JP2008058727A

JP2008058727A - Speech coding device

Info

Publication number: JP2008058727A
Application number: JP2006236803A
Authority: JP
Inventors: Hirokazu Takeuchi; 広和竹内
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-08-31
Filing date: 2006-08-31
Publication date: 2008-03-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech coding device performing band-expansion coding wherein noise is reduced for a speech signal which has no or a very small signal of low frequency. <P>SOLUTION: When a low-frequency signal ratio detection part 25 detects the speech signal which has no or a very small signal of low frequency in an expansion band coder part 21 of the speech coding device, a coding system selection part 24a makes an HFC (High Frequency Coding) coding part 29 code a speech signal of an expansion band. The HFC coding part 29 performs parameter coding and waveform coding according to tonality of the speech signal of the expansion band. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声符号化装置に係り、特に、高域音声信号の符号化処理に関する。 The present invention relates to a speech encoding apparatus, and more particularly to encoding processing of a high frequency speech signal.

音声信号を帯域拡張符号化方式によって符号化する、即ち、基本部分である低域の信号の符号化と、拡張部分である高域の信号とを異なる符号化方式によって符号化することが知られている。この方法は、高域の信号と低域の信号との間には相関があるので、この相関情報としてパワーやノイズレベルやトーン信号成分を符号化することによって高域の信号を符号化するものである。この符号化方法によれば、高効率な音声信号の符号化が可能である。 It is known that a speech signal is encoded by a band extension encoding method, that is, a low-frequency signal that is a basic portion and a high-frequency signal that is an extension portion are encoded by different encoding methods. ing. In this method, since there is a correlation between the high frequency signal and the low frequency signal, the high frequency signal is encoded by encoding power, noise level, and tone signal component as this correlation information. It is. According to this encoding method, highly efficient audio signal encoding is possible.

例えば、あるサンプリング周波数によってサンプリングされた音声信号をＡＡＣ（Advanced Audio Coding）方式によって符号化するにあたり、エスビイアアル（登録商標。ＳＢＲ、Spectral Band Replication、スペクトル帯域複製。）方式（以後、ＳＢＲ方式と称する。）による符号化を組み合わせて符号化することが知られている。 For example, when an audio signal sampled at a certain sampling frequency is encoded by an AAC (Advanced Audio Coding) method, an Sbiaal (registered trademark, SBR, Spectral Band Replication) method (hereinafter, referred to as an SBR method). It is known that encoding is performed in combination with encoding according to).

ＡＡＣ方式と、ＳＢＲ方式とによって音声信号を符号化する場合、その音声信号を、例えば上記サンプリング周波数の半分の周波数でサンプリングした信号に変換して得られる低域の信号は、ＡＡＣ方式によって符号化する。 When an audio signal is encoded by the AAC method and the SBR method, a low-frequency signal obtained by converting the audio signal into a signal sampled at, for example, half the sampling frequency is encoded by the AAC method. To do.

一方、低域以外の高域の信号は、ＳＢＲ方式によって符号化する。即ち、ＱＭＦ（Quadrature Mirror Filter）分析処理によってサブバンド信号に変換する。そして、まず、過渡信号の検出結果に基づいて、エンベロープ情報の区切りとなるグリッド情報を生成する。区切られた各サブバンドサンプルの領域（セグメント）毎のパワー情報をエンベロープ情報として符号化する。また、エンベロープ情報では表現しきれない信号は、付加的な情報として符号化する。 On the other hand, high frequency signals other than the low frequency are encoded by the SBR method. That is, the signal is converted into a subband signal by QMF (Quadrature Mirror Filter) analysis processing. First, grid information serving as a delimiter of envelope information is generated based on the detection result of the transient signal. The power information for each divided subband sample area (segment) is encoded as envelope information. A signal that cannot be expressed by envelope information is encoded as additional information.

最後に、ＡＡＣ方式によって符号化された信号と、ＳＢＲ方式によって符号化された信号とが所定のフォーマットに整えられて、符号化は終了する（例えば、特許文献１参照。）。 Finally, the signal encoded by the AAC method and the signal encoded by the SBR method are arranged in a predetermined format, and the encoding ends (see, for example, Patent Document 1).

このＳＢＲ方式によって符号化された各サンプル信号は、デコーダでは、ＡＡＣ方式によって符号化された情報をデコードして得られた低域のサブバンド信号からコピーすることによって得られた信号をベースにして復元される。
特表２００５−５１０７７２号公報（第２−６頁、図９） Each sample signal encoded by the SBR method is based on a signal obtained by copying from a low-frequency subband signal obtained by decoding information encoded by the AAC method at the decoder. Restored.
JP-T-2005-510772 (page 2-6, FIG. 9)

しかしながら、上述した特許文献１に開示されている方法では、低域の信号が存在しない、または微小である場合、高域の信号を精度よく表現して符号化することができず、符号化された音声信号がデコードされた際、ノイズが知覚される可能性がある問題点があった。 However, in the method disclosed in Patent Document 1 described above, when the low-frequency signal does not exist or is very small, the high-frequency signal cannot be accurately expressed and encoded. There is a problem that noise may be perceived when an audio signal is decoded.

本発明は上記問題点を解決するためになされたもので、低域の信号が存在しない、または微小である音声信号を、ノイズが軽減された帯域拡張符号化する音声符号化装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides a speech coding apparatus that performs band extension coding with reduced noise on a speech signal in which a low-frequency signal does not exist or is very small. With the goal.

上記目的を達成するために、本発明の音声符号化装置は、音声信号の低域信号と高域信号とをそれぞれ異なる方式で符号化することによって、前記音声信号を符号化する音声符号化装置であって、前記低域信号を符号化する低域信号符号化手段と、前記高域信号の前記低域信号への相関情報を符号化することによって前記高域信号を符号化する高域信号符号化手段とを有し、前記高域信号符号化手段は、前記高域信号のパワーを前記低域信号のパワーで除した比を算出し、その算出された比が所定の閾値以上の場合、前記高域信号に前記比に対して単調減少関数となるゲインを前記高域信号に乗算して符号化することを特徴とする。 In order to achieve the above object, a speech encoding apparatus according to the present invention encodes a speech signal by encoding a low frequency signal and a high frequency signal of the speech signal using different methods. A low-frequency signal encoding means for encoding the low-frequency signal, and a high-frequency signal for encoding the high-frequency signal by encoding correlation information of the high-frequency signal with the low-frequency signal. The high frequency signal encoding means calculates a ratio obtained by dividing the power of the high frequency signal by the power of the low frequency signal, and the calculated ratio is equal to or greater than a predetermined threshold value. The high frequency signal is encoded by multiplying the high frequency signal by a gain that is a monotonically decreasing function with respect to the ratio.

また、本発明の音声符号化装置は、音声信号の低域信号と高域信号とをそれぞれ異なる方式で符号化することによって、前記音声信号を符号化する音声符号化装置であって、前記低域信号を符号化する低域信号符号化手段と、前記高域信号のパワーを前記低域信号のパワーで除した比を算出し、その算出された比が所定の閾値以下の場合、前記高域信号の前記低域信号への相関情報を符号化することによって前記高域信号を符号化し、前記算出された比が所定の閾値を超える場合、前記低域信号との相関に係らずに前記高域信号を符号化する高域信号符号化手段とを有することを特徴とする。 The speech coding apparatus according to the present invention is a speech coding apparatus that encodes the speech signal by encoding the low frequency signal and the high frequency signal of the speech signal using different methods. A low-frequency signal encoding means for encoding a high-frequency signal, a ratio obtained by dividing the power of the high-frequency signal by the power of the low-frequency signal, and when the calculated ratio is equal to or less than a predetermined threshold, When the high frequency signal is encoded by encoding correlation information of the low frequency signal to the low frequency signal, and the calculated ratio exceeds a predetermined threshold, the correlation is not related to the low frequency signal. And high frequency signal encoding means for encoding the high frequency signal.

本発明によれば、低域の信号が存在しない、または微小である音声信号を、ノイズが軽減された帯域拡張符号化する音声符号化装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the audio | voice encoding apparatus which carries out the band expansion encoding of the audio | voice signal which a low-frequency signal does not exist or is very small by which noise was reduced can be provided.

以下に、本発明による音声符号化装置の実施の形態を、図面を参照して説明する。 Embodiments of a speech encoding apparatus according to the present invention will be described below with reference to the drawings.

（第1の実施形態）
図１は、本発明の第１の実施形態に係る音声符号化装置の構成を示すブロック図である。この音声符号化装置は、ＰＣＭ音声信号１１を受信して符号化し、符号化された信号である符号化音声信号１２を生成する装置であり、ＰＣＭ音声信号１１の中の高域の信号を符号化する拡張帯域エンコーダ部２１と、低サンプリング周波数化部３１と、ＰＣＭ音声信号１１の中の低域の信号を符号化するＡＡＣエンコーダ部３２と、ストリームフォーマッタ部４１とからなる。 (First embodiment)
FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to the first embodiment of the present invention. This speech encoding device is a device that receives and encodes a PCM speech signal 11 and generates an encoded speech signal 12 that is a coded signal, and encodes a high-frequency signal in the PCM speech signal 11. An extension band encoder unit 21, a low sampling frequency unit 31, an AAC encoder unit 32 that encodes a low-frequency signal in the PCM audio signal 11, and a stream formatter unit 41.

図２は、拡張帯域エンコーダ部２１の構成を示すブロック図である。拡張帯域エンコーダ部２１は、ＰＣＭ音声信号１１を受信するＱＭＦ分析部２２と、グリッド情報生成部２３と、ゲイン制御部２４と、低域信号比検出部２５と、追加パラメータ算出部２６と、エンベロープ情報算出部２７と、ストリームフォーマッタ部４１と接続される拡張帯域ストリームフォーマッタ部２８とからなる。 FIG. 2 is a block diagram showing a configuration of the extension band encoder unit 21. The extension band encoder unit 21 includes a QMF analysis unit 22 that receives the PCM audio signal 11, a grid information generation unit 23, a gain control unit 24, a low frequency signal ratio detection unit 25, an additional parameter calculation unit 26, an envelope The information calculation unit 27 and the extension band stream formatter unit 28 connected to the stream formatter unit 41 are included.

図３は、低域信号比検出部２５の構成を示すブロック図である。低域信号比検出部２５は、ＱＭＦ分析部２２と接続される低域バンドパワー算出部２５ａと、ＱＭＦ分析部２２と接続される高域バンドパワー算出部２５ｂと、ゲイン制御部２４と接続されるパワー比算出部２５ｃとからなる。 FIG. 3 is a block diagram illustrating a configuration of the low frequency signal ratio detection unit 25. The low frequency signal ratio detection unit 25 is connected to the low frequency band power calculation unit 25 a connected to the QMF analysis unit 22, the high frequency band power calculation unit 25 b connected to the QMF analysis unit 22, and the gain control unit 24. Power ratio calculation unit 25c.

上記のように構成された、本発明の第１の実施形態に係る音声符号化装置の各部の動作を図１ないし図３を参照して説明する。 The operation of each unit of the speech encoding apparatus according to the first embodiment of the present invention configured as described above will be described with reference to FIGS.

ＱＭＦ分析部２２は、受信されたＰＣＭ音声信号１１を周波数領域に変換し、サブバンド信号を生成する。グリッド情報生成部２３は、ＱＭＦ分析部２２によって生成されたサブバンド信号の過渡信号の検出結果に基づいて、エンベロープ情報の区切りとなるグリッド情報を生成する。 The QMF analysis unit 22 converts the received PCM audio signal 11 into the frequency domain, and generates a subband signal. The grid information generation unit 23 generates grid information serving as a delimiter of envelope information based on the detection result of the transient signal of the subband signal generated by the QMF analysis unit 22.

低域信号比検出部２５は、ＱＭＦ分析部２２によって生成されたサブバンド信号を受信し、その信号の低域の信号のパワーと、高域の信号のパワーとの比を検出する。即ち、高域の信号のパワー（Ｐｈｉｇｈ）を低域の信号のパワー（Ｐｌｏｗ）で除した比を生成する。 The low frequency signal ratio detection unit 25 receives the subband signal generated by the QMF analysis unit 22, and detects the ratio of the power of the low frequency signal to the power of the high frequency signal. That is, a ratio is generated by dividing the power (Phigh) of the high frequency signal by the power (Plow) of the low frequency signal.

ゲイン制御部２４は、低域信号比検出部２５によって検出されたＡＡＣエンコーダ部３２によって符号化される低域の信号のパワー（Ｐｌｏｗ）と、拡張帯域エンコーダ部２１によって符号化される高域の信号のパワー（Ｐｈｉｇｈ）との比（Ｐｈｉｇｈ／Ｐｌｏｗ）に基づいて、ＱＭＦ分析部２２によって生成された高域のサブバンド信号に所定のゲインを与えた信号を生成する。 The gain control unit 24 detects the power (Plow) of the low frequency signal encoded by the AAC encoder unit 32 detected by the low frequency signal ratio detection unit 25 and the high frequency signal encoded by the extension band encoder unit 21. Based on the ratio (High / Plow) to the signal power (High), a signal is generated by giving a predetermined gain to the high-frequency subband signal generated by the QMF analyzer 22.

ここで、ゲインは、上記比が所定の値以下であれば、１である。そして、その比が所定の値を超えれば、１未満の値を取る単調減少関数である。即ち、高域の信号のパワーが一定であると、低域の信号のパワーが小さくなるほど、小さいゲインを与える、言い換えると、大きい減衰を与える。 Here, the gain is 1 if the ratio is equal to or less than a predetermined value. And if the ratio exceeds a predetermined value, it is a monotonically decreasing function that takes a value of less than 1. That is, if the power of the high frequency signal is constant, the smaller the power of the low frequency signal, the smaller the gain, in other words, the greater the attenuation.

図３を参照して、低域信号比検出部２５を構成する各部の動作を説明する。低域バンドパワー算出部２５ａは、ＱＭＦ分析部２２によって生成されたサブバンド信号の低域のサブバンド信号のパワー（Ｐｌｏｗ）を算出する。高域バンドパワー算出部２５ｂは、ＱＭＦ分析部２２によって生成されたサブバンド信号の高域のサブバンド信号のパワー（Ｐｈｉｇｈ）を算出する。そして、パワー比算出部２５ｃは、高域バンドパワー算出部２５ｂによって算出された高域の信号のパワー（Ｐｈｉｇｈ）を低域バンドパワー算出部２５ａによって算出された低域の信号のパワー（Ｐｌｏｗ）で除して商を求めることによって、パワー比（Ｐｈｉｇｈ／Ｐｌｏｗ）を生成する。 With reference to FIG. 3, the operation of each unit constituting the low-frequency signal ratio detection unit 25 will be described. The low band power calculation unit 25a calculates the power (Plow) of the low band subband signal of the subband signal generated by the QMF analysis unit 22. The high band power calculator 25b calculates the power (Ph) of the high band sub-band signal of the sub-band signal generated by the QMF analyzer 22. Then, the power ratio calculation unit 25c uses the power (Phigh) of the high frequency signal calculated by the high frequency band power calculation unit 25b as the power (Plow) of the low frequency signal calculated by the low frequency band power calculation unit 25a. The power ratio (High / Plow) is generated by obtaining the quotient by dividing by.

なお、この比を生成する際、周波数領域での低域信号と高域信号との区切り位置、即ちＡＡＣ符号化と拡張帯域符号化の区切りとなる位置は、サンプリング周波数や、符号化レートに依存する。そこで、パワー比算出部２５ｃは、低域サブバンド数Ｎｌｏｗと高域サブバンド数Ｎｈｉｇｈの比（Ｎｌｏｗ／Ｎｈｉｇｈ）を乗算することで補正する。結果として、補正されたパワー比Ｐｒは（Ｐｈｉｇｈ／Ｐｌｏｗ）×（Ｎｌｏｗ／Ｎｈｉｇｈ）となる。 When this ratio is generated, the position where the low frequency signal and the high frequency signal are separated in the frequency domain, that is, the position where the AAC coding and the extended band coding are separated depends on the sampling frequency and the coding rate. To do. Therefore, the power ratio calculation unit 25c performs correction by multiplying the ratio (Nlow / Nhigh) of the low frequency subband number Nlow and the high frequency subband number Nhigh. As a result, the corrected power ratio Pr becomes (Phigh / Plow) × (Nlow / Nhigh).

図２を参照して、拡張帯域エンコーダ部２１を構成する各部の動作の説明に戻る。追加パラメータ算出部２６は、グリッド情報生成部２３によって生成されたグリッド情報によって区切られたサブバンドであって、ゲイン制御部２４によって送信された信号の中で、エンベロープ情報では表現できない信号を検出する。そして、その信号を表現するノイズレベルなどの追加情報を求め、その追加情報を示すパラメータを生成する。 Referring back to FIG. 2, the description returns to the operation of each unit constituting the extension band encoder unit 21. The additional parameter calculation unit 26 detects signals that are subbands divided by the grid information generated by the grid information generation unit 23 and cannot be expressed by the envelope information from the signals transmitted by the gain control unit 24. . Then, additional information such as a noise level expressing the signal is obtained, and a parameter indicating the additional information is generated.

エンベロープ情報算出部２７は、グリッド情報生成部２３によって生成されたグリッド情報によって区切られたサブバンド信号に対して、高域のサブバンドサンプルの領域（セグメント）毎のパワー情報をエンベロープ情報として符号化する。 The envelope information calculation unit 27 encodes power information for each region (segment) of the high-frequency subband sample as envelope information for the subband signal divided by the grid information generated by the grid information generation unit 23. To do.

拡張帯域ストリームフォーマッタ部２８は、グリッド情報生成部２３によって生成されたグリッド情報と、追加パラメータ算出部２６によって生成された追加情報を示すパラメータと、エンベロープ情報算出部２７によって符号化された高域の音声信号とを受信し、所定の拡張帯域、即ち、高域の音声信号符号化情報を所定の形式のストリームに整えて送信する。拡張帯域ストリームフォーマッタ部２８によって所定の形式に整えられた符号化された高域の音声信号は、ストリームフォーマッタ部４１に送信する。 The extension band stream formatter unit 28 includes the grid information generated by the grid information generation unit 23, the parameter indicating the additional information generated by the additional parameter calculation unit 26, and the high frequency band encoded by the envelope information calculation unit 27. The audio signal is received, and the audio signal encoding information of a predetermined extension band, that is, a high frequency is arranged into a stream of a predetermined format and transmitted. The encoded high-frequency audio signal adjusted to a predetermined format by the extension band stream formatter unit 28 is transmitted to the stream formatter unit 41.

低サンプリング周波数化部３１は、ＰＣＭ音声信号１１を受信して、その信号に含まれる低域の信号をダウンサンプル処理して生成する。例えば、通常のＡＡＣ符号化とＳＢＲ符号化の組合せでは、入力信号がサンプリングされた周波数の２分の１の周波数でサンプリングされた信号を生成するが、本発明の意味するところは低サンプリング周波数を生成する処理であって、２分の１の周波数に限るものではない。ＡＡＣエンコーダ部３２は、低サンプリング周波数化部３１によって生成された低域の音声信号を受信して、ＡＡＣ方式によって符号化し、符号化された信号を送信する。 The low sampling frequency unit 31 receives the PCM audio signal 11 and generates a low-frequency signal included in the signal by down-sampling. For example, a combination of normal AAC coding and SBR coding produces a signal sampled at a frequency that is half the frequency at which the input signal was sampled. The process to be generated is not limited to a half frequency. The AAC encoder unit 32 receives the low frequency audio signal generated by the low sampling frequency unit 31, encodes it by the AAC method, and transmits the encoded signal.

ストリームフォーマッタ部４１は、拡張帯域ストリームフォーマッタ部２８によって所定の形式に整えられた符号化された高域の音声信号と、ＡＡＣエンコーダ部３２によって符号化された低域の音声信号とを受信して、これらの高域の音声信号と、低域の音声信号とを所定の形式のストリームに整えて、符号化された音声信号である符号化音声信号１２を送信する。 The stream formatter unit 41 receives the encoded high frequency audio signal adjusted to a predetermined format by the extension band stream formatter unit 28 and the low frequency audio signal encoded by the AAC encoder unit 32. The high frequency audio signal and the low frequency audio signal are arranged into a stream of a predetermined format, and the encoded audio signal 12 which is an encoded audio signal is transmitted.

上記のように、低域信号比検出部２５によって、低域の音声信号のパワーが高域の音声信号に対してより小さい程、高域の音声信号により大きな減衰を与えて符号化する。これは、次の理由による。低域の音声信号のパワーが小さいと、それをコピーして得られる高域信号は、周波数解像度の低いエンベロープ情報を基にエンベロープ補正（ゲインアップ）して得られるため、特に低域信号と高域信号の相関が低い場合には、歪みが大きくなりノイズとして知覚されるためである。また、人間の聴感上重要な低域信号が小さい上、高域の歪みに対するマスキング効果も働かないため、ノイズが知覚されやすくなる。そこで、上記減衰によって、符号化音声信号１２がデコードされた信号を聴取した使用者に、そのノイズによる違和感を与え難くする効果がある。 As described above, the low-frequency signal ratio detection unit 25 performs encoding by giving a high attenuation to the high-frequency audio signal as the power of the low-frequency audio signal is smaller than that of the high-frequency audio signal. This is due to the following reason. If the power of the low frequency audio signal is small, the high frequency signal obtained by copying it is obtained by performing envelope correction (gain increase) based on the envelope information with low frequency resolution. This is because when the correlation of the band signals is low, distortion increases and is perceived as noise. Further, since the low-frequency signal important for human hearing is small and the masking effect for high-frequency distortion does not work, noise is easily perceived. Therefore, the above-described attenuation has an effect of making it difficult for the user who has listened to the signal obtained by decoding the encoded audio signal 12 to feel uncomfortable due to the noise.

（第２の実施形態）
本発明の第２の実施形態に係る音声符号化装置と、第１の実施形態に係る音声符号化装置との相違点は、拡張帯域エンコーダ部２１にある。そこで、第２の実施形態に係る拡張帯域エンコーダ部２１の説明をする。図４は、その拡張帯域エンコーダ部２１の構成を示すブロック図である。この第２の実施形態に係る拡張帯域エンコーダ部２１で、第１の実施形態に係る拡張帯域エンコーダ部２１と同じ部分には、同じ符号を付して説明を省略する。 (Second Embodiment)
The difference between the speech coding apparatus according to the second embodiment of the present invention and the speech coding apparatus according to the first embodiment resides in the extension band encoder unit 21. Therefore, the extension band encoder unit 21 according to the second embodiment will be described. FIG. 4 is a block diagram showing a configuration of the extension band encoder unit 21. In the extension band encoder unit 21 according to the second embodiment, the same parts as those of the extension band encoder unit 21 according to the first embodiment are denoted by the same reference numerals and description thereof is omitted.

この拡張帯域エンコーダ部２１は、ＰＣＭ音声信号１１を受信するＱＭＦ分析部２２と、グリッド情報生成部２３と、符号化方式選択部２４ａと、低域信号比検出部２５と、追加パラメータ算出部２６と、エンベロープ情報算出部２７と、ストリームフォーマッタ部４１と接続される拡張帯域ストリームフォーマッタ部２８ａと、ＨＦＣ（High Frequency Coding）符号化部２９とからなる。 The extension band encoder unit 21 includes a QMF analysis unit 22 that receives the PCM audio signal 11, a grid information generation unit 23, an encoding scheme selection unit 24a, a low frequency signal ratio detection unit 25, and an additional parameter calculation unit 26. And an envelope information calculation unit 27, an extension band stream formatter unit 28a connected to the stream formatter unit 41, and an HFC (High Frequency Coding) encoding unit 29.

即ち、この拡張帯域エンコーダ部２１は、第１の実施形態に係る拡張帯域エンコーダ部２１と比較して、ゲイン制御部２４に代えて符号化方式選択部２４ａを有し、拡張帯域ストリームフォーマッタ部２８に代えて拡張帯域ストリームフォーマッタ部２８ａを有し、更に、ＨＦＣ符号化部２９を有する。 That is, the extension band encoder unit 21 includes an encoding method selection unit 24a instead of the gain control unit 24, as compared with the extension band encoder unit 21 according to the first embodiment, and an extension band stream formatter unit 28. Instead, an extended band stream formatter unit 28a is provided, and an HFC encoding unit 29 is further provided.

図５は、ＨＦＣ符号化部２９の構成を示すブロック図である。このＨＦＣ符号化部２９は、符号化方式選択部２４ａ及びグリッド情報生成部２３と接続されるトーナリティ算出部２９ａと、グリッド情報生成部２３と接続される符号化モード選択部２９ｂと、拡張帯域ストリームフォーマッタ部２８ａと接続されるパラメータ符号化部２９ｃと、拡張帯域ストリームフォーマッタ部２８ａと接続される波形符号化部２９ｄとからなる。 FIG. 5 is a block diagram showing a configuration of the HFC encoding unit 29. The HFC encoding unit 29 includes a tonality calculation unit 29a connected to the encoding method selection unit 24a and the grid information generation unit 23, an encoding mode selection unit 29b connected to the grid information generation unit 23, and an extension band stream The parameter encoding unit 29c is connected to the formatter unit 28a, and the waveform encoding unit 29d is connected to the extension band stream formatter unit 28a.

上記のように構成された、本発明の第２の実施形態に係る拡張帯域エンコーダ部２１の動作を図４ないし図６を参照して説明する。 The operation of the extension band encoder unit 21 configured as described above according to the second embodiment of the present invention will be described with reference to FIGS.

符号化方式選択部２４ａは、低域信号比検出部２５によって生成されたパワー比（Ｐｒ）を受信して、この比が所定の閾値以下である場合、高域の信号を、追加パラメータ算出部２６及びエンベロープ情報算出部２７によって符号化することを選択し、ＱＭＦ分析部２２によって生成された高域のサブバンド信号をこれらの処理部に送って符号化させる。 The encoding scheme selection unit 24a receives the power ratio (Pr) generated by the low-frequency signal ratio detection unit 25, and if this ratio is equal to or less than a predetermined threshold, the high-frequency signal is converted into an additional parameter calculation unit. 26 and the envelope information calculation unit 27 are selected to encode, and the high-frequency subband signal generated by the QMF analysis unit 22 is sent to these processing units for encoding.

一方、パワー比（Ｐｒ）が所定の閾値を超える場合、高域の信号を、ＨＦＣ符号化部２９によって符号化することを選択し、ＱＭＦ分析部２２によって生成された高域のサブバンド信号をＨＦＣ符号化部２９に送って符号化させる。 On the other hand, when the power ratio (Pr) exceeds a predetermined threshold, the high-frequency signal is selected to be encoded by the HFC encoder 29 and the high-frequency subband signal generated by the QMF analyzer 22 is selected. The data is sent to the HFC encoder 29 for encoding.

トーナリティ算出部２９ａは、ＱＭＦ分析部２２によって生成された高域のサブバンド信号を受信して、グリッド情報生成部２３によって生成されたグリッドによって区切られた領域の各サブバンド信号に対してトーナリティを算出する。トーナリティ算出の際には、例えば、線形予測ゲインを用いる。この際、よりパワーの大きい信号の特性が反映されやすいように、パワーに応じてトーナリティの値を重み付けする。重み付けは、スケールファクタバンド内で最大のパワー値を持つサブバンド信号のパワー値で正規化することによりなされる。 The tonality calculation unit 29a receives the high frequency sub-band signal generated by the QMF analysis unit 22, and calculates the tonality for each sub-band signal in the region delimited by the grid generated by the grid information generation unit 23. calculate. When calculating tonality, for example, a linear prediction gain is used. At this time, the value of the tonality is weighted according to the power so that the characteristics of the signal having higher power can be easily reflected. The weighting is performed by normalizing with the power value of the subband signal having the maximum power value within the scale factor band.

符号化モード選択部２９ｂは、トーナリティ算出部２９ａによって算出されたトーナリティを受信し、グリッド情報生成部２３によって生成されたグリッドによって区切られた領域のスケールファクタバンド内の各サブバンド信号のトーナリティに基づいて、そのスケールファクタバンド毎の符号化モードを選択する。即ち、スケールファクタバンド内の各サブバンド信号のトーナリティの値で最も高い値が所定の閾値以上であるか否かで判断する。 The encoding mode selection unit 29b receives the tonality calculated by the tonality calculation unit 29a, and based on the tonality of each subband signal in the scale factor band of the region delimited by the grid generated by the grid information generation unit 23. Then, the encoding mode for each scale factor band is selected. That is, the determination is made based on whether the highest value of the tonality of each subband signal in the scale factor band is equal to or greater than a predetermined threshold value.

そのトーナリティが所定の閾値以下である場合、そのスケールファクタバンドは、ノイズライクな信号であると判断され、符号化モード選択部２９ｂは、グリッドによって区切られた領域の信号に関するパワー情報を含むパラメータ情報をパラメータ符号化部２９ｃに符号化させる。一方、そのトーナリティが所定の閾値を超える場合、上記信号を波形符号化部２９ｄに符号化させる。 When the tonality is equal to or less than a predetermined threshold, the scale factor band is determined to be a noise-like signal, and the encoding mode selection unit 29b includes parameter information including power information related to the signal in the region divided by the grid. Is encoded by the parameter encoding unit 29c. On the other hand, when the tonality exceeds a predetermined threshold, the waveform encoding unit 29d is encoded with the signal.

パラメータ符号化部２９ｃは、セグメント毎にノイズライクな信号の符号化であることを示すフラグとパワー情報とを符号化する。波形符号化部２９ｄは、各サブバンドサンプルを波形符号化する。波形符号化は、目標とする拡張帯域ストリームのビットレートに応じ、言い換えると、目標とする拡張帯域ストリームフォーマッタ部２８ａによって送信される情報の量に応じて、セグメント毎のスケールファクタ（量子化ステップ）を決定する。そして、量子化された各サブバンドサンプルの差分値をハフマン符号化する。 The parameter encoding unit 29c encodes a flag and power information indicating that it is encoding of a noise-like signal for each segment. The waveform encoding unit 29d encodes each subband sample. The waveform coding is performed according to the bit rate of the target extension band stream, in other words, according to the amount of information transmitted by the target extension band stream formatter unit 28a. To decide. Then, Huffman coding is performed on the difference value of each quantized subband sample.

拡張帯域ストリームフォーマッタ部２８ａは、グリッド情報生成部２３によって生成されたグリッド情報と、追加パラメータ算出部２６によって生成された追加情報を示すパラメータと、エンベロープ情報算出部２７によって符号化された高域の音声信号とに加えて、パラメータ符号化部２９ｃによって符号化されたパラメータ情報と、波形符号化部２９ｄによって符号化された高域の音声信号を受信する。そして、これらの受信された信号を、所定の拡張帯域、即ち、高域の音声信号が符号化された所定の形式のストリームに整えて送信する。 The extension band stream formatter unit 28a includes the grid information generated by the grid information generation unit 23, the parameter indicating the additional information generated by the additional parameter calculation unit 26, and the high frequency band encoded by the envelope information calculation unit 27. In addition to the audio signal, the parameter information encoded by the parameter encoding unit 29c and the high frequency audio signal encoded by the waveform encoding unit 29d are received. Then, these received signals are arranged into a predetermined format stream in which a predetermined extension band, that is, a high frequency audio signal is encoded, and transmitted.

図６は、第２の実施形態に係る音声符号化装置によって各サブバンドがいずれかの方法によって符号化された一例を示す。ここでは、１６個のサブバンドに分割されているとする。低域の６つのサブバンドは、ＡＡＣエンコーダ部３２によってＡＡＣ符号化される。図６では、これらのサブバンドに右上から左下へのハッチングを施してある。 FIG. 6 shows an example in which each subband is encoded by any method by the speech encoding apparatus according to the second embodiment. Here, it is assumed that it is divided into 16 subbands. The six sub-bands in the low band are AAC encoded by the AAC encoder unit 32. In FIG. 6, these subbands are hatched from upper right to lower left.

高域のサブバンドは、周波数が低い順に、３つのサブバンドを持つ第１のスケールファクタバンド（ｓｆｂ１）、３つのサブバンドを持つ第２のスケールファクタバンド（ｓｆｂ２）、４つのサブバンドを持つ第３のスケールファクタバンド（ｓｆｂ３）に分割されている。 The high-frequency subbands have a first scale factor band (sfb1) having three subbands, a second scale factor band (sfb2) having three subbands, and four subbands in order of decreasing frequency. It is divided into a third scale factor band (sfb3).

各スケールファクタバンドの中で、最も高いトーナリティを持つサブバンドが符号化モード選択部２９ｂによって判断され、そのトーナリティが各スケールファクタバンド毎の閾値と比較される。図６で、各スケールファクタバンドの中で、最も高いトーナリティを持つサブバンドに左上から右下へのハッチングを施してある。 Among the scale factor bands, the subband having the highest tonality is determined by the encoding mode selection unit 29b, and the tonality is compared with a threshold value for each scale factor band. In FIG. 6, the sub-band having the highest tonality among the scale factor bands is hatched from the upper left to the lower right.

そして、そのトーナリティが所定の閾値以下であれば、そのスケールファクタバンドの信号は、パラメータ符号化部２９ｃに符号化させる。一方、そのトーナリティが所定の閾値を超える場合、波形符号化部２９ｄに符号化させる。 If the tonality is equal to or less than a predetermined threshold, the signal of the scale factor band is encoded by the parameter encoding unit 29c. On the other hand, when the tonality exceeds a predetermined threshold value, the waveform encoding unit 29d performs encoding.

図６に示す一例では、ｓｆｂ１で最大のトーナリティはｓｆｂ１用の閾値ＴＨ１を超える（ｔｏｎａｌ（ｍａｘ）＞ＴＨ１）ため、ｓｆｂ１の信号は波形符号化部２９ｄによって符号化される。また、ｓｆｂ２で最大のトーナリティはｓｆｂ２用の閾値ＴＨ２を超える（ｔｏｎａｌ（ｍａｘ）＞ＴＨ２）ため、ｓｆｂ２の信号は波形符号化部２９ｄによって符号化される。 In the example shown in FIG. 6, since the maximum tonality at sfb1 exceeds the threshold TH1 for sfb1 (tonal (max)> TH1), the signal of sfb1 is encoded by the waveform encoding unit 29d. Further, since the maximum tonality in sfb2 exceeds the threshold TH2 for sfb2 (tonal (max)> TH2), the signal of sfb2 is encoded by the waveform encoding unit 29d.

そして、ｓｆｂ３で最大のトーナリティはｓｆｂ３用の閾値ＴＨ３未満である（ｔｏｎａｌ（ｍａｘ）＜ＴＨ３）ため、ｓｆｂ３の信号はパラメータ符号化部２９ｃによってパワー情報を含むパラメータ情報として符号化される。ここで、聴覚特性を考慮して、高域ほどパラメータ符号化が選択されやすくするため、上記閾値は、高域ほど大きくする（ＴＨ１＜ＴＨ２＜ＴＨ３）。 Since the maximum tonality in sfb3 is less than the threshold TH3 for sfb3 (tonal (max) <TH3), the signal of sfb3 is encoded as parameter information including power information by the parameter encoding unit 29c. Here, in consideration of auditory characteristics, the higher the frequency, the easier the parameter coding is selected, so that the threshold value is increased as the frequency is higher (TH1 <TH2 <TH3).

なお、この拡張帯域エンコーダ部２１の動作と、第１の実施形態に係る拡張帯域エンコーダ部２１の動作とを比較すると、パワー比（Ｐｒ）が高い場合、高域の信号は、聴感上歪みが知覚されやすいトーナルな信号に対して波形符号化部２９ｄによる波形符号化が行われることにより、高域の信号を低歪みで符号化できる。 When the operation of the extension band encoder unit 21 and the operation of the extension band encoder unit 21 according to the first embodiment are compared, when the power ratio (Pr) is high, the high frequency signal is distorted in terms of audibility. By performing waveform encoding by the waveform encoding unit 29d on a tonal signal that is easily perceived, a high-frequency signal can be encoded with low distortion.

一方、その場合、拡張帯域エンコーダ部２１によって作成される符号のビット数は多いことがある。しかし、上記比が高いことは、低域の信号がない、または微小であることを意味し、その結果、ＡＡＣエンコーダ部３２によって作成される低域の信号に割り当てられるビット数は少ない。その結果、符号化音声信号１２のビット数が過大になることを防ぐことができる。 On the other hand, in that case, the number of bits of the code created by the extension band encoder unit 21 may be large. However, when the ratio is high, it means that there is no low frequency signal or is very small. As a result, the number of bits allocated to the low frequency signal generated by the AAC encoder unit 32 is small. As a result, it is possible to prevent the number of bits of the encoded audio signal 12 from becoming excessive.

（その他の実施形態）
本発明の実施形態に係る音声符号化装置は、プログラムを利用して動作するコンピュータであっても良い。また、本発明は、音声信号を符号化するあらゆる装置に適用することが当然に可能である。ここで、音声信号は、人が発生した声、楽音、その他のあらゆる音を含むことは、言うまでもない。また、上記の実施形態で説明した要素を適宜組み合わせても良い。本発明は以上の構成に限定されるものではなく、種々の変形が可能である。 (Other embodiments)
The speech encoding apparatus according to the embodiment of the present invention may be a computer that operates using a program. In addition, the present invention can naturally be applied to any device that encodes an audio signal. Here, it goes without saying that the audio signal includes a voice generated by a person, a musical sound, and any other sound. Moreover, you may combine suitably the element demonstrated in said embodiment. The present invention is not limited to the above configuration, and various modifications are possible.

本発明の第１の実施形態に係る音声符号化装置の構成を示すブロック図。1 is a block diagram showing a configuration of a speech encoding apparatus according to a first embodiment of the present invention. 本発明の第１の実施形態に係る拡張帯域エンコーダ部の構成を示すブロック図。The block diagram which shows the structure of the extension zone | band encoder part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る低域信号比検出部の構成を示すブロック図。The block diagram which shows the structure of the low-pass signal ratio detection part which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る拡張帯域エンコーダ部の構成を示すブロック図。The block diagram which shows the structure of the expansion zone | band encoder part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係るＨＦＣ符号化部の構成を示すブロック図。The block diagram which shows the structure of the HFC encoding part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施形態に係る各サブバンドの符号化方式の選択の一例を示す図。The figure which shows an example of selection of the encoding system of each subband which concerns on the 2nd Embodiment of this invention.

Explanation of symbols

１１ＰＣＭ音声信号
１２符号化音声信号
２１拡張帯域エンコーダ部
２２ＱＭＦ分析部
２３グリッド情報生成部
２４ゲイン制御部
２４ａ符号化方式選択部
２５低域信号比検出部
２５ａ低域バンドパワー算出部
２５ｂ高域バンドパワー算出部
２５ｃパワー比算出部
２６追加パラメータ算出部
２７エンベロープ情報算出部
２８、２８ａ拡張帯域ストリームフォーマッタ部
２９ＨＦＣ符号化部
２９ａトーナリティ算出部
２９ｂ符号化モード選択部
２９ｃパラメータ符号化部
２９ｄ波形符号化部
３１低サンプリング周波数化部
３２ＡＡＣエンコーダ部
４１ストリームフォーマッタ部 11 PCM audio signal 12 Encoded audio signal 21 Extended band encoder unit 22 QMF analysis unit 23 Grid information generation unit 24 Gain control unit 24a Coding method selection unit 25 Low frequency signal ratio detection unit 25a Low frequency band power calculation unit 25b High frequency Band power calculation unit 25c Power ratio calculation unit 26 Additional parameter calculation unit 27 Envelope information calculation unit 28, 28a Extended band stream formatter unit 29 HFC encoding unit 29a Tonality calculation unit 29b Encoding mode selection unit 29c Parameter encoding unit 29d Waveform code Conversion unit 31 low sampling frequency conversion unit 32 AAC encoder unit 41 stream formatter unit

Claims

An audio encoding device that encodes the audio signal by encoding the low-frequency signal and the high-frequency signal of the audio signal in different manners,
Low-frequency signal encoding means for encoding the low-frequency signal;
High frequency signal encoding means for encoding the high frequency signal by encoding correlation information of the high frequency signal to the low frequency signal;
The high frequency signal encoding means calculates a ratio obtained by dividing the power of the high frequency signal by the power of the low frequency signal, and when the calculated ratio is equal to or greater than a predetermined threshold, monotonously decreases with respect to the ratio. A speech coding apparatus, wherein the high frequency signal is multiplied by a gain as a function and coded.

An audio encoding device that encodes the audio signal by encoding the low-frequency signal and the high-frequency signal of the audio signal in different manners,
Low-frequency signal encoding means for encoding the low-frequency signal;
Calculate a ratio obtained by dividing the power of the high-frequency signal by the power of the low-frequency signal, and if the calculated ratio is equal to or less than a predetermined threshold, encode correlation information of the high-frequency signal to the low-frequency signal Encoding the high frequency signal, and when the calculated ratio exceeds a predetermined threshold, high frequency signal encoding means for encoding the high frequency signal irrespective of the correlation with the low frequency signal; A speech encoding apparatus comprising:

The high frequency signal encoding means, when the calculated ratio exceeds a predetermined threshold, divides the high frequency signal into a plurality of scale factor bands, calculates the tonality for each of the divided scale factor bands, If the calculated tonality is less than or equal to the tonality threshold, the signal included in the scale factor band is parameter-encoded. If the calculated tonality exceeds the tonality threshold, the signal included in the scale factor band is waveform encoded. The speech encoding apparatus according to claim 2, wherein:

The speech encoding apparatus according to claim 3, wherein the tonality for each scale factor band is a maximum value of the tonality of a subband signal included in the scale factor band.

The speech encoding apparatus according to claim 3, wherein the tonality threshold value is a larger value with respect to the higher scale factor band.

The speech coding apparatus according to claim 3, wherein the parameter coding is coding of a parameter including power information of the signal.