JP2820117B2

JP2820117B2 - Audio coding device

Info

Publication number: JP2820117B2
Application number: JP8134812A
Authority: JP
Inventors: 俊之石野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-05-29
Filing date: 1996-05-29
Publication date: 1998-11-05
Anticipated expiration: 2016-05-29
Also published as: JPH09321628A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声符号化装置に関
し、特に心理聴覚分析を用いる音声符号化装置に関す
る。[0001] 1. Field of the Invention [0002] The present invention relates to a speech coding apparatus, and more particularly to a speech coding apparatus using psychological auditory analysis.

【０００２】[0002]

【従来の技術】図３は従来の一般的な音声符号化装置の
構成を示すブロック図である。2. Description of the Related Art FIG. 3 is a block diagram showing a configuration of a conventional general speech coding apparatus.

【０００３】従来から、心理聴覚分析機能を用いた音声
符号化方式はフレーム単位の符号化を行っており、この
音声符号化装置２０では、周波数分割フィルタバンク１
１が入力されたＮフレーム目の入力音声信号データを周
波数領域の成分に分割する。一方、心理聴覚分析部１７
は、図４の入力音声信号データの構成を説明する説明図
に示すように、Ｎフレーム目のデータを符号化する場
合、（Ｎ−ｉ）フレーム目からＮフレーム目までのＡ領
域の入力音声信号データを、スペクトラム計算部１４に
入力して周波数解析を行い、この周波数解析結果と人間
の聴覚特性のマスキング効果とを考慮し、マスキングカ
ーブ予測器１５でマスキングカーブを算出し、このマス
キングカーブを基に、量子化ステップ幅予測器１６によ
り量子化ステップ幅を予測し、量子化器１２が予測され
た量子化ステップ幅で周波数分割フィルタバンク１１の
出力するデータの量子化を行っている。[0003] Conventionally, a speech coding method using a psycho-auditory analysis function has performed coding on a frame-by-frame basis.
The input audio signal data of the Nth frame to which 1 is input is divided into frequency domain components. On the other hand, the psychological hearing analysis unit 17
As shown in the explanatory diagram illustrating the configuration of the input audio signal data in FIG. 4, when encoding data in the Nth frame, the input audio in the A region from the (Ni) frame to the Nth frame is encoded. The signal data is input to the spectrum calculator 14 to perform a frequency analysis, and in consideration of the result of the frequency analysis and the masking effect of the human auditory characteristics, a masking curve predictor 15 calculates a masking curve. Based on this, the quantization step width is predicted by the quantization step width predictor 16, and the quantizer 12 quantizes the data output from the frequency division filter bank 11 with the predicted quantization step width.

【０００４】実際の使用例としては、例えば、国際標準
化機構であるＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／Ｗ
Ｇ１１の作業部会（ＭＰＥＧ；ＭｏｖｉｎｇＰｉｃｔ
ｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）の策定した、動
画像の高能率圧縮方式の１つであるＭＰＥＧ方式の音声
符号化方式の場合、ＭＰＥＧＡｕｄｉｏＬａｙｅｒ
Ｉ／II／III で用いられている心理聴覚分析部への音声
信号入力データは、Ｎフレーム目の音声入力信号を符号
化するさい、Ｎフレーム目のデータと（Ｎ−１）フレー
ム目のデータが入力される。この他の心理聴覚分析を用
いた符号化方式としては、ＭＤ（ミニ・ディスク）で用
いられているＡＴＲＡＣ音声符号化方式や、ＤＣＣ（デ
ィジタル・コンパクト・カセット）で用いられているＰ
ＡＳＣ音声符号化方式等があげられる。[0004] Practical examples of use include, for example, ISO / IEC JTC1 / SC29 / W which is an international standardization organization.
Working Group of G11 (MPEG; Moving Pict
MPEG Audio Layer, which is one of the high-efficiency compression schemes for moving images, which is formulated by our Experts Group.
The speech signal input data to the psychological auditory analysis unit used in I / II / III is the data of the Nth frame and the data of the (N-1) th frame when the speech input signal of the Nth frame is encoded. Is entered. Other encoding methods using psychological auditory analysis include ATRAC speech encoding methods used in MDs (mini discs) and PRACs used in DCCs (digital compact cassettes).
ASC audio coding system and the like can be mentioned.

【０００５】なお、ここでマスキング効果とこれの使用
方法について述べておく。マスキング効果には同時マス
キングと継時マスキングとがあり、継時マスキングの中
には順向、逆向マスキングがある。[0005] Here, the masking effect and how to use it will be described. The masking effect includes simultaneous masking and successive masking. The successive masking includes forward and backward masking.

【０００６】同時マスキングとはマスクする音（マスカ
ー）とマスクされる音（マスキー）が同時に提示される
場合をいう。[0006] Simultaneous masking refers to a case in which a masking sound (masker) and a masking sound (masky) are presented simultaneously.

【０００７】継時マスキングの順向マスキングとは時間
的に先行する音の後続する音へのマスキングであり、逆
向マスキングとは後続する音の先行する音へのマスキン
グである。それぞれのマスキング量は、大阪大学桑野
氏がジェーエーエス・ジャーナル（JAS Journal ）'93・
6月号13〜25頁に「聴覚心理とオーディオ」として発表
された論文の中に示されている。以下に表１として一部
を示す。The forward masking of successive masking is the masking of a temporally preceding sound to a succeeding sound, and the backward masking is the masking of a subsequent sound to a preceding sound. Mr. Kuwano of Osaka University reported the amount of each masking in JAS Journal '93.
This is shown in a paper published as "Aural Psychology and Audio" in the June issue, pages 13-25. A part is shown as Table 1 below.

【０００８】[0008]

【表１】 [Table 1]

【０００９】マスキングカーブ予測器１５で予測された
マスキングカーブから、人間の聴覚特性でマスキング量
の大きい周波数成分データに対する量子化ステップ幅を
粗く、マスキング量の少ない周波数成分データに対する
量子化ステップ幅を細かく割り当てしてステップ幅を求
めた結果を量子化ステップ幅予測器１６から出力する。From the masking curve predicted by the masking curve predictor 15, the quantization step width for the frequency component data having a large masking amount in the human auditory characteristics is coarse, and the quantization step width for the frequency component data having a small masking amount is fine. The result of the allocation and the step width is output from the quantization step width predictor 16.

【００１０】[0010]

【発明が解決しようとする課題】上述した従来の音声符
号化装置は、Ｎフレーム目のデータを符号化する場合、
符号化されるフレームのデータより前のデータを用いて
心理聴覚分析を行っているため、マスキング効果の内の
順向マスキング効果のみしか利用しないで、マスキング
カーブを求めていることになる。このため求められたマ
スキングカーブは実際にマスキングされる全てのデータ
を解析していないため、最適のマスキングカーブを算出
しているとはいえず、符号化効率が悪いという問題点が
ある。The conventional speech coding apparatus described above, when coding the data of the Nth frame,
Since the psychoacoustic analysis is performed using the data before the data of the frame to be coded, the masking curve is determined without using only the forward masking effect among the masking effects. For this reason, since the obtained masking curve does not analyze all data actually masked, it cannot be said that an optimum masking curve is calculated, and there is a problem that coding efficiency is poor.

【００１１】本発明の目的は、同等の演算量でより符号
化効率を高めることができる音声符号化装置を提供する
ことにある。It is an object of the present invention to provide a speech coding apparatus capable of increasing coding efficiency with the same amount of calculation.

【００１２】[0012]

【課題を解決するための手段】本発明の音声符号化装置
は、一定長の複数のフレームに分割され入力端子を介し
て入力される入力音声信号データを各フレームごとに周
波数分割したデータとする周波数分割手段と、前記一定
長の複数のフレームを受け各フレームごとにスペクトラ
ム解析し最新のフレームに対しこの最新のフレームおよ
びこの最新のフレーム以前のｉ（ｉ＝１，２，…ｎ）個
のフレームのスペクトラム解析の結果とマスキングの効
果を含む人間の聴覚特性とを用いて量子化ステップ幅を
計算する心理聴覚分析手段と、前記周波数分割手段が周
波数分割したデータを前記心理聴覚分析手段が計算した
量子化ステップ幅で量子化する量子化手段と、この量子
化手段が量子化した量子化データを符号化ビット列に多
重化する多重化手段とを備える音声符号化装置におい
て、前記入力端子と前記周波数分割手段および前記心理
聴覚分析手段との間に前記入力音声信号データを一時記
憶する入力音声信号データ記憶手段を設け、前記心理聴
覚分析手段が前記入力音声信号データ記憶手段から量子
化ステップ幅を計算すべきフレームを中に挟む前後のｉ
個のフレームを受け、前記量子化ステップ幅を計算すべ
きフレームのスペクトラム解析の結果とマスキングの効
果を含む人間の聴覚特性とを用いて量子化ステップ幅を
計算し前記量子化手段に出力する構成である。SUMMARY OF THE INVENTION A speech encoding apparatus according to the present invention converts input speech signal data, which is divided into a plurality of frames of a fixed length and input via an input terminal, into frequency-divided data for each frame. Frequency dividing means for receiving a plurality of frames of the predetermined length and performing spectrum analysis for each frame, and comparing the latest frame with the latest frame and i (i = 1, 2,... Psychological auditory analysis means for calculating the quantization step width using the result of the spectrum analysis of the frame and the human auditory characteristics including the effect of masking, and the psychological auditory analysis means calculates the frequency-divided data by the frequency dividing means. And a multiplexing means for multiplexing the quantized data quantized by the quantizing means into an encoded bit sequence. Wherein the input audio signal data storage means for temporarily storing the input audio signal data is provided between the input terminal and the frequency division means and the psychological hearing analysis means, wherein the psychological hearing analysis means Are i before and after sandwiching a frame whose quantization step width is to be calculated from the input audio signal data storage means.
Receiving the number of frames, calculating the quantization step width using the result of the spectrum analysis of the frame for which the quantization step width is to be calculated and the human auditory characteristics including the effect of masking, and outputting the result to the quantization means. It is.

【００１３】本発明の音声符号化装置は、周波数分割手
段がサブバンド分割フィルタバンク器であってもよい。In the speech coding apparatus according to the present invention, the frequency division means may be a sub-band division filter bank.

【００１４】本発明の音声符号化装置は、周波数分割手
段が変形離散コサイン変換方式（ＭＤＣＴ）であっても
よい。In the speech coding apparatus of the present invention, the frequency dividing means may be a modified discrete cosine transform (MDCT).

【００１５】本発明の音声符号化装置は、心理聴覚分析
手段が入力音声信号データ記憶手段から一定長の複数の
フレームを受け各フレームごとに周波数解析を行うスペ
クトラム計算器と、このスペクトラム計算器の結果と人
間の聴覚特性であるマスキング効果を考慮してマスキン
グカーブを求めるマスキングカーブ予測器と、このマス
キングカーブ予測器で求めたマスキングカーブから量子
化ステップ幅を求める量子化ステップ幅予測器とで構成
してもよい。The speech coding apparatus according to the present invention is characterized in that a psychoacoustic analysis means receives a plurality of frames of a fixed length from the input speech signal data storage means and performs a frequency analysis for each frame. Consists of a masking curve predictor that calculates the masking curve in consideration of the result and the masking effect that is a human auditory characteristic, and a quantization step width predictor that calculates the quantization step width from the masking curve calculated by the masking curve predictor May be.

【００１６】[0016]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１７】図１は本発明の第１の実施の形態を示すブ
ロック図である。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【００１８】本発明の音声符号化装置１０は、一定長の
複数のフレームに分割され入力端子９を介して入力され
る入力音声信号データを一時記憶する入力データメモリ
８と、入力音声信号データを各フレームごとに周波数分
割したデータとする周波数分割フィルタバンク１と、入
力データメモリ８から量子化ステップ幅を計算すべきフ
レームを中に挟む前後のｉ個のフレームを受け該当する
フレームのスペクトラム解析の結果とマスキングの効果
を含む人間の聴覚特性とを用いて量子化ステップ幅を計
算する心理聴覚分析部７と、周波数分割フィルタバンク
１が周波数分割したデータを心理聴覚分析部７が計算し
た量子化ステップ幅で量子化する量子化器２と、量子化
器２が量子化した量子化データを符号化ビット列に多重
化する多重化器３とを備えている。また、心理聴覚分析
部７は、入力データメモリ８から一定長の複数のフレー
ムを受け各フレームごとに周波数解析を行うスペクトラ
ム計算器４と、スペクトラム計算器４の計算結果と人間
の聴覚特性であるマスキング効果とを考慮してマスキン
グカーブを求めるマスキングカーブ予測器５と、マスキ
ングカーブ予測器５で求めたマスキングカーブから量子
化ステップ幅を求める量子化ステップ幅予測器６とを含
んでいる。The speech encoding apparatus 10 of the present invention comprises: an input data memory 8 for temporarily storing input speech signal data which is divided into a plurality of frames of a fixed length and inputted via an input terminal 9; A frequency-division filter bank 1 that is data that is frequency-divided for each frame, and i frames before and after a frame for which a quantization step width is to be calculated are received from an input data memory 8, and a spectrum analysis of the corresponding frame is performed. Psychological auditory analysis unit 7 that calculates the quantization step width using the result and the human auditory characteristics including the effect of masking, and quantization that psychological auditory analysis unit 7 calculates the frequency-divided data by frequency division filter bank 1. A quantizer 2 for quantizing with a step width, and a multiplexer 3 for multiplexing the quantized data quantized by the quantizer 2 into an encoded bit sequence. It is equipped with a. Further, the psychological auditory analysis unit 7 receives a plurality of frames of a fixed length from the input data memory 8 and performs a frequency analysis for each frame, a calculation result of the spectrum calculator 4, and human auditory characteristics. It includes a masking curve estimator 5 for obtaining a masking curve in consideration of a masking effect, and a quantization step width estimator 6 for obtaining a quantization step width from the masking curve obtained by the masking curve estimator 5.

【００１９】次に図１の心理聴覚分析部７のスペクトラ
ム計算器４に入力されるデータと実際に符号化される入
力音声信号データとの関係を図２の入力音声信号データ
の構成を説明する説明図を用いて説明する。名称および
符号は図１に示すのものを用いる。Next, the relationship between the data input to the spectrum calculator 4 of the psychological auditory analyzer 7 in FIG. 1 and the input audio signal data actually encoded will be described with reference to the configuration of the input audio signal data in FIG. This will be described with reference to an explanatory diagram. The names and reference numerals shown in FIG. 1 are used.

【００２０】スペクトラム計算器４に入力される入力音
声信号データは、Ｎフレーム目を符号化する場合には、
Ｎフレーム目を中に含む（Ｎ−ｊ）フレ−ムから（Ｎ＋
ｋ）フレーム目までのＢ領域の入力音声信号データを用
いる。この場合の入力音声信号データは（ｉ＋１）×ｎ
個のサンプルからなる。ここでｉ＝ｊ＋ｋである。The input audio signal data input to the spectrum calculator 4 is as follows when encoding the Nth frame.
From the (N-j) frame that contains the Nth frame inside, (N +
k) The input audio signal data in the B region up to the frame is used. The input audio signal data in this case is (i + 1) × n
Consists of samples. Here, i = j + k.

【００２１】このＢ領域で表される（ｉ＋１）×ｎ個の
入力音声信号をスペクトラム計算器４に入力し、周波数
分析を行い、この分析結果をマスキングカーブ予測器５
に入力し、マスキングカーブを求める。続いて、量子化
ステップ幅予測器６が、このマスキングカーブ情報をも
とに量子化ステップ幅を予測することにより、継時マス
キングの順向マスキングと逆向マスキングの両方のマス
キング効果を用いることが可能となる。The (i + 1) × n input audio signals represented by the B region are input to the spectrum calculator 4 to perform frequency analysis, and the analysis result is used as a masking curve predictor 5
And calculate the masking curve. Subsequently, the quantization step width predictor 6 predicts the quantization step width based on the masking curve information, so that it is possible to use both the forward masking and the backward masking effects of the successive masking. Becomes

【００２２】同じ（ｉ＋１）×ｎ個のサンプル数を用い
てマスキングカーブを求める場合において、順向マスキ
ングと逆向マスキングの割り合いを変え、最も多くのマ
スキング量を有するマスキングカーブを求めることが可
能であり、この最も多くのマスキング量を持つマスキン
グカーブを用いて量子化ステップ幅を予測し、この予測
した量子化ステップ幅で量子化した量子化データは、同
じデータ量のデータを符号化した場合の中で、最も効率
が良い符号化となる。When a masking curve is obtained by using the same number of (i + 1) × n samples, it is possible to obtain a masking curve having the largest amount of masking by changing the ratio between forward masking and reverse masking. Yes, the quantization step width is predicted using the masking curve having the largest amount of masking, and the quantized data quantized with the predicted quantization step width is obtained when the data having the same data amount is encoded. Among them, the most efficient encoding is performed.

【００２３】なお、これまで、周波数分割手段として周
波数分割フィルタバンク１を用いた例で説明したが、こ
れは、周波数分割手段がサブバンド分割フィルタバンク
器であっても同様に作用し、最も効率が良い符号化を行
うことが可能である。Although the above description has been made with reference to an example in which the frequency division means uses the frequency division filter bank 1 as the frequency division means, the operation is the same even if the frequency division means is a sub-band division filter bank device, and the most efficient operation is achieved. Can perform good coding.

【００２４】また、周波数分割手段が変形離散コサイン
変換方式（ＭＤＣＴ）であっても、同様に、最も効率が
良い符号化を行うことが可能である。Even if the frequency dividing means is a modified discrete cosine transform method (MDCT), the most efficient coding can be performed similarly.

【００２５】[0025]

【発明の効果】以上説明したように、本発明は、量子化
ステップ幅を予測する場合、順向マスキングの効果と逆
向マスキング効果の両方のマスキング効果を組合せて用
いることにより、従来の順向マスキングの効果のみを用
いてマスキングカーブを算出したものに比較して、より
多くのマスキング量を得ることが可能となり、効率の良
い符号化を行うことが可能となるという効果が有る。As described above, according to the present invention, when predicting the quantization step width, the conventional forward masking effect is obtained by using both the forward masking effect and the backward masking effect in combination. It is possible to obtain a larger amount of masking as compared with the case where a masking curve is calculated using only the effect of (1), and there is an effect that efficient coding can be performed.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態を示すブロック図で
ある。FIG. 1 is a block diagram showing a first embodiment of the present invention.

【図２】本発明の第１の実施の形態の入力音声信号デー
タの構成を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a configuration of input audio signal data according to the first embodiment of this invention.

【図３】従来の一般的な音声符号化装置の構成を示すブ
ロック図である。FIG. 3 is a block diagram showing a configuration of a conventional general speech encoding device.

【図４】従来の一般的な音声符号化装置の入力音声信号
データの構成を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a configuration of input speech signal data of a conventional general speech encoding device.

[Explanation of symbols]

１周波数分割フィルタバンク２量子化器３多重化器４スペクトラム計算器５マスキングカーブ予測器６量子化ステップ幅予測器７心理聴覚分析部８入力データメモリ９入力端子１０音声符号化装置 DESCRIPTION OF SYMBOLS 1 Frequency division filter bank 2 Quantizer 3 Multiplexer 4 Spectrum calculator 5 Masking curve predictor 6 Quantization step width predictor 7 Psychological auditory analysis part 8 Input data memory 9 Input terminal 10 Voice encoder

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平８−123490（ＪＰ，Ａ) 特開平８−51366（ＪＰ，Ａ) 特開平８−204575（ＪＰ，Ａ) 特開平８−204574（ＪＰ，Ａ) 特開平７−183818（ＪＰ，Ａ) 特開平３−139923（ＪＰ，Ａ) 特開平６−242797（ＪＰ，Ａ) 特開平６−232825（ＪＰ，Ａ) 特開平６−201744（ＪＰ，Ａ) 特開平８−211899（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) H03M 7/30 G10L 7/04 G10L 9/18────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-8-123490 (JP, A) JP-A-8-51366 (JP, A) JP-A-8-204575 (JP, A) JP-A-8-204 204574 (JP, A) JP-A-7-183818 (JP, A) JP-A-3-139923 (JP, A) JP-A-6-242797 (JP, A) JP-A-6-232825 (JP, A) JP-A-6-201744 (JP, A) JP-A-8-211899 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) H03M 7/30 G10L 7/04 G10L 9/18

Claims

(57) [Claims]

1. A frequency dividing means for dividing input audio signal data, which is divided into a plurality of frames of a fixed length and input through an input terminal, into frequency-divided data for each frame; The spectrum analysis is performed for each of the received frames, and the latest frame and i (i = 1, 1) before this latest frame are compared with the latest frame.
2,... N) psychological auditory analysis means for calculating a quantization step width using a result of spectrum analysis of frames and human auditory characteristics including a masking effect, and data obtained by frequency division by the frequency division means. A speech encoding apparatus comprising: a quantizing means for quantizing with the quantization step width calculated by the psychological auditory analysis means; and a multiplexing means for multiplexing the quantized data quantized by the quantizing means into an encoded bit string. In, provided input voice signal data storage means for temporarily storing the input voice signal data between the input terminal and the frequency division means and the psychological auditory analysis means,
The psychological auditory analysis means receives i frames before and after the frame for which the quantization step width is to be calculated from the input voice signal data storage means, and performs spectrum analysis of the frame for which the quantization step width is to be calculated. A speech coding apparatus comprising: calculating a quantization step width using a result and a human auditory characteristic including a masking effect and outputting the calculated quantization step width to the quantization unit.

2. The speech coding apparatus according to claim 1, wherein the frequency division means is a sub-band division filter bank unit.

3. The speech coding apparatus according to claim 1, wherein the frequency dividing means is a modified discrete cosine transform (MDCT).

4. A spectrum calculator which receives a plurality of frames of a fixed length from the input voice signal data storage means and performs frequency analysis for each frame, and a result of the spectrum calculator and human auditory characteristics. A masking curve estimator for obtaining a masking curve in consideration of a certain masking effect, and a quantization step width estimator for obtaining a quantization step width from the masking curve obtained by the masking curve estimator. 2. The speech encoding device according to claim 1.