JP2000293200A

JP2000293200A - Audio compression coding method

Info

Publication number: JP2000293200A
Application number: JP11103625A
Authority: JP
Inventors: Toshiyuki Ozaki; 敏之尾崎
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-04-12
Filing date: 1999-04-12
Publication date: 2000-10-20

Abstract

PROBLEM TO BE SOLVED: To reduce the amount of computations by eliminating the computational processes required for quantization/coding if the same state continues in terms of a frame unit in audio data. SOLUTION: One frame of audio data Xn(i) (i=1 to 1,152) is taken in and audio data Xn(i) of less than a threshold value are set to 0. Then, audio data of a previous frame and a present frame are compared (a step S14). If they agree with each other, audio bit stream of the previous frame are replaced and outputted (a step S16), and the processes of an audio model weighting section and a quantization/coding section are omitted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、オーディオデータ
をＭＰＥＧ(Moving Picture Image Coding Experts Gro
up) によって圧縮する音声圧縮符号化方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an MPEG (Moving Picture Image Coding Experts
up), and a voice compression encoding method.

【０００２】[0002]

【従来の技術】近年、大容量メディアへの応用を目的と
した動画像・音声（オーディオ）の高能率符号化が試み
られている。この高能率符号化の国際標準としてＭＰＥ
Ｇがあり、ＩＳＯ／ＩＥＣ１１１７２にその詳細が記載
されている。2. Description of the Related Art In recent years, high-efficiency encoding of moving images and audio (audio) for the purpose of application to large-capacity media has been attempted. MPE is an international standard for this high efficiency coding.
G, which is described in detail in ISO / IEC11172.

【０００３】音声ディジタルデータをＭＰＥＧによって
符号化する音声圧縮符号化装置は、図２に示すようにサ
ブバンド分析部２１、量子化／符号化部２２、ビット多
重化部２３、聴覚モデル重み付け部２４を含んで構成さ
れる。サブバンド分析部は２１は各フレームの音声デー
タを３２のサブバンドサンプルに変換する。一方、聴覚
モデル重み付け部２４は各サブバンドに対するビット割
り当て（ビットアロケーション）を決定し、量子化／符
号化部２２に与える。この結果をもとに、量子化／符号
化部２２はサブバンド分析部２１から出力されたサブバ
ンドサンプルの量子化と符号化とを行う。そして、ビッ
ト多重化部２３は、フレームパッキングされた高能率符
号化データをオーディオビットストリームとして出力
し、記録媒体に蓄積したり、受信側の復号装置に伝送し
たりする。As shown in FIG. 2, an audio compression encoding apparatus for encoding audio digital data by MPEG includes a subband analysis section 21, a quantization / encoding section 22, a bit multiplexing section 23, and an auditory model weighting section 24. It is comprised including. The sub-band analyzer 21 converts the audio data of each frame into 32 sub-band samples. On the other hand, the auditory model weighting unit 24 determines bit allocation (bit allocation) for each subband and supplies the bit allocation (quantization / coding unit 22). Based on the result, the quantization / encoding unit 22 quantizes and encodes the subband samples output from the subband analysis unit 21. Then, the bit multiplexing unit 23 outputs the frame-packed high-efficiency coded data as an audio bit stream, and stores it on a recording medium or transmits it to a decoding device on the receiving side.

【０００４】聴覚モデル重み付け部２４における信号処
理方法として、モデル１とモデル２が存在するが、モデ
ル１の信号処理方法は図３のようになる。即ち、最初の
ステップＳ１では、入力オーディオ信号に対してＦＦＴ
分析をする。そしてステップＳ２で複数のサブバンドに
分割し、各サブバンドの音圧を計算する。次のステップ
Ｓ３では、各サブバンドのデータを純音成分と非純音成
分に選別し、ステップＳ４でレベルの低い周波数成分、
例えば非純音成分を中心とするデータの間引きを行う。
そしてステップＳ５では個別のマスキング閾値を計算
し、ステップＳ６で全サブンドから見た全体的マスキン
グ閾値の計算をする。そしてステップＳ７に進むと、各
サブバンドの最小マスキングレベルを決定し、ステップ
Ｓ８で信号対マスク比（ＳＭＲ）を計算する。このよう
に聴覚モデル重み付け部２４でＳＭＲを求め、得られた
ＳＭＲに基づいてビット割り当てを量子化／符号化部２
２に指示するようにしている。A model 1 and a model 2 exist as signal processing methods in the auditory model weighting unit 24. The signal processing method of the model 1 is as shown in FIG. That is, in the first step S1, the FFT is performed on the input audio signal.
Make an analysis. Then, in step S2, the sound is divided into a plurality of sub-bands, and the sound pressure of each sub-band is calculated. In the next step S3, the data of each sub-band is selected into a pure tone component and a non-pure tone component, and in step S4, a low-level frequency component,
For example, thinning out data centering on non-pure tone components is performed.
In step S5, an individual masking threshold is calculated, and in step S6, an overall masking threshold as viewed from all subordinates is calculated. Then, proceeding to step S7, the minimum masking level of each sub-band is determined, and the signal-to-mask ratio (SMR) is calculated in step S8. As described above, the SMR is obtained by the auditory model weighting unit 24, and the bit allocation is determined based on the obtained SMR by the quantization / encoding unit 2.
2 is instructed.

【０００５】[0005]

【発明が解決しようとする課題】このように、フレーム
単位で音声圧縮符号化が行われるが、その演算量は多大
である。特に、聴覚モデル重み付け部２４における演算
量は多い。仮に、現在処理中のフレームの音声データが
前フレームの音声データと同じ（同じ音声波形）場合、
ビット多重化部２３の出力は同じオーディオビットスト
リームになるはずである。このような場合でも、従来の
音声圧縮符号化方法では、一連の符号化処理を行わなけ
ればならず、同じ演算を再度繰り返さなければならない
という問題があった。As described above, speech compression encoding is performed on a frame basis, but the amount of calculation is enormous. In particular, the amount of calculation in the auditory model weighting unit 24 is large. If the audio data of the frame currently being processed is the same as the audio data of the previous frame (the same audio waveform),
The output of the bit multiplexing unit 23 should be the same audio bit stream. Even in such a case, the conventional audio compression encoding method has a problem that a series of encoding processes must be performed and the same operation must be repeated again.

【０００６】本発明は、このような従来の問題点に鑑み
てなされたものであって、現フレームの音声データが前
フレームの音声データが同一であるとみなされた場合、
一連の符号化処理を省き、前フレームのオーディオビッ
トストリームをそのまま利用することにより、符号化処
理の演算量の大幅な削減を行うことのできる音声圧縮符
号化方法を実現することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of such a conventional problem, and when the audio data of the current frame is considered to be the same as the audio data of the previous frame,
It is an object of the present invention to realize a voice compression encoding method capable of greatly reducing the amount of computation in the encoding process by omitting a series of encoding processes and using the audio bit stream of the previous frame as it is.

【０００７】[0007]

【課題を解決するための手段】本願の請求項１の発明
は、オーディオ信号を規定の周波数でサンプリングし、
得られた音声データを所定のサンプング数毎にフレーム
化し、フレーム単位にサブバンド分析を行い、各サブバ
ンドの音声データを聴覚モデルに基づいて重み付けを行
い、得られた重み付けに基づいて量子化及び符号化をし
てオーディオビットストリームを生成する音声圧縮符号
化方法であって、前フレームの音声データ、及び前フレ
ームの音声データが量子化及び符号化されたオーディオ
ビットストリームをメモリに蓄え、現フレームの音声デ
ータが入力されたとき、現フレームの音声データと前記
メモリに保持された前フレームの音声データとを比較
し、前記比較において現フレームと前フレームの音声デ
ータが同一のオーディオ波形からなるものと判定された
とき、前フレームのオーディオビットストリームを前記
メモリから読み出し、現フレームのオーディオビットス
トリームとして出力することを特徴とするものである。According to the first aspect of the present invention, an audio signal is sampled at a specified frequency,
The obtained audio data is framed for each predetermined number of samplings, subband analysis is performed for each frame, audio data of each subband is weighted based on an auditory model, and quantization and quantization are performed based on the obtained weighting. An audio compression encoding method for encoding to generate an audio bit stream, comprising: storing, in a memory, audio data of a previous frame and an audio bit stream in which the audio data of the previous frame is quantized and encoded; When the audio data of the current frame is input, the audio data of the current frame is compared with the audio data of the previous frame held in the memory, and in the comparison, the audio data of the current frame and the audio data of the previous frame have the same audio waveform. Read the audio bit stream of the previous frame from the memory It is characterized in that the output as audio bitstream of the current frame.

【０００８】このような方法によれば、前フレームの音
声波形と現フレームの音声波形が同一である場合、符号
化過程の演算処理を大幅に削減することができ。According to such a method, when the audio waveform of the previous frame is the same as the audio waveform of the current frame, the arithmetic processing in the encoding process can be greatly reduced.

【０００９】[0009]

【発明の実施の形態】以下に、本発明の実施の形態にお
ける音声圧縮符号化方法について図面を参照しつつ説明
する。音声圧縮符号化装置の構成は図２に示すものと同
一であるが、その信号処理方法が従来例と異なる。図１
は本実施の形態の音声圧縮符号化方法を示すフロチャー
トである。以下、ＭＰＥＧ１オーディオのレイヤ２と呼
ばれる符号化処理に限定して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech compression encoding method according to an embodiment of the present invention will be described below with reference to the drawings. The configuration of the audio compression encoding apparatus is the same as that shown in FIG. 2, but the signal processing method is different from the conventional example. FIG.
4 is a flowchart showing a speech compression encoding method according to the present embodiment. Hereinafter, the description will be limited to the encoding process called layer 2 of MPEG1 audio.

【００１０】図１において、ステップＳ１１では、音声
データを１フレームサンプル分取り込り込み、このデー
タをＸ_n （ｉ）（ｉ＝１，２・・・１１５２、ｎ＝フレ
ーム番号）とする。音声データを取り込むサンプリング
周波数として、３２ｋＨｚ、４４．１ｋＨｚ、４８ｋＨ
ｚがある。次のステップＳ１２では、１フレームの音声
データＸ_n （ｉ）が全て所定の閾値以下か否かを調べ
る。閾値以下であれば、現フレームはノイズ成分のみと
見なし、音声データＸ_n （ｉ）の値を０にする。In FIG. 1, in step S11, audio data for one frame sample is fetched, and this data is defined as X _n (i) (i = 1, 2,..., 1152, n = frame number). 32 kHz, 44.1 kHz, 48 kHz as sampling frequencies for capturing audio data
There is z. In the next step S12, it is checked whether or not all the audio data X _n (i) of one frame is equal to or less than a predetermined threshold. If it is equal to or less than the threshold value, the current frame is regarded as a noise component only, and the value of the audio data X _n (i) is set to 0.

【００１１】次のステップＳ１３では、前フレームが余
分なスロットを含まないか否かを調べる。サンプリング
周波数が３２ｋＨｚ又は４８ｋＨｚの場合、サンブリン
グ数１１５２と１フレームに割当てられたバイト数との
関係で、余分なスロットは生じない。しかしサンプリン
グ周波数が４４．１ｋＨｚの場合には、１フレームのス
トリームのビット数が一定ではなくなり（端数が生じ
る）、余分なスロット（１バイト）を含む場合も生じ
る。このように前フレームが余分スロットを含んでいる
場合には、現フレームのストリームを前フレームのスト
リームを置き換えることはできないため、ステップＳ１
５に分岐し、通常の符号化処理を行う。そしてこれに続
くステップＳ１９に進み、新規ストリームのデータを作
成する。In the next step S13, it is checked whether or not the previous frame does not include an extra slot. When the sampling frequency is 32 kHz or 48 kHz, no extra slot occurs due to the relationship between the number of samplings 1152 and the number of bytes allocated to one frame. However, when the sampling frequency is 44.1 kHz, the number of bits of a stream of one frame is not constant (fraction occurs), and an extra slot (1 byte) may be included. When the previous frame includes an extra slot as described above, the stream of the current frame cannot be replaced with the stream of the previous frame.
Then, the process branches to 5 to perform a normal encoding process. Then, the process proceeds to step S19 to create data of a new stream.

【００１２】一方、ステップＳ１３で、前フレームが余
分スロットを含まないと判定された場合は、ステップＳ
１４に分岐する。このステッブＳ１４では、現フレーム
の音声データＸ_n （ｉ）と前フレームの音声データＸ
_n-1 （ｉ）（ｉ＝１〜１１５２）とが一致するか否かを
調べる。このため、予め前フレームの音声データＸ_n-1
（ｉ）をメモリに蓄えておき、Ｘ_n （ｉ）とＸ_n-1
（ｉ）とを比較する。On the other hand, if it is determined in step S13 that the previous frame does not include an extra slot,
Branch to 14. In this step S14, the audio data X _n (i) of the current frame and the audio data X
_It is checked whether or not _n-1 (i) (i = 1 to 1152) matches. Therefore, the audio data X _{n-1 of the} previous frame is
(I) is stored in a memory, and X _n (i) and X _n−1
(I) is compared.

【００１３】ここで、Ｘ_n （ｉ）とＸ_n-1 （ｉ）の差分
値をＹ（ｉ）とすると、Ｙ（ｉ）の値がサンプリングポ
イントｉに関わらず０となった場合には、前フレームの
音声データＸ_n-1 （ｉ）と現フレームの音声データＸ_n
（ｉ）が同一のものとみなす。このステップＳ１４で前
フレームの音声データと現フレームの音声データが一致
しなければ、前述したステップＳ１５に分岐し、通常の
符号化処理を行う。またステップＳ１４で前フレームの
音声データと現フレームの音声データが一致すれぱ、ス
テップＳ１６に分岐し、現フレームに対するビットスト
リームを予めメモリに蓄えていた前フレームのビットス
トリームに置き換える。Here, assuming that the difference between X _n (i) and X _n-1 (i) is Y (i), if the value of Y (i) becomes 0 regardless of the sampling point i, , voice data of the previous frame X _n-1 (i) and the current frame speech data X _n
(I) are considered the same. If the audio data of the previous frame and the audio data of the current frame do not match in this step S14, the flow branches to the above-described step S15, where normal encoding processing is performed. If the audio data of the previous frame and the audio data of the current frame match in step S14, the flow branches to step S16, where the bit stream for the current frame is replaced with the bit stream of the previous frame stored in the memory in advance.

【００１４】次のステップＳ１７に進むと、現フレーム
が余分なスロットを含むか否かを調べる。前フレームが
余分スロットを含まず、現フレームが余分スロットを含
むフレームである場合には、ステップＳ１８に移り、現
フレームのストリームの後尾に０ｘ００を付加して処理
を終える。またステップＳ１７で現フレームが余分なス
ロットを含まなければ、処理を終える。In the next step S17, it is checked whether or not the current frame includes an extra slot. If the previous frame does not include an extra slot and the current frame is a frame including an extra slot, the process proceeds to step S18, where 0x00 is added to the end of the stream of the current frame, and the process ends. If the current frame does not include an extra slot in step S17, the process ends.

【００１５】このようにフレーム単位で無音状態が続い
たり、同一の純音が継続するような場合、前フレームと
現フレームの音声データが同一であるとみなされるの
で、特に聴覚モデル重み付け部２４における演算は不要
となり、量子化／符号化部２２の処理も不要となる。In the case where the silent state continues in the frame unit or the same pure tone continues, the sound data of the previous frame and the current frame are regarded as the same. Becomes unnecessary, and the processing of the quantization / encoding unit 22 becomes unnecessary.

【００１６】[0016]

【発明の効果】以上にように、本発明の音声圧縮符号化
方法によれば、前フレームと現フレームの音声データを
比較する。現フレームの音声データが前フレームの音声
データと一致すれば、一連の符号化処理を施すことな
く、前のストリームの置き換えで済むため、大幅な演算
量削減が可能となる。As described above, according to the audio compression encoding method of the present invention, the audio data of the previous frame and the audio data of the current frame are compared. If the audio data of the current frame matches the audio data of the previous frame, the previous stream can be replaced without performing a series of encoding processes, so that the amount of calculation can be greatly reduced.

[Brief description of the drawings]

【図１】本発明の実施の形態における音声圧縮符号化方
法の信号処理手順を示すフローチャートである。FIG. 1 is a flowchart showing a signal processing procedure of a voice compression encoding method according to an embodiment of the present invention.

【図２】音声圧縮符号化装置の構成を示すブロック図で
ある。FIG. 2 is a block diagram illustrating a configuration of a voice compression encoding device.

【図３】音声圧縮符号化装置（ＭＰＥＧオーディオエン
コーダ）において、聴覚モデル重み付け部の信号処理方
法の説明図である。FIG. 3 is an explanatory diagram of a signal processing method of an auditory model weighting unit in the audio compression encoding device (MPEG audio encoder).

[Explanation of symbols]

２１サブバンド分析部２２量子化／符号化部２３ビット多重化部２４聴覚モデル重み付け部 Reference Signs List 21 Subband analysis unit 22 Quantization / encoding unit 23 Bit multiplexing unit 24 Auditory model weighting unit

Claims

[Claims]

An audio signal is sampled at a specified frequency, obtained audio data is framed for each predetermined number of samplings, subband analysis is performed for each frame, and audio data of each subband is based on an auditory model. A voice compression encoding method for generating an audio bit stream by performing quantization and encoding based on the obtained weights, wherein the audio data of the previous frame and the audio data of the previous frame are quantized. And when the encoded audio bit stream is stored in the memory, and when the audio data of the current frame is input, the audio data of the current frame is compared with the audio data of the previous frame held in the memory. When it is determined that the audio data of the frame and the previous frame have the same audio waveform, An audio compression encoding method, comprising reading an audio bit stream of a frame from the memory and outputting the audio bit stream of the current frame.