JP3348759B2

JP3348759B2 - Transform coding method and transform decoding method

Info

Publication number: JP3348759B2
Application number: JP24814595A
Authority: JP
Inventors: 直樹岩上; 健弘守谷; 聡三樹
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-09-26
Filing date: 1995-09-26
Publication date: 2002-11-20
Anticipated expiration: 2015-09-26
Also published as: JPH0990989A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、楽音信号あるいは
音声信号等、ピッチ成分を含む信号の変換符号化方法お
よび変換復号化方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to a conversion encoding method and a conversion decoding method for a signal containing a pitch component, such as a tone signal or a voice signal.

【０００２】[0002]

【従来の技術】現在、楽音信号あるいは音声信号等のオ
ーディオ信号を高能率に符号化する方法として、該オー
ディオ信号をフレームと呼ばれる５〜５０ｍｓ程度の一
定間隔の区間に分割し、この１フレームの信号に時間−
周波数変換を施して得られた周波数領域信号を、周波数
特性の包絡形状（周波数特性の概形）と、周波数領域信
号を周波数特性概形で平坦化して得られる残差信号の２
つの情報に分離し、それぞれを符号化することが提案さ
れている。2. Description of the Related Art At present, as a method for encoding an audio signal such as a tone signal or a voice signal with high efficiency, the audio signal is divided into frames, which are called frames and have a fixed interval of about 5 to 50 ms. Time to signal-
The frequency domain signal obtained by performing the frequency conversion is divided into an envelope shape of the frequency characteristic (an outline of the frequency characteristic) and a residual signal obtained by flattening the frequency domain signal with the outline of the frequency characteristic.
It has been proposed to separate the two pieces of information and encode each of them.

【０００３】また、このような符号化法の具体的な方法
として、適応スペクトル聴感制御エントロピー符号化法
（ASPEC,Adaptive Spectral Perceptual Entropy Codin
g）、重み付きベクトル量子化による変換符号化法（TCW
VQ,Transform Coding withWeighted Vector Quantizati
on）、およびエムペグ−オーディオ・レイヤ３方式（MP
EG-Audio Layer 3）等が提案されている。As a specific method of such an encoding method, an adaptive spectral perceptual entropy encoding method (ASPEC, Adaptive Spectral Perceptual Entropy Codin) is known.
g), transform coding method using weighted vector quantization (TCW
VQ, Transform Coding with Weighted Vector Quantizati
on), and MPeg-Audio Layer 3 (MP
EG-Audio Layer 3) and the like have been proposed.

【０００４】なお、これらの技術については、K.Brande
nburg, J.Herre, J.D.Johnston etal:"ASPEC:Adaptive
spectral entropy coding of high quality music sign
als", Proc.AES'91 、T.Moriya, H.Suda:"An 8 Kbit/s
transform coder for noisychannels", Proc.ICASSP'89
pp.196--199 、および ISO/IEC標準 IS-11172-3 に詳
しく述べられている。[0004] These technologies are described in K. Brande
nburg, J. Herre, JDJohnston etal: "ASPEC: Adaptive
spectral entropy coding of high quality music sign
als ", Proc. AES'91, T. Moriya, H. Suda:" An 8 Kbit / s
transform coder for noisychannels ", Proc.ICASSP'89
pp.196--199 and ISO / IEC standard IS-11172-3.

【０００５】ここで、これらの符号化法によって高能率
な符号化を実現するためには、残差信号は、できるだけ
周波数特性が平坦であることが望ましい。このため、上
述の適応スペクトル聴感制御エントロピー符号化法（AS
PEC）あるいはエムペグ−オーディオ・レイヤ３方式（M
PEG-Audio Layer 3）では、図５に示すように、周波数
領域信号をいくつかの小帯域に分割し、各小帯域内の信
号を帯域の強さを表すスケーリングファクタと呼ばれる
値で除算して正規化することにより、残差信号の周波数
特性の平坦化を図っている。Here, in order to realize highly efficient encoding by these encoding methods, it is desirable that the residual signal has as flat a frequency characteristic as possible. For this reason, the adaptive spectral hearing control entropy coding method (AS
PEC) or MPeg-Audio Layer 3 (M
In PEG-Audio Layer 3), as shown in FIG. 5, a frequency domain signal is divided into several sub-bands, and a signal in each sub-band is divided by a value called a scaling factor representing the strength of the band. The frequency characteristics of the residual signal are flattened by normalization.

【０００６】一方、これらの方法よりも高能率な周波数
領域信号の平坦化方法として、図６に示すような線形予
測分析を用いる方法がある。この方法では、入力信号を
線形予測して得られた線形予測係数で線形予測分析フィ
ルタを駆動することにより周波数特性の平坦化を行う。
この方法は、上記重み付きベクトル量子化による変換
符号化法（TCWVQ）で用いられている手法である。On the other hand, as a method of flattening a frequency domain signal with higher efficiency than these methods, there is a method using a linear prediction analysis as shown in FIG. In this method, a frequency characteristic is flattened by driving a linear prediction analysis filter with a linear prediction coefficient obtained by linearly predicting an input signal.
This method is a method used in the above-described transform coding method using weighted vector quantization (TCWVQ).

【０００７】なお、線形予測分析、離散コサイン変換
（DCT）、変形離散コサイン変換（MDCT）等の各関連各
技術については、斉藤、中田”音声情報処理の基礎”
（オーム社）の第６章、K.R.Rao,P.Yip 著、安田、藤原
訳”画像符号化技術−DCTとその国際標準”（オーム
社）の第２章、H.S.Malvar,"Signal Processing with L
apedTransforms,"Artech House 、および ISO/IEC 標
準 IS-11172-3 に記載されている。[0007] Regarding each related technology such as linear prediction analysis, discrete cosine transform (DCT), and modified discrete cosine transform (MDCT), Saito and Nakata, "Basics of speech information processing".
Chapter 6 of Ohmsha, KRRao, P. Yip, Translated by Yasuda and Fujiwara, "Image Coding Technology-DCT and Its International Standards", Chapter2, HSMalvar, "Signal Processing with L
apedTransforms, "Artech House, and ISO / IEC Standard IS-11172-3.

【０００８】[0008]

【発明が解決しようとする課題】しかし、これらの符号
化方法では、周波数特性の大局的な概形を正規化するに
とどまり、楽音や音声のピッチ成分による微視的な周波
数特性の凹凸を能率良く除去することができない。した
がって、このことが障害となり、上記従来の符号化方法
は、ピッチ成分の強いオーディオ信号を符号化する場合
に高能率化することが困難であった。However, these encoding methods only normalize the general outline of the frequency characteristics and efficiently remove microscopic irregularities in the frequency characteristics due to pitch components of musical sounds and voices. It cannot be removed well. Therefore, this is an obstacle, and it is difficult for the above-mentioned conventional encoding method to achieve high efficiency when encoding an audio signal having a strong pitch component.

【０００９】本発明は、上述する問題点に鑑みてなされ
たもので、ピッチ成分が含まれたオーディオ信号を能率
良く符号化することが可能な変換符号化方法および変換
復号化方法を提供することを目的としている。The present invention has been made in view of the above-mentioned problems, and provides a transform coding method and a transform decoding method capable of efficiently coding an audio signal containing a pitch component. It is an object.

【００１０】[0010]

【課題を解決するための手段】請求項１記載の発明は、
楽音信号あるいは音声信号を一定時間間隔のフレームに
分割し、各フレームに時間−周波数変換を施して周波数
領域信号を生成する時間−周波数変換段階と、周波数領
域信号の概形信号を生成し、当該概形信号を量子化して
量子化概形インデックスを出力する概形計算・量子化段
階と、前記周波数領域信号を量子化された前記概形信号
により除算して平坦化信号を生成する平坦化段階と、前
記平坦化信号からピッチ成分を検出して量子化し量子化
ピッチ成分インデックスを出力するピッチ符号化段階
と、前記平坦化信号から量子化した前記ピッチ成分を除
去した平坦化信号の量子化平坦化信号インデックスを出
力する平坦化信号量子化段階とを有することを特徴とし
ている。According to the first aspect of the present invention,
Convert musical or audio signals into frames at fixed time intervals
Divide and perform time-frequency conversion on each frame to
Time generates area signals - a frequency conversion stage, frequency domain
Generates an approximate signal of the area signal and quantizes the approximate signal.
A rough calculation / quantization stage that outputs a quantized rough index
Floor and the generalized signal obtained by quantizing the frequency domain signal
And planarization generating a flattened signal by dividing by the previous
Detect and quantize pitch components from the flattened signal and quantize
Pitch encoding step to output pitch component index
If, dividing the pitch component obtained by quantizing from the flattened signal
Output the quantized flattened signal index of the flattened signal
It is characterized by having a flattened signal quantization step of force.

【００１１】請求項２記載の発明は、請求項１記載の発
明において、ピッチ符号化段階では、ピッチ成分の基本
周波数を求めて量子化し、周波数領域信号から前記基本
周波数の自然数倍の周波数またはこの周波数に最も近い
周波数のサンプルをピッチ成分サンプルとして抽出して
量子化し、このようにして得られた量子化ピッチ基本周
波数インデックスと量子化ピッチ成分サンプルインデッ
クスとを量子化ピッチ成分インデックスとして出力する
ことを特徴としている。According to a second aspect of the present invention, in the first aspect of the invention, in the pitch encoding step, a basic
Quantized seeking frequency, the fundamental from the frequency domain signal
Frequency that is a natural number multiple of frequency or closest to this frequency
Extract frequency samples as pitch component samples
Quantized, the basic pitch of the quantized pitch thus obtained
Wave index and quantized pitch component sample index
Is output as a quantized pitch component index .

【００１２】請求項３記載の発明は、請求項２記載の発
明において、ピッチ符号化段階では、周波数領域信号か
ら基本周波数の自然数倍の周波数あるいはこの周波数に
最も近い周波数のサンプル及びこれを含めた連続する複
数のサンプルを１単位としてピッチ成分を抽出して量子
化することを特徴としている。According to a third aspect of the present invention, in the second aspect of the invention, in the pitch encoding step, the frequency domain signal
To a frequency that is a natural number multiple of the fundamental frequency or
The nearest frequency sample and the consecutive
Quantizing the pitch component by extracting the number of samples as one unit
Is characterized by

【００１３】請求項４記載の発明は、請求項２または３
記載の発明において、ピッチ符号化段階では、ピッチ成
分サンプルを一括または各単位毎にベクトル量子化する
ことを特徴としている。The invention according to claim 4 is the invention according to claim 2 or 3.
In the described invention, in the pitch encoding step, the pitch component
It is characterized in that the minute samples are vector-quantized collectively or for each unit .

【００１４】請求項５記載の発明は、量子化平坦化信号
インデックスから平坦化信号を再生する平坦化信号再生
段階と、量子化ピッチ成分インデックスからピッチ成分
を再生するピッチ再生段階と、量子化概形インデックス
から概形信号を再生する概形信号再生段階と、前記平坦
化信号に前記ピッチ成分を加えた信号を前記概形信号で
逆平坦化して周波数領域信号を再生する逆平坦化段階
と、前記周波数領域信号に時間−周波数逆変換を施して
時間領域の楽音信号あるいは音声信号を生成する時間−
周波数逆変換段階とを有することを特徴としている。According to a fifth aspect of the present invention, there is provided a quantized flattened signal.
Flattening signal regeneration that reproduces the flattening signal from the index
Step and pitch component from quantized pitch component index
Playback pitch and quantization rough index
And envelope signal reproducing step of reproducing the outline signal from the flat
The signal obtained by adding the pitch component to the digitized signal is
De-flattening step to reproduce the frequency domain signal by de-flattening
And performing a time-frequency inverse transform on the frequency domain signal.
Time to generate a tone signal or voice signal in the time domain-
It is characterized by having a frequency inverse transform stage.

【００１５】請求項６記載の発明は、請求項５記載の発
明において、ピッチ再生段階では、ピッチ成分の基本周
波数を復号化し、ピッチ成分として量子化ピッチ成分サ
ンプルインデックスから復号したピッチ成分サンプルを
前記基本周波数の自然数倍の周波数またはこの周波数に
最も近い周波数に周波数領域信号として配置することを
特徴としている。According to a sixth aspect of the present invention, in the fifth aspect of the invention, in the pitch reproducing step, the basic circumference of the pitch component is
The wave number is decoded, and the quantized pitch component
The pitch component sample decoded from the sample index
A frequency that is a natural number multiple of the fundamental frequency or this frequency
It is characterized by being arranged as a frequency domain signal at the closest frequency .

【００１６】請求項７記載の発明は、請求項６記載の発
明において、ピッチ再生段階では、ピッチ成分として量
子化ピッチ成分サンプルインデックスから復号したピッ
チ成分サンプルを基本周波数の自然数倍の周波数または
この周波数に最も近い周波数のサンプル及びこれを含め
た連続する複数のサンプルを１単位として周波数領域信
号として配置することを特徴としている。According to a seventh aspect of the present invention, in the invention according to the sixth aspect, at the pitch reproduction stage, a quantity is used as a pitch component.
The pitch decoded from the child pitch component sample index
Multiplied by a natural number multiple of the fundamental frequency or
Include the sample at the frequency closest to this frequency and
Frequency domain signal with multiple consecutive samples as one unit.
It is characterized by being arranged as a number .

【００１７】請求項８記載の発明は、請求項６または７
記載の発明において、ピッチ再生段階では、ピッチ成分
サンプルを一括または各単位毎に復号することを特徴と
している。The invention according to claim 8 is the invention according to claim 6 or 7.
In the described invention, in the pitch reproduction stage, the pitch component
It is characterized in that samples are decoded at once or for each unit .

【００１８】[0018]

【作用】楽音あるいは音声は、ピッチすなわち音程の高
／低を有する。この楽音あるいは音声を周波数変換して
得られる周波数領域信号には、一定の周波数間隔で並ぶ
ピッチ成分が含まれる。したがって、該周波数領域信号
を自らの周波数特性の概形で正規化して得られる残差信
号にも、上記ピッチ成分が含まれている。このピッチ成
分は、全体のパワーに対してエネルギーの大きいスパイ
クとなって現れるので、残差信号の平坦度を落として量
子化能率を悪化させる。しかし、本発明は、ピッチ成分
が周波数軸上で等間隔に並んでいる点に着目し、ピッチ
成分を残差信号から差し引くことにより、少ない付加情
報量で残差係数の平坦度を高める。The musical tone or voice has a pitch, that is, a high / low pitch. The frequency domain signal obtained by frequency-converting this musical tone or voice contains pitch components arranged at a constant frequency interval. Therefore, the above-mentioned pitch component is also included in the residual signal obtained by normalizing the frequency domain signal with the outline of its own frequency characteristic. Since this pitch component appears as a spike having a large energy with respect to the entire power, the flatness of the residual signal is reduced to deteriorate the quantization efficiency. However, the present invention focuses on the fact that pitch components are arranged at regular intervals on the frequency axis, and subtracts the pitch components from the residual signal, thereby increasing the flatness of the residual coefficient with a small amount of additional information.

【００１９】[0019]

【発明の実施の形態】以下、図面を参照して本発明の一
実施形態について説明する。図１は、本実施形態による
変換符号化方法および変換復号化方法を説明する図であ
り、符号Ａは符号器、またＢは復号器である。図示する
ように、符号器Ａは、時間−周波数変換器１、大局的概
形計算・量子化器２、第１平坦化器３、ピッチ符号化器
４、加算器５、微細スペクトル概形計算・量子化器６、
第２平坦化器７、および量子化器８によって構成されて
いる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram for explaining a transform coding method and a transform decoding method according to the present embodiment, where code A is an encoder, and B is a decoder. As shown, the encoder A includes a time-frequency converter 1, a global shape calculator / quantizer 2, a first flattener 3, a pitch encoder 4, an adder 5, and a fine spectrum shape calculator.・ Quantizer 6,
It comprises a second flattener 7 and a quantizer 8.

【００２０】時間−周波数変換器１は、時間領域の入力
信号（楽音信号あるいは音声信号等のオーディオ信号）
を一定時間間隔のフレームに分割し、各々のフレームに
時間−周波数変換を施して周波数領域信号を生成する。The time-frequency converter 1 is a time-domain input signal (music signal or audio signal such as a voice signal).
Is divided into frames at fixed time intervals, and time-frequency conversion is performed on each frame to generate a frequency domain signal.

【００２１】図２は、この周波数領域信号の周波数特性
を示したものである。この図に示すように、楽音信号あ
るいは音声信号の周波数領域信号は、一定周波数間隔ｐ
で配列するピッチ成分が含まれている。なお、変換手法
としては、離散コサイン変換（Discrete Cosine Transf
ormation,DCT）や変形離散コサイン変換（Modified Dis
crete Cosine Transformation,MDCT）を用いることがで
きる。FIG. 2 shows the frequency characteristics of the frequency domain signal. As shown in this figure, the frequency domain signal of the tone signal or the audio signal has a constant frequency interval p.
The pitch components arranged in the order are included. As a conversion method, Discrete Cosine Transf
ormation, DCT) and modified discrete cosine transform (Modified Dis
crete Cosine Transformation, MDCT).

【００２２】大局的概形計算・量子化器２は、上記時間
−周波数変換器１から出力された周波数領域信号の大局
的な概形を示す信号を生成し、また量子化する。そし
て、この信号を上記第１平坦化器３に出力すると共に、
量子化大局的概形インデックスとして外部に出力する。
該大局的概形の算出手法としては、線形予測スペクト
ル、あるいは周波数領域信号を複数のサブバンドに分割
し、各バンドの代表値によって周波数領域信号全体の概
形を表現するスケールファクタを用いてもよい。The global shape calculator / quantizer 2 generates and quantizes a signal indicating a global shape of the frequency domain signal output from the time-frequency converter 1. Then, this signal is output to the first flattener 3 and
Output to the outside as a quantized global shape index.
As a method of calculating the general outline, a linear prediction spectrum or a scale factor that divides a frequency domain signal into a plurality of sub-bands and uses a representative value of each band to express an outline of the entire frequency domain signal may be used. Good.

【００２３】なお、線形予測スペクトルを量子化する場
合は、線形予測パラメータをＬＳＰパラメータに変換し
て量子化する。またはＫパラメータに変換して量子化す
る。When quantizing the linear prediction spectrum, the linear prediction parameters are converted into LSP parameters and quantized. Alternatively, it is converted into a K parameter and quantized.

【００２４】第１平坦化器３は、上記時間−周波数変換
器１から出力された周波数領域信号を大局的概形計算・
量子化器２から出力された上記大局的概形信号によって
除算することにより平坦化し、第１の平坦化信号を出力
する。The first flattener 3 calculates a global outline of the frequency domain signal output from the time-frequency converter 1.
The signal is flattened by being divided by the globally generalized signal output from the quantizer 2, and a first flattened signal is output.

【００２５】次に、ピッチ符号化器４は、上記第１の平
坦化信号からピッチ成分を検出して符号化する。また、
図３はピッチ符号化器４の詳細を示す図であり、上記第
１の平坦化信号は、図示するピッチ基本周波数抽出器４
ａおよびピッチサンプル抽出器４ｂに入力される。Next, the pitch encoder 4 detects and encodes a pitch component from the first flattened signal. Also,
FIG. 3 is a diagram showing details of the pitch encoder 4. The first flattened signal is a pitch fundamental frequency extractor 4 shown in FIG.
a and pitch sample extractor 4b.

【００２６】このピッチ基本周波数抽出器４ａは、第１
の平坦化信号を分析することによりピッチ成分の基本周
波数（ピッチ基本周波数）を求める。すなわち、ピッチ
基本周波数抽出器４ａは、第１の平坦化係数のケプスト
ラムを計算し、その最大値をピッチ成分の基本周期とす
る。そして、該基本周期の逆数を演算することによりピ
ッチ基本周波数を求め、ピッチ基本周波数量子化器４ｃ
に出力する。The pitch fundamental frequency extractor 4a has a first
The fundamental frequency of the pitch component (pitch fundamental frequency) is obtained by analyzing the flattened signal of That is, the pitch fundamental frequency extractor 4a calculates the cepstrum of the first flattening coefficient, and sets the maximum value as the fundamental period of the pitch component. Then, the pitch fundamental frequency is obtained by calculating the reciprocal of the fundamental period, and the pitch fundamental frequency quantizer 4c
Output to

【００２７】なお、ピッチ基本周波数をより正確にする
ために、求められたピッチ基本周波数の前後で、ピッチ
基本周波数ごとの第１の平坦化信号のサンプルのパワー
の総和が最大になる基本周波数を検索し、新たにこれを
ピッチ基本周波数としてもよい。In order to make the pitch fundamental frequency more accurate, the fundamental frequency at which the sum of the powers of the samples of the first flattened signal for each pitch fundamental frequency becomes maximum before and after the obtained pitch fundamental frequency is determined. A search may be made, and this may be newly set as the pitch fundamental frequency.

【００２８】ピッチ基本周波数量子化器４ｃは、このよ
うにして求められたピッチ基本周波数を量子化する。す
なわち、このピッチ基本周波数量子化器４ｃは、ピッチ
基本周波数の対数値をスカラ量子化し、量子化ピッチ基
本周波数インデックスとして外部に出力すると共に、こ
のスカラ量子化された信号を上記ピッチサンプル抽出器
４ｂに出力する。The pitch fundamental frequency quantizer 4c quantizes the pitch fundamental frequency thus obtained. That is, the pitch fundamental frequency quantizer 4c scalar-quantizes the logarithmic value of the pitch fundamental frequency, outputs the quantized pitch fundamental frequency index to the outside, and outputs the scalar-quantized signal to the pitch sample extractor 4b. Output to

【００２９】このピッチサンプル抽出器４ｂは、第１平
坦化器３から入力された第１の平坦化信号に対して、ピ
ッチ基本周波数量子化器４ｃから入力された量子化ピッ
チ基本周波数の自然数倍の周波数に最も近いサンプルを
中心として前後１サンプルを抽出し、この３サンプル一
組を一本のピッチ成分のサンプル群としてピッチサンプ
ル量子化器４ｄに出力する。なお、このピッチ成分のサ
ンプル群の数は、固定値でも良いし、可変としても良
い。The pitch sample extractor 4b converts the first flattened signal inputted from the first flattener 3 into a natural number of the quantized pitch fundamental frequency inputted from the pitch fundamental frequency quantizer 4c. One sample before and after the sample closest to the double frequency is extracted, and one set of these three samples is output to the pitch sample quantizer 4d as one pitch component sample group. Note that the number of the pitch component sample groups may be a fixed value or may be variable.

【００３０】ピッチサンプル量子化器４ｄは、上記ピッ
チ成分のサンプル群を量子化して量子化ピッチ成分イン
デックスとして外部に出力すると共に、この量子化ピッ
チ成分インデックスを復号した量子化ピッチ成分を上記
加算器５に出力する。なお、該サンプル群の量子化は、
スカラ量子化であっても良いし、３サンプルからなるサ
ンプル群ごとにベクトル量子化してもよい。また、全サ
ンプル群を一括でベクトル量子化しても良い。以上がピ
ッチ符号化器４において行われる処理である。The pitch sample quantizer 4d quantizes the sample group of pitch components and outputs the quantized pitch component index to the outside, and decodes the quantized pitch component index to the adder. 5 is output. The quantization of the sample group is
Scalar quantization may be used, or vector quantization may be performed for each sample group including three samples. Further, all the sample groups may be collectively vector-quantized. The above is the processing performed in the pitch encoder 4.

【００３１】次に、加算器５は、該ピッチ符号化器４か
ら入力されたピッチ成分の量子化信号を用いて、第１平
坦化器３から入力された第１の平坦化信号からピッチ成
分のみを差し引いて第２の平坦化信号を生成し、微細ス
ペクトル概形計算・量子化器６および第２平坦化器７に
出力する。Next, the adder 5 uses the quantized signal of the pitch component input from the pitch encoder 4 to calculate the pitch component from the first flattened signal input from the first flattener 3. A second flattened signal is generated by subtracting only the second flattened signal, and output to the fine spectrum rough shape calculator / quantizer 6 and the second flattener 7.

【００３２】ここで、図４は、この第２の平坦化信号の
周波数特性を示す図である。上記図２との比較でわかる
ように、第２の平坦化信号は、時間−周波数変換器１か
ら出力された周波数領域信号からピッチ成分を除去した
ものとなる。FIG. 4 is a diagram showing the frequency characteristics of the second flattened signal. As can be seen from a comparison with FIG. 2, the second flattened signal is obtained by removing the pitch component from the frequency domain signal output from the time-frequency converter 1.

【００３３】微細スペクトル概形計算・量子化器６は、
該第２の平坦化信号から微細なスペクトルの概形（微細
スペクトル概形）を計算し、これを量子化する。そし
て、この量子化した信号を量子化微細スペクトル概形イ
ンデックスとして外部に出力すると共に、第２平坦化器
７に出力する。The fine spectrum shape calculator / quantizer 6 comprises:
A fine spectral outline (fine spectral outline) is calculated from the second flattened signal and quantized. Then, the quantized signal is output to the outside as a quantized fine spectrum rough shape index and to the second flattener 7.

【００３４】この微細スペクトル概形は、微細スペクト
ル概形を直接量子化して求めてもよいし、過去のフレー
ムの微細スペクトル概形を線形合成して求めてもよい。
また、過去および現在のフレームの量子化された微細ス
ペクトル概形の情報を線形合成して求めてもよい。さら
に、この微細スペクトル概形は、例えば、第２の平坦化
信号の絶対値に３から５程度の幅の窓関数を畳み込んだ
ものを用いてもよいし、サブバンド分割した第２の平坦
化信号の振幅の代表値を各バンドごとに用意し、これを
概形としてもよい。The fine spectral outline may be obtained by directly quantizing the fine spectral outline or by linearly synthesizing the fine spectral outline of a past frame.
Alternatively, the information of the quantized fine spectral outline of the past and current frames may be obtained by linear synthesis. Further, the fine spectrum outline may be obtained by convolving the absolute value of the second flattened signal with a window function having a width of about 3 to 5 or using the second flattened signal obtained by subband division. A representative value of the amplitude of the digitized signal may be prepared for each band, and this may be used as an outline.

【００３５】第２平坦化器７は、加算器５から入力され
た第２の平坦化信号を微細スペクトル概形計算・量子化
器６で得られた微細スペクトル概形で除算して平坦化
し、第３の平坦化信号として量子化器８に出力する。こ
の量子化器８は、該第３の平坦化信号をスカラ量子化あ
るいはベクトル量子化し、量子化インデックスとして外
部に出力する。The second flattener 7 divides the second flattened signal input from the adder 5 by the fine spectral outline obtained by the fine spectral outline calculator / quantizer 6 to flatten it. The signal is output to the quantizer 8 as a third flattened signal. The quantizer 8 performs scalar quantization or vector quantization on the third flattened signal, and outputs it as a quantization index to the outside.

【００３６】なお、ベクトル量子化する場合は、フレー
ムの全サンプルを一括で量子化してもよいが、フレーム
のサンプル列を複数のサブベクトルに分割して、このサ
ブベクトルごとに量子化する方が演算量の面で現実的で
ある。また、分割の方法は、単純なサブバンド分割でも
よいし、サンプルをインタリーブしてから分割するイン
タリーブ分割でもよい。また、量子化の際必要な情報量
にあわせて適応的ビット割り当てをしてもよい。In the case of vector quantization, all samples of a frame may be quantized collectively. However, it is better to divide the sample sequence of a frame into a plurality of subvectors and quantize each subvector. It is realistic in terms of computational complexity. Further, the division method may be a simple subband division or an interleave division in which samples are interleaved and then divided. Also, adaptive bit allocation may be performed in accordance with the amount of information necessary for quantization.

【００３７】次に、復号器Ｂについて説明する。図１に
示すように、復号器Ｂは、再生器９、微細スペクトル概
形再生器１０、第１逆平坦化器１１、ピッチ再生器１
２、加算器１３、大局的概形再生器１４、第２逆平坦化
器１５、および時間−周波数逆変換器１６によって構成
されている。Next, the decoder B will be described. As shown in FIG. 1, the decoder B includes a regenerator 9, a fine spectrum rough shape regenerator 10, a first inverse flattener 11, and a pitch regenerator 1.
2, an adder 13, a global outline regenerator 14, a second inverse flattener 15, and an inverse time-frequency converter 16.

【００３８】このうち、再生器９は、上記符号器Ａから
伝送されてきた量子化インデックスから上記第３の平坦
化信号を再生する。この再生器９は、上記量子化器８の
逆処理を行うことにより第３の平坦化信号を再生し、第
１逆平坦化器１１に出力する。微細スペクトル概形再生
器１０は、符号器Ａから伝送されてきた微細スペクトル
概形量子化インデックスから微細スペクトル概形を再生
する。The regenerator 9 regenerates the third flattened signal from the quantization index transmitted from the encoder A. The regenerator 9 reproduces the third flattened signal by performing the inverse processing of the quantizer 8 and outputs the third flattened signal to the first inverse flattener 11. The fine-spectrum rough shape regenerator 10 recovers the fine-spectrum rough shape from the fine-spectrum rough shape quantization index transmitted from the encoder A.

【００３９】第１逆平坦化器１１は、再生器９から入力
された第３の平坦化信号に微細スペクトル概形を付加し
て、上記第２の平坦化信号を再生して加算器１３に出力
する。また、ピッチ再生器１２は、符号器Ａから伝送さ
れてきた量子化ピッチ成分インデックスおよび量子化ピ
ッチ基本周波数インデックスから上記ピッチ成分を再生
し、加算器１３に出力する。The first inverse flattener 11 adds a fine spectral outline to the third flattened signal input from the regenerator 9 to regenerate the second flattened signal and sends the second flattened signal to the adder 13. Output. The pitch reproducer 12 reproduces the pitch component from the quantized pitch component index and the quantized pitch fundamental frequency index transmitted from the encoder A, and outputs the reproduced pitch component to the adder 13.

【００４０】加算器１３は、第１逆平坦化器１１から入
力された第２の平坦化信号に、ピッチ再生器１２から入
力されたピッチ成分を加えて上記第１の平坦化信号を再
生し、第２逆平坦化器１５に出力する。また、大局的概
形再生器１４は、符号器Ａから伝送されてきた量子化大
局的概形インデックスから上記大局的概形を再生し、第
２逆平坦化器１５に出力する。The adder 13 reproduces the first flattened signal by adding the pitch component inputted from the pitch reproducer 12 to the second flattened signal inputted from the first inverse flattener 11. , To the second inverse flattener 15. Further, the global shape regenerator 14 regenerates the global shape from the quantized global shape index transmitted from the encoder A, and outputs it to the second inverse flattener 15.

【００４１】第２逆平坦化器１５は、加算器１３から入
力された第１の平坦化信号に、大局的概形再生器１４か
ら入力された大局的概形を付加し、上記周波数領域信号
を生成する。そして、時間−周波数逆変換器１６は、該
第２逆平坦化器１５から入力された周波数領域信号に時
間−周波数逆変換を施して復号し、時間領域の音声信号
あるいは楽音信号を出力する。The second inverse flattener 15 adds the global shape input from the global shape regenerator 14 to the first flattened signal input from the adder 13, and Generate Then, the inverse time-frequency converter 16 performs inverse time-frequency conversion on the frequency-domain signal input from the second inverse flattener 15 and decodes the signal, and outputs a time-domain audio signal or tone signal.

【００４２】[0042]

【発明の効果】以上説明したように、本発明によれば、
ピッチ成分を有する楽音信号あるいは音声信号を符号化
するに際し、該信号を周波数領域に変換した周波数領域
信号に現れるスパイク状のピッチ成分のの規則性を利用
して、これを高能率に符号化する。したがって、より平
坦化された残差係数を得ることができ、符号化器全体の
能率を高めることが可能である。As described above, according to the present invention,
When encoding a tone signal or voice signal having a pitch component, the signal is efficiently encoded by utilizing the regularity of spike-like pitch components appearing in a frequency domain signal obtained by converting the signal into a frequency domain. . Therefore, a more flattened residual coefficient can be obtained, and the efficiency of the entire encoder can be increased.

[Brief description of the drawings]

【図１】本発明の一実施形態を示す符号器および復号器
を説明する図である。FIG. 1 is a diagram illustrating an encoder and a decoder according to an embodiment of the present invention.

【図２】本発明において時間−周波数変換器の出力信号
の周波数特性を示す図である。FIG. 2 is a diagram showing a frequency characteristic of an output signal of a time-frequency converter in the present invention.

【図３】本発明においてピッチ符号化器の詳細構成を示
す図である。FIG. 3 is a diagram showing a detailed configuration of a pitch encoder in the present invention.

【図４】本発明において第２平坦化信号の周波数特性を
示す図である。FIG. 4 is a diagram showing a frequency characteristic of a second flattened signal in the present invention.

【図５】従来の変換符号化方法を説明する第１の図であ
る。FIG. 5 is a first diagram illustrating a conventional transform encoding method.

【図６】従来の変換符号化方法を説明する第２の図であ
る。FIG. 6 is a second diagram illustrating a conventional transform encoding method.

[Explanation of symbols]

１時間−周波数変換器２大局的概形計算・量子化器３第１平坦化器４ピッチ符号化器５、１３加算器６微細スペクトル概形計算・量子化器７第２平坦化器８量子化器９再生器１０微細スペクトル概形再生器１１第１逆平坦化器１２ピッチ再生器１４大局的概形再生器１５第２逆平坦化器１６時間−周波数逆変換器 DESCRIPTION OF SYMBOLS 1 Time-frequency converter 2 Global shape calculation / quantizer 3 First flattener 4 Pitch encoder 5, 13 Adder 6 Fine spectrum shape calculation / quantizer 7 Second flattener 8 Quantum Modifier 9 regenerator 10 fine spectrum rough shape regenerator 11 first inverse flattener 12 pitch regenerator 14 global shape regenerator 15 second inverse flattener 16 time-frequency inverse converter

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平１−239597（ＪＰ，Ａ) 特開昭57−62096（ＪＰ，Ａ) 特開昭57−161795（ＪＰ，Ａ) 特開昭63−37400（ＪＰ，Ａ) 特開平７−261800（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 11/04 G10L 19/00 - 19/02 G11B 20/10 H03M 7/30 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-1-239597 (JP, A) JP-A-57-62096 (JP, A) JP-A-57-161795 (JP, A) JP-A 63-209 37400 (JP, A) JP-A-7-261800 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-11/04 G10L 19/00-19/02 G11B 20/10 H03M 7/30

Claims

(57) [Claims]

1. A music signal or a voice signal for a certain period of time.
Divide into separate frames and convert each frame to time-frequency
Time-frequency conversion stage for generating a frequency domain signal
And generate an approximate signal of the frequency domain signal, and quantify the approximate signal.
Approximation calculation that outputs a quantized outline index
Quantizing the frequency domain signal with the quantized generalized signal;
A flattening step of generating a flattened signal by dividing, and detecting and quantizing a pitch component from the flattened signal to perform quantization.
Pitch encoding step that outputs a normalized pitch component index
If, removing the pitch component obtained by quantizing from the flattened signal
Output the quantized flattened signal index of the flattened signal
Transform coding method characterized by having a flattened signal quantization step that.

2. In the pitch encoding step, a fundamental frequency of a pitch component is obtained and quantized, and a frequency which is a natural number multiple of the fundamental frequency is obtained from a frequency domain signal.
Or pitch the sample at the frequency closest to this frequency
It is extracted as a component sample, quantized, and the quantized pitch fundamental frequency index obtained in this way is obtained.
And the quantized pitch component sample index
2. The transform encoding method according to claim 1 , wherein the output is output as a child pitch component index .

3. In the pitch encoding step, a frequency which is a natural number multiple of the fundamental frequency is obtained from the frequency domain signal.
Or the sample at the frequency closest to this frequency and
The pitch formation is performed with multiple consecutive samples including
3. The method according to claim 2 , wherein the component is extracted and quantized .

4. A pitch encoding step, comprising the steps of :
4. The transform coding method according to claim 2, wherein the pull is vector-quantized collectively or for each unit .

5. Flattening from a quantized flattened signal index
A flattening signal reproducing step of reproducing the signal, to reproduce a pitch component from the quantization pitch component index
Pitch reproduction step, and a rough signal for reproducing a rough signal from the quantized rough index.
Signal reproduction step, and the signal obtained by adding the pitch component to the flattened signal.
Inverse flattening to reproduce frequency domain signal by inverse flattening with shape signal
And performing time-frequency inverse transform on the frequency domain signal.
Time-frequency to generate a musical tone signal or audio signal in the area
Transform decoding method characterized by having a number inverse transformation stage.

6. A pitch reproducing step, wherein a fundamental frequency of a pitch component is decoded, and a quantized pitch component sample index is used as a pitch component.
Pitch component samples decoded from the
To a frequency that is a natural number times or the frequency closest to this frequency
The transform decoding method according to claim 5, wherein the signal is arranged as a frequency domain signal .

7. In the pitch reproduction stage, a pitch component
The pitch decoded from the quantized pitch component sample index
Switch component samples at natural frequency times the fundamental frequency or
Is the frequency sample closest to this frequency and contains
Frequency domain using multiple consecutive samples as one unit
7. The method according to claim 6, wherein the signal is arranged as a signal .

8. A pitch component sampling step, wherein a pitch component sample is sampled.
8. The conversion decoding method according to claim 6 , wherein the decoding is performed in a batch or for each unit .