JP3332132B2

JP3332132B2 - Voice coding method and apparatus

Info

Publication number: JP3332132B2
Application number: JP23999395A
Authority: JP
Inventors: 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-09-19
Filing date: 1995-09-19
Publication date: 2002-10-07
Anticipated expiration: 2015-09-19
Also published as: JPH0981194A

Abstract

PROBLEM TO BE SOLVED: To provide a method of voice coding by which a high-quality voice signal can be reproduced with a small amount of operation even at a low coding bit rate below about 4kbps. SOLUTION: The voice coding method which supplies the drive signals generated by using the drive vectors obtained from a drive vector coding book to a weighted synthesizing filter 113, thereby generates synthesized voice signals, searches the drive vector that minimizes the distortion of synthesized voice signals and outputs coding parameters representing the information on the above drive vector and the filter factor of the synthesizing filter 113 has plural adaptive coding books 110, 120, a noise coding book 121 and a signal classifier 130. When the nature of input voice signals is classified as to be periodical, or periodical and stationary by the signal classifier 130, the drive vectors obtained from the adaptive coding books 110, 120 are used to generate drive signals. Otherwise, the drive vectors obtained from the noise coding book 121 are used to generate drive signals.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、低ビットレートで
高品質の音声再生が可能な音声符号化方法および装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio encoding method and apparatus capable of reproducing high quality audio at a low bit rate.

【０００２】[0002]

【従来の技術】電話帯域の音声を４ｋｂｐｓ程度の伝送
レートで符号化するための有効な符号化方式として、Ｃ
ＥＬＰ(Code Excited Linear Prediction)が知られてい
る。この方式は、フレーム単位に分割された入力音声か
ら声道をモデル化した音声合成フィルタを求める処理
と、このフィルタの入力信号に当たる駆動信号となる駆
動ベクトルを求める処理に大別される。これらの処理の
うち、後者は符号帳に格納された複数の駆動ベクトルを
一つずつ音声合成フィルタに通し、得られた合成音声信
号ベクトルの歪、つまり入力音声信号に対する誤差を計
算し、この歪が最小となる駆動ベクトルを探索する処理
からなる。この処理は閉ループ探索と呼ばれており、８
ｋｂｐｓ程度の符号化ビットレートでも良好な音質を再
生するために非常に有効な方法である。2. Description of the Related Art As an effective coding method for coding telephone band voice at a transmission rate of about 4 kbps, C.P.
ELP (Code Excited Linear Prediction) is known. This method is roughly classified into a process of obtaining a speech synthesis filter that models a vocal tract from input speech divided in units of frames, and a process of obtaining a drive vector serving as a drive signal corresponding to an input signal of the filter. Among these processes, the latter passes a plurality of drive vectors stored in the codebook one by one through a speech synthesis filter, calculates distortion of an obtained synthesized speech signal vector, that is, an error with respect to an input speech signal, and calculates the distortion. Is a process of searching for a drive vector that minimizes. This process is called a closed loop search,
This is a very effective method for reproducing good sound quality even at an encoding bit rate of about kbps.

【０００３】しかし、符号化ビットレートが４ｋｂｐｓ
程度以下と低くなると、この方式では十分な品質の音声
を再生できなくなる問題がある。また、閉ループ探索に
多くの計算量を必要とすることも、この方式の問題点で
ある。以下、これらの問題点について具体的に説明す
る。However, the coding bit rate is 4 kbps.
If it is lower than this level, there is a problem that it is impossible to reproduce sound of sufficient quality in this method. Another problem of this method is that a large amount of calculation is required for the closed loop search. Hereinafter, these problems will be specifically described.

【０００４】ＣＥＬＰ方式に関しては、M.R.Schroeder
and B.S.Atal，“Code Excited Linear Prediction(CEL
P)： High Quality Speech at Very Low Bit Rates”，
Proc.ICASSP,pp.937-940,1985 およびW.S.Kleijin,D.J.
Krasinski et al.“ImprovedSpeech Quality and Effic
ient Vector Quantization in SELP ”，Proc.ICASSP,p
p.155-158,1988 で詳しく述べられている。[0004] Regarding the CELP system, MRSchroeder
and BSAtal, “Code Excited Linear Prediction (CEL
P): High Quality Speech at Very Low Bit Rates ”,
Proc.ICASSP, pp.937-940,1985 and WSKleijin, DJ
Krasinski et al. “ImprovedSpeech Quality and Effic
ient Vector Quantization in SELP ”, Proc.ICASSP, p
p.155-158, 1988.

【０００５】このＣＥＬＰ方式の概略を図７を用いて説
明する。入力端子４６０には、フレーム単位の音声信号
が入力される。この入力音声信号は線形予測分析部４５
０で分析され、重み付き合成フィルタ４３０のフィルタ
係数が求められると同時に、聴感重み付け部４４０に入
力され、重み付き入力音声信号が得られる。この重み付
き入力音声信号から重み付き合成フィルタ４３０の零状
態応答が差し引かれ、目標ベクトル４８０が生成され
る。The outline of the CELP system will be described with reference to FIG. The input terminal 460 receives an audio signal in frame units. This input audio signal is input to the linear prediction analysis unit 45.
At the same time as the analysis is performed at 0, the filter coefficient of the weighted synthesis filter 430 is obtained, and at the same time, it is input to the auditory weighting unit 440 to obtain a weighted input audio signal. The zero-state response of the weighted synthesis filter 430 is subtracted from the weighted input audio signal, and a target vector 480 is generated.

【０００６】次に、適応符号帳４１１から駆動ベクトル
が一つずつ読み出され、さらにゲイン回路４２１でゲイ
ンを乗じられた後、重み付き合成フィルタ４３０に駆動
信号として入力されることにより、合成音声信号ベクト
ルが生成される。この合成音声信号ベクトルの歪、すな
わち目標ベクトル４８０との差が評価部４７０で評価さ
れ、この歪がより小さくなるように駆動ベクトルが適応
符号帳４１１から探索され、最適なものが第１の駆動ベ
クトルとされる。次に、この第１の駆動ベクトルの影響
を考慮して、雑音符号帳４１２から第２の駆動ベクトル
が同様にして探索される。最後に、第１および第２の駆
動ベクトルにそれぞれゲイン回路４２１および４２２で
最適なゲインが乗じられ、駆動信号が生成される。この
駆動信号によって適応符号帳４１１の内容の更新が行わ
れ、次フレームの音声信号の入力に備えられる。Next, drive vectors are read out one by one from the adaptive codebook 411, multiplied by a gain in a gain circuit 421, and then input as a drive signal to a weighted synthesis filter 430, whereby a synthesized speech is obtained. A signal vector is generated. The evaluation unit 470 evaluates the distortion of the synthesized speech signal vector, that is, the difference from the target vector 480, and searches the adaptive codebook 411 for a drive vector so as to reduce the distortion. Vector. Next, a second drive vector is similarly searched from the noise codebook 412 in consideration of the effect of the first drive vector. Finally, the first and second drive vectors are multiplied by the optimum gains in gain circuits 421 and 422, respectively, to generate drive signals. The contents of the adaptive codebook 411 are updated by this drive signal, and the update is prepared for the input of the audio signal of the next frame.

【０００７】以上の通り、ＣＥＬＰ方式では適応符号帳
４１１および雑音符号帳４１２に格納されている全ての
駆動ベクトルに対応して、重み付き合成フィルタ４３０
によるフィルタリング処理、合成音声信号の歪計算を行
い、この歪を最小とする駆動ベクトルを探索する閉ルー
プ探索を行っている。このＣＥＬＰ方式によると、８ｋ
ｂｐｓ程度の符号化ビットレートでは比較的良好な品質
の音声信号を再生することができるが、４ｋｂｐｓ程度
以下の低い符号化ビットレートになると、適応符号帳４
１１や雑音符号帳４１２の内容を更新する周期が長くな
ったり、ビット配分が少なくなるために、再生音声信号
の品質が劣化するという問題がある。As described above, in the CELP system, the weighted synthesis filter 430 corresponds to all the drive vectors stored in the adaptive codebook 411 and the noise codebook 412.
To perform a closed loop search for searching for a drive vector that minimizes the distortion. According to this CELP system, 8k
At an encoding bit rate of about bps, a relatively good quality audio signal can be reproduced, but at a low encoding bit rate of about 4 kbps or less, the adaptive codebook 4 can be reproduced.
There is a problem that the quality of the reproduced audio signal is degraded because the cycle of updating the contents of the H.11 or the noise codebook 412 becomes longer or the bit allocation becomes smaller.

【０００８】この問題点についてさらに詳しく説明する
と、図７の構成では重み付き合成フィルタ４３０を駆動
するための駆動信号源が１段目の適応符号帳４１１と２
段目の雑音符号帳４１２からなる２段構成となってお
り、入力音声信号が周期的である場合、適応符号帳４１
１が駆動信号の周期的な成分を表現し、また適応符号帳
４１１だけでは表現できなかった駆動信号の残差成分を
雑音符号帳４１２が表現するというようにして、適応符
号帳４１１と雑音符号帳４１２が役割分担を行う構成と
なっている。This problem will be described in more detail. In the configuration of FIG. 7, the driving signal source for driving the weighted synthesis filter 430 is the first-stage adaptive codebooks 411 and 2.
It has a two-stage configuration consisting of a noise codebook 412 at the stage, and when the input speech signal is periodic, the adaptive codebook 41
1 expresses the periodic component of the drive signal, and the noise codebook 412 expresses the residual component of the drive signal that cannot be expressed by the adaptive codebook 411 alone. The book 412 is configured to share roles.

【０００９】この構成は、符号化ビットレートが８ｋｂ
ｐｓ程度のときは良好に作用し、高品質の再生音声信号
が得られていた。しかし、４ｋｂｐｓ程度以下の低い符
号化ビットレートになると、適応符号帳４１１や雑音符
号帳４１２の内容を更新する周期（一般に、サブフレー
ム呼ばれる）が長くなると共に、割り当てられるビット
数が少なくなるため、適応符号帳４１１が駆動信号の周
期的な成分を十分な精度で表現することができなくな
る。この結果、適応符号帳４１１では表現できない駆動
信号の残差成分に周期的な成分が残留し、これを雑音符
号帳４１２が表現することができないため、再生音声の
品質が劣化するのである。In this configuration, the encoding bit rate is 8 kb.
At about ps, it works well and a high-quality reproduced audio signal is obtained. However, at a low coding bit rate of about 4 kbps or less, the cycle of updating the contents of the adaptive codebook 411 and the noise codebook 412 (generally called a subframe) becomes longer, and the number of allocated bits decreases. Adaptive codebook 411 cannot express the periodic component of the drive signal with sufficient accuracy. As a result, a periodic component remains in the residual component of the drive signal that cannot be expressed by the adaptive codebook 411, and this cannot be expressed by the noise codebook 412, so that the quality of reproduced sound is degraded.

【００１０】また、ＣＥＬＰ方式では閉ループ探索を実
行するため、膨大な演算がかかるという問題がある。こ
の演算量の問題に対しては、駆動ベクトルの閉ループ探
索に必要な演算量を削減する方式が、R.C.RoseとT.P.Ba
rnwell IIIによって考案されている。この方式はＳＥＶ
(Self Excited Vocoder)と呼ばれており、その動作や性
能については、 R.C.Rose and T.P.Barnwell III：“Qu
ality Comparison ofLow Complexity 4800 bps Self Ex
cited and Code Excited Vocoders”,Proc.ofICASSP'8
7,pp.1637-1640,1987に記述されている。以下、このＳ
ＥＶ方式を簡単に説明する。In addition, the CELP method has a problem that a huge amount of calculation is required to execute a closed loop search. To solve this computational problem, RCRose and TPBa
Invented by rnwell III. This method is SEV
(Self Excited Vocoder), and its operation and performance are described in RCRose and TPBarnwell III: “Qu
quality Comparison of Low Complexity 4800 bps Self Ex
cited and Code Excited Vocoders ”, Proc.ofICASSP'8
7, pp. 1637-1640, 1987. Hereinafter, this S
The EV system will be briefly described.

【００１１】図８は、ＳＥＶ方式の音声符号化装置にお
ける駆動信号探索部のブロック図である。図７との比較
から分かるように、ＣＥＬＰ方式とＳＥＶ方式の大きな
違いは合成フィルタの駆動信号の構成法にある。ＣＥＬ
Ｐ方式では、適応符号帳と雑音符号帳を用いて駆動信号
を構成するのに対し、ＳＥＶ方式では１または２個の適
応符号帳（文献ではLong-term predictor と呼んでい
る）５００，５１０のみを用いて駆動信号を構成すると
いうシンプルな構成になっており、また適応符号帳の特
殊な構成を利用して演算量の削減を図っている。すなわ
ち、適応符号帳５００，５１０内の各駆動ベクトルが過
去の駆動信号からピッチ周期に対応した遅延分シフトし
ながら構成され、その結果、各駆動ベクトルはその要素
がオーバラップする構成になっていることを利用して、
駆動ベクトルに対する重み付き合成フィルタ５２０の出
力の計算を再帰的に行うことで、演算量を約１／２０に
削減している。FIG. 8 is a block diagram of a drive signal search section in the SEV type speech coding apparatus. As can be seen from a comparison with FIG. 7, a major difference between the CELP system and the SEV system lies in the method of configuring the driving signal of the synthesis filter. CEL
In the P system, a drive signal is configured using an adaptive codebook and a noise codebook, whereas in the SEV system, only one or two adaptive codebooks (referred to as long-term predictors in the literature) 500 and 510 Is used to form a drive signal, and a special configuration of the adaptive codebook is used to reduce the amount of calculation. That is, each drive vector in adaptive codebooks 500 and 510 is configured to be shifted from the previous drive signal by a delay corresponding to the pitch period, and as a result, each drive vector has a configuration in which its elements overlap. Utilizing that
The calculation amount is reduced to about 1/20 by recursively calculating the output of the weighted synthesis filter 520 for the drive vector.

【００１２】しかしながら、ＳＥＶ方式は入力音声信号
の性質によらず常に適応符号帳を用いて過去の履歴から
駆動信号を作成しているため、入力音声信号が無声区間
のように過去の信号との相関が少ない場合や、信号の立
ち上がり部分や立ち下がり部分、無声区間から有声区間
への過渡部分、有声区間から無声区間への過渡部分な
ど、音声の性質が変化する部分で適切な駆動信号を生成
することができず、音質がＣＥＬＰ方式より更に劣化し
てしまうという問題がある。However, in the SEV system, the drive signal is always generated from the past history using the adaptive codebook regardless of the properties of the input speech signal. Generates an appropriate drive signal in areas where the characteristics of speech change, such as when there is little correlation, in the rising or falling part of the signal, in the transition from unvoiced to voiced, or in the transition from voiced to unvoiced. Therefore, there is a problem that the sound quality is further deteriorated as compared with the CELP system.

【００１３】[0013]

【発明が解決しようとする課題】上述したように、従来
の音声符号化方式のうちＣＥＬＰ方式は４ｋｂｐｓ程度
以下の低い符号化ビットレートでは、高品質の音声信号
を再生することができないばかりでなく、演算量が多い
という問題があり、またＳＥＶ方式では演算量は削減さ
れるが、ＳＥＬＰ方式よりさらに再生音声信号の品質が
劣化するという問題があった。As described above, among the conventional speech coding systems, the CELP system cannot reproduce high quality speech signals at a low coding bit rate of about 4 kbps or less. The SEV method has a problem that the amount of calculation is large, and the amount of calculation is reduced in the SEV method, but the quality of the reproduced audio signal is further deteriorated as compared with the SELP method.

【００１４】本発明の目的は、４ｋｂｐｓ程度以下の低
い符号化ビットレートでも、少ない演算量で高品質の音
声信号を再生することができる音声符号化および装置を
提供することにある。An object of the present invention is to provide a speech encoding apparatus and apparatus capable of reproducing a high-quality speech signal with a small amount of computation even at a low encoding bit rate of about 4 kbps or less.

【００１５】[0015]

【課題を解決するための手段】上記課題を解決するた
め、本発明は複数の駆動ベクトル符号帳から得られる駆
動ベクトルを用いて駆動信号を生成し、この駆動信号を
入力音声信号の分析結果に基づいてフィルタ係数が決定
される合成フィルタに供給し、この合成フィルタから出
力される合成音声信号の歪がより小さくなる駆動ベクト
ルを前記駆動ベクトル符号帳から探索して、少なくとも
該駆動ベクトルおよび前記合成フィルタのフィルタ係数
の情報を表す符号化パラメータを出力する音声符号化方
法において、駆動ベクトル符号帳として少なくとも２つ
の適応符号帳と少なくとも一つの固定符号帳および少な
くとも一つの雑音符号帳を用意しておき、第１段階で第
１の適応符号帳および固定符号帳のうち前記合成音声信
号の歪み小さくなる方の符号帳を選択し、第２段階で第
１段階で第１の適応符号帳が選択されたか固定符号帳が
選択されたか否かを判定して入力音声信号の性質を周期
的か否か、または周期的かつ定常的か否かにより分類
し、入力音声信号の性質が周期的または周期的かつ定常
的の場合には少なくとも２つの適応符号帳から得られる
駆動ベクトルを用いて駆動信号を生成し、入力音声信号
の性質が非周期的または周期的かつ非定常的の場合には
少なくとも雑音符号帳から得られる駆動ベクトルを用い
て駆動信号を生成することを特徴とする。In order to solve the above-mentioned problems, the present invention generates a drive signal using drive vectors obtained from a plurality of drive vector codebooks, and converts the drive signal into an analysis result of an input speech signal. The driving coefficient is supplied to a synthesis filter whose filter coefficient is determined based on the driving vector codebook. In a speech coding method for outputting coding parameters representing information of filter coefficients of a filter, at least two adaptive codebooks , at least one fixed codebook and at least one noise codebook are prepared as drive vector codebooks. , The first stage
1 of the adaptive codebook and the fixed codebook.
The codebook with the smaller signal distortion is selected.
Whether the first adaptive codebook was selected in one step or the fixed codebook
Determines whether or not it has been selected and cycles the properties of the input audio signal
Classification based on whether it is periodic or periodic and stationary
However, when the property of the input speech signal is periodic or periodic and stationary, a drive signal is generated using a drive vector obtained from at least two adaptive codebooks, and the property of the input speech signal is aperiodic or In a case where the driving signal is periodic and non-stationary, a driving signal is generated using at least a driving vector obtained from a noise codebook.

【００１６】本発明に係る音声符号化装置は、駆動ベク
トル符号帳として用意された少なくとも２つの適応符号
帳と少なくとも一つの固定符号帳および少なくとも一つ
の雑音符号帳と、第１段階で第１の適応符号帳および固
定符号帳のうち前記合成音声信号の歪が小さくなる方の
符号帳を選択し、第２段階で第１の適応符号帳が選択さ
れたか固定符号帳が選択されたか否かを判定して入力音
声信号の性質が周期的または周期的かつ定常的か否かに
より分類する分類手段と、この分類手段により入力音声
信号の性質が周期的または周期的かつ定常的と分類され
た場合には少なくとも２つの適応符号帳から得られる駆
動ベクトルを用いて駆動信号を生成し、入力音声信号の
性質が非周期的または周期的かつ非定常的と分類された
場合には少なくとも雑音符号帳から得られる駆動ベクト
ルを用いて駆動信号を生成する駆動信号生成手段とを有
することを特徴とする。A speech coding apparatus according to the present invention comprises at least two adaptive codebooks prepared as a driving vector codebook , at least one fixed codebook and at least one noise codebook , Adaptive codebook and fixed
In the fixed codebook, the one in which the distortion of the synthesized speech signal is smaller
Codebook, and the first adaptive codebook is selected in the second stage.
Classifying means for judging whether or not the input speech signal is periodic or periodic and stationary based on whether the input speech signal is periodic or periodic and stationary. If the input speech signal is classified as periodic or periodic and stationary, a drive signal is generated using a drive vector obtained from at least two adaptive codebooks, and the property of the input speech signal is non-periodic or periodic and non-stationary. And a driving signal generating means for generating a driving signal using at least a driving vector obtained from the noise codebook.

【００１７】このように本発明においては、合成フィル
タの駆動信号は多段の駆動ベクトル符号帳に格納された
駆動ベクトルの線形結合で構成される。そして、入力音
声信号の性質を周期的か否か、または周期的かつ定常か
否かにより分類し、周期的または周期的かつ定常的と分
類された場合は、少なくとも２段の適応符号帳から得ら
れる駆動ベクトルを用いて駆動信号を生成し、非周期的
または周期的かつ非定常的と分類された場合は、少なく
とも雑音符号帳から得られる駆動ベクトルを用いて駆動
信号を生成する。As described above, in the present invention, the drive signal of the synthesis filter is formed by a linear combination of the drive vectors stored in the multi-stage drive vector codebook. Then, the characteristics of the input speech signal are classified according to whether they are periodic or whether they are periodic and stationary. If the characteristics are classified as periodic or periodic and stationary, the characteristics are obtained from at least two stages of adaptive codebooks. It generates a drive signal with a drive vector which is non-periodic
Or if it is classified as periodic and non-stationary,
Both are driven using the drive vector obtained from the noise codebook
Generate a signal .

【００１８】前述のように従来のＣＥＬＰ方式では、入
力音声信号が周期的である場合、１段目の適応符号帳で
駆動信号の周期的な成分を表現し、この適応符号帳だけ
で表現できなかった駆動信号の残差成分を２段目の雑音
符号帳で表現する構成となっていたため、４ｋｂｐｓ程
度以下の低い符号化ビットレートでは適応符号帳や雑音
符号帳の符号を更新する周期が長くなると共に、割り当
てられるビット数が少なくなる。従って、適応符号帳が
周期的な信号を十分な精度で表現することができなくな
り、適応符号帳の残差の成分に周期的な成分が残留し、
この残留した周期成分を雑音符号帳が表現することがで
きないことが原因で、再生音声信号の品質が劣化してい
た。As described above, in the conventional CELP system, when the input speech signal is periodic, the periodic component of the drive signal is expressed by the first-stage adaptive codebook, and can be expressed only by this adaptive codebook. Since the residual component of the drive signal that did not exist is represented by the second stage noise codebook, the cycle of updating the code of the adaptive codebook or the noise codebook is long at a low coding bit rate of about 4 kbps or less. And the number of allocated bits decreases. Therefore, the adaptive codebook cannot express a periodic signal with sufficient accuracy, and a periodic component remains in a residual component of the adaptive codebook,
Due to the inability of the noise codebook to represent the remaining periodic component, the quality of the reproduced audio signal has deteriorated.

【００１９】これに対して、本発明では入力音声信号が
周期的または周期的かつ定常的である場合は、１段目の
第１の適応符号帳で駆動信号の周期的な成分を表現した
後、符号化ビットレートの低下で１段目の残差に周期的
な成分が残留した場合でも、この周期成分を２段目の第
２の適応符号帳で表現することができる。このため、低
い符号化ビットレートでも高品質の再生音声信号を得る
ことができる。On the other hand, in the present invention, when the input speech signal is periodic or periodic and stationary, the first adaptive codebook of the first stage expresses the periodic component of the drive signal. Even if a periodic component remains in the residual of the first stage due to a decrease in the encoding bit rate, this periodic component can be expressed by the second adaptive codebook of the second stage. Therefore, a high-quality reproduced audio signal can be obtained even at a low encoding bit rate.

【００２０】一方、入力音声信号が非周期的または周期
的かつ非定常的の場合、本発明では駆動信号の生成に少
なくとも雑音符号帳と多くとも一つの適応符号帳を用い
ることにより、入力音声信号が周期的でなく、または周
期的であっても定常的でなく過去の駆動信号との相関が
小さい信号である場合、過去の駆動信号との相関を利用
する適応符号帳を多用せず、雑音符号帳を利用する構成
とすることができるので、良好な再生音声信号を得るこ
とができる。On the other hand, when the input speech signal is non-periodic or periodic and non-stationary, the present invention uses at least a noise codebook and at most one adaptive codebook to generate the drive signal. Is not periodic, or a signal that is not stationary even if it is periodic and has a small correlation with the past drive signal, does not use the adaptive codebook that uses the correlation with the past drive signal, Since a configuration using a codebook can be used, a good reproduced audio signal can be obtained.

【００２１】このように本発明では、入力音声信号が周
期的か否かまたは周期的かつ定常的か否かに応じて、駆
動信号を複数段の適応符号帳で構成するか、雑音符号帳
で構成するかを切り替えることにより、入力音声信号の
性質に適合した駆動信号を生成できるので、低い符号化
ビットレートでも品質の良好な再生音声信号を得ること
が可能となる。As described above, according to the present invention, the drive signal is composed of a plurality of stages of adaptive codebooks or a noise codebook depending on whether the input speech signal is periodic or periodic and stationary. By switching the configuration, a drive signal suitable for the properties of the input audio signal can be generated, so that a high-quality reproduced audio signal can be obtained even at a low encoding bit rate.

【００２２】[0022]

【００２３】[0023]

【００２４】[0024]

BEST MODE FOR CARRYING OUT THE INVENTION

（第１の実施形態）図１は、本発明の第１の実施形態に
係る音声符号化装置の構成を示すブロック図であり、音
声信号の入力端子１００、バッファ１０１、ＬＰＣ分析
部１０２、ＬＳＰ量子化器１０３、サブフレーム分割回
路１０４、重み付けフィルタ１０５、１段目の駆動信号
源を構成する第１の適応符号帳１１０、ゲイン回路１１
１、加算器１１２、重み付き合成フィルタ１１３、減算
器１１４、２段目の駆動信号源を構成する第２の適応符
号帳１２０と雑音符号帳１２１、切り替え器１２２、ゲ
イン回路１２３、利得符号帳１２４、信号分類器１３
０、符号帳１１０，１２０，１２１，１２４を探索する
ための探索部１４０，１４１，１４２、マルチプレクサ
１５０および符号化パラメータの出力端子１５１を有す
る。(First Embodiment) FIG. 1 is a block diagram showing a configuration of a speech coding apparatus according to a first embodiment of the present invention, in which a speech signal input terminal 100, a buffer 101, an LPC analysis unit 102, an LSP Quantizer 103, subframe division circuit 104, weighting filter 105, first adaptive codebook 110 constituting first-stage drive signal source, gain circuit 11
1, an adder 112, a weighted synthesis filter 113, a subtractor 114, a second adaptive codebook 120 and a noise codebook 121 constituting a second-stage drive signal source, a switch 122, a gain circuit 123, a gain codebook 124, signal classifier 13
0, search units 140, 141, 142 for searching the codebooks 110, 120, 121, 124, a multiplexer 150, and an output terminal 151 for encoding parameters.

【００２５】ここで、信号分類器１３０は入力音声信号
の性質を周期的または周期的かつ定常的か否かにより分
類して分類判定信号を出力するものである。また、切り
替え器１２２は信号分類器１３０からの分類判定信号に
基づいて２段目の駆動信号源として第２の適応符号帳１
２０および雑音符号帳１２１のいずれかを選択して取り
出すためのものである。Here, the signal classifier 130 classifies the property of the input speech signal according to whether it is periodic or periodic and stationary and outputs a classification determination signal. Further, the switcher 122 uses the second adaptive codebook 1 as a second-stage drive signal source based on the classification determination signal from the signal classifier 130.
20 and any one of the random codebook 121.

【００２６】次に、本実施形態の音声符号化装置の動作
を説明する。まず、ディジタル化された音声信号が入力
端子１００から入力され、フレームと呼ばれる一定間隔
（例えば、２０ｍｓ）の区間に分割されてバッファ１０
１に蓄えられる。バッファ１０１からフレーム単位で読
み出された入力音声信号は、フレーム単位でＬＰＣ分析
部１０２によって線形予測分析され、入力音声信号のス
ペクトル包絡を表すパラメータである線形予測係数ａ_i
（ｉ＝１，…，ｐ）が計算されてＬＳＰ量子化器１０３
に入力される。Next, the operation of the speech coding apparatus according to the present embodiment will be described. First, a digitized audio signal is input from an input terminal 100, is divided into intervals called frames (e.g., 20 ms), and is divided into buffers 10.
Stored in 1. The input audio signal read out from the buffer 101 in frame units is subjected to linear prediction analysis by the LPC analysis unit 102 in frame units, and a linear prediction coefficient a _i which is a parameter representing a spectrum envelope of the input audio signal.
(I = 1,..., P) are calculated and the LSP quantizer 103
Is input to

【００２７】ＬＳＰ量子化器１０３では、線形予測係数
をＬＳＰ（線スペクトル対）パラメータに変換した後、
予め定められたビット数で量子化する。このＬＳＰ量子
化器１０３で量子化されたＬＳＰパラメータは、マルチ
プレクサ１５０に入力されるとともに、復号されて線形
予測係数に逆変換され、量子化された線形予測係数ａｑ
_i （ｉ＝１，…，ｐ）が得られる。線形予測分析の方
法、ＬＳＰパラメータの求め方、ＬＳＰパラメータの量
子化方法については、周知の方法を用いることができ
る。量子化された線形予測係数は、重み付けフィルタ１
０５と重み付き合成フィルタ１１３に与えられる。The LSP quantizer 103 converts the linear prediction coefficients into LSP (line spectrum pair) parameters,
Quantization is performed with a predetermined number of bits. The LSP parameters quantized by the LSP quantizer 103 are input to the multiplexer 150, and are also decoded and inversely transformed into linear prediction coefficients, and the quantized linear prediction coefficients aq
_i (i = 1,..., p) is obtained. Known methods can be used for the method of linear prediction analysis, the method of obtaining LSP parameters, and the method of quantizing LSP parameters. The quantized linear prediction coefficients are assigned to a weighting filter 1
05 and the weighted synthesis filter 113.

【００２８】一方、バッファ１０１からフレーム単位で
読み出された入力音声信号は、サブフレーム分割回路１
０４によって１フレーム当たりサブフレームと呼ばれる
複数の区間に分割され、サブフレーム単位で重み付け合
成フィルタ１０５に入力されて、駆動信号探索の目標ベ
クトルとなる。そして、この重み付け合成フィルタ１１
３を駆動する駆動信号の探索処理がサブフレーム単位で
以下のようにして行われる。On the other hand, the input audio signal read out from the buffer 101 on a frame basis is
Each frame is divided into a plurality of sections called subframes by 04, and is input to the weighting synthesis filter 105 in subframe units to become a target vector for driving signal search. Then, the weighted synthesis filter 11
The search processing of the drive signal for driving No. 3 is performed on a subframe basis as follows.

【００２９】本実施形態の音声符号化装置では、第１の
適応符号帳１１０を１段目の駆動信号源、第２の適応符
号帳１２０と雑音符号帳１２１のいずれか一方を２段目
の駆動信号源とする２段構成となっており、各段の符号
帳から駆動ベクトルが順次探索される。この探索手順を
図３に示すフローチャートを用いて説明する。In the speech coding apparatus of the present embodiment, the first adaptive codebook 110 is used as a drive signal source in the first stage, and one of the second adaptive codebook 120 and the noise codebook 121 is used in the second stage. It has a two-stage configuration as a drive signal source, and a drive vector is sequentially searched from the codebook of each stage. This search procedure will be described with reference to the flowchart shown in FIG.

【００３０】まず、１段目の第１の適応符号帳１１０に
ついて駆動ベクトルの探索を行う。すなわち、入力端子
１００より音声信号が入力されると（ステップＳ１）、
適応符号帳１１０の探索に先立ち、重み付けフィルタ１
０５で入力音声信号に聴感重み付けを施すと共に、前サ
ブフレームからの重み付け合成フィルタ１０５の影響を
差し引くことにより目標ベクトルＸを生成し（ステップ
Ｓ２）、第１の適応符号帳１１０の探索を行う（ステッ
プＳ３）。第１の適応符号帳１１０は音声信号の周期成
分（ピッチ）を表現するのに用いられ、この適応符号帳
１１０に格納される駆動ベクトルｅ(n) は、次式で表さ
れるように、過去の駆動信号をサブフレーム長分切り出
すことにより作成される。First, a drive vector search is performed for the first adaptive codebook 110 in the first stage. That is, when an audio signal is input from the input terminal 100 (step S1),
Prior to searching adaptive codebook 110, weighting filter 1
At 05, the input speech signal is subjected to perceptual weighting, the target vector X is generated by subtracting the influence of the weighting synthesis filter 105 from the previous subframe (step S2), and the first adaptive codebook 110 is searched (step S2). Step S3). The first adaptive codebook 110 is used to represent a periodic component (pitch) of a speech signal, and a driving vector e (n) stored in the adaptive codebook 110 is expressed by the following equation. It is created by cutting out the past drive signal by the subframe length.

【００３１】ｅ(n) ＝ｅ(n-L) ，ｎ＝１，…，Ｎ（１）ここで、Ｌはラグ、Ｎはサブフレーム長である。第１の
適応符号帳１１０の探索は、探索部１４０によって、駆
動ベクトルｅを重み付き合成フィルタ１１３に通すこと
によって得られる合成音声信号ベクトルの歪、すなわち
目標ベクトルＸに対する合成音声信号ベクトルとの誤差
を最小とするラグを周知の方法に従って探索することで
行われる。ラグは、整数サンプルまたは小数サンプル単
位とすることができる。E (n) = e (nL), n = 1,..., N (1) where L is a lag and N is a subframe length. The search for the first adaptive codebook 110 is performed by the search unit 140, which is a distortion of the synthesized speech signal vector obtained by passing the drive vector e through the weighted synthesis filter 113, that is, an error between the target vector X and the synthesized speech signal vector. The search is performed by searching for a lag that minimizes according to a known method. The lag can be in whole or fractional samples.

【００３２】重み付けフィルタ１０５および重み付き合
成フィルタ１１３は、周知の方法で構成することがで
き、一例としてＷ(z) およびＨｗ(z) をそれぞれの伝達
関数とすると、Ｗ(z) ＝Ａ(z/r) ／Ａ（z/r') （２）Ｈｗ(z) ＝Ｗ(z) ／Ａｑ(z) （３）Ａ(z) ＝１＋Σａ_i ｚ^-i （４）Ａｑ(z) ＝１＋Σａｑ_i ｚ^-i （５）と表すことができる。The weighting filter 105 and the weighting synthesis filter 113 can be formed by a known method. For example, if W (z) and Hw (z) are transfer functions, respectively, W (z) = A ( z / r) / A (z / r ') (2) Hw (z) = W (z) / Aq (z) (3) A (z) = 1 + Σa i z -i (4) Aq (z) = can be expressed as _{^{1 + Σaq i z -i (5}} ).

【００３３】次に、２段目の符号帳の探索を探索部１４
１によって行う。この場合、まず信号分類器１３０によ
り符号化対象の入力音声信号が周期的か否か、または周
期的かつ定常か否かにより分類する（ステップＳ４）。
具体的には、公知の有声／無声判定法、さらにはこれに
ピッチ周期の連続性判定を組み合わせた手法、あるいは
自己相関法などによって分類を行う。Next, the search unit 14 searches the codebook in the second stage.
Perform by 1. In this case, first, the input speech signal to be encoded is classified by the signal classifier 130 according to whether it is periodic or whether it is periodic and stationary (step S4).
More specifically, classification is performed by a known voiced / unvoiced determination method, a method combining this with a determination of the continuity of the pitch period, or an autocorrelation method.

【００３４】図２は、音声信号の有声区間と無声区間の
波形の一例を示す図である。同図に示されるように、無
声区間から有声区間へ転移する過渡部分や、有声区間か
ら無声区間へ転移する過渡部分では、周期的であるもの
のピッチ周期や振幅変化が一様でない非定常的な状態と
なっており、過渡部分以外では周期的かつ定常的な状態
となっている。信号分類器１３０は、最も簡単には入力
音声信号が有声区間か無声区間かの判定により周期的か
否かで分類を行ってもよいが、有声区間についてさらに
定常的か否かの判定を行うことにより、周期的かつ定常
的か否かで分類を行ってもよい。FIG. 2 is a diagram showing an example of the waveform of a voice signal in a voiced section and an unvoiced section. As shown in the figure, in a transitional part transitioning from an unvoiced section to a voiced section, and in a transitional part transitioning from a voiced section to a unvoiced section, a non-stationary transition in which the pitch period and amplitude change are periodic but not uniform It is in a state, and is in a periodic and steady state except for the transient part. In the simplest case, the signal classifier 130 may perform the classification based on whether the input voice signal is a voiced section or an unvoiced section based on whether it is periodic or not. However, the signal classifier 130 determines whether the voiced section is more stationary. Accordingly, the classification may be performed based on whether the information is periodic and stationary.

【００３５】ステップＳ４での分類の結果、入力音声信
号が周期的または周期的かつ定常的である場合は、２段
目の符号帳として第２の適応符号帳１２０を探索し（ス
テップＳ５）、そうでない場合は雑音符号帳１２１を探
索する（ステップＳ６）。第２の適応符号帳１２０の探
索は、第１の符号帳１１０の探索と同様に行う。また、
雑音符号帳１２１の探索は、従来のＣＥＬＰ方式と同様
に行う。このとき、演算量を削減するためオーバラッピ
ング符号帳、バックワードフィルタリング、予備選択な
ど周知の技術を利用することができる。If the result of the classification in step S4 is that the input speech signal is periodic or periodic and stationary, the second adaptive codebook 120 is searched for as the second-stage codebook (step S5). Otherwise, the random codebook 121 is searched (step S6). The search for the second adaptive codebook 120 is performed in the same manner as the search for the first codebook 110. Also,
The search for the random codebook 121 is performed in the same manner as in the conventional CELP method. At this time, a known technique such as an overlapping codebook, backward filtering, and preliminary selection can be used to reduce the amount of calculation.

【００３６】最後に、利得符号帳１２４の探索を行う
（ステップＳ７）。利得符号帳１２４は、１段目と２段
目の符号帳から探索される駆動ベクトルに乗じるゲイン
を要素とするベクトルを代表ベクトルとして持つ。そし
て、１段目の第１の適応符号帳１１０から探索された駆
動ベクトルと、切り替え器１２２を介して取り出された
２段目の第１の適応符号帳１２０または雑音符号帳１２
１から探索された駆動ベクトルに、ゲイン回路１１１，
１２３によって利得符号帳１２４から探索されたゲイン
がそれぞれ乗じられ、さらに加算器１１２で加え合わせ
られて得られた駆動ベクトルを重み付き合成ベクトル１
１３に通して得られる合成音声信号ベクトルの歪（目標
ベクトルに対する誤差）が最小となるように、探索部１
４２で閉ループ的に周知の方法により利得符号帳１２４
が探索される。Finally, the gain codebook 124 is searched (step S7). The gain codebook 124 has, as a representative vector, a vector having a gain multiplied by a drive vector searched from the first and second codebooks as an element. Then, the driving vector searched from the first adaptive codebook 110 in the first stage and the first adaptive codebook 120 or the noise codebook 12 in the second stage extracted through the switch 122.
The gain vector 111,
The drive vector obtained by multiplying each of the gains 123 by the gain searched from the gain codebook 124 and further adding them by the adder 112 is added to the weighted combined vector 1
13 so that the distortion (error with respect to the target vector) of the synthesized speech signal vector obtained through step 13 is minimized.
At 42, a gain codebook 124 in a known manner in a closed loop
Is searched.

【００３７】以上のようにして探索部１４０，１４１，
１４２で探索された第１の適応符号帳１１０のインデッ
クス（第１のインデックス）、第２の適応符号帳１２０
または雑音符号帳１２１のインデックス（第２のインデ
ックス）、および利得符号帳１２４のインデックス（第
３のインデックス）は、ＬＳＰ量子化器１０３からの量
子化されたパラメータおよび信号分類器１３０からの分
類判定信号とともに、符号化パラメータとしてマルチプ
レクサ１５０を介して出力される。マルチプレクサ１５
０から出力される符号化パラメータは、伝送路または記
憶媒体へ送出される。As described above, the search units 140, 141,
The index (first index) of the first adaptive codebook 110 searched in 142, the second adaptive codebook 120
Alternatively, the index (second index) of the noise codebook 121 and the index (third index) of the gain codebook 124 are the quantized parameters from the LSP quantizer 103 and the classification judgment from the signal classifier 130. Along with the signal, it is output via the multiplexer 150 as an encoding parameter. Multiplexer 15
The encoding parameter output from 0 is transmitted to a transmission path or a storage medium.

【００３８】このように本実施形態の音声符号化装置に
おいては、入力音声信号が信号分類器１３０によって周
期的または周期的かつ定常的と分類された場合は、第１
の適応符号帳１１０で重み付き合成フィルタ１１３の駆
動信号の周期的な成分を表現し、さらに符号化ビットレ
ートの低下によって減算器１１４から得られる残差に周
期的な成分が残留した場合でも、この周期成分を第２の
適応符号帳１２０で表現することができるため、低い符
号化ビットレートでも音声復号化装置において高品質の
再生音声信号を得ることができる。As described above, in the speech coding apparatus according to the present embodiment, if the input speech signal is classified as periodic or periodic and stationary by the signal classifier 130, the first
, The periodic component of the drive signal of the weighted synthesis filter 113 is represented by the adaptive codebook 110, and even if the periodic component remains in the residual obtained from the subtractor 114 due to the decrease in the encoding bit rate, Since this periodic component can be expressed by the second adaptive codebook 120, a high-quality reproduced audio signal can be obtained in the audio decoding device even at a low encoding bit rate.

【００３９】また、信号分類器１３０によって入力音声
信号が非周期的または周期的かつ非定常的と分類された
場合には、雑音符号帳１２１あるいは適応符号帳１１０
と雑音符号帳１２１を用いて駆動信号を生成することに
より、過去の駆動信号との相関を利用する適応符号帳を
多用することなく、音声復号化装置において良好な再生
音声信号を得ることができる。If the input speech signal is classified as non-periodic or periodic and non-stationary by the signal classifier 130, the noise codebook 121 or the adaptive codebook 110
And a noise codebook 121 to generate a drive signal, it is possible to obtain a good reproduced audio signal in the audio decoding device without extensive use of an adaptive codebook utilizing correlation with past drive signals. .

【００４０】次に、本実施形態に係る音声復号化装置に
ついて説明する。図４は、本実施形態に係る音声復号化
装置の構成を示すブロック図である。この音声復号化装
置は符号化データの入力端子１６０、デマルチプレクサ
１６１、ＬＳＰ逆量子化器１６２、第１の適応符号帳１
７０、ゲイン回路１７１、加算器１７２、合成フィルタ
１７３、ポストフィルタ１７４、第２の適応符号帳１８
０、雑音符号帳１８１、切り替え器１８２、ゲイン回路
１８３、利得符号帳１８４および再生音声信号の出力端
子１９０からなる。Next, the speech decoding apparatus according to this embodiment will be described. FIG. 4 is a block diagram illustrating a configuration of the speech decoding device according to the present embodiment. This speech decoding apparatus includes an encoded data input terminal 160, a demultiplexer 161, an LSP inverse quantizer 162, a first adaptive codebook 1
70, gain circuit 171, adder 172, synthesis filter 173, post filter 174, second adaptive codebook 18
0, a noise codebook 181, a switch 182, a gain circuit 183, a gain codebook 184, and a reproduction audio signal output terminal 190.

【００４１】入力端子１６０には、図１に示した音声符
号化装置から出力される符号化パラメータが伝送路また
は記憶媒体を介して入力される。この符号化パラメータ
はデマルチプレクサ１６１に入力され、図１のＬＳＰ量
子化器１０３で量子化されたＬＳＰパラメータ、第１の
適応符号帳１１０のインデックス（第１のインデック
ス）、第２の適応符号帳１２０または雑音符号帳１２１
のインデックス（第２のインデックス）、利得符号帳１
２４のインデックス（第３のインデックス）および信号
分類器１３０からの分類判定信号が分離して復号され
る。The input terminal 160 receives coding parameters output from the speech coding apparatus shown in FIG. 1 via a transmission line or a storage medium. This coding parameter is input to the demultiplexer 161 and is quantized by the LSP quantizer 103 in FIG. 1, the LSP parameter, the index (first index) of the first adaptive codebook 110, and the second adaptive codebook. 120 or noise codebook 121
Index (second index), gain codebook 1
The 24 indices (third index) and the classification judgment signal from the signal classifier 130 are separated and decoded.

【００４２】デマルチプレクサ１６１の出力のうち、量
子化されたＬＳＰパラメータはＬＳＰ逆量子化器１６２
に入力されてＬＳＰパラメータが復元され、また第１の
インデックスは第１の適応符号帳１７０に、第２のイン
デックスは第２の適応符号帳１８０および雑音符号帳１
８１に、第３のインデックスは利得符号帳１８４に、分
類判定信号は切り替え器１８２にそれぞれ入力される。Among the outputs of the demultiplexer 161, the quantized LSP parameter is the LSP inverse quantizer 162
, The LSP parameters are restored, and the first index is assigned to the first adaptive codebook 170 and the second index is assigned to the second adaptive codebook 180 and the random codebook 1.
81, the third index is input to the gain codebook 184, and the classification determination signal is input to the switch 182.

【００４３】第１の適応符号帳１７０、第２の適応符号
帳１８０、雑音符号帳１８１および利得符号帳１８４
は、図１に示した音声符号化装置内の第１の適応符号帳
１１０、第２の適応符号帳１２０、雑音符号帳１２１お
よび利得符号帳１２４と同様に構成される。The first adaptive codebook 170, the second adaptive codebook 180, the noise codebook 181 and the gain codebook 184
Has the same configuration as the first adaptive codebook 110, the second adaptive codebook 120, the noise codebook 121, and the gain codebook 124 in the speech coding apparatus shown in FIG.

【００４４】この音声復号化装置の動作を説明すると、
デマルチプレクサ１６１から出力された第１のインデッ
クスに基づいて第１の適応符号帳１７０を探索して得ら
れた第１の駆動ベクトルは、ゲイン回路１７１において
利得符号帳１８４から第３のインデックスに基づいて探
索されたゲインが乗じられた後、加算器１７２に入力さ
れる。一方、第２のインデックスに基づいて第２の適応
符号帳１８０または雑音符号帳１８１を探索して得られ
た第２の駆動ベクトルは、分類判定信号に基づいて切り
替え器１８２で選択され、同様にゲイン回路１７２にお
いて利得符号帳１８４から第３のインデックスに基づい
て探索されたゲインが乗じられた後、加算器１７２に入
力される。The operation of the speech decoding apparatus will be described.
A first driving vector obtained by searching the first adaptive codebook 170 based on the first index output from the demultiplexer 161 is based on the third index from the gain codebook 184 in the gain circuit 171. After being multiplied by the searched gain, it is input to the adder 172. On the other hand, a second drive vector obtained by searching the second adaptive codebook 180 or the noise codebook 181 based on the second index is selected by the switch 182 based on the classification determination signal, and similarly, After being multiplied by the gain searched for based on the third index from the gain codebook 184 in the gain circuit 172, it is input to the adder 172.

【００４５】加算器１７２により合成された駆動ベクト
ルは、ＬＳＰ逆量子化器１６２からのＬＳＰパラメータ
によって伝達特性が制御される合成フィルタ１７３に駆
動信号として与えられ、この合成フィルタ１７３により
合成音声信号ベクトルが生成される。この合成音声信号
ベクトルは、同じくＬＳＰパラメータによって伝達特性
が制御されるポストフィルタ１７４に入力され、ここで
再生音声の主観品質を向上させるための周知の処理、例
えばピッチ強調、ホルマント強調、高域強調およびゲイ
ン調整などの処理が施された後、出力端子１９０より再
生音声信号として出力される。この再生音声信号は、前
述した通り高品質のものとなる。The drive vector synthesized by the adder 172 is provided as a drive signal to a synthesis filter 173 whose transfer characteristic is controlled by the LSP parameter from the LSP dequantizer 162, and the synthesized filter 173 synthesizes a speech signal vector. Is generated. This synthesized speech signal vector is input to a post-filter 174, whose transfer characteristics are also controlled by LSP parameters, where known processing for improving the subjective quality of the reproduced speech, such as pitch emphasis, formant emphasis, and high-frequency emphasis After being subjected to processing such as gain adjustment and the like, it is output from the output terminal 190 as a reproduced audio signal. This reproduced audio signal has high quality as described above.

【００４６】（第２の実施形態）図５は、第２の実施形
態に係る音声符号化装置のブロック図である。図１と相
対応する部分に同一の参照符号を付して第１の実施形態
との相違点を中心に説明すると、この第２の実施形態と
第１の実施形態との相違点は、２段目の駆動信号源とな
る符号帳の選択方法にある。すなわち、第１の実施形態
では入力音声信号を分析することでその性質を分類し、
２段目の駆動信号として用いる符号帳を第２の適応符号
帳１２０と雑音符号帳１２１との切り替えていた。(Second Embodiment) FIG. 5 is a block diagram of a speech coding apparatus according to a second embodiment. The parts corresponding to those in FIG. 1 are denoted by the same reference numerals and the description will focus on the differences from the first embodiment. The difference between the second embodiment and the first embodiment is 2 In the method of selecting a codebook to be a drive signal source of the second stage. That is, in the first embodiment, the characteristics are classified by analyzing the input audio signal,
The codebook used as the second-stage drive signal is switched between the second adaptive codebook 120 and the noise codebook 121.

【００４７】これに対し、第２の実施形態では１段目の
駆動信号源となる符号帳として、第１の適応符号帳２１
０と、格納されている駆動ベクトルが常に固定の固定符
号帳２１１の２つの符号帳を持ち、これらの符号帳２１
０，２１１の一方を合成音声信号レベルの歪最小基準で
選択し、その結果から入力音声信号の性質を同様に周期
的か否か、または周期的かつ定常的か否かにより分類
し、それに応じて２段目の駆動信号源として用いる符号
帳として、第２の適応符号帳１２０と雑音符号帳１２１
のいずれか一方を選択する。言い換えると、第１の実施
形態では開ループ的に符号帳を選択するのに対して、第
２の実施形態は閉ループ的に２段目の符号帳の選択を行
う点が異なっている。On the other hand, in the second embodiment, the first adaptive codebook 21 is used as the codebook serving as the first-stage drive signal source.
0, and a fixed codebook 211 in which the stored driving vector is always fixed.
One of 0 and 211 is selected based on the minimum distortion standard of the synthesized voice signal level, and from the result, the property of the input voice signal is similarly classified according to whether it is periodic or not, and whether it is periodic and stationary. The second adaptive codebook 120 and the random codebook 121 are used as codebooks used as the second-stage drive signal source.
Select one of In other words, in the first embodiment, the codebook is selected in an open loop, whereas in the second embodiment, the codebook in the second stage is selected in a closed loop.

【００４８】本実施形態の動作を具体的に説明すると、
まずフレーム単位でＬＰＣ分析およびＬＳＰパラメータ
の量子化を第１の実施形態と全く同様に行う。次に、探
索部１４０において第１の適応符号帳２１０の探索を第
１の実施形態と同様に、また固定符号帳２１１の探索を
第１の実施形態における雑音符号帳１２１の探索と同様
に行い、適応符号帳２１０および固定符号帳２１１から
探索された駆動ベクトルを切り替え器２１２を介して取
り出す。The operation of this embodiment will be specifically described.
First, LPC analysis and LSP parameter quantization are performed in the same manner as in the first embodiment for each frame. Next, in search section 140, search for first adaptive codebook 210 is performed in the same manner as in the first embodiment, and search for fixed codebook 211 is performed in the same manner as in search for noise codebook 121 in the first embodiment. , And extracts the drive vector searched from the adaptive codebook 210 and the fixed codebook 211 via the switch 212.

【００４９】そして、これら第１の適応符号帳２１０と
固定符号帳２１１の探索の結果、すなわち第１の適応符
号帳２１０から探索された駆動ベクトルにゲイン回路１
１１によって利得符号帳１２４から探索されたゲインを
乗じて得られた駆動ベクトルを重み付き合成フィルタ１
１３に通して得られる合成音声ベクトルの歪みと、固定
符号帳２１１から探索された駆動ベクトルにゲイン回路
１１１によって利得符号帳１２４から探索されたゲイン
を乗じて得られた駆動ベクトルを重み付き合成フィルタ
１１３に通して得られる合成音声ベクトルの歪みを比較
し、最小の歪を与える方の駆動ベクトルを１段目の駆動
信号源からの駆動ベクトルとする。この探索によって探
索部１４０で得られる第１の適応符号帳２１０または固
定符号帳２１１のインデックスは、マルチプレクサ１５
０と信号分類器２３０に与えられる。The result of the search of the first adaptive codebook 210 and the fixed codebook 211, that is, the driving vector searched from the first adaptive codebook 210 is added to the gain circuit 1
11 and the drive vector obtained by multiplying the gain searched from the gain codebook 124 by the weighting synthesis filter 1
13 and a driving vector obtained by multiplying the distortion of the synthesized speech vector obtained through the loop 13 and the driving vector searched from the fixed codebook 211 by the gain searched from the gain codebook 124 by the gain circuit 111. The distortion of the synthesized speech vector obtained through step 113 is compared, and the drive vector that gives the minimum distortion is the drive vector from the first-stage drive signal source. The index of the first adaptive codebook 210 or fixed codebook 211 obtained by the search unit 140 by this search is
0 is provided to the signal classifier 230.

【００５０】信号分類器２３０では、探索部１４０より
入力されたインデックスから１段目の駆動信号源として
第１の適応符号帳２１０が選択されたか固定符号帳２１
１が選択されたかどうかを判定することによって、入力
音声信号の性質を周期的か否か、または周期的でかつ定
常的か否かにより分類する。そして、信号分類器２３０
は適応符号帳２１０が選択された場合、つまり入力音声
信号が周期的または周期的かつ定常的の場合は、２段目
の駆動信号源として第２の適応符号帳２２０が選択され
るように、そうでない場合には雑音符号帳１２１が選択
されるように切り替え器１２２を制御する。The signal classifier 230 determines whether the first adaptive codebook 210 has been selected as the first-stage drive signal source from the index input from the search unit 140 or not.
By determining whether 1 has been selected, the characteristics of the input audio signal are classified according to whether they are periodic or whether they are periodic and stationary. Then, the signal classifier 230
When the adaptive codebook 210 is selected, that is, when the input speech signal is periodic or periodic and stationary, the second adaptive codebook 220 is selected as the second-stage drive signal source, Otherwise, switch 122 is controlled so that random codebook 121 is selected.

【００５１】２段目の駆動信号源としての符号帳の探索
は、第１の実施形態と同様に行う。この場合、信号分類
器２３０は１段目の駆動信号源として第１の適応符号帳
２１０が選択されたか否かのみでなく、ピッチの連続性
も考慮に入れて２段目の符号帳の選択を行う構成とする
こともできる。The search for the codebook as the second-stage drive signal source is performed in the same manner as in the first embodiment. In this case, the signal classifier 230 selects the second-stage codebook in consideration of not only whether or not the first adaptive codebook 210 has been selected as the first-stage drive signal source but also the continuity of the pitch. May be performed.

【００５２】なお、図５の音声符号化装置に対応する音
声復号化装置は図４の音声復号化装置と同様であるの
で、説明を省略する。（第３の実施形態）第１および第２の実施形態では、駆
動信号源が２段の符号帳で構成されていたが、３段また
はそれ以上にすることもできる。図６は、駆動信号源が
３段の場合の第３の実施形態に係る音声符号化装置の構
成を示すブロック図である。同図に示すように、本実施
形態では図１に示した第１の実施形態の構成に、３段目
の駆動信号源であるもう一つの雑音符号帳３３０と、そ
の出力に利得符号帳１２４から探索されたゲインを乗じ
るためのゲイン回路３３１、および雑音符号帳３３０を
探索する探索部１４３が追加された構成となっている。The speech decoding device corresponding to the speech encoding device in FIG. 5 is the same as the speech decoding device in FIG. (Third Embodiment) In the first and second embodiments, the drive signal source is constituted by a two-stage codebook. However, the drive signal source may be constituted by three or more stages. FIG. 6 is a block diagram showing a configuration of a speech encoding device according to the third embodiment when the number of drive signal sources is three. As shown in the figure, in the present embodiment, another noise codebook 330 which is a third-stage drive signal source and a gain codebook 124 are added to the output of the configuration of the first embodiment shown in FIG. A gain circuit 331 for multiplying by the gain searched from the above and a search unit 143 for searching the noise codebook 330 are added.

【００５３】本実施形態における符号帳の探索動作は、
まず１段目の駆動信号源である第１の適応符号帳１１０
の探索を探索部１４０によって行い、次に２段目の駆動
信号源である第２の適応符号帳１２０または雑音符号帳
１２１の探索を探索部１４１によって行い、最後に３段
目の駆動信号源である雑音符号帳３３０の探索を探索部
１４３によって行うことによって達成される。The search operation of the codebook in this embodiment is as follows.
First, a first adaptive codebook 110 which is a drive signal source of the first stage
Is searched by the search unit 140, then the search for the second adaptive codebook 120 or the noise codebook 121, which is the second-stage drive signal source, is performed by the search unit 141, and finally the third-stage drive signal source The search is performed by the search unit 143 to search for the random codebook 330.

【００５４】なお、以上の実施形態では入力音声信号の
性質を周期的または周期的かつ定常的か否かにより分類
する手段として、図１に示すように入力音声信号を信号
分類器１３０に入力として有声／無声判定等により分類
を行うか、または図５に示すように第１段階で第１の適
応符号帳２１０が選択されたか否か、もしくは図６に示
すように第２段階で第２の適応符号帳１２０が選択され
たか否かにより間接的に分類する方法を用いたが、これ
に限られるものではない。In the above embodiment, as means for classifying the properties of the input speech signal based on whether they are periodic or whether they are periodic and stationary, as shown in FIG. Classification is performed by voiced / unvoiced determination or the like, or whether the first adaptive codebook 210 is selected in the first step as shown in FIG. 5, or whether the second adaptive codebook 210 is selected in the second step as shown in FIG. Although the method of indirectly classifying based on whether or not the adaptive codebook 120 is selected is used, the present invention is not limited to this.

【００５５】例えば、入力音声信号のパワーを判定して
入力音声信号の性質を周期的または周期的かつ定常的か
否かにより分類を行ってもよいし、第１の適応符号帳１
１０から読み出される駆動ベクトルにゲイン回路１１１
で乗じられるゲインが所定の閾値を越えたか否かを判定
し、閾値を越えたとき入力音声信号の性質が周期的また
は周期的かつ定常的と分類するようにするなど、種々の
変形が可能である。For example, the power of the input voice signal may be determined to classify the characteristics of the input voice signal according to whether it is periodic or periodic and stationary, or the first adaptive codebook 1 may be used.
Gain vector 111 to the drive vector read from
Various modifications are possible, such as determining whether the gain multiplied by exceeds a predetermined threshold, and classifying the property of the input audio signal as periodic or periodic and stationary when the threshold is exceeded. is there.

【００５６】[0056]

【発明の効果】以上説明したように、本発明によれば入
力音声信号の性質を周期的か否かまたは周期的かつ定常
的か否かにより分類し、周期的または周期的かつ定常的
と分類した場合は、第１および第２の２段構成の適応符
号帳で駆動信号を作成することによって、第１の適応符
号帳で十分除去できずに残留した周期成分を第２の適応
符号帳で表現する構成とし、非周期的または周期的かつ
非定常的と分類した場合には、雑音符号帳を用いて駆動
信号を作成して相関の小さい信号を表現する構成として
いる。この結果、常に入力音声信号の性質に適合した駆
動信号を構成することができ、４ｋｂｐｓ程度の低いビ
ットレートでも良好な品質の音声を再生することができ
る。As described above, according to the present invention, the characteristics of an input voice signal are classified according to whether they are periodic or whether they are periodic and stationary, and are classified as periodic or periodic and stationary. In this case, by generating a drive signal using the first and second two-stage adaptive codebooks, the periodic components that cannot be sufficiently removed by the first adaptive codebook and remain are removed by the second adaptive codebook. When classified as non-periodic or periodic and non-stationary, a driving signal is created using a noise codebook to represent a signal having a small correlation. As a result, a drive signal suitable for the properties of the input audio signal can always be formed, and high-quality audio can be reproduced even at a bit rate as low as about 4 kbps.

【００５７】また、本発明では駆動信号が適応符号帳か
ら構成されるので、適応符号帳の特殊な構成を利用して
ＣＥＬＰ方式で問題となっていた演算量を削減できると
いう効果もある。すなわち、ＳＥＶ方式と同様に、適応
符号帳内の各駆動ベクトルが過去の駆動信号からピッチ
周期に対応した遅延分シフトしながら構成され、その結
果、各駆動ベクトルはその要素がオーバラップする構成
になっていることを利用して、駆動ベクトルに対する合
成フィルタの出力の計算を再帰的に行うことで、演算量
を大幅に削減することが可能である。Further, in the present invention, since the drive signal is composed of an adaptive codebook, there is also an effect that the amount of calculation which has been a problem in the CELP system can be reduced by using a special configuration of the adaptive codebook. That is, similarly to the SEV method, each driving vector in the adaptive codebook is configured to be shifted from the past driving signal by a delay corresponding to the pitch period, and as a result, each driving vector has a configuration in which its elements overlap. By utilizing the fact that the calculation of the output of the synthesis filter with respect to the drive vector is performed recursively, it is possible to greatly reduce the amount of calculation.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る音声符号化装置
の構成を示すブロック図FIG. 1 is a block diagram illustrating a configuration of a speech encoding device according to a first embodiment of the present invention.

【図２】音声信号の有声区間と無声区間の波形の例を示
す図FIG. 2 is a diagram showing an example of waveforms in a voiced section and an unvoiced section of an audio signal.

【図３】同実施形態における主要な処理の流れを示すフ
ローチャートFIG. 3 is a flowchart showing a flow of main processing in the embodiment.

【図４】同実施形態に係る音声復号化装置の構成を示す
ブロック図FIG. 4 is a block diagram showing the configuration of the speech decoding apparatus according to the embodiment;

【図５】本発明の第２の実施形態に係る音声符号化装置
の構成を示すブロック図FIG. 5 is a block diagram showing a configuration of a speech encoding device according to a second embodiment of the present invention.

【図６】本発明の第３の実施形態に係る音声符号化装置
の構成を示すブロック図FIG. 6 is a block diagram illustrating a configuration of a speech coding apparatus according to a third embodiment of the present invention.

【図７】従来のＣＥＬＰ方式による音声符号化装置の典
型的な構成を示すブロック図FIG. 7 is a block diagram showing a typical configuration of a conventional CELP-based speech coding apparatus;

【図８】従来のＳＥＶ方式による音声符号化装置の要部
の構成を示すブロック図FIG. 8 is a block diagram showing a configuration of a main part of a conventional speech coding apparatus using the SEV method.

[Explanation of symbols]

１００…音声信号入力端子１０１…バッファ１０４…サブフレーム分割回路１０２…ＬＰＣ分析部１０３…ＬＳＰ量子化器１０５…重み付けフィルタ１１０…第１の適応符号帳１１１…ゲイン回路１１２…加算器１１３…重み付き合成フィルタ１１４…減算器１２０…第２の符号帳１２１…雑音符号帳１２２…切り替え器１２３…ゲイン回路１２４…利得符号帳１３０…信号分類器１４０〜１４３…探索部１５０…マルチプレクサ１５１…符号化パラメータ出力端子２１０…第１の適応符号帳２１１…固定符号帳１５０…マルチプレクサ１６０…符号化パラメータ入力端子１６１…デマルチプレクサ１６２…ＬＳＰ逆量子化器１７０…第１の適応符号帳１７１…ゲイン回路１７３…合成フィルタ１７４…ポストフィルタ１８０…第２の適応符号帳１８１…雑音符号帳１８２…切り替え器１８３…ゲイン回路２１０…第１の適応符号帳２１１…固定符号帳２１２…切り替え器２３０…信号分類器３３０…雑音符号帳３３１…ゲイン回路 Reference Signs List 100 audio signal input terminal 101 buffer 104 subframe division circuit 102 LPC analysis section 103 LSP quantizer 105 weighting filter 110 first adaptive codebook 111 gain circuit 112 adder 113 weighted Synthesis filter 114 Subtractor 120 Second codebook 121 Noise codebook 122 Switcher 123 Gain circuit 124 Gain codebook 130 Signal classifier 140-143 Search unit 150 Multiplexer 151 Coding parameter Output terminal 210 first adaptive codebook 211 fixed codebook 150 multiplexer 160 input parameter input terminal 161 demultiplexer 162 LSP inverse quantizer 170 first adaptive codebook 171 gain circuit 173 Synthetic filter 174: Post filter 80 second adaptive codebook 181 noise codebook 182 switch 183 gain circuit 210 first adaptive codebook 211 fixed codebook 212 switch 230 signal classifier 330 noise codebook 331 Gain circuit

Claims

(57) [Claims]

A drive signal is generated using a drive vector obtained from a plurality of drive vector codebooks, and the drive signal is supplied to a synthesis filter whose filter coefficient is determined based on an analysis result of an input speech signal; The drive vector codebook searches for a drive vector that reduces the distortion of the synthesized speech signal output from the synthesis filter from the drive vector codebook, and outputs at least an encoding parameter representing information on the drive vector and the filter coefficient of the synthesis filter. In the speech coding method, at least two adaptive codebooks, at least one fixed codebook and at least one noise codebook are prepared as the drive vector codebook, and the first adaptive codebook and the fixed codebook are provided in a first stage. Of the codebooks, the one with the smaller distortion of the synthesized voice signal is selected,
In the second step, it is determined whether the first adaptive codebook or the fixed codebook has been selected in the first step, and whether the property of the input speech signal is periodic or not, and And if the property of the input speech signal is periodic or periodic and stationary, generate the drive signal using a drive vector obtained from at least the two adaptive codebooks; A speech coding method comprising: generating a drive signal using at least a drive vector obtained from the noise codebook when the property of the speech signal is non-periodic or periodic and non-stationary.

2. A drive signal is generated using drive vectors obtained from a plurality of drive vector codebooks, and the drive signal is supplied to a synthesis filter whose filter coefficient is determined based on an analysis result of an input speech signal. The drive vector codebook searches for a drive vector that reduces the distortion of the synthesized speech signal output from the synthesis filter from the drive vector codebook, and outputs at least an encoding parameter representing information on the drive vector and the filter coefficient of the synthesis filter. In the speech encoding device, at least 2 provided as the drive vector codebook
Two adaptive codebooks, at least one fixed codebook and at least one noise codebook, and the first adaptive codebook and the fixed codebook in the first stage, which codebook has a smaller distortion of the synthesized speech signal. Selected,
Classification means for judging whether a first adaptive codebook or a fixed codebook is selected in a second step, and classifying the input speech signal according to whether the property of the input speech signal is periodic or periodic and stationary. And if the property of the input audio signal is classified as periodic or periodic and stationary by the classification means,
At least the noise code book when the two using a driving vector obtained from the adaptive codebook generates the drive signal, the nature of the input speech signal is classified as aperiodic or periodic and unsteady And a drive signal generating means for generating the drive signal using a drive vector obtained from the audio encoding apparatus.