JPH06208398A

JPH06208398A - Generation method for sound source waveform

Info

Publication number: JPH06208398A
Application number: JP5001504A
Authority: JP
Inventors: Hideo Osawa; 英男大沢
Original assignee: Japan Radio Co Ltd
Current assignee: Japan Radio Co Ltd
Priority date: 1993-01-08
Filing date: 1993-01-08
Publication date: 1994-07-26

Abstract

PURPOSE:To improve the quality of a reproduced sound of the heading and the ending of a word by changing an amplitude of a sound source code book in the same way as amplitude change in an input sound, when frame power of the input sound has a rising tendency or a falling tendency over the range from the frame to thee next frame. CONSTITUTION:A rising and falling detecting circuit 28 detects the rising of an input sound by discriminating a threshold value, and also detects the inclination of rising. A sound source waveform generation circuit 30 transists its output amplitude in a frame according to this inclination. A signal level outputted from a sound source code book 10 is constant in the same frame, and the gain B of a coefficient adder 14 is set in an optimum state by an error power minimizing section 26. Therefore, since the sound source waveform is weighted by that inclination, the appropriate sound source waveform is obtained by using this processing even if an input sound is raised or fallen with shorter lag than frame length.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ＣＥＬＰ（code excit
ed linear prediction：符号励振線形予測）符号化方式
における音源コードブックの音源波形生成方法に関す
る。The present invention relates to CELP (code excit
ed linear prediction: a method for generating a sound source waveform of a sound source codebook in a code excitation linear prediction) coding method.

【０００２】[0002]

【従来の技術】例えばディジタル自動車電話に使用され
る音声ＣＯＤＥＣ等には、ディジタル音声信号の符号化
速度を、４ｋｂｐｓから８ｋｂｐｓ程度に低減すること
が要請されている。図６には、このような目的で使用さ
れる音源波形の生成方法の一例が示されている。2. Description of the Related Art For example, a voice CODEC used in a digital car telephone is required to reduce a coding rate of a digital voice signal from 4 kbps to 8 kbps. FIG. 6 shows an example of a method of generating a sound source waveform used for such a purpose.

【０００３】この図に示される方法は、ＣＥＬＰ符号化
方式において音源コードブックで音源波形を生成する方
法方法である。図において１０で示されるのが音源コー
ドブックであり、音源コードブック１０には過去のフレ
ームにおいて音源信号として使用された信号が格納され
ている。すなわち、音源コードブック１０及び雑音コー
ドブック１２の後段に設けられゲインβ（スカラ）の係
数乗算器１４、ゲインγ（スカラ）の係数乗算器１６及
びベクトル加算を行う加算器１８から構成される線形加
算手段の出力は、現フレームにおいて音源信号として使
用されると共に、音源コードブック１０に帰還入力され
る。The method shown in this figure is a method of generating an excitation waveform with an excitation codebook in the CELP coding method. In the figure, reference numeral 10 denotes a sound source codebook, and the sound source codebook 10 stores a signal used as a sound source signal in a past frame. That is, a linear configuration including a coefficient multiplier 14 with a gain β (scalar), a coefficient multiplier 16 with a gain γ (scalar), and an adder 18 for performing vector addition, which is provided at the subsequent stage of the sound source codebook 10 and the noise codebook 12. The output of the adding means is used as a sound source signal in the current frame and is fed back to the sound source codebook 10.

【０００４】ここにいうフレームは、音源信号として使
用される音声信号を、２０ｍｓｅｃ程度の長さでフレー
ム化したものであり、サンプリング周波数として８ｋＨ
ｚを用いている場合、１フレームに属するサンプル数は
８ｋＨｚ×２０ｍｓｅｃ＝１６０サンプルとなる。フレ
ーム長は、音声信号が定常的であると見なせる長さ、す
なわちその性質に大きな変化がないと見なせる長さに設
定する必要があり、通常は、２０〜３０ｍｓｅｃ程度に
設定する。音源コードブック１０は、最も古い音源信号
を廃棄しつつ、帰還入力される新しい音源信号を逐次蓄
える（音源コードブック１０の更新）。この結果、音源
コードブック１０には、前のフレームで音源信号として
使用された信号、その前のフレームで音源信号として使
用された信号、…というように、過去のフレームの音源
信号が格納される。その際、１個のフレームは４個のサ
ブフレームに分割して取り扱われる。すなわち、音源コ
ードブック１０は、このサブフレームを単位として、す
なわち１６０／４＝４０サンプルを単位として、最も新
しいサブフレームが先頭に位置するよう、更新される。The frame referred to here is a sound signal used as a sound source signal, which is framed with a length of about 20 msec, and has a sampling frequency of 8 kHz.
When z is used, the number of samples belonging to one frame is 8 kHz × 20 msec = 160 samples. It is necessary to set the frame length to a length at which the audio signal can be considered to be stationary, that is, to a length at which the property of the audio signal does not change significantly, and is usually set to about 20 to 30 msec. The sound source codebook 10 discards the oldest sound source signal and sequentially stores new sound source signals to be fed back (update of the sound source codebook 10). As a result, the sound source codebook 10 stores sound source signals of past frames, such as a signal used as a sound source signal in the previous frame, a signal used as a sound source signal in the previous frame, and so on. . At this time, one frame is divided into four subframes and handled. That is, the sound source codebook 10 is updated in units of this subframe, that is, in units of 160/4 = 40 samples, so that the newest subframe is located at the head.

【０００５】音源コードブック１０に格納される例えば
１６０サンプルの音声信号は、数十サンプル（上の例で
は４０サンプル）まとめられた状態で、線形加算手段に
出力される。このように信号を数十サンプルまとめたも
のを、ベクトルと呼ぶ。特に、音源コードブック１０か
ら出力されるベクトルは、音源コードベクトルと呼ばれ
る。The audio signals of, for example, 160 samples stored in the sound source codebook 10 are output to the linear adding means in a state in which several tens of samples (40 samples in the above example) are put together. A signal obtained by collecting dozens of samples is called a vector. In particular, the vector output from the sound source codebook 10 is called a sound source code vector.

【０００６】最適な音源信号生成のために必要となる音
源コードベクトルは、Ａ−ｂ−Ｓ（analysis by synthe
sis ：合成による分析）法により得られる。Ａ−ｂ−Ｓ
法は、最も誤差電力の小さい信号を生成するために、実
際に合成を行い閉ループを用いてフィードバックを加え
る手法である。図６におけるＡ−ｂ−Ｓ法は、主に、音
源コードブック１０における最適ラグの決定及び雑音コ
ードブック１２における最適インデックスの決定により
実現される。The sound source code vector required for optimal sound source signal generation is A-B-S (analysis by synthesis).
sis: Synthetic analysis) method. A-B-S
The method is a method of actually combining and adding feedback using a closed loop in order to generate a signal with the smallest error power. The A-B-S method in FIG. 6 is mainly realized by determining the optimum lag in the excitation codebook 10 and determining the optimum index in the noise codebook 12.

【０００７】まず、音源コードブック１０における最適
ラグの決定は、重み付け合成フィルタ２０の出力の聴覚
重み付けフィルタ２２の出力に対する誤差電力に基づ
き、音源コードブック１０においてラグを探索すること
により、行われる。音源コードブック１０と雑音コード
ブック１２の各ベクトルの探索は独立に行われる。First, the optimum lag in the excitation codebook 10 is determined by searching the lag in the excitation codebook 10 based on the error power of the output of the weighting synthesis filter 20 with respect to the output of the auditory weighting filter 22. The search for each vector in the sound source codebook 10 and the noise codebook 12 is performed independently.

【０００８】前述のように、音源コードブック１０から
出力される音源コードベクトルは、係数乗算器１４及び
１６並びに加算器１８により雑音コードブック１２の出
力と線形加算された上で、音源信号として重み付け合成
フィルタ２０に入力される。音源コードベクトル探索時
は雑音コードブック１２からの入力を零とする。重み付
け合成フィルタ２０は、聴覚重み付けフィルタ２２の出
力と比較可能な態様に音源信号を変換するためのフィル
タであり、加算器１８の出力に所定の重み付けを施す。
一方、入力される音声（符号化すべき音声）は、聴覚重
み付けフィルタ２２により重み付けされる。減算器２４
は、聴覚重み付けフィルタ２２の出力に対する重み付け
合成フィルタ２０の出力の誤差電力をベクトル減算によ
り求める。減算器２４の後段に設けられている誤差電力
最小化部２６は、得られた誤差電力が最小となるよう、
音源コードブック１０における探索を実行する。誤差電
力最小化部２６は、重み付け合成フィルタ２０の出力が
聴覚重み付けフィルタ２２の出力と最も類似したベクト
ルとなるよう、あらかじめ定められている探索範囲を探
索する。誤差電力が最小となるラグは、最適ラグと呼ば
れる。As described above, the excitation code vector output from the excitation code book 10 is linearly added to the output of the noise code book 12 by the coefficient multipliers 14 and 16 and the adder 18, and then weighted as an excitation signal. It is input to the synthesis filter 20. When searching the sound source code vector, the input from the noise codebook 12 is set to zero. The weighting synthesis filter 20 is a filter for converting the sound source signal into a form that can be compared with the output of the auditory weighting filter 22, and applies a predetermined weight to the output of the adder 18.
On the other hand, the input voice (voice to be encoded) is weighted by the auditory weighting filter 22. Subtractor 24
Calculates the error power of the output of the weighting synthesis filter 20 with respect to the output of the auditory weighting filter 22 by vector subtraction. The error power minimization unit 26 provided in the subsequent stage of the subtractor 24 minimizes the obtained error power.
A search in the sound source codebook 10 is executed. The error power minimization unit 26 searches a predetermined search range so that the output of the weighting synthesis filter 20 becomes a vector most similar to the output of the auditory weighting filter 22. The lag with the minimum error power is called the optimum lag.

【０００９】ここに、ラグとは、音源コードブック１０
中の最も新しいサンプルから見たサンプルの古さをい
う。前述のように、音源コードブック１０には過去にお
いて音源信号として使用された信号が格納されている。
探索範囲を例えば−２０〜−１４６サンプル（音源コー
ドブック１０中の最も新しいサンプルから見て２０サン
プル前から１４６サンプル前まで）とすると、誤差電力
最小化部２６は、この探索範囲に属する任意のサンプル
を始点とする４０サンプル（１ベクトル）を順次取り出
し、探索範囲の全体に亘って、誤差電力が最小となるか
否かの探索（最適ラグの探索）を行う。始点となるサン
プルの位置をラグと呼び、誤差電力が最小となるベクト
ルのラグを最適ラグと呼ぶ。ラグは、声帯振動の周期に
対応しており、最適ラグは、入力音声が男声である場合
には長くなり、女性である場合には短くなる。探索範囲
は、符号化速度を下げる（ビットレートを下げる）とき
には、狭く設定する。[0009] Here, the lag is the tone generator codebook 10
The age of the sample as seen from the newest sample. As described above, the sound source codebook 10 stores signals that have been used as sound source signals in the past.
If the search range is, for example, −20 to −146 samples (from 20 samples before to 146 samples before the newest sample in the sound source codebook 10), the error power minimization unit 26 selects any of the samples belonging to this search range. Forty samples (one vector) starting from the sample are sequentially taken out, and a search for the minimum error power (search for the optimum lag) is performed over the entire search range. The position of the sample that is the starting point is called the lag, and the lag of the vector that minimizes the error power is called the optimum lag. The lag corresponds to the cycle of vocal cord vibration, and the optimum lag is longer when the input voice is a male voice and shorter when the input voice is a female voice. The search range is set narrow when the coding speed is lowered (the bit rate is lowered).

【００１０】このように探索を行うことにより得られる
最適ラグは、復号時にも使用できる。すなわち、符号化
側と復号側とで同一の手法により音源コードブック１０
を更新するようにすれば、符号化側から復号側に最適ラ
グの情報を伝送するのみで、音源コードブック１０に係
る情報を与えることができる。さらに、上述の探索範囲
の設定例では、ラグは７ビットで表現できる。The optimum lag obtained by performing the search in this way can also be used at the time of decoding. That is, the excitation codebook 10 is encoded by the same method on the encoding side and the decoding side.
Is updated, it is possible to give the information related to the excitation codebook 10 only by transmitting the information of the optimum lag from the encoding side to the decoding side. Further, in the above-described search range setting example, the lag can be represented by 7 bits.

【００１１】Ａ−ｂ−Ｓ法は、図６では、このような最
適ラグ決定に加え、雑音コードブック１２における最適
インデックスの決定により実現されている。雑音コード
ブック１２は、互いに異なる数十サンプル（上述の例で
は４０サンプル＝１ベクトル）の雑音信号を所定個数
（例えば１２８個）蓄えている。このように雑音コード
ブック１２に蓄えられているベクトルを、雑音コードベ
クトルと呼ぶ。雑音コードブック１２に蓄えられている
雑音コードベクトルの個数が１２８である場合、７ビッ
トのインデックスにより、各雑音コードベクトルを特定
できる。雑音コードブック１２の内容は、符号化側と復
号側とで同一の内容とする。このようにすれば、符号化
側から復号側に最適インデックスの情報を伝送するのみ
で、雑音コードブック１２に係る情報を与えることがで
きる。In FIG. 6, the A-B-S method is realized by determining the optimum index in the noise codebook 12 in addition to the optimum lag determination. The noise codebook 12 stores a predetermined number (for example, 128) of noise signals of several tens of samples (40 samples = 1 vector in the above example) different from each other. The vector thus stored in the noise codebook 12 is called a noise code vector. When the number of noise code vectors stored in the noise code book 12 is 128, each noise code vector can be specified by the 7-bit index. The content of the noise codebook 12 is the same on the encoding side and the decoding side. In this way, the information about the noise codebook 12 can be given only by transmitting the information of the optimum index from the encoding side to the decoding side.

【００１２】誤差電力最小化部２６は、音源コードブッ
ク１０について前述の最適ラグ探索を実施した上で、雑
音コードブック１２について上記インデックスの最適化
を実施する。The error power minimization unit 26 performs the above-mentioned optimum lag search for the excitation codebook 10 and then optimizes the index for the noise codebook 12.

【００１３】このようにして得られた最適ラグ及び最適
インデックスは、符号化側から復号側に伝送される。す
なわち、図６に示される符号化側の装置の出力は、誤差
電力最小化部２６によって得られる最適ラグ及び最適イ
ンデックスである。言い換えれば、ＣＥＬＰ符号化方式
では、音源コードベクトルと雑音コードベクトルの和で
重み付け合成フィルタ２０を駆動した出力により、入力
音声を表現している。このようなベクトル単位での量子
化をベクトル量子化という。The optimum lag and the optimum index thus obtained are transmitted from the encoding side to the decoding side. That is, the output of the device on the encoding side shown in FIG. 6 is the optimum lag and the optimum index obtained by the error power minimization unit 26. In other words, in the CELP coding method, the input speech is represented by the output that drives the weighting synthesis filter 20 with the sum of the excitation code vector and the noise code vector. Quantization in such vector units is called vector quantization.

【００１４】これに加え、重み付け合成フィルタ２０の
駆動条件たる情報、すなわち各コードベクトルに乗ぜら
れるゲインβ及びγ並びに重み付け合成フィルタ２０の
各種パラメタも伝送される。伝送されるゲインβ及びγ
は、それぞれ誤差電力を用いて最適化した最適ゲインで
ある。例えばゲインγの最適値γ_optは、次の式により
求められる誤差電力Ｅ_ｉがｄＥ_ｉ／ｄγ＝０の条件を満
たすγ＝γ_optである。In addition to this, information as a driving condition of the weighting synthesis filter 20, that is, gains β and γ multiplied by each code vector and various parameters of the weighting synthesis filter 20 are also transmitted. Gains β and γ transmitted
Are optimum gains optimized using the error powers. For example, the optimum value γ _opt of the gain γ is γ = γ _opt that satisfies the condition that the error power E _i obtained by the following equation is dE _i / dγ = 0.

【００１５】Ｅ_ｉ＝Σ｛ｐ（ｎ）−γ×ｅ_ｉ（ｎ）＊ｈ
（ｎ−ｔ）｝^２ただし、Ｅ_ｉはｉ番目の雑音コードベクトルの誤差電
力、ｅ_ｉ（ｎ）はｉ番目の雑音コードベクトル中のｎ番
目のサンプル、ｐ（ｎ）は既に計算された最適な音源ベ
クトルを重み付けして合成した信号を、聴覚重み付けさ
れた入力信号から減じた信号、ｈ（ｎ−ｔ）は重み付け
合成フィルタ２０のインパルス応答である。＊はコンボ
ルーションを、Σは４０個のサンプルについての総和
を、それぞれ示している。あるｉについて最適ゲインγ
_optを求め、この値を上式に代入すると、各ｉについて
真の誤差電力Ｅ_ｉ（ｏｐｔ）が求められる。このことか
ら明らかなように、ｉが異なると最適ゲインγ_optも異
なる。これは、ゲインβの最適値β_optについても同様
である。E _i = Σ {p (n) -γ × e _i (n) * h
(N−t)} ² where E _i is the error power of the i th noise code vector, e _i (n) is the n th sample in the i th noise code vector, and p (n) has already been calculated. A signal obtained by subtracting a signal obtained by weighting and combining the optimal sound source vector from the perceptually weighted input signal, and h (n−t) is an impulse response of the weighting synthesis filter 20. * Indicates convolution, and Σ indicates the sum total of 40 samples. Optimal gain γ for some i
_{By obtaining opt} and substituting this value into the above equation, the true error power E _i (opt) is obtained for each i. As is apparent from this, the optimum gain γ _opt also differs when i is different. The same applies to the optimum value β _opt of the gain β.

【００１６】復号側では、符号化側と同様のコードブッ
クにより、伝送された情報に基づき各コードベクトルを
求め、合成フィルタを駆動して再生信号を出力する。On the decoding side, the same codebook as on the encoding side is used to obtain each code vector based on the transmitted information, drive the synthesizing filter, and output the reproduced signal.

【００１７】なお、フレーム長又はサブフレーム長は、
音源信号の周期とは関係なく固定されている。上述のよ
うに最適ラグの探索範囲を−２０〜−１４６サンプルの
範囲に設定した場合には、例えばラグが２０サンプルで
あると、このラグを始点とする４０サンプルを取り出す
ことができない。このような場合には、誤差電力最小化
部２６は、最新の２０サンプルを２回繰り返させること
により４０サンプルを生成させ、これを線形加算及び重
み付け合成に供するようにする。また、女声のようにピ
ッチ周波数が高い音声が入力された場合、最適ラグは短
くなる。このような場合、音源信号の長さがフレーム長
に満たないラグが最適となることがある。この場合、当
該音源信号を繰り返して使用して音源信号に使用する。The frame length or subframe length is
It is fixed regardless of the period of the sound source signal. When the optimum lag search range is set to the range of −20 to −146 samples as described above, if the lag is 20 samples, for example, 40 samples cannot be taken starting from this lag. In such a case, the error power minimization unit 26 generates the 40 samples by repeating the latest 20 samples twice, and uses this for the linear addition and the weighted combination. Further, when a voice with a high pitch frequency such as a female voice is input, the optimum lag becomes short. In such a case, a lag in which the length of the sound source signal is less than the frame length may be optimal. In this case, the sound source signal is repeatedly used and used as the sound source signal.

【００１８】[0018]

【発明が解決しようとする課題】入力音声のレベルが定
常的な領域では、上述のような処理により好適に符号化
を行うことができるが、語頭及び語尾の領域では、当該
定常的な領域と波形は同じであるにしてもレベルが異な
る。すなわち、語頭及び語尾の領域では、入力音声に立
ち上がり又は立ち下がりが生じているため、音源コード
ブックにより生成されるも立ち上がり又は立ち下がりが
生じていると考えられる。この領域を定常的な領域と同
様に扱うと、符号化側から復号側に伝送する情報により
復号器で再生される再生信号の量子化雑音が特に語頭及
び語尾で発生し、再生音声品質の低下が生じることがあ
る。この対策としては、フレーム長又はサブフレーム長
を短くする方法が考えられるが、この場合、単位時間あ
たりのビット数が増え符号化の効率が悪くなる。In the region where the level of the input voice is steady, the above-described processing can be suitably performed for encoding, but in the region of the beginning and end of the word, the region is Even if the waveforms are the same, the levels are different. That is, since the input voice has rises or falls in the beginning and end regions, it is considered that rises or falls are generated even if generated by the sound source codebook. If this area is treated in the same way as a stationary area, quantization noise of the reproduced signal reproduced by the decoder due to the information transmitted from the encoding side to the decoding side especially occurs at the beginning and end of the word, and the reproduced voice quality deteriorates. May occur. As a countermeasure against this, a method of shortening the frame length or the subframe length can be considered, but in this case, the number of bits per unit time increases and the coding efficiency becomes poor.

【００１９】本発明は、このような問題点を解決するこ
とを課題としてなされたものであり、ＣＥＬＰ符号化方
式により女声のように短いピッチ周期を有する入力音声
を符号化する際に、語頭及び語尾の再生音声品質を改善
することを目的とする。The present invention has been made to solve the above problems, and when the input speech having a short pitch period such as a female voice is coded by the CELP coding method, the beginning and The purpose is to improve the reproduced voice quality of the ending.

【００２０】[0020]

【課題を解決するための手段】このような目的を達成す
るために、本発明は、所定のフレーム又はサブフレーム
長を有し音源コードブックにより得られる音源コードベ
クトルと雑音コードブックにより得られる雑音コードベ
クトルの一次結合を求め、重み付け合成フィルタに入力
し、その出力信号と聴覚重み付けされた入力音声とを比
較して得られる誤差電力が最小化されるよう、音源コー
ドブックの探索において最適ラグ及び最適ゲインを設定
し、設定された最適ラグ及び最適ゲインを含む情報を伝
送すると共に、最適ラグがフレーム又はサブフレーム長
より短い場合に当該最適ラグに係る音源波形を繰り返す
ことにより音源コードベクトルを生成するＡ−ｂ−Ｓ法
を用いたＣＥＬＰ符号化方式において、あるフレーム又
はサブフレームから次のフレーム又はサブフレームに亘
って入力音声のフレーム又はサブフレーム電力が立ち上
がり又は立ち下がっている場合に、音源コードブックか
ら出力される信号に含まれる音源波形の振幅を、入力音
声の立ち上がり又は立ち下がりに係る振幅遷移と同様に
遷移させることを特徴とする。In order to achieve such an object, the present invention provides a source code vector obtained by an source codebook having a predetermined frame or subframe length and a noise obtained by a noise codebook. An optimal lag and a search are performed in the source codebook so that the error power obtained by obtaining the linear combination of the code vectors and inputting it to the weighting synthesis filter and comparing the output signal with the perceptually weighted input speech is minimized. Generates a sound source code vector by setting the optimum gain, transmitting information including the set optimum lag and optimum gain, and repeating the sound source waveform related to the optimum lag when the optimum lag is shorter than the frame or subframe length. In a CELP coding method using the A-b-S method, whether a certain frame or subframe When the frame or subframe power of the input speech rises or falls over the next frame or subframe, the amplitude of the sound source waveform included in the signal output from the sound source codebook is set to the rise or rise of the input speech. It is characterized in that a transition is made in the same manner as the amplitude transition associated with the falling.

【００２１】[0021]

【作用】本発明においては、あるフレーム又はサブフレ
ームから次のフレーム又はサブフレームに亘って入力音
声のフレーム又はサブフレーム電力が立ち上がり又は立
ち下がっている場合に、音源コードブックから出力され
る信号に含まれる音源波形の振幅が、入力音声の立ち上
がり又は立ち下がりに係る振幅遷移と同様に遷移する。
従って、女声のように短いピッチ周期を有する入力音声
を符号化する際にも、その立ち上がり又は立ち下がりに
係る傾斜を音源波形に重み付けられるため、適正な音源
波形が得られ、適正なピッチ抽出が行える。また、フレ
ーム又はサブフレーム長より短いラグが最適である場合
に、立ち上がり及び立ち下がりに係るフレーム又はサブ
フレームの量子化雑音を低減できるから、入力音声の語
頭及び語尾に係るフレーム又はサブフレームの量子化雑
音が低減され、再生音声品質が改善される。In the present invention, the signal output from the sound source codebook is used when the frame or subframe power of the input speech rises or falls from one frame or subframe to the next frame or subframe. The amplitude of the included sound source waveform makes a transition similar to the amplitude transition associated with the rising or falling of the input voice.
Therefore, even when encoding an input voice having a short pitch period such as a female voice, since the slope relating to the rising or falling of the input voice is weighted to the sound source waveform, a proper sound source waveform can be obtained and a proper pitch can be extracted. You can do it. Further, when the lag shorter than the frame or subframe length is optimal, the quantization noise of the frame or subframe relating to the rising and falling can be reduced, so that the quantification of the frame or subframe relating to the beginning and ending of the input speech is performed. Noise is reduced and the reproduced voice quality is improved.

【００２２】[0022]

【実施例】以下、本発明の好適な実施例について図面に
基づき説明する。なお、図６に示される従来例と同様の
構成には同一の符号を付し説明を省略する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A preferred embodiment of the present invention will be described below with reference to the drawings. It should be noted that the same components as those of the conventional example shown in FIG.

【００２３】図１には、本発明の一実施例に係る方法が
ブロック図により示されている。この図に示される装置
は、図６に示される従来の装置に、さらに立ち上がり、
立ち下がり検出回路２８及び音源波形生成回路３０を付
加した構成である。また、音源コードブック１０の出力
は、直接に係数乗算器１４に入力されるのではなく、音
源波形生成回路３０を介して係数乗算器１４に入力され
る。FIG. 1 is a block diagram showing a method according to an embodiment of the present invention. The device shown in this figure is more than the conventional device shown in FIG.
This is a configuration in which a fall detection circuit 28 and a sound source waveform generation circuit 30 are added. Further, the output of the sound source codebook 10 is not directly input to the coefficient multiplier 14, but is input to the coefficient multiplier 14 via the sound source waveform generation circuit 30.

【００２４】立ち上がり、立ち下がり検出回路２８は、
入力音声のフレーム（又はサブフレーム）電力を演算
し、これをしきい値判別することにより入力音声の立ち
上がり及び立ち下がりを検出すると共に、立ち上がり及
び立ち下がりの傾斜を求める。音源波形生成回路３０
は、立ち上がり、立ち下がり検出回路２８により入力音
声のフレーム（又はサブフレーム）電力の及び立ち上が
り又は立ち下がりが検出された場合に、当該立ち上がり
又は立ち下がりの傾斜に応じて音源コードベクトルの振
幅を線形的に変化させる。The rising / falling detection circuit 28 is
The frame (or subframe) power of the input voice is calculated, and the rising and falling edges of the input voice are detected by discriminating the threshold value of this power, and the slopes of the rising and falling edges are obtained. Sound source waveform generation circuit 30
When the rising / falling detection circuit 28 detects the frame (or subframe) power of the input voice and rising / falling, the amplitude of the sound source code vector is linear according to the slope of the rising / falling. Change.

【００２５】いま、入力音声のフレーム（又はサブフレ
ーム）電力が立ち上がった場合を考える。前フレーム
（又はサブフレーム）までは入力音声が無音であり、現
フレーム（又はサブフレーム）、すなわち該当フレーム
（又はサブフレーム）から有音になり、さらに次フレー
ム（又はサブフレーム）の電力が図２に示されるように
該当フレーム（又はサブフレーム）に比べ増加したとす
る。Now, consider a case where the frame (or sub-frame) power of the input voice rises. The input voice is silent up to the previous frame (or subframe), becomes voiced from the current frame (or subframe), that is, the corresponding frame (or subframe), and the power of the next frame (or subframe) As shown in 2, it is assumed that the number has increased compared to the corresponding frame (or subframe).

【００２６】立ち上がり、立ち下がり検出回路２８は、
この立ち上がりをしきい値判別により検出すると共に、
図３に示されるように立ち上がりの傾斜を検出する。音
源波形生成回路３０は、この傾斜に従い、図４及び図５
に示されるように該当フレーム（又はサブフレーム）内
においてその出力振幅を遷移させる。音源コードブック
１０から出力される信号のレベルは同一フレーム（又は
サブフレーム）中において一定であり、また、係数乗算
器１４のゲインβは誤差電力最小化部２６により最適設
定される。The rising / falling detection circuit 28 is
This rising is detected by the threshold judgment,
The rising slope is detected as shown in FIG. The sound source waveform generation circuit 30 follows the inclinations of FIGS.
As shown in, the output amplitude is changed within the corresponding frame (or subframe). The level of the signal output from the sound source codebook 10 is constant in the same frame (or subframe), and the gain β of the coefficient multiplier 14 is optimally set by the error power minimization unit 26.

【００２７】従って、このような処理により、フレーム
長より短いラグで入力音声が立ち上がり又は立ち下がっ
ている場合でも、その傾斜を音源波形に重み付けられる
ため、適正な音源波形が得られ、適正なピッチ抽出が行
える。また、フレーム長より短いラグが最適である場合
に、立ち上がり及び立ち下がりに係るフレームの量子化
雑音を低減できるから、入力音声の語頭及び語尾におけ
る量子化雑音の発生を防止でき、再生音声品質を改善で
きる。さらに、ＣＥＬＰ符号化方式では、通常、フレー
ム電力も伝送パラメータとして伝送しているため、本実
施例を実現するに当たって新たな伝送パラメータが必要
とされない。Therefore, by such processing, even when the input voice rises or falls at a lag shorter than the frame length, the inclination is weighted to the sound source waveform, so that a proper sound source waveform is obtained and a proper pitch is obtained. Can be extracted. Further, when the lag shorter than the frame length is optimal, the quantization noise of the frame relating to the rising and falling can be reduced, so that the generation of the quantization noise at the beginning and the ending of the input voice can be prevented, and the reproduced voice quality can be improved. Can be improved. Furthermore, in the CELP coding method, since the frame power is also normally transmitted as a transmission parameter, no new transmission parameter is required to implement this embodiment.

【００２８】なお、この例では、ラグがフレーム長の１
／２である。この例と異なり、探索しているラグの音源
波形が、１フレームに整数個入らないときには、最後の
繰り返し波形（１波形分入っていない）のレベルは１つ
前の音源波形のレベルと同等とする。In this example, the lag is 1 of the frame length.
/ 2. Unlike this example, when the source waveform of the lag being searched for does not fit in an integral number in one frame, the level of the last repetitive waveform (not including one waveform) is equal to the level of the previous source waveform. To do.

【００２９】[0029]

【発明の効果】以上説明したように、本発明によれば、
あるフレーム又はサブフレームから次のフレーム又はサ
ブフレームに亘って入力音声のフレーム又はサブフレー
ム電力が立ち上がり又は立ち下がっている場合に、音源
コードブックから出力される信号に含まれる音源波形の
振幅を、入力音声の立ち上がり又は立ち下がりに係る振
幅遷移と同様に遷移させるようにしたため、女声のよう
に短いピッチ周期を有する入力音声を符号化する際に
も、その立ち上がり又は立ち下がりに係る傾斜を音源波
形に重み付けられ、適正な音源波形が得られ、適正なピ
ッチ抽出が行える。また、フレーム又はサブフレーム長
より短いラグが最適である場合に、立ち上がり及び立ち
下がりに係るフレーム又はサブフレームの量子化雑音を
低減できるから、入力音声の語頭及び語尾に係るフレー
ム又はサブフレームの量子化雑音が低減され、再生音声
品質が改善される。また、フレーム又はサブフレーム電
力を復号側に伝送している場合には、新たな伝送パラメ
ータは必要とされない。As described above, according to the present invention,
When the frame or subframe power of the input voice rises or falls from one frame or subframe to the next frame or subframe, the amplitude of the sound source waveform included in the signal output from the sound source codebook, Since the transition is performed in the same manner as the amplitude transition related to the rising or falling of the input voice, even when the input voice having a short pitch period such as a female voice is encoded, the slope related to the rising or the fall thereof is the source waveform. , A proper sound source waveform is obtained, and a proper pitch can be extracted. Further, when the lag shorter than the frame or subframe length is optimal, the quantization noise of the frame or subframe relating to the rising and falling can be reduced, so that the quantification of the frame or subframe relating to the beginning and ending of the input speech is performed. Noise is reduced and the reproduced voice quality is improved. Also, no new transmission parameters are required when transmitting frame or subframe power to the decoding side.

[Brief description of drawings]

【図１】本発明の一実施例に係る方法を装置のブロック
構成として示すブロック図である。FIG. 1 is a block diagram showing a method according to an embodiment of the present invention as a block configuration of an apparatus.

【図２】立ち上がり時の該当フレームから次フレームへ
のフレーム電力の遷移を示す図である。FIG. 2 is a diagram showing a transition of frame power from a corresponding frame to a next frame at the time of rising.

【図３】この電力遷移に伴う振幅遷移を示す図である。FIG. 3 is a diagram showing an amplitude transition associated with this power transition.

【図４】該当フレームにおいて音源波形に付与される振
幅レベルの傾斜を示す図である。FIG. 4 is a diagram showing a slope of an amplitude level given to a sound source waveform in a corresponding frame.

【図５】該当フレームに係る音源波形を示す図である。FIG. 5 is a diagram showing a sound source waveform related to a corresponding frame.

【図６】従来例に係る方法を装置のブロック構成として
示すブロック図である。FIG. 6 is a block diagram showing a method according to a conventional example as a block configuration of an apparatus.

[Explanation of symbols]

１０音源コードブック１２雑音コードブック１４，１６係数乗算器１８加算器２０重み付け合成フィルタ２２聴覚重み付けフィルタ２４減算器２６誤差電力最小化部２８立ち上がり、立ち下がり検出回路３０音源波形生成回路 10 Sound Source Codebook 12 Noise Codebook 14,16 Coefficient Multiplier 18 Adder 20 Weighting Synthesis Filter 22 Auditory Weighting Filter 24 Subtractor 26 Error Power Minimization Unit 28 Rise / Fall Detection Circuit 30 Sound Source Waveform Generation Circuit

Claims

[Claims]

1. A linear combination of an excitation code vector having a predetermined frame or sub-frame length and obtained by an excitation codebook and a noise code vector obtained by a noise codebook is obtained, input to a weighting synthesis filter, and its output signal is obtained. The optimum lag and the optimum gain are set in the search of the sound source codebook so that the error power obtained by comparing the input sound weighted with the perceptual weight is minimized, and information including the set optimum lag and the optimum gain is set. In the CELP coding method using the ABS method, in which the excitation code vector is generated by repeating the excitation waveform related to the optimal lag when the optimal lag is shorter than the frame or subframe length while transmitting, Or the frame of the input audio from the subframe to the next frame or subframe Is characterized in that when the subframe power is rising or falling, the amplitude of the sound source waveform included in the signal output from the sound source codebook is transited in the same manner as the amplitude transition related to the rising or falling of the input voice. Source sound source waveform generation method.