JPH05108098A

JPH05108098A - Speech encoding device

Info

Publication number: JPH05108098A
Application number: JP3267840A
Authority: JP
Inventors: Koji Yoshida; 田幸司吉; 正 ▲吉▼田; Tadashi Yoshida
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1991-10-16
Filing date: 1991-10-16
Publication date: 1993-04-30
Anticipated expiration: 2016-06-25
Also published as: JP3178732B2

Abstract

PURPOSE:To obtain a speech of high quality even when the bit rate is low. CONSTITUTION:For the generation of a driving sound source which generates a synthesized speech having minimum distortion for a weighted input speech obtained by an auditory weighting filter 11, a sound source switch 15 selects a sound source having small distortion between a pulse sound source outputted by a pulse sound source generator 13 and a noise sound source outputted by a probable code book 14 and outputs it from an adaptive code book 12, and a driving sound source generator 16 generates the driving sound source. The pulse component and noise component of the adaptive code book 12 are separated and stored in an adaptive code book pulse component storage unit and an adaptive code book noise component storage unit and a pulse and noise component gain controller may control the gains of the pulse component and noise component to optimum gains.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ディジタル通信、ボイ
スメール等に利用する音声符号化装置に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding device used for digital communication, voice mail and the like.

【０００２】[0002]

【従来の技術】近年、４．８〜８．０ｋｂ／ｓ程度の低
ビットレートにおける音声符号化装置は、図３に示すよ
うなＣＥＬＰ（Code Excited Linear Prediction Coder
)と呼ばれる音声符号化装置が広く用いられている。2. Description of the Related Art In recent years, a speech coding apparatus at a low bit rate of about 4.8 to 8.0 kb / s has a CELP (Code Excited Linear Prediction Coder) as shown in FIG.
) Is widely used.

【０００３】以下、従来のこの種の音声符号化装置につ
いて説明する。図３において、３１は入力音声の聴覚重
み付けを行なう聴覚重み付けフィルタであり、重み付き
入力音声を出力する。３２は適応コードブックであり、
過去の駆動音源を蓄えている。３３は確率的コードブッ
クであり、複数の雑音音源を予め保持している。３４は
適応コードブック３２と確率的コードブック３３とから
駆動音源信号を生成する駆動音源生成器である。３５は
駆動音源を入力として重み付き合成音声を生成する重み
付き合成フィルタである。３６は重み付き入力音声に対
する重み付き合成音声の歪を計算し、この歪が最も小さ
くなるような長期予測遅延とゲイン、確率的コードブッ
ク中の代表ベクトルとゲインを出力する歪最小化器であ
る。A conventional speech coding apparatus of this type will be described below. In FIG. 3, reference numeral 31 is a perceptual weighting filter that performs perceptual weighting of the input voice, and outputs a weighted input voice. 32 is an adaptive codebook,
Stores past driving sound sources. Reference numeral 33 is a stochastic codebook, which holds a plurality of noise sound sources in advance. A driving sound source generator 34 generates a driving sound source signal from the adaptive codebook 32 and the stochastic codebook 33. Reference numeral 35 is a weighted synthesis filter that generates a weighted synthesized speech with the driving sound source as an input. Reference numeral 36 is a distortion minimizer that calculates the distortion of the weighted synthesized speech with respect to the weighted input speech, and outputs the long-term prediction delay and gain that minimize this distortion, and the representative vector and gain in the stochastic codebook. ..

【０００４】以上のように構成された音声符号化装置に
ついて、以下その動作について説明する。まず、聴覚重
み付きフィルタ３１で重み付き入力音声ｖ［ｎ］を求め
る。ＣＥＬＰ符号化装置は、これに最も近い重み付き合
成音声を生成するような駆動音源ｅ［ｎ］を符号化する
ものである。ここで、駆動音源はｅ［ｎ］は次式のよう
に、適応コードブック３２の出力である長期予測信号ａ
［ｎ−Ｌ］と確率的コードブック３３中のベクトルｃ_I
［ｎ］からなる。The operation of the speech coding apparatus configured as described above will be described below. First, the perceptual weighting filter 31 obtains the weighted input voice v [n]. The CELP encoding device encodes the driving sound source e [n] that generates the weighted synthesized speech closest to the CELP encoding device. Here, in the driving sound source, e [n] is a long-term predicted signal a which is an output of the adaptive codebook 32 as shown in the following equation.
[N−L] and the vector c _I in the probabilistic codebook 33.
It consists of [n].

【０００５】ｅ［ｎ］＝β・ａ［ｎ−Ｌ］＋γ・ｃ_I［ｎ］・・・・（１）E [n] = β · a [n−L] + γ · c _I [n] ... (1)

【０００６】実際には、両成分を同時に決定するのは困
難であり、通常、適応コードブック３２から長期予測成
分のみをまず歪最小化器３６により決定し（適応コード
ブック探索）、過去の駆動信号をどれだけ遡るかを示す
長期予測遅延Ｌと最適ゲインβを出力する。次に、残り
の歪が最小となるように確率的コードブック３３により
歪最小化し（確率的コードブック探索）、選ばれた代表
コードブックの番号Ｉと最適ゲインγを出力する。In practice, it is difficult to determine both components at the same time. Usually, only the long-term predicted component is first determined from the adaptive codebook 32 by the distortion minimizer 36 (adaptive codebook search), and past driving is performed. It outputs a long-term predicted delay L indicating how far back the signal is going and an optimum gain β. Next, the distortion is minimized by the stochastic codebook 33 so as to minimize the remaining distortion (stochastic codebook search), and the selected representative codebook number I and the optimum gain γ are output.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上記従
来の音声符号化装置では、４．８ｋｂ／ｓ程度以下の低
ビットレートでは、駆動音源信号の生成の性能が下がる
に伴い、その過去の駆動音源信号を蓄えた適応コードブ
ックからの長期予測成分の生成にも性能の劣化が生じ、
特にパルス性のピッチ周期音源を持つ有声音で顕著な音
声品質の劣化が生じるという問題を有していた。However, in the above-mentioned conventional speech coding apparatus, at a low bit rate of about 4.8 kb / s or less, as the driving sound source signal generation performance deteriorates, the past driving sound source becomes worse. Performance degradation also occurs in the generation of long-term prediction components from the adaptive codebook that stores signals,
In particular, there is a problem that a voiced sound having a pulsed pitch period sound source causes a remarkable deterioration in voice quality.

【０００８】本発明は、上記従来の問題を解決するもの
であり、低ビットレートでも、パルス性のピッチ周期音
源を持つ有声音での音声品質の劣化を抑えた優れた音声
符号化装置を提供することを目的とするものである。The present invention solves the above-mentioned conventional problems, and provides an excellent speech coding apparatus which suppresses deterioration of speech quality of voiced sound having a pulse-like pitch period sound source even at a low bit rate. The purpose is to do.

【０００９】本発明の他の目的は、長期予測信号のパル
スと雑音成分のゲインを制御することにより、高い音声
品質を有する音声符号化装置を提供することである。Another object of the present invention is to provide a speech coder having high speech quality by controlling the gains of pulses and noise components of a long-term predicted signal.

【００１０】[0010]

【課題を解決するための手段】本発明は、上記目的を達
成するために、確率的コードブックに加えて、パルス音
源を生成するパルス音源生成器と、このパルス音源生成
器から生成されたパルス音源または確率的コードブック
から生成された雑音音源のいずれかを選択する音源切り
替え器とを備え、低ビットレートでも、パルス性のピッ
チ周期音源を持つ有声音での音声品質の劣化を抑えるよ
うにしたものである。In order to achieve the above object, the present invention provides, in addition to a stochastic codebook, a pulse sound source generator for generating a pulse sound source, and a pulse generated by the pulse sound source generator. A sound source switch that selects either a sound source or a noise source generated from a stochastic codebook is provided, and it is possible to suppress deterioration of voice quality in voiced sound having a pulse-like pitch period sound source even at a low bit rate. It was done.

【００１１】また本発明の他の目的を達成するために、
適応コードブックの代わりに、長期予測信号のパルス成
分を生成する適応コードブックパルス成分格納器と、雑
音成分を生成する適応コードブック雑音成分格納器と、
長期予測のパルス成分および雑音成分のゲインを制御す
るパルス・雑音成分ゲイン制御器とを備え、長期予測信
号のパルスおよび雑音成分のゲインを制御することによ
り高い音声品質を実現するようにしたものである。In order to achieve another object of the present invention,
Instead of the adaptive codebook, an adaptive codebook pulse component store that generates a pulse component of the long-term prediction signal, an adaptive codebook noise component store that generates a noise component,
It is equipped with a pulse / noise component gain controller that controls the gain of the pulse component and noise component of the long-term prediction, and realizes high voice quality by controlling the gain of the pulse and noise component of the long-term prediction signal. is there.

【００１２】[0012]

【作用】したがって、本発明によれば、パルス音源生成
器の出力であるパルス音源と確率的コードブックから生
成される雑音音源のいずれか最適な音源を音源切り替え
器により選択することにより、低ビットレートでも、パ
ルス性のピッチ周期音源を持つ有声音でパルス成分を生
成することができ、音声品質の劣化を抑えた符号化が行
なえるという効果を有する。Therefore, according to the present invention, by selecting the optimum sound source of the pulse sound source which is the output of the pulse sound source generator and the noise sound source generated from the stochastic codebook by the sound source switching unit, low bit Even at the rate, it is possible to generate a pulse component with a voiced sound having a pulse-like pitch period sound source, and it is possible to perform encoding while suppressing deterioration of voice quality.

【００１３】また本発明によれば、長期予測器に設けら
れたパルス・雑音成分ゲイン制御器により、長期予測信
号のパルスおよび雑音成分のゲインを最適に制御するこ
とにより、高い音声品質を有する音声符号化が行なえる
という効果を有する。Further, according to the present invention, the pulse / noise component gain controller provided in the long-term predictor optimally controls the gains of the pulse and noise components of the long-term predictive signal, so that the voice having high voice quality is obtained. It has an effect that encoding can be performed.

【００１４】[0014]

【実施例】図１は本発明の第１の実施例の構成を示すも
のである。図１において、１１は入力音声の聴覚重み付
けを行なう聴覚重み付けフィルタ、１２は過去の駆動音
源を蓄える適応コードブック、１３はパルス音源を生成
するパルス音源生成器、１４は複数の雑音音源を保持し
ている確率的コードブック、１５はパルス音源生成器１
３または確率的コードブック１４から生成された音源の
いずれかを選択する音源切り替え器、１６はパルス音源
生成器１３または確率的コードブック１４のいずれかか
ら選択された音源と適応コードブック１２の出力とから
駆動音源を生成する駆動音源生成器、１７は駆動音源か
ら重み付き合成音声を生成する重み付き合成フィルタ、
１８は重み付き入力音声と重み付き合成音声との歪を計
算し、その歪が最小となるような長期予測遅延とパルス
位置または確率的コードブック符号およびこれらのゲイ
ンを出力する歪最小化器である。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the configuration of the first embodiment of the present invention. In FIG. 1, 11 is a perceptual weighting filter for perceptually weighting input speech, 12 is an adaptive codebook for storing past driving sound sources, 13 is a pulse sound source generator for generating pulse sound sources, and 14 is a plurality of noise sound sources. Probabilistic codebook, 15 is pulse generator 1
3 or a sound source switcher for selecting one of the sound sources generated from the probabilistic codebook 14, 16 is the sound source selected from either the pulse sound source generator 13 or the stochastic codebook 14, and the output of the adaptive codebook 12. And a driving sound source generator that generates a driving sound source from, and 17 is a weighted synthesis filter that generates a weighted synthetic speech from the driving sound source,
Reference numeral 18 denotes a distortion minimizer that calculates distortion between the weighted input speech and the weighted synthesized speech, and outputs a long-term prediction delay and a pulse position or a stochastic codebook code that minimize the distortion and their gains. is there.

【００１５】次に上記第１の実施例の動作について説明
する。まず、聴覚重み付けフィルタ１１で重み付き入力
音声ｖ［ｎ］を求め、以後これに最も近い重み付き合成
音声を生成する駆動音源ｅ［ｎ］を符号化する。ここ
で、駆動音源生成器１６で生成される駆動音源ｅ［ｎ］
は、適応コードブック１２の出力の長期予測信号ａ［ｎ
−Ｌ］、パルス音源生成器１３で生成されたパルス成分
ｐ_M［ｎ］または確率的コードブック１４から雑音成分
ｃ_I［ｎ］からなり、以下の式で表わされる。ｅ［ｎ］＝β・ａ［ｎ−Ｌ］＋γ_P・ｐ_M［ｎ］・・・・（２）または、ｅ［ｎ］＝β・ａ［ｎ−Ｌ］＋γ_N・ｃ_I［ｎ］・・・・（３）ここで、ｐ_M［ｎ］は位置Ｍからピッチ周期間隔で単一
インパルスを持つパルス列、β、γ_P、γ_Nはそれぞれ
長期予測、パルス、白色雑音成分のゲインである。Next, the operation of the first embodiment will be described. First, the perceptual weighting filter 11 obtains the weighted input voice v [n], and thereafter, the driving sound source e [n] that produces the closest weighted synthesized voice is encoded. Here, the driving sound source e [n] generated by the driving sound source generator 16
Is the long-term prediction signal a [n of the output of the adaptive codebook 12
-L], the pulse component p _M [n] generated by the pulse source generator 13 or the noise component c _I [n] from the stochastic codebook 14, and is represented by the following equation. e [n] = β · a [n−L] + γ _P · p _M [n] ··· (2) or e [n] = β · a [n−L] + γ _N · c _I [n ] (3) where p _M [n] is a pulse train having a single impulse from the position M at pitch period intervals, β, γ _P , and γ _N are long-term prediction, pulse, and gain of white noise component, respectively. Is.

【００１６】上記式（２）および（３）は、いずれも２
つの成分を同時に決定することは困難であり、まず、適
応コードブック１２の長期予測信号をその聴覚重み付き
合成音声の歪最小化により決定し、過去の駆動信号のど
の部分を用いるかを示す長期予測遅延Ｌと最適ゲインβ
を出力する。次に、長期予測信号決定後の歪に対して、
式（２）および（３）のそれぞれに対応して、パルス音
源または雑音音源により歪最小化を行ない、歪の小さい
方の音源をその符号化区間で音源として選択し、それが
パルス音源であればパルス位置Ｍと最適ゲインγ_P、雑
音音源であれば選ばれた雑音ベクトルの番号Ｉと最適ゲ
インγ_Nを出力する。The above equations (2) and (3) are both 2
It is difficult to determine the two components at the same time. First, the long-term prediction signal of the adaptive codebook 12 is determined by distortion minimization of the perceptually weighted synthetic speech, and a long-term indication indicating which part of the past drive signal is used. Prediction delay L and optimal gain β
Is output. Next, for the distortion after determining the long-term predicted signal,
Corresponding to equations (2) and (3), distortion minimization is performed by a pulse sound source or a noise sound source, and the sound source with the smaller distortion is selected as the sound source in the coding section. For example, the pulse position M and the optimum gain γ _P , and if it is a noise source, the selected noise vector number I and the optimum gain γ _N are output.

【００１７】このように上記第１の実施例によれば、長
期予測信号決定後の歪に対して、パルス音源または雑音
音源のいずれか歪の小さい音源を選択することにより、
パルス性のピッチ周期音源を持つ有声音でパルス成分を
生成することができ、低ビットレートでも音声品質の劣
化を抑えた符号化が行なえる効果を有する。As described above, according to the first embodiment, with respect to the distortion after the determination of the long-term predicted signal, either the pulse sound source or the noise sound source having a small distortion is selected,
It is possible to generate a pulse component with a voiced sound having a pulse-like pitch period sound source, and it is possible to perform encoding while suppressing deterioration of voice quality even at a low bit rate.

【００１８】図２は本発明の第２の実施例の構成を示す
ものである。図２において、２１は適応コードブックの
パルス成分を持つ適応コードブックパルス成分格納器、
２２は適応コードブックの雑音成分を持つ適応コードブ
ック雑音成分格納器、２３は長期予測信号のパルスおよ
び雑音成分のゲインを制御するパルス・雑音成分ゲイン
制御器であり、他は第１の実施例と同じ構成である。す
なわち、２４はパルス音源を生成するパルス音源生成
器、２５は複数の雑音音源を保持している確率的コード
ブック、２６はパルス音源生成器２４または確率的コー
ドブック２５から生成された音源のいずれかを選択する
音源切り替え器、２７はパルス音源生成器２４または確
率的コードブック２５のいずれかから選択された音源と
パルス・雑音成分ゲイン制御器２３の出力とから駆動音
源を生成する駆動音源生成器、２８は駆動音源から重み
付き合成音声を生成する重み付き合成フィルタ、２９は
重み付き入力音声と重み付き合成音声との歪を計算し、
その歪が最小となるような長期予測遅延とパルス位置ま
たは確率的コードブック符号およびこれらのゲインを出
力する歪最小化器、３０は入力音声の聴覚重み付けを行
なう聴覚重み付けフィルタである。FIG. 2 shows the configuration of the second embodiment of the present invention. In FIG. 2, reference numeral 21 denotes an adaptive codebook pulse component storage unit having an adaptive codebook pulse component,
Reference numeral 22 is an adaptive codebook noise component storage having a noise component of the adaptive codebook, 23 is a pulse / noise component gain controller for controlling the gain of the pulse and noise components of the long-term prediction signal, and the others are the first embodiment. It has the same configuration as. That is, 24 is a pulse source generator that generates a pulse source, 25 is a stochastic codebook that holds a plurality of noise sources, and 26 is either a pulse source generator 24 or a source generated from the stochastic codebook 25. A sound source switching unit for selecting whether or not, 27 is a drive sound source generation for generating a drive sound source from the sound source selected from either the pulse sound source generator 24 or the stochastic codebook 25 and the output of the pulse / noise component gain controller 23. , 28 is a weighted synthesis filter that generates a weighted synthesized speech from the driving sound source, 29 is a distortion between the weighted input speech and the weighted synthesized speech,
A distortion minimizer that outputs a long-term prediction delay and a pulse position or a probabilistic codebook code that minimizes the distortion and their gains, and 30 is a perceptual weighting filter that performs perceptual weighting of input speech.

【００１９】次に上記第２の実施例の動作について説明
する。第１の実施例と同様に重み付き入力音声に最も近
い重み付き合成音声を生成する駆動音源ｅ［ｎ］を符号
化するため、まず長期予測信号の歪最小化を行なう。そ
の際、第２の実施例では、長期予測信号ａ_S［ｎ］をパ
ルス成分ａ_P［ｎ−Ｌ］と雑音成分ａ_N［ｎ−Ｌ］の和
で以下の式で表現し、長期予測遅延Ｌおよびそれぞれの
成分のゲインβ_P、β _Nを決定する。ａ_S［ｎ］＝β_P・ａ_P［ｎ−Ｌ］＋β_N・ａ_N［ｎ−Ｌ］・・・（４）Next, the operation of the second embodiment will be described.
To do. The closest to the weighted input speech as in the first embodiment.
Code the driving sound source e [n] that generates a weighted synthetic speech
First, the distortion of the long-term predicted signal is minimized. So
At this time, in the second embodiment, the long-term predicted signal a_S[N]
Ruth component a_P[N-L] and noise component a_NSum of [n-L]
And the long-term prediction delay L and
Component gain β_P, Β _NTo decide. a_S[N] = β_P・ A_P[N-L] + β_N・ A_N[N-L] (4)

【００２０】長期予測遅延Ｌおよびそれぞれの成分のゲ
インβ_P、β_Nの決定法の一例として、まずβ_P＝β_N
として長期予測遅延Ｌを求め、そのＬに対して、以下の
式（５）で示される入力音声との重み付き二乗誤差Ｅを
最小にするような、最適なパルス成分ゲインβ_Pおよび
雑音成分ゲインβ_Nを決定する。Ｅ＝Σ｛ｐ［ｎ］−β_P・ｂ_P［ｎ］−β_N・ｂ_N［ｎ］｝² →Ｍｉｎ・・・（５）ｐ［ｎ］：聴覚重み付き入力音声ｂ_P［ｎ］：適応コードブックパルス成分出力の聴覚重
み付き合成音声 β_P：パルス成分ゲインｂ_N［ｎ］：適応コードブック雑音成分出力の重み付き
合成音声 β_N：雑音成分ゲインAs an example of the method of determining the long-term prediction delay L and the gains β _P and β _N of each component, first, β _P = β _N
As a long-term prediction delay L, the optimum pulse component gain β _P and noise component gain that minimize the weighted squared error E with respect to the input voice expressed by the following equation (5) are obtained. Determine β _N. E = Σ {p [n] -β P · b P [n] -β N · b N [n]} 2 → Min ··· (5) p [n]: the perceptually weighted input speech b _P [n ]: Adaptive codebook pulse component output auditory weighted synthesized speech β _P : Pulse component gain b _N [n]: Adaptive codebook noise component output weighted synthesized speech β _N : Noise component gain

【００２１】上記式においてＥが最小になるように
β_P、β_Nについて解くことにより、最適なβ_P、β_N
が以下のように求められる。By solving for β _P and β _N so that E becomes minimum in the above equation, the optimum β _P and β _N
Is calculated as follows.

【００２２】[0022]

【数１】 [Equation 1]

【００２３】なお、適応コードブックの長期予測成分の
歪最小化後は、第１の実施例と同様に長期予測成分決定
後の歪に対して、パルス音源または雑音音源のいずれか
歪の小さい音源を選択する。また、適応コードブックの
更新は、パルス成分および雑音成分のそれぞれを別々に
行ない、選択された音源がパルス音源であれば、適応コ
ードブックのパルス成分の更新に使用し、雑音音源であ
れば雑音成分の更新に使用する。After the distortion of the long-term prediction component of the adaptive codebook is minimized, the pulse source or the noise source, which has a small distortion, is the same as the first embodiment. Select. Also, the adaptive codebook is updated separately for each of the pulse component and the noise component. If the selected sound source is a pulse sound source, it is used for updating the pulse component of the adaptive codebook. Used to update the ingredients.

【００２４】このように上記第２の実施例によれば、適
応コードブックをパルス成分と雑音成分とに分離して格
納することにより、長期予測信号のパルス成分と雑音成
分のゲインを最適に制御することができ、これにより高
い音声品質を有する音声符号化が行なえるという効果を
有する。As described above, according to the second embodiment, the gain of the pulse component and the noise component of the long-term prediction signal is optimally controlled by storing the adaptive codebook by separating it into the pulse component and the noise component. This has the effect of enabling speech coding with high speech quality.

【００２５】[0025]

【発明の効果】本発明は、上記実施例から明らかなよう
に、長期予測成分決定後の歪に対して、パルス成分また
白色雑音成分のいずれか歪の小さい音源を選択すること
により、パルス性のピッチ周期音源を持つ有声音でパル
ス成分を生成することができ、低ビットレートでも音声
品質の劣化を抑えた符号化が行なえるという効果を有す
る。As is apparent from the above embodiment, the present invention selects the pulse component or the white noise component, which has a small distortion, with respect to the distortion after the determination of the long-term prediction component. It is possible to generate a pulse component with a voiced sound having a pitch period sound source, and it is possible to perform encoding while suppressing deterioration of voice quality even at a low bit rate.

【００２６】また本発明によれば、適応コードブックを
パルス成分と雑音成分とに分離して格納することによ
り、長期予測信号のパルス成分と雑音成分のゲインを最
適に制御することができ、これにより高い音声品質を有
する音声符号化が行なえるという効果を有する。Further, according to the present invention, the gain of the pulse component and the noise component of the long-term prediction signal can be optimally controlled by storing the adaptive codebook by separating it into the pulse component and the noise component. This has the effect of enabling speech coding with higher speech quality.

[Brief description of drawings]

【図１】本発明の第１の実施例における音声符号化装置
の概略ブロック図FIG. 1 is a schematic block diagram of a speech encoding apparatus according to a first embodiment of the present invention.

【図２】本発明の第２の実施例における音声符号化装置
の概略ブロック図FIG. 2 is a schematic block diagram of a speech coder according to a second embodiment of the present invention.

【図３】従来のＣＥＬＰ音声符号化装置の概略ブロック
図FIG. 3 is a schematic block diagram of a conventional CELP speech coding apparatus.

[Explanation of symbols]

１１聴覚重み付けフィルタ１２適応コードブック１３パルス音源生成器１４確率的コードブック１５音源切り替え器１６駆動音源生成器１７重み付き合成フィルタ１８歪最小化器２１適応コードブックパルス成分格納器２２適応コードブック雑音成分格納器２３パルス・雑音成分ゲイン制御器２４パルス音源生成器２５確率的コードブック２６音源切り替え器２７駆動音源生成器２８重み付き合成フィルタ２９歪最小化器３０聴覚重み付けフィルタ 11 Auditory Weighting Filter 12 Adaptive Codebook 13 Pulse Sound Source Generator 14 Stochastic Codebook 15 Sound Source Switcher 16 Driving Sound Source Generator 17 Weighted Synthesis Filter 18 Distortion Minimizer 21 Adaptive Codebook Pulse Component Store 22 Adaptive Codebook Noise Component storage 23 Pulse / noise component gain controller 24 Pulse sound source generator 25 Stochastic codebook 26 Sound source switcher 27 Drive sound source generator 28 Weighted synthesis filter 29 Distortion minimizer 30 Auditory weighting filter

Claims

[Claims]

1. A perceptual weighting filter for performing perceptual weighting of input speech in a certain section to generate a weighted input speech, an adaptive codebook for storing past driving sound sources, and a pulse sound source generator for generating pulse sound sources, A stochastic codebook that stores a plurality of noise sources in advance, a sound source switcher that selects one of the pulse sound source generator and the sound source generated from the stochastic codebook, and the pulse sound source generator or the stochastic codebook A driving sound source generator that generates a driving sound source from a sound source selected from any of the following and an output of the adaptive codebook, a weighted synthesis filter that synthesizes speech with the driving sound source generated from the driving sound source generator as an input, Calculate the sum of squares of the error of the weighted synthetic speech with respect to the weighted input speech, and minimize the long-term prediction delay, pulse A speech coder having a position or probabilistic codebook code and a distortion minimizer for selecting respective gains and outputting the code.

2. An adaptive codebook pulse component store for storing pulse components of a long-term predicted signal, an adaptive codebook noise generation store for storing noise components, and a pulse component for long-term prediction instead of the adaptive codebook. The speech coding apparatus according to claim 1, further comprising a pulse / noise generation gain controller for controlling a noise generation gain, wherein the gains of the pulse component and the noise component of the long-term prediction signal can be controlled independently.

3. The pulse / noise component gain controller determines the gains of the pulse component and the noise component of the long-term predicted signal so as to minimize the perceptual weighted squared error with respect to the input speech. 2. The audio encoding device according to 2.