JP3183944B2

JP3183944B2 - Audio coding device

Info

Publication number: JP3183944B2
Application number: JP10672792A
Authority: JP
Inventors: 隆史吉原
Original assignee: Olympus Optic Co Ltd
Current assignee: Olympus Corp
Priority date: 1992-04-24
Filing date: 1992-04-24
Publication date: 2001-07-09
Anticipated expiration: 2016-07-09
Also published as: JPH05303398A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化装置に係り、
特に分析合成方式を用いた音声符号化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding apparatus,
In particular, the present invention relates to a speech encoding device using an analysis and synthesis method.

【０００２】[0002]

【従来の技術】従来より音声符号化装置に採用されてい
る分析合成方式の代表例には線形予測分析がある。そし
て、この線形予測分析は、音声波形を一定の時間間隔で
標本化した時間離散的信号をｘt （t ：時間を表す整
数）とすると、現時点における標本値ｘt と、これに隣
接する過去のＰ個の標本値との間に相関があり、下記の
（１）式で示される線形方程式からｘt の線形予測値を
求めるものである。2. Description of the Related Art A typical example of an analysis / synthesis method conventionally used in a speech coding apparatus is a linear prediction analysis. In this linear prediction analysis, assuming that a time discrete signal obtained by sampling a speech waveform at a fixed time interval is xt (t: integer representing time), a sample value xt at the present time and a past P There is a correlation between these sample values and a linear prediction value of xt is obtained from a linear equation expressed by the following equation (1).

【０００３】[0003]

【数１】 (Equation 1)

【０００４】上記（１）式において、αi は標本値ｘt
と線形予測値との間の予測誤差ｅtの２乗和を最小とす
るように定める。尚、このαi の解法としては共分散法
と自己相関法がある。一方、上記線形予測分析を用いた
ものにＣＥＬＰ(Code-Excited Linear Prediction)符号
化がある。In the above equation (1), αi is a sample value xt
Is determined so as to minimize the sum of squares of the prediction error et between the and the linear prediction value. Incidentally, there are a covariance method and an autocorrelation method as a solution of αi. On the other hand, there is CELP (Code-Excited Linear Prediction) coding using the above linear prediction analysis.

【０００５】図９は上記ＣＥＬＰ符号化を採用した符号
器の構成例を示す図であり、同図に示すように、音声入
力部５から入力された音声は線形予測分析部１５に供給
され線形予測係数αi が求められる。そして、この線形
予測係数αi は線形予測係数量子化部１６にてスカラー
量子化され、後段の線形予測器１７に供給される。FIG. 9 is a diagram showing an example of the configuration of an encoder employing the CELP coding. As shown in FIG. 9, a speech input from a speech input section 5 is supplied to a linear prediction analysis section 15 and The prediction coefficient αi is obtained. The linear prediction coefficient αi is scalar-quantized by the linear prediction coefficient quantization unit 16 and supplied to the subsequent linear predictor 17.

【０００６】この線形予測器１７には、同時にコードブ
ック２２からの励起ベクトルｂj も入力され、入力音声
ｘとその線形予測音声との差分がとられ、予測誤差ｅj
が求まる。そして、この予測誤差ｅj は聴覚重み付けフ
ィルタ１８を介することで聴覚的雑音感が低減され、後
段の平均２乗誤差計算部１９に供給される。この平均２
乗誤差計算部１９では、予測誤差ｅ'jの平均２乗誤差を
求め、最小の平均２乗誤差と、その時の励起ベクトルｂ
j を保持する。The linear predictor 17 also receives the excitation vector bj from the codebook 22 at the same time, calculates the difference between the input speech x and its linear prediction speech, and obtains a prediction error ej.
Is found. Then, the prediction error ej is passed through the auditory weighting filter 18 so that the auditory noise is reduced, and is supplied to the mean square error calculator 19 in the subsequent stage. This average 2
The square error calculator 19 calculates the mean square error of the prediction error e′j , and calculates the minimum mean square error and the excitation vector b at that time.
holds j.

【０００７】このような処理をコードブック２２の全て
の励起ベクトルについて行った後、音声復号器２１に、
量子化された線形予測係数αi と励起ベクトルｂj が送
出される。[0007] After performed such processing on all the excitation vectors in the codebook 22, the audio decrypt circuit 21,
The quantized linear prediction coefficient αi and the excitation vector bj are transmitted.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上述し
た線形予測分析は、線形モデルを用いた予測分析であ
り、その予測誤差ｅt を線形最小２乗法で解くため、桁
落ちにより精度が落ちるという欠点がある。However, the above-described linear prediction analysis is a prediction analysis using a linear model, and the prediction error et is solved by the linear least squares method. is there.

【０００９】さらに、上記線形モデルでは、短区間の音
声信号は定常であるとの仮定を用いているが、実際には
定常ではないため線形予測誤差がそれほど小さくならな
い。また、上記ＣＥＬＰ符号化では、聴覚重み付けフィ
ルタにより処理された予測誤差の平均２乗誤差を最小と
する励起ベクトルをコードブックの中からサーチしてい
るだけである為、その量子化誤差により予測誤差がそれ
ほど小さくならない。本発明は上記問題に鑑みてなされ
たもので、ニューラルネットワークの学習機能を用いて
桁落ち及び量子化誤差による精度低下を防止することに
ある。Further, in the above linear model, it is assumed that the speech signal in a short section is stationary. However, since the speech signal is not stationary in practice, the linear prediction error does not become so small. Further, in the CELP coding, since only the excitation vector which minimizes the mean square error of the prediction error processed by the auditory weighting filter is searched from the codebook, the prediction error is calculated by the quantization error. Is not so small. SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to use a learning function of a neural network to prevent precision loss due to digit loss and quantization error.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声符号化装置は、一定の時間間隔で標本
化された入力音声より線形予測係数を分析次数分だけ計
算する線形予測分析手段と、上記線形予測分析手段によ
り算出された線形予測係数を量子化する線形予測係数量
子化手段と、上記線形予測係数量子化手段により量子化
された信号と、コードブックからの情報に基づいて線形
予測音声を算出する線形予測手段と、上記入力音声と上
記線形予測音声の差分値である線形予測誤差に対して雑
音感低減を行うフィルタ手段と、上記フィルタ手段から
の出力信号より平均２乗誤差を算出し、最小の平均２乗
誤差と、その時の励起ベクトルを保持する平均２乗誤差
計算手段と、入力層と出力層からなる２層の階層型ニュ
ーラルネットワーク手段と、上記励起ベクトルと上記線
形予測係数を受け、該励起ベクトルのみによる応答値を
算出し、入力音声とこの応答値の差分値を上記階層型ニ
ューラルネットワーク手段の教師データとして出力する
ゼロ状態応答算出手段とを具備し、上記階層型ニューラ
ルネット手段のシナプス結合係数の初期値として、あら
かじめ計算された入力音声の線形予測係数を用いて学習
し更新することを特徴とする。In order to achieve the above object, a speech coding apparatus according to the present invention comprises a linear prediction apparatus for calculating a linear prediction coefficient for an analysis order from input speech sampled at fixed time intervals. Analysis means, a linear prediction coefficient quantization means for quantizing the linear prediction coefficient calculated by the linear prediction analysis means, a signal quantized by the linear prediction coefficient quantization means, and information from a codebook. A linear prediction means for calculating a linear prediction speech by using a filter, a filter means for reducing a feeling of noise with respect to a linear prediction error which is a difference value between the input speech and the linear prediction speech, and an average of two output signals from the filter means. Means for calculating a squared error, a minimum mean squared error, a mean squared error calculating means for holding the excitation vector at that time, and a two-layered neural network comprising an input layer and an output layer A zero-state response which receives the excitation vector and the linear prediction coefficient, calculates a response value only by the excitation vector, and outputs a difference value between the input voice and the response value as teacher data of the hierarchical neural network means. Calculating means for learning and updating by using a linear prediction coefficient of the input speech calculated in advance as an initial value of the synaptic coupling coefficient of the hierarchical neural network means.

【００１１】[0011]

【作用】即ち、本発明の音声符号化装置では、線形予測
分析手段により一定の時間間隔で標本化された入力音声
より線形予測係数が分析次数分だけ算出されると、線形
予測係数量子化手段によりこの線形予測係数が量子化さ
れ、線形予測手段によりこの量子化された信号とコード
ブックからの情報に基づいて線形予測音声が算出され
る。そして、フィルタ手段により上記入力音声と上記線
形予測音声の差分値である線形予測誤差に対して雑音感
低減が行われ、平均２乗誤差計算手段により、このフィ
ルタ手段からの出力信号より平均２乗誤差が算出され、
最小の平均２乗誤差とその時の励起ベクトルが保持され
る。そして、ゼロ状態応答算出手段により、上記励起ベ
クトルと上記線形予測係数を受け、該励起ベクトルのみ
による応答値を算出し、入力音声とこの応答値の差分値
が２層の階層型ニューラルネットワーク手段の教師デー
タとして出力される。この階層型ニューラルネット手段
のシナプス結合係数の初期値は、あらかじめ計算された
入力音声の線形予測係数が用いて学習され、更新され
る。That is, in the speech coding apparatus of the present invention, when linear prediction coefficients are calculated by the analysis order from the input speech sampled at predetermined time intervals by the linear prediction analysis means, the linear prediction coefficient quantization means , The linear prediction coefficient is quantized, and a linear prediction speech is calculated by the linear prediction means based on the quantized signal and information from the codebook. Then, the filter means reduces the noise sensation for the linear prediction error, which is the difference value between the input speech and the linear prediction speech, and the mean square error calculation means calculates the mean square from the output signal from the filter means. The error is calculated,
The minimum mean square error and the excitation vector at that time are retained. Then, the zero-state response calculating means receives the excitation vector and the linear prediction coefficient, calculates a response value based only on the excitation vector, and calculates the difference between the input voice and the response value of the two-layer hierarchical neural network means. Output as teacher data. The initial value of the synaptic coupling coefficient of the hierarchical neural network means is learned and updated using the linear prediction coefficient of the input speech calculated in advance.

【００１２】[0012]

【実施例】以下、本発明の原理について説明する。線形
予測値は線形予測係数αi と過去の標本値ｘt-i とか
ら、上記（１）式のように表すことができるが、その予
測誤差ｅt は（２）式のように示される。DESCRIPTION OF THE PREFERRED EMBODIMENTS The principle of the present invention will be described below. The linear prediction value can be represented by the above equation (1) from the linear prediction coefficient αi and the past sample value xt-i, and the prediction error et is represented by the equation (2).

【００１３】[0013]

【数２】 (Equation 2)

【００１４】ここで、図２に示すような２層の階層型線
形ニューラルネットワーク１を考えると、過去の標本値
ｘt-i を入力層２の各ニューロンユニットへの入力値、
線形予測係数αi を入出力層間のシナプス結合係数、線
形予測値を出力層３の各ニューロンネットの出力値とみ
なすことができる。Here, considering a two-layer hierarchical linear neural network 1 as shown in FIG. 2, past sample values xt-i are used as input values to each neuron unit of the input layer 2,
The linear prediction coefficient αi can be regarded as the synaptic coupling coefficient between the input and output layers, and the linear prediction value can be regarded as the output value of each neuron net of the output layer 3.

【００１５】そこで、教師信号として現時点における標
本値ｘt を用い、予測誤差ｅt の２乗和を最小化するよ
うに、シナプス結合係数、即ち線形予測係数αi の学習
を行う。Therefore, the current sample value xt is used as the teacher signal, and learning of the synaptic coupling coefficient, that is, the linear prediction coefficient αi is performed so as to minimize the sum of squares of the prediction error et.

【００１６】同図に示す階層型線形ニューラルネットワ
ーク１において、入力層２に過去の標本値ｘt-i が線形
予測分析の次数分入力されると、線形予測係数に相当す
るシナプス結合係数との積和がとられ、線形予測値が得
られる。学習については、その誤差Ｅを（３）式のよう
定めることができる。In the hierarchical linear neural network 1 shown in FIG. 1, when past sample values xt-i are input to the input layer 2 for the order of the linear prediction analysis, the product of the product and the synapse coupling coefficient corresponding to the linear prediction coefficient is obtained. The sum is taken and a linear prediction is obtained. For learning, the error E can be determined as in equation (3).

【００１７】[0017]

【数３】そして、（４）式のようなバックプロパゲーション学習
と呼ばれる方法を採用する。(Equation 3) Then, a method called back propagation learning as in the equation (4) is adopted.

【００１８】[0018]

【数４】 (Equation 4)

【００１９】次に、図３は、先に図２に示した階層型線
形ニューラルネットワーク１に非線形ニューロンユニッ
ト４を入出力層間に付加し、本来非線形であるため線形
予測だけでは困難な音声の特徴を予測可能としているも
のである。同図に示す非線形ニューロンユニット４は、
入力層２からの入力値とシナプス結合係数との積和を非
線形関数ｆ（ｘ）により変換して出力する。このとき、
非線形ニューロンユニットｋの出力値Ｙtkは（５）式の
ように示される。Next, FIG. 3 shows a feature of a speech in which a nonlinear neuron unit 4 is added between the input and output layers to the hierarchical linear neural network 1 shown in FIG. Is predictable. The nonlinear neuron unit 4 shown in FIG.
The product sum of the input value from the input layer 2 and the synaptic coupling coefficient is converted by a nonlinear function f (x) and output. At this time,
The output value Ytk of the non-linear neuron unit k is expressed as in equation (5).

【００２０】[0020]

【数５】 (Equation 5)

【００２１】尚、ｆ（ｘ）＝１／（１＋ｅｘｐ（−
ｘ））のようなシグモイド関数を用い、前述したように
バックプロパゲーション学習法を用いる。Ｐ´は入力層
のニューロンユニットと非線形ニューロンユニット間の
シナプス結合の数である。以下、前述したような原理に
基づく本発明の実施例について説明する。図４は本発明
の第１の実施例の構成を示す図である。Note that f (x) = 1 / (1 + exp (-
x)), and the back propagation learning method is used as described above. P ′ is the number of synaptic connections between the neuron unit of the input layer and the nonlinear neuron unit. Hereinafter, embodiments of the present invention based on the above-described principle will be described. FIG. 4 is a diagram showing the configuration of the first embodiment of the present invention.

【００２２】同図に示すように、符号器２０では、音声
入力部５が２層の階層型線形ニューラルネットワーク１
の入力層２に接続されており、該ニューラルネットワー
ク１の出力層３はシナプス結合係数学習部８に接続され
ている。そして、上記入力音声５は、更に入力音声から
線形予測係数を分析次数分求めるための線形予測係数計
算部６、及びバックプロパゲーション学習によりシナプ
ス結合係数の学習演算を行うシナプス結合係数学習部
８、予測誤差ｅt を求めるための予測誤差計算部１０に
それぞれ接続されている。As shown in FIG. 1, in the encoder 20, the speech input unit 5 includes a two-layer hierarchical linear neural network 1.
The output layer 3 of the neural network 1 is connected to a synapse coupling coefficient learning unit 8. The input speech 5 further includes a linear prediction coefficient calculation unit 6 for obtaining a linear prediction coefficient for the analysis order from the input speech, a synapse coupling coefficient learning unit 8 for performing a learning operation of a synaptic coupling coefficient by back propagation learning, Each is connected to a prediction error calculator 10 for obtaining a prediction error et.

【００２３】上記線形予測係数計算部６、及びシナプス
結合係数学習部８はシナプス結合係数設定部７に接続さ
れており、該シナプス結合係数設定部７は、階層型線形
ニューラルネット１に接続されている。そして、上記階
層型線形ニューラルネットワーク１は、そのシナプス結
合係数を量子化するシナプス結合係数量子化部９にも接
続されており、該シナプス結合係数量子化部９は上記予
測誤差計算部１０、及び入力音声に係るシナプス結合係
数と予測誤差の量子化したデータを基に音声波形を合成
する音声復号器２１に接続されている。The linear predictive coefficient calculating section 6 and the synaptic coupling coefficient learning section 8 are connected to a synaptic coupling coefficient setting section 7, and the synaptic coupling coefficient setting section 7 is connected to the hierarchical linear neural network 1. I have. The hierarchical linear neural network 1 is also connected to a synapse coupling coefficient quantization unit 9 that quantizes the synaptic coupling coefficient, and the synapse coupling coefficient quantization unit 9 includes the prediction error calculation unit 10 and It is connected to an audio decoder 21 that synthesizes an audio waveform based on the quantized data of the synapse coupling coefficient and the prediction error of the input audio.

【００２４】上記予測誤差計算部１０には、予測誤差を
量子化するための予測誤差量子化部１１が接続されてお
り、該予測誤差量子化部１１は音声復号器２１に接続さ
れている。The prediction error calculator 10 is connected to a prediction error quantizer 11 for quantizing the prediction error. The prediction error quantizer 11 is connected to a speech decoder 21.

【００２５】このような構成において、一定の時間間隔
で標本化された音声が音声入力部５より所定個数分の音
声が線形予測係数計算部６に入力されると、公知の技術
である共分散法または自己相関法により線形予測係数が
分析次数分だけ計算される。通常、この分析次数Ｐは１
０程度である。この計算結果はシナプス結合係数設定部
７に供給され、階層型線形ニューラルネットワーク１の
シナプス結合係数αi の初期値として設定される。In such a configuration, when a predetermined number of voices sampled at predetermined time intervals are input to the linear prediction coefficient calculation unit 6 from the voice input unit 5, a covariance technique known in the art is used. The linear prediction coefficient is calculated for the analysis order by the method or the autocorrelation method. Usually, this analysis order P is 1
It is about 0. The result of this calculation is supplied to the synapse coupling coefficient setting unit 7, and is set as an initial value of the synaptic coupling coefficient αi of the hierarchical linear neural network 1.

【００２６】こうして初期値が設定されると、順次、入
力値ｘt-i を分析係数Ｐ分入力しながら階層型線形ニュ
ーラルネットワーク１が動作され、現音声波形の線形予
測値がシナプス結合係数学習部８に出力される。When the initial value is set, the hierarchical linear neural network 1 is operated while sequentially inputting the input value xt-i by the analysis coefficient P, and the linear prediction value of the current speech waveform is converted to the synapse connection coefficient learning unit. 8 is output.

【００２７】このシナプス結合係数学習部８では、線形
予測値とシナプス結合係数αi と現時点における標本値
ｘt 及び入力層２への入力値ｘt-i を用いて、シナプス
結合係数αi がバックプロパゲーション学習により更新
学習される。そして、この更新されたシナプス結合係数
αi はシナプス結合係数設定部７に供給され、階層型線
形ニューラルネットワーク１の新たなシナプス係数とし
て設定される。The synapse coupling coefficient learning unit 8 uses the linear prediction value, the synaptic coupling coefficient αi, the current sample value xt, and the input value xt-i to the input layer 2 to convert the synapse coupling coefficient αi to the back propagation learning. Is updated and learned. Then, the updated synapse coupling coefficient αi is supplied to the synapse coupling coefficient setting unit 7, and is set as a new synaptic coefficient of the hierarchical linear neural network 1.

【００２８】上記バックプロパゲーション学習は、誤差
Ｅの減少がなくなるまで行われるが、予測誤差ｅt があ
る閾値以上の場合にのみ、その閾値以内に誤差がおさま
るまで学習するようにする事もできる。これにより、従
来予測誤差から音源情報としてのピッチを抽出していた
が、この処理を省略する事が可能となる。The above-described back propagation learning is performed until the error E no longer decreases. However, only when the prediction error et is equal to or greater than a certain threshold, learning can be performed until the error falls within the threshold. Thus, the pitch as the sound source information is conventionally extracted from the prediction error, but this processing can be omitted.

【００２９】逆に、上記バックプロパゲーション学習を
予測誤差ｅt がある閾値以下の場合にのみ行うとする
と、予測誤差がパルス化即ち電力集中が起きることにな
り、効率的符号化が可能となる。Conversely, if the back propagation learning is performed only when the prediction error et is equal to or less than a certain threshold, the prediction error is pulsed, that is, power concentration occurs, and efficient coding can be performed.

【００３０】尚、通常ピッチ成分は予測誤差における周
期的インパルスとして残存しているが、閾値処理により
効果的にこの誤差を取り除ける。さらに、予測誤差を閾
値以下にするため、そのダイナミックレンジが小さくな
り符号量の低減に寄与する。こうしてバックプロパゲー
ション学習が終了すると、シナプス結合係数量子化部９
において、階層型線形ニューラルネットワーク１のシナ
プス結合係数が読み取られ、所定の量子化ビット数で量
子化される。Although the pitch component usually remains as a periodic impulse in the prediction error, the error can be effectively removed by threshold processing. Further, since the prediction error is set to be equal to or less than the threshold, the dynamic range is reduced, which contributes to the reduction of the code amount. When the back propagation learning is completed, the synapse coupling coefficient quantization unit 9
In, the synaptic coupling coefficient of the hierarchical linear neural network 1 is read and quantized with a predetermined number of quantization bits.

【００３１】予測誤差計算部１０では、この量子化され
たシナプス結合係数から求められる予測値と、現時点に
おける標本値ｘt との予測誤差ｅt が算出され、予測誤
差量子化部１１では、その予測誤差の量子化が行われ
る。こうして、シナプス結合係数及び予測誤差の量子化
データが音声複号器２１に供給され音声合成が行われ
る。次に、図５は本発明の第２実施例の構成を示す図で
ある。The prediction error calculator 10 calculates the prediction error et between the predicted value obtained from the quantized synaptic coupling coefficient and the current sample value xt, and the prediction error quantizer 11 calculates the prediction error et. Is quantized. In this way, the quantized data of the synapse coupling coefficient and the prediction error are supplied to the audio decoder 21 to perform the audio synthesis. Next, FIG. 5 is a diagram showing a configuration of a second embodiment of the present invention.

【００３２】本実施例は、第１の実施例の構成の他に乱
数発生部１２が設けられ、さらに階層型線形ニューラル
ネットワーク１に非線形ニューロンユニット４が入出力
層間の設けられていることに特徴を有する。The present embodiment is characterized in that a random number generator 12 is provided in addition to the configuration of the first embodiment, and a nonlinear neuron unit 4 is provided in the hierarchical linear neural network 1 between input and output layers. Having.

【００３３】このような構成において、シナプス結合係
数設定部７は線形予測係数計算部６からシナプス結合係
数αi の初期値を受けると同時に、乱数発生部１２より
入力層と非線形ニューロンユニット間のシナプス結合係
数βikの初期値及び非線形ニューロンユニットと出力層
間のシナプス結合係数γk の初期値として微小な乱数を
受けると、階層型ニューラルネットワーク１´にその値
を設定する。そして、初期値が設定されると第１実施例
と同様の処理を行うが、現音声形の予測値は（６）式で
示される。In such a configuration, the synapse connection coefficient setting unit 7 receives the initial value of the synapse connection coefficient αi from the linear prediction coefficient calculation unit 6 and simultaneously receives the synapse connection between the input layer and the nonlinear neuron unit from the random number generation unit 12. When a small random number is received as the initial value of the coefficient βik and the initial value of the synaptic coupling coefficient γk between the nonlinear neuron unit and the output layer, the values are set in the hierarchical neural network 1 ′. Then, when the initial value is set, the same processing as in the first embodiment is performed, but the predicted value of the current voice form is expressed by equation (6).

【００３４】[0034]

【数６】ここで、Ｋは非線形ニューロンユニットの個数、Ｊは入
力層と非線形ニューロンユニット間のシナプス結合の数
を示し、Ｐ≧Ｊの関係を持つ。本実施例では、非線形ニ
ューロンユニット４を付加したことで、音声波形の非線
形予測が可能となり、予測誤差をさらに小さくすること
ができる。(Equation 6) Here, K indicates the number of nonlinear neuron units, J indicates the number of synaptic connections between the input layer and the nonlinear neuron unit, and has a relationship of P ≧ J. In the present embodiment, by adding the nonlinear neuron unit 4, nonlinear prediction of the speech waveform becomes possible, and the prediction error can be further reduced.

【００３５】尚、線形予測係数αi が学習初期で大きく
変化することがないように、学習初期にはαi を固定
し、非線形ニューロンユニットに関する係数βik、γk
のみ更新を行い、次の段階で全シナプス結合係数を学
習、更新するという方法も可能である。In order to prevent the linear prediction coefficient αi from largely changing in the initial stage of learning, αi is fixed in the initial stage of learning, and the coefficients βik, γk
A method is also possible in which only updating is performed, and in the next stage, all synaptic coupling coefficients are learned and updated.

【００３６】以上、線形予測分析に本発明を適用した場
合の実施例について説明したが、次に、線形予測分析を
用いたＣＥＬＰ符号化に適用した場合の本発明の実施例
について説明するまず、図６を参照して、本発明にＣＥ
ＬＰ符号化を取り入れた音声符号化装置の概要につて説
明する。The embodiment in the case where the present invention is applied to the linear prediction analysis has been described above. Next, the embodiment in the case where the invention is applied to the CELP coding using the linear prediction analysis will be described. With reference to FIG.
An outline of a speech encoding device incorporating LP encoding will be described.

【００３７】同図に示すように、符号器２０にはゼロ状
態応答算出部１３が接続されており、該ゼロ状態応答算
出部１３及び音声入力部５は差分器１４を介して階層型
ニューラルネットワーク１に接続されている。そして、
上記符号器２０は更に階層型ニューラルネットワーク１
にも接続されており、該階層型ニューラルネットワーク
１は復号器２１に接続されている。As shown in the figure, a zero-state response calculating unit 13 is connected to the encoder 20. The zero-state response calculating unit 13 and the voice input unit 5 are connected via a differentiator 14 to a hierarchical neural network. 1 connected. And
The encoder 20 further includes a hierarchical neural network 1
The hierarchical neural network 1 is connected to a decoder 21.

【００３８】このような構成において、符号器２０より
出力された最適励起ベクトルｂj はゼロ状態応答算出部
１３に供給され、ゼロ状態応答Ｓt が計算され出力され
る。そして、ゼロ状態応答Ｓt は線形予測器と同様に線
形予測係数αi と励起ベクトルｂj を用いて（７）式の
ように表すことができる。In such a configuration, the optimum excitation vector bj output from the encoder 20 is supplied to the zero-state response calculator 13, where the zero-state response St is calculated and output. Then, the zero-state response St can be expressed as in equation (7) using the linear prediction coefficient αi and the excitation vector bj as in the case of the linear predictor.

【００３９】[0039]

【数７】 (Equation 7)

【００４０】ただし、計算時における初期状態Ｓt-i の
値が全てゼロ値となるという点が線形予測器とは異な
る。そして、差分器１４では、入力音声ｘと励起ベクト
ルｂjのゼロ状態応答Ｓとの差分ｘ´（＝ｘ−Ｓ）がと
られ、階層型ニューラルネットワーク１に供給される。However, this is different from the linear predictor in that the values of the initial state St-i at the time of calculation are all zero. Then, in the differentiator 14, a difference x ′ (= x−S) between the input voice x and the zero-state response S of the excitation vector bj is obtained and supplied to the hierarchical neural network 1.

【００４１】この階層型ニューラルネットワーク１は入
力層２と出力層３の２層の線形ニューラルネットワーク
で、入力層２と出力層３は互いにシナプス結合により結
線されている。そして、階層型ニューラルネットワーク
１のシナプス結合係数の初期値として符号器２０で得ら
れた線形予測係数αi を用いる。The hierarchical neural network 1 is a two-layer linear neural network having an input layer 2 and an output layer 3. The input layer 2 and the output layer 3 are connected to each other by synaptic connection. Then, the linear prediction coefficient αi obtained by the encoder 20 is used as the initial value of the synaptic coupling coefficient of the hierarchical neural network 1.

【００４２】階層型ニューラルネットワーク１の入力層
２に過去の出力値ｘt-i が入力されると、例えば誤差Ｅ
を（８）式により算出し、この誤差Ｅを最小化するよう
に、上記（４）式に示すようなバックプロパゲーション
学習法を行う。When the past output value xt-i is input to the input layer 2 of the hierarchical neural network 1, for example, an error E
Is calculated by the equation (8), and a back propagation learning method as shown in the above equation (4) is performed so as to minimize the error E.

【００４３】[0043]

【数８】 (Equation 8)

【００４４】上記（８）式で、第１項は差分器１４から
の出力値ｘ´を教師データとする通常の出力誤差最小化
項であるが、第２項は線形予測係数αi がその量子化テ
ーブルｖi 中のいずれかの要素ｖimに近ければ値が小さ
くなる式である。ここで、εは“０”に近い正定数であ
る。In the above equation (8), the first term is a normal output error minimizing term using the output value x 'from the differentiator 14 as teacher data, while the second term is a linear prediction coefficient αi whose quantum is The formula is such that the value becomes smaller if it is closer to any element vim in the conversion table vi. Here, ε is a positive constant close to “0”.

【００４５】バックプロパゲーション学習は１音声信号
ｘ´t 毎にシナプス結合係数を更新する逐次学習法も可
能であるが、ここでは分析フレーム区間Ｔ毎に一括して
シナプス結合係数を更新する一括学習法を用いてシナプ
ス結合係数が更新される毎にゼロ状態応答算出部の線形
予測係数αi を階層型ニューラルネットワーク１のシナ
プス結合係数αi で更新する。In the back propagation learning, a sequential learning method of updating the synaptic coupling coefficient for each audio signal x't is also possible, but here, collective learning for updating the synaptic coupling coefficient collectively for each analysis frame section T is used. Each time the synaptic coupling coefficient is updated using the method, the linear prediction coefficient αi of the zero-state response calculation unit is updated with the synaptic coupling coefficient αi of the hierarchical neural network 1.

【００４６】そして、この動作をゼロ状態応答を再計算
することを誤差Ｅが十分小さくなるまで繰り返し、十分
小さくなったらシナプス結合係数αi を量子化して、よ
り最適な線形予測係数として出力する。図１は本発明の
第３実施例の構成を示す図である。This operation is repeated until the error E becomes sufficiently small, and the synapse coupling coefficient αi is quantized and output as a more optimal linear prediction coefficient. FIG. 1 is a diagram showing a configuration of a third embodiment of the present invention.

【００４７】同図に示すように、音声入力部５は線形予
測分析部１５に接続されており、該線形予測分析部１５
は線形予測係数量子化部１６に接続されている。そし
て、線形予測係数量子化部１６は線形予測器１７に接続
されており、この線形予測器１７には、コードブック２
２より得られた励起ベクトルｂj にゲインγを与えるゲ
イン付加器２３が接続されている。As shown in the figure, the speech input unit 5 is connected to a linear prediction analysis unit 15,
Are connected to the linear prediction coefficient quantization unit 16. The linear prediction coefficient quantization unit 16 is connected to a linear predictor 17, and the linear predictor 17 has a codebook 2
2 is connected to a gain adder 23 for giving a gain γ to the excitation vector bj obtained from the step 2.

【００４８】さらに、上記音声入力部５及び上記線形予
測器１７は、差分器１４ａを介して聴覚重み付けフィル
タに接続されており、該聴覚重み付けフィルタ１８は平
均２乗誤差計算部１９に接続されている。そして、この
平均２乗誤差計算部１９はシナプス結合係数設定部７及
びゼロ状態応答算出部１３に接続されている。Further, the speech input unit 5 and the linear predictor 17 are connected to an auditory weighting filter via a differentiator 14a, and the auditory weighting filter 18 is connected to a mean square error calculator 19. I have. The mean square error calculator 19 is connected to the synapse coupling coefficient setting unit 7 and the zero state response calculator 13.

【００４９】このゼロ状態応答算出部１３及び音声入力
部５は差分器１４ｂを介してシナプス結合係数学習部８
に接続されており、該シナプス結合係数学習部８はシナ
プス結合係数設定部７に接続されている。The zero-state response calculating section 13 and the voice input section 5 are connected to the synapse coupling coefficient learning section 8 via a differentiator 14b.
The synapse coupling coefficient learning unit 8 is connected to the synapse coupling coefficient setting unit 7.

【００５０】そして、上記シナプス結合係数設定部７は
階層型ニューラルネットワーク１に接続されており、該
階層型ニューラルネットワーク１は上記シナプス結合係
数学習部８、シナプス結合係数量子化部９にそれぞれ接
続されている。このシナプス結合係数量子化部９は音声
復号器２１に接続されており、該音声復号器２１は平均
２乗誤差計算部１９に接続されている。The synapse coupling coefficient setting section 7 is connected to the hierarchical neural network 1, and the hierarchical neural network 1 is connected to the synaptic coupling coefficient learning section 8 and the synaptic coupling coefficient quantizing section 9, respectively. ing. The synapse coupling coefficient quantization unit 9 is connected to a speech decoder 21, and the speech decoder 21 is connected to a mean square error calculation unit 19.

【００５１】このような構成において、一定の時間間隔
で標本化された入力音声が、所定個数だけ線形予測分析
部１５に入力されると、公知の技術である共分散法ある
いは自己相関法により線形予測係数が分析次数分だけ計
算される。通常、分析次数Ｐは１０程度である。そし
て、この計算結果は線形予測係数量子化部１６に供給さ
れ、図示しない量子化テーブルを参照してスカラー量子
化され、線形予測器１７に供給される。さらに、この線
形予測器１７には、同時にコードブック２２からの励起
ベクトルｂj がゲイン付加器２３にてγ倍されて供給さ
れ線形予測音声が求まる。In such a configuration, when a predetermined number of input voices sampled at a fixed time interval are input to the linear prediction analysis unit 15, the input voice is linearized by a known technique such as a covariance method or an autocorrelation method. The prediction coefficients are calculated for the order of analysis. Usually, the analysis order P is about 10. The calculation result is supplied to the linear prediction coefficient quantization unit 16, scalar-quantized with reference to a quantization table (not shown), and supplied to the linear predictor 17. Further, at the same time, the excitation vector bj from the codebook 22 is multiplied by γ in the gain adder 23 and supplied to the linear predictor 17 to obtain a linear predicted speech.

【００５２】次に、入力音声と線形予測音声の差分値、
つまり線形予測誤差ｅj が聴覚重み付けフィルタ１８に
供給され、人間の聴覚特性に基づいた雑音感低減が行わ
れる。そして、このフィルタ出力は平均２乗誤差計算部
１９において平均２乗誤差が計算されて、最小の平均２
乗誤差と、その時の励起ベクトルγｂj を保持する。Next, the difference value between the input speech and the linear prediction speech,
In other words, the linear prediction error ej is supplied to the auditory weighting filter 18, and the noise perception is reduced based on the human auditory characteristics. The mean square error of the filter output is calculated by a mean square error calculator 19, and the minimum mean square error is calculated.
The multiplication error and the excitation vector γbj at that time are held.

【００５３】この動作がコードブック２２の全ての励起
ベクトルについて行われ、その結果である最小誤差励起
ベクトルγｂj と線形予測係数αi がゼロ状態応答算出
部１３に供給される。This operation is performed for all the excitation vectors in the code book 22, and the minimum error excitation vector γbj and the linear prediction coefficient αi that are the results are supplied to the zero-state response calculator 13.

【００５４】ここでは、励起ベクトルγｂj のみによる
応答値、即ちゼロ状態応答が計算され、入力音声ｘとこ
のゼロ状態応答Ｓの差分値ｘ´が階層型ニューラルネッ
トワーク１の教師データとしてシナプス結合係数学習部
８に供給される。Here, a response value based on only the excitation vector γbj, that is, a zero-state response is calculated, and the difference value x ′ between the input speech x and the zero-state response S is used as the teacher data of the hierarchical neural network 1 for synapse coupling coefficient learning. It is supplied to the unit 8.

【００５５】上記階層型ニューラルネットワーク１のシ
ナプス結合係数の初期値として平均２乗誤差計算部１９
から送出される線形予測係数αi がシナプス結合係数設
定部７を介して設定される。The mean square error calculator 19 is used as an initial value of the synaptic coupling coefficient of the hierarchical neural network 1.
Are set via the synapse coupling coefficient setting unit 7.

【００５６】そして、階層型ニューラルネットワーク１
を上記（１）式に基づいて動作させながら、シナプス結
合係数学習部８にてバックプロパゲーション学習部を行
うが、バックプロパゲーション学習で最小化する誤差式
は、例えば上記（８）式のように定義する。これは、線
形予測係数αi を図示しない線形予測係数量子化テーブ
ルの要素の１つのＶimに近付けながら、以下に示す
（９）式で示される誤差を最小化するというものであ
る。Then, the hierarchical neural network 1
Is performed by the synapse coupling coefficient learning unit 8 while operating based on the above equation (1). The error equation to be minimized by the back propagation learning is, for example, the above equation (8). Defined in This is to minimize the error represented by the following equation (9) while bringing the linear prediction coefficient αi closer to Vim, one of the elements of a linear prediction coefficient quantization table (not shown).

【００５７】[0057]

【数９】 (Equation 9)

【００５８】即ち、線形予測係数のスカラー量子化と出
力誤差の最小化を同時に最適化していることになる。こ
こでのバックプロパゲーション学習は、分析フレーム区
間Ｔ毎にシナプス結合係数を更新する一括学習法をと
り、その更新毎にゼロ状態応答算出部１３の線形予測係
数αi を更新する。That is, the scalar quantization of the linear prediction coefficient and the minimization of the output error are simultaneously optimized. The back propagation learning here uses a collective learning method of updating the synapse connection coefficient for each analysis frame section T, and updates the linear prediction coefficient αi of the zero-state response calculation unit 13 for each update.

【００５９】さらに、シナプス結合係数設定部７を介し
て階層型ニューラルネットワーク１の学習を誤差Ｅが十
分小さくなるまで繰り返した後、そのシナプス結合係数
をシナプス結合係数量子化部９でスカラー量子化して、
音声復号器２１に出力する。この音声復号器２１では平
均２乗誤差計算部１９から最適励起ベクトルγｂjも同
時に受取り音声の合成を行う。次に、図７は本発明の第
４の実施例の構成を示す図である。Further, after repeating the learning of the hierarchical neural network 1 via the synapse coupling coefficient setting unit 7 until the error E becomes sufficiently small, the synaptic coupling coefficient is scalar-quantized by the synaptic coupling coefficient quantization unit 9. ,
Output to the audio decoder 21. The speech decoder 21 receives the optimum excitation vector γbj from the mean square error calculator 19 at the same time and synthesizes the speech. Next, FIG. 7 is a diagram showing a configuration of a fourth embodiment of the present invention.

【００６０】同図に示すように、本実施例は、第１の実
施例の構成と比べてゼロ状態応答算出部１３が削除さ
れ、その代わりに階層型ニューラルネットワーク１に励
起ベクトルｂjtの入力ユニットが付加され、さらに、そ
のシナプス結合係数としてゲインγが初期設定されてい
る点に特徴を有する。As shown in the figure, in the present embodiment, the zero-state response calculation unit 13 is eliminated from the configuration of the first embodiment, and the input unit of the excitation vector bjt is added to the hierarchical neural network 1 instead. And the gain γ is initially set as the synaptic coupling coefficient.

【００６１】このような構成において、平均２乗誤差計
算部１９から送出される励起ベクトルｂj のゲインγ
は、シナプス結合係数設定部７を介して階層型ニューラ
ルネットワーク１に初期設定される。In such a configuration, the gain γ of the excitation vector bj sent from the mean square error calculator 19
Are initially set in the hierarchical neural network 1 via the synapse coupling coefficient setting unit 7.

【００６２】そして、励起ベクトルｂj のｔ時点におけ
る要素ｂjtが階層型ニューラルネットワーク１に入力さ
れることで、学習動作を開始する。ゲインγは線形予測
係数αと同様に図示しない量子化テーブルまたは量子化
ステップに近似するように学習される。即ち、前記の誤
差Ｅを示す式に（１０）式が付加される。Then, the learning operation is started when the element bjt at the time point t of the excitation vector bj is input to the hierarchical neural network 1. The gain γ is learned so as to approximate a quantization table or a quantization step (not shown), like the linear prediction coefficient α. That is, equation (10) is added to the equation indicating the error E.

【００６３】[0063]

【数１０】ここで、Ｕn はゲインγの量子化テーブルＵの一要素で
あり、ｎはテーブル内の要素数である。(Equation 10) Here, Un is one element of the quantization table U of gain γ, and n is the number of elements in the table.

【００６４】こうして、音声符号器２１はシナプス結合
係数量子化部９より最適化された線形予測係数αi と励
起ベクトルのゲインγを平均２乗誤差算出部１９より励
起ベクトルｂj を受取り音声の合成を行う。図８は本発
明の第５実施例の構成を示す図である。In this way, the speech encoder 21 receives the excitation vector bj from the mean square error calculator 19 and the linear prediction coefficient αi optimized by the synapse coupling coefficient quantizer 9 and the gain γ of the excitation vector, and synthesizes the speech. Do. FIG. 8 is a diagram showing the configuration of the fifth embodiment of the present invention.

【００６５】本実施例は、図９に示した従来例と比較し
て、ゼロ状態応答算出部１３がコードブック２２による
量子化誤差を線形予測分析部１５にフィードバックする
ように設けられている点に特徴を有する。The present embodiment is different from the conventional example shown in FIG. 9 in that the zero-state response calculator 13 is provided so as to feed back the quantization error caused by the codebook 22 to the linear prediction analyzer 15. It has features.

【００６６】このような構成において、平均２乗誤差計
算部１９で最適励起ベクトルγｂjが求まると、ゼロ状
態応答算出部１３に送出され、最適励起ベクトルγｂj
のゼロ状態応答Ｓが算出され、入力音声ｘとの差分値ｘ
´を基に、線形予測分析部１５にて、新規の線形予測係
数αi が求められる。In such a configuration, when the optimum excitation vector γbj is obtained by the mean square error calculation section 19, it is sent to the zero-state response calculation section 13 and the optimum excitation vector γbj is obtained.
Is calculated, and the difference value x from the input voice x is calculated.
, A new linear prediction coefficient αi is obtained in the linear prediction analysis unit 15.

【００６７】そして、この線形予測係数を量子化したも
のを、ただちに音声符号器２１に送り出すことも可能で
あるが、さらに、符号化精度を向上させるために最適な
励起ベクトルを求め直す。そして、線形予測係数の量子
化データが変化しなくなるまで、上記処理を繰り返す。
本実施例では、このような動作により、線形予測係数及
び励起ベクトル共に最適化することが可能となる。以
上、本発明の実施例について説明したが、本発明はこれ
に限定されることなく種々の改良、変更が可能であるこ
とは勿論である。It is possible to immediately send the quantized linear prediction coefficient to the speech encoder 21. However, in order to further improve the encoding accuracy, the optimum excitation vector is determined again. Then, the above processing is repeated until the quantized data of the linear prediction coefficient does not change.
In the present embodiment, such an operation makes it possible to optimize both the linear prediction coefficient and the excitation vector. The embodiments of the present invention have been described above. However, it is needless to say that the present invention is not limited to the embodiments, and various modifications and changes can be made.

【００６８】例えば、上記した第３及び第４実施例で使
用した階層型ニューラルネットワーク１は２層の線形ネ
ットワークであるが、入出力層間に非線形ニューラルネ
ットワークを付加することも可能である。For example, the hierarchical neural network 1 used in the third and fourth embodiments is a two-layer linear network, but a nonlinear neural network can be added between the input and output layers.

【００６９】[0069]

【発明の効果】本発明によれば、従来数値計算での桁落
ちによる線形予測係数の精度低下を階層型ニューラルネ
ットワークの学習処理により防止する事ができる。さら
に、非線形ニューロンユニットを追加することで、定常
ではない音声波形の非線形予測を可能とし、予測誤差を
より小さくすることができる。According to the present invention, it is possible to prevent a decrease in precision of a linear prediction coefficient due to a digit loss in a conventional numerical calculation by a learning process of a hierarchical neural network. Further, by adding a non-linear neuron unit, non-linear prediction of an unsteady speech waveform can be performed, and the prediction error can be further reduced.

【００７０】そして、入力音声の線形予測誤差及びその
量子化誤差を用いて階層型ニューラルネットワークを学
習させることで線形予測係数を最適化するので、入力音
声の符号化を高能率に行うことができる。Since the linear prediction coefficients are optimized by learning the hierarchical neural network using the linear prediction error and the quantization error of the input speech, the input speech can be encoded with high efficiency. .

[Brief description of the drawings]

【図１】本発明の第３の実施例に係る音声符号化装置の
構成を示す図である。FIG. 1 is a diagram illustrating a configuration of a speech encoding device according to a third embodiment of the present invention.

【図２】階層型線形ニューラルネットワーク１の構成を
示す図である。FIG. 2 is a diagram showing a configuration of a hierarchical linear neural network 1.

【図３】階層型非線形ニューラルネットワーク１´の構
成を示す図である。FIG. 3 is a diagram showing a configuration of a hierarchical nonlinear neural network 1 ′.

【図４】本発明の第１の実施例に係る音声符号化装置の
構成を示す図である。FIG. 4 is a diagram illustrating a configuration of a speech encoding device according to a first embodiment of the present invention.

【図５】本発明の第２の実施例に係る音声符号化装置の
構成を示す図である。FIG. 5 is a diagram illustrating a configuration of a speech encoding device according to a second embodiment of the present invention.

【図６】本発明をＣＥＬＰ符号化に採用した場合の概念
図を示す。FIG. 6 is a conceptual diagram when the present invention is applied to CELP coding.

【図７】本発明の第４の実施例に係る音声符号化装置の
構成を示す図である。FIG. 7 is a diagram illustrating a configuration of a speech encoding device according to a fourth embodiment of the present invention.

【図８】本発明の第５の実施例に係る音声符号化装置の
構成を示す図である。FIG. 8 is a diagram illustrating a configuration of a speech encoding device according to a fifth embodiment of the present invention.

【図９】従来の音声符号化装置の構成を示す図である。FIG. 9 is a diagram illustrating a configuration of a conventional speech encoding device.

[Explanation of symbols]

１…階層型ニューラルネットワーク、２…入力層、３…
出力層、４…中間層、５…音声入力部、６…線形予測係
数計算部、７…シナプス結合係数設定部、８…シナプス
結合係数学習部、９…シナプス結合係数量子化部、１０
…予測誤差計算部、１１…予測誤差量子化部、１２…乱
数発生部、１３…ゼロ状態応答算出部、１４…差分器、
１５…線形予測分析部、１６…線形予測係数量子化部、
１７…線形予測器、１８…聴覚重み付けフィルタ、１９
…平均２乗誤差計算部、２０…符号器、２１…復号器、
２２…コードブック、２３…ゲイン付加器。1 ... Hierarchical neural network, 2 ... Input layer, 3 ...
Output layer, 4 ... Intermediate layer, 5 ... Speech input section, 6 ... Linear prediction coefficient calculation section, 7 ... Synapse connection coefficient setting section, 8 ... Synapse connection coefficient learning section, 9 ... Synapse connection coefficient quantization section, 10
... Prediction error calculator, 11 ... Prediction error quantizer, 12 ... Random number generator, 13 ... Zero state response calculator, 14 ... Differentiator,
15: Linear prediction analysis unit, 16: Linear prediction coefficient quantization unit,
17: Linear predictor, 18: Auditory weighting filter, 19
... mean-square error calculator, 20 ... encoder, 21 ... decoder,
22: code book, 23: gain adder.

Claims

(57) [Claims]

1. A linear prediction analysis means for calculating a linear prediction coefficient by an analysis order from input speech sampled at a fixed time interval, and a linear prediction means for quantizing the linear prediction coefficient calculated by the linear prediction analysis means. Prediction coefficient quantization means; linear prediction means for calculating a linear prediction speech based on a signal quantized by the linear prediction coefficient quantization means and information from a codebook; difference between the input speech and the linear prediction speech Filter means for reducing noise perception of a linear prediction error, which is a value; calculating an average square error from an output signal from the filter means; and calculating an average square error and an average holding an excitation vector at that time Square error calculating means, two-layer hierarchical neural network means comprising an input layer and an output layer, receiving the excitation vector and the linear prediction coefficient, Zero-state response calculating means for calculating a response value by only the input voice and a difference value between the input voice and the response value as teacher data of the hierarchical neural network means, and a synaptic connection of the hierarchical neural network means. A speech encoding apparatus characterized in that learning and updating are performed using a linear prediction coefficient of an input speech calculated in advance as an initial value of a coefficient.