JPH0786952A

JPH0786952A - Predictive encoding method for voice

Info

Publication number: JPH0786952A
Application number: JP5227577A
Authority: JP
Inventors: Akitoshi Kataoka; 章俊片岡; Takehiro Moriya; 健弘守谷; Shinji Hayashi; 伸二林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1993-09-13
Filing date: 1993-09-13
Publication date: 1995-03-31

Abstract

PURPOSE:To provide a high-quality decoded voice at the destination of transmission even when voices provided with different frequency characteristics are encoded and transmitted. CONSTITUTION:A predictive coefficient decision part 2 and a predictive coefficient quantization part 4 set a predictive coefficient to a synthetic filter 3. A pitch cycle vector and a noise waveform vector are outputted from an adaptive code book 5 and a noise code book 7, and gains are respectively multiplied at gain parts 6 and 8. The outputs of the gain parts 6 and 8 are added and supplied to the synthetic filter 3 later, and a resultant speech vector is synthesized. Concerning distortion provided by subtracting the resultant speech vector from an input speech vector to which power is quantized, the sense of hearing is weighted so that the degree of weighting can be adaptively controlled based on the frequency characteristic of the input speech vector, power is calculated later, the pitch cycle vector and the noise waveform vector are selected from the adaptive code book 5 and the noise code book 7 so as to minimize this power, and the gains of the gain parts 6 and 8 are set.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自動車電話等のディジ
タル移動通信などに用いられ、音声を高能率に符号化す
る音声の予測符号化方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice predictive coding method which is used for digital mobile communication such as a car telephone and which codes a voice with high efficiency.

【０００２】[0002]

【従来の技術】近年、ディジタル移動通信などの技術分
野においては、電波を有効利用するなどの目的で、種々
の高能率符号化方法が用いられている。これらの高能率
符号化方法のうち、８ｋｂｉｔ／ｓ程度の符号化速度で
音声を符号化する高能率符号化方法としては、符号駆動
型線形予測（ＣＥＬＰ）符号化方法やベクトル加算駆動
型線形予測（ＶＳＥＬＰ）符号化方法、あるいはマルチ
パス符号化方法等がある。2. Description of the Related Art In recent years, in the technical field of digital mobile communication and the like, various high efficiency coding methods have been used for the purpose of effectively utilizing radio waves. Among these high-efficiency coding methods, as a high-efficiency coding method for coding speech at a coding rate of about 8 kbit / s, there are a code-driven linear prediction (CELP) coding method and a vector addition-driven linear prediction method. (VSELP) encoding method, multi-pass encoding method, and the like.

【０００３】なお、ＣＥＬＰ符号化方法の詳細について
は、たとえば、M.R.SchroederとB.S.Atalとが著した"Co
de-Excited Linear Prediction(CELP) : High-quality
Speech at Very Low Rates" (Proc. ICASSP '85, 25.1.
1, pp. 937-940, 1985を、ＶＳＥＬＰ符号化方法の詳細
については、たとえば、I.A.GersonとM.A.Jasiukとが著
した"Vector Sum Excited Linear Prediction (VSELP)
Speech Coding at 8kps" (Proc. ICASSP '90, S9.3, p
p. 461-464, 1990)を、マルチパス符号化方法の詳細に
ついては、たとえば、小澤一範と荒関卓とが著した”ピ
ッチ情報を用いた９．６−４．８ｋｂｉｔ／ｓマルチパ
ス音声符号化方式”（信学誌（Ｄ−II），Ｊ７２−Ｄ−
II，８，ｐｐ．１１２５−１１３２，１９８９）をそれ
ぞれ参照されたい。The details of the CELP coding method are described in, for example, "Co Co.," written by MR Schroeder and BSAtal.
de-Excited Linear Prediction (CELP): High-quality
Speech at Very Low Rates "(Proc. ICASSP '85, 25.1.
1, pp. 937-940, 1985, for details of the VSELP coding method, see, for example, "Vector Sum Excited Linear Prediction (VSELP)" by IAGerson and MA Jasiuk.
Speech Coding at 8kps "(Proc. ICASSP '90, S9.3, p
p. 461-464, 1990) for details of the multi-pass encoding method, see, for example, Kazunori Ozawa and Taku Araseki, “9.6-4.8 kbit / s multi-pass speech using pitch information”. Encoding system "(The journal of journals (D-II), J72-D-
II, 8, pp. 1125-1132, 1989) respectively.

【０００４】図１は、従来のＣＥＬＰ符号化方法を用い
た音声の符号化装置の構成例を表すブロック図である。
アナログの音声信号がサンプリング周波数８ｋＨｚでサ
ンプリングされて生成された入力音声データが入力端子
１から入力される。予測係数決定部２において、入力端
子１から入力された入力音声データの複数のサンプルが
１フレームとして１つのベクトルにまとめられ（以下、
入力音声ベクトルという）、この入力音声ベクトルにつ
いて線形予測分析がなされ、伝達関数｛１／Ａ（ｚ）｝
を有する合成フィルタ３の予測係数（線形予測符号化
（ＬＰＣ）係数、または線スペクトル対（ＬＳＰ）係
数）が計算され、決定される。これにより、予測係数量
子化部４において、予測係数が量子化され、合成フィル
タ３に設定される。FIG. 1 is a block diagram showing a configuration example of a speech coding apparatus using a conventional CELP coding method.
Input audio data generated by sampling an analog audio signal at a sampling frequency of 8 kHz is input from the input terminal 1. In the prediction coefficient determination unit 2, a plurality of samples of the input voice data input from the input terminal 1 are combined into one vector as one frame (hereinafter,
Input speech vector), a linear prediction analysis is performed on this input speech vector, and the transfer function {1 / A (z)}
The prediction coefficient (linear predictive coding (LPC) coefficient or line spectrum pair (LSP) coefficient) of the synthesis filter 3 having is calculated and determined. As a result, the prediction coefficient quantization unit 4 quantizes the prediction coefficient and sets it in the synthesis filter 3.

【０００５】適応符号帳５は、音声の有声区間のピッチ
周期に対応した複数のピッチ周期ベクトルが記憶される
ように構成されている。この適応符号帳５から、後述す
る歪パワー計算部１２によって選択され、取り出された
ピッチ周期ベクトルに、利得部６において、同じく歪パ
ワー計算部１２によって設定された利得が乗算され、利
得部６から出力される。The adaptive codebook 5 is configured to store a plurality of pitch period vectors corresponding to the pitch period of a voiced section of speech. From the adaptive code book 5, the pitch period vector selected and extracted by the distortion power calculation unit 12 described later is multiplied by the gain set by the distortion power calculation unit 12 in the gain unit 6, and the gain period 6 Is output.

【０００６】いっぽう、雑音符号帳７には、音声の無声
区間に対応した複数の雑音波形ベクトル（たとえば、乱
数ベクトル）があらかじめ記憶されている。この雑音符
号帳７から、後述する歪パワー計算部１２によって選択
され、取り出された雑音波形ベクトルに、利得部８にお
いて、歪パワー計算部１２によって設定された利得が乗
算され、利得部８から出力される。そして、利得部６の
出力ベクトルと、利得部８の出力ベクトルとが加算器９
において加算され、加算器９の出力ベクトルが合成フィ
ルタ３に駆動ベクトルとして供給され、合成フィルタ３
において、設定された予測係数に基づいて音声ベクトル
（以下、合成音声ベクトルという）が合成される。On the other hand, the noise codebook 7 stores in advance a plurality of noise waveform vectors (for example, random number vectors) corresponding to unvoiced sections of speech. The noise waveform vector selected and extracted from the noise codebook 7 by the distortion power calculation unit 12 described later is multiplied by the gain set by the distortion power calculation unit 12 in the gain unit 8 and output from the gain unit 8. To be done. The output vector of the gain unit 6 and the output vector of the gain unit 8 are added by the adder 9
Are added together, the output vector of the adder 9 is supplied to the synthesizing filter 3 as a drive vector, and the synthesizing filter 3
In, a voice vector (hereinafter referred to as a synthesized voice vector) is synthesized based on the set prediction coefficient.

【０００７】また、パワー量子化部１０において、入力
音声ベクトルのパワーが計算された後、そのパワーが量
子化され、これにより、量子化された入力音声ベクトル
のパワーが用いられて入力音声ベクトルとピッチ周期ベ
クトルとが正規化される。そして、減算器１１におい
て、正規化され、パワー量子化部１０から出力された入
力音声ベクトルから合成音声ベクトルが減算されて、歪
データが求められる。In addition, after the power of the input speech vector is calculated in the power quantizing unit 10, the power is quantized, whereby the quantized power of the input speech vector is used as the input speech vector. The pitch period vector and are normalized. Then, the subtractor 11 subtracts the synthesized speech vector from the input speech vector that is normalized and output from the power quantization unit 10 to obtain distortion data.

【０００８】次に、歪パワー計算部１２は、歪データの
パワーを計算し、この歪データのパワーが最も小さくな
るように、適応符号帳５および雑音符号帳７それぞれか
らピッチ周期ベクトルおよび雑音波形ベクトルをそれぞ
れ選択するとともに、利得部６および８のそれぞれの利
得を設定する。これにより、符号出力部１３において、
予測係数、入力音声ベクトルのパワー、ピッチ周期ベク
トルおよび雑音波形ベクトルそれぞれに対して選択され
た情報（コード）と利得などとがビット系列の符号に変
換されて出力され、これらの符号が伝送される。Next, the distortion power calculator 12 calculates the power of the distortion data, and the pitch period vector and the noise waveform are respectively calculated from the adaptive codebook 5 and the noise codebook 7 so that the power of the distortion data becomes the smallest. The respective vectors are selected and the gains of the gain units 6 and 8 are set. Thereby, in the code output unit 13,
Information (code) and gain selected for each of the prediction coefficient, the power of the input speech vector, the pitch period vector and the noise waveform vector are converted into a code of a bit sequence and output, and these codes are transmitted. .

【０００９】ところで、歪パワー計算部１２において、
合成音声ベクトルと入力音声ベクトルとの差である歪デ
ータを評価する際、歪データが最小になること、すなわ
ち、ＳＮが最大になることだけで評価すると、量子化雑
音が周波数軸上で一様に分布することになる。また、音
声信号は、低域に多くのパワーを有するが、周波数の増
加に従ってパワーは減少する。そのため、量子化雑音が
周波数軸上で一様に分布していると、高域において量子
化雑音レベルが音声レベルより相対的に高く、このこと
が符号化音声を劣化させる原因となる。By the way, in the distortion power calculation unit 12,
When evaluating distortion data, which is the difference between a synthetic speech vector and an input speech vector, if the distortion data is evaluated to be the minimum, that is, the SN is maximized, the quantization noise is uniform on the frequency axis. Will be distributed in. Further, the audio signal has much power in the low frequency range, but the power decreases as the frequency increases. Therefore, if the quantization noise is uniformly distributed on the frequency axis, the quantization noise level is relatively higher than the speech level in the high frequency range, which causes deterioration of the coded speech.

【００１０】そこで、従来では、図２に示すように、歪
パワー計算部１２において、聴覚重み付けフィルタ１４
を用いて歪データを入力音声ベクトルのスペクトルに基
づいて重み付けを行った後、パワー計算部１５において
評価する。つまり、音声パワーの大きい低域では、一様
分布の時より多少量子化雑音レベルが大きくなっても音
声によってマスキングされるため、量子化雑音は聞こえ
ない。逆に、高域では、一様分布より低くなるように重
み付けを行う。図２において、ｅは減算器１１から出力
された歪データ、ｅ’は重み付きの歪データである。Therefore, in the prior art, as shown in FIG.
After the distortion data is weighted based on the spectrum of the input speech vector using, the power calculation unit 15 evaluates the distortion data. That is, in the low frequency range where the voice power is high, the quantization noise is inaudible because it is masked by the voice even if the quantization noise level becomes a little higher than in the case of the uniform distribution. On the contrary, in the high range, weighting is performed so that the distribution becomes lower than the uniform distribution. In FIG. 2, e is distortion data output from the subtractor 11, and e ′ is weighted distortion data.

【００１１】聴覚重み付けフィルタ１４の伝達関数Ｗ
（ｚ）は、（１）式によって表される。The transfer function W of the perceptual weighting filter 14
(Z) is represented by the equation (1).

【数１】ここで、[Equation 1] here,

【数２】 [Equation 2]

【数３】（１）〜（３）式において、係数α_iは、予測係数決定
部２において得られた量子化されていないＬＰＣ係数で
ある。また、係数γ₁およびγ₂には、０＜γ₂＜γ₁＜１
の値が用いられる。また、係数γ₁およびγ₂によって聴
覚重み付けフィルタ１４の特性が左右されるため、これ
らの係数γ₁およびγ₂の値は、試聴によって経験的に決
定される。[Equation 3] In the expressions (1) to (3), the coefficient α _i is the non-quantized LPC coefficient obtained by the prediction coefficient determination unit 2. In addition, for coefficients γ ₁ and γ ₂ , 0 <γ ₂ <γ ₁ <1
The value of is used. Further, since influenced the characteristics of perceptual weighting filter 14 by a factor gamma ₁ and gamma _2, the values of these coefficients gamma ₁ and gamma ₂ are determined empirically by listening.

【００１２】[0012]

【発明が解決しようとする課題】ところで、電話機の音
声には、従来から、国際電信電話諮問委員会（ＣＣＩＴ
Ｔ）において標準化されているＩＲＳ特性を有する音声
が用いられてきた。しかしながら、最近では、小型のエ
レクトレット・マイクロホンの普及に伴って、ＩＲＳ特
性とは異なる周波数特性（以下、ＮＯＮ−ＩＲＳ特性と
いう）を有する音声も用いられるようになってきてい
る。そして、このような異なる周波数特性を有する音声
を符号化する場合、上述した聴覚重み付けフィルタ１４
の係数γ₁およびγ₂の最適値も当然異なる。By the way, the voice of a telephone has conventionally been transmitted to the International Telegraph and Telephone Consultative Committee (CCIT).
Voices with IRS characteristics standardized in T) have been used. However, recently, with the spread of small electret microphones, voices having frequency characteristics different from the IRS characteristics (hereinafter referred to as NON-IRS characteristics) have also been used. When encoding voices having such different frequency characteristics, the auditory weighting filter 14 described above is used.
The optimal values of the coefficients γ ₁ and γ ₂ of are naturally different.

【００１３】しかしながら、上述した従来の音声の予測
符号化方法においては、既に説明したように、聴覚重み
付けフィルタ１４の係数γ₁およびγ₂の値は、入力音声
の周波数特性に関係なく、入力音声に応じて試聴を行っ
て経験的に一定の値に固定されていた。したがって、上
記聴覚重み付けフィルタ１４の係数γ₁およびγ₂の値に
適合しない周波数特性を有する音声を符号化して伝送し
た場合には、伝送先で良い品質の復号音声を得ることが
できないという問題があった。本発明は、このような背
景の下になされたもので、異なる周波数特性を有する音
声を符号化して伝送した場合でも、伝送先で良い品質の
復号音声を得ることができる音声の予測符号化方法を提
供することを目的とする。However, in the above-described conventional predictive coding method for speech, as described above, the values of the coefficients γ ₁ and γ ₂ of the perceptual weighting filter 14 are irrespective of the frequency characteristics of the input speech. It was fixed to a certain value empirically by listening to it. Therefore, when speech having a frequency characteristic that does not match the values of the coefficients γ ₁ and γ ₂ of the auditory weighting filter 14 is encoded and transmitted, there is a problem that a decoded speech of good quality cannot be obtained at the transmission destination. there were. The present invention has been made under such a background, and a predictive coding method for a voice capable of obtaining a decoded voice of good quality at a transmission destination even when voices having different frequency characteristics are coded and transmitted. The purpose is to provide.

【００１４】[0014]

【課題を解決するための手段】請求項１記載の発明は、
入力音声を線形予測分析して予測係数を算出し、該予測
係数を合成フィルタに設定し、該合成フィルタを、複数
のピッチ周期ベクトルが記憶された適応符号帳と、複数
の雑音波形ベクトルが記憶された雑音符号帳とからそれ
ぞれ選択されたピッチ周期ベクトルおよび雑音波形ベク
トルによって、前記入力音声の複数サンプルからなるフ
レーム単位に駆動して合成音声を合成することを利用し
て音声を符号化する音声の予測符号化方法において、前
記合成音声と前記入力音声との歪が最小となるように前
記適応符号帳および前記雑音符号帳から前記ピッチ周期
ベクトルおよび前記雑音波形ベクトルを選択するために
前記歪に聴覚的な重み付けを行う際に、その重み付けの
度合いを前記入力音声の周波数特性に基づいて適応的に
制御することを特徴としている。請求項２記載の発明
は、請求項１記載の発明において、前記重み付けの度合
いを、パーコール係数を用いて適応的に制御することを
特徴としている。The invention according to claim 1 is
Input speech is subjected to linear prediction analysis to calculate prediction coefficients, the prediction coefficients are set in a synthesis filter, and the synthesis filter stores an adaptive codebook in which a plurality of pitch period vectors are stored and a plurality of noise waveform vectors. Speech coded by utilizing the pitch period vector and the noise waveform vector respectively selected from the selected noise codebook and driving in frame units composed of a plurality of samples of the input speech to synthesize synthesized speech. In the predictive coding method, the distortion is selected in order to select the pitch period vector and the noise waveform vector from the adaptive codebook and the noise codebook so that the distortion between the synthesized speech and the input speech is minimized. When auditory weighting is performed, the degree of weighting is adaptively controlled based on the frequency characteristics of the input voice. It is set to. The invention according to claim 2 is characterized in that, in the invention according to claim 1, the degree of weighting is adaptively controlled by using a Percoll coefficient.

【００１５】[0015]

【作用】本発明によれば、合成音声と入力音声との歪に
施される聴覚的な重み付けの度合いが入力音声の周波数
特性に基づいて適応的に制御されるので、異なる周波数
特性を有する音声を符号化して伝送した場合でも、伝送
先で良い品質の復号音声が得られる。According to the present invention, since the degree of perceptual weighting applied to the distortion between the synthesized voice and the input voice is adaptively controlled based on the frequency characteristic of the input voice, voices having different frequency characteristics Even when encoded and transmitted, the decoded voice of good quality can be obtained at the transmission destination.

【００１６】[0016]

【実施例】以下、図面を参照して、本発明の実施例につ
いて説明する。本発明においては、音声の符号化装置の
構成は、図１および図２とほぼ同様であるが、歪パワー
計算部１２を構成する聴覚重み付けフィルタ１４の係数
γ₂の値が、以下に示す第１〜第３の実施例によってそ
れぞれ適応的に制御される。Embodiments of the present invention will be described below with reference to the drawings. In the present invention, the configuration of the speech coding apparatus is almost the same as that in FIGS. 1 and ₂ , but the value of the coefficient γ ₂ of the auditory weighting filter 14 that constitutes the distortion power calculation unit 12 is as shown below. It is adaptively controlled by each of the first to third embodiments.

【００１７】（１）第１の実施例（フレーム内処理）聴覚重み付けフィルタ１４の係数γ₁およびγ₂の値を決
定する際の試聴の結果、ＩＲＳ特性を有する音声におい
ては、係数γ₁およびγ₂の値は、γ₁＝０．９、γ₂＝
０．６が好ましい。いっぽう、ＮＯＮ−ＩＲＳ特性を有
する音声においては、γ₁＝０．９、γ₂＝０．４が好ま
しい。そこで、ＩＲＳ特性およびＮＯＮ−ＩＲＳ特性の
両特性に最適となるように、入力音声に応じて係数γ₂
の値を制御すればよい。つまり、入力音声がＩＲＳ特性
を有する場合には、γ₂＝０．６と設定し、入力音声が
ＮＯＮ−ＩＲＳ特性を有する場合には、γ₂＝０．４と
設定すればよい。(1) First Embodiment (Intra-frame Processing) As a result of the audition when determining the values of the coefficients γ ₁ and γ ₂ of the perceptual weighting filter 14, the coefficient γ ₁ and the coefficient γ ₁ and The values of γ ₂ are γ ₁ = 0.9 and γ ₂ =
0.6 is preferred. On the other hand, γ ₁ = 0.9 and γ ₂ = 0.4 are preferable for voices having NON-IRS characteristics. Therefore, the coefficient γ ₂ is changed according to the input voice so as to be optimal for both the IRS characteristic and the NON-IRS characteristic.
You can control the value of. That is, γ ₂ = 0.6 is set when the input voice has the IRS characteristic, and γ ₂ = 0.4 is set when the input voice has the NON-IRS characteristic.

【００１８】また、ＩＲＳ特性を有する音声およびＮＯ
Ｎ−ＩＲＳ特性を有する音声を分析した結果、両者に
は、ＰＡＲＣＯＲ係数（パーコール係数）の１次の係数
ｋ₁の出現確率の分布に大きな違いがあることがわかっ
た。すなわち、このＰＡＲＣＯＲ係数ｋ₁は、必ず−１
＜ｋ₁＜１の範囲に存在するが、ＮＯＮ−ＩＲＳ特性を
有する音声を分析して得られたＰＡＲＣＯＲ係数ｋ
₁は、値＋１付近に集中する傾向があり、いっぽう、Ｉ
ＲＳ特性を有する音声を分析して得られたＰＡＲＣＯＲ
係数ｋ₁には、そのような傾向がない。Further, voice and NO having IRS characteristics
As a result of analyzing voices having N-IRS characteristics, it was found that the two have a large difference in the distribution of the appearance probability of the first-order coefficient k ₁ of the PARCOR coefficient (Parcor coefficient). That is, this PARCOR coefficient k ₁ must be -1.
PARKOR coefficient k which exists in the range of <k ₁ <1 but is obtained by analyzing voice having NON-IRS characteristics
₁ tends to concentrate near the value +1 and, on the other hand, I
PARCOR obtained by analyzing voice having RS characteristics
The coefficient k ₁ has no such tendency.

【００１９】ここで、図３に実際の音声データを処理し
て得られるＰＡＲＣＯＲ係数ｋ₁の値をその出現確率の
分布別に統計を取った時の値＋１付近の、ＩＲＳ特性を
有する音声を分析して得られたＰＡＲＣＯＲ係数ｋ₁の
出現確率の分布（曲線ａ）、およびＮＯＮ−ＩＲＳ特性
を有する音声を分析して得られたＰＡＲＣＯＲ係数ｋ₁
の出現確率の分布（曲線ｂ）を示す。図３からわかるよ
うに、ＮＯＮ−ＩＲＳ特性を有する音声は、ＰＡＲＣＯ
Ｒ係数ｋ₁＞０．９において多く存在するが、ＩＲＳ特
性を有する音声は、ＰＡＲＣＯＲ係数ｋ₁＞０．９の出
現が減少している。Here, in FIG. 3, a voice having an IRS characteristic is analyzed in the vicinity of a value +1 when statistics of the value of the PARCOR coefficient k ₁ obtained by processing the actual voice data by distribution of the appearance probability are analyzed. Distribution of the appearance probability of the PARCOR coefficient k ₁ (curve a) obtained, and the PARCOR coefficient k ₁ obtained by analyzing the voice having the NON-IRS characteristic
The distribution (curve b) of the appearance probability of is shown. As can be seen from FIG. 3, the voice having the NON-IRS characteristic is PARCO.
Although there are many R coefficients k ₁ > 0.9, the number of PARCOR coefficients k ₁ > 0.9 is reduced in speech having IRS characteristics.

【００２０】したがって、このＰＡＲＣＯＲ係数ｋ₁の
相違の特徴を用いて、聴覚重み付けフィルタ１４の係数
γ₂の値を、入力音声によって適応的に制御する。つま
り、入力音声を分析して得られたＰＡＲＣＯＲ係数ｋ₁
の値がしきい値Ｔｈ（たとえば、ｔＨ＝０．９）以上で
ある場合には、聴覚重み付けフィルタ１４の係数γ_２の
値を０．４に設定し、係数ｋ_１の値がしきい値Ｔｈより
小さい場合には、係数γ₂の値を０．６に設定する。な
お、ＰＡＲＣＯＲ係数ｋ₁は、予測係数決定部２におい
て線形予測分析を行う時に求めることができる。また、
従来の技術において既に説明したように、音声の符号化
はフレーム単位で行われるため、この実施例において
は、聴覚重み付けフィルタ１４の係数γ₂の適応的制御
もフレーム単位で行われる。Therefore, the value of the coefficient γ ₂ of the perceptual weighting filter 14 is adaptively controlled by the input voice using the feature of the difference of the PARCOR coefficient k ₁ . That is, the PARCOR coefficient k ₁ obtained by analyzing the input voice
Is greater than or equal to the threshold value Th (for example, tH = 0.9), the value of the coefficient γ ₂ of the auditory weighting filter 14 is set to 0.4, and the value of the coefficient k ₁ is set to the threshold value. When it is smaller than Th, the value of the coefficient γ ₂ is set to 0.6. The PARCOR coefficient k ₁ can be obtained when the prediction coefficient determination unit 2 performs the linear prediction analysis. Also,
As already described in the prior art, the audio coding is performed on a frame-by-frame basis. Therefore, in this embodiment, the adaptive control of the coefficient γ ₂ of the auditory weighting filter 14 is also performed on a frame-by-frame basis.

【００２１】（２）第２の実施例（フレーム間処理）図１に示す予測係数決定部２においては、入力音声デー
タについてフレーム単位で線形予測分析がなされ、合成
フィルタ３の予測係数が計算されるが、入力音声データ
の子音部や無声区間などでは、このフレーム単位での線
形予測分析が必ずしも有効ではない。そのため、ＮＯＮ
−ＩＲＳ特性を有する音声においても、フレーム単位で
は、ＰＡＲＣＯＲ係数ｋ₁の値が値＋１付近に集中する
とは限らない。また、聴覚重み付けフィルタ１４の係数
γ₂の値がフレーム単位で大きく変化した場合、復号音
声の連続性が失われて望ましくない。(2) Second Embodiment (Interframe Processing) In the prediction coefficient determination unit 2 shown in FIG. 1, the input speech data is subjected to linear prediction analysis in frame units, and the prediction coefficient of the synthesis filter 3 is calculated. However, in the consonant part or the unvoiced section of the input voice data, the linear prediction analysis for each frame is not always effective. Therefore, NON
Even in a voice having the -IRS characteristic, the value of the PARCOR coefficient k ₁ does not always concentrate around the value +1 in frame units. Further, if the value of the coefficient γ ₂ of the perceptual weighting filter 14 greatly changes in units of frames, the continuity of decoded speech is lost, which is not desirable.

【００２２】そこで、この実施例においては、係数γ₂
の値を（４）式で表されるように、現在処理しようとし
ているフレーム（現フレーム）の入力音声データの係数
γ₂と、過去に処理されたＭ個のフレームのそれぞれの
入力音声データの係数γ₂の和によって表現する。Therefore, in this embodiment, the coefficient γ ₂
As shown in the equation (4), the coefficient γ ₂ of the input voice data of the frame (current frame) currently being processed and the input voice data of each of the M frames processed in the past are expressed as It is expressed by the sum of the coefficients γ ₂ .

【数４】（４）式において、ｎはフレーム番号（現フレームのフ
レーム番号はｎ）、γ₂（ｎ）はフレーム番号ｎのフレ
ームの入力音声データを分析して得られたＰＡＲＣＯＲ
係数ｋ₁によって決まる係数γ₂、Ｍは次数（たとえば、
Ｍ＝３）、Ｗ_iは重み係数である。このように、係数γ₂
を和の形で表現することにより、フレーム単位での係数
γ₂の急激な変化が避けられるので、復号音声の連続性
が失われることはない。[Equation 4] In the equation (4), n is the frame number (the frame number of the current frame is n), and γ ₂ (n) is the PARCOR obtained by analyzing the input voice data of the frame of the frame number n.
The coefficient γ ₂ , M determined by the coefficient k ₁ is an order (for example,
M = 3), W _i is a weighting coefficient. Thus, the coefficient γ ₂
By expressing the sum in the form of a sum, a rapid change of the coefficient γ _{2 in} a frame unit can be avoided, so that the continuity of the decoded speech is not lost.

【００２３】（３）第３の実施例ところで、実際の使用状況において、音声の符号化装置
に入力される入力音声データがＩＲＳ特性を有するか、
あるいはＮＯＮ−ＩＲＳ特性を有するか不明である。し
かしながら、たとえば、使用する電話機によって音声の
周波数特性が決まってくるので、電話機の機種が決定さ
れれば、入力音声データがＩＲＳ特性を有するか、ある
いはＮＯＮ−ＩＲＳ特性を有するかが決まり、以後、両
特性が逆転することはあまり考えられない。(3) Third Embodiment By the way, in an actual use situation, whether the input voice data input to the voice encoding device has the IRS characteristic,
Alternatively, it is unclear whether it has NON-IRS characteristics. However, for example, since the frequency characteristic of the voice is determined by the telephone used, if the model of the telephone is determined, it is determined whether the input voice data has the IRS characteristic or the NON-IRS characteristic. It is unlikely that the two characteristics will be reversed.

【００２４】そこで、この実施例においては、図３に示
すＰＡＲＣＯＲ係数ｋ₁のの出現確率の分布に基づい
て、聴覚重み付けフィルタ１４の係数γ₂の値を（５）
式に表すように適応的に制御する。Therefore, in this embodiment, the value of the coefficient γ ₂ of the auditory weighting filter 14 is set to (5) based on the distribution of the occurrence probability of the PARCOR coefficient k ₁ shown in FIG.
It is adaptively controlled as shown in the equation.

【数５】（５）式において、γ_2min＝０．４、γ_2max＝０．６、
α₁＝０．０５、α₂＝０．０１、β＝０．００１、ａ₁
＝０．９７、ａ₂＝０．９５、ａ₃＝０．９０、初期値γ
₂（０）＝０．６とする。また、（５）式において、ｎ
はフレーム番号である。[Equation 5] In _{_{(5), γ 2min = 0.4, γ 2max}} = 0.6,
α ₁ = 0.05, α ₂ = 0.01, β = 0.001, a ₁
_{= 0.97, a 2 = 0.95,} a 3 = 0.90, the initial value γ
₂ (0) = 0.6. Further, in the equation (5), n
Is the frame number.

【００２５】入力音声データがＩＲＳ特性を有する場合
には、係数γ₂は、ａ₁＝０．９７以上の値が出現する確
率はゼロであり、ａ₂＝０．９５以上の値が出現する確
率も非常に少ない。そのため、係数γ₂（ｎ）はほとん
ど変化しない。つまり、入力音声データがＩＲＳ特性を
有する場合には、係数γ₂（ｎ）が０．６に保たれる。
いっぽう、入力音声データがＮＯＮ−ＩＲＳ特性を有す
る場合には、係数γ₂は、ａ₂＝０．９５以上の値が出現
する確率が非常に高く、α₁＝０．０５ならば、４フレ
ーム後には、γ₂（ｎ）＝０．４となる。そして、音声
区間によって係数γ₂の値が０．９以下である場合で
も、増加がβ＝０．００１と小さいので、急激に変化し
ない。また、係数γ₂の値が０．０５を越えるフレーム
があれば、係数γ₂（ｎ）は０．４に収束する。以上、
本発明の実施例を図面を参照して詳述してきたが、具体
的な構成はこれらの実施例に限られるものではなく、本
発明の要旨を逸脱しない範囲の設計の変更等があっても
本発明に含まれる。When the input voice data has the IRS characteristic, the coefficient γ ₂ has a zero probability that a value of a ₁ = 0.97 or more appears, and a value of a ₂ = 0.95 or more appears. The probability is also very low. Therefore, the coefficient γ ₂ (n) hardly changes. That is, when the input voice data has the IRS characteristic, the coefficient γ ₂ (n) is maintained at 0.6.
On the other hand, when the input voice data has the NON-IRS characteristic, the coefficient γ ₂ has a very high probability that a value of a ₂ = 0.95 or more appears, and if α ₁ = 0.05, 4 frames. Later, γ ₂ (n) = 0.4. Even if the value of the coefficient γ ₂ is 0.9 or less depending on the voice section, since the increase is as small as β = 0.001, it does not change rapidly. Further, if there is a frame in which the value of the coefficient γ ₂ exceeds 0.05, the coefficient γ ₂ (n) converges to 0.4. that's all,
Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and even if there are design changes and the like within the scope not departing from the gist of the present invention. Included in the present invention.

【００２６】[0026]

【発明の効果】以上説明したように、本発明によれば、
聴覚重み付けフィルタの係数を入力音声の周波数特性に
応じて適応的に制御することができるので、異なる周波
数特性を有する音声を符号化して伝送した場合でも、伝
送先で良い品質の復号音声を得ることができるという効
果がある。As described above, according to the present invention,
Since the coefficient of the perceptual weighting filter can be adaptively controlled according to the frequency characteristics of the input speech, even if the speech having different frequency characteristics is encoded and transmitted, a decoded speech of good quality can be obtained at the transmission destination. There is an effect that can be.

[Brief description of drawings]

【図１】本発明の第１〜第３の実施例および従来例によ
る音声の予測符号化方法を適用した音声の符号化装置の
構成を表すブロック図である。FIG. 1 is a block diagram showing a configuration of a speech coding apparatus to which a speech predictive coding method according to first to third embodiments of the present invention and a conventional example is applied.

【図２】歪パワー計算部１２の構成の一例を示すブロッ
ク図である。FIG. 2 is a block diagram showing an example of a configuration of a distortion power calculation unit 12.

【図３】入力音声の周波数特性の違いによるＰＡＲＣＯ
Ｒ係数ｋ₁の出現確率の分布の一例を示す図である。[Fig. 3] PARCO due to difference in frequency characteristics of input voice
Is a diagram showing an example of the distribution of the probability of occurrence of R factor k _1.

[Explanation of symbols]

１入力端子２予測係数決定部３合成フィルタ４予測係数量子化部５適応符号帳６，８利得部７雑音符号帳９加算器１０パワー量子化部１１減算器１２歪パワー計算部１３符号出力部１４聴覚重み付けフィルタ１５パワー計算部 1 Input Terminal 2 Prediction Coefficient Determining Section 3 Synthesis Filter 4 Prediction Coefficient Quantization Section 5 Adaptive Codebook 6,8 Gain Section 7 Noise Codebook 9 Adder 10 Power Quantization Section 11 Subtractor 12 Distortion Power Calculation Section 13 Code Output Section 14 Auditory Weighting Filter 15 Power Calculation Unit

Claims

[Claims]

1. A predictive coefficient is calculated by performing linear predictive analysis on input speech, the predictive coefficient is set in a synthesis filter, and the synthesis filter is provided with an adaptive codebook in which a plurality of pitch period vectors are stored. A speech is generated by synthesizing a synthesized speech by driving in a frame unit composed of a plurality of samples of the input speech with a pitch period vector and a noise waveform vector respectively selected from a noise codebook in which a noise waveform vector is stored. In the predictive coding method for speech, the pitch period vector and the noise waveform vector are selected from the adaptive codebook and the noise codebook so that the distortion between the synthesized speech and the input speech is minimized. Therefore, when the distortion is auditorily weighted, the degree of the weighting is adaptively controlled based on the frequency characteristic of the input voice. Predictive coding method of speech, characterized by.

2. The degree of weighting is adaptively controlled by using a Percoll coefficient.
Predictive coding method of the described speech.