JPH02160300A

JPH02160300A - Voice encoding system

Info

Publication number: JPH02160300A
Application number: JP63316040A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-12-13
Filing date: 1988-12-13
Publication date: 1990-06-20

Abstract

PURPOSE:To obtain a voice encoding and decoding device of good voice quality with a relatively small arithmetic quantity by weighting a spectrum parameter when the impulse response of a synthesizing filter has high periodicity. CONSTITUTION:A voice signal is classified according to phonetic features and a sound source signal which matches the classifications is used. The vowellike section of the sound source signal is represented by using multipulses in a representative section and at least either of an amplitude correction coefficient and a phase correction coefficient in other pitch sections among pitch sections obtained by dividing a frame and the frictional section is represented by using the combination of a small number of multipulses and a noise signal. Further, the intensity of the periodicity of the impulses response of the synthesizing filter is decided and when the periodicity is high, the spectrum parameter is weighted so as to widen the band width of the synthesizing filter. Therefore, even a female voice, specially, 'i' and 'u' are generated while the underestimation of the band width of the synthesizing filter is prevented. Consequently, an excellent synthetic voice is obtained.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声符号化方式、特に音声信号を低いビットレ
ート、即ち４．８ｋｂ／ｓ程度で、比較的すくない演算
量により高品質に符号化するための音声符号化方式に関
する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an audio encoding method, in particular, to encoding audio signals with high quality at a low bit rate, that is, about 4.8 kb/s, and with a relatively small amount of calculation. This paper relates to a speech encoding method for

[Conventional technology]

音声信号を４．８ｋｂ／ｓ程度の低いビットレートで符
号化する方式としては、例えば特開昭６１−１５０００
０　　（文献１）や特開昭６２−０３８５００　　（文
献２）等に記載されているピッチ補間マルチパルス法が
知られている。これらの方法では、送信側はフレーム毎
の音声信号から音声信号のスペクトル特性を表すスペク
トルパラメータとピッチを表すピッチパラメータとを抽
出し、音声信号を有声区間と無声区間との２種類に分類
し、有声区間では１フレームの音源信号を、１フレーム
をピッチ区間毎に分割した複数個のピッチ区間のうちの
一つのピッチ区間（代表区間）についてマルチパルスで
表し、代表区間におけるマルチパルスの振幅および位置
とスペクトルおよびピッチパラメータとを伝送する。ま
た、無声区間では１フレームの音源を少数のマルチパル
スと雑音信号とで表しマルチパルスの振幅および位置と
雑音信号のゲインおよびインデックスとを伝送する。As a method for encoding audio signals at a low bit rate of about 4.8 kb/s, for example, Japanese Patent Application Laid-open No. 15000/1983
Pitch interpolation multi-pulse methods are known, such as those described in JP-A-62-038500 (Document 2) and the like. In these methods, the transmitting side extracts a spectral parameter representing the spectral characteristics of the audio signal and a pitch parameter representing the pitch from the audio signal for each frame, and classifies the audio signal into two types: voiced sections and unvoiced sections. In a voiced section, the sound source signal of one frame is expressed as a multipulse for one pitch section (representative section) out of a plurality of pitch sections obtained by dividing one frame into pitch sections, and the amplitude and position of the multipulse in the representative section are expressed as multipulses. and spectrum and pitch parameters. Furthermore, in the silent section, the sound source of one frame is represented by a small number of multipulses and a noise signal, and the amplitude and position of the multipulses and the gain and index of the noise signal are transmitted.

受信側は有声区間では、現フレームの代表区間のマルチ
パルスと隣接フレームの代表区間のマルチパルスとを用
いてマルチパルス同志の振幅と位置とを補間して、代表
区間以外のピッチ区間のマルチパルスを復元し、フレー
ムの駆動音源信号を復元する。また、無声区間ではマル
チパルスと雑音信号のインデックスおよびゲインとを用
いてフレームの駆動音源信号を復元する。さらに、復元
した駆動音源信号をスペクトルパラメータを用いた合成
フィルタに入力して合成音声信号を出力する。In the voiced section, the receiving side interpolates the amplitude and position of the multipulses using the multipulse in the representative section of the current frame and the multipulse in the representative section of the adjacent frame, and then interpolates the multipulse in the pitch section other than the representative section. and restore the driving sound source signal of the frame. Furthermore, in the silent section, the drive excitation signal of the frame is restored using the multi-pulse and the index and gain of the noise signal. Furthermore, the restored drive sound source signal is input to a synthesis filter using spectral parameters to output a synthesized speech signal.

[Problem to be solved by the invention]

しかしながら、上述した従来の方式では、有声区間では
代表区間のマルチパルス同志の補間によりフレームの駆
動音源信号を復元しているので、母音連鎖のように母音
から母音へ遷移する部分や有声の過渡部など、音源信号
の特性が変化しているフレームでは、補間により復元し
た駆動音源信号は実際の音源信号とは大きく異なり、そ
の結果合成音声の音質が劣化している。また、有声区間
の鼻音区間では、音源信号に明確な周期性が現れないの
で、前記ピッチ補間の方法では十分に音源信号を表すこ
とができない。一方、このような音声の特性が大きく変
化する部分は、音韻知覚や自然性の知覚に非常に重要で
あることが知覚実験から知られているが、従来の方式で
はこれらの部分の情報が十分に復元できないので音質が
劣化するという大きな問題点がある。また、無声区間で
はマルチパルスと雑音とを用いて音源信号を表している
が、子音区間でも摩擦音の音源は雑音性となるが、破裂
音ではパルス的な部分が多くなるため、従来の方式のよ
うに、音声信号を単純に有声と無声との２種に分類して
表しているのみでは、良好な合成音声が得られないとい
う問題点がある。However, in the conventional method described above, in voiced sections, the driving sound source signal of the frame is restored by interpolation between multi-pulses in the representative section, so it is not possible to restore the driving sound source signal of the frame by interpolating the multi-pulses of the representative section. For frames in which the characteristics of the sound source signal change, such as, the driving sound source signal restored by interpolation is significantly different from the actual sound source signal, and as a result, the sound quality of the synthesized voice is degraded. Furthermore, in the nasal section of the voiced section, no clear periodicity appears in the sound source signal, so the pitch interpolation method cannot adequately represent the sound source signal. On the other hand, it is known from perceptual experiments that parts of the voice where the characteristics change greatly are extremely important for phonological perception and naturalness perception, but conventional methods do not provide sufficient information on these parts. There is a big problem that the sound quality deteriorates because it cannot be restored. In addition, in unvoiced sections, the sound source signal is expressed using multipulses and noise, but even in consonant sections, the sound source of fricatives is noisy, but in plosives there are many pulse-like parts, so the conventional method Thus, there is a problem in that good synthesized speech cannot be obtained by simply classifying and representing speech signals into two types, voiced and unvoiced.

一方、音声のスペクトル包絡を表すスペクトルパラメー
タの分析には従来、線形予測分析（ＬＰＣ分析）がよく
知られている。しかしながら、女性の音声、特にイ、つ
などにおいては、ＬＰＧ分析がピッチ周波数の基本およ
び高調波成分に影響されるために、ＬＰＣ分析により得
られたスペクトルパラメータを用いた合成フィルタの特
性は、実際の音声のスペクトル包絡に比べ、特に音声の
第１ホルマントに対応する周波数におけるバンド幅が極
端に狭くなってしまうという問題点がある。従って音源
信号を求める際に、このようなスペクトルパラメータを
用いると、音源信号にはピッチの周期性が表われず、音
源の周期性を仮定した前記ピッチ補間を用いたマルチパ
ルスにより音源信号を表した場合、合成音声の音質が大
きく劣化してしまうという問題点がある。On the other hand, linear predictive analysis (LPC analysis) is conventionally well known for analyzing spectral parameters representing the spectral envelope of speech. However, since LPG analysis is influenced by the fundamental and harmonic components of the pitch frequency for female voices, especially i, tsu, etc., the characteristics of a synthesis filter using spectral parameters obtained by LPC analysis are There is a problem in that the bandwidth especially at the frequency corresponding to the first formant of the voice becomes extremely narrow compared to the spectral envelope of the voice. Therefore, if such spectral parameters are used when determining the sound source signal, the pitch periodicity will not appear in the sound source signal, and the sound source signal will be represented by multipulses using the pitch interpolation described above assuming periodicity of the sound source. In this case, there is a problem that the sound quality of the synthesized speech deteriorates significantly.

本発明の目的は、上述した問題点を解決し、比較的少な
い演算量により４．８ｋｂ／ｓ程度で音質の良好な音声
符号化復号化装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems and provide a speech encoding/decoding device that achieves good sound quality at approximately 4.8 kb/s with a relatively small amount of calculation.

[Means to solve the problem]

本発明による音声符号化方式は、入力した離散的な音声
信号からスペクトル包絡を表すスペクトルパラメータと
ピッチを表すピッチパラメータとをフレーム毎に求め、
前記スペクトルパラメータを用いて構成されるフィルタ
のインパルス応答に周期性が強いときには前記スペクト
ルパラメータに重みずけを施し、前記音声信号の特徴を
表す判別パラメータを抽出して前記音声信号を複数個の
種類に分類し、前記種類に応じて前記フレーム毎の音声
信号の音源信号として前記フレーム区間を前記ピッチパ
ラメータに応じた小区間に分割し、前記小区間の内の１
つの区間において求めたマルチパルスと、前記マルチパ
ルスに関して振幅あるいは位相の少なくとも一方を補正
するための補正情報もしくはコードブックとマルチパル
スとの少なくとも一方を求めて出力することにより構成
される。The audio encoding method according to the present invention calculates a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch from an input discrete audio signal for each frame,
When the impulse response of a filter configured using the spectral parameters has strong periodicity, weighting is applied to the spectral parameters, a discrimination parameter representing the characteristics of the audio signal is extracted, and the audio signal is divided into a plurality of types. The frame section is divided into small sections according to the pitch parameter as the sound source signal of the audio signal for each frame according to the type, and one of the small sections is divided into subsections according to the pitch parameter.
It is configured by determining and outputting the multi-pulse obtained in one interval, correction information for correcting at least one of the amplitude or phase of the multi-pulse, or at least one of a codebook and the multi-pulse.

[Effect]

本発明による音声符号化方式の第１の特徴は、フレーム
の音声信号を予め定められた種類に分類する。以下では
、−例として母音性、鼻音性、摩擦性、破裂性の４種に
分類する例について述べるが、この種類は音源の音声生
成上の違いに応じて音源信号を良好に表せるように適切
に選択する必要がある。４種類以外の構成も可能である
。The first feature of the audio encoding method according to the present invention is to classify audio signals of frames into predetermined types. Below, we will discuss an example in which the sound source is classified into four types: vowel, nasal, fricative, and plosive, but these types are appropriate to express the sound source signal well according to the differences in sound production of the sound source. need to be selected. Configurations other than the four types are also possible.

音声の分類方法としては、音声信号の特徴を表す判別パ
ラメータを抽出して第２図のように音声信号を上述の４
種に分類する。このパラメータとしては、例えば信号の
パワーあるいはそのＲＭＳ（Ｒｏｏｔ　Ｍｅａｎ　５ｑ
ｕａｒｅ）、短時間（例えば５ｍｓ　）毎のパワーの変
化あるいは変化率、短時間毎のスペクトル変化あるいは
変化率、ピッチゲインなどを用いることができる。The voice classification method involves extracting the discrimination parameters that represent the characteristics of the voice signal and classifying the voice signal into the above-mentioned 4 types as shown in Figure 2.
Classify into species. As this parameter, for example, the power of the signal or its RMS (Root Mean 5q
uare), power change or change rate every short time (for example, 5 ms), spectral change or change rate every short time, pitch gain, etc. can be used.

また、第２の特徴は、フレーム毎にＬＰＣ分析を用いて
抽出した音声信号のスペクトル包絡を表すスペクトルパ
ラメータにおいて、スペクトルパラメータにより構成さ
れる合成フィルタのインパルス応答に周期性が強いとき
は、第１ホルマントに相当する帯域でフィルタのバンド
幅が過小推定されていると判断して、前記スペクトルパ
ラメータに適切な重みずけを施す。ここでスペクトルパ
ラメータにより構成される合成フィルタの伝達特性Ｈ（
ｚ）は次のように書ける。The second feature is that when the impulse response of the synthesis filter composed of the spectral parameters has strong periodicity in the spectral parameters representing the spectral envelope of the audio signal extracted using LPC analysis for each frame, the first It is determined that the bandwidth of the filter is underestimated in the band corresponding to the formant, and appropriate weighting is applied to the spectral parameters. Here, the transfer characteristic H(
z) can be written as follows.

ここでａｌはスペクトルパラメータ、Ｐはフィルタの次
数を示す。この合成フィルタのインパルス応答ｈ（ｎ）
は次式により求められる。Here, al is a spectral parameter, and P is the order of the filter. Impulse response h(n) of this synthesis filter
is determined by the following formula.

ｈ（ｎ）４ａ＋ｈ（ｎ−ｉ））Ｇδ（ｎ）　　　　　　
　（ｎ≧０）　　　　　　−（２）ここで、Ｇは励振源
の振幅である。そして、ｈ（ｎ）から求めたピッチゲイ
ンＰＣＩが予め定められたしきい値よりも大きければ、
インパルス応答の周期性が強いと判断する。ここでピッ
チゲインはｈ（ｎ）の自己相関関数を予め定められた時
間遅れだけ求め、これの最大値をとる時間遅れの点での
自己相関係数の値を用いることができる。次に、インパ
ルス応答に周期性が強いときは、次式のようにスペクト
ルパラメータに重みずけを施す。h(n)4a+h(n-i))Gδ(n)
(n≧0) −(2) where G is the amplitude of the excitation source. Then, if the pitch gain PCI obtained from h(n) is larger than a predetermined threshold,
It is determined that the impulse response has strong periodicity. Here, the pitch gain can be determined by calculating the autocorrelation function of h(n) by a predetermined time lag, and using the value of the autocorrelation coefficient at the time lag that takes the maximum value. Next, when the impulse response has strong periodicity, the spectral parameters are weighted as shown in the following equation.

ａＩ：ａＩｒｌ（１≦ｉ≦Ｐ）　　　　　　　　　　、
（３）ここでｒは１よりも小さい正の値をとる。ｒの値
により、合成フィルタのバンド幅は下式で示す量Ｂ（Ｈ
ｚ、）だけ広がる。aI: aIrl (1≦i≦P),
(3) Here, r takes a positive value smaller than 1. Depending on the value of r, the bandwidth of the synthesis filter is determined by the amount B(H
It spreads by z,).

Ｂ＝Ｆｓ／　π１．（ｒ）　　（Ｈｚ）　　　　　　　
　−（４）−例として、ｒを０．９８、Ｆｓを８ｋＨｚ
に選ぶと、Ｂは約５０Ｈｚとなる。B=Fs/π1. (r) (Hz)
-(4)-As an example, r is 0.98 and Fs is 8kHz.
If selected, B will be approximately 50 Hz.

以下で説明するマルチパルスの計算には、合成フィルタ
のインパルス応答に周期性が強いと判断されたときは、
（３）式で重みずけをしたスペクトルパラメータを用い
る。周期性が強くないときには、（３）式の重みずけは
行わない。以上が本発明の第２の特徴である。For multi-pulse calculations explained below, when it is determined that the impulse response of the synthesis filter has strong periodicity,
Spectral parameters weighted using equation (3) are used. When the periodicity is not strong, the weighting in equation (3) is not performed. The above is the second feature of the present invention.

次に、音源信号の求め方について説明する。Next, how to obtain the sound source signal will be explained.

信号のパワーあるいはそのＲｍＳとピッチゲインなどを
用いてフレームが母音区間か否かを判別する。母音性区
間では、第３図にしめすように、フレーム区間をあらか
じめ求めたピッチ周期毎の複数個のピッチ区間に分割し
、このピッチ区間のうちの１つのピッチ区間（代表区間
）についてマルチパルスを求める。次に同一フレーム内
の他のピッチ区間については、前記マルチパルスに対す
る振幅補正係数Ｃｋおよび位相補正係数ｄｋを求める。It is determined whether the frame is a vowel section or not using the signal power or its RmS and pitch gain. In the vowel interval, as shown in Figure 3, the frame interval is divided into a plurality of pitch intervals for each pitch period determined in advance, and a multipulse is applied to one pitch interval (representative interval) among these pitch intervals. demand. Next, for other pitch sections within the same frame, the amplitude correction coefficient Ck and phase correction coefficient dk for the multi-pulse are determined.

そしてフレーム毎に、音源情報として代表区間のフレー
ム内のピッチ位置、代表区間のマルチパルスの振幅およ
び位置と、同一フレームの他のピッチ区間における振幅
補正係数ｃｋおよび位相補正係数ｄｋとを補正情報とし
て伝送する。代表区間は、最も良好な合成音声信号が求
められる区間を探索して求めてもよいし、フレーム内で
固定としてもよい。音質は前者の方が良好であるが、演
算量は多くなる。Then, for each frame, as sound source information, the pitch position in the frame of the representative section, the amplitude and position of the multipulse in the representative section, and the amplitude correction coefficient ck and phase correction coefficient dk in other pitch sections of the same frame are used as correction information. Transmit. The representative section may be found by searching for a section where the best synthesized speech signal is found, or may be fixed within the frame. The former has better sound quality, but requires more calculations.

以下に振幅補正係数Ｃｋおよび位相補正係数ｄｋの求め
方、さらに代表区間の探索法を示す。今、フレームで求
めた平均ピッチ周期をＴとする。フレームをＴ毎のサブ
フレーム区間に分割した様子を第３図（ｂ）に示す。こ
こでは代表区間を探索する場合について示す。代表区間
の候補となるサブフレームを例えばサブフレーム■とす
る。サブフレーム■についてあらかじめ定められた個数
りのマルチパルスの振幅および位置を求める。マルチパ
ルスの求め方については、相互相関関数Φ□と自己相関
関数Ｒｈｂを用いて求める方法が知られており、これは
例えば前記文献１．２や、Ａｒａｓｅｋｉ。The following describes how to obtain the amplitude correction coefficient Ck and the phase correction coefficient dk, and how to search for a representative section. Now, let T be the average pitch period found for each frame. FIG. 3(b) shows how the frame is divided into T subframe sections. Here, a case will be described in which a representative section is searched. For example, a subframe that is a candidate for the representative section is subframe ■. The amplitudes and positions of a predetermined number of multipulses are determined for subframe ■. As for how to obtain multipulses, a method is known that uses a cross-correlation function Φ□ and an autocorrelation function Rhb, and this method is described, for example, in the above-mentioned document 1.2 and Araseki.

Ｏｚａｗａ、Ｏｎｏ、０ｃｈｉａｉ氏による°Ｍｕｌｔ
ｉ−ｐｕｌｓｅ　ＥｘｃｉｔｅｄＳｐｅｅｃｈ　Ｃｏｄ
ｅｒ　Ｂａ５ｅｄ　ｏｎ　Ｍａｘｉｍｕｍ　Ｃｒｏｓｓ
−ｃｏｒｒｅｌａしｉｏｎ　　５ｅａｒｃｈ　　Ａｌｇ
ｏｒｉｔｈｍ、”（ＧＬＯＢＥＣＯ）４　８３．　　Ｉ
ＥＥＥＧｌｏｂａｌ　Ｔｅｌｅ−ｃｏｍｍｕｎｉｃａｔ
ｉｏｎｓ　Ｃｏｎｆｅｒｅｎｃｅ　、講演番号２３．３
．１９８３）　　（文献３）に記載されているのでここ
では説明を省略する。°Mult by Ozawa, Ono, Ochiai
i-pulse ExcitedSpeech Cod
er Ba5ed on Maximum Cross
-correla ion 5earch Alg
orithm,” (GLOBECO) 4 83. I
EEEGlobal Tele-communicat
ions Conference, lecture number 23.3
．． 1983) (Reference 3), so the explanation will be omitted here.

代表区間のマルチパルスの振幅および位置をそれぞれｇ
、およびｍ＋Ｈ＝ｔ〜Ｌ）とする。これを第３図（ｃ）
に示す。代表区間以外の区間ｋにおける振幅補正係数ｃ
ｋ、位相補正係数ｄｋは、これらと合成フィルタをもち
いて区間ｋについて合成した合成音声ｘ’ｋ（ｎ）と、
該当区間の音声Ｘｋ（ｎ）との重みすけ誤差電力九を最
小化するように求めることができる。重みすけ誤差電力
ＥｋはＥｋ＝ΣＩ［ｘｋ（ｎ）−ｘ’　ｈ（ｎ）］＊ｗ（ｎ）
ｌ　２−　（５）ただしＸ′　ｋ（ｎ）＝ｃｋΣｇ＋　　・ｈ（ｎ−ｍ＋−Ｔ−
ｄ　ｋ）　　　　　　　　　　　　　　・＋　・　（６
）ここで、ｗ（ｎ）は聴感重みすけフィルタのインパル
ス応答を示す。ただしこのフィルタはなくてもよい。ま
たｈ（ｎ）は音声を合成するための合成フィルタのイン
パルス応答を示す。ｃｋ、　ｄｙは（５）式を最小化す
るように求めることができる。このためには、例えばま
ずｄｉを固定して、（５）式をｃｋについて偏微分して
Ｏとおき、次式を得る。The amplitude and position of the multipulse in the representative section are expressed as g.
, and m+H=t~L). This is shown in Figure 3(c)
Shown below. Amplitude correction coefficient c in section k other than the representative section
k, phase correction coefficient dk, and synthesized speech x'k(n) synthesized for section k using these and a synthesis filter,
It can be determined to minimize the weighted difference power 9 with respect to the sound Xk(n) in the corresponding section. The weighted error power Ek is Ek=ΣI[xk(n)-x' h(n)]*w(n)
l 2- (5) However, X' k(n)=ckΣg+ ・h(n-m+-T-
d k) ・+ ・ (6
) Here, w(n) represents the impulse response of the perceptual weighting filter. However, this filter is not necessary. Further, h(n) represents an impulse response of a synthesis filter for synthesizing speech. ck and dy can be obtained by minimizing equation (5). To do this, for example, first fix di, partially differentiate equation (5) with respect to ck, and set it as O to obtain the following equation.

ΣＸｗｋ（ｎ）ｘ’　ｗｉ＋（ｎ）ここでｘ、ｋ　（ｎ）＝ｘｋ（ｎ）ネｗ（ｎ）　　　　　　　
　　　　　　　　　−（８ａ）ｘ’ｗｋ（ｎ）”Σｇ＋
ｈ（ｎ−ｍ＋−Ｔ−ｄｋ）１ｗ（ｎ）　　　−（８ｂ）
従って（７）式の値を種々のｄｋの値について求め、（
７）式のｃｋを最小化するｄ、、　ｃｋの組合せを求め
ることにより（５）式のＥＫは最小化される。このよう
にして、代表区間以外のピッチ区間に対してＣｋ、　ｄ
ｋを求めフレーム全体に対して次式で定義される重みす
け誤差電力Ｅを求める。ΣXwk(n)x'wi+(n) where x, k (n)=xk(n)new(n)
−(8a)x'wk(n)"Σg+
h(n-m+-T-dk)1w(n)-(8b)
Therefore, the value of equation (7) is found for various values of dk, and (
7) By finding a combination of d, ck that minimizes ck in equation (5), EK in equation (5) is minimized. In this way, Ck, d for pitch sections other than the representative section
k is determined, and the weighted offset error power E defined by the following equation is determined for the entire frame.

Ｅ・ΣＥｋ　　　　　　　　　　　　　　　　・・・（
９）ここでＮは該当フレームに含まれるサブフレームの
個数である。ただし、代表ピッチ区間（第３図の例では
サブフレーム区間■）の重みずけ誤差電力Ｅ２は次式で
求める。E・ΣEk...(
9) Here, N is the number of subframes included in the corresponding frame. However, the weighted error power E2 of the representative pitch section (subframe section ■ in the example of FIG. 3) is determined by the following equation.

Ｅ２＝Σｆ［ｘ（ｎ）＝Σｇ＋・ｈ（ｎ−ｍ＋）］＊ｗ
（ｎ）１２−（１０）代表ピッチ区間の探索は、全ての
代表ピッチ区間の候補について、（５）〜（１０）式の
値を計算し、（９）式の誤差電力の値を最も小さくする
区間を代表ピッチ区間とすることができる。第３図（ｃ
）に探索後の代表ピッチ区間がサブフレーム■であった
場合について、代表区間のマルチパルスと、代表区間以
外のに番目の区間（第３図（ｃ）ではｋ・■、■、■）
の音源ｖｋ（ｎ）を振幅、位相補正係数を用いて次式に
従い発生させた例を示す。E2=Σf[x(n)=Σg+・h(n-m+)]*w
(n)12-(10) Search for a representative pitch section calculates the values of equations (5) to (10) for all representative pitch section candidates, and minimizes the value of the error power of equation (9). The section where the pitch is played can be set as the representative pitch section. Figure 3 (c
), when the representative pitch section after searching is subframe ■, the multi-pulse of the representative section and the second section other than the representative section (k・■, ■, ■ in Fig. 3(c))
An example will be shown in which the sound source vk(n) is generated according to the following equation using amplitude and phase correction coefficients.

ＶＫ（ｎ）”ＣＪｇｌ　・δ（ｎ−ｍ＋−Ｔ−ｄｋ）　
　　　　　−（１１）次に、鼻音区間では母音区間はど
音源のピッチ毎の周期性が強くないと予想されるので、
上述の方法ではなく、ピッチ予測マルチパルスやマルチ
パルスにより音源を表す。ここでピッチ予測マルチパル
スの求め方としては、特開昭６Ｏ−０５１９００（文献
４）を参照することができる。また、マルチパルスの求
め方は前記文献３を参照できる。なお、鼻音区間の判別
法は、例えばパワーあるいはそのＲＭＳ、ピッチゲイン
および次式で定義される第１次の対数断面積比ｒ１を用
いることができる。特に鼻音区間ではｒｌが大きくなる
特徴がある。VK(n)"CJgl ・δ(n-m+-T-dk)
-(11) Next, in the nasal interval, it is expected that the periodicity of each pitch of the sound source is not strong in the vowel interval, so
Rather than using the method described above, the sound source is represented by pitch prediction multipulses or multipulses. Here, for how to obtain the pitch prediction multi-pulse, reference can be made to JP-A-6O-051900 (Reference 4). Further, reference can be made to the above-mentioned document 3 for how to obtain multi-pulses. Note that the nasal interval can be determined using, for example, power or its RMS, pitch gain, and first-order logarithmic cross-sectional area ratio r1 defined by the following equation. Particularly in the nasal interval, rl is characteristically large.

ここでに、は１ン欠目のにパラメータ（ＰＡＲＣＯＲと
も呼ぶ）である。Here, is a parameter (also called PARCOR) for the first missing point.

一方、子音区間では音源をマルチパルス、あるいはマル
チパルスと雑音の組合せで表している。On the other hand, in the consonant section, the sound source is represented by multipulses or a combination of multipulses and noise.

子音区間では、摩擦性か破裂性かを判別し、摩擦性の場
合はマルチパルスと雑音あるいはコードブックで音源を
表す。具体的な方法は前記文献２等を参照できる。また
、破裂性の場合は、音源をマルチパルスで表す。摩擦性
と破裂性の判別法としては、短時間（例えば５ｍ５）毎
のパワーあるいはそのＲＭＳの変化や変化率などのパラ
メータを用いることができる。In the consonant interval, it is determined whether the sound is fricative or plosive, and if it is fricative, the sound source is represented by multipulses and noise or a codebook. For the specific method, reference can be made to the above-mentioned document 2 and the like. In addition, in the case of rupture, the sound source is represented by multipulses. As a method for determining the frictional property and the rupturable property, parameters such as power every short period of time (for example, 5 m5) or a change or rate of change in the RMS thereof can be used.

〔Example〕

次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明による音声符号化方式の一実施例を示す
ブロック図である。図において、入力端子１００から音
声信号を入力し、１フレ一ム分（例えば２０ｍ５　）の
音声信号をバッファメモリ１１０に格納する。ピッチ分
析回路１３０は、フレームの音声信号から平均ピッチ周
期Ｔを計算する。FIG. 1 is a block diagram showing an embodiment of a speech encoding method according to the present invention. In the figure, an audio signal is input from an input terminal 100, and the audio signal for one frame (for example, 20 m5) is stored in a buffer memory 110. Pitch analysis circuit 130 calculates an average pitch period T from the audio signal of the frame.

この方法としては例えば自己相関法にもとづく方法が知
られており、詳細は前記文献１．２のピッチ抽出回路を
参照することができる。また、この方法以外にも他の衆
知な方法（例えば、ケプストラム法、５ＩＦＴ法、変相
開法など）を用いることができる。ピッチ符号化回路１
５０は平均ピッチ周期Ｔをあらかじめ定められたビット
数で量子化して得た符号をマルチプレクサ２６０へ出力
するとともに、これを復号化して得た復号ピッチ周期Ｔ
′を音源信号計算回路２２０、補間回路２８２、駆動信
号復元回路２８３へ出力する。As this method, for example, a method based on an autocorrelation method is known, and for details, refer to the pitch extraction circuit in Document 1.2. In addition to this method, other well-known methods (for example, cepstrum method, 5IFT method, phase change open method, etc.) can be used. Pitch encoding circuit 1
50 outputs a code obtained by quantizing the average pitch period T with a predetermined number of bits to the multiplexer 260, and also outputs a code obtained by decoding this to the decoded pitch period T.
' is output to the sound source signal calculation circuit 220, the interpolation circuit 282, and the drive signal restoration circuit 283.

Ｋパラメータ計算回路１４．０は、フレームの音声信号
のスペクトル特性を表すスペクトルパラメータとして、
Ｋパラメータを前記フレームの音声信号から衆知のＬＰ
Ｃ分析を行いあらかじめ定められた次数Ｍだけ計算する
。この具体的な計算法については前記文献１．２のにパ
ラメータ計算回路を参照することができる。なお、Ｋパ
ラメータはＰＡＲＣＯＲ係数と同一のものである。The K parameter calculation circuit 14.0 calculates the following as a spectral parameter representing the spectral characteristics of the audio signal of the frame.
The K parameter is calculated from the audio signal of the frame by the well-known LP
Perform C analysis and calculate only a predetermined order M. Regarding this specific calculation method, reference can be made to the above-mentioned document 1.2 regarding the parameter calculation circuit. Note that the K parameter is the same as the PARCOR coefficient.

周期性判別回路１４５は、Ｋパラメータを一旦線形予測
係数ａ、に衆知の方法を用いて変換し、線形予測係数に
より構成される合成フィルタのインパルス応答を計算す
る。ここで合成フィルタの伝達特性は前記（１）式で表
される。また、インパルス応答の計算には前記（２）式
をもちいる。次にインパルス応答の周期性を判別する。The periodicity determination circuit 145 once converts the K parameter into a linear prediction coefficient a using a well-known method, and calculates the impulse response of the synthesis filter constituted by the linear prediction coefficient. Here, the transfer characteristic of the synthesis filter is expressed by the above equation (1). Further, the above equation (2) is used to calculate the impulse response. Next, determine the periodicity of the impulse response.

具体的には、インパルス応答のピッチゲインを計算し、
これを予め定めら°れたしきい値と比較して、予め定め
られたしきい値よりも大きければ周期性が強いと判断す
る。ここで、ピッチゲインの計算にはインパルス応答の
自己相関関数を予め定められた遅れ時間だけ計算し、最
大の値をとる遅れ時間における自己相関係数の値をピッ
チゲインとすることができる。詳細は前記文献１．２の
ピッチ抽出回路を参照できる。ピッチゲインが予め定め
られたしきい値よりも大きいときには、前記（３）式に
従い、線形予測係数ａ、に予め定められた重みずけｒを
施す。ここでｒの値は１よりも小さい正の値である。こ
のようにして重みずけられた線形予測係数は、再びにパ
ラメータに逆変換され、Ｋパラメータ符号化回路１６０
へ出力される。にパラメータと線形予測係数との変換は
、Ｊ、Ｍａｋｈｏｕ１氏らによる’Ｌｉｎｅａｒ　Ｐｒ
ｅｄｉｃｔ、ｉｏｎ　ｏｆ　５ｐｅｅｃｈ　”　と題し
た単行本（文献５）等を参照できる。Specifically, we calculate the pitch gain of the impulse response,
This is compared with a predetermined threshold, and if it is larger than the predetermined threshold, it is determined that the periodicity is strong. Here, in calculating the pitch gain, the autocorrelation function of the impulse response is calculated for a predetermined delay time, and the value of the autocorrelation coefficient at the delay time that takes the maximum value can be taken as the pitch gain. For details, refer to the pitch extraction circuit in Document 1.2. When the pitch gain is larger than a predetermined threshold, a predetermined weight r is applied to the linear prediction coefficient a according to equation (3). Here, the value of r is a positive value smaller than 1. The linear prediction coefficients weighted in this way are inversely transformed into parameters again and sent to the K-parameter encoding circuit 160.
Output to. The conversion between parameters and linear prediction coefficients is described in 'Linear Pr' by J. Makhou et al.
You can refer to the book titled "Edict, ion of 5peech" (Reference 5).

Ｋパラメータ符号化回路１６０は前記にパラメータを予
め定められた量子化ビット数で量子化して得た符号ρｋ
をマルチプレクサ２６０へ出力するとともに、これを復
号化してさらに線形予測係数ａ、’（ｉ＝１〜Ｍ）に変
換して重み付は回路２００、補間回路２８２へ出力する
。Ｋパラメータの符号（ヒ、にパラメータから線形予測
係数への変換の方法については前記文献１．２等を参照
することができる。The K parameter encoding circuit 160 generates a code ρk obtained by quantizing the parameters with a predetermined number of quantization bits.
is output to the multiplexer 260, and is also decoded and further converted into linear prediction coefficients a,' (i=1 to M), which are output to the weighting circuit 200 and the interpolation circuit 282. Regarding the method of converting the sign (hi, ni) parameter of the K parameter into a linear prediction coefficient, reference can be made to the above-mentioned document 1.2.

インパルス応答計算回路１７０は、前記線形予測係数ａ
１’　を用いて聴感重みずけを行った合成フィルタのイ
ンパルス応答り、（ｎ）を計算し、これを自己相関関数
計算回路１８０へ出力する。自己相関関数計算回路１８
０は前記インパルス応答の自己相関関数Ｒｈｈ（ｎ）を
予め定められた遅れ時間まで計算して出力する。インパ
ルス応答計算回路１７０および自己相関関数計算回路１
８０の動作は前記文献１．２等を参照することができる
。The impulse response calculation circuit 170 calculates the linear prediction coefficient a.
The impulse response (n) of the synthesis filter subjected to perceptual weighting using 1' is calculated and outputted to the autocorrelation function calculation circuit 180. Autocorrelation function calculation circuit 18
0 calculates and outputs the autocorrelation function Rhh(n) of the impulse response up to a predetermined delay time. Impulse response calculation circuit 170 and autocorrelation function calculation circuit 1
Regarding the operation of 80, reference can be made to the above-mentioned documents 1.2, etc.

減算器１９０は、フレームの音声信号×（ｎ）から合成
フィルタ２８１の出力を１フレーム分減算し減算結果を
重み付は回路２００へ出力する。重み付は回路２００は
前記減算結果をインパルス応答がｗ（ｎ）で表される聴
感重みすけフィルタに通し、重みすけ信号ｘ、（ｎ）を
得てこれを出力する。重みずけの方法は前記文献１．２
等を参照できる。The subtracter 190 subtracts the output of the synthesis filter 281 by one frame from the frame audio signal x(n), and outputs the subtraction result to the weighting circuit 200. For weighting, the circuit 200 passes the subtraction result through an auditory weighting filter whose impulse response is represented by w(n) to obtain a weighting signal x,(n) and output it. The weighting method is described in the above document 1.2.
etc. can be referred to.

相互相関関数計算回路２１０は、重みすけ信号ｘ、（ｎ
）とインパルス応答ｈＷ（ｎ）とを入力して相互相関関
数Φ。を予め定められた遅れ時間まで計算し出力する。The cross-correlation function calculation circuit 210 calculates the weighted scale signal x, (n
) and the impulse response hW(n) to obtain the cross-correlation function Φ. is calculated and output up to a predetermined delay time.

この計算法は前記文献１．２等を参照できる。For this calculation method, reference can be made to the above-mentioned documents 1 and 2.

判別回路２１５は、フレームの音声信号の種類を判別す
る。ここでは−例として作用の項で述べたように、母音
性、鼻音性、摩擦性、破裂性の４種に分類することにす
るが、分類数は４種に限られるわけではなく他の分類法
を用いることもできる。これらの判別には、作用の項で
述べたように、フレームの音声信号のパワーやそのＲＭ
Ｓ、ピッチゲイン、短時間毎のパワーあるいはそのＲＭ
Ｓの変化、フレーム間のスペクトル変化などを用いるこ
とができる。これらのパラメータを用いて判別した種類
は音源信号計算回路２２０およびマルチプレクサ２６０
へ出力される。The determination circuit 215 determines the type of audio signal of the frame. Here, as an example, as mentioned in the section on action, we will classify it into four types: vowel, nasal, fricative, and plosive, but the number of classifications is not limited to four, and there are other types as well. You can also use the law. For these determinations, as mentioned in the section on effects, the power of the frame audio signal and its RM are used.
S, pitch gain, power per short time or its RM
A change in S, a change in spectrum between frames, etc. can be used. The type determined using these parameters is determined by the sound source signal calculation circuit 220 and the multiplexer 260.
Output to.

音源信号計算回路２２０では、母音性の判別は、パワー
あるいはそのＲＭＳが予め定められたしきい値以上で、
ピッチゲインが予め定められたしきい値以上であること
で判別する。この場合は、前記作用の項で説明したよう
に、復号化した平均ピッチ周期Ｔ′を用いてフレームを
あらかじめピッチ周期毎のサブフレーム（ピッチ区間）
に分割し、音源信号として、代表的な１ピッチ区間（代
表区間）の候補となるピッチ区間に対してマルチパルス
の位置ｌｌ１１と振幅ｇ＋とを求める。In the sound source signal calculation circuit 220, vowel character is determined when the power or its RMS is equal to or higher than a predetermined threshold;
The determination is made based on whether the pitch gain is greater than or equal to a predetermined threshold. In this case, as explained in the operation section, the frame is divided into subframes (pitch sections) for each pitch period in advance using the decoded average pitch period T'.
The position ll11 and the amplitude g+ of the multi-pulse are determined for a pitch section that is a candidate for a representative one pitch section (representative section) as a sound source signal.

次に振幅・位相補正回路２７０は、前記作用の項の（３
）　、　（４）式に従い、他のピッチ区間ｋにおける音
源信号発生のためのマルチパルスの振幅補正係数ｃｋ、
位相補正係数ｄ１を計算する。さらにこれらの値を音源
信号計算回路２２０へ出力し、音源信号計算回路２２０
では前記作用の項で述べた（１）　、　（５）　、　（
６）式に基づき、いくつかの候補区間についてフレーム
全体の誤差電力Ｅを計算し、Ｅを最も小さくするピッチ
区間を代表区間として選択し、代表区間のサブフレーム
番号を示す情報ＰＩ、代表区間のマルチパルスの振幅ｇ
１、位置ｎ＋（ｔ−ｔ〜Ｌ）、および他のピッチ区間の
振幅補正係数ｃｋ、位相補正係数ｄ、を出力する。Next, the amplitude/phase correction circuit 270 performs (3
), according to equation (4), multi-pulse amplitude correction coefficient ck for generating sound source signal in other pitch section k,
Calculate the phase correction coefficient d1. Furthermore, these values are output to the sound source signal calculation circuit 220, and the sound source signal calculation circuit 220
Now, (1), (5), (
6) Based on the formula, calculate the error power E of the entire frame for several candidate sections, select the pitch section that minimizes E as the representative section, and calculate the information PI indicating the subframe number of the representative section, the information PI indicating the subframe number of the representative section, Multipulse amplitude g
1, the amplitude correction coefficient ck and the phase correction coefficient d of the position n+(t-t~L) and other pitch sections are output.

次に鼻音性の判別は、ピッチゲインがあらがしめ定めら
れなしきい値よりも大きく、１吹口の対数断面積比があ
らかじめ定められたしきい値よりも大きいことで判別す
る。この場合は、フレーム区間全体に対して、例えばマ
ルチパルスを求める。Next, nasality is determined based on whether the pitch gain is larger than a predetermined threshold value and the logarithmic cross-sectional area ratio of one mouthpiece is larger than a predetermined threshold value. In this case, for example, multi-pulses are determined for the entire frame section.

一方、子音区間では、摩擦性と破裂性の判別は例えば、
短時間（例えば５ｍｓ　）毎のスペクトルの変化や短時
間（例えば５ｍｓ程度）毎のパワーあるいはそのＲＭＳ
の変化が予め定められたしきい値よりも大きければ破裂
性、そうでなければ摩擦性と判別する。摩擦性の判別に
は、低域（例えば１　ｋＨｚ以下）と高域（例えば２ｋ
Ｈｚ以上）とのパワーあるいはそのＲＭＳの比を用いる
こともできる。On the other hand, in consonant intervals, the discrimination between fricative and plosive is, for example,
Spectrum changes every short time (for example, 5ms), power every short time (for example, about 5ms), or its RMS
If the change in is larger than a predetermined threshold value, it is determined that there is rupture, and otherwise, it is determined that there is friction. To determine the friction property, low range (e.g. 1 kHz or less) and high range (e.g. 2 kHz) are used.
It is also possible to use the ratio of the power or its RMS to (Hz or higher).

摩擦性の場合は、予め定められた個数のマルチパルスと
雑音信号あるいはコードブックとで音源信号を表す。具
体的な方法は前記文献１．２を参照することができる。In the case of friction, the sound source signal is represented by a predetermined number of multipulses and a noise signal or codebook. For the specific method, reference can be made to the above-mentioned document 1.2.

まずあらかじめ定められた個数のマルチパルスを求めた
後に、雑音メモリに複数種類格納されている雑音信号あ
るいはコードブックの種類を表すインデックスとゲイン
とを求める。これらの計算はフレームを予め定められた
区間長に分割したサブフレーム毎に行う。この場合音源
信号として伝送するのは、マルチパルスの振幅および位
置と雑音信号のインデックスおよびゲインとである。First, a predetermined number of multipulses are determined, and then the index and gain representing the types of noise signals or codebooks stored in the noise memory are determined. These calculations are performed for each subframe obtained by dividing the frame into predetermined section lengths. In this case, what is transmitted as the sound source signal are the amplitude and position of the multipulse and the index and gain of the noise signal.

また、破裂性の場合は、フレーム全体で予め定められた
個数のマルチパルスの振幅と位置とを求める。Furthermore, in the case of rupture, the amplitude and position of a predetermined number of multipulses are determined over the entire frame.

符号化回路２３０は、母音性の場合は、代表区間のマル
チパルスの振幅ｇ１および位置ｍ１を予め定められたビ
ット数で符号化して出力する。また、代表区間のサブフ
レームを示す情報Ｐ＋、振幅補正係数Ｃｋおよび位相補
正係数ｄｋを予め定められたビット数で符号化してマル
チプレクサ２６０へ出力する。さらに、これらを復号化
して駆動信号復元回路２８３へ出力する。鼻音性、破裂
性の場合はマルチパルスの振幅および位置を符号化して
マルチプレクサ２６０へ出力するとともに復号化して駆
動音源復元回路２８３へ出力する。また摩擦性の場合は
、マルチパルスの振幅および位置を符号化し雑音信号の
ゲインおよびインデックスを符号化してマルチプレクサ
２６０へ出力すると共にこれらを復号化して駆動音源復
元回路２８３へ出力する。In the case of vowel character, the encoding circuit 230 encodes the amplitude g1 and position m1 of the multi-pulse in the representative section using a predetermined number of bits and outputs the encoded signal. Further, the information P+ indicating the subframe of the representative section, the amplitude correction coefficient Ck, and the phase correction coefficient dk are encoded with a predetermined number of bits and output to the multiplexer 260. Furthermore, these are decoded and output to the drive signal restoration circuit 283. In the case of nasality or plosiveness, the amplitude and position of the multipulse are encoded and output to the multiplexer 260, and also decoded and output to the driving sound source restoration circuit 283. In the case of friction, the amplitude and position of the multi-pulse are encoded, and the gain and index of the noise signal are encoded and output to the multiplexer 260, and at the same time, these are decoded and output to the drive sound source restoration circuit 283.

駆動音源復元回路２８３は、母音性区間では、平均ピッ
チ周期Ｔ″を用いてフレームを前記音源信号計算回路２
２０と同様な方法で分割し、代表区間のサブフレームを
示す情報Ｐ、と代表区間のマルチパルスの復号化された
振幅および位置とを用いて、代表区間にはマルチパルス
を発生し、代表区間以外のピッチ区間では、前記代表区
間のマルチパルスと復号化された振幅補正係数と復号化
された位相補正係数とを用いて、前記（７）式に従い音
源信号Ｖｋ（ｎ）を復元する。In the vowel interval, the drive sound source restoration circuit 283 converts the frame into the sound source signal calculation circuit 2 using the average pitch period T''.
The information P indicating the subframe of the representative section and the decoded amplitude and position of the multi-pulse in the representative section are used to generate multi-pulses in the representative section. In other pitch sections, the sound source signal Vk(n) is restored using the multipulse of the representative section, the decoded amplitude correction coefficient, and the decoded phase correction coefficient according to the equation (7).

一方、鼻音性、破裂性、摩擦性区間では、マルチパルス
を発生させる。摩擦性区間ではさらに雑音信号のインデ
ックスを用いて雑音メモリ２２５から雑音信号をアクセ
スして、それにゲインを乗じて駆動音源信号を復元する
。摩擦性区間での駆動音源信号復元の詳細は前記文献２
を参照することができる。On the other hand, multi-pulses are generated in the nasal, plosive, and fricative sections. In the frictional section, the noise signal is further accessed from the noise memory 225 using the index of the noise signal, and is multiplied by a gain to restore the drive sound source signal. Details of drive sound source signal restoration in a frictional section can be found in the above-mentioned document 2.
can be referred to.

補間回路２８２は、母音性区間では、線形予測係数を一
旦にパラメータに変換してにパラメータ上でピッチ周期
Ｔ゛のサブフレーム区間毎に補間し、線形予測係数に逆
変換し出力する。なお、補間はにパラメータ上のみなら
ず他の衆知なパラメータ、例えば対数断面積比などを用
いることができる。鼻音性や子音区間では補間はおこな
わない。In the vowel interval, the interpolation circuit 282 converts the linear prediction coefficients into parameters at once, interpolates on the parameters for each subframe interval of pitch period T', and inversely converts them into linear prediction coefficients for output. Note that the interpolation can use not only the parameters but also other well-known parameters such as the logarithmic cross-sectional area ratio. Interpolation is not performed in nasal or consonant intervals.

合成フィルタ２８１は、前記復元された駆動音源信号を
入力し、前記線形予測係数ａｌ’　を入力して１フレ一
ム分の合成音声信号を求めるとともに、次のフレームへ
の影響信号を１フレーム分計算しこれを減算器１９０へ
出力する。なお、影響信号の計算法は特開昭５９−１１
６７９４　　（文献７）等を参照できる。The synthesis filter 281 inputs the restored driving sound source signal and the linear prediction coefficient al' to obtain a synthesized speech signal for one frame, and also calculates the influence signal for the next frame for one frame. It is calculated and output to the subtracter 190. The method for calculating the influence signal is based on Japanese Patent Application Laid-open No. 59-11.
6794 (Reference 7) etc.

マルチプレクサ２６０は、音源信号を表す符号、フレー
ムの音声の種類を表す符号、母音性区間では代表区間の
サブフレーム位置を表す符号、平均ピッチ周期の符号、
およびにパラメータを表す符号を組み合わせて出力する
。The multiplexer 260 receives a code representing the sound source signal, a code representing the type of audio of the frame, a code representing the subframe position of the representative interval in the vowel interval, a code representing the average pitch period,
and the code representing the parameter are combined and output.

以上は本発明の一実施例の説明であるが、本発明の一構
成に過ぎずその変形例も種々考えられる。Although the above is an explanation of one embodiment of the present invention, it is only one configuration of the present invention, and various modifications thereof can be considered.

例えば、合成フィルタのインパルス応答の周期性の強さ
の判別は、音声の母音区間のみにおいて行うようにして
もよい。また、前記実施例では、摩擦性区間は音源信号
を少数のマルチパルスと雑音信号とで表したが、これは
衆知の５ｔｏｃｈａｓｊｉｃ　ｃｏｄｉＢの方法により
表すこともできる。この方法の詳細については、例えば
５ｃｈｒｏｅｄｅｒ、Ａｔａ１氏による“Ｃｏｄｅ−ｅ
ｘｃｉｔｅｄ　１ｉｎｅａｒ　ｐｒｅｄｉｃｔｉｏｎ（
ＣＥＬＰ）：　Ｈｉｇｈｑｕａｌｉｔｙ　　５ｐｅｅｃ
ｈ　　ａｔ　　ｖｅｒｙ　　ｌｏｗ　　ｂｉｔ、　　ｒ
ａしｅｓ、”　　（ＩＣＡＳＳＰ、９３７−９４０．１
９８５）　（文献８）等を参照できる。For example, the strength of the periodicity of the impulse response of the synthesis filter may be determined only in the vowel section of the speech. Further, in the above embodiment, the frictional section is represented by a sound source signal with a small number of multipulses and a noise signal, but this can also be represented by the well-known method of 5 tochasjic codiB. For details on this method, see, for example, “Code-e
xcited 1inear prediction (
CELP): High quality 5peec
h at very low bit, r
aes,” (ICASSP, 937-940.1
985) (Reference 8).

さらに、雑音メモリ２２５に格納されている雑音信号と
しては、あらかじめ定められた確率密度特性（例えばガ
ウス分布など）を有する白色雑音信号を格納しておいて
もよいし、予め多量の音声信号を予測して求めた予測残
差信号から学習により計算した値によってもよい。前者
の方法は前記文献６を参照できる。また後者の方法につ
いては、例えば、Ｍａｋｈｏｕ　１氏らによる°Ｖｅｃ
ｔｏｒ　Ｑｕａｎｔｉｚａｔｉｏｎ　ｉｎ　５ｐｅｅｃ
ｈ　Ｃｏｄｉｎｇ、”（Ｐｒｏｃ、ＩＥＥＥ、ｖｏｌ、
７３，１１．１５５１−１５８８．１９８５）　　（文
献９）等を参照できる。Further, as the noise signal stored in the noise memory 225, a white noise signal having a predetermined probability density characteristic (for example, Gaussian distribution, etc.) may be stored, or a large amount of audio signals may be predicted in advance. It may be a value calculated by learning from the prediction residual signal obtained by the calculation. For the former method, reference can be made to the above-mentioned document 6. Regarding the latter method, for example, °Vec by Makhou et al.
tor Quantization in 5peec
h Coding,” (Proc, IEEE, vol.
73, 11.1551-1588.1985) (Reference 9).

また、実施例ではフレームの音声信号を母音性、鼻音性
、摩擦性、破裂性の４種に分類して異なる音源信号を用
いたが、この分類数を変えてもよい。Furthermore, in the embodiment, the frame audio signals are classified into four types: vowel, nasal, fricative, and plosive, and different sound source signals are used, but the number of classifications may be changed.

また、実施例では、スペクトルパラメータとしてにパラ
メータを符号化し、その分析法としてＬＰＧ分析を用い
たが、スペクトルパラメータとしては池の衆知なパラメ
ータ、例えばＬＳＰ、ＬＰＣケプストラム、ケプストラ
ム、改良ケプストラム、一般化ケブストラム、メルケブ
ストラムなどを用いることもできる。また各パラメータ
に最適な分析法を用いることができる。In addition, in the example, the parameters were encoded as spectral parameters and LPG analysis was used as the analysis method, but the spectral parameters may be the well-known parameters such as LSP, LPC cepstrum, cepstrum, improved cepstrum, generalized cepstrum. , melkebstrum, etc. can also be used. Furthermore, it is possible to use the optimal analysis method for each parameter.

また補間回路２８２における補間ずべきパラメータおよ
びその補間法については、他の衆知な方法を用いること
ができる。具体的な補間法は、例えばＡシミ　１氏らに
よる’５ｐｅｅｃｈ　Ａｎａｌｙｓｉｓ　ａｎｄＳｙｎ
ｔｈｅｓｉｓ　ｂｙ　Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉ
ｏｎ　ｏｆ　５ｐｅｅｃｈＷａｖｅ”と題した論文（Ｊ
、Ａｃｏｕｓｔ、Ｓｏｃ、Ａｎ＋、　、　ｐｐ、６３７
−６５５．１９７１）　　（文献１０）等を参照できる
。Further, other well-known methods can be used as the parameters to be interpolated in the interpolation circuit 282 and the interpolation method thereof. A specific interpolation method is, for example, '5peech Analysis and Syn' by A Simi et al.
thesis by Linear Predicti
on of 5peechWave” (J
, Acoust, Soc, An+, , pp, 637
-655.1971) (Reference 10) etc.

さらに、母音区間では、代表区間以外のピッチ区間に振
幅補正係数ｃｋと位相補正係数ｄｋとを求めて伝送した
が、復号化した平均ピッチ周期Ｔ′を隣接のピッチ周期
を用いてピッチ区間毎に補間することにより位相補正係
数を伝送しない構成とすることもできる。また振幅補正
係数はピッチ区間毎に伝送するのではなくてピッチ区間
毎に求めた振幅補正係数の値を最小２乗曲線あるいは最
小２乗直線で蓮似して、前記曲線あるいは直線の係数を
伝送するような構成にしてもよい、これらの方法は任意
の組合せにより用いることができる。これらの構成より
補正情報の伝送のための情報量を低減することができる
。Furthermore, in the vowel section, the amplitude correction coefficient ck and the phase correction coefficient dk were determined and transmitted for pitch sections other than the representative section, but the decoded average pitch period T' was calculated for each pitch section using the adjacent pitch period. It is also possible to adopt a configuration in which the phase correction coefficient is not transmitted by interpolation. Also, the amplitude correction coefficient is not transmitted for each pitch section, but the value of the amplitude correction coefficient obtained for each pitch section is shaped like a least squares curve or a least squares straight line, and the coefficient of the curve or straight line is transmitted. These methods can be used in any combination. These configurations can reduce the amount of information for transmitting correction information.

また位相補正係数として、例えばＯｎｏ、Ｏ□ａｗａ氏
らによる”２．４ｋｂｐｓ　Ｐｉｔｃｈ　Ｐｒｅｄｉｃ
ｔｉｏｎ　Ｍｕｌｔｉ−ｐ（１１ｓｅ　５ｐｅｅｃｈ　
Ｃｏｄｉｎｇ”と題した論文（Ｐｒｏｃ、　ＩＣＡＳＳ
ＰＳ４．９．１９８８）　　＜文献１１〉に記載されて
いるように、フレームの端で線形位相項τを求め、これ
を各ピッチ区間に分配し、ピッチ区間毎には位相補正係
数を求めない構成とすることもできる。In addition, as a phase correction coefficient, for example, "2.4 kbps Pitch Predic" by Ono, O□awa et al.
tion Multi-p (11se 5peech
Coding” (Proc, ICASS
PS4.9.1988) As described in <Reference 11>, a configuration in which the linear phase term τ is determined at the edge of the frame, this is distributed to each pitch section, and the phase correction coefficient is not determined for each pitch section. It is also possible to do this.

また、演算量を大幅に低減するために、母音区間では、
代表区間をフレーム内の予め定められた区間に固定しく
例えば、フレームのほぼ中央のピッチ区間や、フレーム
内でパワーの最も大きいピッチ区間など）、代表区間の
探索をしない構成としてもよい。この場合は、代表区間
の候補区間に対する（９）　、　（１０）式の計算が不
要となり大幅な演算量低減が可能となるが音質は低下す
る。In addition, in order to significantly reduce the amount of calculation, in the vowel interval,
The representative section may be fixed to a predetermined section within the frame (for example, the pitch section approximately in the center of the frame, the pitch section with the largest power within the frame, etc.), or the representative section may not be searched. In this case, calculation of equations (9) and (10) for the candidate section of the representative section becomes unnecessary, making it possible to significantly reduce the amount of calculation, but the sound quality deteriorates.

また、さらに演算量を低減するために、影響信号の計算
を省略することもできる。これによって、駆動信号１夏
元回路２８３、補間回路２８２、合成フィルタ２８１、
減算器１９０は不要となり演算量低減が可能となるが、
やはり音質は低下する。Furthermore, in order to further reduce the amount of calculation, calculation of the influence signal can be omitted. As a result, the drive signal 1 summer source circuit 283, the interpolation circuit 282, the synthesis filter 281,
The subtracter 190 becomes unnecessary and the amount of calculation can be reduced, but
Again, the sound quality deteriorates.

なお、ディジタル信号処理の分野でよく知られているよ
うに、自己相関関数は周波数軸上でパワースペクトルに
、相互相関関数はクロスパワースペクトルに対応してい
るので、これらから計算することもできる。これらの計
算法については、Ｏｐｐｅｎｈｅｉｍ氏らによる”Ｄｉ
ｇｉｔａｌ　ＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ　”　
（Ｐｒｅｎｔｉｃｅ−Ｈａｌｌ、１９７５）と題した単
行本（文献１２）を参照できる。Note that, as is well known in the field of digital signal processing, the autocorrelation function corresponds to the power spectrum on the frequency axis, and the cross-correlation function corresponds to the cross-power spectrum, so it can also be calculated from these. These calculation methods are described in “Di
Digital Signal Processing”
(Prentice-Hall, 1975).

〔Effect of the invention〕

以上述べたように本発明によれば、音声信号を音声学的
な特徴に基づきいくつかの種類に分類し、その分類に適
しｆＳ音源信号を用いていること、特に、母音性区間で
は、フレームをピッチ周期に分割したピッチ区間のうち
、１つのピッチ区間（代表区間）のマルチパルスと他の
ピッチ区間では振幅補正係数および位相補正係数の少な
くとも一方を用いて表していること、牽擦性区間では、
少数のマルチパルスと雑音信号との組合せにより音源信
号を表していること、また、スペクトルパラメータによ
り構成される合成フィルタのインパルス応答の周期性の
強さを判別して、周期性が強いときは、合成フィルタの
バンド幅を広げるようにスペクトルパラメータに対して
重みずけを施しているので、女性音声の特にイやつなと
でも合成フィルタのバンド幅の過小推定を防ぐことがで
き、良好な合成音声を得ることができるという効果があ
る。これらにより、男性、女性の音声によらず、また母
音、子音の定常区間は勿論のこと、音韻知覚や自然性の
知覚に重要な音声の特性が変化している部分（有声の過
渡部や母音間の変化部分）でも音質の劣化のほとんどな
い合成音声を得ることができるという大きな効果がある
。As described above, according to the present invention, audio signals are classified into several types based on phonetic characteristics, and an fS sound source signal suitable for the classification is used. Among the pitch sections divided into pitch periods, the multi-pulse of one pitch section (representative section) and the other pitch sections are expressed using at least one of the amplitude correction coefficient and the phase correction coefficient, and the frictional section. So,
It is determined that the sound source signal is represented by a combination of a small number of multi-pulses and a noise signal, and the strength of the periodicity of the impulse response of the synthesis filter composed of the spectral parameters.If the periodicity is strong, Since the spectral parameters are weighted to widen the bandwidth of the synthesis filter, it is possible to prevent underestimation of the bandwidth of the synthesis filter even for female voices that are particularly ugly or soft, resulting in good synthesized speech. It has the effect of being able to obtain As a result, the characteristics of the voice that are important for phonological perception and perception of naturalness change (voiced transient parts, vowel This has the great effect of making it possible to obtain synthesized speech with almost no deterioration in sound quality, even in the case of changes between the two.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
音声の分類法の一例を示す図、第３図は有声フレームで
の代表区間と代表区間のマルチパルスを示した説明図で
ある。１１０・・・バッファメモリ、１３０・・・ピッチ分析
回路、１４０・・・Ｋパラメータ計算回路、１４５・・
・周期性判別回路、１５０・・・ピッチ符号化回路、１
６０・・・Ｋパラメータ符号化回路、１７０・・・イン
パルス応答計算回路、１８０・・・自己相関関数計算回
路、２００・・・重み付は回路、２１０・・・相互相関
関数計算回路、２１５・・・判別回路、２２０・・・音
源信号計算回路、２２５・・・雑音メモリ、２３０・・
・符号化回路、２６０・・・マルチプレクサ、２７０・
・・振幅・位相補正係数計算回路、２８１・・・合成フ
ィルタ、２８２・・・補間回路、２８３・・・駆動信号
復元回路。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing an example of a voice classification method, and FIG. 3 is an explanatory diagram showing a representative section in a voiced frame and multipulses in the representative section. It is. 110...Buffer memory, 130...Pitch analysis circuit, 140...K parameter calculation circuit, 145...
- Periodicity discrimination circuit, 150... pitch encoding circuit, 1
60...K parameter encoding circuit, 170...Impulse response calculation circuit, 180...Autocorrelation function calculation circuit, 200...Weighting circuit, 210...Cross correlation function calculation circuit, 215. ... Discrimination circuit, 220 ... Sound source signal calculation circuit, 225 ... Noise memory, 230 ...
・Encoding circuit, 260...Multiplexer, 270・
... Amplitude/phase correction coefficient calculation circuit, 281 ... Synthesis filter, 282 ... Interpolation circuit, 283 ... Drive signal restoration circuit.

Claims

[Claims]

A spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are obtained for each frame from the input discrete audio signal, and when the impulse response of a filter configured using the spectral parameters has strong periodicity, the spectral parameters are The audio signal is classified into a plurality of types by weighting and discriminating parameters representing the characteristics of the audio signal are extracted, and the frame section is used as the source signal of the audio signal for each frame according to the type. The multi-pulse is divided into small sections according to the pitch parameter, and the multi-pulse obtained in one of the small sections, correction information or a codebook for correcting at least one of the amplitude or phase of the multi-pulse, and the multi-pulse are divided into small sections according to the pitch parameter. A voice encoding method characterized by determining and outputting at least one of a pulse and a pulse.