JPH10105200A

JPH10105200A - Voice coding/decoding method

Info

Publication number: JPH10105200A
Application number: JP8254904A
Authority: JP
Inventors: Kimio Miseki; 公生三関
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-09-26
Filing date: 1996-09-26
Publication date: 1998-04-24

Abstract

PROBLEM TO BE SOLVED: To obtain a synthesis voice of high quality even with a low bit rate by coding a driving signal using a Formant emphasizing filter having a characteristic based on a synthesis filter, plural driving signal candidates, and an input voice signal. SOLUTION: A synthesis filter information coding section 100 extracts information of a spectrum envelope of an input voice signal, and gives parameters of the synthesis filter to a Formant emphasis information generating section 102. A weight filter 101 performs weighting of an input voice signal. A pitch component coding section 105 encodes a pitch driving signal of the synthesis filter. A Formant emphasizing section 107 generates a candidate of a formant- emphasized noise driving signal using Formant emphasizing information from the Formant emphasis information generating section 102. A noise driving signal searching section 108 searches a candidate in which distortion is reduced, and outputs a noise driving signal and a weighting synthesis noise signal respectively Foltman-emphasized are outputted as an output of a noise component coding section 109.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、電話帯域の音声、
広帯域音声あるいはオーディオ信号等の圧縮符号化／復
号化を行うための音声符号化／復号化方法に関する。TECHNICAL FIELD The present invention relates to telephone band voice,
The present invention relates to a speech encoding / decoding method for performing compression encoding / decoding of a wideband speech or an audio signal.

【０００２】[0002]

【従来の技術】音声信号を圧縮符号化する技術として、
ＣＥＬＰ（Code-Excited Linear Prediction）が知られ
ている。ＣＥＬＰ方式の詳細については、例えば M.R.S
chroeder, and B.S.Atal“Code-Excited Linear Predic
tion(CELP):High-quality Speech at Very Low Rates”
Proc.ICASSP ′85,25,1.1,pp.937-940,1985 （文献１）
に開示されている。2. Description of the Related Art As a technique for compression-encoding an audio signal,
CELP (Code-Excited Linear Prediction) is known. For details of the CELP method, see MRS
chroeder, and BSAtal “Code-Excited Linear Predic
tion (CELP): High-quality Speech at Very Low Rates ”
Proc. ICASSP '85, 25, 1.1, pp. 937-940, 1985 (Reference 1)
Is disclosed.

【０００３】ＣＥＬＰ方式は、分析合成型符号化と呼ば
れる音声符号化方式の代表的な一つである。分析合成型
符号化では、音声信号を合成フィルタとこれを駆動する
駆動信号でモデル化し、駆動信号を合成フィルタに通過
させることにより合成音声信号を生成する。そして、合
成フィルタの特性と駆動信号を符号化して出力する。合
成フィルタに全極型のフィルタを用いるものは、音声符
号化や音声合成の分野ではＬＰＣ（線形予測符号化）合
成モデルと呼ばれる。合成フィルタは声道の共振特性を
表し、駆動信号は声帯信号に対応している。声道の共振
点はフォルマントと呼ばれる。[0003] The CELP system is a typical one of voice coding systems called analysis-synthesis coding. In the analysis-synthesis type coding, a speech signal is modeled by a synthesis filter and a drive signal for driving the synthesis filter, and the drive signal is passed through the synthesis filter to generate a synthesized speech signal. Then, the characteristics of the synthesis filter and the drive signal are encoded and output. A filter using an all-pole filter as a synthesis filter is called an LPC (Linear Predictive Coding) synthesis model in the field of speech coding and speech synthesis. The synthesis filter represents the resonance characteristics of the vocal tract, and the drive signal corresponds to the vocal cord signal. The resonance point of the vocal tract is called a formant.

【０００４】従来の分析合成型符号化では、予め用意し
た静的な白色的な雑音信号の候補から雑音駆動信号を選
び、これに声の高低に対応するピッチ周期を持つピッチ
駆動信号を結合して、最終的な駆動信号としている。Ｃ
ＥＬＰ方式では、この静的な雑音信号を発生したり再生
するために、予め設計時に定めた雑音信号の候補を格納
した雑音コードブックを使用することが多い。In the conventional analysis-synthesis type coding, a noise drive signal is selected from previously prepared candidates for a static white noise signal, and a pitch drive signal having a pitch period corresponding to the pitch of a voice is combined with the selected noise drive signal. And the final drive signal. C
In the ELP method, in order to generate or reproduce this static noise signal, a noise codebook that stores noise signal candidates determined in advance in design is often used.

【０００５】人間の音声は、短い時間（５〜１０ｍｓｅ
ｃ）の区間内では音源・声道の情報に比較的変化が少な
いため、区間毎に駆動信号・声道の情報を更新させるこ
とにより、ある程度高いビットレートの下ではＬＰＣモ
デルを用いて比較的効率よく音声を表現することができ
る。ＣＥＬＰ方式は、さらに駆動信号の符号化を聴覚重
み付けられた音声信号レベルの歪で評価する方法なの
で、８ｋｂｉｔ／ｓ程度までのビットレートであれば、
歪の少ない復号音声を得ることができる。[0005] A human voice is used for a short time (5 to 10 msec).
Since the information of the sound source and the vocal tract is relatively small in the section c), by updating the drive signal and the information of the vocal tract for each section, the LPC model is relatively used under a relatively high bit rate. Voices can be expressed efficiently. The CELP method further evaluates the encoding of the drive signal based on the distortion of the audio signal level weighted by the auditory sense. Therefore, if the bit rate is up to about 8 kbit / s,
A decoded voice with little distortion can be obtained.

【０００６】ところが、符号化に用いるビット数を削減
して４ｋｂｉｔ／ｓ程度にまでビットレートを低下させ
ると、特に駆動信号を表現するためのビット数の不足か
ら、符号化による歪が音としてはっきり知覚されるよう
になり、自然性が欠けたり雑音が混じるなどの劣化が顕
著となる。However, if the number of bits used for encoding is reduced and the bit rate is reduced to about 4 kbit / s, the distortion due to encoding becomes apparent as a sound especially due to the lack of bits for expressing the drive signal. It becomes perceived, and deterioration such as lack of naturalness or noise mixing becomes remarkable.

【０００７】[0007]

【発明が解決しようとする課題】上述したように、従来
のＣＥＬＰ方式のような音声合成モデルに基づく音声符
号化方法では、４ｋｂｉｔ／ｓ程度までビットレートを
低下させると、符号化歪がはっきり知覚されるようにな
り、高品質の復号音声を得ることが困難になるという問
題があった。本発明は、４ｋｂｉｔ／ｓ程度のビットレ
ートでも高品質な合成音声を得ることができる音声符号
化／復号化方法を提供することを目的とする。As described above, in a conventional speech coding method based on a speech synthesis model such as the CELP system, when the bit rate is reduced to about 4 kbit / s, coding distortion is clearly perceived. And it becomes difficult to obtain high quality decoded speech. An object of the present invention is to provide a speech encoding / decoding method capable of obtaining a high-quality synthesized speech even at a bit rate of about 4 kbit / s.

【０００８】[0008]

【課題を解決するための手段】上述した課題を解決する
ため、本発明は合成フィルタと該合成フィルタを駆動す
る駆動信号を用いて合成音声信号を表現し、入力音声信
号に対する合成音声信号の歪をより小さくする合成フィ
ルタの特性を表す情報および駆動信号を符号化する音声
符号化方法において、合成フィルタと、複数の駆動信号
候補と、入力音声信号に基づいて得られる特性を有する
フォルマント強調フィルタを用いて駆動信号の符号化を
行うことを基本的な特徴とする。In order to solve the above-mentioned problems, the present invention expresses a synthesized speech signal using a synthesis filter and a drive signal for driving the synthesis filter, and distorts the synthesized speech signal with respect to the input speech signal. In a speech encoding method for encoding information and a drive signal representing characteristics of a synthesis filter that reduces the size of a synthesis filter, a synthesis filter, a plurality of drive signal candidates, and a formant emphasis filter having characteristics obtained based on an input audio signal are provided. It is a basic feature that coding of the drive signal is performed by using this.

【０００９】より具体的には、入力音声信号に基づいて
得られる特性のフォルマント強調特性でフォルマント強
調された駆動信号候補を含む複数の駆動信号候補を用い
て、合成音声信号の歪をより小さくする駆動信号を選択
し、該選択した駆動信号を符号化する。More specifically, the distortion of the synthesized voice signal is reduced by using a plurality of drive signal candidates including the drive signal candidates which are formant-emphasized by the formant emphasis characteristics obtained based on the input voice signal. A drive signal is selected, and the selected drive signal is encoded.

【００１０】また、駆動信号をピッチ駆動信号と雑音駆
動信号の組み合わせで表現する場合には、ピッチ駆動信
号の候補および雑音駆動信号の候補の少なくとも一方の
少なくとも一部をフォルマント強調するようにする。When the driving signal is expressed by a combination of a pitch driving signal and a noise driving signal, at least a part of at least one of the pitch driving signal candidate and the noise driving signal candidate is formant-emphasized.

【００１１】このようにすると、音声のスペクトル特性
に合わせた動的なフォルマント強調による適応化をさせ
た駆動信号候補の中から好適な駆動信号を選択できるの
で、従来の静的な駆動信号候補から選択する方法に比べ
て大幅に符号化歪の小さい合成音声信号を生成できる。In this way, a suitable drive signal can be selected from the drive signal candidates that have been adapted by dynamic formant enhancement in accordance with the spectral characteristics of the voice, so that the conventional static drive signal candidates can be selected. As compared with the selection method, a synthesized speech signal with significantly smaller coding distortion can be generated.

【００１２】また、本発明は合成フィルタの特性を表す
情報からフォルマント強調特性を求めて、駆動信号候補
のフォルマント強調を行うことを特徴とする。こうする
ことにより、合成フィルタの特性を表す情報を符号化側
から復号化側に伝送することによって、フォルマント強
調用の特別なサイド情報を符号化側から復号化側に送る
こと無く、符号化側および復号化側の両方で共通のフォ
ルマント強調を行うことができる。すなわち、情報量を
増やすことなく合成音声の品質を大幅に改善することが
可能となる。Further, the present invention is characterized in that a formant emphasis characteristic is obtained from information indicating characteristics of a synthesis filter, and formant emphasis of a drive signal candidate is performed. In this way, by transmitting information representing the characteristics of the synthesis filter from the encoding side to the decoding side, it is possible to transmit special side information for formant enhancement from the encoding side to the decoding side without transmitting the side information for encoding. A common formant enhancement can be performed on both the decoding side and the decoding side. That is, it is possible to greatly improve the quality of synthesized speech without increasing the amount of information.

【００１３】さらに、本発明ではフォルマント強調特性
にスペクトルの傾きを補正する特性を含ませてもよい。
このようなスペクトルの傾き補正特性をフォルマント強
調特性に併せ持たせることにより、フォルマント強調処
理の中に混入する不要なスペクトルの傾きを駆動信号に
与えないようにできる。この結果、合成音声が時間的に
こもるような現象を防いで、より安定した合成音声が得
られるようになる。Further, in the present invention, the formant emphasizing characteristic may include a characteristic for correcting a spectrum inclination.
By providing such a spectrum inclination correction characteristic to the formant emphasis characteristic, it is possible to prevent an unnecessary spectrum inclination mixed in the formant emphasis processing from being given to the drive signal. As a result, it is possible to prevent a phenomenon in which the synthesized speech is muffled in time, and to obtain a more stable synthesized speech.

【００１４】駆動信号候補のフォルマント強調が有効で
ある理由として、さらに次のような理由が挙げられる。
音声を分析して合成フィルタの情報（音韻情報を表す）
を入力音声信号から除いて得られる駆動信号（符号化前
の駆動信号）を音声として再生してみると、音声の内容
がある程度は聞き取れる場合が多い。これは、除いたは
ずの音韻情報が駆動信号にいくらか残っており、駆動信
号が完全には白色化されていないことを意味している。
このため、駆動信号の符号化を行う際の駆動信号候補に
フォルマント強調処理を加えることで、より実際に近い
駆動信号を表現できることになる。The following is another reason why the formant enhancement of the drive signal candidate is effective.
Speech analysis and synthesis filter information (representing phoneme information)
When the drive signal (drive signal before encoding) obtained by removing the audio signal from the input audio signal is reproduced as audio, the content of the audio can often be heard to some extent. This means that some of the phoneme information that should have been removed remains in the drive signal, and the drive signal is not completely whitened.
For this reason, by applying a formant enhancement process to the drive signal candidates when the drive signal is encoded, a drive signal that is closer to the actual one can be expressed.

【００１５】また、フォルマント強調は音声のフォルマ
ントの間の谷の部分の周波数成分を少なくする働きがあ
るので、符号化側の駆動信号の探索候補にフォルマント
強調を用いる本発明の手法により、音声のフォルマント
と無関係な周波数成分を少なくしたものの中から、合成
音声信号の歪をより小さくするより好適な駆動信号を選
択できるので、最終的に生成される合成音声自体の主観
的な品質が向上するという効果がある。以上の理由か
ら、本発明によれば少ないビット数で、例えば４ｋｂｉ
ｔ／ｓ程度以下の低ビットレートでも、復号化側で高品
質な合成音声を生成することが可能となる。Also, since formant enhancement has the function of reducing the frequency components in the valleys between speech formants, the technique of the present invention in which formant enhancement is used as a search candidate for a drive signal on the encoding side is used. It is possible to select a more suitable drive signal that reduces the distortion of the synthesized speech signal from among those with reduced frequency components unrelated to formants, so that the subjective quality of the final synthesized speech itself is improved. effective. For the above reasons, according to the present invention, the number of bits is small, for example, 4 kbi.
Even at a low bit rate of about t / s or less, high-quality synthesized speech can be generated on the decoding side.

【００１６】さらに、本発明においては、雑音駆動信号
の符号化に際して、入力音声信号を基に駆動信号の符号
化に用いる聴覚重み付き合成フィルタの特性およびフォ
ルマント強調フィルタの特性と目標ベクトルを求め、こ
れら聴覚重み付き合成フィルタおよびフォルマント強調
フィルタを用いてフォルマント強調付きインパルス応答
を求め、目標ベクトルとフォルマント強調付きインパル
ス応答を用いて修正された目標ベクトルを求め、雑音駆
動信号候補と修正された目標ベクトルおよびフォルマン
ト強調付きインパルス応答を用いて雑音駆動信号の符号
化を行うようにしてもよい。この場合、さらにピッチ強
調フィルタの特性を併せて求め、聴覚重み付き合成フィ
ルタ、ピッチ強調フィルタおよびフォルマント強調フィ
ルタを用いてフォルマント強調付きインパルス応答を求
めるようにしてもよい。Further, in the present invention, when encoding a noise drive signal, the characteristics of a perceptually weighted synthesis filter, the characteristics of a formant emphasis filter, and the target vector used for encoding the drive signal are determined based on the input speech signal. A formant-enhanced impulse response is obtained by using these auditory weighted synthesis filter and formant-enhancement filter, a corrected target vector is obtained by using the target vector and the formant-enhanced impulse response, and a noise drive signal candidate and a corrected target vector are obtained. The noise drive signal may be encoded using the impulse response with formant emphasis. In this case, the characteristics of the pitch emphasis filter may be further obtained, and the impulse response with formant emphasis may be obtained using a synthesis filter with auditory weights, a pitch emphasis filter, and a formant emphasis filter.

【００１７】[0017]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を説明する。（第１の実施形態）図１は、本発明の第１の実施形態に
係る改良された合成モデルを従来のＣＥＬＰ方式に組み
入れた音声符号化／復号化装置の構成をブロック図で表
したものである。本実施形態において従来技術と異なる
のは、駆動信号をフォルマント強調するために必要な処
理の部分であり、他の部分については従来のＣＥＬＰ方
式と同様の構成である。また、本実施形態ではフォルマ
ント強調を特に雑音駆動信号候補に適用する符号化の構
成例について述べる。Embodiments of the present invention will be described below with reference to the drawings. (First Embodiment) FIG. 1 is a block diagram showing a configuration of a speech encoding / decoding device in which an improved synthesis model according to a first embodiment of the present invention is incorporated in a conventional CELP system. It is. The present embodiment differs from the prior art in the processing required for formant emphasizing the drive signal, and the other parts have the same configuration as the conventional CELP system. In the present embodiment, an example of a coding configuration in which formant emphasis is applied particularly to noise drive signal candidates will be described.

【００１８】［符号化側について］まず、符号化側につ
いて説明する。符号化側には、合成フィルタ情報符号化
部１００、重みフィルタ１０１、フォルマント強調情報
生成部１０２、ピッチ成分符号化部１０５、雑音成分符
号化部１０９、ゲイン成分符号化部１１０および局部復
号化部１１１が備えられている。[Encoding side] First, the encoding side will be described. On the encoding side, a synthesis filter information encoding unit 100, a weight filter 101, a formant emphasis information generation unit 102, a pitch component encoding unit 105, a noise component encoding unit 109, a gain component encoding unit 110, and a local decoding unit 111 are provided.

【００１９】合成フィルタ情報符号化部１００は、入力
端子１１からフレームと呼ばれる一定長の単位で入力さ
れた音声信号（以下、入力音声信号という）を分析し
て、音声信号の短期の相関を表すスペクトル包絡の情報
を抽出し、かつこれを符号化する。合成フィルタ情報符
号化部１００で符号化された情報は、合成フィルタの特
性を表すコードとして多重化部１１２に与えられ、さら
に合成フィルタのパラメータに変換されてフォルマント
強調情報生成部１０２に与えられる。音声信号の分析法
としては、例えば線形予測分析（ＬＰＣ：Linear Predi
ction Coding）を用いることができる。The synthesis filter information coding unit 100 analyzes a speech signal (hereinafter, referred to as an input speech signal) input from the input terminal 11 in units of a fixed length called a frame, and indicates a short-term correlation of the speech signal. The information of the spectral envelope is extracted and encoded. The information encoded by the synthesis filter information encoding unit 100 is provided to the multiplexing unit 112 as a code representing the characteristics of the synthesis filter, and is further converted to parameters of the synthesis filter and provided to the formant emphasis information generation unit 102. As a method of analyzing a speech signal, for example, linear prediction analysis (LPC: Linear Predi
ction Coding) can be used.

【００２０】重みフィルタ１０１は、聴覚のマスキング
特性を利用して主観的な音質を向上させるために用いら
れるフィルタであり、合成フィルタ情報符号化部１００
により分析されたスペクトル包絡の情報に基づいて伝達
関数が設定され、入力音声信号の重み付けを行うととも
に、局部復号化部１１１からの過去の符号化の影響を除
いて、符号化の目標となる目標ベクトルを生成する。The weighting filter 101 is a filter used to improve subjective sound quality by utilizing the masking characteristics of the auditory sense.
A transfer function is set based on the information of the spectral envelope analyzed by the above, and the weighting of the input speech signal is performed, and a target which is a target of the coding is removed except for the influence of the past coding from the local decoding unit 111. Generate a vector.

【００２１】ピッチ成分符号化部１０５は、合成フィル
タを駆動する駆動信号のうちのピッチ周期成分（ピッチ
駆動信号という）を符号化するためのものであり、適応
コードブック１０３とピッチ駆動信号探索部１０４から
構成される。The pitch component encoding section 105 encodes a pitch cycle component (referred to as a pitch drive signal) of the drive signal for driving the synthesis filter, and includes an adaptive codebook 103 and a pitch drive signal search section. 104.

【００２２】適応コードブック１０３は、過去の符号化
された駆動信号を格納し、重みフィルタ１０１から入力
される目標ベクトルに含まれるピッチ周期成分を表現す
るピッチ駆動信号の候補を発生する。ピッチ駆動信号探
索部１０４は、所定の方法で決まるピッチ周期候補とそ
れに対応するピッチ駆動信号の候補に対し、重み付き合
成フィルタでピッチ駆動信号を合成して得られるときの
重み付き合成ピッチ信号の歪（目標ベクトルに対する重
み付き合成ピッチ信号の誤差）が小さくなるような候補
を探索する。こうして選ばれたピッチ周期と、そのピッ
チ周期に対応するピッチ駆動信号ｅ₁ （ｎ）および重み
付き合成ピッチ信号がピッチ成分符号化部１０５から出
力される。The adaptive codebook 103 stores past encoded drive signals and generates pitch drive signal candidates representing pitch period components included in the target vector input from the weight filter 101. The pitch driving signal search unit 104 generates a weighted synthesized pitch signal obtained by synthesizing a pitch driving signal by a weighted synthesis filter with respect to a pitch cycle candidate determined by a predetermined method and a corresponding pitch driving signal candidate. A candidate that reduces distortion (error of a weighted synthesized pitch signal with respect to a target vector) is searched for. The pitch cycle selected in this manner, the pitch drive signal e ₁ (n) corresponding to the pitch cycle, and the weighted synthesized pitch signal are output from the pitch component encoding section 105.

【００２３】次に、本発明に基づいてフォルマント強調
を応用した雑音成分符号化部１０９について詳しく説明
する。雑音成分符号化部１０９は、雑音駆動信号候補発
生部１０６、フォルマント強調部１０７および雑音駆動
信号探索部１０４から構成される。Next, the noise component coding unit 109 to which formant enhancement is applied according to the present invention will be described in detail. The noise component coding unit 109 includes a noise drive signal candidate generation unit 106, a formant enhancement unit 107, and a noise drive signal search unit 104.

【００２４】雑音駆動信号候補発生部１０６は、所定の
方法で雑音コードに対応付けられて一意に復号できる雑
音駆動信号の候補を発生できるように構成されている。
この雑音駆動信号候補発生部１０６は通常、静的に備え
られる雑音コードブックで実現することができるが、こ
れに限られるものではなく、複数の小さな雑音コードブ
ックの組み合わせや、予め組み合わせを限定されたパル
ス等で静的な雑音コードブックに相当するものを表現す
るものでもよく、その実現方法の自由度は大きい。本発
明は、このようにして発生される駆動信号候補全てに対
応でき、これらにフォルマント強調を行うことで、動的
に符号化歪を抑制できるように雑音駆動信号候補の一部
分または全部を変形できる。The noise drive signal candidate generating section 106 is configured to generate a noise drive signal candidate that can be uniquely decoded in association with a noise code by a predetermined method.
The noise drive signal candidate generating unit 106 can be generally realized by a statically provided noise codebook, but is not limited to this. A combination of a plurality of small noise codebooks or a combination in advance is limited. Alternatively, a signal equivalent to a static noise codebook may be expressed by a pulse or the like. The present invention can cope with all the driving signal candidates generated in this way, and by performing formant emphasis on them, a part or all of the noise driving signal candidates can be deformed so that coding distortion can be dynamically suppressed. .

【００２５】フォルマント強調部１０７は、フォルマン
ト強調情報生成部１０２からのフォルマント強調情報を
用いてフォルマント強調された雑音駆動信号を生成す
る。フォルマント強調情報生成部１０２は、合成フィル
タ情報符号化部１００からの合成フィルタのパラメータ
を用いて、合成フィルタのスペクトル包絡の凹凸の関係
を反映するようなフォルマント強調情報を求める。The formant emphasis unit 107 generates a formant-enhanced noise drive signal using the formant emphasis information from the formant emphasis information generation unit 102. Using the parameters of the synthesis filter from the synthesis filter information encoding unit 100, the formant enhancement information generation unit 102 obtains formant enhancement information that reflects the relationship between the unevenness of the spectral envelope of the synthesis filter.

【００２６】図２に、本発明による合成フィルタとフォ
ルマント強調特性の関係を模式的に示す。図２（ａ）は
合成フィルタのスペクトル特性の例を表し、図２（ｂ）
はこの合成フィルタの特性に対応する本発明によるフォ
ルマント強調特性の例を示している。図２（ａ）では、
合成フィルタが周波数（角周波数）ω１，ω２，ω３で
パワースペクトルのピーク値をとっており（ω１，ω
２，ω３の間の周波数は谷となっている）、これらのピ
ークがフォルマントと対応すると考えることができる。
また、この例ではスペクトルは全体的には右下がりに傾
いた低域通過特性となっていることが分かる。FIG. 2 schematically shows the relationship between the synthesis filter according to the present invention and the formant enhancement characteristics. FIG. 2A shows an example of the spectral characteristics of the synthesis filter, and FIG.
Shows an example of the formant enhancement characteristic according to the present invention corresponding to the characteristic of the synthesis filter. In FIG. 2A,
The synthesis filter takes the peak value of the power spectrum at the frequencies (angular frequencies) ω1, ω2, ω3 (ω1, ω
The frequency between 2 and ω3 is a valley), and it can be considered that these peaks correspond to the formants.
In addition, in this example, it can be seen that the spectrum has a low-pass characteristic which is inclined downward to the right as a whole.

【００２７】一方、図２（ｂ）のフォルマント強調特性
は、合成フィルタのスペクトルのピークを示す周波数位
置に対応して、ω１，ω２，ω３でピーク値をとってい
る。しかしながら、駆動信号に不要なスペクトルの傾き
をもたらさないようにするため、フォルマント強調特性
は全体的にはスペクトルの傾きが無い平坦な特性となっ
ている。On the other hand, the formant emphasis characteristic in FIG. 2B has a peak value at ω1, ω2, ω3 corresponding to the frequency position indicating the peak of the spectrum of the synthesis filter. However, the formant emphasis characteristic is a flat characteristic with no spectrum inclination as a whole so as not to bring an unnecessary spectrum inclination to the drive signal.

【００２８】図２（ｃ）に、図２（ｂ）の特性に従って
フォルマント強調を行ったときの駆動信号候補の例を示
す。これに対し、フォルマント強調しない従来の駆動信
号候補の例を図２（ｄ）に示す。図２（ｃ）（ｄ）を比
較して分かるように、本発明によるフォルマント強調し
たときの駆動信号候補を用いて駆動信号の符号探索を行
うことにより、フォルマントのはっきりした明瞭な合成
音声を生成するような駆動信号を選択することが可能と
なる。FIG. 2C shows an example of drive signal candidates when formant enhancement is performed according to the characteristics shown in FIG. 2B. On the other hand, FIG. 2D shows an example of a conventional drive signal candidate without formant enhancement. As can be seen by comparing FIGS. 2 (c) and 2 (d), by performing a code search of the drive signal using the drive signal candidate when the formant is enhanced according to the present invention, a clear synthesized speech with a clear formant is generated. It is possible to select a drive signal that performs the following.

【００２９】このようなフォルマント強調特性を実現す
るための具体的な方法の例を以下に述べる。ここでは、
フォルマント強調特性Ｈｃ（ｚ）を合成フィルタＨ
（ｚ）の情報から求める。すなわち、次式Ｈｃ（ｚ）＝Ｈｆ（ｚ）Ｈ（ｚ／ν１）／Ｈ（ｚ／ν２）（１）で実現することができる。ここで、Ｈｆ（ｚ）はＨｃ
（ｚ）のスペクトルの傾きを平坦化するフィルタを表
す。また、ν１とν２はＨｃ（ｚ）の凹凸の程度を調整
するパラメータであり、合成フィルタの凹凸と特性が近
く、かつピークが緩やかに現れるようにすることが望ま
しいので、これらの関係は１≧ν１＞ν２≧０とする必
要がある。Ｈｆ（ｚ）は、Ｈ（ｚ／ν１）／Ｈ（ｚ／ν
２）のスペクトルの全体的な傾きを平坦化するようにす
ることで求めることができる。具体的な一例としては、
Ｈ（ｚ／ν１）／Ｈ（ｚ／ν２）のインパルス応答の自
己相関係数に相当する値Ｒ１，…，Ｒｑを求め、これを
基にＬＰＣ分析と同様の方法でｑ次の予測係数を求め、
この予測係数を用いた予測フィルタをＨｆ（ｚ）とす
る。予測の次数ｑは、１以上の値を用いる。An example of a specific method for realizing such a formant enhancement characteristic will be described below. here,
Formant emphasis characteristic Hc (z) is converted to synthesis filter H
It is determined from the information of (z). That is, it can be realized by the following equation: Hc (z) = Hf (z) H (z / ν1) / H (z / ν2) (1) Here, Hf (z) is Hc
A filter for flattening the slope of the spectrum of (z) is shown. Further, ν1 and ν2 are parameters for adjusting the degree of unevenness of Hc (z), and it is desirable that the characteristic is close to the unevenness of the synthesis filter and that the peak appears gently. It is necessary that ν1> ν2 ≧ 0. Hf (z) is H (z / ν1) / H (z / ν
It can be obtained by flattening the overall slope of the spectrum of 2). As a specific example,
The values R1,..., Rq corresponding to the autocorrelation coefficients of the impulse response of H (z / ν1) / H (z / ν2) are obtained, and based on the values, the prediction coefficient of the q-th order is calculated in the same manner as in the LPC analysis. Asked,
A prediction filter using this prediction coefficient is defined as Hf (z). The order q of prediction uses a value of 1 or more.

【００３０】予測の次数ｑの値を２〜４程度以上に大き
くすると、全体のスペクトルの傾き以外に低次のフォル
マントも平坦化され、高次のフォルマントについてだけ
強調するような特殊なフォルマント強調を行うことが可
能となる。高次のフォルマントが強調されると、音韻が
はっきりして合成音声の明瞭度を上げることができる効
果がある。When the value of the prediction order q is increased to about 2 to about 4 or more, low-order formants other than the slope of the entire spectrum are flattened, and special formant emphasis such as emphasizing only high-order formants is performed. It is possible to do. When the higher-order formants are emphasized, there is an effect that the phoneme becomes clear and the clarity of the synthesized speech can be increased.

【００３１】図３に、４以上の値のｑを用いたときの本
発明による合成フィルタとフォルマント強調の特性の関
係の例を示す。図３（ａ）は合成フィルタのスペクトル
特性の例を表し、図３（ｂ）はこの合成フィルタに対応
する本発明に基づくフォルマント強調特性の例を示して
いる。図３（ｂ）では上述の小さい値のｑを使った場合
と異なり、フォルマント強調特性は合成フィルタのスペ
クトルのピークを示す周波数位置ω１，ω２，ω３のう
ち第３フォルマントに対応するω３付近でのみピーク値
をとり、他の部分は平坦化されている。これを用いてフ
ォルマント強調したときの駆動信号候補のスペクトルの
例は、図３（ｃ）に示すように、やはり第３フォルマン
トに対応するω３付近でのみピーク値をとる特性となる
場合が多くなる。従って、従来のＣＥＬＰ方式では十分
に表現することが難しかった高次のフォルマントが強調
されて、音韻がはっきりした合成音声を生成できる効果
がある。ＬＰＣ分析を基に得られる合成フィルタを用い
る場合には、合成フィルタを次式（２）で表すことがで
きる。FIG. 3 shows an example of the relationship between the synthesis filter according to the present invention and the characteristic of formant enhancement when q having a value of 4 or more is used. FIG. 3A shows an example of the spectrum characteristic of the synthesis filter, and FIG. 3B shows an example of the formant enhancement characteristic according to the present invention corresponding to the synthesis filter. In FIG. 3B, unlike the case where q having a small value described above is used, the formant emphasis characteristic is only in the vicinity of ω3 corresponding to the third formant among the frequency positions ω1, ω2, ω3 indicating the peaks of the spectrum of the synthesis filter. It takes a peak value and the other parts are flattened. As shown in FIG. 3C, an example of a spectrum of a drive signal candidate when formant emphasis is performed by using this is a characteristic in which a peak value is obtained only near ω3 corresponding to the third formant in many cases. . Therefore, there is an effect that a high-order formant, which is difficult to sufficiently express in the conventional CELP method, is emphasized, and a synthesized speech with clear phonemes can be generated. When a synthesis filter obtained based on LPC analysis is used, the synthesis filter can be expressed by the following equation (2).

【００３２】[0032]

【数１】 (Equation 1)

【００３３】Ｐは合成フィルタの次数である。従って、
この場合にはフォルマント強調特性は、式（１）（２）
よりＨｃ（ｚ）＝Ｈｆ（ｚ）Ａ（ｚ／ν２）／Ａ（ｚ／ν１）（３）と書き表すことができる。P is the order of the synthesis filter. Therefore,
In this case, the formant emphasis characteristic is given by the following equations (1) and (2).
Thus, Hc (z) = Hf (z) A (z / ν2) / A (z / ν1) (3)

【００３４】フォルマント強調特性Ｈｃ（ｚ）は、復号
化側において再生することのできる合成フィルタの情報
を用いて符号化と同じものを復号化側で再現することが
できる。このため符号化側から合成フィルタのコードと
雑音コードが伝送される構成であれば、復号化側で符号
化側と全く同じフォルマントを強調された駆動信号を再
生できる。The formant emphasis characteristic Hc (z) can be reproduced on the decoding side using the information of the synthesis filter that can be reproduced on the decoding side. Therefore, with a configuration in which the code of the synthesis filter and the noise code are transmitted from the encoding side, it is possible to reproduce a drive signal in which the same formant as that of the encoding side is emphasized on the decoding side.

【００３５】また、フォルマント強調処理の計算量を削
減するために、Ｈｃ（ｚ）のインパルス応答に窓をかけ
てＭ＋１サンプルに打ち切り、これをＭ次のＦＩＲフィ
ルタの係数としてフォルマント強調に用いる方法が有効
である。打ち切り用の窓としては、方形窓やハミング
窓、指数窓等を用いることができる。Further, in order to reduce the calculation amount of the formant enhancement processing, a method is used in which the window of the impulse response of Hc (z) is windowed to cut off M + 1 samples, and this is used as a coefficient of an M-order FIR filter for formant enhancement. It is valid. A rectangular window, a Hamming window, an exponential window, or the like can be used as the window for censoring.

【００３６】今、フォルマント強調前の雑音駆動信号を
ｅ₂ （ｎ）、フォルマント強調用に用いるＭ次のＦＩＲ
フィルタのインパルス応答をｈ_c （ｎ）とすると、フォ
ルマント強調された雑音駆動信号ｅ₂ ′（ｎ）は、例え
ば次式で表される畳み込み処理で求めることができる。Now, the noise driving signal before formant emphasis is e ₂ (n), and an M-order FIR used for formant emphasis.
Assuming that the impulse response of the filter is h _c (n), the noise drive signal e ₂ ′ (n) with formant emphasis can be obtained by, for example, a convolution process represented by the following equation.

【００３７】[0037]

【数２】 (Equation 2)

【００３８】ここで、ｈ_c （０）＝１である。図１に説
明を戻すと、雑音駆動信号探索部１０８はフォルマント
強調された雑音駆動信号候補を重み付き合成フィルタで
合成したものがピッチ成分を除いた目標ベクトルに対し
て歪が小さくなるような候補を探索し、選ばれた雑音コ
ード、それに対応するフォルマント強調をされた雑音駆
動信号およびフォルマント強調された重み付き合成雑音
信号を雑音符号化部１０９の出力として出力する。Here, h _c (0) = 1. Referring back to FIG. 1, the noise driving signal search unit 108 generates a noise driving signal candidate obtained by synthesizing a formant-emphasized noise driving signal candidate with a weighted synthesis filter such that the distortion becomes smaller with respect to the target vector excluding the pitch component. And outputs the selected noise code, the corresponding formant-enhanced noise drive signal and the formant-enhanced weighted combined noise signal as the output of the noise encoding unit 109.

【００３９】上記の方法で雑音コードを選ぶこともでき
るが、これと原理は同じで手法が異なる別の雑音コード
の探索の方法がある。これは、雑音駆動信号候補をフォ
ルマント強調する処理を行わずに、重み付き合成フィル
タとフォルマント強調フィルタを結合したものを改めて
探索用の重み付き合成フィルタと見なし、フォルマント
強調前の雑音駆動信号候補を用いて雑音コードの探索を
行う方法である。このようにしても雑音駆動信号をフォ
ルマント強調することと等価になるため、探索結果にお
いても同じ雑音コードを選ぶことができる。この方法を
用いると、雑音駆動信号候補にその都度フォルマント強
調を行う必要がなくなり、雑音駆動信号の探索ループの
中からフォルマント強調処理を除くことができる。この
ため、探索に要する計算量を大幅に削減できる。Although a noise code can be selected by the above method, there is another method of searching for a noise code which has the same principle but a different method. This is because the combination of the weighted synthesis filter and the formant emphasis filter is regarded as a search weighted synthesis filter again without performing the formant emphasis processing of the noise drive signal candidate, and the noise drive signal candidate before formant emphasis is regarded as a search. This is a method of searching for a noise code using the above method. This is also equivalent to formant emphasis of the noise drive signal, so that the same noise code can be selected in the search result. By using this method, it is not necessary to perform formant emphasis on the noise drive signal candidates each time, and the formant emphasis processing can be eliminated from the noise drive signal search loop. Therefore, the amount of calculation required for the search can be significantly reduced.

【００４０】次に、ゲイン成分符号化部１１０について
説明する。このゲイン成分符号化部１１０は、ピッチ駆
動信号および雑音駆動信号のそれぞれに用いるゲイン値
を符号化するものである。ゲイン値の符号化は、ゲイン
を乗じたときの重み付き合成ピッチ信号とフォルマント
強調された重み付き合成雑音信号の両者を結合して得ら
れる合成信号が目標信号に近くなるようなゲインを与え
るゲインコードを選択し、これを出力することで行う。
また、このときのゲインとピッチ駆動信号ｅ₁（ｎ）と
フォルマント強調された雑音駆動信号ｅ₂ ′（ｎ）を用
いて、合成フィルタの入力となる駆動信号を求める。Next, the gain component encoder 110 will be described. The gain component encoding section 110 encodes a gain value used for each of the pitch drive signal and the noise drive signal. The gain value is encoded by a gain that gives a gain such that a synthesized signal obtained by combining both the weighted synthesized pitch signal when multiplied by the gain and the weighted synthesized noise signal subjected to formant emphasis is close to the target signal. This is done by selecting the code and outputting it.
Further, a drive signal to be input to the synthesis filter is obtained using the gain, the pitch drive signal e ₁ (n), and the formant-emphasized noise drive signal e ₂ ′ (n).

【００４１】ｅｘ（ｎ）＝ｇ₁ ｅ₁ （ｎ）＋ｇ₂ ｅ₂ ′（ｎ）（５）ここでｇ₁ 、ｇ₂ はそれぞれピッチ駆動信号用のゲイ
ン、雑音駆動信号用のゲインを表す。Ex (n) = g ₁ e ₁ (n) + g ₂ e ₂ ′ (n) (5) where g ₁ and g ₂ represent a gain for the pitch drive signal and a gain for the noise drive signal, respectively. .

【００４２】こうして求められた駆動信号は、次のピッ
チ成分の符号化に備え、適応コードブック１０３に格納
される。最後に、次のフレームの符号化への影響を求め
るために駆動信号は重み付き合成され、その結果生成さ
れる合成信号は重みフィルタ部１０１に入力される。The drive signal thus obtained is stored in the adaptive code book 103 in preparation for encoding the next pitch component. Finally, the drive signals are weighted and combined to determine the effect on the encoding of the next frame, and the combined signal generated as a result is input to the weight filter unit 101.

【００４３】符号化されたデータ（符号化データ）は多
重化部１１２で多重化され、ビットストリームとして復
号化側に伝送される。［復号化側について］次に、復号化側について説明す
る。図１において、符号化側から送られてきたビットス
トリームは、復号化側においてまず逆多重化部１１３に
入力される。逆多重化部１１３は入力されたビットスト
リームを合成フィルタのコード、ピッチ周期コード、雑
音コードおよびゲインコードに分離し、これらをフレー
ム毎のコードに整理して出力する。The coded data (coded data) is multiplexed by the multiplexing unit 112 and transmitted as a bit stream to the decoding side. [Decoding Side] Next, the decoding side will be described. In FIG. 1, the bit stream sent from the encoding side is first input to the demultiplexing unit 113 on the decoding side. The demultiplexing unit 113 separates the input bit stream into a synthesis filter code, a pitch cycle code, a noise code, and a gain code, and organizes these into codes for each frame and outputs them.

【００４４】復号化側には、さらに合成フィルタ情報再
生部１１４、フォルマント強調情報生成部１１５、適応
コードブック１１６、雑音駆動信号再生部１１７、フォ
ルマント強調部１１９、ゲイン成分再生部１１８、駆動
信号生成部１２０および合成フィルタ１２１が備えら
れ、逆多重化部１１３で分離されたコードを基にフレー
ム単位に合成音声信号を再生する。以下、各部について
詳細に説明する。On the decoding side, a synthesis filter information reproduction unit 114, a formant emphasis information generation unit 115, an adaptive codebook 116, a noise drive signal reproduction unit 117, a formant emphasis unit 119, a gain component reproduction unit 118, a drive signal generation unit A unit 120 and a synthesis filter 121 are provided, and reproduce a synthesized audio signal in frame units based on the code separated by the demultiplexing unit 113. Hereinafter, each part will be described in detail.

【００４５】合成フィルタ情報再生部１１４は、合成フ
ィルタのコードを用いて合成フィルタの情報を再生す
る。適応コードブック１１６は、過去のピッチ駆動信号
を格納しており、この適応コードブック１１６からピッ
チ周期コードを基にピッチ駆動信号が取り出される。The synthesis filter information reproducing section 114 reproduces the information of the synthesis filter using the code of the synthesis filter. The adaptive codebook 116 stores a past pitch drive signal, and a pitch drive signal is extracted from the adaptive codebook 116 based on a pitch cycle code.

【００４６】フォルマント強調情報生成部１１５は、符
号化側で説明したフォルマント強調情報生成部１０２と
同じ機能を持ち、復号された合成フィルタの情報を基に
合成フィルタのスペクトル包絡の凹凸の関係を反映する
ようなフォルマント強調のためのパラメータ情報を求め
る。The formant emphasis information generation section 115 has the same function as the formant emphasis information generation section 102 described on the encoding side, and reflects the relationship between the unevenness of the spectrum envelope of the synthesis filter based on the decoded synthesis filter information. Parameter information for formant emphasis as described below is obtained.

【００４７】雑音駆動信号再生部１１７は、雑音コード
を基に符号化側と同じ方法で雑音駆動信号を再生する。
フォルマント強調部１１９は、符号化側のフォルマント
強調部１０７と同じ機能を持ち、フォルマント強調情報
生成部１１５からのパラメータを用いて雑音駆動信号の
フォルマント強調を行う。符号化側と復号化側とで同一
のフォルマント強調された駆動信号を生成するために
は、フォルマント強調部１１９は符号化側のフォルマン
ト強調部１０７と同一または等価なフォルマント強調を
行う必要がある。また、合成フィルタの情報と異なる情
報で、復号化側で得られる音声の特徴量、例えば有声・
無声の程度の情報などに応じて、フォルマント強調の度
合いを符号化側よりさらに強くしたり、逆に弱くしたり
することも可能である。The noise drive signal reproducing section 117 reproduces the noise drive signal based on the noise code in the same manner as the encoding side.
The formant emphasis unit 119 has the same function as the formant emphasis unit 107 on the encoding side, and performs formant emphasis of the noise drive signal using the parameter from the formant emphasis information generation unit 115. In order to generate the same formant-enhanced drive signal on the encoding side and the decoding side, the formant emphasis unit 119 needs to perform the same or equivalent formant emphasis as the formant emphasis unit 107 on the encoding side. In addition, information different from the information of the synthesis filter and obtained from the decoding side, such as voiced
Depending on the information on the degree of silence, etc., the degree of formant emphasis can be further increased or reduced on the encoding side.

【００４８】ゲイン成分再生部１１８は、ゲインコード
を基にピッチ駆動信号とフォルマント強調された雑音駆
動信号にそれぞれ与えるべきゲインを求める。駆動信号
生成部１２０は、ピッチ駆動信号とフォルマント強調さ
れた雑音駆動信号をゲイン成分再生部１１８で再生され
たゲインを与えて結合することにより、合成フィルタの
入力となる駆動信号を生成する。また、生成された駆動
信号は適応コードブック１１６に格納される。The gain component reproducing section 118 obtains a gain to be given to each of the pitch drive signal and the formant-emphasized noise drive signal based on the gain code. The drive signal generator 120 generates a drive signal to be input to the synthesis filter by combining the pitch drive signal and the noise drive signal subjected to formant emphasis by giving the gain reproduced by the gain component reproducer 118. The generated drive signal is stored in the adaptive codebook 116.

【００４９】合成フィルタ１２１は、合成フィルタ情報
再生部１１４で再生された合成フィルタ情報により特性
が決まるフィルタであり、駆動信号生成部１２０から駆
動信号が入力されることにより、合成音声信号を生成し
て出力端子１２へ出力する。なお、さらに音質を向上さ
せるために、この合成音声信号をポストフィルタを通過
させてから出力するようにすることもできる。The synthesis filter 121 is a filter whose characteristics are determined by the synthesis filter information reproduced by the synthesis filter information reproduction section 114. The synthesis filter 121 generates a synthesized voice signal by inputting a drive signal from the drive signal generation section 120. To the output terminal 12. In order to further improve the sound quality, the synthesized voice signal may be output after passing through a post filter.

【００５０】次に、図４に示すフローチャートを用いて
本実施形態における符号化側の処理手順について説明す
る。通常、ＣＥＬＰ方式では合成フィルタ情報はフレー
ム単位で行い、他のピッチ周期、雑音駆動信号、ゲイン
の符号化については、フレームをさらに分割したサブフ
レーム単位に行うことが多いが、ここでは簡単のため、
ピッチ周期、雑音駆動信号およびゲインの符号化もフレ
ーム単位で行うように設計した場合の例について説明す
る。Next, the processing procedure on the encoding side in this embodiment will be described with reference to the flowchart shown in FIG. Normally, in the CELP method, synthesis filter information is performed in units of frames, and encoding of other pitch periods, noise drive signals, and gains is often performed in units of subframes obtained by further dividing a frame. ,
An example of a case where the coding of the pitch period, the noise drive signal, and the gain is designed to be performed in frame units will be described.

【００５１】始めに初期設定を行い（ステップＳ１０
０）、次にフレーム処理用の音声を入力し（ステップＳ
１０１）、合成フィルタの情報を抽出して符号化する
（ステップＳ１０２）。次に、分析されたスペクトル包
絡の情報から得られる重みフィルタで入力音声信号の重
み付けを行い、駆動信号の符号化の目標となる目標信号
を生成する（ステップＳ１０３）。さらに、ピッチ駆動
信号の探索を行い、選ばれたピッチ周期を符号化する
（ステップＳ１０４）。そして、フォルマント強調のた
めに合成フィルタの情報を用いて、合成フィルタのスペ
クトル包絡の凹凸の関係を反映するようなフォルマント
強調情報を求める（ステップＳ１０５）。次にフォルマ
ント強調された雑音駆動信号の候補を生成し、これらを
用いた雑音駆動信号の探索と符号化を行う（ステップＳ
１０６）。次に、ピッチ駆動信号と雑音駆動信号にそれ
ぞれ用いるゲイン値の符号化を行う（ステップＳ１０
７）。First, initial settings are made (step S10).
0), and then input a sound for frame processing (step S).
101), the information of the synthesis filter is extracted and encoded (step S102). Next, the input audio signal is weighted by a weight filter obtained from the analyzed information on the spectrum envelope, and a target signal to be a target of encoding of the drive signal is generated (step S103). Further, a search for a pitch drive signal is performed, and the selected pitch cycle is encoded (step S104). Then, using the information of the synthesis filter for formant enhancement, formant enhancement information that reflects the relationship between the unevenness of the spectrum envelope of the synthesis filter is obtained (step S105). Next, formant-emphasized noise drive signal candidates are generated, and the noise drive signal is searched for and encoded using these candidates (step S).
106). Next, gain values used for the pitch drive signal and the noise drive signal are encoded (step S10).
7).

【００５２】符号化されたデータ（符号化データ）は多
重化され、ビットストリームとして復号化側に伝送され
る（ステップＳ１０８）。さらに、ゲインを与えて駆動
信号を生成し、聴覚重み付きの合成音声信号を生成する
局部復号化を行う（ステップＳ１０９）。最後に次のフ
レームの処理を備え、符号化に用いる内部データを移動
させる（ステップＳ１１０）。The coded data (coded data) is multiplexed and transmitted as a bit stream to the decoding side (step S108). Further, a drive signal is generated by giving a gain, and local decoding is performed to generate a synthesized speech signal with auditory weight (step S109). Finally, processing for the next frame is provided, and internal data used for encoding is moved (step S110).

【００５３】次に、図５に示すフローチャートを用いて
本実施形態における復号化側の処理手順を説明する。ま
ず、復号化処理の内部状態を初期設定する（ステップＳ
２００）。次に、符号化側から送られてきたビットスト
リーム（符号化データ）を復号化で用いる各コードに分
離する（ステップＳ２０１）。これらのうち、合成フィ
ルタのコードは合成フィルタ情報を再生するために用い
られ（ステップＳ２０２）、ピッチ周期コードはピッチ
駆動信号を再生するために用いられる（ステップＳ２０
３）。さらに、合成フィルタの情報はフォルマント強調
情報を求めるためにも用いられ（ステップＳ２０４）、
雑音コードで再生した雑音駆動信号とフォルマント強調
情報を用いて雑音駆動信号のフォルマント強調を行う
（ステップＳ２０５）。次にゲインコードを用いてゲイ
ン情報を再生し（ステップＳ２０６）、この再生された
ゲインを用いてピッチ駆動信号とフォルマント強調され
た雑音駆動信号を結合して駆動信号を求める（ステップ
Ｓ２０７）。そして、この駆動信号を合成フィルタに入
力し、合成音声信号を作成する（ステップＳ２０８）。
最後に、次の区間の処理に備えて内部データを移動させ
る（ステップＳ２０９）。Next, the processing procedure on the decoding side in this embodiment will be described with reference to the flowchart shown in FIG. First, the internal state of the decoding process is initialized (step S
200). Next, the bit stream (coded data) sent from the coding side is separated into codes used for decoding (step S201). Among these, the code of the synthesis filter is used to reproduce the synthesis filter information (step S202), and the pitch period code is used to reproduce the pitch drive signal (step S20).
3). Further, the information of the synthesis filter is also used to obtain formant emphasis information (step S204),
Formant enhancement of the noise drive signal is performed using the noise drive signal reproduced with the noise code and the formant enhancement information (step S205). Next, gain information is reproduced using the gain code (step S206), and the driving signal is obtained by combining the pitch driving signal and the formant-emphasized noise driving signal using the reproduced gain (step S207). Then, the drive signal is input to a synthesis filter to create a synthesized voice signal (step S208).
Finally, the internal data is moved in preparation for the processing of the next section (step S209).

【００５４】（第２の実施形態）図６は、第２の実施形
態に係る音声符号化／復号化装置の構成をブロック図で
表したものである。(Second Embodiment) FIG. 6 is a block diagram showing a configuration of a speech encoding / decoding device according to a second embodiment.

【００５５】第１の実施形態では、フォルマント強調を
雑音駆動信号候補の全部に対して行う構成について述べ
たが、第２の実施形態では、フォルマント強調を特に雑
音駆動信号候補のうちの一部の候補に適用した場合の構
成について述べる。従って、雑音駆動信号の処理に関わ
る部分以外は基本的に第１の実施形態と同様にして実現
できるので、図６では同じ処理に対しては図１と同じ参
照符号を付して詳細な説明を省略し、第２の実施形態に
特有の部分についてだけ詳細に説明する。In the first embodiment, the configuration in which the formant enhancement is performed on all the noise drive signal candidates has been described. However, in the second embodiment, the formant enhancement is performed particularly on a part of the noise drive signal candidates. The configuration when applied to candidates will be described. Therefore, the parts other than the part related to the processing of the noise drive signal can be basically realized in the same manner as in the first embodiment. In FIG. 6, the same processing is denoted by the same reference numeral as in FIG. Are omitted, and only the parts unique to the second embodiment will be described in detail.

【００５６】［符号化側について］符号化側には、合成
フィルタ情報符号化部１００、重みフィルタ１０１、フ
ォルマント強調情報生成部１０２、ピッチ成分符号化部
１０５、雑音成分符号化部６０９、ゲイン成分符号化部
１１０および局部復号化部１１１が備えられている。こ
れらのうち、ピッチ成分符号化部１０５は適応コードブ
ック１０３とピッチ駆動信号探索部１０４から構成され
る。また、雑音成分符号化部６０９は、雑音駆動信号候
補発生部６０６、フォルマント強調部６０７および雑音
駆動信号探索部６０８から構成される。[On the Encoding Side] On the encoding side, a synthesis filter information encoding unit 100, a weight filter 101, a formant emphasis information generation unit 102, a pitch component encoding unit 105, a noise component encoding unit 609, a gain component An encoding unit 110 and a local decoding unit 111 are provided. Among them, the pitch component encoding unit 105 includes the adaptive codebook 103 and the pitch drive signal search unit 104. The noise component coding section 609 includes a noise drive signal candidate generation section 606, a formant enhancement section 607, and a noise drive signal search section 608.

【００５７】次に、本発明に基づいてフォルマント強調
を応用した雑音成分符号化部６０９について詳細に説明
する。雑音駆動信号候補発生部６０６は、所定の方法で
雑音コードに対応付けられて一意に復号できる雑音駆動
信号を発生できるように構成されている。この雑音駆動
信号候補発生部６０６は通常、静的に備えられる雑音コ
ードブックで実現することができるが、これに限られる
ものではなく、複数の小さな雑音コードブックの組み合
わせや、予め組み合わせを限定されたパルス、等で静的
な雑音コードブックに相当するものを表現するものでも
よく、その実現方法の自由度が大きい。本発明は、この
ようにして発生される駆動信号候補全てに対応でき、フ
ォルマント強調を行うことで動的に符号化歪を抑制でき
るように雑音駆動信号候補の一部分または全部を変形で
きる。Next, the noise component coding unit 609 to which formant enhancement is applied based on the present invention will be described in detail. The noise drive signal candidate generation unit 606 is configured to generate a noise drive signal that can be uniquely decoded in association with a noise code by a predetermined method. The noise drive signal candidate generating section 606 can be normally realized by a statically provided noise codebook, but is not limited to this. A combination of a plurality of small noise codebooks or a combination in advance is limited. It may be one that expresses a static noise codebook corresponding to a pulse or the like, and the method of realizing the method is large. The present invention can cope with all the driving signal candidates generated in this way, and can partially or entirely deform the noise driving signal candidates so that encoding distortion can be dynamically suppressed by performing formant enhancement.

【００５８】ここでは、フォルマント強調処理を予め定
められた一部の雑音駆動信号候補に対してだけ行うよう
にし、それ以外の雑音駆動信号候補に対してはフォルマ
ント強調は行わない構成の雑音成分符号化を実現する方
法の一例について説明する。雑音駆動信号候補発生部６
０６は、第１の雑音駆動信号候補発生部（１）と第２の
雑音駆動信号候補発生部（２）の２つの部分に大きく分
けられ、発生部（１）から発生された雑音駆動信号候補
に対してはフォルマント強調を行うようにし、発生部
（２）から発生された雑音駆動信号候補はフォルマント
強調を行わないように構成する。Here, the formant enhancement processing is performed only on some predetermined noise drive signal candidates, and the formant enhancement is not performed on the other noise drive signal candidates. An example of a method for realizing the configuration will be described. Noise drive signal candidate generator 6
06 is roughly divided into two parts, a first noise drive signal candidate generator (1) and a second noise drive signal candidate generator (2), and the noise drive signal candidate generated by the generator (1) is divided into two parts. , The formant enhancement is performed, and the noise drive signal candidates generated from the generation unit (2) are configured not to perform the formant enhancement.

【００５９】フォルマント強調部６０７は、第１の雑音
駆動信号候補発生部（１）からの駆動信号候補に対して
だけフォルマント強調処理を行うよう構成されている。
そして、フォルマント強調された雑音駆動信号候補とフ
ォルマント強調されない雑音駆動信号候補の両方も用い
て、雑音駆動信号探索部６０８において符号化歪がより
小さくなるような好適な雑音駆動信号とそれに対応する
雑音コードを探索する。The formant emphasis section 607 is configured to perform formant emphasis processing only on the drive signal candidates from the first noise drive signal candidate generation section (1).
Then, both the noise drive signal candidate subjected to the formant emphasis and the noise drive signal candidate not subjected to the formant emphasis are used, and a suitable noise drive signal having a smaller coding distortion in the noise drive signal search unit 608 and a noise corresponding thereto are obtained. Explore the code.

【００６０】このとき最終的に選ばれた雑音駆動信号が
フォルマント強調されたものかどうかは、第１の雑音駆
動信号発生部（１）からのものであるか第２の雑音駆動
信号発生部（２）からのものであるかによって判断でき
ることは明白である。言い換えれば、各雑音コードに対
して雑音駆動信号のフォルマント強調をするかしないか
の対応が予め定められていれば、最終的に選ばれた雑音
コードによりフォルマント強調がされた駆動信号が使用
されるか、フォルマント強調がされない駆動信号が使用
されるかが簡単に判る。At this time, whether the noise drive signal finally selected is formant-enhanced is determined from the first noise drive signal generator (1) or the second noise drive signal generator ( It is clear that the judgment can be made based on whether the data comes from 2). In other words, if the correspondence of whether or not to formant-enhance the noise drive signal for each noise code is predetermined, the drive signal whose formant is enhanced by the finally selected noise code is used. It is easy to determine whether a drive signal without formant enhancement is used.

【００６１】また、この実施形態においても、第１の実
施形態で説明した方法と同じようにして、予めフォルマ
ント強調フィルタＨｃ（ｚ）を重み付き合成フィルタの
中に含める方法を用いて探索の計算量を削減することが
できることは明らかである。この際、フォルマント強調
を行わない場合の探索に対しては、Ｈｃ（ｚ）＝１と見
なして探索すればよい。Also in this embodiment, in the same manner as the method described in the first embodiment, the search calculation is performed using a method in which the formant emphasis filter Hc (z) is included in the weighted synthesis filter in advance. It is clear that the amount can be reduced. At this time, the search in the case where the formant enhancement is not performed may be performed assuming that Hc (z) = 1.

【００６２】［復号化側について］次に、復号化側につ
いて説明する。図６において、復号化側には逆多重化部
１１３、合成フィルタ情報再生部１１４、フォルマント
強調情報生成部１１５、適応コードブック１１６、雑音
駆動信号再生部６１７、フォルマント強調部６１９、切
り替え部６２０、ゲイン成分再生部１１８、駆動信号生
成部１２０および合成フィルタ１２１が設けられ、逆多
重化部１１３により分離されたコードを基に合成音声信
号を再生する。以下、各部について詳細に説明する。[Decoding Side] Next, the decoding side will be described. 6, on the decoding side, a demultiplexing unit 113, a synthesis filter information reproducing unit 114, a formant emphasis information generating unit 115, an adaptive codebook 116, a noise drive signal reproducing unit 617, a formant emphasizing unit 619, a switching unit 620, A gain component reproduction unit 118, a drive signal generation unit 120, and a synthesis filter 121 are provided, and reproduce a synthesized audio signal based on the code separated by the demultiplexing unit 113. Hereinafter, each part will be described in detail.

【００６３】雑音駆動信号再生部６１７は、符号化側の
雑音駆動信号発生部６０６に対応して、第１の雑音駆動
信号再生部（１）と第２の雑音駆動信号再生部（２）の
二つの部分から構成され、符号化側と同様にして雑音コ
ードに対応した雑音駆動信号を再生する。一方、上述し
たように雑音コードから雑音駆動信号のフォルマント強
調をするかしないかの情報が判るので、この情報に基づ
きフォルマント強調処理が必要な場合はフォルマント強
調部６１９を用いて雑音駆動信号のフォルマント強調を
行う。図６の例では、切り替え部６２０を用いて復号化
側の駆動信号のフォルマント強調をするかしないかを実
現するようにしているが、実現方法はこれに限られるも
のではない。The noise driving signal reproducing section 617 corresponds to the noise driving signal generating section 606 on the encoding side, and includes a first noise driving signal reproducing section (1) and a second noise driving signal reproducing section (2). A noise drive signal composed of two parts and corresponding to the noise code is reproduced in the same manner as on the encoding side. On the other hand, since it is known from the noise code whether or not the formant enhancement of the noise drive signal is performed, as described above, if the formant enhancement processing is necessary based on this information, the formant enhancement section 619 is used to formant the noise drive signal. Make emphasis. In the example of FIG. 6, the switching unit 620 is used to realize whether or not the formant enhancement of the drive signal on the decoding side is performed. However, the implementation method is not limited to this.

【００６４】次に、図７に示すフローチャートを用いて
本実施形態における符号化側の処理手順について説明す
る。通常、ＣＥＬＰ方式では合成フィルタ情報の符号化
はフレーム単位で行い、他のピッチ周期、雑音駆動信
号、ゲインの符号化については、フレームをさらに分割
したサブフレーム単位に行うことが多いが、ここでは簡
単のため、ピッチ周期、雑音駆動信号およびゲインの符
号化もフレーム単位で行うように設計した場合の例につ
いて説明する。Next, the processing procedure on the encoding side in this embodiment will be described with reference to the flowchart shown in FIG. Normally, in the CELP method, encoding of synthesis filter information is performed in units of frames, and encoding of other pitch periods, noise drive signals, and gains is often performed in units of subframes obtained by further dividing a frame. For the sake of simplicity, an example will be described in which the pitch period, the noise drive signal, and the gain are designed to be encoded in frame units.

【００６５】始めに初期設定を行い（ステップＳ３０
０）、次にフレーム処理用の音声を入力し（ステップＳ
３０１）、合成フィルタの情報を抽出して符号化する
（ステップＳ３０２）。次に、分析されたスペクトル包
絡の情報から得られる重みフィルタで入力音声信号の重
み付けを行い、駆動信号の符号化の目標となる目標信号
を生成する（ステップＳ３０３）。さらに、ピッチ駆動
信号の探索を行い、選ばれたピッチ周期を符号化する
（ステップＳ３０４）。そして、フォルマント強調のた
めに合成フィルタの情報を用いて、合成フィルタのスペ
クトル包絡の凹凸の関係を反映するようなフォルマント
強調情報を求める（ステップＳ３０５）。次に雑音駆動
信号候補発生部（１）からの候補に対してはフォルマン
ト強調された雑音駆動信号の候補を生成し（ステップＳ
３１５）、雑音駆動信号発生部（２）からの候補に対し
てはフォルマント強調を行わないようにして、フォルマ
ント強調をされた候補とフォルマント強調をされない候
補を用いて雑音駆動信号の探索と符号化を行う（ステッ
プＳ３０６）。次に、ピッチ駆動信号と雑音駆動信号に
それぞれ用いるゲイン値の符号化を行う（ステップＳ３
０７）。符号化されたデータ（符号化データ）は多重化
され、ビットストリームとして復号化側に伝送される
（ステップＳ３０８）。First, initial settings are made (step S30).
0), and then input a sound for frame processing (step S).
301), the information of the synthesis filter is extracted and encoded (step S302). Next, the input audio signal is weighted by a weight filter obtained from the analyzed information on the spectral envelope, and a target signal to be a target of the drive signal encoding is generated (step S303). Further, a search for a pitch drive signal is performed, and the selected pitch cycle is encoded (step S304). Then, using the information of the synthesis filter for formant enhancement, formant enhancement information that reflects the relationship between the unevenness of the spectrum envelope of the synthesis filter is obtained (step S305). Next, formant-emphasized noise drive signal candidates are generated for the candidates from the noise drive signal candidate generator (1) (step S).
315) Searching and encoding the noise drive signal using the formant-enhanced candidate and the formant-enhanced candidate so that the formant enhancement is not performed on the candidate from the noise drive signal generator (2). Is performed (step S306). Next, coding of gain values used for the pitch drive signal and the noise drive signal is performed (step S3).
07). The encoded data (encoded data) is multiplexed and transmitted to the decoding side as a bit stream (step S308).

【００６６】さらに、ゲインを与えて駆動信号を生成
し、聴覚重み付きの合成音声信号を生成する局部復号化
を行う（ステップＳ３０９）。最後に次のフレームの処
理に備え、符号化に用いる内部データを移動させる（ス
テップＳ３１０）。Further, a drive signal is generated by giving a gain, and local decoding is performed to generate a synthesized speech signal with auditory weight (step S309). Finally, in preparation for the processing of the next frame, the internal data used for encoding is moved (step S310).

【００６７】次に、図８に示すフローチャートを用いて
本実施形態における復号化側の処理手順を説明する。こ
こでは、ポストフィルタ処理を用いる例について述べ
る。まず、復号化処理の内部状態を初期設定する（ステ
ップＳ４００）。次に、符号化側から送られてきたビッ
トストリーム（符号化データ）を復号化で用いる各コー
ドに分離する（ステップＳ４０１）。これらのうち、合
成フィルタのコードは合成フィルタ情報を再生するため
に用いられ（ステップＳ４０２）、ピッチ周期コードは
ピッチ駆動信号を再生するために用いられる（ステップ
Ｓ４０３）。次に、合成フィルタの情報を用いてフォル
マント強調情報を求め（ステップＳ４０４）、雑音コー
ドに基づいてフォルマントを強調した雑音駆動信号か、
またはフォルマント強調しない雑音駆動信号を再生する
（ステップＳ４０５）。フォルマント強調をするかしな
いかは雑音コードの値により判断できる。次にゲインコ
ードを用いてゲイン情報を再生し（ステップＳ４０
６）、このゲインを用いてピッチ駆動信号とステップＳ
４０５で再生された雑音駆動信号を結合し駆動信号を求
める（ステップＳ４０７）。そして、この駆動信号を合
成フィルタに入力し、合成音声を作成する（ステップＳ
４０８）。これをポストフィルタに通して最終的な出力
用の合成音声を生成する（ステップＳ４０９）。これと
共に、次のフレームの処理に備えて内部データを移動さ
せる（ステップＳ４１０）。Next, the processing procedure on the decoding side in this embodiment will be described with reference to the flowchart shown in FIG. Here, an example using post-filter processing will be described. First, the internal state of the decoding process is initialized (step S400). Next, the bit stream (coded data) sent from the coding side is separated into codes used for decoding (step S401). Among these, the code of the synthesis filter is used to reproduce the synthesis filter information (step S402), and the pitch cycle code is used to reproduce the pitch drive signal (step S403). Next, formant emphasis information is obtained using the information of the synthesis filter (step S404).
Alternatively, a noise drive signal without formant enhancement is reproduced (step S405). Whether or not formant enhancement is performed can be determined by the value of the noise code. Next, the gain information is reproduced using the gain code (step S40).
6), the pitch driving signal and the step S
The noise driving signal reproduced in 405 is combined to obtain a driving signal (step S407). Then, this drive signal is input to a synthesis filter to create a synthesized voice (step S
408). This is passed through a post filter to generate a final synthesized speech for output (step S409). At the same time, the internal data is moved in preparation for the processing of the next frame (step S410).

【００６８】（第３の実施形態）図９は、第３の実施形
態に係る音声符号化／復号化装置の構成をブロック図で
表したものである。第１、２の実施形態ではフォルマン
ト強調を雑音駆動信号候補だけに対して行う構成につい
て述べたが、この第３の実施形態では、フォルマント強
調をピッチ駆動信号候補と雑音駆動信号候補の両方に適
用する場合の構成例について述べる。本実施形態では、
ピッチ駆動信号の処理に関わる部分以外は基本的に第１
の実施形態と同様にして実現できるので、図７において
第１の実施形態と同様の処理で実現できる部分に対して
は、図１と同じ参照符号を付して説明を省略し、ピッチ
駆動信号のフォルマント強調に関わる部分を中心に詳細
に説明する。(Third Embodiment) FIG. 9 is a block diagram showing the configuration of a speech encoding / decoding device according to a third embodiment. In the first and second embodiments, the configuration in which the formant enhancement is performed only on the noise drive signal candidates has been described. However, in the third embodiment, the formant enhancement is applied to both the pitch drive signal candidates and the noise drive signal candidates. An example of the configuration in the case of doing this will be described. In this embodiment,
Basically, except for the part related to pitch drive signal processing,
7, the same reference numerals as in FIG. 1 denote parts which can be realized by the same processing as in the first embodiment, and a description thereof will be omitted. The details related to the formant emphasis will be described in detail.

【００６９】［符号化側について］図９において、符号
化側には合成フィルタ情報符号化部１００、重みフィル
タ１０１、フォルマント強調情報生成部７０２、ピッチ
成分符号化部７０５、雑音成分符号化部７０９、ゲイン
成分符号化部１１０および局部復号化部１１１が設けら
れている。これらのうち、ピッチ成分符号化部７０５
は、適応コードブック１０３とフォルマント強調部Ａ７
０３、ピッチ駆動信号探索部１０４から構成される。ま
た、雑音成分符号化部７０９は、雑音駆動信号候補発生
部１０６、フォルマント強調部Ｂ７０７および雑音駆動
信号探索部１０８から構成される。[Encoding side] In FIG. 9, on the encoding side, the synthesis filter information encoding section 100, the weight filter 101, the formant enhancement information generation section 702, the pitch component encoding section 705, and the noise component encoding section 709 are provided. , A gain component encoder 110 and a local decoder 111. Among them, the pitch component coding unit 705
Is the adaptive codebook 103 and the formant emphasis unit A7
03, a pitch drive signal search unit 104. The noise component coding unit 709 includes a noise drive signal candidate generation unit 106, a formant enhancement unit B707, and a noise drive signal search unit 108.

【００７０】フォルマント強調情報生成部７０２は、合
成フィルタの情報を用いて、合成フィルタのスペクトル
包絡の凹凸の関係を反映するようなフォルマント強調の
ためのパラメータ情報を求める。この際、ピッチ駆動信
号用のフォルマント強調情報Ａと雑音駆動信号用のフォ
ルマント強調情報Ｂを出力する。このようにすると、ピ
ッチ駆動信号と雑音駆動信号とでフォルマント強調の度
合いや特性を最適化することができるため、さらに音質
を改善することが可能となる。The formant emphasis information generation section 702 uses the information of the synthesis filter to obtain parameter information for formant emphasis that reflects the relationship between the unevenness of the spectral envelope of the synthesis filter. At this time, formant emphasis information A for the pitch drive signal and formant emphasis information B for the noise drive signal are output. By doing so, the degree and characteristics of formant emphasis can be optimized between the pitch drive signal and the noise drive signal, so that the sound quality can be further improved.

【００７１】次に、本発明によるフォルマント強調を応
用したピッチ成分符号化部７０５について詳細に説明す
る。適応コードブック１０３は、過去の符号化された駆
動信号を格納しており、所定の方法により目標ベクトル
に含まれるピッチ周期成分を表現するためのピッチ駆動
信号の候補を発生する。Next, the pitch component encoder 705 to which formant enhancement according to the present invention is applied will be described in detail. The adaptive codebook 103 stores past encoded drive signals, and generates pitch drive signal candidates for expressing pitch cycle components included in the target vector by a predetermined method.

【００７２】フォルマント強調部Ａ７０３は、フォルマ
ント強調情報生成部７０２からのフォルマント強調情報
Ａを用いて発生されたピッチ駆動信号候補に対しフォル
マント強調を行う。この際、ピッチ駆動信号探索部１０
４はフォルマント強調されたピッチ駆動信号候補を用い
て定義される、フォルマント強調された重み付き合成ピ
ッチ信号と目標ベクトルとの歪が小さくなるような候補
を探索する。こうして選ばれたフォルマント強調された
ピッチ駆動信号と、これに対応するピッチ周期コード、
およびフォルマント強調された重み付き合成ピッチ信号
をピッチ成分符号化部７０５から出力する。The formant emphasis unit A 703 performs formant emphasis on the pitch drive signal candidate generated using the formant emphasis information A from the formant emphasis information generation unit 702. At this time, the pitch drive signal search unit 10
Reference numeral 4 searches for a candidate that reduces the distortion between the formant-emphasized weighted synthesized pitch signal and the target vector, which is defined using the formant-emphasized pitch drive signal candidate. The selected formant-emphasized pitch drive signal and the corresponding pitch period code,
Then, a weighted synthesized pitch signal with formant emphasis is output from the pitch component encoding unit 705.

【００７３】次に、雑音成分符号化部７０９について説
明する。雑音駆動信号候補発生部１０６は、所定の方法
で雑音コードに対応付けられて一意に復号できる雑音駆
動信号を発生できるように構成されている。この雑音駆
動信号候補発生部１０６は通常、静的に備えられる雑音
コードブックで実現することができるが、これに限られ
るものではなく、複数の小さな雑音コードブックの組み
合わせや、予め組み合わせを限定されたパルス、等で静
的な雑音コードブックに相当するものを表現する方法も
知られており、その実現方法にあたって自由度が大き
い。本発明は、このような駆動信号候補全てに対応で
き、フォルマント強調を行うことで動的に符号化歪を抑
制できるように雑音駆動信号候補の一部分または全部を
変形することができる。Next, the noise component coding section 709 will be described. The noise drive signal candidate generating section 106 is configured to generate a noise drive signal that can be uniquely decoded in association with a noise code by a predetermined method. The noise drive signal candidate generating unit 106 can be generally realized by a statically provided noise codebook, but is not limited to this. A combination of a plurality of small noise codebooks or a combination in advance is limited. There is also known a method of expressing an equivalent to a static noise codebook by using a pulse or the like, and the method of realizing the method has a large degree of freedom. The present invention can cope with all such drive signal candidates, and can partially or entirely change the noise drive signal candidates so as to dynamically suppress coding distortion by performing formant enhancement.

【００７４】フォルマント強調部Ｂ７０７は、フォルマ
ント強調情報生成部７０２からのフォルマント強調情報
Ｂを用いて、フォルマント強調された雑音駆動信号を生
成する。The formant emphasis unit B 707 uses the formant emphasis information B from the formant emphasis information generation unit 702 to generate a formant-enhanced noise drive signal.

【００７５】雑音駆動信号探索部１０８は、フォルマン
ト強調された雑音駆動信号候補を重み付き合成フィルタ
で合成し、得られる合成雑音信号を用いてピッチ成分を
除いた目標ベクトルに対して歪が小さくなるような候補
を探索し、選ばれた雑音コード、それに対応するフォル
マント強調を示された雑音駆動信号および重み合成され
たフォルマント強調雑音信号を雑音符号化部７０９の出
力として出力する。The noise drive signal searching section 108 combines the formant-enhanced noise drive signal candidates with a weighted synthesis filter, and uses the obtained synthesized noise signal to reduce distortion with respect to the target vector from which the pitch component has been removed. Such a candidate is searched for, and the selected noise code, the noise drive signal indicating the corresponding formant enhancement, and the weight-combined formant-enhanced noise signal are output as the output of the noise encoder 709.

【００７６】［復号化側について］図９において、音声
復号化部には逆多重化部１１３、合成フィルタ情報再生
部１１４、フォルマント強調情報生成部７１５、適応コ
ードブック１１６、雑音駆動信号再生部１１７、フォル
マント強調部Ａ７１９、フォルマント強調部Ｂ７２０、
ゲイン成分再生部１１８、駆動信号再生部１２０および
合成フィルタ１２１が備えられ、逆多重化部１１３で分
離されたコードを基に合成音声信号を再生する。[Decoding Side] In FIG. 9, the audio decoding unit includes a demultiplexing unit 113, a synthesis filter information reproducing unit 114, a formant emphasis information generating unit 715, an adaptive codebook 116, and a noise driving signal reproducing unit 117. , Formant emphasis section A719, formant emphasis section B720,
A gain component reproducing unit 118, a drive signal reproducing unit 120, and a synthesis filter 121 are provided, and reproduce a synthesized audio signal based on the code separated by the demultiplexing unit 113.

【００７７】フォルマント強調情報生成部７１５は、符
号化側で説明したフォルマント強調情報生成部７０２と
同じ機能を持ち、復号された合成フィルタの情報を基に
ピッチ駆動信号用のフォルマント強調情報Ａと雑音駆動
信号情報用のフォルマント強調情報Ｂを出力する。The formant emphasis information generation section 715 has the same function as the formant emphasis information generation section 702 described on the encoding side, and forms formant emphasis information A for a pitch drive signal and noise based on decoded synthesis filter information. It outputs formant emphasis information B for drive signal information.

【００７８】適応コードブック１１６は、過去の駆動信
号を格納しており、ピッチ周期コードを基にピッチ駆動
信号を再生する。次に、フォルマント強調部Ａ７１９で
フォルマント強調情報Ａを用いて再生されたピッチ駆動
信号のフォルマント強調を行う。The adaptive code book 116 stores past drive signals, and reproduces pitch drive signals based on pitch cycle codes. Next, the formant emphasis unit A719 performs formant emphasis on the reproduced pitch drive signal using the formant emphasis information A.

【００７９】雑音駆動信号再生部１１７は、雑音コード
を基に符号化側と同じ方法で雑音駆動信号を再生する。
フォルマント強調部Ｂ７２０は、フォルマント強調情報
Ｂを用いて再生された雑音駆動信号のフォルマント強調
を行う。The noise driving signal reproducing section 117 reproduces the noise driving signal based on the noise code in the same manner as the encoding side.
The formant emphasis unit B720 performs formant emphasis on the reproduced noise drive signal using the formant emphasis information B.

【００８０】次に、図１０に示すフローチャートを用い
て本実施形態における符号化側の処理手順について説明
する。通常、ＣＥＬＰ方式では合成フィルタ情報の符号
化はフレーム単位で行い、他のピッチ周期、雑音駆動信
号、ゲインの符号化については、フレームをさらに分割
したサブフレーム単位に行うことが多いが、ここでは簡
単のため、ピッチ周期、雑音駆動信号およびゲインの符
号化もフレーム単位で行うように設計した場合の例につ
いて説明する。Next, the processing procedure on the encoding side in this embodiment will be described with reference to the flowchart shown in FIG. Normally, in the CELP method, encoding of synthesis filter information is performed in units of frames, and encoding of other pitch periods, noise drive signals, and gains is often performed in units of subframes obtained by further dividing a frame. For the sake of simplicity, an example will be described in which the pitch period, the noise drive signal, and the gain are designed to be encoded in frame units.

【００８１】始めに初期設定を行い（ステップＳ５０
０）、次にフレーム処理用の音声を入力し（ステップＳ
５０１）、合成フィルタの情報を抽出して符号化する
（ステップＳ５０２）。次に、分析されたスペクトル包
絡の情報から得られる重みフィルタで入力音声信号の重
み付けを行い、駆動信号の符号化の目標となる目標信号
を生成する（ステップＳ５０３）。そして、フォルマン
ト強調のために合成フィルタの情報を用いて、合成フィ
ルタのスペクトル包絡の凹凸の関係を反映するようなフ
ォルマント強調情報Ａとフォルマント強調情報Ｂを求め
る（ステップＳ５０４）。このうちフォルマント強調情
報Ａはピッチ駆動信号用のものであり、フォルマント強
調情報Ｂは雑音駆動信号用のものである。次に、フォル
マント強調情報Ａを用いてフォルマント強調されたピッ
チ駆動信号を候補にしてピッチ駆動信号の探索を行い、
選ばれたピッチ周期を符号化する（ステップＳ５０
５）。次に、フォルマント強調情報Ｂを用いてフォルマ
ント強調された雑音駆動信号候補を候補にして雑音駆動
信号の探索と符号化を行う（ステップＳ５０６）。次
に、ピッチ駆動信号と雑音駆動信号にそれぞれ用いるゲ
イン値の符号化を行う（ステップＳ５０７）。符号化さ
れたデータ（符号化データ）は多重化されてビットスト
リームとして復号側に伝送される（ステップＳ５０
８）。First, initial settings are made (step S50).
0), and then input a sound for frame processing (step S).
501), the information of the synthesis filter is extracted and encoded (step S502). Next, the input audio signal is weighted by a weight filter obtained from the analyzed information on the spectrum envelope, and a target signal to be a target for encoding the drive signal is generated (step S503). Then, formant emphasis information A and formant emphasis information B that reflect the relationship between the unevenness of the spectrum envelope of the synthesis filter and the formant emphasis information B are obtained using the information of the synthesis filter for formant emphasis (step S504). The formant emphasis information A is for a pitch drive signal, and the formant emphasis information B is for a noise drive signal. Next, a pitch drive signal is searched for using the formant-emphasized information as a candidate for the formant-emphasized pitch drive signal,
Encode the selected pitch period (step S50)
5). Next, a noise drive signal search and encoding are performed using the noise drive signal candidates subjected to the formant enhancement using the formant enhancement information B as candidates (step S506). Next, the gain values used for the pitch drive signal and the noise drive signal are encoded (step S507). The coded data (coded data) is multiplexed and transmitted to the decoding side as a bit stream (step S50).
8).

【００８２】さらに、ゲインを与えて駆動信号を生成
し、聴覚重み付きの合成音声を生成する局部復号化を行
う（ステップＳ５０９）。最後に次のフレームの処理に
備え、符号化に用いる内部データを移動させる（ステッ
プＳ５１０）。Further, a drive signal is generated by giving a gain, and local decoding is performed to generate a synthesized speech with auditory weight (step S509). Finally, in preparation for the processing of the next frame, the internal data used for encoding is moved (step S510).

【００８３】次に、図１１に示すフローチャートを用い
て本実施形態における復号化側の処理手順を説明する。
まず、復号化処理の内部状態を初期設定する（ステップ
Ｓ６００）。次に、符号化側から送られてきたビットス
トリーム（符号化データ）を復号化で用いる各コードに
分離する（ステップＳ６０１）。これらのうち、合成フ
ィルタのコードを用いて合成フィルタ情報を再生し（ス
テップＳ６０２）、次いでこの合成フィルタの情報を用
いてフォルマント強調情報Ａとフォルマント強調情報Ｂ
を求める（ステップＳ６０３）。Next, the processing procedure on the decoding side in this embodiment will be described with reference to the flowchart shown in FIG.
First, the internal state of the decoding process is initialized (step S600). Next, the bit stream (coded data) sent from the coding side is separated into codes used for decoding (step S601). Of these, the synthesis filter information is reproduced using the synthesis filter code (step S602), and the formant enhancement information A and the formant enhancement information B are then generated using the synthesis filter information.
Is obtained (step S603).

【００８４】次に、フォルマント強調情報Ａとピッチ周
期コードを基にフォルマント強調されたピッチ駆動信号
を再生し（ステップＳ６０４）、フォルマント強調情報
Ｂと雑音コードを基にフォルマント強調した雑音駆動信
号を再生する（ステップＳ６０５）。次にゲインコード
を用いてゲイン情報を再生し（ステップＳ６０６）、こ
のゲインを用いてフォルマント強調されたピッチ駆動信
号とフォルマント強調された雑音駆動信号を結合し、駆
動信号を求める（ステップＳ６０７）。次に、求められ
駆動信号を合成フィルタに入力し、合成音声を作成する
（ステップＳ６０８）。最後に、次のフレーム処理に備
えて内部データを移動させる（ステップＳ６０９）。Next, a pitch drive signal whose formant is emphasized based on the formant emphasis information A and the pitch cycle code is reproduced (step S604), and a noise drive signal whose formant is emphasized is reproduced based on the formant emphasis information B and the noise code. (Step S605). Next, the gain information is reproduced using the gain code (step S606), and the pitch drive signal subjected to formant emphasis and the noise drive signal subjected to formant emphasis are combined using the gain to obtain a drive signal (step S607). Next, the obtained drive signal is input to a synthesis filter to generate a synthesized voice (step S608). Finally, the internal data is moved in preparation for the next frame processing (step S609).

【００８５】（第４の実施形態）図１２は、本発明の第
４の実施形態に係る音声符号化装置の構成をブロック図
で表したものである。本実施形態では第１の実施形態で
説明した符号化部における雑音駆動信号のフォルマント
強調探索と等価な探索ができる別の方法を示す。すなわ
ち、聴覚重み付き合成フィルタとフォルマント強調フィ
ルタを組み合わせたフィルタのインパルス応答（フォル
マント強調付きインパルス応答）と、フォルマント強調
前の雑音駆動信号を用いて駆動信号の符号化を行う。本
実施形態では、雑音成分符号化部以外のところは第１の
実施形態と同じであるので、同じ部分の説明は省略する
ことにする。(Fourth Embodiment) FIG. 12 is a block diagram showing a configuration of a speech coding apparatus according to a fourth embodiment of the present invention. In this embodiment, another method capable of performing a search equivalent to the formant emphasis search of the noise drive signal in the encoding unit described in the first embodiment will be described. That is, the drive signal is encoded using the impulse response of the filter combining the auditory weighting synthesis filter and the formant enhancement filter (impulse response with formant enhancement) and the noise drive signal before formant enhancement. In the present embodiment, parts other than the noise component coding unit are the same as those of the first embodiment, and the description of the same parts will be omitted.

【００８６】符号化部は、合成フィルタ情報符号化部１
００、重みフィルタ１０１、フォルマント強調情報生成
部１０２、ピッチ成分符号化部１０５、ピッチ強調フィ
ルタ情報生成部８０５、雑音成分符号化部８００、ゲイ
ン成分符号化部１１０および局部復号化部１１１から構
成される。The encoding section includes a synthesis filter information encoding section 1
00, a weight filter 101, a formant enhancement information generator 102, a pitch component encoder 105, a pitch enhancement filter information generator 805, a noise component encoder 800, a gain component encoder 110, and a local decoder 111. You.

【００８７】これらのうち雑音成分符号化部８００は、
雑音駆動信号候補発生部８０４、フォルマント強調付き
インパルス応答計算部８０１、修正目標ベクトル計算部
８０２および雑音駆動信号探素部８０３から構成され
る。以下、各部の説明を詳細に行う。The noise component coding unit 800 among them
It comprises a noise drive signal candidate generator 804, an impulse response calculator with formant enhancement 801, a corrected target vector calculator 802, and a noise drive signal searcher 803. Hereinafter, each part will be described in detail.

【００８８】雑音駆動信号候補発生部８０４は、所定の
方法で雑音コードに対応付けられて一意に復号できる雑
音駆動信号を発生できるように構成されている。この雑
音駆動信号候補発生部８０４は通常、静的に備えられる
雑音コードブックで実現することができるが、これに限
られるものではなく、複数の小さな雑音コードブックの
組み合わせや、予め組み合わせを限定されたパルス、等
で静的な雑音コードブックに相当するものを表現するも
のでもよく、その実現方法の自由度が大きい。本発明
は、このようにして発生される駆動信号候補全てに対応
できる。The noise drive signal candidate generator 804 is configured to generate a noise drive signal that can be uniquely decoded by being associated with a noise code by a predetermined method. This noise drive signal candidate generating section 804 can be normally realized by a statically provided noise codebook, but is not limited to this. A combination of a plurality of small noise codebooks or a combination in advance is limited. It may be one that expresses a static noise codebook corresponding to a pulse or the like, and the method of realizing the method is large. The present invention can deal with all the drive signal candidates generated in this way.

【００８９】フォルマント強調付きインパルス応答計算
部８０１は、合成フィルタ情報符号化部１００からの聴
覚重み付き合成フィルタの情報とフォルマント強調情報
生成部１０２からのフォルマント強調情報を用いて、フ
ォルマント強調付きインパルス応答ｈｗｃ（ｎ）を計算
する。具体的には、聴覚重み付き合成フィルタの特性Ｈ
ｗ（ｚ）とフォルマント強調フィルタの特性Ｈｃ（ｚ）
を用いて、Ｈｗｃ（ｚ）＝Ｈｗ（ｚ）Ｈｃ（ｚ）なる特
性のフォルマント強調付き聴覚重み付き合成フィルタの
インパルス応答ｈｗｃ（ｎ）を求める。この計算は、ｈ
ｗｃ（ｎ）＝ｈｗ（ｎ）＊ｈｃ（ｎ）によっても行うこ
とができる。ここで、ｈｗ（ｎ）は聴覚重み付き合成フ
ィルタのインパルス応答、ｈｃ（ｎ）はフォルマント強
調フィルタのインパルス応答、記号＊は畳み込みの計算
を表す。ｈｃ（ｎ）には、前述した窓掛けによる打ち切
りを用いたＭ次のＦＩＲフィルタのインパルス応答を用
いることができることは言うまでもない。The impulse response with formant emphasis calculation section 801 uses the information on the synthesis filter with auditory weights from the synthesis filter information encoding section 100 and the formant emphasis information from the formant emphasis information generation section 102 to generate the impulse response with formant emphasis. hwc (n) is calculated. Specifically, the characteristic H of the auditory weighted synthesis filter
w (z) and characteristic Hc (z) of the formant emphasis filter
Is used to obtain an impulse response hwc (n) of a hearing weighted synthesis filter with formant emphasis having a characteristic of Hwc (z) = Hw (z) Hc (z). This calculation is h
It can also be performed by wc (n) = hw (n) * hc (n). Here, hw (n) represents the impulse response of the synthesis filter with auditory weights, hc (n) represents the impulse response of the formant enhancement filter, and the symbol * represents calculation of convolution. It goes without saying that the impulse response of the M-order FIR filter using the above-described truncation by windowing can be used as hc (n).

【００９０】修正目標ベクトル計算部８０２は、目標ベ
クトル（ピッチ成分を除かれていることが望ましい）ｘ
とフォルマント強調付きインパルス応答ｈｗｃ（ｎ）を
用いて、修正された目標ベクトルｘ_m を求める。修正さ
れた目標べクトルｘ_m の具体的な計算方法の一例を次に
示す。The corrected target vector calculation unit 802 calculates a target vector (pitch component is preferably removed) x
Using formant enhancement with impulse response hwc (n) and calculates the corrected target vector x _m. Following an example of modified concrete calculation method of the target base vector x _m.

【００９１】[0091]

【数３】 (Equation 3)

【００９２】ここで、Ｎは雑音駆動信号の符号化を行う
区間のサンプル数を示す。この計算は、行列とベクトル
を用いてｘ_m ＝Ｈ_wc ^t ｘと表すことができる。ここで、Ｈ_wcはｈｗｃ（ｎ）を用
いた畳み込み演算を表すＮ×Ｎの下三角行列（行列の中
の右上の要素が零の下三角行列）を表す。Here, N indicates the number of samples in a section in which the noise drive signal is encoded. This calculation can be expressed as x _m = H _wc ^t x using matrix and vector. Here, H _wc represents an N × N lower triangular matrix representing a convolution operation using hwc (n) (the upper right element in the matrix is a lower triangular matrix of zero).

【００９３】この他、フォルマント強調特性だけでな
く、ピッチ周期性を強調する特性を雑音駆動信号に与え
るか、または、ピッチ周期性を強調する特性を聴覚重み
付き合成フィルタのインパルス応答に与える方法も有効
である。ここでは、後者の方法の場合について説明す
る。このときＨｗｃ（ｚ）は、Ｈｗｃ（ｚ）＝Ｈｗ（ｚ）Ｈｃ（ｚ）Ｈｐ（ｚ）（８）で表すことができる。さらに、このフォルマント強調付
きインパルス応答ｈｗｃ（ｎ）は、ｈｗｃ（ｎ）＝ｈｗ（ｎ）＊ｈｃ（ｎ）＊ｈｐ（ｎ）（９）により求めることができる。ここで、Ｈｐ（ｚ）とｈｐ
（ｎ）は、それぞれピッチ強調フィルタとそのインパル
ス応答を表す。ピッチ強調フィルタの伝達関数の一例
は、ピッチ強調フィルタ情報生成部８０５から供給され
るピッチ周期Ｔ、ピッチゲインβを用いて次式で表され
る。In addition to the above, not only the formant emphasis characteristic but also a method of emphasizing the pitch periodicity to the noise driving signal or a method of emphasizing the pitch periodicity to the impulse response of the perceptually weighted synthesis filter is provided. It is valid. Here, the latter method will be described. At this time, Hwc (z) can be expressed by Hwc (z) = Hw (z) Hc (z) Hp (z) (8) Further, the impulse response hwc (n) with formant emphasis can be obtained by hwc (n) = hw (n) * hc (n) * hp (n) (9) Here, Hp (z) and hp
(N) represents a pitch emphasis filter and its impulse response, respectively. An example of the transfer function of the pitch enhancement filter is represented by the following equation using the pitch period T and the pitch gain β supplied from the pitch enhancement filter information generation unit 805.

【００９４】Ｈｐ（ｚ）＝１／（１−β^-T）Ｔの値はピッチ成分符号化部１０５で探索されたピッチ
周期そのものか、または、それに基づいて決まるピッチ
周期を用いることができる。例えば、非整数サンプルの
精度までのピッチ周期Ｔ_F をピッチ成分符号化部で用い
ている場合は、ピッチ強調フィルタの計算量を減らすた
めに、整数化したピッチ周期Ｔ＝ｉｎｔ（Ｔ_F ）を用い
る方法も有効である。（８）式および（９）式における
フィルタやインパルス応答の順序はどのようであっても
構わないことは明らかである。As the value of Hp (z) = 1 / (1-β- ^T ) T, the pitch period itself searched by the pitch component coding unit 105 or a pitch period determined based on the pitch period can be used. For example, when the pitch component _TF up to the precision of the non-integer sample is used in the pitch component encoding unit, the pitch period T = int (T _F ) is converted into an integer to reduce the amount of calculation of the pitch emphasis filter. The method used is also effective. It is clear that the order of the filters and the impulse responses in the equations (8) and (9) may be any order.

【００９５】雑音駆動信号探索部８０３は、修正された
目標ベクトルｘ_m とフォルマント強調付きインパルス応
答ｈｗｃ（ｎ）と雑音駆動信号候補発生部８０４からの
雑音駆動信号候補（フォルマント強調されていない雑音
駆動信号候補）ベクトルｖ_kを用いて、最終的に次式が
できるだけ大きくなるような雑音コードｋを探索するよ
うにすることで雑音駆動信号の選択を行う。The noise drive signal search section 803 includes a corrected target vector x _m , an impulse response hwc (n) with formant enhancement, and a noise drive signal candidate from the noise drive signal candidate generation section 804 (a noise drive without formant enhancement). By using the signal candidate) vector v _k , a noise driving signal is selected by searching for a noise code k such that the following equation finally becomes as large as possible.

【００９６】Ａ_k ² ／Ｂ_k ＝（ｘ_m ^t ｖ_k ）² ／（ｖ_k ^t Ｈ_wc ^t Ｈ_wcｖ_k ）（１１）雑音駆動信号の探索に要する計算量を削減するために、
様々な方法が考えられる。例えば、探索の評価式である
（１１）式の分子部分またはｘ_m ^t ｖ_k の絶対値だけを
用いて多数の候補を少数の候補にふるい落としてから、
残った少数の候補の中で（１１）式が最大となる駆動信
号候補を選ぶ方法を用いると、探索に要する計算量を大
幅に削減でき、しかも全探索に比べて品質がほとんど落
ちない。なぜなら、探索ループの前でｘ_m を一度求めて
おくと、探索ループの中で必要な（１１）式の分子部分
の計算は、修正された目標ベクトルｘ_m とフォルマント
強調されていない雑音駆動信号候補ベクトルｖ_k との直
接の内積計算だけでよく、ループの中からフォルマント
強調処理を除くことができるからである。また、ふるい
落とし用の評価式として、（ｘ_m ｖ_k ）² ｆ_k や、｜ｘ
_m ^t ｖ_k ｜ｆ_k ^1/2（ここで、ｆ_k ＝（ｖ_k ^t ｖ
_k ）^-1）を用いると、さらにふるい落としの選別能力が
向上するため、より少ない候補に絞り込むことができる
という効果がある。A _k ² / B _k = (x _m ^t v _k ) ² / (v _k ^t H _{w c} ^t H _{w c} v _k ) (11) In order to reduce the calculation amount required for searching for the noise drive signal,
Various methods are conceivable. For example, a large number of candidates are sifted to a small number of candidates using only the numerator part of expression (11), which is an evaluation expression for search, or the absolute value of x _m ^t v _k ,
If a method of selecting a drive signal candidate that maximizes the expression (11) from the remaining small number of candidates is used, the amount of calculation required for the search can be significantly reduced, and the quality hardly deteriorates compared to the full search. This is because, once x _m has been obtained before the search loop, the calculation of the numerator part of the equation (11) required in the search loop requires the corrected target vector x _m and the noise drive signal without formant emphasis. This is because it is only necessary to calculate the inner product directly with the candidate vector v _k, and the formant enhancement processing can be eliminated from the loop. In addition, as evaluation expressions for removing sieves, (x _m v _k ) ² f _k and | x
_m ^t v _k | f _k ^1/2 (where f _k = (v _k ^t v
_The use of _k ) ^-1 ) further improves the ability to sort out sieves, and has the effect of narrowing down to fewer candidates.

【００９７】また、雑音駆動信号の候補が予め組み合わ
せを限定された少数のパルスで表され、パルスの絶対値
振輻が固定値（例えば１）に拘束されているか、また
は、パルスの絶対値振幅が狭い範囲（例えば０．９〜
１．１）に拘束されているような特殊な構成のマルチパ
ルス符号化方式に本発明を適用する場合は、実際に駆動
信号候補は生成せずに、修正された目標ベクトルｘ_m の
要素の絶対値が最大となる位置を探索してパルスの立つ
位置の候補を絞ったり、候補のふるい落としを行ったり
することが可能となる。特に、１〜３程度の少ないパル
スだけから構成されるような駆動信号候補を用いる方式
に本発明を適用する場合は、目標ベクトルとフォルマン
ト強調付きインパルス応答ｈｗｃ（ｎ）を用いて計算さ
れる修正された目標ベクトルｘ_m の要素の絶対値が最大
となる位置を探索し、その位置のｘ_mの要素の正負に応
じてパルスの振幅の正負を決めるようにするだけで、雑
音駆動信号と雑音コードを決めることも可能である。マ
ルチパルス方式に関しては「ディジタル音声処理」（東
海大学出版会、古井著）（文献２）に詳しく書かれてい
るので、ここでは説明を省略する。Further, the candidate of the noise drive signal is represented by a small number of pulses whose combination is limited in advance, and the absolute value of the pulse is restricted to a fixed value (for example, 1), or the absolute value amplitude of the pulse is Is in a narrow range (for example, 0.9 to
When the present invention is applied to a multi-pulse coding scheme having a special configuration constrained to 1.1), a drive signal candidate is not actually generated, and the element of the corrected target vector x _m is It is possible to search for the position where the absolute value is the maximum and narrow down the candidates for the position where the pulse rises, or to screen out the candidates. In particular, when the present invention is applied to a method using a drive signal candidate composed of only about 1 to 3 small pulses, a correction calculated using the target vector and the impulse response hwc (n) with formant emphasis. A search is made for a position where the absolute value of the element of the obtained target vector x _m is maximum, and the positive and negative of the pulse amplitude are determined in accordance with the positive and negative of the element of x _m at that position. It is also possible to determine the code. The multi-pulse method is described in detail in “Digital Voice Processing” (published by Tokai University Press, Furui) (Reference 2), and thus the description is omitted here.

【００９８】雑音コードが選ばれると、それに対応する
フォルマント強調をされた雑音駆動信号およびフォルマ
ント強調された重み付き合成雑音信号を生成し、雑音コ
ードとこれらを雑音符号化部の出力として出力する。フ
ォルマント強調特性だけでなくピッチ周期性の強調の特
性も用いて探索する構成では、フォルマント強調とピッ
チ強調フィルタ情報生成部８０５からのピッチ周期とピ
ッチゲインを用いたピッチ周期性強調の両方の強調が行
われた雑音駆動信号を生成し、さらにこれを聴覚重み付
き合成して重み付き合成雑音信号を生成する。When a noise code is selected, a formant-enhanced noise driving signal and a formant-weighted synthesized noise signal corresponding to the selected noise code are generated, and the noise code and these are output as outputs of the noise encoding unit. In the configuration in which the search is performed using not only the formant emphasis characteristic but also the pitch periodicity emphasis characteristic, both the formant emphasis and the pitch period from the pitch emphasis filter information generation unit 805 and the pitch periodicity emphasis using the pitch gain are emphasized. The generated noise driving signal is generated, and the weighted synthesized noise signal is generated by further performing auditory weighting synthesis.

【００９９】次に、図１３に示すフローチャートを用い
て本実施形態における符号化の処理手順を説明する。始
めに初期設定を行い（ステップＳ８００）、次にフレー
ム処理用の音声を入力し（ステップＳ８０１）、合成フ
ィルタの情報を抽出して、これを符号化する（ステップ
Ｓ８０２）。Next, the encoding procedure in this embodiment will be described with reference to the flowchart shown in FIG. First, initial settings are made (step S800), and then a speech for frame processing is input (step S801), and information of a synthesis filter is extracted and encoded (step S802).

【０１００】次に、分析されたスペクトル包絡の情報か
ら得られる特性の重みフィルタで入力音声信号の重み付
けを行い、駆動信号の符号化の目標となる目標信号を生
成する（ステップＳ８０３）。Next, the input audio signal is weighted by a weighting filter having characteristics obtained from the analyzed information on the spectral envelope, and a target signal to be a target of drive signal encoding is generated (step S803).

【０１０１】さらに、ピッチ駆動信号の探索を行い、選
ばれたピッチ周期を符号化する（ステップＳ８０４）。
そして、フォルマント強調のために合成フィルタの情報
を用いて、合成フィルタのスペクトル包絡の凹凸の関係
を反映するようなフォルマント強調情報を求める。Further, a search for a pitch drive signal is performed, and the selected pitch cycle is encoded (step S804).
Then, using the information of the synthesis filter for formant enhancement, formant enhancement information that reflects the relationship between the unevenness of the spectral envelope of the synthesis filter is obtained.

【０１０２】また、ピッチ周期性を強調する場合は、さ
らにピッチ強調フィルタの情報を求める（ステップＳ８
０５）。次に、フォルマント強調付きインパルス応答と
修正された目標ベクトルを計算し（ステップＳ８０
６）、これらを用いて雑音駆動信号の探索と符号化を行
う（ステップＳ８０７）。When the pitch periodicity is emphasized, information on the pitch emphasis filter is further obtained (step S8).
05). Next, an impulse response with formant enhancement and a corrected target vector are calculated (step S80).
6), search and encode a noise drive signal using these (step S807).

【０１０３】次に、ピッチ駆動信号と雑音駆動信号にそ
れぞれ用いるゲイン値の符号化を行う（ステップＳ８０
８）。符号化されたデータ（符号化データ）は、多重化
されてビットストリームとして復号側に伝送される（ス
テップＳ８０９）。Next, the gain values used for the pitch drive signal and the noise drive signal are encoded (step S80).
8). The coded data (coded data) is multiplexed and transmitted to the decoding side as a bit stream (step S809).

【０１０４】さらに、ゲインを与えて駆動信号を生成
し、聴覚重み付きの合成音声を生成する局部復号化を行
う（ステップＳ８１０）。最後に、次のフレームの処理
に備え、符号化に用いる内部データを移動させる（ステ
ップＳ８１１）。Further, a drive signal is generated by giving a gain, and local decoding is performed to generate a synthesized voice with auditory weight (step S810). Finally, the internal data used for encoding is moved in preparation for the processing of the next frame (step S811).

【０１０５】以上、本発明の実施形態をいくつか説明し
たが、上に挙げた実施形態は一例であって、本発明に基
づく駆動信号のフォルマント強調は、入力音声の性質や
合成フィルタの分析方法に依存して様々な応用の変形が
可能であることは言うまでもない。例えば、ピッチ駆動
信号に対してだけフォルマント強調することも可能であ
るし、フォルマント強調をするしないの判断を音声を分
析することで行ってもよい。The embodiments of the present invention have been described above. However, the above-described embodiments are merely examples, and the formant enhancement of the drive signal according to the present invention is performed by analyzing the characteristics of the input voice and the method of analyzing the synthesis filter. It is needless to say that various application modifications are possible depending on. For example, the formant emphasis can be performed only on the pitch drive signal, or the judgment of not performing the formant emphasis may be made by analyzing a voice.

【０１０６】また、フォルマント強調情報の補助情報と
して、フォルマント強調の強弱の程度や、フォルマント
強調の有無を示す情報を符号化して、符号化側から復号
化側へ伝送するようにしてもよい。As auxiliary information of the formant emphasis information, information indicating the degree of formant emphasis and the presence or absence of formant emphasis may be encoded and transmitted from the encoding side to the decoding side.

【０１０７】また、本発明では音声を対象に説明を簡単
にするため、あえて“フォルマント強調”という用語を
用いているが、適用分野はいわゆる電話帯域などの音声
のみに限られるものではなく、広帯域音声や楽音信号等
の符号化にも適用できる。In the present invention, the term "formant emphasis" is used for the sake of simplicity for speech. However, the application field is not limited to only speech such as a so-called telephone band. The present invention can also be applied to encoding of a voice, a tone signal, and the like.

【０１０８】さらに、本発明において復号化側でポスト
フィルタを用いる場合は、ＣＥＬＰ方式等で用いられる
ポストフィルタの特性と異なる特性のポストフィルタを
用いる方が主観的な音質の向上が大きい。具体的には、
本発明により駆動信号にフォルマント強調が施される場
合、ポストフィルタ内部で行うフォルマント強調は、通
常よりも弱く調整されることが望ましい。こうすること
で、最終的なポストフィルタ出力の音声信号がトータル
的に適切にフォルマント強調されて、合成音声がより自
然な響きの音質となる効果がある。Further, when a post filter is used on the decoding side in the present invention, a subjective improvement in sound quality is greater when a post filter having characteristics different from those of the post filter used in the CELP method or the like is used. In particular,
When the formant enhancement is applied to the drive signal according to the present invention, it is desirable that the formant enhancement performed inside the post-filter is adjusted to be weaker than usual. By doing so, the final post-filter output audio signal is appropriately and appropriately formant-emphasized, and the synthesized voice has the effect of having a more natural sound quality.

【０１０９】[0109]

【発明の効果】以上説明したように、本発明の音声符号
化／復号化方法によれば、フォルマント強調された駆動
信号候補を含む複数の駆動信号候補を用いて、合成音声
信号の歪をより小さくする駆動信号を選択し、この選択
した駆動信号を符号化し、復号化側においても復号化側
と同様にフォルマント強調した駆動信号を用いて合成音
声信号を生成することにより、例えば４ｋｂｉｔ／ｓ程
度以下といったような低ビットレートでも高品質な合成
音声を生成することができる。As described above, according to the speech encoding / decoding method of the present invention, the distortion of the synthesized speech signal is improved by using a plurality of drive signal candidates including the formant-enhanced drive signal candidates. By selecting a drive signal to be reduced, encoding the selected drive signal, and generating a synthesized voice signal using the formant-enhanced drive signal on the decoding side as in the decoding side, for example, about 4 kbit / s. High quality synthesized speech can be generated even at the following low bit rates.

【０１１０】また、駆動信号候補に対するフォルマント
強調特性に駆動信号候補のスペクトルの傾きを補正する
特性を含ませることにより、フォルマント強調処理の中
に混入する不要なスペクトルの傾きを駆動信号に与えな
いようにすることができ、これによって合成音声が時間
的にこもるような現象を防いで、より安定した合成音声
を得ることができる。Further, by including a characteristic for correcting the inclination of the spectrum of the drive signal candidate in the formant emphasis characteristic for the drive signal candidate, an unnecessary spectrum inclination mixed in the formant emphasis processing is not given to the drive signal. This prevents a phenomenon in which the synthesized speech is muffled in time, and a more stable synthesized speech can be obtained.

【０１１１】さらに、フォルマント強調特性を合成フィ
ルタの特性を表す情報から求めるようにすれば、通常、
合成フィルタの情報が符号化側から復号化側に伝送され
ることから、フォルマント強調特性の情報を伝達するた
めに特別なサイド情報を伝送することなく、つまり情報
量を増やすことなく、合成音声の品質向上を図ることが
可能となる。Further, if the formant emphasis characteristic is obtained from information representing the characteristic of the synthesis filter, usually,
Since the information of the synthesis filter is transmitted from the encoding side to the decoding side, it is possible to transmit the information of the formant enhancement characteristic without transmitting special side information, that is, without increasing the amount of information, without increasing the amount of information. Quality can be improved.

[Brief description of the drawings]

【図１】本発明の第１の実施形態に係る音声符号化／復
号化装置の構成を示すブロック図FIG. 1 is a block diagram showing a configuration of a speech encoding / decoding device according to a first embodiment of the present invention.

【図２】本発明で用いるフォルマント強調特性を説明す
るための図FIG. 2 is a diagram for explaining a formant emphasis characteristic used in the present invention.

【図３】本発明で用いるフォルマント強調特性を説明す
るための図FIG. 3 is a diagram for explaining a formant emphasis characteristic used in the present invention.

【図４】本発明の第１の実施形態に係る音声符号化の処
理手順を示すフローチャートFIG. 4 is a flowchart showing a procedure of a speech encoding process according to the first embodiment of the present invention;

【図５】本発明の第１の実施形態に係る音声復号化の処
理手順を示すフローチャートFIG. 5 is a flowchart showing a speech decoding processing procedure according to the first embodiment of the present invention;

【図６】本発明の第２の実施形態に係る音声符号化／復
号化装置の構成を示すブロック図FIG. 6 is a block diagram showing a configuration of a speech encoding / decoding device according to a second embodiment of the present invention.

【図７】本発明の第２の実施形態に係る音声符号化の処
理手順を示すフローチャートFIG. 7 is a flowchart showing a speech encoding processing procedure according to the second embodiment of the present invention;

【図８】本発明の第２の実施形態に係る音声復号化の処
理手順を示すフローチャートFIG. 8 is a flowchart showing a speech decoding processing procedure according to the second embodiment of the present invention;

【図９】本発明の第３の実施形態に係る音声符号化／復
号化装置の構成を示すブロック図FIG. 9 is a block diagram illustrating a configuration of a speech encoding / decoding device according to a third embodiment of the present invention.

【図１０】本発明の第３の実施形態に係る音声符号化の
処理手順を示すフローチャートFIG. 10 is a flowchart showing a speech encoding processing procedure according to a third embodiment of the present invention.

【図１１】本発明の第３の実施形態に係る音声復号化の
処理手順を示すフローチャートFIG. 11 is a flowchart showing a speech decoding processing procedure according to the third embodiment of the present invention.

【図１２】本発明の第４の実施形態に係る音声符号化装
置の構成を示すブロック図FIG. 12 is a block diagram showing a configuration of a speech coding apparatus according to a fourth embodiment of the present invention.

【図１３】本発明の第４の実施形態に係る音声符号化の
処理手順を示すフローチャートFIG. 13 is a flowchart showing a speech encoding processing procedure according to the fourth embodiment of the present invention;

[Explanation of symbols]

１１…音声信号入力端子１２…音声信号出力端子１００…合成フィルタ情報符号化部１０１…重みフィルタ１０２，７０２…フォルマント強調情報生成部１０３…適応コードブック１０４…ピッチ駆動信号探索部１０５，７０５…ピッチ成分符号化部１０６，６０６，８０４…雑音駆動信号候補発生部１０７，６０７，７０３，７０７…フォルマント強調部１０８，６０８，８０３…雑音駆動信号探索部１０９，８００…雑音成分符号化部１１０…ゲイン成分符号化部１１１…局部復号化部１１２…多重化部１１３…逆多重化部１１４…合成フィルタ情報再生部１１５，７１５…フォルマント強調情報生成部１１６…適応コードブック１１７，６１７…雑音駆動信号再生部１１８…ゲイン成分再生部１１９，６１９，７１９，７２０…フォルマント強調部１２０…駆動信号生成部１２１…合成フィルタ８０１…フォルマント強調つきインパルス応答計算部８０２…修正目標ベクトル計算部８０５…ピッチ強調フィルタ情報生成部 DESCRIPTION OF SYMBOLS 11 ... Speech signal input terminal 12 ... Speech signal output terminal 100 ... Synthesis filter information coding part 101 ... Weight filter 102,702 ... Formant emphasis information generation part 103 ... Adaptive codebook 104 ... Pitch drive signal search part 105,705 ... Pitch Component encoders 106, 606, 804 noise drive signal candidate generator 107, 607, 703, 707 formant enhancer 108, 608, 803 noise drive signal searcher 109, 800 noise component encoder 110 gain Component encoding unit 111 Local decoding unit 112 Multiplexing unit 113 Demultiplexing unit 114 Synthetic filter information reproducing unit 115, 715 Formant emphasis information generating unit 116 Adaptive codebook 117, 617 Noise driving signal reproduction Unit 118: gain component reproducing unit 119, 619, 719, 20 ... formant enhancement section 120 ... driving signal generation unit 121 ... synthesis filter 801 ... formant emphasis with the impulse response calculator 802 ... modified target vector calculation section 805 ... pitch emphasis filter information generation unit

Claims

[Claims]

A synthetic speech signal is expressed using a synthesis filter and a drive signal for driving the synthesis filter, and information and a drive signal representing characteristics of the synthesis filter for reducing distortion of the synthesized speech signal with respect to the input speech signal are reduced. In the voice coding method for coding, the drive signal is coded using the synthesis filter, a plurality of drive signal candidates, and a formant emphasis filter having characteristics obtained based on the input voice signal. Voice encoding method.

2. A method for expressing a synthesized speech signal using a synthesis filter and a drive signal for driving the synthesis filter, and information and a drive signal representing characteristics of the synthesis filter for reducing distortion of the synthesized speech signal with respect to the input speech signal. In the speech encoding method for encoding, using a plurality of drive signal candidates including a drive signal candidate formant-enhanced by a formant emphasis filter having characteristics obtained based on the input audio signal, the distortion of the synthesized audio signal is further reduced. A speech encoding method comprising: selecting a drive signal to be reduced; and encoding the selected drive signal.

3. A synthetic speech signal is expressed using a synthesis filter and a drive signal for driving the synthesis filter, and information and a drive signal representing characteristics of the synthesis filter for reducing distortion of the synthesized speech signal with respect to the input speech signal are obtained. In the speech encoding method for encoding, a formant emphasis characteristic is obtained from information representing characteristics of the synthesis filter, and the synthesized speech is obtained by using a plurality of drive signal candidates including a drive signal candidate formant-enhanced by the formant emphasis characteristic. A speech encoding method comprising: selecting a drive signal that reduces signal distortion; and encoding the selected drive signal.

4. A synthesized speech signal is expressed by using a synthesis filter and a drive signal composed of a combination of a pitch drive signal and a noise drive signal for driving the synthesis filter, thereby reducing distortion of the synthesized speech signal with respect to the input speech signal. In a speech encoding method for encoding information and a drive signal representing characteristics of a synthesis filter, at least a part of at least one of a pitch drive signal candidate and a noise drive signal candidate is formant by a formant enhancement characteristic obtained by a predetermined method. Emphasizing, using these pitch drive signal candidates and noise drive signal candidates, selecting a pitch drive signal and a noise drive signal that reduce distortion of the synthesized speech signal, and selecting the selected pitch drive signal and noise drive signal. A speech encoding method characterized by encoding a drive signal comprising a combination of the following.

5. The apparatus according to claim 1, wherein the formant emphasis characteristic includes a characteristic for correcting a spectrum inclination.
The speech encoding method according to any one of claims 1 to 4.

6. A method for reproducing information and a drive signal representing the characteristics of a synthesis filter from input coded data, and driving the synthesis filter whose characteristics are determined by the information representing the characteristics of the synthesis filter by a formant-enhanced drive signal. A speech decoding method characterized by obtaining a synthesized speech signal.

7. A synthetic filter whose characteristics are determined by the information representing the characteristics of the synthesis filter and the drive signal are reproduced from the input encoded data, and a characteristic of the synthesis filter is determined from the information of the synthesis filter. A speech decoding method characterized in that a synthesized speech signal is obtained by driving with a drive signal that is formant-enhanced with formant emphasis characteristics.

8. An information representing a characteristic of a synthesis filter that expresses a synthesized speech signal using a synthesis filter and a drive signal for driving the synthesis filter on the encoding side, and further reduces distortion of the synthesized speech signal with respect to an input speech signal. And transmitting encoded data obtained by encoding the drive signal, and reproducing a synthesized filter whose characteristics are determined by information representing characteristics of the synthesized filter reproduced from the encoded data on the decoding side from the encoded data. A speech encoding / decoding method for obtaining a synthesized speech signal by driving with a drive signal, wherein the encoding side has characteristics obtained based on the synthesis filter, a plurality of drive signal candidates, and the input speech signal. The drive signal is encoded using a formant emphasis filter, and the drive signal on the decoding side is formant-emphasized by the formant emphasis characteristic. Speech encoding / decoding method characterized by obtaining a synthesized speech signal by driving a more said synthesis filter.

9. An information representing a characteristic of a synthesis filter which expresses a synthesized speech signal using a synthesis filter and a drive signal for driving the synthesis filter on the encoding side, and further reduces distortion of the synthesized speech signal with respect to the input speech signal. And transmitting encoded data obtained by encoding the drive signal, and reproducing a synthesized filter whose characteristics are determined by information representing characteristics of the synthesized filter reproduced from the encoded data on the decoding side from the encoded data. A speech encoding / decoding method for obtaining a synthesized speech signal by driving with a drive signal, wherein a plurality of drive signal candidates including a formant-enhanced drive signal candidate with formant enhancement characteristics obtained on a coding side by a predetermined method Is used to select a drive signal that reduces the distortion of the synthesized speech signal, encodes the selected drive signal, and on the decoding side, Speech encoding / decoding method characterized by obtaining a synthesized speech signal by driving the synthesis filter by formant emphasized drive signal formant emphasis characteristic.

10. An information representing a characteristic of a synthesis filter that expresses a synthesized speech signal using a synthesis filter and a drive signal for driving the synthesis filter on the encoding side, and reduces distortion of the synthesized speech signal with respect to an input speech signal. And transmitting encoded data obtained by encoding the drive signal, and reproducing a synthesized filter whose characteristics are determined by information representing characteristics of the synthesized filter reproduced from the encoded data on the decoding side from the encoded data. A speech encoding / decoding method for obtaining a synthesized speech signal by driving with a drive signal, wherein a formant emphasis characteristic is obtained from information representing characteristics of the synthesis filter on the encoding side, and formant emphasis is performed using the formant emphasis characteristic. Using a plurality of drive signal candidates including drive signal candidates, select a drive signal to reduce the distortion of the synthesized audio signal, The selected drive signal is encoded. On the decoding side, a formant emphasis characteristic is obtained from information representing the characteristic of the synthesis filter reproduced from the encoded data. A speech encoding / decoding method characterized by obtaining a synthesized speech signal by driving a filter.

11. The encoding side formant-emphasizes at least a part of at least one of a pitch drive signal candidate and a noise drive signal candidate with a formant emphasis characteristic obtained by a predetermined method, and selects these pitch drive signal candidates. And a candidate for a noise drive signal, selecting a pitch drive signal and a noise drive signal that further reduce the distortion of the synthesized speech signal, and encoding a drive signal comprising a combination of the selected pitch drive signal and the noise drive signal. The speech encoding / decoding method according to claim 9 or 10, wherein:

12. The speech encoding / decoding method according to claim 8, wherein said formant emphasis characteristic includes a characteristic for correcting a spectrum inclination.

13. A synthetic voice signal is expressed by using a synthesis filter and a drive signal for driving the synthesis filter, and information and a drive signal representing characteristics of the synthesis filter for reducing distortion of the synthesized voice signal with respect to the input voice signal are reduced. In the voice coding method to be coded, the characteristics of a perceptual weighting synthesis filter and the characteristics of a formant emphasis filter and a target vector used for coding the drive signal are determined based on the input voice signal, and the perceptual weighting synthesis filter and Determining a formant-enhanced impulse response using a formant emphasis filter; obtaining the target vector and a corrected target vector using the formant-enhanced impulse response; Driven with Impulse Response A speech encoding method comprising encoding a signal.

14. A synthetic speech signal is represented by using a synthesis filter and a drive signal for driving the synthesis filter, and information and a drive signal representing characteristics of the synthesis filter for reducing distortion of the synthesized speech signal with respect to the input speech signal are reduced. In the voice coding method to be coded, a characteristic and a target vector of a perceptual weighting synthesis filter, a pitch emphasis filter, and a formant emphasis filter used for coding the drive signal are obtained based on the input voice signal, and the perceptual weighting synthesis is performed. A filter, a pitch emphasis filter and a formant emphasis filter are used to obtain an impulse response with formant emphasis, the target vector and the corrected target vector are obtained using the formant emphasis impulse response, With target vector and formant enhancement A speech encoding method comprising encoding a noise drive signal using an impulse response.