JPH09160595A

JPH09160595A - Voice synthesizing method

Info

Publication number: JPH09160595A
Application number: JP7315431A
Authority: JP
Inventors: Takehiko Kagojima; 岳彦籠嶋; Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-12-04
Filing date: 1995-12-04
Publication date: 1997-06-20

Abstract

PROBLEM TO BE SOLVED: To provide a voice synthesizing method suited for obtaining a high- quality synthesized voice in text-to-speech synthesis. SOLUTION: This method produces synthesized voice signals by connecting together the pieces of information that are selected from plural pieces of prestored voice synthesis unit information. In this case, a formant emphasizing filter part 17, in which an LPC that is a voice spectrum parameter for use as a coefficient in a vocal-tract filter part 16, is used as a filter factor 112, is provided, and the formant of the synthesized voice signals is emphasized by the filter 17.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はテキスト音声合成シ
ステムにおいて、音韻記号列、ピッチおよび音韻継続時
間長などの情報から合成音声信号を生成する音声合成方
法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesis method for generating a synthesized speech signal from information such as a phoneme symbol string, pitch and phoneme duration in a text-to-speech system.

【０００２】[0002]

【従来の技術】任意の文章（テキスト）から人工的に音
声信号を作り出すことをテキスト音声合成という。通
常、テキスト音声合成システムは、言語処理部、音韻処
理部および音声信号生成部の３つの要素から構成され
る。入力されたテキストは、まず言語処理部において形
態素解析や構文解析などが行われ、次に韻律処理部にお
いてアクセントやイントネーションの処理が行われて、
音韻記号列、ピッチおよび音韻継続時間長などの情報が
出力される。最後に、音声信号生成部すなわち音声合成
器は、音韻記号列、ピッチおよび音韻継続時間長などの
情報から音声信号を合成する。従って、テキスト音声合
成に用いる音声合成方式は、任意の音韻記号列を合成す
ることが可能な方式でなければならない。2. Description of the Related Art Artificially producing a voice signal from an arbitrary sentence (text) is called text-to-speech synthesis. Usually, a text-to-speech synthesis system is composed of three elements: a language processing section, a phoneme processing section, and a speech signal generation section. The input text is first subjected to morphological analysis and syntactic analysis in the language processing unit, and then subjected to accent and intonation processing in the prosody processing unit,
Information such as a phoneme symbol string, pitch, and phoneme duration is output. Finally, the voice signal generator, that is, the voice synthesizer, synthesizes a voice signal from information such as a phoneme symbol string, pitch, and phoneme duration. Therefore, the speech synthesis method used for text-to-speech synthesis must be a method capable of synthesizing an arbitrary phoneme symbol string.

【０００３】このような任意の音韻記号列を合成するこ
とができる音声合成方式の原理は、音節、音素および１
ピッチ区間などの基本となる音声合成単位の情報である
特徴パラメータをピッチや継続時間長を制御して接続す
るものである。ピッチと継続時間長を制御して任意の音
韻記号列を合成することができる音声合成装置の方式と
して、ボコーダ方式やホルマント合成方式が従来知られ
ている。これらの方式は、声帯の信号をモデル化した駆
動信号により、声道の特性をモデル化した声道フィルタ
を駆動することで音声を合成するものであるが、モデル
化の精度が不十分であるため合成音声は不明瞭なものと
なっていた。The principle of a speech synthesis system capable of synthesizing such a phoneme symbol sequence is as follows.
The characteristic parameter, which is information of a basic voice synthesis unit such as a pitch section, is connected by controlling the pitch and duration. A vocoder system and a formant synthesis system are conventionally known as a system of a speech synthesis apparatus capable of synthesizing an arbitrary phoneme symbol string by controlling a pitch and a duration. These methods synthesize voices by driving a vocal tract filter that models vocal tract characteristics with a drive signal that models vocal cord signals, but the modeling accuracy is insufficient. Therefore, the synthesized voice was unclear.

【０００４】そこで、モデル化の精度を上げて音質の向
上を図った方法として、例えば特開昭５８−８０６９９
号「音声合成方式」に開示されているように、自然音声
を分析して得られるスペクトルパラメータに基づいて声
道フィルタを制御し、声道フィルタの逆フィルタで音声
信号を処理することによって得られる残差波形を声道フ
ィルタの駆動信号とする方法がある。Therefore, as a method for improving the sound quality by increasing the modeling accuracy, for example, Japanese Patent Application Laid-Open No. 58-80699.
As disclosed in No. "Voice Synthesis Method", it is obtained by controlling a vocal tract filter based on a spectral parameter obtained by analyzing natural speech and processing the speech signal with an inverse filter of the vocal tract filter. There is a method in which the residual waveform is used as a drive signal for the vocal tract filter.

【０００５】図１７に、この方式の一例である残差駆動
のＬＰＣ方式を用いた従来の音声合成装置の構成を示
す。この音声合成装置は、残差波形記憶部１１、有声音
源生成部１２、無声音源生成部１３、ＬＰＣ係数記憶部
１４、ＬＰＣ係数補間部１５、および声道フィルタ部１
６から構成される。FIG. 17 shows the configuration of a conventional speech synthesizer using the LPC method of residual driving which is an example of this method. This speech synthesizer includes a residual waveform storage unit 11, a voiced sound source generation unit 12, an unvoiced sound source generation unit 13, an LPC coefficient storage unit 14, an LPC coefficient interpolation unit 15, and a vocal tract filter unit 1.
6 is comprised.

【０００６】残差波形記憶部１１は、複数の音声合成単
位の情報として複数の残差波形を予め記憶しており、こ
れらの残差波形の中から波形選択情報１０１に従って選
択された１ピッチ周期長残差波形１０２を出力する。有
声音源生成部１２は、フレーム平均ピッチ１０３を周期
として１ピッチ周期長残差波形１０２を繰り返し、この
繰り返し波形にフレーム平均パワー１０２を乗じること
によって有声音源信号１０５を生成する。この有声音源
信号１０５は、有声／無声判別情報１０７により判別さ
れる有声区間において出力され、声道フィルタ部１６に
入力される。無声音源生成部１３は、フレーム平均パワ
ー１０２に基づいて白色雑音などで表現される無声音源
信号１０６を出力する。この無声音源信号１０６は、有
声／無声判別情報１０７により判別される無声区間にお
いて出力され、声道フィルタ部１６に入力される。The residual waveform storage unit 11 stores a plurality of residual waveforms in advance as information on a plurality of voice synthesis units, and one pitch cycle selected from the residual waveforms according to the waveform selection information 101. The long residual waveform 102 is output. The voiced sound source generation unit 12 repeats the one-pitch cycle length residual waveform 102 with the frame average pitch 103 as a cycle, and multiplies the repeated waveform by the frame average power 102 to generate the voiced sound source signal 105. The voiced sound source signal 105 is output in the voiced section discriminated by the voiced / unvoiced discrimination information 107 and input to the vocal tract filter unit 16. The unvoiced sound source generation unit 13 outputs an unvoiced sound source signal 106 represented by white noise or the like based on the frame average power 102. The unvoiced sound source signal 106 is output in the unvoiced section determined by the voiced / unvoiced determination information 107 and input to the vocal tract filter unit 16.

【０００７】ＬＰＣ係数記憶部１４は、別の音声合成単
位の情報である複数のＬＰＣ係数を記憶しており、ＬＰ
Ｃ係数選択情報１０８に従って一つのＬＰＣ係数１０９
が選択的に出力される。ＬＰＣ係数補間部１５は、フレ
ーム間でＬＰＣ係数が不連続とならないように前フレー
ムのＬＰＣ係数と現フレームのＬＰＣ係数１０９とを補
間してＬＰＣ係数１１０を出力する。The LPC coefficient storage unit 14 stores a plurality of LPC coefficients which are information of another speech synthesis unit,
One LPC coefficient 109 according to the C coefficient selection information 108
Is selectively output. The LPC coefficient interpolating unit 15 interpolates the LPC coefficient of the previous frame and the LPC coefficient 109 of the current frame so that the LPC coefficient does not become discontinuous between frames and outputs the LPC coefficient 110.

【０００８】声道フィルタ部１６は、有声音源信号１０
５または無声音源信号１０６によってＬＰＣ係数１１０
を係数とする声道フィルタを駆動し、合成音声信号１１
１を出力する。The vocal tract filter section 16 includes a voiced sound source signal 10
5 or the unvoiced sound source signal 106, the LPC coefficient 110
Driving a vocal tract filter with a coefficient of
Outputs 1.

【０００９】この音声合成装置では、予め自然音声に線
形予測分析を行って求められた種々のＬＰＣ係数をＬＰ
Ｃ係数記憶部１４に記憶させ、またこれらのＬＰＣ係数
で逆フィルタリングを行うことによって得られる残差波
形から１ピッチ周期の波形を切り出して残差波形記憶部
１１に記憶させておく。このように、自然音声を分析し
て得られるＬＰＣ係数のようなパラメータを声道フィル
タや音源信号に適用しているため、モデル化の精度が高
く、比較的自然音声に近い合成音声を得ることができ
る。In this speech synthesizer, various LPC coefficients obtained by performing linear prediction analysis on natural speech in advance are used as LP.
The C-coefficient storage unit 14 stores the waveform of one pitch period from the residual waveform obtained by performing inverse filtering with these LPC coefficients, and stores it in the residual waveform storage unit 11. In this way, since parameters such as LPC coefficients obtained by analyzing natural speech are applied to the vocal tract filter and the sound source signal, the accuracy of modeling is high and a synthetic speech relatively close to natural speech can be obtained. You can

【００１０】[0010]

【発明が解決しようとする課題】しかし、上述した従来
の音声合成装置では、高精度にモデル化を行っていたと
しても、ＬＰＣ係数や残差波形を求める際に分析した自
然音声とはピッチ周期が異なる音声を合成する場合にス
ペクトル歪みが生じることは避けられない。However, in the above-mentioned conventional speech synthesizer, even if the modeling is performed with high precision, the natural speech analyzed when obtaining the LPC coefficient and the residual waveform is the pitch period. It is inevitable that spectrum distortion will occur when different voices are synthesized.

【００１１】例えば、ある音韻の音声のスペクトル包絡
が図１３（ａ）に示すように表されたとすると、その音
韻を基本周波数ｆで発生した場合の音声信号のパワース
ペクトルは、図１３（ｂ）に示されるようにスペクトル
包絡を周波数間隔ｆでサンプリングした離散的なスペク
トルとなる。同様に、基本周波数ｆ′で発声した場合の
音声信号のパワースペクトルは、図１３（ｃ）に示され
るようにスペクトル包絡を周波数間隔ｆ′でサンプリン
グした離散的なスペクトルとなる。For example, if the spectrum envelope of the voice of a certain phoneme is expressed as shown in FIG. 13A, the power spectrum of the voice signal when the phoneme is generated at the fundamental frequency f is shown in FIG. 13B. As shown in (1), a discrete spectrum is obtained by sampling the spectrum envelope at frequency intervals f. Similarly, the power spectrum of the voice signal when uttered at the fundamental frequency f'is a discrete spectrum in which the spectrum envelope is sampled at the frequency interval f ', as shown in FIG. 13 (c).

【００１２】ここで、基本周波数ｆで発声された図１３
（ｂ）に示されるスペクトルを持つ音声を分析してスペ
クトル包絡を求めることによって、ＬＰＣ係数記憶部１
４に記憶するＬＰＣ係数を求めることを考える。音声信
号の場合は、一般に、図１３（ｂ）に示すような離散的
なスペクトルから図１３（ａ）に示すような真のスペク
トル包絡を求めることは原理的には不可能である。そこ
で、音声を分析することによって求められたスペクトル
包絡は、図１４（ａ）の破線で示されるように離散点で
は真のスペクトル包絡と等しくなっても、それ以外の周
波数では誤差が生じる可能性がある。よって、得られた
スペクトル包絡は図１４（ｂ）に示されるように、真の
スペクトル包絡に対して山の部分（ホルマント）がなま
ったスペクトルになることがある。この場合、ｆと異な
る基本周波数ｆ′で音声合成を行って得られる合成音声
のスペクトルは、図１４（ｃ）に示されるように、図１
３（ｃ）に示される自然音声のスペクトルと比較してな
まったものとなり、合成音声の明瞭性が劣化する原因と
なる。Here, FIG. 13 uttered at the fundamental frequency f.
By analyzing the voice having the spectrum shown in (b) to obtain the spectrum envelope, the LPC coefficient storage unit 1
Consider finding the LPC coefficient stored in 4. In the case of a voice signal, it is generally impossible in principle to obtain a true spectrum envelope as shown in FIG. 13A from a discrete spectrum as shown in FIG. Therefore, even if the spectrum envelope obtained by analyzing the voice becomes equal to the true spectrum envelope at the discrete points as shown by the broken line in FIG. 14A, an error may occur at other frequencies. There is. Therefore, the obtained spectrum envelope may be a spectrum in which the peak portion (formant) is distorted with respect to the true spectrum envelope, as shown in FIG. In this case, the spectrum of the synthesized speech obtained by performing speech synthesis at a fundamental frequency f ′ different from f is as shown in FIG.
The spectrum becomes blunt in comparison with the spectrum of natural speech shown in FIG. 3 (c), which causes deterioration of clarity of synthesized speech.

【００１３】また、音声合成単位を接続する際にフィル
タ係数などのパラメータの補間を行うことによって、ス
ペクトルの凹凸が平均化されてなまってしまい、合成音
声が不明瞭になってしまうという問題がある。例えば、
連続する２つの音声合成単位のＬＰＣ係数の周波数特性
がそれぞれ図１５（ａ）（ｂ）に示されるように表され
ているとすると、これら２つのフィルタ係数を補間する
ことによって得られるフィルタの周波数特性は図１５
（ｃ）に示されるようにスペクトルの凹凸が平均化され
てなまってしまう場合があり、これも合成音声の明瞭性
が劣化する原因となり得る。Further, when the voice synthesis units are connected, interpolation of parameters such as a filter coefficient causes the unevenness of the spectrum to be averaged and blunted, and the synthesized voice becomes unclear. . For example,
If the frequency characteristics of the LPC coefficients of two consecutive speech synthesis units are represented as shown in FIGS. 15A and 15B, respectively, the frequency of the filter obtained by interpolating these two filter coefficients is shown. The characteristics are shown in Figure 15.
As shown in (c), the unevenness of the spectrum may be averaged and dulled, which may also cause deterioration of the clarity of the synthesized speech.

【００１４】また、残差波形のピークの位置がフレーム
毎に異なる場合、有声音源のピッチが乱れるという問題
がある。例えば、図１６に示されるように残差波形を等
間隔Ｔで配置したとしても、各残差波形のピークの位置
が異なると、合成音声信号のピッチのハーモニクスが乱
れ、音質劣化の原因となる。Further, when the peak position of the residual waveform differs for each frame, there is a problem that the pitch of the voiced sound source is disturbed. For example, even if the residual waveforms are arranged at equal intervals T as shown in FIG. 16, if the positions of the peaks of the residual waveforms are different, the harmonics of the pitch of the synthesized speech signal are disturbed, which causes deterioration in sound quality. .

【００１５】本発明は、上記の問題点を解決すべくなさ
れたもので、テキスト音声合成において高品質の合成音
声を得るのに適した音声合成方法を提供することを目的
とする。The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech synthesis method suitable for obtaining high-quality synthesized speech in text speech synthesis.

【００１６】[0016]

【課題を解決するための手段】上述した目的を達成する
ために、本発明は予め記憶した複数の音声合成単位の情
報から選択された情報を接続することによって合成音声
信号を生成する音声合成方法において、声道フィルタの
フィルタ係数として用いられる音声のスペクトルパラメ
ータに従ってフィルタ係数が決定されるホルマント強調
フィルタを設け、このフィルタにより合成音声信号のホ
ルマントを強調するようにしたことを骨子とする。In order to achieve the above-mentioned object, the present invention is a speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance. The main point is to provide a formant enhancement filter whose filter coefficient is determined according to the spectral parameter of the voice used as the filter coefficient of the vocal tract filter, and to enhance the formant of the synthesized voice signal by this filter.

【００１７】すなわち、本発明に係る第１の音声合成方
法は、予め記憶した複数の音声合成単位の情報から選択
された情報を接続することによって合成音声信号を生成
する音声合成方法において、予め記憶した音声合成単位
の情報が少なくとも音声のスペクトルパラメータを含
み、選択されたスペクトルパラメータに従ってフィルタ
係数が決定されるホルマント強調フィルタにより合成音
声信号のホルマントを強調することによって、なまった
スペクトルが整形され、明瞭な合成音声が得られるよう
にしたものである。That is, the first voice synthesizing method according to the present invention is a voice synthesizing method for generating a synthetic voice signal by connecting information selected from a plurality of prestored information of voice synthesizing units, and storing the voice synthesizing method in advance. The information of the synthesized speech synthesis unit includes at least the spectrum parameter of the speech, and the filter coefficient is determined according to the selected spectrum parameter. By enhancing the formant of the synthesized speech signal by the formant enhancement filter, the rounded spectrum is shaped and becomes clear. It is designed so that various synthetic voices can be obtained.

【００１８】本発明に係る第２の音声合成方法は、予め
記憶した音声合成単位の情報が少なくとも音声のスペク
トルパラメータと１ピッチ周期の声道フィルタ駆動信号
を含み、選択されたスペクトルパラメータに従ってフィ
ルタ係数が決定されるホルマント強調フィルタにより合
成音声信号のホルマントをスペクトルを強調することに
よって、より少ない計算量で明瞭な合成音声が得られる
ようにしたものである。In a second voice synthesis method according to the present invention, the information of the voice synthesis unit stored in advance includes at least a voice spectrum parameter and a vocal tract filter drive signal of one pitch period, and the filter coefficient is selected according to the selected spectrum parameter. By enhancing the spectrum of the formants of the synthesized speech signal by the formant enhancement filter, the clear synthesized speech can be obtained with a smaller amount of calculation.

【００１９】本発明に係る第３の音声合成方法は、予め
記憶した音声合成単位の情報に少なくとも音声の１ピッ
チ周期の波形のホルマントを強調した波形を含ませるこ
とによって、音声合成時にホルマント強調処理を行うこ
となく、明瞭な合成音声が得られるようにしたものであ
る。A third voice synthesis method according to the present invention includes a formant enhancement process at the time of voice synthesis by including a waveform in which a formant of a waveform of at least one pitch period of a voice is emphasized in information of a voice synthesis unit stored in advance. It is possible to obtain a clear synthetic speech without performing.

【００２０】本発明に係る第４の音声合成方法は、予め
記憶した音声合成単位の情報が少なくとも音声のスペク
トルパラメータを含み、選択されたスペクトルパラメー
タに従ってフィルタ係数が決定されるホルマント強調フ
ィルタにより合成音声信号のホルマントを整形すると共
に、音声のピッチパラメータに従ってフィルタ係数が決
定されるピッチ強調フィルタにより合成音声信号のピッ
チを強調することによって、なまったスペクトルが整形
されると同時に、ピッチのハーモニクスの乱れのない明
瞭かつ高品質の合成音声が得られるようにしたものであ
る。In a fourth speech synthesis method according to the present invention, information of a speech synthesis unit stored in advance includes at least a spectral parameter of a speech, and a formant emphasis filter in which a filter coefficient is determined according to the selected spectral parameter By shaping the formant of the signal and emphasizing the pitch of the synthesized speech signal by a pitch enhancement filter whose filter coefficient is determined according to the pitch parameter of the speech, a dull spectrum is shaped and at the same time the pitch harmonics are disturbed. It is intended to provide clear and high-quality synthesized speech.

【００２１】本発明の第５に係る音声合成方法は、予め
記憶した音声合成単位の情報が少なくとも音声のスペク
トルパラメータと１ピッチ周期の声道フィルタ駆動信号
を含み、選択されたスペクトルパラメータに従ってフィ
ルタ係数が決定されるホルマント強調フィルタにより合
成音声信号のホルマントを強調すると共に、音声のピッ
チパラメータに従ってフィルタ係数が決定されるピッチ
強調フィルタにより合成音声信号のピッチを強調するこ
とによって、より少ない計算量でなまったスペクトルが
整形されると同時に、ピッチのハーモニクスの乱れのな
い明瞭かつ高品質の合成音声が得られるようにしたもの
である。According to a fifth aspect of the speech synthesis method of the present invention, the information of the speech synthesis unit stored in advance includes at least a speech spectrum parameter and a vocal tract filter drive signal of one pitch period, and the filter coefficient is selected according to the selected spectrum parameter. The formant enhancement filter that determines the pitch of the synthesized speech signal emphasizes the formant of the synthesized speech signal, and the pitch enhancement filter that determines the filter coefficient according to the pitch parameter of the speech enhances the pitch of the synthesized speech signal. The spectrum is shaped, and at the same time, clear and high quality synthesized speech without disturbance of pitch harmonics can be obtained.

【００２２】本発明に係る第６の音声合成方法は、予め
記憶した音声合成単位の情報が少なくとも音声の１ピッ
チ周期の波形のホルマントを強調した波形を含み、さら
に音声のピッチパラメータに従ってフィルタ係数が決定
されるピッチ強調フィルタにより合成音声信号のピッチ
を強調することによって、音声合成時にホルマント強調
処理を行うことなく、明瞭でかつピッチのハーモニクス
の乱れのない高品質の合成音声が得られるようにしたも
のである。In a sixth voice synthesizing method according to the present invention, information of a voice synthesizing unit stored in advance includes a waveform in which a formant of a waveform of at least one pitch period of a voice is emphasized, and a filter coefficient is set in accordance with a voice pitch parameter. By emphasizing the pitch of the synthesized speech signal by the determined pitch enhancement filter, it is possible to obtain high-quality synthesized speech that is clear and has no disturbance of pitch harmonics without performing formant enhancement processing during speech synthesis. It is a thing.

【００２３】[0023]

BEST MODE FOR CARRYING OUT THE INVENTION

（第１の実施形態）図１に、本発明の第１の音声合成方
法を適用した第１の実施形態に係る音声合成装置の構成
を示す。この音声合成装置は、残差波形記憶部１１、有
声音源生成部１２、無声音源生成部１３、ＬＰＣ係数記
憶部１４、ＬＰＣ係数補間部１５、声道フィルタ部１
６、および本発明において新たに設けられたホルマント
強調フィルタ部１７から構成される。(First Embodiment) FIG. 1 shows the configuration of a voice synthesizing apparatus according to a first embodiment to which the first voice synthesizing method of the present invention is applied. This speech synthesizer includes a residual waveform storage unit 11, a voiced sound source generation unit 12, an unvoiced sound source generation unit 13, an LPC coefficient storage unit 14, an LPC coefficient interpolation unit 15, and a vocal tract filter unit 1.
6 and the formant emphasis filter unit 17 newly provided in the present invention.

【００２４】残差波形記憶部１１は、複数の音声合成単
位の情報として、声道フィルタ駆動信号の基となる１ピ
ッチ周期の複数の残差波形を予め記憶しており、これら
の残差波形の中から波形選択情報１０１に従って選択さ
れた一つの１ピッチ周期長残差波形１０２を出力する。
有声音源生成部１２は、フレーム平均ピッチ１０３を周
期として１ピッチ周期長残差波形１０２を繰り返し、こ
の繰り返し波形にフレーム平均パワー１０２を乗じるこ
とによって有声音源信号１０５を生成する。この有声音
源信号１０５は、有声／無声判別情報１０７により判別
される有声区間において出力され、声道フィルタ部１６
に入力される。無声音源生成部１３は、フレーム平均パ
ワー１０２に基づいて白色雑音などで表現される無声音
源信号１０６を出力する。この無声音源信号１０６は、
有声／無声判別情報１０７により判別される無声区間に
おいて出力され、声道フィルタ部１６に入力される。The residual waveform storage unit 11 stores in advance a plurality of residual waveforms of one pitch cycle, which is the basis of the vocal tract filter drive signal, as information of a plurality of voice synthesis units. One 1-pitch cycle length residual waveform 102 selected according to the waveform selection information 101 is output.
The voiced sound source generation unit 12 repeats the one-pitch cycle length residual waveform 102 with the frame average pitch 103 as a cycle, and multiplies the repeated waveform by the frame average power 102 to generate the voiced sound source signal 105. This voiced sound source signal 105 is output in the voiced section discriminated by the voiced / unvoiced discrimination information 107, and the vocal tract filter unit 16
Is input to The unvoiced sound source generation unit 13 outputs an unvoiced sound source signal 106 represented by white noise or the like based on the frame average power 102. This unvoiced sound source signal 106 is
It is output in the unvoiced section determined by the voiced / unvoiced determination information 107 and input to the vocal tract filter unit 16.

【００２５】ＬＰＣ係数記憶部１４は、別の複数の音声
合成単位の情報として、予め自然音声に線形予測分析
（ＬＰＣ分析）を行って求められた複数のＬＰＣ係数を
記憶しており、ＬＰＣ係数選択情報１０８に従って一つ
のＬＰＣ係数１０９が選択的に出力される。残差波形記
憶部１１は、これらのＬＰＣ係数で逆フィルタリングを
行うことによって得られる残差波形から切り出された１
ピッチ周期の波形を記憶している。ＬＰＣ係数補間部１
５は、フレーム間でＬＰＣ係数が不連続とならないよう
に前フレームのＬＰＣ係数と現フレームのＬＰＣ係数１
０９とを補間してＬＰＣ係数１１０を出力する。声道フ
ィルタ部１６は、入力された有声音源信号１０５または
無声音源信号１０６によってＬＰＣ係数１１０をフィル
タ係数とする声道フィルタを駆動し、合成音声信号１１
１を出力する。The LPC coefficient storage unit 14 stores a plurality of LPC coefficients obtained as a result of performing linear prediction analysis (LPC analysis) on natural speech in advance as information on another plurality of speech synthesis units. One LPC coefficient 109 is selectively output according to the selection information 108. The residual waveform storage unit 11 extracts 1 from the residual waveform obtained by performing inverse filtering with these LPC coefficients.
The waveform of the pitch cycle is stored. LPC coefficient interpolator 1
5 is the LPC coefficient of the previous frame and the LPC coefficient of the current frame 1 so that the LPC coefficient does not become discontinuous between frames.
09 and outputs the LPC coefficient 110. The vocal tract filter unit 16 drives a vocal tract filter having an LPC coefficient 110 as a filter coefficient by the input voiced sound source signal 105 or unvoiced sound source signal 106, and synthesizes the synthesized speech signal 11
Outputs 1.

【００２６】そして、ホルマント強調フィルタ部１７
は、ＬＰＣ係数１１２に従って決定されるフィルタ係数
で合成音声信号１１１にフィルタリングを行って、ホル
マント（スペクトルの山の部分）を強調し、音韻記号１
１３を出力する。すなわち、ホルマント強調フィルタで
は音声のスペクトルパラメータに従ったフィルタ係数を
必要とするが、この種の音声合成装置では声道フィルタ
部１６のフィルタ係数をスペクトルパラメータであるＬ
ＰＣ係数に従って設定することに着目し、ＬＰＣ係数補
間部１５から出力されたＬＰＣ係数１１２に従ってホル
マント強調フィルタ部１７のフィルタ係数を設定してい
る。The formant emphasis filter unit 17
Filters the synthesized speech signal 111 with a filter coefficient determined according to the LPC coefficient 112 to emphasize the formants (the peaks of the spectrum), and
13 is output. That is, the formant emphasis filter requires a filter coefficient according to the spectrum parameter of the voice, but in this type of speech synthesizer, the filter coefficient of the vocal tract filter unit 16 is the spectrum parameter L.
Focusing on the setting according to the PC coefficient, the filter coefficient of the formant enhancement filter unit 17 is set according to the LPC coefficient 112 output from the LPC coefficient interpolating unit 15.

【００２７】このようにホルマント強調フィルタ部１７
により合成音声信号１１１のホルマントを強調すること
によって、図１３および図１４で説明したような原因で
なまったスペクトルが整形され、明瞭な合成音声を得る
ことができる。In this way, the formant emphasis filter unit 17
By emphasizing the formant of the synthesized speech signal 111, the blunted spectrum due to the causes described with reference to FIGS. 13 and 14 is shaped, and clear synthesized speech can be obtained.

【００２８】図２は、有声音源生成部１２の上述と異な
る他の構成例を示す図である。同図において、ピッチ周
期記憶部２４はフレーム平均ピッチ１０３を記憶し、前
フレームのフレーム平均ピッチ２０４を出力する。ピッ
チ周期補間部２５は、この前フレームのフレーム平均ピ
ッチ２０４から現フレームのフレーム平均ピッチ１０３
にピッチ周期が滑らかに変化するようにピッチ周期の補
間を行い、波形重畳位置指定情報２０５を出力する。乗
算器２１は、１ピッチ周期長残差波形１０２にフレーム
平均パワー１０２を乗じて１ピッチ周期長残差波形２０
１を出力する。ピッチ波形記憶部２２は、１ピッチ周期
長残差波形２０１を記憶し、１フレーム前の１ピッチ周
期長残差波形２０２を出力する。波形補間部２３は、波
形重畳位置指定情報２０５に従った重みを付けて１ピッ
チ周期長残差波形２０１と１ピッチ周期長残差波形２０
２の補間を行い、補間後の１ピッチ周期長残差波形２０
３を出力する。波形重畳処理部２６は、波形重畳位置指
定情報２０５で指定される波形重畳位置に１ピッチ周期
長残差波形２０３を重畳することによって、有声音源信
号１０５を生成して出力する。FIG. 2 is a diagram showing another configuration example of the voiced sound source generation unit 12 different from the above. In the figure, the pitch cycle storage unit 24 stores the frame average pitch 103 and outputs the frame average pitch 204 of the previous frame. The pitch period interpolator 25 calculates the frame average pitch 204 of the previous frame from the frame average pitch 204 of the previous frame.
Then, the pitch cycle is interpolated so that the pitch cycle changes smoothly, and the waveform superposition position designation information 205 is output. The multiplier 21 multiplies the 1-pitch cycle length residual waveform 102 by the frame average power 102 to obtain the 1-pitch cycle length residual waveform 20.
Outputs 1. The pitch waveform storage unit 22 stores the 1-pitch cycle length residual waveform 201 and outputs the 1-pitch cycle-length residual waveform 202 one frame before. The waveform interpolating unit 23 weights the 1-pitch cycle length residual waveform 201 and the 1-pitch cycle length residual waveform 20 with weighting according to the waveform superposition position designation information 205.
2 interpolation is performed, and the 1-pitch cycle length residual waveform 20 after interpolation is performed.
3 is output. The waveform superimposition processing unit 26 generates and outputs the voiced sound source signal 105 by superimposing the one-pitch cycle length residual waveform 203 on the waveform superposition position designated by the waveform superposition position designation information 205.

【００２９】次に、ホルマント強調フィルタ部１７の構
成例について説明する。第１の構成例では、ホルマント
強調フィルタを全極フィルタで構成する。このホルマン
ト強調フィルタの伝達関数は、次式で表される。Next, an example of the structure of the formant emphasis filter section 17 will be described. In the first configuration example, the formant enhancement filter is configured by an all-pole filter. The transfer function of this formant enhancement filter is expressed by the following equation.

【００３０】[0030]

【数１】 [Equation 1]

【００３１】但し、α_i はＬＰＣ係数、Ｎはフィルタ次
数、βは０＜β＜１の定数である。ここで、声道フィル
タの伝達関数をＨ（ｚ）とするとＱ₁ （ｚ）＝Ｈ（ｚ／
β）であるから、Ｑ₁ （ｚ）はＨ（ｚ）の極ｐ_i （ｉ＝
１，…，Ｎ）をβｐ_i （ｉ＝１，…，Ｎ）でそれぞれ置
き換えたものと言える。言い換えれば、Ｑ₁ （ｚ）はＨ
（ｚ）の全ての極を一定の割合βで原点に近付けたもの
であるから、Ｈ（ｚ）と比較してＱ₁ （ｚ）の周波数ス
ペクトルは凹凸がなまったものとなる。よって、βが大
きいほどホルマント強調の度合いが大きくなる。However, α _i is the LPC coefficient, N is the filter order, and β is a constant of 0 <β <1. Here, assuming that the transfer function of the vocal tract filter is H (z), Q ₁ (z) = H (z /
β), Q ₁ (z) is the pole p _i (i =
It can be said that 1, 1, ..., N) are replaced with β p _i (i = 1, ..., N). In other words, Q ₁ (z) is H
Since all the poles of (z) are brought close to the origin at a constant ratio β, the frequency spectrum of Q ₁ (z) becomes uneven as compared with H (z). Therefore, the larger β is, the larger the degree of formant enhancement is.

【００３２】ホルマント強調フィルタ部１７の第２の構
成例では、ホルマント強調フィルタを極零型フィルタと
固定の特性を持つ１次のハイパスフィルタの縦続接続で
構成する。このホルマント強調フィルタの伝達関数は、
次式で表される。In the second configuration example of the formant emphasis filter section 17, the formant emphasis filter is constituted by a cascade connection of a pole-zero type filter and a first-order high-pass filter having a fixed characteristic. The transfer function of this formant enhancement filter is
It is expressed by the following equation.

【００３３】[0033]

【数２】 (Equation 2)

【００３４】但し、γは０＜γ＜βの定数であり、μは
０＜μ＜１の定数である。この場合は、極零フィルタに
よってホルマント強調を行い、極零フィルタの周波数特
性の余分なスペクトル傾きを１次のハイパスフィルタに
よって補正している。However, γ is a constant of 0 <γ <β, and μ is a constant of 0 <μ <1. In this case, the formant enhancement is performed by the pole-zero filter, and the extra spectral slope of the frequency characteristic of the pole-zero filter is corrected by the first-order high-pass filter.

【００３５】なお、ホルマント強調フィルタ部１７の構
成は上述した二つの例に限定されるものでない。また、
声道フィルタ部１６とホルマント強調フィルタ部１７の
位置を逆にした構成も可能である。すなわち、声道フィ
ルタ部１６およびホルマント強調フィルタ１７はいずれ
も線形システムであるから、その位置を入れ換えても同
様の効果が得られる。The configuration of the formant enhancement filter unit 17 is not limited to the above two examples. Also,
A configuration in which the vocal tract filter section 16 and the formant emphasis filter section 17 are reversed in position is also possible. That is, since the vocal tract filter unit 16 and the formant emphasis filter 17 are both linear systems, the same effect can be obtained even if the positions are interchanged.

【００３６】このように本実施形態の音声合成装置で
は、声道フィルタ部１６と縦続してホルマント強調強調
フィルタ部１７を配置し、そのフィルタ係数をＬＰＣ係
数に従って設定することにより、図１３や図１４で説明
したような原因でなまった合成音声信号のスペクトルが
整形され、明瞭な合成音声を得ることが可能となる。As described above, in the speech synthesizer of this embodiment, the formant emphasis emphasis filter unit 17 is arranged in cascade with the vocal tract filter unit 16 and the filter coefficient is set in accordance with the LPC coefficient. The spectrum of the synthetic speech signal which has been distorted due to the cause as described in 14 is shaped, and clear synthetic speech can be obtained.

【００３７】（第２の実施形態）次に、本発明の第２の
実施形態に係る音声合成装置の構成を図３に示す。図３
において、図１と同一の参照番号を付した構成要素は図
１と同じ機能を有するものとして説明を省略する。(Second Embodiment) Next, FIG. 3 shows the configuration of a speech synthesizer according to a second embodiment of the present invention. FIG.
In FIG. 1, the components designated by the same reference numerals as those in FIG. 1 have the same functions as those in FIG.

【００３８】本実施形態では、有声／無声判別情報１０
７により判別される無声区間においては、第１の実施形
態と同様に、無声音源生成部１３で生成された無声音源
信号を駆動信号とし、ＬＰＣ補間部１５から出力された
ＬＰＣ係数１１０をフィルタ係数とする声道フィルタ部
１６で合成された無声音声信号３０３を出力する。一
方、有声／無声判別情報１０７により判別される有声区
間においては、以下に示すように第１の実施形態とは異
なる手順で処理が行われる。In the present embodiment, the voiced / unvoiced discrimination information 10
In the unvoiced section determined by 7, the unvoiced sound source signal generated by the unvoiced sound source generation unit 13 is used as the drive signal, and the LPC coefficient 110 output from the LPC interpolation unit 15 is used as the filter coefficient in the unvoiced section. The unvoiced voice signal 303 synthesized by the vocal tract filter unit 16 is output. On the other hand, in the voiced section discriminated by the voiced / unvoiced discrimination information 107, the processing is performed by a procedure different from that in the first embodiment as described below.

【００３９】声道フィルタ部３１は、残差波形記憶部１
１から出力される１ピッチ周期長残差波形１０２を声道
フィルタ駆動信号とし、ＬＰＣ係数記憶部１４から出力
されるＬＰＣ係数１０９をフィルタ係数として１ピッチ
周期長音声波形３０１を合成する。ホルマント強調フィ
ルタ部１７は、ＬＰＣ係数１０９をフィルタ係数１１２
とするホルマント強調フィルタにより１ピッチ周期長音
声波形３０１にフィルタリングを行ってホルマントを強
調し、１ピッチ周期長音声波形３０２を出力する。この
１ピッチ周期長音声波形３０２は、有声音生成部３２に
入力される。The vocal tract filter unit 31 includes a residual waveform storage unit 1.
The 1-pitch cycle length residual waveform 102 output from 1 is used as a vocal tract filter drive signal, and the LPC coefficient 109 output from the LPC coefficient storage unit 14 is used as a filter coefficient to synthesize the 1-pitch cycle length speech waveform 301. The formant emphasis filter unit 17 converts the LPC coefficient 109 into the filter coefficient 112.
The 1-pitch cycle-long speech waveform 301 is filtered by the formant emphasis filter to emphasize the formants, and the 1-pitch cycle-long speech waveform 302 is output. The one-pitch cycle length speech waveform 302 is input to the voiced sound generation unit 32.

【００４０】有声音生成部３２は、図２に示した有声音
源生成部１２と同一の構成で実現することができる。た
だし、図２の構成で有声音生成部３２を実現する場合、
有声音源生成部１２では１ピッチ周期長残差波形１０２
が入力されるのに対して、１ピッチ周期長音声波形３０
２が入力されるため、出力は有声音源信号１０５ではな
く有声音声信号３０４となる。そして、有声／無声判別
情報１０７により判別される無声区間においては無声音
声信号３０３を選択し、また有声区間においては有声音
声信号３０４を選択して合成音声信号３０５を出力す
る。The voiced sound generator 32 can be realized with the same configuration as the voiced sound source generator 12 shown in FIG. However, when realizing the voiced sound generation unit 32 with the configuration of FIG.
In the voiced sound source generation unit 12, the one-pitch cycle length residual waveform 102
While 1 pitch cycle length speech waveform 30
Since 2 is input, the output is not the voiced sound source signal 105 but the voiced voice signal 304. Then, the unvoiced voice signal 303 is selected in the unvoiced section discriminated by the voiced / unvoiced discrimination information 107, and the voiced voice signal 304 is selected in the voiced section to output the synthesized speech signal 305.

【００４１】本実施形態によれば、有声音声信号を合成
する場合、声道フィルタ部３１およびホルマント強調フ
ィルタ部１７においてフィルタリングをかける長さが１
フレーム当たり１ピッチ周期長で良く、またＬＰＣ係数
の補間が不要となるため、第１の実施形態と比較して少
ない計算量で同様の効果を得ることができる。According to this embodiment, when synthesizing a voiced speech signal, the filtering length in the vocal tract filter unit 31 and the formant emphasis filter unit 17 is 1
Since one pitch cycle length is sufficient for each frame and interpolation of the LPC coefficient is unnecessary, the same effect can be obtained with a smaller amount of calculation as compared with the first embodiment.

【００４２】なお、本実施形態では有声音声信号のみに
ホルマント強調を行っているが、無声音声信号３０３に
も有声音声信号の場合と同様にホルマント強調フィルタ
部を設けてホルマント強調を行う構成も可能である。In this embodiment, the formant enhancement is performed only on the voiced voice signal, but the unvoiced voice signal 303 may also be provided with the formant enhancement filter unit to perform the formant enhancement as in the case of the voiced voice signal. Is.

【００４３】また、本実施形態においてもホルマント強
調フィルタ部１７と声道フィルタ部３１の位置を逆にし
た構成も可能である。（第３の実施形態）次に、本発明の第３の実施形態に係
る音声合成装置の構成を図４に示す。図４において、図
３と同一の参照番号を付した構成要素は図３と同じ機能
を有するものとして説明を省略する。Also in the present embodiment, the formant emphasis filter section 17 and the vocal tract filter section 31 may be arranged in opposite positions. (Third Embodiment) Next, FIG. 4 shows the configuration of a speech synthesizer according to a third embodiment of the present invention. 4, the components denoted by the same reference numerals as those in FIG. 3 have the same functions as those in FIG. 3, and the description thereof will be omitted.

【００４４】図３で説明した第２の実施形態では１ピッ
チ周期長音声波形３０１にホルマント強調を行うように
なっていたのに対して、本実施形態は合成音声信号３０
５に対してホルマント強調を行うようにした点が第２の
実施形態と異なる。従って、本実施形態によっても第２
の実施形態と同様の効果を得ることができる。In the second embodiment described with reference to FIG. 3, the formant enhancement is performed on the one-pitch cycle length speech waveform 301, whereas in this embodiment the synthesized speech signal 30 is used.
5 is different from the second embodiment in that the formant enhancement is performed on No. 5. Therefore, according to the present embodiment, the second
The same effect as that of the embodiment can be obtained.

【００４５】（第４の実施形態）次に、本発明の第４の
実施形態に係る音声合成装置の構成を図５に示す。図５
において、図３と同一の参照番号を付した構成要素は図
３と同じ機能を有するものとして説明を省略する。(Fourth Embodiment) Next, FIG. 5 shows the configuration of a speech synthesizer according to a fourth embodiment of the present invention. FIG.
In FIG. 3, the components designated by the same reference numerals as those in FIG. 3 have the same functions as those in FIG.

【００４６】本実施形態では、ピッチ波形記憶部４１に
１ピッチ周期長音声波形を記憶し、記憶されている１ピ
ッチ周期長音声波形の中から波形選択情報１０１に従っ
て１ピッチ周期長音声波形３０２を出力する。ここで、
ピッチ波形記憶部４１に記憶されている１ピッチ周期長
音声波形は、予め図６に示される処理を行ってホルマン
トを強調した波形である。In this embodiment, the 1-pitch cycle-long speech waveform is stored in the pitch-waveform storage unit 41, and the 1-pitch cycle-long speech waveform 302 is selected from the stored 1-pitch cycle-long speech waveform according to the waveform selection information 101. Output. here,
The one-pitch cycle length speech waveform stored in the pitch waveform storage unit 41 is a waveform in which the formant is emphasized by performing the processing shown in FIG. 6 in advance.

【００４７】すなわち、図３の構成においてはオンライ
ンで行っていた処理を本実施形態では図６の構成におい
て予めオフラインで行い、残差波形記憶部１１およびＬ
ＰＣ係数記憶部１４から出力された残差波形およびＬＰ
Ｃ係数に基づいて声道フィルタ部３１で合成した合成音
声信号３０１に対してホルマント強調フィルタ１１２で
ホルマント強調を行って、全ての音声合成単位について
１ピッチ周期長音声波形を求め、それらをピッチ波形記
憶部４１に記憶するようにしたものである。従って、本
実施形態によると、１ピッチ周期長音声波形の合成およ
びホルマント強調に必要な計算量を削減することができ
る。That is, in the present embodiment, the processing which was performed online in the configuration of FIG. 3 is performed offline in advance in the configuration of FIG. 6, and the residual waveform storage unit 11 and L are stored.
Residual waveform and LP output from the PC coefficient storage unit 14
Formant enhancement is performed on the synthesized speech signal 301 synthesized by the vocal tract filter unit 31 based on the C coefficient by the formant enhancement filter 112 to obtain a 1-pitch cycle-long speech waveform for all speech synthesis units, and these are calculated as pitch waveforms. The data is stored in the storage unit 41. Therefore, according to the present embodiment, it is possible to reduce the amount of calculation required for synthesizing the 1-pitch cycle length speech waveform and for enhancing the formant.

【００４８】（第５の実施形態）次に、本発明の第５の
実施形態に係る音声合成装置の構成を図７に示す。図７
において、図５と同一の参照番号を付した構成要素は図
５と同じ機能を有するものとして説明を省略する。本実
施形態では、無声音声記憶部４２に記憶されている無声
音声の中から、無声音声選択情報６０１に従って選択し
た無声音声３０３を出力するものである。本実施形態に
よると、図５で説明した第４の実施形態と比較して、無
声音声信号を合成する際に声道フィルタによるフィルタ
リングを行う必要がないため、さらに計算量が削減され
る。(Fifth Embodiment) FIG. 7 shows the arrangement of a speech synthesizer according to the fifth embodiment of the present invention. FIG.
In FIG. 5, the constituent elements denoted by the same reference numerals as those in FIG. 5 have the same functions as those in FIG. In this embodiment, the unvoiced voice 303 stored in the unvoiced voice storage unit 42 is output as the unvoiced voice 303 selected according to the unvoiced voice selection information 601. According to the present embodiment, compared with the fourth embodiment described with reference to FIG. 5, it is not necessary to perform filtering by the vocal tract filter when synthesizing the unvoiced voice signal, so that the calculation amount is further reduced.

【００４９】（第６の実施形態）次に、本発明の第６の
実施形態に係る音声合成装置の構成を図８に示す。図８
において、図１７と同一の参照番号を付した構成要素は
図１７と同じ機能を有するものとして説明を省略する。(Sixth Embodiment) FIG. 8 shows the arrangement of a speech synthesizer according to the sixth embodiment of the present invention. FIG.
In FIG. 17, the components denoted by the same reference numerals as those in FIG. 17 have the same functions as those in FIG.

【００５０】本実施形態は、図１７の構成にピッチ強調
フィルタ部５１が追加された構成となっている。このピ
ッチ強調フィルタ部５１は、フレーム平均ピッチ１０３
に従って係数が決定されるピッチ強調フィルタによって
合成音声信号１１１にフィルタリングを行い、ピッチを
強調して合成音声信号５０１を出力する。ピッチ強調フ
ィルタ部５１は、例えば次の伝達関数を持つフィルタに
よって実現される。In this embodiment, a pitch emphasis filter section 51 is added to the structure of FIG. The pitch emphasis filter unit 51 is configured to adjust the frame average pitch 103
The synthesized speech signal 111 is filtered by the pitch emphasizing filter whose coefficient is determined in accordance with the above to emphasize the pitch and output the synthesized speech signal 501. The pitch emphasis filter unit 51 is realized by, for example, a filter having the following transfer function.

【００５１】[0051]

【数３】ここで、ｐはピッチ周期であり、γとλはピッチゲイン
に基づいて次式のように計算される。(Equation 3) Here, p is the pitch period, and γ and λ are calculated according to the following equations based on the pitch gain.

【００５２】[0052]

【数４】 (Equation 4)

【００５３】Ｃ_z ，Ｃ_p はピッチの強調の度合いを制御
するための定数であり、経験的に決められる。また、ｆ
（ｘ）は処理する信号が周期性を含まない無声音声信号
のとき不要なピッチ強調を避けるため用いられる制御因
子である。ｘはピッチゲインに対応し、このｘがあるし
きい値（典型的に０．６）より小さいとき無声音と判定
し、ｆ（ｘ）＝０とする。ｘがしきい値以上のときはｆ
（ｘ）＝ｘとする。ｘが１を超えると、安定性を保つた
めｆ（ｘ）＝１とする。Ｃ_g は無声音と有声音でフィル
タのゲインが変動するのを吸収するためのもので、次式
のように計算される。C _z and C _p are constants for controlling the degree of pitch enhancement and are determined empirically. Also, f
(X) is a control factor used to avoid unnecessary pitch enhancement when the signal to be processed is an unvoiced speech signal that does not include periodicity. x corresponds to the pitch gain, and when this x is smaller than a certain threshold value (typically 0.6), it is determined to be unvoiced sound and f (x) = 0. f when x is greater than or equal to the threshold
(X) = x. When x exceeds 1, f (x) = 1 is set in order to maintain stability. C _g is for absorbing the fluctuation of the gain of the filter between unvoiced sound and voiced sound, and is calculated by the following equation.

【００５４】[0054]

【数５】 (Equation 5)

【００５５】本実施形態によると、ピッチ強調フィルタ
部５１を新たに設けたことにより、今まで説明した実施
形態におけるなまったスペクトルをホルマント強調によ
り整形することによる合成音声の明瞭化の効果に加え
て、図１５で説明したような原因に基づく合成音声信号
のピッチのハーモニクスの乱れが改善されることによっ
て、より品質のよい合成音声を得ることができる。According to the present embodiment, by newly providing the pitch emphasis filter section 51, in addition to the effect of clarifying the synthesized speech by shaping the blunted spectrum by formant emphasis in the above-described embodiments. As described above with reference to FIG. 15, the disturbance of the pitch harmonics of the synthesized speech signal based on the cause is improved, so that higher quality synthesized speech can be obtained.

【００５６】（第７の実施形態）次に、本発明の第７の
実施形態に係る音声合成装置の構成を図９に示す。本実
施形態は、図１で説明した第１の実施形態の音声合成装
置に第６の実施形態で説明したようなピッチ強調フィル
タ部５１を付加したものである。(Seventh Embodiment) FIG. 9 shows the arrangement of a speech synthesizer according to the seventh embodiment of the present invention. In the present embodiment, the pitch enhancing filter unit 51 as described in the sixth embodiment is added to the speech synthesizer of the first embodiment described in FIG.

【００５７】（第８の実施形態）次に、本発明の第８の
実施形態に係る音声合成装置の構成を図１０に示す。図
１０において、図９と同一の参照番号を付した構成要素
は図９と同じ機能を有するものとして説明を省略する。(Eighth Embodiment) Next, FIG. 10 shows the configuration of a speech synthesizer according to an eighth embodiment of the present invention. In FIG. 10, the components designated by the same reference numerals as those in FIG. 9 have the same functions as those in FIG.

【００５８】本実施形態は、図９での説明した第７の実
施形態の音声合成装置にゲイン調整部６１が追加された
構成となっている。このゲイン調整部６１は、ホルマン
ト強調フィルタ部１７およびピッチ強調フィルタ部５１
の総合のゲインを補正するためのものであり、最終出力
の合成音声信号６０１と声道フィルタ部１６から出力さ
れる合成音声信号１１１のパワーが等しくなるように、
ピッチ強調フィルタ部５１の出力信号に乗算器６２によ
って所定のゲインを乗じている。In this embodiment, a gain adjusting section 61 is added to the speech synthesizer of the seventh embodiment described with reference to FIG. The gain adjusting section 61 includes a formant emphasis filter section 17 and a pitch emphasis filter section 51.
Is for correcting the total gain of, so that the powers of the final output synthetic voice signal 601 and the synthetic voice signal 111 output from the vocal tract filter unit 16 become equal to each other.
A multiplier 62 multiplies the output signal of the pitch enhancement filter section 51 by a predetermined gain.

【００５９】（第９の実施形態）次に、本発明の第９の
実施形態に係る音声合成装置の構成を図１１に示す。本
実施形態は、図３で説明した第２の実施形態の音声合成
装置にピッチ強調フィルタ部５１を付加したものであ
る。(Ninth Embodiment) Next, FIG. 11 shows the configuration of a speech synthesizer according to a ninth embodiment of the present invention. In the present embodiment, a pitch enhancement filter section 51 is added to the speech synthesizer of the second embodiment described in FIG.

【００６０】（第１０の実施形態）次に、本発明の第１
０の実施形態に係る音声合成装置の構成を図１２に示
す。本実施形態は、図５で説明した第５の実施形態の音
声合成装置ピッチ強調フィルタ部５１を付加したもので
ある。(Tenth Embodiment) Next, the first embodiment of the present invention will be described.
FIG. 12 shows the configuration of the speech synthesizer according to the No. 0 embodiment. In this embodiment, the voice synthesizer pitch enhancement filter unit 51 of the fifth embodiment described with reference to FIG. 5 is added.

【００６１】[0061]

【発明の効果】以上説明したように、本発明によればホ
ルマント強調さらにはピッチ強調を行った合成音声信号
を生成することができ、もって明瞭かつ高品質の再生音
声が得られる音声合成方法を提供することができる。As described above, according to the present invention, a voice synthesizing method capable of generating a synthesized voice signal in which formant enhancement and further pitch enhancement are performed and thereby obtaining clear and high quality reproduced voice is provided. Can be provided.

[Brief description of the drawings]

【図１】本発明に係る第１の実施形態を示すブロック図FIG. 1 is a block diagram showing a first embodiment according to the present invention.

【図２】本発明における有声音源生成部の一構成例を示
すブロック図FIG. 2 is a block diagram showing a configuration example of a voiced sound source generation unit according to the present invention.

【図３】本発明に係る第２の実施形態を示すブロック図FIG. 3 is a block diagram showing a second embodiment according to the present invention.

【図４】本発明に係る第３の実施形態を示すブロック図FIG. 4 is a block diagram showing a third embodiment according to the present invention.

【図５】本発明に係る第４の実施形態を示すブロック図FIG. 5 is a block diagram showing a fourth embodiment according to the present invention.

【図６】本発明における１ピッチ長音声波形の生成法の
一例を示すブロック図FIG. 6 is a block diagram showing an example of a method of generating a 1-pitch long speech waveform according to the present invention.

【図７】本発明に係る第５の実施形態を示すブロック図FIG. 7 is a block diagram showing a fifth embodiment according to the present invention.

【図８】本発明に係る第６の実施形態を示すブロック図FIG. 8 is a block diagram showing a sixth embodiment according to the present invention.

【図９】本発明に係る第７の実施形態を示すブロック図FIG. 9 is a block diagram showing a seventh embodiment according to the present invention.

【図１０】本発明に係る第８の実施形態を示すブロック
図FIG. 10 is a block diagram showing an eighth embodiment according to the present invention.

【図１１】本発明に係る第９の実施形態を示すブロック
図FIG. 11 is a block diagram showing a ninth embodiment according to the present invention.

【図１２】本発明に係る第１０の実施形態を示すブロッ
ク図FIG. 12 is a block diagram showing a tenth embodiment of the invention.

【図１３】音声信号のスペクトル、スペクトル包絡およ
び基本周波数の関係を示す図FIG. 13 is a diagram showing a relationship between a spectrum of a voice signal, a spectrum envelope, and a fundamental frequency.

【図１４】分析した音声信号のスペクトルと基本周波数
を変更して合成した合成音声のスペクトルの関係を示す
図FIG. 14 is a diagram showing a relationship between a spectrum of an analyzed speech signal and a spectrum of synthesized speech synthesized by changing a fundamental frequency.

【図１５】２つの合成フィルタの周波数特性とそれらを
補間して得られるフィルタの周波数特性の関係を示す図FIG. 15 is a diagram showing the relationship between the frequency characteristics of two synthesis filters and the frequency characteristics of a filter obtained by interpolating them.

【図１６】有声音源信号のピッチの乱れを示す図FIG. 16 is a diagram showing pitch disturbance of a voiced sound source signal.

【図１７】従来の音声合成装置のブロック図FIG. 17 is a block diagram of a conventional speech synthesizer.

[Explanation of symbols]

１１…ＬＰＣ係数補間部１２…残差波形記憶部１３…ＬＰＣ係数記憶部１４…無声音源生成部１５…声道フィルタ部１６…有声音源生成部１７…ホルマント強調フィルタ部４１…ピッチ波形記憶部４２…無声音声記憶部５１…ピッチ強調フィルタ部６１…ゲイン調整部６２…乗算器 11 ... LPC coefficient interpolation section 12 ... Residual waveform storage section 13 ... LPC coefficient storage section 14 ... Unvoiced sound source generation section 15 ... Vocal tract filter section 16 ... Voiced sound source generation section 17 ... Formant enhancement filter section 41 ... Pitch waveform storage section 42 ... unvoiced voice storage unit 51 ... pitch enhancement filter unit 61 ... gain adjustment unit 62 ... multiplier

Claims

[Claims]

1. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information includes at least a spectrum parameter of speech. And a formant enhancement filter whose filter coefficient is determined according to the selected spectral parameter, to enhance the formant of the synthesized voice signal.

2. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information is at least a speech spectrum parameter and 1 A method for synthesizing speech, which comprises enhancing a formant of the synthesized speech signal by a formant enhancement filter including a vocal tract filter driving signal having a pitch period and having a filter coefficient determined according to a selected spectral parameter.

3. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information is at least one pitch period of speech. A method for synthesizing speech, including a waveform in which a formant of the waveform is emphasized.

4. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information includes at least a spectrum parameter of speech. Shaping the formant of the synthesized speech signal by a formant enhancement filter whose filter coefficient is determined according to the selected spectral parameter, and changing the pitch of the synthesized speech signal by a pitch enhancement filter whose filter coefficient is determined according to the pitch parameter of the speech. A speech synthesis method characterized by emphasis.

5. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information is at least a speech spectrum parameter and 1 A pitch in which a formant enhancement filter including a vocal tract filter driving signal having a pitch period and having a filter coefficient determined according to a selected spectral parameter emphasizes the formant of the synthesized speech signal and a filter coefficient is determined according to a pitch parameter of the speech. A voice synthesizing method, characterized in that a pitch of the synthesized voice signal is enhanced by an enhancement filter.

6. A speech synthesis method for generating a synthesized speech signal by connecting information selected from a plurality of pieces of speech synthesis unit information stored in advance, wherein the speech synthesis unit information is at least one pitch period of speech. A voice synthesizing method comprising a waveform emphasizing a formant of a waveform, further emphasizing a pitch of the synthesized voice signal by a pitch enhancement filter having a filter coefficient determined according to a voice pitch parameter.