JPH06308997A - Voice synthesizing method - Google Patents

Voice synthesizing method

Info

Publication number
JPH06308997A
JPH06308997A JP5094359A JP9435993A JPH06308997A JP H06308997 A JPH06308997 A JP H06308997A JP 5094359 A JP5094359 A JP 5094359A JP 9435993 A JP9435993 A JP 9435993A JP H06308997 A JPH06308997 A JP H06308997A
Authority
JP
Japan
Prior art keywords
frequency
waveform
pitch
input
pitch frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5094359A
Other languages
Japanese (ja)
Other versions
JP3317458B2 (en
Inventor
Hideyuki Mizuno
秀之 水野
Masanobu Abe
匡伸 阿部
Tomohisa Hirokawa
智久 広川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP09435993A priority Critical patent/JP3317458B2/en
Publication of JPH06308997A publication Critical patent/JPH06308997A/en
Application granted granted Critical
Publication of JP3317458B2 publication Critical patent/JP3317458B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PURPOSE:To reduce deterioration quality caused by a change of a pitch frequency at the time of synthesizing a voice by editing a voice waveform. CONSTITUTION:Based on an input phoneme and an input pitch frequency, a voice waveform is selected from a waveform dictionary (S1), and from both a pitch frequency of its selected waveform, and the input pitch frequency, a target formant frequency is determined by referring to a change table (S2). Also, the pitch frequency of the selected waveform is changed in accordance with the input pitch frequency (S3). This format frequency of the voice waveform obtained by changing the pitch frequency is changed to the target formant frequency (S4).

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】この発明は、例えば任意のテキス
トを音声に変換する場合に適用され、入力音韻及び入力
ピッチ周波数に従って音声波形を選択し、その選択した
音声波形のピッチ周波数を入力ピッチ周波数に従って制
御すると共に、選択した音声波形を入力音韻継続時間に
従った長さとし、かつ入力大きさに従ったパワーとし
て、順次波形重畳して合成音声を得る音声合成方法に関
する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is applied, for example, to the conversion of arbitrary text into speech, selects a speech waveform according to an input phoneme and an input pitch frequency, and determines the pitch frequency of the selected speech waveform as the input pitch frequency. The present invention relates to a voice synthesizing method for controlling a voice in accordance with the input phoneme duration, and setting the length of the selected voice waveform according to the input phoneme duration and sequentially superimposing the waveform as power according to the input magnitude.

【0002】[0002]

【従来の技術】従来の音声合成の技術としては、LPC
分析等によって得られたスペクトルパラメータとパルス
等の駆動音源信号とを用いて音声合成を行う方式がこれ
まで一般的である。この方式は音声をモデル化して有限
個のパラメータで音声を表現するものであり、一定の手
順に従えば簡単に音声の分析や合成が可能である。ま
た、波形辞書を用いて適切な波形を選択し、波形重畳法
を用いてピッチ制御を行い合成音を生成する方式がある
(特開平01−284898)。この方式では、自然音
声の波形をそのまま用いているため音声の品質は良く、
自然音声と同等の音声を得ることが可能である。
2. Description of the Related Art As a conventional speech synthesis technique, LPC is used.
A method of synthesizing a voice using a spectrum parameter obtained by analysis and a driving sound source signal such as a pulse has been generally used. This method models a voice and expresses the voice with a finite number of parameters, and it is possible to easily analyze and synthesize the voice by following a certain procedure. There is also a method of selecting an appropriate waveform using a waveform dictionary and performing pitch control using a waveform superposition method to generate a synthesized voice (Japanese Patent Laid-Open No. 01-284898). In this method, since the waveform of natural voice is used as it is, the quality of voice is good,
It is possible to obtain voice equivalent to natural voice.

【0003】[0003]

【発明が解決しようとする課題】前者のスペクトルパラ
メータと駆動音源信号とを用いる方法では音声の品質が
悪いなどの問題があった。また後者の編集合成では、ピ
ッチ周波数とスペクトル構造との相関関係について考慮
していないため、ピッチ周波数の変更処理によって音声
の一部で品質劣化が生じる。
The former method using the spectrum parameter and the driving sound source signal has a problem such as poor voice quality. Also, in the latter edit synthesis, since the correlation between the pitch frequency and the spectral structure is not taken into consideration, the pitch frequency changing process causes quality deterioration in a part of the voice.

【0004】この発明の目的は、波形重畳法を用いてピ
ッチ制御を行い合成音を生成する方法において、ピッチ
周波数の変更処理による品質劣化が少ないようにした音
声合成方法を提供することにある。
An object of the present invention is to provide a method of synthesizing a synthesized voice by performing pitch control using a waveform superposition method, which provides a speech synthesizing method in which quality deterioration due to pitch frequency changing processing is reduced.

【0005】[0005]

【課題を解決するための手段】この発明によれば、波形
重畳法を用いてピッチ制御を行い合成音を生成する方法
において、入力ピッチ周波数及び選択された波形のピッ
チ周波数に従って目的フォルマント周波数を決定し、ピ
ッチ制御された音声波形のフォルマント周波数を目的フ
ォルマント周波数に従って変換する。
According to the present invention, a target formant frequency is determined according to an input pitch frequency and a pitch frequency of a selected waveform in a method of generating a synthesized sound by performing pitch control using a waveform superposition method. Then, the formant frequency of the pitch-controlled voice waveform is converted according to the target formant frequency.

【0006】[0006]

【実施例】図1にこの発明の方法の実施例の流れ図を示
す。この発明では音韻(入力音韻)とピッチ周波数(入
力ピッチ周波数とが入力されるが、例えばテキストを音
声に変換する場合は、テキストが解析されて、音韻系列
とされ、更に各音韻ごとのピッチ周波数と音韻間でのピ
ッチ周波数の連続性とを考慮したピッチパターンと、各
音韻に対する音韻継続時間と、各音韻ごとのパワー(大
きさ)とその音韻間での連続性を考慮したパワーパター
ンとが設定される。
1 is a flow chart of an embodiment of the method of the present invention. In the present invention, a phoneme (input phoneme) and a pitch frequency (input pitch frequency) are input. For example, when converting text into speech, the text is analyzed and made into a phoneme sequence, and the pitch frequency for each phoneme is further input. And a pitch pattern that considers the continuity of the pitch frequency between phonemes, a phoneme duration for each phoneme, a power (magnitude) for each phoneme, and a power pattern that considers the continuity between phonemes. Is set.

【0007】この発明の要部で、音韻系列である各入力
音韻と、その各音韻に対する入力ピッチ周波数とが主と
して関係する。つまり、まず波形選択ステップS1で
は、入力音韻と入力ピッチ周波数に基づき、適当な評価
関数を用いて波形辞書から波形を選択する。ここで評価
関数及び波形選択手法については例えば評価関数を用い
た波形選択手法(参考文献「波形編集型規則音声合成法
における波形選択法」、音講論1−2−21(1988
−10)広川ら)を用いることができる。この手法では
目的とする音韻種別、音素内のピッチ、継続時間長、音
韻一致率等をパラメータとして、(1)式のような評価
関数Wに基づき波形辞書中から最適な波形、つまり最も
小さいWとなる波形を選択する。
In the main part of the present invention, each input phoneme that is a phoneme sequence and the input pitch frequency for each phoneme are mainly related. That is, first, in the waveform selection step S1, a waveform is selected from the waveform dictionary using an appropriate evaluation function based on the input phoneme and the input pitch frequency. Here, regarding the evaluation function and the waveform selection method, for example, a waveform selection method using the evaluation function (reference “Waveform selection method in waveform editing type regular speech synthesis method”, Sound lecture 1-2-21 (1988)
-10) Hirokawa et al. Can be used. In this method, the optimum waveform from the waveform dictionary, that is, the smallest W, is selected from the waveform dictionary based on the evaluation function W as shown in equation (1) with the target phoneme type, the pitch within the phoneme, the duration, the phoneme match rate, etc. as parameters. Select the waveform to be.

【0008】 W=Σωi |Pit −Pis | (1) Σはi=1からnまでωiはi番目のパラメータの重み
係数、Pit はi番目のパラメータの目標値(入力
値)、Pis はi番目のパラメータの実際の値(波形辞
書内の波形の値)。次にフォルマント周波数設定ステッ
プS2で、図2に示すようなテーブルを用いて入力ピッ
チ周波数と、選択した波形の持つピッチ周波数とからフ
ォルマント周波数を決定する。例えば選択した波形のピ
ッチ周波数がP1(Pi-1 ≦P1<Pi )であり、これ
を入力ピッチ周波数P2(Pj-1 ≦P2<Pj )に変更
する場合、図2から第1フォルマント周波数F1t を F1t =F10 +f1ij 〔Hz〕 (F10 は選択された波形の第1フォル マント周波数) (2) として設定する。このような表を、例えばピッチによる
影響が大きいと思われる第1フォルマントから第3フォ
ルマントまで用意しておき、選択した波形が持つピッチ
周波数と入力ピッチ周波数とから各フォルマント毎に表
を参照して目的フォルマント周波数を決定する。
[0008] W = Σω i | Pi t -Pi s | (1) Σ weighting coefficients ωi is the i-th parameter i = 1 to n, Pi t is a target value of the i-th parameter (input value), Pi s is the actual value of the i-th parameter (the value of the waveform in the waveform dictionary). Next, in the formant frequency setting step S2, the formant frequency is determined from the input pitch frequency and the pitch frequency of the selected waveform using a table as shown in FIG. For example, if the pitch frequency of the selected waveform is P1 (P i-1 ≤P1 <P i ), and it is changed to the input pitch frequency P2 (P j-1 ≤P2 <P j ), the first frequency from FIG. The formant frequency F1 t is set as F1 t = F1 0 + f1 ij [Hz] (F1 0 is the first formant frequency of the selected waveform) (2). Such a table is prepared, for example, from the first formant to the third formant which are considered to be greatly influenced by the pitch, and the table is referred to for each formant based on the pitch frequency and the input pitch frequency of the selected waveform. Determine the formant frequency.

【0009】ピッチ制御ステップS3では、入力ピッチ
周波数に従って選択した波形のピッチ周波数の変更を行
う。ここでピッチ周波数変更手法としては、例えば波形
重畳法(参考文献「波形編集型規則合成法におけるピッ
チ制御法の検討」、音講論1−4−7(1990−3)
広川ら)を用いることができる。この手法では、図3に
示すように入力ピッチ周期をTとすれば、2Tの長さを
有し中心部に対して前方部及び後方部が徐々に減少す
る。窓11を用いて1ピッチ単位で選択された波形12
を切り出し、つまり波形12の大きなピークを中心とす
るピッチ周期Tごとに窓11を用いて切り出し、入力ピ
ッチ周期Tでそれらの波形を重ね合わせて入力ピッチ周
期の波形13を得る。
In the pitch control step S3, the pitch frequency of the selected waveform is changed according to the input pitch frequency. Here, as the pitch frequency changing method, for example, a waveform superimposing method (reference document “Examination of pitch control method in waveform editing type rule synthesis method”, Sound Lecture 1-4-7 (1990-3)
Hirokawa et al.) Can be used. In this method, assuming that the input pitch period is T as shown in FIG. 3, the length is 2T and the front portion and the rear portion gradually decrease with respect to the central portion. Waveform 12 selected in 1 pitch units using window 11
Is cut out, that is, cut out using the window 11 for each pitch period T centered on the large peak of the waveform 12, and these waveforms are superposed at the input pitch period T to obtain the waveform 13 of the input pitch period.

【0010】次にステップS3で得られたピッチ制御波
形に対し、フォルマント変換ステップS4では、フォル
マント周波数設定ステップS2で決定した目的フォルマ
ント周波数になるようにフォルマント周波数を変更した
波形を生成する。ここでは例えば「フォルマント制御方
法」(特願平4−261825)を用いてフォルマント
の変更を行う。即ち図4に示すようにピッチ制御ステッ
プS3でピッチが制御された選択波形についてそのスペ
クトル包絡(スペクトル密度関数)P(w)を抽出し
(S1 )、また高速フーリエ変換(FFT)により音声
波形をスペクトルX(w)に変換し(S2 )、かつフォ
ルマントの抽出を行う(S3 )。その抽出したフォルマ
ントの周波数を目的とする周波数、つまりフォルマント
周波数設定ステップS2で決定された目的フォルマント
周波数に変更する(S4 )。ピッチ制御された選択音声
波形に対し、フォルマントの周波数を目的周波数に変更
した音声波形と対応するスペクトル包絡を求める
(S5 )。変換後のフォルマント周波数F' i における
スペクトル密度P' (2πΔT・F' i )と所望するス
ペクトル密度Pt(2πΔT・F' i )との歪Dを求め
(S6 )、その歪Dが十分小さくない場合は(S7 )、
変更対象フォルマントのバンド幅を変更してステップS
5 に戻る(S8 )。ステップS6 で歪Dが十分小と判定
されると、P(w)とX(w)とからステップS5 で最
終的に求めたスペクトル包絡P'(w)と対応したスペク
トルX'(w)を求め(S9 )、このX'(w)とX(w)
とのスペクトル歪dを求める(S10)、この歪dが十分
小さくなれば(S11)、X'(w)をX(w)とし、X'
(w)のスペクトル包絡P"(w)をP'(w)としてステ
ップS9 に戻る(S12)。このことを繰返して歪dが十
分小さくなったらステップS9 で最終的に得られている
X'(w)を逆FFTして音声波形に変換する(S13)。
Next, with respect to the pitch control waveform obtained in step S3, in the formant conversion step S4, a waveform in which the formant frequency is changed so as to have the target formant frequency determined in the formant frequency setting step S2 is generated. Here, for example, the formant is changed by using the "formant control method" (Japanese Patent Application No. 4-261825). That is, as shown in FIG. 4, the spectral envelope (spectral density function) P (w) of the selected waveform whose pitch is controlled in the pitch control step S3 is extracted (S 1 ), and the speech waveform is obtained by the fast Fourier transform (FFT). Is converted into a spectrum X (w) (S 2 ), and formants are extracted (S 3 ). Frequency for the purpose of frequency of the formant in which the extracted, i.e. to change the purpose formant frequency determined by the formant frequency setting step S2 (S 4). A spectrum envelope corresponding to the speech waveform in which the formant frequency is changed to the target frequency is calculated for the pitch-controlled selected speech waveform (S 5 ). The distortion D between the spectral density P ′ (2πΔT · F ′ i ) at the converted formant frequency F ′ i and the desired spectral density Pt (2πΔT · F ′ i ) is obtained (S 6 ), and the distortion D is sufficiently small. If not (S 7 ),
Change the bandwidth of the target formant to be changed
Return to 5 (S 8 ). When the distortion D is determined to be sufficiently small in step S 6 , the spectrum X ′ (w) corresponding to the spectrum envelope P ′ (w) finally obtained in step S 5 from P (w) and X (w). ) Is obtained (S 9 ), and this X ′ (w) and X (w)
(S 10 ), if the distortion d becomes sufficiently small (S 11 ), X ′ (w) is set to X (w), and X ′
The spectral envelope P ″ (w) of (w) is set as P ′ (w) and the process returns to step S 9 (S 12 ). If this is repeated and the distortion d becomes sufficiently small, it is finally obtained in step S 9. inverse FFT of X '(w) which are converted into speech waveform (S 13).

【0011】このようにして選択された波形についてピ
ッチ周波数を入力ピッチ周波数に変更し、そのピッチ周
波数が変更された波形のフォルマント周波数を、目的フ
ォルマント周波数に変更した音声波形を得、この音声波
形を、入力テキストに応じて設定された音韻継続時間
と、音韻ごとのパワーとその音韻間での連続性を考慮し
たパワーパターンとに従って各音韻の長さ、パワーを制
御して順次波形重畳して合成音声を得る。
With respect to the waveform thus selected, the pitch frequency is changed to the input pitch frequency, the formant frequency of the waveform whose pitch frequency is changed is changed to the target formant frequency, and a voice waveform is obtained. , The phoneme duration is set according to the input text and the power of each phoneme and the power pattern considering the continuity between the phonemes are controlled by controlling the length and power of each phoneme, and the waveforms are sequentially superimposed and synthesized. Get the voice.

【0012】この発明はテキストを音声に変換する場合
に限らず、入力音韻と入力ピッチ周波数に従って音声波
形を選択して音声合成する場合に適用できる。
The present invention is applicable not only to the case of converting text into speech but also to the case of selecting a speech waveform according to an input phoneme and an input pitch frequency to synthesize speech.

【0013】[0013]

【発明の効果】以上述べたようにこの発明によれば、自
然音声のピッチ周波数とスペクトル構造と相関関係を考
慮して音声波形のピッチ周波数を目的の周波数にすると
共に、その両ピッチ周波数に応じてフォルマント周波数
を変更し、つまりスペクトルを変更しているため、音声
の品質を損うことなくピッチ周波数が変更された音声が
得られ、自然音声に近い品質の合成音声を得ることがで
きる。
As described above, according to the present invention, the pitch frequency of the voice waveform is set to the target frequency in consideration of the correlation between the pitch frequency of the natural voice and the spectral structure, and both pitch frequencies are adjusted. Since the formant frequency is changed by changing the formant frequency, that is, the spectrum is changed, a voice with a changed pitch frequency can be obtained without deteriorating the voice quality, and a synthesized voice with a quality close to natural voice can be obtained.

【図面の簡単な説明】[Brief description of drawings]

【図1】この発明の実施例を示す流れ図。FIG. 1 is a flow chart showing an embodiment of the present invention.

【図2】フォルマント周波数変更テーブルを示す図。FIG. 2 is a diagram showing a formant frequency change table.

【図3】ピッチ周波数変更処理の概略を示す波形図。FIG. 3 is a waveform diagram showing an outline of pitch frequency changing processing.

【図4】フォルマント変換処理の例を示す流れ図。FIG. 4 is a flowchart showing an example of formant conversion processing.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 入力音韻及び入力ピッチ周波数に従って
音声波形を選択し、その選択した音声波形のピッチ周波
数を上記入力ピッチ周波数に従って制御し、かつ上記選
択した音声波形を入力された音韻継続時間に従った長さ
とし、また入力された大きさに従ったパワーとして、順
次波形重畳して合成音声を得る音声合成方法において、 上記入力ピッチ周波数及び上記選択された波形のピッチ
周波数に従って、目的フォルマント周波数を決定するフ
ォルマント周波数設定ステップと、 上記ピッチ制御された音声波形のフォルマント周波数を
上記目的フォルマント周波数に従って変換するフォルマ
ント変換ステップとを設けたことを特徴とする音声合成
方法。
1. A voice waveform is selected according to an input phoneme and an input pitch frequency, a pitch frequency of the selected voice waveform is controlled according to the input pitch frequency, and the selected voice waveform is controlled according to an input phoneme duration. A target formant frequency is determined according to the input pitch frequency and the pitch frequency of the selected waveform in a speech synthesis method in which waveforms are sequentially superimposed to obtain synthesized speech as power according to the input magnitude. And a formant conversion step of converting the formant frequency of the pitch-controlled voice waveform according to the target formant frequency.
JP09435993A 1993-04-21 1993-04-21 Voice synthesis method Expired - Lifetime JP3317458B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP09435993A JP3317458B2 (en) 1993-04-21 1993-04-21 Voice synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP09435993A JP3317458B2 (en) 1993-04-21 1993-04-21 Voice synthesis method

Publications (2)

Publication Number Publication Date
JPH06308997A true JPH06308997A (en) 1994-11-04
JP3317458B2 JP3317458B2 (en) 2002-08-26

Family

ID=14108110

Family Applications (1)

Application Number Title Priority Date Filing Date
JP09435993A Expired - Lifetime JP3317458B2 (en) 1993-04-21 1993-04-21 Voice synthesis method

Country Status (1)

Country Link
JP (1) JP3317458B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1239457A2 (en) * 2001-03-09 2002-09-11 Yamaha Corporation Voice synthesizing apparatus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1239457A2 (en) * 2001-03-09 2002-09-11 Yamaha Corporation Voice synthesizing apparatus
EP1239457A3 (en) * 2001-03-09 2003-11-12 Yamaha Corporation Voice synthesizing apparatus
US7065489B2 (en) 2001-03-09 2006-06-20 Yamaha Corporation Voice synthesizing apparatus using database having different pitches for each phoneme represented by same phoneme symbol

Also Published As

Publication number Publication date
JP3317458B2 (en) 2002-08-26

Similar Documents

Publication Publication Date Title
US7792672B2 (en) Method and system for the quick conversion of a voice signal
US8996378B2 (en) Voice synthesis apparatus
US20130311189A1 (en) Voice processing apparatus
JP3450237B2 (en) Speech synthesis apparatus and method
JP3732793B2 (en) Speech synthesis method, speech synthesis apparatus, and recording medium
JP2002268658A (en) Device, method, and program for analyzing and synthesizing voice
JP2904279B2 (en) Voice synthesis method and apparatus
JP3317458B2 (en) Voice synthesis method
JP2600384B2 (en) Voice synthesis method
JP4451665B2 (en) How to synthesize speech
JPH09319391A (en) Speech synthesizing method
JP3197975B2 (en) Pitch control method and device
JP3727885B2 (en) Speech segment generation method, apparatus and program, and speech synthesis method and apparatus
JP4468506B2 (en) Voice data creation device and voice quality conversion method
US7130799B1 (en) Speech synthesis method
JPH09179576A (en) Voice synthesizing method
JP2615856B2 (en) Speech synthesis method and apparatus
JP2005024794A (en) Method, device, and program for speech synthesis
JPH07261798A (en) Voice analyzing and synthesizing device
JP2755478B2 (en) Text-to-speech synthesizer
JP3241582B2 (en) Prosody control device and method
JPH06175675A (en) Method for controlling continuance time length of voice synthesizing device
JPH09160595A (en) Voice synthesizing method
JP2016050994A (en) Acoustic processing device
JP2003255977A (en) Phoneme expanding and compressing method

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090614

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090614

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100614

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100614

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110614

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120614

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130614

Year of fee payment: 11

EXPY Cancellation because of completion of term