JP3437472B2

JP3437472B2 - Speech synthesis method and apparatus

Info

Publication number: JP3437472B2
Application number: JP37075098A
Authority: JP
Inventors: 亮望月; 洋文西村; 利光蓑輪
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-12-25
Filing date: 1998-12-25
Publication date: 2003-08-18
Anticipated expiration: 2018-12-25
Also published as: JP2000194390A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、高音質音声合成の
ための音声合成方法に関し、特に基本周波数変更による
音質劣化の目立たない音声合成方法とその装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing method for high quality voice synthesizing, and more particularly to a voice synthesizing method and its device in which deterioration of the voice quality due to a fundamental frequency change is not noticeable.

【０００２】[0002]

【従来の技術】従来の音声合成における基本周波数パタ
ーンの決定方法は、特開平５―８８６９０号公報に記載
されているように、音韻環境の影響を考慮して各音節の
基本周波数を設定し、この音節毎の基本周波数を連結し
て文節あるいは文章の基本周波数パターンを生成する方
法が知られている。2. Description of the Related Art A conventional method for determining a fundamental frequency pattern in speech synthesis is to set the fundamental frequency of each syllable in consideration of the influence of the phonological environment, as described in JP-A-5-88690. There is known a method of connecting fundamental frequencies for each syllable to generate a fundamental frequency pattern of a syllable or a sentence.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
基本周波数パターン決定方法では、合成に使用する音声
素片の基本周波数を考慮せず、合成目的とする基本周波
数パターンを決めてしまうため、合成時に用いる音声素
片によっては基本周波数変更率が大きくなり、音質劣化
の原因になるという問題を有していた。However, in the conventional method of determining the fundamental frequency pattern, the fundamental frequency pattern to be synthesized is determined without considering the fundamental frequency of the speech unit used for synthesis. There is a problem that the fundamental frequency change rate becomes large depending on the speech unit used, which causes deterioration of sound quality.

【０００４】本発明は、このような問題を解決するもの
であり、合成目的とする音声のイントネーションが不自
然にならない範囲内で、音声素片の基本周波数変更率が
少なくなるように合成目的とする基本周波数パターンを
決定することによって、基本周波数変更による音質劣化
を減らすことができる音声合成方法とその装置を提供す
ることを目的とする。The present invention is intended to solve such a problem, and to reduce the fundamental frequency change rate of the speech unit within a range in which the intonation of the speech to be synthesized does not become unnatural. It is an object of the present invention to provide a speech synthesis method and apparatus capable of reducing deterioration of sound quality due to fundamental frequency change by determining a fundamental frequency pattern to be used.

【０００５】[0005]

【課題を解決するための手段】上記問題を解決するため
に本発明は、まず合成目的とする音声のイントネーショ
ンを表現し得る代表的な基本周波数パターンを用意す
る。続いてこの基本周波数パターンに変動を加えても、
イントネーションに影響を与えない範囲を設定する。こ
の合成目的とする音声の基本周波数パターンが変動可能
な範囲内で、合成に用いる音声素片がなるべく少ない基
本周波数変更率で合成できるような合成目的の基本周波
数パターンを決定する。これにより合成目的とする音声
のイントネーションを表現しつつ、基本周波数変更処理
による音質劣化を軽減できる音声合成方法が得られる。In order to solve the above problems, the present invention first prepares a typical fundamental frequency pattern capable of expressing the intonation of speech to be synthesized. Then, even if you change this basic frequency pattern,
Set a range that does not affect the intonation. Within the range in which the fundamental frequency pattern of the speech to be synthesized is variable, a fundamental frequency pattern for synthesis is determined so that the speech unit used for synthesis can be synthesized with a fundamental frequency change rate that is as small as possible. As a result, it is possible to obtain a voice synthesizing method capable of reducing the sound quality deterioration due to the fundamental frequency changing process while expressing the intonation of the voice to be synthesized.

【０００６】[0006]

【発明の実施の形態】本発明の請求項１に記載の発明
は、音声素片の基本周波数を、合成の目的とする音声の
基本周波数パターンに変換して音声を合成する波形重畳
型の音声合成方法において、前記基本周波数パターンを
決定する際に、イントネーションが不自然にならない範
囲内で、前記基本周波数パターンに許容範囲を設け、前
記各音声素片の接続位置では前記各音声素片の基本周波
数が等しくなるように前記許容範囲内で前記基本周波数
パターンを決定し、音声を合成することを特徴とする音
声合成方法であり、合成に用いる音声素片の基本周波数
変更率が少なくするように合成目的とする基本周波数パ
ターンを決定することで、基本周波数変更による音質劣
化を抑えることができるという作用を有する。BEST MODE FOR CARRYING OUT THE INVENTION According to the invention described in claim 1 of the present invention, the fundamental frequency of a speech unit is set to
In speech synthesis method of the waveform superposition type for synthesizing speech into a fundamental frequency pattern, in determining the fundamental frequency pattern, to the extent that intonation does not become unnatural, the allowable range is provided to the fundamental frequency pattern, Previous
Serial to determine the fundamental frequency pattern in the I fundamental frequency is equal urchin before Symbol tolerance of each speech unit in the connection position of each speech unit, it is speech synthesis method characterized by synthesizing a speech By determining the fundamental frequency pattern to be synthesized so that the fundamental frequency change rate of the speech unit used for synthesis is reduced, it is possible to suppress the sound quality deterioration due to the fundamental frequency change.

【０００７】また、請求項２に記載の発明は、請求項１
記載の音声合成方法において、前記基本周波数パターン
の許容範囲を設定する区間を母音部の定常区間について
のみ行い、前記母音部の定常区間での基本周波数変更率
ができるだけ少なくなる基本周波数パターンを決定する
ことを特徴とする音声合成方法であり、例えば合成に用
いる素片としてＶＣＶ単位を用いた場合など、母音部分
のみで基本周波数パターンを決定すれば、容易な計算で
合成目的とする基本周波数パターンを設定できるという
作用を有する。The invention described in claim 2 is the same as claim 1
In the speech synthesis method described, the section for setting the allowable range of the basic frequency pattern is performed only for the steady section of the vowel part, and the basic frequency pattern in which the basic frequency change rate in the steady section of the vowel part is as small as possible is determined. This is a voice synthesis method characterized by that, for example, when a VCV unit is used as a unit used for synthesis, if a basic frequency pattern is determined only by a vowel part, a basic frequency pattern to be synthesized can be calculated easily. It has the effect that it can be set.

【０００８】また、請求項３に記載の発明は、請求項１
記載の音声合成方法において、基本周波数パターンの許
容範囲を、合成する音声の音韻の種類毎に決定すること
を特徴とする音声合成方法であり、音韻の種類毎に目的
となる基本周波数が取り得る許容範囲を設定すること
で、音韻の違いによる影響を考慮した基本周波数パター
ンを再現することができるという作用を有する。The invention described in claim 3 is the same as claim 1
The speech synthesis method described above is a speech synthesis method characterized in that an allowable range of a fundamental frequency pattern is determined for each type of phoneme to be synthesized, and a target fundamental frequency can be taken for each type of phoneme. By setting the allowable range, it is possible to reproduce the fundamental frequency pattern in consideration of the influence of the difference in phoneme.

【０００９】また、請求項４に記載の発明は、請求項１
記載の音声合成方法において、基本周波数パターンの許
容範囲を、合成する音声の音節の位置毎に決定すること
を特徴とする音声合成方法であり、語頭やアクセント核
が存在する音節などの基本周波数パターンの変動がイン
トネーションに与える影響が大きい音節と、そうでない
音節とで基本周波数の許容範囲を別々に設定すること
で、基本周波数変更による音質劣化を目立ちにくくする
と共に、イントネーションが自然な基本周波数パターン
を決定することができるという作用を有する。The invention according to claim 4 is the same as claim 1
The speech synthesis method described above is a speech synthesis method characterized by determining an allowable range of a fundamental frequency pattern for each position of a syllable of a speech to be synthesized, and a fundamental frequency pattern such as a syllable in which a beginning or accent nucleus exists. By setting the permissible range of the fundamental frequency separately for syllables that have a large effect on the intonation and for syllables that do not, the deterioration of the sound quality due to the fundamental frequency change is less noticeable, and the fundamental frequency pattern with a natural intonation is created. It has the effect that it can be determined.

【００１０】また、請求項５に記載の発明は、音声素片
の基本周波数を、合成の目的とする音声の基本周波数パ
ターンに変換して音声を合成する波形重畳型の音声合成
装置であって、ＣＶ、ＶＣＶ、ＣＶＣ等の単位の音声素
片を格納する音声素片データベースと、合成する音声の
読みを文字列として入力する文字列入力手段と、前記文
字列により音声素片データベースから候補となる音声素
片を検索する音声素片検索手段と、合成の目的となる音
声のイントネーションに影響を与えない範囲内で、前記
基本周波数パターンに許容範囲を設け、前記音声素片の
接続位置では前記音声素片の基本周波数が等しくなるよ
うに前記許容範囲内で前記基本周波数パターンを決定す
る基本周波数パターン決定手段と、前記基本周波数パタ
ーンに従って各音声素片の基本周波数を変更する基本周
波数変更手段と、基本周波数を変更した音声素片を接続
する音声素片接続手段と、前記音声素片を接続して得ら
れた合成音声を出力する合成音声出力手段とを備えた音
声合成装置であり、音声素片データベースに格納されて
いる音声素片に十分な種類の基本周波数パターンがそろ
わなくても、イントネーションが不自然にならない範囲
で合成目的となる基本周波数パターンを自由に設定でき
るようにすることで、基本周波数変更率を減らすことが
できるため、音声素片データベースの音声素片数を増や
すことなく、音質劣化の少ない合成ができるという作用
を有する。The invention according to claim 5 is the speech unit.
The fundamental frequency of the
Waveform superposition type speech synthesis that converts into turns and synthesizes speech
A device, which is a voice unit database for storing voice units in units of CV, VCV, CVC, etc., a character string input means for inputting a voice reading to be synthesized as a character string, and a voice unit using the character string. Within the range that does not affect the intonation of the voice that is the target of synthesis, a voice unit search means that searches for a candidate voice unit from the database ,
The allowable range is set in the basic frequency pattern,
At the connection position, the fundamental frequencies of the speech units will be equal.
As described above, a fundamental frequency pattern determining means for determining the fundamental frequency pattern within the allowable range, a fundamental frequency changing means for changing the fundamental frequency of each speech element according to the fundamental frequency pattern, and a speech element with the fundamental frequency changed A speech synthesis device comprising a speech unit connecting means to be connected and a synthetic speech output means for outputting a synthesized speech obtained by connecting the speech units, wherein the speech unit is stored in a speech unit database. Even if one side does not have a sufficient number of basic frequency patterns, it is possible to reduce the basic frequency change rate by freely setting the basic frequency pattern that is the synthesis target within the range where the intonation does not become unnatural. Therefore, there is an effect that it is possible to perform synthesis with little deterioration in sound quality without increasing the number of speech units in the speech unit database.

【００１１】以下、本発明の実施の形態について、図１
から図５を用いて説明する。（実施の形態１）まず、本発明の請求項１に記載の発明
に対応する実施の形態１について、具体的に説明する。
図１は請求項１に記載の音声合成方法の概念図である。
図１において、１１０は合成目的とする音声のイントネ
ーションを表現する代表的な基本周波数パターン、１１
１、１１２はイントネーションが不自然にならない範囲
内で基本周波数パターンが設定できる許容範囲の上限と
下限、１１３はイントネーションが不自然にならない範
囲内で基本周波数パターンが変更できる許容範囲、１２
１〜１２４は合成に用いる音声素片の基本周波数、１３
０は各音声素片の基本周波数変更率が少なくなるように
設定された基本周波数パターン、１４１〜１４４は各音
声素片の接続位置である。The embodiment of the present invention will be described below with reference to FIG.
It will be described with reference to FIG. (Embodiment 1) First, Embodiment 1 corresponding to the invention described in claim 1 of the present invention will be specifically described.
FIG. 1 is a conceptual diagram of the speech synthesis method according to claim 1.
In FIG. 1, 110 is a typical fundamental frequency pattern that expresses the intonation of speech to be synthesized, and 11
1, 112 are the upper and lower limits of the allowable range in which the basic frequency pattern can be set within the range where the intonation is not unnatural, 113 is the allowable range in which the basic frequency pattern can be changed within the range where the intonation is not unnatural, 12
1 to 124 are fundamental frequencies of speech units used for synthesis, 13
0 is a fundamental frequency pattern set so that the fundamental frequency change rate of each speech unit is small, and 141 to 144 are connection positions of each speech unit.

【００１２】次に図１を用いて動作を説明する。波形重
畳方法の合成では、合成音声に適切なイントネーション
を付与するために、各音声素片の基本周波数パターン１
２１〜１２４を、合成目的とする音声の基本周波数パタ
ーン１１０へ変更する。この代表的な基本周波数パター
ン１１０には、実音声から求めた基本周波数パターンを
用いたり、音声の生成過程に立脚したモデルを用いて生
成した基本周波数パターンなどを用いたりする。しか
し、この基本周波数変更の際、各音声素片の基本周波数
パターン１２１〜１２４を合成目的音声の基準とする基
本周波数パターン１１０に完全に合わせて変更すると、
基本周波数変更率が大きくなり、音質劣化の原因となる
場合がある。本発明の方法では、イントネーションが不
自然にならない範囲で、基本周波数パターンを設定でき
る許容範囲１１３を設ける。この基本周波数パターン設
定許容範囲１１３の中で、各音声素片の基本周波数変更
率が可能な限り少なくなる基本周波数パターン１３０を
決定する。この際、各音声素片の接続位置１４１〜１４
４で基本周波数パターンが不連続にならないよう基本周
波数パターンを設定する。Next, the operation will be described with reference to FIG. In the synthesis of the waveform superimposing method, the fundamental frequency pattern 1 of each speech unit is added in order to give an appropriate intonation to the synthesized speech.
21 to 124 are changed to the fundamental frequency pattern 110 of the voice to be synthesized. As the representative fundamental frequency pattern 110, a fundamental frequency pattern obtained from actual speech is used, or a fundamental frequency pattern generated using a model based on the speech generation process is used. However, when changing the fundamental frequency, if the fundamental frequency patterns 121 to 124 of the respective speech units are completely changed to the fundamental frequency pattern 110 serving as the reference of the synthesized speech,
The fundamental frequency change rate becomes large, which may cause deterioration of sound quality. In the method of the present invention, the allowable range 113 in which the fundamental frequency pattern can be set is provided within the range where the intonation does not become unnatural. Within the basic frequency pattern setting allowable range 113, the basic frequency pattern 130 in which the basic frequency change rate of each speech unit is as small as possible is determined. At this time, the connection positions 141 to 14 of the respective speech units
In step 4, the basic frequency pattern is set so that it does not become discontinuous.

【００１３】以上のように、本発明の実施の形態１によ
れば、合成目的音声の基本周波数パターンをイントネー
ションが不自然にならない許容範囲内で、各音声素片の
基本周波数変更率を少なくなるように決定することで、
基本周波数変更による音質劣化を減らすことができると
いう効果が得られる。As described above, according to the first embodiment of the present invention, the basic frequency change rate of each speech unit is reduced within the allowable range in which the intonation does not make the basic frequency pattern of the synthetic target speech unnatural. By deciding
The effect that the sound quality deterioration due to the fundamental frequency change can be reduced can be obtained.

【００１４】（実施の形態２）次に本発明の請求項２に
記載の発明に対応する実施の形態２について、具体的に
説明する。図２は請求項２に記載の音声合成方法の概念
図である。図２において、２０１は合成目的音声の基本
周波数パターン、２０２は基本周波数パターン設定許容
範囲、２１１〜２１４は合成に用いる音声素片の基本周
波数パターン、２２１〜２２４は音声素片の母音位置に
おける基本周波数と合成目的とする音声の基本周波数と
の差分（基本周波数変更率）である。(Second Embodiment) Next, a second embodiment corresponding to the invention described in claim 2 of the present invention will be specifically described. FIG. 2 is a conceptual diagram of the speech synthesis method according to the second aspect. In FIG. 2, 201 is a basic frequency pattern of a synthesis target voice, 202 is a basic frequency pattern setting allowable range, 211 to 214 are basic frequency patterns of speech units used for synthesis, and 221 to 224 are basic vowel positions of speech units. It is the difference (fundamental frequency change rate) between the frequency and the fundamental frequency of the speech to be synthesized.

【００１５】次に図２を用いて動作を説明する。ここで
は、ＶＣＶ（母音−子音−母音）を接続単位とした音声
素片を例に用いて動作を説明する。本発明の実施の形態
１と同様に、本発明ではイントネーションが不自然にな
らない範囲で、なるべく基本周波数変更率が少なくなる
ように目的となる基本周波数パターンを設定する。しか
し、ＶＣＶを接続単位とした音声素片を合成に用いる場
合、子音部分が無声子音である場合は、基本周波数を持
たず（図２の２１１、２１４）、また有声子音である場
合は基本周波数が安定しないため（図２の２１２、２１
３）、音声素片全体で基本周波数変更率を定義すること
は困難である。そこで請求項２に記載の本発明では、Ｖ
ＣＶを接続単位とした音声素片を合成に用いる場合、先
行母音と後続母音の定常区間における基本周波数変更率
（２２１〜２２４）にのみ着目し、この母音定常区間で
の基本周波数変更率が少なくなるように目的となる基本
周波数パターンを設定する。図２に示すように各母音１
点においてのみ着目するようにすれば、基本周波数パタ
ーンの設定が容易に行える。Next, the operation will be described with reference to FIG. Here, the operation will be described by using a voice unit whose connection unit is VCV (vowel-consonant-vowel). Similar to the first embodiment of the present invention, in the present invention, a target fundamental frequency pattern is set so that the fundamental frequency change rate is as small as possible within a range where the intonation does not become unnatural. However, in the case of using a speech unit with VCV as a connection unit for synthesis, if the consonant part is an unvoiced consonant, it has no fundamental frequency (211 and 214 in FIG. 2), and if it is a voiced consonant, the fundamental frequency. Is not stable (212, 21 in FIG. 2)
3) It is difficult to define the fundamental frequency change rate for the entire speech unit. Therefore, in the present invention according to claim 2, V
In the case of using a speech unit with a CV as a connection unit for synthesis, attention is paid only to the fundamental frequency change rate (221 to 224) in the steady section of the preceding vowel and the subsequent vowel, and the fundamental frequency change rate in this vowel steady section is small. The target fundamental frequency pattern is set so that Each vowel 1 as shown in FIG.
If attention is paid only to the points, the setting of the fundamental frequency pattern can be easily performed.

【００１６】以上のように、本発明の実施の形態２によ
れば、ＶＣＶ素片において先行母音および後続母音の各
１点においてのみ基本周波数変更率に着目することで、
容易に目的となる基本周波数パターンの設定が行えると
いう効果が得られる。尚、上記説明ではＶＣＶを接続単
位とし、母音１点で基本周波数変更率を決定して単語音
声を合成する場合を説明したが、文節や文章を合成する
場合においても同様に自然な基本周波数パターンの設定
が可能である。As described above, according to the second embodiment of the present invention, by focusing on the fundamental frequency change rate only at each one point of the preceding vowel and the following vowel in the VCV segment,
The effect that the target fundamental frequency pattern can be easily set is obtained. In the above description, the case where VCV is used as a connection unit and the fundamental frequency change rate is determined with one vowel to synthesize the word voice is explained. However, when synthesizing a phrase or a sentence, a natural fundamental frequency pattern is similarly obtained. Can be set.

【００１７】（実施の形態３）次に本発明の請求項３に
記載の発明に対応する実施の形態３について、具体的に
説明する。図３は請求項３に記載の音声合成方法の概念
図である。図３において、３１０、３２０は合成目的と
する音声の代表的な基本周波数パターン、３１１〜３１
３、３２１〜３２３は基本周波数の設定が許容される範
囲である。(Third Embodiment) Next, a third embodiment corresponding to the invention described in claim 3 of the present invention will be specifically described. FIG. 3 is a conceptual diagram of the speech synthesis method according to the third aspect. In FIG. 3, 310 and 320 are typical fundamental frequency patterns of speech to be synthesized, 311 to 31 1.
3, 321 to 323 are ranges in which setting of the fundamental frequency is permitted.

【００１８】次に図３を用いて動作を説明する。図３に
示す単語音声「変化」と「文化」を合成する場合、２つ
の単語は同じ３モーラ１型のアクセント型を持つため、
同じ形状の代表的な基本周波数パターン３１０、３２０
をそれぞれ用いる。しかし同じアクセント型を持つ単語
であっても、基本周波数パターンは音韻の違いにより異
なることがある。（一般的に基本周波数は、有声子音で
は低い周波数から立ち上がるのに対して、無声子音では
高い周波数から始まる傾向がある。）このため、第一音
節が無声子音「へ」である単語「変化」と、第一音節が
有声子音「ぶ」である単語「文化」とでは、それぞれの
子音の影響を考慮して、「へ」の場合は高めに許容範囲
３１１を設定し、「ぶ」の場合は低めに許容範囲３２１
を設定する。Next, the operation will be described with reference to FIG. When synthesizing the word voices "change" and "culture" shown in FIG. 3, since the two words have the same accent type of 3 moras,
Representative fundamental frequency patterns 310 and 320 having the same shape
Are used respectively. However, even with words having the same accent type, the fundamental frequency pattern may differ due to the difference in phoneme. (Generally, the fundamental frequency rises from a low frequency in voiced consonants, whereas it tends to start from a high frequency in unvoiced consonants.) Therefore, the word "change" in which the first syllable is the unvoiced consonant "he". , And the word “culture” where the first syllable is the voiced consonant “bu”, in consideration of the influence of each consonant, the allowable range 311 is set higher in the case of “he” and in the case of “bu”. Is a lower tolerance 321
To set.

【００１９】以上のように、本発明の実施の形態３によ
れば、音韻の違いが基本周波数に与える影響を考慮し、
それぞれの音韻について基本周波数パターンの補正許容
範囲を設けることで、より原音声に近い基本周波数パタ
ーンを再現することができ、基本周波数変更率が少なく
なるという効果が得られる。尚、上記説明では単語音声
を用いた場合で説明したが、文節や文章においても音韻
の影響を考慮することで、より自然な基本周波数パター
ンの設定が可能である。As described above, according to the third embodiment of the present invention, considering the influence of the difference in phoneme on the fundamental frequency,
By providing the correction allowable range of the fundamental frequency pattern for each phoneme, the fundamental frequency pattern closer to the original voice can be reproduced, and the effect of reducing the fundamental frequency change rate can be obtained. In the above description, the case where the word voice is used has been described, but a more natural fundamental frequency pattern can be set by considering the influence of the phoneme even in the phrase and the sentence.

【００２０】（実施の形態４）次に本発明の請求項４に
記載の発明に対応する実施の形態４について、具体的に
説明する。図４は請求項４に記載の音声合成方法の概念
図である。図４において、４０１は合成目的とする音声
の代表的な基本周波数パターン、４１１〜４１６はモー
ラ毎に設定した基本周波数パターンの設定が許容される
範囲である。(Embodiment 4) Next, Embodiment 4 corresponding to the invention described in claim 4 of the present invention will be specifically described. FIG. 4 is a conceptual diagram of the speech synthesis method according to claim 4. In FIG. 4, reference numeral 401 is a typical basic frequency pattern of a voice to be synthesized, and 411 to 416 are ranges in which the setting of the basic frequency pattern set for each mora is allowed.

【００２１】次に図４を用いて動作を説明する。図４は
６モーラ３型アクセントの単語を合成する場合に用いる
基本周波数パターンの例である。合成目的とする音声の
基本周波数パターン４０１において、基本周波数が最大
値に達する第２音節やアクセント核が存在する第３音節
は、単語音声のアクセント型を決定する大きな要因とな
るため、基本周波数の詳細な設定が必要とされる。それ
故に、本発明の方法では、どの音節も同じ基本周波数の
設定許容範囲を与えるのではなく、音節毎に基本周波数
の設定許容範囲を設定する。すなわち、アクセント核付
近ではイントネーションに与える影響が強く、目的とな
る基本周波数パターンが重要であるため設定許容範囲４
１２、４１３を狭く設定し、また、アクセント核以降に
ある音節では、アクセント核ほど厳密な基本周波数パタ
ーンの設定が必要ないため、できるだけ基本周波数変更
による音質劣化が軽減するように許容範囲４１４〜４１
６を広く設定する。Next, the operation will be described with reference to FIG. FIG. 4 is an example of a fundamental frequency pattern used when synthesizing a 6-mora type 3 accent word. In the fundamental frequency pattern 401 of the speech to be synthesized, the second syllable in which the fundamental frequency reaches the maximum value and the third syllable in which the accent nucleus exists are large factors that determine the accent type of the word speech, so Detailed settings are required. Therefore, in the method of the present invention, not all syllables give the same setting tolerance of the fundamental frequency, but the setting tolerance of the fundamental frequency is set for each syllable. That is, since the influence on the intonation is strong near the accent nucleus and the target fundamental frequency pattern is important, the setting allowable range 4
12, 413 is set to be narrower, and in syllables after the accent nucleus, it is not necessary to set the fundamental frequency pattern as strictly as in the accent nucleus, so that the allowable range 414 to 41 is set so as to reduce the sound quality deterioration due to the fundamental frequency change.
Widely set 6

【００２２】以上のように、本発明の実施の形態４によ
れば、基本周波数の設定許容範囲を音節毎に別々に設け
ることで、イントネーションを重視しつつ、基本周波数
変更率が少なくなる基本周波数パターンを決定できると
いう効果が得られる。尚、上記説明では単語音声を用い
た場合で説明したが、文節や文章においても当該音節の
位置を考慮することで、より自然な基本周波数パターン
の設定が可能である。As described above, according to the fourth embodiment of the present invention, the basic frequency setting permissible range is separately provided for each syllable, so that the basic frequency can be reduced while the basic frequency change rate is reduced. The effect that the pattern can be determined is obtained. In the above description, the case where the word voice is used has been described, but it is possible to set a more natural fundamental frequency pattern by considering the position of the syllable in a phrase or a sentence.

【００２３】（実施の形態５）次に本発明の請求項５に
記載の発明に対応する実施の形態５について、具体的に
説明する。図５は請求項５に記載の音声合成装置のブロ
ック図である。図５において、５０１は合成しようとす
る音声の読みを表す文字列、５０２はこの文字列を入力
する文字列入力手段、５０３は入力された文字列により
音声素片データベースから候補の音声素片を検索する音
声素片検索手段、５０４は合成目的となる基本周波数パ
ターンと音声素片を決定する基本周波数パターン決定手
段、５０５は基本周波数パターンに従って各音声素片の
基本周波数を変更する基本周波数変更手段、５０６は基
本周波数を変更した音声素片を接続する音声素片接続手
段、５０７は音声素片を接続して得られた合成音声を出
力する合成音声出力手段、５０８は出力された合成音
声、５０９はＣＶ、ＶＣＶ、ＣＶＣといった単位の音声
素片を格納する音声素片データベースである。(Fifth Embodiment) Next, a fifth embodiment corresponding to the invention described in claim 5 of the present invention will be specifically described. FIG. 5 is a block diagram of a speech synthesizer according to claim 5. In FIG. 5, 501 is a character string representing the reading of the speech to be synthesized, 502 is a character string input means for inputting this character string, and 503 is a candidate speech unit from the speech unit database according to the input character string. A speech element search means for searching, 504 is a fundamental frequency pattern determining means for determining a fundamental frequency pattern and a speech element to be synthesized, and 505 is a fundamental frequency changing means for changing the fundamental frequency of each speech element according to the fundamental frequency pattern. 506 is a voice unit connecting means for connecting voice units whose fundamental frequency has been changed, 507 is a synthesized voice output unit for outputting a synthesized voice obtained by connecting the voice units, 508 is an output synthesized voice, A voice unit database 509 stores voice units in units such as CV, VCV, and CVC.

【００２４】次に図５を用いて動作を説明する。まず合
成目的音声の読みにアクセント記号などの情報を含んだ
文字列を、文字列入力手段５０２から入力し、ＣＶまた
はＶＣＶの単位に分解する。次に分解された文字列と同
じ読みを持つ複数の音声素片を、音声素片データベース
５０９から音声素片検索手段５０３により検索する。こ
の検索された候補となる音声素片の中から、基本周波数
パターン決定手段５０４で求まる基本周波数変更可能範
囲と比較して、音声素片の基本周波数変更率がなるべく
少なくなる音声素片を決定する。また音声素片の決定と
同時に、イントネーションが不自然にならない範囲で、
基本周波数変更率がなるべく少なくなる基本周波数パタ
ーンを本発明の請求項１〜４に記載の方法に基づいて決
定する。ここで決定した基本周波数パターンに沿って、
各音声素片の基本周波数変更を基本周波数変更手段５０
５で行う。最後に基本周波数が変更された各音声素片を
音声素片接続手段５０６において接続し、合成音声を出
力する。Next, the operation will be described with reference to FIG. First, a character string containing information such as accent marks in the reading of the synthetic target voice is input from the character string input means 502 and decomposed into CV or VCV units. Next, a plurality of speech units having the same reading as the decomposed character string are searched from the speech unit database 509 by the speech unit searching unit 503. From among the searched candidate speech units, a speech unit whose fundamental frequency change rate of the speech unit is as small as possible is determined by comparing with the fundamental frequency changeable range obtained by the fundamental frequency pattern determination unit 504. . Also, at the same time as determining the speech unit, within a range where the intonation does not become unnatural,
A fundamental frequency pattern in which the fundamental frequency change rate is as small as possible is determined based on the methods according to claims 1 to 4 of the present invention. According to the fundamental frequency pattern determined here,
Basic frequency changing means 50 is used to change the basic frequency of each speech unit.
Do in 5. Finally, the voice units whose fundamental frequencies have been changed are connected by the voice unit connecting means 506 to output a synthesized voice.

【００２５】以上のように、本発明の実施の形態５によ
れば、音声素片データベースに格納されている音声素片
に十分な種類の基本周波数パターンがそろわなくても、
イントネーションが不自然にならない範囲で合成目的と
なる基本周波数パターンを変形することで、基本周波数
変更率を減らすことができるため、音声素片データベー
スの音声素片数を増やすことなく、音質劣化の少ない音
声合成装置を構成できるという効果が得られる。As described above, according to the fifth embodiment of the present invention, even if a sufficient number of types of basic frequency patterns are not prepared for the speech units stored in the speech unit database,
Since the fundamental frequency change rate can be reduced by modifying the fundamental frequency pattern that is the synthesis target within the range where the intonation does not become unnatural, there is little deterioration in sound quality without increasing the number of speech units in the speech unit database. The effect that a voice synthesizer can be configured is obtained.

【００２６】[0026]

【発明の効果】本発明は、上記実施の形態から明らかな
ように、合成に用いる音声素片の基本周波数に着目し、
イントネーションが不自然にならない範囲内で各音声素
片の基本周波数変更率が少なくなるように合成目的とす
る基本周波数パターンを決定することで、音質劣化を減
らすことができる音声合成方法とその装置を得ることが
できる。As is apparent from the above embodiment, the present invention focuses on the fundamental frequency of the speech unit used for synthesis,
By deciding the fundamental frequency pattern to be synthesized so that the fundamental frequency change rate of each speech unit is reduced within a range where the intonation does not become unnatural, a speech synthesis method and its device that can reduce sound quality deterioration are provided. Obtainable.

[Brief description of drawings]

【図１】本発明の実施の形態１における音声合成方法の
動作を示す概念図FIG. 1 is a conceptual diagram showing an operation of a speech synthesis method according to a first embodiment of the present invention.

【図２】本発明の実施の形態２における母音位置にのみ
着目した基本周波数パターン決定方法の動作を示す概念
図FIG. 2 is a conceptual diagram showing an operation of a fundamental frequency pattern determination method focusing only on a vowel position in the second embodiment of the present invention.

【図３】本発明の実施の形態３における基本周波数の設
定可能な許容範囲の決定方法を示す概念図FIG. 3 is a conceptual diagram showing a method of determining a permissible range in which a fundamental frequency can be set according to a third embodiment of the present invention.

【図４】本発明の実施の形態４における基本周波数の設
定可能な許容範囲の決定方法を示す概念図FIG. 4 is a conceptual diagram showing a method for determining a permissible range in which a fundamental frequency can be set according to Embodiment 4 of the present invention.

【図５】本発明の実施の形態５における音声合成装置の
構成を示すブロック図FIG. 5 is a block diagram showing a configuration of a speech synthesizer according to a fifth embodiment of the present invention.

[Explanation of symbols]

１１０、３１０、３２０、４０１代表的な基本周波数
パターン１２１〜１２４、２１１〜２１４音声素片の基本周波
数１３０、２０１合成目的の基本周波数パターン１４１〜１４４各音声素片の接続位置２０２基本周波数パターン設定許容範囲５０２文字列入力手段５０３音声素片検索手段５０４基本周波数パターン決定手段５０５基本周波数変更手段５０６音声素片接続手段５０７合成音声出力手段５０９音声素片データベース110, 310, 320, 401 Typical basic frequency patterns 121 to 124, 211 to 214 Basic frequencies 130 and 201 of speech units Basic frequency patterns 141 to 144 for synthesis purpose Connection position of each speech unit 202 Basic frequency pattern setting Allowable range 502 Character string input unit 503 Speech unit search unit 504 Basic frequency pattern determination unit 505 Basic frequency change unit 506 Speech unit connection unit 507 Synthesized speech output unit 509 Speech unit database

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平１−284898（ＪＰ，Ａ) 特開平４−281499（ＪＰ，Ａ) 特開平８−254993（ＪＰ，Ａ) 特開平10−97291（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 ─────────────────────────────────────────────────── --- Continuation of the front page (56) References JP-A-1-284898 (JP, A) JP-A-4-281499 (JP, A) JP-A-8-254993 (JP, A) JP-A-10- 97291 (JP, A) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 13/08

Claims

(57) [Claims]

1. A fundamental frequency of a speech unit is used for the purpose of synthesis.
In the waveform superposition type speech synthesis method for synthesizing a speech is converted to the fundamental frequency pattern of voice that, in determining the fundamental frequency pattern, to the extent that intonation is not unnatural, tolerance to the fundamental frequency pattern the provided, wherein the connecting position of each speech unit each speech segment
Fundamental frequency determines the <br/> fundamental frequency pattern in the I urchin before Symbol tolerance equal, speech synthesis method characterized by synthesizing the speech.

2. The speech synthesis method according to claim 1, wherein
Speech synthesis characterized by performing a section for setting an allowable range of the basic frequency pattern only for a stationary section of a vowel part, and determining a basic frequency pattern in which a basic frequency change rate in the stationary section of the vowel part is as small as possible. Method.

3. The speech synthesis method according to claim 1, wherein
A method for synthesizing speech, characterized in that an allowable range of a fundamental frequency pattern is determined for each type of phoneme of speech to be synthesized.

4. The speech synthesis method according to claim 1, wherein
A method for synthesizing a voice, characterized in that an allowable range of a basic frequency pattern is determined for each position of a syllable of a voice to be synthesized.

5. The fundamental frequency of a speech unit is used for the purpose of synthesis.
Convert to the fundamental frequency pattern of the voice to synthesize the voice
A waveform superimposing type speech synthesizer, which is a CV, VCV,
A speech unit database that stores speech units in units such as CVC, character string input means that inputs the reading of synthesized speech as a character string, and speech units that are candidates from the speech unit database by the character string. An allowable range for the fundamental frequency pattern within the range that does not affect the intonation of the speech unit to be searched and the speech to be synthesized.
Is provided, and at the connection position of the speech unit, the base of the speech unit is
Basic frequency pattern determining means for determining the basic frequency pattern within the permissible range so that the main frequencies are equal, basic frequency changing means for changing the basic frequency of each speech element according to the basic frequency pattern, and the basic frequency A voice synthesizing apparatus comprising: a voice unit connecting means for connecting a changed voice unit; and a synthesized voice output unit for outputting a synthesized voice obtained by connecting the voice units.