JP3113101B2

JP3113101B2 - Speech synthesizer

Info

Publication number: JP3113101B2
Application number: JP04298835A
Authority: JP
Inventors: 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-11-09
Filing date: 1992-11-09
Publication date: 2000-11-27
Anticipated expiration: 2015-11-27
Also published as: JPH06149283A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は文字コード列、または韻
律情報と音韻系列とから合成音声を生成する音声合成装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for generating a synthesized speech from a character code string or a prosody information and a phoneme sequence.

【０００２】[0002]

【従来の技術】近時、漢字かな混じりの文を解析し、そ
の文が示す音声情報を規則合成法により音声合成して出
力する音声合成装置が種々開発されている。そして、こ
の種の音声合成装置は、銀行業務における電話紹介サー
ビスや、新聞校閲システム、文書読み上げ装置等として
幅広く利用され始めている。2. Description of the Related Art Recently, various speech synthesizers have been developed which analyze a sentence containing kanji and kana characters and synthesize and output speech information indicated by the sentence by a rule synthesis method. This type of speech synthesizer has begun to be widely used as a telephone introduction service in a banking business, a newspaper review system, a text-to-speech device, and the like.

【０００３】この種の規則合成法を採用した音声合成装
置は、基本的には人間が発声した音声を予めある単位、
例えばＣＶ（子音、母音）、ＣＶＣ（子音、母音、子
音）、ＶＣＶ（母音、子音、母音）、ＶＣ（母音、子
音）毎にＬＳＰ（線スペクトル対）分析やケプストラム
分析等の手法を用いて分析して求められる音韻情報を音
声素片ファイルに登録しておき、この音声素片ファイル
を参照して合成パラメータ（音韻パラメータと韻律パラ
メータ）を生成し、これらの合成パラメータをもとにし
て音源の生成と合成フィルタリング処理を行うことによ
り合成音声を生成するものである。A speech synthesizer employing this type of rule synthesizing method basically converts speech uttered by a human into a predetermined unit,
For example, for each of CV (consonant, vowel), CVC (consonant, vowel, consonant), VCV (vowel, consonant, vowel), VC (vowel, consonant), LSP (line spectrum pair) analysis and cepstrum analysis are used. The phoneme information obtained by the analysis is registered in a speech unit file, and synthesis parameters (phoneme parameters and prosodic parameters) are generated with reference to the speech unit file, and a sound source is generated based on these synthesis parameters. And a synthetic filtering process to generate synthesized speech.

【０００４】また、このような音声合成装置は、ピッチ
の高さを変えないで発話の速度を自由に変えられるよう
になっている。この発話速度を変えるための方式とし
て、従来より、合成パラメータ生成するときの単音節
（ＣＶあるいはＣ）の長さ（モーラ長）を増減する方式
や、合成するときのフレーム周期を増減する方式が知ら
れている。[0004] Further, such a speech synthesizer can freely change the utterance speed without changing the pitch height. Conventionally, as a method for changing the utterance speed, a method of increasing or decreasing the length (mora length) of a single syllable (CV or C) when generating a synthesis parameter, or a method of increasing or decreasing a frame period when synthesizing is generated. Are known.

【０００５】上記モーラ長（の可変）による発話速度制
御方式は、人間が発話速度を変えるとき子音はほぼ一定
で母音部のみ伸縮するという特徴を取り入れて制御する
ものである。このため、モーラ長による発話速度制御方
式では、人間でも発声できないような速さで音声合成し
ようとすると、母音部が短くなり過ぎたり、母音部がな
くなるという不具合を生じ、非常に聞きづらいものとな
っていた。これを解決するための手段として、発話速度
を速くする場合は、モーラ長に従って合成パラメータを
一様に間引く方式が知られているが、フレームを間引く
ための制御が難しく、また間引いたために明瞭度が低く
なったり、音のつながりが悪くなるといった不具合が生
じていた。The utterance speed control method based on the mora length (variable) controls the sound by adopting a feature that when a utterance speed is changed by a human, the consonant is almost constant and only the vowel portion expands and contracts. For this reason, in the utterance speed control method based on the mora length, when attempting to synthesize speech at such a speed that even humans cannot utter, the vowel part becomes too short or the vowel part disappears, which makes it very difficult to hear. I was As a means for solving this, a method of uniformly thinning out synthesis parameters according to the mora length is known to increase the utterance speed.However, control for thinning out frames is difficult, and clarity is reduced due to thinning. And the connection of the sound becomes poor.

【０００６】これに対して、上記フレーム周期（の可
変）による発話速度制御方式は、母音と子音を一様に伸
縮させることができ、発話速度を速くしたとき母音部分
が聞き取れなくなるようなことはなく、しかもフレーム
を間引くことがないので音のつながりも悪くなることは
ない。しかし、このフレーム周期による発話速度制御方
式では、発話速度を遅くしたときは母音と同様に子音も
長くなるため、不自然に聞こえるという問題があった。On the other hand, the utterance speed control method based on (variable) frame period can expand and contract vowels and consonants uniformly, and when the utterance speed is increased, the vowel part cannot be heard. Since there is no need to skip frames, the connection between sounds does not deteriorate. However, in the utterance speed control method based on the frame period, when the utterance speed is reduced, the consonant becomes longer as well as the vowel, so that there is a problem that the sound sounds unnatural.

【０００７】[0007]

【発明が解決しようとする課題】このように上記した従
来の音声合成技術にあっては、モーラ長可変による発話
速度制御方式やフレーム周期可変による発話速度制御方
式を個別に用いると、発話速度によって聞きづらい音声
や不自然な音声となる等の不具合があった。As described above, in the above-mentioned conventional speech synthesis technology, if the speech speed control method based on the variable mora length and the speech speed control method based on the variable frame period are individually used, the speech speed depends on the speech speed. There were inconveniences such as sounds that were difficult to hear and unnatural sounds.

【０００８】本発明はこのような事情を考慮してなされ
たもので、その目的とするところは、発話速度を極めて
速くした場合でも聞き取りやすい合成音が生成でき、ま
た遅くした場合でも自然な合成音が生成できる音声合成
装置を提供することにある。The present invention has been made in view of such circumstances, and it is an object of the present invention to be able to generate a synthesized sound that is easy to hear even when the utterance speed is extremely high, and to provide a natural synthesized sound even when the utterance speed is low. An object of the present invention is to provide a speech synthesizer capable of generating sound.

【０００９】[0009]

【課題を解決するための手段】本発明に係る音声合成装
置は、合成音声の発話速度情報が入力される入力部と、
この入力部に入力された発話速度情報が、予め定められ
た値より速く読み上げることを表しているか、遅く読み
上げることを表しているかを判断する判断手段と、この
判断手段で、上記入力された発話速度情報が予め定めら
れた値より速く読み上げることを表していると判断され
た際に、音声合成時のフレーム周期を短くする第１の手
段と、上記判断手段で、上記入力された発話速度情報が
予め定められた値より遅く読み上げることを表している
と判断された際に、モーラ長を伸長して合成パラメータ
を生成する第２の手段とを備えたことを特徴とするもの
である。According to the present invention, there is provided a speech synthesizer comprising: an input unit for inputting speech speed information of a synthesized speech;
The speech rate information input to the input unit is determined in advance.
Indicates that the text should be read faster than
Means for determining whether the
In determining means, that speak faster than said input utterance speed loach report is stipulated Me premixed value is determined to be the table
When the first means of shortening the frame period during speech synthesis, by the determining means, and the table that is read aloud slower than the speech speed information the input <br/> predetermined value
And a second means for generating a composite parameter by extending the mora length when it is determined that

【００１０】[0010]

【作用】上記の構成においては、入力された発話速度情
報により発話速度を速くすることが指定されていると判
断された場合には、フレーム周期を短くすることによっ
て、子音と母音を一様に縮めることが可能となり、逆に
発話速度を遅くすることが指定されていると判断された
場合には、モーラ長を長くすることによって、母音部の
みを伸ばすことが可能になるため、発話速度を変えるこ
とによる明瞭度の低下や不自然さを招くことがなくな
り、高品質の合成音声を生成することが可能となる。In the above arrangement, the input speech speed information
Determine if is possible to increase the speech rate is designated by the broadcast
In the case of rejection, it is possible to shorten consonants and vowels uniformly by shortening the frame period, and conversely, if it is determined that the speech speed is specified to be low, By increasing the length, it is possible to extend only the vowels, so that changing the speech rate does not cause intelligibility or unnaturalness, and high-quality synthesized speech can be generated. Becomes

【００１１】[0011]

【実施例】以下、図面を参照して本発明の一実施例を説
明する。図１は同実施例に係る音声合成装置の概略構成
を示すブロック図である。図１に示す音声合成装置は、
音声合成の対象とする漢字かな混じりの文字コード列
と、合成音声の発話速度情報ｘの入力を司る入力部１を
有する。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of the speech synthesizer according to the embodiment. The speech synthesizer shown in FIG.
It has an input unit 1 for inputting a character code string mixed with kanji or kana to be subjected to speech synthesis and speech speed information x of synthesized speech.

【００１２】図１に示す音声合成装置はまた、音声合成
の対象となる単語や句等についてのアクセント型、読
み、品詞情報等が予め登録されている単語辞書２と、入
力部１により入力された文字コード列を単語辞書２を用
いて解析し、対応する音韻系列および韻律情報を生成す
る言語処理部３とを有する。The speech synthesizing apparatus shown in FIG. 1 also has a word dictionary 2 in which accent types, readings, parts of speech, and the like for words and phrases to be subjected to speech synthesis are registered in advance, and an input unit 1 inputs the words. And a language processing unit 3 that analyzes the character code string using the word dictionary 2 and generates a corresponding phoneme sequence and prosody information.

【００１３】図１に示す音声合成装置はまた、予め任意
の音声単位毎に入力音声を分析することにより求められ
た音韻パラメータ群が格納されている音声素片ファイル
４と、言語処理部３にて生成された音韻系列に対応する
音韻パラメータを音声素片ファイル４から取り出して、
後述する発話速度制御部１０（におけるモーラ長設定部
７）により決定されたモーラ長を基準にして各種音韻の
モーラ長を定め、音韻パラメータの生成を行う合成パラ
メータ生成部５とを有する。この合成パラメータ生成部
５は、言語処理部３にて生成された韻律情報に従って韻
律パラメータの生成も行う。The speech synthesizer shown in FIG. 1 also includes a speech unit file 4 in which a phoneme parameter group obtained by analyzing an input speech in advance for each arbitrary speech unit is stored. The phoneme parameters corresponding to the phoneme sequence generated by the above are extracted from the speech unit file 4,
A synthesis parameter generation unit 5 that determines mora lengths of various phonemes based on a mora length determined by (a mora length setting unit 7 in (a mora length setting unit 7 in) an after-mentioned speech speed control unit 10 and generates phoneme parameters). The synthesis parameter generation unit 5 also generates a prosody parameter according to the prosody information generated by the language processing unit 3.

【００１４】図１に示す音声合成装置は更に、入力部１
により入力された発話速度情報ｘをもとにモーラ長の設
定とフレーム周期の設定を行って発話速度を制御する発
話速度制御部１０と、合成パラメータ生成部５から与え
られ合成パラメータ（音韻パラメータと韻律パラメー
タ）と発話速度制御部１０（における次に述べるフレー
ム周期設定部９）により決定されたフレーム周期に従っ
て音源の生成、ディジタルフィルタリングを行って、合
成音声を生成する音声合成部１１と、音声出力用のスピ
ーカ１２とを有する。The speech synthesizer shown in FIG.
The speech rate control unit 10 controls the speech rate by setting the mora length and the frame period based on the speech rate information x input by the control unit 1 and the synthesis parameters (phonological parameters and A speech synthesis unit 11 for generating a sound source and performing digital filtering in accordance with a prosody parameter) and a frame cycle determined by the speech rate control unit 10 (a frame cycle setting unit 9 described below) to generate a synthesized speech; Speaker 12.

【００１５】発話速度制御部１０は、発話速度情報（の
値）とモーラ長との対応関係を登録したモーラ長テーブ
ル６、入力部１からの発話速度情報ｘとモーラ長テーブ
ル６を用いてモーラ長を決定するモーラ長設定部７、発
話速度情報（の値）とフレーム周期との対応関係を登録
したフレーム周期テーブル８、および入力部１からの発
話速度情報ｘとフレーム周期テーブル８を用いてフレー
ム周期を決定するフレーム周期設定部９を持つ。なお図
１では、音声合成部１１において合成音声をアナログ信
号に変換するためのＤ／Ａ変換器などは省略されてい
る。The speech rate control unit 10 uses the mora length table 6 in which the correspondence between the speech rate information (the value thereof) and the mora length is registered, the speech rate information x from the input unit 1 and the mora length table 6, and Using a mora length setting unit 7 for determining the length, a frame period table 8 in which the correspondence between (the value of) the speech speed information and the frame period is registered, and the speech speed information x and the frame period table 8 from the input unit 1. It has a frame period setting unit 9 for determining a frame period. In FIG. 1, a D / A converter for converting a synthesized voice into an analog signal in the voice synthesis unit 11 and the like are omitted.

【００１６】次に、図１に示す音声合成装置の発話速度
制御部１０を中心とする動作を、図２のフローチャート
を参照して説明する。ここでは、発話速度情報は「（発
声の単音節の長さ）／１０」と定義する。Next, the operation of the speech synthesizer shown in FIG. 1 centered on the speech rate control section 10 will be described with reference to the flowchart of FIG. Here, the speech speed information is defined as “(length of a single syllable of speech) / 10”.

【００１７】まず、入力部１により発話速度情報ｘが入
力されると、その発話速度情報ｘが発話速度制御部１０
に取り込まれ、同制御部１０にて、その情報ｘ（の値）
と、予め定められた基準の速度情報（の値）、例えば
「１２」とが比較される（ステップＳ１，Ｓ２）。First, when the utterance speed information x is input from the input unit 1, the utterance speed information x is input to the utterance speed control unit 10.
And the information x (the value of)
And (a value of) predetermined reference speed information, for example, “12” (Steps S1 and S2).

【００１８】この比較の結果、ｘが１２未満であるなら
ば、発話速度制御部１０内のモーラ長設定部７は、モー
ラ長を１２フレームに決定し、合成パラメータ生成部５
に設定する（ステップＳ３）。また発話速度制御部１０
内のフレーム周期設定部９は、入力された発話速度情報
ｘによりフレーム周期テーブル８を参照して同情報ｘに
対応するフレーム周期を決定し、音声合成部１１に設定
する（ステップＳ４）。As a result of this comparison, if x is less than 12, the mora length setting unit 7 in the speech rate control unit 10 determines the mora length to be 12 frames, and
(Step S3). The speech rate control unit 10
The frame period setting unit 9 determines the frame period corresponding to the information x with reference to the frame period table 8 based on the input speech speed information x, and sets the frame period in the speech synthesis unit 11 (step S4).

【００１９】一方、入力部１からの発話速度情報ｘが１
２以上であるならば、フレーム周期設定部９は通常の値
のフレーム周期（ここでは、分析時のフレーム周期であ
り、例えば１０ｍｓ）を決定し、音声合成部１１に設定
する（ステップＳ５）。またモーラ長設定部７は、入力
された発話速度情報ｘによりモーラ長テーブル６を参照
して同情報ｘに対応するモーラ長を決定し、合成パラメ
ータ生成部５に設定する（ステップＳ６）。以上の音声
合成装置の動作を更に具体的に説明する。On the other hand, when the speech rate information x from the input unit 1 is 1
If it is 2 or more, the frame cycle setting unit 9 determines the frame cycle of a normal value (here, the frame cycle at the time of analysis, for example, 10 ms) and sets it in the speech synthesis unit 11 (step S5). Further, the mora length setting unit 7 refers to the mora length table 6 based on the input speech speed information x, determines a mora length corresponding to the information x, and sets the mora length in the synthesis parameter generation unit 5 (step S6). The operation of the above speech synthesizer will be described more specifically.

【００２０】まず入力部１により、値が「１２」の発話
速度情報ｘが入力されたものとする。ここで、（人間が
発生した音声から音声素片ファイル４に登録する音韻情
報を求めるための）分析時のサンプリング周波数は「８
ｋＨｚ（１２５μｓ）」、フレーム周期は「１０ｍｓ」
であるとする。また、モーラ長テーブル６には図３
（ａ）に示す内容のものを使用し、フレーム周期テーブ
ル８には図３（ｂ）に示す内容のものを使用する。First, it is assumed that utterance speed information x having a value of "12" is input by the input unit 1. Here, the sampling frequency at the time of analysis (for obtaining phoneme information to be registered in the speech unit file 4 from a voice generated by a human) is "8
kHz (125 μs) ”and the frame cycle is“ 10 ms ”
And Also, FIG.
The content shown in FIG. 3A is used, and the content shown in FIG.

【００２１】さて、入力部１により入力された発話速度
情報ｘ＝１２は、発話速度制御部１０に取り込まれ（ス
テップＳ１）、図２のフローチャートに示した流れで処
理が行われる。まず入力された発話速度情報ｘ＝１２
は、ステップＳ２において基準の発話速度情報「１２」
と比較される。この場合、ｘ＝１２のため、ステップＳ
５，Ｓ６の処理が行われる。ステップＳ５では、フレー
ム周期設定部９において「１０ｍｓ」なるフレーム周期
が決定され、音声合成部１１に設定される。またステッ
プＳ６では、図３（ａ）に示したモーラ長テーブル６が
モーラ長設定部７によって参照され、発話速度情報ｘ＝
１２に対応するモーラ長「１２」が合成パラメータ生成
部５に設定される。つまり、この例では、フレーム周期
が「１０ｍｓ」、モーラ長が「１２フレーム」となるこ
とから、１モーラ当たりの長さは「１２０ｍｓ」（１０
ｍｓ×１２）になる。The speech speed information x = 12 input by the input unit 1 is taken into the speech speed control unit 10 (step S1), and the processing is performed according to the flow shown in the flowchart of FIG. First, the input speech speed information x = 12
Is the reference utterance rate information "12" in step S2.
Is compared to In this case, since x = 12, step S
Steps S5 and S6 are performed. In step S <b> 5, a frame cycle of “10 ms” is determined by the frame cycle setting unit 9 and set in the speech synthesis unit 11. In step S6, the mora length table 6 shown in FIG. 3A is referred to by the mora length setting unit 7, and the speech speed information x =
The mora length “12” corresponding to “12” is set in the synthesis parameter generation unit 5. That is, in this example, since the frame period is “10 ms” and the mora length is “12 frames”, the length per mora is “120 ms” (10 ms).
ms × 12).

【００２２】ここで、入力部１により、音声合成の対象
とする漢字かな混じりの文字コード列が入力されると、
言語処理部３は、この入力文字コード列と単語辞書２と
を照合し、この入力文字コード列が示す音声合成の対象
となっている単語や句等についてのアクセント型、読
み、品詞情報を求め、その品詞情報に従うアクセント型
・境界の決定、および漢字かな混じり文の読みの形式へ
の変換を行い、音韻系列と韻律情報を生成する。Here, when the input unit 1 inputs a character code string mixed with kanji or kana to be subjected to speech synthesis,
The language processing unit 3 collates the input character code string with the word dictionary 2 and obtains accent type, reading, and part of speech information of a word or phrase which is a target of speech synthesis indicated by the input character code string. Then, it determines the accent type and boundary according to the part-of-speech information, and converts the sentence into a form of reading a sentence mixed with kanji and kana to generate a phoneme sequence and prosodic information.

【００２３】言語処理部３によって生成された音韻系列
と韻律情報は合成パラメータ生成部５に与えられる。合
成パラメータ生成部５は、言語処理部３から与えられた
音韻系列に対応する音韻パラメータを音声素片ファイル
４より取り出し、発話速度制御部１０内のモーラ長設定
部７によって設定されたモーラ長「１２」、即ち１２フ
レームを基準にして、各種音韻のモーラ長を定め、音韻
パラメータを生成する。同時に合成パラメータ生成部５
は、言語処理部３から与えられた韻律情報に従って韻律
パラメータを生成する。即ち合成パラメータ生成部５
は、音韻パラメータおよび韻律パラメータからなる合成
パラメータを生成する。The phoneme sequence and the prosody information generated by the language processing unit 3 are provided to a synthesis parameter generation unit 5. The synthesis parameter generation unit 5 extracts a phoneme parameter corresponding to the phoneme sequence given from the language processing unit 3 from the speech unit file 4, and sets the mora length “set by the mora length setting unit 7 in the speech speed control unit 10. Based on “12”, that is, 12 frames, mora lengths of various phonemes are determined, and phoneme parameters are generated. At the same time, the synthesis parameter generator 5
Generates a prosody parameter according to the prosody information given from the language processing unit 3. That is, the synthesis parameter generation unit 5
Generates a synthesis parameter including a phoneme parameter and a prosody parameter.

【００２４】合成パラメータ生成部５によって生成され
た合成パラメータは音声合成部１１に与えられる。音声
合成部１１は、この合成パラメータと発話速度制御部１
０内のフレーム周期設定部９によって設定されたフレー
ム周期に従って音源の生成、ディジタルフィルタリング
を行って、合成音声を生成するThe synthesis parameters generated by the synthesis parameter generator 5 are given to the speech synthesizer 11. The voice synthesizing unit 11 controls the synthesis parameters and the speech speed control unit 1
A sound source is generated and digital filtering is performed according to the frame period set by the frame period setting unit 9 within 0 to generate a synthesized voice.

【００２５】ここで、合成パラメータ生成部５によって
音声素片ファイル４から取り出された音韻パラメータ
（音声素片）が、図４（ａ）に示すように、子音（Ｃ）
と母音（Ｖ）の長さの割合が「Ｃ：Ｖ＝６：６」で、モ
ーラ長１２フレームであると仮定する。一方、音声合成
部１１には、フレーム周期設定部９によりフレーム周期
「１０ｍｓ」が設定されている。この場合、音声合成部
１１では、フレーム周期が「１０ｍｓ」、サンプリング
周期が「１２５μｓ」であることから、１フレームの合
成パラメータに対して８０個（１０ｍｓ／１２５μｓ）
の音声波形データが得られることになる。Here, the phoneme parameters (speech units) extracted from the speech unit file 4 by the synthesis parameter generation unit 5 are converted into consonants (C) as shown in FIG.
It is assumed that the ratio of the length of the vowel (V) is “C: V = 6: 6” and the mora length is 12 frames. On the other hand, a frame cycle “10 ms” is set in the speech synthesis section 11 by the frame cycle setting section 9. In this case, the voice synthesizing unit 11 has a frame cycle of “10 ms” and a sampling cycle of “125 μs”, so that 80 (10 ms / 125 μs) for one frame of synthesis parameters.
Will be obtained.

【００２６】次に、入力部１により、値が「６」の発話
速度情報ｘが入力された場合について説明する。この場
合、図２のステップＳ１，Ｓ２，Ｓ３，Ｓ４が処理さ
れ、モーラ長「１２」が得られると共に、図３（ｂ）に
示すフレーム周期テーブル８から発話速度情報ｘ＝６に
対応したフレーム周期「５ｍｓ」が得られる。したがっ
て、１モーラ当たりの長さは「６０ｍｓ」（５ｍｓ×１
２）となり、１フレームの合成パラメータに対して４０
個（５ｍｓ／１２５μｓ）の音声波形データが生成され
ることになる。即ち、図４（ｂ）に示すように、Ｃ：Ｖ
が「６：６」のまま図４（ａ）の１２０ｍｓから６０ｍ
ｓに縮小される。Next, the case where the speech rate information x having the value "6" is inputted by the input unit 1 will be described. In this case, steps S1, S2, S3, and S4 in FIG. 2 are processed to obtain the mora length “12” and the frame corresponding to the speech speed information x = 6 from the frame period table 8 shown in FIG. A period “5 ms” is obtained. Therefore, the length per mora is “60 ms” (5 ms × 1
2) becomes 40 for the synthesis parameter of one frame.
(5 ms / 125 μs) of audio waveform data will be generated. That is, as shown in FIG.
Is “6: 6” and 60 m from 120 ms in FIG.
s.

【００２７】次に、入力部１により、値が「２０」の発
話速度情報ｘが入力された場合について説明する。この
場合、図２のステップＳ１，Ｓ２，Ｓ５，Ｓ６が処理さ
れ、フレーム周期「１０ｍｓ」が得られると共に、図３
（ａ）に示すモーラ長テーブル６から発話速度情報ｘ＝
２０に対応したモーラ長「２０」が得られる。したがっ
て、１モーラ当たりの長さは「２００ｍｓ」（１０ｍｓ
×２０）となり、１フレームの合成パラメータに対して
８０個（１０ｍｓ／１２５μｓ）の音声波形データが生
成されることになる。即ち、図４（ｃ）に示すようにＶ
のみが伸長され、Ｃ：Ｖが「６：１４」になる。なお、
Ｖの伸長は、音声素片における最終フレーム、或いはＶ
の中間部分のフレームを繰り返し使用することによって
行えばよい。Next, the case where the speech rate information x having the value "20" is inputted by the input unit 1 will be described. In this case, steps S1, S2, S5, and S6 in FIG. 2 are processed to obtain a frame period “10 ms”,
From the mora length table 6 shown in FIG.
The mora length “20” corresponding to 20 is obtained. Therefore, the length per mora is “200 ms” (10 ms
× 20), and 80 (10 ms / 125 μs) audio waveform data are generated for one frame of synthesis parameters. That is, as shown in FIG.
Only C: V becomes “6:14”. In addition,
Decompression of V is performed in the last frame of the speech unit, or V
May be performed by repeatedly using the frame in the middle part of the above.

【００２８】以上、説明してきたように上述の処理機能
を備えた本実施例装置によれば、発話速度を、予め定め
られた速さ「１２」より速くする場合には、フレーム周
期を短くして音声合成し、逆に予め定められた速さ「１
２」より遅くする場合には、モーラ長を長くして音声合
成することによって、発話速度を変えた場合でも聞き取
りやすい合成音声の生成が行える。As described above, according to the apparatus of the present embodiment having the above-described processing function, when the utterance speed is higher than the predetermined speed "12", the frame period is shortened. To synthesize a voice, and conversely, a predetermined speed “1”
If the speed is slower than "2", the mora length is lengthened and speech synthesis is performed, so that a synthesized speech that is easy to hear can be generated even when the speech speed is changed.

【００２９】なお、本発明は前記実施例に限定されるも
のではない。即ち、前記実施例では、２つの方式の切り
換え制御を、発話速度情報の「１２」を基準として行う
と共に、発話速度情報「１２」に対応するモーラ長を
「１２フレーム」としたが、これに限るものではない。
また、発話速度情報が「１２」未満のとき、モーラ長を
「１２」に固定したが、徐々に短くなるように設定して
も構わない。同様に、「１２」以上のとき、フレーム周
期を「１０ｍｓ」に固定したが、徐々に長くなるように
設定しても構わない。要するに本発明はその要旨を逸脱
しない範囲で種々変形して実施することができる。The present invention is not limited to the above embodiment. That is, in the above-described embodiment, the switching control between the two methods is performed based on the speech speed information “12” and the mora length corresponding to the speech speed information “12” is “12 frames”. It is not limited.
Further, when the utterance speed information is less than “12”, the mora length is fixed at “12”, but may be set to be gradually shortened. Similarly, when the frame period is “12” or more, the frame period is fixed to “10 ms”, but may be set to be gradually longer. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【００３０】[0030]

【発明の効果】以上説明したように本発明によれば、発
話速度情報の値によって、フレーム周期可変による発話
速度制御方式およびモーラ長可変による発話速度制御方
式のいずれかを選択、あるいは併用して、上記各方式の
欠点を他方の方式で補うことができるため、発話速度を
大幅に変えたときでも、自然で聞き取りやすい合成音声
の生成が行える等の実用上多大なる効果が奏せられる。As described above, according to the present invention, according to the value of the speech speed information, one of the speech speed control system based on the variable frame period and the speech speed control system based on the variable mora length is selected or used in combination. Since the disadvantages of each of the above methods can be compensated for by the other method, even when the speech speed is greatly changed, it is possible to produce a synthetic voice that is natural and easy to hear.

[Brief description of the drawings]

【図１】本発明の一実施例に係る音声合成装置のブロッ
ク構成図。FIG. 1 is a block diagram of a speech synthesizer according to an embodiment of the present invention.

【図２】同実施例における発話速度制御部１０の処理の
流れを説明するためのフローチャート。FIG. 2 is a flowchart for explaining a processing flow of an utterance speed control unit 10 in the embodiment.

【図３】同実施例におけるモーラ長テーブル６およびフ
レーム周期テーブル８の内容例を示す図。FIG. 3 is a view showing an example of contents of a mora length table 6 and a frame cycle table 8 in the embodiment.

【図４】発話速度情報の値の違いによるＣＶ音節の長さ
の変化を示す図。FIG. 4 is a diagram showing a change in the length of a CV syllable due to a difference in the value of speech speed information.

[Explanation of symbols]

１…入力部、２…単語辞書、３…言語処理部、４…音声
素片ファイル、５…合成パラメータ生成部、６…モーラ
長テーブル、７…モーラ長設定部、８…フレーム周期テ
ーブル、９…フレーム周期設定部、１０…発話速度制御
部、１１…音声合成部。DESCRIPTION OF SYMBOLS 1 ... Input part, 2 ... Word dictionary, 3 ... Language processing part, 4 ... Speech unit file, 5 ... Synthetic parameter generation part, 6 ... Mora length table, 7 ... Mora length setting part, 8 ... Frame period table, 9 ... Frame period setting unit, 10 ... Speaking speed control unit, 11 ... Voice synthesis unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭63−223697（ＪＰ，Ａ) 特開平４−170600（ＪＰ，Ａ) 特開昭64−79799（ＪＰ，Ａ) 特開平３−203800（ＪＰ，Ａ) 特開平３−206496（ＪＰ，Ａ) 特開昭62−233831（ＪＰ，Ａ) 特開昭61−122700（ＪＰ，Ａ) 特開昭59−200340（ＪＰ，Ａ) 特開昭58−16295（ＪＰ，Ａ) 特公昭56−2354（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-62-223697 (JP, A) JP-A-4-170600 (JP, A) JP-A-64-79799 (JP, A) JP-A-3- 203800 (JP, A) JP-A-3-206496 (JP, A) JP-A-62-233831 (JP, A) JP-A-61-122700 (JP, A) JP-A-59-200340 (JP, A) JP-A-58-16295 (JP, A) JP-B-56-2354 (JP, B2) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00 -21/06

Claims

(57) [Claims]

1. A generates a prosodic parameters according prosody information to generate a corresponding phoneme parameters according phoneme sequence, in the speech synthesizing apparatus for synthesizing speech in accordance with synthesis parameters consisting phoneme parameters and prosodic parameters, the speech speed of the synthesized speech Input where information is entered
Unit and speech rate information input to the input unit are determined in advance.
Indicates that the text should be read faster than
Determining means for determining whether the means that increase, in this determination means, input to the input unit utterance speed soil report
Have tables to but that read aloud faster than value determined Me pre
When it is determined that that the first means for shortening the frame period during speech synthesis, in the determination unit, input to the input unit utterance speed soil report
Not table a but it is read aloud slower than the value that is determined Me premixed
And a second means for generating a synthesis parameter by extending the mora length when it is determined that the speech is synthesized.