JPS63253996A

JPS63253996A - Sentence-voice converter

Info

Publication number: JPS63253996A
Application number: JP62087100A
Authority: JP
Inventors: 達郎松本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-04-10
Filing date: 1987-04-10
Publication date: 1988-10-20
Anticipated expiration: 2012-07-09
Also published as: JP2628994B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概要〕この発明は、文章を音声に変換して出力する文−音声変
換装置において、合成する音声の各音素の時間長を夫々
固定して同じ内容を発声させた場合、毎回同じ時間長と
なってしまい、不自然な合成音声が生成されてしまう問
題を解決するため、発声される各音素に対応する合成単
位記号に付与する時間長にゆらぎを与えて音声合成する
構成を採用することにより、同じ発声内容であっても、
毎回時間長を変化させ、より人の発声に近い、自然性の
高い合成音声を出力するようにしている。[Detailed Description of the Invention] [Summary] The present invention provides a sentence-to-speech conversion device that converts a sentence into speech and outputs the same content by fixing the time length of each phoneme of synthesized speech. In order to solve the problem that the length of time is the same every time, resulting in unnatural synthesized speech, we synthesized speech by varying the length of time given to the synthesis unit symbol corresponding to each phoneme that is uttered. By adopting a structure that
The length of time is varied each time to output highly natural synthesized speech that is closer to human speech.

Ｃ産業上の利用分野〕本発明は、文章に対応する合成単位記号毎に付与する時
間長にゆらぎを与えて音声合成し、出力する文−音声変
換装置に関するものである。C. Industrial Application Field] The present invention relates to a sentence-to-speech conversion device that synthesizes and outputs speech by varying the time length given to each synthesis unit symbol corresponding to a sentence.

この文−音声変換装置は、任意の文章を入力とし、音声
に変換して出力するものであって、翻訳電話やコンピュ
ータの音声出力、盲人用読書機などの多くの分野で利用
されるものである。This text-to-speech conversion device takes any text as input, converts it into speech, and outputs it, and is used in many fields such as translation phones, computer speech output, and reading machines for the blind. be.

[Conventional technology]

第１０図および第１１図を参照して文章を音声に合成し
て出力する従来の処理の流れを説明する。The flow of conventional processing for synthesizing text into speech and outputting it will be described with reference to FIGS. 10 and 11.

第１０図において、入力された文章に対し、合成単位記
号生成部１１によって変換ルール１２が適用され、合成
の基本単位を表す合成単位記号列に変換される。この変
換された合成単位記号列は、パラメータ時系列生成部１
３によってパラメータルール１４が適用され、音声合成
部１５に与えるためのパラメータ時系列に変換される。In FIG. 10, the composition unit symbol generation unit 11 applies the conversion rule 12 to the input sentence, and converts it into a composition unit symbol string representing the basic unit of composition. This converted composite unit symbol string is generated by the parameter time series generator 1
3, the parameter rule 14 is applied, and the parameter rule 14 is converted into a parameter time series to be given to the speech synthesis unit 15.

このパラメータ時系列は、音声合成部１５によって音声
合成され、音声として出力される。This parameter time series is voice synthesized by the voice synthesis section 15 and output as voice.

次に、第１１図を用いてパラメータ時系列生成部１３の
処理の流れを詳細に説明する。Next, the process flow of the parameter time series generating section 13 will be explained in detail using FIG. 11.

第１１図において、時間長設定部１３−１は、通知され
た合成単位記号列に対し、時間長ルール１３−２を適用
して合成単位記号毎に時間長の割り当てを行うと共に、
音素環境による時間長の違いを考慮して時間長の調整を
行う、パラメータ設定部１３−３は、この調整された時
間長を用いて、合成単位記号列に対応するパラメータ値
を設定し、更に補間を行ってパラメータ時系列を生成す
る。In FIG. 11, the time length setting unit 13-1 applies the time length rule 13-2 to the notified combination unit symbol string to allocate a time length to each combination unit symbol, and
The parameter setting unit 13-3, which adjusts the time length in consideration of the difference in time length depending on the phoneme environment, uses the adjusted time length to set the parameter value corresponding to the synthesis unit symbol string, and further Perform interpolation to generate parameter time series.

この際、音声の自然性を高めるために、パラメータ間を
滑らかに接続したり、実音声に見られるような“ゆらぎ
”を与える処理を行う、このようにして生成されたパラ
メータ時系列を音声合成部１５に入力することによって
、音声が合成され、出力される。At this time, in order to enhance the naturalness of the voice, the parameter time series generated in this way is processed to create smooth connections between parameters and give "fluctuations" similar to those seen in real speech. By inputting to section 15, speech is synthesized and output.

〔発明が解決しようとする問題点３人が同じ単語（文）を何回も繰り返し発声する場合、発
声された音声は、発声毎にその物理的な特徴（振幅、ピ
ンチ、ホルマント周波数、発声長など）が変化している
。[Problem 3 to be solved by the invention: When a person repeatedly utters the same word (sentence), the uttered voice changes its physical characteristics (amplitude, pinch, formant frequency, utterance length) each time it is uttered. etc.) are changing.

従来の第１Ｏ図および第１１図に示すような文音声変換
装置は、パラメータ設定部１３−３において、パラメー
タ間を滑らかに接続したり、パラメータ値にゆらぎを付
加したりすることが考えられる。しかし、これは時間長
を固定したままであるため、同じ内容を発声させた場合
、毎回、同じ時間長となる不自然な音声が合成・出力さ
れるという問題点があった。In conventional sentence-to-speech conversion devices as shown in FIGS. 1O and 11, the parameter setting section 13-3 may connect parameters smoothly or add fluctuations to parameter values. However, since the time length remains fixed, there is a problem in that when the same content is uttered, an unnatural sound with the same length of time is synthesized and output every time.

[Means for solving problems]

本発明は、前記問題点を解決するため、文章に対応づけ
て生成した合成単位記号に対し、時間長ルール３を適用
して時間長の割り当てを行う時間長設定部２と、この時
間長設定部２によって割当てられた時間長に対し、時間
長ゆらぎルール５を通用してゆらぎを付加する時間長ゆ
らぎ付加部４と、この時間長ゆらぎ付加部４によってゆ
らぎが付加された時間長に対し、パラメータルール７を
適用してパラメータ値を設定した後、パラメータ時系列
を生成するパラメータ設定部６とを設け、このパラメー
タ設定部６によって生成されたパラメータ時系列を音声
合成部に供給して音声を合成して出力するようにしてい
る。In order to solve the above-mentioned problems, the present invention provides a time length setting section 2 that assigns a time length by applying time length rule 3 to a composite unit symbol generated in association with a sentence, and a time length setting section 2 that allocates a time length by applying a time length rule 3. A time length fluctuation addition unit 4 adds fluctuation to the time length assigned by the time length fluctuation rule 5 using the time length fluctuation rule 5, and to the time length to which fluctuation is added by the time length fluctuation addition unit 4, After setting parameter values by applying parameter rule 7, a parameter setting section 6 is provided which generates a parameter time series, and the parameter time series generated by the parameter setting section 6 is supplied to a speech synthesis section to generate speech. I am trying to synthesize and output.

第１図は本発明の原理構成図を示す０図中パラメータ時
系列生成部１は、合成単位記号を入力とし、パラメータ
時系列を生成して出力するものである。これは、第１０
図図中パラメータ時系列生成部１３に対応するものであ
って、時間長に更にゆらぎを付加して生成したパラメー
タ時系列を出力するものである。FIG. 1 shows a basic configuration diagram of the present invention. In FIG. 1, a parameter time series generation section 1 receives a composite unit symbol as input, generates and outputs a parameter time series. This is the 10th
It corresponds to the parameter time series generation unit 13 in the figure, and outputs a parameter time series generated by adding fluctuation to the time length.

時間長設定部２は、入力された合成単位記号に対し、時
間長ルール３を適用して時間長を割り当てるものである
。The time length setting section 2 applies the time length rule 3 to assign a time length to the input composite unit symbol.

時間長ゆらぎ付加部４は、割り当てられた時間長に対し
、時間長ゆらぎルール５を適用してゆらぎを付加するも
のである。The time length fluctuation adding unit 4 applies a time length fluctuation rule 5 to the assigned time length to add fluctuation.

パラメータ設定部６は、パラメータルール７を適用して
、合成単位記号列と、ゆらぎの付加された時間長とに従
って、パラメータ値を設定して音声合成部に与えるパラ
メータ時系列を生成するものである。The parameter setting section 6 applies the parameter rule 7 to set parameter values according to the synthesis unit symbol string and the time length to which fluctuations are added, and generates a parameter time series to be given to the speech synthesis section. .

[Effect]

次に動作を説明する。 Next, the operation will be explained.

第１図において、合成単位記号を時間長設定部２に人力
すると、時間長ルール３が適用され、夫々の合成単位記
号に対し、時間長が設定される。In FIG. 1, when composite unit symbols are manually entered into the time length setting section 2, time length rule 3 is applied and a time length is set for each composite unit symbol.

この時間長を時間長ゆらぎ付加部４に入力すると、時間
長ゆらぎルール５が適用され、これらの時間長に毎回異
なる値のゆらぎが付加される。このゆらぎの付加された
時間長、および合成単位記号をパラメータ設定部６に入
力すると、パラメータルール７が適用され、パラメータ
値が設定され、更に補間が行われてパラメータ時系列が
生成される。When these time lengths are input to the time length fluctuation adding section 4, the time length fluctuation rule 5 is applied, and a different value of fluctuation is added to these time lengths each time. When the time length to which this fluctuation is added and the synthesis unit symbol are input to the parameter setting unit 6, the parameter rule 7 is applied, the parameter value is set, and further interpolation is performed to generate a parameter time series.

このパラメータ時系列を音声合成部に入力することによ
り、音声が合成され、出力される。By inputting this parameter time series to the speech synthesis section, speech is synthesized and output.

以上のように、時間長ルール３を通用して生成した時間
長に対し、更に時間長ゆらぎルール５を適用してゆらぎ
例えば乱数を用いて生成したゆらぎを付加してパラメー
タ時系列を生成し、音声合成を行って出力することによ
り、同じ内容を発声させても、毎回異なる時間長からな
る自然な音声が出力される。As described above, a parameter time series is generated by further applying time length fluctuation rule 5 to the time length generated using time length rule 3 and adding fluctuations, for example, fluctuations generated using random numbers, By performing speech synthesis and outputting, even if the same content is uttered, natural speech with a different length of time is output each time.

〔Example〕

次に、第２図ないし第９図を用いて本発明の１実施例の
構成および動作を詳細に説明する。Next, the configuration and operation of one embodiment of the present invention will be explained in detail using FIGS. 2 to 9.

第２図において、時間長設定部２は、入力された合成単
位記号に対し、時間長ルール３を適用して時間長を割り
当てるものである。In FIG. 2, a time length setting unit 2 applies a time length rule 3 to the input composition unit symbol to assign a time length.

時間長調整部２−１は、割り当てられた時間長に対し、
時間長調整ルール２−２を適用して各合成単位記号の環
境による時間長の違いを調整するものである。The time length adjustment unit 2-1 adjusts the time length to the assigned time length.
The time length adjustment rule 2-2 is applied to adjust the difference in time length depending on the environment of each composite unit symbol.

時間長ゆらぎ付加部４は、調整された時間長に対し、時
間長ゆらぎルール５を適用して合成単位記号毎にゆらぎ
の幅が異なることを考慮し、ゆらぎを付加するものであ
る。The time length fluctuation adding section 4 adds fluctuation to the adjusted time length by applying the time length fluctuation rule 5, taking into account that the width of fluctuation differs for each composite unit symbol.

超分節素パラメータ設定部４−１は、ゆらぎの付加され
た時間長に対し、超分節素パラメータルール４−２を適
用して振幅、ピッチなどの超分節素（音声学的に意味の
ある音声の最小単位である分節素の枠を越えて、音節や
句、文などの広範囲に渡る音声特徴を有するもの）パラ
メータの設定を行うものである。The super-segmental element parameter setting unit 4-1 applies the super-segmental element parameter rule 4-2 to the time length to which the fluctuation is added, and sets the ultra-segmental elements (phonetically meaningful speech) such as amplitude and pitch. It goes beyond the segmental element, which is the smallest unit of speech, and sets parameters for a wide range of speech features such as syllables, phrases, and sentences.

パラメータ設定部６−１は、超音節素の付加されたもの
対し、パラメータルール７を適用してパラメータ値を設
定するものである。The parameter setting unit 6-1 sets a parameter value by applying parameter rule 7 to the supersyllabic element.

パラメータ補間部６−２は、設定されたパラメータ値の
間の補間例えば直線補間、２次曲線補間などを行って、
音声合成部に与えるパラメータ時系列を生成するもので
ある。The parameter interpolation unit 6-2 performs interpolation between set parameter values, such as linear interpolation, quadratic curve interpolation, etc.
It generates a parameter time series to be given to the speech synthesis section.

次に、第２図構成の処理の流れを第３図ないし第７図ル
ールを用いて順次説明する。Next, the flow of the process in the configuration shown in FIG. 2 will be sequentially explained using the rules shown in FIGS. 3 to 7.

第３図は時間長ルール例を示す、これは、入力された合
成音声記号に対し、時間長設定部２によって適用される
ルールを示す、このルール１は、“ＡＥの時間長は、２
００ｍ５である”旨を表す。FIG. 3 shows an example of a time length rule. This shows the rule applied by the time length setting unit 2 to the input synthesized speech symbol. This rule 1 is "The time length of AE is 2.
00m5".

ここで、“ＡＥ”は例えば英語“ｃａｔ”の発音記号（
合成単位記号に対応する）（ｋａａｔ）中の′″ａ！”
を表し、ＤＵは区間を表し、５ｔａｒｔおよびｅｎｄは
処理する場合に使用する時間変数名を表す。Here, “AE” is the phonetic symbol for the English word “cat” (
``a!'' in (kaat) (corresponding to the compound unit symbol)
, DU represents an interval, and 5tart and end represent time variable names used in processing.

時間長設定部２がこのルール１を合成単位記号“ａ！“
に対して適用することによって、時間変数５ｔａｒｔ＝
ｏ、および時間変数ａｎｄ−２００（ｍｓ）に設定され
る。The time length setting unit 2 converts this rule 1 into a composite unit symbol “a!”
By applying to the time variable 5tart=
o, and a time variable and-200 (ms).

第４図は時間長調整ルール例を示す、これは、時間長ル
ール例えばルール１によって設定された時間長（時間変
数５ｔａｒｔｘＱ、時間変数ａｎｄ−２００＞に対し、
時間長調整部２−１によって適用されるルールを示す、
このルール２は“有声の摩擦音に先行する母音の時間長
は１２０％に伸張する”、ルール３は“無声の閉鎖音に
先行する母音の時間長は７５％に短縮する”旨を夫々表
す、ここで、〔〕内は合成単位記号の素性（音響的特徴
）を表し、合成単位記号毎に予め設定しておく必要があ
る。　　（ｖｏｗｅｌ　）はその合成単位が母音、（＋
ｖｏｃ）は有声、（−ｖｏｃ）は無声、（ｆｒｉｃ）は
摩擦音、（ｓｔｏｐ）は閉鎖音であることを夫々表す、
ルールの左から第１番目の項例えば〔νｏ＠ｅｌ　）は
ルール適用の対象となる合成単位記号又はその素性を表
す、第２番目の項例えば（Ｄｏ　　１２０％）はルール
適用後の結果を表す、第３番目の項例えば“−（＋ｖｏ
ｃ　ｆｒｉｃ　）　”は対象となる□合成単位記号の環
境（合成単位記号又はその素性）を表す、以下同様であ
る０時間長調整部２−１がこれらルール２あるいはルー
ル３を合成単位記号に適用することによって、右横に示
すように、時間変数ｅｎｄが１．２倍あるいは０．７５
倍される。FIG. 4 shows an example of a time length adjustment rule.
Indicating the rules applied by the time length adjustment unit 2-1,
Rule 2 states that "the duration of the vowel preceding a voiced fricative is extended by 120%," and rule 3 states that "the duration of the vowel preceding a voiceless stop is shortened to 75%." Here, the characters in [ ] represent the features (acoustic characteristics) of the composite unit symbol, and must be set in advance for each composite unit symbol. For (vowel), the unit of composition is a vowel, (+
voc) indicates voiced, (-voc) indicates voiceless, (fric) indicates fricative, and (stop) indicates stop consonant.
The first term from the left of the rule, for example [νo@el], represents the composite unit symbol or its feature to which the rule is applied, and the second term, for example (Do 120%), represents the result after applying the rule. , the third term, for example “-(+vo
c fric ) ” represents the environment (composite unit symbol or its feature) of the target □composite unit symbol, and the same applies hereafter.0 Time length adjustment unit 2-1 applies these rules 2 or 3 to the composite unit symbol. By doing this, the time variable end becomes 1.2 times or 0.75 as shown on the right side.
be multiplied.

第５図は時間長ゆらぎルール例を示す、これは、時間長
調整ルール２−２によって調整された後の時間長に対し
、時間長ゆらぎ付加部４によって適用されるルールを示
す、このルール４は“母音の時間長はその２０％の範囲
でゆらぐ”、ルール５は“摩擦音の時間長はその１０％
の範囲でゆらぐ”旨を夫々表す、ここで、ＲＮＤは、乱
数例えば一様乱数、あるいはスペクトルが１／ｆとなる
ようにフィルタリングされた乱数などであって、−１か
ら＋１の値を持つものを表す０時間長ゆらぎ付加部４が
例えば合成単位記号ＡＥに対して適用すると、第５図ル
ールの右欄に示すような演算が実行される。例えば、乱
数が−０，３の値を取った場合、ルール１で設定された
時間変数ｅｎｄ＝２００に対し、下式によって算出され
る時間変数ｅｎｄ＝１８８に設定される。FIG. 5 shows an example of the time length fluctuation rule. This rule 4 shows the rule applied by the time length fluctuation adding section 4 to the time length after being adjusted by the time length adjustment rule 2-2. Rule 5 is ``The duration of vowels fluctuates within 20% of that range'', and Rule 5 is ``The duration of fricatives fluctuates within 10% of that range''.
Here, RND is a random number, such as a uniform random number or a random number filtered so that the spectrum is 1/f, and has a value from -1 to +1. When the 0 time length fluctuation addition unit 4 representing 0 is applied to, for example, the composite unit symbol AE, the calculation shown in the right column of the rule in Figure 5 is executed.For example, if the random number takes the values -0, 3, In this case, the time variable end=200 set in Rule 1 is set to the time variable end=188 calculated by the following formula.

ｅｎｄ−２００＋２００Ｘ０．２Ｘ　（０゜３）＝１８
８・・・・・・・・・・・・・・・＋１１以上のルール
を適用することによって、時間長に対する処理を完了す
る０次に、パラメータ値の設定の処理を説明する。end-200+200X0.2X (0°3)=18
8......+11 or more rules are applied to complete the processing for the time length.Next, the processing for setting parameter values will be described.

第６図は超分節パラメータルール例を示す。これは、ゆ
らぎの付加された時間長に対し、超分節素パラメータ設
定部４−１によって適用されるルールを示す、このルー
ル６は“母音の始点と終点との振幅はＯｄＢである”、
ルール７は“母音の始点から３０　ｍ　ｓ後の振幅は６
０ｄＢ、および終点の３　Ｑｍｓ前の振幅は５５ｄＢで
ある”、ルール８は“母音の始点から７０ｍ５後の振幅
は７０ｄＢである°旨を夫々表す、これにより、第８図
（イ）図中黒丸を用いて示す位置のパラメータ値が設定
されたこととなる。また、ルール９は“母音の始点のピ
ンチは１００Ｈｚである”、ルール１０は１母音の終点
のピンチは先に設定されている始点のピッチの０．９倍
である旨を夫々表す。FIG. 6 shows an example of a supersegmental parameter rule. This shows the rule applied by the supersegment element parameter setting unit 4-1 to the time length to which fluctuation is added.This rule 6 is "The amplitude between the start point and end point of a vowel is OdB",
Rule 7 is ``The amplitude 30 ms after the start of the vowel is 6.
0 dB, and the amplitude 3 Qms before the end point is 55 dB," and Rule 8 means "The amplitude 70 m5 after the start point of the vowel is 70 dB." This means that the black circle in Figure 8 (a) This means that the parameter value at the position indicated by is set. Further, Rule 9 indicates that "the pitch at the start point of a vowel is 100 Hz", and Rule 10 indicates that the pitch at the end point of one vowel is 0.9 times the pitch of the previously set starting point.

これにより、第８図（ロ）図中黒丸を用いて位置のパラ
メータが設定されたこととなる。ここで、使用している
時間変数５ｔａｒｔおよび時間変数ｅｎｄは、前段で設
定された値が入っている。また、ＡＶは振幅値、ＦＯは
ピンチ周波数、＄は先に設定したパラメータ値を保持し
、後に使用するための変数を表す。As a result, the position parameters are set using the black circles in FIG. 8(b). Here, the time variable 5tart and time variable end used have the values set in the previous stage. Further, AV represents an amplitude value, FO represents a pinch frequency, and $ represents a variable for holding a previously set parameter value and for later use.

第７図はパラメータルール例を示す、これは、パラメー
タ設定部６によって適用されるルールを示す。このルー
ル１１は’ＡＥの始点と終点の第１ホルマントは６００
Ｈｚ、第２ホルマントは１６００Ｈｚ、第３ホルマント
は２３００Ｈｚである”旨を夫々表す。FIG. 7 shows an example of a parameter rule, which shows the rule applied by the parameter setting unit 6. This rule 11 is 'The first formant of the starting point and ending point of AE is 600.
Hz, the second formant is 1600Hz, and the third formant is 2300Hz.''

以上によって、ゆらぎの付加された時間長に対し、振幅
、ピッチ周波数、第１ホルマント、第２ホルマント、第
３ホルマントなどのパラメータ値が第９図に示されるよ
うに設定される。そして、第２図パラメータ補間部６−
２は、第９図に示すように不連続に設定されたパラメー
タ値を、音声合成部が必要とするパラメータ指定間隔毎
に値を持つように補間を行う、この補間は、線形補間、
２次曲線補間、臨界制動２次系による補間などを用いて
行う、この補間を行う際に、パラメータ値にゆらぎを加
えてもよい、このようにして補間して生成したパラメー
タ時系列を音声合成部に入力して音声を合成して出力す
ることにより、発声毎に時間長が変化する自然性の高い
音声を出力することができる。As described above, parameter values such as amplitude, pitch frequency, first formant, second formant, and third formant are set as shown in FIG. 9 for the time length to which fluctuation is added. And, FIG. 2 parameter interpolation unit 6-
2 interpolates the discontinuously set parameter values as shown in FIG. 9 so that they have values at each parameter specification interval required by the speech synthesis section. This interpolation is performed by linear interpolation,
It is performed using quadratic curve interpolation, interpolation using a critical braking quadratic system, etc. When performing this interpolation, fluctuations may be added to the parameter values. The parameter time series generated by interpolation in this way is used for speech synthesis. By synthesizing and outputting the voices input into the section, it is possible to output highly natural voices whose duration changes each time they are uttered.

第８図（イ）は、既述したように、第６図ルール６ない
し８によって設定されたパラメータ値を模式的に表し、
第８図（ロ）は第６図ルール９．１０によって設定され
たパラメータ値を模式的に表したものである。As already mentioned, FIG. 8(a) schematically represents the parameter values set according to rules 6 to 8 in FIG.
FIG. 8(b) schematically represents the parameter values set according to rule 9.10 of FIG. 6.

第９図は、既述したルール１ないし３．６ないし１１に
よって設定されたパラメータ値を夫々示す、尚、この図
中には、ルール４．５によって適用されるゆらぎは示し
てないが、当該ルール４．５が適用される場合には、Ｔ
ＩＭＥＲの２００ｍ５が乱数の２０％（ルール４適用の
場合）あるいは１０％（ルール５適用の場合）だけゆら
ぎとして付加されるものである。FIG. 9 shows the parameter values set according to the rules 1 to 3.6 to 11 described above. Note that this figure does not show the fluctuation applied according to rule 4.5, but the If rule 4.5 applies, then T
200 m5 of IMER is added as a fluctuation by 20% (when rule 4 is applied) or 10% (when rule 5 is applied) of the random number.

〔Effect of the invention〕

以上説明したように、本発明によれば、発声される各音
素に対応する合成単位記号に付与する時間長にゆらぎを
与えて音声合成する構成を採用しているため、同じ発声
内容であっても、毎回時間長を変化させ、より人の発声
に近い、自然性の高い合成音声を生成することができる
。As explained above, according to the present invention, since a configuration is adopted in which speech is synthesized by varying the time length given to the synthesis unit symbol corresponding to each phoneme to be uttered, even if the utterance content is the same, By changing the length of time each time, it is possible to generate highly natural synthesized speech that is closer to human speech.

[Brief explanation of drawings]

第１図は本発明の原理構成図、第２図は本発明の１実施
例構成図、第３図は時間長ルール例、第４図は時間長調
整ルール例、第５図は時間長ゆらぎルール例、第６図は
超分節素パラメータルール例、第７図はパラメータルー
ル例、第８図は超分節素パラメータルール適用例、第９
図はパラメータ値例、第１０図は文−音声変換装置の構
成図、第１１図は従来のパラメータ時系列生成部の構成
図を示す。図中、ｌはパラメータ時系列生成部、２は時間長設定部
、３は時間長ルール、４は時間長ゆらぎ付加部、５は時
間長ゆらぎルール、６はパラメータ設定部、７はパラメ
ータルールを表す。Figure 1 is a diagram showing the principle of the present invention, Figure 2 is a diagram showing the configuration of one embodiment of the present invention, Figure 3 is an example of a time length rule, Figure 4 is an example of a time length adjustment rule, and Figure 5 is a time length fluctuation diagram. Rule example, Figure 6 is an example of a hypersegmental parameter rule, Figure 7 is an example of a parameter rule, Figure 8 is an example of applying a hypersegmental parameter rule, and Figure 9 is an example of a hypersegmental parameter rule.
The figure shows an example of parameter values, FIG. 10 is a block diagram of a sentence-to-speech conversion device, and FIG. 11 is a block diagram of a conventional parameter time series generating section. In the figure, l is a parameter time series generation part, 2 is a time length setting part, 3 is a time length rule, 4 is a time length fluctuation adding part, 5 is a time length fluctuation rule, 6 is a parameter setting part, and 7 is a parameter rule. represent.

Claims

[Claims] In a sentence-to-speech conversion device that converts a sentence into speech and outputs it, a time length rule (3) is applied to a composition unit symbol generated in association with a sentence to assign a time length. and a time length fluctuation adding section (4) that adds fluctuation by applying the time length fluctuation rule (5) to the time length allocated by the time length setting part (2). and a parameter setting unit (6) that applies parameter rules (7) to set parameter values for the time length to which fluctuations have been added by the time length fluctuation addition unit (4), and then generates a parameter time series. A sentence-to-speech conversion device comprising: a parameter time series generated by the parameter setting unit (6) is supplied to a speech synthesis unit to synthesize and output speech.