JPH0876782A

JPH0876782A - Voice synthesizing device

Info

Publication number: JPH0876782A
Application number: JP6209537A
Authority: JP
Inventors: Hiroshi Sano; 洋佐野; Takaaki Arai; 孝章新居; Hiroyuki Tsuboi; 宏之坪井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-09-02
Filing date: 1994-09-02
Publication date: 1996-03-22

Abstract

PURPOSE: To improve the naturality of synthesized voice by maintaining the specified phoneme continuation time and pause length of single sound and forming phoneme symbol strings and vocal sound and rhythm control parameters. CONSTITUTION: Processing to limit the number of the syllables of one word tone is executed according to the instruction of a speed setting section 10 in a text structure deforming section 8. Next, the phoneme and rhythm control parameter value information is so determined as to extend and contract the phoneme continuation time and pause length of single sound at the ratios according to the speed. This determination is executed by a time length determining section 11 according to the instruction from a speed setting section 10. The phoneme and rhythm symbol strings obtd. from the text structure changed in the pause insertion position by the text structure deforming section 8 and the phoneme and rhythm control parameters formed by the instruction of high-speed utterance by the time length determining section 11 are simultaneously formed and are outputted to a parameter forming section 6. The speed synthesis parameters are formed in parameter formation in such a manner. The parameter systems are sent to a synthesizer to obtain the synthesized voices according to the assigned speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は入力されたテキストを一
定規則に従って音声として出力する音声合成装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for outputting input text as speech according to a certain rule.

【０００２】[0002]

【従来の技術】図14は、テキストから合成音声を出力す
る従来例の音声規則合成装置のブロック図を示してい
る。図14において、テキストが入力されると、言語処理
部１では文字列から語の連鎖を導く。そのために形態素
解析、構文解析が行なわれる。意味解析が実施されるこ
ともある。音韻処理部４では、語を構成する音素系列
を、その語の前後の環境を参照しながら単音記号に変換
する。同時に文の構造解析結果に基づいて文全体のイン
トネーションや休止などの韻律記号を生成する。この音
韻・韻律記号列は、パラメータ生成部６によって音声波
形を生成するためパラメータの時間変化パターンを生成
する。このパラメータ系列により音声波形生成部７を駆
動し、最終的に合成音声が出力される。こうして入力さ
れたテキストに対応する規則合成音声を得る。2. Description of the Related Art FIG. 14 is a block diagram of a conventional speech rule synthesizing device for outputting synthetic speech from text. In FIG. 14, when text is input, the language processing unit 1 derives a word chain from a character string. Therefore, morphological analysis and syntactic analysis are performed. Semantic analysis may also be performed. The phoneme processing unit 4 converts a phoneme sequence forming a word into a phonetic symbol while referring to the environment before and after the word. At the same time, prosodic symbols such as intonation and pause of the whole sentence are generated based on the structural analysis result of the sentence. This phonological / prosodic symbol string is used by the parameter generation unit 6 to generate a time-varying pattern of parameters for generating a speech waveform. The voice waveform generator 7 is driven by this parameter sequence, and finally synthetic voice is output. A rule-synthesized speech corresponding to the text thus input is obtained.

【０００３】上述の従来例の音声合成装置は、入力した
テキストを逐次音声に変換して出力するものであり、一
般には通常の人間の発声速度（おおよそ 6.0〜 8.0モー
ラ／秒）で音声を合成する。The above-mentioned conventional speech synthesizing device is for sequentially converting input text into speech and outputting it. Generally speaking, speech is synthesized at a normal human speaking speed (about 6.0 to 8.0 mora / sec). To do.

【０００４】任意のテキストを音声に変換するこのよう
な音声規則合成においては、合成音声の発声速度を変更
することで、例えば、低速発声する合成音声を用いると
聞き取りにくい部分や重要な部分を明瞭に伝えることが
できる。逆に高速に発声させることで、不必要な情報部
分を早読みすることができるため、同じ時間であっても
多くの情報を伝達することが可能となる。これによって
音声規則合成の応用範囲が広がるなど速度を変えて合成
音声を実現することは重要である。In such speech rule synthesis for converting arbitrary text into speech, by changing the utterance speed of the synthesized speech, for example, if a synthesized speech uttering at a low speed is used, a difficult-to-understand portion or an important portion is clarified. Can be told. On the contrary, by uttering at high speed, it is possible to quickly read unnecessary information portions, so that it is possible to transmit a large amount of information even at the same time. As a result, it is important to change the speed and realize synthetic speech, such as expanding the range of application of speech rule synthesis.

【０００５】そこで、従来、重要でない語句を発声しな
いことによって高速に発声する手法や単音や音節音の音
韻継続時間長を伸縮することによって発声速度を変更す
る方法、分節単位で発声時間の伸縮を行う制御による発
声速度制御法が考案されている。伸縮する音韻を区分し
て、例えば母音の継続時間だけを変えて、子音部分の継
続時間は変化させない手法を用いた制御も行なわれてい
る。Therefore, conventionally, a method of uttering at high speed by not uttering an unimportant phrase, a method of changing the utterance speed by expanding or contracting the phoneme duration of a single sound or a syllable, and expanding or contracting the utterance time on a segment basis. A method of controlling the speaking rate by controlling the operation has been devised. Control is also performed using a method in which a phoneme that expands and contracts is divided and, for example, only the duration of vowels is changed and the duration of consonant parts is not changed.

【０００６】従来の方法では、通常発声速度を基準とし
て、休止区間も含め音伸縮対象となる部分の音韻継続時
間長を一定の割合で伸縮させていた。しかしながら、人
間が速く発声した、もしくはゆっくり発声した音声の特
徴を観察すると、休止長の伸縮比率は一定ではなく、こ
のため単音の音韻継続時間長と休止長を線形伸縮させる
従来の方法では合成音声が不自然になるという問題点が
あった。[0006] In the conventional method, the phoneme duration time of the part to be expanded / contracted including the pause section is expanded / contracted at a constant rate with reference to the normal vocalization rate. However, when observing the characteristics of speech that humans uttered fast or slowly, the expansion / contraction ratio of the pause length is not constant. There was a problem that it became unnatural.

【０００７】また、人が文章を読む速さが速くなると、
それまであった休止の幾つかが欠落して、一つのフレー
ズに含まれるモーラ数が多くなる傾向にある。ゆっくり
発声する場合には、逆に休止が挿入されフレーズの数が
多くなることも公知の事実として指摘されている。その
ため、合成音声の自然性を向上させるためには、単音の
音韻継続時間長の伸縮と休止長の伸縮だけでなく、同時
に休止挿入位置も発声速度に応じて変えることが必要と
なる。[0007] In addition, when a person reads a sentence faster,
Some of the pauses up to that point are missing, and the number of mora contained in one phrase tends to increase. It is also known as a well-known fact that when speaking slowly, a pause is inserted and the number of phrases is increased. Therefore, in order to improve the naturalness of the synthesized speech, it is necessary not only to expand or contract the phoneme duration of a single sound and to expand or contract the pause length, but also to change the pause insertion position according to the utterance speed.

【０００８】[0008]

【発明が解決しようとする課題】以上述べた様に従来の
音声合成装置においては、発声速度に応じて休止挿入位
置を変えることができず、更には音韻継続時間長と休止
長を単純に線形伸縮させていたため、合成音声が不自然
になるという欠点がある。As described above, in the conventional speech synthesizer, the pause insertion position cannot be changed according to the utterance speed, and the phoneme duration and the pause are simply linear. Since it has been expanded and contracted, there is a drawback in that the synthetic speech becomes unnatural.

【０００９】本発明の目的は、発声速度に応じて休止挿
入位置を変えると共に、音韻継続時間長と休止長を非線
形伸縮させることにより、合成音声の自然性を向上され
た音声合成装置を提供することにある。An object of the present invention is to provide a speech synthesizer in which the naturalness of synthesized speech is improved by changing the pause insertion position according to the utterance speed and by nonlinearly expanding and contracting the phoneme duration and pause. Especially.

【００１０】[0010]

【課題を解決するための手段】本発明は、入力されたテ
キストを言語解析して語調単位の連鎖から構成されるテ
キスト構造を作成する言語処理手段と、合成音声の発声
速度を設定する速度設定手段と、この速度設定手段より
設定された音声速度に従って前記言語処理手段により作
成されたテキスト構造に変形を施し休止の挿入位置を変
えるテキスト構造変形手段と、このテキスト構造変形手
段より得られたテキスト構造から音韻・韻律記号列を生
成する音韻処理手段と、前記速度設定手段により設定さ
れた音声速度に従って音韻継続時間長と休止長を決定し
その制御パラメータを出力する時間長決定手段と、この
時間長決定手段より出力された制御パラメータを用いて
前記音韻処理手段からの音韻・韻律記号列を合成音声パ
ラメータとして生成するパラメータ生成手段と、このパ
ラメータ生成手段からの合成音声パラメータを用いて合
成音声を出力する音声波形生成手段とを備えたことを特
徴とするものである。SUMMARY OF THE INVENTION According to the present invention, a language processing means for linguistically analyzing input text to create a text structure composed of a chain of tone units, and a speed setting for setting a utterance speed of synthetic speech. Means, text structure transforming means for transforming the text structure created by the language processing means according to the voice speed set by the speed setting means to change the insertion position of the pause, and text obtained by the text structure transforming means. Phoneme processing means for generating a phoneme / prosodic symbol string from the structure, time length determining means for determining the phoneme duration and pause length according to the voice speed set by the speed setting means, and outputting the control parameters, and this time Using the control parameters output from the length determining means, the phoneme / prosodic symbol string from the phoneme processing means is generated as a synthetic speech parameter. And parameter generating means for, is characterized in that a speech waveform generation means for outputting a synthesized speech using a synthesis speech parameters from the parameter generating means.

【００１１】[0011]

【作用】本発明では、発声速度の指示に従って、テキス
トの言語解析の結果から得られる話調単位の連鎖から構
成されるテキスト構造を変形して休止の挿入位置を適切
に設定し、同時に単音の音韻継続時間長と休止長を速度
毎に違った比率で伸縮させるよう音韻・韻律と制御パラ
メータ値を決定し、さらに指定される速度指定の値によ
っては、単音の音韻継続時間長と休止長を指定速度値に
関わらず一定に保ち、音韻・韻律記号列と音韻・韻律制
御パラメータを生成する。この情報を基にして合成音声
パラメータを生成し、所望の発声速度で、自然なイント
ネーションによってテキストの合成音声を生成すること
ができる。According to the present invention, according to the instruction of the speaking rate, the text structure constituted by the chain of tone units obtained from the result of the linguistic analysis of the text is transformed to set the pause insertion position appropriately, and at the same time, the The phoneme / prosody and control parameter values are determined so that the phoneme duration and pause length are expanded and contracted at different ratios for each speed, and depending on the specified speed designation value, the phoneme duration and pause length of a single note can be determined. The phoneme / prosodic symbol string and the phoneme / prosodic control parameter are generated by keeping the speed constant regardless of the designated speed value. A synthetic voice parameter can be generated based on this information, and a synthetic voice of text can be generated at a desired utterance rate by natural intonation.

【００１２】[0012]

【実施例】図１は、本発明の一実施例である音声規則合
成装置のブロック図である。図１において図14と同一の
ものについては、同一の符号を付している。図１が、従
来例の装置（図14）と異なるのは、速度指定を行なう速
度設定部を設けたことと、速度設定部から指示される指
示に従って、テキスト構造変形規則を参照し、テキスト
構造の変形を行ない休止挿入位置を変えるテキスト構造
変形部と、当該変形されたテキスト構造が合成音声とし
て出力される際に、単音の音韻継続時間長と休止長を速
度毎に違った比率で伸縮させるよう音韻・韻律制御パラ
メータを決める時間長決定部11と、指定される速度情報
を参照し、ある特定の発声速度が与えられた場合には、
単音の音韻継続時間長と休止長を速度指定値に関わらず
一定に保つことを前記時間長決定部に設けた点にあり、
従来例の装置に比較し、速度設定部10，テキスト構造変
形部８，テキスト構造変形規則を有する規則ファイル
９，時間長決定部11を付加したことを特徴としている。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram of a voice rule synthesizing apparatus which is an embodiment of the present invention. In FIG. 1, the same parts as those in FIG. 14 are designated by the same reference numerals. 1 differs from the conventional device (FIG. 14) in that a speed setting unit for specifying a speed is provided, and the text structure transformation rule is referred to in accordance with an instruction given from the speed setting unit. The text structure transformation unit that changes the pause insertion position by performing transformation of the above, and when the transformed text structure is output as synthesized speech, expands and contracts the phoneme duration and pause length of a single note at different rates for each speed. As described above, referring to the time length determination unit 11 that determines the phoneme / prosodic control parameter and the specified speed information, if a certain specific utterance speed is given,
There is a point provided in the time length determining unit to keep the phoneme duration and pause length of a single note constant regardless of the specified speed value,
Compared with the conventional device, a speed setting unit 10, a text structure modification unit 8, a rule file 9 having text structure modification rules, and a time length determination unit 11 are added.

【００１３】以下、図１を参照して本発明に係る音声規
則合成装置の構成と動作について説明する。図１におい
て任意のテキストがキーボードや通信線路，ファイルあ
るいはマイクロホンといった何らかの入力デバイス（図
示せず）を通じて言語解析部１に入力される。言語解析
部１は入力されたテキストの構造解析を行ない、テキス
ト全体の構造を決定するとともに、解析規則ファイル３
に格納された解析規則を基に、辞書２を介してテキスト
に含まれる単語を調べ、各単語の構文素性，意味素性と
各単語間の依存関係など文構造情報を決定する。速度設
定部10からの指示が通常速度の場合には、上述の情報か
らなるテキスト構造は音韻処理部４にそのまま出力され
る。The configuration and operation of the speech rule synthesizing apparatus according to the present invention will be described below with reference to FIG. In FIG. 1, an arbitrary text is input to the language analysis unit 1 through some input device (not shown) such as a keyboard, a communication line, a file or a microphone. The language analysis unit 1 analyzes the structure of the input text, determines the structure of the entire text, and analyzes the structure of the analysis rule file 3
Based on the parsing rule stored in, the words included in the text are examined through the dictionary 2 and sentence structure information such as syntactic features and semantic features of each word and dependency relation between each word is determined. When the instruction from the speed setting unit 10 is the normal speed, the text structure including the above information is directly output to the phoneme processing unit 4.

【００１４】音韻処理部４は、生成規則ファイル５に格
納された生成規則を基に、テキスト構造中に含まれる単
語を調べ各単語のアクセントならびに音韻記号列を決定
する。テキスト構造から話調連鎖を求め、文全体のイン
トネーションパターンである韻律記号列を決定し、パラ
メータ生成部６で必要な音韻・韻律記号列を生成する。
速度設定部10からの指示が通常速度の場合には音韻・韻
律記号列と予め用意されている音韻・韻律制御パラメー
タ情報をパラメータ生成部６に出力する。パラメータ生
成部６において、音声合成パラメータが作られ音声波形
生成部７に出力される。音声波形生成部７は音声合成パ
ラメータによって駆動され、合成音声をスピーカー，放
送機器，通信線路あるいはファイルといった何らかの出
力デバイスを通じて出力する。以上の構成により、通常
速度の場合の合成音声は従来の装置と何ら変わることな
く実施することができる。The phonological processing unit 4 examines the words contained in the text structure based on the production rules stored in the production rule file 5, and determines the accent and phonological symbol string of each word. The tone chain is obtained from the text structure, the prosodic symbol string that is the intonation pattern of the entire sentence is determined, and the necessary phonological / prosodic symbol string is generated by the parameter generating unit 6.
When the instruction from the speed setting unit 10 is the normal speed, the phoneme / prosodic symbol string and the prepared phoneme / prosodic control parameter information are output to the parameter generation unit 6. The parameter generation unit 6 creates a voice synthesis parameter and outputs it to the voice waveform generation unit 7. The voice waveform generator 7 is driven by a voice synthesis parameter and outputs the synthesized voice through some output device such as a speaker, a broadcasting device, a communication line or a file. With the above configuration, the synthesized voice at the normal speed can be implemented without any change from the conventional device.

【００１５】次に、一実施例として速度設定部から通常
速度に対して相対的に高速発声が指示された場合の本装
置の具体的な動作について説明する。次の入力テキスト
例文を用いて説明する。Next, as one embodiment, a specific operation of the present apparatus when the speed setting section gives a high-speed utterance relatively to the normal speed will be described. This is explained using the following input text example sentence.

【００１６】例文 “今日はここでお弁当を広げて食べ
ることにしましょう” 言語処理部によって入力テキストのテキスト構造解析が
行なわれる。例では、図２に示すテキスト構造が得られ
る。ここでHead/Modify は句の構文素性を示し、MOは句
の拍数を示す。NOW は句を構成する語数を示し、(ADJ）
は、句どうしの構文的なつながりの強度を示している。
NOは文頭からの当該句の位置を示し、Phonotype は音節
音の並びを示す。例えば、テキスト「今日は」に対して
「きょ−わ」となる。phrases は語の構文素性を示して
いる。Example sentence "Let's spread the lunch here and eat today" The text structure analysis of the input text is performed by the language processing unit. In the example, the text structure shown in FIG. 2 is obtained. Here, Head / Modify indicates the syntactic feature of the phrase, and MO indicates the beat of the phrase. NOW indicates the number of words that make up the phrase, (ADJ)
Shows the strength of syntactic connections between phrases.
NO indicates the position of the phrase from the beginning of the sentence, and Phonotype indicates the sequence of syllables. For example, the text "Today is" becomes "Kyowa". phrases indicate syntactic features of a word.

【００１７】テキスト構造は、各々の句を基にして話調
単位にまとめられる。上記の結果から通常の発声におけ
る休止挿入位置を、簡便な表記で示すと次のようにな
る。尚、“／／”は休止挿入の位置をテキスト上で視覚
的に示している。The text structure is grouped into speech units based on each phrase. From the above result, the rest insertion position in normal utterance is shown in a simple notation as follows. Incidentally, "//" visually indicates the position of the pause insertion on the text.

【００１８】休止挿入位置１ “今日は／／ここで／／
お弁当を広げて／／食べることにしましょう” 図２のテキスト構造を基にして、音韻処理部において音
韻列を求める。これを図３に示す。例えば、「きょ−
わ」の音韻記号列は、“kjo : wa”となる。音韻記号の
下に記載した数値は、当該韻律語のアクセント型を示し
ている。こうした処理は生成規則ファイル５に格納され
ている規則を適用することで得る。Pause insertion position 1 "Today /// here //
Let's spread the box lunch // Let's eat. "Based on the text structure shown in Fig. 2, the phoneme processing unit obtains a phoneme sequence. This is shown in Fig. 3. For example," Kyo-
The phoneme symbol string of "wa" is "kjo: wa". The numerical value written below the phonetic symbol indicates the accent type of the prosodic word. Such processing is obtained by applying the rule stored in the generation rule file 5.

【００１９】続いて、イントネーション処理を行ないフ
レーズ制御記号（P0〜P3）と休止制御記号（S0〜S4）を
生成する。アクセント情報からアクセント制御記号（DH
〜DHA)を生成し、その結果、図４の音韻・制御記号列が
求まる。Then, intonation processing is performed to generate phrase control symbols (P0 to P3) and pause control symbols (S0 to S4). Accent control symbol (DH
~ DHA) is generated, and as a result, the phoneme / control symbol string in FIG. 4 is obtained.

【００２０】通常の発声速度では、フレーズ制御記号，
休止制御記号，アクセント制御記号，音韻継続時間長
（Rate）の各制御記号に対する音韻・韻律制御パラメー
タ値が、予め登録された規定値に従って設定される（図
５を参照）。これは音韻処理部４内に保持してある。At normal speaking rate, phrase control symbols,
The phoneme / prosodic control parameter value for each control symbol of the pause control symbol, the accent control symbol, and the phoneme duration time (Rate) is set according to a pre-registered specified value (see FIG. 5). This is held in the phoneme processing unit 4.

【００２１】通常発声速度の場合には上述の音韻・韻律
記号列，並びに音韻・韻律制御パラメータ値の情報から
合成器を駆動するパラメータ系列が作成され、その結果
が合成器に送られ合成音声が得られる。In the case of normal utterance speed, a parameter sequence for driving the synthesizer is created from the above-mentioned phonological / prosodic symbol string and information on the phonological / prosodic control parameter value, and the result is sent to the synthesizer to generate synthetic speech. can get.

【００２２】発声速度を変える（速くする）場合一般に速く文を発声した場合には、テキストの文構造か
ら判断される依存関係だけから話調単位を構成すること
ができない。通常速度における話調単位に比べると、一
話調に含まれる音節数が増加し同時に休止が欠落する。
但し、人間による発声を観察すると一話調単位の音節の
数が変化した場合に、その影響によりテキスト全体に渡
って一様に休止が欠落したり、また一話調に含まれる音
節数がおしなべて均等に増加する訳ではなく許容の範囲
が認められる。In the case where the utterance speed is changed (increased) In general, when a sentence is uttered quickly, it is not possible to form the tone unit only from the dependency determined from the sentence structure of the text. The number of syllables included in one tone increases and the pause is lost at the same time, compared with the tone unit at the normal speed.
However, when observing human utterances, when the number of syllables in a unit of one tone changes, due to the effect, pauses are uniformly missing throughout the text, and the number of syllables included in one tone is generally large. There is a permissible range, not an equal increase.

【００２３】そこで、テキスト構造変形部８では、速度
設定部10の指示に従って、一話調の音節数を制限する処
理を行なう。例えば、速い発声（10.0モーラ／秒）とい
う発声速度の指示が速度設定部から指示されたとする。
テキスト構造変形部８は、当該変形部内の図６に示すテ
ーブルを参照し、指定される発声速度に応じて一話調に
含まれる音節数を調整する。本例の場合には、一韻律句
の最小音節数は４，最大15となる。Therefore, the text structure modification unit 8 carries out a process of limiting the number of syllables in one tone according to the instruction of the speed setting unit 10. For example, it is assumed that the speed setting unit gives an instruction of a vocalization speed of high vocalization (10.0 mora / second).
The text structure modification unit 8 refers to the table shown in FIG. 6 in the modification unit and adjusts the number of syllables included in one tone according to the designated utterance speed. In the case of this example, the minimum number of syllables in one prosodic phrase is 4, and the maximum is 15.

【００２４】文を観察すると比較的一韻律句として結び
付きやすい構文特徴を持つ構造、それとは逆に、一韻律
句としては結び付かない性質の構文特徴を持つ句があ
る。例えば、提題句に隣接する時間や場所を示す副語
句、あるいは目的語とそれに前節する他の格補語が一話
調としてまとまる。逆に従属句や、接続句は、前後の句
と韻律上結びつくことがほとんどない。予めこうした変
形規則をテキスト構造変形規則ファイル９に格納して置
き、速度設定部10からの指示により一韻律句の音節数が
上述のように変化した場合、テキスト構造変形部８はテ
キスト構造を当該変形規則を用いて変形する。When a sentence is observed, there is a structure having a syntactic feature that is relatively easy to connect as a prosodic phrase, and conversely, there is a phrase having a syntactic feature that is not connected as a prosodic phrase. For example, a subphrase that indicates the time or place adjacent to the propositional phrase, or an object and other case complements preceding it are collected as a one-tone tone. On the contrary, subordinate phrases and connecting phrases are rarely connected proximately to the preceding and following phrases. Such transformation rules are stored in advance in the text structure transformation rule file 9, and when the number of syllables of one prosodic phrase changes as described above according to an instruction from the speed setting unit 10, the text structure transformation unit 8 changes the text structure Transform using transformation rules.

【００２５】例では、テキスト構造中の語の構文素性を
検索し、提題句と場所副詞句からなり、連鎖する二韻律
句があることを見つけ、二つの韻律句の音節数の合計が
16モーラ以内であることを確認し、提題句に後続する場
所副詞句を一話調となるように結合し話調単位を変更す
る。従って、提題句と場所副詞句の二話調間にあった休
止が削除されることになる。こうした変形操作によっ
て、速い発声速度に応じるテキスト構造を得ることが可
能となる。このテキスト構造は音韻処理部４において、
生成規則ファイル５の規則を適用することで、テキスト
構造に対応する音韻列が生成される。その結果を図７に
示す。In the example, the syntactic feature of a word in the text structure is searched, and it is found that there are two prosodic phrases that consist of a subject phrase and a place adverb phrase and are chained, and the total number of syllables of the two prosodic phrases is
Confirm that it is within 16 mora, and combine the place adverbial phrase following the propositional phrase so that it becomes one-tone and change the tone unit. Therefore, the pause between the two episodes of the proposition phrase and the place adverb phrase will be deleted. By such a transformation operation, it becomes possible to obtain a text structure corresponding to a high speaking speed. In the phoneme processing unit 4, this text structure is
By applying the rule of the generation rule file 5, the phoneme sequence corresponding to the text structure is generated. FIG. 7 shows the result.

【００２６】上記の結果から通常の発声における休止挿
入位置を、簡便な表記で示すと次のようになる。休止挿
入位置が減少していることがわかる。休止挿入位置２ “今日はここで／／お弁当を広げて／
／食べることにしましょう” 続いて、イントネーション処理を行ないフレーズ制御記
号と休止制御記号が生成される。アクセント情報からア
クセント制御記号が生成されて、図８の音韻・制御記号
列が求まる。From the above results, the rest insertion position in normal utterance is shown in a simple notation as follows. It can be seen that the rest insertion position has decreased. Pause insertion position 2 “Today is here // Spread your lunch //
/ Let's eat! ”Then, intonation processing is performed to generate phrase control symbols and pause control symbols. The accent control symbols are generated from the accent information, and the phoneme / control symbol string in FIG. 8 is obtained.

【００２７】次に、単音の音韻継続時間長と休止長を速
度に応じて比率で伸縮させるように音韻・韻律制御パラ
メータ値情報を決める。これは、速度設定部10からの指
示によって時間長決定部11によって行なわれる。まず単
音の音韻継続時間長（Rate）を図９に示す手続きに従い
決める。Next, phonological / prosodic control parameter value information is determined so as to expand / contract the phoneme duration and pause length of a single note in proportion to the speed. This is performed by the time length determining unit 11 according to an instruction from the speed setting unit 10. First, the phoneme duration (Rate) of a single note is determined according to the procedure shown in FIG.

【００２８】ここで、最小継続時間は、当該継続時間長
で音声を合成した場合に、それが無理なく可聴でき得る
最小の時間長である。最大継続時間長は、逆に当該継続
時間長で音声を合成した場合に、それが不自然なく可聴
でき得る最大の時間長である。この値は合成器の性能に
応じて指定する。この値は予め時間長決定部11内に格納
しておくことも可能であるし、あるいは外部から制御信
号によって与えてもよい。F1は発声速度値（Ｘ）に応じ
て、すくなくとも一定比率で変化しない戻り値を返す関
数である。例えば、図10のような関数で与えることがで
きる。A1〜A5はそれぞれ異なる定数である。関数F1はこ
れらに限られる訳ではなく、非線形の応答を持つ様々の
関数を利用することができる。[0028] Here, the minimum duration is the minimum duration that can be reasonably heard when voice is synthesized with the duration. On the contrary, the maximum duration is the maximum duration that the sound can be heard without any unnaturalness when the voice is synthesized with the duration. This value is specified according to the performance of the synthesizer. This value can be stored in advance in the time length determination unit 11 or can be given from the outside by a control signal. F1 is a function that returns a return value that does not change at least at a constant ratio in accordance with the speaking speed value (X). For example, it can be given by a function as shown in FIG. A1 to A5 are different constants. The function F1 is not limited to these, and various functions having a non-linear response can be used.

【００２９】続いて、休止長のパラメータ値（Pn）を図
11の手続きに従って決める。ここで、最小休止長は、当
該休止長で音声を合成した場合に、それが不自然なく可
聴でき得る最小の時間長である。最大休止長は、逆に当
該時間長で音声を合成した場合に、それが不自然なく可
聴でき得る最大の時間長である。この値は合成器の性能
に応じて指定する。この値は予め時間長決定部11内に格
納しておくことも可能であるし、あるいは外部から制御
信号によって与えてもよい。F2は発声速度指定の値に応
じて、すくなくとも一定比率に変化しない戻り値を返す
関数である。例では、図12に示すように、それぞれ値域
の異なる一次関数を用いて実現している。B1〜B4はそれ
ぞれ関数の係数である。関数F2はこれらに限られる訳で
はなく、非線形の応答を持つ様々の関数を利用すること
ができる。Next, the parameter value (Pn) of the rest length is shown in FIG.
Determine according to the procedure of 11. Here, the minimum pause length is the minimum time length at which the voice can be heard without any unnaturalness when the voice is synthesized with the pause length. On the contrary, the maximum pause length is the maximum length of time that the sound can be heard without any unnaturalness when the voice is synthesized for the length of time. This value is specified according to the performance of the synthesizer. This value can be stored in advance in the time length determination unit 11 or can be given from the outside by a control signal. F2 is a function that returns a return value that does not change at least in a fixed ratio in accordance with the value of the speaking speed specification. In the example, as shown in FIG. 12, it is realized by using linear functions having different range values. B1 to B4 are the coefficients of the function, respectively. The function F2 is not limited to these, and various functions having a non-linear response can be used.

【００３０】上述の二つの手続きによって、音韻継続時
間長と休止長が決まり、音韻・韻律制御パラメータ値の
情報が決まる。この結果、図13に示すように設定され
る。図13に示されるように音韻継続時間長（Rate）と休
止時間長（S1〜S3）が小さくなっている。The above-mentioned two procedures determine the phoneme duration and pause, and the information on the phoneme / prosodic control parameter value. As a result, the settings are made as shown in FIG. As shown in FIG. 13, the phoneme duration (Rate) and the pause duration (S1 to S3) are small.

【００３１】テキスト構造変形部８により休止挿入位置
を変えたテキスト構造から得た音韻・韻律記号列と、時
間長決定部11により高速発声の指示に応じて作成された
音韻・韻律制御パラメータが同時に作成され、パラメー
タ生成部に出力される。パラメータ生成において音声合
成パラメータが作成され、パラメータ系列が合成器に送
られて指定の速度に応じた合成音声が得られる。The phoneme / prosodic symbol string obtained from the text structure in which the pause insertion position is changed by the text structure modification unit 8 and the phoneme / prosodic control parameter created by the time length determination unit 11 in response to the high-speed utterance instruction are simultaneously executed. It is created and output to the parameter generation unit. A voice synthesis parameter is created in the parameter generation, and the parameter sequence is sent to the synthesizer to obtain a synthesized voice according to the designated speed.

【００３２】以上、図１の音声合成装置を用いて、発声
速度を変えた場合にも適切で自然な合成音声が得られる
方法について説明した。実施例で挙げた速度制御に関わ
るテキスト構造変形規則はこの限りではなく、必要に応
じて規則ファイルに格納することによって情報を新たに
付加することができる。The method for obtaining an appropriate and natural synthesized speech using the speech synthesizer shown in FIG. 1 has been described above. The text structure modification rules relating to speed control described in the embodiments are not limited to this, and information can be newly added by storing them in the rule file as needed.

【００３３】テキスト構造変形部８に格納する一話調の
音節数指定のためのテーブルの各値は、文章スタイルや
音声の合成目的により変更することが可能で、上述の一
実施例に示した限りではない。また、以上の実施例にお
いて言語解析部に日本語文を入力しているが、単語単位
で入力して処理してもよく、また文章を入力してもよ
い。Each value in the table for designating the number of syllables for one voice stored in the text structure modification unit 8 can be changed according to the sentence style and the purpose of synthesizing the voice, and is shown in the above-mentioned one embodiment. Not as long. Further, although the Japanese sentence is input to the language analysis unit in the above embodiments, it may be input and processed in word units, or a sentence may be input.

【００３４】[0034]

【発明の効果】本発明によれば、入力テキストを言語解
析して得られたテキスト構造に対し、発声速度指定に従
って変形することで休止挿入位置を変えることができ、
同時に単音継続時間長と休止長をその速度に応じて一定
比率によらず変えるように音韻・韻律制御パラメータ値
を決めることで、休止挿入位置と単音継続時間長と休止
長が運動して変化するために自然な合成音声を得ること
ができる。According to the present invention, the pause insertion position can be changed by deforming the text structure obtained by linguistically analyzing the input text according to the utterance speed specification.
At the same time, by deciding the phoneme / prosodic control parameter value so that the duration of the single note and the pause length are changed according to the speed without depending on a fixed ratio, the pause insertion position and the duration of the single note and the pause length are changed by movement. Therefore, a natural synthesized voice can be obtained.

[Brief description of drawings]

【図１】本発明の一実施例のブロック図FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】テキスト構造例を示す第１の図FIG. 2 is a first diagram showing an example of a text structure.

【図３】テキスト構造例を示す第２の図FIG. 3 is a second diagram showing an example of text structure.

【図４】音韻・制御記号例を示す第１の図FIG. 4 is a first diagram showing examples of phoneme / control symbols.

【図５】各制御記号に対するパラメータ値を示す第１
の図FIG. 5 shows a first parameter value for each control symbol.
Illustration

【図６】一話調の音節数の指定テーブルを示す図FIG. 6 is a diagram showing a designation table for the number of syllables in one-speaking tone

【図７】テキスト構造例を示す第３の図FIG. 7 is a third diagram showing a text structure example.

【図８】音韻・制御記号例を示す第２の図FIG. 8 is a second diagram showing an example of phoneme / control symbols.

【図９】音韻継続時間長の決定のフローチャートFIG. 9 is a flowchart for determining a phoneme duration.

【図１０】音韻継続時間長の決定のための非線形関数
を示す図FIG. 10 is a diagram showing a non-linear function for determining the phoneme duration.

【図１１】休止長の決定のフローチャートFIG. 11 is a flowchart for determining a rest length.

【図１２】休止長の決定のための非線形関数を示す図FIG. 12 is a diagram showing a non-linear function for determining the rest length.

【図１３】各制御記号に対するパラメータ値を示す第
２の図FIG. 13 is a second diagram showing parameter values for each control symbol.

【図１４】従来の音声合成装置のブロック図FIG. 14 is a block diagram of a conventional speech synthesizer.

[Explanation of symbols]

１…言語処理部２…辞書ファイル３…解析規則ファイル４…音韻処理部５…生成
規則ファイル６…パラメータ生成部７…音声波形生成部８…
テキスト構造変形部９…テキスト構造変形規則ファイル 10…速度設定部
11…時間長決定部1 ... Language processing unit 2 ... Dictionary file 3 ... Analysis rule file 4 ... Phonological processing unit 5 ... Generation rule file 6 ... Parameter generation unit 7 ... Speech waveform generation unit 8 ...
Text structure transformation section 9 ... Text structure transformation rule file 10 ... Speed setting section
11… Time length determination unit

Claims

[Claims]

1. A language processing means for linguistically analyzing input text to create a text structure composed of a chain of word tone units, a speed setting means for setting a utterance speed of synthetic speech, and a speed setting means. A text structure transforming means that transforms the text structure created by the language processing means according to the set voice speed to change the insertion position of the pause, and a phonological / prosodic symbol string from the text structure obtained by the text structure transforming means. The phoneme processing means for generating, the time length determining means for determining the phoneme duration and the pause length according to the voice speed set by the speed setting means, and outputting the control parameters, and the control output by the time length determining means. Parameter generating means for generating a phoneme / prosodic symbol string from the phoneme processing means as a synthetic speech parameter using parameters A voice synthesizing device comprising: a voice waveform generating means for outputting a synthesized voice using the synthesized voice parameter from the parameter generating means.

2. The time length determining means expands / contracts the phoneme duration time and pause length of a single note at different rates for each speed when the set utterance speed is within a predetermined range. The speech synthesizer according to claim 1.

3. The time length determining means sets the phoneme duration and the rest length of a single note to fixed values, respectively, when the set utterance rate is out of a predetermined range. Speech synthesizer.