JPH09244680A - Device and method for rhythm control - Google Patents

Device and method for rhythm control

Info

Publication number
JPH09244680A
JPH09244680A JP5315996A JP5315996A JPH09244680A JP H09244680 A JPH09244680 A JP H09244680A JP 5315996 A JP5315996 A JP 5315996A JP 5315996 A JP5315996 A JP 5315996A JP H09244680 A JPH09244680 A JP H09244680A
Authority
JP
Japan
Prior art keywords
word
prosody
voice
sentence
range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5315996A
Other languages
Japanese (ja)
Other versions
JP3241582B2 (en
Inventor
Takahiko Niimura
貴彦 新村
Keiji Hayashi
慶士 林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
N T T DATA TSUSHIN KK
NTT Data Corp
Original Assignee
N T T DATA TSUSHIN KK
NTT Data Communications Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by N T T DATA TSUSHIN KK, NTT Data Communications Systems Corp filed Critical N T T DATA TSUSHIN KK
Priority to JP05315996A priority Critical patent/JP3241582B2/en
Publication of JPH09244680A publication Critical patent/JPH09244680A/en
Application granted granted Critical
Publication of JP3241582B2 publication Critical patent/JP3241582B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To obtain a natural speech as a synthesis result when a speech is synthesized by combining a speech of a document different in pronunciation style and a speech of word together. SOLUTION: In addition to a data base 1 wherein speeches of various words are registered and a data base 5 wherein speeches of fixed form parts of various documents are registered, a table 3 containing registered rhythms of those words and a table 7 containing ranges of rhythms that word parts of those documents should have are prepared. The speech and rhythm of a desired word are read out and the speech of a fixed part of a desired document and the rhythm range of the word part are read out. Then the rhythm 23 of the word is compared with the rhythm range 27 of the word part of the document to calculate the difference 29 between both the rhythms. Then rhythm control 17 based upon the difference 29 is exercised over the speech 21 of the word to convert the speech 21 of the word into a speech 31 having a rhythm converging in the rhythm range 27. Then the converted word speech 31 is combined with the speech 25 of the fixed form part of the document to synthesize the speech 33 of the complete document.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の技術分野】本発明は、一般には音声合成の技術
に関わり、特に、システム応答文のように定型文であっ
てその中の特定の単語だけが取り替えられるような類い
の文の自然な発声音を合成するのに適した韻律制御の技
術に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to speech synthesis technology, and more particularly, it is a natural sentence of a type which is a fixed sentence such as a system response sentence in which only specific words are replaced. The present invention relates to a technique of prosody control suitable for synthesizing various vocal sounds.

【0002】[0002]

【従来の技術】普及している自動音声応答装置の構成音
で用いられる文章は一般にシステム応答文と呼ばれ、そ
の中の特定の単語だけが取り替え得る定型文であること
が多い。例えば、「ABC銀行から振込がございまし
た」なるシステム応答文では、銀行名の単語が「AB
C」や「CDF」等の異なるものに取り替えられる。
2. Description of the Related Art A sentence used in a constituent sound of a popular automatic voice response device is generally called a system response sentence, and it is often a fixed sentence in which only a specific word can be replaced. For example, in the system response sentence "There was a transfer from ABC Bank", the word of the bank name was "AB
It can be replaced with a different one such as "C" or "CDF".

【0003】このようなシステム応答文を合成する場
合、従来は編集合成法により単語と文章とを繋げてい
る。編集合成法とは、例えば上記において「○○銀行か
ら振込がございました」という定型的な文章の単語部分
「○○」に、「ABC」や「CDF」等の任意の単語を
貼り付けて合成する方法である。
When synthesizing such a system response sentence, a word and a sentence are conventionally connected by an edit synthesizing method. The editing and composition method is, for example, the composition of a certain sentence such as "There was a transfer from XX bank" to the word portion "XX" by pasting an arbitrary word such as "ABC" or "CDF". Is the way to do it.

【0004】[0004]

【発明が解決しようとする課題】編集合成法を用いた従
来の音声合成処理では、「ABC」のような単語の音声
と、文章の「銀行から振込がございました」のような定
型部分の音声とを単純に繋げている。しかし、同じ文章
又は単語でも、異なる発話様式で発声されたものは韻律
が微妙に異なるため、聴覚的に異なった感覚を与える。
例えば、複数人による会話の音声と、一人による朗読の
音声とを聞き比べると、同じ文章であっても話し方つま
り発話様式が異なるために、異なった感じに聞こえる。
また、単独に発声された単語の音声と、文章に組込まれ
て発声された同じ単語の音声とを聞き比べても、やはり
異なった感じに聞こえる。通常、音声合成システムに登
録されている個々の単語や文章の音声は異なる発話様式
で発声されたものであるから、従来のようにその単語と
文章の音声を単純に繋げると、文章全体の音声は不自然
に聞こえてしまう。
[Problems to be Solved by the Invention] In the conventional voice synthesis processing using the edit synthesis method, the voice of a word such as "ABC" and the voice of a fixed part such as "There was a transfer from the bank" in the sentence. And is simply connected. However, even if the same sentence or word is uttered in different utterance styles, the prosody is slightly different, and thus the sensation is auditorily different.
For example, when a voice of a conversation by a plurality of people and a voice of a reading by one person are compared, even the same sentence sounds different because the way of speaking, that is, the utterance style is different.
In addition, even if the sound of a word uttered alone is compared with the sound of the same word uttered incorporated in a sentence, it still sounds different. Usually, the voices of individual words and sentences registered in the voice synthesis system are uttered in different utterance styles. Sounds unnatural.

【0005】この従来技術の下で、多様な発話様式をも
つ応答文を自然に聞こえるように作るためには、各単語
毎に多様な登録文に適合した韻律をもった他種類の音声
を登録しておく必要がある。しかし、特に小さいシステ
ムではデータベースの容量制限から他種類の音声を各登
録語毎に用意することができない。また、音声収録の面
でも、あらゆる韻律で単語を発声することは困難であ
る。
According to this conventional technique, in order to create response sentences having various utterance styles so as to be heard naturally, other kinds of voices having prosody suitable for various registration sentences are registered for each word. You need to do it. However, especially in a small system, it is not possible to prepare other types of voices for each registered word due to the limited capacity of the database. Also, in terms of voice recording, it is difficult to utter a word with any prosody.

【0006】従って、本発明の目的は、発話様式が異な
る文章の音声と単語の音声とを結合して音声を合成する
場合、合成結果が自然な音声に聞こえるようにすること
にある。
Therefore, an object of the present invention is to make the synthesized result sound natural when the speech of a sentence and the speech of a word having different utterance styles are combined to synthesize a speech.

【0007】[0007]

【課題を解決するための手段】本発明に従えば、文章の
定型部分の音声と単語の音声とを繋げて完全な文章の音
声を合成する場合に、その単語の音声の韻律がその文章
に適合するように制御される。この韻律制御では、ま
ず、その単語の音声がもつ韻律を示すデータと、その文
章の単語部分がもつ韻律範囲を示すデータとが取得さ
れ、そして、それらのデータが比較される。次に、その
比較の結果に基づいて、その単語の音声の韻律が上記単
語部分の韻律範囲内に入るように、その単語の音声が調
節される。このように韻律が制御された単語の音声を、
文章の定型部分の音声に繋げれば、自然に聞こえる完全
な文章の音声が得られる。
According to the present invention, when a voice of a fixed part of a sentence and a voice of a word are connected to synthesize a voice of a complete sentence, the prosody of the voice of the word is changed to the sentence. Controlled to fit. In this prosody control, first, data indicating the prosody of the voice of the word and data indicating the prosody range of the word portion of the sentence are acquired, and these data are compared. Then, based on the result of the comparison, the voice of the word is adjusted so that the prosody of the voice of the word falls within the prosodic range of the word portion. In this way, the sound of a word whose prosody is controlled is
If you connect it to the sound of a fixed part of a sentence, you get a complete sentence sound that sounds natural.

【0008】韻律制御で操作する韻律パラメータには、
ピッチ周波数、振幅、及び時間長の3種類のパラメータ
を採用することができる。韻律制御法には、ピッチ同期
波形重畳法を用いることができる。
The prosody parameters operated by the prosody control are:
Three types of parameters can be adopted: pitch frequency, amplitude, and time length. A pitch synchronization waveform superposition method can be used as the prosody control method.

【0009】本発明によれば、一つの発話様式で発声さ
れた単語の音声を、多様な発話様式で発声された種々の
文章の各々に適合する韻律をもった音声に変換できるの
で、各単語について多様な韻律をもつ他種類の音声を用
意する必要がない。よって、データベース容量の小さい
小システムでも、自然な音声をもった文章が合成でき
る。
According to the present invention, the voice of a word uttered in one utterance style can be converted into a voice having a prosody suitable for each of various sentences uttered in various utterance styles. There is no need to prepare other kinds of sounds with various prosody. Therefore, even a small system with a small database capacity can synthesize a sentence with a natural voice.

【0010】[0010]

【発明の実施の形態】図1は本発明の一実施形態にかか
る音声合成装置の構成を示す。この装置は実際には、本
発明に従う音声合成処理のためのアプリケーションプロ
グラムがインストールされたコンピュータシステムであ
る。
1 shows the configuration of a speech synthesizer according to an embodiment of the present invention. This device is actually a computer system in which an application program for speech synthesis processing according to the present invention is installed.

【0011】磁気ディスク装置の様な適当なストレージ
内に、単語データベース1、単語韻律テーブル3、文章
データベース5、及び文中韻律テーブル7が用意されて
いる。単語データベース1には、種々の単語の単語番号
と、それらの単語の実際の発声から得られた音声データ
とが格納されている。単語韻律テーブル3には、上記種
々の単語の実際の発声から測定された韻律のデータが、
単語番号と共に格納されている。ここで、各単語の韻律
データは、各単語の音声区間で測定されたピッチ周波
数、振幅及び時間長という3種類の韻律パラメータの平
均値から構成される。
A word database 1, a word prosody table 3, a sentence database 5, and a sentence prosody table 7 are prepared in an appropriate storage such as a magnetic disk device. The word database 1 stores word numbers of various words and voice data obtained from actual utterance of those words. In the word prosody table 3, the prosody data measured from the actual utterances of the various words are
It is stored with the word number. Here, the prosody data of each word is composed of an average value of three types of prosody parameters, which are pitch frequency, amplitude, and time length, which are measured in the voice section of each word.

【0012】文章データベース5には、種々の文章の文
章番号と、それらの文章の実際の発声から得られた各文
章の定型部分(つまり、取り替え可能な単語の部分を抜
いた残り部分)の音声データとが格納されている。文中
韻律テーブル7には、上記種々の文章の取り替え可能な
単語部分がもつ韻律の範囲を示すデータが格納されてい
る。各文章の韻律範囲データは、各文章の上記単語部分
がもつ上記3種の韻律パラメータの平均値と標準偏差と
から構成され、それは次の様にして作成されたものであ
る。すなわち、各文章について、上記単語部分だけを種
々の単語に入れ替えた多数の文章を実際に発声して、そ
れらの単語部分の韻律(上記3種の韻律パラメータの
値)を測定し、そして、各パラメータについて測定値の
平均値と標準偏差とを求める。発明者が行った実験によ
れば、個々の文章毎に単語部分のピッチ周波数、振幅及
び時間長に特有の範囲が存在することが分った。従っ
て、上記の様にして作成された各文章の韻律範囲データ
は、各文章の単語部分がもつ各文章に特有の韻律範囲を
示している。このことは、各文章を発声する時、その韻
律範囲データ内の韻律で単語部分を発声すれば、その単
語は各文章の定型部分と聴覚的に整合して、文章全体が
自然に聞こえることを意味する。
In the sentence database 5, the sentence numbers of various sentences and the voices of the fixed parts of each sentence (that is, the remaining parts excluding the replaceable word parts) obtained from the actual utterances of those sentences are recorded. Data and are stored. The in-sentence prosody table 7 stores data indicating the range of prosody of the replaceable word parts of the various sentences. The prosodic range data of each sentence is composed of the average value and the standard deviation of the above-mentioned three kinds of prosodic parameters of the word portion of each sentence, and is created as follows. That is, for each sentence, a large number of sentences in which only the word portions are replaced with various words are actually uttered, and the prosody of these word portions (the values of the above three types of prosody parameters) are measured, and Obtain the average value and standard deviation of the measured values for the parameters. According to an experiment conducted by the inventor, it was found that there is a range peculiar to the pitch frequency, the amplitude and the time length of the word part for each individual sentence. Therefore, the prosody range data of each sentence created as described above indicates the prosody range peculiar to each sentence included in the word portion of each sentence. This means that when uttering each sentence, if a word part is uttered in the prosody of the prosodic range data, that word will be auditorily matched with the fixed part of each sentence and the whole sentence will sound natural. means.

【0013】CPU9は、アプリケーションプログラム
を実行することにより、単語検索処理11、文章検索処
理13、韻律生成処理15、韻律制御処理17、および
編集合成処理19という5つのプロセスを行う。CPU
9への入力は、所望のシステム応答文を構成する単語と
文章の単語番号と文章番号である。
By executing the application program, the CPU 9 carries out five processes including a word search process 11, a sentence search process 13, a prosody generation process 15, a prosody control process 17, and an edit synthesis process 19. CPU
Inputs to 9 are word numbers and sentence numbers of words and sentences that form a desired system response sentence.

【0014】入力された単語番号に応答して単語検索処
理11が行われる。この処理11では、単語データベー
ス1及び単語韻律テーブル3から、入力された単語番号
により識別される単語の音声データ21及び韻律データ
23が検索される。検索された単語の音声データ21は
韻律制御処理17へ渡され、韻律データ23は韻律生成
処理15へ渡される。
Word search processing 11 is performed in response to the input word number. In this process 11, the voice data 21 and the prosody data 23 of the word identified by the input word number are searched from the word database 1 and the word prosody table 3. The voice data 21 of the retrieved word is passed to the prosody control process 17, and the prosody data 23 is passed to the prosody generation process 15.

【0015】入力された文章番号に応答して文章検索処
理13が行われる。この処理13では、文章データベー
ス5及び文中韻律テーブル7から、入力された文章番号
により識別される文章の定型部分の音声データ25及び
単語部分の韻律範囲データ27が検索される。検索され
た定型部分の音声データ25は編集合成処理19へ渡さ
れ、単語部分の韻律範囲データ27は韻律生成処理15
へ渡される。
A sentence search process 13 is performed in response to the inputted sentence number. In this process 13, the sentence database 5 and the in-sentence prosody table 7 are searched for the voice data 25 of the fixed part of the sentence and the prosody range data 27 of the word part identified by the input sentence number. The retrieved fixed-form voice data 25 is passed to the edit synthesis processing 19, and the prosodic range data 27 of the word portion is prosody generation processing 15.
Passed to

【0016】韻律生成処理15では、単語の韻律データ
23と、文章の単語部分の韻律範囲データ27とが比較
され、両者の差分29が計算される。例えば、韻律デー
タ23の示すピッチ周波数が277[Hz]であり、韻
律範囲データ27の示すピッチ周波数の平均が325
[Hz]かつ標準偏差が34[Hz]である場合、韻律
範囲データ27が示すピッチ周波数の範囲は291〜3
59[Hz]であるから、周波数についての差分29は
291−277=14[Hz]である。他の韻律パラメ
ータについても、同様の方法で差分29が計算される。
これら3種の韻律パラメータの差分29は韻律制御処理
17へ渡される。尚、単語の韻律データ23の示す3種
のパラメータ値の内のいずれかが、韻律範囲データ27
の示すそのパラメータの範囲内に収っている場合は、そ
のパラメータについての差分29はゼロである。
In the prosody generation processing 15, the prosody data 23 of the word is compared with the prosody range data 27 of the word portion of the sentence, and the difference 29 between them is calculated. For example, the pitch frequency indicated by the prosody data 23 is 277 [Hz], and the average pitch frequency indicated by the prosody range data 27 is 325.
When the frequency is [Hz] and the standard deviation is 34 [Hz], the pitch frequency range indicated by the prosody range data 27 is 291-3.
Since it is 59 [Hz], the difference 29 regarding the frequency is 291-277 = 14 [Hz]. The difference 29 is calculated for other prosody parameters in the same manner.
The difference 29 between these three types of prosody parameters is passed to the prosody control process 17. Note that any of the three types of parameter values indicated by the word prosody data 23 is the prosody range data 27.
, The difference 29 for that parameter is zero.

【0017】韻律制御処理17では、単語の音声データ
21に対して韻律の差分29に基づいた韻律制御が行わ
れる。韻律制御方法には、例えばピッチ同期波形重畳法
が用いられる。韻律制御処理17の結果、元の音声デー
タ21は、韻律範囲データ27の示す韻律範囲内に入る
韻律を有した音声データ31に変換される。尚、差分2
9がゼロであるパラメータについては、制御が行われな
いから元の音声データ21の値がそのまま維持される。
この処理17により得られた制御された韻律をもつ音声
データ31は、編集合成処理19へ渡される。
In the prosody control processing 17, prosody control based on the prosody difference 29 is performed on the voice data 21 of the word. As the prosody control method, for example, the pitch synchronization waveform superposition method is used. As a result of the prosody control processing 17, the original voice data 21 is converted into voice data 31 having a prosody within the prosody range indicated by the prosody range data 27. The difference is 2
For the parameter in which 9 is zero, the original value of the audio data 21 is maintained as it is because no control is performed.
The voice data 31 having the controlled prosody obtained by this processing 17 is passed to the edit synthesis processing 19.

【0018】編集合成処理19では、制御された韻律を
もつ単語の音声データ31が、文章の定型部分の音声デ
ータ25の空白な単語部分に組込まれて、完全なシステ
ム応答文の音声データ33が生成される。この音声デー
タ33はスピーカのような音声出力装置によって音声に
再生される。
In the edit / synthesis processing 19, the voice data 31 of the word having the controlled prosody is incorporated into the blank word portion of the voice data 25 of the fixed part of the sentence, and the voice data 33 of the complete system response sentence is obtained. Is generated. The voice data 33 is reproduced as voice by a voice output device such as a speaker.

【0019】因みに、韻律制御で用いるピッチ同期波形
重畳法は、制御結果の合成音が高品質である、及びピッ
チ波形単位の容易な処理であるという特長を有する。こ
の方法の詳細は例えばE.Moulines及びF. Charpentierに
よる“Pitch-Syncronous Waveform Processing Techniq
ues for Text-to-Speech Synthesis using Diphones,”
Speech Communication, Vol.9, pp.453-467, Dec. 199
0に説明されている。
Incidentally, the pitch synchronization waveform superimposing method used in prosody control is characterized in that the synthesized speech as a control result has a high quality and the pitch waveform unit is easily processed. Details of this method can be found, for example, in “Pitch-Syncronous Waveform Processing Techniq” by E. Moulines and F. Charpentier.
ues for Text-to-Speech Synthesis using Diphones, ”
Speech Communication, Vol.9, pp.453-467, Dec. 199
It is described in 0.

【0020】図2はピッチ同期波形重畳法による韻律制
御の流れを示し、図3はこの韻律制御の各段階における
音声波形を示している。
FIG. 2 shows the flow of prosody control by the pitch synchronization waveform superposition method, and FIG. 3 shows the speech waveform at each stage of this prosody control.

【0021】まず、図3Aに示すような原音声波形を表
す元の音声データ21に窓関数をかけて、図3Bに示す
ように個々のピッチ波形を取り出す(S1)。次に、各
ピッチ波形に対し振幅の差分で決まる重み関数をかけ
て、図3Cに示すように各ピッチ波形の振幅を調節する
(S2)。次に、音声区間内に存在するピッチ波形の個
数を時間長の差分に応じて加減することにより、図3D
に示すように時間長を調節する(S3)。次に、ピッチ
波形の間隔(周期)をピッチ周波数の差分に応じて変更
してピッチ周波数を調節し、そして、それらのピッチ波
形を結合することにより、図3Eに示すような制御され
た韻律をもつ音声波形を表した音声データ31を作成す
る(S4)。
First, a window function is applied to the original voice data 21 representing the original voice waveform as shown in FIG. 3A to extract individual pitch waveforms as shown in FIG. 3B (S1). Next, a weighting function determined by the difference in amplitude is applied to each pitch waveform to adjust the amplitude of each pitch waveform as shown in FIG. 3C (S2). Next, by adding or subtracting the number of pitch waveforms existing in the voice section according to the difference in the time length, as shown in FIG.
The time length is adjusted as shown in (S3). Next, the pitch frequency is adjusted by changing the pitch waveform interval (cycle) according to the difference in pitch frequency, and by combining these pitch waveforms, a controlled prosody as shown in FIG. 3E is obtained. The voice data 31 representing the voice waveform is stored is created (S4).

【0022】このような韻律制御により、単語の原音声
の韻律が、組込まれるべき文章に適した韻律範囲内のも
のに修正されるから、その単語音声を組込んだ文章全体
の音声は自然に聞こえる。従って、1単語当たり1つの
音声データを、発話様式の異なる多様な文章に適合させ
て用いることができる。
By such prosody control, the prosody of the original voice of the word is corrected to be within the prosody range suitable for the sentence to be incorporated, so that the voice of the entire sentence incorporating the word voice naturally. hear. Therefore, one voice data per word can be used by adapting to various sentences having different utterance styles.

【図面の簡単な説明】[Brief description of drawings]

【図1】 本発明の一実施形態の構成を示すブロック
図。
FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention.

【図2】 ピッチ同期波形重畳法による韻律制御の流れ
を示したフローチャート。
FIG. 2 is a flowchart showing the flow of prosody control by the pitch synchronization waveform superposition method.

【図3】 韻律制御の各ステップにおける音声波形を示
した波形図。
FIG. 3 is a waveform diagram showing a speech waveform at each step of prosody control.

【符号の説明】[Explanation of symbols]

1 単語データベース 3 単語韻律テーブル 5 文章データベース 7 文中韻律テーブル 9 CPU 11 単語検索処理 13 文章検索処理 15 韻律生成処理 17 韻律制御処理 19 編集合成処理 21 単語の音声データ 23 単語の韻律データ 25 文章の定型部分の音声データ 27 文章の単語部分の韻律範囲データ 29 韻律の差分 31 制御された韻律をもつ単語の音声データ 33 合成されたシステム応答文 1 word database 3 word prosody table 5 sentence database 7 sentence prosodic table 9 CPU 11 word search process 13 sentence search process 15 prosody generation process 17 prosody control process 19 edit synthesis process 21 word speech data 23 word prosodic data 25 sentence pattern Partial voice data 27 Prosodic range data of word part of sentence 29 Prosody difference 31 Speech data of words with controlled prosody 33 Synthesized system response sentence

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】 単語部分と定型部分とを有する文章の前
記定型部分の音声と、前記単語部分に入るべき単語の音
声とを結合して、前記文章の全体の音声を合成するシス
テムのための、前記単語の韻律を前記文章に適合するよ
うに制御する装置において、 前記単語の音声と韻律を取得する単語取得手段と、 前記文章の単語部分の韻律範囲を取得する韻律範囲取得
手段と、 取得された前記単語の韻律と前記単語部分の韻律範囲と
を比較する韻律比較手段と、 取得された前記単語の音声に対し、前記韻律比較手段か
らの比較結果に基づく韻律制御を行い、それにより、前
記単語の音声を、前記韻律範囲に存在する韻律をもった
制御された単語の音声に変換する韻律制御手段と、を備
えたことを特徴とする韻律制御装置。
1. A system for synthesizing a voice of a fixed part of a sentence having a word part and a fixed part and a voice of a word to be included in the word part to synthesize the whole voice of the sentence. A device for controlling the prosody of the word so as to match the sentence, a word acquiring unit for acquiring a voice and a prosody of the word, a prosody range acquiring unit for acquiring a prosody range of a word portion of the sentence, Prosody comparison means for comparing the prosody of the word and the prosody range of the word portion, for the voice of the acquired word, performs prosody control based on the comparison result from the prosody comparison means, thereby, A prosody control device comprising: a prosody control unit that converts the voice of the word into a voice of a controlled word having a prosody existing in the prosody range.
【請求項2】 請求項1記載の装置において、 前記単語の韻律が、前記単語の音声がもつピッチ周波
数、振幅及び時間長という3種の韻律パラメータの値を
含み、 前記単語部分の韻律範囲が、前記単語部分がもつ前記3
種の韻律パラメータの値の範囲を含むことを特徴とする
韻律制御装置。
2. The device according to claim 1, wherein the prosody of the word includes three prosody parameter values of a pitch frequency, an amplitude, and a time length of the voice of the word, and the prosody range of the word portion is , The word part has the 3
A prosody control device comprising a range of values of seed prosody parameters.
【請求項3】 請求項1記載の装置において、 前記韻律制御手段が、ピッチ同期波形重畳法を用いた韻
律制御を行うことを特徴とする韻律制御装置。
3. The prosody control device according to claim 1, wherein the prosody control means performs prosody control using a pitch synchronization waveform superposition method.
【請求項4】 請求項1記載の装置において、 前記単語取得手段が、複数の単語の音声を格納した単語
データベースと、前記複数の単語の韻律を格納した単語
韻律テーブルと、選択された単語の音声と韻律を前記単
語データベース及び単語韻律テーブルから検索する手段
とを有し、 前記韻律範囲取得手段が、複数の文章の単語部分の韻律
範囲を格納した文中韻律テーブルと、選択された文章の
単語部分の韻律範囲を前記文中韻律テーブルから検索す
る手段とを有することを特徴とする韻律制御装置。
4. The apparatus according to claim 1, wherein the word acquisition means stores a word database storing sounds of a plurality of words, a word prosody table storing prosody of the plurality of words, and a selected word And a means for retrieving speech and prosody from the word database and a word prosody table, the prosody range acquisition means, a sentence prosody table storing the prosody range of the word portion of the plurality of sentences, and the word of the selected sentence A prosody control device comprising means for retrieving a prosody range of a portion from the in-sentence prosody table.
【請求項5】 単語部分と定型部分とを有する文章の前
記定型部分の音声と、前記単語部分に入るべき単語の音
声とを結合して、前記文章の全体の音声を合成するシス
テムのための、前記単語の韻律を前記文章に適合するよ
うに制御する方法において、 前記単語の音声と韻律を取得する過程と、 前記文章の単語部分の韻律範囲を取得する過程と、 取得された前記単語の韻律と前記単語部分の韻律範囲と
を比較する過程と、 取得された前記単語の音声に対し、前記比較過程からの
比較結果に基づく韻律制御を行い、それにより、前記単
語の音声を、前記韻律範囲に存在する韻律をもった制御
された単語の音声に変換する過程と、を備えたことを特
徴とする韻律制御方法。
5. A system for synthesizing a voice of the fixed part of a sentence having a word part and a fixed part and a voice of a word to be included in the word part to synthesize the whole voice of the sentence. , A method of controlling the prosody of the word to match the sentence, a step of acquiring a voice and a prosody of the word, a step of acquiring a prosody range of a word portion of the sentence, of the acquired word The process of comparing the prosody and the prosody range of the word portion, and the prosody control based on the comparison result from the comparison process is performed on the acquired voice of the word, whereby the voice of the word is converted into the prosody. A prosody control method comprising: a process for converting a controlled word having a prosody existing in a range into a voice.
【請求項6】 単語部分と定型部分とを有する文章の前
記定型部分の音声と、前記単語部分に入るべき単語の音
声とを結合して、前記文章の全体の音声を合成するシス
テムにおいて、 前記単語の音声と韻律を取得する単語取得手段と、 前記文章の定型部分の音声と単語部分の韻律範囲とを取
得する文章取得手段と、 取得された前記単語の韻律と前記単語部分の韻律範囲と
を比較する韻律比較手段と、 取得された前記単語の音声に対し、前記韻律比較手段か
らの比較結果に基づく韻律制御を行い、それにより、前
記単語の音声を、前記韻律範囲に存在する韻律をもった
制御された単語の音声に変換する韻律制御手段と、 前記韻律制御手段からの前記制御された単語の音声と、
前記文章取得手段からの前記定型部分の音声とを結合し
て、前記文章の全体の音声を作成する編集合成手段と、
を備えたことを特徴とする音声合成システム。
6. A system for synthesizing the entire voice of the sentence by combining the voice of the fixed part of a sentence having a word part and a fixed part with the voice of a word to be included in the word part, A word acquisition unit that acquires a voice and a prosody of a word, a sentence acquisition unit that acquires a voice of a fixed part of the sentence and a prosody range of the word part, and a prosody of the acquired word and a prosody range of the word part And a prosody comparison means for comparing the obtained speech of the word, prosody control based on the comparison result from the prosody comparison means, thereby, the speech of the word, the prosody existing in the prosody range. A prosody control means for converting into a voice of a controlled word, and a voice of the controlled word from the prosody control means;
An edit / synthesis unit that combines the voice of the fixed part from the sentence acquisition unit to create the entire voice of the sentence,
A voice synthesis system characterized by comprising.
JP05315996A 1996-03-11 1996-03-11 Prosody control device and method Expired - Fee Related JP3241582B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP05315996A JP3241582B2 (en) 1996-03-11 1996-03-11 Prosody control device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP05315996A JP3241582B2 (en) 1996-03-11 1996-03-11 Prosody control device and method

Publications (2)

Publication Number Publication Date
JPH09244680A true JPH09244680A (en) 1997-09-19
JP3241582B2 JP3241582B2 (en) 2001-12-25

Family

ID=12935078

Family Applications (1)

Application Number Title Priority Date Filing Date
JP05315996A Expired - Fee Related JP3241582B2 (en) 1996-03-11 1996-03-11 Prosody control device and method

Country Status (1)

Country Link
JP (1) JP3241582B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11237971A (en) * 1998-02-23 1999-08-31 Nippon Telegr & Teleph Corp <Ntt> Voice responding device
JP2009282236A (en) * 2008-05-21 2009-12-03 Mitsubishi Electric Corp Speech synthesizer

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11237971A (en) * 1998-02-23 1999-08-31 Nippon Telegr & Teleph Corp <Ntt> Voice responding device
JP2009282236A (en) * 2008-05-21 2009-12-03 Mitsubishi Electric Corp Speech synthesizer

Also Published As

Publication number Publication date
JP3241582B2 (en) 2001-12-25

Similar Documents

Publication Publication Date Title
JP2885372B2 (en) Audio coding method
JPH10153998A (en) Auxiliary information utilizing type voice synthesizing method, recording medium recording procedure performing this method, and device performing this method
JPH031200A (en) Regulation type voice synthesizing device
US20020049594A1 (en) Speech synthesis
JP2003337592A (en) Method and equipment for synthesizing voice, and program for synthesizing voice
US6832192B2 (en) Speech synthesizing method and apparatus
US10643600B1 (en) Modifying syllable durations for personalizing Chinese Mandarin TTS using small corpus
JP2002525663A (en) Digital voice processing apparatus and method
JPH08335096A (en) Text voice synthesizer
JP3241582B2 (en) Prosody control device and method
JP4451665B2 (en) How to synthesize speech
JP5175422B2 (en) Method for controlling time width in speech synthesis
JPH11249679A (en) Voice synthesizer
US7130799B1 (en) Speech synthesis method
JP3785892B2 (en) Speech synthesizer and recording medium
JP2008058379A (en) Speech synthesis system and filter device
JP3081300B2 (en) Residual driven speech synthesizer
JP3059751B2 (en) Residual driven speech synthesizer
JPH09179576A (en) Voice synthesizing method
JP3113101B2 (en) Speech synthesizer
JP2577372B2 (en) Speech synthesis apparatus and method
JP3967571B2 (en) Sound source waveform generation device, speech synthesizer, sound source waveform generation method and program
JP3310217B2 (en) Speech synthesis method and apparatus
JPH11109992A (en) Phoneme database creating method, voice synthesis method, phoneme database, voice element piece database preparing device and voice synthesizer
JP2809769B2 (en) Speech synthesizer

Legal Events

Date Code Title Description
R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071019

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081019

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091019

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101019

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111019

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121019

Year of fee payment: 11

LAPS Cancellation because of no payment of annual fees