JP3397406B2

JP3397406B2 - Voice synthesis device and voice synthesis method

Info

Publication number: JP3397406B2
Application number: JP30873193A
Authority: JP
Inventors: 芳明及川; 敬一山田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1993-11-15
Filing date: 1993-11-15
Publication date: 2003-04-14
Anticipated expiration: 2018-04-14
Also published as: JPH07140999A

Description

Detailed Description of the Invention

【０００１】[0001]

【目次】以下の順序で本発明を説明する。産業上の利用分野従来の技術（図４）発明が解決しようとする課題課題を解決するための手段（図１）作用（図１）実施例（図１〜図３）発明の効果[Table of Contents] The present invention will be described in the following order. Industrial applications Conventional technology (Fig. 4) Problems to be Solved by the Invention Means for Solving the Problems (FIG. 1) Action (Fig. 1) Example (FIGS. 1 to 3) The invention's effect

【０００２】[0002]

【産業上の利用分野】本発明は音声合成装置及び音声合
成方法に関し、特に入力文字系列より音声を合成するテ
キスト音声合成装置及びテキスト音声合成方法に適用し
て好適なものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus and a voice synthesizing method, and is particularly suitable for application to a text voice synthesizing apparatus and a text voice synthesizing method for synthesizing a voice from an input character sequence.

【０００３】[0003]

【従来の技術】従来、音声合成装置においては一般に図
４に示すような処理を行なうことにより音声合成を行な
つている。まず音声合成装置１において、所定の入力装
置よりテキスト入力部２に漢字仮名混じりの日本語テキ
ストが入力され、テキストを文章解析部３に出力する。
文章解析部３は入力されたテキストを辞書４を基準にし
て解析し、読み仮名文字列に変換した後、単語固有のア
クセント型を検出して単語、文節毎に分解する。2. Description of the Related Art Conventionally, a speech synthesizer generally performs speech synthesis by performing a process as shown in FIG. First, in the speech synthesizer 1, a Japanese text mixed with kanji and kana is input to the text input unit 2 from a predetermined input device, and the text is output to the sentence analysis unit 3.
The sentence analysis unit 3 analyzes the input text using the dictionary 4 as a reference, converts it into a phonetic kana character string, and then detects the accent type specific to the word and decomposes it into words and phrases.

【０００４】すなわち日本語においては、英語のように
単語が分かち書きされていないことから、例えば「米国
産業界」のような言葉は、「米国／産業・界」、「米／
国産／業界」のように２種類に区分化することができ
る。That is, in Japanese, words are not separated into words like English, so words such as "US industry" are "US / industry / world" and "US /
It can be divided into two types, such as "domestic / industry".

【０００５】このため文章解析部３は辞書４を参考にし
ながら、言葉の連続関係及び単語の統計的性質を利用し
て、テキスト入力を単語、文節毎に分解するようになさ
れ、これにより単語、文節の境界を検出する。さらに文
章解析部３は、このようにして求められた各文章毎の読
み仮名（音韻に関する情報）と文節境界及びアクセント
型（韻律に関する情報）とを示す記号列を記号列解析部
５に出力する。For this reason, the sentence analysis unit 3 refers to the dictionary 4 and utilizes the continuity of words and the statistical property of words to decompose the text input into words and phrases. Detects bunsetsu boundaries. Furthermore, the sentence analysis unit 3 outputs to the symbol string analysis unit 5 a symbol string indicating the phonetic kana (information about phonemes) and bunsetsu boundaries and accent types (information about prosody) for each sentence thus obtained. .

【０００６】記号列解析部５はこの記号列を音韻情報及
び韻律情報に分離して抽出し、音韻情報は音韻継続時間
長算出部６及びパラメータ接続部７に出力され、韻律情
報はピツチパターン生成部８に出力される。ここで音韻
情報とは発声される音に関する情報、韻律情報とはアク
セントやイントネーシヨンに関する情報である。The symbol string analysis unit 5 separates and extracts this symbol string into phoneme information and prosodic information, the phoneme information is output to the phoneme duration calculation unit 6 and the parameter connection unit 7, and the prosodic information is generated as a pitch pattern. It is output to the unit 8. Here, the phonological information is information about a sound to be uttered, and the prosody information is information about accents and intonation.

【０００７】音韻継続時間長算出部６に音韻情報が入力
されると、音韻継続時間長算出部６は音韻情報より各音
韻の継続時間長を算出する。例えば音韻の継続時間長を
算出する方法として、母音の継続時間長を次式When the phoneme information is input to the phoneme duration calculating unit 6, the phoneme duration calculating unit 6 calculates the duration of each phoneme from the phoneme information. For example, as a method of calculating the phoneme duration, the vowel duration is calculated as

【数１】のように母音の平均継続時間長と、音韻環境の各要因に
おける母音平均長からの変動分の時間長とを加えること
により求める手法がある。この手法は平成２年３月の日
本音響学会講演論文集（海木他著）に発表されている。[Equation 1] As described above, there is a method of obtaining it by adding the average duration of vowels and the time length of variation from the average length of vowels in each factor of the phonological environment. This method was published in a March 2012 paper collection of the Acoustical Society of Japan (written by Kaiki et al.).

【０００８】この（１）式において、αは係数行列、σ
は母音の音韻環境が各要因のどのカテゴリに該当するか
を示す係数「１」又は「０」の行列、ｍは要因数（すな
わち母音の種類、前方音韻の種類、前々方音韻の種類、
後方音韻の種類、後々方音韻の種類、前方が促音、後方
が促音、長音、呼気段落モーラ数、呼気段落内位置、文
モーラ数、文内位置）、ｌは各要因のカテゴリ数を表
す。In this equation (1), α is a coefficient matrix and σ
Is a matrix of coefficients “1” or “0” that indicates which category of each factor the phonological environment of the vowel belongs to, and m is the number of factors (that is, the type of vowel, the type of the front phoneme, the type of the front-to-back phoneme,
The type of the rear phoneme, the type of the rear phoneme, the front, the consonant, the rear, the consonant, the long sound, the expiratory paragraph mora number, the expiratory paragraph position, the sentence mora number, the sentence position), and l represents the number of categories of each factor.

【０００９】この場合、入力として与えられるパラメー
タは求めようとしている音韻の音韻環境であり、母音平
均長からの変動分は予め大量の文章を分析して求めてお
くことにより実現される。In this case, the parameter given as an input is the phonological environment of the phoneme to be obtained, and the variation from the average length of vowels is realized by analyzing and obtaining a large amount of sentences in advance.

【００１０】このようにして求められた音韻の継続時間
長はパラメータ接続部７及びピツチパターン生成部８に
出力される。パラメータ接続部７では、音韻情報及び算
出した各音韻の継続時間長に基づいて、音素片データベ
ース９より読み出した音素片データを接続してパラメー
タ列を生成する。生成されたパラメータ列は音声合成部
１０に出力される。The phoneme duration determined in this manner is output to the parameter connection unit 7 and the pitch pattern generation unit 8. The parameter connection unit 7 connects the phoneme piece data read from the phoneme piece database 9 based on the phoneme information and the calculated duration of each phoneme to generate a parameter string. The generated parameter string is output to the voice synthesizer 10.

【００１１】他方、ピツチパターン生成部８では、韻律
情報及び算出した各音韻の継続時間長に基づいてピツチ
パターンを生成し、生成されたピツチパターンは音声合
成部１０に出力される。On the other hand, the pitch pattern generator 8 generates a pitch pattern based on the prosodic information and the calculated duration of each phoneme, and the generated pitch pattern is output to the voice synthesizer 10.

【００１２】ここでピツチパターン算出の例として、次
式Here, as an example of pitch pattern calculation,

【数２】に示すようにピツチパターンをフレーズ成分とアクセン
ト成分とに分け、それぞれがインパルス入力、ステツプ
入力で駆動される２次臨界応答出力の和で表現するモデ
ルがある。このモデルは1989年１月の通信学会論文誌 V
ol.J72-A, No.1（藤崎他著「基本周波数パターン生成過
程モデルに基づく文章音声の合成」）に発表されてい
る。[Equation 2] There is a model in which the pitch pattern is divided into a phrase component and an accent component, and each is represented by the sum of the secondary critical response outputs driven by an impulse input and a step input as shown in FIG. This model was published in January, 1989
ol.J72-A, No.1 (Fujisaki et al., "Synthesis of Sentence Speech Based on Fundamental Frequency Pattern Generation Process Model").

【００１３】この（２）式において、Ｇ_pi(t) 、Ｇ
_aj(t) はそれぞれフレーズ制御機構のインパルス応答、
アクセント制御機構のステツプ応答であり、ｔ≧０の範
囲ではインパルス応答、ステツプ応答はそれぞれ次式In this equation (2), G _pi (t), G
_aj (t) is the impulse response of the phrase control mechanism,
It is the step response of the accent control mechanism, and the impulse response and the step response are as follows in the range of t ≧ 0.

【数３】及び次式[Equation 3] And the following equation

【数４】となる。[Equation 4] Becomes

【００１４】この（３）式及び（４）式において、Ｆ
_minは最低ピツチ周波数、α、β、θは定数、Ａ_piはフ
レーズ指令の大きさ、Ｔ_liはフレーズ指令の位置、Ａ_aj
はアクセント指令の大きさ、Ｔ_1jはアクセント指令の開
始位置、Ｔ_2jはアクセント指令の終了位置である。また
ｔ＜０ではＧ_pi(t) ＝Ｇ_aj(t) ＝０となる。In the equations (3) and (4), F
_min is the lowest pitch frequency, α, β, θ are constants, A _pi is the size of the phrase command, T _li is the position of the phrase command, A _aj
Is the size of the accent command, T _1j is the start position of the accent command, and T _2j is the end position of the accent command. When t <0, G _pi (t) = G _aj (t) = 0.

【００１５】この場合、入力として与えられるパラメー
タは、入力の時刻、大きさ及び応答の速さを決める係数
である。これらの値は、予め大量の文章を分析して求め
て保持しておき、所定の規則により適切な値を用いてピ
ツチパターンを生成する。次に音声合成部１０はパラメ
ータ列及びピツチパターンに基づいて波形合成処理を行
い、デイジタルアナログ変換部（Ｄ／Ａ部）１１を介し
て合成音声信号を出力する。In this case, the parameters given as inputs are coefficients that determine the time, magnitude and response speed of the input. These values are obtained by analyzing a large amount of sentences in advance and held, and a pitch pattern is generated using appropriate values according to a predetermined rule. Next, the voice synthesizing unit 10 performs a waveform synthesizing process based on the parameter sequence and the pitch pattern, and outputs a synthetic voice signal via the digital analog converting unit (D / A unit) 11.

【００１６】[0016]

【発明が解決しようとする課題】ところでこのような従
来のテキスト音声合成装置１では、ピツチパターンの算
出は、テキスト解析のための辞書に予め記載されている
各単語のアクセント型と、単語が連なつた場合のアクセ
ントの変化規則とにより、文節のアクセントが求められ
て記号列に出力されている。この場合、同じアクセント
型で同じモーラ数の単語が文章中の同じ位置に出現した
場合には、求められるアクセントパターンは同じものと
なる。By the way, in such a conventional text-to-speech synthesizing apparatus 1, the pitch pattern is calculated by combining the accent type of each word previously described in the dictionary for text analysis and the word. According to the accent change rule in the case of Natsu, the accent of the phrase is obtained and output to the symbol string. In this case, when words of the same accent type and the same number of mora appear at the same position in the sentence, the accent patterns required are the same.

【００１７】また音韻の継続時間長は、算出しようとし
ている音韻の前後、その前後の音韻環境や文章中の位置
を考慮して求められるが、同じ音韻環境で異なる単語の
場合には、求められる音韻の継続時間長は同じものとな
る。The phoneme duration is calculated in consideration of the phoneme environment before and after the phoneme to be calculated, the phoneme environment before and after the phoneme, and the position in the sentence, but in the case of different words in the same phoneme environment. The phoneme duration is the same.

【００１８】ところが実際に人間が発声した場合のピツ
チのパターンは、同じアクセント型で同じモーラ数の単
語が文章中の同じ位置に出現しても、同じピツチパター
ンで話されることはほとんどなく、音韻の継続時間長も
ミクロ的に同じ音韻環境であつても単語単位のようなマ
クロの音韻環境を考えると同様の長さになるということ
は少ない。従つて従来のテキスト音声合成装置の合成音
声は、単調になつてしまうという問題があつた。However, the pitch pattern when a person actually utters is such that even if words with the same accent type and the same number of mora appear at the same position in the sentence, they are hardly spoken in the same pitch pattern. Even if the phoneme duration is microscopically the same, the phoneme duration is unlikely to be the same when considering a macro phoneme environment such as a word unit. Therefore, there is a problem in that the synthesized voice of the conventional text-to-speech synthesizer becomes monotonous.

【００１９】本発明は以上の点を考慮してなされたもの
で、人間の発声に近い合成音声を得ることができる音声
合成装置及び音声合成方法を提案しようとするものであ
る。The present invention has been made in view of the above points, and an object thereof is to propose a voice synthesizing apparatus and a voice synthesizing method capable of obtaining a synthetic voice close to a human voice.

【００２０】[0020]

【課題を解決するための手段】かかる課題を解決するた
め本発明においては、単語固有の音韻継続時間長情報及
び又はアクセント指令値を保持する辞書２４と、入力さ
れた文章が解析されて得られる当該文章の記号列データ
を音韻情報及び韻律情報に分離して抽出する記号列解析
手段２５と、音韻情報の所定の音韻に対する継続時間長
を算出することにより得られる値に必要に応じて音韻継
続時間長情報に置き換えて音韻継続時間長を算出する音
韻継続時間長算出手段２６と、音韻情報及び音韻継続時
間長に基づいてパラメータ列を生成するパラメータ生成
手段２７と、音韻継続時間長と韻律情報、又は、音韻継
続時間長と韻律情報とアクセント指令値とに基づいてピ
ツチパターンを生成するピツチパターン生成手段２８
と、パラメータ列及びピツチパターンより音声波形を合
成する音声合成手段３０とを設けるようにする。In order to solve such a problem, the present invention is obtained by analyzing a dictionary 24 holding word-specific phoneme duration information and / or accent command values, and an inputted sentence. A symbol string analysis means 25 for separating and extracting the symbol string data of the sentence into phoneme information and prosody information, and a phoneme continuation if necessary to a value obtained by calculating the duration of the phoneme information for a given phoneme. Phoneme duration calculating means 26 for calculating phoneme duration by replacing with phoneme information, parameter producing means 27 for generating a parameter string based on phoneme information and phoneme duration, phoneme duration and prosody information. , Or a pitch pattern generating means 28 for generating a pitch pattern based on the phoneme duration, the prosody information and the accent command value.
And a voice synthesizing means 30 for synthesizing a voice waveform from the parameter sequence and the pitch pattern.

【００２１】またかかる課題を解決するため本発明にお
いては、単語固有の音韻継続時間長情報及び又はアクセ
ント指令値を辞書２４に保持し、入力された文章が解析
されて得られる当該文章の記号列データを音韻情報及び
韻律情報に分離して抽出し、音韻情報の所定の音韻に対
する継続時間長を算出することにより得られる値に必要
に応じて音韻継続時間長情報に置き換えて音韻継続時間
長を算出し、音韻情報及び音韻継続時間長に基づいてパ
ラメータ列を生成し、音韻継続時間長と韻律情報、又
は、音韻継続時間長と韻律情報とアクセント指令値とに
基づいてピツチパターンを生成し、パラメータ列及びピ
ツチパターンより音声波形を合成するようにする。In order to solve such a problem, in the present invention, the phoneme duration information and / or the accent command value peculiar to a word is held in the dictionary 24, and the symbol string of the sentence obtained by analyzing the inputted sentence is obtained. The data is separated and extracted into phonological information and prosodic information, and the value obtained by calculating the duration of the phoneme information for a given phoneme is replaced with the phoneme duration information as necessary to determine the phoneme duration. Calculate, to generate a parameter string based on the phoneme information and phoneme duration, to generate a pitch pattern based on the phoneme duration and prosody information, or the phoneme duration and prosody information and accent command value, A voice waveform is synthesized from the parameter string and the pitch pattern.

【００２２】[0022]

【作用】音韻情報の所定の音韻に対する継続時間長を算
出することにより得られる値に、辞書２４に保持してい
る単語固有の音韻継続時間長情報に置き換えて算出され
た音韻継続時間長、さらには辞書２４に保持されている
単語固有のアクセント指令値に基づいて音声波形を合成
するので、たとえ同じアクセント型で同じモーラ数の単
語が文章中の同じ位置に出現した場合、又は異なる単語
が同じ音韻環境にある場合であつても、それぞれの単語
特有のピツチパターンを確実に得ることができる。The phoneme duration calculated by replacing the value obtained by calculating the duration of the phoneme information with respect to a predetermined phoneme with the phoneme duration information unique to the word held in the dictionary 24, and Synthesizes the speech waveform based on the word-specific accent command value held in the dictionary 24, so that even if words with the same accent type and the same mora number appear at the same position in the sentence, or if different words are the same. Even in the phonological environment, it is possible to reliably obtain the pitch pattern peculiar to each word.

【００２３】[0023]

【実施例】以下図面について、本発明の一実施例を詳述
する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings.

【００２４】図１において、２１は全体として音声合成
装置を示し、テキスト入力部２２に入力されたテキスト
を所定の各処理部で処理することにより人間の発声に近
い音声を出力するようになされている。ここで音声合成
装置２１は従来の音声合成装置１と同じ構成を有する
が、辞書に追加された情報と、これらの情報を用いる各
処理部の処理内容は異なる。In FIG. 1, reference numeral 21 denotes a speech synthesizer as a whole, which is adapted to output a voice close to human utterance by processing the text input to the text input unit 22 by each predetermined processing unit. There is. Here, the speech synthesizer 21 has the same configuration as the conventional speech synthesizer 1, but the information added to the dictionary and the processing content of each processing unit using these information are different.

【００２５】まず所定の入力装置よりテキスト入力部２
２に漢字仮名混じりの日本語テキストが入力され、これ
を文章解析部２３に出力する。文章解析部２３は入力さ
れたテキストを辞書２４を基準にして解析し、読み仮名
文字列に変換した後、単語固有のアクセント型を検出し
て単語、文節毎に分解する。First, a text input unit 2 is input from a predetermined input device.
A Japanese text mixed with kanji and kana is input to 2 and is output to the sentence analysis unit 23. The sentence analysis unit 23 analyzes the input text using the dictionary 24 as a reference, converts it into a phonetic kana character string, and then detects a word-specific accent type and decomposes it into words and phrases.

【００２６】すなわち文章解析部２３は辞書２４を参考
にしながら、言葉の連続関係及び単語の統計的性質を利
用して、テキスト入力を単語、文節毎に分解するように
なされ、これにより単語、文節の境界を検出する。さら
に文章解析部２３は、このようにして求められた各文章
毎の読み仮名（音韻に関する情報）と文節境界及びアク
セント型（韻律に関する情報）とを示す記号列を記号列
解析部２５に出力する。That is, the sentence analysis unit 23 uses the continuity of words and the statistical property of words while referring to the dictionary 24, and decomposes the text input into words and phrases. Detect the boundaries of. Further, the sentence analysis unit 23 outputs to the symbol string analysis unit 25 a symbol string indicating the reading kana (information on phoneme), bunsetsu boundary, and accent type (information on prosody) of each sentence thus obtained. .

【００２７】ここで辞書２４には、アクセント型を示す
情報の他に単語固有のアクセント指令値及び音韻継続時
間長情報が予め保持され、音韻継続時間長算出部２６は
音韻継続時間長情報を読み出し、ピツチパターン生成部
２７はアクセント指令値を読出す。次に記号列解析部２
５はこの記号列を音韻情報及び韻律情報に分離して抽出
し、音韻情報を音韻継続時間長算出部２６及びパラメー
タ接続部２７に出力し、韻律情報をピツチパターン生成
部２８に出力する。Here, in the dictionary 24, in addition to the information indicating the accent type, the accent command value peculiar to the word and the phoneme duration information are held in advance, and the phoneme duration calculating unit 26 reads out the phoneme duration information. The pitch pattern generator 27 reads the accent command value. Next, the symbol string analysis unit 2
Reference numeral 5 separates and extracts this symbol string into phoneme information and prosody information, outputs the phoneme information to the phoneme duration calculation unit 26 and the parameter connection unit 27, and outputs the prosody information to the pitch pattern generation unit 28.

【００２８】音韻継続時間長算出部２６では、記号列解
析部２５からの音韻情報及び辞書２４からの音韻継続時
間長情報より各音韻の継続時間長を算出する。例えば、
「白い花（shiroihana）」というテキストが入力された
場合、この「白い花」の３番目の母音（すなわち
「ｉ」）の継続時間長を求める場合を以下に説明する。The phoneme duration calculation unit 26 calculates the duration of each phoneme from the phoneme information from the symbol string analysis unit 25 and the phoneme duration information from the dictionary 24. For example,
The case where the duration of the third vowel (ie, “i”) of the “white flower” is obtained when the text “white flower (shiroihana)” is input will be described below.

【００２９】この母音「ｉ」の継続時間長は（１）式よ
り求めることができる。すなわち具体的には母音「ｉ」
の継続時間長＝『／ｉ／の場合の係数＋前の音韻が／ｏ
／の場合の係数＋前々方の音韻が／ｒ／の場合の係数＋
後の音韻が／ｈ／の場合の係数＋後々方の音韻が／ａ／
の場合の係数＋前が促音でない場合の係数＋後が促音で
ない場合の係数＋長音でない場合の係数＋呼気段落のモ
ーラ長「ｓ」の場合の係数＋呼気段落中の場合の係数』
＋文のモーラ長Ｎの場合の係数＋文中の場合の係数とな
る。The duration of this vowel "i" can be obtained from the equation (1). That is, specifically, the vowel "i"
Duration = "coefficient for" / i / + previous phoneme is / o "
Coefficient in case of / + coefficient in case where phoneme in front of is / r /
The coefficient when the subsequent phoneme is / h / + the later phoneme is / a /
Coefficient in case of + no coefficient in front + coefficient in case of no subsequent sound + coefficient in case of no long tone + coefficient in case of mora length of expiratory paragraph "s" + coefficient in case of expiratory paragraph "
+ Coefficient when the mora length N of the sentence is + coefficient when in the sentence

【００３０】次にこの式の『』で囲んだ項を予め辞書２
４に保持してある値に置き換えて、母音／ｉ／の継続時
間長を算出する。すなわち母音「ｉ」の継続時間長＝
『辞書２４より読み出した値』＋文のモーラ長Ｎの場合
の係数＋文中の場合の係数となる。Next, the term enclosed by "" in this expression is previously stored in the dictionary 2.
By replacing it with the value held in 4, the duration of the vowel / i / is calculated. That is, the duration of the vowel "i" =
“Value read from dictionary 24” + coefficient in case of mora length N of sentence + coefficient in case of sentence.

【００３１】このように予め辞書２４に各音韻の継続時
間長を求めるのに必要な音韻継続時間長情報を用意し、
この音韻継続時間長情報を用いて各音韻の継続時間長を
（１）式より算出すれば、異なる単語が同じ音韻環境に
あつても、それぞれの単語に応じた自然な継続時間長を
算出することができる。ここで未知語には継続時間長情
報は存在しないので、未知語の場合には従来の方法によ
り音韻の継続時間長を算出する。In this way, the phoneme duration information necessary for obtaining the duration of each phoneme is prepared in advance in the dictionary 24,
If the duration of each phonological unit is calculated from the expression (1) using this phonological duration information, even if different words have the same phonological environment, a natural duration corresponding to each word is calculated. be able to. Here, since the unknown word has no duration information, in the case of an unknown word, the duration of the phoneme is calculated by the conventional method.

【００３２】このようにして算出された各音韻の継続時
間長はパラメータ接続部２７及びピツチパターン生成部
２８に出力される。パラメータ接続部２７では、記号列
解析部２５からの音韻情報と音韻継続時間長算出部２６
で算出された音韻の継続時間長とに基づいて、音素片デ
ータベース２９から選択した音素片データを接続し、パ
ラメータ列を生成する。生成されたパラメータ列は音声
合成部３０に出力される。The duration time of each phoneme calculated in this way is output to the parameter connection unit 27 and the pitch pattern generation unit 28. In the parameter connection unit 27, the phoneme information from the symbol string analysis unit 25 and the phoneme duration calculation unit 26.
The phoneme piece data selected from the phoneme piece database 29 is connected based on the phoneme duration time calculated in (3) to generate a parameter string. The generated parameter string is output to the voice synthesizer 30.

【００３３】他方、ピツチパターン生成部２８では、記
号列解析部２５からの韻律情報、辞書２４からのアクセ
ント指令値及び音韻継続時間長算出部２６で算出された
音韻の継続時間長に基づいて、ピツチパターンを生成す
る。On the other hand, in the pitch pattern generation unit 28, based on the prosody information from the symbol string analysis unit 25, the accent command value from the dictionary 24, and the phoneme duration calculated by the phoneme duration calculation unit 26, Generate a pitch pattern.

【００３４】ここで例えばフレーズ指令及びアクセント
指令がそれぞれ１つずつの場合の従来のピツチパターン
の生成過程を図２に示す。上述のように従来のピツチパ
ターン生成過程において、入力として与えられるパラメ
ータは予め大量の文章を分析して求めてた値であり、所
定の規則によつてそれぞれに適した値を選択し、この選
択した値をそれぞれアクセント指令の大きさＡ１、開始
位置ｔ１、終了位置ｔ２、応答速度β１に使用してピツ
チパターンを生成している。FIG. 2 shows a conventional process for generating a pitch pattern when there is one phrase command and one accent command, for example. As described above, in the conventional pitch pattern generation process, the parameter given as an input is a value obtained by analyzing a large amount of sentences in advance, and a value suitable for each is selected according to a predetermined rule, and this selection is performed. These values are used as the accent command size A1, the start position t1, the end position t2, and the response speed β1 to generate a pitch pattern.

【００３５】他方この実施例においては、予め辞書２４
に各単語に固有のアクセント指令値を保持しておき、辞
書２４より所定の単語のアクセント指令の大きさＡ２、
開始位置ｔ３、終了位置ｔ４、応答速度β２を読み出
し、これらを用いて図３に示すようなピツチパターンを
算出することにより、単語特有のピツチパターンを得
る。On the other hand, in this embodiment, the dictionary 24 is prepared in advance.
The accent command value peculiar to each word is stored in the dictionary, and the accent command size A2 of a predetermined word is stored in the dictionary 24.
The start position t3, the end position t4, and the response speed β2 are read out, and a pitch pattern as shown in FIG. 3 is calculated using these to obtain a word-specific pitch pattern.

【００３６】このように予め辞書２４に各単語ごとの固
有のアクセント指令値を用意し、このアクセント指令値
を用いることによりピツチパターンを生成すれば、同じ
アクセント型で同じモーラ数の単語が文章中の同じ位置
に出現してもそれぞれの単語特有のアクセントパターン
を得ることができるので、単語特有のピツチパターンを
得ることができる。ここで未知語の場合には、音韻の継
続時間長の算出の場合と同様に従来の方法によりアクセ
ント指令値を算出する。In this way, if a unique accent command value for each word is prepared in the dictionary 24 in advance and a pitch pattern is generated by using this accent command value, words with the same accent type and the same number of mora are in the sentence. Even if they appear at the same position, the accent pattern peculiar to each word can be obtained, so that the pitch pattern peculiar to the word can be obtained. In the case of an unknown word, the accent command value is calculated by the conventional method as in the case of calculating the phoneme duration.

【００３７】このようにして生成されたピツチパターン
は音声合成部３０に出力され、音声合成部３０において
パラメータ列及びピツチパターンより音声波形を合成
し、Ｄ／Ａ部３１を介して合成音声信号を出力する。The pitch pattern generated in this way is output to the voice synthesizing section 30, and the voice synthesizing section 30 synthesizes a voice waveform from the parameter sequence and the pitch pattern, and a synthesized voice signal is generated via the D / A section 31. Output.

【００３８】以上の構成において、辞書２４に音韻継続
時間長情報とアクセント指令値とを予め保持しておく。
テキスト入力部２２に、辞書２４に登録されている語が
入力された場合、パラメータ接続部２７においては、辞
書２４に保持されている音韻継続時間長情報を用いて算
出した音韻の継続時間長及び音韻情報に基づいて音素片
データのパラメータ列が生成される。またピツチパター
ン生成部２８においては、算出した音韻の継続時間長、
音韻情報及び辞書２４に保持されているアクセント指令
書に基づいてピツチパターンが生成される。In the above structure, the dictionary 24 holds phoneme duration information and accent command values in advance.
When a word registered in the dictionary 24 is input to the text input unit 22, the parameter connection unit 27 calculates the phoneme duration and the phoneme duration calculated using the phoneme duration information stored in the dictionary 24. A parameter string of phoneme piece data is generated based on the phoneme information. In the pitch pattern generation unit 28, the calculated phoneme duration,
A pitch pattern is generated based on the phoneme information and the accent instruction book held in the dictionary 24.

【００３９】このようにして生成したパラメータ列及び
ピツチパターンは各単語固有のパラメータ列及びピツチ
パターンになるので、これらのパラメータ列とピツチパ
ターンとを合成して得られる合成音声は一段と人間の発
声に近い合成音声となる。Since the parameter sequence and pitch pattern generated in this way become the parameter sequence and pitch pattern unique to each word, the synthesized speech obtained by synthesizing these parameter sequence and the pitch pattern is much more human voiced. The synthesized speech is close.

【００４０】以上の構成によれば、テキスト解析用の辞
書２４に、各単語のアクセント指令値と音韻継続時間長
情報とを追加して保持する。辞書２４に登録されている
語が入力された場合には、音韻継続時間長情報を用いて
算出した音韻の継続時間長に基づいてパラメータ列を生
成し、この算出された音韻の継続時間長及び辞書２４に
保持されているアクセント指令値に基づいてピツチパタ
ーンを生成することにより、人間の発声に一段と近い合
成音声を出力することがてきる。According to the above construction, the accent command value of each word and the phoneme duration information are additionally held in the text analysis dictionary 24. When a word registered in the dictionary 24 is input, a parameter string is generated based on the phoneme duration calculated using the phoneme duration information, and the calculated phoneme duration and the phoneme duration are calculated. By generating a pitch pattern based on the accent command value stored in the dictionary 24, it is possible to output a synthesized voice that is much closer to human speech.

【００４１】なお上述の実施例においては、辞書２４に
アクセント指令値及び音韻継続時間長情報の両方を保持
する場合について述べたが、本発明はこれに限らず、い
ずれか１つの情報だけを辞書２４に保持するようにして
もよい。In the above embodiment, the case where the dictionary 24 holds both the accent command value and the phoneme duration information has been described, but the present invention is not limited to this, and only one of the information is stored in the dictionary. It may be held at 24.

【００４２】[0042]

【発明の効果】上述のように本発明によれば、音韻情報
の所定の音韻に対する継続時間長を算出することにより
得られる値に、辞書に保持している単語固有の音韻継続
時間長情報に置き換えて算出された音韻継続時間長、さ
らには辞書に保持されている単語固有のアクセント指令
値に基づいて音声波形を合成するので、たとえ同じアク
セント型で同じモーラ数の単語が文章中の同じ位置に出
現した場合、又は異なる単語が同じ音韻環境にある場合
であつても、それぞれの単語特有のピツチパターンを確
実に得ることができ、かくして、人間の発声に一段と近
い合成音声を出力することができる。As described above, according to the present invention, the value obtained by calculating the duration of phoneme information with respect to a predetermined phoneme is added to the phoneme duration information unique to the word stored in the dictionary. Since the speech waveform is synthesized based on the phoneme duration calculated by replacement and the word-specific accent command value stored in the dictionary, even words with the same accent type and the same number of mora have the same position in the sentence. , Or even when different words are in the same phonological environment, it is possible to reliably obtain a pitch pattern peculiar to each word, and thus it is possible to output synthetic speech that is much closer to human speech. it can.

[Brief description of drawings]

【図１】本発明によるテキスト音声合成装置の一実施例
の機能構成を示すブロツク図である。FIG. 1 is a block diagram showing a functional configuration of an embodiment of a text-to-speech synthesizer according to the present invention.

【図２】従来のテキスト音声合成装置により算出される
ピツチパターンを示す特性曲線である。FIG. 2 is a characteristic curve showing a pitch pattern calculated by a conventional text-to-speech synthesizer.

【図３】実施例におけるテキスト音声合成装置により算
出されるピツチパターンを示す特性曲線である。FIG. 3 is a characteristic curve showing a pitch pattern calculated by the text-to-speech synthesizer in the embodiment.

【図４】従来のテキスト音声合成装置の機能構成を示す
ブロツク図である。FIG. 4 is a block diagram showing a functional configuration of a conventional text-to-speech synthesizer.

[Explanation of symbols]

１、２１……音声合成装置、２、２２……テキスト入力
部、３、２３……文章解析部、４、２４……辞書、５、
２５……記号列解析部、６、２６……音韻継続時間長算
出部、７、２７……パラメータ接続部、８、２８……ピ
ツチパターン生成部、９、２９……音素片データベー
ス、１０、３０……音声合成部、１１、３１……Ｄ／Ａ
部。1, 21 ... Speech synthesizer, 2, 22 ... Text input section, 3, 23 ... Sentence analysis section, 4, 24 ... Dictionary, 5,
25 ... Symbol string analysis unit, 6, 26 ... Phonological duration calculation unit, 7, 27 ... Parameter connection unit, 8, 28 ... Pitch pattern generation unit, 9, 29 ... Phoneme unit database, 10, 30 ... Voice synthesizer, 11, 31 ... D / A
Department.

フロントページの続き (56)参考文献特開平１−126695（ＪＰ，Ａ) 特開平５−181491（ＪＰ，Ａ) 特開平５−289686（ＪＰ，Ａ) 特開平５−289688（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/08 Continuation of front page (56) Reference JP-A-1-126695 (JP, A) JP-A-5-181491 (JP, A) JP-A-5-289686 (JP, A) JP-A-5-289688 (JP , A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 13/08

Claims

(57) [Claims]

1. A dictionary holding word-specific phoneme duration information and / or accent command values, and a symbol string of the sentence obtained by analyzing an inputted sentence.
Symbol string analysis means for separating and extracting data into phoneme information and prosodic information, and calculating a duration of the phoneme information for a given phoneme
If necessary, continue the above phoneme to the value obtained by
Phoneme duration calculating means for calculating phoneme duration by replacing with phoneme information, parameter generating means for generating a parameter string based on the phoneme information and phoneme duration, the phoneme duration and Pitch pattern generating means for generating a pitch pattern based on the prosodic information, or the phoneme duration, the prosody information and the accent command value, and a voice synthesizing means for synthesizing a voice waveform from the parameter string and the pitch pattern. A voice synthesizer comprising:

2. A symbol string of the above sentence, which holds phoneme duration information and / or accent command value peculiar to a word and is obtained by analyzing the inputted sentence.
Data is separated into phonological information and prosodic information and extracted, and the duration of the phonological information for a given phonological element is calculated.
If necessary, continue the above phoneme to the value obtained by
The phoneme duration is calculated by replacing it with time length information, and a parameter string is generated based on the phoneme information and the phoneme duration, and the phoneme duration and the prosody information, or the phoneme duration and A voice synthesizing method comprising generating a pitch pattern based on the prosody information and the accent command value, and synthesizing a voice waveform from the parameter string and the pitch pattern.