JPS63131195A

JPS63131195A - Voice synthesizer

Info

Publication number: JPS63131195A
Application number: JP61277979A
Authority: JP
Inventors: 成利斉藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1986-11-21
Filing date: 1986-11-21
Publication date: 1988-06-03

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は、入力された任意の文字列を規則合成して音
声に変換する音声合成＠同に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Field of Application) The present invention relates to speech synthesis in which an arbitrary input character string is systematically synthesized and converted into speech.

（従来の技術）任意の入力文字列を規則合成して音声に変換する音声合
成装置では、入力文字列を解析して音韻記号列と韻律情
報とを求め、上記音韻記号列に基づいてＣ（子音）Ｖ（
母音）音節パラメータを選択し、接続するとともに、前
記韻律情報を考慮した合成音声を生成・出力するように
している。(Prior Art) A speech synthesis device that synthesizes an arbitrary input character string in a regular manner and converts it into speech analyzes the input character string to obtain a phonetic symbol string and prosodic information, and based on the phonetic symbol string, C( consonant)V(
The system selects and connects syllable parameters (vowels) and generates and outputs synthesized speech that takes into account the prosodic information.

ところで、自然音声のスペクトル包絡特性に管口すると
、同じＣ■音節でも、直前に付加される母音の種類によ
って２つの音節の接続部でのスペクトル包絡特性の変化
状況は、かなり異なったものとなっている。しかしなが
ら、従来の音声合成装置では、音節と音節とを単純な補
間によって結合していたので、音節の接続部における滑
らかさに欠けるという欠点があった。By the way, when looking at the spectral envelope characteristics of natural speech, even for the same C syllable, the changes in the spectral envelope characteristics at the junction of two syllables can vary considerably depending on the type of vowel added immediately before. ing. However, since conventional speech synthesis devices combine syllables by simple interpolation, they have the disadvantage of lacking smoothness at syllable connections.

（発明が解決しようとする問題点）このように従来の音声合成装置では、Ｃ■音節の接続部
において単純な補間結合を行うためその部分でのスペク
トル包絡特性の変化は自然音声のスペクトル包絡特性の
変化と違ったものになり、潤らかさのない不自然な音声
になっていた。(Problem to be Solved by the Invention) In this way, in the conventional speech synthesis device, simple interpolation and combination are performed at the connection part of the C syllable, so the change in the spectral envelope characteristic at that part is similar to that of the spectral envelope characteristic of natural speech. The sound was different from the previous one, and the sound was dull and unnatural.

この発明は上記の欠点を除去し、自然な連続性をもつ高
品質な合成音声を得ることができる音声合成装置を提供
することを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide a speech synthesizer that can eliminate the above-mentioned drawbacks and obtain high-quality synthesized speech with natural continuity.

［発明の構成］（問題点を解決するための手段）この発明は、入力文字列を解析して音韻記号列と韻律情
報とを生成する文字列解析装置と、音節パラメータを記
憶する記Ｗ１装置と、前記音節パラメータを参照して前
記生成された音韻記号列から音声パラメータ列を生成す
る音声パラメータ列生成装置と、前記生成された韻律情
報に基づいて韻律パラメータ列を生成する韻律パラメー
タ列生成装置と、前記音声パラメータ列と韻律パラメー
タ列とに従って合成音声を生成・出力する音声合成器と
を具備してなる音声合成装置において、前記音節パラメ
ータとして直前に母音の付かない音節を表す音節パラメ
ータと、直前母音のわたり部部分が付加された音節を表
す音節パラメータとを用意したことを特徴としている。[Structure of the Invention] (Means for Solving the Problems) The present invention includes a character string analysis device that analyzes an input character string to generate a phonetic symbol string and prosody information, and a writing W1 device that stores syllable parameters. a speech parameter string generation device that generates a speech parameter string from the generated phonetic symbol string with reference to the syllable parameters; and a prosodic parameter string generation device that generates a prosodic parameter string based on the generated prosodic information. and a speech synthesizer that generates and outputs synthesized speech according to the speech parameter string and the prosodic parameter string, wherein the syllable parameter represents a syllable without a vowel immediately before the syllable parameter; The system is characterized by the provision of a syllable parameter representing a syllable to which the transition part of the preceding vowel is added.

（作　用）この発明では、音声パラメータ列を生成するとき入力さ
れた文字列の前の音節を考慮し、入力された文字列が語
頭ならば直前に母音の付かない音節パラメータを記ｇＡ
装置から取り出し、語頭でないときは結合する音節の直
前の音節に看目し、その直前の音節の母音節とのわたり
部分が付加された音節パラメータを記憶装置から取り出
ず。そして、これら取り出した音節パラメータを補間結
合して音声パラメータ列を生成するようにしている。(Function) In this invention, when generating a speech parameter string, the syllable before the input character string is considered, and if the input character string is the beginning of a word, a syllable parameter without a vowel is written immediately before it.
If the syllable is not at the beginning of a word, it looks at the syllable immediately before the syllable to be combined, and does not retrieve the syllable parameter to which the transition part with the vowel of the syllable immediately before is added from the storage device. Then, these extracted syllable parameters are interpolated and combined to generate a speech parameter sequence.

この発明によれば、入力文字列が語頭てない場合、取り
出す音節パラメータは前にわたり部分を含んだ音節パラ
メータであることから、音声パラメータ列生成装置によ
って生成された音声パラメータ列は直前音節とのわたり
部分のスペクトル包格特性のｆｉ報をもっている音声パ
ラメータ列である。このため、出力される合成音声は自
然な連続性をもつ高品質なものとなる。According to this invention, when the input character string does not have the beginning of a word, the syllable parameter to be retrieved is a syllable parameter that includes a preceding syllable. This is an audio parameter string containing information on the spectral inclusive characteristics of a portion. Therefore, the synthesized speech that is output is of high quality and has natural continuity.

（実施例）以下、本発明の一実施例を図面に従って説明する。(Example) An embodiment of the present invention will be described below with reference to the drawings.

第１図は本実施例に係る音声合成装置を示すブロック図
である。FIG. 1 is a block diagram showing a speech synthesis device according to this embodiment.

文字列解析装置１は、入力文字列を解析し、音韻記号列
と韻律記号列とを生成する。このうち音韻記号列は音声
パラメータ列生成装置２に送られ、韻律記号列は韻律パ
ラメータ列生成装＠３に送られる。The character string analysis device 1 analyzes an input character string and generates a phonetic symbol string and a prosodic symbol string. Among these, the phonetic symbol string is sent to the speech parameter string generation device 2, and the prosodic symbol string is sent to the prosodic parameter string generation device @3.

上記音声パラメータ列生成装置２は入力された音韻記号
列に従い音声の声道特性を表現する音声パラメータ列を
生成する。その生成の方法は以下の通りである。なお、
ここでは音声合成の中位として子音（Ｃ）と母音（Ｖ）
の組み合せからなるＣｖ音節を用いる。例えば、入力文
字列が「ひまわり」であるとすると、このときの音韻系
列は［ｈｉ　ｒｍａ　ｗａ　ｒｉ］で示される。ここで
、／ｈ／、／ｌ／。The speech parameter string generating device 2 generates a speech parameter string expressing the vocal tract characteristics of speech according to the input phonetic symbol string. The method of its generation is as follows. In addition,
Here, consonants (C) and vowels (V) are used as the middle level of speech synthesis.
Use Cv syllables consisting of combinations of. For example, if the input character string is "sunflower", the phoneme sequence at this time is represented by [hi rma wa ri]. Here, /h/, /l/.

／ｖ／、／ｒ／は子音の音韻記号であり、／ｉ／、／ａ
／は母音の音韻記号である。音声パラメータ列生成装置
２は入力文字列を音節単位で［ｈｉ−ｎ＋ａ−ｗａ−ｒ
ｉ］（・は音節の区切りを示す。）の如く分割する。/v/, /r/ are the phonetic symbols of consonants, /i/, /a
/ is the phonological symbol for a vowel. The speech parameter string generation device 2 converts the input character string into [hi-n+a-wa-r
i] (* indicates a syllable break).

次に分割された単音節単位に従い、音節をパラメータ化
したファイルから音節パラメータを順次取り出し結合す
ることになるが、本Ｈｌｉｔではここで用いる音節パラ
メータ記憶装置が１種類ではなく６種類用意され、結合
する音節が語頭であるかどうか、また語頭でないときは
、当該音節の直前の音節の母音の種類を参照して６種類
の音節パラメータ記憶装置から１つの記憶装置が選択き
れる。Next, according to the divided single syllable units, syllable parameters are sequentially retrieved from the file in which the syllables are parameterized and combined, but in this Hlit, there are not one type of syllable parameter storage device used here, but six types, and the syllable parameters are combined. One storage device can be selected from six types of syllable parameter storage devices by checking whether the syllable to be used is at the beginning of a word or not, and if it is not at the beginning of a word, by referring to the type of vowel of the syllable immediately before the syllable.

次にこの６種類の音節パラメータ記憶装置４〜９につい
て説明する。Next, the six types of syllable parameter storage devices 4 to 9 will be explained.

Ｃ■音節パラメータ記・阻装置４は、例えばアナウンサ
ーによって単独に発声された自然音声の単音節を分析し
て作成した音節パラメータファイルであり、音節の直前
に母音の付かない条件で発声させた音声をパラメータ化
した語頭用の音節パラメータを記憶するものである。C■Syllable parameter record/inhibition device 4 is a syllable parameter file created by analyzing a single syllable of natural speech uttered alone by an announcer, for example, and is a syllable parameter file created by analyzing a single syllable of natural speech uttered alone by an announcer. It stores the syllable parameters for the beginning of a word, which are parameterized.

また、他の５つのｖＣＶ音節パラメータ記憶装置４〜９
は、例えばアナウンサーによって発声された直前に母音
／ａ／、　／　ｉ／、　／ｕ／、　／ｅ／、　１０／が
付いた２音節の自然音声から必要な後ろの音節（ＣＶ）
を直前母音とのわたり部分（Ｖ）を含んだ形で切り出し
、分析し、作成したｖＣＶ音節パラメータファイルであ
る。例えば母音／ａ／どのわたり部分を付加した音節／
Ｗａ／についてｖＣＶ音Ｄパラメータを作成する場合を
第２図を用いて説明すると、まず、アナウンサーに／ａ
　Ｗａ／と発声させて第２図に示すような自然音声のス
ペクトルを得る。ここで、ｒａ　Ｊ　　「ｗａＪは音節
部分、［ａ刀はｒａ　Ｊから「１４」へ遷移するまでの
音声のわたり部分である。In addition, other five vCV syllable parameter storage devices 4 to 9
is, for example, the necessary back syllable (CV) from a two-syllable natural voice uttered by an announcer with the vowel /a/, /i/, /u/, /e/, 10/ immediately before it.
This is a vCV syllable parameter file created by cutting out and analyzing the VCV including the transition part (V) with the preceding vowel. For example, the vowel /a / the syllable with the transition part added /
To explain the case of creating vCV sound D parameters for Wa/ using Figure 2, first, tell the announcer /a
Wa/ is uttered to obtain a natural speech spectrum as shown in FIG. Here, ra J "waJ" is the syllable part, and [a sword is the transition part of the voice from ra J to "14".

従来のＣＶパラメータ記憶装置は「ａ」と［ｗａＪ部分
の分析パラメータのみを記憶していた。しかし、ここで
はさらにｒａＪからｒ１４Ｊへの音声のわたり部分「ａ
ｏ」の分析結果も数フレーム含んだ［ａ’１４ａＪの形
でｖＣＶ音節パラメータファイルとして記憶するように
している。The conventional CV parameter storage device stores only the analysis parameters of the "a" and [waJ parts. However, here, the voice transition part “a” from raJ to r14J is further explained.
The analysis result of "o" is also stored as a vCV syllable parameter file in the form of [a'14aJ, which includes several frames.

次に音韻系列を音節単位に分割したものからどのように
音声パラメータを生成するか説明する。Next, we will explain how to generate speech parameters from a phoneme sequence divided into syllables.

結合する音節が語頭であるときは、ＣＶ音節パラメータ
記憶装置４から音節パラメータ列を取り出し、語頭でな
いときは結合する音節の直前の音節に着目し、その母音
節と同じ母音についてのわたり部分を含むｖＣＶ音節パ
ラメータ記憶装置５〜９から音節パラメータ列を取り出
す。When the syllable to be combined is at the beginning of a word, the syllable parameter string is retrieved from the CV syllable parameter storage device 4, and when it is not at the beginning of the word, the syllable immediately before the syllable to be combined is focused on, and the transition part for the same vowel as that vowel is included. A syllable parameter string is retrieved from the vCV syllable parameter storage devices 5-9.

例えば第３図に示す単玉音声「ひまわり」［ｈｉ・ｒｍ
ａ　−ｗａ　−ｒｉ］を合成する場合を例にとると、／
ｈｉ／は語頭用のＣＶ音即パラメータ記憶装置４　ｆｉ
ｔら音節パラメータ列が取り出され、／１ａ／は直前母
音が／１／のＶＣＶ音節パラメータ記憶装置６から、ま
た／Ｖ４ａ／、／ｒｒ／は直前母音が／ａ／のｖＣＶ音
節パラメータ記憶装置６からそれぞれ音節パラメータ列
が取り出される。この結果、／ｉａ／、／ｗａ／、／ｒ
ｉ／については、その頭に［ドＪ、ｒａｌで示すような
わたり部分の存在する音節パラメータ列が得られる。そ
してこれら音節パラメータ列を、斜線で示す補間区間で
補間して継続し、音声パラメータ列が生成される。ちな
みに従来の音節パラメータ記ニ装置には、第４図のよう
にわたり部分のないパラメータしか用いておらず、補間
区間で補間を行なうことのみで、わたり部分をも生成し
ていた。For example, the single sound ``Himawari'' [hi・rm
a -wa -ri], for example, /
hi/ is a CV sound immediate parameter storage device 4 for the beginning of a word fi
t and other syllable parameter strings are retrieved, /1a/ is retrieved from the VCV syllable parameter storage device 6 whose immediately preceding vowel is /1/, and /V4a/ and /rr/ are retrieved from the vCV syllable parameter storage device 6 whose immediately preceding vowel is /a/. A syllable parameter string is extracted from each. As a result, /ia/, /wa/, /r
For i/, a syllable parameter sequence is obtained in which there is a transition part such as [do J, ral] at the beginning. Then, these syllable parameter sequences are interpolated and continued in the interpolation interval indicated by diagonal lines to generate a speech parameter sequence. Incidentally, in the conventional syllable parameter recording device, only parameters without a spanning part are used as shown in FIG. 4, and crossing parts are also generated only by performing interpolation in the interpolation interval.

以上のような方法で生成された音声パラメータ列は音声
合成器１０に入力される。一方、Ｉ１１律記月刊は韻律
パラメータ生成装置３により韻律パラメータ列に変換さ
れるが、この韻律パラメータ列を同じく音声合成器１０
に入力させる。そして、上記２種類のパラメータ列の入
力により音声合成器１０が動作して滑らかで自然性の高
い合成音声が出力される。The speech parameter sequence generated by the method described above is input to the speech synthesizer 10. On the other hand, the I11 Ritsuki Monthly is converted into a prosodic parameter string by the prosodic parameter generation device 3, but this prosodic parameter string is also converted to the speech synthesizer 10.
input. Then, the speech synthesizer 10 operates based on the input of the above two types of parameter sequences, and outputs smooth and highly natural synthesized speech.

このように、本実施例に係る音声合成装置によれば、音
声の調音結合現象に看目し、語頭用として孤立発声させ
た単音節の分析から作成したＣＶ音節パラメータファイ
ルを用い、語頭でない場合のものとして、前にわたり部
分を含んだ５種類のｖＣＶ音節パラメータファイルを用
いることによって、音節の前後環境を考慮した補間結合
を行い音声パラメータ列を生成している。このためＣＶ
音節の接続部分のスペクトル包絡特性の変化が自然音声
に近いものとなり、自然な連続性を持つ高品質の合成音
声が得られるという優れた効果が秦される。As described above, according to the speech synthesis device according to the present embodiment, the CV syllable parameter file created from the analysis of a single syllable uttered in isolation as a word-initial character is used, with an eye to the articulatory combination phenomenon of speech. By using five types of vCV syllable parameter files including front and back parts, a speech parameter string is generated by performing interpolation and combination taking into account the surrounding environment of the syllable. For this reason, CV
The change in the spectral envelope characteristics of the connecting portion of syllables becomes close to that of natural speech, resulting in the excellent effect of obtaining high-quality synthesized speech with natural continuity.

［発明の効果］以上のように、本発明によれば、音節同士をわたり部分
を含んだ補間処理によって１続しているので、より自然
な音声を１琴ることができる。[Effects of the Invention] As described above, according to the present invention, since the syllables are continuous by interpolation processing including the parts where they cross each other, it is possible to produce a more natural sound.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係る音声合成装置のブロッ
ク図、第２図は同装置における音節パラメータファイル
の作成手順を説明するための音声分析図、第３図は同装
置における音声パラメータ列の作成手順を説明するため
の図、第４図は従来の音声合成装置における音声パラメ
ータ列の作成手順を説明するための図である。１・・・文字列解析装置、２・・・音声パラメータ列生
成装置、３・・・韻律パラメータ列生成装置、４・・・
Ｃｖ音節パラメータ記・は装置、５〜９・・・ｖＣＶ音
節パラメータ記憶装置、１０・・・音声合成器。出願人代理人　弁理士　鈴江武彦一一一　パワー □時間第２図第３図第４図Fig. 1 is a block diagram of a speech synthesis device according to an embodiment of the present invention, Fig. 2 is a speech analysis diagram for explaining the procedure for creating a syllable parameter file in the same device, and Fig. 3 is a speech parameter diagram in the same device. FIG. 4 is a diagram for explaining the procedure for creating a sequence of speech parameters in a conventional speech synthesis device. DESCRIPTION OF SYMBOLS 1... Character string analysis device, 2... Audio parameter string generation device, 3... Prosody parameter string generation device, 4...
Cv syllable parameter record device, 5 to 9 vCV syllable parameter storage device, 10 speech synthesizer. Applicant's Representative Patent Attorney Takehiko Suzue Power □ Time Figure 2 Figure 3 Figure 4

Claims

[Claims]

(1) A character string analysis device that analyzes an input character string to generate a phonetic symbol string and prosodic information, a syllable parameter representing a syllable without a vowel immediately before it, and a syllable with a transition part of the previous vowel added. a storage device that stores a syllable parameter representing a syllable; a phonological parameter string generation device that generates a phonological parameter string from the generated phonetic symbol string by referring to the syllable parameter; a prosodic parameter sequence generation device that generates a sequence;
A speech synthesis device comprising a speech synthesizer that generates and outputs synthesized speech according to the speech parameter string and the prosodic parameter string.

(2) The syllable parameter representing the syllable without a vowel immediately preceding it is created from monosyllabic analysis of naturally occurring natural speech, and is a syllable representing the syllable to which the transition part of the immediately preceding vowel is added. The parameters are generated by analyzing natural speech of two syllables with a vowel immediately before it, including the transition part between the vowel and the previous vowel. Speech synthesizer.