JP3083624B2

JP3083624B2 - Voice rule synthesizer

Info

Publication number: JP3083624B2
Application number: JP04054994A
Authority: JP
Inventors: 芳則志賀; 義幸原; 恒雄新田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-03-13
Filing date: 1992-03-13
Publication date: 2000-09-04
Anticipated expiration: 2015-09-04
Also published as: JPH05257494A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、声質の違った音声を合
成するのに好適な音声規則合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech rule synthesizing apparatus suitable for synthesizing speeches having different voice qualities.

【０００２】[0002]

【従来の技術】音声規則合成方式は、実際に人間が発声
した音声を観測或いは分析することによって得られる音
声パラメータを、子音＋母音（ＣＶ）や母音＋子音＋母
音（ＶＣＶ）、子音＋母音＋子音（ＣＶＣ）等の単位で
音声素片として予め用意しておき、入力される音韻系列
に従ってこれら音声素片を補間接続し、こうして得られ
た音韻パラメータと、他方で生成されたピッチパターン
からなる韻律パラメータとを合成器に送って音声を合成
するものである。2. Description of the Related Art A speech rule synthesizing method uses a speech parameter obtained by observing or analyzing a voice actually uttered by a human being as a consonant + vowel (CV), a vowel + consonant + vowel (VCV), a consonant + vowel. A voice unit is prepared in advance in units of + consonants (CVC) or the like, and these voice units are interpolated and connected in accordance with the input phoneme sequence. From the phoneme parameters thus obtained and the pitch pattern generated on the other hand, Is sent to the synthesizer to synthesize speech.

【０００３】そこで、上記のような規則合成方式で声質
の違った音声を合成しようとした場合、前記音声素片の
全てを、異なった人により発声された音声データから得
られる音声パラメータより作成した音声素片に切り替え
て用いることが考えられる。In order to synthesize voices having different voice qualities by the rule synthesis method described above, all of the voice segments are created from voice parameters obtained from voice data uttered by different people. Switching to speech units may be used.

【０００４】[0004]

【発明が解決しようとする課題】上記したように、従来
の音声規則合成技術を利用して声質の違った音声を合成
しようとすると、音声素片を全て入れ替えなければなら
ず、また、１つの合成装置で複数の声質の音声を合成し
たい場合、合成したい声質の数に比例して音声素片の保
持に必要なメモリが増大するという問題があった。As described above, when synthesizing voices having different voice qualities by using the conventional voice rule synthesizing technique, all voice segments must be replaced, and one voice unit must be replaced. When synthesizing voices of a plurality of voice qualities with a synthesizer, there is a problem that the memory required for holding speech units increases in proportion to the number of voice qualities to be synthesized.

【０００５】そこで、本発明は、少数の音声素片の入れ
替えだけで声質を変えることができ、また、１つの合成
装置で複数の声質の音声を合成したい場合でも、音声素
片の保持に必要なメモリの増加が少量で済むような、音
声規則合成装置を提供することを目的とする。Therefore, the present invention can change the voice quality only by exchanging a small number of voice segments, and is necessary for holding voice segments even when it is desired to synthesize voices of a plurality of voice qualities with one synthesizer. It is an object of the present invention to provide a speech rule synthesizing apparatus that requires only a small amount of memory.

【０００６】[0006]

【課題を解決するための手段】本発明の音声規則合成装
置は上記課題を解決するために、異なる声質毎に複数の
母音及び複数の半母音の音声パラメータからなる母音素
片及び半母音素片を記憶する母音素片記憶部と、特定声
質の母音へのわたりの途中までを含む子音の音声パラメ
ータからなる子音素片を記憶する素片記憶部と、上記母
音素片記憶部に記憶されている母音素片及び半母音素片
のもととなった声質に対応して基本ピッチ情報を複数記
憶する基本ピッチ記憶部と、音声合成の対象となる入力
文書の解析を行い、音韻系列、アクセント情報を生成
し、音韻長を決定する言語解析・音韻長決定部と、この
言語解析・音韻長決定部で生成された音韻系列をもと
に、母音素片については声質指定情報として入力指定さ
れた声質に対応する母音素片及び半母音素片を前記母音
素片記憶部から読み出し、また子音素片については上記
素片記憶部から読み出しを行い、上記言語解析・音韻長
決定部で決定された音韻長に従って補間接続して音韻パ
ラメータを生成する音韻パラメータ生成部と、上記言語
解析・音韻長決定部で生成されたアクセント情報及び決
定された音韻長と、声質指定情報として入力指定された
声質に対応する基本ピッチ情報に基づいてピッチパター
ンからなる韻律パラメータを生成する韻律パラメータ生
成部と、この韻律パラメータ生成部で生成された韻律パ
ラメータをもとに音源パルスを生成し、上記音韻パラメ
ータ生成部で生成された音韻パラメータをフィルタ係数
として音声合成を行う合成フィルタ部とを備えたことを
特徴とする。Means for Solving the Problems] speech synthesis by rule instrumentation of the present invention
In order to solve the above problem , multiple
Vowels consisting of vowels and multiple semi-vowel speech parameters
Vowel segment storage unit that stores segment and semi-vowel segments, and specific voice
Speech parameters of consonants including the middle part of quality vowels
Segment storage unit for storing a consonant segment composed of data
Vowel segments and semi-vowel segments stored in the phoneme segment storage unit
Multiple basic pitch information according to the voice
Basic pitch storage unit to remember and input to be synthesized
Analyzes documents and generates phoneme series and accent information
And a language analysis and phoneme length determining unit for determining phoneme length.
Based on the phoneme sequence generated by the language analysis and phoneme length determination unit
Vowel segments are input and specified as voice quality specification information.
Vowel segments and semi-vowel segments corresponding to the selected voice quality
Read from the unit storage unit, and the above
Read from the segment storage unit and perform the above language analysis and phoneme length
Interpolation connection according to the phoneme length determined by the determination unit
A phonological parameter generator for generating parameters, the language
Accent information and decision generated by the analysis / phoneme length determination unit
Phoneme length specified and input as voice quality specification information
Pitch putter based on basic pitch information corresponding to voice quality
Parameter generation to generate prosody parameters consisting of
And the prosody parameter generated by the prosody parameter generation unit.
Generates a sound source pulse based on the parameters
The phoneme parameters generated by the data generator
And a synthesis filter unit that performs voice synthesis .

【０００７】[0007]

【作用】上記の構成によれば、予め記憶手段に保持して
いる音声素片の音声パラメータを入力音韻系列に応じて
読み出し順次接続した後、音声合成器に与えて音声を出
力する音声規則合成装置において、母音素片及び半母音
素片以外の音声素片については、常に声質の同じ音声パ
ラメータからなる音声素片が用いられ、母音素片及び半
母音素片については、声質の異なる母音及び半母音の音
声パラメータからなる母音素片及び半母音素片が切り替
えて用いられる。According to the above arrangement , the speech rule of the speech unit which reads out the speech parameters of the speech segments stored in the storage means in advance in accordance with the input phoneme sequence and sequentially connects them is given to the speech synthesizer to output the speech. in the device, for the voice segment other than a vowel segment and glide piece, always speech unit is used consisting of the same speech parameters of voice, for the vowel segment and glide element pieces of different vowels and semi-vowels of voice quality Vowel segments and semi- vowel segments consisting of speech parameters are switched and used.

【０００８】この母音素片を含む母音部は音声の声質に
最も影響を与える部分である。したがって、少なくとも
この母音素片に、声質の異なった母音の音声パラメータ
からなる母音素片を切り替えて使用することで、他の音
声素片（例えば母音素片及び半母音素片以外の音声素
片）には、常に声質の同じ音声パラメータからなる音声
素片を用いていても、合成音声の声質を切り替えること
が可能となる。The vowel part including the vowel segments is the part that most affects the voice quality of the voice. Therefore, by switching and using a vowel segment composed of vowel speech parameters having different voice qualities at least as the vowel segment, other vowel segments (for example, speech segments other than vowel segments and semi-vowel segments) are used. In this case, it is possible to switch the voice quality of the synthesized voice even if a voice segment consisting of voice parameters having the same voice quality is always used.

【０００９】即ち、本発明の音声規則合成装置によれ
ば、少数の音声素片の入れ替えだけで合成音声の声質を
変えることができ、また、１つの合成装置で複数の声質
の音声を合成したい場合でも、音声素片の保持に必要な
メモリの増加が少量で済む。That is, according to the speech rule synthesizing apparatus of the present invention, it is possible to change the voice quality of synthesized speech only by replacing a small number of speech units, and to synthesize voices of a plurality of voice qualities with one synthesizing apparatus. Even in this case, a small increase in the memory required for holding the speech unit is required.

【００１０】[0010]

【実施例】以下、図面を参照して本発明の実施例につき
説明する。図１は本発明の一実施例に係る音声合成装置
（音声規則合成装置）のブロック構成図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a speech synthesizer according to one embodiment of the present invention.
FIG. 3 is a block diagram of a (voice rule synthesis device) .

【００１１】図１の音声合成装置は、素片記憶部１と、
例えば３つの母音素片記憶部２〜４と、３つの基本ピッ
チ記憶部１２〜１４とを有している。素片記憶部１に
は、母音へのわたりの途中までを含む子音の音声パラメ
ータからなる子音素片Ｃｖと母音から子音へ移るときの
母音過渡部素片ｖｃが記憶されている。また、母音素片
記憶部２〜４には、それぞれ声質の異なる母音の音声パ
ラメータからなる母音素片が記憶されている。また基本
ピッチ記憶部１２〜１４には、母音素片記憶部２〜４に
記憶されている母音素片のもととなった音声の発声者の
基本ピッチがそれぞれ記憶されている。The speech synthesizing apparatus shown in FIG.
For example, it has three vowel unit storage units 2 to 4 and three basic pitch storage units 12 to 14. The unit storage unit 1 stores a consonant segment Cv including speech parameters of a consonant including a part of the way to a vowel and a vowel transition unit segment vc at the time of transition from a vowel to a consonant. The vowel segment storage units 2 to 4 store vowel segments each including voice parameters of vowels having different voice qualities. The basic pitch storage units 12 to 14 store the basic pitches of the speakers of the voices that are the basis of the vowel segments stored in the vowel unit storage units 2 to 4, respectively.

【００１２】図１の音声合成装置はまた、音声合成（読
み上げ）の対象となるテキスト（入力文書）の言語解析
に基づく音韻系列、アクセント情報の生成、各音韻長の
決定等を行う言語解析・音韻長決定部５と、後述する音
韻パラメータ生成部７及び韻律パラメータ生成部８を制
御する制御部６とを有している。The speech synthesizing apparatus shown in FIG. 1 also generates a phonological sequence based on linguistic analysis of a text (input document) to be subjected to voice synthesis (speech), generates accent information, and determines each phonological length. It has a phoneme length determination unit 5 and a control unit 6 that controls a phoneme parameter generation unit 7 and a prosody parameter generation unit 8 described below.

【００１３】図１の音声合成装置はまた、言語解析・音
韻長決定部５で生成された音韻系列及び音韻長に従い、
素片記憶部１に記憶されている音声素片、及び母音素片
記憶部２〜４のうちの選択された１つに記憶されている
音声素片を用いて音韻パラメータを生成する音韻パラメ
ータ生成部７と、韻律パラメータ生成部８と、合成器フ
ィルタ９とを有している。The speech synthesizing apparatus shown in FIG. 1 also uses the phoneme sequence and phoneme length generated by the language
Phoneme parameter generation for generating phoneme parameters using a speech unit stored in the unit storage unit 1 and a speech unit stored in a selected one of the vowel unit storage units 2 to 4 It has a unit 7, a prosody parameter generation unit 8, and a synthesizer filter 9.

【００１４】韻律パラメータ生成部８は、言語解析・音
韻長決定部５で生成されたアクセント情報及び音韻長と
基本ピッチ記憶部１２〜１４のうちの選択された１つに
記憶されている基本ピッチとに基づいて韻律パラメータ
を生成する。また合成器フィルタ９は、音韻パラメータ
生成部７からの音韻パラメータと韻律パラメータ生成部
８からの韻律パラメータをもとに、合成音声を生成す
る。The prosody parameter generation unit 8 stores the accent information and phoneme length generated by the language analysis and phoneme length determination unit 5 and the basic pitch stored in one of the basic pitch storage units 12 to 14. And a prosody parameter is generated based on The synthesizer filter 9 generates a synthesized speech based on the phoneme parameters from the phoneme parameter generation unit 7 and the prosody parameters from the prosody parameter generation unit 8.

【００１５】図１の音声合成装置は更に、母音素片記憶
部２〜４の１つを選択して音韻パラメータ生成部７に切
替接続する母音素片切替部２１と、基本ピッチ記憶部１
２〜１４の１つを選択して韻律パラメータ生成部８に切
替接続する基本ピッチ切替部２２と、声質切替制御部２
３とを有している。この声質切替制御部２３は、図示せ
ぬ入力部から入力された（ユーザ指定の）声質指定情報
に応じて母音素片切替部２１及び基本ピッチ切替部２２
を制御することにより、合成器フィルタ９で合成される
音声の声質の切り替えを制御する。The voice synthesizing apparatus of FIG. 1 further includes a vowel unit switching unit 21 for selecting one of the vowel unit storage units 2 to 4 and switching connection to the phoneme parameter generation unit 7, and a basic pitch storage unit 1.
A basic pitch switching unit 22 for selecting one of 2-14 and switching connection to the prosody parameter generation unit 8;
And 3. The voice quality switching control unit 23 performs a vowel unit switching unit 21 and a basic pitch switching unit 22 according to voice quality specification information (user specification) input from an input unit (not shown).
, The switching of the voice quality of the voice synthesized by the synthesizer filter 9 is controlled.

【００１６】次に、図１の音声合成装置における音声合
成処理について説明する。まず、素片記憶部１及び母音
素片記憶部２〜４に記憶される音声素片の作成方法につ
いて詳述する。Next, the speech synthesis processing in the speech synthesis apparatus shown in FIG. 1 will be described. First, a method of creating speech units stored in the unit storage unit 1 and the vowel unit storage units 2 to 4 will be described in detail.

【００１７】音声素片の作成にあたっては、まず、発声
リストに従ってアナウンサ等の特定発声者が発声した音
声データに、２０ｍｓｅｃ程度の一定時間長の時間窓を
掛け、１０ｍｓｅｃ程度の一定時間シフトをしながら各
窓内でケプストラム分析を行う。In preparing a speech unit, first, a time window having a fixed time length of about 20 msec is applied to voice data uttered by a specific speaker such as an announcer in accordance with the utterance list, and shifted by a fixed time of about 10 msec. Cepstrum analysis is performed in each window.

【００１８】次に、各フレームのパワースペクトラムや
音声パワーを見ながら、素片として切り出したいフレー
ム範囲に対応するケプストラムパラメータを抜き出し、
音声素片とする。図２（ａ）は、音声データの１つの音
韻からＣｖ，ｖｃ素片を切り出している例を示す。この
ように過渡区間に関しては、比較的広い範囲で（フレー
ム数を多く）切り出しを行う。Ｃｖ，ｖｃ素片は上述し
たように素片記憶部１に記憶される。Next, while observing the power spectrum and audio power of each frame, cepstrum parameters corresponding to a frame range to be cut out as a segment are extracted.
Speech unit. FIG. 2A shows an example in which Cv and vc segments are cut out from one phoneme of audio data. As described above, the transition section is cut out in a relatively wide range (the number of frames is large). The Cv and vc segments are stored in the segment storage unit 1 as described above.

【００１９】一方、定常区間である母音部は１フレーム
分のケプストラムパラメータのみ切り出す。ここでは、
３名の発声者の音声データ中の各母音（日本語の場合、
/a/,/i/,/u/,/e/,/o/ ）から前記のようにケプストラム
パラメータの切り出しを行い、それぞれ母音素片記憶部
２〜４に記憶させる。この場合、この３名の発声者の基
本ピッチを、基本ピッチ記憶部１２〜１４に記憶させ
る。On the other hand, only a cepstrum parameter for one frame is cut out from a vowel part which is a stationary section. here,
Each vowel in the voice data of three speakers (for Japanese,
Cepstrum parameters are cut out from / a /, / i /, / u /, / e /, / o /) as described above, and stored in the vowel unit storage units 2 to 4, respectively. In this case, the basic pitches of the three speakers are stored in the basic pitch storage units 12 to 14.

【００２０】さて、音声合成（読み上げ）の対象となる
入力文書は言語解析・音韻長決定部５に与えられる。言
語解析・音韻長決定部５は、この入力文書を対象として
言語解析を行い、音韻系列、アクセント情報を生成し、
それに各音韻長を決定してこれらの情報を制御部６に渡
す。制御部６は、言語解析・音韻長決定部５からの情報
に従い、音韻パラメータ生成部７及び韻律パラメータ生
成部８を制御する。The input document to be subjected to speech synthesis (speech) is provided to a language analysis / phoneme length determination unit 5. The linguistic analysis / phoneme length determination unit 5 performs linguistic analysis on the input document to generate a phonological sequence and accent information,
Then, each phoneme length is determined, and the information is passed to the control unit 6. The control unit 6 controls the phoneme parameter generation unit 7 and the prosody parameter generation unit 8 according to the information from the language analysis and phoneme length determination unit 5.

【００２１】一方、声質切替制御部２３には、ユーザに
よって指定された合成音声の声質を示す声質指定情報が
図示せぬ入力部を通して与えられる。声質切替制御部２
３は、この声質指定情報に従って母音素片切替部２１及
び基本ピッチ切替部２２を制御する。On the other hand, the voice quality switching control unit 23 is supplied with voice quality specifying information indicating the voice quality of the synthesized voice specified by the user through an input unit (not shown). Voice quality switching control unit 2
3 controls the vowel unit switching unit 21 and the basic pitch switching unit 22 according to the voice quality designation information.

【００２２】これにより母音素片切替部２１は、母音素
片記憶部２〜４のうち、声質指定情報の示す声質に対応
する母音素片記憶部を選択して音韻パラメータ生成部７
に接続する。また基本ピッチ切替部２２は、基本ピッチ
記憶部１２〜１４のうち、声質指定情報の示す声質に対
応する基本ピッチ記憶部を選択して韻律パラメータ生成
部８に接続する。ここでは、母音素片記憶部２と基本ピ
ッチ記憶部１２、母音素片記憶部３と基本ピッチ記憶部
１３、または母音素片記憶部４と基本ピッチ記憶部１４
の組み合わせで選択される。Thus, the vowel unit switching unit 21 selects the vowel unit storage unit corresponding to the voice quality indicated by the voice quality designation information from the vowel unit storage units 2 to 4, and selects the phonological parameter generation unit 7.
Connect to Further, the basic pitch switching unit 22 selects a basic pitch storage unit corresponding to the voice quality indicated by the voice quality designation information from the basic pitch storage units 12 to 14, and connects the selected basic pitch storage unit to the prosody parameter generation unit 8. Here, the vowel unit storage unit 2 and the basic pitch storage unit 12, the vowel unit storage unit 3 and the basic pitch storage unit 13, or the vowel unit storage unit 4 and the basic pitch storage unit 14
Is selected in combination.

【００２３】音韻パラメータ生成部７は制御部６からの
制御を受け、言語解析・音韻長決定部５で生成された音
韻系列（読みの音韻系列）をもとに、必要な音声素片
を、母音の音声パラメータからなる母音素片については
母音素片切替部２１によって切替接続されている母音素
片記憶部２〜４のうちの１つ（即ち、指定の声質に対応
する母音素片記憶部）から、子音の音声パラメータから
なる子音素片（ここでは、Ｃｖ素片及びｖｃ素片）につ
いては素片記憶部１から、それぞれ読み出し、これらを
言語解析・音韻長決定部５で決定された各音韻長に従っ
て補間接続して音韻パラメータを生成する。The phoneme parameter generation unit 7 receives a control from the control unit 6 and, based on the phoneme sequence (reading phoneme sequence) generated by the language analysis and phoneme length determination unit 5, extracts necessary speech units. A vowel segment composed of vowel speech parameters is one of the vowel segment storage units 2 to 4 switched by the vowel unit switching unit 21 (that is, a vowel unit storage unit corresponding to a specified voice quality). ), The consonant segments (here, the Cv segment and the vc segment) composed of the consonant voice parameters are read from the segment storage unit 1 and determined by the language analysis / phoneme length determination unit 5. Interpolation connection is performed according to each phoneme length to generate phoneme parameters.

【００２４】この素片間の接続は、図２（ｂ）に示すよ
うに行われる。即ち、フレームの繰り返し区間と補間区
間を挿入・調節して、言語解析・音韻長決定部５で決定
された各音韻長に合わせながら素片間が接続されてい
く。ここでは、補間方法としてケプストラムパラメータ
各次数の線形補間を用いている。The connection between the segments is made as shown in FIG. That is, by inserting and adjusting the repetition section and the interpolation section of the frame, the segments are connected with each other while adjusting to the respective phoneme lengths determined by the language analysis / phoneme length determination section 5. Here, linear interpolation of each order of the cepstrum parameter is used as an interpolation method.

【００２５】また韻律パラメータ生成部８は制御部６か
らの制御を受け、言語解析・音韻長決定部５で生成され
たアクセント情報及び音韻長と基本ピッチ切替部２２に
よって切替接続されている基本ピッチ記憶部１２〜１４
のうちの１つに記憶されている基本ピッチ（即ち、指定
の声質に対応する基本ピッチ）に基づいて、ピッチパタ
ーンからなる韻律パラメータを生成する。The prosody parameter generation unit 8 receives control from the control unit 6, and switches the accent information and the phoneme length generated by the language analysis / phoneme length determination unit 5 and the basic pitch switched by the basic pitch switching unit 22. Storage unit 12-14
Is generated based on the basic pitch (that is, the basic pitch corresponding to the designated voice quality) stored in one of them.

【００２６】韻律パラメータ生成部８によって生成され
た韻律パラメータは合成器フィルタ９に供給される。こ
の合成器フィルタ９には、音韻パラメータ生成部７によ
って生成された音韻パラメータも供給される。合成器フ
ィルタ９は、例えばＬＭＡフィルタ（対数振幅特性近似
フィルタ）であり、韻律パラメータ生成部８からの韻律
パラメータをもとに音源パルスを生成し、音韻パラメー
タ生成部７からの音韻パラメータをフィルタ係数として
合成音声を作り出す。The prosody parameters generated by the prosody parameter generation unit 8 are supplied to a synthesizer filter 9. The synthesizer filter 9 is also supplied with the phoneme parameters generated by the phoneme parameter generator 7. The synthesizer filter 9 is, for example, an LMA filter (logarithmic amplitude characteristic approximation filter), generates a sound source pulse based on the prosody parameters from the prosody parameter generation unit 8, and converts the phoneme parameters from the phoneme parameter generation unit 7 into filter coefficients. To produce synthesized speech.

【００２７】このようにして合成された音声は、母音部
分だけが、選択された母音素片（即ち、母音素片切替部
２１によって選択された母音素片記憶部より読み出した
母音素片）から得られる声質となる。ところが、母音部
は音声の声質に最も影響を与える部分であるから、以上
のようにして合成された音声は、ユーザが指定した母音
素片のもととなった発声者の声質となる。In the speech synthesized in this way, only the vowel part is selected from the selected vowel segment (ie, the vowel segment read out from the vowel segment storage unit selected by the vowel unit switching unit 21). The resulting voice quality. However, since the vowel part is the part that most affects the voice quality of the voice, the voice synthesized as described above becomes the voice quality of the speaker who is the source of the vowel unit specified by the user.

【００２８】以上本発明の一実施例について説明した
が、本発明は上記実施例に限定されるものではない。例
えば、上記実施例では、母音素片だけ、声質の異なる音
声パラメータから作成した母音素片に切り替えて声質を
変えているが、母音と同じ調音形式の子音である半母音
（/w/,/y/ 等）も、合成される音声の声質への影響が比
較的大きいので、これら半母音の素片も母音素片と同様
に声質毎に独立して持たせ、ユーザの指定する声質に応
じて切り替えることもできる。Although one embodiment of the present invention has been described above, the present invention is not limited to the above embodiment. For example, in the above embodiment, only the vowel segments are switched to vowel segments created from speech parameters having different voice qualities to change the voice quality, but a semi-vowel (/ w /, / y / Etc.) also have a relatively large effect on the voice quality of the synthesized voice, so that these semi-vowel segments are provided independently for each voice quality, like the vowel segments, and switched according to the voice quality specified by the user. You can also.

【００２９】また、合成音声の品質をさらに向上させる
ために、声質の異なった母音素片毎に、各母音の音源残
差信号を一緒に保持しておき、これをもとに合成器９で
音源を生成し、音声を合成することもできる。Further, in order to further improve the quality of the synthesized speech, the sound source residual signal of each vowel is held together for each vowel unit having a different voice quality, and based on this, the synthesizer 9 uses this signal. It is also possible to generate a sound source and synthesize speech.

【００３０】また、上記実施例では、声質の異なった母
音素片（が記憶されている母音素片記憶部２〜４）に対
応して、その母音素片のもととなった、それぞれ異なる
音声の発声者の基本ピッチ（が記憶されている基本ピッ
チ記憶部１２〜１４）を備えているが、特定の発声者の
基本ピッチだけを用意して、これを共通に用いるように
しても構わない。但し、この場合、母音素片のもととな
る音声データ発声者の声の高さではなくなることもあ
る。In the above embodiment, the vowel segments having different voice qualities (vowel segment storage units 2 to 4 storing the vowel segments) correspond to different vowel segments. Although the basic pitch storage units 12 to 14 storing the basic pitch of the speaker of the voice are provided, only the basic pitch of the specific speaker may be prepared and used in common. Absent. However, in this case, the pitch may not be the voice pitch of the voice data speaker that is the source of the vowel unit.

【００３１】さらに、合成パラメータの種類や音声素片
接続方法についても限定はなく、ケプストラムパラメー
タ以外でもＬＰＣ（Linear Predictive Coding）等他の
パラメータを使っても構わない。要するに本発明はその
要旨を逸脱しない範囲で種々変形して実施することがで
きる。Further, there is no limitation on the type of the synthesis parameters or the speech unit connection method, and other parameters such as LPC (Linear Predictive Coding) other than the cepstrum parameters may be used. In short, the present invention can be variously modified and implemented without departing from the gist thereof.

【００３２】[0032]

【発明の効果】以上詳述したように、本発明によれば、
少数の音声素片の入れ替えだけで声質を変えることがで
き、また、１つの合成装置で複数の声質の音声を合成し
たい場合でも、音声素片の保持に必要なメモリの増加が
少量で済む。As described above in detail, according to the present invention, according to the present onset Akira,
Can change the voice quality by simply interchanging a small number of speech units, also even if you want to synthesize speech of a plurality of voice in one combined unit, requires only a small amount is increased memory required to hold the speech unit .

[Brief description of the drawings]

【図１】本発明の一実施例に係る音声合成装置のブロッ
ク構成図。FIG. 1 is a block diagram of a speech synthesizer according to an embodiment of the present invention.

【図２】音声素片の切り出しと音声素片接続を説明する
ための図。FIG. 2 is a view for explaining extraction of speech units and connection of speech units.

[Explanation of symbols]

１…素片記憶部、２〜４…母音素片記憶部、５…言語解
析・音韻長決定部、６…制御部、７…音韻パラメータ生
成部、８…韻律パラメータ生成部、９…合成器フィル
タ、１２〜１４…基本ピッチ記憶部、２１…母音素片切
替部、２２…基本ピッチ切替部、２３…声質切替制御
部。DESCRIPTION OF SYMBOLS 1 ... Unit storage part, 2-4 ... Vowel unit storage part, 5 ... Language analysis and phoneme length determination part, 6 ... Control part, 7 ... Phoneme parameter generation part, 8 ... Prosody parameter generation part, 9 ... Synthesizer Filters, 12 to 14: Basic pitch storage unit, 21: Vowel unit switching unit, 22: Basic pitch switching unit, 23: Voice quality switching control unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭59−72494（ＪＰ，Ａ) 特開昭60−115997（ＪＰ，Ａ) 特開昭62−229199（ＪＰ，Ａ) 特開昭61−166600（ＪＰ，Ａ) 特開平１−271800（ＪＰ，Ａ) 特開昭64−35598（ＪＰ，Ａ) 特開平２−205896（ＪＰ，Ａ) 特許2703253（ＪＰ，Ｂ２) 大泉監修，藤村編「音声科学」（1972 −３−10）東京大学出版会ｐ．349− 364 古井「ディジタル音声処理」（1985− ９−25）東海大学出版会ｐ．141−142 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-59-72494 (JP, A) JP-A-60-115997 (JP, A) JP-A-62-229199 (JP, A) JP-A 61-72199 166600 (JP, A) JP-A-1-271800 (JP, A) JP-A-64-35598 (JP, A) JP-A-2-205896 (JP, A) Patent 2703253 (JP, B2) Supervised by Oizumi, Fujimura “Speech Science” (1972-3-10), University of Tokyo Press, p. 349-364 Furui "Digital Speech Processing" (1985-9-25) Tokai University Press p. 141-142 (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06

Claims

(57) [Claims]

A plurality of vowels and a plurality of halves for different voice qualities.
Vowel segments and semi-vowel segments consisting of vowel speech parameters
Vowel segment storage unit that stores vowels, and consonant voices that include the middle of a vowel with a specific voice quality
A unit storage unit for storing a consonant unit composed of parameters, and a vowel unit and a semi-vowel stored in the vowel unit storage unit
Basic pitch information corresponding to the voice quality on which the phoneme was based
A basic pitch storage unit that stores a plurality of data and an input document that is a target of speech synthesis
A linguistic solution that generates sequence and accent information and determines phoneme length
And analysis and phoneme length determining unit, also the language phoneme sequence generated by the analysis-phoneme length determining unit
Voice quality designation information for vowel segments and semi-vowel segments
Vowel segments and semi-vowels corresponding to voice quality specified as input
A phoneme segment is read from the vowel segment storage unit,
For the piece, read from the segment storage unit,
Complement according to the phoneme length determined by the language analysis and phoneme length determination unit.
Phonological parameter generation that generates phonological parameters by interconnecting
The accent part generated by the linguistic analysis and phoneme length determining part.
Information and the determined phoneme length, and
Based on the basic pitch information corresponding to the specified voice quality
Prosody that generates prosodic parameters consisting of pitch patterns
A parameter generation unit, and a prosody parameter generated by the prosody parameter generation unit.
Generating a sound source pulse based on
Using the phoneme parameters generated by the
And characterized by including a synthesis filter unit for performing voice synthesis
Voice rule synthesis apparatus.