JP3133427B2

JP3133427B2 - Speech synthesizer

Info

Publication number: JP3133427B2
Application number: JP03299688A
Authority: JP
Inventors: 信英山崎
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-10-19
Filing date: 1991-10-19
Publication date: 2001-02-05
Anticipated expiration: 2016-02-05
Also published as: JPH05108095A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声素片相当の情報を
重ね合わせることにより音声を合成する音声合成装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for synthesizing speech by superimposing information corresponding to speech units.

【０００２】従来、特開平１−２３９２９２号公報に開
示されているような音声合成装置が知られている。この
音声合成装置では、ゼロ位相インパルス応答波形を音声
素片波形として用い、音声素片波形の振幅と重ね合わせ
周期とを乱数でランダムに指示して与えることにより、
人間の肉声に近い自然な無声音の合成を行なうことを意
図している。2. Description of the Related Art A speech synthesizer as disclosed in Japanese Patent Application Laid-Open No. 1-239292 has been known. In this speech synthesizer, the zero-phase impulse response waveform is used as a speech segment waveform, and the amplitude of the speech segment waveform and the superimposition period are randomly specified and given by random numbers, thereby giving
It is intended to synthesize natural unvoiced sounds close to human voice.

【０００３】具体的には、雑音信号のインパルス応答波
形に等間隔（約０．１７ｍ秒間隔）にランダムな値を掛
け合わせることによってインパルス応答波形の振幅をラ
ンダムに変化させ、また、重ね合わせ周期については、
図１１に示すように、次に重ね合わせるタイミングを前
のタイミングから乱数（“１”から“５”までの整数
値）で与え、これにより、図中、Ｒ１，Ｒ２，Ｒ３で示
すように重ね合わせ周期をランダムに与えていた。More specifically, the amplitude of the impulse response waveform is changed randomly by multiplying the impulse response waveform of the noise signal by a random value at regular intervals (approximately 0.17 msec). about,
As shown in FIG. 11, the next superimposition timing is given by a random number (an integer value from “1” to “5”) from the previous timing, whereby the superimposition is performed as indicated by R1, R2, and R3 in the figure. The matching cycle was given at random.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上述し
た従来の音声合成装置では、インパルス応答波形，すな
わち音声素片波形にランダムな値を掛け合わせて、その
振幅をランダムにしているので、合成音声としての無声
音のスペクトル特性が損なわれるという問題があった。
すなわち、音声素片波形の振幅をランダムにすることに
よって人間が実際に発声する無声音のスペクトル特性を
良好に近似することができなくなる。However, in the above-described conventional speech synthesizer, the amplitude is made random by multiplying the impulse response waveform, that is, the speech unit waveform by a random value, and thus the synthesized speech is synthesized. There is a problem that the spectral characteristics of the unvoiced sound are lost.
That is, by making the amplitude of the speech unit waveform random, the spectral characteristics of unvoiced sound actually uttered by a human cannot be satisfactorily approximated.

【０００５】また、上述した従来の音声合成装置では、
重ね合わせ周期をランダムにすることによって位相特性
をランダム化しているが、この重ね合わせでは、次に重
ね合わせるタイミングが１つ前のタイミングに影響され
ていることから、重なりが一様にならず、重ね合わせ後
の波形のパワーが大きく変動するという問題があり、ま
た、位相特性のランダムさが十分でないという問題があ
った。従って、この音声合成装置では、人間の肉声に近
いより自然な無声音を合成するには限界があった。In the above-mentioned conventional speech synthesizer,
Although the phase characteristics are randomized by making the superimposition cycle random, in this superimposition, since the next superimposition timing is affected by the immediately preceding timing, the overlap is not uniform, There is a problem that the power of the waveform after the superposition greatly varies, and there is a problem that the randomness of the phase characteristics is not sufficient. Therefore, this speech synthesizer has a limit in synthesizing a more natural unvoiced sound close to the real human voice.

【０００６】本発明は、無声音を合成する場合にも、人
間の肉声により近い自然な合成音声を生成することの可
能な音声合成装置を提供することを目的としている。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech synthesizer capable of generating a natural synthesized speech closer to the real human voice even when synthesizing an unvoiced sound.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、音声素片波形の位相をラン
ダム化してランダム位相音声素片波形を生成するランダ
ム位相化手段と、該ランダム位相化手段により生成され
た前記ランダム位相音声素片波形をずらしながら加算も
しくは重畳して無声音の音声波形を合成する波形重畳手
段とを備えていることを特徴としている。In order to achieve the above object, the invention according to claim 1 comprises a randomizing means for randomizing the phase of a speech unit waveform to generate a random-phase speech unit waveform; And a waveform superimposing means for adding or superimposing the random phase speech unit waveforms generated by the random phase converting means while shifting them to synthesize an unvoiced speech waveform.

【０００８】また、請求項２記載の発明は、波形重畳手
段が、ランダム位相音声素片波形の重ね合わせのタイミ
ングをランダム値で指示するランダム信号発生手段と、
前記ランダム信号発生手段により指示された前記重ね合
わせのタイミングにより、ランダム位相音声素片波形を
ずらして加算もしくは重畳して無声音の音声波形を合成
する重ね合わせ手段とを有していることを特徴としてい
る。According to a second aspect of the present invention, the waveform superimposing means includes a random signal generating means for designating a superimposition timing of a random phase speech unit waveform by a random value;
And superimposing means for synthesizing an unvoiced sound waveform by shifting or adding or superimposing a random phase speech unit waveform by the superimposition timing instructed by the random signal generating means. I have.

【０００９】また、請求項３記載の発明は、さらに、ラ
ンダム位相化後に音声素片波形に対して窓かけ処理を行
なう窓かけ処理手段が設けらけていることを特徴として
いる。Further, the invention according to claim 3 is characterized in that a windowing processing means for performing windowing processing on a speech unit waveform after random phase conversion is further provided.

【００１０】[0010]

【００１１】[0011]

【作用】請求項１記載の発明では、無声音の音声合成を
行なう際に、音声素片波形の振幅ではなく位相をランダ
ム化する。この結果、得られる無声音の合成音声は、ス
ペクトル特性については、人間が実際に発声する無声音
のスペクトル特性を維持しつつ、位相特性についてだ
け、ホワイトノイズのように十分にランダムなものとな
る。According to the first aspect of the invention, when performing voice synthesis of an unvoiced sound, the phase of the voice unit waveform is randomized instead of the amplitude. As a result, the synthesized voice of the unvoiced sound obtained is sufficiently random, such as white noise, only in phase characteristics, while maintaining the spectral characteristics of unvoiced sounds actually uttered by humans.

【００１２】また、請求項２記載の発明では、音声素片
波形の位相のみならず、重ね合わせのタイミングをもラ
ンダムにし、この結果、位相特性がより一層ランダム化
される。According to the second aspect of the present invention, not only the phase of the speech unit waveform but also the timing of superposition are made random, so that the phase characteristics are further randomized.

【００１３】また、請求項３記載の発明では、ランダム
位相化後に音声素片波形に対して窓かけ処理がなされ、
始端および終端の不連続性が緩和されたランダム位相音
声素片波形を所定のタイミングでずらしながら重ね合わ
せることができる。According to the third aspect of the present invention, the window processing is performed on the speech unit waveform after the random phase conversion,
The random-phase speech unit waveforms in which the discontinuity at the beginning and end are alleviated can be superimposed while being shifted at a predetermined timing.

【００１４】[0014]

【００１５】[0015]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は本発明に係る音声合成装置の一実施例のブ
ロックである。図１を参照すると、この音声合成装置
は、音声素片波形１の位相をランダム化するランダム位
相化部２と、無声，有声のいずれの合成音声を生成する
かを切替える切替部３と、無声の合成音声を生成する場
合には、ランダム位相化部２でランダム位相化された音
声素片波形をずらしながら加算もしくは重畳し、有声の
合成音声を生成する場合には、音声素片波形１を所定の
ピッチ周期で順次ずらしながら加算もしくは重畳して、
合成音声を生成する波形重畳部４とを備えている。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a speech synthesizer according to the present invention. Referring to FIG. 1, the speech synthesizer includes a random phase unit 2 for randomizing the phase of a speech unit waveform 1, a switching unit 3 for switching whether to generate unvoiced or voiced synthesized speech, and a voiceless unit. When the synthesized speech is generated, the speech unit waveform randomized by the random phase unit 2 is added or superimposed while being shifted, and when the voiced synthesized speech is generated, the speech unit waveform 1 is added. Addition or superimposition while shifting sequentially at a predetermined pitch cycle,
A waveform superimposing unit 4 for generating a synthesized voice.

【００１６】上記音声素片波形１は、例えばＬＰＣ，Ｌ
ＳＰ，ＰＳＥなどの音声分析手法によって得られた音声
のスペクトル包絡を逆フーリエ変換することにより作成
され、従って、音声のスペクトル包絡を逆フーリエ変換
した波形に相当するものを表現したものとなっている。The speech unit waveform 1 is, for example, LPC, L
It is created by performing an inverse Fourier transform on the spectral envelope of the voice obtained by a voice analysis technique such as SP or PSE, and thus represents a waveform equivalent to the inverse Fourier transformed waveform of the spectral envelope of the voice. .

【００１７】また、ランダム位相化部２は、与えられた
音声素片波形１のスペクトル包絡特性をそのままの状態
に維持しながら、音声素片波形１の位相だけをランダム
化するようになっている。図２はランダム位相化部２の
構成例を示す図であり、図２の例では、ランダム位相化
部２は、音声素片波形１をフーリエ変換してスペクトル
Ｓを得るフーリエ変換部５と、乱数Ｒを発生する乱数発
生部６と、フーリエ変換部５で得られたスペクトルＳの
位相を乱数Ｒでランダムに与え、逆フーリエ変換によっ
て再び時間領域の波形とする逆フーリエ変換部７とによ
り構成されている。The randomizing section 2 randomizes only the phase of the speech unit waveform 1 while maintaining the spectral envelope characteristic of the given speech unit waveform 1 as it is. . FIG. 2 is a diagram showing an example of the configuration of the random phase shifter 2. In the example of FIG. 2, the random phase shifter 2 performs a Fourier transform on the speech unit waveform 1 to obtain a spectrum S, A random number generator 6 for generating a random number R, and an inverse Fourier transform unit 7 for randomly giving the phase of the spectrum S obtained by the Fourier transform unit 5 with the random number R, and re-forming the time domain waveform by inverse Fourier transform. Have been.

【００１８】また、波長重畳部４については、無声音の
合成音声を生成する場合に、ランダム位相化された音声
素片波形を所定のピッチ周期でずらしながら加算もしく
は重畳するよう、これを構成することができる。The wavelength superimposing unit 4 is configured to add or superimpose a random-phased speech unit waveform while shifting it at a predetermined pitch cycle when generating a synthetic voice of unvoiced sound. Can be.

【００１９】次に、このような構成の音声合成装置の動
作について説明する。無声音の合成音声を生成しようと
するときには、切替部３は、“無声”側に切替わる。な
お、このような切替動作は、例えば、フレームが有声区
間であるか無声区間であるかの情報を切替部３に加える
ことによってなされても良いし、あるいはピッチデータ
の正，負の値をそれぞれ有声音情報，無声音情報として
切替部３に加えることによってなされても良い。ランダ
ム位相化部２では、これが例えば図２のような構成とな
っているときには、先づ、図３（ａ）に示すような音声
素片波形１をフーリエ変換してスペクトルＳを得る。次
いで、このスペクトルＳの位相を乱数Ｒでランダムに与
えて、逆フーリエ変換により再び時間領域の波形にす
る。このようにして得られた波形は、図３（ｂ）に示す
ように、位相がランダム化されたものとなっており、波
形重畳部４では、このランダム位相化された図３（ｂ）
に示す音声素片波形を例えば図３（ｃ）に示すように所
定のピッチ周期Ｆでずらしながら加算もしくは重畳し、
これにより無声音の合成音声を得ることができる。Next, the operation of the speech synthesizer having such a configuration will be described. When attempting to generate an unvoiced synthesized voice, the switching unit 3 switches to the “unvoiced” side. Note that such a switching operation may be performed, for example, by adding information indicating whether a frame is a voiced section or an unvoiced section to the switching unit 3, or by setting the positive and negative values of the pitch data respectively. It may be performed by adding to the switching unit 3 as voiced sound information and unvoiced sound information. When the random phase conversion unit 2 has a configuration as shown in FIG. 2, for example, the spectrum S is obtained by Fourier-transforming the speech unit waveform 1 as shown in FIG. Next, the phase of the spectrum S is randomly given by a random number R, and the waveform in the time domain is again formed by the inverse Fourier transform. The waveform thus obtained has a randomized phase as shown in FIG. 3 (b), and the waveform superimposing unit 4 uses the randomized phase in FIG. 3 (b).
Are added or superimposed while being shifted at a predetermined pitch period F, for example, as shown in FIG.
Thereby, a synthesized voice of unvoiced sound can be obtained.

【００２０】このように、本実施例では、無声音の合成
音声時には、基本的に、音声素片波形１の振幅ではなく
位相をランダム化するようにしているので、得られる合
成音声としての無声音のスペクトル特性については、人
間が実際に発声する無声音のスペクトル特性を維持しつ
つ、位相特性についてだけ、これをホワイトノイズのよ
うに十分にランダムなものにすることができる。これに
より、従来に比べて、人間の肉声に近いより自然な無声
音の合成音声を生成することが可能となる。As described above, in this embodiment, at the time of unvoiced synthesized speech, the phase of the speech unit waveform 1 is basically randomized instead of the amplitude. Regarding the spectral characteristics, it is possible to maintain the spectral characteristics of the unvoiced sound actually uttered by a human and make the phase characteristics sufficiently random like white noise only for the phase characteristics. As a result, it is possible to generate a more natural unvoiced synthesized voice that is closer to the real human voice than in the past.

【００２１】なお、上記の例では、音声合成時にランダ
ム位相化部２を作動させてリアルタイムに位相のランダ
ム化を行なっているが、これのかわりに、予め音声素片
波形１をランダム位相化し、これをメモリ等に予め保持
させておき、無声音の音声合成時には、メモリ等に記憶
されているランダム位相音声素片波形を読み出して波形
重畳部４に送るように構成することも可能である。In the above-mentioned example, the randomization of the phase is performed in real time by activating the random phase shifter 2 at the time of speech synthesis. This may be stored in a memory or the like in advance, and a random phase speech unit waveform stored in the memory or the like may be read out and sent to the waveform superimposing unit 4 at the time of voice synthesis of an unvoiced sound.

【００２２】また、上記の例では、波形重畳部４におい
て、ランダム位相音声素片波形を図３（ａ），（ｂ），
（ｃ）のように所定のピッチ周期でずらしながら重ね合
わせ、この場合にも従来に比べて位相特性のランダムさ
をより良好なものに改善することができるが、さらに、
音声素片波形の位相をランダムにすることに加えて、音
声素片波形の重ね合わせのタイミングをもランダムにす
ることによって、合成音声としての無声音のスペクトル
特性については、もとの音声素片波形１のスペクトル特
性を忠実に再現しこれを維持することができる一方で、
位相特性については、これをほぼ完全なホワイトノイズ
のようにより一層ランダム化することができる。In the above example, the waveform superposition unit 4 converts the random phase speech unit waveform into the waveforms shown in FIGS.
As shown in (c), the layers are overlapped while being shifted at a predetermined pitch cycle. In this case as well, the randomness of the phase characteristics can be improved to a better level as compared with the related art.
In addition to randomizing the phase of the speech unit waveform, the superposition timing of the speech unit waveform is also randomized, so that the spectral characteristics of unvoiced sound as synthesized speech While the spectral characteristics of No. 1 can be faithfully reproduced and maintained,
As regards the phase characteristic, it can be made more random like almost perfect white noise.

【００２３】図４は音声素片波形１の重ね合わせのタイ
ミングをランダムにすることを意図した波形重畳部４の
構成例を示す図である。FIG. 4 is a diagram showing an example of the configuration of the waveform superposition unit 4 intended to make the timing of superimposing the speech unit waveform 1 random.

【００２４】図４の構成例では、波形重畳部４は、ラン
ダム信号を発生するランダム信号発生部１１と、所定の
ピッチ周期のパルスを発生するピッチ周期発生部１２
と、無声音の音声合成時には、ランダム信号発生部１１
からのランダム信号を選択するように切替わり、有声音
の音声合成時には、ピッチ周期発生部１２からのピッチ
周期信号を選択するように切替わる切替部１３と、図１
に示す切替部３からの音声素片波形を切替部１３からの
信号によるタイミングでずらして重ね合わせる重ね合わ
せ部１４とを有している。In the configuration example of FIG. 4, the waveform superimposing section 4 includes a random signal generating section 11 for generating a random signal and a pitch cycle generating section 12 for generating a pulse having a predetermined pitch cycle.
When the voice synthesis of the unvoiced sound is performed, the random signal generation unit 11
And a switching unit 13 that switches to select a random signal from the pitch period generating unit 12 during voice synthesis of a voiced sound.
And a superposition unit 14 for superposing the speech unit waveforms from the switching unit 3 as shown in FIG.

【００２５】波長重畳部４が図４のような構成になって
いる場合、無声音の合成時には、切替部１３は、ランダ
ム信号発生部１１からのランダム信号（例えばランダム
パルス）を重ね合わせ部１４に重ね合わせのタイミング
として与えるようになっている。このときには、切替部
３からのランダム位相音声素片波形が波長重畳部４に送
られると、波長重畳部４の重ね合わせ部１４では、ラン
ダム位相音声素片波形を切替部１３からのランダム信号
で指示された重ね合わせのタイミングでずらしながらラ
ンダム位相音声素片波形の重ね合わせを行ない、無声音
の音声波形を合成する。このようにして、得られた無声
音の合成音声は、音声素片波形の位相のみならず、重ね
合わせのタイミングもランダムであることによって、位
相特性がより一層ランダム化され、より人間の肉声に似
た自然な合成音声となる。When the wavelength superimposing section 4 is configured as shown in FIG. 4, when synthesizing unvoiced sound, the switching section 13 applies a random signal (for example, a random pulse) from the random signal generating section 11 to the superposing section 14. This is given as the timing of superposition. At this time, when the random phase speech unit waveform from the switching unit 3 is sent to the wavelength superposition unit 4, the superimposition unit 14 of the wavelength superposition unit 4 converts the random phase speech unit waveform with the random signal from the switching unit 13. The random phase speech unit waveforms are superimposed while being shifted at the designated superimposition timing, and an unvoiced speech waveform is synthesized. In this way, the synthesized voice of the unvoiced sound obtained has not only the phase of the speech unit waveform but also the timing of superposition being random, so that the phase characteristics are further randomized and more similar to human voice. Natural synthesized speech.

【００２６】また、図４の構成例において、図示のよう
に重ね合わせ部１４の前段に窓掛処理部１５が設けられ
ていても良い。この窓掛処理部１５は、ランダム位相音
声素片波形に対し、その始端および終端の不連続性を緩
和するためのハミング窓，ハニング窓などの窓掛処理を
行なうようになっている。このような窓掛処理部１５に
おいて、図５（ａ）に示すような窓ＷＩＮが設定されて
いる場合には、図５（ｂ）に示すようなランダム位相音
声素片波形は、この窓掛処理部１５の窓ＷＩＮによっ
て、図５（ｃ）のように変形され、図５（ｂ）の波形の
始端および終端の不連続性を緩和することができ、重ね
合わせ部１４では、始端および終端の不連続性が緩和さ
れたランダム位相音声素片波形を所定のタイミングでず
らしながら重ね合わせる。この結果、不連続性さが減少
したより人間の肉声に近い自然な合成音声を生成するこ
とができる。In the configuration example shown in FIG. 4, a windowing processing unit 15 may be provided at a stage preceding the overlapping unit 14 as shown. The windowing processing unit 15 performs windowing processing such as a Hamming window and a Hanning window on the random phase speech unit waveform to reduce discontinuity at the start and end thereof. When a window WIN as shown in FIG. 5A is set in such a windowing processing unit 15, a random-phase speech unit waveform as shown in FIG. Due to the window WIN of the processing unit 15, the waveform is deformed as shown in FIG. 5C, and the discontinuity at the start and end of the waveform in FIG. 5B can be reduced. Are superimposed while shifting at predetermined timing the random-phase speech unit waveforms in which the discontinuity is alleviated. As a result, it is possible to generate a natural synthesized speech that is closer to the real human voice than the discontinuity is reduced.

【００２７】さらに、図４の構成において、ランダム信
号発生部１１を図６に示すような構成にすることができ
る。すなわち、図６の構成例では、ランダム信号発生部
１１は、乱数，すなわちランダム値ｒ_n（例えば“−２
０”から“２０”の間の整数値）を発生する乱数発生器
５１と、現在のランダム値ｒ_nと１つ前の時点でのラン
ダム値ｒ_n-1との差分（ｒ_n−ｒ_n-1）をとる差分器５２
と、一定の周期Ｔ（例えば“５０”）と差分器５２から
の差分値（ｒ_n−ｒ_n-1）とを加算する加算器５３とから
構成されている。Further, in the configuration of FIG. 4, the random signal generator 11 can be configured as shown in FIG. That is, in the configuration example of FIG. 6, the random signal generator 11, a random number, i.e. the random value r _n (eg "-2
0 a random number generator 51 for generating integer values) between "from" 20 ", the current random value r _n and the random value r _n-1 and the difference at the time of the previous (r _n -r _{n -1} )
When, and a differential value (r _n -r _{_n-1)} and adds the adder 53 from the differentiator 52 and a fixed period T (for example, "50").

【００２８】ランダム信号発生部１１がこのような構成
となっている場合には、ランダム位相音声素片波形の重
ね合わせのタイミング周期ｌ_n（ｎ＝１，２，３…）
は、図７に示すように、差分値（ｒ_n−ｒ_n-1）に一定の
周期Ｔを加算したものとなり、一定の周期Ｔからの変動
がランダムとなるように重ね合わせ部１４に与えること
ができる。すなわち、従来では、図１１に示したよう
に、重ね合わせるタイミングを前のタイミングから乱数
で与えているが、図６のランダム信号発生部１１では、
音声素片波形の重ね合わせのタイミングを一定の周期Ｔ
からの変動がランダムとなるように重ね合わせ部１４に
与えているので、これにより、音声素片波形の重なりが
一様となり、また、位相特性が十分にランダムなものと
なって、さらに一層人間の肉声に似た自然な合成音声を
生成することができる。When the random signal generator 11 has such a configuration, the timing period l _n (n = 1, 2, 3,...) Of superposition of random phase speech unit waveforms
As shown in FIG. 7, it is obtained by adding a fixed period T the difference value _{_{(r n -r n-1)}} , variations in the predetermined cycle T is given to the superposition section 14 such that the random be able to. That is, conventionally, as shown in FIG. 11, the superimposition timing is given by a random number from the previous timing, but the random signal generation unit 11 of FIG.
A fixed period T
Is given to the superimposing unit 14 so that the variation from the randomization is random, so that the overlap of the speech unit waveforms becomes uniform and the phase characteristics become sufficiently random, so that the human A natural synthesized voice similar to the real voice of the subject can be generated.

【００２９】このように音声合成装置を図１，図４，図
６に示したような構成にし、音声素片波形の位相をラン
ダムにし、さらにはこれに加えて、重ね合わせのタイミ
ングをもランダムにすることが、人間の実際に発生する
無声音により一層近い無声音を合成する上で効果的であ
る。但し、図１の基本構成に基づき、種々の変形を行な
うこともできる。例えば、重ね合わせのタイミングをラ
ンダムにするかわりに、ランダム位相音声素片波形の振
幅をランダムにするように構成することも可能である。As described above, the speech synthesizer is configured as shown in FIGS. 1, 4 and 6, and the phase of the speech unit waveform is made random. In addition, the superposition timing is also made random. Is effective in synthesizing an unvoiced sound that is closer to an unvoiced sound actually generated by a human. However, various modifications can be made based on the basic configuration of FIG. For example, instead of making the timing of superposition random, it is also possible to make the amplitude of the random phase speech unit waveform random.

【００３０】図８，図１０はこのような構成例を示す図
であり、図８の構成例では、波長重畳部４は、所定のピ
ッチ周期の信号を発生する周期発生部２１と、無声音の
音声合成時に周期発生部２１から出力される周期信号の
タイミングで切替部３からのランダム位相音声素片波形
の振幅の大きさをランダム値で指示する振幅値発生部２
２と、無声音の音声合成時に切替部３からのランダム位
相音声素片波形を振幅値発生部２２からのランダム値と
積算する積算部２３と、無声音の音声合成時には、積算
部２３からの積算された波形を選択するように切替わ
り、有声音の音声合成時には、切替部３からの音声素片
波形１自体を選択するように切替わる切替部２４と、切
替部２４からの波形を周期発生部２１からの周期信号に
よるタイミングでずらして重ね合わせる重ね合わせ部２
５とを有している。なお、振幅値発生部２２は、例えば
乱数発生器によって構成されている。FIG. 8 and FIG. 10 are diagrams showing such a configuration example. In the configuration example of FIG. 8, the wavelength superposition unit 4 includes a period generation unit 21 for generating a signal having a predetermined pitch period, and an unvoiced sound. An amplitude value generation unit 2 that indicates the magnitude of the amplitude of the random phase speech unit waveform from the switching unit 3 with a random value at the timing of the periodic signal output from the period generation unit 21 during speech synthesis.
2, an integrating unit 23 that integrates the random phase speech unit waveform from the switching unit 3 with the random value from the amplitude value generating unit 22 at the time of voice synthesis of unvoiced sound, and an integrating unit 23 at the time of voice synthesis of unvoiced sound. A switching unit 24 that switches to select the speech unit waveform 1 itself from the switching unit 3 during voice synthesis of a voiced sound, and a waveform generation unit that switches the waveform from the switching unit 24. Superimposing unit 2 that superimposes at a timing shifted by a periodic signal from 21
5 is provided. Note that the amplitude value generation unit 22 is configured by, for example, a random number generator.

【００３１】このような構成では、図９に示すように、
周期発生部２１からは重ね合わせ部２５で波形をずらし
て重ねるためのタイミングを示す信号Ｐが出力され、振
幅値発生部２２では、周期発生部２１からの周期信号Ｐ
によって，すなわち重ね合わせ部２５で波形をずらして
重ね合わせるタイミングごとに、振幅のランダム値Ｗを
更新して積算部２３に与える。積算部２３では、振幅の
ランダム値Ｗと切替部３からのランダム位相音声素片波
形とを積算することによって、ランダム位相音声素片波
形の振幅をランダムに変化させ、これを切替部２４を介
して重ね合わせ部２５に送る。重ね合わせ部２５では、
位相のみならず振幅もランダムな音声素片波形を周期発
生部２１からの周期信号Ｐによる重ね合わせのタイミン
グでずらしながら重ね合わせを行ない、無声音の音声波
形を合成する。このようにして得られた無声音の合成音
声は、位相のみならず振幅もランダムな音声素片波形を
所定のタイミングでずらし重ね合わせたものとなる。な
お、この際、音声素片波形の振幅は、波形を重ね合わせ
るごとにランダムに変化するので、従来の音声合成装置
に比べて、スペクトル特性への影響は少ない。In such a configuration, as shown in FIG.
The period generating section 21 outputs a signal P indicating the timing at which the overlapping section 25 shifts and overlaps the waveform, and the amplitude value generating section 22 outputs the periodic signal P from the period generating section 21.
In other words, the random value W of the amplitude is updated and given to the integrating unit 23 at each timing when the waveforms are shifted and overlapped by the overlapping unit 25. The integrating unit 23 changes the amplitude of the random-phase speech unit waveform at random by integrating the random value W of the amplitude and the random-phase speech unit waveform from the switching unit 3, and changes this through the switching unit 24. To the overlapping section 25. In the overlapping section 25,
Superposition is performed while shifting the speech unit waveforms having not only a phase but also a random amplitude at the timing of the superposition by the periodic signal P from the period generating unit 21 to synthesize an unvoiced sound waveform. The synthesized voice of the unvoiced sound obtained in this way is obtained by shifting the speech unit waveforms having not only a phase but also a random amplitude at a predetermined timing and superimposing them. At this time, since the amplitude of the speech unit waveform changes randomly each time the waveform is superimposed, the influence on the spectral characteristics is smaller than that of the conventional speech synthesizer.

【００３２】また、図１０の構成例では、波形重畳部４
は、ランダム信号を発生するランダム信号発生部３１
と、所定のピッチ周期の信号を発生するピッチ周期発生
部３２と、無声音の音声合成時には、ランダム信号発生
部３１からのランダム信号を選択するように切替わり、
有声音の音声合成時には、ピッチ周期発生部３２からの
ピッチ周期信号を選択するように切替わる切替部３３
と、無声音の音声合成時にランダム信号発生部３１から
のランダム信号のタイミングで切替部３からのランダム
位相音声素片波形の振幅の大きさをランダム値で指示す
る振幅値発生部３４と、無声音の音声合成時に切替部３
からのランダム位相音声素片波形を振幅値発生部３４か
らのランダム値と積算する積算部３５と、無声音の音声
合成時には、積算部３５からの積算された波形を選択す
るように切替わり、有声音の音声合成時には、切替部３
からの音声素片波形１自体を選択するように切替わる切
替部３６と、切替部３６からの波形をランダム信号発生
部３１からのランダム信号によるタイミングでずらして
重ね合わせる重ね合わせ部３７とを有している。なお、
振幅値発生部３４は、例えば乱数発生器によって構成さ
れている。Further, in the configuration example of FIG.
Is a random signal generator 31 for generating a random signal
And a pitch cycle generating section 32 for generating a signal with a predetermined pitch cycle, and switching to select a random signal from the random signal generating section 31 during voice synthesis of unvoiced sound,
At the time of voice synthesis of voiced sound, a switching unit 33 that switches so as to select the pitch period signal from the pitch period generation unit 32
And an amplitude value generating unit 34 that indicates the magnitude of the amplitude of the random phase speech unit waveform from the switching unit 3 at a random signal timing from the random signal generating unit 31 at the time of synthesis of the unvoiced sound by a random value; Switching unit 3 during voice synthesis
The integration unit 35 integrates the random phase speech unit waveform from the random number from the amplitude value generation unit 34 with the random value from the amplitude value generation unit 34, and switches to select the waveform integrated from the integration unit 35 when unvoiced sound is synthesized. When the voice is synthesized, the switching unit 3
A switching unit 36 for switching to select the speech unit waveform 1 itself from the first unit, and a superposition unit 37 for superposing the waveform from the switching unit 36 at a timing shifted by the random signal from the random signal generation unit 31. are doing. In addition,
The amplitude value generator 34 is constituted by, for example, a random number generator.

【００３３】このような構成では、無声音の音声合成時
において、ランダム信号発生部３１からは重ね合わせ部
３７で波形をずらして重ねるためのタイミングを示すラ
ンダム信号が出力され、振幅値発生部３４ではこのラン
ダム信号によって，すなわち重ね合わせ部３７で波形を
ずらして重ね合わせるタイミングごとに振幅のランダム
値を更新して積算部３５に与える。積算部３５では、振
幅のランダム値と切替部３からのランダム位相音声素片
波形とを積算することによって、ランダム位相音声素片
波形の振幅をランダムに変化させ、これを切替部３６を
介して重ね合わせ部３７に送る。重ね合わせ部３７で
は、位相のみならず振幅もランダムな音声素片波形をラ
ンダム信号発生部３１からのランダム信号による重ね合
わせのタイミングでずらしながら重ね合わせを行ない、
無声音の音声波形を合成する。このようにして、得られ
た無声音の合成音声は、位相のみならず振幅もランダム
な音声素片波形をランダムなタイミングでずらし重ね合
わせたものとなる。In such a configuration, at the time of voice synthesis of unvoiced sound, the random signal generator 31 outputs a random signal indicating a timing for shifting and overlapping the waveforms in the superimposing section 37, and the amplitude value generating section 34 The random value of the amplitude is updated by the random signal, that is, at each timing of superimposing the waveform while being shifted by the superimposing unit 37, and the updated random value is provided to the integrating unit 35. The integrating unit 35 changes the amplitude of the random-phase speech unit waveform at random by integrating the random value of the amplitude and the random-phase speech unit waveform from the switching unit 3, and changes the amplitude through the switching unit 36. It is sent to the overlapping section 37. The superposition unit 37 performs superposition while shifting the speech unit waveform having not only a phase but also a random amplitude at the timing of superposition by the random signal from the random signal generation unit 31,
Synthesize unvoiced speech waveform. In this way, the obtained synthesized voice of unvoiced sound is obtained by superimposing speech unit waveforms having not only a phase but also a random amplitude at random timing.

【００３４】また、図６に示したようなランダム信号発
生部１１については、これを本発明の音声合成装置のみ
ならず、従来の音声合成装置にも適用することができ
る。例えば、このランダム信号発生部１１を、音声素片
波形の振幅と重ね合わせ周期とを乱数でランダムに指示
する前述の従来の音声合成装置の波形重畳部に適用する
場合には、従来に比べて、ランダム振幅音声素片波形の
重ね合わせ時に重なりがより一様となり、位相特性のラ
ンダムさをより改善することができる。The random signal generator 11 as shown in FIG. 6 can be applied not only to the speech synthesizer of the present invention but also to a conventional speech synthesizer. For example, when the random signal generating unit 11 is applied to the waveform superimposing unit of the above-described conventional speech synthesizer in which the amplitude of the speech unit waveform and the superimposing cycle are randomly designated by random numbers, the random signal generating unit 11 has a larger size than the conventional one. In addition, when the random amplitude speech unit waveforms are superimposed, the overlap becomes more uniform, and the randomness of the phase characteristics can be further improved.

【００３５】[0035]

【発明の効果】以上に説明したように、請求項１記載の
発明では、無声音の音声合成を行なう際に、音声素片波
形の振幅ではなく位相をランダム化するので、得られる
合成音声としての無声音のスペクトル特性については、
人間が実際に発声する無声音のスペクトル特性を維持し
つつ、位相特性についてだけ、これをホワイトノイズの
ように十分にランダムなものにすることができ、従来に
比べて、人間の肉声に近いより自然な無声音の合成音声
を生成することができる。As described above, according to the first aspect of the present invention, when voice synthesis of unvoiced sound is performed, the phase is randomized instead of the amplitude of the voice unit waveform, so that the synthesized voice obtained as the synthesized voice is obtained. Regarding the spectral characteristics of unvoiced sound,
While maintaining the spectral characteristics of the unvoiced sound actually uttered by humans, only the phase characteristics can be made sufficiently random like white noise, making it more natural and closer to the human voice than before. It is possible to generate an unvoiced synthesized voice.

【００３６】また、請求項２記載の発明では、音声素片
波形の位相のみならず、重ね合わせのタイミングをもラ
ンダムにするので、位相特性がより一層ランダム化さ
れ、より人間の肉声に似た自然な合成音声を生成するこ
とができる。According to the second aspect of the present invention, not only the phase of the speech segment waveform but also the timing of superposition are made random, so that the phase characteristics are further randomized and more similar to human voice. Natural synthesized speech can be generated.

【００３７】また、請求項３記載の発明では、ランダム
位相化後に音声素片波形に対して窓かけ処理を行なう窓
かけ処理手段が設けらけているので、始端および終端の
不連続性が緩和されたランダム位相音声素片波形を所定
のタイミングでずらしながら重ね合わせることができ、
この結果、不連続性さが減少したより人間の肉声に近い
自然な合成音声を生成することができる。According to the third aspect of the present invention, since windowing processing means for performing windowing processing on the speech unit waveform after random phase conversion is provided, discontinuity at the start and end is reduced. Can be superimposed while shifting the random phase speech unit waveform at a predetermined timing,
As a result, it is possible to generate a natural synthesized speech that is closer to the real human voice than the discontinuity is reduced.

【００３８】[0038]

[Brief description of the drawings]

【図１】本発明に係る音声合成装置の一実施例のブロッ
ク図である。FIG. 1 is a block diagram of an embodiment of a speech synthesizer according to the present invention.

【図２】ランダム位相化部の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of a random phase shift unit.

【図３】（ａ），（ｂ），（ｃ）は波形重畳部における
波形の重ね合わせの一例を示す図である。FIGS. 3A, 3B, and 3C are diagrams illustrating an example of superposition of waveforms in a waveform superposition unit. FIGS.

【図４】波形重畳部の構成例を示す図である。FIG. 4 is a diagram illustrating a configuration example of a waveform superimposing unit.

【図５】（ａ），（ｂ），（ｃ）は窓かけ処理の一例を
示す図である。FIGS. 5A, 5B, and 5C are diagrams illustrating an example of a windowing process.

【図６】ランダム信号発生部の構成例を示す図である。FIG. 6 is a diagram illustrating a configuration example of a random signal generation unit.

【図７】本発明におけるランダム位相音声素片波形の重
ね合わせのタイミングの一例を示す図である。FIG. 7 is a diagram showing an example of the timing of superposition of random phase speech unit waveforms in the present invention.

【図８】波形重畳部の構成例を示す図である。FIG. 8 is a diagram illustrating a configuration example of a waveform superimposing unit.

【図９】振幅のランダム値の更新処理を説明するための
図である。FIG. 9 is a diagram for explaining a process of updating a random value of amplitude.

【図１０】波形重畳部の構成例を示す図である。FIG. 10 is a diagram illustrating a configuration example of a waveform superimposing unit.

【図１１】従来の音声合成装置における音声素片波形の
重ね合わせのタイミングの一例を示す図である。FIG. 11 is a diagram showing an example of the timing of superposition of speech unit waveforms in a conventional speech synthesis device.

[Explanation of symbols]

１音声素片波形２ランダム位相化部３切替部４波形重畳部５フーリエ変換部６乱数発生部１１ランダム信号発生部１２ピッチ周期発生部１３切替部１４重ね合わせ部１５窓掛処理部５１乱数発生器５２差分器５３加算器 DESCRIPTION OF SYMBOLS 1 Speech unit waveform 2 Random phase conversion part 3 Switching part 4 Waveform superposition part 5 Fourier transformation part 6 Random number generation part 11 Random signal generation part 12 Pitch period generation part 13 Switching part 14 Superposition part 15 Window processing part 51 Random number generation Device 52 difference device 53 adder

Claims

(57) [Claims]

1. A voice synthesizing apparatus for synthesizing voice segment waveforms by superimposing them, wherein a random phase generator generates a random phase voice unit waveform by randomizing a phase of the voice unit waveform, and the random phase generator. And a waveform superimposing means for adding or superimposing the random-phase speech unit waveforms generated by the above while shifting to synthesize an unvoiced speech waveform.

2. The speech synthesizer according to claim 1, wherein:
The waveform superimposing means includes a random signal generating means for instructing a superposition timing of a random-phase speech unit waveform by a random value, and a random-phase speech unit based on the superposition timing instructed by the random signal generating means. A speech synthesizing device comprising: superimposing means for synthesizing an unvoiced speech waveform by shifting and adding or superimposing the waveforms.

3. The speech synthesizer according to claim 1, wherein
The speech synthesis apparatus further includes windowing processing means for performing windowing processing on the speech unit waveform after randomization.