JPH01219900A

JPH01219900A - Speech synthesizing device

Info

Publication number: JPH01219900A
Application number: JP63044512A
Authority: JP
Inventors: Tatsuro Matsumoto; 達郎松本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1988-02-29
Filing date: 1988-02-29
Publication date: 1989-09-01

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔概　　　要〕音声生成過程を線形システムとしてモデル化し音源波形
でフィルタを駆動することにより音声波形を合成する音
声合成装置に係り、特にその音源部における音源波形の
スペクトル制御技術に関し、発声の強さによって有声音
源波形のパルス形状を変化させることを可能にし、それ
により自然性に優れた合成音声を合成することを目的と
し、音源振幅に応じた振幅の音源波形を発生する音源波
形発生手段と、該手段で発生される前記音源波形のスペ
クトル特性を前記音源振幅に応じて可変する音源スペク
トル制御手段とを有するように構成する。[Detailed Description of the Invention] [Summary] The present invention relates to a speech synthesis device that synthesizes speech waveforms by modeling the speech generation process as a linear system and driving a filter with the sound source waveform, and in particular, spectral control of the sound source waveform in the sound source section. Regarding the technology, it is possible to change the pulse shape of the voiced sound source waveform depending on the strength of the vocalization, and thereby generate a sound source waveform with an amplitude corresponding to the sound source amplitude, with the aim of synthesizing synthetic speech with excellent naturalness. and a sound source spectrum control means for varying the spectrum characteristics of the sound source waveform generated by the sound source waveform in accordance with the sound source amplitude.

[Industrial application field]

本発明は、音声生成過程を線形システムとしてモデル化
し音源波形でフィルタを駆動すことにより音声波形を合
成する音声合成装置に係り、特にその音源部における音
源波形のスペクトル制御技術に関する。The present invention relates to a speech synthesis device that synthesizes speech waveforms by modeling the speech generation process as a linear system and driving a filter with the sound source waveform, and particularly relates to a spectrum control technique of the sound source waveform in the sound source section.

[Conventional technology]

効率の良い音声合成を行うためには、人間の音声生成過
程を適切にモデル化し、そのモデルに基づいたシステム
を構成する必要がある。音声生成過程の代表的なモデル
としては、線形システムによって表現したモデルがある
。In order to perform efficient speech synthesis, it is necessary to appropriately model the human speech production process and configure a system based on that model. A typical model of the speech generation process is a model expressed by a linear system.

このような音声生成モデルの構成を第４図に示す。同図
に示すように、音源部１の出力によって声道フィルタ２
を駆動し、更に声道フィルタ２の出力によって放射特性
フィルタ３を駆動して音声合成を行う。The configuration of such a speech generation model is shown in FIG. As shown in the figure, the vocal tract filter 2 is
The radiation characteristic filter 3 is further driven by the output of the vocal tract filter 2 to perform speech synthesis.

音源部１は、有声音の場合には声帯振動による空気のパ
ルス流をモデル化し、無声音の場合には声道へ流れ込む
空気の乱流をモデル化した部分である。声道フィルタ２
は、声道内の空気流の伝達特性をモデル化した部分であ
る。放射特性フィルタ３は、唇における空気流による音
波の放射特性をモデル化した部分である。The sound source unit 1 is a part that models a pulse flow of air caused by vocal fold vibration in the case of a voiced sound, and models a turbulent flow of air flowing into the vocal tract in the case of an unvoiced sound. Vocal tract filter 2
is a part that models the transfer characteristics of airflow within the vocal tract. The radiation characteristic filter 3 is a part that models the radiation characteristics of sound waves caused by airflow at the lips.

今、第４図で音源部１の出力における音源スペクトルを
Ｓ　（ｆ）　、声道フィルタ２における声道伝達関数（
フィルタ特性）をＴ　（ｆ）　、放射特性フィルタ３に
おける放射特性（フィルタ特性）をＲ（ｆ）とすれば、
唇から層分離れた点で観測される音圧の変化に相当する
スペクトル特性Ｐ　（ｆ）は、Ｓ　（ｆ）、　Ｔ　（ｆ
）、　　Ｒ（ｆ）の積で表され、以下の式（１）のよう
になる。Now, in FIG. 4, the sound source spectrum at the output of the sound source section 1 is S (f), and the vocal tract transfer function at the vocal tract filter 2 is (
If the filter characteristic) is T (f) and the radiation characteristic (filter characteristic) of the radiation characteristic filter 3 is R (f),
The spectral characteristics P (f) corresponding to the change in sound pressure observed at a point separated from the lips are S (f), T (f
), R(f), and is expressed as the following equation (1).

Ｐ　（ｆ）　＝Ｓ　（ｆ）　　・Ｔ　（ｆ）　　・Ｒ（
ｆ）・　・　・　・（１）このようなモデルを実現するために、従来、種々のシス
テムが提案されている。ここでは、代表的なホルマント
型音声合成装置であるＫｌａｔｔ型音声合成装置を例に
挙げ説明する。P (f) = S (f) ・T (f) ・R(
f)... (1) In order to realize such a model, various systems have been proposed in the past. Here, a Klatt-type speech synthesizer, which is a typical formant-type speech synthesizer, will be explained as an example.

第５図に、Ｋｌａｔｔ型音声合成装置のブロック構成を
示す。同図において、音源部１、声道フィルタ２、放射
特性フィルタ３は、第４図の各々に対応している。そし
て音源部１は、有声音源波形９を発生する有声音源部４
と、無声音源波形１０を発生する無声音源部５によって
構成される。また、声道フィルタ２は、有声音源波形９
によって駆動される有声音用フィルタ６と、無声音源波
形１０によって駆動される無声音用フィルタ７、及び上
記各フィルタ６．７の出力を加算して放射特性フィルタ
３へ供給する加算器８によって構成される。FIG. 5 shows a block configuration of the Klatt type speech synthesizer. In the figure, a sound source section 1, a vocal tract filter 2, and a radiation characteristic filter 3 correspond to those in FIG. 4, respectively. The sound source section 1 includes a voiced sound source section 4 that generates a voiced sound source waveform 9.
and an unvoiced sound source section 5 that generates an unvoiced sound source waveform 10. In addition, the vocal tract filter 2 has a voiced sound source waveform 9
It is composed of a voiced sound filter 6 driven by a voiced sound filter 6, an unvoiced sound filter 7 driven by an unvoiced sound source waveform 10, and an adder 8 that adds the outputs of each of the filters 6. Ru.

上記構成において、有声音用フィルタ６及び無声音用フ
ィルタ７は、第６図に示される二次の巡回型ディジタル
フィルタを基本単位とし、各々その組み合わせで構成さ
れており、有声音用フィルタ６は第６図の基本フィルタ
の直列接続で実現され、無声音用声道フィルタ７は並列
接続で実現される。In the above configuration, the voiced sound filter 6 and the unvoiced sound filter 7 each have a second-order cyclic digital filter shown in FIG. This is realized by connecting the basic filters shown in FIG. 6 in series, and the vocal tract filter 7 for unvoiced speech is realized by connecting them in parallel.

第６図において、１２，１６．１７は各々係数Ａ、Ｂ、
Ｃを乗算する乗算器、１３は乗算器１２゜１６．１７の
各出力を加算する加算器、１４．１５は各々単位サンプ
リング時間分信号を遅延させる遅延要素である。そして
、Ｔをサンプリング周期、ｎを整数として、乗算器１２
へ入力ｘ　（ｎＴ）が入力し、加算器１３から出力ｙ　
（ｎＴ）が出力され、乗算器１６へは出力ｙ　（ｎＴ）
を遅延要素１４で１サンプリング時間分遅延させた信号
ｙ（ｎＴ−Ｔ）が入力し、乗算器１７へは出力ｙ（ｎ　
Ｔ）を遅延要素１４．１５で２サンプリング時間分遅延
させた信号ｙ　（ｎＴ−２Ｔ）が入力するとすれば、第
６図の基本フィルタの入力ｘ　（ｎＴ）と出力ｙ　（ｎ
Ｔ）の関係は、以下の式（２）のような差分方程式によ
って表される。In Figure 6, 12, 16.17 are coefficients A, B, respectively.
13 is an adder that adds the outputs of the multipliers 12, 16, and 17, and 14 and 15 are delay elements that delay the signal by a unit sampling time, respectively. Then, where T is the sampling period and n is an integer, the multiplier 12
Input x (nT) is input to , and output y from adder 13
(nT) is output, and the output y (nT) is output to the multiplier 16.
The signal y(nT-T) which is delayed by one sampling time by the delay element 14 is input, and the output y(nT) is input to the multiplier 17.
If a signal y (nT - 2T) obtained by delaying T) by two sampling times with a delay element 14.15 is input, then the input x (nT) and the output y (nT) of the basic filter in Fig. 6 are input.
The relationship T) is expressed by a difference equation such as the following equation (2).

ｙ　（ｎＴ）＝／ｌｘ　（ｎＴ）＋Ｂ−ｘ　（ｎＴ−Ｔ
）＋Ｃ−ｘ　（ｎＴ−２Ｔ）　　・・・・（２）また、
第６図の基本フィルタの乗算器１２．１６．１７におけ
る各係数Ａ、Ｂ、Ｃは、共振周波数Ｆとその帯域幅ＢＷ
を与えることによって、以下の式（３）〜（５）より計
算される。ただし、Ｔはサンプリング周期、πは円周率
であり、ｅｘｐは指数関数演算、ｃｏｓは余弦関数演算
を示す。y (nT)=/lx (nT)+B-x (nT-T
)+C-x (nT-2T)...(2) Also,
The coefficients A, B, and C in the multipliers 12, 16, and 17 of the basic filter in FIG. 6 are the resonance frequency F and its bandwidth BW.
is calculated from the following equations (3) to (5). Here, T is the sampling period, π is pi, exp is an exponential function operation, and cos is a cosine function operation.

Ｃ＝−ｅｘｐ　　（−２π・ＢＷ−Ｔ　）　　　　　　
・・・１３）Ｂ＝２・ｅｘｐ　　（−π・ＢＷ・Ｔ）　
　・ｃｏｓ　　（２πＦＴ）・　・　・　・（４）Ａ＝　１−Ｃ−Ｂ・　・　・　・（５）これらの関係より、第５図の有声音用フィルタ６を構成
する直列接続された複数の基本フィルタに、各々時間的
に変化する別々の共振周波数Ｆとその帯域幅ＢＷを与え
ることにより有声音が合成される。無声音用フィルタ７
においても同様である。C=-exp (-2π・BW-T)
...13) B=2・exp (−π・BW・T)
・cos (2πFT) ・・・・ (4) A=1−C−B ・・・・・(5) From these relationships, a plurality of series-connected basics forming the voiced sound filter 6 in FIG. Voiced sounds are synthesized by providing filters with separate resonant frequencies F and their bandwidths BW, each of which varies over time. Unvoiced sound filter 7
The same applies to

次に、第５図の無声音源部５は特には図示しないが、無
声音源波形１０として白色雑音を発生ずる白色雑音発生
部と振幅値を制御する乗算器等によって構成される。Next, the unvoiced sound source section 5 in FIG. 5 is composed of a white noise generating section that generates white noise as the unvoiced sound source waveform 10, a multiplier that controls the amplitude value, etc., although not particularly shown.

一方、第５図の有声音源部４の従来の一般的な構成を第
７図に示す。同図に示すように、インパルス列波形２１
を発生するインパルス発生部１８と、インパルス列波形
２１を入力とするローパスフィルタ１９と、その出力に
音源振幅２２を乗算して有声音源波形９（第５図参照）
を出力する音源振幅乗算部２０から構成される。On the other hand, FIG. 7 shows a conventional general configuration of the voiced sound source section 4 shown in FIG. As shown in the figure, the impulse train waveform 21
an impulse generating section 18 that generates an impulse train waveform 21, a low-pass filter 19 that receives an impulse train waveform 21 as input, and a voiced sound source waveform 9 (see FIG. 5) that multiplies its output by a sound source amplitude 22.
It consists of a sound source amplitude multiplier 20 that outputs.

ここで、インパルス列生成部１日は、声帯の振動に対応
したピッチと呼ばれる基本周波数ＦＯに応じた周期でイ
ンパルス列波形２１を発生する。Here, the impulse train generation unit 1 generates an impulse train waveform 21 at a period corresponding to a fundamental frequency FO called a pitch corresponding to the vibration of the vocal cords.

この波形２１のスペクトルは、基本周波数ＦＯとその整
数倍の各周波数に、同一振幅の周波数成分（高調波成分
）を有するくし型の調波構造を有する。The spectrum of this waveform 21 has a comb-shaped harmonic structure having frequency components (harmonic components) of the same amplitude at the fundamental frequency FO and each integral multiple thereof.

次に、ローパスフィルタ１９は、上記スペクトル特性を
有するインパルス列波形２１をろ波することにより、基
本周波数から周波数の高い高調波成分に向かって次第に
振幅が減少するような包絡を有する調波構造のスペクト
ル特性を持つ波形に変換する。この波形は実際の音源波
形に近いスペクトル特性を有する。Next, the low-pass filter 19 filters the impulse train waveform 21 having the above-mentioned spectral characteristics, thereby forming a harmonic structure having an envelope in which the amplitude gradually decreases from the fundamental frequency toward higher harmonic components. Convert to a waveform with spectral characteristics. This waveform has spectral characteristics close to the actual sound source waveform.

上記処理の後、音源振幅乗算部２０において音源振幅２
２が設定されて有声音源波形９が出力され、第５図の有
声音用フィルタ６を駆動する。After the above processing, the sound source amplitude multiplier 20
2 is set, the voiced sound source waveform 9 is output, and the voiced sound filter 6 shown in FIG. 5 is driven.

第７図のローパスフィルタ１９は、第５図の声道フィル
タ２の部分に用いられている第６図の基本フィルタ１つ
をそのまま流用し、その共振周波数ＦをＯ１帯域幅ＢＷ
）ｔｒ−適当な値に設定することによつてローパス特性
を実現している。なお、実際には音源のスペクトルの形
を成形するために反共振フィルタも接続されているが、
ここでは省略する。The low-pass filter 19 in FIG. 7 uses one of the basic filters in FIG. 6 used for the vocal tract filter 2 in FIG.
) tr - By setting it to an appropriate value, low-pass characteristics are realized. In addition, an anti-resonance filter is actually connected to shape the shape of the spectrum of the sound source, but
It is omitted here.

[Problem to be solved by the invention]

上記第５図及び第７図で示した従来の有声音源部４によ
って、有声音源波形９として一定のパルス形状を有し基
本周波数ＦＯに応じた周期のパルス列が得られる。そし
て、そのスペクトル特性は前記したように、基本周波数
から周波数の高い高調波成分に向かって次第に振幅が減
少するような一定の包絡を有する調波構造となる。なお
、各周波数成分はピッチに従って時間的に変化する。With the conventional voiced sound source section 4 shown in FIGS. 5 and 7, a pulse train having a constant pulse shape and a period corresponding to the fundamental frequency FO is obtained as the voiced sound source waveform 9. As described above, its spectral characteristics have a harmonic structure having a constant envelope in which the amplitude gradually decreases from the fundamental frequency toward higher frequency harmonic components. Note that each frequency component changes over time according to the pitch.

ところが、人間の声帯によって生成される有声音源波形
は、たとえピッチが一定、すなわち基本周波数ＦＯが一
定でパルス間隔が一定であったとしても、各パルス形状
は発声の強さによって第８図Ｔａ）又は（ｂ）のように
変化する。すなわち、強い発声時には第８図（ａ）に示
すように、インパルス性の強い非対称三角波で近似され
、弱い発声時には第８図（ｂ）に示すように、正弦波的
になる。However, in the voiced sound source waveform generated by the human vocal cords, even if the pitch is constant, that is, the fundamental frequency FO is constant and the pulse interval is constant, the shape of each pulse depends on the strength of the vocalization (see Figure 8 (Ta)). Or change as in (b). That is, when the vocalization is strong, it is approximated by a highly impulsive asymmetric triangular wave as shown in FIG. 8(a), and when the vocalization is weak, it is approximated by a sine wave as shown in FIG. 8(b).

しかし、従来の有声音源波形の発生方式においては、発
生の強さによって音源振幅は変化するが、得られる有声
音源波形９　（第５図、第７図参照）のパルス形状は一
定であったため、発声の強さによる合成音声の音質の違
いを出すことができず、自然性に優れた合成音声を合成
することができないという問題点を有していた。However, in the conventional voiced sound source waveform generation method, although the sound source amplitude changes depending on the strength of generation, the pulse shape of the obtained voiced sound source waveform 9 (see FIGS. 5 and 7) is constant. This method has a problem in that it is not possible to differentiate the sound quality of synthesized speech depending on the strength of the utterance, and it is not possible to synthesize synthesized speech with excellent naturalness.

本発明は、発声の強さによって有声音源波形のパルス形
状を変化させることを可能にし、それにより自然性に優
れた合成音声を合成することを目的とする。An object of the present invention is to make it possible to change the pulse shape of a voiced sound source waveform depending on the strength of vocalization, thereby synthesizing synthetic speech with excellent naturalness.

[Means to solve the problem]

第１図は、本発明のブロック図である。音源波形発生手
段２３は、音源振幅２５に応じた振幅の音源波形２６を
発生する。同手段２３は、例えばピッチ周期のディジタ
ルのインパルス列波形を発生するインパルス列生成部と
、該インパルス列波形を入力とするディジタルローパス
フィルタと、該フィルタの出力に前記音源振幅２５のデ
ィジタル値を乗算して有声音源波形を出力する乗算器と
によって構成される。FIG. 1 is a block diagram of the present invention. The sound source waveform generating means 23 generates a sound source waveform 26 having an amplitude corresponding to the sound source amplitude 25. The means 23 includes, for example, an impulse train generation section that generates a digital impulse train waveform having a pitch period, a digital low-pass filter that receives the impulse train waveform as input, and a digital value of the sound source amplitude 25 that is multiplied by the output of the filter. and a multiplier that outputs a voiced sound source waveform.

音源スペクトル制御手段２４は、音源波形発生手段２３
で発生される音源波形２６のスペクトル特性を音源振幅
２５に応じて可変する。同手段２４は、例えば音源波形
発生手段２３内の前記ディジタルローパスフィルタの帯
域幅が、音源振幅２５に正比例して変化するように、そ
のフィルタ係数を可変する手段によって構成される。The sound source spectrum control means 24 includes the sound source waveform generation means 23
The spectral characteristics of the sound source waveform 26 generated by the sound source are varied according to the sound source amplitude 25. The means 24 is constituted by, for example, means for varying the filter coefficients of the digital low-pass filter in the sound source waveform generating means 23 so that the bandwidth thereof changes in direct proportion to the sound source amplitude 25.

〔作　　　用〕上記手段により、音源波形２６のスペクトル特性は音源
振幅２５の強さに応じて変化し、実際の音源波形の変化
に良（対応したものとなる。この音源波形２６を用いて
音声合成用のフィルタ（声道フィルタと放射特性フィル
タ等）を駆動することにより、自然性に優れた合成音声
が合成される。[Function] With the above means, the spectral characteristics of the sound source waveform 26 change according to the strength of the sound source amplitude 25, and correspond well to changes in the actual sound source waveform. By driving synthesis filters (vocal tract filter, radiation characteristic filter, etc.), a synthesized speech with excellent naturalness is synthesized.

〔Example〕

以下、本発明の実施例につき詳細に説明を行う。 Hereinafter, embodiments of the present invention will be described in detail.

まず、本発明の対象とする音声合成装置の全体構成は、
既に説明した第５図のＫｌａｔｔ型音声合成装置と同じ
であるためその説明は省略する。First, the overall configuration of the speech synthesis device targeted by the present invention is as follows.
Since it is the same as the Klatt type speech synthesizer shown in FIG. 5, which has already been explained, its explanation will be omitted.

第２図は、第５図の有声音源部４に係る本発明の第１の
実施例の構成図である。まず、インパルス生成部２３に
おいて、基本周波数ＦＯに応じた周期でインパルス列波
形２７が生成される。FIG. 2 is a block diagram of a first embodiment of the present invention relating to the voiced sound source section 4 of FIG. First, the impulse generation section 23 generates an impulse train waveform 27 at a period corresponding to the fundamental frequency FO.

次に、この波形２７は、フィルタ係数制御部２５によっ
てスペクトル特性が制御されるローパスフィルタ２４で
ろ波される。ローパスフィルタ２４は、既に説明した第
７図の二次巡回型ディジタルフィルタである基本フィル
タ１つによって構成される。Next, this waveform 27 is filtered by a low-pass filter 24 whose spectral characteristics are controlled by a filter coefficient controller 25. The low-pass filter 24 is composed of one basic filter, which is the second-order cyclic digital filter shown in FIG. 7, which has already been explained.

フィルタ係数制御部２５は、音源振幅２８に応じてロー
パスフィルタ２４の係数を計算する。The filter coefficient control unit 25 calculates the coefficients of the low-pass filter 24 according to the sound source amplitude 28.

音源振幅乗算部２６は、ローパスフィルタ２４の出力に
音源振幅２８（ディジタル値）を乗算することにより、
有声音源波形９（第５図参照）を出力する。The sound source amplitude multiplier 26 multiplies the output of the low-pass filter 24 by the sound source amplitude 28 (digital value) to obtain
A voiced sound source waveform 9 (see FIG. 5) is output.

次に上記構成の第１の実施例の動作について以下に説明
する。Next, the operation of the first embodiment having the above configuration will be described below.

まず、インパルス列生成部２３において生成されるイン
パルス列波形２７は、インパルスが声帯の振動に対応し
た基本周波数ＦＯ（ピッチ）に応じた周期で並んだ波形
であり、そのスペクトル特性は、基本周波数ＦＯとその
整数倍の各周波数に、同一振幅の周波数成分（高調波成
分）を有するくし型の調波構造となる。First, the impulse train waveform 27 generated by the impulse train generating section 23 is a waveform in which impulses are arranged at a period corresponding to the fundamental frequency FO (pitch) corresponding to the vibration of the vocal cords, and its spectral characteristics are A comb-shaped harmonic structure has frequency components (harmonic components) of the same amplitude at each frequency that is an integer multiple of .

次にローパスフィルタ２４は、インパルス列波形２７を
ろ波することにより、音源振幅２８が大きい場合には既
に説明した第８図（ａ）に示すような有声音源波形９に
変換し、音源振幅２８が小さい場合には第８図（′ｂ）
に示すような有声音源波形９に変換する。Next, the low-pass filter 24 filters the impulse train waveform 27 to convert it into a voiced sound source waveform 9 as shown in FIG. When is small, Figure 8 ('b)
It is converted into a voiced sound source waveform 9 as shown in FIG.

ここで、第８図（ａ）、　（ｂ）の各有声音源波形は、
既に述べたように人間が実際に生成する波形に近い波形
を示している。そして、強い発声時には同図＋８）に示
すようにインパルス性の強い非対称三角波で近似される
。このことは、有声音源波形のスペクトル特性が、基本
周波数から周波数の高い高稠波成分に向かって振幅があ
まり減少しないゆるやかな包絡特性を有し、多くの高調
波成分を有する調波構造となることを示している。Here, each voiced sound source waveform in FIGS. 8(a) and (b) is
As already mentioned, it shows a waveform that is close to the waveform actually generated by humans. When a strong voice is uttered, it is approximated by an asymmetric triangular wave with strong impulsiveness, as shown in +8) in the same figure. This means that the spectral characteristics of the voiced sound source waveform have a gentle envelope characteristic in which the amplitude does not decrease much from the fundamental frequency to the high-frequency harmonic components, resulting in a harmonic structure with many harmonic components. It is shown that.

一方、弱い発声時には同図（ｂ）に示すように正弦波的
になる。このことは、有声音源波形のスペクトル特性が
、基本周波数から周波数の高い高調波成分に向かって振
幅が急激に減少する包絡特性を有する調波構造となり、
はとんど基本周波数成分のみとなっていることを示して
いる。On the other hand, when vocalization is weak, the sound becomes sinusoidal as shown in FIG. 2(b). This means that the spectral characteristics of the voiced sound source waveform have a harmonic structure with an envelope characteristic in which the amplitude rapidly decreases from the fundamental frequency toward higher frequency harmonic components.
indicates that it consists mostly of the fundamental frequency component.

上記事実に基づけば、第２図のローパスフィルタ２４に
おける伝達特性としては、音源振幅２８が大きいときに
、カットオフ周波数が高くなってより高い周波数の高調
波成分まで通過させ、音源振幅２８が小さいときに、カ
ットオフ周波数が低くなって高い周波数の高調波成分は
通過させないような特性が要求される。Based on the above facts, the transfer characteristics of the low-pass filter 24 in FIG. Sometimes, a characteristic is required in which the cutoff frequency is so low that high frequency harmonic components are not passed.

第２図のローパスフィルタ２４として第６図の基本フィ
ルタを用いた場合に上記特性を実現するためには、共振
周波数ＦをＯとおき、その帯域幅ＢＷが音源振幅２８が
大きいときには広（、小さいときには狭くなるようにな
ればよい。In order to achieve the above characteristics when the basic filter shown in FIG. 6 is used as the low-pass filter 24 shown in FIG. When it's small, it's fine if it gets narrower.

従って、第２図のフィルタ係数制御部２５は、共振周波
数Ｆを０に固定し、かつその帯域幅ＢＷを、叶＝α・ＡＶ＋β　　　　　　　　　・・・・（６）と
なるように変化させ、その後、前記（３）〜（５）式に
よって第６図の基本フィルタの各係数Ａ、Ｂ、Ｃの値を
計算し、ローパスフィルタ２４にセットすればよい。な
お、ＡＶは音源振幅２８の値であり、α、βは実験的に
求められる係数で、α、β〉０である。Therefore, the filter coefficient control unit 25 in FIG. 2 fixes the resonant frequency F to 0, and changes its bandwidth BW so that the following equation is satisfied: (6), and then, The values of the coefficients A, B, and C of the basic filter shown in FIG. Note that AV is the value of the sound source amplitude 28, α and β are coefficients determined experimentally, and α, β>0.

以上の動作により、第２図の第１の実施例で得られる有
声音源波形９としては、発声の強さに応じて第８図（ａ
ｌ又は（ｂ）に示すような波形が得られ、人間の実際の
有声音源波形に良く対応したものとなる。そして、この
ような有声音源波形９を用いて、第５図のＫｌａｔｔ型
音声合成装置によって音声合成を行うことにより、特に
有声音区間の合成音声の自然性を優れたものにすること
ができる。Through the above operations, the voiced sound source waveform 9 obtained in the first embodiment shown in FIG.
A waveform as shown in FIG. 1 or (b) is obtained, which corresponds well to the waveform of an actual human voiced sound source. By using such a voiced sound source waveform 9 and performing speech synthesis using the Klatt type speech synthesizer shown in FIG. 5, it is possible to make the synthesized speech particularly natural in the voiced sound section.

次に、第３図は、第５図の有声音源部４に係る本発明の
第２の実施例の構成図である。この実施例では、第２図
の第１の実施例のインパルス生成部２３とローパスフィ
ルタ２４の間に、更にもう１つのローパスフィルタ２９
を設けた構成となっている。Next, FIG. 3 is a block diagram of a second embodiment of the present invention relating to the voiced sound source section 4 of FIG. 5. In this embodiment, another low-pass filter 29 is provided between the impulse generator 23 and the low-pass filter 24 of the first embodiment shown in FIG.
It is configured with the following.

上記の構成で、ローパスフィルタ２９の伝達特性を音源
振幅２８が最も強い場合に対応した音源スペクトルを生
成するようにそのフィルタ係数を設定する。そして、ロ
ーパスフィルタ２４の伝達特性は、前記第１の実施例と
同様にして音源振幅２８の大きさに応じてフィルタ係数
制御部２５で制御される。例えば、第３図のローパスフ
ィルタ２９及び２４を各々第６図の基本フィルタ１つで
構成する場合、ローパスフィルタ２９及び２４の各共振
周波数ＦをＯとおき、ローパスフィルタ２９の共振周波
数における帯域幅ＢＷを強い発声に対応したスペクトル
を生成するように広（設定し、また、ローパスフィルタ
２４の共振周波数における帯域幅ＢＷを、強い発声の場
合法（、弱い発声の場合狭くなるように制御する。In the above configuration, the filter coefficients of the low-pass filter 29 are set so that the transmission characteristic of the low-pass filter 29 generates a sound source spectrum corresponding to the case where the sound source amplitude 28 is the strongest. The transfer characteristic of the low-pass filter 24 is controlled by the filter coefficient control section 25 in accordance with the magnitude of the sound source amplitude 28 in the same manner as in the first embodiment. For example, if the low-pass filters 29 and 24 in FIG. 3 are each configured with one basic filter in FIG. The BW is set wide to generate a spectrum corresponding to strong vocalizations, and the bandwidth BW at the resonance frequency of the low-pass filter 24 is controlled to be narrow in the case of strong vocalizations and narrow in the case of weak vocalizations.

上記構成により、第２図の第１の実施例に比べてよりき
めの細かいスペクトル制御を行うことが可能となる。With the above configuration, it is possible to perform more fine-grained spectrum control than in the first embodiment shown in FIG.

以上に示した本発明の実施例において、例えば第２図の
ローパスフィルタ２４として第６図のものより次数の高
いフィルタを用いれば、第３図の第２の実３％例のよう
にローパスフィルタを２つつなげなくても、きめの細か
いスペクトル制御を１つのローパスフィルタで実現する
ことができる。In the embodiment of the present invention shown above, for example, if a filter with a higher order than the one in FIG. 6 is used as the low-pass filter 24 in FIG. Fine-grained spectrum control can be achieved with a single low-pass filter without having to connect two filters.

また、第２図又は第３図のようにローパスフィルタに限
定されるものではな（、反共振特性を発声の強さによっ
て変化させるようなフィルタを付加してもよい。Furthermore, the present invention is not limited to the low-pass filter as shown in FIG. 2 or 3; a filter whose anti-resonance characteristic changes depending on the strength of vocalization may be added.

更に、本発明はＫｌａｔｔ型音声合成装置に限定される
ものではなく、音源波形でフィルタを駆動する線形シス
テムで音声生成過程をモデル化した音声合成装置に広く
通用することが可能である。Furthermore, the present invention is not limited to Klatt-type speech synthesizers, but can be broadly applied to speech synthesizers that model the speech generation process using a linear system in which a filter is driven by a sound source waveform.

〔Effect of the invention〕

本発明によれば、音源波形のスペクトルを音源振幅に応
じて可変させることが可能となり、これを用いて音声合
成用のフィルタを駆動することにより、自然性に優れた
合成音声を合成することが可能となる。According to the present invention, it is possible to vary the spectrum of a sound source waveform according to the sound source amplitude, and by using this to drive a filter for speech synthesis, it is possible to synthesize synthesized speech with excellent naturalness. It becomes possible.

特に、インパルス列波形に対して、音源振幅に正比例し
てより高い高調波成分を通過させるように制御すること
により、人間の発声によく対応した有声音源波形を生成
することが可能となる。In particular, by controlling the impulse train waveform to pass higher harmonic components in direct proportion to the sound source amplitude, it is possible to generate a voiced sound source waveform that corresponds well to human speech.

このとき、インパルス列波形を、共振周波数が０１１ｚ
で、音源振幅に正比例して帯域幅の変化するローパスフ
ィルタでろ波することにより、質のよい有声音源波形を
得ることができる。At this time, the impulse train waveform has a resonance frequency of 011z
By filtering with a low-pass filter whose bandwidth changes in direct proportion to the sound source amplitude, a voiced sound source waveform of good quality can be obtained.

[Brief explanation of the drawing]

第１図は、本発明のブロック図、第２図は、本発明の第１の実施例の構成図、第３図は、
本発明の第２の実施例の構成図、第４図は、音声生成モ
デルの構成図、第５図は、Ｋｌａｔｔ型音声合成装置の構成図、第６図
は、Ｋｌａｔｔ型音声合成装置の基本フィルタ、第７図は、従来例の構成図、第８図（ａ）、　ｆｂ）は、有声音源波形図である。２３・・・音源波形発生手段、２４・・・音源スペクトル制御手段、２５・・・音源振幅、２６・・・音源波形。特許出願人　　　富士通株式会社末完９月のフ゛′ロック巳第　１　図本発明の第１の寅絶伊し１１Ｎ目第　２　図Ｋｌａｔｔ型自声＠成剋置の基装フィルタ第６図従来伊１０才ｉへ゛図第７図テ龜　し・多色、片時（ａ）様幅 →時間肩声者　ラ水う皮升ヌｉ刀第８図FIG. 1 is a block diagram of the present invention, FIG. 2 is a configuration diagram of the first embodiment of the present invention, and FIG. 3 is a block diagram of the present invention.
A block diagram of the second embodiment of the present invention, FIG. 4 is a block diagram of a speech generation model, FIG. 5 is a block diagram of a Klatt-type speech synthesizer, and FIG. 6 is a basic diagram of a Klatt-type speech synthesizer. FIG. 7 is a configuration diagram of a conventional filter, and FIGS. 8(a) and 8(fb) are voiced sound source waveform diagrams. 23... Sound source waveform generation means, 24... Sound source spectrum control means, 25... Sound source amplitude, 26... Sound source waveform. Patent Applicant: Fujitsu Limited September 1999 Blockchain Figure 1 Figure 11 of the invention's first breakthrough Figure 2 Figure 6 Base filter of Klatt type self-voice @growth device Figure 6 Conventional filter To 10 years old, Figure 7. Teku, multicolor, one time (a) width → time shoulder voice, figure 8.

Claims

[Scope of Claims] 1) In a speech synthesis device that synthesizes a speech waveform by driving a filter with a sound source waveform, a sound source waveform generating means ( 23) and the sound source waveform (2
6) Sound source spectrum control means (24) for varying the spectrum characteristics according to the sound source amplitude (25). 2) The speech synthesis is performed by digital signal processing, the sound source waveform (26) is a digital voiced sound source waveform, and the sound source waveform generating means (23) is an impulse generator that generates a digital impulse train waveform with a pitch period. Department and
The sound source spectrum control means is composed of a digital low-pass filter that receives the impulse train waveform as input, and a multiplier that multiplies the output of the filter by a digital value of the sound source amplitude (25) to output the voiced sound source waveform. (24) is the sound source amplitude (
By varying the filter coefficient of the digital low-pass filter according to (25), when the sound source amplitude (25) is small, among the fundamental frequency component and its harmonic component corresponding to the pitch period in the impulse train waveform, 2. The sound source according to claim 1, wherein the control is performed so that only the vicinity of the fundamental frequency component is passed, and when the sound source amplitude (25) is large, the control is performed so that high frequency harmonic components are also passed. Speech synthesizer. 3) The digital low-pass filter has a resonant frequency of 0.
Hz transmission characteristic, and the sound source spectrum control means (
3. The speech synthesis apparatus according to claim 2, wherein the filter coefficient of the digital low-pass filter is varied so that the bandwidth of the digital low-pass filter changes in direct proportion to the sound source amplitude.