JPH02230300A

JPH02230300A - Voice synthesizer

Info

Publication number: JPH02230300A
Application number: JP1049958A
Authority: JP
Inventors: Takayuki Ishikawa; 孝行石川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1989-03-03
Filing date: 1989-03-03
Publication date: 1990-09-12

Abstract

PURPOSE:To prevent energy from being converted excessively by providing the voice synthesizer with a pitch varying device and determining a pulse excitation point at or nearly at a pitch pulse period. CONSTITUTION:When a voice synthesizing filter 13 is driven by a modeled sound source with an impulse string based upon the pitch period or a white noise to generate an input voice signal, the pitch pulse varying device 18 which varies the generation period of a pitch pulse train based upon the pitch period is provided and a pitch pulse position which is a pulse excitation point varies at a period determined by the pitch pulse varying device 18. Consequently, the convergence of energy is evaded and a synthetic voice which is close to natural one can be obtained.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声分析器で分析されたスペクトラル包絡情報
と音源情報とを合成する音声合成器に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech synthesizer that synthesizes spectral envelope information and sound source information analyzed by a speech analyzer.

[Conventional technology]

従来、この種の音声合成器は、入力音声信号の巨視的構
造を示すスペクトラル包絡情報と微細構造を示す音源情
報とを入力し、これら分析情報に基づいて人力音声信号
を再生するものである。すなわち、このような音声合成
器においては、分析情報として伝送されるスペクトラル
包絡情報を音源情報で駆動する全極型のディジタルフィ
ルタが音声合成フィルタとして備えられている。Conventionally, this type of speech synthesizer inputs spectral envelope information indicating the macroscopic structure of an input speech signal and sound source information indicating the fine structure, and reproduces a human-powered speech signal based on these analysis information. That is, such a speech synthesizer is equipped with an all-pole digital filter as a speech synthesis filter that drives spectral envelope information transmitted as analysis information with sound source information.

スペクトラル包絡情報は、通常は入力音声信号をＬＰＧ
分析（線形予測分析；　ＬｉｎｅａｒＰｒｅｄｉｃｔｉ
ｖｅ　Ｃｏｅｆｆｉｃｉｅｎｔ）　Ｌて求められるαパ
ラメータやＫパラメータのごとき線形予測係数をフィル
タ係数としたものである。Spectral envelope information is usually obtained by converting the input audio signal to LPG.
analysis (linear predictive analysis; Linear Predicti
The filter coefficients are linear prediction coefficients such as the α parameter and the K parameter determined by

一方、音源情報はピッチ周期に基づいたインバルス列（
ピッチパルス列）、有声無声無音情報、その他音声電力
等であり、音源情報の持つ波形情報は切り揄でて、音源
のピッチ周期と有声無声無音情報および音声電力等をモ
デル化したもので、音声合成フィルタを駆動するように
している。すなわち、有声音源はそのピッチ周期のイン
パルス列で、また無声無音は白色雑音によるモデル化表
現で、音声合成フィルタを駆動するようにしている。On the other hand, the sound source information is an impulse sequence based on the pitch period (
(pitch pulse train), voiced/unvoiced/silent information, other audio power, etc., and the waveform information contained in the sound source information is cut out, and the pitch period of the sound source, voiced/unvoiced/silent information, audio power, etc. are modeled, and the voice synthesis I am trying to drive the filter. That is, a voiced sound source is driven by an impulse train of its pitch period, and a voiceless sound source is represented by a modeling representation using white noise to drive the speech synthesis filter.

[Problem to be solved by the invention]

しかしながら、従来のこのような波掛非伝送の分析情報
を用いて音声合成フィルタを駆動する音声合成器にあっ
ては、波形伝送型の音声合成器、例えばマルチパルスボ
コーダと比較すると、本質的に位相情報に欠け、またピ
ッチ周期情報にも曖昧性が入り込み易い。特に、音声合
成フィルタは分析側から合成側に伝送されるピッチ周期
情報に対応して発生するピッチパルス位置をパルス励起
点として駆動されるので、そのパルス励起点にエネルギ
の過度の集中が起こる。ピッチ周期を利用しての励振は
、定周期的にエネルギの集中を発生することになり、自
然性に乏しい機械音的な合成音質となるという欠点があ
った。However, in a conventional speech synthesizer that drives a speech synthesis filter using analysis information without waveform transmission, compared to a waveform transmission type speech synthesizer such as a multipulse vocoder, it is essentially Phase information is lacking, and pitch period information is also prone to ambiguity. In particular, since the speech synthesis filter is driven using the pitch pulse position generated in response to the pitch period information transmitted from the analysis side to the synthesis side as a pulse excitation point, energy is excessively concentrated at the pulse excitation point. Excitation using the pitch period causes concentration of energy at regular intervals, resulting in a synthetic sound quality that is mechanical and lacks naturalness.

[Means to solve the problem]

本発明は、音声合成フィルタをピッチ周期に基づいたイ
ンパルス列もしくは白色雑音でモデル化音源で駆動する
ことによって入力音声信号を合成するにあたって、ピッ
チ周期に基づいたピッチパルス列（インパルス列）の発
生周期を可変するピッチパルス可変器を設けたことを特
徴とする。The present invention provides a method for controlling the generation period of a pitch pulse train (impulse train) based on the pitch period when synthesizing an input speech signal by driving a speech synthesis filter with a modeled sound source using an impulse train based on the pitch period or white noise. It is characterized by being provided with a variable pitch pulse device.

[Effect]

ピッチパルス可変器により、パルス励起点であるピッチ
パルス位置が、ピッチパルス可変器で定まる周期によっ
て変化するので、エネルギの集中が避けられ、自然性に
近い合成音を得ることができる。Since the pitch pulse variable device changes the pitch pulse position, which is the pulse excitation point, according to the period determined by the pitch pulse variable device, concentration of energy can be avoided and a synthesized sound close to naturalness can be obtained.

〔Example〕

以下、図面に示す一実施例を参照して、本発明を詳細に
説明する。Hereinafter, the present invention will be described in detail with reference to an embodiment shown in the drawings.

第１図は本発明の音声合成器の一実施例を示すブロック
図である。デマルチプレクサ１１は伝送路１２を介して
、合成すべき音声信号の分析情報を人力する。この分析
情報は音声分析器によるスペクトラル包絡情報と音源情
報との多重化信号であり、スペクトラル包絡情報として
のＬＰＧ係数データａ１音源情報としての有声無声無音
情報ｂとピッチ周期情報Ｃおよび短時間音声電力データ
ｄが含まれる。FIG. 1 is a block diagram showing an embodiment of the speech synthesizer of the present invention. The demultiplexer 11 manually inputs analysis information of the audio signal to be synthesized via the transmission line 12. This analysis information is a multiplexed signal of spectral envelope information and sound source information by a voice analyzer, and includes LPG coefficient data a as spectral envelope information, voiced/unvoiced silence information b as sound source information, pitch period information C, and short-time voice power. Data d is included.

ここで、音声信号を分析する音声分析器は、ＬＰＧ分析
器、ピッチ抽出器、有声無声無音判別器、電力計測器等
からなり、分析した分析情報をメモリ回路に記憶すると
共に、マルチプレクサ等で適宜組み合わせて多重化し、
これを伝送符号化して伝送路１２に送出し、第１図に示
す音声合成器に供給する。Here, the voice analyzer that analyzes the voice signal is composed of an LPG analyzer, a pitch extractor, a voiced/unvoiced/silence discriminator, a power meter, etc., and stores the analyzed analysis information in a memory circuit and uses a multiplexer etc. as appropriate. Combine and multiplex
This is transmitted encoded, sent to the transmission line 12, and supplied to the speech synthesizer shown in FIG.

音声合成器では、入力された分析情報に基づき、デマル
チプレクサ１１によって多重化データの多重化分離と復
号化とを行う。In the speech synthesizer, the demultiplexer 11 demultiplexes and decodes the multiplexed data based on the input analysis information.

復号化したＬＰＧ係数データａは、音声合成フィルタ１
３に、ピッチ周期情報Ｃはピッチ，｛Ｊレス発生器１４
に、有声無声無音情報ｂは切替器１５に、また短時間音
声電力データｄは可変増幅器１６にそれぞれ供給される
。音声合成フィルタ１３は、予め定めた次数の全極型デ
ィジタルフィルタとして構成され、ＬＰＧ係数データａ
はこのフィルタの係数として利用される。The decoded LPG coefficient data a is passed through the speech synthesis filter 1
3, the pitch period information C is the pitch, {Jless generator 14
Then, the voiced/unvoiced/silent information b is supplied to the switch 15, and the short-time voice power data d is supplied to the variable amplifier 16. The speech synthesis filter 13 is configured as an all-pole digital filter of a predetermined order, and is configured as an all-pole digital filter of a predetermined order.
are used as coefficients of this filter.

有声無声無音情報ｂは切替器１５に供給され、このデー
タが有声を指定するときはピッチパルス発生器１４の出
力を可変増幅器１６に、また無声無音のときは雑音発生
器１７の出力を可変増幅器ｌ６に供給するように切替器
１５を切り替えさせる。The voiced/unvoiced/silent information b is supplied to the switch 15, and when this data specifies voiced, the output of the pitch pulse generator 14 is sent to the variable amplifier 16, and when it is unvoiced, the output of the noise generator 17 is sent to the variable amplifier 16. The switch 15 is switched to supply the signal to l6.

雑音発生器１７は白色雑音を発生し、有声無声無音情報
ｂが無声か無音かを指定するときは、この白色雑音が可
変増幅器１６に供給される。The noise generator 17 generates white noise, and this white noise is supplied to the variable amplifier 16 when the voiced/unvoiced/silence information b specifies whether it is voiceless or silent.

ピッチ周期情報Ｃを供給されたピッチパルス発生器ｌ４
は、このピッチ周期に対応する周波数のピッチパルス列
を発生し、更に前述のパルス列を本発明の特徴であるピ
ッチパルス可変器１８の指示に基づいた位置に修正した
のち、切替器１５に供給する。Pitch pulse generator l4 supplied with pitch period information C
generates a pitch pulse train of a frequency corresponding to this pitch period, further corrects the position of the aforementioned pulse train based on the instruction from the pitch pulse variable device 18, which is a feature of the present invention, and then supplies it to the switch 15.

可変増幅器１６はこうして入力するピッチパルスもしく
は白色雑音に対し、別に入力する短時間音声電力データ
ｄの大きさに対応した重み付け増幅を実施したのち、こ
れを音声合成フィルタ１３に供給し、このフィルタの駆
動音源とする。The variable amplifier 16 performs weighted amplification on the input pitch pulse or white noise in accordance with the magnitude of the short-time voice power data d that is input separately, and then supplies this to the voice synthesis filter 13. Use as a driving sound source.

音声合成フィルタ１３は、こうして入力するＬＰＧ係数
データａをフィルタ係数とし、駆動音源によって駆動さ
れ、分析フレームごとに量子化合成波形を再生し、Ｄ／
Ａコンバータｌ９に供給する。The speech synthesis filter 13 uses the input LPG coefficient data a as a filter coefficient, is driven by a driving sound source, reproduces a quantized synthesized waveform for each analysis frame, and converts the D/
Supplied to A converter l9.

Ｄ／Ａコンバータ１９は、こうして入力した量子化合成
波形をアナログ波形に変換し、ＬＰＦ（Ｌｏｗ　Ｐａｓ
ｓ　Ｆｉｌｔｅｒ）　２Ｑに送出する。ＬＰＦ２０は、
所定の高城周波数遮断フィルタリングを行い、合成音声
として出力ライン２１に送出する。The D/A converter 19 converts the input quantized composite waveform into an analog waveform, and converts it into an analog waveform.
s Filter) Send to 2Q. LPF20 is
A predetermined Takagi frequency cutoff filtering is performed, and the synthesized speech is sent to the output line 21.

ところで、従来の音声合成器は、前述した通り、音声合
成フィルタ１３を駆動するパルス励起点にピッチ周期、
すなわち定周期的な過度のエネルギの集中が生起し、自
然性に乏しい合成音声となっていたが、本発明によるこ
の実施例では、パルス励起点をピッチパルス可変器１８
により意図的に動かす。これによって、定周期（ピッチ
周期）ごとに発生する過度のエネルギの集中をピッチ周
期を基本としながら分散させる。つまり、ピッチ周期、
もしくはその周期よりも少し前もしくは少し後というよ
うにパルスの励起点を意図的に動かすことにより、従来
は定周期的に発生した過度のエネルギの発生を時間的に
分散させることができ、聴覚的違和感をなくし、合成音
声の自然性を著しく改善している。By the way, as mentioned above, the conventional speech synthesizer has a pitch period, a pulse excitation point that drives the speech synthesis filter 13, and
In other words, excessive concentration of energy occurs periodically, resulting in synthesized speech that lacks naturalness.However, in this embodiment according to the present invention, the pulse excitation point is changed to the pitch pulse variable device 18.
Move more intentionally. As a result, excessive concentration of energy that occurs every fixed period (pitch period) is dispersed based on the pitch period. That is, the pitch period,
Alternatively, by intentionally moving the excitation point of the pulse a little earlier or a little later than that period, it is possible to temporally disperse the generation of excessive energy that conventionally occurred periodically, which improves the auditory sense. This eliminates the sense of discomfort and significantly improves the naturalness of synthesized speech.

ここで、ピッチパルス可変器１８は、擬似乱数として代
表的なＭ系列を利用しており、Ｍ系列の下位２ビットが
（０．０）（１．１）のときは、ピッチ周期通りのパル
ス励起とし、（１．０）のときはピッチ周期よりも１サ
ンプル（１２５μＳｅｃ）早く励起し、（１．０）のと
きは１サンプル遅く励起する構成となっている。そして
、ピッチパルス可変器１８の内容は１サンプルごとに更
新する構成となっている。Here, the pitch pulse variable device 18 uses a typical M sequence as a pseudo-random number, and when the lower two bits of the M sequence are (0.0) (1.1), the pulse pulse according to the pitch period is When it is (1.0), it is excited one sample (125 μSec) earlier than the pitch period, and when it is (1.0), it is excited one sample later. The contents of the pitch pulse variable device 18 are updated every sample.

〔Effect of the invention〕

以上説明したように本発明によれば、音声合成器にピッ
チ可変器を設け、ピッチパルス周期およびその近傍でパ
ルス励起点を定めるので、過度のエネルギの集中を防止
することができ、自然性のない機械音的な合成音となる
ことを防ぐことができる。このように、エネルギの励振
点を可変するピッチパルス可変器により、エネルギを時
間的に分散せしめ、聴覚的違和感のない自然性のよい合
成音声が生成できる。As explained above, according to the present invention, the speech synthesizer is provided with a pitch variable device, and the pulse excitation point is determined at the pitch pulse period and its vicinity, so that excessive concentration of energy can be prevented and natural It is possible to prevent the sound from becoming a mechanically synthesized sound. In this way, by using the pitch pulse variable device that varies the energy excitation point, the energy can be dispersed over time, making it possible to generate synthetic speech with good naturalness and no auditory discomfort.

【図面の簡単な説明】第１図は本発明の音声合成器の一実施例を示すブロック
図である。１１・・・・・・デマルチブレクサ、１２・・・・・・
伝送路、１３・・・・・・音声合成フィルタ、１４・・・・・・ピッチパルス発生器、１５・旧・・切
替器、１６・・・・・・可変増幅器、１７・・・・・・
雑音発生器、１８・・・・・・ピッチパルス可変器、１
９・・・・・・Ｄ／Ａコンバータ、２０・・・・・・Ｌ
ＰＦ．躬１図BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a speech synthesizer of the present invention. 11... Demultiplexer, 12...
Transmission line, 13...Speech synthesis filter, 14...Pitch pulse generator, 15.Old...Switcher, 16...Variable amplifier, 17...・
Noise generator, 18...Pitch pulse variable device, 1
9...D/A converter, 20...L
P.F. Figure 1

Claims

[Claims]

Spectral envelope information indicating the macroscopic structure of the audio signal analyzed by a speech analyzer and sound source information indicating the fine structure of the audio signal are input, the spectral envelope information is driven by the sound source information, and a speech synthesis filter is used. A speech synthesizer that synthesizes speech includes a pitch pulse generator that generates a pitch pulse train of a predetermined frequency based on pitch period information of the sound source information, and a generation position of the pitch pulse train generated by this pitch pulse generator. a pitch pulse variable device for changing the pitch pulse, a noise generator for generating white noise, and selecting the output of the pitch pulse generator when the voiced/unvoiced/unvoiced information of the sound source information is voiced information; A speech synthesizer comprising: a switch that selects the output of the noise generator when the information is unvoiced/silent information; and a switch that drives the speech synthesis filter based on the output of the switch.