JPH02247700A

JPH02247700A - Voice synthesizing device

Info

Publication number: JPH02247700A
Application number: JP1070030A
Authority: JP
Inventors: Nobuhide Yamazaki; 山崎　信英; Hiroo Kitagawa; 博雄北川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-03-20
Filing date: 1989-03-20
Publication date: 1990-10-03

Abstract

PURPOSE:To stabilize synthesized voices and to improve quality thereof by determining the spectral parameter of an original voice signal which is subjected to a high-band stress and passing the original voice signal through a reverse filter of this spectrum. CONSTITUTION:The original voice is subjected to the high-band stress in a high-pass filter 1 and is subjected to the spectral analysis of the original voice at every frame in an LSP analysis circuit 2, from which the voice is outputted as an LSP parameter. The original voice obtd. by passing the LSP reverse filter 3 having the characteristics reverse from the spectral characteristics by the LSP parameter is subjected to a high-band suppression and is made into a residual waveform. This residual waveform is edited in accordance with the data on pitch and amplitude by a residual waveform synthesis circuit 6, by which the synthesized residual waveform is formed and is outputted as the synthesized voice by an LSP synthesis filter 4. The residual waveform of the large power is thereby obtd. and the high-quality synthesized voice is stably obtd.

Description

【発明の詳細な説明】伎亙立互本発明は、音声合成装置、より詳細には、残差波形を利
用した規則音声合成装置に関し、高品位な音声を合成す
るための分析合成系に係わる。[Detailed Description of the Invention] The present invention relates to a speech synthesis device, and more particularly, to a regular speech synthesis device using a residual waveform, and relates to an analysis and synthesis system for synthesizing high-quality speech. .

災米伎亙従来の音声合成器は、音声のスペクトル包絡を表現する
声道フィルタと、音声のピッチ周期、振幅、スペクトル
微細構造を表現する駆動音源信号生成装置から構成され
る。通常、声道フィルタはＰＡＲＣＯＲ方式やＬＳＰ方
式などのデジタルフィルタで構成され、駆動音源はイン
パルスやホワイトノイズを切り換えて用いられる（白鳥
英−：音声合成技術、情報処理Ｖｏ１．２４　Ｎｏ、８
　ｐｐ、９９３−１０００では、音声生成モデルとして
音源にパルス周期と雑音を切り替えている）。A conventional speech synthesizer consists of a vocal tract filter that expresses the spectral envelope of speech, and a driving sound source signal generation device that expresses the pitch period, amplitude, and spectral fine structure of speech. Usually, the vocal tract filter is composed of a digital filter such as the PARCOR method or the LSP method, and the driving sound source is used by switching between impulse and white noise (Hide Shiratori: Speech Synthesis Technology, Information Processing Vol. 1.24 No. 8)
pp. 993-1000, the sound source is switched between pulse period and noise as a speech generation model).

駆動音源としては、人間の原音声信号を逆フイルタリン
グ処理して得た残差波形を用いることもある（特公昭５
８−８８７９８号公報では、残差信号から切り出したお
よそ１ピツチの周期の波形を駈動波形としている）。こ
れは、スペクトルパラメータで近似できなかった成分を
音源でカバーして、音質向上を狙ったものである。As a driving sound source, a residual waveform obtained by inverse filtering the original human voice signal may be used (Tokuko Kokō 5).
In Japanese Patent No. 8-88798, a waveform with a period of about 1 pitch extracted from the residual signal is used as a cantering waveform). This aims to improve sound quality by covering components that cannot be approximated by spectral parameters with the sound source.

ところが、従来の残差波形は音韻によって残差波形のパ
ワーが極端に低くなる場合があり、これが合成フィルタ
の発振を引き起こす原因となっていた。However, in the conventional residual waveform, the power of the residual waveform may be extremely low depending on the phoneme, which causes oscillation of the synthesis filter.

■−−−昨本発明は、上述のごとき実情に鑑みてなされたもので、
特に、高域を抑圧した残差波形を用いて駆動音源のパワ
ーを大きくし、これによって、合成器の発振を減少させ
、より安定で高品質な合成音を得ることを目的としてな
されたものである。■---The present invention was made in view of the above-mentioned circumstances.
In particular, this was done with the aim of increasing the power of the driving sound source by using a residual waveform with suppressed high frequencies, thereby reducing the oscillation of the synthesizer and obtaining a more stable and high-quality synthesized sound. be.

豆−一戒本発明は、上記目的を達成するために、声道の特性ある
いは音声のスペクトル包絡を模擬する回路と該回路を駆
動する音源とからなる音声合成装置において、原音声信
号を高域強調するフィルタと、入力信号のスペクトル包
絡を表現するパラメータを求める分析回路と、このスペ
クトル包絡の周波数特性に対する逆フィルタを有し、原
音声信号を高域強調したもののスペクトルパラメータを
求め、このスペクトルの逆フィルタに原音声信号を通す
ことによって得られる高域を抑圧した残差波形を用いて
駆動すること、或いは、声道の特性あるいは音声のスペ
クトル包絡を模擬する回路と該回路を駆動する音源とか
らなる音声合成装置において、原音声信号のスペクトル
の傾きを除去する適応逆フィルタと、入力信号のスペク
トル包絡を表現するパラメータを求める分析回路と、こ
のスペクトル包絡の周波数特性に対する逆フィルタを有
し、原音声信号スペクトルの傾きを除去した後、これを
分析回路によってスペクトルパラメータを求め、ＪＭ汗
声信号をこのスペクトルの逆フィルタに与えることで、
原音声信号におけるスペクトルの傾きを音源信号に含め
、逆にスペクトルの傾きをスペクトルパラメータから除
去することを特徴としたものである。以下、本発明の実
施例に基づいて説明する。In order to achieve the above-mentioned object, the present invention provides a speech synthesis device that includes a circuit that simulates the characteristics of the vocal tract or the spectral envelope of speech and a sound source that drives the circuit. It has an emphasizing filter, an analysis circuit that obtains parameters representing the spectral envelope of the input signal, and an inverse filter for the frequency characteristics of this spectral envelope. Drive using a residual waveform with suppressed high frequencies obtained by passing the original speech signal through an inverse filter, or a circuit that simulates the characteristics of the vocal tract or the spectral envelope of speech and a sound source that drives the circuit. A speech synthesis device comprising: an adaptive inverse filter for removing the spectral slope of the original speech signal; an analysis circuit for obtaining parameters expressing the spectral envelope of the input signal; and an inverse filter for the frequency characteristics of the spectral envelope; After removing the slope of the original audio signal spectrum, the spectrum parameters are obtained using an analysis circuit, and the JM sweat voice signal is applied to an inverse filter of this spectrum.
This method is characterized in that the spectral slope of the original audio signal is included in the sound source signal, and conversely, the spectral slope is removed from the spectral parameters. Hereinafter, the present invention will be explained based on examples.

而して１本発明においては、原音声信号を高域強調した
もののスペクトルパラメータを求め、このスペクトルの
逆フィルタに原音声信号を通すことによって高域を抑圧
した残差波形を得るが、その波形は、従来のインパルス
状のものから三角波のようなものになり、パワーも比較
的大きくなり、従フて、合成フィルタのゲインが下がり
、これによって、合成器の発振を減少させ、より安定で
高品質な合成音声を得ることが出来る。Accordingly, in the present invention, the spectral parameters of the original audio signal with high frequencies emphasized are obtained, and the residual waveform with the high frequencies suppressed is obtained by passing the original audio signal through an inverse filter of this spectrum. changes from the conventional impulse-like shape to a triangular wave-like one, and the power becomes relatively large.As a result, the gain of the synthesis filter decreases, which reduces the oscillation of the synthesizer and makes it more stable and high-frequency. It is possible to obtain high-quality synthesized speech.

第１図は、諸求項第１項に記載した発明の一実施例を説
明するための構成図で１図中、１は高域フィルタ、２は
ＬＳＰ分析回路、３はＬＳＰ逆フィルタ、４はＬＳＰ合
成フィルタ、５はパラメータ、残差波形記憶部、６は残
差合成回路で、原音声は１合成パラメータを作成するた
めの音声資料であり、該原音声は高域フィルタ１によっ
て高域強調された後、ＬＳＰ分析回ｌｆ！ｔ２に送られ
る。FIG. 1 is a block diagram for explaining an embodiment of the invention described in Item 1 of the claims.In the figure, 1 is a high-pass filter, 2 is an LSP analysis circuit, 3 is an LSP inverse filter, and 4 is an LSP synthesis filter, 5 is a parameter and residual waveform storage unit, 6 is a residual synthesis circuit, the original voice is audio material for creating 1 synthesis parameter, and the original voice is processed in high frequency by high frequency filter 1. After being highlighted, LSP analysis time lf! Sent to t2.

ＬＳＰ分析回路２では、フレーム毎に高域強調された原
音声のスペクトル分析を行ない、スペクトル包絡を表わ
すＬＳＰパラメータとして出力する。また、原波形はＬ
ＳＰ逆フィルタ３に通される。ここで、ＬＳＰ逆フィル
タ３はＬＳＰパラメータによって与えられるスペクトル
特性とは逆のスペクトル特性を持つフィルタである。こ
のＬＳＰパラメータによって示されるスペクトル包絡は
、原音声のそれと比べて高域が強調されているため、Ｌ
ＳＰ逆フィルタ３の出力は高域抑圧された残差波形とな
る。第２図に原音声信号（ａ）と、高域抑圧しない残差
波形（ｂ）と、高域抑圧した残差波形ＣＱ）を示す。The LSP analysis circuit 2 performs spectrum analysis of the high-frequency emphasized original speech for each frame and outputs it as an LSP parameter representing the spectrum envelope. Also, the original waveform is L
It is passed through an SP inverse filter 3. Here, the LSP inverse filter 3 is a filter having spectral characteristics opposite to those given by the LSP parameters. The spectral envelope indicated by this LSP parameter emphasizes the high range compared to that of the original voice, so
The output of the SP inverse filter 3 becomes a residual waveform with high frequencies suppressed. FIG. 2 shows the original audio signal (a), the residual waveform without high frequency suppression (b), and the residual waveform with high frequency suppression (CQ).

合成時には、高域抑圧した残差波形は残差合成回路６に
よってピッチと振幅のデータをもとに編集され合成残差
波形となる。合成残差波形はＬＳＰ合成［ｔ４によって
合成音声として出力される。At the time of synthesis, the high-frequency suppressed residual waveform is edited by the residual synthesis circuit 6 based on pitch and amplitude data to become a composite residual waveform. The synthesized residual waveform is output as synthesized speech by LSP synthesis [t4.

なお、本実施例のＬＳＰパラメータ以外に、パーコール
、ＬＰＧ、フォルマントなどの他のスペクトル特性を表
わすパラメータも使用できる。In addition to the LSP parameters of this embodiment, parameters representing other spectral characteristics such as Percoll, LPG, and formant can also be used.

第３図は、請求項第２項に記載した発明の一実施例を説
明するための構成図で、図中、１０は適応フィルタを示
し、該適応フィルタ１０は、原音声における一次の相関
係数を相関器によって求め、スペクトルパラメータから
スペクトルの傾きを除去し、このスペクトルの逆フィル
タに原音声信号を通して、パワーの大きい残差波形を得
、また、該適応フィルタによって、有声音、無声音に関
わらずスペクトルの傾きを補正して、スペクトルをフラ
ットにしている、ここで、１次の相関係数をｒ工とする
と、Ｆ（ｚ）＝１−ｒＬｚ−１の特性を持つプログラマブル
フィルタによって実現できる。FIG. 3 is a block diagram for explaining one embodiment of the invention as set forth in claim 2. In the figure, 10 indicates an adaptive filter, and the adaptive filter 10 is configured to calculate the first-order correlation in the original speech. The number is calculated by a correlator, the slope of the spectrum is removed from the spectrum parameter, and the original speech signal is passed through an inverse filter of this spectrum to obtain a residual waveform with large power. First, the slope of the spectrum is corrected to make the spectrum flat. Here, if the first-order correlation coefficient is r, this can be achieved by a programmable filter with the characteristic F(z) = 1-rLz-1. .

なお、ＬＳＰ分析回路２乃至残差合成回路６の動作は、
第１図の場合と全く同様であるので、その詳細な説明は
省略する。The operations of the LSP analysis circuit 2 to residual synthesis circuit 6 are as follows.
Since this is exactly the same as the case shown in FIG. 1, detailed explanation thereof will be omitted.

羞−一末以上の説明から明らかなように、請求項第１項の発明に
よると、高域を強調してスペクトル分析を行い、原音声
信号を高域強調したもののスペクｌ−ルパラメータを求
め、このスペクトルの逆フィルタに原音声信号を通して
いるので、パワーの大きい残差波形が得られ、これが合
成器の発振を減少させ、より安定で高品質な合成音声を
得ることが出来る。また、高域強調フィルタによって有
声音に特有なスペクトルの傾きを補正し、スペクトルを
フラットにすることで、高域の分析精度を高めている。As is clear from the above description, according to the invention of claim 1, the spectrum analysis is performed with the high frequency band emphasized, and the spectral parameters of the original audio signal with the high frequency band emphasized are determined. Since the original speech signal is passed through this spectral inverse filter, a residual waveform with large power is obtained, which reduces the oscillation of the synthesizer, making it possible to obtain a more stable and high-quality synthesized speech. Additionally, a high-frequency emphasis filter is used to correct the spectral tilt characteristic of voiced sounds, flattening the spectrum and increasing the accuracy of high-frequency analysis.

また、低次のフォルマントが見かけ上非常に鋭いＱとし
て現われてしまう現象に対しても抑圧の効果がある。It also has the effect of suppressing the phenomenon in which low-order formants appear as an apparently very sharp Q.

また、請求項第２項の発明によると、適応フィルタによ
って、スペクトルパラメータからスペクトルの傾きを除
去して、このスペクトルの逆フィルタに原音声信号を通
しているので、パワーの大きい残差波形が得られ、これ
が合成器の発振を減少させ、より安定で高品質な合成音
声を得ることが出来る。また、適応フィルタによって有
声音１、！声音に関わらずスペクトルの傾きを補正し、
スペクトルをフラットにすることで、高域の分析精度を
高めている。また、低次のフォルマントが見かけ上非常
に鋭いＱとして現われてしまうという現象に対しても抑
圧の効果がある。Further, according to the invention of claim 2, since the slope of the spectrum is removed from the spectrum parameter by the adaptive filter and the original audio signal is passed through the inverse filter of this spectrum, a residual waveform with large power can be obtained. This reduces synthesizer oscillations, resulting in more stable and high quality synthesized speech. In addition, voiced sounds 1,!, by adaptive filters. Corrects the slope of the spectrum regardless of the voice,
By flattening the spectrum, high-frequency analysis accuracy is improved. It also has the effect of suppressing the phenomenon in which low-order formants appear as an apparently very sharp Q.

[Brief explanation of drawings]

第１図は、請求項第１項に記載した発明の一実施例を説
明するための構成図、第２図は、高域抑圧していない残
差波形（ｂ）と、高域抑圧した残差波形を示す図、第３
図は、請求項第２項に記載した発明の一実施例を説明す
るための構成図である。１・・・高域フィルタ、２・・・ＬＳＰ分析回路、３・
・・ＬＳＰ逆フィルタ、４・・・ＬＳＰ合成フィルタ、
５・・・パラメータ、残差波形記憶部、６・・・残差合
成回路、１０・・・適応フィルタ。第１区第２図FIG. 1 is a block diagram for explaining one embodiment of the invention as set forth in claim 1, and FIG. 2 shows a residual waveform (b) without high frequency suppression and a residual waveform (b) with high frequency suppression. Diagram showing the difference waveform, 3rd
The figure is a configuration diagram for explaining an embodiment of the invention set forth in claim 2. 1...High-pass filter, 2...LSP analysis circuit, 3.
...LSP inverse filter, 4...LSP synthesis filter,
5... Parameter, residual waveform storage section, 6... Residual synthesis circuit, 10... Adaptive filter. District 1, Figure 2

Claims

[Claims] 1. A speech synthesis device comprising a circuit that simulates the characteristics of the vocal tract or the spectral envelope of speech and a sound source that drives the circuit, which includes a filter that emphasizes the high frequency range of the original speech signal, and a filter that emphasizes the high frequency range of the input signal. It has an analysis circuit that obtains parameters expressing the spectral envelope, and an inverse filter for the frequency characteristics of this spectral envelope, obtains the spectral parameters of the high-frequency emphasized original audio signal, and passes the original audio signal through this spectral inverse filter. A speech synthesis device characterized in that it is driven using a residual waveform obtained by suppressing high frequencies. 2. In a speech synthesis device consisting of a circuit that simulates the characteristics of the vocal tract or the spectral envelope of speech and a sound source that drives the circuit, an adaptive inverse filter that removes the spectral slope of the original speech signal and the spectral envelope of the input signal are used. , and an inverse filter for the frequency characteristics of this spectral envelope. After removing the slope of the original audio signal spectrum, the analysis circuit calculates the spectral parameters, and the original audio signal is converted into this spectrum. 1. A speech synthesis device characterized in that a spectral slope in an original audio signal is included in a sound source signal and conversely removed from a spectral parameter by applying the spectral slope to an inverse filter.