JPH0632037B2

JPH0632037B2 - Speech synthesizer

Info

Publication number: JPH0632037B2
Application number: JP60281438A
Authority: JP
Inventors: 寛治国澤; 博糸山
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1985-12-13
Filing date: 1985-12-13
Publication date: 1994-04-27
Anticipated expiration: 2009-04-27
Also published as: JPS62139599A

Description

【発明の詳細な説明】［技術分野］本発明は、規則合成手段を用いたＰＡＲＣＯＲ方式の音
声合成装置に関するものである。Description: TECHNICAL FIELD The present invention relates to a PARCOR-type speech synthesizing apparatus using rule synthesizing means.

［背景技術］一般に、音素や音節記号あるいは文字系列データだけを
入力とし、それらに対応する任意の合成音出力を得るこ
とができる規則合成法は、合成音の語彙数に制限がない
ので、非常に優れた音声合成方法である。しかしなが
ら、規則合成法で自然性、明瞭性の高い合成音を作るこ
とは容易でない。そこで、用途を限定するなどして実用
に耐え得ると思われるものが開発されてきている。ここ
に、比較的安価なＬＳＩが提供されているＰＡＲＣＯＲ
方式［振幅パラメータと、ピッチパラメータと、スペク
トルパラメータであるところの偏相関係数（κパラメー
タ）とよりなる音声合成データを音声信号をサンプリン
グした定常フレームから抽出し、この音声合成データに
て音声を合成する方式］の音声合成手段を用いた規則合
成法として、（母音＋子音＋母音）形音節（ＶＣＶ音
節）や、（子音＋母音）形音節（ＣＶ音節）を単位とす
る規則合成法が提案されている。ここに、ＶＣＶ音節は
７７０種類程度であり、ＣＶ音節は１００種類程度であ
るから、記憶容量の点から言えばＣＶ音節を単位とする
規則合成法の方が優れている。しかし、ＶＣＶ音節を単
位とする方法では母音と母音を接続するだけで良い（Ｖ
_１１Ｃ_１Ｖ_１２＋Ｖ_２１Ｃ_２Ｖ_２２）のに対して、ＣＶ
音節を単位とする方法では、母音と子音また母音（Ｃ_１
Ｖ_１＋Ｃ_２Ｖ_２またはＣ_１Ｖ_２＋Ｖ_２）を接続しなけれ
ばならないため、接続部分における補間特性が悪くなっ
て合成音の音質が悪くなるという問題があった。すなわ
ち、母音と母音、母音と有声子音とを接続するときには
パラメータを補間することになるが、ＰＡＲＣＯＲ方式
のκパラメータは直線補間を行った場合の補間特性が悪
く、合成音の品質が低くなってしまい自然性、明瞭性に
欠ける音声になるという問題があった。[Background Art] Generally, a rule-based synthesis method that can obtain only a phoneme, a syllable symbol, or a character sequence data as an input and obtain an arbitrary synthesized sound output corresponding to the data is not limited in the number of vocabularies of the synthesized sound. It is an excellent speech synthesis method. However, it is not easy to create synthetic sounds with high naturalness and clarity by the rule synthesis method. Therefore, what is thought to be practically usable has been developed by limiting the applications. PARCOR, where relatively inexpensive LSI is provided here
Method [Voice synthesis data consisting of an amplitude parameter, a pitch parameter, and a partial correlation coefficient (κ parameter), which is a spectral parameter, is extracted from a stationary frame in which a voice signal is sampled, and a voice is generated using this voice synthesis data. As a rule synthesizing method using a voice synthesizing means of a method of synthesizing, a rule synthesizing method using (vowel + consonant + vowel) syllables (VCV syllables) and (consonant + vowel) syllables (CV syllables) is a unit. Proposed. Here, since there are about 770 types of VCV syllables and about 100 types of CV syllables, the rule synthesis method using CV syllables as a unit is superior in terms of storage capacity. However, in the method using the VCV syllable as a unit, it is sufficient to connect the vowels to each other (V
₁₁ C ₁ V ₁₂ + V ₂₁ C ₂ V ₂₂ ), but CV
In the syllable-based method, vowels and consonants or vowels (C ₁
Since V ₁ + C ₂ V ₂ or C ₁ V ₂ + V ₂ ) has to be connected, there is a problem in that the interpolation characteristic in the connected portion is deteriorated and the sound quality of the synthesized sound is deteriorated. That is, parameters are interpolated when connecting vowels and vowels and vowels and voiced consonants, but the κ parameter of the PARCOR method has poor interpolation characteristics when linear interpolation is performed, resulting in poor quality of synthesized speech. However, there is a problem in that the sound lacks naturalness and clarity.

そこで、このような問題点を解決するものとして、発明
者等は声道断面積に基いて音節の接続部のパラメータを
補間するようにした音声合成装置を特願昭６０−１２７
７７号として出願（以下において基本例と称する）して
いる。すなわち、規則合成手段にて処理された音声合成
データにより音声を合成するＰＡＲＣＯＲ方式の音声合
成手段を具備した音声合成装置において、音節の接続部
におけるパラメータを声道断面積Ai上で直線補間した
後、κパラメータに変換するように上記規則合成手段を
形成して、自然性、明瞭性が良い合成音を得ることがで
きるようにしたものである。ここに、上記基本例は、声
道を直円筒縦続接続近似（詳しくは参考文献「音声情報
処理の基礎」（オーム社）の第１０章を参照）した場合
に各声道断面積Aiは直線的に変化するであろうという考
えに基づいており、κパラメータ上で直線補間するより
も合成音の品質が高くなるものであり、κパラメータκ
ｉと声道断面積Ai,Ai₊₁との関係は κｉ＝（Ａｉ−Ａｉ_＋１）／Ａｉ＋Ａｉ_＋１）……(1) である。Therefore, as a solution to such a problem, the inventors of the present invention have proposed a speech synthesizer which is designed to interpolate the parameter of the connection part of the syllable based on the vocal tract cross-sectional area.
No. 77 has been filed (hereinafter referred to as a basic example). That is, in a voice synthesizing apparatus equipped with a PARCOR-type voice synthesizing means for synthesizing a voice by the voice synthesizing data processed by the rule synthesizing means, after linearly interpolating a parameter in a syllable connecting portion on the vocal tract cross-sectional area Ai. , The above-mentioned rule synthesizing means is formed so as to be converted into the .kappa. Parameter, so that a synthesized voice with good naturalness and clarity can be obtained. Here, in the above basic example, each vocal tract cross-sectional area Ai is a straight line when the vocal tract is subjected to a straight-cylindrical cascade connection approximation (for details, refer to Chapter 10 of Reference "Basics of Speech Information Processing" (Ohmsha)). Based on the idea that it will change over time, the quality of synthesized speech will be higher than when linear interpolation is performed on the κ parameter.
The relationship between i and the vocal tract cross-sectional areas Ai, Ai _{+ 1} is κi = (Ai-Ai _{+ 1} ) / Ai + Ai _{+ 1} ) (1).

しかしながら、このような基本例にあっては、合成音の
品質が高くなるものの、声道断面積上での直線補間の計
算に加えて、声道断面積からκパラメータに変換するた
めの割り算をする必要があり、この割り算が比較的大き
な負担となってしまうという問題があった。そこで、声
道断面積上で直線補間を行い、Ai,Ai₊₁を求めた後、変
換用テーブルを用いてκパラメータを求めるようにした
ものがあった。しかしながら、この場合にあっては、割
り算が不要になって演算処理が簡略化されるものの、２
次元の変換テーブルが必要になり、この変換テーブルに
要する記憶容量が非常に大きく（一般に、２次元のテー
ブルを形成するために要する記憶容量は１次元のテーブ
ルを形成するために要する記憶容量の２乗倍）になって
実現が困難になるといる問題があった。However, in such a basic example, although the quality of the synthesized voice is high, in addition to the calculation of the linear interpolation on the vocal tract cross-sectional area, the division for converting the vocal tract cross-sectional area into the κ parameter is performed. However, there was a problem that this division would be a relatively heavy burden. Therefore, there is a method in which linear interpolation is performed on the vocal tract cross-sectional area, Ai and Ai ₊₁ are obtained, and then the κ parameter is obtained using a conversion table. However, in this case, although division is unnecessary and the arithmetic processing is simplified, 2
A dimensional conversion table is required, and the storage capacity required for this conversion table is very large (generally, the storage capacity required to form a two-dimensional table is 2 times the storage capacity required to form a one-dimensional table). There was a problem that it became difficult to realize because of the multiplication.

［発明の目的］本発明は上記の点に鑑みて為されたものであり、その目
的とするところは、自然性、明瞭性が良い合成音を得る
ことができ、しかも、計算処理量および記憶容量を少な
くすることができ、構成が簡単になってコストを安くす
ることができる音声合成装置を提供することにある。[Object of the Invention] The present invention has been made in view of the above points, and an object of the present invention is to obtain a synthetic sound with good naturalness and clarity, and to calculate and store a large amount of calculation and memory. An object of the present invention is to provide a speech synthesizer that can reduce the capacity, simplify the configuration, and reduce the cost.

［発明の開示］（実施例）第１図は本発明一実施例を示すもので、子音と母音とよ
りなるＣＶ音節を単位とした音節合成パラメータを記憶
する音節合成パラメータ記憶部１と、規則データを記憶
する規則データ記憶部２と、文字系列データのような合
成入力データＤ_０、音節合成パラメータＤ_１および規則
データＤ_２に基いて音声合成データＤ_３を演算するとと
もに、音節の接続部のパラメータを補間処理する演算部
３とで規則合成手段４を形成し、上記規則合成手段４に
て処理された音声合成データＤ_３により音声を合成する
ＰＡＲＣＯＲ方式の音声合成手段５を具備して成る音声
合成装置において、音節間の接続を行う際における先行
音節、後続音節の接続部の声道断面積を▲Ａ^s _i▼，▲Ａ
^s _i+1▼、▲Ａ^e _i▼，▲Ａ^e _i+1▼とすると、の値を求め、この値より予め記憶しているκパラメータ
の補間パターンを選択して補間を行うように演算部３を
形成したものである。なお、実施例にあっては各音節の
音節合成パラメータを得るための分析手段１０はローパ
スフィルタを具備し原音声Ｖ_０をＡ／Ｄ変換するＡ／Ｄ
変換回路１１と、差分回路１２と、原音声Ｖ_０の振幅、
周期、スペクトルを分析して合成パラメータ（振幅パラ
メータ、ピッチパラメータおよびκパラメータ）を形成
する分析回路１３とで構成されており、分析を行う前に
差分などの逆フィルタリングを行うことにより、分析精
度を向上させて合成音の品質向上を図るようになってい
る。また、音声合成手段５の後段には、上記逆フィルタ
リングによる前処理をキャンセルするための逆差分回路
１４およびローパスフィルタを具備したＤ／Ａ変換回路
１５が設けられている。DISCLOSURE OF THE INVENTION (Embodiment) FIG. 1 shows an embodiment of the present invention, in which a syllable synthesis parameter storage unit 1 for storing syllable synthesis parameters in units of CV syllables composed of consonants and vowels, and a rule. A rule data storage unit 2 for storing data, a synthesized input data D ₀ such as character sequence data, a syllable synthesis parameter D ₁ and a rule data D ₂ are used to calculate a voice synthesis data D ₃ and a syllable connection unit. The parameter synthesizing means 4 is formed by the arithmetic unit 3 for interpolating the parameters of 1. and the PARCOR type speech synthesizing means 5 for synthesizing the speech by the speech synthesizing data D ₃ processed by the rule synthesizing means 4 is provided. In the speech synthesizing apparatus, the vocal tract cross-sectional areas of the connecting portions of the preceding syllable and the following syllable in connecting the syllables are ▲ A ^s _i ▼, ▲ A
^{If s} _{i + 1} ▼, ▲ A ^e _i ▼, ▲ A ^e _{i + 1} ▼, The arithmetic unit 3 is formed so as to perform the interpolation by obtaining the value of, and selecting the interpolation pattern of the κ parameter stored in advance from this value. In the embodiment, the analyzing means 10 for obtaining the syllable synthesis parameter of each syllable is equipped with a low-pass filter and A / D for A / D converting the original voice V _0.
The conversion circuit 11, the difference circuit 12, the amplitude of the original voice V ₀ ,
It is composed of an analysis circuit 13 which analyzes a period and a spectrum to form a synthetic parameter (amplitude parameter, pitch parameter and κ parameter). By performing inverse filtering such as difference before analysis, analysis accuracy is improved. It is designed to improve the quality of synthesized speech. Further, a D / A conversion circuit 15 provided with an inverse difference circuit 14 and a low-pass filter for canceling the preprocessing by the inverse filtering is provided at the subsequent stage of the voice synthesizing means 5.

以下、実施例の動作原理について説明する。いま、声道
断面積上で直線補間することを考えた場合において、補
間の始まりの点すなわち先行するＣＶ音節の最後の区間
の声道断面積を▲Ａ^s _i▼，▲Ａ^s _i+1▼、補間の終わりの
点すなわち後続のＣＶ音節の最初の区間の声道断面積を
▲Ａ^e _i▼，▲Ａ^e _i+1▼とすると、求めたい点のκパラメ
ータは補間の始まりの点から求めたい点までの長さを
ｌ、補間の始まりから終わりまでの長さをＬとおくと、となる。ここで、ｄκｉ／dx、ｄκｉ^２／dx²を求める
と、両式において分子はどちらも定数である。また、０＜▲
Ａ^s _i▼≦▲Ａ^s _i▼＋Δｉｘ≦▲Ａ^e _i▼ または０＜▲Ａ^e _i▼≦▲Ａ^e _i▼＋Δｉｘ≦▲Ａ^s _i▼…
(8) ０＜▲Ａ^s _i+1▼≦▲A^s _i+1▼＋Δｉ_＋１×≦▲A^e _i+1▼ または０＜▲Ａ^e _i+1ｘ≦▲Ａ^e _i+1▼＋Δｉ_＋１ｘ≦▲Ａ
^s _i+1▼……(9) （但し、Ａｉ，Ａｉ、Ａｉ_＋１，Ａｉ_＋１は声道断面積
であるから正の値をとる。）であるから、結局、ｘ（０≦ｘ≦１）に関なく常に正ま
たは負の値をとる。The operating principle of the embodiment will be described below. Now, in the case of considering the linear interpolation on the vocal tract cross-sectional area, the vocal tract cross-sectional area of the starting point of the interpolation, that is, the vocal tract cross-sectional area of the last section of the preceding CV syllable is ▲ A ^s _i ▼, ▲ A ^s _{i + 1.} ▼, If the vocal tract cross-sectional area of the end point of interpolation, that is, the first section of the subsequent CV syllable is ▲ A ^e _i ▼, ▲ A ^e _{i + 1} ▼, the κ parameter of the desired point is the start point of interpolation. Let l be the length from the point to be obtained and L be the length from the start to the end of interpolation, Becomes Here, when dκi / dx and dκi ² / dx ² are calculated, Both numerator in both equations are constant. Also, 0 <▲
A ^s _i ▼ ≤ ▲ A ^s _i ▼ + Δix ≤ ▲ A ^e _i ▼ or 0 <▲ A ^e _i ▼ ≤ ▲ A ^e _i ▼ + Δix ≤ ▲ A ^s _i ▼ ...
(8) 0 <▲ A ^s _{i + 1} ▼ ≦ ▲ A ^s _{i + 1} ▼ + Δi ₊₁ × ≦ ▲ A ^e _{i + 1} ▼ or 0 <▲ A ^e _{i + 1} x ≦ ▲ A ^e _{i + 1} ▼ + Δi ₊₁ x ≦ ▲ A
^s _{i + 1} ▼ (9) (However, since Ai, Ai, Ai ₊₁ and Ai ₊₁ are vocal tract cross-sectional areas, they take positive values.) Therefore, x (0 ≦ x ≦ 1) ) Regardless of whether it always takes a positive or negative value.

ここにdκi／dxは常に正または負の値であるといること
はκｉはｘに関して単調な関数であることを意味し、d²
κi＝dx²が常に正または負の値であることはκｉはｘに
関して変曲点を持たず、常に上に凸、下に凸の関数とな
ることを意味する。The fact that dκi / dx is always a positive or negative value here means that κi is a monotone function with respect to x, and d ²
The fact that κi = dx ² is always a positive or negative value means that κi does not have an inflection point with respect to x and is always a function that is convex upward and convex downward.

これにより、κｉのｘに関する関数の形はの値を見ることによりある程度予測できると考えられ
る。第２図は、声道断面積上で直線補間した場合におけ
るκパラメータの変化の例を示している。Therefore, the form of the function of κi with respect to x is It can be predicted to some extent by looking at the value of. FIG. 2 shows an example of changes in the κ parameter when linear interpolation is performed on the vocal tract cross-sectional area.

そこで、κｉの関数の形としていくつかのパターンを用
意し、ｄκｉ／dx｜_ｘ＝０の値によってどのパターンに
属するかを決めることにより声道断面積上で直線補間す
る場合と近い効果を得ることができることになる。Therefore, some patterns are prepared as the function form of κi, and which pattern belongs to is determined by the value of dκi / dx | _{x = 0} , thereby obtaining an effect similar to that of linear interpolation on the vocal tract cross-sectional area. It will be possible.

以下、実施例の動作について第３図乃至第６図を用いて
具体的に説明する。第３図は音声合成動作を示すフロー
チャートであり、演算部３では、入力された発生音の文
字系列データから読み出すでべき音節を抽出し、抽出し
た音節に対応する音声合成データを音声合成パラメータ
記憶部１から読出す。次に、読み出された音声合成パラ
メータに基いて補間演算（後述）を行うとともに、規則
データによる音節長の決定を行い、κパラメータ系列、
ピッチパラメータ系列および振幅パラメータ系列を決定
する。このようにして作成された音声合成パラメータは
適宜データ圧縮され、音声合成用データとして記憶され
る。続いて、文字系列データから次の音節の抽出が行な
われて上述した動作が文末になるまで繰り返し行なわ
れ、文末になった時点で音声合成手段５による音声合成
が行なわれる。次に、音節の接続部におけるκパラメー
タの補間演算について説明する。いま、各ＣＶ音節また
はＶ音節のパラメータは定常部に達した点までのものが
音声合成パラメータ記憶部１に記憶してあり、最後の区
間のパラメータのリピート回数によって合成音の音節長
を調整するようになっている。また、音節の接続部にお
いては、先行する音節の最後の区間に対する声道断面積
と次の音節の最初の区間の声道断面積とからκパラメー
タの補間を行って接続する。第４図は各音節の非定常部
のκパラメータデータ▲Ｐⁱ ₁▼〜▲Pⁱ ₅▼、定常部のκ
パラメータデータＰ_５および各音節の最初、最後の区間
のκパラメータに対応する声道断面積データ▲Ａⁱ ₁▼，
▲Ａⁱ ₅▼の記憶状態、第５図はＣＶ音節の接続例をそれ
ぞれ示しており、接続部においてはκパラメータ▲Ａⁱ ₅
▼、▲Ａⁱ⁺¹ ₁▼に基いて補間を行うようになっている。
なお、音声合成データとしては、この他に有声／無声判
定パラメータ、振幅パラメータ、ピッチパラメータなど
が必要であることは言うまでもない。次に、κパラメー
タの補間は、(10)式よりｄκｉ／dx｜_ｘ＝０を求め、そ
の値より補間テーブル７を参照して補間パターンを選択
して補間を行うようになっている。第６図は、例えば、
４ステップで補間を行い、増加に関して３つ、減少に関
して３つのパターンがあるとしたときの補間テーブルの
パターンデータを示している。ここに、補間パターンを
選択するための計算が少しでも容易になるようにｄκｉ
／dxの代わりに−１／２（ｄκｉ／dx）を求め、それに
対応したしきい値を容易することにし、そのときに、となった場合にはパターン３を用いることになり、１ステップ目の値 κ＝▲κ^s _i▼＋（1/16）Δκｉ２ステップ目の値 κ＝▲κ^s _i▼＋（3/16）Δκｉ３ステップ目の値 κ＝▲κ^s _i▼＋（8/16）Δκｉとなる。但し、Δκｉ＝▲κ^s _i▼−▲κ^e _i▼である。The operation of the embodiment will be specifically described below with reference to FIGS. 3 to 6. FIG. 3 is a flowchart showing a voice synthesizing operation. In the arithmetic unit 3, a syllable to be read out is extracted from the input character string data of the generated sound, and the voice synthesizing data corresponding to the extracted syllable is stored in the voice synthesizing parameter storage. Read from Part 1. Next, interpolation calculation (described later) is performed based on the read voice synthesis parameter, and the syllable length is determined by the rule data, and the κ parameter sequence,
Determine a pitch parameter sequence and an amplitude parameter sequence. The voice synthesis parameter created in this way is appropriately data-compressed and stored as voice synthesis data. Then, the next syllable is extracted from the character sequence data and the above-described operation is repeated until the end of the sentence, and at the end of the sentence, the voice synthesizing means 5 performs voice synthesis. Next, the interpolation calculation of the κ parameter in the syllable connection will be described. Now, the parameters of each CV syllable or V syllable up to the point of reaching the stationary part are stored in the voice synthesis parameter storage unit 1, and the syllable length of the synthesized voice is adjusted by the number of repeats of the parameter in the last section. It is like this. At the syllable connection, the κ parameter is interpolated from the vocal tract cross-sectional area of the last section of the preceding syllable and the vocal tract cross-sectional area of the first section of the next syllable to connect. FIG. 4 shows κ parameter data ▲ P ⁱ ₁ ▼ to ▲ P ⁱ ₅ ▼ of the non-stationary part of each syllable, and κ of the stationary part.
Parameter data P ₅ and vocal tract cross-sectional area data ▲ A ⁱ ₁ ▼ corresponding to the κ parameter in the first and last sections of each syllable,
The memory state of ▲ A ⁱ ₅ ▼, Fig. 5 shows examples of connection of CV syllables, respectively, and κ parameter ▲ A ⁱ _{5 at the} connection part.
Interpolation is performed based on ▼ and ▲ A ^{i + 1} ₁ .
Needless to say, voiced / unvoiced determination parameters, amplitude parameters, pitch parameters and the like are required as the voice synthesis data. Next, in the interpolation of the κ parameter, dκi / dx | _{x = 0} is obtained from the equation (10), and the interpolation pattern is selected by referring to the interpolation table 7 from the value to perform the interpolation. FIG. 6 shows, for example,
The pattern data of the interpolation table when interpolation is performed in four steps and there are three patterns for increase and three patterns for decrease is shown. Here, in order to make the calculation for selecting the interpolation pattern a little easier, dκi
Instead of / dx, -1/2 (dκi / dx) is calculated, and the threshold value corresponding to that is facilitated. At that time, If, the pattern 3 is used, and the value of the first step κ = ▲ κ ^s _i ▼ + (1/16) Δκ _i the value of the second step κ = ▲ κ ^s _i ▼ + (3/16 ) Δκi 3 th step value κ = ▲ κ ^s _i ▼ + a (8/16) Δκi. ^{_{However, Δκi = ▲ κ s i ▼}} - ▲ κ e it is a ▼.

補間のパターンについて、ｊステップ目の増分を Δ▲κ^j _i▼＝（ｎ^ｊ／ａ）Δκｉｎは整数を求めておき、実際にΔ▲κ^j _i▼を求めるときにその値
をｎ回加算するだけで良いので計算が容易になる。さら
にａを２のべき乗としておくと、Δκｉ／ａの計算が非
常に容易になる。Regarding the interpolation pattern, the increment at the j-th step is Δ ▲ κ ^j _i ▼ = (n ^j / a) Δκ _i n is an integer, and the value is repeated n times when actually calculating Δ ▲ κ ^j _i ▼. The calculation is easy because all you have to do is add. Furthermore, if a is a power of 2, the calculation of Δκi / a becomes very easy.

このようにして補間した場合と、声道断面積上で実際に
直線補間してκパラメータに変換する場合の計算の複雑
さについて比較すると、前者の場合には一度−1/2（ｄ
κｉ／dx｜_ｘ＝０）を計算する必要があり、後者は各ス
テップ毎にκ＝（Ａｉ−Ａｉ_＋１）／（Ａｉ＋Ａ
ｉ_＋１）の計算を行う必要がある。また、前者において
は、(10)式より−1/2（ｄκｉ／dx｜_ｘ＝０）の計算の
ために３回の乗算と１回の除算を必要とし、一見複雑な
計算のように見えるが、この計算値は補間のパターンを
選択するためのしきい値と比較するだけであるので、そ
れほどの精度を必要としない。一方後者においては、κ
ｉは実際のκパラメータであり、１０ビット程度の精度
が必要であると考えられるので、κパラメータの計算精
度はかなり高精度が要求されることになり、補間計算は
後者よりも前者のほうがかなり容易になる。なお、補間
方法を用いた場合にあっても、声道断面積上で直線補間
した場合の効果は十分維持されることは言うまでもな
い。また、本実施例では、合成パラメータを圧縮してお
り、このようなデータ圧縮を行う場合には計算量が多く
なるが、記憶容量を小さくでき、総合的に見てコンパク
トで安価な音声合成装置が得られることになる。もちろ
んデータ圧縮を行わなくても良いことは言うまでもな
い。また、補間処理は、合成音の品質に大きな影響を与
える低次のκパラメータについてのみ行い、高次のκパ
ラメータについては直線補間するようにしても良い。Comparing the calculation complexity in the case of interpolating in this way and in the case of actually linearly interpolating on the vocal tract cross-sectional area and converting into the κ parameter, in the former case, once −1/2 (d
κi / dx | _{x = 0} ) needs to be calculated, and the latter is κ = (Ai−Ai _{+ 1} ) / (Ai + A) for each step.
i _{+ 1} ) needs to be calculated. Also, in the former case, it is necessary to perform three multiplications and one division for the calculation of −1/2 (dκi / dx | _{x = 0} ) from the equation (10), which seems to be a complicated calculation at first glance. However, since this calculated value is only compared with the threshold value for selecting the interpolation pattern, it does not require so much accuracy. On the other hand, in the latter case, κ
Since i is an actual κ parameter and it is considered that accuracy of about 10 bits is required, the calculation accuracy of the κ parameter is required to be quite high, and the interpolation calculation in the former is considerably higher than that in the latter. It will be easier. Needless to say, the effect of linear interpolation on the vocal tract cross-sectional area is sufficiently maintained even when the interpolation method is used. Further, in the present embodiment, the synthesis parameter is compressed, and when such data compression is performed, the amount of calculation is large, but the storage capacity can be reduced, and the speech synthesizer is compact and inexpensive overall. Will be obtained. Of course, it goes without saying that data compression need not be performed. In addition, the interpolation process may be performed only for low-order κ parameters that have a large effect on the quality of synthesized speech, and linear interpolation may be performed for high-order κ parameters.

［発明の効果］本発明は上述のように、子音と母音とよりなるＣＶ音節
を単位とした音節合成パラメータを記憶する音節合成パ
ラメータ記憶部と、規則データを記憶する規則データ記
憶部と、文字系列データのような合成入力データ、音節
合成パラメータおよび規則データに基いて音声合成デー
タを演算するとともに、音節の接続部のパラメータを補
間処理する演算部とで規則合成手段を形成し、上記規則
合成手段にて処理された音声合成データにより音声を合
成するＰＡＲＣＯＲ方式の音声合成手段を具備して成る
音声合成装置において、音節間の接続を行う際における
先行音節、後続音節の接続部の声道断面積をそれぞれ▲
Ａ^s _i▼，▲Ａ^s _i+1▼、▲Ａ^e _i▼，▲Ａ^e _i+1▼とすると、の値を求め、この値より予め記憶しているκパラメータ
の補間パターンを選択して補間を行うように演算部を形
成したものであり、声道断面積上で直線補間した場合と
同様の自然性、明瞭性が良い合成音を得ることができ、
しかも、補間計算量を少なくすることができるので、安
価なＣＰＵを用いて演算部を形成でき、コストを安くす
ることができるという効果がある。[Advantages of the Invention] As described above, the present invention has a syllable synthesis parameter storage unit that stores syllable synthesis parameters in units of CV syllables composed of consonants and vowels, a rule data storage unit that stores rule data, and a character. The rule synthesizing means is formed by calculating speech synthesizing data based on synthetic input data such as sequence data, syllable synthesizing parameters, and rule data, and forming a rule synthesizing unit with an arithmetic unit that interpolates parameters of syllable connection parts. In a voice synthesizing apparatus comprising a PARCOR type voice synthesizing means for synthesizing a voice by means of voice synthesizing data processed by the means, a vocal tract disconnection between a preceding syllable and a succeeding syllable when connecting syllables. Area is ▲
If A ^s _i ▼, ▲ A ^s _{i + 1} ▼, ▲ A ^e _i ▼, ▲ A ^e _{i + 1} ▼, The calculation unit is formed so as to perform the interpolation by selecting the value of γ parameter and selecting the interpolation pattern of the κ parameter stored in advance from this value, and the same natural interpolation as in the case of linear interpolation on the vocal tract cross-sectional area is performed. It is possible to obtain a synthetic sound with good sex and clarity.
Moreover, since the amount of interpolation calculation can be reduced, the calculation unit can be formed by using an inexpensive CPU, and the cost can be reduced.

[Brief description of drawings]

第１図は本発明一実施例のブロック回路図、第２図乃至
第６図は同上の動作説明図である。１は音声合成パラメータ記憶部、２は規則データ記憶
部、３は演算部、４は規則合成手段、５は音声合成手
段、７は補間テーブルである。FIG. 1 is a block circuit diagram of an embodiment of the present invention, and FIGS. 2 to 6 are operation explanatory diagrams of the same. Reference numeral 1 is a voice synthesis parameter storage unit, 2 is a rule data storage unit, 3 is a calculation unit, 4 is a rule synthesis unit, 5 is a voice synthesis unit, and 7 is an interpolation table.

Claims

[Claims]

1. A syllable synthesis parameter storage unit for storing syllable synthesis parameters in units of CV syllables consisting of consonants and vowels, a rule data storage unit for storing rule data, and synthetic input data such as character sequence data. , The synthesizing parameter and the rule data are used to calculate the voice synthesizing data, and the rule synthesizing unit is formed by the calculating unit that interpolates the parameter of the syllable connection part, and the voice synthesizing process is performed by the rule synthesizing unit. PA that synthesizes voice with data
A speech synthesizing apparatus comprising RCOR type speech synthesizing means, wherein a preceding syllable at the time of connecting between syllables,
The vocal tract cross-sectional areas of the connecting parts of the following syllables are respectively ▲ A ^s _i ▼, ▲
If A ^s _{i + 1} ▼, ▲ A ^e _i ▼, ▲ A ^e _{i + 1} ▼, The speech synthesizer is characterized in that the arithmetic unit is formed so as to perform the interpolation by obtaining the value of, and from this value, select the interpolation pattern of the κ parameter stored in advance in the interpolation table.