JPS6346497A

JPS6346497A - Voice synthesization system

Info

Publication number: JPS6346497A
Application number: JP61191007A
Authority: JP
Inventors: 哲也酒寄; 佐々部　昭一; 博雄北川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-04-18
Filing date: 1986-08-14
Publication date: 1988-02-27
Anticipated expiration: 2013-05-13
Also published as: JP2749802B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、音声合成方式、より詳細には、音声合成にお
けるピッチパターンの生成方式、及び、振幅パターンの
生成方式に関する。TECHNICAL FIELD The present invention relates to a speech synthesis method, and more particularly to a pitch pattern generation method and an amplitude pattern generation method in speech synthesis.

又に文に音声合成において、その品質を左右する要因に自然な韻
律制御が上げられている。韻律には、音韻継続時間、振
幅、ピッチ周波数などの要素があるにれらのうち、ピン
チパターン制御について、従来１点ピッチやアクセント
型によるモデル化などが行なわ九できた。日本語のアク
セントは高低アクセントであり、アクセント型によるピ
ッチパターンのモデル化は有効な手法であるが、単語の
モーラ数や音韻性については考慮されていない。Furthermore, natural prosodic control is cited as a factor that influences the quality of sentence-to-speech synthesis. Prosody includes elements such as phonological duration, amplitude, and pitch frequency. Conventionally, modeling using one-point pitch and accent type has been performed for pinch pattern control. Japanese accent is a pitch accent, and modeling pitch patterns based on accent types is an effective method, but it does not take into account the number of moras or phonological characteristics of words.

また、現在、これらを考慮して任意の単語のピンチパタ
ーンを生成することはできない。更に、振幅パターン制
御については１日本語のアクセントが高低アクセントで
あるためか、ピッチ制御規則等に比べて技術開発が遅れ
ており、音韻毎に平均振幅値を計算しこれを音韻固有の
振幅とする方法や、ピッチパターン変化に係数を乗じて
振幅パターン変化とする方法などが存在するのみである
。Furthermore, it is currently not possible to generate a pinch pattern for any word by taking these into account. Furthermore, with regard to amplitude pattern control, technological development has lagged behind pitch control rules, probably because Japanese accents are pitch accents, so it is necessary to calculate the average amplitude value for each phoneme and use this as the phoneme-specific amplitude. There are only two methods, such as a method of changing the pitch pattern and a method of multiplying the change in the pitch pattern by a coefficient to obtain a change in the amplitude pattern.

目的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声合成装置のピッチパターン生成を多変量統計
解析の手法を利用して行うことにより、自然性の高い音
声を合成できるようにピッチパターンを定めること、及
び、音声の規則合成において、韻律の自然性を高めるよ
うに合成音声の振幅パターンを生成することを目的とし
てなされたものである。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, by using multivariate statistical analysis techniques to generate pitch patterns in speech synthesizers, it is possible to determine pitch patterns that can synthesize highly natural speech, and to improve prosody in the rule-based synthesis of speech. This was done for the purpose of generating an amplitude pattern of synthesized speech so as to enhance its naturalness.

構成本発明は、上記目的を達成するために、入力文字列また
は音韻系列に応じて、予め用意した音声素片のパラメー
タ系列を読み出し、結合規則によって接続し、韻律規則
によって韻律を付与する音声規則合成方式において、ピ
ッチパターン生成に寄与する定性的パラメータ（単語の
アクセント型。Structure In order to achieve the above object, the present invention provides a speech rule that reads out a parameter series of speech segments prepared in advance according to an input character string or a phoneme series, connects them using a combination rule, and adds prosody using a prosody rule. In the synthesis method, qualitative parameters (word accent type) that contribute to pitch pattern generation.

単語のモーラ数、音韻系列等）を用いて数量化１類分析
（カテゴリカル重回帰分析）を行なうことによって、；
Ｉｌ’ｌ定データとの最小２乗誤差を最小にする予測モ
デルを作り、これによってピッチ値を得てピッチパター
ンを生成すること、及び、予め用意した音声素片のパラ
メータ系列を入力文字列に従って読み出し、結合規則に
よって接続し、韻律規則によって韻律を付加する音声規
則合成方式において、種々のパラメータを多変量統計解
析的に処理して最適な制御値を得て振幅パターンを生成
することを特徴としたものである。以下、本発明の実施
例に基づいて説明する。By conducting quantitative type 1 analysis (categorical multiple regression analysis) using the number of moras of words, phonological sequence, etc.
Create a prediction model that minimizes the least squares error with Il'l constant data, obtain a pitch value using this model, and generate a pitch pattern. In the speech rule synthesis method that reads and connects using connection rules and adds prosody using prosody rules, it is characterized by processing various parameters using multivariate statistical analysis to obtain optimal control values and generate amplitude patterns. This is what I did. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例であるピッチパターン生成
方式の一例を説明するためのブロック線図で、図中、１
は音声分析データ部、２は数量化Ｉ類解析部、３は入力
定性的パラメータ部、４は予測モデル部、５はピッチ補
間部で、不実施例は、前記従来技術の欠点を改善するた
めに、ピッチパターン生成に寄与する種々のパラメータ
を多変量統計解析法を用いて、音韻性等も考慮したピッ
チパターンを生成するモデルを構成し、自然性の高い合
成音声を合成できるようにしたもので、本実施例におい
ては、数量化１１Ｍ解析によって定性的な要因を用いて
目的変数を予測するモデルを考える。予測する目的変数
Ｙｉとしてただしδ（ｊｋ）はダニ−変数で、Ｒは要因アイテム数。FIG. 1 is a block diagram for explaining an example of a pitch pattern generation method that is an embodiment of the present invention.
2 is a speech analysis data section, 2 is a quantification type I analysis section, 3 is an input qualitative parameter section, 4 is a prediction model section, and 5 is a pitch interpolation section. Using multivariate statistical analysis of various parameters that contribute to pitch pattern generation, we constructed a model that generates pitch patterns that take into account phonological characteristics, etc., making it possible to synthesize highly natural synthesized speech. In this example, a model for predicting a target variable using qualitative factors through quantification 11M analysis will be considered. As the target variable Yi to predict, δ(jk) is a tick variable, and R is the number of factor items.

Ｃｊはｊ番目のアイテム内のカテゴリー数。Cj is the number of categories in the jth item.

ａｊｋはカテゴリーに対する係数（数量）。ajk is the coefficient (quantity) for the category.

である。It is.

（ａｊｋ）は、Ｙｉによりサンプル値ｙｉを最適に予測
するために、 Σ（ｙｉ−Ｙｉ）２→最小ｉ＝１を満すように数ｋ　（ａｊｋ）を定める。In order to optimally predict the sample value yi using Yi, the number k (ajk) is determined so as to satisfy Σ(yi-Yi)2→minimum i=1.

さて、数ｆｔ（ａｊｋ）を求めるためには、多数の音声
サンプルを用意し、目的変数とするピッチ値（音韻の重
心、中心や母音のエネルギー重心あるいは音韻の境界な
どの音韻あるいは音節について少なくとも１点における
ピッチ値）を観測し、そのピッチ点について前後の音韻
環境やモーラ位置、単語モーラ数、アクセント型などの
ピッチパターンに寄与する定性的パラメータを求め（第
２図において、Ａ点は母音の中心を示す）、これらを要
因アイテムとすればよい。このようにして、構成された
モデルに、求めるべきピッチ点での定性的パラメータを
入力として、その点におけるピッチ（予測値）を求める
ことができ、これらの点を補間して単語のピッチパター
ンとすることが可能となる。Now, in order to find the number ft (ajk), prepare a large number of speech samples, and set the pitch value as the target variable (at least one pitch value for the phoneme or syllable, such as the center of gravity of the phoneme, the energy center of the center or vowel, or the boundary of the phoneme). The pitch value at a point) is observed, and the qualitative parameters that contribute to the pitch pattern, such as the phonological environment, mora position, word mora number, and accent type, are determined for that pitch point (in Figure 2, point A is the vowel (indicating the center), these can be used as factor items. In this way, by inputting the qualitative parameters at the pitch point to be determined into the configured model, the pitch (predicted value) at that point can be determined, and these points can be interpolated to form the pitch pattern of the word. It becomes possible to do so.

また、日本語単語のアクセントを表現するピッチパター
ンはおよそ第３図に示すように直線近似で表わすことが
でき（第３図においてＢ点はピッチ変化点を示す）、こ
れらの節点の位置とそのピッチ値の２つを目的変数とし
て数量化■類分析法によって上記同様にモデル化するこ
ともできる６第４図は、本発明の他の実施例である振幅
パターン生成方式の一例を説明するためのブロック線図
で、この実施例は、自然な振幅パターンを生成するため
、振幅パターンに影響を与えると考えられる種々のパラ
メータを同時に扱い、多変量統計解析的処理によってパ
ラメータ全体として最適な予測値を得る制御モデルを作
り、これによって制御するようにしたもので、図中、ｌ
ｌａは実測振幅値部、１１ｂは要因アイテム部、１２は
数量化■類分析部、１３は入力要因アイテム部、１４は
予４１リモデル部、１５は予測振幅値部である。第４図
において、音韻の重心、中心や母音のエネルギー重心あ
るいは音韻境界など、音韻あるいは音節について少なく
とも一点を振幅変化点と仮定し、この点における振幅値
を外的基準にとり１前後の音韻環境、モーラ位置、モー
ラ数、アクセント情報などを要因アイテムにとり、数量
化Ｉ類分析を行うことによって制御モデルを設定するよ
うにしている。このようにして構成されたモデルによっ
て、求めるべき振幅変化点での要因アイテムを入力とし
てその点における振幅予測値を求めることができ、二九
らの点を補間して振幅パターンを生成することが可能と
なる。さらに振幅変化点（第５図り点）の位置とその振
幅値の２つを外的基準の取り、数量化１ｍ分析によって
上記と同様にモデル化することもできる。In addition, the pitch pattern that expresses the accent of Japanese words can be approximately expressed by linear approximation as shown in Figure 3 (point B in Figure 3 indicates the pitch change point), and the positions of these nodes and their It can also be modeled in the same way as above using the quantification method using the two pitch values as objective variables.6 Figure 4 is for explaining an example of an amplitude pattern generation method which is another embodiment of the present invention. In this block diagram, in order to generate a natural amplitude pattern, this embodiment simultaneously handles various parameters that are thought to affect the amplitude pattern, and uses multivariate statistical analysis to find the optimal predicted value for the parameters as a whole. A control model was created to obtain the
la is an actual amplitude value section, 11b is a factor item section, 12 is a quantification type analysis section, 13 is an input factor item section, 14 is a prediction 41 remodeling section, and 15 is a predicted amplitude value section. In Figure 4, at least one point for a phoneme or syllable, such as the center of gravity of a phoneme, the energy center of a vowel, or a phoneme boundary, is assumed to be an amplitude change point, and the amplitude value at this point is taken as an external reference, and the phoneme environment around 1 is calculated. A control model is set by performing quantitative type I analysis using factors such as mora position, number of moras, and accent information. With the model configured in this way, it is possible to obtain the predicted amplitude value at that point by inputting the factor item at the point of amplitude change to be obtained, and it is possible to generate an amplitude pattern by interpolating the points of Niku et al. It becomes possible. Furthermore, it is also possible to take the position of the amplitude change point (fifth measurement point) and its amplitude value as external standards, and model it in the same manner as above by quantification 1m analysis.

効　　　果以上の説明から明らかなように、本発明によると、統計
的解析法（数量化Ｉ類分析法）を用いて。Effects As is clear from the above explanation, according to the present invention, a statistical analysis method (quantification type I analysis method) is used.

定性パラメータに対するモデルを構成し、自然性の高い
合成音を得ることのできるピッチパターンを生成するこ
とができる。また、多変量統計解析法（数量化■類分析
法）を用いて、定性パラメータに対するモデルを構成し
、自然性の高い合成音を得ることができるように振幅パ
ターンを生成することができる。By constructing a model for qualitative parameters, it is possible to generate pitch patterns that can produce highly natural synthesized sounds. Furthermore, by using a multivariate statistical analysis method (quantification type 1 analysis method), a model for qualitative parameters can be constructed, and an amplitude pattern can be generated so that a highly natural synthesized sound can be obtained.

[Brief explanation of drawings]

第１図は、本発明の一実施例であるピッチパターンを生
成方式の一例を説明するためのブロック線図、第２図は
、ピッチ周波数と母音の重心の関係を示す図、第３図は
、ピッチ周波数とピッチ変化点との関係を示す図、第４
図は、本発明の他の実施例である振幅パターン生成方式
の一例を説明するためのブロック線図、第５図は、振幅
変化点を示す図である。１・・・音声分析データ部、２・・・数量化Ｉ類分析部
。３・・・入力定性的パラメータ部、４・・・予ｆｌｌ’
ｌモデル部。５・・・ピッチ値部、１１ａ・・実測振幅値部、１１ｂ
・・・要因アイテム、１２・・・数量化■類分析部、１
３・・・入力要因アイテム部、１４・・・予測モデル部
。１５・・・予測振幅値部。特許出願人　　　株式会社　リコー第１図Ｍ２図（＋牙）（？１１）第　３　図FIG. 1 is a block diagram for explaining an example of a pitch pattern generation method according to an embodiment of the present invention, FIG. 2 is a diagram showing the relationship between pitch frequency and the center of gravity of a vowel, and FIG. , a diagram showing the relationship between pitch frequency and pitch change point, 4th
The figure is a block diagram for explaining an example of an amplitude pattern generation method which is another embodiment of the present invention, and FIG. 5 is a diagram showing amplitude change points. 1...Speech analysis data section, 2...Quantification type I analysis section. 3... Input qualitative parameter section, 4... Prefull'
l model department. 5... Pitch value section, 11a... Actual amplitude value section, 11b
...Factor item, 12...Quantification ■Type analysis section, 1
3... Input factor item section, 14... Prediction model section. 15... Predicted amplitude value section. Patent applicant: Ricoh Co., Ltd. Figure 1 Figure M2 (+Fang) (?11) Figure 3

Claims

[Claims]

(1) Contributes to pitch pattern generation in a speech rule synthesis method that reads a parameter sequence of speech units prepared in advance according to an input character string or phoneme sequence, connects them using a combination rule, and adds prosody using a prosodic rule. The method is characterized in that it creates a prediction model that minimizes the least squares error with measured data by performing quantitative type I analysis using qualitative parameters, and thereby obtains pitch values and generates pitch patterns. Speech synthesis method.

(2) at least one for each phoneme or syllable;
A speech synthesis method according to claim (1), characterized in that a quantified Class I analysis is performed using a pitch value of a point as an objective variable.

(3) The speech synthesis method according to claim (1), characterized in that a quantitative Class I analysis is performed using the position of a pitch pattern change point and its pitch value as objective variables.

(4) The speech synthesis method according to claim (3), wherein the position of a pitch pattern change point is expressed as a value relative to the normalized duration length of each phoneme or syllable.

(5) In the speech rule synthesis method, which reads the parameter series of speech segments prepared in advance according to the input character string, connects them using connection rules, and adds prosody using prosody rules, various parameters are processed using multivariate statistical analysis. A speech synthesis method characterized by generating an amplitude pattern by obtaining an optimal control value.

(6) Quantification using qualitative parameters (accent information for each utterance, number of moras, phonological sequence, strong information, etc.) that are considered to influence the amplitude pattern to be generated.
A speech synthesis method according to claim (5), characterized in that a prediction model that minimizes a square error with an actual measurement value is created by performing Class I analysis, and thereby an amplitude pattern is generated.

(7), at least one for each phoneme or syllable;
A speech synthesis method according to claim (6), characterized in that a quantification type I analysis is performed using an amplitude value of a point as an objective variable.

(8) A speech synthesis method according to claim (6), characterized in that a quantification type I analysis is performed using the position of a change point of an amplitude pattern and its amplitude value as objective variables.

(9) The speech synthesis method according to claim (8), wherein the position of the change point of the amplitude pattern is expressed as a value relative to the normalized duration length of each phoneme or syllable.