JP2560277B2

JP2560277B2 - Speech synthesis method

Info

Publication number: JP2560277B2
Application number: JP60235724A
Authority: JP
Inventors: 順子栗林
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1985-10-21
Filing date: 1985-10-21
Publication date: 1996-12-04
Anticipated expiration: 2011-12-04
Also published as: JPS6294900A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声合成方式に関する。The present invention relates to a speech synthesis system.

〔従来の技術〕音声合成方式は大別するとパラメータ方式と波形符号
化方式に分けられる。[Prior Art] Speech synthesis methods are roughly classified into a parameter method and a waveform coding method.

前者は、ホルマント,RARCOR,LPC方式などが代表的な
ものである。このような声道パラメータを使用する方式
は、データの圧縮率が高いという利点を持っているが、
その分析の過程の条件により音質に非常なばらつきが生
じており、音質は自然性に欠けるという欠点がある。Typical examples of the former are formant, RARCOR, and LPC methods. The method using such a vocal tract parameter has an advantage of high data compression rate,
Due to the condition of the analysis process, the sound quality varies greatly, and the sound quality lacks naturalness.

後者は、DM,ADM,ADPCM方式などがある。これらの方式
は、良質な合成音質が得られるがパラメータ方式に比較
し、１秒の音声を合成するのに必要なデータ量が多いと
いう欠点がある。この為、波形素片合成方式と組合せ、
データ量を圧縮する方法が用いられている。The latter includes DM, ADM and ADPCM methods. Although these systems can obtain high quality synthetic sound quality, they have a drawback that a large amount of data is required to synthesize one second of voice as compared with the parameter system. Therefore, in combination with the waveform element synthesis method,
A method of compressing the amount of data is used.

音声信号は各種周波数成分を含んだ複雑な波形である
が、その母音部の波形はある基本同期毎に、類似の波形
が繰返し表われる。波形素片合成方式はこの点に着目
し、繰返し表われる類似波形（以下素片という）を１つ
代表として取出し、（以下この素片を代表素片という）
この素片を繰返すことによって元の類似波形部分を再生
するものである。これにより、全波形のデータを記憶す
る必要がなくデータ量を少なくすることができる。The voice signal has a complicated waveform including various frequency components, but the waveform of the vowel part of the voice signal has a similar waveform repeatedly for each basic synchronization. Focusing on this point, the waveform segment synthesis method takes out one similar waveform that appears repeatedly (hereinafter referred to as a segment) as a representative (hereinafter, this segment is referred to as a representative segment).
By repeating this segment, the original similar waveform portion is reproduced. Thereby, it is not necessary to store the data of all the waveforms, and the data amount can be reduced.

実際の音声信号では、基本周期の振幅値の包絡線は連
続的に変化している。ところが、この波形素片合成方式
は代表素片を繰返し使用する為、繰返し部分の包絡線が
一定となってしまう。この様子を第２図に示す。In an actual voice signal, the envelope of the amplitude value of the fundamental cycle changes continuously. However, in this waveform segment synthesis method, since the representative segment is repeatedly used, the envelope of the repeated portion becomes constant. This is shown in FIG.

第２図はａの範囲で示される代表素片（以下代表素片
ａという）を３回繰返し、ｂの範囲で示される代表素片
（以下代表素片ｂという）を２回繰返した波形図であ
る。これは、代表素片a,代表素片ｂそれぞれの繰返し中
は、包絡線は変化していない。FIG. 2 is a waveform diagram in which a representative element shown in the range of a (hereinafter referred to as representative element a) is repeated three times and a representative element shown in the range of b (hereinafter referred to as representative element b) is repeated twice. Is. This is because the envelope does not change during the repetition of each of the representative pieces a and b.

この為、代表素片を繰返す際に前の繰返し波形に対
し、相対的にどの程度振幅が増減するかという情報であ
る包絡線変化値ΔＥを、代表素片データとは別に保持
し、代表素片繰返し時に、この包絡線変化値ΔＥを正，
負側の波形データに付与する方法がとられてきた。Therefore, when the representative segment is repeated, the envelope change value ΔE, which is information about how much the amplitude increases and decreases relative to the previous repetitive waveform, is stored separately from the representative segment data, and the representative segment data is retained. At the time of one-sided repetition, this envelope change value ΔE is positive,
The method of giving to the waveform data on the negative side has been taken.

包絡線変化値ΔＥの付与は、代表素片の振幅値に包絡
線変化値ΔＥをパラメータとした包絡線データＥを乗算
する形で行なわれる。The envelope change value ΔE is given by multiplying the amplitude value of the representative segment by the envelope data E with the envelope change value ΔE as a parameter.

つまり、１回目の繰返し波形に対しては、常に包絡線
データとしてＥ＝１と代表素片の振幅値との乗算を行
う。従って、代表素片の振幅値はそのままである。２回
目の繰返し波形に対しては、Ｅ＝１＋ΔＥの値を包絡線
データとして、代表素片の振幅値との乗算を行う。３回
目以降の繰返し波形に対しても同様にＥ＝１＋２ΔＥ
（３波形目）,E＝１＋３ΔＥ（４波形目）……の値を包
絡線データとして代表素片の振幅値との乗算を行う。That is, for the first repetitive waveform, the envelope data is always multiplied by E = 1 and the amplitude value of the representative segment. Therefore, the amplitude value of the representative segment remains unchanged. With respect to the second repetitive waveform, the value of E = 1 + ΔE is used as the envelope data and multiplication with the amplitude value of the representative segment is performed. Similarly for the third and subsequent repetitive waveforms, E = 1 + 2ΔE
(3rd waveform), E = 1 + 3ΔE (4th waveform) ... Envelope data is used for multiplication with the amplitude value of the representative segment.

第２図に示される波形に包絡線変化値を付与した場合
を第３図に示す。すなわち第３図は、代表素片ａを３回
繰返す際にΔEaという包絡線変化値を付与し、代表素片
ｂを２回繰返す際にΔEbという包絡線変化値を付与した
場合を示したものである。FIG. 3 shows a case where an envelope change value is added to the waveform shown in FIG. That is, FIG. 3 shows a case where an envelope change value of ΔEa is given when the representative segment a is repeated three times, and an envelope change value of ΔEb is given when the representative segment b is repeated twice. Is.

包絡線変化値の算出方法は、代表素片と次の代表素片
との最大振幅値差から求める方法などがある。式（１）
は、包絡線変化値ΔＥを代表素片と次の代表素片との最
大振幅値差から求める式である。As a method of calculating the envelope change value, there is a method of obtaining it from the maximum amplitude value difference between the representative segment and the next representative segment. Equation (1)
Is an expression for obtaining the envelope change value ΔE from the maximum amplitude value difference between the representative segment and the next representative segment.

RPn :代表素片の繰返し回数 Mn ：代表素片の最大振幅値（絶対値） Mn＋1:次の代表素片の最大振幅値（絶対値）従来、波形素片合成方式は、主に擬音，メロディ等そ
の信号が正弦波，余弦波，くけい波等からなり、しか
も、正負振幅がほぼ対称である信号に使われており、包
絡線変化値ΔＥは各代表素片に対し１つ与えられてい
た。 RPn: Number of repetitions of the representative segment Mn: Maximum amplitude value (absolute value) of the representative segment Mn + 1: Maximum amplitude value (absolute value) of the next representative segment Conventionally, the waveform segment synthesis method mainly uses onomatopoeia and melody. This signal is used for signals whose sine wave, cosine wave, claw wave, etc., and whose positive and negative amplitudes are almost symmetrical, and one envelope variation value ΔE is given to each representative segment. It was

[Problems to be solved by the invention]

上述した従来の波形素片合成方式は各代表素片に対し
て１つの包絡線変化値ΔＥが与えられる為、音声信号の
ように、代表素片の正負振幅が非対称の場合は１つの包
絡線変化値ΔＥを正側の波形データと負側の波形データ
に付与すると最後に繰返された素片と、次の１回目繰返
しの素片との間に急激な振幅の変化が生じ、この歪の為
ざらついた耳ざわりな音が発生するという問題があっ
た。In the above-described conventional waveform segment synthesis method, one envelope change value ΔE is given to each representative segment, so that one envelope segment when the positive and negative amplitudes of the representative segment are asymmetrical like a voice signal. When the change value ΔE is added to the positive-side waveform data and the negative-side waveform data, a sudden change in amplitude occurs between the last repeated element and the next first-repeated element, and this distortion There was a problem that a gritty, gritty sound was generated.

すなわち第３図に示したように、代表素片ａの包絡線
変化値ΔＥを、正側の波形データ及び負側の波形データ
に付与した場合、代表素片ａの３回目の振幅と、次の代
表素片ｂの１回目の振幅との間には急激な振幅の変化が
生じてしまい耳ざわりな音が発生する。That is, as shown in FIG. 3, when the envelope change value ΔE of the representative segment a is added to the positive-side waveform data and the negative-side waveform data, the third amplitude of the representative segment a and A sudden change in amplitude occurs between the representative element b and the first amplitude, and a harsh sound is generated.

本発明の目的は歪の少ない音声が得られる音声合成方
式を提供することにある。An object of the present invention is to provide a speech synthesis system that can obtain speech with less distortion.

[Means for solving problems]

本発明の音声合成方式は、波形データの振幅値の増減
に関する包絡線情報をそれぞれ繰返される類似波形から
なる複数の代表素片に付与する波形素片合成方式を用い
る音声合成方式において、入力音声の第１の代表素片の
正側（及び負側）の最大値と次の第２の代表素片の正側
（及び負側）の最大値との増減率を求め、この増減率を
前記第１の代表素片の繰返し数で除した値を包絡線情報
とし、音声合成時に求めた前記包絡線情報を正側の波形
データ及び負側の波形データに別々に与えるものであ
る。The speech synthesis method of the present invention is a speech synthesis method using a waveform segment synthesis method that assigns envelope information relating to an increase / decrease of an amplitude value of waveform data to a plurality of representative segments each having a repeated similar waveform. The increase / decrease rate between the maximum value on the positive side (and the negative side) of the first representative element and the maximum value on the positive side (and the negative side) of the second representative element is calculated, and this increase / decrease rate A value obtained by dividing the number of repetitions of one representative segment is set as envelope information, and the envelope information obtained at the time of speech synthesis is separately given to the positive side waveform data and the negative side waveform data.

〔Example〕

次に、本発明の実施例について図面を用いて説明す
る。Next, embodiments of the present invention will be described with reference to the drawings.

次式（２），（３）はそれぞれ正側の波形データに付
与する包絡線変化値ΔEPn負側の波形データに付与する
包絡線変化値ΔEMnとをを求める式の一例である。The following expressions (2) and (3) are examples of expressions for obtaining the envelope change value ΔEPn given to the positive side waveform data and the envelope change value ΔEMn given to the negative side waveform data, respectively.

MAXn :代表素片データの正側の最大値 MAXn₊₁:次の代表素片データの正側の最大値 MINn :代表素片データの負側の最小値 MINn₊₁:次の代表素片データの負側の最小値 RPn ：繰返し回数この場合、合成時のパラメータとしては、代表素片デ
ータ、繰返し回数そして包絡線データとしてΔEPn,ΔEM
nを与えればよい。 MAXn: Maximum value on the positive side of representative segment data MAXn ₊₁ : Maximum value on the positive side of the next representative segment data MINn: Minimum value on the negative side of the representative segment data MINn ₊₁ : Next representative segment data Negative minimum value of RPn: Number of iterations In this case, the parameters for synthesis are representative segment data, number of iterations, and envelope data ΔEPn, ΔEM
You can give n.

第１図は本発明の一実施例を説明する為の合成波形図
であり、縦軸に振幅値を又横軸に時間を示している。FIG. 1 is a composite waveform diagram for explaining one embodiment of the present invention, in which the vertical axis represents the amplitude value and the horizontal axis represents the time.

第１図の波形は第２図の波形に対し、包絡線変化値を
正側の波形データ，負側の波形データに対しそれぞれ独
立に付与したものである。The waveform of FIG. 1 is obtained by independently adding the envelope change value to the waveform data of the positive side and the waveform data of the negative side with respect to the waveform of FIG.

すなわち、代表素片a,代表素片ｂに対して、それぞれ
正側の波形データに付与する包絡線変化値ΔEPa,ΔEP
b、負側の波形データに付与する包絡線変化値ΔEMa,ΔE
Mbを与え、代表素片を繰返す際に、これらを付与したも
のである。That is, the envelope change values ΔEPa and ΔEP to be added to the waveform data on the positive side for the representative segment a and the representative segment b, respectively.
b, Envelope change value ΔEMa, ΔE given to the waveform data on the negative side
Mb is given and these are added when the representative element is repeated.

つまり、代表素片ａの１回目の繰返しの際にはａと１
との乗算を行い、２回目の繰返しの際には代表素片ａの
振幅値が正であれば、１＋ΔEPnとの乗算を行い、振幅
値が負であれば、１＋ΔEMnとの乗算を行う。３回目に
ついても同様に、振幅値が正であれば１＋２ΔEPn、振
幅値が負であれば１＋２ΔEMnとの乗算を行う。That is, when the representative segment a is repeated for the first time, a and 1
When the amplitude value of the representative segment a is positive, it is multiplied by 1 + ΔEPn, and when the amplitude value is negative, it is multiplied by 1 + ΔEMn. Similarly, for the third time, if the amplitude value is positive, 1 + 2ΔEPn is multiplied, and if the amplitude value is negative, 1 + 2ΔEMn is multiplied.

このように正側の波形データ及び負側の波形データに
付与する包絡線変化値を別々に付与することにより、異
なる代表素片間の振幅の変化を少くし、歪の少ない音声
を合成することができる。In this way, by adding the envelope change values to the positive-side waveform data and the negative-side waveform data separately, it is possible to reduce the change in amplitude between different representative units and synthesize speech with less distortion. You can

〔The invention's effect〕

以上説明したように本発明によれば、波形素片合成方
式において、正側の波形データ及び負側の波形データに
付与する包絡線変化値を包絡線データとして保持し、代
表素片繰返し時に、包絡線データの付与を、正側の波形
データ，負側の波形データに対し、それぞれ独立に行う
ことにより、代表素片繰返し時の包絡線を連続的に変化
させることができ、歪の少ない音声が得られる。As described above, according to the present invention, in the waveform segment synthesizing method, the envelope change value given to the positive-side waveform data and the negative-side waveform data is held as envelope data, and when the representative segment is repeated, Envelope data can be added to the positive-side waveform data and the negative-side waveform data independently of each other, so that the envelope can be continuously changed when the representative unit is repeated. Is obtained.

[Brief description of drawings]

第１図は本発明の一実施例を説明する為の合成波形図、
第２図は波形素片合成方式を説明する為の合成波形図、
第３図は従来の音声合成方式を説明する為の合成波形図
である。FIG. 1 is a synthetic waveform diagram for explaining one embodiment of the present invention,
FIG. 2 is a synthesized waveform diagram for explaining the waveform segment synthesis method.
FIG. 3 is a synthesized waveform diagram for explaining the conventional speech synthesis method.

Claims

(57) [Claims]

1. A voice synthesizing method using a waveform element synthesizing method for assigning envelope information relating to an increase / decrease of an amplitude value of waveform data to a plurality of representative elements each having a repeated similar waveform. Maximum value on the positive side (and negative side) of the representative segment and the positive side (and negative side) on the second representative segment
Of the maximum value of, and the value obtained by dividing the increase / decrease rate by the number of repetitions of the first representative segment is the envelope information, and the envelope information obtained at the time of speech synthesis is the waveform data on the positive side and A voice synthesis method characterized in that it is given separately to the waveform data on the negative side.