JPS61112199A - Voice synthesization by carrier modulation - Google Patents

Voice synthesization by carrier modulation

Info

Publication number
JPS61112199A
JPS61112199A JP59158900A JP15890084A JPS61112199A JP S61112199 A JPS61112199 A JP S61112199A JP 59158900 A JP59158900 A JP 59158900A JP 15890084 A JP15890084 A JP 15890084A JP S61112199 A JPS61112199 A JP S61112199A
Authority
JP
Japan
Prior art keywords
frequency
vowel
wave
sound
pitch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP59158900A
Other languages
Japanese (ja)
Inventor
若林 昭夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to JP59158900A priority Critical patent/JPS61112199A/en
Publication of JPS61112199A publication Critical patent/JPS61112199A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 この発明は、光が周波数個有の色を持つように音波す周
波数個有の音を持ち、しかも母音の原音であることを出
願人が発見したことにより可能となったもので、素片合
成、線型予測、正弦波重畳等の波形またはスペクトル再
成を目指した従来の方法とは本質的に異なるものであり
、極めて自然に近い音質の母音を合成することができる
画期的方法である。
[Detailed Description of the Invention] This invention was made possible by the applicant's discovery that just as light has a frequency-specific sound, it also has a frequency-specific sound, which is the original sound of a vowel. This method is essentially different from conventional methods that aim at waveform or spectrum reconstruction such as segment synthesis, linear prediction, and sine wave superposition, and can synthesize vowels with extremely natural sound quality. This is an innovative method.

本発明はハード、ソフトのいづれでも実施可能である。The present invention can be implemented using either hardware or software.

原理および実施例について説明する。低周波発振器の出
力を聴(と300Hz程度ではブー、1kHz程度では
パー、2kHz以上ではピーという音に聴こえる。これ
らが破裂音であるのは耳に到達した時に立ち上がるステ
ップ関数で変調されているからで、十分縁やかな立ち上
がりにすれば母音となる。試みにf=100Hzのco
!1波で100%変調をすると母音に聴こえることがわ
かる。このとき変調された波形は第1図のごとく。
The principle and examples will be explained. Listen to the output of a low-frequency oscillator (at about 300 Hz it sounds like a boom, at about 1 kHz it sounds like a buzz, and above 2 kHz it sounds like a beep. These are plosive sounds because they are modulated by a step function that rises when they reach the ear. So, if you make it have a sufficiently sharp rise, it becomes a vowel.
! It can be seen that when 100% modulation is applied to one wave, it sounds like a vowel. The modulated waveform at this time is as shown in FIG.

v (t)= (1+cos (2sft))sin 
(2πnft)で表わされるが、nを3以上の素数に選
ぶと英語母音に対応するものを得ることができる。基本
波以外には高調波関係にない周波数が具なる音となるこ
とは、共鳴ということを考慮すれば予測し得る。勿論、
これらの前後の周波数帯も同じ音を呈し、境界は明確で
はなく中間的母音を経て移り変わる。また基本波が20
0Hzならば第2.3.4高調波がu、o、αで必ずし
も素数倍とはならない、すなわち、母音は音波の波形で
はなく周波数特有の音である。我々は光の周波数に対し
色という感覚または概念を持つように音波の周波数に対
して母音という聴覚または概念を持つのである。そして
、前記のnの値に対して下記のように母音を対応させる
ことができ、更に光の七色に対応させた調合も簡単では
ないが可能である。
v (t) = (1+cos (2sft)) sin
(2πnft), but if n is chosen as a prime number of 3 or more, one corresponding to English vowels can be obtained. Considering resonance, it can be predicted that the sound will consist of frequencies other than the fundamental wave that have no harmonic relationship. Of course,
The frequency bands before and after these also exhibit the same sound, with no clear boundaries and transitions through intermediate vowels. Also, the fundamental wave is 20
At 0 Hz, the 2nd, 3rd, and 4th harmonics are u, o, and α, which are not necessarily prime multiples. In other words, the vowel is not a sound wave waveform but a frequency-specific sound. Just as we have the sense or concept of color for the frequency of light, we have the sense or concept of vowel for the frequency of sound waves. Then, the vowels can be made to correspond to the value of n as shown below, and it is also possible, although not easy, to make a compound that corresponds to the seven colors of light.

n=3 5 7 11 13 17 19 23uoa
  Δ a  a! ei 赤橙黄緑 青藍紫 本発明はこれら新事実の出願人による発見に基くもので
従来法とは異なる全く新規のものである。
n=3 5 7 11 13 17 19 23uoa
Δaa! ei Red-Orange-Yellow-Green Blue-Indigo-Purple The present invention is based on the discovery of these new facts by the applicant, and is completely new and different from conventional methods.

これら周波数特有の音である母音はnが11以上になる
と周波数が高すぎることと、振動数の2乗および振幅の
2乗に比例するエネルギーが大きくなり声の大きさを示
す声帯振動のピッチ周波数成分が相対的に弱くなるため
音声が小さくなることにより実用にならない0本発明は
これらを人間の音声に近い母音に改良するための方法で
あり以下に実施例で説明する。
The frequency of vowels, which are sounds specific to these frequencies, is too high when n is 11 or more, and the pitch frequency of vocal fold vibration indicates the loudness of the voice because the energy increases in proportion to the square of the vibration frequency and the square of the amplitude. Since the components are relatively weak, the sound becomes small, making it impractical.The present invention is a method for improving these vowels into sounds that are closer to human sounds, and will be explained below using examples.

前者はピッチ周基の鋸歯状波で変調すれば耳に聴こえる
見掛の周波数を下げることができるが。
The former can lower the apparent frequency heard by the ear by modulating it with a pitch-based sawtooth wave.

後者は振幅を小さくする必要があり、そ九には鋸歯状波
の振幅も小さくしなければならないがこれはピッチ成分
の減少であるので母音成分に定常値を加えた1+a−s
in (2gnft)、Ova≦1を変調することによ
りピッチ成分の減少を防ぐ、また鋸歯状波も第2高調波
成分の発生を避けるため定常値を加え正の振幅のみとす
る。上記では母音周波を搬送波としたが、このような場
合にはどちらを搬送波と考えても同じで、情報を担うも
のが搬送波であるという考えに従えば鋸歯状波を搬送波
と考えるのが妥当である0周波数の関係は電波と逆にな
るが、電波が搬送波に極めて大きなエネルギーを持たせ
情報自体のエネルギーは僅かであるのに対し、搬送波の
周波数を低くするとそのエネルギーは小さく情報自体の
エネルギーが大きくなるので効率が良くなる。生物は効
率の良い方法を採ると考えられるので鋸歯状波を搬送波
と考えるのである。得られた母音への波形を第2図に示
す、1ピッチLoomsを1バイトのデーター400バ
イトで構成してあり必要な時間これを繰り返し出力する
0図に見られるように、このような変調では重畳とは著
しく異なり、搬送波の振幅減少と共に変調波成分の振幅
も減少する特徴を持つが、これは母音が声帯振動の基本
周波数の高調波ならその振幅は基本波振動の変位に比例
するであろうという推測と一致する。これらの母音は笛
の音のようであり、特にn=11以上ではまだ良質とは
いえないがその改良について述べる前に搬送波に鋸歯状
波を用いる理由を詳しく説明する。
For the latter, it is necessary to reduce the amplitude, and for the ninth, the amplitude of the sawtooth wave must also be reduced, but since this is a decrease in the pitch component, 1 + a - s, which is the addition of the steady value to the vowel component.
in (2gnft) and Ova≦1 to prevent a decrease in the pitch component. Also, in order to avoid the generation of a second harmonic component, the sawtooth wave also has a steady value and has only positive amplitude. In the above, the vowel frequency was used as the carrier wave, but in this case, it is the same no matter which one is considered as the carrier wave.If you follow the idea that the carrier wave is the one that carries information, it is reasonable to consider the sawtooth wave as the carrier wave. The relationship at a certain 0 frequency is the opposite of that of radio waves, but whereas radio waves have extremely large energy in the carrier wave and the energy of the information itself is small, when the frequency of the carrier wave is lowered, the energy is small and the energy of the information itself is small. The larger the size, the better the efficiency. Living organisms are thought to use efficient methods, so sawtooth waves are considered carrier waves. The obtained vowel waveform is shown in Figure 2. One pitch Looms is composed of 400 bytes of data, and this is repeatedly output for the required time. This is significantly different from superposition, in that the amplitude of the modulated wave component also decreases as the amplitude of the carrier wave decreases, but this is because if the vowel is a harmonic of the fundamental frequency of vocal fold vibration, its amplitude is proportional to the displacement of the fundamental wave vibration. This is consistent with the assumption that he is deaf. These vowels are like the sound of a whistle, and cannot be said to be of good quality, especially when n=11 or more.Before discussing the improvement, we will explain in detail the reason for using a sawtooth wave as the carrier wave.

声帯振動の基本周波数を100Hzとすると第30高調
波程度迄の高調波が含まれていると考えられるがその総
和について検討する。これら高調波の振幅が全て等しく
位相が0の正弦波であるとすると、その総和は基本周期
で繰り返すインパルスを微分したものに近づく、第3図
(a)に第10高調波迄の和を10で割ったものを示す
、これは母音アである。この場合は最高周波数とその半
分の周波数の母音の混合である0等振幅の和が白色雑音
となるためには位相がランダムでなければならない0位
相がπ/2だとインパルス列、中間位相では中間的な波
形となる。光のように連続的な周波数を含むと位相の一
致したちの同志がインパルス列を形過し粒子性を与える
と考えら九る。
If the fundamental frequency of vocal fold vibration is 100 Hz, it is thought that harmonics up to about the 30th harmonic are included, but the sum total will be considered. Assuming that all of these harmonics have equal amplitudes and are sine waves with a phase of 0, their sum approaches the differentiation of impulses that repeat in the fundamental period. Figure 3 (a) shows the sum up to the 10th harmonic as This is the vowel a, which shows what is divided by. In this case, in order for the sum of 0 equal amplitudes, which is a mixture of vowels with the highest frequency and half the frequency, to become white noise, the phase must be random.If the 0 phase is π/2, it is an impulse train, and in the intermediate phase It becomes an intermediate waveform. It is thought that when light contains continuous frequencies, comrades with matching phases form an impulse train, giving it a particle nature.

実験的にはこのような母音を作れるが声帯振動の全高調
波が同一振幅ということはエネルギーの点で無理である
。そこで、エネルギー一定すなわち第n高調波の振幅が
基本波の1 / nで位相0の正弦波の総和を求めると
鋸歯状波となる。これは良質な母音つであるが他の周波
数で変調するとその周波数の母音となり単に母音の見か
けの周波数を下げる働きをする。これが鋸歯状波を搬送
波に用いる理由である。実際には第5高調波程度迄加え
れば鋸歯状波の代用として十分である。第3図(b)に
この波形を示す、声帯振動により鋸歯状波が生じ声道の
共鳴により特定の周波数成分が増強されると考えれば声
道理論だが、#!!歯状波にはその成分は一様強度で含
まれるので重畳となり、第2図のような振幅の強弱は現
れない、変調作用が含まれなければならない、声帯自身
の高調波振動が基本振動の時々刻々の変位に比例すると
考えるのは物理的にも妥当でありこれが声帯の変調作用
である。この場合、高調波の総和は鋸歯状波が丸みを帯
びたものとなる。これらの理由でピッチ周期の鋸歯状波
または類似波を搬送波とし母音周波による共振的変調作
用で合成するのが本法の特徴の1つである。そして発声
が呼気のみにより起るので搬送波に定常値を加え正の波
形とする。また共振の位相特性により、共振周波数成分
は位相をπ/2遅らすのがよい。
Although it is possible to create such a vowel experimentally, it is impossible in terms of energy for all harmonics of vocal cord vibration to have the same amplitude. Therefore, if the energy is constant, that is, the amplitude of the nth harmonic is 1/n of the fundamental wave, and the sum of sine waves with phase 0 is determined, a sawtooth wave is obtained. This is a high-quality vowel, but when modulated with another frequency, it becomes a vowel of that frequency and simply works to lower the apparent frequency of the vowel. This is the reason why a sawtooth wave is used as a carrier wave. In fact, adding up to about the fifth harmonic is sufficient as a substitute for the sawtooth wave. This waveform is shown in Figure 3(b).If we consider that the vibration of the vocal cords generates a sawtooth wave and the resonance of the vocal tract enhances a specific frequency component, this is the vocal tract theory, but #! ! The tooth wave contains its components with uniform intensity, so they are superimposed, and the amplitude strength as shown in Figure 2 does not appear.Modulation must be included.The harmonic vibration of the vocal cords itself is the fundamental vibration. It is physically reasonable to think that it is proportional to the momentary displacement, and this is the modulation effect of the vocal cords. In this case, the sum of harmonics is a sawtooth wave with a rounded shape. For these reasons, one of the features of this method is that a sawtooth wave or a similar wave with a pitch period is used as a carrier wave and is synthesized by a resonant modulation effect with a vowel frequency. Since vocalization is caused only by exhalation, a steady value is added to the carrier wave to form a positive waveform. Further, depending on the phase characteristics of resonance, it is preferable that the phase of the resonance frequency component is delayed by π/2.

次に前記の合成母音の音質改良法につき説明する。方法
は極めて簡単で、下表に示す組合せと割合で鼻音を混合
することである。結果の母音のパターンを第4図に示す
0割合は以下のように決める。αのエネルギーを1とし
は望これと等しくなるようにfa/fで原母音の振幅を
決める。へ以上の高周波母音にこの割合でα、o、uを
混入するとエネルギーは2倍になるので両者に0.7を
掛は調整する。
Next, a method for improving the sound quality of the synthesized vowels will be explained. The method is extremely simple: mix nasal sounds in the combinations and proportions shown in the table below. The resulting vowel pattern is shown in Figure 4. The zero percentage is determined as follows. If the energy of α is 1, the amplitude of the original vowel is determined by fa/f so that it is equal to the desired value. If α, o, and u are mixed in this ratio into the high-frequency vowels above , the energy will be doubled, so adjust by multiplying both by 0.7.

uo   (1/に5GNe   i 原母音 2 1.4  1 .5..4..3 .2 
 .2混 入Δ=、3 a+=、l  −a=、7  
o=1  u=1.6u、oに混入するのは第3高調波
でqのエネルギーに換算し約illを逆位相で加える。
uo (5GNe in 1/ i Protovowel 2 1.4 1 .5..4..3 .2
.. 2 Mixing Δ=, 3 a+=, l −a=, 7
o=1 u=1.6u, what is mixed into o is the third harmonic, which is converted into energy of q, and approximately ill is added in opposite phase.

これはピークを潰す歪を与えて音質を改良し周波数を高
めるものである。混合割合は厳密を要するものではなく
好みによりかなり変り得るが位相は重要で。
This applies distortion that crushes peaks to improve sound quality and raise frequencies. The mixing ratio does not have to be exact and can vary considerably depending on your preference, but the phase is important.

1−acos(2+cnft)  +bcos(2gm
ft)、   n<m、  a、b>0による混合波で
搬送波を変調する。
1-acos(2+cnft) +bcos(2gm
ft), the carrier wave is modulated by a mixed wave with n<m, a, b>0.

またn;17以上の母音に0を混入すれば母音eとなる
が、Uを混入す九ば母音iとなり、これら高周波母音の
周波数を下げる働きをもする。その結果母音iをUとe
の中間母音と晃なすことができ、全母音を環状に並べて
表示することができる。これは青色と長波長の、赤色を
混合すると青色より短波長であるはずの紫色を得るが中
間色と見なして環状の色配合図を作るのと同じである。
Also, if 0 is mixed into a vowel of n; 17 or higher, it becomes a vowel e, but if U is mixed in, it becomes a nine-ba vowel i, which also works to lower the frequency of these high-frequency vowels. As a result, the vowel i becomes U and e.
It is possible to display all the vowels arranged in a ring. This is the same as creating a circular color combination diagram by mixing blue with red, which has a long wavelength, and creates purple, which should have a shorter wavelength than blue, but considers it to be an intermediate color.

第4図の各パターンを必要な時間だけ繰り返し出力すれ
ば必要な長さの極めて良質の母音を発声できる。尚、こ
れらを組み合せて単語を発声する場合、明瞭度を良くす
るには母音間の休止時間を十分長くする必要があり、こ
れが短かすぎるとお経のようになってしまう、調音結合
はこれにより生じる。早口の場合も発声時間を短かくシ
、休止時間を十分に保てば明瞭度は失なわれない。
By repeatedly outputting each pattern shown in FIG. 4 for the necessary time, it is possible to produce extremely high-quality vowels of the necessary length. When pronouncing a word by combining these words, it is necessary to make the pause time between vowels sufficiently long to improve intelligibility; if the pause time is too short, it will sound like a sutra. arise. Even if you speak quickly, if you keep your speaking time short and your pauses long enough, your intelligibility will not be lost.

人間の発声する母音は手書き文字に相当するもので、ピ
ッチ周期やフォルマント周波数の変動は人間なるが故の
ばらつきである。こNで合成した母音はこれらを持たな
いが波形および音質は人間の音声に極めて近いものであ
る。
Vowels uttered by humans are equivalent to handwritten letters, and variations in pitch period and formant frequency are due to human nature. Although the vowels synthesized by N do not have these, the waveform and sound quality are extremely close to human speech.

以上のごとく本発明は全く独創的な考えに基くものであ
り、その簡単さと実用性および合成音声の特性において
従来法に比べ著しく優れたものである。
As described above, the present invention is based on a completely original idea, and is significantly superior to conventional methods in terms of simplicity, practicality, and characteristics of synthesized speech.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は母音周波数の正弦波を100 Hzの余弦波で
100%変調した波形。 第2図は鋸歯状波を母音周波で変調した波形。 第3図は高調波の総和の波形。 第4図は本法による合成母音の波形。 特許化原人  若 林 昭 大 第1図(Δ) 第 2 図  (八) 第3図(b) 第4rII 手続補正書 昭和60年10月31日 2 発明の名称 ハンソウハ ヘンチ四つ     オン七イコ°つ七イ
ホウ搬送波変調による音声合成法 3 補正をする者 事件との関係    特許出願人 カ ナカ”ワケンヒラツカシタ“イカンチロウ住所  
神奈川県平塚市代官町21−17〒254   置 0
463−23−5422ワカ バヤシ アキ  才 6 添付書類の目録 (1)  図   面               
1通補正の内容 1 特許請求の範囲を下記に補正する。 光が周波数個有の色を持つように音波も周波数個有の音
を持ち、しかも母音の原音であることに着目し、その音
質を改良することによる音声合成法で、ピッチ周期の鋸
歯状波または類似波を搬送波とし9M母音周波数の正弦
波または余弦波、あ2 発明の詳細な説明を下記に補正
する。 イ)  2頁19行の10字目上り3頁1行の9字目迄
の下記に示す44字を削除する。 また基本波が200 Hzならば第2.3.4高調波が
U、O,aで必ずしも素数倍とはならない。 口) 9頁1行より17行迄を下記と入れ替える。 第4図の各パターンを必要な時間だけ繰り返し出力すれ
ば必要な長さの極めて良質の母音を発声できる。尚、こ
れらを組み合せて単語を発声する場合、明瞭度を良くす
るには母音間の休止区間を十分長くする必要があり、こ
れが短かすぎるとお経のようになってしまう、調音結合
はこれにより生じる。これを防ぎ休止区間を短かくする
ためには、母音毎にスタートマークとストップマークが
なければならない、これは鋸歯状搬送波の振幅を音声区
間を1.母音間の休止区間をOとした矩形波の一次遅れ
の包絡線を形成するように増減すればよい、スタートマ
ークの先頭のスタートピッチは休止区間と区別でき、か
つ十分小さい振幅でなければならない、尚、前記調合表
の各母音の振幅は単に割合を決めるためのものであり、
調合後の振幅をはシ一定にそろえたものを各母音の基準
振幅とする。 次に子音の合成について述べる。従来、子音は雑音によ
り構成されるといわ九ているが、雑音は個人差を与える
要素ではあるけれども人間の発声器官の構造上やむなく
生ずるもので子音の本質には関係がない、これを除去し
た後の周期成分に子音の本質があるのであり、それは母
音の組合せであるということができる。光の周波数に対
する視覚による概念が色であり種々の周波数の混合によ
り各種の色を生ずるように音波の周波数も聴覚により母
音および子音という概念を与える。たゾ。 音声は純粋のスペクトル音が用いられることはなく、少
くともピッチ周期波を混ぜたもので混合色に相当し、母
音と子音の間には子音が多少複雑な混合であるものが多
いという違いがあるにすぎない6色の混合には直接、光
や絵の具を混合する方法と各色の細かい点を混ぜ視覚に
より混合する方法があるが、音声にもこれに相当する2
つの方法がある。振幅の加算または振幅変調による混合
が前者に相当し、前記8母音はこれにより合成されてい
る。多くの子音は後者により合成されるが。 このとき時間軸上に時分割で各周波数を配置し聴覚によ
り混合する。以下に各子音について具体的に述べる。 1) ワ行の合成 前記u、oの合成で主周波のピークを潰す位相で第3高
調波を加えたのは飽和歪を与えたことを意味する。従っ
て、UよりOの方が飽和歪が少なく、αは殆どこれがな
くてよいことは、この順に口の開は方が大きくなること
を示し、更に、これは口を大きく開けると共鳴周波数が
高くなることを示す、そこで、ワの発声は口を窄めた状
態から大きく開けた状態に倒ることを考えれば、主周波
数はUからαの状態に連続的に変化しなければならない
、実際にこれは観測されるのであり、この周波数推移は
かなり荒い周波数離散化により実現可能で、uoαの3
母音を2,3ピツチづつこの順に並べるだけでかなり良
好なワを合成できる。 実際には発声の長さに応じてαのピッチ数を増加し全体
として前記スタート、ストップマークを構成する。更に
2周波数の推移する前部の振幅は最終母音の最高振幅よ
り小さクシ、ピッチ数も少なくするのがよい、これは音
節においてアクセントは母音にあることに相当する。こ
れらを合成記号式で第5図(1)により表わす0発音記
号と区別するため母音を構成する基準振幅の1ピツチの
波形を英大文字で表わし、右肩に繰返しピッチ数、右下
に振幅比を付けた記号を連結類に並べ合成法を示したも
のである。このように日本語の子音は発声の初期に周波
数推移を伴い最終母音に倒るもので、いわゆる音節を構
成しているのである。 ワ行の他の音は順にuoi、uou、uoe。 uoである0合成記号式はワの最終母音を替えたもので
ある。ワとヲ以外は母音Oを含まなくてもよいが7行と
の区別は明瞭でなくなる。 2) ヤ行の合成 Uと主周波数の同じものにiがあり、Oと同じものにe
があることよりieαという推移が考えられるがこれは
ヤになる。たゾ、ワとの区別を明瞭にするためには母音
田を加え1aaaαとする。 また、ユ=iaaaou、ヨ= i e a!oである
。これらの合成記号式を第5図(2)に示す、マンセル
の色相環に例えた前記の母音の並びにおいて周波数の低
い側からαに向って推移するものがワであり、逆に高い
側からαに向うものがヤなのであって、a!を加えるこ
とによりこの違いがはっきりするのでワとヤの区別が明
瞭になるのである。 3) マ行の合成 mの合成は200 Hzに600 Hzを逆相(cos
波の場合)に加え100%程度の3次の飽和歪を与えた
ものを鋸歯状波で変調する。 200 Hz自体の音は
Uであり、′同じUの音を持つ300 Hzでも可能で
1mは口を閉じることにより3次歪を大きくしたUと考
えられるので母音である1日本語のマ行はワ行のUをm
で置換えたものである。たゾ、ピッチ敷は5〜6ピツチ
必栗であり、0は少なくする0合成記号式を第5図(3
)に示す。 4) す行の合成 nは200 Hzに400 Hzを同程度加え2次歪を
起こしたものを鋸歯状波で変調すればよい、母音a。 田はαに舌の作用の加ったものと考えられ、αの第2高
調波の近辺にあることより推定されるように舌の作用は
2次歪を与える。しかし、実際の発声においてもそうで
あろが、これだけではmとの区別は明瞭でない、nの前
後に母音等が付くとき舌が上顎に着く音、離れる音がn
の先頭と末尾のピッチ:こ重畳することによりmとの区
別は明瞭になる。これは3 kHzあたりの周波数成分
の強い混合波で、nのピッチ波形の前半に重畳して第6
図のような波形を示す、この混合波の詳細は後に述べる
が、nの合成記号式はこれらを含めて第5図(4)で表
わす。 5) ラ行の合成 rは前記の舌の離れる音を示す周波数成分が第7図のよ
うなきれいな包絡線を形成したもので1ピツチで構成さ
れる。従って、う行はこれに直接ア行の各母音を接続す
ればよい、うの合成記号式を第5図(5)に示す。 6)へ行の合成 ワの合成音uoaからOを除き休止ピッチを1ピッチ置
く程度の括れをつけるとハになる。共鳴周波数を連続的
にしか変化できない人間の発声器官でこのような周波数
の飛躍を可能にするのが破裂である0口を窄め内圧を高
かくして溜めた呼気を一気に吐くと、−塊の呼気が出た
後に一時呼気の停止が起り、その間に口の形は0を通過
し、やがて呼気が正常に戻ったときはアの状態になって
いるというのが破裂の機構である。その結果、子音と母
音の2つの部分よりなる音節の形となる。 ハの合成記号式を第5図(6)に示す、このハ行は昔の
ファ、フイ、・・・と表記した口を窄めて発声するもの
であり、前の音との間の休止区間が短いとハ→ワ、ヒ→
イ、フ4つ等の推移が起る。現代流のハ行は子音部が異
なるがそれについては後に述べる。ハ行の子音部を替え
ることにより以下の各種の子音を生ずる。 7) パ行の合成 ハ行の破裂が強くなったものがバ行であると考えられる
が、破裂が強いとは時間に対する周波数の飛躍量が大き
いことである。従って、前記ハ行の接頭部Uのピッチ数
を減らしスタートピッチよりすぐに休止ピッチに入るこ
とによりパ行を合成できる。バの合成記号式を第5図(
7)に示す、スタートピッチとしては口を閉じた状態を
示すmを用いるのがよく、実際の音声では前の音との間
に休止区間はなく上記mが詰っていて発声の速度に応じ
てそのピッチ数が変化する。 8)  k、 h、 t、ah、 ts、 sの合成こ
れら子音の特徴は白色雑音により構成されることである
。白色雑音は雑音モドキであって雑音ではない、スペク
トル的には一様振幅のsin波の混合とみることができ
るが、従来この方法で合成した前例はなく乱数により模
擬されるのでこの名がある。出原人はこれを下記の方法
でスペクトルに基いて合成することに世界で初めて成功
した。 y=ΣA  5in(2nπft + 2yc R)n
                         
nRはOと1の間の一様乱数列のn番目の値である。す
なわち、yは位相をランダムに規定した全高調波の総和
である。A が一定のときいわゆる白色雑音となる。こ
の方法は任意のスペクトル分布の雑音モドキを合成でき
る画期的方法である。 たゾし、二九は基本波の周期を持つ周期関数であるので
それが好ましくない場合は全長を基本周期とする必要が
ある。 本項目の各子音は基本波100 Hzに対してnの最高
値を概略下表の値としたものである。 k  h  r/n   t  ch  ts  sn
=  17 23 31 47 61 71 83rお
よびnの舌の離着の音もこの部類に属す、振幅Anは一
定よりも3 db10ctav@程度の高域強調とする
のがよい0例としてkおよびSの合成波形の1ピツチ分
を第8図に示す、実際の音声は呼気通路を狭め高域強調
または発声するもので、舌が上顎に着く位置が口の奥よ
り先になる程、また、呼気通路が狭い種間波数は高くな
り、内圧が高くなることによる整圧作用でピッチ周期は
明確でなくなりピッチ数も多くなる。 h、 ch、、
ts、 sはこれに相当し、ピッチ周期をぼかすために
子音部会長を基本周期にとって合成するのがよい、ピッ
チ周期を基本にすると、ピッチ周期が明確に現れるため
グラインダで金属を削るような音を発生する。 これら子音は緩やかに増加し、最高値に達した後2〜3
ピツチで減衰する波形で変調され、同時に最高周波数も
低い方へ推移する。従って、tは上記の減衰部分のみか
らなり2ピツチ程で構成されるが、 chはtを含み、
 tsはchを含み、Sはtsを含む、には2〜3ピツ
チで構成され、ピッチ周期は比較的明確である。これら
子音にア行の各母音を接続すれば力、す、り、ハ行を合
成できる。 9) バ行の合成 バ行の合成には搬送波としてピッチ周期のsin波とそ
の第2高調波を同相同振幅で加えた第9図(a)  を
用いるのがよい、この絶対値により700 Hzのsi
n波を変調した波形2ピツチを同図(b)に示す、これ
はアのピッチ波形をずらして重ねたような波形を成し、
実際の音声バの波形と極めてよく似ており、これのみで
前記パの合成法によりバを合成できる。たゾ、同様にし
て作ったブの1ピツチを先頭につけるとバの音が強くな
る。 こNで用いた搬送波に、25程度の定常値を加えたもの
の絶対値により700 Hzを変調すればバとなる。前
記パ行の合成はこれを用いる方が実際に近い波形となる
。また、これはピッチ強度の増加をゆるやかにすれば母
音アとなる。これは前記の母音と異なり、零レベルの上
下に振動する実際の音声波形により近いものである。こ
の搬送波も高周波成分を除去した鋸歯状波とみることが
できる。 10)  gt dt zyngの合成g=kBu、d
=tBu、z=sBu、ng=nkBuである。たゾし
、2の合成におけるSはピッチ数を少くしなければなら
ず、 tsを用いると考えてもよい。Buは既に述べた
ように子音部と母音部に分けることができない、このピ
ッチ数が多くなると母音Uが付いていると聴きとられる
1日本語の濁音はこれら英語濁音または清音にバ行の各
音を続ければよい。 11)よう音の合成 よう音の合成はローマ字的にイ段の各子音部にヤ行を継
げばよい、たゾし、iea:は普通には各1ピツチで構
成する。 12)ンの合成 ンはnの発声区間の中はどで急にピッチ振幅を、25以
下に落し、以後この状態を維持する。従って次に母音が
くるとその母音はす行に変化するという現象が起る。 以上9日本語母子音の合成例において述べた各条件は標
準的なもので、その許容範囲は極めて広い、これは誰で
も発声ができ人により様々な音色を持つことを可能とし
ている反面、従来行われている数理的解析によるこれら
標準的条件の同定を困難にしている。従って、従来の音
声合成はモデルとなった一人間の音声を再現するにすぎ
ないのに対し1本合成法は個々の人間とは関係のない音
声を合成できるもので、前者を手書き文字の再現に例え
るなら本法は活字文字の生成に相当するものである。も
ちろん2周波数の偏移や雑音の混入により様々に音色を
変えることもできるので特定の人間の音声を模擬するこ
ともできる。この目的には音声を本法に基いてローマ字
的に母子前に分け、更に可能な限り細分化したものを組
合せ再現する高圧縮素片合成も可能である。更に2本合
成法は実施例で明らかなように音声認識法としても有効
である。尚2本合成法は英語音声をその綴りに基いて合
成することができる。 以上のごとく本発明は音声が周波数個有の音の音圧的お
よび時分割的組合せであるという全く独創的な考えに基
くものであり、その簡単さと実用性および合成音声の特
性において従来法に比べ著しく優れたものである。 3 図面の簡単な説明に下記第5図より第9回連の説明
を追加する。 第5図は各子音の合成法を示す記号式。 第6図はnのピッチ波形に舌が上顎に離着する音の重畳
した波形。 第7図はrの波形。 第8図はkとSの一部の波形。 第9図(a)はパ行の搬送波の波形。 (b) は700 Hzを変調したバの波形。
Figure 1 shows a waveform in which a vowel frequency sine wave is 100% modulated with a 100 Hz cosine wave. Figure 2 shows the waveform of a sawtooth wave modulated by a vowel frequency. Figure 3 shows the waveform of the sum of harmonics. Figure 4 shows the waveform of a vowel synthesized by this method. Original patentee Akira Wakabayashi Figure 1 (Δ) Figure 2 (8) Figure 3 (b) 4rII Procedural amendment October 31, 1985 2 Name of the invention Speech synthesis method using carrier wave modulation 3 Relationship with the case of the person making the correction Address of patent applicant Kanaka "Waken Hiratsukashita" Ikanchiro
21-17 Daikancho, Hiratsuka City, Kanagawa Prefecture 〒254 Location 0
463-23-5422 Waka Bayashi Aki 6 years old List of attached documents (1) Drawings
Contents of one amendment 1 The scope of claims is amended as follows. Just as light has a frequency-specific color, sound waves also have a frequency-specific sound, and this is a speech synthesis method that improves the sound quality by focusing on the fact that it is the original sound of a vowel. Or a sine wave or cosine wave with a 9M vowel frequency using a similar wave as a carrier wave, A2. The detailed description of the invention is amended below. b) Delete the following 44 characters from the 10th character on page 2, line 19 to the 9th character on page 3, line 1. Furthermore, if the fundamental wave is 200 Hz, the 2nd, 3rd, and 4th harmonics are U, O, and a, which are not necessarily prime multiples. ) Replace page 9, lines 1 to 17 with the following. By repeatedly outputting each pattern shown in FIG. 4 for the necessary time, it is possible to produce extremely high-quality vowels of the necessary length. When pronouncing a word by combining these words, it is necessary to make the pause between vowels sufficiently long to improve the clarity; if it is too short, it will sound like a sutra. arise. In order to prevent this and shorten the pause period, there must be a start mark and a stop mark for each vowel, which reduces the amplitude of the sawtooth carrier wave to 1. It should be increased or decreased so as to form an envelope of the first-order lag of a rectangular wave with the pause section between vowels as O.The start pitch at the beginning of the start mark must be distinguishable from the pause section and have a sufficiently small amplitude. The amplitude of each vowel in the above formula table is simply for determining the proportion.
The amplitude after blending is made constant and is used as the reference amplitude for each vowel. Next, we will discuss the synthesis of consonants. Conventionally, consonants are said to be composed of noise, but although noise is an element that causes individual differences, it is unavoidable due to the structure of the human vocal organ and has no relation to the essence of the consonant, so we have removed it. The essence of a consonant lies in the latter periodic component, and can be said to be a combination of vowels. Just as the visual concept of the frequency of light is color, and the mixing of various frequencies produces various colors, the auditory concept of the frequency of sound waves is the concept of vowels and consonants. Tazo. Speech is never a pure spectral sound, but is a mixture of at least pitch periodic waves, which corresponds to a mixed color, and the difference between vowels and consonants is that they are often a somewhat complex mixture of consonants. There are two ways to mix the six colors that exist: one is to mix light or paint directly, and the other is to mix the details of each color visually.
There are two ways. Mixing by amplitude addition or amplitude modulation corresponds to the former, and the eight vowels are synthesized by this. Although many consonants are synthesized by the latter. At this time, each frequency is arranged in time division on the time axis and mixed by hearing. Each consonant will be described in detail below. 1) Synthesis of row W In the combination of u and o, adding the third harmonic at a phase that crushes the peak of the main frequency means giving saturation distortion. Therefore, the saturation distortion is lower in O than in U, and α requires almost no distortion, which indicates that the opening of the mouth becomes larger in this order, and furthermore, this means that the wider the mouth is opened, the higher the resonance frequency becomes. Therefore, considering that Wa's vocalization falls from a closed mouth state to a wide open state, the main frequency must change continuously from the state of U to the state of α, in fact. This is observed, and this frequency transition can be realized by fairly rough frequency discretization, and 3 of uoα
By simply arranging the vowels in this order two or three pitches at a time, you can synthesize a fairly good wa. Actually, the number of pitches α is increased according to the length of the utterance, and the start and stop marks are formed as a whole. Furthermore, the amplitude of the front part where the two frequencies change should be smaller than the highest amplitude of the final vowel, and the number of pitches should also be smaller, which corresponds to the fact that the accent is on the vowel in the syllable. In order to distinguish these from the 0 phonetic symbol shown in Figure 5 (1) in the composite symbol formula, the 1-pitch waveform of the reference amplitude constituting the vowel is expressed in capital letters, with the number of repeated pitches on the right shoulder and the amplitude ratio on the lower right. This shows the composition method by arranging the symbols with , into connected classes. In this way, Japanese consonants undergo a frequency transition and fall to the final vowel at the beginning of utterance, forming what is called a syllable. The other sounds in the wa line are uoi, uou, uoe. The zero compound symbol expression uo is obtained by changing the final vowel of wa. It is not necessary to include the vowel O except for wa and wo, but the distinction from the 7th line will not be clear. 2) I has the same main frequency as the composite U of the Y row, and e has the same main frequency as O.
Since there is, a transition such as ieα can be considered, but this is negative. To clearly distinguish between tazo and wa, the vowel field is added to make it 1aaaα. Also, yu = iaaaou, yo = i e a! It is o. These composite symbol formulas are shown in Figure 5 (2). In the above-mentioned sequence of vowels compared to Munsell's color wheel, Wa is the one that moves from the low frequency side toward α, and vice versa. What goes towards α is yah, so a! By adding , this difference becomes clear, and the distinction between wa and ya becomes clear. 3) Synthesis of m row The synthesis of m is 200 Hz and 600 Hz in reverse phase (cos
In the case of a wave), a third-order saturation distortion of approximately 100% is added to the waveform, which is then modulated with a sawtooth wave. The sound of 200 Hz itself is U, and 300 Hz, which has the same U sound, is also possible, and 1 m is considered to be a U with increased third-order distortion by closing the mouth, so the vowel 1 Japanese M line is U in the wa line is m
It was replaced with . The number of pitches must be 5 to 6 pitches, and the 0 compound symbol formula, which reduces the number of 0s, is shown in Figure 5 (3
). 4) The composite n of the second row is the vowel a, which is obtained by adding 400 Hz to the same extent to 200 Hz to cause second-order distortion, and then modulating it with a sawtooth wave. The field is considered to be α plus the action of the tongue, and the action of the tongue gives second-order distortion, as estimated from the fact that it is near the second harmonic of α. However, as is the case with actual pronunciation, the distinction from m is not clear from this alone; when a vowel, etc. is added before or after n, the sound of the tongue landing on the upper jaw and the sound of leaving it are n.
By superimposing the pitches at the beginning and end of , it becomes clear to distinguish it from m. This is a mixed wave with a strong frequency component around 3 kHz, and it is superimposed on the first half of the pitch waveform of n.
The details of this mixed wave having a waveform as shown in the figure will be described later, but the composite symbol expression of n including these is shown in FIG. 5 (4). 5) The composite r of the A row is composed of one pitch in which the frequency component representing the sound of the tongue leaving forms a clean envelope as shown in FIG. Therefore, for the U line, each vowel in the A line can be directly connected to this, and the composite symbol formula for the U line is shown in Figure 5 (5). 6) If you remove O from the synthesized sound uoa of the synthesized w in the line 6 and add a shortening to the extent of adding one pause pitch, it becomes ha. The human vocal organ, which can only change its resonance frequency continuously, is able to make such a jump in frequency through rupture.If you close your mouth, increase the internal pressure, and exhale the accumulated air all at once, you will exhale in a lump. The mechanism of rupture is that after expiration occurs, there is a temporary cessation of exhalation, during which time the shape of the mouth passes through 0, and when exhalation returns to normal, it is in state A. The result is a syllable with two parts: a consonant and a vowel. The compound symbol for ``ha'' is shown in Figure 5 (6). This ``c'' line is uttered with the mouth closed, which was written in the old days as ``fa'', ``hui'', etc., and there is a pause between the previous sound. If the section is short, Ha → Wa, Hi →
A, F, etc. transitions occur. The consonant part of the modern C line is different, but I will discuss this later. By changing the consonant part of the C line, the following various consonants are produced. 7) Synthesis of the P line The B line is considered to be a stronger rupture of the C line, but a strong rupture means that the amount of jump in frequency with respect to time is large. Therefore, the P line can be synthesized by reducing the number of pitches of the prefix U of the C line and entering the rest pitch immediately after the start pitch. Figure 5 (
As shown in 7), it is best to use m, which indicates a state with the mouth closed, as the starting pitch; in actual speech, there is no pause between the previous sound and the above m is packed, and the pitch changes depending on the speed of speech. The number of pitches changes. 8) Synthesis of k, h, t, ah, ts, s A characteristic of these consonants is that they are composed of white noise. White noise is a type of noise, not noise.Spectrally, it can be seen as a mixture of sine waves with uniform amplitude, but there is no precedent for it being synthesized using this method, and it is simulated using random numbers, hence the name. . Degento was the first in the world to succeed in synthesizing this based on spectra using the method described below. y=ΣA 5in(2nπft + 2yc R)n

nR is the nth value of a uniform random number sequence between O and 1. That is, y is the sum of all harmonics whose phases are randomly defined. When A is constant, it becomes so-called white noise. This method is an innovative method that can synthesize noise variations with arbitrary spectral distributions. However, since 29 is a periodic function with the period of the fundamental wave, if this is not desirable, it is necessary to set the total length as the fundamental period. For each consonant in this item, the maximum value of n for a fundamental wave of 100 Hz is approximately the value shown in the table below. k h r/n t ch ts sn
= 17 23 31 47 61 71 83 The sound of tongue release and release in r and n also belongs to this category.It is better to emphasize the high frequency range of about 3 db10ctav@ rather than keeping the amplitude An constant.For example, the synthesis of k and S One pitch of the waveform is shown in Figure 8.Actual speech is produced by narrowing the exhalation passageway and emphasizing the high frequencies. The narrow interspecies wave number becomes high, and due to the pressure regulating effect due to the high internal pressure, the pitch period becomes less clear and the number of pitches increases. h, ch,,
ts and s correspond to this, and in order to blur the pitch period, it is best to synthesize by taking the consonant section head as the basic period.If you use the pitch period as the basis, the pitch period will clearly appear, so it will produce a sound like scraping metal with a grinder. occurs. These consonants gradually increase and after reaching the highest value 2-3
It is modulated with a waveform that attenuates with pitch, and at the same time the highest frequency also shifts toward the lower side. Therefore, t consists only of the above-mentioned attenuation part and consists of about 2 pitches, but ch includes t,
ts includes ch, S includes ts, is composed of 2 to 3 pitches, and the pitch period is relatively clear. By connecting each vowel in the A line to these consonants, you can synthesize the power, su, ri, and ha lines. 9) Synthesis of the B row It is best to use the pitch period sine wave and its second harmonic with the same phase and amplitude as the carrier wave, as shown in Fig. 9 (a), for the synthesis of the B row. si
The 2-pitch waveform obtained by modulating the n-wave is shown in the same figure (b).
It is extremely similar to the waveform of an actual speech bar, and with this alone, it is possible to synthesize bass using the above-mentioned method of synthesizing bass. Tazo, if you add one pitch of bu made in the same way at the beginning, the sound of ba will become stronger. If 700 Hz is modulated by the absolute value of the carrier wave used in N plus a steady value of about 25, then B is obtained. If this is used to synthesize the above-mentioned PA lines, a waveform that is closer to the actual waveform will be obtained. Also, if the increase in pitch intensity is made more gradual, this becomes a vowel A. Unlike the vowels described above, this is more similar to an actual speech waveform that vibrates above and below the zero level. This carrier wave can also be seen as a sawtooth wave with high frequency components removed. 10) Synthesis of gt dt zyng g=kBu, d
=tBu, z=sBu, ng=nkBu. However, the number of pitches of S in the synthesis of 2 must be reduced, and ts may be considered to be used. As mentioned above, Bu cannot be divided into a consonant part and a vowel part, and when the number of pitches increases, it can be heard as having a vowel U. 1. Japanese voiced sounds are added to these English voiced sounds or clear sounds with each part of the B line. Just keep the sound going. 11) Synthesizing the Yo sound To synthesize the Yo sound, just follow the Y line to each consonant part of the I stage according to the Roman alphabet, and the iea: is usually composed of one pitch each. 12) During the utterance section of n, the pitch amplitude suddenly drops to below 25, and this state is maintained thereafter. Therefore, when the next vowel comes, a phenomenon occurs in which the vowel changes to a sub line. Each of the conditions described above in the nine Japanese vowel consonant synthesis examples is standard, and the permissible range is extremely wide. This makes it difficult to identify these standard conditions through the mathematical analyses. Therefore, while conventional speech synthesis only reproduces the voice of one person as a model, the one-line synthesis method can synthesize voices that are unrelated to the individual person, and the former reproduces handwritten characters. For example, this method is equivalent to the generation of printed characters. Of course, the tone can be changed in various ways by shifting two frequencies or adding noise, so it is also possible to simulate a specific human voice. For this purpose, it is also possible to perform highly compressed segment synthesis, which divides speech into mother and child parts in the Roman alphabet based on this method, and then combines and reproduces the subdivided parts as much as possible. Furthermore, the two-line synthesis method is also effective as a speech recognition method, as is clear from the examples. Note that the two-line synthesis method can synthesize English speech based on its spelling. As described above, the present invention is based on the completely original idea that speech is a sound pressure and time division combination of sounds with individual frequencies, and is superior to conventional methods in its simplicity and practicality, as well as in the characteristics of synthesized speech. It is significantly better than that. 3 Add the 9th series of explanations from Figure 5 below to the brief explanation of the drawings. Figure 5 is a symbolic formula showing how to synthesize each consonant. Figure 6 shows a waveform in which the sound of the tongue touching and leaving the upper jaw is superimposed on the n pitch waveform. Figure 7 shows the waveform of r. Figure 8 shows part of the waveforms of k and S. FIG. 9(a) shows the waveform of the carrier wave in the PA row. (b) is a waveform modulated at 700 Hz.

Claims (1)

【特許請求の範囲】 光が周波数個有の色を持つように音波も周 波数個有の音を持ち、しかも母音の原音であることに着
目し、その音質を改良することによる音声合成法で、ピ
ッチ周期の鋸歯状波または類似波を搬送波とし、原母音
周波数の正弦波または余弦波または2、3の混合あるい
は変調波で変調することによる音声合成法。
[Claims] This is a speech synthesis method that focuses on the fact that just as light has a frequency-specific color, so sound waves also have a frequency-specific sound, and that this is the original sound of a vowel, and improves the sound quality of the sound. A speech synthesis method in which a sawtooth wave or similar wave with a pitch period is used as a carrier wave and modulated with a sine wave or cosine wave of the original vowel frequency, or a mixture or modulation wave of two or three.
JP59158900A 1984-07-31 1984-07-31 Voice synthesization by carrier modulation Pending JPS61112199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59158900A JPS61112199A (en) 1984-07-31 1984-07-31 Voice synthesization by carrier modulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59158900A JPS61112199A (en) 1984-07-31 1984-07-31 Voice synthesization by carrier modulation

Publications (1)

Publication Number Publication Date
JPS61112199A true JPS61112199A (en) 1986-05-30

Family

ID=15681816

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59158900A Pending JPS61112199A (en) 1984-07-31 1984-07-31 Voice synthesization by carrier modulation

Country Status (1)

Country Link
JP (1) JPS61112199A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63229497A (en) * 1987-03-18 1988-09-26 若林 昭夫 Voice synthetic recognition

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57102693A (en) * 1980-12-19 1982-06-25 Fujitsu Ltd Voice synthsizing apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57102693A (en) * 1980-12-19 1982-06-25 Fujitsu Ltd Voice synthsizing apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS63229497A (en) * 1987-03-18 1988-09-26 若林 昭夫 Voice synthetic recognition

Similar Documents

Publication Publication Date Title
US4128737A (en) Voice synthesizer
US4301328A (en) Voice synthesizer
Breen Speech synthesis models: a review
JPS61112199A (en) Voice synthesization by carrier modulation
d’Alessandro et al. The speech conductor: gestural control of speech synthesis
JPS58168097A (en) Voice synthesizer
JP3394281B2 (en) Speech synthesis method and rule synthesizer
Hisagi et al. Acoustic properties of Japanese and English vowels: Effects of phonetic and prosodic context
Hanson et al. Development of rules for controlling the HLsyn speech synthesizer
JPS58129500A (en) Singing voice synthesizer
Saiyod et al. Thai Speech Synthesis for Text-to-Speech based on Formant Synthesis Technique
Rialland A new perspective on Silbo Gomero
JPH03200300A (en) Voice synthesizer
JPS60153099A (en) Rule type voice synthesizer
KR960024888A (en) LSP Speech Synthesis Method Using Dipon Unit
JPH07152396A (en) Voice synthesizer
JPH0836397A (en) Voice synthesizer
JPS5913676Y2 (en) vocoder
JP2590268B2 (en) Speech synthesizer
JPS63199400A (en) Voice synthesizer
Klatt Synthesis of stop consonants in initial position
JPS60113299A (en) Voice synthesizer
Bélanger et al. Designing and controlling a source-filter model for naturalistic and expressive singing voice synthesis
Linggard Electronic speech synthesis
Macchi et al. Syllable affixes in speech snythesis