JP2004233624A

JP2004233624A - Voice synthesizer

Info

Publication number: JP2004233624A
Application number: JP2003021683A
Authority: JP
Inventors: Atsuichi Nakamura; 敦一中村
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2003-01-30
Filing date: 2003-01-30
Publication date: 2004-08-19
Anticipated expiration: 2023-01-30
Also published as: JP3915704B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice synthesizer capable of synthesizing a voice of high quality. <P>SOLUTION: An address generator 21 which accumulates phase data outputted from a phase data generator 20 outputs a readout address for the rate of the center frequency of a voiced sound formant or voiceless sound formant, and waveform data for generating the voiced sound formant or voiceless sound formant are read out from a waveform data storage part 22 according to the readout address. A multiplier 23 multiplies the read-out waveform data by an envelope signal, and an adder 25 adds noise to the waveform data for generating the voiceless sound formant. A voice is synthesized by combining voiced sound formants or voiceless sound formants outputted from a plurality of such WT voice parts 10. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明が属する技術分野】
本発明は、複数のフォルマントを合成して音声を合成することができる音声合成装置に関するものである。
【０００２】
【従来の技術】
従来の音声合成装置の一例としては、数ｍｓないし数十ｍｓの短時間の音声を定常と見なして数個の正弦波の和で音声を表現することを原理としている。そして、正弦波を発生する正弦波発生器の位相をピッチ周期でリセットすることにより有声音を形成すると共に、正弦波発生器の位相初期化タイミングをランダムにすることによりスペクトルを広げて無声音を形成する音声合成装置が知られていた（例えば、特許文献１参照）。
【０００３】
【特許文献１】
特公昭５８−５３３５１号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の音声合成装置が合成することのできる音声の品位は低くリアリティがないという問題点があった。
そこで、本発明は、高品位の音声を合成することができる音声合成装置を提供することを目的としている。
【０００５】
【課題を解決するための手段】
上記目的を達成するために、本発明の音声合成装置は、所望のフォルマント中心周波数および所望のフォルマントレベルを有するフォルマントをそれぞれ形成する複数のフォルマント形成部を備え、該複数のフォルマント形成部で形成された複数のフォルマントを合成することにより音声を合成する音声合成装置であって、前記複数のフォルマント形成部のそれぞれが、複数種類の波形形状の中から所望の波形形状を指定する波形形状指定手段と、前記複数種類の波形形状に対応した複数の波形データを記憶する波形データ記憶手段と、前記フォルマント中心周波数に対応したレートで変化するアドレスを発生して、前記波形形状指定手段で指定された波形形状に対応した波形データを前記波形データ記憶手段から読み出す波形データ読み出し手段と、前記ピッチ周期に対応したタイミング毎に急速に減衰するとともに減衰後に急速に立ち上がる形状のエンベロープ信号を形成し、該形成したエンベロープ信号を前記波形データ読み出し手段により前記波形データ記憶手段から読み出された波形データに付与するエンベロープ付与手段とを備えている。
【０００６】
また、上記本発明の音声合成装置において、前記複数のフォルマント形成部により形成された複数のフォルマントを合成することにより有声音が合成されるようにしてもよい。
【０００７】
このような本発明によれば、複数のフォルマント形成部により所望のフォルマント中心周波数および所望のフォルマントレベルをそれぞれ有するフォルマントを形成し、形成された複数のフォルマントを合成することにより音声を合成している。そして、フォルマントを形成する波形データにピッチ周期のエンベロープ信号を付与するようにしている。これにより、フォルマントにピッチ感を有させることができ、高品位のリアリティのある音声を合成することができるようになる。また、有声音フォルマントを形成する波形データにピッチ周期のエンベロープ信号を付与することにより、有声音フォルマントにピッチ感を有させることができる。
【０００８】
【発明の実施の形態】
本発明の実施の形態の音源装置と兼用される音声合成装置の構成を示すブロック図を図１に示す。
図１に示す音声合成装置１は、複数種類の波形形状の波形データを記憶している波形データ記憶部と、この波形データ記憶部から所定の波形データを読み出す読み出し手段を少なくとも備える９つの波形テーブルボイス（ＷＴボイス）部１０ａ，１０ｂ，１０ｃ，１０ｄ，１０ｅ，１０ｆ，１０ｇ，１０ｈ，１０ｉと、ＷＴボイス部１０ａ〜１０ｉから出力される波形データをミキシングするミキシング手段１１から構成され、ミキシング手段１１からは発生された楽音あるいは合成された音声が出力される。この場合、ＷＴボイス部１０ａ〜１０ｉに各種パラメータとして楽音パラメータおよび音声パラメータが供給されており、楽音／音声の発生指示をする音声モードフラグ（ＨＶＭＯＤＥ）が楽音の発生を指示（ＨＶＭＯＤＥ＝０）していた場合は、楽音パラメータが選択されてＷＴボイス部１０ａ〜１０ｉで使用される。そして、選択された楽音パラメータに基づいてＷＴボイス部１０ａ〜１０ｉから発生された複数の楽音の波形データが出力され、ミキシング手段１１から最大９音からなる楽音が出力される。
【０００９】
そして、楽音／音声の発生指示をする音声モードフラグ（ＨＶＭＯＤＥ）が音声の発生を指示（ＨＶＭＯＤＥ＝１）していた場合は、音声パラメータが選択されてＷＴボイス部１０ａ〜１０ｉで使用される。そして、選択された音声パラメータに基づいてＷＴボイス部１０ａ〜１０ｉから有声音ピッチ信号、有声音フォルマントあるいは無声音フォルマントを形成する波形データが出力され、有声音フォルマントおよび無声音フォルマントを形成している波形データがミキシング手段１１で合成されることにより１つの音声が出力される。なお、ＨＶＭＯＤＥのＨＶはＨｕｍａｎＶｏｉｃｅの略である。また、Ｕ／Ｖは無声音（ＵｎｖｏｉｃｅｄＳｏｕｎｄ）／有声音（ＶｏｉｃｅｄＳｏｕｎｄ）指示フラグであり、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０が供給されている場合は、ＷＴボイス部１０ｂ〜１０ｉから有声音のフォルマントを形成する波形データが出力される。また、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０が供給されているＷＴボイス部１０ａからは、有声音のピッチ周期とされる有声音ピッチ信号が出力され、波形データは利用されない。ＷＴボイス部１０ａから出力された有声音ピッチ信号はＷＴボイス部１０ｂ〜１０ｉに供給されて、有声音フォルマントを形成する波形データの位相が、有声音ピッチ信号の周期毎にリセットされるようになる。また、有声音フォルマントのエンベロープ形状が有声音ピッチ信号の周期に対応したものとなる。これにより、有声音フォルマントにピッチ感を有させることができる。
【００１０】
そして、ＷＴボイス部１０ｂ〜１０ｉにＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１が供給されている場合は、ＷＴボイス部１０ｂ〜１０ｉから無声音のフォルマントを形成する波形データが出力される。また、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１が供給されているＷＴボイス部１０ａからの出力は利用されない。このように、ＨＶＭＯＤＥ＝１とすると、ＷＴボイス部１０ｂ〜１０ｉにより有声音フォルマントあるいは無声音フォルマントのフォルマントを最大８フォルマント出力することができる。
【００１１】
ここで、音声について説明すると、音声の元になるのは声帯の振動であるが、声帯の振動は発音する言葉が違ってもほとんど変化することはない。口の開け方や喉の形などによって生じる共振や共鳴、そしてそれに付随する摩擦音や破裂音などが声帯の振動に付け加えられることでさまざまな音声になっている。このような音声には、特定の周波数領域にスペクトルが集中しているフォルマントと呼ばれる部分が周波数軸上で複数箇所存在している。このフォルマントの中央の周波数、あるいは、振幅最大の周波数がフォルマント中心周波数である。音声に含まれるフォルマントの数や、各フォルマントの中心周波数や振幅、帯域幅などは音声の性質を決める要素であり、音声を出す人の性別や体格、年齢などによって大きく異なるようになる。また、音声では発音する言葉の種類ごとに特徴的なフォルマントの組み合わせが決まっており、フォルマントの組み合わせは声質に関わることはない。フォルマントの種類を大別すると、有声音を合成するためのピッチ感を持った有声音フォルマントと、無声音を合成するためのピッチ感を持たない無声音フォルマントとなる。なお、有声音とは、発音する際に声帯が振動する音声であり、有声音には、母音と半母音、そしてバ行、ガ行、マ行、ラ行などで使用される有声子音が含まれる。また、無声音とは、発音する際に声帯が振動しない音声であり、ハ行、力行、サ行などの子音が無声音に該当する。
【００１２】
図１に示す構成の本発明にかかる音源装置と兼用される音声合成装置１において、楽音を発生する際には、ＨＶＭＯＤＥ＝０としてＷＴボイス部１０ａ〜１０ｉのそれぞれで複数の楽音を発生するようにしている。すなわち、最大９音からなる楽音を発生することができる。
音声を合成する際には、ＨＶＭＯＤＥ＝１として合成する有声音あるいは無声音の音声に対応する有声音フォルマントあるいは無声音フォルマントをＷＴボイス部１０ｂ〜１０ｉにより形成するようにしている。この場合、合成される音声は最大８つのフォルマントの組み合わせとなる。例えば、合成される音声が有声音の場合は、ＷＴボイス部１０ｂ〜１０ｉにＵ／Ｖ＝０が供給されて、供給されている音声パラメータに基づく有声音フォルマントがそれぞれＷＴボイス部１０ｂ〜１０ｉにより形成される。この際に、ＷＴボイス部１０ａにはＵ／Ｖ＝０が供給されて、ＷＴボイス部１０ａは供給されている音声パラメータに基づいて有声音ピッチ信号を発生する。この有声音ピッチ信号はＷＴボイス部１０ｂ〜１０ｉに供給されて、出力される有声音フォルマントを形成する波形データの位相が有声音ピッチ信号の周期毎にリセットされる。また、有声音フォルマントのエンベロープ形状が有声音ピッチ信号の周期に対応したものとなる。これによりピッチ感を持った有声音フォルマントがＷＴボイス部１０ｂ〜１０ｉにより形成されるようになる。
【００１３】
また、合成される音声が無声音の場合は、ＷＴボイス部１０ｂ〜１０ｉにＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１が供給されて、供給されている音声パラメータに基づく無声音フォルマントがそれぞれＷＴボイス部１０ｂ〜１０ｉにより形成される。後述するように、無声音の場合にはノイズが付与された無声音フォルマントとされる。これにより、高品質のリアリティのある音声を合成することができる。なお、無声音を合成する場合はＷＴボイス１０ａの出力は利用されない。
【００１４】
音声合成装置１におけるＷＴボイス部１０ａ〜１０ｉの構成は同じ構成とされており、ＷＴボイス部１０として以下にその構成を説明する。図２は、ＷＴボイス部１０の概略構成を示すブロック図である。なお、図２以降において、（ＷＴ）、（有声音フォルマント）、（無声音フォルマント）の表記は、そのパラメータがそれぞれ、楽音、有声音フォルマント、無声音フォルマントを生成するためのパラメータであることを示している。
図２において、位相データ発生器（ＰＧ：ＰｈａｓｅＧｅｎｅｒａｔｏｒ）２０は、発生すべき楽音のピッチあるいは有声音ピッチ信号、有声音フォルマント中心周波数、無声音フォルマント中心周波数のいずれかに対応する位相データを発生している。ＰＧ２０には、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報と、楽音パラメータとして楽音のオクターブ情報ＢＬＯＣＫ（ＷＴ）、楽音の周波数情報ＦＮＵＭ（ＷＴ）が供給されている。さらに、音声パラメータとして、有声音ピッチ信号のオクターブ情報ＢＬＯＣＫ（有声音ピッチ）、有声音ピッチ信号の周波数情報ＦＮＵＭ（有声音ピッチ）、あるいは、有声音フォルマントのオクターブ情報ＢＬＯＣＫ（有声音フォルマント）、有声音フォルマントの周波数情報ＦＮＵＭ（有声音フォルマント）、無声音フォルマントのオクターブ情報ＢＬＯＣＫ（無声音フォルマント）、無声音フォルマントの周波数情報ＦＮＵＭ（無声音フォルマント）の各パラメータが供給されている。ＰＧ２０において、供給されている各種パラメータがフラグ情報により選択されて、選択したパラメータに基づいて発生すべき楽音の音程あるいは有声音ピッチ信号、有声音フォルマント中心周波数、無声音フォルマント中心周波数のいずれかに対応する位相データが発生されている。
【００１５】
ＰＧ２０の詳細構成を図３に示す。図３においてセレクタ３０では、Ｕ／Ｖフラグの状態に応じて有声音ピッチ信号あるいは有声音フォルマントの周波数情報ＦＮＵＭと、無声音フォルマントの周波数情報ＦＮＵＭとのいずれかが選択されてセレクタ３１に出力される。セレクタ３１では、ＨＶＭＯＤＥフラグの状態に応じて楽音の周波数情報ＦＮＵＭ（ＷＴ）と、セレクタ３０から出力される音声関連の周波数情報ＦＮＵＭとのいずれかが選択されてシフター３４に出力され、セレクタ３１から出力される周波数情報ＦＮＵＭがシフター３４にセットされる。また、セレクタ３２では、Ｕ／Ｖフラグの状態に応じて有声音ピッチ信号あるいは有声音フォルマントのオクターブ情報ＢＬＯＣＫと、無声音フォルマントのオクターブ情報ＢＬＯＣＫとのいずれかが選択されてセレクタ３３に出力される。セレクタ３３では、ＨＶＭＯＤＥフラグの状態に応じて楽音のオクターブ情報ＢＬＯＣＫ（ＷＴ）と、セレクタ３２から出力される音声関連のオクターブ情報ＢＬＯＣＫとのいずれかが選択されてシフター３４にシフト情報として出力され、シフター３４にセットされている周波数情報ＦＮＵＭがオクターブ情報ＢＬＯＣＫに応じてシフトされる。これにより、発生すべき楽音の音程、有声音ピッチ信号、有声音フォルマントの中心周波数、無声音フォルマントの中心周波数のいずれかを発生するためのオクターブの加味された位相データがＰＧ出力としてＰＧ２０から出力される。
【００１６】
図２に戻りＰＧ２０からのＰＧ出力は、アドレス発生器（ＡＤＧ：ＡｄｄｒｅｓｓＧｅｎｅｒａｔｏｒ）２１に入力され、ＰＧ出力とされる位相データを累算することにより、波形データ記憶部（ＷＡＶＥＴＡＢＬＥ）２２から所望の波形形状の波形データを読み出すための読み出しアドレスを発生している。ＡＤＧ２１には、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報と、楽音パラメータとしてスタートアドレスＳＡ（ＷＴ）、ループポイントＬＰ（ＷＴ）、エンドポイントＥＰ（ＷＴ）が供給され、さらに、音声パラメータとして、有声音フォルマントを形成するに適した波形を選択するための波形選択（ＷＳ）信号と、楽音および音声に共通の発音開始を指示するキーオン（ＫｅｙＯｎ）信号が供給されている。
【００１７】
楽音を発生する場合には、ＨＶＭＯＤＥ＝０としてキーオン信号の開始タイミングでスタートアドレスＳＡ（ＷＴ）がＡＤＧ２１から出力され、スタートアドレスＳＡ（ＷＴ）で示される波形データ記憶部２２の位置から波形データの読み出しが開始される。そして、ＰＧ２０からの位相データを累算していくことによりエンドポイントＥＰ（ＷＴ）までの読み出しアドレスが、楽音の音程に応じたレートで変化するようにＡＤＧ２１から順次出力される。これにより、エンドポイントＥＰ（ＷＴ）で示される波形データ記憶部２２の位置までの波形データのサンプルが楽音の音程に応じたレートで順次読み出される。次いで、ループポイントＬＰ（ＷＴ）に相当する読み出しアドレスがＡＤＧ２１から出力され、さらにＰＧ２０からの位相データを累算していくことによりエンドポイントＥＰ（ＷＴ）までの読み出しアドレスが楽音の音程に応じたレートで変化しながらＡＤＧ２１から順次出力される。これにより、ループポイントＬＰ（ＷＴ）で示される波形データ記憶部２２の位置からエンドポイントＥＰ（ＷＴ）で示される波形データ記憶部２２の位置までの波形データのサンプルが楽音の音程に応じたレートで順次読み出される。ループポイントＬＰ（ＷＴ）からエンドポイントＥＰ（ＷＴ）までの読み出しアドレスは、キーオン信号により発音停止されるまで繰り返し発生される。これにより、キーオン信号で示される発音開始から発音停止までの所望の波形データを、楽音の音程に応じたレートで波形データ記憶部２２から読み出すことができる。
【００１８】
また、音声を合成する際には、ＨＶＭＯＤＥ＝１としてキーオン信号の開始タイミングでＷＳ（有声音フォルマント）信号で示されるスタートアドレス、あるいは、予め定められている無声音フォルマント用のスタートアドレスで示される波形データ記憶部２２の位置から波形データの読み出しが開始される。そして、ＰＧ２０からの位相データを累算していくことにより固定とされているアドレス範囲の読み出しアドレスが、有声音フォルマントあるいは無声音フォルマントの中心周波数に応じたレートで変化するようＡＤＧ２１から順次出力される。これにより、波形データのサンプルが波形データ記憶部２２から有声音フォルマントあるいは無声音フォルマントの中心周波数に応じたレートで順次読み出されるようになる。なお、ＷＴボイス部１０ａにおいては、ＰＧ２０からの位相データを累算した累算値が有声音ピッチ周期で予め定められている所定の値（定数値）に達するようになり、定数値に達した際に有声音ピッチ信号（パルス信号）が出力されるようになる。
【００１９】
このようなＡＤＧ２１の詳細構成を図４に示す。図４においてＰＧ２０からの位相データは累算器（ＡＣＣ：Ａｃｃｕｍｕｌａｔｏｒ）４１に入力されて、クロック毎に累算されることにより読み出しアドレスの増分値が生成される。この読み出しアドレスの増分値は、セレクタ４６を介して加算器４７に供給され加算器４７においてスタートアドレスが加算されて読み出しアドレスが生成され、ＡＤＧ出力としてＡＤＧ２１から出力される。
ＡＤＧ２１において、ＨＶＭＯＤＥ＝０とされて楽音を発生する際の動作を説明する。ＨＶＭＯＤＥ＝０とされると、アンドゲートＡＮＤが閉じるためオアゲートＯＲから出力されるキーオン信号（ＫｅｙＯｎ）のみによって累算器４１は初期値にリセットされ、ＰＧ２０から供給される発生すべき楽音の音程に応じた位相データの累算を開始する。この累算はクロック毎に行われ、その累算値ｂはセレクタ４６および減算器４３に出力される。
【００２０】
減算器４３にデータａを供給するセレクタ４２はＨＶＭＯＤＥ＝０とされていることからエンドポイントＥＰ（ＷＴ）をデータａとして選択し減算器４３に出力する。これにより、減算器４３で演算された減算値（ａ−ｂ）が出力され、減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算器４５に供給される。また、減算値（ａ−ｂ）が負となった際に“１”となるＭＳＢ（ＭｏｓｔＳｉｇｎｉｆｉｃａｎｔＢｉｔ）信号が選択信号としてセレクタ４６に供給されると共に、累算器４１にロード信号として供給される。ＭＳＢ信号は、減算値（ａ−ｂ）が負になった際に“１”になることから、セレクタ４６は累算値ｂがエンドポイントＥＰ（ＷＴ）を超えるまでは累算値ｂを加算器４７に出力する。加算器４７に加算データを供給するセレクタ５０は、ＨＶＭＯＤＥ＝０とされていることからスタートアドレスＳＡ（ＷＴ）を選択して加算器４７に出力する。これにより、スタートアドレスＳＡ（ＷＴ）が加算された累算値ｂがＡＤＧ出力として出力される。累算値ｂはクロック毎に位相データが累算されて、位相データのレートで変化することから、ＡＤＧ出力である読み出しアドレスも位相データに応じて変化していくようになる。
【００２１】
そして、累算値ｂがエンドポイントＥＰ（ＷＴ）を超えた際にＭＳＢ信号は“１”に変化することから、セレクタ４６は加算器４５から出力されるデータｃを出力するようになる。データｃは、ＨＶＭＯＤＥ＝０とされていることからセレクタ４４において選択されたループポイントＬＰ（ＷＴ）に、加算器４５において減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算された演算値とされる。これにより、加算器４７から出力されるＡＤＧ出力は振幅値｜ａ−ｂ｜で補正されたループポイントＬＰ（ＷＴ）の読み出しアドレスとなる。また、ＭＳＢ信号が“１”に変化することから累算器４１にロード信号が供給されて、累算器４１にデータｃがロードされるようになる。すると、ＭＳＢ信号が“０”に戻ることから累算器４１から出力されるデータｃがセレクタ４６から出力されるようになる。そして、累算器４１からはクロック毎に位相データがデータｃに加算された累算値ｂが出力されることから、ＡＤＧ出力はほぼループポイントＬＰ（ＷＴ）の読み出しアドレスから位相データに応じたレートで変化していくようになる。
【００２２】
この場合のＡＤＧ出力をグラフで図示して説明すると、ＡＤＧ出力は図５に示すようになる。すなわち、キーオン信号が印加されるとスタートアドレスＳＡ（ＷＴ）が出力され、位相データに応じたレートで変化しながら読み出しアドレスが上昇していきスタートアドレスＳＡ（ＷＴ）からエンドポイント（ＥＰ）分増分された際に、スタートアドレスＳＡ（ＷＴ）にループポイント（ＬＰ）を加算した値に戻り、以降は、スタートアドレスＳＡ（ＷＴ）にループポイント（ＬＰ）を加算した値からエンドポイント（ＥＰ）分増分されるまでの読み出しアドレスを繰り返し発生するようになる。この際の読み出しアドレスの変化は、位相データに応じたレートとなる。そして、キーオン信号により発音停止された際にＡＤＧ出力は停止されるようになる。このＡＤＧ出力である読み出しアドレスにより波形データ記憶部２２から読み出された波形データは、位相データに応じた周波数となる。なお、スタートアドレスＳＡ（ＷＴ）により波形データ記憶部２２から読み出される波形データの種類を選択することができることから、例えば、ＷＴボイス部１０ａ〜１０ｉ毎にスタートアドレスＳＡ（ＷＴ）を選択することにより、ＷＴボイス部１０ａ〜１０ｉ毎の音色を異ならせることができるようになる。
【００２３】
次に、ＡＤＧ２１がＷＴボイス部１０ａのアドレス発生器であって、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０とされて有声音ピッチ信号を発生する際の動作を説明する。ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０とされると、アンドゲートＡＮＤが開くが、ＷＴボイス１０ａには有声音ピッチ信号が供給されていないため、オアゲートＯＲからはキーオン信号のみが出力される。従って、累算器４１はキーオン信号により初期値にリセットされ、ＰＧ２０から供給される発生すべき有声音ピッチ信号に応じた位相データの累算を開始する。この累算はクロック毎に行われ、その累算値ｂはセレクタ４６および減算器４３に出力される。減算器４３にデータａを供給するセレクタ４２はＨＶＭＯＤＥ＝１とされていることからあらかじめ定められている定数値をデータａとして選択し減算器４３に出力する。これにより、減算器４３で演算された減算値（ａ−ｂ）が出力され、減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算器４５に供給される。
【００２４】
また、減算値（ａ−ｂ）のＭＳＢ信号が選択信号としてセレクタ４６に供給されると共に、累算器４１にロード信号として供給される。ＭＳＢ信号は、減算値（ａ−ｂ）が負の値になった際、すなわち累算値が定数値に達した際に“１”になり、累算器４１にロード信号として供給されて、累算器４１にデータｃがロードされるようになる。データｃは、ＨＶＭＯＤＥ＝１とされていることからセレクタ４４において選択された“０”に、加算器４５において減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算された演算値とされる。累算器４１が次のクロックでデータｃに位相データを加算すると、ＭＳＢ信号は“０”になる。このようにして、ＭＳＢ信号はＰＧ２０から供給された有声音ピッチパラメータに基づく位相データに応じた周期、すなわち有声音ピッチの周期で発生されるようになる。そこで、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０が供給されたＷＴボイス１０ａでは、このＭＳＢ信号を有声音ピッチ信号として出力している。有声音ピッチ信号をグラフで図示すると図７に示すように有声音ピッチの周期を有するパルス信号となる。この場合において、ＷＴボイス部１０ａからはＡＤＧ出力も出力されるが、このＡＤＧ出力は読み出しアドレスとして使用しない。
【００２５】
次に、ＡＤＧ２１において、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０とされて有声音フォルマントを発生する際の動作を説明する。ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０とされると、ゲートＮＯＴの作用によりアンドゲートＡＮＤが開くためオアゲートＯＲから出力される有声音ピッチ信号およびキーオン信号によって累算器４１は初期値にリセットされ、ＰＧ２０から供給される発生すべき有声音フォルマントの中心周波数に応じた位相データの累算を開始する。アンドゲートＡＮＤには、ＷＴボイス部１０ａから出力される図７に示す有声音ピッチ信号が供給されている。累算器４１の累算はクロック毎に行われ、その累算値ｂはセレクタ４６および減算器４３に出力される。減算器４３にデータａを供給するセレクタ４２はＨＶＭＯＤＥ＝１とされていることから、あらかじめ定められている定数値をデータａとして選択し減算器４３に出力する。定数値とするのはフォルマントを形成する波形データのデータ量が固定値とされているからである。そして、減算器４３で演算された減算値（ａ−ｂ）が出力され、減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算器４５に供給される。
【００２６】
また、減算値（ａ−ｂ）のＭＳＢ信号が選択信号としてセレクタ４６に供給されると共に、累算器４１にロード信号として供給される。ＭＳＢ信号は、減算値（ａ−ｂ）が負の値になった際に“１”になることから、セレクタ４６は累算値ｂが定数値を超えるまでは累算値ｂを加算器４７に出力する。加算器４７に加算データを供給するセレクタ５０は、ＨＶＭＯＤＥ＝１とされていることからセレクタ４９の出力を選択して加算器４７に出力する。また、セレクタ４９はＵ／Ｖ＝０とされていることから、スタートアドレス発生器４８から出力される有声音フォルマントを形成する選択された波形データのスタートアドレスＳＡ（ＷＳ）をセレクタ４９に出力している。さらに、スタートアドレス発生器４８は、有声音フォルマントを形成するに適した波形を選択するよう入力されている波形選択（ＷＳ）信号に応じて波形データを選択するよう波形データ記憶部２２上のスタートアドレスＳＡを出力している。これにより、加算器４７においてスタートアドレスＳＡ（ＷＳ）に累算値ｂが加算され、ＡＤＧ出力として出力される。累算値ｂはクロック毎に位相データが累算されて位相データに応じたレートで変化していくことから、ＡＤＧ出力である有声音フォルマントを形成する波形データを読み出す読み出しアドレスも位相データに応じたレートで変化していくようになる。
【００２７】
そして、累算が進んで累算値が定数値に達すると、減算値（ａ−ｂ）が負の値となってＭＳＢ信号が“１”になり、セレクタ４６に供給される。すると、セレクタ４６からデータｃが出力されるようになるが、データｃは、ＨＶＭＯＤＥ＝１とされていることからセレクタ４４において選択された“０”に、加算器４５において減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算された演算値とされる。これにより、加算器４７から出力されるＡＤＧ出力は振幅値｜ａ−ｂ｜の読み出しアドレスとなる。また、ＭＳＢ信号は累算器４１にロード信号として供給されて、累算器４１にデータｃがロードされるようになる。そして、次のクロックで位相データがデータｃに加算されると、ＭＳＢ信号が“０”に戻ることから累算器４１から出力されるデータｂがセレクタ４６から出力されるようになる。累算器４１における位相データの累算はクロック毎に行われ、ＡＤＧ出力はスタートアドレスＳＡ（ＷＳ）から位相データに応じたレートで変化していき、定数値分だけ増分した際に再びスタートアドレスＳＡ（ＷＳ）に戻ることから、ＡＤＧ出力はスタートアドレスＳＡ（ＷＳ）から定数値分増分されるまでの読み出しアドレスを繰り返すようになる。この場合の位相データは有声音フォルマントの中心周波数に基づいていることから、読み出しアドレスは有声音フォルマントの中心周波数に応じたレートで変化するようになる。さらに、累算器４１はＷＴボイス部１０ａから出力される有声音ピッチ信号により初期値にリセットされることから、ＡＤＧ出力は有声音ピッチの周期毎にリセットされ、ＡＤＧ信号を読み出しアドレスとして波形データ記憶部２２から読み出した波形データにより形成される所定の中心周波数を有する有声音フォルマントに、ピッチ感を有させることができるようになる。
【００２８】
この場合のＡＤＧ出力をグラフで図示すると、図６に示すようになる。すなわち、キーオン信号が印加されると有声音フォルマントを形成させる波形データを選択するＷＳ信号に対応したスタートアドレスＳＡ（ＷＳ）が出力される。そして、累算器４１の作用により有声音フォルマントの中心周波数に応じたレートで変化する読み出しアドレスが上昇していきスタートアドレスＳＡ（ＷＳ）が定数値分増分された際に、スタートアドレスＳＡ（ＷＳ）に戻り、以降は、スタートアドレスＳＡ（ＷＳ）から定数値分増分した値までの読み出しアドレスを繰り返し発生するようになる。このＡＤＧ出力により、波形データ記憶部２２から選択された波形データを読み出すと、読み出された波形データにより所定の中心周波数の有声音フォルマントが形成されるようになる。そして、キーオン信号により発音停止された際にＡＤＧ出力は停止されるようになる。なお、スタートアドレスＳＡ（ＷＳ）すなわちＷＳ（有声音フォルマント）信号により波形データ記憶部２２から読み出される波形データの種類を選択することができ、これにより形成される有声音フォルマントのフォルマントを変化させることができる。また、図６では、累算器４１がＷＴボイス部１０ａから出力される有声音ピッチ信号により初期値にリセットされることは図示していない。
【００２９】
次に、ＡＤＧ２１において、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１とされて無声音フォルマントを発生する際の動作を説明する。ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１とされると、アンドゲートＡＮＤがゲートＮＯＴの作用により閉じるためオアゲートＯＲから出力されるキーオン信号によってのみ累算器４１は初期値にリセットされ、ＰＧ２０から供給される発生すべき無声音フォルマントの中心周波数に応じた位相データの累算を開始する。この累算はクロック毎に行われ、その累算値ｂはセレクタ４６および減算器４３に出力される。減算器４３にデータａを供給するセレクタ４２はＨＶＭＯＤＥ＝１とされていることからあらかじめ定められている定数値をデータａとして選択し減算器４３に出力する。定数値とするのはフォルマントを形成する波形データのデータ量が固定値とされているからである。そして、減算器４３で演算された減算値（ａ−ｂ）が出力され、減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算器４５に供給される。
【００３０】
また、減算値（ａ−ｂ）のＭＳＢ信号が選択信号としてセレクタ４６に供給されると共に、累算器４１にロード信号として供給される。ＭＳＢ信号は、減算値（ａ−ｂ）が負の値になった際に“１”になることから、セレクタ４６は累算値ｂが定数値を超えるまでは累算値ｂを加算器４７に出力する。加算器４７に加算データを供給するセレクタ５０は、ＨＶＭＯＤＥ＝１とされていることからセレクタ４９の出力を選択して加算器４７に出力する。また、セレクタ４９はＵ／Ｖ＝１とされていることから、サイン波の波形データのスタートアドレスＳＡ（サイン）をセレクタ４９に出力している。これは、サイン波が無声音フォルマントを形成するのに適しているからである。これにより、加算器４７においてスタートアドレスＳＡ（サイン）に累算値ｂが加算され、ＡＤＧ出力として出力される。累算値ｂはクロック毎に位相データが累算されて無声音フォルマントの中心周波数に応じたレートで変化していくことから、ＡＤＧ出力である無声音フォルマントを形成する波形データを読み出す読み出しアドレスも無声音フォルマントの中心周波数に応じたレートで変化していくようになる。
【００３１】
そして、累算値ｂが定数値を超えた際にＭＳＢ信号は“１”に変化することから、セレクタ４６は加算器４５から出力されるデータｃを出力するようになる。データｃは、ＨＶＭＯＤＥ＝１とされていることからセレクタ４４において選択された“０”に、加算器４５において減算値（ａ−ｂ）のＭＳＢが除外された振幅値｜ａ−ｂ｜が加算された演算値とされる。これにより、加算器４７から出力されるＡＤＧ出力は振幅値｜ａ−ｂ｜の読み出しアドレスとなる。また、ＭＳＢ信号は累算器４１にロード信号として供給されて、累算器４１にデータｃがロードされるようになる。そして、次のクロックで位相データがデータｃに加算されると、ＭＳＢ信号が“０”に戻ることから累算器４１から出力されるデータｂがセレクタ４６から出力されるようになる。累算器４１における位相データの累算は、クロック毎に行われＡＤＧ出力はスタートアドレスＳＡ（サイン）から位相データに応じたレートで変化していき、定数値分だけ増分した際に再びスタートアドレスＳＡ（サイン）に戻ることから、ＡＤＧ出力はスタートアドレスＳＡ（サイン）から定数値分増分されるまでの読み出しアドレスを繰り返すようになる。この場合の位相データは無声音フォルマントの中心周波数に基づいていることから、読み出しアドレスは無声音フォルマントの中心周波数に応じたレートで変化するようになる。このＡＤＧ信号を読み出しアドレスとして波形データ記憶部２２から読み出した波形データにより、所定の中心周波数を有する無声音フォルマントが形成される。
【００３２】
この場合のＡＤＧ出力をグラフで図示すると、図８に示すようになる。すなわち、キーオン信号が印加されると無声音フォルマントを形成させるサイン波の波形データのスタートアドレスＳＡ（サイン）が出力され、累算器４１の作用により無声音フォルマントの中心周波数に応じたレートで変化する読み出しアドレスが上昇していきスタートアドレスＳＡ（サイン）が定数値分増分された際に、スタートアドレスＳＡ（サイン）に戻り、以降は、スタートアドレスＳＡ（サイン）から定数値分増分した値までの読み出しアドレスを繰り返し発生するようになる。このＡＤＧ出力により、波形データ記憶部２２からサイン波の波形データを読み出すと、読み出された波形データにより所定の中心周波数の無声音フォルマントが形成されるようになる。そして、キーオン信号により発音停止された際にＡＤＧ出力は停止されるようになる。
【００３３】
ここで、波形データ記憶部２２に記憶されている有声音フォルマントあるいは無声音フォルマントを形成するための複数種類の波形データの波形形状の一例を図１４に示す。
図１４では、波形データ記憶部２２に３２種類の波形形状の波形データが記憶されている例が示されており、ＷＳ（有声音フォルマント）信号として“０”をセットすると、０番のサイン波が読み出されるようになり、例えばＷＳ（有声音フォルマント）信号として“１６”をセットすると、１６番の三角波が読み出されるようになる。また、スタートアドレスＳＡ（サイン）は０番のサイン波の波形データ記憶部２２上のスタートアドレスとされている。これらの３２種類の波形データのデータ量は固定とされており、このデータ量に前記した定数値が対応している。従って、ＡＤＧ２１から出力されるＡＤＧ出力により３２種類の波形データのいずれかを読み出すと、選択された波形形状の波形データが発音停止されるまで繰り返し読み出されるようになる。
【００３４】
図２に戻り波形データ記憶部２２から読み出された波形データは乗算器２３に供給され、エンベロープ発生器（ＥＧ）２４により発生されたエンベロープ信号が乗算される。ＥＧ２４には、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報と、楽音パラメータとしてアタックレートＡＲ（ＷＴ）、ディケイレートＤＲ（ＷＴ）、サスティンレートＳＲ（ＷＴ）、リリースレートＲＲ（ＷＴ）、サスティンレベルＳＬ（ＷＴ）が供給され、さらに、楽音および音声に共通の発音開始を指示するキーオン（ＫｅｙＯｎ）信号が供給されている。
【００３５】
このようなエンベロープ発生器（ＥＧ）２４の詳細構成を示すブロック図を図９に示す。
楽音を発生する場合には図９に示すＥＧ２４において、ＨＶＭＯＤＥ＝０としてセレクタ６０においてアタックレートＡＲ（ＷＴ）を選択してセレクタ６１へ出力し、セレクタ６３においてディケイレートＤＲ（ＷＴ）を選択してセレクタ６１へ出力し、セレクタ６４においてリリースレートＲＲ（ＷＴ）を選択してセレクタ６１へ出力する。さらに、セレクタ６１にはサスティンレートＳＲ（ＷＴ）が入力されている。セレクタ６１は、ステート制御部６６により制御されてアタック、ディケイ、サスティン、リリースの各ステート毎に当該ステートのエンベロープパラメータを選択して出力する。ステート制御部６６には、キーオン信号、音声モードフラグ（ＨＶＭＯＤＥ）が供給されると共に、サスティンレベルＳＬ（ＷＴ）信号が入力されている。また、ＷＴボイス部１０ａから出力される有声音ピッチ信号、無声音／有声音指示フラグ（Ｕ／Ｖ）も供給されているが、これらは使用されない。セレクタ６１からステートに応じて出力されるエンベロープパラメータは累算器（ＡＣＣ）６５により累算されてエンベロープが生成されてＥＧ出力として出力されると共に、ステート制御部６６に供給される。ステート制御部６６は、ＥＧ出力のレベルからステートを判断することができる。累算器６５ではキーオン信号の開始タイミングで累算をスタートする。
【００３６】
この場合のＥＧ出力をグラフで図示すると、図１０に示すようになる。すなわち、ステート制御部６６および累算器６５に供給されているキーオン信号が立ち上がると、ステート制御部６６は発音開始と判断してセレクタ６１から発音開始時のステートであるアタック用のアタックレートＡＲ（ＷＴ）のパラメータを出力させる。このアタックレートＡＲ（ＷＴ）のパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１０に示すＡＲのように急速に上昇していく。そして、ＥＧ出力のレベルが例えば０ｄＢに達すると、ステート制御部６６はステートがディケイに移行したと判断してセレクタ６１からディケイレートＤＲ（ＷＴ）のパラメータを出力させる。このディケイレートＤＲ（ＷＴ）のパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１０に示すＤＲのように急速に下降していく。
【００３７】
ＥＧ出力が下降していき、ＥＧ出力のレベルがサスティンレベルＳＬ（ＷＴ）に達すると、ステート制御部６６はそのことを検出してステートがサスティンに移行したと判断し、セレクタ６１からサスティンレートＳＲ（ＷＴ）のパラメータを出力させる。出力されたサスティンレートＳＲ（ＷＴ）のパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１０に示すＳＲのように緩やかな傾斜で下降していく。ステート制御部６６は、キーオン信号が立ち下がるまではサスティンを継続させ、ここで、キーオン信号が立ち下がりステート制御部６６が発音停止と判断すると、セレクタ６１からリリースレートＲＲ（ＷＴ）のパラメータを出力させる。出力されたリリースレートＲＲ（ＷＴ）のパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１０に示すＲＲのように急速に傾斜で下降していき発音が停止されるようになる。
【００３８】
次に、音声における有声音フォルマントを発生する場合には図９に示すＥＧ２４において、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝０としてセレクタ６０において初期ステート用の急速立ち上げレートを選択してセレクタ６１へ出力し、セレクタ６２でＵ／Ｖ＝０に応じて選択された中間ステート用の定数値をセレクタ６３において選択してセレクタ６１へ出力し、セレクタ６４において終了ステート用の急速減衰レートを選択してセレクタ６１へ出力する。さらに、セレクタ６１にはサスティンレートＳＲ（ＷＴ）が入力されているが、このパラメータは使用されない。セレクタ６１は、ステート制御部６６により制御されて初期、中間、終了の各ステート毎に当該ステートのエンベロープパラメータを選択して出力する。ステート制御部６６には、キーオン信号、ＷＴボイス部１０ａから出力される有声音ピッチ信号、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報が供給されている。また、サスティンレベルＳＬ（ＷＴ）信号が供給されているが、この場合は使用されない。セレクタ６１からステートに応じて出力されるエンベロープパラメータは累算器（ＡＣＣ）６５によりクロック毎に累算されてエンベロープが生成されてＥＧ出力として出力されると共に、ステート制御部６６に供給される。ステート制御部６６は、ＥＧ出力のレベルからステートを判断することができる。累算器６５ではキーオン信号の開始タイミングで累算をスタートする。
【００３９】
この場合のＥＧ出力をグラフで図示すると、図１１に示すようになる。すなわち、ステート制御部６６および累算器６５に供給されているキーオン信号が立ち上がると、ステート制御部６６は発音開始と判断してセレクタ６１から初期ステート用の急速立ち上げレートのパラメータを出力させる。この急速立ち上げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１１に示すように急激に上昇していく。そして、ＥＧ出力のレベルが所定レベルに達すると、ステート制御部６６は中間ステートに移行したと判断してセレクタ６１から中間ステート用の定数値のパラメータを出力させる。この定数値のパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１１に示すように緩やかに下降していく。
【００４０】
ここで、ステート制御部６６に図７に示す有声音ピッチ信号が入力されると、ステート制御部６６はセレクタ６１を制御して急速立ち下げレートのパラメータを選択して累算器６５に出力する。この急速立ち下げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１１に示すように急激に下降していく。そして、ＥＧ出力のレベルが所定の最低レベルに達すると、ステート制御部６６はセレクタ６１を制御して急速立ち下げレートのパラメータを再び選択して累算器６５に出力する。この急速立ち上げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１１に示すように急激に上昇していく。そして、ＥＧ出力のレベルが所定レベルに達すると、ステート制御部６６は中間ステートに移行したと判断してセレクタ６１から中間ステート用の定数値のパラメータを出力させる。以下、同様の動作が繰り返し行われる。このように、有声音ピッチの周期を有するエンベロープとされるため、このエンベロープが乗算器２３で乗算された波形データにピッチ感を与えることができるようになる。
【００４１】
また、キーオン信号が立ち下がりステート制御部６６が発音停止と判断すると、ステート制御部６６はセレクタ６１を制御して急速立ち下げレートのパラメータを選択して累算器６５に出力する。この急速立ち下げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は急激に下降していき発音が停止されるようになる。
【００４２】
次に、音声における無声音フォルマントを発生する場合には図９に示すＥＧ２４において、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１としてセレクタ６０において初期ステート用の急速立ち上げレートを選択してセレクタ６１へ出力し、セレクタ６２でＵ／Ｖ＝１に応じて選択された中間ステート用の“０”をセレクタ６３において選択してセレクタ６１へ出力し、セレクタ６４において終了ステート用の急速減衰レートを選択してセレクタ６１へ出力する。さらに、セレクタ６１にはサスティンレートＳＲ（ＷＴ）が入力されているが、このパラメータは使用されない。セレクタ６１は、ステート制御部６６により制御されて初期、中間、終了の各ステート毎に当該ステートのエンベロープパラメータを選択して出力する。ステート制御部６６には、キーオン信号、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報が供給されている。また、ＷＴボイス部１０ａから出力される有声音ピッチ信号およびサスティンレベルＳＬ（ＷＴ）信号が供給されているが、この場合は使用されない。セレクタ６１からステートに応じて出力されるエンベロープパラメータは累算器（ＡＣＣ）６５により累算されてエンベロープが生成されてＥＧ出力として出力されると共に、ステート制御部６６に供給される。ステート制御部６６は、ＥＧ出力のレベルからステートを判断することができる。累算器６５ではキーオン信号の開始タイミングで累算をスタートする。
【００４３】
この場合のＥＧ出力をグラフで図示すると、図１２に示すようになる。すなわち、ステート制御部６６および累算器６５に供給されているキーオン信号が立ち上がると、ステート制御部６６は発音開始と判断してセレクタ６１から初期ステート用の急速立ち上げレートのパラメータを出力させる。この急速立ち上げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１２に示すように急激に上昇していく。そして、ＥＧ出力のレベルが所定レベルに達すると、ステート制御部６６は中間ステートに移行したと判断してセレクタ６１から中間ステート用の“０”のパラメータを出力させる。これにより、累算器６５から出力されるＥＧ出力は図１２に示すように、その値を維持するようになる。ここで、キーオン信号が立ち下がりステート制御部６６が発音停止と判断すると、ステート制御部６６はセレクタ６１を制御して急速立ち下げレートのパラメータを選択して累算器６５に出力する。この急速立ち下げレートのパラメータは、累算器６５においてクロック毎に累算されＥＧ出力は図１２に示すように急激に下降していき発音が停止されるようになる。
なお、図１０ないし図１２に示すＥＧ出力では直線的に変化しているエンベロープを形成するようにしたが、曲線的に変化するエンベロープを発生するようにしてもよい。また、ＥＧ２４の出力を波形データに乗算する乗算器２３は後述する加算器２５の後段に配置してもよい。
【００４４】
図２に戻り乗算器２３においてエンベロープが乗算された波形データは、加算器２５に供給されてノイズ発生部２６により発生されたノイズが加算される。ノイズは、例えばホワイトノイズとされる。この場合、ノイズ発生部２６には音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報が供給されており、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１とされて無声音フォルマントを発生する際にのみノイズを発生するようにしている。従って、加算器２５においては無声音フォルマントを形成するエンベロープが乗算された波形データにのみノイズが加算されて出力されるようになる。
【００４５】
ここで、ノイズ発生部２６の詳細構成を図１３に示す。図１３に示すように、ノイズ発生部２６におけるホワイトノイズ発生器７０から発生されたホワイトノイズは、４段のローパスフィルタ（ＬＰＦ１，ＬＰＦ２，ＬＰＦ３，ＬＰＦ４）７１，７２，７３，７４により帯域制限される。そして、ローパスフィルタ７４の出力は乗算器７５においてノイズのレベルが調整され、セレクタ７６に入力される。セレクタ７６はアンドゲート（ＡＮＤ）７７の出力により選択されており、アンドゲート７７はＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１とされて無声音フォルマントを発生する際にセレクタ７６において乗算器７５から出力されるノイズを出力している。また、ＨＶＭＯＤＥ＝１およびＵ／Ｖ＝１のいずれかが“０”とされて楽音あるいは有声音フォルマントを発生する際には、アンドゲート７７の出力によりセレクタ７６からはノイズに代えて“０”が出力される。これにより、加算器２５においては無声音フォルマントを形成するエンベロープが乗算された波形データにのみノイズが加算されて出力されるようになる。
【００４６】
ローパスフィルタ７１〜７４は同様の構成とされており、代表としてローパスフィルタ７１の構成が図１３に示されている。ローパスフィルタ７１において、ホワイトノイズ発生器７０から入力されたホワイトノイズは、遅延回路７０ａにより１サンプル時間遅延され係数乗算器７０ｂにおいて所定の係数が乗算され加算器７０ｄに入力される。また、入力されたホワイトノイズは係数乗算器７０ｃにおいて所定の係数が乗算され加算器７０ｄに入力されて、係数乗算器７０ｂの出力に加算される。加算器７０ｄの出力がローパスフィルタ出力となる。このような構成の、例えば４段のローパスフィルタ７１〜７４によりホワイトノイズの帯域制限を行うことにより、音声における耳につく感じを抑制することができるようになる。なお、乗算器７５におけるノイズレベルのレベル調整は必ずしも必要なものではなく、省略するようにしてもよい。
【００４７】
図２に戻り加算器２５から出力された波形データは、乗算器２７に供給されて出力レベルが調整される。乗算器２７には、音声モードフラグ（ＨＶＭＯＤＥ）、無声音／有声音指示フラグ（Ｕ／Ｖ）のフラグ情報と、楽音の出力レベルを示すレベル（ＷＴ）、有声音フォルマントの出力レベルを示すレベル（有声音フォルマント）、無声音フォルマントの出力レベルを示すレベル（無声音フォルマント）が供給されている。そして、ＨＶＭＯＤＥ＝０とされて楽音を発生する場合には、乗算器２７においてレベル（ＷＴ）が乗算されて楽音の波形データの出力レベルが調整される。また、ＨＶＭＯＤＥ＝１、Ｕ／Ｖ＝０とされて有声音フォルマントを発生する場合には、乗算器２７においてレベル（有声音フォルマント）が乗算されて有声音フォルマントを形成する波形データの出力レベルが調整される。これにより、有声音フォルマントのレベルが所定のレベルとなる。さらに、ＨＶＭＯＤＥ＝１、Ｕ／Ｖ＝１とされて無声音フォルマントを発生する場合には、乗算器２７においてレベル（無声音フォルマント）が乗算されて無声音フォルマントを形成する波形データの出力レベルが調整される。これにより、無声音フォルマントのレベルが所定のレベルとなる。
【００４８】
以上の説明では、本発明にかかる音源装置と兼用される音声合成装置は９つの波形データ記憶部を有するＷＴボイス部から構成したが、これに限るものではなく９未満でも９を超えるＷＴボイス部としてもよい。９を超えるＷＴボイス部とすると、楽音の同時発音数を増加させることができると共に、合成するフォルマント数を増加することができ種々の音声を合成することができる。
また、本発明にかかる音源装置と兼用される音声合成装置は、音声モードフラグ（ＨＶＭＯＤＥ）で楽音を指定した場合には、複数のＷＴボイス部は楽音形成部として機能し、音声モードフラグ（ＨＶＭＯＤＥ）で音声を指定した場合には、複数のＷＴボイス部はフォルマント形成部として機能するようになる。また、音声モードフラグ（ＨＶＭＯＤＥ）を音声に固定することにより、専用の音声合成装置として使用することができる。
【００４９】
【発明の効果】
本発明は以上説明したように、複数の波形テーブルボイス部である複数のフォルマント形成部により所望のフォルマント中心周波数および所望のフォルマントレベルをそれぞれ有するフォルマントを形成し、形成された複数のフォルマントを合成することにより音声を合成している。そして、フォルマントを形成する波形データにピッチ周期のエンベロープ信号を付与するようにしている。これにより、フォルマントにピッチ感を有させることができ、高品位のリアリティのある音声を合成することができるようになる。また、有声音フォルマントを形成する波形データにピッチ周期のエンベロープ信号を付与することにより、有声音フォルマントにピッチ感を有させることができる。
【００５０】
また、複数の波形テーブルボイス部から楽音パラメータに基づいて出力される波形データを、ミキシングすることにより複数の楽音を発生することができ、複数の波形テーブルボイス部から音声パラメータに基づいて出力される有声音フォルマントあるいは無声音フォルマントを形成する波形データを合成することにより音声を合成することができる。このように、複数の波形テーブルボイス部を楽音発生と音声合成とで兼用することができるため、本発明の音声合成装置は音源装置と兼用することができるようになる。
【図面の簡単な説明】
【図１】本発明の実施の形態の音源装置と兼用される音声合成装置の構成を示すブロック図である。
【図２】本発明の実施の形態の音源装置と兼用される音声合成装置におけるＷＴボイス部の概略構成を示すブロック図である。
【図３】本発明の実施の形態の音源装置と兼用される音声合成装置における位相データ発生器の詳細構成を示すブロック図である。
【図４】本発明の実施の形態の音源装置と兼用される音声合成装置におけるアドレス発生器の詳細構成を示すブロック図である。
【図５】本発明の実施の形態の音源装置と兼用される音声合成装置におけるアドレス発生器のＡＤＧ出力の一例を示すグラフである。
【図６】本発明の実施の形態の音源装置と兼用される音声合成装置におけるアドレス発生器のＡＤＧ出力の他の例を示すグラフである。
【図７】本発明の実施の形態の音源装置と兼用される音声合成装置におけるアドレス発生器の有声音ピッチ信号の波形を示す図である。
【図８】本発明の実施の形態の音源装置と兼用される音声合成装置におけるアドレス発生器のＡＤＧ出力のさらに他の例を示すグラフである。
【図９】本発明の実施の形態の音源装置と兼用される音声合成装置におけるエンベロープ発生器の詳細構成を示すブロック図である。
【図１０】本発明の実施の形態の音源装置と兼用される音声合成装置におけるエンベロープ発生器のＥＧ出力の一例を示すグラフである。
【図１１】本発明の実施の形態の音源装置と兼用される音声合成装置におけるエンベロープ発生器のＥＧ出力の他の例を示すグラフである。
【図１２】本発明の実施の形態の音源装置と兼用される音声合成装置におけるエンベロープ発生器のＥＧ出力のさらに他の例を示すグラフである。
【図１３】本発明の実施の形態の音源装置と兼用される音声合成装置におけるノイズ発生部の詳細構成を示すブロック図である。
【図１４】本発明の実施の形態の音源装置と兼用される音声合成装置における波形データ記憶部に記憶されている有声音フォルマントあるいは無声音フォルマントを形成するための複数種類の波形データの波形形状の一例を示す図である。
【符号の説明】
１音声合成装置、１０ＷＴボイス部、１０ａ，１０ｂ，１０ｃ，１０ｄ，１０ｅ，１０ｆ，１０ｇ，１０ｈ，１０ｉＷＴボイス部、１１ミキシング手段、２０位相データ発生器、２１アドレス発生器、２２波形データ記憶部、２３乗算器、２５加算器、２６ノイズ発生部、２７乗算器、３０セレクタ、３１セレクタ、３２セレクタ、３３セレクタ、３４シフター、４１累算器、４２セレクタ、４３減算器、４４セレクタ、４５加算器、４６セレクタ、４７加算器、４８スタートアドレス発生器、４９セレクタ、５０セレクタ、６０セレクタ、６１セレクタ、６２セレクタ、６３セレクタ、６４セレクタ、６５累算器、６６ステート制御部、７０ホワイトノイズ発生器、７０ａ遅延回路、７０ｂ係数乗算器、７０ｃ係数乗算器、７０ｄ加算器、７１，７２，７３，７４ローパスフィルタ、７５乗算器、７６セレクタ、７７アンドゲート、ＡＲアタックレート、ＢＬＯＣＫオクターブ情報、ＤＲディケイレート、ＥＰエンドポイント、ＦＮＵＭ周波数情報、ＬＰループポイント、ＲＲリリースレート、ＳＡスタートアドレス、ＳＬサスティンレベル、ＳＲサスティンレート[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech synthesizer capable of synthesizing speech by synthesizing a plurality of formants.
[0002]
[Prior art]
As an example of a conventional speech synthesizer, it is based on the principle that speech in a short time of several ms to several tens of ms is regarded as stationary and speech is represented by the sum of several sine waves. Then, the voiced sound is formed by resetting the phase of the sine wave generator that generates the sine wave at the pitch cycle, and the spectrum is expanded by randomizing the phase initialization timing of the sine wave generator to form an unvoiced sound. There is known a speech synthesizer that performs the following (for example, see Patent Document 1).
[0003]
[Patent Document 1]
Japanese Patent Publication No. 58-53351
[Problems to be solved by the invention]
However, there is a problem that the quality of speech that can be synthesized by the conventional speech synthesizer is low and there is no reality.
Therefore, an object of the present invention is to provide a speech synthesizer capable of synthesizing high-quality speech.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, a voice synthesizing device of the present invention includes a plurality of formant forming units each forming a formant having a desired formant center frequency and a desired formant level, and is formed by the plurality of formant forming units. A voice synthesizer that synthesizes voice by synthesizing a plurality of formants, wherein each of the plurality of formant forming units specifies a desired waveform shape from among a plurality of types of waveform shapes, and A waveform data storage means for storing a plurality of waveform data corresponding to the plurality of types of waveform shapes; and an address changing at a rate corresponding to the formant center frequency, and a waveform designated by the waveform shape designation means. Reading waveform data corresponding to a shape from the waveform data storage means And an envelope signal having a shape that rapidly attenuates at each timing corresponding to the pitch period and rises rapidly after the attenuation, and reads the formed envelope signal from the waveform data storage unit by the waveform data reading unit. And envelope providing means for providing the obtained waveform data.
[0006]
Further, in the above-described voice synthesizer of the present invention, a voiced sound may be synthesized by synthesizing a plurality of formants formed by the plurality of formant forming units.
[0007]
According to the present invention, a formant having a desired formant center frequency and a desired formant level is respectively formed by the plurality of formant forming sections, and the formed plural formants are synthesized to synthesize a voice. . Then, an envelope signal having a pitch cycle is added to waveform data forming a formant. As a result, the formant can be given a sense of pitch, and high-quality, realistic voice can be synthesized. Further, by giving an envelope signal of a pitch period to waveform data forming a voiced sound formant, the voiced sound formant can have a sense of pitch.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a block diagram showing a configuration of a speech synthesizer used also as a sound source device according to an embodiment of the present invention.
The speech synthesizer 1 shown in FIG. 1 has nine waveform tables each including at least a waveform data storage unit storing waveform data of a plurality of types of waveform shapes, and reading means for reading predetermined waveform data from the waveform data storage unit. A voice (WT voice) unit 10a, 10b, 10c, 10d, 10e, 10f, 10g, 10h, 10i, and a mixing unit 11 for mixing waveform data output from the WT voice units 10a to 10i. Outputs the generated musical sound or synthesized voice. In this case, tone parameters and voice parameters are supplied to the WT voice units 10a to 10i as various parameters, and a voice mode flag (HVMODE) for instructing generation of a tone / voice instructs generation of a tone (HVMODE = 0). If so, the tone parameter is selected and used by the WT voice units 10a to 10i. Then, the waveform data of a plurality of musical tones generated from the WT voice units 10a to 10i based on the selected musical tone parameters are output, and the mixing means 11 outputs a musical tone including up to nine tones.
[0009]
If the voice mode flag (HVMODE) for instructing generation of a musical tone / voice indicates that voice is to be generated (HVMODE = 1), voice parameters are selected and used in the WT voice units 10a to 10i. Then, based on the selected voice parameters, the WT voice units 10a to 10i output voiced sound pitch signals, waveform data forming voiced sound formants or unvoiced sound formants, and output waveform data forming voiced sound formants and unvoiced sound formants. Are synthesized by the mixing means 11 to output one voice. Note that HV in HVMODE is an abbreviation for Human Voice. U / V is an unvoiced sound / voiced sound indication flag. When HVMODE = 1 and U / V = 0 are supplied, U / V is a voiced sound from the WT voice units 10b to 10i. Waveform data forming a formant is output. The WT voice unit 10a to which HVMODE = 1 and U / V = 0 are supplied outputs a voiced sound pitch signal having a pitch period of a voiced sound, and does not use waveform data. The voiced sound pitch signal output from the WT voice unit 10a is supplied to the WT voice units 10b to 10i, and the phase of the waveform data forming the voiced sound formant is reset every period of the voiced sound pitch signal. . Further, the envelope shape of the voiced sound formant corresponds to the period of the voiced sound pitch signal. As a result, the voiced sound formant can have a sense of pitch.
[0010]
When HVMODE = 1 and U / V = 1 are supplied to the WT voice units 10b to 10i, the WT voice units 10b to 10i output waveform data forming an unvoiced sound formant. The output from the WT voice unit 10a to which HVMODE = 1 and U / V = 1 are not used. As described above, when HVMODE = 1, a maximum of eight voiced formants or unvoiced sound formants can be output by the WT voice units 10b to 10i.
[0011]
Here, to explain the voice, the source of the voice is the vibration of the vocal cords, but the vibration of the vocal cords hardly changes even if the words to be pronounced are different. Resonance and resonance caused by the way the mouth is opened and the shape of the throat, and the accompanying fricatives and plosives are added to the vibrations of the vocal cords to produce various sounds. In such speech, there are a plurality of portions called formants in which the spectrum is concentrated in a specific frequency region on the frequency axis. The center frequency of this formant or the frequency with the largest amplitude is the formant center frequency. The number of formants included in the voice, the center frequency, amplitude, and bandwidth of each formant are factors that determine the characteristics of the voice, and vary greatly depending on the gender, physique, age, and the like of the person who outputs the voice. In addition, a combination of formants characteristic of each type of words to be pronounced in a voice is determined, and the combination of formants does not affect voice quality. The types of formants are roughly classified into a voiced sound formant having a sense of pitch for synthesizing voiced sounds and an unvoiced sound formant without a sense of pitch for synthesizing unvoiced sounds. The voiced sound is a voice in which the vocal cords vibrate when it is pronounced, and the voiced sound includes vowels and semi-vowels, and voiced consonants used in ba-, ga-, ma-, and la-lines. . The unvoiced sound is a sound in which the vocal cords do not vibrate when being pronounced, and consonants such as c-line, power-line, and sub-line correspond to the unvoiced sound.
[0012]
In the sound synthesizer 1 having the configuration shown in FIG. 1 and also serving as the sound source device according to the present invention, when generating a musical tone, HVMODE = 0 is set so that each of the WT voice units 10a to 10i generates a plurality of musical tones. I have to. That is, it is possible to generate a tone consisting of up to nine tones.
When synthesizing voices, the WT voice units 10b to 10i form voiced sound formants or unvoiced sound formants corresponding to voiced or unvoiced sounds to be synthesized with HVMODE = 1. In this case, the synthesized speech is a combination of up to eight formants. For example, when the voice to be synthesized is a voiced sound, U / V = 0 is supplied to the WT voice units 10b to 10i, and the voiced sound formants based on the supplied voice parameters are respectively processed by the WT voice units 10b to 10i. It is formed. At this time, U / V = 0 is supplied to the WT voice unit 10a, and the WT voice unit 10a generates a voiced sound pitch signal based on the supplied voice parameters. This voiced sound pitch signal is supplied to the WT voice units 10b to 10i, and the phase of the output waveform data forming the voiced sound formant is reset every period of the voiced sound pitch signal. Further, the envelope shape of the voiced sound formant corresponds to the period of the voiced sound pitch signal. As a result, a voiced sound formant having a sense of pitch is formed by the WT voice units 10b to 10i.
[0013]
When the synthesized voice is an unvoiced sound, HVMODE = 1 and U / V = 1 are supplied to the WT voice units 10b to 10i, and the unvoiced sound formants based on the supplied voice parameters are respectively converted to the WT voice units 10b to 10i. 10i. As described later, in the case of an unvoiced sound, the unvoiced sound formant is added with noise. This makes it possible to synthesize high-quality, realistic speech. Note that when synthesizing an unvoiced sound, the output of the WT voice 10a is not used.
[0014]
The configurations of the WT voice units 10a to 10i in the voice synthesizer 1 are the same, and the configuration of the WT voice unit 10 will be described below. FIG. 2 is a block diagram illustrating a schematic configuration of the WT voice unit 10. Note that in FIG. 2 and thereafter, the notation of (WT), (voiced sound formant), and (unvoiced sound formant) indicates that the parameters are parameters for generating a musical tone, a voiced sound formant, and an unvoiced sound formant, respectively. I have.
In FIG. 2, a phase data generator (PG: Phase Generator) 20 generates phase data corresponding to the pitch of a musical tone to be generated or a voiced pitch signal, a voiced sound formant center frequency, or an unvoiced sound formant center frequency. ing. The PG 20 is supplied with flag information of a voice mode flag (HVMODE), an unvoiced sound / voiced sound instruction flag (U / V), octave information BLOCK (WT) of a musical sound as musical sound parameters, and frequency information FNUM (WT) of a musical sound. ing. Further, as voice parameters, octave information BLOCK (voiced pitch) of the voiced pitch signal, frequency information FNUM (voiced pitch) of the voiced pitch signal, octave information BLOCK (voiced sound formant) of the voiced formant, and voiced parameter Each parameter of the voice information formant frequency information FNUM (voiced formant), unvoiced formant octave information BLOCK (unvoiced formant), and unvoiced formant frequency information FNUM (unvoiced formant) is supplied. In the PG 20, the supplied various parameters are selected based on the flag information, and correspond to any of the musical tone pitch to be generated based on the selected parameters, the voiced pitch signal, the voiced formant center frequency, and the unvoiced formant center frequency. Phase data is generated.
[0015]
The detailed configuration of the PG 20 is shown in FIG. In FIG. 3, the selector 30 selects either the voiced pitch signal or the voiced formant frequency information FNUM or the unvoiced formant frequency information FNUM according to the state of the U / V flag, and outputs the selected signal to the selector 31. . The selector 31 selects either the tone frequency information FNUM (WT) or the sound-related frequency information FNUM output from the selector 30 according to the state of the HVMODE flag, and outputs the selected frequency information FNUM (WT) to the shifter 34. The output frequency information FNUM is set in the shifter 34. The selector 32 selects one of a voiced sound pitch signal or octave information BLOCK of a voiced sound formant and an octave information BLOCK of an unvoiced sound formant in accordance with the state of the U / V flag, and outputs the selected signal to the selector 33. The selector 33 selects one of the octave information BLOCK (WT) of the musical tone and the sound-related octave information BLOCK output from the selector 32 in accordance with the state of the HVMODE flag, and outputs the selected information to the shifter 34 as shift information. The frequency information FNUM set in the shifter 34 is shifted according to the octave information BLOCK. Thus, the PG 20 outputs the octave-added phase data for generating any of the pitch of the musical tone to be generated, the voiced pitch signal, the center frequency of the voiced sound formant, and the center frequency of the unvoiced sound formant from the PG 20. You.
[0016]
Returning to FIG. 2, a PG output from the PG 20 is input to an address generator (ADG: Address Generator) 21 to accumulate phase data used as the PG output, thereby obtaining a desired signal from the waveform data storage unit (WAVE TABLE) 22. A read address for reading the waveform data of the waveform shape is generated. The ADG 21 includes a voice mode flag (HVMODE), flag information of an unvoiced sound / voiced sound instruction flag (U / V), and start addresses SA (WT), loop points LP (WT), and end points EP (WT) as musical sound parameters. Further, as a voice parameter, a waveform selection (WS) signal for selecting a waveform suitable for forming a voiced sound formant, and a key-on (KeyOn) signal for instructing start of sound generation common to musical tones and voices are provided. Supplied.
[0017]
When a musical tone is generated, the start address SA (WT) is output from the ADG 21 at the start timing of the key-on signal with HVMODE = 0, and the waveform data is stored from the position of the waveform data storage unit 22 indicated by the start address SA (WT). Reading is started. Then, by accumulating the phase data from the PG 20, the read address up to the end point EP (WT) is sequentially output from the ADG 21 so as to change at a rate corresponding to the musical interval. As a result, the samples of the waveform data up to the position of the waveform data storage unit 22 indicated by the end point EP (WT) are sequentially read out at a rate corresponding to the musical tone pitch. Next, a read address corresponding to the loop point LP (WT) is output from the ADG 21, and the phase data from the PG 20 is further accumulated, so that the read address up to the end point EP (WT) corresponds to the musical interval. The data is sequentially output from the ADG 21 while changing at a rate. As a result, the sample of the waveform data from the position of the waveform data storage unit 22 indicated by the loop point LP (WT) to the position of the waveform data storage unit 22 indicated by the end point EP (WT) has a rate corresponding to the musical tone pitch. Are sequentially read. The read address from the loop point LP (WT) to the end point EP (WT) is repeatedly generated until the sound generation is stopped by the key-on signal. Thus, desired waveform data from the start of sound generation to the stop of sound generation indicated by the key-on signal can be read from the waveform data storage unit 22 at a rate corresponding to the pitch of a musical tone.
[0018]
When synthesizing voice, the start address indicated by the WS (voiced sound formant) signal at the start timing of the key-on signal with HVMODE = 1, or the waveform indicated by the start address for a predetermined unvoiced sound formant Reading of the waveform data is started from the position of the data storage unit 22. Then, the read address in the address range fixed by accumulating the phase data from the PG 20 is sequentially output from the ADG 21 so as to change at a rate corresponding to the center frequency of the voiced sound formant or the unvoiced sound formant. . As a result, samples of the waveform data are sequentially read from the waveform data storage unit 22 at a rate corresponding to the center frequency of the voiced sound formant or the unvoiced sound formant. In the WT voice section 10a, the accumulated value obtained by accumulating the phase data from the PG 20 reaches a predetermined value (constant value) predetermined in the voiced sound pitch cycle, and has reached the constant value. At this time, a voiced sound pitch signal (pulse signal) is output.
[0019]
The detailed configuration of such an ADG 21 is shown in FIG. In FIG. 4, the phase data from the PG 20 is input to an accumulator (ACC: Accumulator) 41 and is accumulated for each clock to generate an increment value of the read address. The increment value of the read address is supplied to the adder 47 via the selector 46, the start address is added in the adder 47 to generate a read address, and the read address is output from the ADG 21 as an ADG output.
The operation of the ADG 21 for generating a musical tone with HVMODE = 0 will be described. When HVMODE = 0, the accumulator 41 is reset to the initial value only by the key-on signal (KeyOn) output from the OR gate OR because the AND gate AND is closed. The accumulation of the corresponding phase data is started. This accumulation is performed for each clock, and the accumulated value b is output to the selector 46 and the subtractor 43.
[0020]
The selector 42 that supplies the data a to the subtractor 43 selects the end point EP (WT) as the data a because HVMODE = 0, and outputs the data to the subtracter 43. As a result, the subtraction value (ab) calculated by the subtractor 43 is output, and the amplitude value | ab | excluding the MSB of the subtraction value (ab) is supplied to the adder 45. Further, an MSB (Most Significant Bit) signal which becomes “1” when the subtraction value (ab) becomes negative is supplied to the selector 46 as a selection signal and supplied to the accumulator 41 as a load signal. You. Since the MSB signal becomes "1" when the subtraction value (ab) becomes negative, the selector 46 adds the accumulation value b until the accumulation value b exceeds the end point EP (WT). Output to the container 47. The selector 50 that supplies the addition data to the adder 47 selects the start address SA (WT) because HVMODE = 0, and outputs the start address SA (WT) to the adder 47. Thus, the accumulated value b to which the start address SA (WT) is added is output as an ADG output. Since the accumulated value b accumulates the phase data for each clock and changes at the rate of the phase data, the read address as the ADG output also changes according to the phase data.
[0021]
When the accumulated value b exceeds the end point EP (WT), the MSB signal changes to “1”, so that the selector 46 outputs the data c output from the adder 45. Since the data c is HVMODE = 0, the amplitude value | ab of the subtraction value (ab) excluding the MSB of the subtraction value (ab) is added to the loop point LP (WT) selected by the selector 44 in the adder 45. | Is the calculated value. As a result, the ADG output from the adder 47 becomes the read address of the loop point LP (WT) corrected by the amplitude value | ab |. Further, since the MSB signal changes to "1", the load signal is supplied to the accumulator 41, and the data c is loaded into the accumulator 41. Then, since the MSB signal returns to “0”, the data c output from the accumulator 41 is output from the selector 46. Since the accumulator 41 outputs the accumulated value b obtained by adding the phase data to the data c for each clock, the ADG output substantially corresponds to the phase data from the read address of the loop point LP (WT). It changes at a rate.
[0022]
The ADG output in this case will be illustrated and described with a graph. The ADG output is as shown in FIG. That is, when the key-on signal is applied, the start address SA (WT) is output, and the read address rises while changing at a rate corresponding to the phase data, and is incremented by the end point (EP) from the start address SA (WT). Is returned to the value obtained by adding the loop point (LP) to the start address SA (WT), and thereafter, from the value obtained by adding the loop point (LP) to the start address SA (WT), for the end point (EP) The read address up to the increment is repeatedly generated. The change of the read address at this time has a rate corresponding to the phase data. Then, when the sound generation is stopped by the key-on signal, the ADG output is stopped. The waveform data read from the waveform data storage unit 22 by the read address as the ADG output has a frequency corresponding to the phase data. Since the type of the waveform data read from the waveform data storage unit 22 can be selected by the start address SA (WT), for example, by selecting the start address SA (WT) for each of the WT voice units 10a to 10i. , WT voice sections 10a to 10i can have different tones.
[0023]
Next, the operation when the ADG 21 is the address generator of the WT voice unit 10a and generates a voiced sound pitch signal with HVMODE = 1 and U / V = 0 will be described. When HVMODE = 1 and U / V = 0, the AND gate AND opens, but since the voiced pitch signal is not supplied to the WT voice 10a, only the key-on signal is output from the OR gate OR. Therefore, the accumulator 41 is reset to the initial value by the key-on signal, and starts accumulating the phase data according to the voiced pitch signal to be generated supplied from the PG 20. This accumulation is performed for each clock, and the accumulated value b is output to the selector 46 and the subtractor 43. The selector 42 that supplies the data a to the subtractor 43 selects a predetermined constant value as the data a because HVMODE = 1, and outputs the data to the subtracter 43. As a result, the subtraction value (ab) calculated by the subtractor 43 is output, and the amplitude value | ab | excluding the MSB of the subtraction value (ab) is supplied to the adder 45.
[0024]
The MSB signal of the subtraction value (ab) is supplied to the selector 46 as a selection signal, and is also supplied to the accumulator 41 as a load signal. The MSB signal becomes “1” when the subtraction value (ab) becomes a negative value, that is, when the accumulated value reaches a constant value, and is supplied to the accumulator 41 as a load signal. The data c is loaded into the accumulator 41. Since the data c is set to HVMODE = 1, the adder 45 adds the amplitude value | ab− excluding the MSB of the subtraction value (ab) to “0” selected by the selector 44. Calculated value. When the accumulator 41 adds the phase data to the data c at the next clock, the MSB signal becomes “0”. In this way, the MSB signal is generated at a cycle corresponding to the phase data based on the voiced sound pitch parameter supplied from the PG 20, that is, at a cycle of the voiced sound pitch. Therefore, the WT voice 10a supplied with HVMODE = 1 and U / V = 0 outputs this MSB signal as a voiced sound pitch signal. When the voiced sound pitch signal is graphically illustrated, it becomes a pulse signal having a period of the voiced sound pitch as shown in FIG. In this case, the WT voice section 10a also outputs an ADG output, but this ADG output is not used as a read address.
[0025]
Next, the operation of the ADG 21 when HVMODE = 1 and U / V = 0 to generate a voiced sound formant will be described. When HVMODE = 1 and U / V = 0, the accumulator 41 is reset to the initial value by the voiced pitch signal and the key-on signal output from the OR gate OR because the AND gate is opened by the action of the gate NOT. The accumulation of the phase data according to the center frequency of the voiced sound formant to be generated supplied from the PG 20 is started. The AND gate AND is supplied with the voiced sound pitch signal shown in FIG. 7 output from the WT voice unit 10a. The accumulation of the accumulator 41 is performed for each clock, and the accumulated value b is output to the selector 46 and the subtractor 43. Since HVMODE = 1, the selector 42 that supplies the data a to the subtractor 43 selects a predetermined constant value as the data a and outputs it to the subtracter 43. The constant value is used because the data amount of the waveform data forming the formant is a fixed value. Then, the subtraction value (ab) calculated by the subtractor 43 is output, and the amplitude value | ab | excluding the MSB of the subtraction value (ab) is supplied to the adder 45.
[0026]
The MSB signal of the subtraction value (ab) is supplied to the selector 46 as a selection signal, and is also supplied to the accumulator 41 as a load signal. Since the MSB signal becomes “1” when the subtraction value (ab) becomes a negative value, the selector 46 adds the accumulated value b to the adder 47 until the accumulated value b exceeds the constant value. Output to The selector 50 that supplies the addition data to the adder 47 selects the output of the selector 49 because HVMODE = 1, and outputs the output to the adder 47. Since U / V = 0, the selector 49 outputs the start address SA (WS) of the selected waveform data forming the voiced sound formant output from the start address generator 48 to the selector 49. ing. Further, the start address generator 48 starts the start address on the waveform data storage unit 22 so as to select waveform data according to a waveform selection (WS) signal input to select a waveform suitable for forming a voiced sound formant. The address SA is output. As a result, the accumulated value b is added to the start address SA (WS) in the adder 47 and output as an ADG output. Since the accumulated value b changes at a rate corresponding to the phase data by accumulating the phase data every clock, the read address for reading the waveform data forming the voiced sound formant which is the ADG output also depends on the phase data. Will change at the same rate.
[0027]
When the accumulation proceeds and the accumulated value reaches a constant value, the subtraction value (ab) becomes a negative value, the MSB signal becomes “1”, and is supplied to the selector 46. Then, the data c is output from the selector 46, and the data c is subtracted by the adder 45 from “0” selected by the selector 44 because HVMODE = 1 is set. ) Is the calculated value to which the amplitude value | ab | excluding the MSB is added. As a result, the ADG output from the adder 47 becomes a read address for the amplitude value | ab−. Further, the MSB signal is supplied to the accumulator 41 as a load signal, so that the accumulator 41 is loaded with data c. When the phase data is added to the data c at the next clock, the MSB signal returns to “0”, so that the data b output from the accumulator 41 is output from the selector 46. The accumulation of the phase data in the accumulator 41 is performed for each clock, and the ADG output changes at a rate corresponding to the phase data from the start address SA (WS). Returning to SA (WS), the ADG output repeats the read address until it is incremented by a constant value from the start address SA (WS). Since the phase data in this case is based on the center frequency of the voiced sound formant, the read address changes at a rate corresponding to the center frequency of the voiced sound formant. Further, since the accumulator 41 is reset to the initial value by the voiced sound pitch signal output from the WT voice unit 10a, the ADG output is reset for each cycle of the voiced sound pitch. A voiced sound formant having a predetermined center frequency formed by the waveform data read from the storage unit 22 can have a sense of pitch.
[0028]
A graph of the ADG output in this case is as shown in FIG. That is, when a key-on signal is applied, a start address SA (WS) corresponding to a WS signal for selecting waveform data for forming a voiced sound formant is output. Then, the read address changing at a rate corresponding to the center frequency of the voiced sound formant is increased by the operation of the accumulator 41, and when the start address SA (WS) is incremented by a constant value, the start address SA (WS) is increased. ), The read address from the start address SA (WS) to a value incremented by a constant value is repeatedly generated. When the selected waveform data is read from the waveform data storage unit 22 by the ADG output, a voiced sound formant having a predetermined center frequency is formed by the read waveform data. Then, when the sound generation is stopped by the key-on signal, the ADG output is stopped. The type of the waveform data read from the waveform data storage unit 22 can be selected according to the start address SA (WS), that is, the WS (voiced sound formant) signal, and the formant of the voiced sound formant formed thereby can be changed. Can be. FIG. 6 does not show that the accumulator 41 is reset to the initial value by the voiced sound pitch signal output from the WT voice unit 10a.
[0029]
Next, the operation of the ADG 21 when HVMODE = 1 and U / V = 1 to generate an unvoiced sound formant will be described. When HVMODE = 1 and U / V = 1, the accumulator 41 is reset to the initial value only by the key-on signal output from the OR gate OR because the AND gate AND is closed by the action of the gate NOT, and is supplied from the PG 20. The accumulation of the phase data according to the center frequency of the unvoiced sound formant to be generated is started. This accumulation is performed for each clock, and the accumulated value b is output to the selector 46 and the subtractor 43. The selector 42 that supplies the data a to the subtractor 43 selects a predetermined constant value as the data a because HVMODE = 1, and outputs the data to the subtracter 43. The constant value is used because the data amount of the waveform data forming the formant is a fixed value. Then, the subtraction value (ab) calculated by the subtractor 43 is output, and the amplitude value | ab | excluding the MSB of the subtraction value (ab) is supplied to the adder 45.
[0030]
The MSB signal of the subtraction value (ab) is supplied to the selector 46 as a selection signal, and is also supplied to the accumulator 41 as a load signal. Since the MSB signal becomes “1” when the subtraction value (ab) becomes a negative value, the selector 46 adds the accumulated value b to the adder 47 until the accumulated value b exceeds the constant value. Output to The selector 50 that supplies the addition data to the adder 47 selects the output of the selector 49 because HVMODE = 1, and outputs the output to the adder 47. Since U / V = 1, the selector 49 outputs the start address SA (sign) of the sine wave waveform data to the selector 49. This is because sine waves are suitable for forming unvoiced formants. As a result, the accumulated value b is added to the start address SA (sign) in the adder 47 and output as an ADG output. Since the accumulated value b accumulates phase data for each clock and changes at a rate corresponding to the center frequency of the unvoiced formant, the read address for reading out the waveform data forming the unvoiced sound formant which is the ADG output is also unvoiced formant. Will change at a rate corresponding to the center frequency.
[0031]
When the accumulated value b exceeds the constant value, the MSB signal changes to “1”, so that the selector 46 outputs the data c output from the adder 45. Since the data c is set to HVMODE = 1, the adder 45 adds the amplitude value | ab− excluding the MSB of the subtraction value (ab) to “0” selected by the selector 44. Calculated value. As a result, the ADG output from the adder 47 becomes a read address for the amplitude value | ab−. Further, the MSB signal is supplied to the accumulator 41 as a load signal, so that the accumulator 41 is loaded with data c. When the phase data is added to the data c at the next clock, the MSB signal returns to “0”, so that the data b output from the accumulator 41 is output from the selector 46. The accumulation of the phase data in the accumulator 41 is performed for each clock, and the ADG output changes at a rate corresponding to the phase data from the start address SA (sine). Returning to SA (sign), the ADG output repeats the read address until it is incremented by a constant value from the start address SA (sign). Since the phase data in this case is based on the center frequency of the unvoiced formant, the read address changes at a rate corresponding to the center frequency of the unvoiced formant. An unvoiced sound formant having a predetermined center frequency is formed by the waveform data read from the waveform data storage unit 22 using the ADG signal as a read address.
[0032]
A graph of the ADG output in this case is as shown in FIG. That is, when the key-on signal is applied, the start address SA (sine) of the waveform data of the sine wave for forming the unvoiced sound formant is output, and the readout changes at a rate corresponding to the center frequency of the unvoiced sound formant by the operation of the accumulator 41. When the address increases and the start address SA (sign) is incremented by a constant value, the address returns to the start address SA (sign), and thereafter, reading is performed from the start address SA (sign) to a value incremented by a constant value. Addresses are repeatedly generated. When the waveform data of the sine wave is read from the waveform data storage unit 22 by the ADG output, an unvoiced sound formant having a predetermined center frequency is formed by the read waveform data. Then, when the sound generation is stopped by the key-on signal, the ADG output is stopped.
[0033]
Here, an example of the waveform shape of a plurality of types of waveform data for forming a voiced sound formant or an unvoiced sound formant stored in the waveform data storage unit 22 is shown in FIG.
FIG. 14 shows an example in which waveform data of 32 types of waveforms are stored in the waveform data storage unit 22. When “0” is set as a WS (voiced sound formant) signal, the 0th sine wave When, for example, "16" is set as a WS (voiced sound formant) signal, the 16th triangular wave is read. The start address SA (sign) is the start address on the waveform data storage unit 22 of the 0th sine wave. The data amount of these 32 types of waveform data is fixed, and the above-mentioned constant value corresponds to this data amount. Therefore, when any one of the 32 types of waveform data is read by the ADG output from the ADG 21, the waveform data of the selected waveform shape is repeatedly read until the sound generation is stopped.
[0034]
Returning to FIG. 2, the waveform data read from the waveform data storage unit 22 is supplied to a multiplier 23, where the waveform data is multiplied by an envelope signal generated by an envelope generator (EG) 24. The EG 24 includes a voice mode flag (HVMODE), flag information of an unvoiced sound / voiced sound indication flag (U / V), and an attack rate AR (WT), a decay rate DR (WT), and a sustain rate SR (WT) as tone parameters. , A release rate RR (WT), a sustain level SL (WT), and a key-on (KeyOn) signal for instructing the start of sound generation common to musical tones and voices.
[0035]
FIG. 9 is a block diagram showing a detailed configuration of such an envelope generator (EG) 24.
When a musical tone is generated, in the EG 24 shown in FIG. 9, HVMODE = 0, the selector 60 selects the attack rate AR (WT) and outputs it to the selector 61, and the selector 63 selects the decay rate DR (WT). The output is output to the selector 61, and the release rate RR (WT) is selected by the selector 64 and output to the selector 61. Further, the sustain rate SR (WT) is input to the selector 61. The selector 61 is controlled by the state control unit 66 to select and output an envelope parameter of each of the attack, decay, sustain, and release states. The state control unit 66 is supplied with a key-on signal and an audio mode flag (HVMODE), and receives a sustain level SL (WT) signal. Further, a voiced sound pitch signal and an unvoiced sound / voiced sound indication flag (U / V) output from the WT voice unit 10a are also supplied, but these are not used. The envelope parameters output from the selector 61 in accordance with the state are accumulated by an accumulator (ACC) 65 to generate an envelope, output as an EG output, and supplied to a state control unit 66. The state control unit 66 can determine the state from the level of the EG output. The accumulator 65 starts accumulation at the start timing of the key-on signal.
[0036]
FIG. 10 shows a graph of the EG output in this case. That is, when the key-on signal supplied to the state control unit 66 and the accumulator 65 rises, the state control unit 66 determines that sound generation has started, and the selector 61 determines from the selector 61 that the attack rate AR for the attack, which is the state at the start of sound generation. WT) is output. The parameters of the attack rate AR (WT) are accumulated for each clock in the accumulator 65, and the EG output rapidly rises as indicated by AR in FIG. Then, when the level of the EG output reaches, for example, 0 dB, the state control unit 66 determines that the state has shifted to decay, and causes the selector 61 to output a parameter of the decay rate DR (WT). The parameter of the decay rate DR (WT) is accumulated for each clock in the accumulator 65, and the EG output rapidly decreases like DR shown in FIG.
[0037]
When the EG output decreases and the level of the EG output reaches the sustain level SL (WT), the state control unit 66 detects this and determines that the state has shifted to the sustain. (WT) parameter is output. The output parameter of the sustain rate SR (WT) is accumulated for each clock in the accumulator 65, and the EG output falls at a gentle slope as shown in SR shown in FIG. The state control unit 66 continues sustaining until the key-on signal falls. When the key-on signal falls and the state control unit 66 determines that the sound generation is stopped, the state control unit 66 outputs a parameter of the release rate RR (WT) from the selector 61. Let it. The output parameter of the release rate RR (WT) is accumulated for each clock in the accumulator 65, and the EG output rapidly drops with a slope as shown by RR in FIG. Become.
[0038]
Next, when generating a voiced sound formant in the voice, the selector 60 selects the rapid start-up rate for the initial state in the EG 24 shown in FIG. 9 and outputs it to the selector 61 by setting HVMODE = 1 and U / V = 0. Then, a constant value for the intermediate state selected by the selector 62 in accordance with U / V = 0 is selected by the selector 63 and output to the selector 61, and a rapid decay rate for the end state is selected by the selector 64 to select the constant value. Output to 61. Further, the sustain rate SR (WT) is input to the selector 61, but this parameter is not used. The selector 61 is controlled by the state control unit 66 to select and output an envelope parameter of each of the initial, intermediate, and end states. The state control unit 66 is supplied with a key-on signal, a voiced sound pitch signal output from the WT voice unit 10a, a voice mode flag (HVMODE), and flag information on an unvoiced / voiced voice instruction flag (U / V). Further, a sustain level SL (WT) signal is supplied, but is not used in this case. The envelope parameters output from the selector 61 in accordance with the state are accumulated for each clock by an accumulator (ACC) 65 to generate an envelope, output as an EG output, and are supplied to a state control unit 66. The state control unit 66 can determine the state from the level of the EG output. The accumulator 65 starts accumulation at the start timing of the key-on signal.
[0039]
FIG. 11 shows a graph of the EG output in this case. That is, when the key-on signal supplied to the state control unit 66 and the accumulator 65 rises, the state control unit 66 determines that sound generation has started, and causes the selector 61 to output the parameter of the rapid start-up rate for the initial state. The parameter of the rapid rise rate is accumulated for each clock in the accumulator 65, and the EG output rapidly rises as shown in FIG. When the level of the EG output reaches a predetermined level, the state control unit 66 determines that the state has shifted to the intermediate state, and causes the selector 61 to output a parameter having a constant value for the intermediate state. The parameter of this constant value is accumulated for each clock in the accumulator 65, and the EG output gradually decreases as shown in FIG.
[0040]
Here, when the voiced sound pitch signal shown in FIG. 7 is input to the state control unit 66, the state control unit 66 controls the selector 61 to select the parameter of the rapid fall rate and outputs it to the accumulator 65. . The parameter of the rapid fall rate is accumulated for each clock in the accumulator 65, and the EG output sharply falls as shown in FIG. Then, when the level of the EG output reaches the predetermined minimum level, the state control unit 66 controls the selector 61 to again select the parameter of the rapid fall rate and outputs it to the accumulator 65. The parameter of the rapid rise rate is accumulated for each clock in the accumulator 65, and the EG output rapidly rises as shown in FIG. When the level of the EG output reaches a predetermined level, the state control unit 66 determines that the state has shifted to the intermediate state, and causes the selector 61 to output a parameter having a constant value for the intermediate state. Hereinafter, the same operation is repeatedly performed. As described above, since the envelope has a voiced pitch cycle, the envelope data can give a sense of pitch to the waveform data multiplied by the multiplier 23.
[0041]
When the key-on signal falls and the state control unit 66 determines that the sound is stopped, the state control unit 66 controls the selector 61 to select the parameter of the rapid fall rate and outputs it to the accumulator 65. The parameter of the rapid fall rate is accumulated for each clock in the accumulator 65, and the EG output sharply drops to stop the sound generation.
[0042]
Next, when generating an unvoiced sound formant in voice, the selector 60 selects the rapid start-up rate for the initial state in the EG 24 shown in FIG. , The selector 63 selects “0” for the intermediate state selected according to U / V = 1, and outputs it to the selector 61, and the selector 64 selects the rapid decay rate for the end state and selects the intermediate state “0”. Output to 61. Further, the sustain rate SR (WT) is input to the selector 61, but this parameter is not used. The selector 61 is controlled by the state control unit 66 to select and output an envelope parameter of each of the initial, intermediate, and end states. The state control unit 66 is supplied with flag information of a key-on signal, an audio mode flag (HVMODE), and an unvoiced / voiced sound instruction flag (U / V). The voiced pitch signal and the sustain level SL (WT) signal output from the WT voice unit 10a are supplied, but are not used in this case. The envelope parameters output from the selector 61 in accordance with the state are accumulated by an accumulator (ACC) 65 to generate an envelope, output as an EG output, and supplied to a state control unit 66. The state control unit 66 can determine the state from the level of the EG output. The accumulator 65 starts accumulation at the start timing of the key-on signal.
[0043]
FIG. 12 shows a graph of the EG output in this case. That is, when the key-on signal supplied to the state control unit 66 and the accumulator 65 rises, the state control unit 66 determines that sound generation has started, and causes the selector 61 to output the parameter of the rapid start-up rate for the initial state. The parameter of the rapid rise rate is accumulated for each clock in the accumulator 65, and the EG output rapidly rises as shown in FIG. When the level of the EG output reaches a predetermined level, the state control unit 66 determines that the state has shifted to the intermediate state, and causes the selector 61 to output a parameter of “0” for the intermediate state. Thus, the EG output output from the accumulator 65 maintains its value as shown in FIG. Here, when the key-on signal falls and the state control unit 66 determines that the sound is stopped, the state control unit 66 controls the selector 61 to select the parameter of the rapid fall rate and outputs it to the accumulator 65. The parameter of the rapid fall rate is accumulated for each clock in the accumulator 65, and the EG output falls rapidly as shown in FIG.
Although the EG output shown in FIGS. 10 to 12 forms an envelope that changes linearly, an envelope that changes in a curve may be generated. Further, the multiplier 23 for multiplying the waveform data by the output of the EG 24 may be arranged at the subsequent stage of the adder 25 described later.
[0044]
Returning to FIG. 2, the waveform data multiplied by the envelope in the multiplier 23 is supplied to the adder 25, and the noise generated by the noise generator 26 is added. The noise is, for example, white noise. In this case, the noise generation unit 26 is supplied with the flag information of the voice mode flag (HVMODE) and the unvoiced sound / voiced sound indication flag (U / V), and HVMODE = 1 and U / V = 1 to set the unvoiced sound formant. The noise is generated only when the noise is generated. Therefore, in the adder 25, noise is added only to the waveform data multiplied by the envelope forming the unvoiced sound formant and output.
[0045]
Here, the detailed configuration of the noise generating unit 26 is shown in FIG. As shown in FIG. 13, the white noise generated by the white noise generator 70 in the noise generator 26 is band-limited by four-stage low-pass filters (LPF1, LPF2, LPF3, LPF4) 71, 72, 73, 74. You. The output of the low-pass filter 74 has its noise level adjusted by a multiplier 75 and is input to a selector 76. The selector 76 is selected by the output of an AND gate (AND) 77. The AND gate 77 is output from the multiplier 75 in the selector 76 when HVMODE = 1 and U / V = 1 to generate an unvoiced sound formant. Outputs noise. When either HVMODE = 1 or U / V = 1 is set to "0" to generate a musical tone or a voiced sound formant, the selector 76 outputs "0" instead of noise from the output of the AND gate 77. Is output. As a result, the adder 25 adds the noise only to the waveform data multiplied by the envelope forming the unvoiced sound formant and outputs the result.
[0046]
The low-pass filters 71 to 74 have the same configuration, and the configuration of the low-pass filter 71 is shown in FIG. 13 as a representative. In the low-pass filter 71, white noise input from the white noise generator 70 is delayed by one sample time by a delay circuit 70a, multiplied by a predetermined coefficient in a coefficient multiplier 70b, and input to an adder 70d. The input white noise is multiplied by a predetermined coefficient in a coefficient multiplier 70c, input to an adder 70d, and added to the output of the coefficient multiplier 70b. The output of the adder 70d is a low-pass filter output. By limiting the band of the white noise by the low-pass filters 71 to 74 having such a configuration, for example, four stages, it is possible to suppress the audible feeling of the voice. The adjustment of the noise level in the multiplier 75 is not always necessary, and may be omitted.
[0047]
Returning to FIG. 2, the waveform data output from the adder 25 is supplied to a multiplier 27 and the output level is adjusted. In the multiplier 27, the voice mode flag (HVMODE), the flag information of the unvoiced sound / voiced sound indication flag (U / V), the level (WT) indicating the output level of the musical sound, and the level indicating the output level of the voiced sound formant (WT) A level (unvoiced formant) indicating the output level of the voiced sound formant and the unvoiced sound formant is supplied. When HVMODE = 0 is set to generate a musical tone, the level (WT) is multiplied by the multiplier 27 to adjust the output level of the waveform data of the musical tone. When HVMODE = 1 and U / V = 0 to generate a voiced sound formant, the multiplier 27 multiplies the level (voiced sound formant) by multiplying the output level of the waveform data forming the voiced sound formant. Adjusted. As a result, the level of the voiced sound formant becomes a predetermined level. Further, when HVMODE = 1 and U / V = 1 to generate an unvoiced sound formant, the output level of the waveform data forming the unvoiced sound formant is adjusted by the multiplier 27 multiplying the level (unvoiced sound formant). . Thus, the level of the unvoiced sound formant becomes a predetermined level.
[0048]
In the above description, the voice synthesizing device which is also used as the sound source device according to the present invention is composed of the WT voice unit having nine waveform data storage units. However, the present invention is not limited to this. It may be. If the number of WT voice parts exceeds 9, the number of simultaneous tones of musical tones can be increased, the number of formants to be synthesized can be increased, and various voices can be synthesized.
Further, in the voice synthesizing device which is also used as the sound source device according to the present invention, when a tone is designated by the voice mode flag (HVMODE), the plurality of WT voice parts function as a tone generating unit, and the voice mode flag (HVMODE) When a voice is specified in ()), the plurality of WT voice units function as a formant forming unit. Also, by fixing the voice mode flag (HVMODE) to voice, it can be used as a dedicated voice synthesizer.
[0049]
【The invention's effect】
As described above, the present invention forms a formant having a desired formant center frequency and a desired formant level by a plurality of formant forming sections, which are a plurality of waveform table voice sections, and synthesizes the formed formants. This synthesizes voice. Then, an envelope signal having a pitch cycle is added to waveform data forming a formant. As a result, the formant can be given a sense of pitch, and high-quality, realistic voice can be synthesized. Further, by giving an envelope signal of a pitch period to waveform data forming a voiced sound formant, the voiced sound formant can have a sense of pitch.
[0050]
Also, a plurality of musical tones can be generated by mixing waveform data output from a plurality of waveform table voice units based on musical tone parameters, and output from the plurality of waveform table voice units based on voice parameters. A voice can be synthesized by synthesizing waveform data forming a voiced sound formant or an unvoiced sound formant. As described above, since a plurality of waveform table voice sections can be used for both generation of musical sounds and voice synthesis, the voice synthesis apparatus of the present invention can also be used as a sound source apparatus.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a speech synthesizer that is also used as a sound source device according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a schematic configuration of a WT voice unit in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 3 is a block diagram showing a detailed configuration of a phase data generator in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 4 is a block diagram showing a detailed configuration of an address generator in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 5 is a graph showing an example of an ADG output of an address generator in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 6 is a graph showing another example of the ADG output of the address generator in the speech synthesizer used also as the sound source device according to the embodiment of the present invention.
FIG. 7 is a diagram showing a waveform of a voiced sound pitch signal of an address generator in a voice synthesizing device used also as a sound source device according to an embodiment of the present invention.
FIG. 8 is a graph showing still another example of the ADG output of the address generator in the speech synthesizer used also as the sound source device according to the embodiment of the present invention.
FIG. 9 is a block diagram showing a detailed configuration of an envelope generator in the speech synthesizer used also as the sound source device according to the embodiment of the present invention.
FIG. 10 is a graph showing an example of an EG output of an envelope generator in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 11 is a graph showing another example of the EG output of the envelope generator in the voice synthesizer used also as the sound source device according to the embodiment of the present invention.
FIG. 12 is a graph showing still another example of the EG output of the envelope generator in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 13 is a block diagram illustrating a detailed configuration of a noise generating unit in the voice synthesizing device used also as the sound source device according to the embodiment of the present invention.
FIG. 14 is a diagram showing waveform shapes of a plurality of types of waveform data for forming a voiced sound formant or an unvoiced sound formant stored in a waveform data storage unit in a speech synthesizer used also as a sound source apparatus according to an embodiment of the present invention. It is a figure showing an example.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Voice synthesizer, 10 WT voice part, 10a, 10b, 10c, 10d, 10e, 10f, 10g, 10h, 10i WT voice part, 11 mixing means, 20 phase data generator, 21 address generator, 22 waveform data storage Unit, 23 multiplier, 25 adder, 26 noise generator, 27 multiplier, 30 selector, 31 selector, 32 selector, 33 selector, 34 shifter, 41 accumulator, 42 selector, 43 subtractor, 44 selector, 45 Adder, 46 selector, 47 adder, 48 start address generator, 49 selector, 50 selector, 60 selector, 61 selector, 62 selector, 63 selector, 64 selector, 65 accumulator, 66 state controller, 70 white noise Generator, 70a delay circuit, 70b coefficient multiplier, 70c Multiplier, 70d adder, 71, 72, 73, 74 Low-pass filter, 75 multiplier, 76 selector, 77 AND gate, AR attack rate, BLOCK octave information, DR decay rate, EP endpoint, FNUM frequency information, LP loop Point, RR release rate, SA start address, SL sustain level, SR sustain rate

Claims

A voice synthesizer that includes a plurality of formant forming units each forming a formant having a desired formant center frequency and a desired formant level, and synthesizes a voice by synthesizing a plurality of formants formed by the plurality of formant forming units. And
Each of the plurality of formant forming units,
Waveform shape designating means for designating a desired waveform shape from a plurality of types of waveform shapes;
Waveform data storage means for storing a plurality of waveform data corresponding to the plurality of types of waveform shapes,
A waveform data reading unit that generates an address that changes at a rate corresponding to the formant center frequency and reads waveform data corresponding to the waveform shape specified by the waveform shape specification unit from the waveform data storage unit;
A waveform read out from the waveform data storage means by the waveform data reading means by forming an envelope signal having a shape which rapidly attenuates at each timing corresponding to the pitch period and rises rapidly after the attenuation. An envelope assigning means for assigning to the data,
A speech synthesis device comprising:

The voice synthesizer according to claim 1, wherein a voiced sound is synthesized by synthesizing a plurality of formants formed by the plurality of formant forming units.