JP3779058B2

JP3779058B2 - Sound source system

Info

Publication number: JP3779058B2
Application number: JP03146598A
Authority: JP
Inventors: 清嗣新井; 雅人足立
Original assignee: Korg Inc
Current assignee: Korg Inc
Priority date: 1998-02-13
Filing date: 1998-02-13
Publication date: 2006-05-24
Anticipated expiration: 2018-02-13
Also published as: JPH11231875A

Description

【０００１】
【発明の属する技術分野】
この発明は、電子楽器や音声合成装置で用いられる波形発生装置を有する音源システムに関し、特に加算合成方式を用いた波形発生装置を備えた音源システムに関する。
【０００２】
【従来の技術】
加算合成方式（additive synthesis technique）と総括して呼ばれる技術は、正弦波加算合成、複合正弦波法、Phase Vocoder、Sinusoidal Model Synthesisなどと称される楽音合成方式または音声合成方式も広範囲に指すものであり、フーリエ解析などの周波数分析に基づいた一貫した分析／再合成システムを起源に持つ、複数個の基底波形（周期波形）を発生する波形発生装置を各基底の適切な信号レベルに応じて重み付け加算して１つの出力合成音を得る音源方式を言う。以下、簡単の為、楽音及び音声を総称して楽音、また、それら楽音波形の発生装置、合成装置を音源と表現する。
【０００３】
基底波形同士の周波数の関係や重み付け係数の導出方法及び係数の時間変化の制御方法などの違いで微妙に異なる手法が提案されているが、これらを問わず加算合成方式と呼ぶことにする。
加算合成方式は、楽音の分析／再合成のシステムが一致しているため、理論的には解析音の情報を損ねず合成音を発生できる分析合成システムであることが知られている。
【０００４】
図７はＫＬ（Karhunen Loeve）基底方式を用いた楽音合成装置の一例を図示したものである。
この方式は以下の２文献に開示されているが、図７は、特に、文献２の合成モデルを図示したものである。
文献１：Karhunen-Loeve-Based Additive Synthesis of Musical Tones IEEE 1986 pp581-584
文献２：Implementation of the KL Synthesis Algorithm under Real-Time Control ICMC Proceedings 1991 pp360-363
【０００５】
図７において、１は入力解析音波形ｘ（ｔ）（ｔは離散時間、ｔ＝ｋＴ，ｋ：整数、Ｔ：サンプリング時間）のピッチ情報ｐ（ｔ）を出力するピッチ情報発生部、２はこのピッチ情報発生部１からのピッチ情報ｐ（ｔ）が基本周波数の対数に比例するデータ形式であることを仮定して、図８（ａ）に示す如く、位相の変化量である位相増分値データφ（ｔ）（＝ＦＲ（ｐ（ｔ）））に変換するピッチ／位相変換部、３はこのピッチ／位相変換部２からの位相増分値データφ（ｔ）に基づいて式（１）に従って位相データθ（ｔ）を出力する位相発生器であり、図８（ｂ）に示す如く、符号無し加算器（２の補数型オーバーフロー特性加算器）で実現する。
【０００６】
【数１】

【０００７】
また、４は上記入力解析音波形のレベル情報ａ（ｔ）を出力するレベル情報発生部で、ボリューム情報にベロシティ情報やレベルエンベロープ情報を統合した最終的な出力音のレベル情報を出力する。５はこのレベル情報発生部４からのレベル情報ａ（ｔ）に基づいて基底波形毎の重み係数Ａｎ（ｔ）（ｎは基底波形の順序を示す）を発生する重み係数発生部である。
【０００８】
さらに、６は上記位相発生器３からの位相データθ（ｔ）及び上記重み係数発生部５からの基底波形毎の重み係数Ａｎ（ｔ）に基づいて重み係数を乗算した各基底波形を合成した出力合成音波形ｙ（ｔ）を得る波形発生装置であり、この波形発生装置６は、ＫＬ基底の位相に対応した波形データを格納してなり、上記位相データθ（ｔ）をアドレスとして対応する基底波形データＦｎ（ｔ）を出力する基底数に応じたＫＬ基底メモリ６Ａ１，６Ａ２，・・・，６Ａｎを有する基底波形発生装置６Ａと、各ＫＬ基底波メモリからの基底波形データＦｎ（ｔ）に上記重み係数発生部５からの基底波形毎の重み係数Ａｎ（ｔ）を乗算する基底数に応じた乗算器６Ｂｎと、これら乗算器６Ｂｎの出力波形を合成した式（２）に示す出力合成波形ｙ（ｔ）音を得る加算器６Ｃとを有する。なお、式（２）において、Ｎは基底波形の総数を示す。
【０００９】
【数２】

【００１０】
【発明が解決しようとする課題】
ところで、上述した従来の加算合成方式の技術では、２つの大きな問題点が知られている。
一つは、数個の基底波形を加算して１出力波形を形成する為に、特に現実に存在する楽音に近いリアルな音色を再現する場合、１つの楽音の合成に費やさねばならない演算量が膨大となる。
従って、演算量を制限すると基底数を増やすことは困難になるが、逆に基底数を減らすと再現できる倍音の総数が減る問題が起こる。その場合は、周波数が低い倍音を優先し合成すると高帯域成分が不足した合成音となり、レベルの大きい倍音を優先し合成すると倍音間隔が不自然に開いた合成音になるというな音質的に顕著な問題が起こる。
【００１１】
もう一つは、再現性を向上させようとすると、各基底の加算重み係数データの制御が複雑になる。重み係数発生部５から発生される重み係数はフーリエ変換または短時間フーリエ変換（Short Time Fourier Transform：ＳＴＦＴ）によって定まる信号レベルデータを用いるのが一般的であるが、そのデータ数は解析点数分の時変データ列が基底の個数分あることになる。つまり、入力解析音を１０，０００サンプルのデータ列で解析した場合は、１０，０００Ｎ個のデータ数となる。この場合は、入力解析音と出力合成音のピッチとレベルが同じならば良好な再現性を得られるが、ピッチとレベルを変更すると入力解析音から離れるに従って再現性が落ちることが知られている。
【００１２】
また、基底の個数分の時変データ列を取り扱うためデータ量は膨大であり、そのままデータを保持及び制御するのは困難であるため、例えば折れ線近似（Piecewise Linear Segments）などの比較的簡単なエンベロープジェネレータの一種で代用されることが多いが、現在では入力解析音特有の情報の大部分がエンベロープ曲線に含まれる微細なゆらぎ情報の中に含まれていることが推定されており、これを簡素な折れ線など簡単なエンベロープ曲線で表現することで欠落する情報量は多く、結果として楽音の再現性を劣化させる。
【００１３】
音源の基底が正弦波の場合も存在するが、図７に示す従来例の場合は基底をＫＬ基底としており、ＫＬ基底の一つ一つをスペクトラム解析すると、それぞれが入力解析音のスペクトラム形状の特徴を部分的に持っており、音源の基底として正弦波の代わりにＫＬ基底波形を用いることで演算量の削減を狙ったものであり、上記２つの問題点のうち前者を解消するものであるが、図７に示す例では、レベルの時変データ列となる各基底の重み係数を求めるのに、重み係数発生部５では基底のエンベロープジェネレータに折れ線近似を用いているため、後者の問題点は解消されていない。
【００１４】
この発明は上述した従来例にかかる問題点を解消するためになされたもので、加算合成方式の音源システムにおいて、簡潔なデータ制御構造でありながら再現性の向上の実現を図ることができると共に、ピッチとレベル双方の時変データ列を編集することにより、新たな演奏情報でも入力解析音の特徴を残した出力合成音が発生できる音源システムを得ることを目的とする。
【００１５】
【課題を解決するための手段】
上記目的を達成するために、この発明に係る音源システムは、入力解析音波形のピッチ情報を出力するピッチ情報発生部と、このピッチ情報発生部からのピッチ情報を位相増分値データに変換するピッチ／位相変換部と、このピッチ／位相変換部からの位相増分値データに基づいて位相データを得る位相発生器と、上記入力解析音波形のレベル情報を出力するレベル情報発生部と、このレベル情報発生部からのレベル情報に基づいて基底波形毎の重み係数を発生する重み係数発生部と、上記位相発生器からの位相データ及び上記重み係数発生部からの基底波形毎の重み係数に基づいて重み係数を乗算した各基底波形の出力合成音を得る波形発生装置とを備え、上記波形発生装置は、ＫＬ基底波形を格納してなる基底数に応じた基底メモリを有し、上記位相データをアドレス入力として各基底メモリからＫＬ基底波形を読み出して出力する基底波形発生装置と、各ＫＬ基底波形に上記重み係数発生部からの基底波形毎の重み係数を乗算する基底数に応じた乗算器と、これら乗算器の出力の合成音を得る加算器とを有する音源システムにおいて、上記重み係数発生部は、互いに直交するピッチ軸とレベル軸及び重み係数軸でなる空間に重み係数が曲面をなすようにした、ピッチ及びレベルに応じた重み係数を格納してなる基底波形毎の２次元テーブルメモリを有し、上記ピッチ情報発生部からのピッチ情報及び上記レベル情報発生部からのレベル情報に基づいて各ＫＬ基底波形毎の重み係数を上記各乗算器に出力することを特徴とするものである。
【００１６】
また、上記基底波形発生装置は、上記２次元テーブルメモリを用いて補間法による直線近似によりピッチ及びレベルに応じた重み係数を求めることを特徴とするものである。
【００１７】
また、他の発明に係る音源システムは、入力解析音波形のピッチ情報を出力するピッチ情報発生部と、このピッチ情報発生部からのピッチ情報を位相増分値データに変換するピッチ／位相変換部と、このピッチ／位相変換部からの位相増分値データに基づいて位相データを得る位相発生器と、上記入力解析音波形のレベル情報を出力するレベル情報発生部と、このレベル情報発生部からのレベル情報に基づいて基底波形毎の重み係数を発生する重み係数発生部と、上記位相発生器からの位相データ及び上記重み係数発生部からの基底波形毎の重み係数に基づいて重み係数を乗算した各基底波形の出力合成音を得る波形発生装置と備え、上記波形発生装置は、ＫＬ基底波形を格納してなる基底数に応じた基底メモリを有し、上記位相データをアドレス入力として各基底メモリからＫＬ基底波形を読み出して出力する基底波形発生装置と、各ＫＬ基底波形に上記重み係数発生部からの基底波形毎の重み係数を乗算する基底数に応じた乗算器と、これら乗算器の出力の合成音を得る加算器とを有する音源システムにおいて、上記重み係数発生部は、ピッチ及びレベルを変数とする２変数多項式を、互いに直交するピッチ軸とレベル軸及び重み係数軸でなる空間に重み係数が曲面をなすようにした、重み係数に近似したときの係数を各基底波形毎に格納してなる係数メモリと、上記ピッチ情報発生部からのピッチ情報と上記レベル情報発生部からのレベル情報及び上記係数メモリからの各基底波形毎の係数に基づいて重み係数を演算する基底波形毎の重み係数演算器とを備え、これら重み係数演算器からの各ＫＬ基底波形毎の重み係数を上記各乗算器に出力することを特徴とするものである。
【００１８】
また、上記ピッチ情報発生部からのピッチ情報及び上記レベル情報発生部からのレベル情報に応じたフィルタ係数を出力する第１のフィルタ係数発生部と、この第１のフィルタ係数発生部からのフィルタ係数が設定されて上記波形発生装置からの出力をフィルタ処理するフォルマントフィルタとをさらに備えたことを特徴とするものである。
【００１９】
また、上記ピッチ情報発生部からのピッチ情報及び上記レベル情報発生部からのレベル情報に応じたフィルタ係数を出力する第２のフィルタ係数発生部と、この第２のフィルタ係数発生部からのフィルタ係数が設定されて上記フォルマンフィルタからの出力をフィルタ処理するブライトネスフィルタとをさらに備えたことを特徴とするものである。
【００２１】
【発明の実施の形態】
この発明では、入力解析音波形のピッチとレベルの時変データと、各ピッチとレベルにおける各基底の信号レベルとを分離して保持し、式（３）に示す出力合成音波形ｙ（ｔ）を得る。なお、式（３）において、ＡＡｎ（ｐ（ｔ），ａ（ｔ））はこの発明で用いるピッチとレベルのデータをｎ番目の基底のレベルデータに変換するデータテーブルまたは関数を示す。
【００２２】
【数３】

【００２３】
例えば、従来技術では各基底の信号レベルの時変データ列（１０，０００Ｎ点）をそのまま管理・制御しなければ再合成できなかったが。これに比較して、この発明では、全体のピッチとレベルのそれぞれの時変データ（各１０，０００点）列を保持しておき、変換関数ＡＡｎによってピッチとレベルから各基底の信号レベルに変換する。
各ピッチとレベルにおける各基底の信号レベルは、定常状態の入力解析音波形においてはほぼ一定の値をとることが解っており、これを参照テーブルまたは関数の形で保持しておくことで有効なデータ保存ができる。
一方、入力解析音のピッチとレベルの時変データは、演奏情報における過渡状態の情報を多数含むので、これら２つを用いて再合成することで再現性の高い出力合成音が発生可能となる。
【００２４】
このように、データを分離することによって新たな効果が生まれるので、それを説明する。
説明を明確にするために音声、特に歌唱データを例にとって説明する。
特定歌唱者の個人性を特徴付ける大きな要素は、一つはその歌唱音声自体が持っている特徴（特にはスペクトラム特性）であり、もう一つは歌唱音声の制御方法（通常歌いまわし等と言われる）の特徴である。この制御方法の大部分はピッチとレベルの制御の時変データ列で記述することができる。
従って、この発明のようにデータを分離することによって、変換関数ＡＡｎに歌唱音声自体の特徴が、残ったピッチとレベルの時変データ列に制御方法の特徴が、分離されて保存される。これによって新たな歌唱音の再合成が可能となる。
【００２５】
つまり、歌唱者Ａの入力解析音を分析し歌唱者Ａに対する変換関数ＡＡｎの組であるＡ−ＡＡｎの組を作成する。一方で歌唱者Ｂに歌唱者Ａが唄わない曲Ｃを唄わせ、そのピッチとレベルと分析し、時変データ列を作成する。然るに、この発明の音源システムに歌唱者Ａに対する変換関数ＡＡｎの組であるＡ−ＡＡｎの組を装備し、歌唱者Ｂが唄う曲Ｃの時変データ列を入力すると、歌唱者Ａが唄わなかった曲Ｃを、歌唱者Ａの歌声と歌唱者Ｂの歌いまわしを用いて再合成することができる。
【００２６】
また、このように特徴を分離できるので、この発明の音源システムに、声帯のスペクトラム特性のシミュレーションを行うフォルマントフィルタなどのスペクトラム特性を模倣させるシステムに、ピッチとレベルから係数データへの変換手段を同様に加えれば、さらにシステムの再現性を向上できる。
【００２７】
なお、この発明の技術は作り方から明らかに全ての加算合成方式に適用可能であり、基底波形形状に依らない。つまり、この発明では合成波形の最小単位に便宜的に基底という語を用いているものの、本来の基底（base）という語の意味を超えた適用が容易である。
例えば、ある楽音を周波数分析して得た基底（正弦波）のデータを周波数の低い方から等間隔で組にして（倍音グループと呼ぶ）、これら１グループの周波数分析データと等価な波形を新たな基底波形としてこの発明を適用することができる。
【００２８】
実施の形態１．
以下、具体的な実施の形態について図を参照して説明する。
図１は実施の形態１に係る音源システムを示す構成図である。
図１において、図７に示す従来例と同一部分は同一符号を付して、その説明は省略する。新たな符号として、５０はピッチとレベルを変数としてこれらに応じた基底波形の重み係数を格納してなる２次元テーブルメモリを各基底波形毎に有する重み係数発生部であり、ピッチ情報発生部１からのピッチ情報及びレベル情報発生部４からのレベル情報に基づいて各基底波形毎の重み係数ＡＡｎ（ｐ（ｔ），ａ（ｔ））を対応する各乗算器６Ｂｎに出力する。なお、７は上記重み係数ＡＡｎ（ｐ（ｔ），ａ（ｔ））が正規化されている場合に、加算器６Ｃから出力される出力合成音に係数ａ（ｔ）を乗算する乗算器であり、その出力は式（３）に示すものとなり、重み係数が正規化されていないものであれば、不要であり、その場合、加算器６Ｃから式（３）に示す出力合成音ｙ（ｔ）が得られる。
【００２９】
図２はさらに詳細に示す音源システムの概念図である。
図２に示すように、上記基底波形発生装置６Ａｎは位相データθ（ｔ）をアドレス入力として第１ないし第ＮのＫＬ基底メモリからＫＬ各基底波形を読み出してそれぞれ対応する乗算器６Ｂｎに出力する一方、上記重み係数発生部５０は、基底数に応じた２次元テーブルメモリ５１，５２，・・・，５ｎを有し、ピッチ情報とレベル情報に応じた第１ないし第Ｎ重み係数を対応する乗算器６Ｂｎにそれぞれ出力するようになっている。
【００３０】
また、ここで、上記２次元テーブルメモリには、図３に示す如く、互いに直交するピッチ軸とレベル軸及び重み係数軸でなる空間に重み係数が曲面をなすようにした重み係数を格納してなり、図示される重み係数曲面は各基底に応じて異なり、各２次元テーブルメモリから時々刻々のピッチ情報とレベル情報に応じた第１ないし第Ｎ重み係数が出力される事により、最適なスペクトラムをもつ波形の合成が可能になる。
【００３１】
また、上記２次元テーブルメモリの分解能を十分高く取れば、ピッチ情報とレベル情報に応じた重み係数を直ちに求めることが可能となるが、メモリ容量の低減化を図る場合、メモリ分解能を低めても下記に示す２次元テーブルメモリを用いた補間法による直線近似に従ってピッチ情報とレベル情報に応じた重み係数を求めることができる。
【００３２】
今、発音周波数Ｆ［Ｈｚ］とそのパワーＰが与えられているものとする。電子楽器では人間の聴覚特性に基づき以下のようにピッチとレベルを定義することが多い。
ピッチｐ＝６９＋１２ｌｏｇ₂（Ｆ／４４０）
レベルａ＝１０ｌｏｇ₁₀ｐ
ＫＬ重み係数がピッチｐ、レベルａに対してｗ（ｐ，ａ）のように表わされるとする。
【００３３】
テーブルを利用する方法
ピッチｐとレベルａを適当な分解能で離散化した集合Ｓ_p，Ｓ_aを用意する。
Ｓ_p＝｛p₀,p₁,p₂,・・・,p_N-1｝，Ｓ_a＝｛a₀,a₁,a₂,・・・,a_M-1｝
係数テーブルＴは係数ｗ（p,a）を上記Ｓ_p×Ｓ_a上でサンプリングしたものとする。
Ｔ（i,j）＝ｗ（p_i,a_j），
【００３４】
ｐ₀≦ｐ＜ｐ_N-1，ａ₀≦ａ＜ａ_M-1の範囲で任意にとったｐ，ａが与えられた時、ｗ（p,a）は係数テーブルＴにより次のように近似できる。
ｗ（p,a）≒Ｔ₀（i,j）＋△ｊ｛Ｔ₀（i,j+1）−Ｔ₀（i,j）｝
ここに、Ｔ₀（i,j）＝Ｔ（i,j）＋△ｉ｛Ｔ₀（i+1,j）−Ｔ（i,j）｝
ｐ_i≦ｐ＜ｐ_i+1，ａ_j≦ａ＜ａ_j+1，
△ｉ＝（ｐ−ｐ_i）／（ｐ_i+1−ｐ_i），△ｊ＝（ａ−ａ_i）／（ａ_i+1−ａ_i）
【００３５】
このような補間は１次補間であるが、テーブル分解能を十分高くとれば補間は必要ないのは勿論である。逆に、さらに省メモリを求める場合は２次以上の補間を使用することもできる。
【００３６】
従って、この実施の形態１によれば、下記のような効果を達成できる。
１）基底波形の加算重み係数をピッチとレベルを変数とする２次元テーブルとしてメモリに保持することにより、従来の加算合成方式のように、入力解析音を再現する為に、重み係数の時変データ列を基底個数分保持する必要はない。
２）楽音の特徴を、楽音のピッチとレベルに依存して決定するデータと演奏技法によるデータに分けて管理することができる。
３）入力解析音からは、楽音の時間変化に依らない、楽音のピッチとレベルに依存して決定する特徴のみが採取される。
従って、これらにより、
４）ピッチとレベルの変化に対して従来のものより音色の追従性がよく、より再現性の高い合成音が発生できる。
５）ピッチとレベルの演奏技法に係わる時変データは、入力解析音と別のデータから抽出したものが使える。
６）さらに補間法による直線近似を利用すれば、ピッチ情報とレベル情報に応じた重み係数を求めるのに、２次元テーブルメモリのメモリ容量の低減化を図ることができる。
【００３７】
実施の形態２．
上述した実施の形態１では、図１の重み係数発生部５０に、ピッチ及びレベルに応じた重み係数を格納するＫＬ基底波形毎の２次元テーブルメモリを備えたが、この実施の形態２では、ピッチ及びレベルを変数とした２変数多項式を近似したときの係数を用いて重み係数を求める場合について説明する。
【００３８】
係数を２変数多項式で近似することにより、一層少ないメモリで実現することも可能である。例えば多項式をピッチ情報ｐ（ｔ）とレベル情報ａ（ｔ）の２変数多項式として、Ｋをピッチ情報ｐ（ｔ）の次数、Ｌをレベル情報ａ（ｔ）の次数として指定すると、式（４）に示す２変数多項式ＡＡｎ（p(t),a(t)）を式（５）となるように係数Ｃ_stn を決定することで係数を近似できる。
【００３９】
【数４】

【００４０】
この場合、多項式計算によってｐ，ａから直接重み係数の近似値を求めることができる。
式（５）は、すべてのｉ，ｊのうちで│ＡＡｎ（p,a）−ｗ（p_i,a_j）│の最大値となるｉ，ｊの組において、その最大値を最小値にするＣ_stn を意味し、式（４）、（５）をＫ：Ｌ＝１について展開すると、２変数多項式ＡＡｎ（p,a）は、式（６）に示すものとなる。
ＡＡｎ（p,a）＝Ｃ_00n＋Ｃ_01nｐ＋Ｃ_10nａ＋Ｃ_11nｐａ（６）
【００４１】
図４はＫ＝Ｌ＝１の場合を等価回路図を図示したもので、この実施の形態２に係る重み係数発生部６０の内部構成の一例に相当する図である。
なお、この実施の形態２における全体構成は図１に示す実施の形態１と同様であるが、図１に示す重み係数発生部５０を、この実施の形態２では重み係数発生部６０として示し、その内部構成の一例を図４に示している。
【００４２】
すなわち、図４に示すように、この実施の形態２に係る重み係数発生部６０としては、ピッチｐ及びレベルａを変数とする２変数多項式を近似したときの上記係数Ｃ_00n，Ｃ_01n，Ｃ_10n，Ｃ_11nを各基底波形（添字ｎに対応）毎に格納してなる係数メモリ６１と、ピッチ情報発生部１からのピッチ情報ｐとレベル情報発生部４からのレベル情報ａ及び上記係数メモリ６１からの各基底波形毎の係数に基づいて重み係数を演算する基底波形毎の重み係数演算器６２とを備えており、これら重み係数演算器６２からの各基底波形毎の重み係数を各乗算器６Ｂｎに出力するようにしている。図中、６２ａ〜６２ｄは乗算器、６２ｅは加算器を示し、図示構成は各基底波形毎に備えられている。
【００４３】
なお、係数の定義域全体を単一の多項式で近似するのは誤差が大きく実用的でない。必要に応じて区分多項式で近似するなどの工夫もできよう。
また、因に２変数多項式ＡＡｎ（p,a）をＫ＝Ｌ＝２の場合について展開すると、式（７）に示すものとなる。
ＡＡｎ（p,a）＝Ｃ_00n＋Ｃ_10nｐ＋Ｃ_20nｐ²＋Ｃ_01nａ＋Ｃ_11nｐａ＋Ｃ_21nｐ²ａ＋Ｃ_02nａ²＋Ｃ_12nｐａ²＋Ｃ_22nｐ²ａ² （７）
【００４４】
従って、上記実施の形態２によれば、重み係数を２変数関数として発生することにより、入力解析音を再現する為に、管理しなければならないデータ個数をさらに減らすことができる。
【００４５】
実施の形態３．
次に、図５は実施の形態３に係る音源システムを示す構成図である。
図５において、図１に示す実施の形態１と同一部分は同一符号を付して、その説明は省略する。新たな符号として、８はピッチ情報発生部１からのピッチ情報ｐ（ｔ）及びレベル情報発生部４からのレベル情報ａ（ｔ）に応じたフォルマントフィルタのフィルタ係数を出力するフィルタ係数発生部、９はこのフィルタ係数発生部８からのフィルタ係数が設定されて波形発生装置６からの出力をフィルタ処理して楽音のピークフォルマントを再現するためのフォルマントフィルタであり、ここで、上記フィルタ係数発生部８は、実施の形態１の重み係数発生部５０と同様な構成でなり、実施の形態１に対して上記フィルタ係数発生部８と上記フォルマントフィルタ９が追加されている。
【００４６】
フォルマントは、楽音のスペクトラムエンベロープ（包絡線）のピークを言い、フォルマントフィルタ９はそのフォルマントのピーク周波数、ピークレベル、Ｑを再現する。また、ここでは、フォルマントフィルタと総括的に述べたものであるが、フォルマントフィルタの具体例としては、格子型フィルタ、伝達関数の直接構成、２次ＩＩＲフィルタの縦続モデル、梯子型フィルタなど様々なものが適応可能であるが、どの実装方法を採用するかは本質的な問題ではなく、いずれの場合も適用可能である。
【００４７】
従って、この実施の形態３によれば、フォルマントフィルタ９のフィルタ係数を重み係数発生部５０と同様の２次元係数テーブルで保持することにより、ピッチとレベルの変化に対して適切なフォルマントを与えることができ、より再現性の高い合成音を発生できる。
【００４８】
実施の形態４．
次に、図６は実施の形態４に係る音源システムを示す構成図である。
図６において、図５に示す実施の形態３と同一部分は同一符号を付して、その説明は省略する。新たな符号として、１０はピッチ情報発生部１からのピッチ情報ｐ（ｔ）及びレベル情報発生部４からのレベル情報ａ（ｔ）に応じたブライトネスフィルタのフィルタ係数を出力するフィルタ係数発生部、１１はこのフィルタ係数発生部１０からのフィルタ係数が設定されてフォルマントフィルタ９を介した出力をフィルタ処理して、音声においては唇の動きによるスペクトラム変動、楽器音においては各種演奏手法によるスペクトラム変動を再現するためのブライトネスフィルタであり、ここで、上記フィルタ係数発生部８は、実施の形態１の重み係数発生部５０と同様な構成でなり、実施の形態３に対して上記フィルタ係数発生部１０と上記ブライトネスフィルタ１１が追加されている。
【００４９】
このブライトネスフィルタも、具体的な実例としては各種考えられるが、ここでは１次ＩＩＲフィルタモデルを挙げる。
従って、この実施の形態４によれば、ブライトネスフィルタのフィルタ係数を重み係数発生部５０と同様の２次元係数テーブルで保持することにより、ピッチとレベルの変化に対して演奏表現などによる音色変化を分離できて、より再現性の高い合成音が発生できる。
【００５０】
【発明の効果】
以上のように、この発明によれば、重み係数発生部に、ピッチ及びレベルに応じた重み係数を格納してなる基底波形毎の２次元テーブルメモリを備え、ピッチ情報発生部からのピッチ情報及びレベル情報発生部からのレベル情報に基づいて各基底波形毎の重み係数を出力するようにしたので、基底波形の加算重み係数をピッチとレベルの２次元テーブルとして保持することにより、従来の加算合成方式のように、入力解析音を再現する為に、重み係数の時変データ列を基底個数分保持する必要はなく、楽音の特徴を、楽音のピッチとレベルに依存して決定するデータと演奏技法によるデータに分けて管理することができ、入力解析音からは、楽音の時間変化に依らない、楽音のピッチとレベルに依存して決定する特徴のみが採取される結果、ピッチとレベルの変化に対して従来のものより音色の追従性がよく、より再現性の高い合成音が発生できる。
【００５１】
また、上記２次元テーブルメモリを用いて補間法による直線近似によりピッチ及びレベルに応じた重み係数を求めるようにすることにより、メモリ容量を削減できる。
【００５２】
また、他の発明によれば、重み係数発生部を、ピッチ及びレベルを変数とする２変数多項式を近似したときの係数を各基底波形毎に格納してなる係数メモリと、ピッチ情報発生部からのピッチ情報とレベル情報発生部からのレベル情報及び上記係数メモリからの各基底波形毎の係数に基づいて重み係数を演算する基底波形毎の重み係数演算器とで構成することにより、重み係数を２変数関数として発生することで、入力解析音を再現する為に管理しなければならないデータ個数をさらに減らすことができる。
【００５３】
また、ピッチ情報発生部からのピッチ情報及びレベル情報発生部からのレベル情報に応じたフィルタ係数を出力する第１のフィルタ係数発生部と、この第１のフィルタ係数発生部からのフィルタ係数が設定されて上記波形発生装置からの出力をフィルタ処理するフォルマントフィルタとをさらに備えるようにしたので、ピッチとレベルの変化に対して適切なフォルマントを与えられ、より再現性の高い合成音が発生できる。
【００５４】
また、ピッチ情報発生部からのピッチ情報及びレベル情報発生部からのレベル情報に応じたフィルタ係数を出力する第２のフィルタ係数発生部と、この第２のフィルタ係数発生部からのフィルタ係数が設定されて上記フォルマンフィルタからの出力をフィルタ処理するブライトネスフィルタとをさらに備えるようにしたので、ピッチとレベルの変化に対して演奏表現などによる音色変化を分離できて、より再現性の高い合成音が発生できる。
【００５５】
さらに、基底波形発生装置からＫＬ基底波形を出力することで、音源の基底としてＫＬ基底波形を用いることで、各基底の重み係数を求めるのに、演算が容易なものとなり、演算量の削減を図ることができる。
【図面の簡単な説明】
【図１】この発明の実施の形態１に係る音源システムを示す構成図である。
【図２】図１の重み係数発生部５０の内部構成を説明するための概念図である。
【図３】図１の重み係数発生部５０が有する２次元テーブルメモリの格納内容を説明するための概念図である。
【図４】この発明の実施の形態２に係る音源システムを説明するもので、重み係数発生部６０の一例を示す構成図である。
【図５】この発明の実施の形態３に係る音源システムを示す構成図である。
【図６】この発明の実施の形態４に係る音源システムを示す構成図である。
【図７】従来例に係る音源システムを示す構成図である。
【図８】図７のピッチ位相変換部と位相発生器を示す構成図である。
【符号の説明】
１ピッチ情報発生部、２ピッチ／位相変換部、３位相発生部、４レベル情報発生部、６波形発生装置、６Ａ基底波形発生装置、６Ａ１，６Ａ２，・・・，６ＡｎＫＬ基底メモリ、６Ｂ１，６Ｂ２，・・・，６Ｂｎ乗算器、６Ｃ加算器、７加算器、８フィルタ係数発生部、９フォルトマントフィルタ、１０フィルタ係数発生部、１１ブライトネスフィルタ、５０重み係数発生部、５１、５２、・・・、５ｎ２次元テーブルメモリ、６０重み係数発生部、６１係数メモリ、６２重み係数演算器。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a sound source system having a waveform generator used in an electronic musical instrument or a speech synthesizer, and more particularly to a sound source system including a waveform generator using an additive synthesis method.
[0002]
[Prior art]
The technique collectively referred to as additive synthesis technique refers to a wide range of musical sound synthesis methods or speech synthesis methods such as sine wave addition synthesis, composite sine wave method, phase vocoder, sinusoidal model synthesis, etc. Yes, waveform generators that generate multiple base waveforms (periodic waveforms) originating from a consistent analysis / resynthesis system based on frequency analysis such as Fourier analysis are weighted according to the appropriate signal level of each base This is a sound source method in which one output synthesized sound is obtained by addition. Hereinafter, for the sake of simplicity, musical sounds and voices are collectively referred to as musical sounds, and musical tone waveform generating devices and synthesizing devices are expressed as sound sources.
[0003]
Although a slightly different method has been proposed depending on the relationship between the frequencies of the base waveforms, the method for deriving the weighting coefficient, and the method for controlling the time change of the coefficient, these are referred to as addition synthesis methods.
The addition synthesis method is known to be an analysis and synthesis system that can generate a synthesized sound theoretically without losing information of the analyzed sound because the analysis / resynthesis system of musical sounds is the same.
[0004]
FIG. 7 illustrates an example of a musical sound synthesizer using a KL (Karhunen Loeve) basis method.
This method is disclosed in the following two documents, and FIG. 7 particularly illustrates the synthesis model of the document 2.
Reference 1: Karhunen-Loeve-Based Additive Synthesis of Musical Tones IEEE 1986 pp581-584
Reference 2: Implementation of the KL Synthesis Algorithm under Real-Time Control ICMC Proceedings 1991 pp360-363
[0005]
In FIG. 7, 1 is a pitch information generating unit that outputs pitch information p (t) of input analysis sound waveform x (t) (t is discrete time, t = kT, k: integer, T: sampling time), Assuming that the pitch information p (t) from the pitch information generating unit 1 is in a data format proportional to the logarithm of the fundamental frequency, as shown in FIG. The pitch / phase conversion unit 3 converts the data φ (t) (= FR (p (t))) into the formula (1) based on the phase increment value data φ (t) from the pitch / phase conversion unit 2. The phase generator outputs phase data θ (t) according to the above, and is realized by an unsigned adder (two's complement overflow adder) as shown in FIG.
[0006]
[Expression 1]

[0007]
Reference numeral 4 denotes a level information generator for outputting level information a (t) of the input analysis sound waveform, and outputs final output sound level information obtained by integrating velocity information and level envelope information into volume information. A weighting factor generator 5 generates a weighting factor An (t) for each base waveform (n indicates the order of base waveforms) based on the level information a (t) from the level information generator 4.
[0008]
Further, 6 synthesizes each base waveform obtained by multiplying the weight coefficient based on the phase data θ (t) from the phase generator 3 and the weight coefficient An (t) for each base waveform from the weight coefficient generator 5. This waveform generator 6 obtains an output synthesized sound waveform y (t). The waveform generator 6 stores waveform data corresponding to the phase of the KL base, and corresponds to the phase data θ (t) as an address. The base waveform generator 6A having KL base memories 6A1, 6A2,..., 6An corresponding to the number of bases that output the base waveform data Fn (t), and the base waveform data Fn (t) from each KL base wave memory And a multiplier 6Bn corresponding to the number of bases for multiplying the weighting coefficient An (t) for each base waveform from the weighting coefficient generator 5, and the output synthesis shown in Expression (2), which combines the output waveforms of these multipliers 6Bn Waveform y (t) And an adder 6C for obtaining sound. In Equation (2), N indicates the total number of base waveforms.
[0009]
[Expression 2]

[0010]
[Problems to be solved by the invention]
By the way, two major problems are known in the above-described conventional additive synthesis technique.
One is the addition of several base waveforms to form one output waveform, so the amount of computation that must be spent on synthesizing one musical tone is particularly important when reproducing a realistic timbre close to a real musical tone. Become enormous.
Therefore, it is difficult to increase the number of bases if the amount of computation is limited, but conversely, if the number of bases is reduced, the total number of overtones that can be reproduced decreases. In that case, if you give priority to harmonic overtones with low frequency, it will become a synthesized tone that lacks high-band components, and if you give priority to overtones with a high level, it will become a synthesized tone with unnaturally spaced harmonics. Problems arise.
[0011]
The other is that when the reproducibility is improved, the control of the addition weight coefficient data of each base becomes complicated. The weighting coefficient generated from the weighting coefficient generator 5 is generally signal level data determined by Fourier transform or short time Fourier transform (STFT), but the number of data is the number of analysis points. There are as many time-varying data sequences as there are bases. That is, when the input analysis sound is analyzed with a data string of 10,000 samples, the number of data is 10,000N. In this case, it is known that good reproducibility can be obtained if the pitch and level of the input analysis sound and the output synthesized sound are the same, but if the pitch and level are changed, it is known that the reproducibility decreases with increasing distance from the input analysis sound. .
[0012]
In addition, since the amount of data is enormous because it deals with time-varying data strings for the number of bases, it is difficult to hold and control the data as it is, so for example, a relatively simple envelope such as polygonal linear approximation (Piecewise Linear Segments). A type of generator is often substituted, but at present it is estimated that most of the information peculiar to the input analysis sound is included in the fine fluctuation information included in the envelope curve. A large amount of information is lost by expressing it with a simple envelope curve such as a broken line, resulting in a deterioration in the reproducibility of musical sounds.
[0013]
Although there are cases where the sound source base is a sine wave, in the case of the conventional example shown in FIG. 7, the base is the KL base, and when each of the KL bases is subjected to spectrum analysis, each of them has a spectrum shape of the input analysis sound. It has some features and aims to reduce the amount of computation by using a KL basis waveform instead of a sine wave as the basis of the sound source, and solves the former of the above two problems. However, in the example shown in FIG. 7, since the weight coefficient generation unit 5 uses a polygonal line approximation for the base envelope generator to obtain the weight coefficient of each base that becomes a time-varying data string of the level, the latter problem Has not been resolved.
[0014]
The present invention has been made to solve the above-described problems of the conventional example, and in the addition synthesis method sound source system, it is possible to realize improvement in reproducibility while having a simple data control structure, It is an object of the present invention to provide a sound source system capable of generating an output synthesized sound that retains the characteristics of an input analysis sound even with new performance information by editing both pitch and level time-varying data strings.
[0015]
[Means for Solving the Problems]
In order to achieve the above object, a sound source system according to the present invention includes a pitch information generation unit that outputs pitch information of an input analysis sound waveform, and a pitch that converts pitch information from the pitch information generation unit into phase increment data. / Phase conversion unit, a phase generator that obtains phase data based on phase increment value data from the pitch / phase conversion unit, a level information generation unit that outputs level information of the input analysis sound waveform, and the level information A weighting factor generator that generates a weighting factor for each base waveform based on level information from the generator, a weight based on the phase data from the phase generator and the weighting factor for each base waveform from the weighting factor generator A waveform generator for obtaining an output synthesized sound of each base waveform multiplied by a coefficient, the waveform generator, KL It has a base memory corresponding to the number of bases that stores the base waveform, and the above phase data is used as an address input from each base memory. KL A base waveform generator that reads and outputs a base waveform, and each KL In a sound source system having a multiplier according to a base number for multiplying a base waveform by a weight coefficient for each base waveform from the weight coefficient generator, and an adder for obtaining a synthesized sound of outputs of these multipliers, the weight coefficient The generating unit is a two-dimensional table for each base waveform in which weighting coefficients corresponding to pitch and level are stored in such a manner that the weighting coefficient forms a curved surface in a space composed of a pitch axis, a level axis, and a weighting coefficient axis orthogonal to each other Each having a memory, based on the pitch information from the pitch information generator and the level information from the level information generator. KL A weighting factor for each base waveform is output to each multiplier.
[0016]
The base waveform generator is characterized in that a weighting factor corresponding to a pitch and a level is obtained by linear approximation using an interpolation method using the two-dimensional table memory.
[0017]
A sound source system according to another invention includes a pitch information generation unit that outputs pitch information of an input analysis sound waveform, and a pitch / phase conversion unit that converts pitch information from the pitch information generation unit into phase increment value data. A phase generator for obtaining phase data based on phase increment value data from the pitch / phase converter, a level information generator for outputting level information of the input analysis sound waveform, and a level from the level information generator A weighting factor generator that generates a weighting factor for each base waveform based on information, and each of the weighting factors multiplied based on the phase data from the phase generator and the weighting factor for each base waveform from the weighting factor generator A waveform generator for obtaining an output synthesized sound of a base waveform, the waveform generator is KL It has a base memory corresponding to the number of bases that stores the base waveform, and the above phase data is used as an address input from each base memory. KL A base waveform generator that reads and outputs a base waveform, and each KL In a sound source system having a multiplier according to a base number for multiplying a base waveform by a weight coefficient for each base waveform from the weight coefficient generator, and an adder for obtaining a synthesized sound of outputs of these multipliers, the weight coefficient The generator is a coefficient obtained by approximating a two-variable polynomial whose pitch and level are variables to a weighting factor in which the weighting factor forms a curved surface in a space consisting of a pitch axis, a level axis and a weighting factor axis that are orthogonal to each other. For each base waveform, the pitch information from the pitch information generator, the level information from the level information generator, and the weight coefficient based on the coefficients for each base waveform from the coefficient memory And a weighting factor calculator for each base waveform for calculating each of the weighting factor calculators. KL A weighting factor for each base waveform is output to each multiplier.
[0018]
Also, a first filter coefficient generator for outputting the filter information corresponding to the pitch information from the pitch information generator and the level information from the level information generator, and the filter coefficient from the first filter coefficient generator And a formant filter for filtering the output from the waveform generator.
[0019]
A second filter coefficient generator for outputting the filter information corresponding to the pitch information from the pitch information generator and the level information from the level information generator; and the filter coefficient from the second filter coefficient generator. And a brightness filter for filtering the output from the Forman filter.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
In the present invention, the time-varying data of the pitch and level of the input analysis sound waveform and the signal level of each base at each pitch and level are separated and held, and the output synthesized sound waveform y (t) shown in Expression (3). Get. In Expression (3), AAn (p (t), a (t)) represents a data table or function for converting pitch and level data used in the present invention into n-th base level data.
[0022]
[Equation 3]

[0023]
For example, in the prior art, the time-varying data string (10,000 N points) of the signal level of each base cannot be recombined unless it is managed and controlled as it is. In contrast, in the present invention, each time-varying data (10,000 points each) of the entire pitch and level is held and converted from the pitch and level to the signal level of each base by the conversion function AAn. To do.
It has been found that the signal level of each base at each pitch and level takes a substantially constant value in the steady-state input analysis sound waveform, and it is effective to hold this in the form of a reference table or function. Data can be saved.
On the other hand, the time-varying data of the pitch and level of the input analysis sound includes a lot of information on the transient state in the performance information, and it is possible to generate an output synthesized sound with high reproducibility by re-synthesis using these two. .
[0024]
As described above, a new effect is produced by separating the data.
In order to clarify the explanation, explanation will be given by taking voice, particularly song data, as an example.
One of the major elements that characterize the individuality of a specific singer is the characteristics (especially spectrum characteristics) of the singing voice itself, and the other is said to be a singing voice control method (usually singing) ). Most of this control method can be described by time-varying data strings for pitch and level control.
Therefore, by separating the data as in the present invention, the characteristics of the singing voice itself are separated into the conversion function AAn, and the characteristics of the control method are separated and stored in the time-varying data string of the remaining pitch and level. This makes it possible to re-synthesize a new singing sound.
[0025]
That is, the input analysis sound of the singer A is analyzed, and a set of A-AAn that is a set of conversion functions AAn for the singer A is created. On the other hand, a song C that is not sung by the singer A is given to the singer B, and its pitch and level are analyzed to create a time-varying data string. However, when the sound source system of the present invention is equipped with a set A-AAn which is a set of conversion functions AAn for the singer A, and the time-varying data string of the song C sung by the singer B is input, the singer A does not speak. Song C can be re-synthesized using the voice of singer A and the song of singer B.
[0026]
In addition, since the features can be separated in this way, the sound source system of the present invention has the same means for converting pitch and level into coefficient data in a system that imitates the spectrum characteristics such as a formant filter that simulates the spectrum characteristics of the vocal cords. In addition, the reproducibility of the system can be further improved.
[0027]
The technique of the present invention is obviously applicable to all addition and synthesis methods from the manufacturing method, and does not depend on the base waveform shape. In other words, in the present invention, although the term “base” is used for the minimum unit of the composite waveform for the sake of convenience, application beyond the meaning of the word “base” is easy.
For example, base (sine wave) data obtained by frequency analysis of a certain musical sound are grouped at equal intervals from the lowest frequency (called overtone group), and a waveform equivalent to the frequency analysis data of these one group is newly added. The present invention can be applied as a simple base waveform.
[0028]
Embodiment 1 FIG.
Hereinafter, specific embodiments will be described with reference to the drawings.
FIG. 1 is a configuration diagram illustrating a sound source system according to the first embodiment.
In FIG. 1, the same parts as those in the conventional example shown in FIG. As a new code, reference numeral 50 denotes a weight coefficient generation unit having a two-dimensional table memory for each base waveform, in which the base waveform weight coefficients corresponding to the pitch and level are stored as variables. , And the weight information AAn (p (t), a (t)) for each base waveform are output to the corresponding multipliers 6Bn based on the pitch information from the level information and the level information from the level information generator 4. Reference numeral 7 denotes a multiplier that multiplies the output synthesized sound output from the adder 6C by a coefficient a (t) when the weighting coefficient AAn (p (t), a (t)) is normalized. Yes, the output is as shown in equation (3), and is unnecessary if the weighting factor is not normalized. In this case, the output synthesized sound y (t) shown in equation (3) from the adder 6C ) Is obtained.
[0029]
FIG. 2 is a conceptual diagram of the sound source system shown in more detail.
As shown in FIG. 2, the base waveform generator 6An uses the phase data θ (t) as an address input, reads out the KL base waveforms from the first to Nth KL base memories, and outputs them to the corresponding multipliers 6Bn. On the other hand, the weighting factor generator 50 has two-

dimensional table memories

51, 52,..., 5n corresponding to the base number, and corresponds to the first to Nth weighting factors corresponding to the pitch information and the level information. Each is output to the multiplier 6Bn.
[0030]
Here, the two-dimensional table memory stores a weighting factor in which the weighting factor forms a curved surface in a space composed of a pitch axis, a level axis, and a weighting factor axis that are orthogonal to each other, as shown in FIG. Thus, the weighting coefficient curved surface shown in the figure differs depending on each base, and the optimal spectrum is obtained by outputting the first to Nth weighting coefficients corresponding to the pitch information and level information every moment from each two-dimensional table memory. It becomes possible to synthesize waveforms with
[0031]
If the resolution of the two-dimensional table memory is sufficiently high, it is possible to immediately obtain the weighting coefficient according to the pitch information and the level information. However, if the memory capacity is to be reduced, the memory resolution can be lowered. Weight coefficients corresponding to pitch information and level information can be obtained according to linear approximation by interpolation using a two-dimensional table memory shown below.
[0032]
Assume that the sound generation frequency F [Hz] and its power P are given. Electronic musical instruments often define pitch and level based on human auditory characteristics as follows.
Pitch p = 69 + 12log ₂ (F / 440)
Level a = 10log _Ten p
Assume that the KL weight coefficient is expressed as w (p, a) with respect to pitch p and level a.
[0033]
How to use a table
Set S in which pitch p and level a are discretized with an appropriate resolution _p , S _a Prepare.
S _p = {P ₀ , p ₁ , p ₂ , ..., p _N-1 }, S _a = {A ₀ , a ₁ , a ₂ , ..., a _M-1 }
The coefficient table T uses the coefficient w (p, a) as S _p × S _a Assume that you sampled above.
T (i, j) = w (p _i , a _j ),
[0034]
p ₀ ≦ p <p _N-1 , A ₀ ≦ a <a _M-1 W (p, a) can be approximated by the coefficient table T as follows when p and a arbitrarily taken in the range of are given.
w (p, a) ≒ T ₀ (I, j) + △ j {T ₀ (I, j + 1) -T ₀ (I, j)}
Where T ₀ (I, j) = T (i, j) + Δi {T ₀ (I + 1, j) -T (i, j)}
p _i ≦ p <p _{i + 1} , A _j ≦ a <a _{j + 1} ,
Δi = (pp _i ) / (P _{i + 1} -P _i ), Δj = (aa _i ) / (A _{i + 1} -A _i )
[0035]
Such interpolation is linear interpolation, but it is needless to say that interpolation is not necessary if the table resolution is sufficiently high. Conversely, when more memory saving is required, second-order or higher-order interpolation can be used.
[0036]
Therefore, according to the first embodiment, the following effects can be achieved.
1) By holding the addition weighting coefficient of the base waveform in a memory as a two-dimensional table with pitch and level as variables, the time-varying of the weighting coefficient to reproduce the input analysis sound as in the conventional additive synthesis method There is no need to retain the number of data strings for the base number.
2) It is possible to manage the characteristics of musical sounds by dividing them into data determined depending on the pitch and level of musical sounds and data based on performance techniques.
3) From the input analysis sound, only the characteristics determined depending on the pitch and level of the musical sound, which do not depend on the time change of the musical sound, are collected.
Therefore, with these,
4) The timbre follows better than the conventional ones with respect to changes in pitch and level, and can generate a synthesized sound with higher reproducibility.
5) Time-variant data related to pitch and level performance techniques can be extracted from the input analysis sound and other data.
6) Further, if linear approximation by the interpolation method is used, the memory capacity of the two-dimensional table memory can be reduced in order to obtain the weighting coefficient according to the pitch information and the level information.
[0037]
Embodiment 2. FIG.
In the first embodiment described above, the weight coefficient generation unit 50 of FIG. 1 includes the two-dimensional table memory for each KL base waveform that stores the weight coefficient according to the pitch and level. In the second embodiment, A case will be described in which a weighting factor is obtained using a coefficient obtained by approximating a two-variable polynomial with pitch and level as variables.
[0038]
By approximating the coefficients with a two-variable polynomial, it is possible to realize with less memory. For example, if a polynomial is designated as a two-variable polynomial of pitch information p (t) and level information a (t), K is designated as the order of pitch information p (t), and L is designated as the order of level information a (t), then equation (4) ) Of the two variable polynomial AAn (p (t), a (t)) shown in FIG. _stn The coefficient can be approximated by determining.
[0039]
[Expression 4]

[0040]
In this case, an approximate value of the weighting factor can be obtained directly from p and a by polynomial calculation.
Equation (5) can be expressed as | AAn (p, a) −w (p _i , a _j ) In the set of i and j that is the maximum value of | _stn When the equations (4) and (5) are expanded with respect to K: L = 1, the two-variable polynomial AAn (p, a) is as shown in the equation (6).
AAn (p, a) = C _00n + C _01n p + C _10n a + C _11n pa (6)
[0041]
FIG. 4 shows an equivalent circuit diagram in the case of K = L = 1, and is a diagram corresponding to an example of the internal configuration of the weighting factor generator 60 according to the second embodiment.
The overall configuration of the second embodiment is the same as that of the first embodiment shown in FIG. 1, but the weighting factor generating unit 50 shown in FIG. 1 is shown as a weighting factor generating unit 60 in the second embodiment. An example of the internal configuration is shown in FIG.
[0042]
That is, as shown in FIG. 4, the weight coefficient generation unit 60 according to the second embodiment uses the above-described coefficient C when approximating a two-variable polynomial having the pitch p and the level a as variables. _00n , C _01n , C _10n , C _11n Is stored for each base waveform (corresponding to the subscript n), the pitch information p from the pitch information generator 1, the level information a from the level information generator 4, and the coefficient memory 61 from the coefficient memory 61. A weight coefficient calculator 62 for each base waveform that calculates a weight coefficient based on the coefficient for each base waveform, and outputs the weight coefficient for each base waveform from the weight coefficient calculator 62 to each multiplier 6Bn. Like to do. In the figure, reference numerals 62a to 62d denote multipliers, and 62e denotes an adder. The illustrated configuration is provided for each base waveform.
[0043]
It is not practical to approximate the entire domain of coefficients with a single polynomial because of the large error. It can be devised such as approximating with a piecewise polynomial if necessary.
Further, when the two-variable polynomial AAn (p, a) is expanded in the case of K = L = 2, the equation (7) is obtained.
AAn (p, a) = C _00n + C _10n p + C _20n p ² + C _01n a + C _11n pa + C _21n p ² a + C _02n a ² + C _12n pa ² + C _22n p ² a ² (7)
[0044]
Therefore, according to the second embodiment, the number of data that must be managed in order to reproduce the input analysis sound can be further reduced by generating the weighting coefficient as a two-variable function.
[0045]
Embodiment 3 FIG.
Next, FIG. 5 is a block diagram showing a sound source system according to the third embodiment.
In FIG. 5, the same parts as those of the first embodiment shown in FIG. As a new code, 8 is a filter coefficient generator that outputs the filter coefficients of the formant filter according to the pitch information p (t) from the pitch information generator 1 and the level information a (t) from the level information generator 4, Reference numeral 9 denotes a formant filter for setting the filter coefficient from the filter coefficient generator 8 and filtering the output from the waveform generator 6 to reproduce the peak formant of the musical sound. Here, the filter coefficient generator 8 has the same configuration as that of the weight coefficient generation unit 50 of the first embodiment, and the filter coefficient generation unit 8 and the formant filter 9 are added to the first embodiment.
[0046]
The formant refers to the peak of the spectrum envelope (envelope) of the musical tone, and the formant filter 9 reproduces the peak frequency, peak level, and Q of the formant. In addition, the formant filter is generally described here, but specific examples of the formant filter include various types such as a lattice filter, a direct configuration of a transfer function, a cascaded model of a second-order IIR filter, and a ladder filter. Things are adaptable, but which implementation method to adopt is not an essential issue and can be applied in any case.
[0047]
Therefore, according to the third embodiment, by holding the filter coefficients of the formant filter 9 in the same two-dimensional coefficient table as the weight coefficient generation unit 50, an appropriate formant is given to the change in pitch and level. It is possible to generate synthetic sounds with higher reproducibility.
[0048]
Embodiment 4 FIG.
Next, FIG. 6 is a block diagram showing a sound source system according to the fourth embodiment.
In FIG. 6, the same parts as those of the third embodiment shown in FIG. As a new code, 10 is a filter coefficient generation unit that outputs the filter coefficient of the brightness filter according to the pitch information p (t) from the pitch information generation unit 1 and the level information a (t) from the level information generation unit 4; Reference numeral 11 denotes a filter coefficient set from the filter coefficient generator 10 and filters the output through the formant filter 9 to perform spectrum fluctuation due to lip movement in voice and spectrum fluctuation due to various performance techniques in instrument sounds. It is a brightness filter for reproduction. Here, the filter coefficient generation unit 8 has the same configuration as the weight coefficient generation unit 50 of the first embodiment, and the filter coefficient generation unit 10 has a configuration similar to that of the third embodiment. The brightness filter 11 is added.
[0049]
Various types of brightness filters can be considered as specific examples. Here, a first-order IIR filter model is given.
Therefore, according to the fourth embodiment, the filter coefficient of the brightness filter is held in the same two-dimensional coefficient table as that of the weight coefficient generation unit 50, so that the timbre change due to the performance expression or the like can be changed with respect to the change in pitch and level. Synthetic sound that can be separated and has higher reproducibility can be generated.
[0050]
【The invention's effect】
As described above, according to the present invention, the weighting coefficient generation unit includes the two-dimensional table memory for each base waveform in which the weighting coefficient corresponding to the pitch and the level is stored, and the pitch information from the pitch information generation unit and Since the weighting coefficient for each base waveform is output based on the level information from the level information generating unit, the conventional weighting synthesis is performed by holding the base waveform additional weighting coefficient as a two-dimensional table of pitch and level. Unlike the method, it is not necessary to retain the time-varying data string of the weighting coefficients for the number of bases to reproduce the input analysis sound, and the data and performance that determine the characteristics of the musical sound depending on the pitch and level of the musical sound It is possible to manage the data separately by technique, and from the input analysis sound, only the characteristics that are determined depending on the pitch and level of the musical sound, which do not depend on the time change of the musical sound, are collected. Pitch and level tracking of tone than the conventional to changes in well, more highly reproducible synthetic speech can be generated.
[0051]
In addition, the memory capacity can be reduced by obtaining the weighting coefficient corresponding to the pitch and level by linear approximation by the interpolation method using the two-dimensional table memory.
[0052]
According to another invention, the weight coefficient generation unit includes a coefficient memory storing coefficients for each base waveform when approximating a two-variable polynomial having pitch and level as variables, and a pitch information generation unit. The weight coefficient by calculating the weight coefficient based on the pitch information and the level information from the level information generator and the coefficient for each base waveform from the coefficient memory. By generating it as a two-variable function, the number of data that must be managed in order to reproduce the input analysis sound can be further reduced.
[0053]
In addition, a first filter coefficient generation unit that outputs a filter coefficient corresponding to pitch information from the pitch information generation unit and level information from the level information generation unit, and a filter coefficient from the first filter coefficient generation unit are set. Since a formant filter for filtering the output from the waveform generator is further provided, an appropriate formant is given to the change in pitch and level, and a synthesized sound with higher reproducibility can be generated.
[0054]
In addition, a second filter coefficient generating unit that outputs filter coefficients corresponding to pitch information from the pitch information generating unit and level information from the level information generating unit, and a filter coefficient from the second filter coefficient generating unit are set. And a brightness filter that filters the output from the above-mentioned Forman filter, so that timbre changes due to performance expressions can be separated from changes in pitch and level, resulting in a more reproducible synthesized sound. Can occur.
[0055]
Furthermore, by outputting the KL base waveform from the base waveform generator, using the KL base waveform as the sound source base makes it easy to calculate the weighting coefficient of each base, reducing the amount of calculation. Can be planned.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing a sound source system according to Embodiment 1 of the present invention.
2 is a conceptual diagram for explaining an internal configuration of a weight coefficient generation unit 50 in FIG. 1;
3 is a conceptual diagram for explaining the contents stored in a two-dimensional table memory included in the weighting coefficient generation unit 50 of FIG.
FIG. 4 is a block diagram illustrating an example of a weighting factor generator 60 for explaining a sound source system according to a second embodiment of the present invention.
FIG. 5 is a block diagram showing a sound source system according to Embodiment 3 of the present invention.
FIG. 6 is a block diagram showing a sound source system according to Embodiment 4 of the present invention.
FIG. 7 is a configuration diagram showing a sound source system according to a conventional example.
8 is a configuration diagram showing a pitch phase converter and a phase generator in FIG. 7;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Pitch information generation part, 2 Pitch / phase conversion part, 3 Phase generation part, 4 Level information generation part, 6 Waveform generation apparatus, 6A Base waveform generation apparatus, 6A1, 6A2, ..., 6An KL base memory, 6B1, 6B2,..., 6Bn multiplier, 6C adder, 7 adder, 8 filter coefficient generator, 9 fault cloak filter, 10 filter coefficient generator, 11 brightness filter, 50 weight coefficient generator, 51, 52,. .. 5n two-dimensional table memory, 60 weight coefficient generator, 61 coefficient memory, 62 weight coefficient calculator.

Claims

A pitch information generator for outputting pitch information of the input analysis sound waveform, a pitch / phase converter for converting the pitch information from the pitch information generator into phase increment value data, and a phase increment from the pitch / phase converter A phase generator that obtains phase data based on the value data, a level information generator that outputs level information of the input analysis sound waveform, and a weighting factor for each base waveform based on the level information from the level information generator Waveform coefficient generator for generating, and a waveform generator for obtaining an output synthesized sound of each base waveform obtained by multiplying a weight coefficient based on the phase data from the phase generator and the weight coefficient for each base waveform from the weight coefficient generator with the door, the waveform generator has a base memory in accordance with the number of bases consisting stores KL base waveform, K from the base memory the phase data as an address input A base waveform generator that reads and outputs L base waveforms, a multiplier that multiplies each KL base waveform by a weight coefficient for each base waveform from the weight coefficient generator, and outputs of the multipliers In a sound source system having an adder for obtaining a synthesized sound,
The weighting factor generation unit stores each weighting factor corresponding to a pitch and a level in which a weighting factor forms a curved surface in a space composed of a pitch axis, a level axis, and a weighting factor axis that are orthogonal to each other. It has a two-dimensional table memory, and outputs a weighting factor for each KL base waveform to each multiplier based on the pitch information from the pitch information generator and the level information from the level information generator. Sound source system.

The sound source system according to claim 1, wherein the base waveform generation device obtains a weighting coefficient corresponding to a pitch and a level by linear approximation by an interpolation method using the two-dimensional table memory.

A pitch information generator for outputting pitch information of the input analysis sound waveform, a pitch / phase converter for converting the pitch information from the pitch information generator into phase increment value data, and a phase increment from the pitch / phase converter A phase generator that obtains phase data based on the value data, a level information generator that outputs level information of the input analysis sound waveform, and a weighting factor for each base waveform based on the level information from the level information generator Waveform coefficient generator for generating, and a waveform generator for obtaining an output synthesized sound of each base waveform obtained by multiplying a weight coefficient based on the phase data from the phase generator and the weight coefficient for each base waveform from the weight coefficient generator and provided, the waveform generator has a base memory in accordance with the number of bases consisting stores KL base waveform, KL from the base memory the phase data as an address input A base waveform generator for reading out and outputting base waveforms, a multiplier for multiplying each KL base waveform by a weight coefficient for each base waveform from the weight coefficient generator, and a combination of outputs of these multipliers In a sound source system having an adder for obtaining sound,
The weighting factor generator approximates a two-variable polynomial with pitch and level as variables, such that the weighting factor forms a curved surface in a space consisting of a pitch axis, a level axis, and a weighting factor axis that are orthogonal to each other. A coefficient memory for storing each coefficient for each base waveform, pitch information from the pitch information generator, level information from the level information generator, and a coefficient for each KL base waveform from the coefficient memory. A sound source system comprising: a weighting factor calculator for each base waveform for calculating a weighting factor based on the weighting factor, and outputting the weighting factor for each KL base waveform from the weighting factor calculator to each of the multipliers.

4. The sound source system according to claim 1, wherein a first filter coefficient generator that outputs filter information corresponding to pitch information from the pitch information generator and level information from the level information generator; A sound source system, further comprising: a formant filter that sets a filter coefficient from the first filter coefficient generator and filters the output from the waveform generator.

5. The sound source system according to claim 4, wherein a second filter coefficient generator for outputting filter information corresponding to the pitch information from the pitch information generator and the level information from the level information generator, and the second filter coefficient generator A sound source system, further comprising: a brightness filter that sets a filter coefficient from a filter coefficient generator and filters the output from the Forman filter.