JP3567587B2

JP3567587B2 - Tone generator

Info

Publication number: JP3567587B2
Application number: JP03301396A
Authority: JP
Inventors: 吾朗坂田
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 1996-01-25
Filing date: 1996-01-25
Publication date: 2004-09-22
Anticipated expiration: 2016-01-25
Also published as: JPH09204185A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声合成する装置に関し、特に、自然な歌声を発生することができる楽音発生装置に関する。
【０００２】
【従来の技術】
従来より、音声分析して抽出した特徴パラメータに基づき人声音を合成する手法として、チャネルボコーダや、線形予測、ＰＡＲＣＯＲ（パーコール）と呼ばれる技術が知られている。これら音声合成技術は、分析した音声を如何に少ない情報量に変換するか、つまり音声を分析して特徴パラメータの形に変換して言葉の意味内容に関係の無い冗長成分を除いて情報量を圧縮することに着目したものであって、高音質で音声合成したり、合成した人声音を楽音形成に応用することを考えたものではなかった。
そうした中にあって、チャネルボコーダは構成が単純でリアルタイムの分析合成に向いているため、フィルタバンクにより抽出される音声のパワースペクトル包絡に基づき楽音合成する楽音発生装置に適用されていた。しかしながら、チャネルボコーダでは、フィルタバンクを構成するバンドパスフィルタ段数の限界や、子音を合成できない等の問題により高音質の音声合成が叶わず、やがて淘汰されて行った。
【０００３】
【発明が解決しようとする課題】
一方、従来の波形メモリ読み出し方式による楽音発生装置では、サンプリングした人声音を波形メモリに記憶しておき、これをサンプリング時のピッチで読み出し再生すれば、最も単純な形で高品位な人声音を発生させることが可能になるものの、サンプリング時のピッチとは異なるピッチで読み出し再生しようとすると、人声音のフォルマント周波数が変換ピッチ量に応じて変化してしまう為、自然な歌声を発生することができないという問題がある。
そこで、本発明は、音声合成された人声音を自然な歌声として楽音形成することができる楽音発生装置を提供することを目的としている。
【０００４】
【課題を解決するための手段】
上記目的を達成するため、請求項１に記載の発明では、時系列に標本化された音声信号を分析フレーム毎に分析して特徴パラメータを抽出する音声分析手段と、この音声分析手段が抽出した特徴パラメータと、オンおよびオフのいずれかの状態をとるストップフラグを、分析フレーム毎に記憶するパラメータ記憶手段と、演奏情報に応じた前記パラメータ記憶手段からの特徴パラメータの読み出しを前記ストップフラグがオン状態である分析フレームで一時停止するとともに、次の演奏情報の発生に応じて一時停止の分析フレームから特徴パラメータの読み出しを再開する一方、当該演奏情報に対応した励振信号を発生する楽音制御手段と、前記パラメータ記憶手段から読み出される特徴パラメータと前記励振信号とに応じて音声合成する音声合成手段とを具備することを特徴としている。
【０００５】
上記請求項１に従属する請求項２に記載の発明によれば、パラメータ記憶手段に記憶された分析フレーム毎のストップフラグの内、指定された分析フレームのストップフラグの状態を変更するフラグ変更手段をさらに具備することを特徴とする。
【０００６】
請求項１に従属する請求項３に記載の発明では、パラメータ記憶手段に記憶された分析フレーム毎の特徴パラメータの内、指定された分析フレームの特徴パラメータを削除するパラメータ削除手段をさらに具備することを特徴とする。
【０００７】
本発明では、音声分析手段が時系列に標本化された音声信号を分析して特徴パラメータを抽出し、これをパラメータ記憶手段に記憶しておき、楽音制御手段が演奏情報に応じて前記パラメータ記憶手段から特徴パラメータを読み出す一方、当該演奏情報に対応した励振信号を発生すると、音声合成手段がパラメータ記憶手段から読み出される特徴パラメータと前記励振信号とに応じて音声合成する。これにより、音声合成された人声音を自然な歌声として楽音形成することが可能となる。
【０００８】
【発明の実施の形態】
本発明による楽音発生装置は、電子楽器の他、人声音で音声案内する装置などに適用され得る。以下では、本発明の実施の形態である電子楽器を実施例として図面を参照して説明する。
Ａ．実施例の概略
図１は、本発明の一実施例による電子楽器の概略構成を示す機能ブロック図である。この電子楽器は、周知のパーコール（ＰＡＲＣＯＲ）ボコーダの原理に基づきサンプリングした音声を分析して特徴パラメータ（後述する）を抽出し、抽出した特徴パラメータに応じて音声合成するものである。音声合成の際には、励振信号（後述する）を演奏データに応じて制御することによって、音声合成される人声音を自然な歌声として楽音形成するようになっている。以下、こうした実施例の概略について説明する。
【０００９】
図１において、１はＡ／Ｄ変換器であり、マイクロフォンおよびプリアンプを介して電気信号に変換された音声信号を所定のサンプリング周波数Ｆｓで標本化してなる離散的な音声データｘｉ（ｉ＝１〜Ｎ：サンプリング数）を出力する。２はパーコール（ＰＡＲＣＯＲ）分析系である。パーコール分析系２では、サンプリングされた音声データｘｉ間の線形予測誤差の自己相関を逐次算出してパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆを発生する。
残差波は、分析窓中の音声データが無声音／有声音のいずれであるかを表わすものであって、無声音である時にはホワイトノイズとなり、一方、有声音である時にはピッチ周期を形成するパルス列となる。なお、残差波パワーＡｆとはそれらを分析窓にわたって積分したものである。
【００１０】
３は特徴パラメータ記憶部であり、パーコール分析系２から出力されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆを分析フレーム毎に順次記憶する。ここで言う分析フレームとは、後述する窓関数にて規定される音声分析期間に相当する。４は演奏データに応じて特徴パラメータ記憶部３に記憶されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆの読み出しを制御したり、後述する励振部５に励振信号の発生を指示する制御部である。
励振部５は、演奏データに対応したピッチで複数の波形信号を発生する波形発生器５−１と、これら波形発生器５−１からそれぞれ出力される波形信号を加算して信号ＯＳＣを出力する加算器５−２と、ホワイトノイズＷＮを発生するホワイトノイズ発生器５−３とから構成され、制御部４の指示に応じて信号ＯＳＣあるいはホワイトノイズＷＮを後述のパーコール合成系６の端子ＩＮ１，ＩＮ２に供給する。すなわち、パーコール合成系６が有声音を合成する時には信号ＯＳＣを端子ＩＮ１へ供給し、一方、無声音を合成する時にはホワイトノイズＷＮを端子ＩＮ２へ供給する。
【００１１】
ところで、パーコール合成では、有声音に対してパルス波形を励振波形とするが、上記波形発生器５−１では、これに限定される必要はなく、様々な波形形状の信号を発生させることで合成音の音色の幅を広げることが可能になる。特に、三角波や異なるパルス幅を持った波形等、倍音成分を多く含んだものが効果的であり、例えば、ノートオンに対してディチューンを施した複数の波形信号を同時発生させることで、後述のパーコール合成系６ではコーラスのような音声合成が実現可能になる。
パーコール合成系６は、上述したパーコール分析系２とは逆の過程で音声を合成するものであり、制御部４の指示に応じて特徴パラメータ記憶部３から読み出されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆと、励振部５から与えられる励振信号とにより音声データｘｉを合成する。７はＤ／Ａ変換器であり、パーコール合成系６から出力される音声データｘｉをアナログ信号に変換して合成音声信号を出力する。
【００１２】
Ｂ．要部構成
次に、上述したパーコール分析系２、特徴パラメータ記憶部３およびパーコール合成系６の各構成について図２〜図５を参照して説明する。
（１）パーコール分析系２の構成
図２はパーコール分析系２の構成を示すブロック図である。この図において、２ａは入力される信号を少なくとも１サンプリング周期遅延して出力する遅延回路である。なお、この遅延回路２ａは、１サンプル遅延に限定されず、系のサンプリング周波数Ｆｓが３０ＫＨｚを越える時には、２サンプル遅延が適当である。２ｂはサンプル（音声データ）ｘｉ（Ｎ個）に対して窓関数Ｗ（ｎ）を乗算して重み付けをした後に、自己相関値を算出する相関器である。
【００１３】
相関器２ｂにおいて重み付けされる窓関数Ｗ（ｎ）としては、次式に示すハニング窓関数が用いられている。
【００１４】
【数１】

【００１５】
このハニング窓関数Ｗ（ｎ）は、図３に図示するように、約３０ｍｓ幅の分析フレームを持ち、２０ｍｓのフレーム周期で分析を進めるようになっている。また、相関器２ｂでは、次式に示す自己相関関数に基づきパーコール係数Ｋ１〜Ｋｎが算出される。なお、この関数ではパーコール係数Ｋ１〜Ｋｎがサンプルの振幅の影響を受けないよう正規化している。また、パーコール係数Ｋ１〜Ｋｎは、完全に相関がある時には「１」、相関が無い時には「０」、完全に逆位相の関係にある時には「−１」となる。
【００１６】
【数２】

【００１７】
２ｃは１サンプル遅延された音声データにパーコール係数Ｋ１を乗算して出力する係数乗算器であり、２ｄは現サンプリングされた音声データにパーコール係数Ｋ１を乗算して出力する係数乗算器である。２ｅは現サンプリングされた音声データから係数乗算器２ｃの出力を減算する減算器、２ｆは１サンプル遅延された音声データに係数乗算器２ｄの出力を減算する減算器である。
以上の構成要素２ａ〜２ｆは、格子型フィルタ２−１を構成し、これがｎ段縦続接続された格子型フィルタ２−１〜２−ｎによって、サンプル（音声データ）ｘｉ間の線形予測誤差の相関を表わすパーコール係数Ｋ１〜Ｋｎを発生するパーコール分析系２が形成されている。
なお、最終段の格子型フィルタ２−ｎから出力される残差波は、積分回路２ｇに供給され、分析窓にわたって積分してなる残差波パワーＡｆを発生する。
【００１８】
（２）特徴パラメータ記憶部３の構成
パーコール分析系２から出力されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆは、制御部４の指示に基づき、特徴パラメータ記憶部３に逐次フレーム記憶される。
ここで、図４を参照して特徴パラメータ記憶部３におけるフレーム記憶態様について説明しておく。本実施例の場合、分析されたパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆは、前述した分析フレーム毎にマトリクス状に記憶される。ここで、特徴的な点は、特徴パラメータの最上位ビットＭＳＢにストップビットＳＴＢを設けたことにある。
【００１９】
日本語の場合、母音部分ではパーコール係数Ｋ１〜Ｋｎが殆ど変化しないのに対し、子音部分では倍音の変化が大きい為、これに対応してパーコール係数Ｋ１〜Ｋｎの変化も大きい。そこで、母音部分では、図４に図示するように、上記ストップビットＳＴＢにストップフラグ「１」を立て、前後の似通った特徴パラメータを持つ分析フレームを削除することによって大幅にデータ量を削減し得るようになっている。また、このようにすることで演奏データに応じた歌声を合成することが可能になる。
つまり、演奏データとしてノートオンが与えられた時、ストップフラグが「１」となっているフレーム迄、順次特徴パラメータを読み出してパーコール合成系６に入力し、ストップフラグが「１」となっているフレームが読み出し対象となった時点でフレームの更新読み出しを一時停止する。そして、次のノートオンが発生した時に、再びストップフラグが立っているフレームまで順次特徴パラメータを読み出してパーコール合成系６に与えて音声合成させる。
【００２０】
（３）パーコール合成系６の構成
次に、特徴パラメータ記憶部３から読み出されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆに基づき音声合成するパーコール合成系６の構成について図５を参照して説明する。図５において、６ａはパーコール係数Ｋ１を所定の定数（約０．２〜０．３）に対して大小比較することによって有声音あるいは無声音のいずれかを判断し、その結果に応じた信号Ｗ１，Ｗ２を発生する比較器である。
すなわち、この比較器６ａは、合成すべき特徴パラメータが「有声音」である時には、信号Ｗ１を正規化レベル「１」として出力し、信号Ｗ２を「０」とする。一方、これとは逆に合成すべき特徴パラメータが「無声音」である時には、信号Ｗ２を正規化レベル「１」として出力し、信号Ｗ１を「０」とする。
【００２１】
６ｂ，６ｃはそれぞれ係数乗算器であり、係数乗算器６ｂは端子ＩＮ１に供給される信号ＯＳＣに対して信号Ｗ１を乗算して出力し、係数乗算器６ｃは端子ＩＮ２に供給されるホワイトノイズＷＮに対して信号Ｗ２を乗算して出力する。６ｄは上記係数乗算器６ｂ，６ｃの各出力を加算して出力する加算器である。
したがって、この加算器６ｄでは、有声音を合成する時に信号ＯＳＣを出力し、無声音を合成する時にホワイトノイズＷＮを励振源として発生することになる。
なお、上述した比較器６ａにあっては、有声音・無声音に応じて励振波形を信号ＯＳＣあるいはホワイトノイズＷＮに切換えるよう信号Ｗ１，Ｗ２を発生するようにしたが、これに限らず、有声音・無声音に応じて信号Ｗ１，Ｗ２をクロスフェードさせるようにしても良い。その場合、有声音から無声音への変化、あるいは無声音から有声音への変化がより自然なものとなる。
【００２２】
６ｅは係数乗算器であり、上記加算器６ｄの出力に残差パワーＡｆを乗算して出力する。６−１〜６−ｎは、それぞれパーコール係数Ｋ１〜Ｋｎに基づき前述したパーコール分析過程の逆過程で音声合成する格子型フィルターである。これら縦続接続される格子型フィルターは、遅延回路６ｆ、係数乗算器６ｇ，６ｈ、加算器６ｉおよび減算器６ｊから構成される。
遅延回路６ｆは、パーコール分析系２と同じサンプリング遅延とすれば、分析した音声信号と同じフォルマントとなる。したがって、音声合成時の特殊効果として故意にフォルマントを異ならせるには、分析時とは異なるサンプリング遅延量とすれば良い。
【００２３】
Ｃ．具体的構成
次に、本実施例による電子楽器の具体的構成について図６を参照して説明する。なお、この図において、図１に示す各部と共通する要素には同一の番号を付し、その説明を省略する。
図６において、１０は楽器各部を制御すると共に、上述したパーコール分析系２および制御部４の機能を担うＣＰＵであり、その動作については後述する。１１はＣＰＵ１０にロードされる各種の制御プログラムや制御データが記憶されるＲＯＭである。１２はＣＰＵ１０のワークエリアとして各種レジスタあるいはフラグデータが一時記憶されるＲＡＭである。また、このＲＡＭ１２は、上述の特徴パラメータ記憶部３として用いられるものであり、その所定記憶エリアには分析されたパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆが分析フレーム毎に配列状（図４参照）に記憶される。
【００２４】
１３は各種操作スイッチが配設され、各スイッチ操作に応じた操作信号を発生する操作パネルである。この操作パネル１３には、音声サンプリングする際に操作されるサンプリングスイッチＳＳ、サンプリング開始時に操作されるスタートスイッチＳＴＳ、サンプリングした音声データを編集加工する際に操作されるエディットスイッチＥＳや、パーコール分析する際に操作される分析スイッチＡＳが配設されている。
また、エディットに関するスイッチ類としては、分析フレームをインクリメントさせるスイッチＩＮＣ、分析フレームをデクリメントさせるスイッチＤＥＣ、分析フレームをデリートするスイッチＤＥＬ、分析フレームをセットするスイッチＳＰＳおよびリセットするスイッチＳＲＳがある。なお、これらスイッチの意図するところについは追って説明する。
【００２５】
１４はＬＣＤ表示パネルやＬＣＤ駆動回路等から構成される表示部であり、バスを介してＣＰＵ１０から供給される表示制御信号に応じて、例えば、サンプリングした音声データを時系列にＬＣＤ表示する。１５は前述したパーコール合成系６の処理をシミュレートするディジタルシグナルプロセッサ（以下、ＤＳＰと記す）である。このＤＳＰ１５は、演奏データに応じてＣＰＵ１０の制御の下に、ＲＡＭ１２から転送されて来るパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆに基づき音声合成する。
ＤＳＰ１５にて合成された音声データは、次段のＤ／Ａ変換器７を介してアナログの音声信号に変換され、図示されていないサウンドシステムにてノイズ除去等のフィルタリングが施された後、スピーカより自然な歌声として放音される。
【００２６】
Ｄ．実施例の動作
次に、上記構成による実施例の動作について図７〜図１６を参照して説明して行く。以下では、最初に全体動作としてメインルーチンの処理を説明した後、このメインルーチンにおいてコールされる各種ルーチンや割込みルーチンの処理内容について順次述べる。
（１）メインルーチンの動作
まず、本実施例による電子楽器に電源が投入されると、ＣＰＵ１０がＲＯＭ１１より所定の制御プログラムを読み出して自身にロードした後、図７に示すメインルーチンを実行してステップＳＡ１に処理を進め、各種レジスタをゼロリセットしたり、初期値セットする等のイニシャライズを行う。
すなわち、ステップＳＡ１では、プログラムカウンタ値がセットされる３種のレジスタｉ，ｊ，ｋをそれぞれゼロリセットする一方、レジスタｆｌａｍｅ＿ｅｎｄ，ｗｉｎ＿ｅｎｄ，ａｎａ＿ｓｔｅｐ，ｋ＿ｅｎｄおよびｆｌａｍｅにそれぞれ初期値をセットする。
【００２７】
ここで、レジスタｆｌａｍｅ＿ｅｎｄは、分析フレーム（窓）の数が格納されるレジスタであり、この例では「３０」フレームが初期セットされる。レジスタｗｉｎ＿ｅｎｄは、１フレームを構成するサンプリング数が格納されるレジスタであり、この例では「１４４０」がセットされる。
また、レジスタａｎａ＿ｓｔｅｐは、分析するフレーム間のサンプリング数が格納されるレジスタであり、この例では「９６０」がセットされる。レジスタｋ＿ｅｎｄは、分析するパーコール係数の次数が格納され、この例では「２０」がセットされる。レジスタｆｌａｍｅは、現在処理中にあるフレーム番号が格納されるレジスタであり、この場合、ゼロリセットされる。
【００２８】
こうして初期化がなされると、ＣＰＵ１０は次のステップＳＡ２に処理を進め、操作パネル１３に配設される各種スイッチの内、サンプリングスイッチＳＳがオン操作されているか否かを判断する。ここで、演奏に先立って音声サンプリングする時には、スイッチＳＳがオン操作されるので、判断結果が「ＹＥＳ」となり、ステップＳＡ３に進み、後述するサンプリング処理ルーチンを実行する。
一方、サンプリングスイッチＳＳがオン操作されない時には、判断結果が「ＮＯ」となり、次のステップＳＡ４へ処理を進める。
ステップＳＡ４では、操作パネル１３に配設される各種スイッチの内、エディットスイッチＥＳがオン操作されているか否かを判断する。ここで、スイッチＥＳがオン操作された時には、判断結果が「ＹＥＳ」となり、ステップＳＡ５に進み、後述するエディット処理ルーチンを実行し、オン操作されない時には、判断結果が「ＮＯ」となり、ステップＳＡ６へ処理を進める。
【００２９】
ステップＳＡ６では、分析スイッチＡＳがオン操作されたかどうかを判断する。ここで、サンプリングした音声データに対してパーコール分析を施すべくスイッチＡＳをオン操作した時には、判断結果が「ＹＥＳ」となり、ステップＳＡ７に進み、後述する分析処理ルーチンを実行する。これに対し、当該スイッチＡＳがオン操作されない時には、判断結果が「ＮＯ」となり、ステップＳＡ８へ処理を進める。
なお、上述したステップＳＡ３，ＳＡ５およびＳＡ７を介して実行される”サンプリング処理ルーチン”、”エディット処理ルーチン”および”分析処理ルーチン”が完了した場合も、ステップＳＡ８に処理を進め、後述する演奏処理ルーチンを実行する。そして、この後、ＣＰＵ１０は再びステップＳＡ２に処理を戻し、以後、上述した過程を繰り返す。
【００３０】
このように、メインルーチンでは、初期化後に発生するスイッチイベントに応じて”サンプリング処理ルーチン”、”エディット処理ルーチン”および”分析処理ルーチン”を実行し、これら処理に基づき得られる特徴パラメータ（パーコール係数および残差波パワー）に従ってなされるパーコール音声合成を演奏データに応じて制御する演奏処理が行われるようになっている。
通常、最初に”サンプリング処理ルーチン”により音声信号をサンプリングしておき、次に”分析処理ルーチン”により各分析フレーム毎の特徴パラメータ（パーコール係数および残差波パワー）を抽出する。続いて、必要に応じてこの抽出した特徴パラメータを”エディット処理ルーチン”により編集し、演奏処理ルーチンにて音声合成されるべき歌声を調整するというプロセスを辿る。以下、こうしたプロセスに沿って各ルーチンの動作について説明して行く。
【００３１】
（２）タイマ割込み処理ルーチンの動作
ＣＰＵ１０では、分析フレームを更新させるべく一定周期毎にタイマ割込み処理を実行しており、例えば、２０ｍｓｅｃ毎に割込みマスクを解除して図８に示すタイマ割込み処理ルーチンを実行してステップＳＢ１に処理を進め、レジスタｔｉｍｅにタイマフラグ「１」をセットする。
なお、このレジスタｔｉｍｅにセットされるタイマフラグは、分析フレームが更新された後にゼロリセットされる。
【００３２】
（３）サンプリング処理ルーチンの動作
音声信号をサンプリングする為、操作パネル１３に配設されるサンプリングスイッチＳＳがオン操作されると、上述したステップＳＡ３を介してサンプリング処理ルーチンが実行され、ＣＰＵ１０は図９に示すステップＳＣ１に処理を進める。
ステップＳＣ１では、サンプリング開始を指示するスタートスイッチＳＴＳがオン操作される迄待機状態となり、当該スイッチＳＴＳがオンされると、判断結果が「ＹＥＳ」となり、次のステップＳＣ２に進み、レジスタｉをゼロリセットする。
【００３３】
次いで、ステップＳＣ３に進むと、レジスタｉの値が、ｆｌａｍｅ＿ｅｎｄ×ａｎａ＿ｓｔｅｐに達したか、つまり、規定数分のサプリングが完了したかどうかを判断する。ここで、完了していない時には、判断結果が「ＮＯ」となり、次のステップＳＣ４に処理を進める。ステップＳＣ４では、レジスタｉの値に対応させてサンプリングした音声データｘｉをＲＡＭ１２に格納する。この後、再びステップＳＣ３に処理を戻し、規定数分の音声データｘｉをサプリングする迄、ステップＳＣ３，ＳＣ４を繰り返す。そして、サンプリングが完了した時点で上記ステップＳＣ３の判断結果が「ＹＥＳ」となり、本ルーチンを終了してメインルーチンへ復帰する。
【００３４】
（４）分析処理ルーチンの動作
以上のようにして、サンプリング処理によりＲＡＭ１２にサンプリングした音声データｘｉが格納された場合、操作者はこれら音声データｘｉにパーコール分析を施して特徴パラメータ（パーコール係数および残差波パワー）を抽出すべく操作パネル１３上の分析スイッチをオン操作する。
すると、ＣＰＵ１０は前述したメインルーチン（図７参照）のステップＳＡ７を介して図１０に示す分析処理ルーチンを実行し、ステップＳＤ１に処理を進める。
【００３５】
ステップＳＤ１では、レジスタｉの値が、レジスタｆｌａｍｅ＿ｅｎｄに格納される最終フレーム値に達したか、つまり、パーコール分析が完了したかどうかを判断する。分析が完了した時には、判断結果が「ＹＥＳ」となって本ルーチンを終了するが、そうでない場合には、判断結果が「ＮＯ」となり、次のステップＳＤ２に処理を進める。
ステップＳＤ２では、レジスタｊの値が、レジスタｗｉｎ＿ｅｎｄの値に達したか否かを判断する。ここで、１フレーム分の音声データｘｉについてパーコール分析が済んでいない場合には、判断結果が「ＮＯ」となり、次のステップＳＤ３に進む。
【００３６】
ステップＳＤ３では、レジスタｉ，レジスタａｎａ＿ｓｔｅｐおよびレジスタｊの値に応じて順次ＲＡＭ１２から読み出される音声データｘｉに対して前述したハニング窓関数Ｗ（ｎ）を乗算し、これを窓掛けされた分析データとしてレジスタｗａｖｅ１［ｊ］にストアする。そして、１フレーム分の音声データｘｉに対して窓掛けが完了すると、上記ステップＳＤ２の判断結果が「ＹＥＳ」となり、次のステップＳＤ４に処理を進める。
ステップＳＤ４では、レジスタｊ，Ｚを一旦、それぞれゼロリセットし、続く、ステップＳＤ５では、上記ステップＳＤ３において窓掛けされた分析データを１フレーム分にわたって自乗和を求めたか否かを判断する。１フレーム分の分析データについて自乗和を算出し終えていない時には、判断結果が「ＮＯ」となり、ステップＳＤ６に進み、レジスタｗａｖｅ１［ｊ］に格納される分析データを自乗して累算して行く。
【００３７】
そして、自乗和を算出し終えた時に、上記ステップＳＤ５の判断結果が「ＹＥＳ」となり、次のステップＳＤ７に処理を進め、レジスタｊの値をゼロリセットする。
次いで、ＣＰＵ１０は、図１１に示すステップＳＤ８に処理を進め、レジスタｊの値がレジスタｗｉｎ＿ｅｎｄの値に達したかどうか、つまり、１フレーム中の最終データに達したか否かを判断する。ここで、最終データに達していない時には、判断結果が「ＮＯ」となり、次のステップＳＤ９に進む。ステップＳＤ９では、レジスタｊの値に対応してレジスタｗａｖｅ１に格納されている分析データを逐次レジスタｗａｖｅ２へストアする。
【００３８】
こうして、レジスタｗａｖｅ１の値をレジスタｗａｖｅ２にコピーし終えると、ステップＳＤ８の判断結果が「ＹＥＳ」となり、次のステップＳＤ１０に処理を進める。ステップＳＤ１０では、レジスタｊおよびレジスタｋをゼロリセットする。
続いて、ステップＳＤ１１に進むと、ＣＰＵ１０は、自己相関値であるパーコール係数Ｋ１〜Ｋｎがストアされるレジスタｋ［ｆｌａｍｅ］［０］〜ｋ［ｆｌａｍｅ］［ｋ＿ｅｎｄ］と、残差波パワーＡｆがストアされるレジスタａｆ［ｆｌａｍｅ］とをゼロリセットする。
なお、レジスタｋ［ｆｌａｍｅ］［０］〜ｋ［ｆｌａｍｅ］［ｋ＿ｅｎｄ］とは、レジスタｆｌａｍｅの値とレジスタｋ＿ｅｎｄの値とで要素が定まる２次元配列要素となっている。
【００３９】
次に、ステップＳＤ１２では、レジスタｋの値がレジスタｋ＿ｅｎｄの値に達したかどうか、つまり、１フレーム分の特徴パラメータを抽出し終えたか否かを判断する。ここで、特徴パラメータの抽出が完了したならば、次のフレームについて分析を進めるべく判断結果を「ＹＥＳ」とし、その処理を上述のステップＳＤ１（図１０参照）に戻す。
一方、特徴パラメータの抽出が完了していないと、判断結果が「ＮＯ」となり、この場合、レジスタｋの値に応じてステップＳＤ１３〜ＳＤ１７を実行して１フレーム中における分析データ間の線形予測誤差の相関係数（パーコール係数）を逐次抽出するパーコール分析を行い、抽出したパーコール係数をレジスタｋ［ｆｌａｍｅ］［０］〜ｋ［ｆｌａｍｅ］［ｋ＿ｅｎｄ］に順次ストアする。
そして、相関係数（パーコール係数）を算出する過程が完了すると、ステップＳＤ１６の判断結果が「ＹＥＳ」となり、ステップＳＤ１８に処理を進めて残差波パワー算出処理ルーチン（後述する）を実行した後、ステップＳＤ１２に処理を戻す。
【００４０】
（５）残差波パワー算出処理ルーチンの動作
上述したステップＳＤ１８を介して残差波パワー算出処理ルーチンが実行されると、ＣＰＵ１０は、図１２に示すステップＳＥ１に処理を進め、レジスタｊおよびレジスタａｆ［ｆｌａｍｅ］をゼロリセットする。次いで、ステップＳＥ２に進むと、現在の処理が１フレーム中の最終データに達したか否かを判断する。ここで、最終データに達していない時には、判断結果が「ＮＯ」となり、次のステップＳＥ３に進む。ステップＳＥ３では、レジスタｗａｖｅ１［ｊ］の値を自乗して累算してなる残差波パワーａｆを算出してレジスタａｆ［ｆｌａｍｅ］にストアする。
【００４１】
そして、分析フレーム当りの残差波パワーａｆが算出されると、上記ステップＳＥ２の判断結果が「ＹＥＳ」となり、次のステップＳＥ４に処理を進める。ステップＳＥ４に進むと、ＣＰＵ１０は算出した残差波パワーａｆが所定値より大であるか否かを判断する。
この所定値とは、パーコール分析された音声が「有声音」あるいは「無声音」のいずれに対応するものであるかを判別する為の閾値である。ここで、残差波パワーａｆが所定値より大の時はパーコール分析された音声が「有声音」であると見做して本ルーチンを終了するが、所定値より小の時には判断結果が「ＮＯ」となり、この場合、ステップＳＥ５に進み、残差波パワーａｆを「０」にセットする。
【００４２】
（６）エディット処理ルーチンの動作
分析処理ルーチンおよび残差波パワー算出処理ルーチンによりＲＡＭ１２に格納された特徴パラメータ（パーコール係数および残差波パワー）を編集する場合、つまり、母音部分の特徴パラメータに対してその最上位ビットにストップビットＳＴＢを付与したり、特徴パラメータの変化が少ない分析フレームを削除する等のデータ加工を施す時には、操作パネル１３に配設される各種スイッチの内、エディットスイッチＥＳがオン操作される。
当該スイッチＥＳがオン操作されると、ＣＰＵ１０は前述したメインルーチンのステップＳＡ５を介して図１３に示すエディット処理ルーチンを実行する。エディット処理ルーチンでは、発生するスイッチイベントに応じて対応する処理を行うようになっており、以下、各スイッチ操作毎の動作について述べる。
【００４３】
▲１▼フレームインクリメントスイッチＩＮＣが操作された時の動作
エディット対象とする分析フレームを進める場合、フレームインクリメントスイッチＩＮＣが操作される。当該スイッチＩＮＣがオン操作されると、ステップＳＦ１の判断結果が「ＹＥＳ」となり、ステップＳＦ２に処理を進める。ステップＳＦ２では、レジスタｆｌａｍｅの値を１インクリメントして歩進させ、続くステップＳＦ３では、更新されたレジスタｆｌａｍｅの値に応じて対応するフレームの特徴パラメータ（パーコール係数および残差波パワー）をＲＡＭ１２から読み出す。この後、読み出した特徴パラメータをディスプレイ上に数値表示あるいはグラフ表示させ、上記ステップＳＦ１へ処理を戻す。
【００４４】
▲２▼フレームデクリメントスイッチＤＥＣが操作された時の動作
エディット対象とする分析フレームを後退させる場合、フレームデクリメントスイッチＤＥＣが操作される。当該スイッチＤＥＣがオン操作された時には、ステップＳＦ１を介してステップＳＦ４に進み、ここでの判断結果が「ＹＥＳ」となってステップＳＦ５に処理を進める。
ステップＳＦ５では、レジスタｆｌａｍｅの値を１デクリメントして上記ステップＳＦ３に進み、更新されたレジスタｆｌａｍｅの値に応じて対応するフレームの特徴パラメータ（パーコール係数および残差波パワー）をＲＡＭ１２から読み出す。この場合も読み出した特徴パラメータをディスプレイ上に数値表示あるいはグラフ表示させ、上記ステップＳＦ１へ処理を戻す。
【００４５】
▲３▼フレームデリートスイッチＤＥＬが操作された時の動作
変化の少ない分析フレームを削除してデータ量を削減するには、フレームデリートスイッチＤＥＬが操作される。当該スイッチＤＥＬがオン操作されると、上述のステップＳＦ１，ＳＦ４を介してステップＳＦ６に進み、ここでの判断結果が「ＹＥＳ」となってステップＳＦ７に処理を進める。
そして、ステップＳＦ７では、レジスタｆｌａｍｅの値をレジスタｉにストアし、続く、ステップＳＦ８では、レジスタｉの値が最終フレームであるか否かを判断する。ここで、最終フレームであれば、最終フレームの特徴パラメータを削除してステップＳＦ１に処理を戻す。
一方、最終フレームでなければ、判断結果が「ＮＯ」となり、ステップＳＦ９に処理を進め、レジスタｉの値に対応する分析フレームを削除すると共に、これ以降のフレーム番号を１インクリメントする。
【００４６】
▲４▼ストップセットスイッチＳＥＴが操作された時の動作
パーコール係数Ｋ１〜Ｋｎの変化が少ない母音部分に対応する分析フレームを見つけた時には、特徴パラメータの最上位ビットＭＳＢに設けられるストップビットＳＴＢにストップフラグを立てるべくストップセットスイッチＳＥＴをオン操作する。
このスイッチＳＥＴがオン操作されると、ＣＰＵ１０はそのスイッチイベントに基づき図１４に示すステップＳＦ１０の判断結果が「ＹＥＳ」となり、ステップＳＦ１１に処理を進める。ステップＳＦ１１では、レジスタａｆ［ｆｌａｍｅ］に格納される残差波パワーａｆの最上位ビットＭＳＢにストップフラグ「１」をセットする。
【００４７】
▲５▼ストップリセットスイッチＲＥＳＥＴが操作された時の動作
一方、上記ストップセットスイッチＳＥＴの操作に応じて付与されたストップフラグをリセットする際には、ストップリセットスイッチＲＥＳＥＴが操作される。当該スイッチＲＥＳＥＴがオン操作されると、ステップＳＦ１２の判断結果が「ＹＥＳ」となり、次のステップＳＦ１３に処理を進め、レジスタａｆ［ｆｌａｍｅ］の最上位ビットＭＳＢのストップフラグをゼロリセットする。
▲６▼イグジットスイッチＥＸＩＴが操作された時の動作
エディット処理を終了すべくイグジットスイッチＥＸＩＴがオン操作されると、ステップＳＦ１４の判断結果が「ＹＥＳ」となり、本ルーチンを完了してその処理をメインルーチンへ復帰させる。
【００４８】
（７）演奏処理ルーチンの動作
前述したメインルーチンのステップＳＡ８（図７参照）を介して図１５〜図１６に示す演奏処理ルーチンが起動されると、ＣＰＵ１０は先ずステップＳＧ１に処理を進め、レジスタａｆ［ｆｌａｍｅ］に格納される残差波パワーａｆを読み出し、その最上位ビットＭＳＢに格納されるストップビットが「０」であるか否かを判断する。
ここで、ストップビットが「０」であると、判断結果が「ＹＥＳ」となり、次のステップＳＧ２に処理を進める。ステップＳＧ２では、レジスタｔｉｍｅの値が「１」、すなわち、特徴パラメータをＤＳＰ１５（図６参照）へ転送するタイミング下にあるかどうかを判断する。
【００４９】
前述したタイマー割込み処理ルーチンの動作によって、レジスタｔｉｍｅの値が「１」にセットされているとする。そうすると、特徴パラメータをＤＳＰ１５へ転送するタイミング下にあるから、判断結果が「ＹＥＳ」となり、ステップＳＧ３に処理を進める。
ステップＳＧ３では、レジスタａｆ［ｆｌａｍｅ］から読み出した残差波パワーａｆと、レジスタｋ［ｆｌａｍｅ］［０］〜ｋ［ｆｌａｍｅ］［ｋ＿ｅｎｄ−１］から読み出したパーコール係数Ｋ１〜ＫｎをそれぞれＤＳＰ１５に転送する。そして、特徴パラメータの転送が完了したならば、レジスタｆｌａｍｅを１インクリメントして歩進する一方、レジスタｔｉｍｅをゼロリセットした後、パーコール係数Ｋ１の値に応じて入力ボリュームをＤＳＰ１５に指示する。この後、ＣＰＵ１０は後述するステップＳＧ７へ処理を進める。
なお、上記ステップＳＧ２において、レジスタｔｉｍｅの値が「０」の時には、特徴パラメータをＤＳＰ１５へ転送するタイミングでないので、この場合も後述するステップＳＧ７へ処理を進める。
【００５０】
一方、ストップフラグが「１」の場合には、上記ステップＳＧ１の判断結果が「ＮＯ」となり、ステップＳＧ４に処理を進める。ステップＳＧ４では、ノートオン指示が来たか否かを判断する。ここで、ノートオンとなった時には、判断結果が「ＹＥＳ」となり、次のステップＳＧ５に進む。
ステップＳＧ５では、特徴パラメータをＤＳＰ１５（図６参照）へ転送するタイミング下にあるかどうかを判断する。転送タイミングでない時には、判断結果が「ＮＯ」となり、後述のステップＳＧ７に進む。
これに対し、転送タイミング下にあると、判断結果が「ＹＥＳ」となり、ステップＳＧ６に進む。
【００５１】
ステップＳＧ６に進むと、ＣＰＵ１０は、レジスタａｆ［ｆｌａｍｅ］から読み出した残差波パワーａｆと、レジスタｋ［ｆｌａｍｅ］［０］〜ｋ［ｆｌａｍｅ］［ｋ＿ｅｎｄ−１］から読み出したパーコール係数Ｋ１〜ＫｎをそれぞれＤＳＰ１５に転送する。そして、転送完了後に、レジスタｆｌａｍｅを１インクリメントして歩進すると共に、レジスタｔｉｍｅをゼロリセットする。さらに、パーコール係数Ｋ１の値に応じて入力ボリュームをＤＳＰ１５に指示し、ノートオンフラグをゼロリセットする。この後、ＣＰＵ１０は後述するステップＳＧ７へ処理を進める。
【００５２】
次いで、図１６に示すステップＳＧ７では、最終フレームに達したか否かを判断する。ここで、最終フレームに達している時には、判断結果が「ＹＥＳ」となり、次のステップＳＧ８に進み、レジスタｆｌａｍｅの値をゼロリセットし、その後、ステップＳＧ９へ処理を進める。一方、最終フレームに達していない時には、何も処理せずにステップＳＧ９へ進む。
ステップＳＧ９に進むと、ＣＰＵ１０は演奏イベントが発生したかどうかを判断する。ここで、演奏イベントが無い時には、判断結果が「ＮＯ」となり、一旦、本ルーチンを完了してメインルーチンに復帰する。
【００５３】
一方、演奏イベントが発生した時には、判断結果が「ＹＥＳ」となり、次のステップＳＧ１０に処理を進める。ステップＳＧ１０では、そのイベントがノートオンであるか否かを判断する。ノートオンであれば、判断結果が「ＹＥＳ」となり、ステップＳＧ１１に進み、励振部５（図６参照）をオン制御する。すなわち、現分析フレームのパーコール係数が「有声音」に対応するものである時には、励振部５から演奏データに対応したピッチの複数の波形信号を加算してなる信号ＯＳＣを生成させ、これをパーコール合成系６（すなわち、ＤＳＰ１５）の端子ＩＮ１へ供給するよう指示する。あるいは、現分析フレームのパーコール係数が「無声音」に対応するものである時には、励振部５からホワイトノイズＷＮを出力させ、これをパーコール合成系６（ＤＳＰ１５）の端子ＩＮ２へ供給するよう指示する。そして、この後にＣＰＵ１０は、ステップＳＧ１２に進み、ノートオンフラグｎｏｔｅｏｎを「１」にセットして本ルーチンを完了する。
これに対し、発生したイベントがノートオンでない場合には、上記ステップＳＧ１０の判断結果が「ＮＯ」となり、ステップＳＧ１３に進み、励振部５に対して信号ＯＳＣあるいはホワイトノイズＷＮの出力をオフするよう指示する。
【００５４】
このように、演奏処理ルーチンでは、演奏データとしてノートオンが与えられた時、ストップフラグが「１」となっているフレーム迄、順次特徴パラメータを読み出してＤＳＰ１５に入力し、ストップフラグが「１」となっているフレームが読み出し対象となった時点でフレームの更新読み出しを一時停止する。そして、次のノートオンが発生した時に、再びストップフラグが立っているフレームまで順次特徴パラメータを読み出してＤＳＰ１５に与えて音声合成させる。
なお、ＤＳＰ１５では、ＣＰＵ１０の指示に応じてＲＡＭ１２から転送されるパーコール係数Ｋ１〜Ｋｎおよび残差波パワーＡｆと、励振部５から与えられる励振信号とに基づきパーコール合成する。これにより、音声合成される人声音が自然な歌声として楽音形成される訳である。
【００５５】
以上説明したように、本実施例では、サンプリングした音声をパーコール分析して特徴パラメータを抽出し、抽出した特徴パラメータを演奏データに応じてパーコール合成するから、音声合成される人声音を自然な歌声として楽音形成することが可能となっている。
また、本実施例では、特徴パラメータの最上位ビットＭＳＢにストップビットＳＴＢを設けておき、特徴パラメータの変化が少ない母音部分では、当該ストップビットにＳＴＢにストップフラグ「１」を立て、前後の似通った特徴パラメータを持つ分析フレームを削除することによって大幅にデータ量を削減し得るようになっている。また、このようにすることで演奏データに応じた歌声が合成し易くなっている。
【００５６】
Ｅ．変形例
次に、本発明による楽音発生装置１００を用いて音声合成される人声音を楽音として形成する変形例について説明する。
図１７は、パーコール分析に用いる音声信号を外部音源１０１から供給する形態である。この場合、ＭＩＤＩ信号（演奏データ）を発生するＭＩＤＩ楽器１０２から外部音源１０１と本発明による楽音発生装置１００とに同一のＭＩＤＩ信号を供給してボコーダ処理（パーコール分析・合成）を行う。この際、楽音発生装置１００に入力される音声信号は、母音入力として用いられる。装置１００内部には、子音用のノイズ音源を具備し、パーコール係数に応じて母音入力および子音入力を重み付けする態様となる。
【００５７】
また、図１８に示す形態の場合、ＭＩＤＩ楽器から出力されるＭＩＤＩ信号を楽音発生装置１００が受け、受けたＭＩＤＩ信号をさらにマルチティンバー駆動可能な外部音源１０１に供給する。外部音源１０１は、マルチティンバーで音声信号を発生し、その中には母音用ティンバー、子音用ティンバーを設定する。装置１００では、これら各ティンバー（音色）の音量レベルをパーコール係数で重み付けした後にパーコール合成する。これにより、母音、子音の発音が実現する。
【００５８】
なお、上述した実施例では、ノートオンに応じて音声合成すべき分析フレームを更新しているが、これに限定されず、例えば、ノートオフやベロシティに応じて分析フレームを更新して逐次パーコール合成する態様としても良い。
【００５９】
【発明の効果】
本発明によれば、音声分析手段が時系列に標本化された音声信号を分析して特徴パラメータを抽出し、これをパラメータ記憶手段に記憶しておき、楽音制御手段が演奏情報に応じて前記パラメータ記憶手段から特徴パラメータを読み出す一方、当該演奏情報に対応した励振信号を発生すると、音声合成手段がパラメータ記憶手段から読み出される特徴パラメータと前記励振信号とに応じて音声合成するので、音声合成された人声音を自然な歌声として楽音形成することができる。
【図面の簡単な説明】
【図１】本発明による一実施例の概略構成を示すブロック図である。
【図２】同実施例におけるパーコール分析系２の構成を示すブロック図である。
【図３】同実施例における分析フレームを説明するための図である。
【図４】同実施例における特徴パラメータ記憶３の特徴パラメータ記憶形態を説明するための図である。
【図５】同実施例におけるパーコール合成系６の構成を示すブロック図である。
【図６】同実施例の具体的構成を示すブロック図である。
【図７】同実施例におけるメインルーチンの動作を示すフローチャートである。
【図８】同実施例におけるタイマ割り込み処理ルーチンの動作を示すフローチャートである。
【図９】同実施例におけるサンプリング処理ルーチンの動作を示すフローチャートである。
【図１０】同実施例における分析処理ルーチンの動作を示すフローチャートである。
【図１１】同実施例における分析処理ルーチンの動作を示すフローチャートである。
【図１２】同実施例における残差波パワー算出処理ルーチンの動作を示すフローチャートである。
【図１３】同実施例におけるエディット処理ルーチンの動作を示すフローチャートである。
【図１４】同実施例におけるエディット処理ルーチンの動作を示すフローチャートである。
【図１５】同実施例における演奏処理ルーチンの動作を示すフローチャートである。
【図１６】同実施例における演奏処理ルーチンの動作を示すフローチャートである。
【図１７】変形例を説明するための図である。
【図１８】変形例を説明するための図である。
【符号の説明】
２パーコール分析系（音声分析手段）
３特徴パラメータ記憶部（パラメータ記憶手段）
４制御部（楽音制御手段）
５励振部（楽音制御手段）
６パーコール合成系（音声合成手段）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus for synthesizing voice, and more particularly, to a tone generator capable of generating a natural singing voice.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, as a technique for synthesizing a human voice based on feature parameters extracted by voice analysis, techniques called channel vocoder, linear prediction, and PARCOR (Percoll) have been known. These speech synthesis techniques convert the analyzed speech into a small amount of information, that is, analyze the speech and convert it to the form of characteristic parameters to remove the information amount excluding redundant components that are not related to the meaning of words. It focuses on compression and does not consider synthesizing speech with high sound quality or applying the synthesized human voice to musical tone formation.
Under such circumstances, the channel vocoder has a simple structure and is suitable for real-time analysis and synthesis. Therefore, the channel vocoder has been applied to a tone generator which synthesizes a tone based on the power spectrum envelope of speech extracted by a filter bank. However, in the channel vocoder, high-quality sound synthesis was not achieved due to the limitation of the number of band-pass filter stages constituting the filter bank and the inability to synthesize consonants.
[0003]
[Problems to be solved by the invention]
On the other hand, in a tone generator using a conventional waveform memory reading method, sampled human voices are stored in a waveform memory, and read out and reproduced at the pitch at the time of sampling, the simplest form of high-quality human voices can be obtained. Although it can be generated, if you try to read and play it at a pitch different from the pitch at the time of sampling, a natural singing voice may be generated because the formant frequency of human voice changes according to the amount of conversion pitch There is a problem that can not be.
SUMMARY OF THE INVENTION It is an object of the present invention to provide a musical sound generator capable of forming a human voice synthesized with speech as a natural singing voice.
[0004]
[Means for Solving the Problems]
In order to achieve the above object, according to the first aspect of the present invention, an audio signal sampled in time series isFor each analysis frameSpeech analysis means for analyzing and extracting feature parameters, and feature parameters extracted by the speech analysis meansAnd a stop flag that takes either the on or off state for each analysis frame.Parameter storage means to store, according to performance informationWasFrom the parameter storage meansReading of the characteristic parameters is paused in the analysis frame in which the stop flag is ON, and the reading of the characteristic parameters is resumed from the analysis frame of the pause in response to the generation of the next performance information.On the other hand, it is characterized by comprising a tone control means for generating an excitation signal corresponding to the performance information, and a voice synthesis means for performing voice synthesis in accordance with the characteristic parameter read from the parameter storage means and the excitation signal. .
[0005]
According to the invention described in claim 2 dependent on the above claim 1,The system further includes flag changing means for changing the state of the stop flag of the specified analysis frame among the stop flags for each analysis frame stored in the parameter storage means.It is characterized by the following.
[0006]
Claim 1In the invention according to claim 3 which is dependent onThe apparatus further includes a parameter deleting unit that deletes a characteristic parameter of a specified analysis frame from among the characteristic parameters for each analysis frame stored in the parameter storage unit.Characterized by.
[0007]
According to the present invention, the voice analysis means analyzes the voice signal sampled in time series to extract characteristic parameters and stores them in the parameter storage means, and the tone control means stores the parameter in accordance with performance information. While reading the characteristic parameters from the means and generating an excitation signal corresponding to the performance information, the voice synthesizing means synthesizes voice according to the characteristic parameters read from the parameter storage means and the excitation signal. As a result, it is possible to form a musical sound as a natural singing voice using the synthesized voice.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
The musical sound generating device according to the present invention can be applied to an electronic musical instrument as well as a device for providing voice guidance using human voices. Hereinafter, an electronic musical instrument according to an embodiment of the present invention will be described as an example with reference to the drawings.
A. Overview of Examples
FIG. 1 is a functional block diagram showing a schematic configuration of an electronic musical instrument according to one embodiment of the present invention. This electronic musical instrument analyzes a sampled sound based on the well-known principle of a PARCOR vocoder, extracts a characteristic parameter (described later), and synthesizes a sound in accordance with the extracted characteristic parameter. At the time of speech synthesis, the excitation signal (described later) is controlled in accordance with the performance data, so that the human voice to be speech-synthesized is formed as a natural singing tone. Hereinafter, an outline of such an embodiment will be described.
[0009]
In FIG. 1, reference numeral 1 denotes an A / D converter, which is a discrete audio data xi (i = 1 to 1) obtained by sampling an audio signal converted into an electric signal via a microphone and a preamplifier at a predetermined sampling frequency Fs. N: sampling number). Reference numeral 2 denotes a PARCOR analysis system. The Percoll analysis system 2 sequentially calculates the autocorrelation of the linear prediction error between the sampled audio data xi to generate Percoll coefficients K1 to Kn and the residual wave power Af.
The residual wave indicates whether the voice data in the analysis window is unvoiced or voiced. When the voice is unvoiced, it becomes white noise. On the other hand, when it is voiced, a pulse train forming a pitch period is generated. Become. The residual wave power Af is obtained by integrating them over the analysis window.
[0010]
Reference numeral 3 denotes a feature parameter storage unit, which sequentially stores Percoll coefficients K1 to Kn and residual wave power Af output from the Percoll analysis system 2 for each analysis frame. The analysis frame here corresponds to a speech analysis period defined by a window function described later. Reference numeral 4 denotes a control unit which controls the reading of the Percoll coefficients K1 to Kn and the residual wave power Af stored in the characteristic parameter storage unit 3 according to the performance data, and instructs an excitation unit 5 described later to generate an excitation signal. is there.
The excitation unit 5 outputs a signal OSC by adding a waveform generator 5-1 that generates a plurality of waveform signals at a pitch corresponding to the performance data, and a waveform signal output from each of the waveform generators 5-1. It comprises an adder 5-2 and a white noise generator 5-3 for generating a white noise WN, and outputs a signal OSC or white noise WN in accordance with an instruction from the control unit 4 to terminals IN1 and IN1 of a Percoll synthesis system 6 described later. Supply to IN2. That is, when the Percoll synthesis system 6 synthesizes a voiced sound, the signal OSC is supplied to the terminal IN1, and when the unvoiced sound is synthesized, the white noise WN is supplied to the terminal IN2.
[0011]
By the way, in the Percoll synthesis, a pulse waveform is used as an excitation waveform for a voiced sound. However, the waveform generator 5-1 is not limited to this, and is synthesized by generating signals having various waveform shapes. It becomes possible to expand the range of the tone of the sound. In particular, a waveform containing many overtone components, such as a triangular wave or a waveform having a different pulse width, is effective. For example, by simultaneously generating a plurality of waveform signals detuned for note-on, the following will be described. Chorus in Percoll Synthesis System 6ofSuch speech synthesis can be realized.
The Percoll synthesizing system 6 synthesizes speech in a process reverse to that of the above-mentioned Percoll analysis system 2, and includes a Percoll coefficient K 1 to Kn and a residual error which are read from the feature parameter storage unit 3 in accordance with an instruction from the control unit 4. The audio data xi is synthesized with the wave power Af and the excitation signal given from the excitation unit 5. Reference numeral 7 denotes a D / A converter, which converts the audio data xi output from the Percoll synthesis system 6 into an analog signal and outputs a synthesized audio signal.
[0012]
B. Main configuration
Next, each configuration of the above-mentioned Percoll analysis system 2, feature parameter storage unit 3, and Percoll synthesis system 6 will be described with reference to FIGS.
(1) Configuration of Percoll analysis system 2
FIG. 2 is a block diagram showing the configuration of the Percoll analysis system 2. In FIG. 1, reference numeral 2a denotes a delay circuit that delays an input signal by at least one sampling period and outputs the delayed signal. The delay circuit 2a is not limited to a one-sample delay, and a two-sample delay is appropriate when the sampling frequency Fs of the system exceeds 30 KHz. Reference numeral 2b denotes a correlator that calculates an autocorrelation value after multiplying a sample (voice data) xi (N pieces) by a window function W (n) and weighting the sample.
[0013]
As the window function W (n) weighted in the correlator 2b, a Hanning window function represented by the following equation is used.
[0014]
(Equation 1)

[0015]
As shown in FIG. 3, the Hanning window function W (n) has an analysis frame having a width of about 30 ms, and performs analysis at a frame period of 20 ms. In the correlator 2b, the Percoll coefficients K1 to Kn are calculated based on the autocorrelation function shown by the following equation. In this function, the Percoll coefficients K1 to Kn are normalized so as not to be affected by the amplitude of the sample. The Percoll coefficients K1 to Kn are “1” when there is a perfect correlation, “0” when there is no correlation, and “−1” when there is a completely opposite phase relationship.
[0016]
(Equation 2)

[0017]
2c is a coefficient multiplier for multiplying the audio data delayed by one sample by the Percoll coefficient K1 and outputting the result, and 2d is a coefficient multiplier for multiplying the currently sampled audio data by the Percoll coefficient K1 and outputting the result. 2e is a subtractor for subtracting the output of the coefficient multiplier 2c from the currently sampled audio data, and 2f is a subtractor for subtracting the output of the coefficient multiplier 2d from the audio data delayed by one sample.
The above components 2a to 2f constitute a lattice filter 2-1. The lattice filters 2-1 to 2-n, which are cascade-connected in n stages, form a linear prediction error between samples (audio data) xi. A Percoll analysis system 2 for generating Percoll coefficients K1 to Kn representing the correlation is formed.
The residual wave output from the lattice filter 2-n at the final stage is supplied to an integrating circuit 2g, and generates a residual wave power Af integrated over an analysis window.
[0018]
(2) Configuration of the feature parameter storage unit 3
The Percoll coefficients K1 to Kn and the residual wave power Af output from the Percoll analysis system 2 are sequentially stored in the feature parameter storage unit 3 based on an instruction from the control unit 4.
Here, the frame storage mode in the feature parameter storage unit 3 will be described with reference to FIG. In the case of the present embodiment, the analyzed Percoll coefficients K1 to Kn and the residual wave power Af are stored in a matrix for each of the analysis frames described above. Here, the characteristic point is that a stop bit STB is provided in the most significant bit MSB of the characteristic parameter.
[0019]
In the case of Japanese, the Percoll coefficients K1 to Kn hardly change in the vowel part, whereas the harmonic changes in the consonant part are large, so that the Percoll coefficients K1 to Kn change accordingly. Therefore, in the vowel part, as shown in FIG. 4, the stop bit STB is set to a stop flag “1”, and the data frame can be significantly reduced by deleting the analysis frames having similar characteristic parameters before and after. It has become. In addition, it becomes possible to synthesize a singing voice according to the performance data.
That is, when the note-on is given as the performance data, the feature parameters are sequentially read out and input to the Percoll synthesis system 6 until the frame in which the stop flag is "1", and the stop flag is "1". When the frame becomes a reading target, the update reading of the frame is temporarily stopped. Then, when the next note-on occurs, the feature parameters are sequentially read out until the frame in which the stop flag is set again, and the read out feature parameters are supplied to the Percoll synthesis system 6 to perform voice synthesis.
[0020]
(3) Configuration of Percoll synthesis system 6
Next, the configuration of a Percoll synthesis system 6 that performs voice synthesis based on the Percoll coefficients K1 to Kn and the residual wave power Af read from the feature parameter storage unit 3 will be described with reference to FIG. In FIG. 5, reference numeral 6a denotes a voiced sound or an unvoiced sound by comparing the magnitude of a Percoll coefficient K1 with a predetermined constant (about 0.2 to 0.3), and determines a signal W1 or W1 corresponding to the result. This is a comparator that generates W2.
That is, when the feature parameter to be synthesized is “voiced sound”, the comparator 6a outputs the signal W1 as the normalized level “1” and sets the signal W2 to “0”. On the other hand, when the characteristic parameter to be synthesized is “unvoiced sound”, the signal W2 is output as the normalized level “1”, and the signal W1 is set to “0”.
[0021]

Reference numerals

6b and 6c denote coefficient multipliers, respectively. The coefficient multiplier 6b multiplies the signal OSC supplied to the terminal IN1 by a signal W1, and outputs the result. The coefficient multiplier 6c outputs white noise WN supplied to the terminal IN2. Is multiplied by a signal W2 and output. An adder 6d adds the outputs of the

coefficient multipliers

6b and 6c and outputs the result.
Therefore, the adder 6d outputs the signal OSC when synthesizing voiced sounds, and generates white noise WN as an excitation source when synthesizing unvoiced sounds.
In the above-described comparator 6a, the signals W1 and W2 are generated so as to switch the excitation waveform to the signal OSC or the white noise WN in accordance with the voiced sound or the unvoiced sound. The signals W1 and W2 may be cross-fade according to the unvoiced sound. In that case, the change from voiced sound to unvoiced sound or the change from unvoiced sound to voiced sound becomes more natural.
[0022]
A coefficient multiplier 6e multiplies the output of the adder 6d by the residual power Af and outputs the result. Reference numerals 6-1 to 6-n denote lattice filters that synthesize speech based on the Percoll coefficients K1 to Kn in the reverse process of the above-mentioned Percoll analysis process. These cascade-connected lattice filters include a delay circuit 6f,

coefficient multipliers

6g and 6h, an adder 6i, and a subtractor 6j.
If the delay circuit 6f has the same sampling delay as the Percoll analysis system 2, the delay circuit 6f has the same formant as the analyzed audio signal. Therefore, in order to intentionally make the formant different as a special effect at the time of speech synthesis, a sampling delay amount different from that at the time of analysis may be used.
[0023]
C. Specific configuration
Next, a specific configuration of the electronic musical instrument according to the present embodiment will be described with reference to FIG. In this figure, the same elements as those shown in FIG. 1 are denoted by the same reference numerals, and the description thereof will be omitted.
In FIG. 6, reference numeral 10 denotes a CPU which controls each section of the musical instrument and has the functions of the above-mentioned Percoll analysis system 2 and control section 4. The operation of the CPU will be described later. Reference numeral 11 denotes a ROM in which various control programs and control data loaded on the CPU 10 are stored. A RAM 12 temporarily stores various registers or flag data as a work area of the CPU 10. The RAM 12 is used as the above-described feature parameter storage unit 3, and has a predetermined storage area in which the analyzed Percoll coefficients K1 to Kn and the residual wave power Af are arrayed for each analysis frame (FIG. 4). Reference).
[0024]
Reference numeral 13 denotes an operation panel on which various operation switches are provided and which generates an operation signal according to each switch operation. The operation panel 13 includes a sampling switch SS operated at the time of audio sampling, a start switch STS operated at the start of sampling, an edit switch ES operated at the time of editing and processing the sampled audio data, and a percall analysis. An analysis switch AS operated at the time is provided.
Switches for editing include a switch INC for incrementing the analysis frame, a switch DEC for decrementing the analysis frame, a switch DEL for deleting the analysis frame, a switch SPS for setting the analysis frame, and a switch SRS for resetting the analysis frame. The purpose of these switches will be described later.
[0025]
A display unit 14 includes an LCD display panel, an LCD drive circuit, and the like, and displays, for example, sampled audio data in a time series on the LCD according to a display control signal supplied from the CPU 10 via a bus. Reference numeral 15 denotes a digital signal processor (hereinafter, referred to as a DSP) that simulates the above-described processing of the Percoll synthesis system 6. The DSP 15 synthesizes voice based on the Percoll coefficients K1 to Kn and the residual wave power Af transferred from the RAM 12 under the control of the CPU 10 according to the performance data.
The audio data synthesized by the DSP 15 is converted into an analog audio signal via the D / A converter 7 at the next stage, and after filtering such as noise removal is performed by a sound system (not shown), It is emitted as a more natural singing voice.
[0026]
D. Operation of the embodiment
Next, the operation of the embodiment having the above configuration will be described with reference to FIGS. Hereinafter, the processing of the main routine will be described first as an overall operation, and then the processing contents of various routines and interrupt routines called in this main routine will be described in order.
(1) Main routine operation
First, when the power is turned on to the electronic musical instrument according to the present embodiment, the CPU 10 reads a predetermined control program from the ROM 11 and loads it into itself, and then executes the main routine shown in FIG. Initialize such as resetting various registers to zero and setting initial values.
That is, in step SA1, the three types of registers i, j, and k in which the program counter values are set are reset to zero, respectively, while the initial values are set in the registers frame_end, win_end, ana_step, k_end, and frame.
[0027]
Here, the register frame_end is a register in which the number of analysis frames (windows) is stored. In this example, “30” frames are initially set. The register win_end is a register in which the number of samplings constituting one frame is stored. In this example, “1440” is set.
The register ana_step is a register in which the number of samplings between frames to be analyzed is stored. In this example, “960” is set. The register k_end stores the order of the Percoll coefficient to be analyzed. In this example, “20” is set. The register frame is a register in which the number of the frame currently being processed is stored. In this case, the register is reset to zero.
[0028]
After the initialization, the CPU 10 proceeds to the next step SA2, and determines whether or not the sampling switch SS is turned on among the various switches arranged on the operation panel 13. Here, when voice sampling is performed prior to the performance, the switch SS is turned on, so that the determination result is "YES", and the process proceeds to Step SA3 to execute a sampling process routine described later.
On the other hand, when the sampling switch SS is not turned on, the determination result is “NO”, and the process proceeds to the next Step SA4.
At step SA4, it is determined whether or not the edit switch ES among the various switches arranged on the operation panel 13 is turned on. Here, when the switch ES is turned on, the judgment result is "YES", the process proceeds to step SA5, and an edit processing routine described later is executed. When the switch ES is not turned on, the judgment result is "NO", and the process proceeds to step SA6. Proceed with the process.
[0029]
In step SA6, it is determined whether or not the analysis switch AS has been turned on. Here, when the switch AS is turned on to perform Percoll analysis on the sampled audio data, the determination result is “YES”, and the process proceeds to Step SA7 to execute an analysis processing routine described later. On the other hand, when the switch AS is not turned on, the determination result is “NO”, and the process proceeds to Step SA8.
When the "sampling process routine", "edit process routine" and "analysis process routine" executed through steps SA3, SA5 and SA7 described above are completed, the process proceeds to step SA8, and the performance process described later is performed. Execute the routine. Then, after that, the CPU 10 returns the process to step SA2 again, and thereafter repeats the above-described process.
[0030]
As described above, in the main routine, the “sampling processing routine”, the “edit processing routine”, and the “analysis processing routine” are executed in response to the switch event generated after the initialization, and the characteristic parameters (Percoll coefficient) obtained based on these processing are executed. And the residual wave power) is controlled in accordance with the performance data.
Normally, the audio signal is first sampled by a "sampling processing routine", and then the feature parameters (Percoll coefficient and residual wave power) for each analysis frame are extracted by an "analysis processing routine". Subsequently, if necessary, the extracted characteristic parameters are edited by the "edit processing routine", and the process of adjusting the singing voice to be synthesized in the performance processing routine is followed. Hereinafter, the operation of each routine will be described according to such a process.
[0031]
(2) Operation of timer interrupt processing routine
The CPU 10 executes timer interrupt processing at regular intervals in order to update the analysis frame. For example, the CPU 10 releases the interrupt mask every 20 msec, executes the timer interrupt processing routine shown in FIG. 8, and proceeds to step SB1. Then, the timer flag “1” is set in the register time.
The timer flag set in the register time is reset to zero after the analysis frame is updated.
[0032]
(3) Operation of sampling processing routine
When the sampling switch SS provided on the operation panel 13 is turned on to sample the audio signal, the sampling processing routine is executed via the above-described step SA3, and the CPU 10 proceeds to step SC1 shown in FIG. Proceed.
In step SC1, the standby state is set until the start switch STS for instructing the start of sampling is turned on. When the switch STS is turned on, the determination result is "YES", The process proceeds to the next step SC2, and the register i is reset to zero.
[0033]
Next, in step SC3, it is determined whether the value of the register i has reached frame_end × ana_step, that is, whether the specified number of samplings have been completed. Here, if not completed, the determination result is “NO”, and the process proceeds to the next Step SC4. In step SC4, the audio data xi sampled in correspondence with the value of the register i is stored in the RAM 12. Thereafter, the process returns to step SC3 again, and steps SC3 and SC4 are repeated until the specified number of audio data xi has been sampled. When the sampling is completed, the result of the determination in step SC3 becomes "YES", the present routine is terminated, and the process returns to the main routine.
[0034]
(4) Operation of analysis processing routine
As described above, when the sampled audio data xi is stored in the RAM 12 by the sampling process, the operator performs Percoll analysis on the audio data xi to extract characteristic parameters (Percoll coefficient and residual wave power). The analysis switch on the operation panel 13 is turned on.
Then, the CPU 10 executes the analysis processing routine shown in FIG. 10 through step SA7 of the above-described main routine (see FIG. 7), and advances the processing to step SD1.
[0035]
In step SD1, it is determined whether the value of the register i has reached the last frame value stored in the register frame_end, that is, whether or not the parcall analysis has been completed. When the analysis is completed, the determination result is "YES" and the routine ends, but otherwise, the determination result is "NO" and the process proceeds to the next step SD2.
In step SD2, it is determined whether or not the value of the register j has reached the value of the register win_end. Here, if the Parcall analysis has not been completed for the audio data xi for one frame, the determination result is “NO”, and the flow proceeds to the next Step SD3.
[0036]
In step SD3, the voice data xi sequentially read from the RAM 12 is multiplied by the above-mentioned Hanning window function W (n) according to the values of the register i, the register ana_step, and the register j. Store in register wave1 [j]. When the windowing of the audio data xi for one frame is completed, the result of the determination in step SD2 is "YES", and the process proceeds to the next step SD4.
In step SD4, the registers j and Z are temporarily reset to zero, respectively. In step SD5, it is determined whether or not the sum of squares of the analysis data windowed in step SD3 has been obtained for one frame. When the sum of squares has not been calculated for the analysis data for one frame, the determination result is “NO”, and the process proceeds to Step SD6 where the analysis data stored in the register wave1 [j] is squared and accumulated. .
[0037]
When the sum of squares has been calculated, the result of the determination in step SD5 becomes "YES", the process proceeds to the next step SD7, and the value of the register j is reset to zero.
Next, the CPU 10 proceeds to step SD8 shown in FIG. 11, and determines whether the value of the register j has reached the value of the register win_end, that is, whether or not the value of the register j has reached the last data in one frame. Here, when the data has not reached the final data, the determination result is “NO”, and the flow proceeds to the next Step SD9. At step SD9, the analysis data stored in the register wave1 corresponding to the value of the register j is sequentially stored in the register wave2.
[0038]
When the value of the register wave1 has been copied to the register wave2 in this manner, the result of the determination in step SD8 becomes "YES", and the process proceeds to the next step SD10. At Step SD10, the registers j and k are reset to zero.
Subsequently, when the process proceeds to step SD11, the CPU 10 stores the register k [frame] [0] to k [frame] [k_end] in which the Percoll coefficients K1 to Kn, which are the autocorrelation values, are stored, and the residual wave power Af. The stored register af [frame] is reset to zero.
Note that the registers k [frame] [0] to k [frame] [k_end] are two-dimensional array elements whose elements are determined by the value of the register frame and the value of the register k_end.
[0039]
Next, in step SD12, it is determined whether or not the value of the register k has reached the value of the register k_end, that is, whether or not the feature parameters for one frame have been extracted. Here, when the extraction of the characteristic parameters is completed, the determination result is set to “YES” to proceed with the analysis for the next frame, and the process returns to step SD1 (see FIG. 10).
On the other hand, if the extraction of the feature parameters is not completed, the determination result is “NO”. In this case, the steps SD13 to SD17 are executed according to the value of the register k to execute the linear prediction error between the analysis data in one frame. Is performed, and the extracted Percoll coefficients are sequentially stored in registers k [frame] [0] to k [frame] [k_end].
When the process of calculating the correlation coefficient (Percoll coefficient) is completed, the determination result in step SD16 becomes "YES", and the process proceeds to step SD18 to execute a residual wave power calculation processing routine (described later). Then, the process returns to step SD12.
[0040]
(5) Operation of residual wave power calculation processing routine
When the residual wave power calculation processing routine is executed through step SD18 described above, the CPU 10 proceeds to step SE1 shown in FIG. 12, and resets the register j and the register af [frame] to zero. Next, when the process proceeds to step SE2, it is determined whether or not the current process has reached the final data in one frame. Here, when the data has not reached the final data, the determination result is “NO”, and the flow proceeds to the next Step SE3. In step SE3, the residual wave power af is calculated by squaring the value of the register wave1 [j] and storing the result in the register af [frame].
[0041]
When the residual wave power af per analysis frame is calculated, the result of the determination in step SE2 is “YES”, and the process proceeds to the next step SE4. In step SE4, the CPU 10 determines whether the calculated residual wave power af is larger than a predetermined value.
The predetermined value is a threshold for determining whether the voice subjected to the percall analysis corresponds to “voiced sound” or “unvoiced sound”. Here, when the residual wave power af is larger than a predetermined value, the voice subjected to the percall analysis is regarded as "voiced sound" and the routine is terminated. "NO", in this case, the process proceeds to step SE5, and the residual wave power af is set to "0".
[0042]
(6) Operation of edit processing routine
When editing the characteristic parameters (Percoll coefficient and residual wave power) stored in the RAM 12 by the analysis processing routine and the residual wave power calculation processing routine, that is, the stop bit is added to the most significant bit of the characteristic parameter of the vowel part. When performing data processing such as adding an STB or deleting an analysis frame with a small change in characteristic parameters, an edit switch ES among various switches provided on the operation panel 13 is turned on.
When the switch ES is turned on, the CPU 10 executes the edit processing routine shown in FIG. 13 through step SA5 of the main routine described above. In the edit processing routine, processing corresponding to a generated switch event is performed. Hereinafter, an operation for each switch operation will be described.
[0043]
(1) Operation when the frame increment switch INC is operated
To advance the analysis frame to be edited, the frame increment switch INC is operated. When the switch INC is turned on, the result of determination in step SF1 is "YES", and the process proceeds to step SF2. In step SF2, the value of the register "frame" is incremented by one and the value is incremented by one, and in the following step SF3, the characteristic parameters (Percoll coefficient and residual wave power) of the corresponding frame from the RAM 12 in accordance with the updated value of the register "frame" read out. Thereafter, the read characteristic parameters are displayed numerically or graphically on the display, and the process returns to step SF1.
[0044]
(2) Operation when the frame decrement switch DEC is operated
When retreating the analysis frame to be edited, the frame decrement switch DEC is operated. When the switch DEC is turned on, the process proceeds to step SF4 via step SF1, where the determination result is “YES” and the process proceeds to step SF5.
In step SF5, the value of the register frame is decremented by 1, and the process proceeds to step SF3. In step SF3, the characteristic parameters (Percoll coefficient and residual wave power) of the corresponding frame are read from the RAM 12 in accordance with the updated value of the register frame. Also in this case, the read characteristic parameters are displayed numerically or graphically on the display, and the process returns to step SF1.
[0045]
(3) Operation when the frame delete switch DEL is operated
In order to reduce the amount of data by deleting analysis frames with little change, the frame delete switch DEL is operated. When the switch DEL is turned on, the process proceeds to step SF6 via the above-described steps SF1 and SF4, where the determination result is “YES”, and the process proceeds to step SF7.
Then, in step SF7, the value of the register frame is stored in the register i, and in step SF8, it is determined whether or not the value of the register i is the last frame. Here, if it is the last frame, the feature parameters of the last frame are deleted, and the process returns to step SF1.
On the other hand, if it is not the last frame, the determination result is “NO”, the process proceeds to step SF9, the analysis frame corresponding to the value of the register i is deleted, and the subsequent frame numbers are incremented by one.
[0046]
(4) Operation when the stop set switch SET is operated
When an analysis frame corresponding to a vowel portion having a small change in the Percoll coefficients K1 to Kn is found, the stop set switch SET is turned on to set a stop flag in a stop bit STB provided in the most significant bit MSB of the characteristic parameter.
When the switch SET is turned on, the CPU 10 determines “YES” in step SF10 shown in FIG. 14 based on the switch event, and proceeds to step SF11. In step SF11, the stop flag “1” is set to the most significant bit MSB of the residual wave power af stored in the register af [frame].
[0047]
(5) Operation when the stop reset switch RESET is operated
On the other hand, when resetting the stop flag provided in response to the operation of the stop set switch SET, the stop reset switch RESET is operated. When the switch RESET is turned on, the result of the determination in step SF12 becomes "YES", the process proceeds to the next step SF13, and the stop flag of the most significant bit MSB of the register af [frame] is reset to zero.
(6) Operation when the exit switch EXIT is operated
When the exit switch EXIT is turned on to end the editing process, the result of the determination in step SF14 becomes "YES", the present routine is completed, and the process returns to the main routine.
[0048]
(7) Operation of performance processing routine
When the performance processing routine shown in FIGS. 15 and 16 is started via the above-described main routine step SA8 (see FIG. 7), the CPU 10 first proceeds to step SG1 and is stored in the register af [frame]. The residual wave power af is read, and it is determined whether or not the stop bit stored in the most significant bit MSB is “0”.
Here, if the stop bit is "0", the determination result is "YES", and the process proceeds to the next step SG2. In step SG2, it is determined whether the value of the register time is "1", that is, whether it is under the timing to transfer the characteristic parameter to the DSP 15 (see FIG. 6).
[0049]
It is assumed that the value of the register time has been set to "1" by the operation of the timer interrupt processing routine described above. Then, since it is time to transfer the characteristic parameters to the DSP 15, the determination result is "YES" and the process proceeds to Step SG3.
In step SG3, the residual wave power af read from the register af [frame] and the registers k [frame] [0] to k [frame] [k_end-1] are transferred to the DSP 15 respectively. Then, when the transfer of the characteristic parameter is completed, the register frame is incremented by one and the step is advanced, while the register time is reset to zero, and the input volume is instructed to the DSP 15 according to the value of the Percoll coefficient K1. Thereafter, the CPU 10 advances the process to step SG7 described later.
In step SG2, when the value of the register "time" is "0", it is not the timing to transfer the characteristic parameter to the DSP 15, so that the process proceeds to step SG7 described later in this case as well.
[0050]
On the other hand, if the stop flag is "1", the result of the determination at step SG1 is "NO", and the process proceeds to step SG4. In step SG4, it is determined whether or not a note-on instruction has been received. Here, when the note-on is made, the determination result is “YES”, and the routine proceeds to the next step SG5.
In step SG5, it is determined whether it is under the timing to transfer the characteristic parameter to the DSP 15 (see FIG. 6). If it is not the transfer timing, the determination result is “NO”, and the process proceeds to Step SG7 described below.
On the other hand, if it is the transfer timing, the determination result is “YES”, and the process proceeds to Step SG6.
[0051]
In step SG6, the CPU 10 reads the residual wave power af read from the register af [frame] and registers k [frame] [0] to k [frame] [k_end-1] are transferred to the DSP 15 respectively. Then, after the transfer is completed, the register "frame" is incremented by one and the step is advanced, and the register "time" is reset to zero. Further, the input volume is instructed to the DSP 15 according to the value of the Percoll coefficient K1, and the note-on flag is reset to zero. Thereafter, the CPU 10 advances the process to step SG7 described later.
[0052]
Next, in step SG7 shown in FIG. 16, it is determined whether or not the last frame has been reached. Here, when the last frame has been reached, the determination result is “YES”, the process proceeds to the next step SG8, the value of the register “frame” is reset to zero, and then the process proceeds to step SG9. On the other hand, if the last frame has not been reached, the process proceeds to step SG9 without performing any processing.
When proceeding to step SG9, the CPU 10 determines whether or not a performance event has occurred. Here, when there is no performance event, the determination result is “NO”, and the present routine is completed once and returns to the main routine.
[0053]
On the other hand, when a performance event has occurred, the determination result is "YES", and the process proceeds to the next step SG10. In step SG10, it is determined whether or not the event is note-on. If it is note-on, the determination result is "YES", and the process proceeds to Step SG11 to turn on the excitation unit 5 (see FIG. 6). That is, when the Percall coefficient of the current analysis frame corresponds to "voiced sound", the excitation unit 5 generates a signal OSC obtained by adding a plurality of waveform signals having a pitch corresponding to the performance data, and generates the signal OSC. An instruction is given to supply the signal to the terminal IN1 of the synthesis system 6 (that is, the DSP 15). Alternatively, when the Percoll coefficient of the current analysis frame corresponds to “unvoiced sound”, the white noise WN is output from the excitation unit 5 and the white noise WN is instructed to be supplied to the terminal IN2 of the Percoll synthesis system 6 (DSP15). Thereafter, the CPU 10 proceeds to step SG12, sets the note-on flag noteon to “1”, and completes this routine.
On the other hand, if the event that has occurred is not note-on, the result of the determination in step SG10 is "NO", and the flow advances to step SG13 to turn off the output of the signal OSC or the white noise WN to the excitation unit 5. Instruct.
[0054]
As described above, in the performance processing routine, when note-on is given as performance data, the feature parameters are sequentially read out and input to the DSP 15 until the frame in which the stop flag is "1", and the stop flag is set to "1". When the frame set to becomes a reading target, the update reading of the frame is temporarily stopped. Then, when the next note-on occurs, the feature parameters are sequentially read out until the frame in which the stop flag is set again and given to the DSP 15 to synthesize the voice.
The DSP 15 performs Percoll synthesis based on the Percoll coefficients K1 to Kn and the residual wave power Af transferred from the RAM 12 in accordance with the instruction of the CPU 10, and the excitation signal given from the excitation unit 5. As a result, the human voice to be synthesized is formed as a natural singing tone.
[0055]
As described above, in the present embodiment, the sampled voice is subjected to Percoll analysis to extract feature parameters, and the extracted feature parameters are percall-synthesized according to the performance data. It is possible to form a musical tone.
Further, in this embodiment, a stop bit STB is provided in the most significant bit MSB of the characteristic parameter, and in a vowel portion where the characteristic parameter does not change much, a stop flag “1” is set in the stop bit in the STB, and the front and rear similarity are set. The amount of data can be greatly reduced by deleting the analysis frame having the characteristic parameter. In addition, this makes it easy to synthesize a singing voice according to the performance data.
[0056]
E. FIG. Modified example
Next, a description will be given of a modification in which a human voice sound synthesized by using the musical sound generation device 100 according to the present invention is formed as a musical sound.
FIG. 17 shows a form in which the audio signal used for the Percoll analysis is supplied from the external sound source 101. In this case, the same MIDI signal is supplied from the MIDI musical instrument 102 that generates the MIDI signal (performance data) to the external sound source 101 and the tone generator 100 according to the present invention, and vocoder processing (Percoll analysis / synthesis) is performed. At this time, the voice signal input to the tone generator 100 is used as a vowel input. A noise source for consonants is provided inside the device 100, and the vowel input and the consonant input are weighted according to the Percoll coefficient.
[0057]
In the case of the embodiment shown in FIG. 18, the tone generator 100 receives a MIDI signal output from a MIDI musical instrument, and further supplies the received MIDI signal to an external tone generator 101 capable of multi-timbral drive. The external sound source 101 generates an audio signal with a multi-timbre, and sets a vowel timbre and a consonant timbre in the audio signal. In the device 100, the volume level of each of these timbres (tone colors) is weighted by the Percoll coefficient, and then the Percoll synthesis is performed. Thereby, the pronunciation of vowels and consonants is realized.
[0058]
In the above-described embodiment, the analysis frame to be subjected to speech synthesis is updated according to the note-on. However, the present invention is not limited to this. For example, the analysis frame is updated according to the note-off or velocity to sequentially perform the percall synthesis. It is good also as an aspect which performs.
[0059]
【The invention's effect】
According to the present invention, the voice analysis means analyzes the voice signal sampled in time series to extract characteristic parameters, and stores them in the parameter storage means. When the characteristic parameter is read from the parameter storage means and an excitation signal corresponding to the performance information is generated, the voice synthesis means synthesizes the voice according to the characteristic parameter read from the parameter storage means and the excitation signal. A human voice can be formed as a natural singing tone.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an embodiment according to the present invention.
FIG. 2 is a block diagram showing a configuration of a Percoll analysis system 2 in the embodiment.
FIG. 3 is a diagram for explaining an analysis frame in the embodiment.
FIG. 4 is a diagram for explaining a feature parameter storage mode of a feature parameter storage 3 in the embodiment.
FIG. 5 is a block diagram showing a configuration of a Percoll synthesis system 6 in the embodiment.
FIG. 6 is a block diagram showing a specific configuration of the embodiment.
FIG. 7 is a flowchart showing an operation of a main routine in the embodiment.
FIG. 8 is a flowchart showing an operation of a timer interrupt processing routine in the embodiment.
FIG. 9 is a flowchart showing an operation of a sampling processing routine in the embodiment.
FIG. 10 is a flowchart showing an operation of an analysis processing routine in the embodiment.
FIG. 11 is a flowchart showing an operation of an analysis processing routine in the embodiment.
FIG. 12 is a flowchart showing an operation of a residual wave power calculation processing routine in the embodiment.
FIG. 13 is a flowchart showing an operation of an edit processing routine in the embodiment.
FIG. 14 is a flowchart showing an operation of an edit processing routine in the embodiment.
FIG. 15 is a flowchart showing an operation of a performance processing routine in the embodiment.
FIG. 16 is a flowchart showing the operation of a performance processing routine in the embodiment.
FIG. 17 is a diagram illustrating a modification.
FIG. 18 is a diagram for explaining a modified example.
[Explanation of symbols]
2 Percoll analysis system (voice analysis means)
3 feature parameter storage unit (parameter storage means)
4 control section (musical sound control means)
5 Exciter (musical sound control means)
6 Percoll synthesis system (voice synthesis means)

Claims

Voice analysis means for analyzing a time-series sampled voice signal for each analysis frame and extracting feature parameters;
Parameter storage means for storing, for each analysis frame, a feature parameter extracted by the voice analysis means , and a stop flag in one of an on state and an off state ;
The reading of characteristic parameters from the parameter storage means according to the performance information is paused at the analysis frame in which the stop flag is on, and the characteristic parameters are read from the analysis frame of the pause at the next performance information generation. Music control means for generating an excitation signal corresponding to the performance information while resuming reading ,
A musical sound generating device comprising: a voice synthesizing unit that synthesizes voice according to a characteristic parameter read from the parameter storage unit and the excitation signal.

2. The musical sound generator according to claim 1, further comprising a flag changing unit for changing a state of the stop flag of the designated analysis frame among the stop flags for each analysis frame stored in the parameter storage unit.

2. The musical sound generating apparatus according to claim 1, further comprising a parameter deleting unit that deletes a characteristic parameter of a specified analysis frame from among characteristic parameters for each analysis frame stored in the parameter storage unit.