JP3802293B2

JP3802293B2 - Musical sound processing apparatus and musical sound processing method

Info

Publication number: JP3802293B2
Application number: JP30026799A
Authority: JP
Inventors: 隆宏川嶋; セラザビエル; ボナダジョルディ
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1999-10-21
Filing date: 1999-10-21
Publication date: 2006-07-26
Anticipated expiration: 2019-10-21
Also published as: JP2001117564A

Description

【０００１】
【発明の属する技術分野】
本発明は、楽音処理装置および楽音処理方法に係り、特に、様々なインストゥルメントの特徴を合成した新しい楽器音を生成する楽音処理装置および楽音処理方法に関する。
【０００２】
【従来の技術】
従来の楽音処理装置においては、２つの音色を補間することによって新しい音色を生成することが行われている。
そして、２つのスペクトル・シェイプを補間する技術については、エネルギー集中させた周波数を基準にして、２つのスペクトル包絡を複数の周波数帯域に分割し、それぞれの帯域ごとにスペクトル変換を行う声質変換方法「特開平９−２４４６９４」、および、二つの原音に対して設定されたパラメータ間の対応する時点を抽出することにより、パラメータを２原音への所望の近さの程度に応じて補間して新しいパラメータを作成する補間音色合成方法「特開平１０−２５４５００」とがある。
【０００３】
【発明が解決しようとする課題】
ところが、３つ以上の音色を合成する技術がないため、例えば、ギター、ピアノおよびフルートの音色からなる発音情報を補間しようとしても、ギターとピアノ、ピアノとフルート、あるいは、ギターとフルートのように２つの音色を補間した楽音信号しか出力することができなかった。
そこで、本発明の目的は、３つ以上のスペクトル・シェイプを補間することができ、より多くの新しい楽音を生成することができる楽音処理装置および楽音処理方法を提供することにある。
【０００４】
【課題を解決するための手段】
上述した課題を解決するため、請求項１に記載の発明は、インストゥルメントにおいて発音された音のサンプリングデータを分析することによって得られた正弦波成分、残差成分およびスペクトル・シェイプを含む分析データを記憶する分析データ記憶手段と、本楽音処理装置により出力する音のピッチ、アンプリチュードおよび音符長を指定するために外部より入力された発音情報およびその音を複数のインストゥルメントの音色から生成するために外部より入力された補間情報に基づいて、前記分析データ記憶手段から前記複数のインストゥルメントに対応する前記分析データをそれぞれ抽出する分析データ抽出手段と、前記抽出された分析データに対応する複数のスペクトル・シェイプのうちの、２つずつを用いて補間処理を行うに際して、該２つのスペクトル・シェイプ間におけるスペクトル・シェイプ遷移関数を用いて補間処理を行い、１つの補間スペクトル・シェイプを生成する補間処理手段と、前記補間スペクトル・シェイプに基づいて楽音信号を生成し出力する楽音信号生成手段と、を備えたことを特徴としている。
【０００６】
請求項２に記載の発明は、請求項１記載の楽音処理装置において、前記分析データ記憶手段は、前記インストゥルメントごとに立ち上がり状態を示す信号か、定常状態を示す信号か、開放状態を示す信号かを表す発音状態のそれぞれに対応する前記正弦波成分および前記残差成分を複数記憶し、前記分析データ抽出手段は、前記入力された発音状態を分析することにより該発音情報の発音状態を判断し、該判断された発音状態に応じて正弦波成分および残差成分を抽出することを特徴としている。
【０００８】
請求項３に記載の発明は、請求項１記載の楽音処理装置において、前記補間処理手段は、前記スペクトル・シェイプを２種ごとの組に分ける組分け処理と、当該組ごとに行う前記補間処理とを繰り返し行い、１つの補間スペクトル・シェイプを生成することを特徴としている。
【００１１】
請求項４に記載の発明は、請求項１記載の楽音処理装置において、前記遷移関数は、線形関数あるいは非線形関数として予め定義されていることを特徴としている。
【００１２】
請求項５に記載の発明は、請求項１記載の楽音処理装置において、前記補間処理手段は、前記２つのスペクトル・シェイプを周波数軸上でそれぞれ複数の領域に分け、各領域に属する前記２つのスペクトル・シェイプ上の実在の周波数およびマグニチュードの組に対し、前記遷移関数としての線形関数を用いた前記補間処理を前記複数の領域にわたって行うことを特徴としている。
【００１３】
請求項６に記載の発明は、請求項５記載の楽音処理装置において、前記補間処理手段は、前記各領域に属する一方のスペクトル・シェイプの周波数である第１周波数および当該第１周波数に対応する他方のスペクトル・シェイプの周波数である第２周波数を前記線形関数を用いて補間することにより補間周波数を算出する周波数補間手段と、前記各領域に属する一方のスペクトル・シェイプのマグニチュードである第１マグニチュードおよび当該第１マグニチュードに対応する他方のスペクトル・シェイプのマグニチュードである第２マグニチュードを前記線形関数を用いて補間するマグニチュード補間手段と、を備えたことを特徴としている。
【００１７】
請求項７に記載の発明は、請求項２記載の楽音処理装置において、前記補間処理手段は、前記分析データ記憶手段によって記憶されている前記正弦波成分および前記残差成分を補間することにより、前記分析データ記憶手段によって記憶されていない正弦波成分および残差成分を算出する分析データ算出手段を備え、該分析データ算出手段により算出された正弦波成分および残差成分を用いて、前記補間処理を行う特徴としている。
【００２１】
請求項８に記載の発明は、音声を入力するための音声入力手段と、前記音声入力手段から入力された入力音声をフレーム単位で周波数分析して音声正弦波成分および音声残差成分を抽出する周波数分析手段と、インストゥルメントにおいて発音された音のサンプリングデータを分析することによって得られた正弦波成分、残差成分およびスペクトル・シェイプを含む分析データを記憶する分析データ記憶手段と、本楽音処理装置により出力する音のピッチ、アンプリチュードおよび音符長を指定するために外部より入力された発音情報およびその音を複数のインストゥルメントの音色から生成するために外部より入力された補間情報に基づいて、前記分析データ記憶手段から前記複数のインストゥルメントに対応する前記分析データをそれぞれ抽出する分析データ抽出手段と、前記抽出された分析データに対応する複数のスペクトル・シェイプのうちの、２つずつを用いて補間処理を行うに際して、該２つのスペクトル・シェイプ間におけるスペクトル・シェイプ遷移関数を用いて補間処理を行い、１つの補間スペクトル・シェイプを生成する補間処理手段と、前記補間処理手段によって生成された１つの補間スペクトル・シェイプと周波数分析手段によって抽出された前記音声正弦波成分および前記音声残差成分とを所定のモーフィング度によってモーフィング処理するモーフィング手段と、前記モーフィング手段より出力された新たな周波数成分に対して逆フーリエ変換を行うことにより楽音信号を生成し出力する楽音信号生成手段と、を備えたことを特徴としている。
【００２２】
請求項９に記載の発明は、請求項８記載の楽音処理装置において、前記所定のモーフィング度を設定するモーフィング設定手段を備え、前記モーフィング手段は、前記モーフィング設定手段によって設定されたモーフィング度により前記補間スペクトル・シェイプと前記音声正弦波成分および前記音声残差成分とをモーフィングすることを特徴としている。
【００２５】
請求項１０に記載の発明は、請求項８記載の楽音処理装置において、前記補間処理手段は、前記入力音声のピッチに対応して予め定められた所定の割合から求められたピッチによって、前記正弦波成分のピッチをシフトするピッチシフト手段と、前記ピッチシフト手段によってピッチがシフトされた前記正弦波成分に対して残差成分を付加する残差成分付加手段と、を備えたことを特徴としている。
【００２７】
請求項１１に記載の発明は、本楽音処理方法により出力する音のピッチ、アンプリチュードおよび音符長を指定するために外部より入力された発音情報およびその音を複数のインストゥルメントの音色から生成するために外部より入力された補間情報に基づいて、インストゥルメントにおいて発音された音のサンプリングデータを分析することによって得られた正弦波成分、残差成分およびスペクトル・シェイプを含む分析データから複数のインストゥルメントに対応する前記分析データを抽出する分析データ抽出過程と、前記分析データに対応するスペクトル・シェイプのうちの、２つずつを用いて補間処理を行うに際して、該２つのスペクトル・シェイプ間におけるスペクトル・シェイプ遷移関数を用いて補間処理を行い、１つの補間スペクトル・シェイプを生成する補間処理過程と、前記補間スペクトル・シェイプに基づいて楽音信号を生成し出力する楽音信号生成過程と、を備えたことを特徴としている。
【００３０】
請求項１２に記載の発明は、請求項１１記載の楽音処理方法において、前記補間処理過程は、前記スペクトル・シェイプを２種ごとの組に分ける組分け処理と、当該組ごとに行う前記補間処理とを繰り返し行い、１つの補間スペクトル・シェイプを生成することを特徴としている。
【００３３】
請求項１３に記載の発明は、請求項１１記載の楽音処理方法において、前記補間処理過程は、前記２つのスペクトル・シェイプを周波数軸上でそれぞれ複数の領域に分け、各領域に属する前記２つのスペクトル・シェイプ上の実在の周波数およびマグニチュードの組に対し、前記遷移関数としての線形関数を用いた前記補間処理を前記複数の領域にわたって行うことを特徴としている。
【００３４】
請求項１４に記載の発明は、請求項１３記載の楽音処理方法において、前記補間処理過程は、前記各領域に属する一方のスペクトル・シェイプの周波数である第１周波数および当該第１周波数に対応する他方のスペクトル・シェイプの周波数である第２周波数を前記線形関数を用いて補間することにより補間周波数を算出する周波数補間過程と、前記各領域に属する一方のスペクトル・シェイプのマグニチュードである第１マグニチュードおよび当該第１マグニチュードに対応する他方のスペクトル・シェイプのマグニチュードである第２マグニチュードを前記線形関数を用いて補間するマグニチュード補間過程と、を備えたことを特徴としている。
【００３５】
請求項１５に記載の発明は、請求項１１記載の楽音処理方法において、前記補間情報は、前記インストゥルメント間の合成割合について補間処理を行うための全体補間情報、発音状態について補間処理を行うための状態補間情報、ピッチについて補間処理を行うためのピッチ補間情報およびアンプリチュードについて補間処理を行うためのアンプリチュード補間情報とを含んだことを特徴としている。
【００３７】
請求項１６に記載の発明は、音声を入力する音声入力過程と、前記音声入力過程において入力された入力音声をフレーム単位で周波数分析して正弦波成分および残差成分を抽出する周波数分析過程と、本楽音処理方法により出力する音のピッチ、アンプリチュードおよび音符長を指定するために外部より入力された発音情報およびその音を複数のインストゥルメントの音色から生成するために外部より入力された補間情報に基づいて、インストゥルメントにおいて発音された音のサンプリングデータを分析することによって得られた正弦波成分、残差成分およびスペクトル・シェイプを含む分析データから複数のインストゥルメントに対応する前記分析データを抽出する分析データ抽出過程と、前記分析データに対応するスペクトル・シェイプのうちの、２つずつを用いて補間処理を行うに際して、該２つのスペクトル・シェイプ間におけるスペクトル・シェイプ遷移関数を用いて補間処理を行い、１つの補間スペクトル・シェイプを生成する補間処理過程と、前記補間処理過程によって生成された１つの補間スペクトル・シェイプと周波数分析過程によって抽出された前記正弦波成分および前記残差成分とを所定のモーフィング度によってモーフィング処理するモーフィング過程と、前記モーフィング過程にて出力された新たな周波数成分に対して逆フーリエ変換を行うことにより楽音信号を生成し出力する楽音信号生成過程と、を備え、前記分析データ抽出過程は、前記正弦波成分に含まれる発音情報および外部より入力された補間情報に基づいて、３種以上のインストゥルメントに対応する前記分析データを抽出することを特徴としている。
【００３８】
請求項１７に記載の発明は、請求項１６記載の楽音処理方法において、前記所定のモーフィング度を設定するモーフィング設定過程を備え、前記モーフィング過程は、前記モーフィング設定過程によって設定されたモーフィング度により前記補間スペクトル・シェイプと前記音声正弦波成分および前記音声残差成分とをモーフィングすることを特徴としている。
【００４０】
請求項１８に記載の発明は、請求項１６記載の楽音処理方法において、前記補間処理過程は、前記入力音声のピッチに対応して予め定められた所定の割合から求められたピッチによって、前記正弦波成分のピッチをシフトするピッチシフト過程と、前記ピッチシフト過程によってピッチがシフトされた前記正弦波成分に対して残差成分を付加する残差成分付加過程と、を備えたことを特徴としている。
【００４１】
【発明の実施の形態】
次に、図面を参照して本発明の好適な実施形態について説明する。
［１］第１実施形態
［１．１］楽音処理装置の全体構成
図１に、本発明の第１実施形態である楽音処理装置の全体構成を示す。
楽音処理装置１は、サンプリングされたインストゥルメントの音声波形を後述するＳＭＳ分析によって分析し、分析の結果、得られた分析データを予め記憶している分析データ記憶部１２と、外部より入力される発音情報信号およびユーザにより設定された補間情報により生成された補間情報信号に基づいて分析データ記憶部１２に記憶されている分析データを抽出する分析データ抽出部１１と、抽出された分析データに基づいて補間を行うことによって新たな周波数成分を生成する補間処理部１３と、生成された周波数成分に対して逆高速フーリエ変換（ＩＦＦＴ）およびオーバーラップ処理を行う周波数合成処理部１４と、周波数合成処理部１４によって生成された周波数信号に対してエンベローブ付加処理を行うエンベローブ処理部１５と、エンベローブ処理部１５によって生成された出力楽音信号を出力する出力部１６とを備えて構成されている。
【００４２】
ここで、本実施形態における発音情報とは、例えば、“８分音符のド”などのように、音の名前や長さなどを含んだ情報のことをいう。したがって、オクターブと音の名前を指定するＭＩＤＩｎｏｔｅや音符の種類を指定するＭＩＤＩｓｔなどのデータも発音情報に含まれる。
【００４３】
次に、上述した補間情報について、詳細に説明する。
本実施形態における補間情報は、インストゥルメントごとに設定することが可能な種々の補間率を含む多次元補間情報のことをいう。
多次元補間情報には、全体多次元補間率、後述する各発音状態における状態多次元補間率、ピッチ多次元補間率およびアンプリチュード多次元補間率がある。
全体多次元補間率は、インストゥルメントを合成する場合に、出力される楽音信号全体に対する当該インストゥルメントの合成割合を示すものであり、例えば、インストゥルメントごとに“０”から“１”の間で設定できるようになっている。
具体的に説明すると、例えば、ギター、ピアノおよびフルートの音色を合成する場合において、ギターの音色を全体の音色の５割にし、ピアノの音色を全体の音色の３割にし、フルートの音色を全体の音色の２割にして合成する場合には、ギターの全体多次元補間率を“０．５”、ピアノの全体多次元補間率を“０．３”、フルートの全体多次元補間率を“０．２”にそれぞれ設定する。
そして、ギター、ピアノおよびフルートにおけるそれぞれの全体多次元補間率は、合計した場合に“１”となるように設定しなければならない。
【００４４】
状態多次元補間率は、各発音状態、例えば、発音の立ち上がり状態、定常状態および開放状態のそれぞれの状態に対応する補間率である立ち上がり状態多次元補間率、定常状態多次元補間率および開放状態多次元補間率に区別される。
そして、各状態多次元補間率は、それぞれの発音状態において、インストゥルメントを合成する場合に、出力される楽音信号全体に対する当該インストゥルメントの合成割合を示すものであり、例えば、各インストゥルメントの発音状態ごとに“０”から“１”の間で設定できるようになっている。
具体的に説明すると、例えば、ギター、ピアノおよびフルートの音色を合成する場合の発音の立ち上がり状態において、ギターの音色を全体の音色の４割にし、ピアノの音色を全体の音色の５割にし、フルートの音色を全体の音色の１割にして合成する場合には、ギターの立ち上がり状態多次元補間率を“０．４”、ピアノの立ち上がり状態多次元補間率を“０．５”、フルートの立ち上がり状態多次元補間率を“０．１”にそれぞれ設定する。
そして、ギター、ピアノおよびフルートにおけるそれぞれの立ち上がり状態多次元補間率は、合計した場合に“１”となるように設定しなければならない。
【００４５】
ここで、立ち上がり状態においては、カットインなどのように突然アタック感の強い音で始める場合や、フェードインなどのように始めは弱い音から入り徐々に音を強くしていく場合などの状態を設定することができる。
また、開放状態においては、カットアウトなどのように突然音を消してしまう場合や、フェードアウトなどのように徐々に音を弱くしていきながら発音を終了させる場合などの状態を設定することができる。
【００４６】
また、ピッチ多次元補間率は、インストゥルメントのピッチを調整して音色を補間するものであり、例えば、キーオン指定のピッチと同じピッチで出力する場合を“０”として、各インストゥルメントごとに“−１”から“１”の間で任意に設定できるようになっている。
具体的に説明すると、例えば、ギター、ピアノおよびフルートの音色を合成する場合において、ギターのピッチを高めにし、ピアノのピッチをキーオン指定のピッチと同じピッチにし、フルートのピッチを低めにする場合には、ギターのピッチ多次元補間率を“０”〜“１”の間に設定し、ピアノのピッチ多次元補間率を“０”に設定し、フルートのピッチ多次元補間率を“０”〜“−１”の間に設定する。
【００４７】
また、アンプリチュード多次元補間率は、インストゥルメントのアンプリチュード（振幅）を調整して音色を補間するものであり、例えば、キーオン指定のアプリチュードと同じフォルマントを用いる場合を“０”として、各インストゥルメントごとに“−１”から“１”の間で任意に設定できるようになっている。
具体的に説明すると、例えば、ギター、ピアノおよびフルートの音色を合成する場合において、ギターのアンプリチュードを高めにし、ピアノのアンプリチュードをキーオン指定のアンプリチュードと同じアンプリチュードにし、フルートのアンプリチュードを低めにする場合には、ギターのアンプリチュード多次元補間率を“０”〜“１”の間に設定し、ピアノのアンプリチュード多次元補間率を“０”に設定し、フルートのアンプリチュード多次元補間率を“０”〜“−１”の間に設定する。
【００４８】
次に、上述した多次元補間情報をインストゥルメントごとに設定する方法を具体的に説明する。
まず、例えば、多次元補間情報を（全体、立ち上がり状態、定常状態、開放状態、ピッチ、アンプリチュード）と表す場合において、インストゥルメントであるギター、ピアノおよびフルートの音色を、ギターの音色をやや強くした音色になるように補間する場合には、ギターの多次元補間情報を（０．４、０、０、０、０、０）と設定し、ピアノの多次元補間情報を（０．３、０、０、０、０、０）と設定し、フルートの多次元補間情報を（０．３、０、０、０、０、０）と設定する。
この場合において、それぞれのインストゥルメントの全体多次元補間率を合計すると“１”になっている。
【００４９】
次に、上述した例において、さらに、発音の立ち上がりにおいてはアタック感の強いギターの音色を強めに出力し、定常状態においてはピアノの音色を強めに出力し、開放状態においてはフルートの音色が強めに出力されるように補間する場合には、ギターの多次元補間情報を（０．４、１、０、０、０、１）と設定し、ピアノの多次元補間情報を（０．３、０、１、０、０、０）と設定し、フルートの多次元補間情報を（０．３、０、０、１、０、０）と設定する。
この場合、立ち上がり状態において出力される楽音信号には、ギターの音色だけではなく、ピアノとフルートの音色も合成して出力されている。これは、全体多次元情報率の割合に応じて、各状態多次元補間率が適用されるためである。
【００５０】
［１．２］楽音処理装置の各部の構成
［１．２．１］分析データ記憶部
図２に、分析データ記憶部１２のファイル構成を示す。
図２に示すように、分析データ記憶部１２は、各インストゥルメントの各発音状態ごとに、アンプリチュードを一定の大きさに区分化したアンプリチュード区分と、ピッチを一定の大きさに区分化したピッチ区分とによって表される分析データテーブル単位にデータを区別して格納している。
そして、アンプリチュード区分とピッチ区分との交点となる領域に対してフォルマント・シェイプのデータが格納されている。
また、当該領域には、必要に応じて残差成分も格納されている。なお、フォルマント・シェイプおよび残差成分は、後述するＳＭＳ分析によって求められる。
【００５１】
ここで、図２に示されている分析データテーブルは、インストゥルメントがサックスであり、発音状態が立ち上がり状態でアタック感の強いフォルマント・シェイプを格納している分析データテーブルである。
その他にも、例えば、インストゥルメントがサックスであり、発音状態が立ち上がり状態でアタック感の弱いフォルマント・シェイプを格納している分析データテーブルや、インストゥルメントがギターであり、発音状態が定常状態でアタック感の強いフォルマント・シェイプを格納している分析データテーブルなども分析データ記憶部１２に記憶されている
このように、分析データテーブルは、インストゥルメントおよび発音の度合いが異なる状態（発音状態）ごとに、フォンマルト・シェイプおよび残差成分を格納してる。
【００５２】
また、ピッチ区分を区分けする大きさの基準は、インストゥルメントの特徴によって変えている。
例えば、ピッチの変更が、フォルマントの変動に対してあまり影響を与えることのない楽器については、ピッチの区分数を減らすようにし、ピアノのように、ピッチの変更が、フォルマントの変動に対して大きな影響を与える楽器については、ピッチの区分数を細かく設定して増やすようにする。
また、アンプリチュード区分を区分けする大きさの基準も、ピッチ区分を区分けする大きさの基準と同様に決定する。
図２に示されている分析データテーブルにおいては、縦軸で表されているピッチは、１[オクターブ]ごとに区分が設定されており、横軸で表されているアンプリチュードは、１０[ｄＢ]ごとに区分が設定されている。
【００５３】
ここで、図３を参照しながら、ＳＭＳ分析について説明する。
ＳＭＳ分析においては、まず、サンプリングされた音声波形に対して窓関数を乗ずることによって得られるフレームを抽出し、次に、抽出したフレームに対して高速フーリエ変換（ＦＦＴ）を行うことによって得られる周波数スペクトルから、正弦波成分と残差成分とを抽出する。
ここで、正弦波成分とは、基本周波数（ピッチ；Pitch）および基本周波数の倍数にあたる周波数（倍音）の成分をいう。本実施形態においては、基本周波数をピッチとし、各成分の平均アンプリチュードをアンプリチュードとし、スペクトル包絡をフォルマント・シェイプとして分析データ記憶部１２に記憶している。
また、残差成分とは、上述した周波数スペクトルから正弦波成分を除いた成分であり、本実施形態においては、図３に示すように周波数領域のデータとして分析データ記憶部１２に記憶している。また、残差成分は、特に立ち上がり状態の音を生成するときに有用な成分となる。
【００５４】
［１．２．２］補間処理部
補間処理部１３は、各インストゥルメントごとの発音情報信号および補間情報信号に基づいて抽出されたフォルマント・シェイプを各インストゥルメント単位に一時的に補間バッファに記憶しておき、後述するスペクトル補間の手法により、補間バッファに記憶された各インストゥルメントのフォルマント・シェイプを補間することによって新たな周波数成分である補間スペクトル・シェイプを生成する。
【００５５】
［１．３］第１実施形態の動作
次に、楽音処理装置１の動作例を順に説明する。
まず、楽音処理装置１の外部から、例えば、ＭＩＤＩｎｏｔｅなどの発音情報信号が入力されると、分析データ抽出部１１は、入力された発音情報信号をフレーム単位の発音情報信号に分割する。具体的には、例えば、“８分音符、ピッチが第２オクターブのド、アンプリチュードが６[ｄＢ]”を示す発音情報信号が入力された場合に、分析データ抽出部１１は、当該発音情報信号を５[ｍｓ]ごとに区切られたフレームに分割し、各フレームごとに“８分音符、ピッチが第２オクターブのド、アンプリチュードが６[ｄＢ]”および当該フレームにおける発音状態を示すデータが記憶される。
【００５６】
次に、分析データ抽出部１１は、フレーム単位に分割された発音情報信号の発音状態が、立ち上がり状態を示す信号であるか、定常状態を示す信号であるか、開放状態を示す信号であるかの判断を行う。
そして、分析データ抽出部１１は、各インストゥルメントごと、かつ、上記各状態ごとに記憶されたフォルマント・シェイプを、外部から入力された補間情報信号に含まれるピッチ多次元補間率およびアンプリチュード多次元補間率によって補間された発音情報信号のピッチおよびアンプリチュードに基づいて分析データ記憶部１２を検索し、検索によって抽出された分析データを、各インストゥルメントごとに区別された補間バッファに対して一時的に記憶させる。
【００５７】
ここで、分析データ抽出部１１による抽出処理を図９を参照して具体的に説明する。図９はインストゥルメントがピアノであり、発音状態が立ち上がり状態である場合の分析データテーブルを示している。
分析データ抽出部１１は、第１フレームの発音情報信号が、例えば、“８分音符、ピッチが第２オクターブのド、アンプリチュードが６[ｄＢ]、立ち上がり状態”を示すデータであり、ピアノの補間情報信号が（０．３、０、１、０、０、０）であった場合に、補間情報信号に含まれるピッチ多次元補間率およびアンプリチュード多次元補間率はともに“０”であるため、発音情報信号に含まれるピッチが“第２オクターブのド”であり、アンプリチュードが“６[ｄＢ]”であるピアノのフォルマント・シェイプ９ａを抽出する。そして、ピアノの補間バッファの第１フレームに対応する領域に対して、フォルマント・シェイプ９ａを記憶する。
次に、分析データ抽出部１１は、第２フレームの発音情報信号を読み込み、当該発音情報信号が、例えば、“８分音符、ピッチが第２オクターブのド、アンプリチュードが１５[ｄＢ]、立ち上がり状態”を示すデータであり、ピアノの補間情報信号が（０．３、０、１、０、０、０）であった場合に、補間情報信号に含まれるピッチ多次元補間率およびアンプリチュード多次元補間率はともに“０”であるため、発音情報信号に含まれるピッチが“第２オクターブのド”であり、アンプリチュードが“１５[ｄＢ]”であるピアノのフォルマント・シェイプ９ｂを抽出する。そして、ピアノの補間バッファの第２フレームに対応する領域に対して、フォルマント・シェイプ９ｂを記憶する。
【００５８】
補間処理部１３は、補間情報信号に含まれる３つ以上のインストゥルメントの補間処理を、以下に説明するアンカー・ポイントを用いるスペクトル補間の手法により行う。
【００５９】
まず、スペクトル補間を用いる目的は、以下の二つに大別される。
（１）二つの時間的に連続するフレームのスペクトル・シェイプを補間し、時間的に二つのフレーム間にあるフレームのスペクトル・シェイプを求める。
（２）二つの異なる音のスペクトル・シェイプを補間し、中間的な音のスペクトル・シェイプを求める。
【００６０】
図４（ａ）に示すように、補間のもととなる二つのスペクトル・シェイプ（以下、便宜上、第１スペクトル・シェイプＳＳ１および第２スペクトル・シェイプＳＳ２とする。）を各々周波数軸上で複数の領域Ｚ1、Ｚ2、……に分割する。
そして、各領域を区切る境界の周波数を各スペクトル・シェイプごとにそれぞれ以下のように設定する。この設定した境界の周波数をアンカー・ポイントと呼んでいる。
第１スペクトル・シェイプＳＳ１：ＲＢ1,1、ＲＢ2,1、……、ＲＢN,1
第２スペクトル・シェイプＳＳ２：ＲＢ1,2、ＲＢ2,2、……、ＲＢM,2
【００６１】
図４（ｂ）に線形スペクトル補間の説明図を示す。
線形スペクトル補間は、補間率により定義され、補間率Ｘは、０から１までの範囲である。この場合において、補間率Ｘ＝０は、第１スペクトル・シェイプＳＳ１そのもの、補間率Ｘ＝１は第２スペクトル・シェイプＳＳ２そのものに相当する。
【００６２】
図４（ｂ）は、補間率Ｘ＝０．３５の場合である。
また、図４（ｂ）において、縦軸上の白丸（○）は、スペクトル・シェイプを構成する周波数およびマグニチュードの組のそれぞれを示す。したがって、紙面垂直方向にマグニチュード軸が存在すると考えるのが適当である。
補間率Ｘ＝０の軸上の第１スペクトル・シェイプＳＳ１の注目するある領域Ｚｉに対応するアンカー・ポイントが、
ＲＢi,1 およびＲＢi+1,1
であり、当該領域Ｚｉに属する具体的な周波数およびマグニチュードの組のうちいずれかの組の周波数＝ｆi1であり、マグニチュード＝Ｓ1（ｆi1）であるものとする。
【００６３】
補間率Ｘ＝１の軸上の第２スペクトル・シェイプＳＳ２の注目するある領域Ｚiに対応するアンカー・ポイントが、
ＲＢi,2 およびＲＢi+1,2
であり、当該領域Ｚiに属する具体的な周波数およびマグニチュードの組のうちいずれかの組の周波数＝ｆi2であり、マグニチュード＝Ｓ2（ｆi2）であるものとする。
ここで、スペクトル遷移関数ｆtrans1（ｘ）およびスペクトル遷移関数ｆtrans2（ｘ）を求める。
【００６４】
例えば、これらを最も簡単な線形関数で表すとすると、以下のようになる。
ｆtrans1（ｘ）＝ｍ1・ｘ＋ｂ1
ｆtrans2（ｘ）＝ｍ2・ｘ＋ｂ2
ここで、
ｍ1＝ＲＢi,2−ＲＢi,1ｂ1＝ＲＢi,1
ｍ2＝ＲＢi+1,2−ＲＢi+1,1ｂ2＝ＲＢi+1,1
である。
次に第１スペクトル・シェイプＳＳ１上に実在する周波数およびマグニチュードの組に対応する補間スペクトル・シェイプ上の周波数およびマグニチュードの組を求める。
【００６５】
まず、第１スペクトル・シェイプＳＳ１上に実在する周波数およびマグニチュードの組、具体的には、周波数ｆi1、マグニチュードＳ1（ｆi1）に対応する第２スペクトル・シェイプ上の周波数＝ｆi1,2、マグニチュード＝Ｓ2（ｆi1,2）を以下のように算出する。
【数１】

ここで、
Ｗ１＝ＲＢi+1,1−ＲＢi,1
Ｗ２＝ＲＢi+1,2−ＲＢi,2
である。
マグニチュード＝Ｓ2（ｆi1,2）を算出するにあたり、第２スペクトル・シェイプＳＳ２上に実在する周波数およびマグニチュードの組のうちで周波数＝ｆi1,2を挟むように最も近い周波数をそれぞれ、(+)、(-)のサフィックスを付して表すとすると、
【数２】

となる。
【００６６】
以上から、補間率＝ｘとすると、第１スペクトル・シェイプＳＳ１上に実在する周波数およびマグニチュードの組に対応する補間スペクトル・シェイプ上の周波数ｆi1,xおよびマグニチュードＳx（ｆi1,x）は以下の式で求められる。
【数３】

Ｓx（ｆi1,x）＝Ｓ1（ｆi1）＋｛Ｓ2（ｆi1,2）−Ｓ1（ｆi1）｝・ｘ
同様にして、第１スペクトル・シェイプＳＳ１上の全ての周波数およびマグニチュードの組に対して算出する。
続いて、第２スペクトル・シェイプＳＳ２上に実在する周波数およびマグニチュードの組に対応する補間スペクトル・シェイプ上の周波数およびマグニチュードの組を求める。
【００６７】
まず、第２スペクトル・シェイプＳＳ２上に実在する周波数およびマグニチュードの組、具体的には、周波数ｆi2、マグニチュードＳ2（ｆi2）に対応する第１スペクトル・シェイプ上の周波数＝ｆi1,1、マグニチュード＝Ｓ1（ｆi1,1）を以下のように算出する。
【数４】

ここで、
Ｗ１＝ＲＢi+1,1−ＲＢi,1
Ｗ２＝ＲＢi+1,2−ＲＢi,2
である。
マグニチュード＝Ｓ1（ｆi1,1）を算出するにあたり、第１スペクトル・シェイプＳＳ１上に実在する周波数およびマグニチュードの組のうちで周波数＝ｆi2,1を挟むように最も近い周波数をそれぞれ、(+)、(-)のサフィックスを付して表すとすると、
【数５】

となる。
【００６８】
以上から、補間率＝ｘとすると、第２スペクトル・シェイプＳＳ２上に実在する周波数およびマグニチュードの組に対応する補間スペクトル・シェイプ上の周波数ｆi2,xおよびマグニチュードＳx（ｆi2,x）は以下の式で求められる。
【数６】

Ｓx（ｆi2,x）＝Ｓ2（ｆi2）＋｛Ｓ2（ｆi2）−Ｓ1（ｆi1,2）｝・（ｘ−１）
同様にして、第２スペクトル・シェイプＳＳ２上の全ての周波数およびマグニチュードの組に対して算出する。
【００６９】
上述したように第１スペクトル・シェイプＳＳ１上に実在する周波数ｆi1およびマグニチュードＳ1（ｆi1）の組に対応する第２スペクトル・シェイプＳＳ２上に実在する周波数ｆi1,2およびマグニチュードＳ2（ｆi1,2）並びに第２スペクトル・シェイプＳＳ２上に実在する周波数ｆi2およびマグニチュードＳ2（ｆi2）の組に対応する補間スペクトル・シェイプ上の周波数ｆi2,xおよびマグニチュードＳx（ｆi2,x）の全ての算出結果を周波数順に並び替えることにより、補間スペクトル・シェイプを求める。
【００７０】
これらを全ての領域Ｚ1、Ｚ2、……について行い、前周波数帯域の補間スペクトル・シェイプを算出する。
上述したアンカー・ポイントを用いるスペクトル補間の例においては、スペクトル遷移関数ｆtrans1（ｘ）、ｆtrans2（ｘ）を線形な関数としたが、二次関数、指数関数など非線形な関数として定義あるいは関数に対応する変化をテーブルとして用意するように構成することも可能である。
【００７１】
次に、３つ以上存在するインストゥルメントを上述したアンカー・ポイントを用いるスペクトル補間によって補間する方法を具体的に説明する。
まず、３つ以上存在するインストゥルメントを、２つで一組となるインストゥルメントの組に分け、それぞれの組に含まれる２つのインストゥルメントのスペクトル・シェイプに対する補間を実行して各組ごとの補間スペクトル・シェイプを得る。
そして、補間によって得られた補間スペクトル・シェイプを、さらに２つで一組となるスペクトル・シェイプの組に分け、それぞれの組に含まれる２つのスペクトル・シェイプに対する補間を実行して各組ごとの補間スペクトル・シェイプを求め、最終的に一つの補間スペクトル・シェイプが得られるまで当該補間処理を繰り返す。
上述した補間処理をｎ種のインストゥルメントに対して行うと、“ｎ−１”回の補間を実行することとなる。
なお、上記組分けをする際に、組にすることができず余ってしまったスペクトル・シェイプについては、今回のステップにおいては補間処理をせずに、次回のステップにおいて補間処理を行えばよい。
【００７２】
上述した補間処理について、図８を参照してさらに具体的に説明すると、例えば、ギター、ピアノおよびフルートの３つのインストゥルメントに対するスペクトル補間を行う場合には、まず、１回目の補間処理として、ギターのスペクトル・シェイプ８ａと、ピアノのスペクトル・シェイプ８ｂとを組にして、当該二つのスペクトル・シェイプに対するスペクトル補間を実行し、補間スペクトル・シェイプ８ｄを得る。
次に、２回目の補間処理として、補間スペクトル・シェイプ８ｄと、１回目の補間処理において補間が行われなかったフルートのスペクトル・シェイプ８ｃとを組にして、当該二つのスペクトル・シェイプに対するスペクトル補間を実行し、最終的に一つの補間スペクトル・シェイプとなる補間スペクトル・シェイプ８ｄを得る。
このように、補間処理を３つのインストゥルメントに対して行うと、補間を２回実行することとなる。
【００７３】
また、補間処理部１３は、分析データ抽出部１１によって残差成分が抽出されている場合には、以下に記載する補間処理を行ってから、先に求めた補間スペクトル・シェイプに付加する。ただし、残差成分の補間処理を行うか否かについては、ユーザが予め設定することができ、分析データ抽出部１１は、予め設定された設定値にしたがって、残差成分の補間処理を行う。
【００７４】
ここで、残差成分の補間処理について説明する。
まず、補間処理部１３は、各インストゥルメントごとの補間情報信号に基づいて、分析データ抽出部１１によって抽出された残差成分の線形補間を行い、各インストゥルメント単位の補間残差成分を生成する。次に、補間処理部１３は、複数のインストゥルメント間における補間残差成分を上述したスペクトル・シェイプを補間するときと同様に補間する。
【００７５】
また、周波数合成処理部１４は、補間処理部１３において出力された新たな周波数成分に対して逆フーリエ変換およびオーバーラップ処理を行うことによって新たな周波数信号を生成する。
ここで、本実施形態におけるオーバーラップ処理とは、フレーム単位に分割されている周波数信号の両端をなめらかに補正して、フレームをつなぎ合わせたときに歪みが生じないようにする処理をいう。
そして、エンベローブ処理部１５は、周波数合成処理部１４によって出力された新たな周波数信号に対して、エンベローブを付加することにより出力楽音信号を生成し、出力部１６によって当該出力楽音信号が出力される。
【００７６】
［１．４］第１実施形態の効果
上述した実施形態によると、補間処理部１３によって、複数のインストゥルメントのスペクトル・シェイプを組み合わせて補間して行くことによって、最終的に１つの補間スペクトル・シェイプにすることができる。したがって、複数のインストゥルメントから発音された発音情報の補間処理を行うことが可能となる。
【００７７】
［１．５］第１実施形態の変形例
［１．５．１］第１変形例
なお、上述した第１実施形態においては、分析データ記憶部に記憶されている全てのフォルマント・シェイプが、インストゥルメントからサンプリングされた楽音波形によって作成されているが、インストゥルメントの特徴的な音のみをサンプリングし、サンプリングされた楽音波形によっては直接作成されることのないフォルマント・シェイプのみを、サンプリングされた楽音波形によって作成されたフォルマント・シェイプを補間することによって作成してもよい。
【００７８】
また、インストゥルメントの特徴的な音のみをサンプリングし、サンプリングされた楽音波形によっては直接作成されることのないフォルマント・シェイプについては、楽音を生成するときに、補間処理部において、サンプリングされた楽音波形によって作成されたフォルマント・シェイプをリアルタイムに補間することによって作成するようにしてもよい。
【００７９】
［１．５．２］第２変形例
また、上述した第１実施形態においては、ユーザにより設定された補間情報に基づいて補間処理などを行っているが、ＭＩＤＩデータにより生成された音色を選択し、当該音色を生成したＭＩＤＩデータに基づいて補間情報を生成して、その補間情報に基づいて補間処理などを行ってもよい。
図５に第３変形例における楽音処理装置１’の概要構成ブロック図を示す。
図５において図１の第１実施形態と異なる点は、図５の第３変形例においては、図１の第１実施形態における構成要素に加えて、さらに、ＭＩＤＩ音色選択部４１、ＭＩＤＩ補間情報記憶部４２、変更入力部４３および補間情報生成部４４が備えられている点である。
【００８０】
具体的に説明すると、楽音処理装置１’は、楽音処理装置１に備えられている各構成要素に加え、予めプリセットされているＭＩＤＩデータにより生成された音色がユーザによって選択された場合に、当該音色に対応するＭＩＤＩメッセージを発行するＭＩＤＩメッセージ選択部４１と、ＭＩＤＩデータにより生成された音色を生成する際に設定されたＭＩＤＩデータのＭＩＤＩ補間情報を記憶しているＭＩＤＩ補間情報記憶部４２と、プリセットされている音色の各発音状態（立ち上がり状態、定常状態、開放状態）の割合および当該音色のエンベローブ情報などを変更する場合に、それらの変更データを入力する変更入力部４３と、ＭＩＤＩメッセージ選択部４１から発行されたＭＩＤＩメッセージに基づいてＭＩＤＩ補間情報記憶部４２に記憶されているＭＩＤＩ補間情報を抽出して補間情報を生成するとともに、変更入力部４３から入力された変更データにしたがって当該補間情報を変更する補間情報生成部４４とを備えて構成されている。
【００８１】
ここで、ＭＩＤＩメッセージと補間情報との間において設定されている各種の対応付けについて説明する。
まず、ＭＩＤＩメッセージのProgram Changeメッセージにおいては、プリセットされている音色に対して、予め定められた補間情報を対応付け、この対応関係をＭＩＤＩ補間情報記憶部４２に記憶しておく。
これにより、ＭＩＤＩ選択部４１からプリセットされている音色が選択されると、選択された音色のProgram Changeメッセージが発行され、補間情報生成部４４においては、当該メッセージに対応した補間情報をＭＩＤＩ補間情報記憶部４２から抽出して補間情報を生成する。
【００８２】
次に、ＭＩＤＩメッセージのControl Changeメッセージにおいては、プリセットされている音色の各発音状態（立ち上がり状態、定常状態、開放状態）の割合を決定するパラメータ値およびエンベローブ情報を決定するパラメータ値等に対して、それらのパラメータ値を変更するための変更データを受け付けて変更されるように設定しておく。
これにより、変更入力部４３から、プリセットされている音色の各発音状態（立ち上がり状態、定常状態、開放状態）の割合および当該音色のエンベローブ情報を変更させる変更データが入力されると、プリセットされた音色のControl Changeメッセージが発行され、当該メッセージを受け付けた補間情報生成部４４では、ＭＩＤＩメッセージ選択部４１によって選択された音色の各発音状態の割合およびエンベローブ情報をリアルタイムに変更することができる。
【００８３】
また、ＭＩＤＩメッセージのControl Changeメッセージにおいては、補間情報に含まれる各補間率を記憶しているパラメータに対して、Contorol Change Numberパラメータを対応付けて設定しておく。
これにより、変更入力部４３から、各補間率を変更させる変更データが入力されると、Control Changeメッセージが発行され、ＭＩＤＩメッセージ選択部４１によって選択された音色に対する各補間率をリアルタイムに変更することができる。
【００８４】
次に、ＭＩＤＩメッセージのNote On/Offメッセージにおいては、Note Velocityパラメータに対して、生成される楽音波形のアンプリチュードを対応付けるとともに、Note Keyパラメータに対して、生成される楽音波形のピッチを対応付けて、設定しておく。
これにより、変更入力部４３から、生成される楽音波形のアンプリチュードあるいはピッチの値が入力されると、Note On/Offメッセージが発行され、出力される楽音波形のアンプリチュードあるいはピッチをリアルタイムに変更することができる。
【００８５】
また、ＭＩＤＩメッセージのNote On/Offメッセージにおいて、Note Velocityパラメータに対して、補間させたいアンプリチュードを対応付けるとともに、Note Keyパラメータに対して、補間させたいピッチを対応付けて設定しておき、Note Velocityパラメータに対応付けられたアンプリチュードに基づいて分析データ記憶部１２に記憶されているアンプリチュード区分を検索してフォルマント・シェイプを抽出したり、Note Keyパラメータに対応付けられているピッチに基づいて分析データ記憶部１２に記憶されているピッチ区分を検索してフォルマント・シェイプを抽出したりしてもよい。
これによって、インストゥルメントの楽音に対して、より現実的な楽音を生成することができる。
【００８６】
次に、ＭＩＤＩメッセージのPitch Bendメッセージにおいては、Pitch Bendパラメータに対して、生成された楽音波形のピッチを対応付けて設定しておく。
これにより、変更入力部４３から、生成される楽音波形のピッチの値が入力されると、Pitch Bendメッセージが発行され、出力される楽音信号のピッチをリアルタイムに変更することができる。
【００８７】
［２］第２実施形態
次に、本発明の第２実施形態について説明する。本第２実施形態は、第１実施形態において記載した楽音処理装置をカラオケ装置に適用し、入力されたボーカル音声に対して新たに生成された楽音とのモーフィング処理を行うことが可能なカラオケ装置として構成した場合の例である。
本第２実施形態が第１実施形態と異なる点は、第１実施形態の楽音処理装置においては、外部から入力された発音情報をユーザにより設定された補間情報に基づいて補間することによって楽音を生成していたが、本第２実施形態においては、歌唱者によって入力された音声情報と、補間処理部によって生成された楽音情報とをモーフィング処理することによって新たな楽音信号を生成している点である。
【００８８】
［２．１］楽音処理装置の概要構成
図６に、第２実施形態の楽音処理装置の概要構成ブロック図を示す。
図６において図１の第１実施形態と同様の部分には同一の符号を付し、その詳細な説明を省略する。
楽音処理装置１０は、分析データ記憶部１２と、分析データ抽出部１１と、補間処理部１３と、歌唱者の音声が入力され、歌唱信号を出力する歌唱信号入力部３１と、歌唱信号のＳＭＳ分析を行ってスペクトル・シェイプ（音声正弦波成分に該当）、残差成分（音声残差成分に該当）、ピッチおよびアンプリチュードなどを出力するＳＭＳ分析部３２と、補間処理部１３によって出力された補間スペクトル・シェイプ等とＳＭＳ分析部３２によって出力された歌唱信号のスペクトル・シェイプ等とを外部より入力された合成割合データなどを含むモーフィング情報に基づいてモーフィング処理を行うモーフィング処理部３３と、モーフィング処理によって生成された新たな周波数信号に対して逆高速フーリエ変換（ＩＦＦＴ）およびオーバーラップ処理を行う周波数合成処理部１４と、出力部１６とを備えて構成されている。
【００８９】
［２．２］楽音処理装置の各部の構成
［２．２．１］ＳＭＳ分析部
ＳＭＳ分析部３２は、歌唱信号入力部３１によって入力された歌唱信号に対して窓関数を乗ずることによって得られるフレームを抽出し、次に、抽出したフレームに対して高速フーリエ変換（ＦＦＴ）を行うことによって得られる周波数スペクトルから、正弦波成分と残差成分とを抽出する。
そして、ＳＭＳ分析部３２は、正弦波成分と残差成分とをモーフィング処理部３３に対して出力するとともに、歌唱信号のうち音の名前や長さなど表すデータを発音情報として分析データ抽出部１１に対して出力する。
【００９０】
［２．２．２］モーフィング処理部
モーフィング処理部３３は、補間処理部１３によって出力された補間スペクトル・シェイプおよびＳＭＳ分析部３２によって出力された歌唱信号のスペクトル・シェイプ並びに外部より入力されたモーフィング情報に含まれるスペクトル・シェイプをモーフィングするモーフィング度に基づいてモーフィング処理を行い、モーフィング情報に応じた所望のスペクトル・シェイプ、ピッチおよびアンプリチュードを有する新たな周波数成分を生成する。
ここで、モーフィング情報には、例えば、モーフィング処理の対象となる音のスペクトル・シェイプあるいは残差成分をそれぞれモーフィング処理する際のモーフィング度、および、ピッチあるいはアンプリチュードの各値をモーフィング処理の対象となる音のうちのどちらの値に合わせるか等が含まれる。
【００９１】
また、モーフィング度は、例えば、２つの音に関するモーフィング処理を行う場合において、一方の音のみが出力されるモーフィング度を“０”と設定し、他方の音のみが出力されるモーフィング度を“１”と設定した場合に、当該モーフィング度を“０”から“１”の間で変化させることによって当該２音のうち、どちらの音にどの程度似せるかというモーフィングの度合いを調整する値である。具体的には、例えば、ギターの音色と、ボーカル音声とのモフィング処理を行う場合において、ギターの音色のみが出力されるモーフィング度を“０”と設定し、ボーカル音声のみが出力されるモーフィング度を“１”と設定した場合に、モーフィング度を“０．３”と設定した場合には、ギターの音色とボーカル音声とが“７：３”の割合でモーフィングされ、ボーカル音声よりもギターの音色に似た楽音が出力される。
【００９２】
［２．３］第２実施形態の動作
本第２実施形態の動作は、主要部の動作以外は、第１実施形態と同様であるため、主要部の動作のみを説明する。
まず、歌唱信号入力部３１によって信号入力処理が行われ、歌唱者の歌った音声信号を入力する。
次に、ＳＭＳ分析部３２は、歌唱信号入力部３１を介して入力された音声信号をＳＭＳ分析して、音声信号のスペクトル・シェイプ、ピッチ、アンプリチュード等を抽出する。
分析データ抽出部１１は、ユーザにより設定された３つ以上のインストゥルメントに関する補間情報とＳＭＳ分析部３２によって出力された発音情報とに基づいて分析データ記憶部１２を検索して、分析データを抽出するとともに、抽出した分析データを補間処理部１３に対して出力する。
【００９３】
そして、モーフィング処理部３３では、補間処理部１３によって出力された補間スペクトル・シェイプおよび補間残差成分と、ＳＭＳ分析部３２によって抽出された歌唱信号のスペクトル・シェイプおよび残差成分との間においてモーフィング処理を行う。
ここで、モーフィング処理部３３は、モーフィング処理を行う際に、外部より入力されたモーフィング情報に含まれるモーフィングを行う際の合成割合データなどにしたがってモーフィング処理を行う。
そして、周波数合成処理部１４は、モーフィング処理部３３によって出力された新たな周波数成分に対して逆高速フーリエ変換（ＩＦＦＴ）およびオーバーラップ処理を行う。
【００９４】
［２．４］第２実施形態の効果
上述した実施形態によると、モーフィング処理部３３によって、ボーカル音声信号と、３つ以上のインストゥルメントに対する補間情報信号に基づいて生成された新たな楽音信号とを任意の割合でモーフィングすることができるため、ボーカル音に対して、今までにないような新たな楽音とのモーフィングが可能となり、より新鮮な楽音信号を生成することができる。
【００９５】
［２．５］第２実施形態の変形例
なお、上述した第２実施形態においては、モーフィング処理部３３によってモーフィング処理された楽音信号を出力しているが、図７に示すように、モーフィング処理部３３を備えずに、補間処理部１３によって出力された補間スペクトル・シェイプ等と、ＳＭＳ分析部３２によって抽出された歌唱信号のスペクトル・シェイプ等とを周波数合成処理部１４に対して別々に入力して周波数合成処理を行い、生成された楽音信号および音声信号を出力部１６において混合してから出力してもよい。この場合において、出力部１６は、楽音信号と音声信号とを混合するためのミキサーを備えることとしてもよい。
なお、第２実施形態と同様にモーフィング処理部３３を備え、補間処理部１３によって出力された補間スペクトル・シェイプ等を周波数合成処理部１４によって楽音信号を生成するとともに、当該楽音信号と、モーフィング処理部３３によってモーフィング処理された後、周波数合成処理部１４によって生成された楽音信号とを出力部１６において混合してから出力してもよい。
【００９６】
また、補間処理部１３において、補間スペクトル・シェイプのピッチを、ＳＭＳ分析部３２から入力された発音情報のピッチに対して予め定められた割合により算出されたピッチに変更するようにしてもよい。
さらに、ピッチを変更する前後の補間スペクトル・シェイプを別々に処理をして出力するようにしてもよい。
この変形例によって、補間された楽音信号と歌唱者によって入力された音声信号とによる２つのハーモニーあるいは３つ以上のハーモニーを実現することができる。
【００９７】
［３］変形例
［３．１］第１変形例
なお、上述した各実施形態において、多次元補間情報は、ユーザによって設定されることに限らず、プリセット音色として保持するようにしてもよいし、ユーザが適宜にボイシングして変更できるようにしてもよい。また、コントローラによってリアルタイムに変更することができるようにしてもよい。
【００９８】
［３．２］第２変形例
また、上述した各実施形態においては、３つ以上のインストゥルメントに関する補間情報に限定しているが、従来と同様、２つのインストゥルメントに関する補間情報の場合にも適用が可能である。
【００９９】
【発明の効果】
上述したように本発明によれば、３つ以上のスペクトル・シェイプを補間することが可能となり、より多くの新しい楽音を生成することができる。
また、音色の発音状態に応じてスペクトル・シェイプの補間を行うため、より自然な楽音を生成することができる。
また、新しく生成された楽音と音声とを合成することが可能なため、より新鮮な楽音を生成することができる。
【図面の簡単な説明】
【図１】第１実施形態における楽音処理装置の概要構成を示すブロック図である。
【図２】楽音処理装置における分析データ記憶部のファイル構造を示す図である。
【図３】楽音処理装置において行われるＳＭＳ分析を説明する図である。
【図４】楽音処理装置において行われるスペクトル補間の手法を説明する図である。
【図５】第１実施形態の変形例における楽音処理装置の概要構成を示すブロック図である。
【図６】第２実施形態における楽音処理装置の概要構成を示すブロック図である。
【図７】第２実施形態の変形例における楽音処理装置の概要構成を示すブロック図である。
【図８】楽音処理装置において行われるスペクトル補間の具体例を説明する図である。
【図９】楽音処理装置における分析データ記憶部の具体例を説明する図である。
【符号の説明】
１、１’、１０、１０’……楽音処理装置、１１……分析データ抽出部、１２……分析データ記憶部、１３……補間処理部、１４……周波数合成処理部、１５……エンベローブ処理部、１６……出力部、３１……歌唱信号入力部、３２……ＳＭＳ分析部、３３……モーフィング処理部、４１……ＭＩＤＩ音色選択部、４２……ＭＩＤＩ補間情報記憶部、４３……変更入力部、４４……補間情報生成部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a musical sound processing apparatus and a musical sound processing method, and more particularly, to a musical sound processing apparatus and a musical sound processing method for generating a new musical instrument sound in which various instrument features are synthesized.
[0002]
[Prior art]
In a conventional musical tone processing apparatus, a new timbre is generated by interpolating two timbres.
With respect to the technique for interpolating two spectrum shapes, a voice quality conversion method “divides two spectrum envelopes into a plurality of frequency bands on the basis of energy-concentrated frequencies and performs spectrum conversion for each band“ Japanese Patent Application Laid-Open No. 9-244694, and by extracting corresponding time points between parameters set for two original sounds, the parameters are interpolated according to the degree of desired proximity to the two original sounds, and new parameters There is an interpolated timbre synthesizing method "JP-A-10-254500".
[0003]
[Problems to be solved by the invention]
However, since there is no technology for synthesizing three or more timbres, for example, even when trying to interpolate pronunciation information consisting of timbres of guitar, piano and flute, like guitar and piano, piano and flute, or guitar and flute Only musical tone signals obtained by interpolating two timbres could be output.
Accordingly, an object of the present invention is to provide a musical sound processing apparatus and a musical sound processing method capable of interpolating three or more spectrum shapes and generating more new musical sounds.
[0004]
[Means for Solving the Problems]
  In order to solve the above-described problem, the invention according to claim 1 is obtained by analyzing sampling data of sounds sounded in an instrument.Includes sinusoidal component, residual component, and spectral shapeAnalysis data storage means for storing analysis data;To specify the pitch, amplitude, and note length of the sound output by this musical tone processor, pronunciation information input from the outside and the sound from multiple instrument timbresBased on the interpolation information input from the outside to generate, from the analysis data storage meansPluralThe analysis data corresponding to the instruments ofRespectivelyAnalytical data extracting means for extracting, and a plurality of spectrum shapes corresponding to the extracted analytical dataWhen performing an interpolation process using two of each of the two, a spectrum shape transition function between the two spectrum shapes is used.Interpolation processing means for performing an interpolation process and generating one interpolation spectrum shape and a tone signal generation means for generating and outputting a tone signal based on the interpolation spectrum shape are provided.
[0006]
  Claim2The musical sound processing device according to claim 1, wherein the analysis data storage means is provided for each instrument.Indicates whether the signal indicates a rising state, a signal that indicates a steady state, or a signal that indicates an open statePronunciation stateEach ofThe corresponding sine wave component and residual component areMultipleRemember,The analysis data extracting means determines the sounding state of the sounding information by analyzing the input sounding state, and extracts a sine wave component and a residual component according to the determined sounding stateIt is characterized by that.
[0008]
  Claim3The invention described in claim1In the musical sound processing device described above, the interpolation processing means repeatedly performs a grouping process for dividing the spectrum shape into two types of groups and the interpolation process performed for each of the groups, thereby obtaining one interpolated spectrum shape. It is characterized by generating.
[0011]
  Claim4The invention described in claim1In the musical sound processing apparatus described above, the transition function is preliminarily defined as a linear function or a nonlinear function.
[0012]
  Claim5The invention described in claim1In the musical tone processing apparatus described above, the interpolation processing unit divides the two spectrum shapes into a plurality of regions on the frequency axis, and sets the actual frequencies and magnitudes on the two spectrum shapes belonging to each region. On the other hand, the interpolation processing using a linear function as the transition function is performed over the plurality of regions.
[0013]
  Claim6The invention described in claim5In the musical sound processing device described above, the interpolation processing means includes a first frequency that is a frequency of one spectrum shape belonging to each region and a second frequency that is a frequency of the other spectrum shape corresponding to the first frequency. Frequency interpolation means for calculating an interpolation frequency by interpolating using the linear function, a first magnitude that is the magnitude of one spectrum shape belonging to each of the regions, and the other spectrum corresponding to the first magnitude Magnitude interpolating means for interpolating the second magnitude, which is the magnitude of the shape, using the linear function is provided.
[0017]
  Claim7The music processing apparatus according to claim 2, wherein the interpolation processing means interpolates the sine wave component and the residual component stored in the analysis data storage means, thereby the analysis data. Analytical data calculation means for calculating a sine wave component and a residual component not stored by the storage meansThe interpolation processing is performed using the sine wave component and the residual component calculated by the analysis data calculation means.It is a feature.
[0021]
  Claim8According to the invention, the voice input means for inputting the voice, and the frequency analysis means for extracting the voice sine wave component and the voice residual component by analyzing the frequency of the input voice inputted from the voice input means in units of frames. When,Analysis data storage means for storing analysis data including a sine wave component, a residual component, and a spectrum shape obtained by analyzing sampling data of a sound produced by the instrument, and output by the musical sound processing device Analysis based on pronunciation information input from the outside to specify the pitch, amplitude and note length of the sound and interpolation information input from the outside to generate the sound from the timbres of multiple instruments Using analysis data extraction means for extracting the analysis data corresponding to the plurality of instruments from the data storage means, and two of the plurality of spectrum shapes corresponding to the extracted analysis data, respectively When performing the interpolation process, the spectrum between the two spectrum shapes · Performs interpolation processing using the shape transition function, and interpolation processing means for generating a single interpolated spectrum shape,Morphing means for morphing one of the interpolated spectrum shapes generated by the interpolation processing means and the speech sine wave component and the speech residual component extracted by the frequency analysis means with a predetermined morphing degree;A musical sound signal generating means for generating and outputting a musical sound signal by performing an inverse Fourier transform on the new frequency component output from the morphing means;It is characterized by having.
[0022]
  Claim9The invention described in claim8The musical tone processing apparatus according to claim 1, further comprising a morphing setting unit configured to set the predetermined morphing degree, wherein the morphing unit uses the interpolated spectrum shape, the speech sine wave component, and the voice sine wave component according to the morphing degree set by the morphing setting unit. It is characterized by morphing the speech residual component.
[0025]
  Claim10According to the invention described in claim 8, in the musical sound processing device according to claim 8, the interpolation processing means is configured to calculate the sine wave component according to a pitch determined from a predetermined ratio corresponding to the pitch of the input speech. Pitch shifting means for shifting the pitch and residual component adding means for adding a residual component to the sine wave component whose pitch is shifted by the pitch shifting means are provided.
[0027]
  Claim11The invention described inTo specify the pitch, amplitude, and note length of the sound output by this musical sound processing methodPronunciation information input from outside andTo generate that sound from the sounds of multiple instrumentsObtained by analyzing sampling data of sounds produced by instruments based on externally input interpolation informationIncludes sinusoidal component, residual component, and spectral shapeFrom analysis dataMultipleAn analytical data extraction process for extracting the analytical data corresponding to the instrument of the spectrum, and a spectrum shape corresponding to the analytical dataWhen performing an interpolation process using two of each of the two, a spectrum shape transition function between the two spectrum shapes is used.An interpolation processing process for performing an interpolation process and generating one interpolation spectrum shape, and a tone signal generation process for generating and outputting a tone signal based on the interpolation spectrum shape are provided.
[0030]
  Claim12The invention described in claim11In the musical sound processing method described above, in the interpolation processing step, a grouping process that divides the spectrum shape into two groups and a interpolation process that is performed for each group are repeated, and one interpolation spectrum shape is obtained. It is characterized by generating.
[0033]
  Claim13The invention described in claim11In the musical sound processing method described above, the interpolation processing step divides the two spectrum shapes into a plurality of regions on the frequency axis, and sets a set of actual frequencies and magnitudes on the two spectrum shapes belonging to each region. On the other hand, the interpolation processing using a linear function as the transition function is performed over the plurality of regions.
[0034]
  Claim14The invention described in claim13In the musical sound processing method described above, the interpolation processing step includes a first frequency that is a frequency of one spectrum shape belonging to each region and a second frequency that is a frequency of the other spectrum shape corresponding to the first frequency. A frequency interpolation process for calculating an interpolation frequency by interpolating using the linear function, a first magnitude which is a magnitude of one spectrum shape belonging to each region, and the other spectrum corresponding to the first magnitude. A magnitude interpolation step of interpolating a second magnitude, which is a magnitude of the shape, using the linear function.
[0035]
  Claim15The invention described in claim11In the musical sound processing method described above, the interpolation information performs overall interpolation information for performing interpolation processing for the synthesis ratio between the instruments, state interpolation information for performing interpolation processing for the sound generation state, and interpolation processing for pitch. And pitch interpolation information for amplitude and amplitude interpolation information for performing interpolation processing on the amplitude.
[0037]
  Claim16The invention described in the above, a voice input process for inputting voice, a frequency analysis process for extracting a sine wave component and a residual component by performing frequency analysis on the input voice input in the voice input process in units of frames,To specify the pitch, amplitude, and note length of the sound output by this musical sound processing methodPronunciation information input from outside andFrom the sound of multiple instrumentsObtained by analyzing the sampling data of the sound produced in the instrument based on the externally input interpolation information to generateIncludes sinusoidal component, residual component, and spectral shapeFrom analysis dataMultipleAn analytical data extraction process for extracting the analytical data corresponding to the instrument of the spectrum, and a spectrum shape corresponding to the analytical dataWhen performing an interpolation process using two of each of the two, a spectrum shape transition function between the two spectrum shapes is used.An interpolation process for performing an interpolation process to generate one interpolation spectrum shape, and the sine wave component and the residual component extracted by one interpolation spectrum shape generated by the interpolation process and the frequency analysis process A morphing process in which morphing is performed according to a predetermined morphing degree;A musical sound signal generation process for generating and outputting a musical sound signal by performing an inverse Fourier transform on the new frequency component output in the morphing process;And the analysis data extraction step extracts the analysis data corresponding to three or more instruments based on pronunciation information included in the sine wave component and interpolation information input from the outside. It is said.
[0038]
  Claim17The invention described in claim16The musical sound processing method according to claim 1, further comprising a morphing setting process for setting the predetermined morphing degree, wherein the morphing process includes the interpolated spectrum shape, the audio sine wave component, and the voice sine wave component according to the morphing degree set by the morphing setting process. It is characterized by morphing the speech residual component.
[0040]
  Claim18The invention described in claim16In the musical sound processing method described above, the interpolation processing step includes a pitch shift step of shifting the pitch of the sine wave component by a pitch determined from a predetermined ratio corresponding to the pitch of the input voice, And a residual component adding step of adding a residual component to the sine wave component whose pitch is shifted by the pitch shifting step.
[0041]
DETAILED DESCRIPTION OF THE INVENTION
Next, preferred embodiments of the present invention will be described with reference to the drawings.
[1] First embodiment
[1.1] Overall configuration of musical tone processing apparatus
FIG. 1 shows the overall configuration of a musical tone processing apparatus according to the first embodiment of the present invention.
The musical sound processing apparatus 1 analyzes the sampled sound waveform of the instrument by SMS analysis, which will be described later. The analysis data storage unit 12 stores in advance the analysis data obtained as a result of the analysis, and is input from the outside. An analysis data extraction unit 11 that extracts analysis data stored in the analysis data storage unit 12 on the basis of the pronunciation information signal and the interpolation information signal generated by the interpolation information set by the user, and the extracted analysis data Interpolation processing unit 13 that generates a new frequency component by performing interpolation based on the frequency, frequency synthesis processing unit 14 that performs inverse fast Fourier transform (IFFT) and overlap processing on the generated frequency component, and frequency synthesis An envelope processing unit 1 that performs an envelope addition process on the frequency signal generated by the processing unit 14. When it is configured to include an output unit 16 for outputting the output musical tone signals generated by the envelope processing unit 15.
[0042]
Here, the pronunciation information in the present embodiment refers to information including the name and length of the sound such as “eighth note de”. Therefore, data such as MIDInote specifying the octave and the name of the sound and MIDIst specifying the note type are also included in the pronunciation information.
[0043]
Next, the interpolation information described above will be described in detail.
Interpolation information in this embodiment refers to multidimensional interpolation information including various interpolation rates that can be set for each instrument.
The multidimensional interpolation information includes an overall multidimensional interpolation rate, a state multidimensional interpolation rate in each sound generation state, which will be described later, a pitch multidimensional interpolation rate, and an amplitude multidimensional interpolation rate.
The overall multidimensional interpolation rate indicates the synthesis ratio of the instrument with respect to the entire musical sound signal that is output when the instrument is synthesized. For example, the overall multidimensional interpolation rate is “0” to “1” for each instrument. Can be set between.
Specifically, for example, when synthesizing guitar, piano, and flute sounds, the guitar tone is set to 50% of the overall tone, the piano tone is set to 30% of the overall tone, and the flute tone is set to the whole. When synthesizing with the tone of 20%, the overall multidimensional interpolation rate of the guitar is “0.5”, the overall multidimensional interpolation rate of the piano is “0.3”, and the overall multidimensional interpolation rate of the flute is “ Set to 0.2 "respectively.
The overall multidimensional interpolation rates for guitar, piano, and flute must be set to “1” when summed.
[0044]
The state multi-dimensional interpolation rate is determined by each of the sound generation states, for example, the rising state multi-dimensional interpolation rate, the steady state multi-dimensional interpolation rate, and the open state, which are interpolation rates corresponding to each of the sound rising state, steady state, and open state. A distinction is made between multidimensional interpolation rates.
Each state multi-dimensional interpolation rate indicates the composition ratio of the instrument with respect to the entire output musical sound signal when the instrument is synthesized in each sound generation state. It can be set between “0” and “1” for each sounding state of the ment.
More specifically, for example, in the state of sounding when synthesizing guitar, piano and flute sounds, the guitar tone is set to 40% of the overall tone, and the piano tone is set to 50% of the overall tone, When synthesizing the flute tone with 10% of the total tone, the guitar rising state multidimensional interpolation rate is “0.4”, the piano rising state multidimensional interpolation rate is “0.5”, and the flute tone is The rising state multidimensional interpolation rate is set to “0.1”, respectively.
The rising state multidimensional interpolation rates for the guitar, piano, and flute must be set to "1" when summed.
[0045]
Here, in the start-up state, there are situations such as when suddenly starting with a strong attack sound such as cut-in, or when starting with a weak sound and gradually increasing the sound, such as fade-in. Can be set.
In the open state, it is possible to set a state such as when the sound suddenly disappears, such as cut out, or when the sound is terminated while gradually fading the sound, such as fade out. .
[0046]
Also, the pitch multidimensional interpolation rate is to interpolate the timbre by adjusting the pitch of the instrument. For example, when outputting at the same pitch as the key-on designated pitch, “0” is set for each instrument. Can be arbitrarily set between “−1” and “1”.
More specifically, for example, when synthesizing guitar, piano, and flute sounds, the pitch of the guitar is increased, the pitch of the piano is the same as the key-on specified pitch, and the pitch of the flute is decreased. Sets the pitch multidimensional interpolation rate of the guitar between “0” to “1”, sets the pitch multidimensional interpolation rate of the piano to “0”, and sets the pitch multidimensional interpolation rate of the flute to “0” to “0”. Set between "-1".
[0047]
The amplitude multi-dimensional interpolation rate adjusts the amplitude (amplitude) of the instrument to interpolate the timbre. It can be arbitrarily set between “−1” and “1” for each instrument.
Specifically, for example, when synthesizing guitar, piano, and flute sounds, the guitar amplitude is increased, the piano amplitude is set to the same amplitude as the key-on specified amplitude, and the flute amplitude is set to the same. When lower, set the guitar amplitude multi-dimensional interpolation rate between "0" and "1", set the piano amplitude multi-dimensional interpolation rate to "0", and set the flute amplitude many The dimension interpolation rate is set between “0” and “−1”.
[0048]
Next, a method for setting the above-described multidimensional interpolation information for each instrument will be specifically described.
First, for example, when multi-dimensional interpolation information is represented as (whole, rising state, steady state, open state, pitch, amplitude), the guitar, piano, and flute timbres that are instruments are somewhat different from the guitar timbre. When interpolation is performed so that the tone is strengthened, the multidimensional interpolation information of the guitar is set to (0.4, 0, 0, 0, 0, 0), and the multidimensional interpolation information of the piano is set to (0.3 , 0, 0, 0, 0, 0) and multi-dimensional interpolation information of the flute is set as (0.3, 0, 0, 0, 0, 0).
In this case, the total multidimensional interpolation rate of each instrument is “1”.
[0049]
Next, in the example described above, the guitar tone with a strong attack is output stronger at the beginning of pronunciation, the piano tone is output stronger in the steady state, and the flute tone is stronger in the open state. When the interpolation is performed so that the multi-dimensional interpolation information of the guitar is set to (0.4, 1, 0, 0, 0, 1), the multi-dimensional interpolation information of the piano is set to (0.3, 0, 1, 0, 0, 0) and flute multidimensional interpolation information is set to (0.3, 0, 0, 1, 0, 0).
In this case, not only the guitar tone but also the piano and flute tone are synthesized and output in the musical tone signal output in the rising state. This is because each state multidimensional interpolation rate is applied according to the ratio of the overall multidimensional information rate.
[0050]
[1.2] Configuration of each part of musical tone processing apparatus
[1.2.1] Analysis data storage unit
FIG. 2 shows the file structure of the analysis data storage unit 12.
As shown in FIG. 2, the analysis data storage unit 12 divides the amplitude into a certain size and the pitch into a certain size for each sounding state of each instrument. The data is stored separately in the analysis data table unit represented by the pitch division.
In addition, formant shape data is stored in a region that is an intersection of the amplitude division and the pitch division.
In addition, a residual component is also stored in this area as necessary. Note that the formant shape and the residual component are obtained by SMS analysis described later.
[0051]
Here, the analysis data table shown in FIG. 2 is an analysis data table in which the instrument is a saxophone, the sound generation state is a rising state, and a strong formant shape is stored.
In addition, for example, an instrument is a sax, an analysis data table storing a formant shape with a weak attack and a sounding state that is standing up, or an instrument is a guitar and the sounding state is steady An analysis data table storing formant shapes with a strong attack feeling is also stored in the analysis data storage unit 12
As described above, the analysis data table stores the von Mart shape and the residual component for each state (sound generation state) having different instruments and sound generation levels.
[0052]
In addition, the standard for the size of pitch division is changed according to the characteristics of the instrument.
For example, for instruments that do not significantly affect the change in formant, the number of pitch divisions should be reduced. Like a piano, the change in pitch is large against the change in formant. For instruments that have an impact, the number of pitch divisions is set finely and increased.
In addition, the reference for the size for dividing the amplitude division is determined in the same manner as the reference for the size for dividing the pitch division.
In the analysis data table shown in FIG. 2, the pitch represented by the vertical axis is set for each 1 [octave], and the amplitude represented by the horizontal axis is 10 [dB]. ] Is set for each].
[0053]
Here, the SMS analysis will be described with reference to FIG.
In SMS analysis, first, a frame obtained by multiplying a sampled speech waveform by a window function is extracted, and then a frequency obtained by performing fast Fourier transform (FFT) on the extracted frame. A sine wave component and a residual component are extracted from the spectrum.
Here, the sine wave component means a component of a fundamental frequency (pitch) and a frequency (overtone) corresponding to a multiple of the fundamental frequency. In this embodiment, the fundamental frequency is the pitch, the average amplitude of each component is the amplitude, and the spectrum envelope is stored in the analysis data storage unit 12 as a formant shape.
Further, the residual component is a component obtained by removing the sine wave component from the above-described frequency spectrum. In this embodiment, the residual component is stored in the analysis data storage unit 12 as frequency domain data as shown in FIG. . The residual component is a useful component particularly when generating a rising sound.
[0054]
[1.2.2] Interpolation processing unit
The interpolation processing unit 13 temporarily stores the pronunciation information signal and the formant shape extracted based on the interpolation information signal for each instrument in the interpolation buffer for each instrument unit, and performs spectral interpolation described later. By interpolating the formant shape of each instrument stored in the interpolation buffer by the above method, an interpolated spectrum shape which is a new frequency component is generated.
[0055]
[1.3] Operation of the first embodiment
Next, operation examples of the musical tone processing apparatus 1 will be described in order.
First, when a pronunciation information signal such as MIDInote is input from the outside of the musical tone processing apparatus 1, the analysis data extraction unit 11 divides the input pronunciation information signal into a pronunciation information signal in frame units. Specifically, for example, when a pronunciation information signal indicating “eighth note, pitch of the second octave, amplitude is 6 [dB]” is input, the analysis data extraction unit 11 performs the pronunciation information. The signal is divided into frames divided every 5 [ms]. For each frame, “eighth note, pitch is 2nd octave, amplitude is 6 [dB]”, and data indicating the sounding state in the frame Is memorized.
[0056]
Next, the analysis data extraction unit 11 determines whether the sound generation state of the sound generation information signal divided into frames is a signal indicating a rising state, a signal indicating a steady state, or a signal indicating an open state. Make a decision.
Then, the analysis data extraction unit 11 converts the formant shape stored for each instrument and for each state into the pitch multidimensional interpolation rate and the amplitude multiple included in the interpolation information signal input from the outside. The analysis data storage unit 12 is searched based on the pitch and amplitude of the pronunciation information signal interpolated by the dimensional interpolation rate, and the analysis data extracted by the search is applied to the interpolation buffer distinguished for each instrument. Remember me temporarily.
[0057]
Here, the extraction process by the analysis data extraction unit 11 will be specifically described with reference to FIG. FIG. 9 shows an analysis data table when the instrument is a piano and the sounding state is a rising state.
The analysis data extraction unit 11 is a data in which the pronunciation information signal of the first frame indicates, for example, “an eighth note, a pitch of the second octave, an amplitude of 6 [dB], and a rising state”. When the interpolation information signal is (0.3, 0, 1, 0, 0, 0), the pitch multidimensional interpolation rate and the amplitude multidimensional interpolation rate included in the interpolation information signal are both “0”. Therefore, a piano formant shape 9a having a pitch included in the pronunciation information signal of “second octave de” and an amplitude of “6 [dB]” is extracted. Then, the formant shape 9a is stored for the area corresponding to the first frame of the piano interpolation buffer.
Next, the analysis data extraction unit 11 reads the sound information signal of the second frame, and the sound information signal is, for example, “eighth note, pitch of the second octave, amplitude is 15 [dB], and rises. State "and when the piano interpolation information signal is (0.3, 0, 1, 0, 0, 0), the pitch multidimensional interpolation rate and the amplitude multiple included in the interpolation information signal Since both of the dimensional interpolation ratios are “0”, the formant shape 9b of the piano whose pitch included in the pronunciation information signal is “second octave” and whose amplitude is “15 [dB]” is extracted. . Then, the formant shape 9b is stored for the area corresponding to the second frame of the piano interpolation buffer.
[0058]
The interpolation processing unit 13 performs interpolation processing of three or more instruments included in the interpolation information signal by a spectrum interpolation method using anchor points, which will be described below.
[0059]
First, the purpose of using spectral interpolation is roughly divided into the following two.
(1) Interpolate the spectrum shape of two temporally continuous frames to obtain the spectrum shape of a frame between the two frames in time.
(2) Interpolate two different sound spectrum shapes to obtain an intermediate sound spectrum shape.
[0060]
As shown in FIG. 4 (a), a plurality of two spectrum shapes (hereinafter referred to as a first spectrum shape SS1 and a second spectrum shape SS2 for convenience) on which interpolation is performed are respectively provided on the frequency axis. Are divided into areas Z1, Z2,.
And the frequency of the boundary which divides each area | region is set as follows for every spectrum shape, respectively. This set boundary frequency is called an anchor point.
First spectrum shape SS1: RB1,1, RB2,1,..., RBN, 1
Second spectrum shape SS2: RB1,2, RB2,2, ..., RBM, 2
[0061]
FIG. 4B shows an explanatory diagram of linear spectrum interpolation.
Linear spectral interpolation is defined by the interpolation rate, and the interpolation rate X ranges from 0 to 1. In this case, the interpolation rate X = 0 corresponds to the first spectrum shape SS1 itself, and the interpolation rate X = 1 corresponds to the second spectrum shape SS2 itself.
[0062]
FIG. 4B shows the case where the interpolation rate X = 0.35.
In FIG. 4B, white circles (◯) on the vertical axis indicate each of the frequency and magnitude sets that constitute the spectrum shape. Therefore, it is appropriate to consider that the magnitude axis exists in the direction perpendicular to the paper surface.
An anchor point corresponding to a certain region Zi of interest of the first spectrum shape SS1 on the axis of the interpolation rate X = 0 is
RBi, 1 and RBi + 1,1
It is assumed that either one of the specific frequency and magnitude sets belonging to the region Zi has frequency = fi1 and magnitude = S1 (fi1).
[0063]
An anchor point corresponding to a certain region Zi of interest of the second spectrum shape SS2 on the axis of the interpolation rate X = 1 is
RBi, 2 and RBi + 1,2
It is assumed that either one of the specific frequency and magnitude sets belonging to the region Zi has frequency = fi2 and magnitude = S2 (fi2).
Here, the spectrum transition function ftrans1 (x) and the spectrum transition function ftrans2 (x) are obtained.
[0064]
  For example, if these are expressed by the simplest linear function, the following is obtained.
        ftrans1 (x) = m1 · x + b1
        ftrans2 (x) = m2 · x + b2
  here,
        m1 = RBi, 2-RBi, 1b1 = RBi, 1
        m2 = RBi + 1,2-RBi + 1,1b2 = RBi + 1,1
  It is.
  Next, a set of frequency and magnitude on the interpolated spectrum shape corresponding to the set of frequency and magnitude existing on the first spectrum shape SS1 is obtained.
[0065]
First, a set of frequencies and magnitudes existing on the first spectrum shape SS1, specifically, frequencies fi1, frequency on the second spectrum shape corresponding to magnitude S1 (fi1) = fi1,2, magnitude = S2. (Fi1,2) is calculated as follows.
[Expression 1]

here,
W1 = RBi + 1,1-RBi, 1
W2 = RBi + 1,2-RBi, 2
It is.
In calculating magnitude = S2 (fi1,2), the frequency closest to the frequency = fi1,2 in the set of frequencies and magnitudes existing on the second spectrum shape SS2 is respectively (+), If it is expressed with a (-) suffix,
[Expression 2]

It becomes.
[0066]
From the above, assuming that the interpolation rate = x, the frequency fi1, x and the magnitude Sx (fi1, x) on the interpolated spectrum shape corresponding to the set of frequencies and magnitudes actually existing on the first spectrum shape SS1 are as follows: Is required.
[Equation 3]

Sx (fi1, x) = S1 (fi1) + {S2 (fi1,2) -S1 (fi1)}. X
Similarly, calculation is performed for all frequency and magnitude pairs on the first spectrum shape SS1.
Subsequently, a set of frequency and magnitude on the interpolated spectrum shape corresponding to the set of frequency and magnitude actually existing on the second spectrum shape SS2 is obtained.
[0067]
First, a set of frequencies and magnitudes existing on the second spectrum shape SS2, specifically, frequency fi1, frequency on the first spectrum shape corresponding to magnitude S2 (fi2) = fi1,1, magnitude = S1. (Fi1,1) is calculated as follows.
[Expression 4]

here,
W1 = RBi + 1,1-RBi, 1
W2 = RBi + 1,2-RBi, 2
It is.
In calculating magnitude = S1 (fi1,1), (+), the frequency closest to the frequency = fi2,1 in the set of frequencies and magnitudes existing on the first spectrum shape SS1, respectively. If it is expressed with a (-) suffix,
[Equation 5]

It becomes.
[0068]
From the above, assuming that the interpolation rate = x, the frequency fi2, x and the magnitude Sx (fi2, x) on the interpolated spectrum shape corresponding to the set of frequencies and magnitudes existing on the second spectrum shape SS2 are as follows: Is required.
[Formula 6]

Sx (fi2, x) = S2 (fi2) + {S2 (fi2) -S1 (fi1,2)}. (X-1)
Similarly, calculation is performed for all frequency and magnitude sets on the second spectrum shape SS2.
[0069]
As described above, the frequencies fi1,2 and the magnitude S2 (fi1,2) existing on the second spectrum shape SS2 corresponding to the set of the frequency fi1 and the magnitude S1 (fi1) existing on the first spectrum shape SS1 and All the calculation results of the frequency fi2, x and the magnitude Sx (fi2, x) on the interpolated spectrum shape corresponding to the set of the frequency fi2 and the magnitude S2 (fi2) existing on the second spectrum shape SS2 are arranged in the order of frequency. By interchanging, the interpolated spectrum shape is obtained.
[0070]
These are performed for all the regions Z1, Z2,..., And the interpolated spectrum shape of the previous frequency band is calculated.
In the example of spectral interpolation using the anchor point described above, the spectral transition functions ftrans1 (x) and ftrans2 (x) are linear functions, but are defined or correspond to nonlinear functions such as quadratic functions and exponential functions. It is also possible to configure so that changes to be made are prepared as a table.
[0071]
Next, a method of interpolating three or more existing instruments by spectral interpolation using the anchor points described above will be specifically described.
First, three or more instruments are divided into two pairs of instruments, and interpolation is performed on the spectrum shapes of the two instruments included in each pair. Get the interpolated spectral shape of each.
Then, the interpolated spectrum shape obtained by the interpolation is further divided into a set of two spectrum shapes, and interpolation is performed on the two spectrum shapes included in each set to perform each of the sets. An interpolation spectrum shape is obtained, and the interpolation process is repeated until one interpolation spectrum shape is finally obtained.
When the above-described interpolation processing is performed on n types of instruments, “n−1” interpolations are performed.
It should be noted that for the spectrum shapes that cannot be grouped and left over when the above grouping is performed, the interpolation process may be performed in the next step without performing the interpolation process in the current step.
[0072]
The above-described interpolation processing will be described in more detail with reference to FIG. 8. For example, when performing spectral interpolation for three instruments of guitar, piano, and flute, first, as the first interpolation processing, The spectrum shape 8a of the guitar and the spectrum shape 8b of the piano are paired, and spectrum interpolation is performed on the two spectrum shapes to obtain an interpolated spectrum shape 8d.
Next, as the second interpolation process, the interpolated spectrum shape 8d and the flute spectrum shape 8c that has not been interpolated in the first interpolation process are paired, and the spectrum interpolation for the two spectrum shapes is performed. To obtain an interpolated spectrum shape 8d which finally becomes one interpolated spectrum shape.
As described above, when interpolation processing is performed on three instruments, interpolation is performed twice.
[0073]
In addition, when the residual component is extracted by the analysis data extraction unit 11, the interpolation processing unit 13 performs the interpolation processing described below and adds it to the previously obtained interpolated spectrum shape. However, whether or not to perform residual component interpolation processing can be set in advance by the user, and the analysis data extraction unit 11 performs residual component interpolation processing in accordance with a preset setting value.
[0074]
Here, residual component interpolation processing will be described.
First, the interpolation processing unit 13 performs linear interpolation of the residual component extracted by the analysis data extracting unit 11 based on the interpolation information signal for each instrument, and calculates the interpolation residual component of each instrument unit. Generate. Next, the interpolation processing unit 13 interpolates interpolation residual components between a plurality of instruments in the same manner as when interpolating the above-described spectrum shape.
[0075]
In addition, the frequency synthesis processing unit 14 generates a new frequency signal by performing inverse Fourier transform and overlap processing on the new frequency component output from the interpolation processing unit 13.
Here, the overlap processing in the present embodiment refers to processing for smoothly correcting both ends of the frequency signal divided in units of frames so that distortion does not occur when the frames are joined.
Then, the envelope processing unit 15 generates an output tone signal by adding an envelope to the new frequency signal output by the frequency synthesis processing unit 14, and the output unit 16 outputs the output tone signal. .
[0076]
[1.4] Effects of the first embodiment
According to the above-described embodiment, the interpolation processing unit 13 can combine the spectrum shapes of a plurality of instruments and interpolate to finally make one interpolated spectrum shape. Therefore, it is possible to perform interpolation processing of pronunciation information generated from a plurality of instruments.
[0077]
[1.5] Modification of the first embodiment
[1.5.1] First modification
In the first embodiment described above, all formant shapes stored in the analysis data storage unit are created by musical sound waveforms sampled from the instrument. Only formants that sample only the sound and are not directly created by the sampled musical sound waveform may be created by interpolating formant shapes created by the sampled musical sound waveform.
[0078]
In addition, only the characteristic sounds of the instrument are sampled. For formant shapes that are not created directly depending on the sampled musical sound waveform, the interpolation processing unit sampled the formant shapes. You may make it produce by interpolating the formant shape produced by the musical sound waveform in real time.
[0079]
[1.5.2] Second modification
Further, in the first embodiment described above, interpolation processing or the like is performed based on the interpolation information set by the user. However, based on the MIDI data that selects the timbre generated by the MIDI data and generates the timbre. Then, interpolation information may be generated, and interpolation processing or the like may be performed based on the interpolation information.
FIG. 5 shows a schematic block diagram of a musical tone processing apparatus 1 'according to the third modification.
5 differs from the first embodiment of FIG. 1 in that, in the third modification of FIG. 5, in addition to the components in the first embodiment of FIG. 1, a MIDI tone color selection unit 41, MIDI interpolation information The storage unit 42, the change input unit 43, and the interpolation information generation unit 44 are provided.
[0080]
More specifically, the musical tone processing device 1 ′, when the tone color generated by the preset MIDI data is selected by the user in addition to each component provided in the musical tone processing device 1 A MIDI message selection unit 41 that issues a MIDI message corresponding to a timbre, a MIDI interpolation information storage unit 42 that stores MIDI interpolation information of the MIDI data set when generating a timbre generated by the MIDI data, When changing the ratio of each tone generation state (rising state, steady state, open state) of preset timbre and the envelope information of the timbre, change input unit 43 for inputting the change data, and MIDI message selection Based on the MIDI message issued from the unit 41, the MIDI interpolation information storage unit 4 And an interpolation information generating unit 44 that generates the interpolation information by extracting the MIDI interpolation information stored in the change information, and changes the interpolation information in accordance with the change data input from the change input unit 43. .
[0081]
Here, various associations set between the MIDI message and the interpolation information will be described.
First, in the Program Change message of the MIDI message, predetermined interpolation information is associated with the preset timbre, and this correspondence is stored in the MIDI interpolation information storage unit 42.
Thus, when a preset tone color is selected from the MIDI selection unit 41, a Program Change message of the selected tone color is issued, and the interpolation information generation unit 44 obtains the interpolation information corresponding to the message as MIDI interpolation information. Interpolation information is generated by extracting from the storage unit 42.
[0082]
Next, in the Control Change message of the MIDI message, a parameter value for determining a ratio of each tone generation state (rise state, steady state, open state) of a preset tone color, a parameter value for determining envelope information, and the like. The change data for changing the parameter values is received and changed.
As a result, when change data for changing the ratio of each tone generation state (rise state, steady state, open state) of the preset tone color and the envelope information of the tone color is input from the change input unit 43, the preset tone color is preset. When the timbre Control Change message is issued and the interpolation information generation unit 44 accepts the timbre Control Change message, it is possible to change, in real time, the ratio of each tone state selected by the MIDI message selection unit 41 and the envelope information.
[0083]
In the Control Change message of the MIDI message, the Control Change Number parameter is set in association with the parameter storing each interpolation rate included in the interpolation information.
Thus, when change data for changing each interpolation rate is input from the change input unit 43, a Control Change message is issued, and each interpolation rate for the tone selected by the MIDI message selection unit 41 is changed in real time. Can do.
[0084]
Next, in the Note On / Off message of the MIDI message, the generated tone waveform amplitude is associated with the Note Velocity parameter, and the generated tone waveform pitch is associated with the Note Key parameter. And set it.
As a result, when the generated tone waveform amplitude or pitch value is input from the change input unit 43, a Note On / Off message is issued, and the output tone waveform amplitude or pitch is changed in real time. can do.
[0085]
Also, in the Note On / Off message of the MIDI message, the amplitude to be interpolated is associated with the Note Velocity parameter, the pitch to be interpolated is associated with the Note Key parameter, and set. Based on the amplitude associated with the parameter, the amplitude classification stored in the analysis data storage unit 12 is searched to extract the formant shape, or the analysis is performed based on the pitch associated with the Note Key parameter. A formant shape may be extracted by searching pitch divisions stored in the data storage unit 12.
As a result, a more realistic musical sound can be generated with respect to the musical sound of the instrument.
[0086]
Next, in the Pitch Bend message of the MIDI message, the pitch of the generated musical tone waveform is set in association with the Pitch Bend parameter.
Thus, when the pitch value of the generated musical sound waveform is input from the change input unit 43, a Pitch Bend message is issued, and the pitch of the output musical sound signal can be changed in real time.
[0087]
[2] Second embodiment
Next, a second embodiment of the present invention will be described. In the second embodiment, the musical sound processing device described in the first embodiment is applied to a karaoke device, and a karaoke device capable of performing a morphing process with a newly generated musical sound on an input vocal voice. It is an example when configured as.
The second embodiment is different from the first embodiment in that the musical sound processing apparatus according to the first embodiment interpolates externally input pronunciation information based on interpolation information set by the user to generate musical sounds. In the second embodiment, a new musical tone signal is generated by morphing the voice information input by the singer and the musical tone information generated by the interpolation processing unit in the second embodiment. It is.
[0088]
[2.1] Outline configuration of musical tone processing apparatus
FIG. 6 shows a schematic block diagram of the musical tone processing apparatus according to the second embodiment.
In FIG. 6, the same parts as those of the first embodiment of FIG.
The musical sound processing apparatus 10 includes an analysis data storage unit 12, an analysis data extraction unit 11, an interpolation processing unit 13, a singing signal input unit 31 that receives a voice of a singer and outputs a singing signal, and an SMS of the singing signal. An SMS analysis unit 32 that performs analysis and outputs a spectrum shape (corresponding to a speech sine wave component), a residual component (corresponding to a speech residual component), a pitch, an amplitude, and the like, and output by the interpolation processing unit 13 A morphing processing unit 33 that performs morphing processing based on morphing information including interpolated spectrum shape and the like, spectrum ratio of the singing signal output by the SMS analysis unit 32, and composition ratio data input from the outside, and morphing Inverse Fast Fourier Transform (IFFT) and over the new frequency signal generated by processing A frequency synthesis processing unit 14 for performing-up process, is configured to include an output unit 16.
[0089]
[2.2] Configuration of each part of musical tone processing device
[2.2.1] SMS analysis unit
The SMS analysis unit 32 extracts a frame obtained by multiplying the singing signal input by the singing signal input unit 31 by a window function, and then performs fast Fourier transform (FFT) on the extracted frame. A sine wave component and a residual component are extracted from the frequency spectrum obtained by this.
Then, the SMS analysis unit 32 outputs the sine wave component and the residual component to the morphing processing unit 33, and the analysis data extraction unit 11 uses data representing the name and length of the sound in the singing signal as pronunciation information. Output for.
[0090]
[2.2.2] Morphing processing unit
The morphing processing unit 33 morphs the interpolated spectrum shape output by the interpolation processing unit 13, the spectrum shape of the singing signal output by the SMS analysis unit 32, and the spectrum shape included in the morphing information input from the outside. Morphing processing is performed based on the morphing degree, and a new frequency component having a desired spectrum shape, pitch, and amplitude corresponding to the morphing information is generated.
Here, the morphing information includes, for example, the morphing degree when the morphing process is performed on the spectrum shape or residual component of the sound to be morphed, and the pitch or amplitude values as the morphing target. Which value of the sound to be matched is included.
[0091]
For example, when performing morphing processing on two sounds, the morphing degree at which only one sound is output is set to “0” and the morphing degree at which only the other sound is output is “1”. When “” is set, the morphing degree is adjusted to change the degree of morphing to which one of the two sounds resembles the sound by changing the degree of morphing between “0” and “1”. Specifically, for example, when performing a morphing process between a guitar tone and vocal sound, the morphing degree at which only the guitar tone is output is set to “0”, and only the vocal sound is output. When “1” is set and the morphing degree is set to “0.3”, the guitar tone and vocal sound are morphed at a ratio of “7: 3”, and the guitar sound is more than the vocal sound. Musical sound similar to the tone is output.
[0092]
[2.3] Operation of the second embodiment
Since the operation of the second embodiment is the same as that of the first embodiment except for the operation of the main part, only the operation of the main part will be described.
First, a signal input process is performed by the singing signal input unit 31, and an audio signal sung by the singer is input.
Next, the SMS analysis unit 32 performs SMS analysis on the audio signal input via the singing signal input unit 31, and extracts the spectrum shape, pitch, amplitude, and the like of the audio signal.
The analysis data extraction unit 11 searches the analysis data storage unit 12 on the basis of the interpolation information regarding three or more instruments set by the user and the pronunciation information output by the SMS analysis unit 32 to obtain the analysis data. At the same time, the extracted analysis data is output to the interpolation processing unit 13.
[0093]
The morphing processing unit 33 morphs between the interpolated spectrum shape and the interpolation residual component output by the interpolation processing unit 13 and the spectrum shape and the residual component of the singing signal extracted by the SMS analysis unit 32. Process.
Here, when performing the morphing process, the morphing processing unit 33 performs the morphing process according to the composition ratio data used when performing the morphing included in the morphing information input from the outside.
Then, the frequency synthesis processing unit 14 performs inverse fast Fourier transform (IFFT) and overlap processing on the new frequency component output by the morphing processing unit 33.
[0094]
[2.4] Effects of the second embodiment
According to the embodiment described above, the morphing processing unit 33 can morph the vocal sound signal and the new musical sound signal generated based on the interpolation information signals for three or more instruments at an arbitrary ratio. Therefore, the vocal sound can be morphed with a new musical sound that has never existed before, and a fresher musical sound signal can be generated.
[0095]
[2.5] Modification of the second embodiment
In the second embodiment described above, the musical sound signal morphed by the morphing processing unit 33 is output. However, as shown in FIG. 7, the morphing processing unit 33 is not provided and the interpolation processing unit 13 does not include the morphing processing unit 33. The output interpolated spectrum shape and the like, and the spectrum shape and the like of the singing signal extracted by the SMS analysis unit 32 are separately input to the frequency synthesis processing unit 14 to perform frequency synthesis processing, and the generated musical sound The signal and the audio signal may be mixed at the output unit 16 and then output. In this case, the output unit 16 may include a mixer for mixing the musical sound signal and the audio signal.
Similar to the second embodiment, a morphing processing unit 33 is provided, and the interpolated spectrum, shape, and the like output from the interpolation processing unit 13 are generated by the frequency synthesis processing unit 14, and the musical sound signal and the morphing process are generated. After the morphing process is performed by the unit 33, the tone signal generated by the frequency synthesis processing unit 14 may be mixed in the output unit 16 and then output.
[0096]
Further, the interpolation processing unit 13 may change the pitch of the interpolation spectrum shape to a pitch calculated at a predetermined ratio with respect to the pitch of the pronunciation information input from the SMS analysis unit 32.
Further, the interpolated spectrum shapes before and after changing the pitch may be processed separately and output.
According to this modification, two harmonies or three or more harmonies can be realized by the interpolated musical sound signal and the voice signal input by the singer.
[0097]
[3] Modification
[3.1] First modification
In each of the above-described embodiments, the multidimensional interpolation information is not limited to be set by the user, but may be held as a preset tone color, or may be appropriately voiced and changed by the user. Good. Moreover, you may enable it to change in real time by a controller.
[0098]
[3.2] Second modification
Moreover, in each embodiment mentioned above, it is limited to the interpolation information regarding three or more instruments. However, as in the conventional case, the present invention can be applied to the case of interpolation information regarding two instruments.
[0099]
【The invention's effect】
As described above, according to the present invention, three or more spectrum shapes can be interpolated, and more new musical sounds can be generated.
In addition, since the spectrum shape is interpolated in accordance with the tone generation state, a more natural musical tone can be generated.
In addition, since the newly generated musical sound and voice can be synthesized, a fresher musical sound can be generated.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a musical sound processing apparatus according to a first embodiment.
FIG. 2 is a diagram showing a file structure of an analysis data storage unit in the musical sound processing apparatus.
FIG. 3 is a diagram for explaining SMS analysis performed in the musical sound processing apparatus;
FIG. 4 is a diagram for explaining a method of spectrum interpolation performed in the musical sound processing apparatus;
FIG. 5 is a block diagram showing a schematic configuration of a musical sound processing apparatus in a modification of the first embodiment.
FIG. 6 is a block diagram showing a schematic configuration of a musical sound processing apparatus according to a second embodiment.
FIG. 7 is a block diagram showing a schematic configuration of a musical sound processing apparatus in a modification of the second embodiment.
FIG. 8 is a diagram for explaining a specific example of spectrum interpolation performed in the musical sound processing apparatus;
FIG. 9 is a diagram illustrating a specific example of an analysis data storage unit in the musical sound processing apparatus.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1, 1 ', 10, 10' ... Music processing apparatus, 11 ... Analysis data extraction part, 12 ... Analysis data storage part, 13 ... Interpolation processing part, 14 ... Frequency composition processing part, 15 ... Envelope Processing unit 16... Output unit 31. Singing signal input unit 32. SMS analysis unit 33 33 Morphing processing unit 41. MIDI tone selection unit 42 42 MIDI interpolation information storage unit 43. ... change input unit, 44 ... interpolation information generation unit

Claims

Analysis data storage means for storing analysis data including a sine wave component, a residual component, and a spectrum shape obtained by analyzing sampling data of sound sounded in the instrument;
Sound information input from the outside to specify the pitch, amplitude, and note length of the sound output by this musical sound processor, and interpolation input from the outside to generate the sound from the timbres of multiple instruments Analysis data extraction means for extracting each of the analysis data corresponding to the plurality of instruments from the analysis data storage means based on the information;
When performing the interpolation process using two of the plurality of spectrum shapes corresponding to the extracted analysis data, the interpolation process is performed using the spectrum shape transition function between the two spectrum shapes. Interpolation processing means for generating one interpolation spectrum shape;
A musical tone signal generating means for generating and outputting a musical tone signal based on the interpolated spectrum shape;
A musical sound processing apparatus comprising:

The musical sound processing device according to claim 1,
The analysis data storage means includes the sine wave component and the residual corresponding to each of the sounding states indicating a signal indicating a rising state, a signal indicating a steady state, or a signal indicating an open state for each instrument. Memorize multiple ingredients,
The analysis data extracting means determines the sounding state of the sounding information by analyzing the input sounding state, and extracts a sine wave component and a residual component according to the determined sounding state. A musical sound processing device.

The musical sound processing device according to claim 1 ,
The interpolation processing means repeatedly generates a single interpolated spectrum shape by repeatedly performing a grouping process for dividing the spectrum shape into two groups and the interpolation process performed for each group. Music processing device.

The musical sound processing device according to claim 1 ,
The musical tone processing apparatus, wherein the transition function is previously defined as a linear function or a nonlinear function.

The musical sound processing device according to claim 1 ,
The interpolation processing unit divides the two spectrum shapes into a plurality of regions on the frequency axis, and uses the real frequency and magnitude pairs on the two spectrum shapes belonging to each region as the transition function. The musical tone processing apparatus, wherein the interpolation processing using the linear function is performed over the plurality of regions.

In the musical sound processing apparatus according to claim 5 ,
The interpolation processing means uses the linear function to calculate a first frequency that is a frequency of one spectrum shape belonging to each of the regions and a second frequency that is a frequency of the other spectrum shape corresponding to the first frequency. A frequency interpolation means for calculating an interpolation frequency by interpolation;
Magnitude interpolation means for interpolating, using the linear function, a first magnitude that is a magnitude of one of the spectrum shapes belonging to each region and a second magnitude that is the magnitude of the other spectrum shape corresponding to the first magnitude; ,
A musical sound processing apparatus comprising:

In the musical sound processing apparatus according to claim 2,
The interpolation processing means includes
Analyzing data calculating means for calculating sine wave components and residual components not stored in the analytical data storage means by interpolating the sine wave components and residual components stored in the analytical data storage means Prepared ,
A musical sound processing apparatus , wherein the interpolation processing is performed using a sine wave component and a residual component calculated by the analysis data calculating means .

Voice input means for inputting voice;
Frequency analysis means for extracting a voice sine wave component and a voice residual component by performing frequency analysis on the input voice input from the voice input means in units of frames;
Analysis data storage means for storing analysis data including a sine wave component, a residual component, and a spectrum shape obtained by analyzing sampling data of sound sounded in the instrument;
Sound information input from the outside to specify the pitch, amplitude, and note length of the sound output by this musical sound processor, and interpolation input from the outside to generate the sound from the timbres of multiple instruments Analysis data extraction means for extracting each of the analysis data corresponding to the plurality of instruments from the analysis data storage means based on the information;
When performing the interpolation process using two of the plurality of spectrum shapes corresponding to the extracted analysis data, the interpolation process is performed using the spectrum shape transition function between the two spectrum shapes. Interpolation processing means for generating one interpolation spectrum shape;
Morphing means for morphing one of the interpolated spectrum shapes generated by the interpolation processing means and the speech sine wave component and the speech residual component extracted by the frequency analysis means with a predetermined morphing degree;
A musical sound signal generating means for generating and outputting a musical sound signal by performing an inverse Fourier transform on the new frequency component output from the morphing means;
A musical sound processing apparatus comprising:

The musical sound processing apparatus according to claim 8 , wherein
Morphing setting means for setting the predetermined morphing degree,
The morphing means morphs the interpolated spectrum shape, the voice sine wave component, and the voice residual component according to the morphing degree set by the morphing setting means.

The musical sound processing apparatus according to claim 8 , wherein
The interpolation processing means is a pitch shift means for shifting the pitch of the sine wave component by a pitch determined from a predetermined ratio corresponding to the pitch of the input voice,
Residual component adding means for adding a residual component to the sine wave component whose pitch is shifted by the pitch shifting means;
A musical sound processing apparatus comprising:

Sound generation information input from the outside to specify the pitch, amplitude, and note length of the sound output by this musical sound processing method and interpolation input from the outside to generate the sound from the timbres of multiple instruments The analysis corresponding to a plurality of instruments from analysis data including sinusoidal components, residual components and spectrum shapes obtained by analyzing sampling data of sounds sounded in the instrument based on information Analytical data extraction process to extract data,
When performing an interpolation process using two of the spectrum shapes corresponding to the analysis data, an interpolation process is performed using a spectrum shape transition function between the two spectrum shapes.・ Interpolation process to generate shape;
A musical sound signal generating process for generating and outputting a musical sound signal based on the interpolated spectrum shape;
A musical sound processing method characterized by comprising:

The musical sound processing method according to claim 11 , wherein
The interpolating process includes generating a single interpolated spectrum shape by repeatedly performing a grouping process for dividing the spectrum shape into two types of groups and the interpolation process performed for each group. Music processing method.

The musical sound processing method according to claim 11 , wherein
In the interpolation process, the two spectrum shapes are divided into a plurality of regions on the frequency axis, and the real frequency and magnitude sets on the two spectrum shapes belonging to each region are used as the transition function. A musical sound processing method, wherein the interpolation processing using a linear function is performed over the plurality of regions.

The musical sound processing method according to claim 13 ,
In the interpolation process, a first frequency that is a frequency of one spectrum shape belonging to each region and a second frequency that is a frequency of the other spectrum shape corresponding to the first frequency are calculated using the linear function. A frequency interpolation process for calculating an interpolation frequency by interpolation;
A magnitude interpolation process of interpolating, using the linear function, a first magnitude that is a magnitude of one of the spectrum shapes belonging to each region and a second magnitude that is the magnitude of the other spectrum shape corresponding to the first magnitude; ,
A musical sound processing method characterized by comprising:

The musical sound processing method according to claim 11 , wherein
The interpolation information includes overall interpolation information for performing interpolation processing for the synthesis ratio between the instruments, state interpolation information for performing interpolation processing for the sound generation state, pitch interpolation information for performing interpolation processing for the pitch, and amplification. A musical sound processing method comprising amplitude interpolation information for performing interpolation processing on a tude.

A voice input process for inputting voice;
A frequency analysis process for extracting a sine wave component and a residual component by performing frequency analysis on the input voice input in the voice input process in units of frames;
Sound generation information input from the outside to specify the pitch, amplitude, and note length of the sound output by this musical sound processing method and interpolation input from the outside to generate the sound from the timbres of multiple instruments The analysis corresponding to a plurality of instruments from analysis data including sinusoidal components, residual components and spectrum shapes obtained by analyzing sampling data of sounds sounded in the instrument based on information Analytical data extraction process to extract data,
When performing an interpolation process using two of the spectrum shapes corresponding to the analysis data, an interpolation process is performed using a spectrum shape transition function between the two spectrum shapes.・ Interpolation process to generate shape;
A morphing process in which one interpolated spectrum shape generated by the interpolation process and the sine wave component and the residual component extracted by the frequency analysis process are morphed according to a predetermined morphing degree;
A musical sound signal generation process for generating and outputting a musical sound signal by performing an inverse Fourier transform on the new frequency component output in the morphing process;
With
The analysis data extraction step extracts the analysis data corresponding to three or more instruments based on pronunciation information included in the sine wave component and interpolation information input from the outside. Processing method.

The musical sound processing method according to claim 16 , wherein
A morphing setting process for setting the predetermined morphing degree;
The musical sound processing method according to claim 1, wherein the morphing process morphs the interpolated spectrum shape, the speech sine wave component, and the speech residual component according to a morphing degree set in the morphing setting process.

The musical sound processing method according to claim 16 , wherein
The interpolation processing step includes a pitch shift step of shifting the pitch of the sine wave component by a pitch obtained from a predetermined ratio corresponding to the pitch of the input voice,
A residual component adding process for adding a residual component to the sine wave component whose pitch is shifted by the pitch shifting process;
A musical sound processing method characterized by comprising: