JP4662406B2

JP4662406B2 - Frequency analysis method and acoustic signal encoding method

Info

Publication number: JP4662406B2
Application number: JP2001204722A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2001-07-05
Filing date: 2001-07-05
Publication date: 2011-03-30
Anticipated expiration: 2021-07-05
Also published as: JP2003022070A

Abstract

PROBLEM TO BE SOLVED: To provide a frequency analyzing method which can perform high- precision frequency analysis by using short-time Fourier transformation and generalized harmonic analysis in combination. SOLUTION: A plurality of combinations of frequency and correlation strength of a time-series signal upper left are computed (lower left) by a method for short-time Fourier transformation, and a plurality of combinations of frequency and correlation strength of the time-series signal are computed (center left) by a method for generalized harmonic analysis. Then the priority of a frequency to be extracted is determined (center right) according to the correlation strength obtained by the generalized harmonic analysis. According to the determined priority, a combination of frequency and corresponding correlation strength with high priority is extracted (lower right) from the result of the short-time Fourier transformation.

Description

【０００１】
【産業上の利用分野】
本発明は、放送メディア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音声配信、インターネット音楽配信、通信カラオケ）、パッケージメディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯＭ、ゲームカセット、携帯音楽プレーヤ向け固体メモリ媒体）などで提供する各種オーディオコンテンツの制作、並びに、音楽演奏録音信号から楽譜出版、通信カラオケ配信用ＭＩＤＩデータ、演奏ガイド機能付き電子楽器向け自動演奏データ、携帯電話・ＰＨＳ・ポケベルなどの着信メロディデータを自動的に作成する自動採譜技術に関する。
【０００２】
【従来の技術】
音響信号に代表される時系列信号には、その構成要素として複数の周期信号が含まれている。このため、与えられた時系列信号にどのような周期信号が含まれているかを解析する手法は、古くから知られている。例えば、フーリエ解析は、与えられた時系列信号に含まれる周波数成分を解析するための方法として広く利用されている。
【０００３】
このような時系列信号の解析方法を利用すれば、音響信号を符号化することも可能である。コンピュータの普及により、原音となるアナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして取り込むことが容易にできるようになってきており、こうして取り込んだデジタルデータに対してフーリエ解析などの手法を適用し、原音信号に含まれていた周波数成分を抽出すれば、各周波数成分を示す符号によって原音信号の符号化が可能になる。
【０００４】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形を記憶したＭＩＤＩ音源が別途必要になるが、その符号化効率の高さが注目を集めており、ＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウェアに広く採り入れられている。
【０００５】
そこで、音響信号に代表される時系列信号に対して、所定の手法で解析を行うことにより、その構成要素となる周期信号を抽出し、抽出した周期信号をＭＩＤＩデータを用いて符号化しようとする提案がなされている。例えば、特開平１０−２４７０９９号公報、特開平１１−７３１９９号公報、特開平１１−７３２００号公報、特開平１１−９５７５３号公報、特開２０００−９９００９号公報、特開２０００−９９０９２号公報、特開２０００−９９０９３号公報、特開２０００−２６１３２２号公報、特開２００１−５４５０号公報、特開２００１−１４８６３３号公報には、任意の時系列信号について、構成要素となる周波数を解析し、その解析結果からＭＩＤＩデータを作成することができる種々の方法が提案されている。
【０００６】
【発明が解決しようとする課題】
上記各公報または明細書において提案してきたＭＩＤＩ符号化方式により、演奏録音等から得られる音響信号の効率的な符号化が可能になった。特に、一般化調和解析を用いた符号化方式は、短時間フーリエ変換法で問題となる周波数分解能を著しく向上させた。
【０００７】
しかしながら、一般化調和解析の手法で生成されるスペクトル成分には、演算途上で発生する誤差が累積し、スペクトル絶対成分の精度が良くない。すなわち、短時間フーリエ変換法では、計算誤差が少なく絶対成分の精度が高いが、擬似成分が多く含まれるという欠点があり、一方、一般化調和解析では、擬似成分を削除することはできるが、計算誤差が多く絶対成分の精度が低いという欠点がある。
【０００８】
上記のような点に鑑み、本発明は、短時間フーリエ変換と一般化調和解析を併用することにより、精度の高い周波数解析を行なうことが可能な、周波数解析方法、および音響信号の符号化方法を提供することを課題とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するため、本発明では、時系列信号から複数の信号成分を分離するための周波数解析方法として、解析しようとする周波数範囲で複数の周波数を設定し、各周波数に対応する複数の周期関数集合を準備する周期関数準備段階と、前記各周波数と相関値の関係を各周波数について格納するための配列である相関配列、優先度配列、強度配列を準備する配列準備段階と、前記時系列信号の時間軸上に複数の単位区間を設定し、各単位区間ごとに区間信号を抽出する区間信号抽出段階と、前記複数の周期関数と前記区間信号との相関を演算して各周期関数に対応する相関値を算出し、前記相関配列の各周期関数の周波数に対応する値を設定するための相関演算段階と、前記周期関数集合を利用して一般化調和解析の手法により各周波数に対応する相関値を算出し、前記優先度配列の各周波数に対応する値を設定するための優先度決定段階と、前記優先度配列の値により、前記相関配列中から対応する相関値を抽出し、当該相関値を強度配列の値として決定する強度算出段階と、を備えると共に、前記区間信号抽出段階で設定された全単位区間に対して、前記相関演算段階、前記優先度設定段階、前記強度算出段階を実行することにより、各単位区間ごとに複数の周波数と強度値の組を得るようにしたことを特徴とする。本発明によれば、区間信号と各周波数との相関を求めることにより相関配列を取得し、区間信号に対して一般化調和解析の手法を用いて各周波数との相関を求めることにより優先度配列を取得し、この優先度配列の値に基づいて、相関配列に記録された周波数とその相関値の組を選出するようにしたので、擬似成分の抽出を抑え、基の時系列信号に含まれている信号成分を正確に抽出することが可能となる。また、時系列信号として音響信号を用い、この音響信号に対して周波数解析を行うことにより精度の高い符号化を行うことが可能となる。
【００１０】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して詳細に説明する。
【００１１】
（1.1.周波数解析方法の基本原理）
はじめに、本発明に係る周波数解析方法の基本原理を、時系列信号として音響信号を用い、周波数解析の結果を利用して符号化を行う場合を例にとって説明しておく。この基本原理は、前掲の各公報に開示されているので、ここではその概要のみを簡単に述べることにする。
【００１２】
図１（ａ）に示すように、時系列信号としてアナログ音響信号が与えられたものとする。図１の例では、横軸に時間ｔ、縦軸に振幅（強度）をとって、この音響信号を示している。ここでは、まずこのアナログ音響信号を、デジタルの音響データとして取り込む処理を行う。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周波数でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えば良い。ここでは、説明の便宜上、ＰＣＭの手法でデジタル化した音響データの波形も図１（ａ）のアナログ音響信号と同一の波形で示すことにする。
【００１３】
続いて、この解析対象となる音響信号の時間軸上に、複数の単位区間を設定する。図１（ａ）に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの単位区間ｄ１〜ｄ５が設定されている。図１の例では、全て同一の区間長をもった単位区間が設定されているが、個々の単位区間ごとに区間長を変えるようにしてもかまわない。あるいは、隣接する単位区間が時間軸上で部分的に重なり合うような区間設定を行ってもかまわない。
【００１４】
こうして単位区間が設定されたら、各単位区間ごとの音響信号（以下、区間信号と呼ぶことにする）について、それぞれ代表周波数を選出する。各区間信号には、通常、様々な周波数成分が含まれているが、例えば、その中で成分の強度割合の大きな周波数成分を代表周波数として選出すれば良い。ここで、代表周波数とはいわゆる基本周波数が一般的であるが、音声のフォルマント周波数などの倍音周波数や、ノイズ音源のピーク周波数も代表周波数として扱うことがある。代表周波数は１つだけ選出しても良いが、音響信号によっては複数の代表周波数を選出した方が、より精度の高い符号化が可能になる。図１（ｂ）には、個々の単位区間ごとにそれぞれ３つの代表周波数を選出し、１つの代表周波数を１つの代表符号（図では便宜上、音符として示してある）として符号化した例が示されている。ここでは、代表符号（音符）を収容するために３つのトラックＴ１，Ｔ２，Ｔ３が設けられているが、これは個々の単位区間ごとに選出された３つずつの代表符号を、それぞれ異なるトラックに収容するためである。
【００１５】
例えば、単位区間ｄ１について選出された代表符号ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容されている。ここで、各符号ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、ＭＩＤＩ符号におけるノートナンバーを示す符号である。ＭＩＤＩ符号におけるノートナンバーは、０〜１２７までの１２８通りの値をとり、それぞれピアノの鍵盤の１つのキーを示すことになる。具体的には、例えば、代表周波数として４４０Ｈｚが選出された場合、この周波数はノートナンバーｎ＝６９（ピアノの鍵盤中央の「ラ音（Ａ３音）」に対応）に相当するので、代表符号としては、ｎ＝６９が選出されることになる。もっとも、図１（ｂ）は、上述の方法によって得られる代表符号を音符の形式で示した概念図であり、実際には、各音符にはそれぞれ強度に関するデータも付加されている。例えば、トラックＴ１には、ノートナンバーｎ（ｄ１，１），ｎ（ｄ２，１）・・・なる音高を示すデータとともに、ｅ（ｄ１，１），ｅ（ｄ２，１）・・・なる強度を示すデータが収容されることになる。この強度を示すデータは、各代表周波数の成分が、元の区間信号にどの程度の度合いで含まれていたかによって決定される。具体的には、各代表周波数をもった周期関数の区間信号に対する相関値に基づいて強度を示すデータが決定されることになる。また、図１（ｂ）に示す概念図では、音符の横方向の位置によって、個々の単位区間の時間軸上での位置が示されているが、実際には、この時間軸上での位置を正確に数値として示すデータが各音符に付加されていることになる。
【００１６】
音響信号を符号化する形式としては、必ずしもＭＩＤＩ形式を採用する必要はないが、この種の符号化形式としてはＭＩＤＩ形式が最も普及しているため、実用上はＭＩＤＩ形式の符号データを用いるのが好ましい。ＭＩＤＩ形式では、「ノートオン」データもしくは「ノートオフ」データが、「デルタタイム」データを介在させながら存在する。「ノートオン」データは、特定のノートナンバーＮとベロシティーＶを指定して特定の音の演奏開始を指示するデータであり、「ノートオフ」データは、特定のノートナンバーＮとベロシティーＶを指定して特定の音の演奏終了を指示するデータである。また、「デルタタイム」データは、所定の時間間隔を示すデータである。ベロシティーＶは、例えば、ピアノの鍵盤などを押し下げる速度（ノートオン時のベロシティー）および鍵盤から指を離す速度（ノートオフ時のベロシティー）を示すパラメータであり、特定の音の演奏開始操作もしくは演奏終了操作の強さを示すことになる。
【００１７】
前述の方法では、第ｉ番目の単位区間ｄｉについて、代表符号としてＪ個のノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），・・・，ｎ（ｄｉ，Ｊ）が得られ、このそれぞれについて強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），・・・，ｅ（ｄｉ，Ｊ）が得られる。そこで、次のような手法により、ＭＩＤＩ形式の符号データを作成することができる。まず、「ノートオン」データもしくは「ノートオフ」データの中で記述するノートナンバーＮとしては、得られたノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），・・・，ｎ（ｄｉ，Ｊ）をそのまま用いれば良い。一方、「ノートオン」データもしくは「ノートオフ」データの中で記述するベロシティーＶとしては、得られた強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），・・・，ｅ（ｄｉ，Ｊ）を所定の方法で規格化した値を用いれば良い。また、「デルタタイム」データは、各単位区間の長さに応じて設定すれば良い。
【００１８】
（1.2.周期関数との相関を求める具体的な方法）
上述した基本原理の基づく方法では、区間信号に対して、１つまたは複数の代表周波数が選出され、この代表周波数をもった周期信号によって、当該区間信号が表現されることになる。ここで、選出される代表周波数は、文字どおり、当該単位区間内の信号成分を代表する周波数である。この代表周波数を選出する具体的な方法には、後述するように、短時間フーリエ変換を利用する方法と、一般化調和解析の手法を利用する方法とがある。いずれの方法も、基本的な考え方は同じであり、あらかじめ周波数の異なる複数の周期関数を用意しておき、これら複数の周期関数の中から、当該単位区間内の区間信号に対する相関が高い周期関数を見つけ出し、この相関の高い周期関数の周波数を代表周波数として選出する、という手法を採ることになる。すなわち、代表周波数を選出する際には、あらかじめ用意された複数の周期関数と、単位区間内の区間信号との相関を求める演算を行うことになる。そこで、ここでは、周期関数との相関を求める具体的な方法を述べておく。
【００１９】
複数の周期関数として、図２に示すような三角関数が用意されているものとする。これらの三角関数は、同一周波数をもった正弦関数と余弦関数との対から構成されており、１２８通りの標準周波数ｆ（０）〜ｆ（１２７）のそれぞれについて、正弦関数および余弦関数の対が定義されていることになる。ここでは、同一の周波数をもった正弦関数および余弦関数からなる一対の関数を、当該周波数についての周期関数として定義することにする。すなわち、ある特定の周波数についての周期関数は、一対の正弦関数および余弦関数によって構成されることになる。このように、一対の正弦関数と余弦関数とにより周期関数を定義するのは、信号に対する周期関数の相関値を求める際に、相関値が位相の影響を受ける事を考慮するためである。なお、図２に示す各三角関数内の変数Ｆおよびｋは、区間信号Ｘについてのサンプリング周波数Ｆおよびサンプル番号ｋに相当する変数である。例えば、周波数ｆ（０）についての正弦波は、ｓｉｎ（２πｆ（０）ｋ／Ｆ）で示され、任意のサンプル番号ｋを与えると、区間信号を構成する第ｋ番目のサンプルと同一時間位置における周期関数の振幅値が得られる。
【００２０】
ここでは、１２８通りの標準周波数ｆ（０）〜ｆ（１２７）を図３に示すような式で定義した例を示すことにする。すなわち、第ｎ番目（０≦ｎ≦１２７）の標準周波数ｆ（ｎ）は、以下に示す〔数式１〕で定義されることになる。
【００２１】
〔数式１〕
ｆ（ｎ）＝４４０×２^γ ⁽ⁿ⁾
γ（ｎ）＝（ｎ−６９）／１２
【００２２】
このような式によって標準周波数を定義しておくと、最終的にＭＩＤＩデータを用いた符号化を行う際に便利である。なぜなら、このような定義によって設定される１２８通りの標準周波数ｆ（０）〜ｆ（１２７）は、等比級数をなす周波数値をとることになり、ＭＩＤＩデータで利用されるノートナンバーに対応した周波数になるからである。したがって、図２に示す１２８通りの標準周波数ｆ（０）〜ｆ（１２７）は、対数尺度で示した周波数軸上に等間隔（ＭＩＤＩにおける半音単位）に設定した周波数ということになる。
【００２３】
続いて、任意の区間の区間信号に対する各周期関数の相関の求め方について、具体的な説明を行う。例えば、図４に示すように、ある単位区間ｄについて区間信号Ｘが与えられていたとする。ここでは、区間長Ｌをもった単位区間ｄについて、サンプリング周波数Ｆでサンプリングが行なわれており、全部でｗ個のサンプル値が得られているものとし、サンプル番号を図示のように、０，１，２，３，・・・，ｋ，・・・，ｗ−２，ｗ−１とする（白丸で示す第ｗ番目のサンプルは、右に隣接する次の単位区間の先頭に含まれるサンプルとする）。この場合、任意のサンプル番号ｋについては、Ｘ（ｋ）なる振幅値がデジタルデータとして与えられていることになる。短時間フーリエ変換においては、Ｘ（ｋ）に対して各サンプルごとに中央の重みが１に近く、両端の重みが０に近くなるような窓関数Ｗ（ｋ）を乗ずることが通常である。すなわち、Ｘ（ｋ）×Ｗ（ｋ）をＸ（ｋ）と扱って以下のような相関計算を行うもので、窓関数の形状としては余弦波形状のハミング窓が一般に用いられている。ここで、ｗは以下の記述においても定数のような記載をしているが、一般にはｎの値に応じて変化させ、区間長Ｌを超えない範囲で最大となるＦ／ｆ（ｎ）の整数倍の値に設定することが望ましい。
【００２４】
このような区間信号Ｘに対して、第ｎ番目の標準周波数ｆ（ｎ）をもった正弦関数Ｒｎとの相関値を求める原理を示す。両者の相関値Ａ（ｎ）は、図５の第１の演算式によって定義することができる。ここで、Ｘ（ｋ）は、図４に示すように、区間信号Ｘにおけるサンプル番号ｋの振幅値であり、ｓｉｎ（２πｆ（ｎ）ｋ／Ｆ）は、時間軸上での同位置における正弦関数Ｒｎの振幅値である。この第１の演算式は、単位区間ｄ内の全サンプル番号ｋ＝０〜ｗ−１の次元について、それぞれ区間信号Ｘの振幅値と正弦関数Ｒｎの振幅ベクトルの内積を求める式ということができる。
【００２５】
同様に、図５の第２の演算式は、区間信号Ｘと、第ｎ番目の標準周波数ｆ（ｎ）をもった余弦関数との相関値を求める式であり、両者の相関値はＢ（ｎ）で与えられる。なお、相関値Ａ（ｎ）を求めるための第１の演算式も、相関値Ｂ（ｎ）を求めるための第２の演算式も、最終的に２／ｗが乗ぜられているが、これは相関値を規格化するためのものでり、前述のとおりｗはｎに依存して変化させるのが一般的であるため、この係数もｎに依存する変数である。
【００２６】
区間信号Ｘと標準周波数ｆ（ｎ）をもった標準周期関数との相関実効値は、図５の第３の演算式に示すように、正弦関数との相関値Ａ（ｎ）と余弦関数との相関値Ｂ（ｎ）との二乗和平方根値Ｅ（ｎ）によって示すことができる。この相関実効値の大きな標準周期関数の周波数を代表周波数として選出すれば、この代表周波数を用いて区間信号Ｘを符号化することができる。
【００２７】
すなわち、この相関値Ｅ（ｎ）が所定の基準以上の大きさとなる１つまたは複数の標準周波数を代表周波数として選出すれば良い。なお、ここで「相関値Ｅ（ｎ）が所定の基準以上の大きさとなる」という選出条件は、例えば、何らかの閾値を設定しておき、相関値Ｅ（ｎ）がこの閾値を超えるような標準周波数ｆ（ｎ）をすべて代表周波数として選出する、という絶対的な選出条件を設定しても良いが、例えば、相関値Ｅ（ｎ）の大きさの順にＱ番目までを選出する、というような相対的な選出条件を設定しても良い。
【００２８】
（1.3.一般化調和解析の手法）
上記のように、区間信号と周期関数との相関を単純に求めていく手法が短時間フーリエ変換であるが、この手法により算出された相関を利用して解析を行なう一般化調和解析について説明する。既に説明したように、音響信号を符号化する場合、個々の単位区間内の区間信号について、相関値の高いいくつかの代表周波数を選出することになる。一般化調和解析は、より高い精度で代表周波数の選出を可能にする手法であり、その基本原理は次の通りである。
【００２９】
図６（ａ）に示すような単位区間ｄについて、信号Ｓ（ｊ）なるものが存在するとする。ここで、ｊは後述するように、繰り返し処理のためのパラメータである（ｊ＝１〜Ｊ）。まず、この信号Ｓ（ｊ）に対して、図２に示すような１２８通りの周期関数すべてについての相関値を求める。この相関値の算出には、上記の短時間フーリエ変換が利用される。そして、最大の相関値が得られた１つの周期関数の周波数を代表周波数として選出し、当該代表周波数をもった周期関数を要素関数として抽出する。続いて、図６（ｂ）に示すような含有信号Ｇ（ｊ）を定義する。この含有信号Ｇ（ｊ）は、抽出された要素関数に、その振幅として、当該要素関数の信号Ｓ（ｊ）に対する相関値を乗じることにより得られる信号である。例えば、周期関数として図２に示すように、一対の正弦関数と余弦関数とを用い、周波数ｆ（ｎ）が代表周波数として選出された場合、振幅Ａ（ｎ）をもった正弦関数Ａ（ｎ）ｓｉｎ（２πｆ（ｎ）ｋ／Ｆ）と、振幅Ｂ（ｎ）をもった余弦関数Ｂ（ｎ）ｃｏｓ（２πｆ（ｎ）ｋ／Ｆ）との和からなる信号が含有信号Ｇ（ｊ）ということになる（図６（ｂ）では、図示の便宜上、一方の関数しか示していない）。ここで、Ａ（ｎ），Ｂ（ｎ）は、図５の式で得られる規格化された相関値であるから、結局、含有信号Ｇ（ｊ）は、信号Ｓ（ｊ）内に含まれている周波数ｆ（ｎ）をもった信号成分ということができる。
【００３０】
こうして、含有信号Ｇ（ｊ）が求まったら、信号Ｓ（ｊ）から含有信号Ｇ（ｊ）を減じることにより、差分信号Ｓ（ｊ＋１）を求める。図６（ｃ）は、このようにして求まった差分信号Ｓ（ｊ＋１）を示している。この差分信号Ｓ（ｊ＋１）は、もとの信号Ｓ（ｊ）の中から、周波数ｆ（ｎ）をもった信号成分を取り去った残りの信号成分からなる信号ということができる。そこで、パラメータｊを１だけ増加させることにより、この差分信号Ｓ（ｊ＋１）を新たな信号Ｓ（ｊ）として取り扱い、同様の処理を、パラメータｊをｊ＝１〜Ｊまで１ずつ増やしながらＪ回繰り返し実行すれば、Ｊ個の代表周波数を選出することができる。
【００３１】
このような相関計算の結果として出力されるＪ個の含有信号Ｇ（１）〜Ｇ（Ｊ）は、もとの区間信号Ｘの構成要素となる信号であり、もとの区間信号Ｘを符号化する場合には、これらＪ個の含有信号の周波数を示す情報および振幅（強度）を示す情報を符号データとして用いるようにすれば良い。尚、Ｊは代表周波数の個数であると説明してきたが、標準周波数ｆ（ｎ）の個数と同一すなわちＪ＝１２８であってもよく、周波数スペクトルを求める目的においてはそのように行うのが通例である。
【００３２】
こうして、各単位区間について、所定数の周波数群が選出されたら、この周波数群の各周波数に対応する「音の高さを示す情報」、選出された各周波数の信号強度に対応する「音の強さを示す情報」、当該単位区間の始点に対応する「音の発音開始時刻を示す情報」、当該単位区間に後続する単位区間の始点に対応する「音の発音終了時刻を示す情報」、の４つの情報を含む所定数の符号データを作成すれば、当該単位区間内の区間信号Ｘを所定数の符号データにより符号化することができる。符号データとして、ＭＩＤＩデータを作成するのであれば、「音の高さを示す情報」としてノートナンバーを用い、「音の強さを示す情報」としてベロシティーを用い、「音の発音開始時刻を示す情報」としてノートオン時刻を用い、「音の発音終了時刻を示す情報」としてノートオフ時刻を用いるようにすれば良い。
【００３３】
（2.1.本発明に係る周波数解析方法）
ここまでに説明した従来技術とも共通する本発明の基本原理を要約すると、原音響信号に単位区間を設定し、単位区間ごとに複数の周波数に対応する信号強度を算出し、得られた信号強度を基に用意された周期関数を利用して１つまたは複数の代表周波数を選出し、選出された代表周波数に対応する音の高さ情報と、選出された代表周波数の強度に対応する音の強さ情報と、単位区間の始点に対応する発音開始時刻と、単位区間の終点に対応する発音終了時刻で構成される符号データを作成することにより、音響信号の符号化が行われていることになる。
【００３４】
本発明の周波数解析方法、および音響信号の符号化方法では、上記基本原理において、代表周波数を選出する過程において、短時間フーリエ変換による解析結果を基に、一般化調和解析による解析結果を利用するものである。
【００３５】
ここからは、本発明の周波数解析方法について音響信号の周波数解析を行なう場合を例として、図７に示すフローチャートを用いて説明する。まず、音響信号の時間軸上の全区間に渡って単位区間を設定する（ステップＳ１）。このステップＳ１における手法は、上記基本原理において、図１（ａ）を用いて説明した通りである。
【００３６】
次に、各単位区間ごとの音響信号、すなわち区間信号について、短時間フーリエ変換の手法により周波数解析を行って各周波数に対応する相関値を算出する（ステップＳ２）。ここでは、上記「1.2.周期関数との相関を求める具体的な方法」の項における例と同様にノートナンバーに対応した標準周波数を有する１２８通りの標準周期関数を用意し、短時間フーリエ変換により各標準周期関数の標準周波数について相関値を求める。これらの相関値はあらかじめ用意された相関配列に記録される。この相関配列とは、標準周波数ｆ（０）〜ｆ（１２７）に対応する相関値を記録することが可能なものとなっている。
【００３７】
続いて、同じ区間信号について一般化調和解析の手法により周波数解析を行って各標準周波数に対応する相関値を算出する（ステップＳ３）。この相関値はステップＳ２で算出した相関値とは異なり、最後まで利用されるものではなく、ステップＳ２で算出した相関値の優先度を決定するために利用され、優先度配列に記録される。ここでも、上記「1.3.一般化調和解析の手法」の項における例と同様にノートナンバーに対応した周波数を有する１２８通りの各標準周期関数について区間信号との相関値を算出し、優先度配列に記録する。具体的には、上記図６を用いて説明した例におけるＪ個の含有信号Ｇ（１）〜Ｇ（Ｊ）を求めるための相関計算を行うことになる。優先度配列も上記相関配列と同様、標準周波数ｆ（０）〜ｆ（１２７）に対応する相関値を記録することが可能なものとなっている。
【００３８】
上記ステップＳ１で設定された全ての単位区間に対して、ステップＳ２、ステップＳ３の処理は実行される。続いて、ステップＳ２およびステップＳ３により算出された相関配列中の相関値、優先度配列中の相関値を用いて強度配列中に相関値を記録していく（ステップＳ４）。具体的には、まず、各単位区間ごとに優先度配列中の相関値が大きいものから所定数については相関値をそのままとし、それ以外の相関値を０に変更する。この所定数は、設定により変更可能であるが、同時に抽出すべき音の数が設定される。次に、優先度配列中の相関値が０でない周波数について、相関配列中から対応する相関値を抽出し、周波数と対応付けて強度配列に記録する。このようにして、各単位区間について強度配列が得られることになる。得られた強度配列は、本来抽出すべき周波数と相関値の組合せとなるので、これに基づいて周波数成分を抽出することにより、精度の高い周波数解析が可能となる。
【００３９】
上記のような周波数解析方法による効果的な例を図８を用いて説明する。図８において、６つのグラフはいずれも周波数（ノートナンバー）と強度の関係を示している。左上の図は、原信号のスペクトルであり、この２つの周波数成分をこの強度で抽出できることが最も好ましい。このような原音響信号に対して、一般化調和解析、短時間フーリエ変換による解析を行なうことにより得られるスペクトル信号は、それぞれ左中、左下の図に示す曲線のようになる。なお、左中、左下の図においては、原信号の周波数と対応する位置に直線を引いておく。さて、左中の図に示した一般化調和解析の結果に基づいて、最大の強度を有する２つの周波数成分を抽出すると、右中の図のようになり、抽出された周波数成分は、原信号と同一の周波数であるが、強度値は逆転しており、左側が１位、右側が２位となっている。一方、左下の図に示した短時間フーリエ変換の解析結果に基づいて最大の強度を有する２つの周波数成分を抽出すると、右上の図のようになり、１位の周波数成分は、原信号と全く同一であるが、２位の周波数成分としては、全く原信号には存在しなかったものが抽出されることになる。本発明の周波数解析方法によれば、左下の図に示したようなスペクトル信号を用いて、すなわち、周波数に対応する強度値は短時間フーリエ変換による解析結果を利用し、その抽出周波数の決定だけを一般化調和解析の結果に基づいて行うようにしたので、
右下の図に示すように原信号の周波数成分を正確に抽出することができる。
【００４０】
次に、本発明に係る音響信号の符号化方法について、図９に示すフローチャートを用いて説明する。まず、音響信号の時間軸上の全区間に渡って単位区間を設定する（ステップＳ１１）。このステップＳ１１における手法は、図７に示したステップＳ１とほぼ同様の手法であり、上記基本原理において、図１（ａ）を用いて説明した通りである。
【００４１】
次に、各単位区間ごとの音響信号、すなわち区間信号について、短時間フーリエ変換の手法により周波数解析を行って各周波数に対応する相関値を算出する（ステップＳ１２）。これも図７に示したステップＳ２と同様、上記「1.2.周期関数との相関を求める具体的な方法」の項における例と同様にノートナンバーに対応した標準周波数を有する１２８通りの標準周期関数を用意し、短時間フーリエ変換により各標準周期関数の標準周波数について相関値を求める。これらの相関値はあらかじめ用意された相関配列に記録される。この相関配列とは、標準周波数ｆ（０）〜ｆ（１２７）に対応する相関値を記録することが可能なものとなっている。
【００４２】
続いて、同じ区間信号について一般化調和解析の手法により周波数解析を行って各標準周波数に対応する相関値を算出する（ステップＳ１３）。ステップＳ１３についても、図７に示したステップＳ２と同様、この相関値はステップＳ１２で算出した相関値とは異なり、最後まで利用されるものではなく、ステップＳ１２で算出した相関値の優先度を決定するために利用され、優先度配列に記録される。ここでも、上記「1.3.一般化調和解析の手法」の項における例と同様にノートナンバーに対応した周波数を有する１２８通りの各標準周期関数について区間信号との相関値を算出し、優先度配列に記録する。具体的には、上記図６を用いて説明した例におけるＪ個の含有信号Ｇ（１）〜Ｇ（Ｊ）を求めるための相関計算を行うことになる。優先度配列も上記相関配列と同様、標準周波数ｆ（０）〜ｆ（１２７）に対応する相関値を記録することが可能なものとなっている。
【００４３】
上記ステップＳ１１で設定された全ての単位区間に対して、ステップＳ１２、ステップＳ１３の処理は実行される。ステップＳ１２およびステップＳ１３により算出された相関配列中の相関値、優先度配列中の相関値は、それぞれ単位音素データの属性情報の１つとして利用される。具体的には、ステップＳ１２において短時間フーリエ変換の手法により得られた標準周波数、相関配列中の相関値、単位区間の始点、単位区間の終点の４つの情報、およびステップＳ１３において一般化調和解析の手法により得られた優先度配列中の相関値を加えた５つの情報を「単位音素データ」と定義する。この単位音素データとは、音素データのうち、特に単位区間長のものをいう。本実施形態では、上記基本原理で説明した場合のように、代表周波数を選出するのではなく、用意した標準周期関数全てに対応する単位音素データを取得する。このステップＳ１２の処理を全単位区間に対して行うことにより、単位音素データ[ｍ，ｎ]（０≦ｍ≦Ｍ−１，０≦ｎ≦Ｎ−１）群が得られることになる。ここで、Ｎは標準周期関数の総数（上述の例ではＮ＝１２８）、Ｍは音響信号において設定された単位区間の総数である。つまり、Ｍ×Ｎ個の単位音素データからなる単位音素データ群が得られることになる。
【００４４】
単位音素データ群が得られたら、この単位音素データ群を構成する単位音素データの優先度の決定を行う（ステップＳ１４）。具体的には、まず、優先度配列中の相関値が所定の基準以下である単位音素データを削除する。ここで、優先度配列中の相関値が所定の基準以下である単位音素データを削除するのは、信号レベルがほとんど０であって、実際には音が存在していないと判断される音素を削除するためである。そのため、この所定の基準としては、音が実際に存在しないレベルとみなされる値が設定される。この時点で単位音素データの数はＭ×Ｎ個より減ることになる。
【００４５】
続いて、各単位区間ごとに優先度配列中の相関値が大きいものから所定数については相関値をそのままとし、それ以外の相関値を０に変更する。この所定数は、設定により変更可能であるが、通常ＭＩＤＩ規格の同時発音可能な音の数である１６が設定される。これにより、優先度配列中の相関値をそのままとされた単位音素データは、優先マークがマーキングされたのと同様の効果を生じることになる。優先度配列中の相関値をそのままとされた単位音素データを以降優先音素データと呼ぶことにする。
【００４６】
このようにして優先音素データを含む単位音素データ群が得られたら、同一周波数で時系列方向に連続する複数の単位音素データを１つの連結音素データとして連結する（ステップＳ１５）。図１０は単位音素データの連結を説明するための概念図である。図１０（ａ）は連結前の単位音素データ群の様子を示す図である。図１０（ａ）において、格子状に仕切られた各矩形は単位音素データを示しており、網掛けがされている矩形は、上記ステップＳ４において優先度配列中の相関値が所定の基準以下であると判断されて削除された単位音素データであり、その他の矩形は削除されなかった単位音素データを示す。ステップＳ５においては、同一周波数（同一ノートナンバー）で時間ｔ方向に連続する単位音素データを連結するため、図１０（ａ）に示す単位音素データ群に対して連結処理を実行すると、図１０（ｂ）に示すような複数の連結音素データ、複数の単位音素データからなる音素データ群が得られる。例えば、図１０（ａ）に示した単位音素データＡ１、Ａ２、Ａ３は連結されて、図１０（ｂ）に示すような連結音素データＡが得られることになる。このとき、構成される単位音素データＡ１、Ａ２、Ａ３のいずれか１つは優先音素データ、すなわち優先度配列中の相関値が正の値になっていなければならず、いずれも優先音素データでない場合は連結されずにこの段階で削除され、次のステップＳ１６には渡されない。連結が行われる場合、新たに得られる連結音素データＡの周波数としては、単位音素データＡ１、Ａ２、Ａ３に共通の周波数が与えられ、相関配列中の相関値としては、単位音素データＡ１、Ａ２、Ａ３の相関配列中の相関値のうち最大のものが与えられ、開始時刻としては、先頭の単位音素データＡ１の区間開始時刻ｔ１が与えられ、終了時刻としては、最後尾の単位音素データＡ３の区間終了時刻ｔ４が与えられる。最終的な符号化時には、単位音素データとは異なり、周波数（ノートナンバー）、相関配列の相関値、開始時刻、終了時刻の４つの情報だけで構成されるため、３つの単位音素データが１つの連結音素データに統合されることにより、データ量は３分の１に削減される。このことは、最終的にＭＩＤＩ符号化される場合には、短い音符３つではなく、長い音符１つとして表現されることを意味している。また、図１０（ａ）に示した優先音素データＢのように、同一周波数で時系列方向に連続する単位音素データがない場合で、当該単位音素データが優先音素データである場合には、図１０（ｂ）に示すように、連結されずにそのまま残ることになるが、以降の処理においては、連結音素データも、連結されなかった単位区間長の優先音素データもまとめて「音素データ」として扱う。
【００４７】
続いて、全区間において音素データの総数が所定数を超えないように音素データの削除を行って符号化を行う（ステップＳ１６）。所定数としては、例えば、ＭＩＤＩデータに符号化するのであれば、１秒間あたりの音素数として２５０（符号長２０ｋｂｐｓ）未満に設定される。具体的にどの音素データを削除するかについては、各音素データの（終了時刻−開始時刻）×相関値、により算出される値を採用する。すなわち、この値が低いものを順次削除していくことになる。
音素データの総数の調整が行われたら、ＭＩＤＩ形式に符号化を行う。
【００４８】
以上のようにして、符号化が行われることになるが、次にさらに精度を高める手法について説明する。上記ステップＳ１４の優先度決定段階においては、優先度配列を利用することにより、単位音素データ群のうち、優先度配列中の相関値が所定の基準以下である単位音素データを削除するようにしたが、以下の手法を用いることにより、優先度配列に記録する値をより精度の高いものにすることができる。具体的には、まず、ノートナンバーに対応した標準周波数よりもさらに狭い間隔で周波数を定義すると共に定義された周波数に対応する周期関数を用意する。このような狭い間隔で定義された周波数を本明細書では、微細周波数と呼び、微細周波数に対応する周期関数を微細周期関数と呼ぶことにする。微細周波数としては、隣接する標準周波数間に所定数設定される。また、微細周波数間の間隔は、標準周波数の場合と同様に等比級数となるように設定される。ここで、微細周波数を各標準周波数間に１２個設定した例を図１１に示す。図１１に示すように、標準周波数ｆ（ｎ）と標準周波数ｆ（ｎ＋１）の間に微細周波数ｆ（ｎ＋１／１３）〜微細周波数ｆ（ｎ＋１２／１３）の１２個が設定されている。図１１中、ノートナンバーｎ＋６／１３とノートナンバーｎ＋７／１３の間の点線は、各標準周波数とみなされる微細周波数の範囲を示す。この「各標準周波数とみなされる」とは、各標準周波数に対応する相関値として優先度配列に格納されることを示している。図１１においては、ノートナンバーｎ＋６／１３まではノートナンバーｎの周波数範囲とみなされ、ノートナンバーｎ＋７／１３からはノートナンバーｎ＋１とみなされることを示している。
【００４９】
このような微細周波数に対して図２に示した標準周期関数と同じ形式の微細周期関数が用意され、各微細周期関数と区間信号との相関値が算出される。そして各標準周波数の範囲となる１３個（１個の標準周波数と前後６個ずつの微細周波数）の周波数の相関値のうち、最大のものがその標準周波数の相関値として優先度配列に格納される。例えば、ノートナンバーｎ（標準周波数ｆ（ｎ））の相関値は、ノートナンバーｎ−６／１３からノートナンバーｎ＋６／１３までの１３個の相関値で最大のものが設定されることになる。ここで、最大の相関値を有する周波数がノートナンバーｎ−６／１３や、ノートナンバーｎ＋６／１３のように標準周波数範囲の端に位置する場合には、対象とする標準周波数と隣接する標準周波数の境界部に存在する成分である可能性もあるが、隣接する標準周波数成分の影響により対象とする標準周波数の範囲内には本来存在しない成分である可能性が高い。前者と後者を判別するためには、隣接する標準周波数の微細周波数に注目し、隣接する標準周波数が対象とする標準周波数に重なるように標準周波数範囲の端に位置している場合には前者であり、それ以外に位置する場合は後者とみなし、後者の場合には優先度配列の対応する標準周波数の相関値を「０」に設定する。前者の場合は、対象とする標準周波数成分と隣接する標準周波数成分が重複して算出されてしまうため、優先度配列の対象とする標準周波数の相関値または隣接する標準周波数の相関値を「０」に設定し、本実施形態では相関値が低い方を「０」に設定するようにしている。
【００５０】
このようにして優先度配列の相関値が「０」として設定された単位音素データは、当然のことながら、優先度配列の相関値が所定の基準以下であると判断されて、ステップＳ１４において削除されることになる。すなわち、微細周波数を設定することにより、より精度の高い優先音素データが得られることになる。
【００５１】
以上、本発明の好適な実施形態について説明したが、上記周波数解析方法および音響信号の符号化方法は、コンピュータ等で実行されることは当然である。具体的には、図７および図９のフローチャートに示したようなステップを上記手順で実行するためのプログラムをコンピュータに搭載しておく。そして、音響信号をＰＣＭ方式等でデジタル化した後、コンピュータに取り込み、ステップＳ１〜ステップＳ４またはステップＳ１１〜ステップＳ１６の処理を行った後、周波数解析を行った場合は、その周波数成分を、符号化を行った場合は、ＭＩＤＩ形式等の符号データをコンピュータより出力する。符号化を行った場合は、出力された符号データは、例えば、ＭＩＤＩデータの場合、ＭＩＤＩシーケンサ、ＭＩＤＩ音源を用いて音声として再生される。
【００５２】
【発明の効果】
以上、説明したように本発明によれば、解析しようとする周波数範囲で複数の周波数を設定し、各周波数に対応する複数の周期関数集合を準備し、各周波数と相関値の関係を各周波数について格納するための配列である相関配列、優先度配列、強度配列を準備し、時系列信号の時間軸上に複数の単位区間を設定し、各単位区間ごとに区間信号を抽出し、複数の周期関数と区間信号との相関を演算して各周期関数に対応する相関値を算出し、相関配列の各周期関数の周波数に対応する値を設定し、周期関数集合を利用して一般化調和解析の手法により各周波数に対応する相関値を算出し、優先度配列の各周波数に対応する値を設定し、相関配列に対して前記優先度配列の値により重み付けすることにより強度配列の値を決定するようにするようにしたので、擬似成分の抽出を抑え、基の時系列信号に含まれている信号成分を正確に抽出することが可能となるという効果を奏する。さらに、時系列信号として音響信号を用い、この音響信号に対して周波数解析を行うことにより精度の高い符号化を行うことが可能となるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の音響信号の符号化方法の基本原理を示す図である。
【図２】本発明で利用される周期関数の一例を示す図である。
【図３】図２に示す各周期関数の周波数とＭＩＤＩノートナンバーｎとの関係式を示す図である。
【図４】解析対象となる信号と周期信号との相関計算の手法を示す図である。
【図５】図４に示す相関計算を行うための計算式を示す図である。
【図６】一般化調和解析の基本的な手法を示す図である。
【図７】本発明の周波数解析方法のフローチャートである。
【図８】本発明の周波数解析方法による効果的な例を示す図である。
【図９】本発明の音響信号の符号化方法のフローチャートである。
【図１０】単位音素データの連結を説明するための概念図である。
【図１１】各標準周波数間に１２個の微細周波数を設定した状態を示す図である。
【符号の説明】
Ａ（ｎ），Ｂ（ｎ）・・・相関値
ｄ，ｄ１〜ｄ５・・・単位区間
Ｅ（ｎ）・・・相関値
Ｇ（ｊ）・・・含有信号
ｎ，ｎ１〜ｎ６・・・ノートナンバー
Ｓ（ｊ），Ｓ（ｊ＋１）・・・差分信号
Ｘ，Ｘ（ｋ）・・・区間信号[0001]
[Industrial application fields]
The present invention includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet music distribution, communication karaoke), package media (CD, MD, cassette, video, LD, CD-ROM, game cassette, mobile phone). Production of various audio contents provided by a solid-state memory medium for music players, etc., music publishing from musical performance recording signals, MIDI data for online karaoke distribution, automatic performance data for electronic musical instruments with performance guide functions, mobile phones, PHS, The present invention relates to an automatic music recording technique for automatically generating incoming melody data such as a pager.
[0002]
[Prior art]
A time-series signal represented by an acoustic signal includes a plurality of periodic signals as its constituent elements. For this reason, a method for analyzing what kind of periodic signal is included in a given time-series signal has been known for a long time. For example, Fourier analysis is widely used as a method for analyzing frequency components included in a given time series signal.
[0003]
By using such a time-series signal analysis method, an acoustic signal can be encoded. With the spread of computers, it has become easy to sample an analog audio signal as the original sound at a predetermined sampling frequency, quantize the signal intensity at each sampling, and capture it as digital data. If a method such as Fourier analysis is applied to the data and the frequency components included in the original sound signal are extracted, the original sound signal can be encoded by a code indicating each frequency component.
[0004]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, when reproducing the actual sound, a MIDI sound source storing the waveform of the instrument sound is separately required. However, its high encoding efficiency is attracting attention, and encoding and decoding according to the MIDI standard are being attracted attention. This technology is now widely used in software that uses a personal computer to perform musical instrument performance, practice and compose music.
[0005]
Therefore, by analyzing a time-series signal represented by an acoustic signal by a predetermined method, a periodic signal as a constituent element is extracted, and the extracted periodic signal is encoded using MIDI data. Proposals have been made. For example, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200, JP-A-11-95753, JP-A-2000-99009, JP-A-2000-99092, JP-A-2000-99093, JP-A-2000-261322, JP-A-2001-5450, and JP-A-2001-148633 analyze the frequency as a component of an arbitrary time-series signal, Various methods for creating MIDI data from the analysis results have been proposed.
[0006]
[Problems to be solved by the invention]
The MIDI encoding method proposed in each of the above publications or specifications has enabled efficient encoding of acoustic signals obtained from performance recordings and the like. In particular, the coding method using the generalized harmonic analysis has significantly improved the frequency resolution which is a problem in the short-time Fourier transform method.
[0007]
However, the spectrum component generated by the generalized harmonic analysis method accumulates errors that occur during the calculation, and the accuracy of the spectrum absolute component is not good. That is, the short-time Fourier transform method has a small calculation error and high accuracy of the absolute component, but has a disadvantage that many pseudo components are included.On the other hand, in the generalized harmonic analysis, the pseudo components can be deleted. There is a drawback that there are many calculation errors and the accuracy of absolute components is low.
[0008]
In view of the above points, the present invention provides a frequency analysis method and an acoustic signal encoding method capable of performing a highly accurate frequency analysis by using a short-time Fourier transform and a generalized harmonic analysis in combination. It is an issue to provide.
[0009]
[Means for Solving the Problems]
In order to solve the above problems, in the present invention, as a frequency analysis method for separating a plurality of signal components from a time-series signal, a plurality of frequencies are set in a frequency range to be analyzed, and a plurality of frequencies corresponding to each frequency are set. A periodic function preparing stage for preparing a periodic function set, an array preparing stage for preparing a correlation array, a priority array, and an intensity array, which are arrays for storing the relationship between each frequency and the correlation value for each frequency; A plurality of unit sections are set on the time axis of the series signal, a section signal extraction stage for extracting a section signal for each unit section, and a correlation between the plurality of periodic functions and the section signal is calculated to calculate each period function. And calculating a correlation value corresponding to the frequency of each periodic function of the correlation array, and a method of generalized harmonic analysis using the periodic function set to set each frequency. A priority determination step for calculating a correlation value corresponding to the value corresponding to each frequency of the priority sequence,in frontDepending on the value of the priority arrayThe corresponding correlation value is extracted from the correlation sequence, and the correlation value isStrength array valueAsAn intensity calculation step to determine, and for each unit section set in the section signal extraction stage, executing each of the units by executing the correlation calculation stage, the priority setting stage, and the intensity calculation stage It is characterized in that a plurality of sets of frequencies and intensity values are obtained for each section. According to the present invention, a correlation array is obtained by obtaining a correlation between a section signal and each frequency, and a priority array is obtained by obtaining a correlation with each frequency using a generalized harmonic analysis technique for the section signal. Based on the value of the priority array, the frequency recorded in the correlation array and the set of the correlation value are selected, so that the extraction of the pseudo component is suppressed and the frequency is included in the basic time series signal. It is possible to accurately extract the signal component. In addition, it is possible to perform highly accurate encoding by using an acoustic signal as a time-series signal and performing frequency analysis on the acoustic signal.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0011]
(1.1. Basic principle of frequency analysis method)
First, the basic principle of the frequency analysis method according to the present invention will be described with reference to an example in which an acoustic signal is used as a time series signal and encoding is performed using the result of frequency analysis. Since this basic principle is disclosed in the above-mentioned publications, only the outline will be briefly described here.
[0012]
As shown in FIG. 1A, it is assumed that an analog acoustic signal is given as a time-series signal. In the example of FIG. 1, the acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis. Here, first, the analog sound signal is processed as digital sound data. This may be performed by using a conventional general PCM method, sampling the analog acoustic signal at a predetermined sampling frequency, and converting the amplitude into digital data using a predetermined number of quantization bits. Here, for convenience of explanation, the waveform of the acoustic data digitized by the PCM method is also shown by the same waveform as the analog acoustic signal of FIG.
[0013]
Subsequently, a plurality of unit sections are set on the time axis of the acoustic signal to be analyzed. In the example shown in FIG. 1A, six times t1 to t6 are defined at equal intervals on the time axis t, and five unit intervals d1 to d5 having these times as the start point and the end point are set. In the example of FIG. 1, unit sections having the same section length are set, but the section length may be changed for each unit section. Alternatively, the section setting may be performed such that adjacent unit sections partially overlap on the time axis.
[0014]
When the unit section is set in this way, representative frequencies are selected for the acoustic signals (hereinafter referred to as section signals) for each unit section. Each section signal usually includes various frequency components. For example, a frequency component having a high component intensity ratio may be selected as the representative frequency. Here, the so-called fundamental frequency is generally used as the representative frequency, but a harmonic frequency such as a formant frequency of speech or a peak frequency of a noise source may be treated as a representative frequency. Although only one representative frequency may be selected, more accurate encoding is possible by selecting a plurality of representative frequencies depending on the acoustic signal. FIG. 1B shows an example in which three representative frequencies are selected for each unit section, and one representative frequency is encoded as one representative code (shown as a note for convenience in the drawing). Has been. Here, three tracks T1, T2 and T3 are provided to accommodate representative codes (notes), but this means that three representative codes selected for each unit section are assigned to different tracks. It is for accommodating.
[0015]
For example, representative codes n (d1,1), n (d1,2), n (d1,3) selected for the unit section d1 are accommodated in tracks T1, T2, T3, respectively. Here, each code n (d1,1), n (d1,2), n (d1,3) is a code indicating a note number in the MIDI code. The note number in the MIDI code takes 128 values from 0 to 127, each indicating one key of the piano keyboard. Specifically, for example, when 440 Hz is selected as the representative frequency, this frequency corresponds to the note number n = 69 (corresponding to “ra sound (A3 sound)” in the center of the piano keyboard). N = 69 is selected. However, FIG. 1B is a conceptual diagram showing the representative code obtained by the above-described method in the form of a note. In reality, data on intensity is also added to each note. For example, the track T1 includes e (d1,1), e (d2,1)... Along with data indicating the pitches of note numbers n (d1,1), n (d2,1). Data indicating the strength is accommodated. The data indicating the intensity is determined by the degree to which the component of each representative frequency is included in the original section signal. Specifically, the data indicating the intensity is determined based on the correlation value with respect to the section signal of the periodic function having each representative frequency. Further, in the conceptual diagram shown in FIG. 1B, the position of each unit section on the time axis is indicated by the position of the note in the horizontal direction, but in reality, the position on the time axis is shown. Is accurately added as a numerical value to each note.
[0016]
As a format for encoding an acoustic signal, it is not always necessary to adopt the MIDI format. However, since the MIDI format is the most popular as this type of encoding, code data in the MIDI format is practically used. Is preferred. In the MIDI format, “note-on” data or “note-off” data exists while interposing “delta time” data. The “note-on” data is data for designating a specific note number N and velocity V to instruct the start of a specific sound, and the “note-off” data is a specific note number N and velocity V. This is data that designates the end of the performance of a specific sound. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter that indicates, for example, the speed at which a piano keyboard is pressed down (velocity at the time of note-on) and the speed at which the finger is released from the keyboard (velocity at the time of note-off). Or it shows the strength of the performance end operation.
[0017]
In the above-described method, J note numbers n (di, 1), n (di, 2),..., N (di, J) are obtained as representative codes for the i-th unit interval di. Intensities e (di, 1), e (di, 2),..., E (di, J) are obtained for each of these. Therefore, MIDI format code data can be created by the following method. First, as the note number N described in the “note on” data or “note off” data, the obtained note numbers n (di, 1), n (di, 2),..., N (di , J) can be used as they are. On the other hand, as the velocity V described in the “note on” data or “note off” data, the obtained intensities e (di, 1), e (di, 2),..., E (di, A value obtained by normalizing J) by a predetermined method may be used. The “delta time” data may be set according to the length of each unit section.
[0018]
(1.2. Specific method for obtaining correlation with periodic function)
In the method based on the basic principle described above, one or a plurality of representative frequencies are selected for the section signal, and the section signal is represented by a periodic signal having this representative frequency. Here, the representative frequency to be selected is literally a frequency representing the signal component in the unit section. Specific methods for selecting the representative frequency include a method using a short-time Fourier transform and a method using a generalized harmonic analysis method, as will be described later. Both methods have the same basic concept. Prepare a plurality of periodic functions with different frequencies in advance, and from these periodic functions, a periodic function that has a high correlation with the section signal in the unit section. And a method of selecting the frequency of the highly correlated periodic function as a representative frequency is adopted. That is, when selecting a representative frequency, an operation for obtaining a correlation between a plurality of periodic functions prepared in advance and a section signal in a unit section is performed. Therefore, here, a specific method for obtaining the correlation with the periodic function will be described.
[0019]
Assume that trigonometric functions as shown in FIG. 2 are prepared as a plurality of periodic functions. These trigonometric functions are composed of a pair of a sine function and a cosine function having the same frequency. For each of 128 standard frequencies f (0) to f (127), a pair of a sine function and a cosine function. Is defined. Here, a pair of functions consisting of a sine function and a cosine function having the same frequency is defined as a periodic function for the frequency. That is, the periodic function for a specific frequency is constituted by a pair of sine function and cosine function. Thus, the periodic function is defined by a pair of sine function and cosine function in order to consider that the correlation value is influenced by the phase when obtaining the correlation value of the periodic function with respect to the signal. The variables F and k in each trigonometric function shown in FIG. 2 are variables corresponding to the sampling frequency F and the sample number k for the section signal X. For example, a sine wave with respect to the frequency f (0) is represented by sin (2πf (0) k / F), and given an arbitrary sample number k, the same time position as the k-th sample constituting the section signal The amplitude value of the periodic function at is obtained.
[0020]
Here, an example in which 128 standard frequencies f (0) to f (127) are defined by the equations as shown in FIG. That is, the nth (0 ≦ n ≦ 127) standard frequency f (n) is defined by the following [Formula 1].
[0021]
[Formula 1]
f (n) = 440 × 2^γ ⁽ⁿ⁾
γ (n) = (n−69) / 12
[0022]
If the standard frequency is defined by such an expression, it is convenient when finally encoding using MIDI data is performed. This is because the 128 standard frequencies f (0) to f (127) set by such a definition take frequency values forming a geometric series, and correspond to the note numbers used in the MIDI data. This is because it becomes a frequency. Therefore, the 128 standard frequencies f (0) to f (127) shown in FIG. 2 are frequencies set at equal intervals (in semitone units in MIDI) on the frequency axis shown on the logarithmic scale.
[0023]
Next, a specific description will be given of how to obtain the correlation of each periodic function with respect to a section signal in an arbitrary section. For example, as shown in FIG. 4, it is assumed that a section signal X is given for a certain unit section d. Here, it is assumed that sampling is performed at the sampling frequency F for the unit interval d having the interval length L, and w sample values are obtained in total, and the sample numbers are 0, 1, 2, 3,..., K,..., W-2, w-1 (the w-th sample indicated by a white circle is a sample included at the head of the next unit section adjacent to the right. And). In this case, for an arbitrary sample number k, an amplitude value of X (k) is given as digital data. In the short-time Fourier transform, it is usual to multiply the window function W (k) such that the center weight is close to 1 and the weights at both ends are close to 0 for each sample with respect to X (k). That is, X (k) × W (k) is treated as X (k) and the following correlation calculation is performed. As the shape of the window function, a cosine wave-shaped Hamming window is generally used. Here, w is described as a constant in the following description, but in general, it is changed according to the value of n, and F / f (n) that is maximum within a range not exceeding the section length L. It is desirable to set the value to an integer multiple.
[0024]
The principle of obtaining a correlation value with such a section signal X and the sine function Rn having the nth standard frequency f (n) is shown. Both correlation values A (n) can be defined by the first arithmetic expression of FIG. Here, X (k) is the amplitude value of the sample number k in the section signal X, as shown in FIG. 4, and sin (2πf (n) k / F) is the sine at the same position on the time axis. This is the amplitude value of the function Rn. This first arithmetic expression can be said to be an expression for obtaining the inner product of the amplitude value of the section signal X and the amplitude vector of the sine function Rn for the dimensions of all sample numbers k = 0 to w−1 in the unit section d. .
[0025]
Similarly, the second arithmetic expression in FIG. 5 is an expression for obtaining a correlation value between the interval signal X and the cosine function having the nth standard frequency f (n), and the correlation value between the two is B ( n). The first arithmetic expression for obtaining the correlation value A (n) and the second arithmetic expression for obtaining the correlation value B (n) are finally multiplied by 2 / w. Is for normalizing the correlation value. As described above, since w is generally changed depending on n, this coefficient is also a variable depending on n.
[0026]
The effective correlation value between the interval signal X and the standard periodic function having the standard frequency f (n) is the correlation value A (n) with the sine function, the cosine function, as shown in the third arithmetic expression of FIG. Of the square sum of squares E (n) with the correlation value B (n). If the frequency of the standard periodic function having a large correlation effective value is selected as the representative frequency, the section signal X can be encoded using this representative frequency.
[0027]
That is, one or a plurality of standard frequencies whose correlation value E (n) is greater than or equal to a predetermined reference may be selected as the representative frequency. Here, the selection condition that “correlation value E (n) is greater than or equal to a predetermined reference” is, for example, a standard in which some threshold value is set and correlation value E (n) exceeds this threshold value. An absolute selection condition that all frequencies f (n) are selected as representative frequencies may be set. For example, up to the Qth in the order of the correlation value E (n) is selected. A relative selection condition may be set.
[0028]
(1.3. Generalized Harmonic Analysis Method)
As described above, the method for simply obtaining the correlation between the interval signal and the periodic function is the short-time Fourier transform. The generalized harmonic analysis in which the analysis is performed using the correlation calculated by this method will be described. . As already described, when encoding an acoustic signal, several representative frequencies having high correlation values are selected for the section signal in each unit section. Generalized harmonic analysis is a technique that enables the selection of representative frequencies with higher accuracy, and the basic principle thereof is as follows.
[0029]
Assume that there is a signal S (j) for the unit interval d as shown in FIG. Here, j is a parameter for repetitive processing (j = 1 to J), as will be described later. First, correlation values for all 128 periodic functions as shown in FIG. 2 are obtained for this signal S (j). The short-time Fourier transform described above is used for calculating the correlation value. Then, the frequency of one periodic function having the maximum correlation value is selected as a representative frequency, and the periodic function having the representative frequency is extracted as an element function. Subsequently, the inclusion signal G (j) as shown in FIG. 6B is defined. The inclusion signal G (j) is a signal obtained by multiplying the extracted element function by the correlation value of the element function with respect to the signal S (j) of the element function. For example, as shown in FIG. 2, when a frequency f (n) is selected as a representative frequency using a pair of sine function and cosine function as shown in FIG. 2, a sine function A (n) having an amplitude A (n). ) Sin (2πf (n) k / F) and a signal composed of the sum of cosine function B (n) cos (2πf (n) k / F) having amplitude B (n) is included signal G (j) (In FIG. 6B, only one function is shown for convenience of illustration). Here, since A (n) and B (n) are normalized correlation values obtained by the equation of FIG. 5, the inclusion signal G (j) is eventually included in the signal S (j). It can be said that the signal component has a certain frequency f (n).
[0030]
Thus, when the content signal G (j) is obtained, the difference signal S (j + 1) is obtained by subtracting the content signal G (j) from the signal S (j). FIG. 6C shows the difference signal S (j + 1) obtained in this way. The difference signal S (j + 1) can be said to be a signal composed of the remaining signal components obtained by removing the signal component having the frequency f (n) from the original signal S (j). Therefore, by increasing the parameter j by 1, this difference signal S (j + 1) is handled as a new signal S (j), and the same processing is performed J times while increasing the parameter j by 1 from j = 1 to J. If it is repeatedly executed, J representative frequencies can be selected.
[0031]
The J inclusion signals G (1) to G (J) output as a result of such correlation calculation are signals that are constituent elements of the original section signal X, and the original section signal X is encoded. In this case, information indicating the frequency of these J inclusion signals and information indicating the amplitude (intensity) may be used as the code data. Although J has been described as the number of representative frequencies, it may be the same as the number of standard frequencies f (n), that is, J = 128. For the purpose of obtaining a frequency spectrum, this is usually done. It is.
[0032]
Thus, when a predetermined number of frequency groups are selected for each unit section, “information indicating the pitch” corresponding to each frequency of this frequency group, and “sound intensity” corresponding to the signal intensity of each selected frequency. "Information indicating strength", "information indicating the start time of sound generation" corresponding to the start point of the unit section, "information indicating the end time of sound generation" corresponding to the start point of the unit section subsequent to the unit section, If a predetermined number of pieces of code data including the four pieces of information are created, the section signal X in the unit section can be encoded with the predetermined number of pieces of code data. If MIDI data is created as code data, a note number is used as “information indicating the pitch of the sound”, velocity is used as the “information indicating the intensity of the sound”, and “sound generation start time is set. The note-on time may be used as the “information indicating” and the note-off time may be used as the “information indicating the end time of sound generation”.
[0033]
(2.1. Frequency analysis method according to the present invention)
To summarize the basic principle of the present invention common to the conventional techniques described so far, unit intervals are set in the original sound signal, signal intensities corresponding to a plurality of frequencies are calculated for each unit interval, and the obtained signal intensities are calculated. One or more representative frequencies are selected using a periodic function prepared based on the sound pitch information corresponding to the selected representative frequency and the sound frequency corresponding to the intensity of the selected representative frequency. The sound signal is encoded by creating code data composed of intensity information, a sounding start time corresponding to the start point of the unit section, and a sounding end time corresponding to the end point of the unit section. become.
[0034]
In the frequency analysis method and the acoustic signal encoding method of the present invention, in the process of selecting the representative frequency, the analysis result by the generalized harmonic analysis is used based on the analysis result by the short-time Fourier transform in the process of selecting the representative frequency. Is.
[0035]
From here, the frequency analysis method of the present invention will be described with reference to the flowchart shown in FIG. First, a unit section is set over all sections on the time axis of the acoustic signal (step S1). The technique in step S1 is as described with reference to FIG. 1A in the basic principle.
[0036]
Next, the acoustic value for each unit interval, that is, the interval signal, is subjected to frequency analysis by a short-time Fourier transform technique to calculate a correlation value corresponding to each frequency (step S2). Here, 128 standard periodic functions having standard frequencies corresponding to the note numbers are prepared in the same manner as in the example in the section of “1.2. Specific method for obtaining correlation with periodic function” above, and short-time Fourier transform is performed. A correlation value is obtained for the standard frequency of each standard periodic function. These correlation values are recorded in a correlation array prepared in advance. With this correlation array, correlation values corresponding to standard frequencies f (0) to f (127) can be recorded.
[0037]
Subsequently, frequency analysis is performed on the same section signal by a generalized harmonic analysis technique to calculate a correlation value corresponding to each standard frequency (step S3). Unlike the correlation value calculated in step S2, this correlation value is not used until the end, but is used to determine the priority of the correlation value calculated in step S2, and is recorded in the priority array. Here, as in the example in the section “1.3. Generalized Harmonic Analysis” above, the correlation value with the interval signal is calculated for each of the 128 standard periodic functions having frequencies corresponding to the note numbers, and the priority array To record. Specifically, the correlation calculation for obtaining the J inclusion signals G (1) to G (J) in the example described with reference to FIG. 6 is performed. Similarly to the correlation array, the priority array can record correlation values corresponding to the standard frequencies f (0) to f (127).
[0038]
The processes in steps S2 and S3 are executed for all the unit sections set in step S1. Subsequently, the correlation value is recorded in the intensity array using the correlation value in the correlation array calculated in steps S2 and S3 and the correlation value in the priority array (step S4). Specifically, first, for each unit section, the correlation value is left as it is for a predetermined number from the largest correlation value in the priority array, and the other correlation values are changed to zero. This predetermined number can be changed by setting, but the number of sounds to be extracted simultaneously is set. Next, for a frequency whose correlation value is not 0 in the priority array, a corresponding correlation value is extracted from the correlation array and recorded in the intensity array in association with the frequency. In this way, an intensity array is obtained for each unit section. Since the obtained intensity array is a combination of a frequency and a correlation value that should be extracted, a frequency component can be extracted based on the combination and a highly accurate frequency analysis can be performed.
[0039]
An effective example of the frequency analysis method as described above will be described with reference to FIG. In FIG. 8, all six graphs show the relationship between frequency (note number) and intensity. The upper left figure is the spectrum of the original signal, and it is most preferable that these two frequency components can be extracted with this intensity. A spectrum signal obtained by performing generalized harmonic analysis and analysis by short-time Fourier transform on such an original sound signal has curves as shown in the left middle and lower left figures, respectively. In the left middle and lower left diagrams, a straight line is drawn at a position corresponding to the frequency of the original signal. Now, when two frequency components having the maximum intensity are extracted based on the result of the generalized harmonic analysis shown in the left middle diagram, the result becomes as shown in the right middle diagram, and the extracted frequency components are the original signal. Although the frequency is the same, the intensity values are reversed, with the left side being first and the right side being second. On the other hand, when two frequency components having the maximum intensity are extracted based on the analysis result of the short-time Fourier transform shown in the lower left diagram, as shown in the upper right diagram, the first frequency component is completely different from the original signal. Although it is the same, as the second frequency component, what was not present in the original signal is extracted. According to the frequency analysis method of the present invention, using the spectrum signal as shown in the lower left figure, that is, the intensity value corresponding to the frequency is obtained by using the analysis result by the short-time Fourier transform, and only the extraction frequency is determined. Based on the result of generalized harmonic analysis,
As shown in the lower right figure, the frequency component of the original signal can be accurately extracted.
[0040]
Next, an audio signal encoding method according to the present invention will be described with reference to the flowchart shown in FIG. First, a unit section is set over all sections on the time axis of the acoustic signal (step S11). The method in step S11 is substantially the same as that in step S1 shown in FIG. 7, and is as described with reference to FIG. 1A in the basic principle.
[0041]
Next, the acoustic value for each unit section, that is, the section signal is subjected to frequency analysis by a short-time Fourier transform technique to calculate a correlation value corresponding to each frequency (step S12). Similarly to step S2 shown in FIG. 7, this is the same as the example in the section “1.2. Specific method for obtaining correlation with periodic function”, and 128 standard periodic functions having standard frequencies corresponding to the note numbers. And a correlation value is obtained for the standard frequency of each standard periodic function by short-time Fourier transform. These correlation values are recorded in a correlation array prepared in advance. With this correlation array, correlation values corresponding to standard frequencies f (0) to f (127) can be recorded.
[0042]
Subsequently, a frequency analysis is performed on the same section signal by a generalized harmonic analysis method to calculate a correlation value corresponding to each standard frequency (step S13). Also in step S13, similar to step S2 shown in FIG. 7, this correlation value is different from the correlation value calculated in step S12 and is not used to the end. The priority of the correlation value calculated in step S12 is set as follows. Used to determine and recorded in the priority array. Here, as in the example in the section “1.3. Generalized Harmonic Analysis” above, the correlation value with the interval signal is calculated for each of the 128 standard periodic functions having frequencies corresponding to the note numbers, and the priority array To record. Specifically, the correlation calculation for obtaining the J inclusion signals G (1) to G (J) in the example described with reference to FIG. 6 is performed. Similarly to the correlation array, the priority array can record correlation values corresponding to the standard frequencies f (0) to f (127).
[0043]
The process of step S12 and step S13 is performed with respect to all the unit sections set by said step S11. The correlation value in the correlation array calculated in step S12 and step S13 and the correlation value in the priority array are each used as one piece of attribute information of unit phoneme data. Specifically, the standard frequency obtained by the short-time Fourier transform method in step S12, the correlation value in the correlation array, the four pieces of information of the start point of the unit section, the end point of the unit section, and the generalized harmonic analysis in step S13 The five pieces of information obtained by adding the correlation values in the priority array obtained by the above method are defined as “unit phoneme data”. This unit phoneme data refers to phoneme data having a unit section length. In this embodiment, unit phoneme data corresponding to all prepared standard periodic functions is acquired instead of selecting a representative frequency as in the case described in the basic principle. By performing the process of step S12 on all unit sections, a unit phoneme data [m, n] (0 ≦ m ≦ M−1, 0 ≦ n ≦ N−1) group is obtained. Here, N is the total number of standard periodic functions (N = 128 in the above example), and M is the total number of unit sections set in the acoustic signal. That is, a unit phoneme data group composed of M × N unit phoneme data is obtained.
[0044]
When the unit phoneme data group is obtained, the priority of the unit phoneme data constituting the unit phoneme data group is determined (step S14). Specifically, first, unit phoneme data whose correlation value in the priority array is equal to or less than a predetermined reference is deleted. Here, the unit phoneme data whose correlation value in the priority array is equal to or less than a predetermined reference is deleted because the phoneme whose signal level is almost 0 and in which no sound actually exists is determined. This is for deletion. Therefore, a value that is regarded as a level at which no sound actually exists is set as the predetermined reference. At this time, the number of unit phoneme data is reduced from M × N.
[0045]
Subsequently, for each unit section, the correlation value is left as it is for a predetermined number from the largest correlation value in the priority array, and the other correlation values are changed to 0. The predetermined number can be changed by setting, but is set to 16, which is the number of sounds that can be generated simultaneously in the normal MIDI standard. Thereby, the unit phoneme data in which the correlation value in the priority array is left as it is has the same effect as when the priority mark is marked. The unit phoneme data in which the correlation values in the priority array are left as they are will be referred to as priority phoneme data hereinafter.
[0046]
When a unit phoneme data group including priority phoneme data is obtained in this way, a plurality of unit phoneme data continuous in the time-series direction at the same frequency are connected as one connected phoneme data (step S15). FIG. 10 is a conceptual diagram for explaining connection of unit phoneme data. FIG. 10A is a diagram showing a state of unit phoneme data groups before connection. In FIG. 10A, each rectangle partitioned in a lattice pattern indicates unit phoneme data, and the shaded rectangle has a correlation value in the priority array equal to or lower than a predetermined reference in step S4. The unit phoneme data determined to be present and deleted, and the other rectangles indicate the unit phoneme data not deleted. In step S5, in order to concatenate unit phoneme data continuous in the time t direction at the same frequency (same note number), when the concatenation process is performed on the unit phoneme data group shown in FIG. A phoneme data group consisting of a plurality of connected phoneme data and a plurality of unit phoneme data as shown in b) is obtained. For example, unit phoneme data A1, A2, and A3 shown in FIG. 10A are connected to obtain connected phoneme data A as shown in FIG. 10B. At this time, any one of the unit phoneme data A1, A2, and A3 to be configured must have priority phoneme data, that is, the correlation value in the priority array must be a positive value, and none of them is priority phoneme data. The case is not connected and is deleted at this stage, and is not passed to the next step S16. When concatenation is performed, a frequency common to unit phoneme data A1, A2, and A3 is given as a frequency of newly obtained concatenated phoneme data A, and unit phoneme data A1 and A2 are used as correlation values in the correlation array. , A3 is given the largest correlation value in the correlation array, the start time t1 of the first unit phoneme data A1 is given as the start time, and the last unit phoneme data A3 is given as the end time. The section end time t4 is given. At the time of final encoding, unlike unit phoneme data, it is composed of only four pieces of information of frequency (note number), correlation value of correlation array, start time, and end time. By integrating the connected phoneme data, the amount of data is reduced to one third. This means that when MIDI encoding is finally performed, it is expressed not as three short notes but as one long note. Further, when there is no unit phoneme data continuous in the time-series direction at the same frequency as the priority phoneme data B shown in FIG. 10A, the unit phoneme data is the priority phoneme data. As shown in FIG. 10 (b), it remains as it is without being connected, but in the subsequent processing, the connected phoneme data and the priority phoneme data of the unit interval length not connected are collectively referred to as “phoneme data”. deal with.
[0047]
Subsequently, the phoneme data is deleted and encoded so that the total number of phoneme data does not exceed a predetermined number in all the sections (step S16). As the predetermined number, for example, when encoding into MIDI data, the number of phonemes per second is set to less than 250 (code length 20 kbps). For specific phoneme data to be deleted, a value calculated by (end time−start time) × correlation value of each phoneme data is adopted. That is, the ones with low values are deleted sequentially.
When the total number of phoneme data is adjusted, encoding is performed in the MIDI format.
[0048]
Although encoding is performed as described above, a method for further improving accuracy will be described next. In the priority determination stage of step S14, unit phoneme data whose correlation value in the priority array is equal to or lower than a predetermined reference is deleted from the unit phoneme data group by using the priority array. However, the value recorded in the priority array can be made more accurate by using the following method. Specifically, first, the frequency is defined at a narrower interval than the standard frequency corresponding to the note number, and a periodic function corresponding to the defined frequency is prepared. In this specification, such a frequency defined with a narrow interval is called a fine frequency, and a periodic function corresponding to the fine frequency is called a fine periodic function. A predetermined number of fine frequencies are set between adjacent standard frequencies. Further, the interval between the fine frequencies is set to be a geometric series as in the case of the standard frequency. Here, FIG. 11 shows an example in which twelve fine frequencies are set between the standard frequencies. As shown in FIG. 11, twelve fine frequencies f (n + 1/13) to fine frequency f (n + 12/13) are set between the standard frequency f (n) and the standard frequency f (n + 1). In FIG. 11, the dotted line between the note number n + 6/13 and the note number n + 7/13 indicates the range of fine frequencies that are considered as the respective standard frequencies. This “considered as each standard frequency” indicates that the correlation value corresponding to each standard frequency is stored in the priority array. FIG. 11 shows that the note number n + 6/13 is regarded as the frequency range of the note number n, and the note number n + 7/13 is regarded as the note number n + 1.
[0049]
A fine periodic function having the same format as the standard periodic function shown in FIG. 2 is prepared for such a fine frequency, and a correlation value between each fine periodic function and the interval signal is calculated. And among the correlation values of 13 frequencies (one standard frequency and 6 fine frequencies before and after each) that fall within the range of each standard frequency, the largest one is stored in the priority array as the correlation value of the standard frequency. The For example, the maximum correlation value of the note number n (standard frequency f (n)) is set among 13 correlation values from the note number n−6 / 13 to the note number n + 6/13. Here, when the frequency having the maximum correlation value is located at the end of the standard frequency range such as note number n-6 / 13 or note number n + 6/13, the standard frequency adjacent to the target standard frequency is used. However, it is highly possible that the component does not originally exist within the target standard frequency range due to the influence of the adjacent standard frequency component. In order to distinguish the former from the latter, pay attention to the fine frequency of the adjacent standard frequency, and if the adjacent standard frequency is located at the end of the standard frequency range so as to overlap the target standard frequency, Yes, if it is located other than that, it is regarded as the latter. In the latter case, the correlation value of the corresponding standard frequency in the priority array is set to “0”. In the former case, since the target standard frequency component and the adjacent standard frequency component are calculated redundantly, the correlation value of the target standard frequency of the priority array or the correlation value of the adjacent standard frequency is set to “0”. In this embodiment, the lower correlation value is set to “0”.
[0050]
The unit phoneme data in which the correlation value of the priority array is set as “0” in this way is, of course, determined that the correlation value of the priority array is equal to or lower than a predetermined reference, and is deleted in step S14. Will be. That is, by setting the fine frequency, more accurate priority phoneme data can be obtained.
[0051]
Although the preferred embodiments of the present invention have been described above, the frequency analysis method and the acoustic signal encoding method are naturally executed by a computer or the like. Specifically, a program for executing the steps shown in the flowcharts of FIGS. 7 and 9 according to the above procedure is installed in the computer. Then, after digitizing the acoustic signal by the PCM method or the like, it is taken into a computer, and after performing the processing of step S1 to step S4 or step S11 to step S16, and performing frequency analysis, the frequency component is encoded. If conversion is performed, code data such as MIDI format is output from the computer. In the case of encoding, for example, in the case of MIDI data, the output code data is reproduced as sound using a MIDI sequencer and a MIDI sound source.
[0052]
【The invention's effect】
As described above, according to the present invention, a plurality of frequencies are set in the frequency range to be analyzed, a plurality of periodic function sets corresponding to each frequency are prepared, and the relationship between each frequency and the correlation value is determined for each frequency. A correlation array, a priority array, and an intensity array, which are arrays for storing data, are prepared, a plurality of unit sections are set on the time axis of the time series signal, a section signal is extracted for each unit section, and a plurality of section sections are extracted. Calculate the correlation value corresponding to each periodic function by calculating the correlation between the periodic function and the interval signal, set the value corresponding to the frequency of each periodic function in the correlation array, and use the periodic function set to generalize harmony A correlation value corresponding to each frequency is calculated by an analysis method, a value corresponding to each frequency of the priority array is set, and the value of the intensity array is weighted by the value of the priority array with respect to the correlation array. Like to decide Since the suppress extraction of pseudo components, an effect that it becomes possible to accurately extract a signal component included in the time-series signal group. Furthermore, an acoustic signal is used as a time-series signal, and frequency analysis is performed on the acoustic signal, thereby achieving an effect that highly accurate encoding can be performed.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic principle of an audio signal encoding method according to the present invention.
FIG. 2 is a diagram showing an example of a periodic function used in the present invention.
3 is a diagram showing a relational expression between the frequency of each periodic function shown in FIG. 2 and a MIDI note number n. FIG.
FIG. 4 is a diagram illustrating a method of calculating a correlation between a signal to be analyzed and a periodic signal.
FIG. 5 is a diagram showing a calculation formula for performing the correlation calculation shown in FIG. 4;
FIG. 6 is a diagram showing a basic method of generalized harmonic analysis.
FIG. 7 is a flowchart of the frequency analysis method of the present invention.
FIG. 8 is a diagram showing an effective example of the frequency analysis method of the present invention.
FIG. 9 is a flowchart of an audio signal encoding method according to the present invention.
FIG. 10 is a conceptual diagram for explaining connection of unit phoneme data.
FIG. 11 is a diagram illustrating a state in which twelve fine frequencies are set between the standard frequencies.
[Explanation of symbols]
A (n), B (n) ... correlation value
d, d1 to d5 ... unit interval
E (n) ... correlation value
G (j) ... Inclusion signal
n, n1 to n6 ... note number
S (j), S (j + 1)... Differential signal
X, X (k) ... section signal

Claims

A frequency analysis method for separating a plurality of signal components from a time series signal,
A periodic function preparation stage that sets a plurality of frequencies in a frequency range to be analyzed and prepares a plurality of periodic function sets corresponding to each frequency;
An array preparation stage for preparing a correlation array, a priority array, and an intensity array, which are arrays for storing the relationship between each frequency and the correlation value for each frequency;
A section signal extraction stage for setting a plurality of unit sections on the time axis of the time series signal and extracting a section signal for each unit section;
A correlation calculation step for calculating a correlation value corresponding to each periodic function by calculating a correlation between the plurality of periodic functions and the interval signal, and setting a value corresponding to a frequency of each periodic function of the correlation array; ,
Calculating a correlation value corresponding to each frequency using a method of generalized harmonic analysis using the periodic function set, and a priority determining step for setting a value corresponding to each frequency of the priority array;
The value before Symbol priority sequence, extracts the correlation value from the corresponding in the correlation sequence, it comprises a strength calculation step of determining the correlation value as the value of the intensity sequences, and
By executing the correlation calculation step, the priority setting step, and the intensity calculation step for all unit intervals set in the interval signal extraction step,
A frequency analysis method characterized in that a plurality of sets of frequencies and intensity values are obtained for each unit section.

The priority determination step includes :
Elected periodic function correlation value before Symbol correlation sequence corresponding to the maximum frequency, the correlation is calculated again and the interval signal, the priority for setting a correlation value corresponding to the frequency of the priority sequence The setting stage,
An interval signal update stage in which the inclusion signal generated by the product of the recalculated correlation value and the periodic function is subtracted from the interval signal, and the difference signal is newly set as the interval signal;
2. The correlation value of the priority array is determined based on a generalized harmonic analysis method by repeatedly executing the priority setting step and the interval signal update step. Frequency analysis method.

Setting a fine frequency at a finer interval between standard frequencies of the standard periodic function and setting a fine periodic function having a fine frequency;
The priority setting step obtains a correlation value corresponding to the fine frequency, sets the maximum correlation value in the priority array as a correlation value corresponding to the standard frequency closest to the corresponding fine frequency, and the correlation value is maximum. The correlation value corresponding to the standard frequency closest to the fine frequency is set to 0 when the fine frequency is near the center between adjacent standard frequencies. The frequency analysis method described in 1.

The strength calculating step prioritizes all defined frequencies based on the determined priority array values, and the priority array corresponding to frequencies whose priority is after a predetermined value. The frequency analysis method according to any one of claims 1 to 3, wherein the intensity value in the intensity array is determined by performing a modification so that the correlation value is zero.

The time series signal is an acoustic signal;
The frequency at which the intensity value in the intensity array determined in the intensity calculation step reaches a predetermined value is selected, the pitch information corresponding to the selected standard frequency, and the sound corresponding to the intensity value of the standard frequency Of sound signal by generating four pieces of information including intensity information, sounding start time corresponding to the start point of each unit section, and sounding end time corresponding to the end point of each unit section The frequency analysis method according to claim 1, wherein the frequency analysis method is performed.

A section setting stage for setting a plurality of unit sections on the time axis for a given acoustic signal,
The first correlation value corresponding to the frequency of each periodic function is calculated by obtaining the correlation between the acoustic signal in the unit section and a plurality of periodic functions using short-time Fourier transform, and the acoustic signal in the unit section The second correlation value corresponding to the frequency of each periodic function is calculated by obtaining the correlation between the periodic function and the plurality of periodic functions using generalized harmonic analysis. A unit phoneme composed of a corresponding first correlation value, a second correlation value corresponding to each periodic function, a section start time corresponding to the start point of the unit section, and a section end time corresponding to the end point of the unit section A unit phoneme data calculation stage for calculating data;
The unit phoneme data calculation stage is deleted for all unit phoneme data obtained by performing the process of calculating the unit phoneme data for all unit intervals, and the unit phoneme data remaining is deleted for each unit phoneme data. A priority determination stage for marking the priority of a predetermined number of unit phoneme data for each section;
Among the remaining unit phoneme data, any one of the unit phoneme data to be connected is marked with a priority, and the same frequency and continuous sections are connected and connected. As the phoneme data, as the attribute of the connected phoneme data, the intensity value gives the maximum one of the first correlation values of the constituting unit phoneme data, the start time gives the section start time of the first effective phoneme data, and the end time Is a phoneme data connection stage that gives the end time of the last effective phoneme data section,
An encoding step of expressing an acoustic signal by a set of phoneme data after the connection processing;
A method for encoding an acoustic signal, comprising:

To set a plurality of frequencies in a frequency range to be analyzed in a computer and prepare a plurality of periodic function sets corresponding to each frequency, and to store the relationship between each frequency and the correlation value for each frequency An array preparation stage for preparing a correlation array, a priority array, and an intensity array, and a section signal extraction for setting a plurality of unit sections on the time axis of the time series signal and extracting a section signal for each unit section Calculating a correlation value corresponding to each periodic function by calculating a correlation between the plurality of periodic functions and the interval signal, and calculating a correlation corresponding to the frequency of each periodic function of the correlation array A priority determination step for calculating a correlation value corresponding to each frequency using a method of generalized harmonic analysis using the periodic function set and setting a value corresponding to each frequency of the priority array The value before Symbol priority sequence, extracts the correlation value from the corresponding in the correlation sequence, a program for executing the intensity calculating step of determining the correlation value as the value of the intensity array.