JP4331373B2

JP4331373B2 - Time-series signal analysis method and acoustic signal encoding method

Info

Publication number: JP4331373B2
Application number: JP2000068521A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2000-03-13
Filing date: 2000-03-13
Publication date: 2009-09-16
Anticipated expiration: 2020-03-13
Also published as: JP2001255870A

Abstract

PROBLEM TO BE SOLVED: To perform more accurate frequency analysis to a time-series signal, and to encode an original acoustic sound signal with high quality. SOLUTION: A time-series signal is inputted (S1), and is divided into unit sections (S2). A function group consisting of plural periodical functions is defined (S3), and a sectional signal X is extracted from one unit section and this is defined as a 1st difference signal S(1) (S4). Such a periodic function as a correlation value E to the sectional signal X is the highest and a correlation value EE to a jth difference signal S(j) that is proximate to the correlation value E is selected from the group of functions as a jth element function (S6); this element function is excluded from the group of functions (S7); a content signal G(j) given by the product of the element function and the correlation value EE is obtained; a new difference signal S(j+1) is obtained by subtracting the content signal G(j) from the difference signal S(j) (S8); and by repeating such processing, MIDI data according to J-pieces of content signals obtained from each section are generated.

Description

【０００１】
【発明の属する技術分野】
本発明は時系列信号の解析方法および音響信号の符号化方法に関し、時系列の強度信号として与えられる時系列信号から、その構成要素となる複数の周期信号を抽出することにより解析を行い、更に、この解析方法を利用して音響信号を符号化する技術に関する。特に、本発明は一般の音響信号を、ＭＩＤＩ形式の符号データに効率良く変換する処理への利用に適しており、放送メディア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音声配信、インターネット配信）、パッケージメディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯＭ、ゲームカセット）などで提供する各種オーディオコンテンツを制作する種々の産業分野や、医療聴診音（たとえば、心音）などの各種音響信号解析や診断の分野への応用が期待される。
【０００２】
【従来の技術】
音響信号に代表される時系列信号には、その構成要素として複数の周期信号が含まれている。このため、与えられた時系列信号にどのような周期信号が含まれているかを解析する手法は、古くから知られている。たとえば、フーリエ解析は、与えられた時系列信号に含まれる周波数成分を解析するための方法として広く利用されている。
【０００３】
このような時系列信号の解析手法を利用すれば、音響信号を符号化することも可能である。コンピュータの普及により、原音となるアナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして取り込むことが容易にできるようになってきており、こうして取り込んだデジタルデータに対してフーリエ解析などの手法を適用し、原音信号に含まれていた周波数成分を抽出すれば、各周波数成分を示す符号によって原音信号の符号化が可能になる。
【０００４】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形を記憶したＭＩＤＩ音源が別途必要になるが、その符号化効率の高さが注目を集めており、ＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウエアに広く採り入れられている。
【０００５】
そこで、音響信号に代表される時系列信号に対して、所定の手法で解析を行うことにより、その構成要素となる周期信号を抽出し、抽出した周期信号をＭＩＤＩデータを用いて符号化しようとする提案がなされている。たとえば、特開平１０−２４７０９９号公報、特開平１１−７３１９９号公報、特開平１１−７３２００号公報、特開平１１−９５７５３号公報、特開２０００−０９９００９号公報、特開２０００−０９９０９３号公報、特開２０００−２６１３２２号公報、特開２００１−００５４５０号公報、特開２００１−１４８６３３号公報、には、任意の時系列信号について、構成要素となる周波数成分を解析し、その解析結果からＭＩＤＩデータを作成することができる種々の方法が提案されている。
【０００６】
【発明が解決しようとする課題】
前掲の各文献に開示された解析方法は、いずれも、解析対象となる時系列信号の時間軸に沿って複数の単位区間を設定し、各単位区間ごとに相関の高い所定の周期関数を対応させ、各周期関数に相当する信号を構成要素となる周期信号として抽出するという手順を採っている。ところが、この解析過程において、もとの時系列信号には本来存在しない周期関数が拾われてしまう、という問題が生じている。特に、楽器音からなる原音響信号に対してこれらの解析方法を実施すると、実際に演奏された音に対応する本来の周波数だけでなく、その周辺の周波数や倍音成分の周波数が拾われてしまい、最終的にＭＩＤＩデータを作成した場合、本来の演奏音とは異なる符号が作成されてしまうという問題が生じることになる。このような問題に対処するために、前掲の特開２０００−２６１３２２号公報では、フーリエ解析の代わりに一般化調和解析を用いる手法が開示され、前掲の特開２００１−００５４５０号公報では相関演算の対象となる周期関数をよりきめ細かく設定する手法が開示され、前掲の特開２００１−１４８６３３号公報では、位相差を用いた周波数補正を行う手法が開示されている。しかしながら、これまで提案されてきたこれらの手法では、必ずしも正確な周波数解析を行うことができず、符号化を行った場合、再生音に歪みが発生するなど品質低下の問題が生じていた。
【０００７】
そこで本発明は、時系列信号に対するより正確な周波数解析を行うことができ、原音響信号の符号化を高い品質をもって行うことが可能な時系列信号の解析方法および音響信号の符号化方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
(1) 本発明の第１の態様は、時系列の強度信号として与えられる時系列信号から、その構成要素となる複数の周期信号を抽出する時系列信号の解析方法において、
解析対象となる時系列信号を、デジタルデータとして取り込む入力段階と、
取り込んだ時系列信号の時間軸上に複数の単位区間を設定する区間設定段階と、
複数通りの周波数を設定し、同一の周波数をもった正弦関数および余弦関数からなる一対の関数を、当該周波数についての周期関数と定義することにより、各周波数についての周期関数からなる関数群を定義する関数群定義段階と、
１つの単位区間内の時系列信号を区間信号Ｘとして抽出するとともに、この区間信号Ｘを第１番目の差分信号Ｓ（１）と定義する区間信号抽出段階と、
定義した関数群の中から、区間信号Ｘに対する相関値Ｅが最も高く、かつ、第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥと相関値Ｅとの関係が所定の設定条件を満足している周期関数を第ｊ番目の要素関数として選出し、第ｊ番目の要素関数を関数群から除外し、第ｊ番目の要素関数と相関値ＥＥとの積で与えられる第ｊ番目の含有信号Ｇ（ｊ）を求め、第ｊ番目の差分信号Ｓ（ｊ）から第ｊ番目の含有信号Ｇ（ｊ）を減じることにより得られる信号を新たな差分信号Ｓ（ｊ＋１）とする処理を、ｊ＝１〜Ｊ（Ｊは任意の整数）までＪ回繰り返して行う要素関数選出段階と、
を行い、
関数群定義段階において、ＭＩＤＩのノートナンバーに対応する半音単位の複数α通りの標準周波数について、それぞれ１／β半音単位の複数β通りのバリエーションをもった近接周波数（標準周波数と同一のものを含んでいてもよい）を、互いに隣接する標準周波数についての近接周波数帯が重なり合わない範囲内で設定し、各近接周波数をもった近接周期関数をそれぞれ定義し、合計α×β通りの近接周期関数によって関数群を構成するようにし、
要素関数選出段階において、所定の周期関数と所定の信号との相関値として、正弦関数についての相関値と余弦関数についての相関値との実効値を用いるようにし、
要素関数選出段階において、ある１つの近接周期関数を関数群から除外する際には、当該近接周期関数のバリエーションとなる近接周期関数も含めて合計β通りの近接周期関数すべてを除外するようにし、
区間信号抽出段階および要素関数選出段階については、個々の単位区間のそれぞれについて行うようにし、各単位区間ごとに複数の含有信号を求め、各単位区間について求められた含有信号を当該単位区間内の時系列信号の構成要素となる周期信号として抽出するようにしたものである。
【０００９】
(2) 本発明の第２の態様は、上述の第１の態様に係る時系列信号の解析方法において、
要素関数選出段階において、関数群の中から、区間信号Ｘに対する相関値Ｅが最も高く、かつ、第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥと相関値Ｅとの関係が所定の設定条件を満足している周期関数を第ｊ番目の要素関数として選出する処理を行う際に、
区間信号Ｘに対する相関値Ｅが最も高い周期関数を仮要素関数として選出し、この仮要素関数についての第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥを計算して条件判断を行い、
条件判断の結果、設定条件を満足していた場合には、当該仮要素関数を第ｊ番目の要素関数として選出する処理を行うとともに、この第ｊ番目の要素関数を関数群から除外する処理を行い、
条件判断の結果、設定条件を満足していない場合には、当該仮要素関数を関数群から除外する処理を行った上で新たな仮要素関数を選出する処理を、設定条件が満足されるまで繰り返し行うようにしたものである。
【００１０】
(3) 本発明の第３の態様は、上述の第１または第２の態様に係る時系列信号の解析方法において、
要素関数選出段階において、相関値ＥＥと相関値Ｅとの関係についての所定の設定条件として、所定のしきい値Δを設定したときに、｜Ｅ−ＥＥ｜＜ΔまたはＥＥ＞Ｅ−Δという条件を用いるようにしたものである。
【００１２】
(4) 本発明の第４の態様は、上述の第１〜第３の態様に係る時系列信号の解析方法において、
要素関数選出段階において、選出された要素関数がもつ周波数の整数倍の周波数をもち、かつ、関数群の中に含まれている１つまたは複数の近接周期関数を、倍音成分関数として選出し、「各倍音成分関数」と「当該倍音成分関数の差分信号Ｓ（ｊ）に対する相関値ＥＥ」との積で与えられる各倍音含有信号を求め、差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）と各倍音含有信号とを減じることにより得られる信号を新たな差分信号Ｓ（ｊ＋１）とする処理を行うようにしたものである。
【００１３】
(5) 本発明の第５の態様は、上述の第１〜第４の態様に係る時系列信号の解析方法において、
各単位区間ごとに求められた含有信号の周波数をそれぞれ近接した標準周波数に置き換えることにより、各単位区間内の時系列信号の構成要素として最終的に抽出される周期信号が、いずれかの標準周波数をもった信号となるようにしたものである。
【００１６】
(6) 本発明の第６の態様は、上述の第１〜第５の態様に係る時系列信号の解析方法を利用した音響信号の符号化方法において、
符号化対象となる音響信号を、解析対象となる時系列信号として取り扱うことにより、個々の単位区間のそれぞれについての含有信号を求め、
特定の単位区間について求められた含有信号についての振幅を示す情報および周波数もしくはその近傍の周波数を示す情報、ならびに当該特定の単位区間の時間軸上での位置を示す情報を含む符号データを作成し、この符号データにより当該特定の単位区間内の音響信号を符号化するようにしたものである。
【００１７】
(7) 本発明の第７の態様は、上述の第６の態様に係る音響信号の符号化方法において、
個々の単位区間ごとに、得られたＪ個の含有信号のうち振幅の大きい順にＭ個（Ｍ＜Ｊ）の含有信号を選出し、このＭ個の含有信号に基づいて符号データの作成を行うようにしたものである。
【００１８】
(8) 本発明の第８の態様は、上述の第６または第７の態様に係る音響信号の符号化方法において、
特定の単位区間について求められた含有信号についての振幅を示す情報としてベロシティーを用い、周波数もしくはその近傍の周波数を示す情報としてノートナンバーを用い、特定の単位区間の時間軸上での区間開始位置を示す情報としてノートオン時刻を用い、区間終了位置を示す情報としてノートオフ時刻を用い、符号データとしてＭＩＤＩデータを作成するようにしたものである。
【００１９】
(9) 本発明の第９の態様は、上述の第１〜第８の態様に係る時系列信号の解析方法または音響信号の符号化方法をコンピュータに実行させるためのプログラムを、コンピュータ読み取り可能な記録媒体に記録するようにしたものである。
【００２０】
【発明の実施の形態】
以下、本発明を図示する実施形態に基づいて説明する。
【００２１】
§１．本発明に係る解析方法および符号化方法の基本原理
はじめに、本発明に係る時系列信号の解析方法および音響信号の符号化方法の基本原理を述べておく。この基本原理は、前掲の各公報あるいは明細書に開示されているので、ここではその概要のみを簡単に述べることにする。
【００２２】
いま、図１(a) に示すように、時系列信号としてアナログ音響信号が与えられたものとしよう。図示の例では、横軸に時間ｔ、縦軸に振幅（強度）をとってこの音響信号を示している。ここでは、まずこのアナログ音響信号を、デジタルの音響データとして取り込む処理を行う。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周期でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えばよい。
【００２３】
続いて、この解析対象となる音響信号の時間軸上に、複数の単位区間を設定する。図１(a) に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの単位区間ｄ１〜ｄ５が設定されている。図示の例では、すべて同一の区間長をもった単位区間が設定されているが、個々の単位区間ごとに区間長を変えるようにしてもかまわない。あるいは、隣接する単位区間が時間軸上で部分的に重なり合うような区間設定を行ってもかまわない。
【００２４】
こうして単位区間が設定されたら、各単位区間ごとの音響信号（以下、区間信号と呼ぶことにする）について、それぞれ代表周波数を選出する。各区間信号には、通常、様々な周波数成分が含まれているが、たとえば、その中で振幅の大きな周波数成分を代表周波数として選出すればよい。代表周波数は１つだけ選出してもよいが、複数の代表周波数を選出した方が、より精度の高い符号化が可能になる。図１(b) には、個々の単位区間ごとにそれぞれ３つの代表周波数を選出し、１つの代表周波数を１つの代表符号コード（図では便宜上、音符として示してある）として符号化した例が示されている。ここでは、代表符号コード（音符）を収容するために３つのトラックＴ１，Ｔ２，Ｔ３が設けられているが、これは個々の単位区間ごとに選出された３つずつの代表符号コードを、それぞれ異なるトラックに収容するためである。なお、ここでの「コード」は記号を意味する「code」の意味であり、和音を示す「chord」の意味ではない。
【００２５】
たとえば、単位区間ｄ１について選出された代表符号コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容されている。ここで、各コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、ＭＩＤＩ符号におけるノートナンバーを示すコードである。ＭＩＤＩ符号におけるノートナンバーは、０〜１２７までの１２８通りの値をとり、それぞれピアノの鍵盤の１つのキーを示すことになる。具体的には、たとえば、代表周波数として４４０Ｈｚが選出された場合、この周波数はノートナンバーｎ＝６９（ピアノの鍵盤中央の「ラ音（Ａ３音）」に対応）に相当するので、代表符号コードとしては、ｎ＝６９が選出されることになる。もっとも、図１(b) は、上述の方法によって得られる代表符号コードを音符の形式で示した概念図であり、実際には、各音符にはそれぞれ強度に関するデータも付加されている。たとえば、トラックＴ１には、ノートナンバーｎ（ｄ１，１），ｎ（ｄ２，１）…なる音階を示すデータとともに、ｅ（ｄ１，１），ｅ（ｄ２，１）…なる強度を示すデータが収容されることになる。この強度を示すデータは、各代表周波数の成分が、もとの区間信号にどの程度の度合いで含まれていたかによって決定される。具体的には、各代表周波数をもった周期関数の区間信号に対する相関値に基づいて強度を示すデータが決定されることになる。また、図１(b) に示す概念図では、音符の横方向の位置によって、個々の単位区間の時間軸上での位置が示されているが、実際には、この時間軸上での位置を正確に数値として示すデータが各音符に付加されていることになる。
【００２６】
音響信号を符号化する形式としては、必ずしもＭＩＤＩ形式を採用する必要はないが、この種の符号化形式としてはＭＩＤＩ形式が最も普及しているため、実用上はＭＩＤＩ形式の符号データを用いるのが最も好ましい。ＭＩＤＩ形式では、「ノートオン」データもしくは「ノートオフ」データが、「デルタタイム」データを介在させながら存在する。「ノートオン」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏開始を指示するデータであり、「ノートオフ」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏終了を指示するデータである。また、「デルタタイム」データは、所定の時間間隔を示すデータである。ベロシティーＶは、たとえば、ピアノの鍵盤などを押し下げる速度（ノートオン時のベロシティー）および鍵盤から指を離す速度（ノートオフ時のベロシティー）を示すパラメータであり、特定の音の演奏開始操作もしくは演奏終了操作の強さを示すことになる。
【００２７】
前述の方法では、第ｉ番目の単位区間ｄｉについて、代表符号コードとしてＪ個のノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｊ）が得られ、このそれぞれについて強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｊ）が得られる。そこで、次のような手法により、ＭＩＤＩ形式の符号データを作成することができる。まず、「ノートオン」データもしくは「ノートオフ」データの中で記述するノートナンバーＮとしては、得られたノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｊ）をそのまま用いればよい。一方、「ノートオン」データもしくは「ノートオフ」データの中で記述するベロシティーＶとしては、得られた強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｊ）を、所定の方法で規格化した値を用いればよい。また、「デルタタイム」データは、各単位区間の長さに応じて設定すればよい。
【００２８】
§２．周期関数との相関を求める具体的な方法
上述した基本原理に基く方法では、区間信号に対して、１つまたは複数の代表周波数が選出され、この代表周波数をもった周期信号によって、当該区間信号が表現されることになる。ここで選出される代表周波数は、文字どおり、当該単位区間内の信号成分を代表する周波数である。この代表周波数を選出する具体的な方法には、§３で述べるように、短時間フーリエ変換を利用する方法と、一般化調和解析の手法を利用する方法とがある。いずれの方法も、基本的な考え方は同じであり、予め周波数の異なる複数の周期関数を用意しておき、これら複数の周期関数の中から、当該単位区間内の区間信号に対する相関が高い周期関数を見つけ出し、この相関の高い周期関数の周波数を代表周波数として選出する、という手法を採ることになる。すなわち、代表周波数を選出する際には、予め用意された複数の周期関数と、単位区間内の区間信号との相関を求める演算を行うことになる。そこでここでは、周期関数との相関を求める具体的な方法を述べておく。
【００２９】
いま、複数の周期関数として、図２に示すような三角関数が用意されているものとしよう。これらの三角関数は、同一周波数をもった正弦関数と余弦関数との対から構成されており、１２８通りの標準周波数ｆ（０）〜ｆ（１２７）のそれぞれについて、正弦関数および余弦関数の対が定義されていることになる。ここでは、同一の周波数をもった正弦関数および余弦関数からなる一対の関数を、当該周波数についての周期関数として定義することにする。別言すれば、ある特定の周波数についての周期関数は、一対の正弦関数および余弦関数によって構成されることになる。このように、一対の正弦関数と余弦関数とにより周期関数を定義するのは、信号に対する周期関数の相関値を求める際に、相関値が位相の影響を受けることを排除するためである。なお、図２に示す各三角関数内の変数Ｆおよびｋは、区間信号Ｘについてのサンプリング周波数Ｆおよびサンプル番号ｋに相当する変数である。たとえば、周波数ｆ（０）についての正弦波は、sin （２πｆ（０）ｋ／Ｆ）で示され、任意のサンプル番号ｋを与えると、区間信号を構成する第ｋ番目のサンプルと同一時間位置における周期関数の振幅値が得られる。
【００３０】
ここでは、１２８通りの標準周波数ｆ（０）〜ｆ（１２７）を、図３に示すような式で定義した例を示すことにする。すなわち、第ｎ番目（０≦ｎ≦１２７）の標準周波数ｆ（ｎ）は、
ｆ（ｎ）＝４４０・２^γ（ｎ）
γ（ｎ）＝（ｎ−６９）／１２
なる式で定義されることになる。このような式によって標準周波数を定義しておくと、最終的にＭＩＤＩデータを用いた符号化を行う際に便利である。なぜなら、このような定義によって設定される１２８通りの標準周波数ｆ（０）〜ｆ（１２７）は、等比級数をなす周波数値をとることになり、ＭＩＤＩデータで利用されるノートナンバーに対応した周波数になるからである。たとえば、ノートナンバーｎ＝６９は、前述したようにピアノの鍵盤中央の「ラ音（Ａ３音）」を示しており、４４０Ｈｚの音に相当することになるが、図３に示す式によって第ｎ番目の標準周波数ｆ（ｎ）を定義しておけば、ｎ＝６９を代入すると、ｆ（ｎ）＝４４０が得られることになる。別言すれば、図３に示す式によって定義された１２８通りの標準周波数ｆ（０）〜ｆ（１２７）は、ＭＩＤＩデータにおける１２８通りのノートナンバーｎ＝０〜１２７に対応した周波数ということになる。ノートナンバーｎは、１オクターブ上がると、周波数が２倍になる対数尺度の音階を示すため、周波数軸ｆに対して線形には対応しない。したがって、図２に示す１２８通りの標準周波数ｆ（０）〜ｆ（１２７）は、対数尺度で示した周波数軸上に等間隔（ＭＩＤＩにおける半音単位）に設定した周波数ということになる。このため、本願では、図に掲載するグラフにおけるノートナンバー軸を、いずれも対数尺度で示すことにする。
【００３１】
続いて、任意の区間信号に対する各周期関数の相関の求め方について、もう少し具体的な説明を行っておこう。たとえば、図４に示すように、ある単位区間ｄについて区間信号Ｘが与えられたとする。ここでは、区間長Ｌをもった単位区間ｄについて、サンプリング周波数Ｆでサンプリングが行われており、全部でｗ個のサンプル値が得られているものとし、サンプル番号を図示のように、０，１，２，３，…，ｋ，…，ｗ−２，ｗ−１としよう（白丸で示す第ｗ番目のサンプルは、右に隣接する次の単位区間の先頭に含まれるサンプルとする）。この場合、任意のサンプル番号ｋについては、Ｘ（ｋ）なる振幅値がデジタルデータとして与えられていることになる。
【００３２】
このような区間信号Ｘに対して、第ｎ番目の標準周波数ｆ（ｎ）をもった正弦関数Ｒｎとの相関値を求める原理を示そう。両者の相関値Ａ（ｎ）は、図５の第１の演算式によって定義することができる。ここで、Ｘ（ｋ）は、図４に示すように、区間信号Ｘにおけるサンプル番号ｋの振幅値であり、sin （２πｆ（ｎ）・ｋ／Ｆ）は、時間軸上での同位置における正弦関数Ｒｎの振幅値である。この第１の演算式は、単位区間ｄ内の全サンプル番号ｋ＝０〜ｗ−１の位置について、それぞれ区間信号Ｘの振幅値と正弦関数Ｒｎの振幅値との積を求め、その総和を求める式ということができる。振幅値は正負の符号を有しているので、その積も正負の符号を有したものになる。したがって、区間信号Ｘと正弦関数Ｒｎとの間に全く相関がなかったとすれば、両振幅の積の符号は、全くランダムに正になったり負になったりするので、その総和は０になる。逆に、両者間に相関があれば、両振幅の積の総和の絶対値は、相関の程度に応じて大きくなる。たとえば、区間信号Ｘの振幅が正である時には、正弦関数Ｒｎの振幅も常に正であり、区間信号Ｘの振幅が負である時には、正弦関数Ｒｎの振幅も常に負である、というような正の相関がある場合（区間信号Ｘと正弦関数Ｒｎとが同一周波数で同位相）ならば、積の総和は正の最大値になり、これとは逆に、区間信号Ｘの振幅が正である時には、正弦関数Ｒｎの振幅は常に負であり、区間信号Ｘの振幅が負である時には、正弦関数Ｒｎの振幅は常に正である、というような負の相関がある場合（区間信号Ｘと正弦関数Ｒｎとが同一周波数で逆位相）ならば、積の総和は負の最大値になる。
【００３３】
同様に、図５の第２の演算式は、区間信号Ｘと、第ｎ番目の標準周波数ｆ（ｎ）をもった余弦関数との相関値を求める式であり、両者の相関値はＢ（ｎ）で与えられる。なお、相関値Ａ（ｎ）を求めるための第１の演算式も、相関値Ｂ（ｎ）を求めるための第２の演算式も、最終的に係数２／ｗが乗ぜられているが、これは相関値を規格化するためのものである。すなわち、分母のｗは、単位区間ｄ内に含まれる総サンプル数であり、ｋ＝０〜ｗ−１の全ｗ個のサンプルについて求めた総和を、総サンプル数ｗで割ることにより、１サンプル分についての平均を求める意味をもっている。一方、分子の２は、相関値Ａ（ｎ），Ｂ（ｎ）が−１〜＋１の間の値となるようにするための定数である。
【００３４】
区間信号Ｘと標準周波数ｆ（ｎ）をもった周期関数との総合的な相関は、たとえば、図５の第３の演算式に示すように、正弦関数との相関値Ａ（ｎ）と余弦関数との相関値Ｂ（ｎ）との実効値、すなわち、二乗和平方根値Ｅ（ｎ）によって示すことができる。このように、二乗和平方根値を用いれば、正の相関と負の相関との双方を反映させた総合的な相関を求めることができ、位相の影響を排除した正確な相関を求めることができる。たとえば、正弦関数に対しては正の相関を示し、余弦関数に対しては負の相関を示すような場合、相関値Ａ（ｎ）は正の値となり、相関値Ｂ（ｎ）は負の値となるが、二乗和平方根値Ｅ（ｎ）は、両相関値の絶対値を反映した値となる。
【００３５】
図５に示す演算式は、周期関数として三角関数を用いた場合の例（別言すれば、波形形状が正弦波になる関数の例）であるが、本発明を実施する上で用いる周期関数の波形形状は、正弦波に限定されるものではなく、三角波、矩形波、鋸歯状波などの波形形状をもった周期関数を用いてもかまわない。たとえば、周期関数として、その波形形状が正弦波、三角波、矩形波、鋸歯状波になる複数通りの関数を定義しておき、取り込んだ音響データの特性に基いて、所定の波形形状をもった関数を手動（オペレータの指示）により選択的に用いるようなこともできる。もちろん、オペレータの選択指示を待たずに、取り込んだ音響データの特性を分析し、最も適した周期関数を自動選択するような機能をもたせておくこともできる。
【００３６】
図６に示す式は、三角関数の代わりに、標準周波数ｆ（ｎ）をもった一般的な周期関数Ｒｎを用いた場合の相関を定義する演算式である。相関値Ａ（ｎ）を求める演算式では、周期関数Ｒｎ（ｋ）が用いられているのに対し、相関値Ｂ（ｎ）を求める演算式では、周期関数Ｒｎ（ｋ＋Ｆ／４ｆ（ｎ））が用いられているのは、両周期関数は、同じ標準周波数ｆ（ｎ）を有しているにもかかわらず、互いに位相がπ／２だけ異なっているためである。上述したように、Ｆは区間信号Ｘのサンプリング周波数であり、Ｆ／ｆ（ｎ）は、１周期内のサンプル総数に相当する。したがって、Ｆ／４ｆ（ｎ）は、１／４周期に相当する時間内のサンプル数を示す値となり、位相差π／２をサンプル番号の単位で示した値となる。このように、標準周波数ｆ（ｎ）をもち、互いに位相がπ／２だけ異なる一対の周期関数について、それぞれ相関値Ａ（ｎ），Ｂ（ｎ）を求めれば、その二乗和平方根値Ｅ（ｎ）が、標準周波数ｆ（ｎ）をもった周期関数に対する総合的な相関を示すパラメータになる。
【００３７】
§３．代表周波数を選出する２通りの方法
既に§２で述べたように、特定の区間信号に対して代表周波数を選出する基本原理は、予め用意された複数の周期関数と当該区間信号との相関を求め、相関の高い周期関数の周波数を代表周波数として選出することにある。この代表周波数を選出する具体的な方法としては、短時間フーリエ変換を利用する方法と、一般化調和解析の手法を利用する方法との２通りの方法が提案されている。ここでは、この２通りの方法について説明を行う。
【００３８】
フーリエ変換を利用する方法では、まず、用意されたすべての周期関数について、それぞれ区間信号Ｘに対する相関値を求める。たとえば、図２に示すように１２８通りの周期関数が用意されていた場合、これらすべての周期関数について、図５の式に示されている演算を行い、相関値Ｅ（０）〜Ｅ（１２７）を求めることになる。こうして求まった相関値を相関強度Ｅ（ｎ）としてプロットしてグラフにすれば、図７のような強度グラフが得られることになる。この強度グラフは、図１に示す音響信号の単位区間ｄ１内の区間信号Ｘについての周波数成分を示すグラフであり、いわゆるフーリエスペクトルを示すものである。このようなフーリエスペクトルを求める処理がフーリエ変換であるが、単位区間ｄ１のような短い時間区間内の信号について、フーリエ変換を行う場合（短時間フーリエ変換）、通常は、ハニング窓（Hanning Window )などの重み関数を用いて、区間信号Ｘにフィルタをかけてから演算を行うのが一般的である。そもそも本来のフーリエ変換は、切り出した区間前後に同様な信号が無限に存在することを想定した理論に基づくため、短時間フーリエ変換において重み関数を用いない場合、作成したスペクトルに高周波ノイズがのることが多い。ハニング窓関数など区間の両端の重みが０になるような重み関数を用いると、このような弊害をある程度抑制できる。ハニング窓関数Ｈ（ｋ）は、単位区間長をＬとすると、ｋ＝１…Ｌに対して（ｋは単位区間内の位置を示すパラメータ）、
Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（２πｋ／Ｌ）
で与えられる関数である。
【００３９】
こうして求められた単位区間ｄ１の強度グラフは、単位区間ｄ１内の区間信号Ｘに含まれるノートナンバーｎ＝０〜１２７に相当する各周波数成分の割合を相関強度Ｅ（ｎ）として示すグラフということができる。そこで、この強度グラフに示されている各相関強度Ｅ（ｎ）に基いて、全１２８個のノートナンバーの中からＪ個のノートナンバーを選択し、このＪ個のノートナンバーを、単位区間ｄ１を代表する代表周波数として選出すればよい。たとえば、「相関強度Ｅ（ｎ）の大きい順に３個の符号コードを抽出する」という基準に基いて抽出を行えば、図７に示す例では、第１番目の代表周波数としてノートナンバーｎ（ｄ１，１）が、第２番目の代表周波数としてノートナンバーｎ（ｄ１，２）が、第３番目の代表周波数としてノートナンバーｎ（ｄ１，３）が、それぞれ選出されることになる。
【００４０】
このようにして、Ｊ個の代表周波数が選出されたら、これら各代表周波数とその相関強度によって、単位区間ｄ１についての区間信号Ｘを符号化することができる。たとえば、上述の例の場合、図７に示す強度グラフにおいて、ノートナンバーｎ（ｄ１，１）、ｎ（ｄ１，２）、ｎ（ｄ１，３）の相関強度がそれぞれｅ（ｄ１，１）、ｅ（ｄ１，２）、ｅ（ｄ１，３）であったとすれば、以下に示す３組のデータ対によって、単位区間ｄ１内の区間信号Ｘを表現することができる。
ｎ（ｄ１，１），ｅ（ｄ１，１）
ｎ（ｄ１，２），ｅ（ｄ１，２）
ｎ（ｄ１，３），ｅ（ｄ１，３）
図１(b) の各トラックの単位区間ｄ１に対応する位置に配置された符号コードは、こうして得られた符号コードである。以上が短時間フーリエ変換の手法である。
【００４１】
続いて、一般化調和解析の手法を説明しよう。この手法の詳細は、たとえば、特開２０００−２６１３２２号公報に開示されているが、その基本原理は次のとおりである。いま、図８(a) に示すような単位区間ｄについて、信号Ｓ（ｊ）なるものが存在するとしよう。ここで、ｊは後述するように、繰り返し処理のためのパラメータである（ｊ＝１〜Ｊ）。まず、この信号Ｓ（ｊ）についてのフーリエスペクトルを求める。すなわち、上述した短時間フーリエ変換の手法と同様に、図７に示すような強度グラフが得られることになる。ただ、上述した短時間フーリエ変換の手法では、この強度グラフに基づいて、相関強度の大きい順にＪ個（上述の例ではＪ＝３）の周波数を代表周波数として選出していたが、ここでは、相関強度が最も大きい周波数１つだけを代表周波数として選出する。続いて、図８(b) に示すような含有信号Ｇ（ｊ）を定義する。この含有信号Ｇ（ｊ）は、選出された代表周波数をもった周期関数であり、その振幅は、信号Ｓ（ｊ）に対する相関強度に応じた値となる。より具体的には、たとえば、周期関数として図２に示すように、一対の正弦関数と余弦関数とを用い、周波数ｆ（ｎ）が代表周波数として選出された場合、振幅Ａ（ｎ）をもった正弦関数Ａ（ｎ） sin（２πｆ（ｎ）ｋ／Ｆ）と、振幅Ｂ（ｎ）をもった余弦関数Ｂ（ｎ） cos（２πｆ（ｎ）ｋ／Ｆ）との和からなる信号が含有信号Ｇ（ｊ）ということになる（図８(b) では、図示の便宜上、一方の関数しか示していない）。ここで、Ａ（ｎ），Ｂ（ｎ）は、図５の式で得られる規格化された相関値であるから、結局、含有信号Ｇ（ｊ）は、信号Ｓ（ｊ）内に含まれている周波数ｆ（ｎ）をもった信号成分ということができる。
【００４２】
こうして、含有信号Ｇ（ｊ）が求まったら、信号Ｓ（ｊ）から含有信号Ｇ（ｊ）を減じることにより、差分信号Ｓ（ｊ＋１）を求める。図８(c) は、このようにして求まった差分信号Ｓ（ｊ＋１）を示している。この差分信号Ｓ（ｊ＋１）は、もとの信号Ｓ（ｊ）の中から、周波数ｆ（ｎ）をもった信号成分を取り去った残りの信号成分からなる信号ということができる。そこで、パラメータｊを１だけ増加させることにより、この差分信号Ｓ（ｊ＋１）を新たな信号Ｓ（ｊ）として取り扱い、同様の処理を、パラメータｊをｊ＝１〜Ｊまで１ずつ増やしながらＪ回繰り返し実行すれば、Ｊ個の代表周波数を選出することができる。
【００４３】
結局、所定の単位区間内の区間信号Ｘについて、一般化調和解析の手法を適用して、合計Ｊ個の代表周波数を選出するには、まず、パラメータｊを初期値１に設定し、この区間信号Ｘを第１番目の差分信号Ｓ（１）と定義し、前述した処理を、パラメータｊをｊ＝１〜Ｊまで１ずつ増やしながらＪ回繰り返し実行すればよい（ｊ＝１の場合の差分信号Ｓ（１）は、解析対象となる区間信号Ｘそのものであり、「差分信号」と呼ぶべきものではないが、ここでは信号Ｓ（ｊ）を「差分信号」と呼ぶことにしているので、信号Ｓ（１）も、便宜上、「差分信号」と呼ぶことにする。）。前述したフーリエ変換による手法の場合、もとの区間信号Ｘに対する相関に基づいて、Ｊ個の代表周波数のすべてが選出されることになるのに対し、この一般化調和解析の手法の場合、代表周波数が１つ決定されるたびに、もとの区間信号Ｘからこの代表周波数成分を減じた差分信号が求められ、この差分信号に対する相関に基づいて次の代表周波数が決定される、という手順がＪ回繰り返されることになる。
【００４４】
§４．２通りの方法の特性および本発明の基本概念
§３では、特定の区間信号に対して代表周波数を選出するための２通りの方法、すなわち、短時間フーリエ変換による方法と、一般化調和解析による方法とを述べた。これら２通りの方法には、それぞれ固有の特性がある。概して、一般化調和解析による方法は、短時間フーリエ変換による方法に比べて、演算負担は増えるが、より正確な代表周波数の選出が可能である。
【００４５】
たとえば、図９に示す例は、楽器音に対するスペクトル解析を、２通りの方法で行った結果を示すものである。図９(a) は、原信号となる楽器音のスペクトルを示し、図９(b) は、これを短時間フーリエ変換によって解析したスペクトルを示し、図９(c) は、これを一般化調和解析によって解析したスペクトルを示す。いずれも横軸はノートナンバー（対数尺度）を示しており、１目盛りが半音単位の周波数差に相当する。原信号は、図９(a) に示すように、ノートナンバーＮ１とＮ２の和音（長三度の協和音程）であり、たとえば、ピアノの場合、ノートナンバーＮ１に対応するキーと、Ｎ２に対応するキーとを同時に叩いた状態に相当する。したがって、原信号はノートナンバーＮ１とＮ２の成分を基本として構成されているため、この原信号によって生じる音響信号をＭＩＤＩ符号化した場合、本来であれば、ノートナンバーＮ１に対応するＭＩＤＩ符号と、ノートナンバーＮ２に対応するＭＩＤＩ符号とが生成されるべきである。
【００４６】
ところが、この音響信号に対して短時間フーリエ変換による解析を施すと、図９(b) に示すように、ノートナンバーＮ１に対応する基本音（Ｎ１基本音という）と、ノートナンバーＮ２に対応する基本音（Ｎ２基本音という）の他に、多くの倍音成分が現れ、更にそれらの周辺音が現れることになる。図では、基本音および倍音の成分を太線で示し、それらの周辺音の成分を細線で示してある。基本音に対する倍音成分がスペクトル上に現れるのは、アコースティック楽器（電子楽器のように人工音を発生する楽器ではなく、古くから用いられている自然な楽器）の物理的な特性により、意図的に鳴らした演奏音の周波数成分だけでなく、その整数倍の周波数成分が含まれてくるためである。図示の例では、基本音Ｎ１，Ｎ２のそれぞれについて、２倍音、３倍音、４倍音、５倍音、６倍音の各倍音成分が現れている。ノートナンバーは、対数尺度の周波数軸上に等間隔で定義された周波数パラメータであるため、ノートナンバーの数字そのものに着目すると、２倍音は基本音のノートナンバーに１２を加えたノートナンバーに対応し、以下、３倍音は＋１９、４倍音は＋２４、５倍音は＋２８、６倍音は＋３１に対応する。
【００４７】
上述したように、本来、フーリエ変換は、信号を切り出した単位区間の前後に同様な信号が無限に存在することを想定した理論に基づいている。ところが、現実的には、有限の区間長をもった単位区間内の区間信号は、そのような理想的な信号にはなっていない。このため、短時間フーリエ変換を行うと、上記倍音成分に加えて、更にそれらの周辺音成分が現れることになる。これらの周辺音は、場合によっては、本来の基本音をぼやかせる要因となる。たとえば、図９(b) に示す例の場合、各基本音は太線で示されているため図の上では識別可能であるが、実際には、Ｎ２基本音はＮ１基本音の周辺音によって隠されてしまっている。すなわち、図９(b) に示すフーリエスペクトルの中から、相関強度の大きい順に３つのノートナンバーを選出すると、基本音Ｎ１は選出されるが、基本音Ｎ２は周辺音に埋没して選出から漏れてしまうことになる。
【００４８】
一方、同じ音響信号に対して一般化調和解析による解析を施すと、図９(c) に示すように、太線で示す倍音成分は依然として残るものの、細線で示す周辺音成分に関してはかなり抑制されることがわかる。したがって、この図９(c) に示すフーリエスペクトルの中から、相関強度の大きい順に３つのノートナンバーを選出すれば、基本音Ｎ１，Ｎ２ともに選出されることになる。しかしながら、この一般化調和解析の手法を用いて、周辺音成分や倍音成分を完全に除去できるわけではない。したがって、周辺音成分による基本音の埋没は依然として生じる可能性があり、符号化を行う際に、基本音が埋没して正しく選出されないおそれがある。また、本来の基本音成分に混入して、このような周辺音成分までもが符号化されてしまうと、再生時に不快なノイズが発生することになる。和声学理論で知られているように、半音の音程をもつ隣接音が同時に鳴ると和声的に不協和音となり、これが数多く集まるとノイズになる。ちなみに、図９(a) に示すＮ１音，Ｎ２音は、長三度の協和音程であり、４倍音と５倍音とが共鳴する協和音である。
【００４９】
このように、従来提案されている時系列信号の解析方法には、基本音成分だけでなく、その倍音成分および周辺音成分もスペクトルに現れてしまうため、もともとの信号強度が低かった本来の基本音成分が埋没して選出されなかったり、不要な成分が誤って選出されてしまったりする問題がある。本願発明は、このような問題を解決するための新たな手法に基づくものである。
【００５０】
まず、周辺音成分を完全に除去する方法の基本原理について説明する。いま、図９(b) ，(c) に示すスペクトルを比較すると、Ｎ１基本音およびＮ２基本音については、ほぼ同程度の相関強度が現れているのに対し、細線で示す周辺音については、相関強度が両者で相違していることがわかる。すなわち、図示の例の場合、図９(b) に示す短時間フーリエ変換によるスペクトルでの周辺音成分に比べて、図９(c) に示す一般化調和解析によるスペクトルでの周辺音成分は全般的に小さくなっている。このように、短時間フーリエ変換による解析結果と、一般化調和解析による解析結果とを比較すると、基本音に関する相関強度はほぼ同一であるのに対し、周辺音に関する相関強度はかなり相違する、という性質があることがわかる。逆に言えば、両方の解析結果が与えられたときに、相関強度に大きな差が生じている周波数（この例の場合、ノートナンバー）があれば、当該周波数成分は、本来の基本音成分ではなく、周辺音成分であると判断してもよいことになる。
【００５１】
本発明の基本概念は、上述の原理に基づいたものであり、解析対象となる区間信号が与えられた場合、この区間信号に対して、短時間フーリエ変換による解析結果と、一般化調和解析による解析結果との双方を行い、両者における相関強度に大きな差が生じている周波数については、代表周波数として選出すべき本来の周波数ではないものと判断し、選出を行わないことにする、という考えに基づくものである。このような考えに従えば、図９に示す例の場合、細線で示す周辺音成分は、いずれも代表周波数として選出されなくなることがわかるであろう。具体的には、両方の相関強度差について、所定のしきい値Δを定め、両者の差がこのしきい値Δ以内となる条件設定を行い、この設定条件を満足する周波数だけを代表周波数として選出するようにすればよい。すなわち、所定のしきい値Δについて、相関強度差｜Ｅ−ＥＥ｜＜Δなる条件が満たされればよい。Δは正の値で、たとえばＥ／２程度とする。たとえば、図９に示すＮ１基本音およびＮ２基本音は、いずれもこの設定条件を満足しており（図９(b) における各基本音の相関強度と、図９(c) における各基本音の相関強度は、ほぼ同一であり、その差は所定のしきい値Δ以内となっている）、代表周波数として選出されることになる。
【００５２】
なお、この手法だけでは、各倍音成分が代表周波数として選出されてしまうことになるが、倍音成分を除去する付加的な手法については、§６において述べることにする。また、本願発明者が種々の音響信号に対して実験を行ったところ、周辺音成分は、短時間フーリエ変換によるスペクトル上での相関強度Ｅに比べ、一般化調和解析によるスペクトル上での相関強度ＥＥの方が小さくなるケースが圧倒的に多いことが判明した。したがって、代表周波数として選出するか否かを判断する設定条件としては、｜Ｅ−ＥＥ｜＜Δなる条件のかわりに、ＥＥ＞Ｅ−Δという条件を用いてもよい。
【００５３】
§５．本発明に係る時系列信号の解析方法の基本手順
続いて、本発明に係る時系列信号の解析方法の基本手順を、図１０の流れ図に基づいて説明する。
【００５４】
まず、ステップＳ１において、解析対象となる時系列信号を、デジタルデータとして取り込む。アナログ音響信号が解析対象である場合、たとえば、４４．１ｋＨｚのサンプリング周波数でサンプリングを行い、デジタルデータとして取り込めばよい。続くステップＳ２において、取り込んだ時系列信号の時間軸上に複数の単位区間を設定する。たとえば、図１に示す例では、所定の区間長をもった単位区間ｄ１〜ｄ５が設定されている。４４．１ｋＨｚのサンプリング周波数でサンプリングを行った場合、１つの単位区間に含まれるサンプル数を、たとえば２０４８個とすれば、単位区間の区間長は約４６ｍｓとなる。
【００５５】
次のステップＳ３では、複数通りの周波数を設定し、各周波数をもった周期関数からなる関数群を定義する。前述の例では、図２に示すように、ＭＩＤＩのノートナンバー０〜１２７に対応した１２８通りの標準周波数ｆ（０）〜ｆ（１２７）を定義し、１２８対の正弦関数および余弦関数を定義することになる。このように、同一の周波数をもった正弦関数および余弦関数からなる一対の関数を、当該周波数についての周期関数とするのは、既に述べたように、相関を求める際に位相の影響を避けるためである。この周期関数と所定の信号との相関値としては、図５に示すように、正弦関数についての相関値Ａ（ｎ）と余弦関数についての相関値Ｂ（ｎ）との実効値Ｅ（ｎ）を用いることになる。
【００５６】
続いて、ステップＳ４において、１つの単位区間内の時系列信号を区間信号Ｘとして抽出するとともに、この区間信号Ｘを第１番目の差分信号Ｓ（１）と定義する。この差分信号Ｓ（１）は、図８(a) に示す信号Ｓ（ｊ）に相当するものであり、一般化調和解析を行う上での最初の解析対象となる信号である（前述したように、ｊ＝１における差分信号Ｓ（１）は、本来、「差分信号」と呼ぶべき信号ではないが、ここでは便宜上、「差分信号」と呼んでいる。）。次のステップＳ５では、この一般化調和解析の繰り返し回数を示すパラメータｊが初期値１に設定され、以下、ステップＳ６〜Ｓ８の処理が、ステップＳ９の条件、すなわち、パラメータｊが所定回数Ｊに到達するまで、ステップＳ１０においてパラメータｊを１ずつ増加させながら繰り返し実行されることになる。この繰り返し処理は、基本的には、前述した一般化調和解析の処理であるが、通常の一般化調和解析では、相関値の大きさ順に無条件で代表周波数を選出していたのに対し、ここでは、差分信号Ｓ（ｊ）に対する相関値ＥＥと区間信号Ｘに対する相関値Ｅとの関係が所定の設定条件を満足しているか否かを判断し、条件を満足している場合に限り、代表周波数としての選出を行うようにする。
【００５７】
すなわち、ステップＳ６において、ステップＳ３で定義した関数群の中から、区間信号Ｘに対する相関値Ｅが最も高く、かつ、第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥと相関値Ｅとの関係が所定の設定条件を満足している周期関数を第ｊ番目の要素関数として選出する処理が行われる。ここで、要素関数とは、区間信号Ｘの構成要素となっているものと判断できる周期関数であり、代表周波数として選出するのに相応しい周波数をもった周期関数のことである。上述したように、従来の一般化調和解析では、相関値が最も高い周期関数を無条件で要素関数として選出していたが、ここでは、相関値Ｅが最も高い周期関数をとりあえず仮要素関数とし、この仮要素関数について、第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥが求められ、相関値Ｅと相関値ＥＥとの関係が所定の設定条件を満足していた場合に限り、当該仮要素関数が、正式な要素関数として選出されることになる。もし、仮要素関数が、所定の設定条件を満足していなかった場合には、区間信号Ｘに対する相関値Ｅの値がその次の大きさをもつ周期関数を次の仮要素関数として選出し、同様に、第ｊ番目の差分信号Ｓ（ｊ）に対する相関値ＥＥを求め、相関値Ｅと相関値ＥＥとの関係が所定の設定条件を満足しているか否かが判断されることになる。
【００５８】
なお、関数群の中で、区間信号Ｘに対する相関値Ｅが最も高い周期関数が、必ずしも所定の設定条件を満足しているとは限らないので、このステップＳ６における「区間信号Ｘに対する相関値Ｅが最も高く」という文言の意味は、「所定の設定条件を満足している周期関数の中では最も高い」という意味である。また、「所定の設定条件」とは、§４で述べたように、周辺音を除外するための設定条件であり、具体的には、相関値ＥＥ（図９(c) に示す一般化調和解析スペクトルの相関強度に相当）と相関値Ｅ（図９(b) に示す短時間フーリエ変換スペクトルの相関強度に相当）とが所定範囲内で近似しているか、あるいは、前者が後者よりも小さくないという条件を用いればよい。ステップＳ６において、このような設定条件を満たし、区間信号Ｘに対する相関値Ｅが最も高い周期関数を要素関数として選出することにより、図９に細線で示す周辺音成分の周波数に該当する周期関数が要素関数として選出されることを防ぐことができる。もっとも、ｊ＝１の段階では、差分信号Ｓ（ｊ）＝区間信号Ｘであるから、相関値Ｅ＝相関値ＥＥとなり、「所定の設定条件」は必ず満足されることになる。したがって、このステップＳ６の条件判断が意味をもつのは、ｊ＝２以降の段階においてである。なお、相関値Ｅと相関値ＥＥとをできるだけ同じ条件で比較することができるように、相関値演算を行う際には、§３で述べたハニング窓は設定しないようにする。
【００５９】
こうして、第ｊ番目の要素関数が選出されたら、ステップＳ７において、この第ｊ番目の要素関数を関数群から除外する処理が行われる。これは、一度要素関数として選出された周期関数が、後に再び要素関数として重複選出されることを防ぐためである。
【００６０】
続くステップＳ８では、選出された第ｊ番目の要素関数に基づいて、第ｊ番目の含有信号Ｇ（ｊ）が求められる。要素関数は、図２に示す１２８通りの周期関数のうちから選出された関数（同一周波数をもつ正弦関数および余弦関数の対）であり、振幅となるべき係数をもっていない関数である。含有信号Ｇ（ｊ）は、この要素関数に振幅を示す係数を付加することにより得られる信号であり、具体的には、要素関数に相関値ＥＥを乗じることにより得られる信号である。相関値ＥＥは、当該要素関数と第ｊ番目の差分信号Ｓ（ｊ）との相関を示す値であり、この相関値ＥＥが大きければ、それだけ当該要素関数の周波数成分が大きく含まれていることになる。前述したように、相関値としては規格化された値が用いられているため、要素関数と相関値ＥＥとの積で与えられる含有信号は、差分信号Ｓ（ｊ）内に含まれている当該要素関数の周波数成分をもった信号ということになる。実際には、要素関数は、同一周波数をもつ正弦関数および余弦関数の対であるため、含有信号Ｇ（ｊ）は、図５に示す式で得られる相関値Ａ（ｎ）を係数にもつ正弦信号と、相関値Ｂ（ｎ）を係数にもつ余弦信号との和からなる信号ということになる。
【００６１】
次に、第ｊ番目の差分信号Ｓ（ｊ）から、第ｊ番目の含有信号Ｇ（ｊ）を減じることにより、新たな差分信号Ｓ（ｊ＋１）を求める。この差分信号Ｓ（ｊ＋１）は、もとの差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）の成分を取り去った残りの信号ということになり、結局、ステップＳ８の処理は、差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）を抽出する処理ということになる。
【００６２】
こうして、ステップＳ６〜Ｓ８の処理を、ｊ＝１〜Ｊ（Ｊは任意の整数）までＪ回繰り返して行えば、合計Ｊ個の含有信号Ｇ（１）〜Ｇ（Ｊ）が抽出されることになる。このＪ個の含有信号は、もとの区間信号Ｘに含まれていた信号成分であり、しかも周辺音成分ではない信号成分である。かくして、解析対象となる区間信号Ｘについて、その構成要素となる周期信号として、Ｊ個の含有信号Ｇ（１）〜Ｇ（Ｊ）が抽出されたことになる。
【００６３】
以上は、１つの単位区間についての処理であるが、ステップＳ１１を経てステップＳ３へと戻ることにより、次の単位区間についても全く同様の処理が実行されることになる。こうして、ステップＳ１１において、全単位区間についての処理が完了したと判断されれば、この手順はすべて完了である。結局、全単位区間について、それぞれＪ個の含有信号Ｇ（１）〜Ｇ（Ｊ）が抽出されることになる。各含有信号は、所定の周波数（代表周波数）に関する情報と、所定の振幅（強度）に関する情報を有しており、また、各単位区間は時間軸上での位置を示す情報を有しているため、これらの情報に基づいて符号データを作成すれば、ステップＳ１において時系列信号として入力した音響信号を符号化することが可能になる。たとえば、符号データとしてＭＩＤＩデータを作成するのであれば、各単位区間について求められた含有信号についての振幅を示す情報としてベロシティーを用い、周波数を示す情報としてノートナンバーを用い、各単位区間の時間軸上での区間開始位置を示す情報としてノートオン時刻を用い、区間終了位置を示す情報としてノートオフ時刻を用いるようにすればよい。
【００６４】
なお、ステップＳ９において、Ｊ回の繰り返し処理を行うと、各単位区間ごとに合計Ｊ個の含有信号Ｇ（１）〜Ｇ（Ｊ）が抽出されることになるが、符号化を行う際には、必ずしもこのＪ個のすべての含有信号についての情報を用いる必要はない。たとえば、個々の単位区間ごとに、得られたＪ個の含有信号のうち振幅の大きい順にＭ個（Ｍ＜Ｊ）の含有信号を選出し、このＭ個の含有信号に基づいて符号データの作成を行うようにすれば、振幅の大きな主要な周波数成分についてのみ選択的に符号化することが可能になる。
【００６５】
以上、図１０の流れ図に基づいて、本発明に係る時系列信号の解析方法および音響信号の符号化方法の基本概念の流れを説明した。そこで、ここでは、この基本概念に基づく処理を、コンピュータを用いて実際に実行する際の配慮を述べることにする。
【００６６】
まず、ステップＳ６の処理は、要するに、関数群の中から１つの要素関数を選出する処理である。図２に示すような関数群を用意した場合であれば、この１２８通りの周期関数のうちから１つの要素関数を選出することになる。選出の条件としては、「区間信号Ｘに対する相関値Ｅが最も高く」かつ「差分信号Ｓ（ｊ）に対する相関値ＥＥと、区間信号Ｘに対する相関値Ｅとの関係が、所定の設定条件を満足する」という条件になる。このような条件に基づく要素関数の選出を行うためには、実際には、次のような処理を行えばよい。
【００６７】
まず、区間信号Ｘと、１２８通りの周期関数との相関値Ｅ（ｎ）をすべて計算する（ｎ＝０〜１２７）。ここで、相関値Ｅ（ｎ）は、図５に示す式で求まる実効値である。区間信号Ｘ自体は、繰り返し回数を示すパラメータｊが１ずつ増加したとしても不変であるため、この１２８通りの相関値Ｅ（ｎ）を求める演算は、パラメータｊ＝１のときに１回だけ行い、これを記録しておくようにすれば足りる。
【００６８】
次に、こうして求めた相関値Ｅ（ｎ）の値が最も高い周期関数を、仮要素関数として選出する。そして、この仮要素関数について、差分信号Ｓ（ｊ）に対する相関値ＥＥ（ｎ）を計算する。差分信号Ｓ（ｊ）は、パラメータｊが増加するたびに変わるので、選出した仮要素関数について相関値ＥＥ（ｎ）を求める演算は、毎回行う必要がある。そして、ＥＥ（ｎ）とＥ（ｎ）との関係が、所定の設定条件を満足しているか否かを判断する。その結果、設定条件を満足していた場合には、当該仮要素関数を正式に第ｊ番目の要素関数として選出するが、設定条件を満足していなかった場合には、新たな仮要素関数を選出し、同様の処理を行うようにする。こうして、設定条件が満足されるまで、同様の処理を繰り返し行えば、やがて正式な要素関数が選出されることになる。
【００６９】
なお、要素関数として正式に選出された周期関数については、ステップＳ７において、関数群から除外される処理が行われ、重複選出が避けられることになるが、実際には、一度仮要素関数として選出されたにもかかわらず、所定の設定条件を満足していなかったために、正式に要素関数として選出されることがなかった周期関数についても、関数群から除外する処理を行ってしまうのが好ましい。既に述べたように、所定の設定条件を満足していない周期関数は、周辺音成分となる周波数をもった周期関数であるため、要素関数として選出するには不適当な周期関数である。したがって、そのような周期関数は、仮要素関数として選出された段階で、関数群から除去してしまった方がよい。したがって、実際には、関数群からの除外処理は、ステップＳ７において行わずに、ステップＳ６において仮要素関数を選出した段階で行うようにするとよい。すなわち、ステップＳ６において、一旦、仮要素関数として選出された周期関数については、正式な要素関数として選出されるか否かにかかわらず、すべてこの時点で関数群から除外してしまうようにすればよい。そうすれば、関数群から新たな仮要素関数を選出する際には、常に、その時点で関数群に残っている周期関数の中から、相関値Ｅ（ｎ）の値が最も高い周期関数を仮要素関数として選出すればよいので、選別作業が容易になる。
【００７０】
ところで、実際にコンピュータを利用して「特定の周期関数を関数群から除外する」という処理を行うには、フラグのＯＮ／ＯＦＦを用いるようにすると便利である。すなわち、ステップＳ３において、１２８通りの周期関数からなる関数群を定義した時点で、この１２８通りの周期関数のそれぞれにＯＮ状態のフラグを設定するようし、特定の周期関数をこの関数群から除外する際には、当該周期関数についてのフラグをＯＦＦ状態にすればよい。個々の時点で関数群に残っている周期関数は、その時点でフラグがＯＮ状態となっている周期関数ということになる。
【００７１】
§６．倍音成分を除去する手順
上述した§５の基本手順を実行すれば、図９に細線で示す周辺音成分の周波数をもつ要素関数が選出されることを防ぐことはできるが、図９に太線で示す倍音成分の周波数をもつ要素関数の選出を防ぐことはできない。音響信号をＭＩＤＩデータを用いて符号化した際、このような倍音成分までもがそのまま符号化されてしまうと、本来の楽器（たとえば、ピアノ）では、Ｎ１音およびＮ２音の２つの鍵盤しか叩いていないのに、太線で示す倍音成分に相当する多数の鍵盤についてのＭＩＤＩデータが作成されてしまうことになる。
【００７２】
解析時あるいは符号化時において、このような倍音成分を除去する方法が、特開平１１−９５７５３号公報に開示されている。その基本原理は、基本音の周波数が代表周波数として選出できた段階で、当該基本音周波数の整数倍の周波数成分を単純に除外してしまうというものである。この方法によれば、たとえば、図９(b) において、Ｎ１基本音が代表周波数として選出された段階でこのＮ１の各倍音成分が除去され、更に、Ｎ２基本音が代表周波数として選出された段階でこのＮ２の各倍音成分が除去されることになるため、倍音成分の周波数が代表周波数として選出されることはなくなる。したがって、確かに図９に示す例では、倍音を除去する有効な方法となる。しかしながら、実際には、この方法では問題が生じることが多い。それは、同一楽器でオクターブ和音（互いに倍音関係にある２つの音）が演奏されたときなどでは、第１の基本音の倍音周波数に、第２の基本音の周波数が重なってしまうためである。図９(a) に示す例では、Ｎ２音はＮ１音の倍音にはなっていないため、このような問題は生じていないが、仮に、Ｎ２音がＮ１音の倍音位置にあったとすると、Ｎ１基本音が代表周波数として選出された段階で行われる倍音成分除去処理により、Ｎ２基本音も除去されてしまうことになる。
【００７３】
結局、与えられた命題は、基本音に対する倍音成分が現れたときに、この倍音成分が和音として基本音とともに実際に演奏された音に基づく成分なのか、あるいは、基本音に付随して生じる単なる倍音成分なのか、を識別することにある。解析をする際には、前者の場合は、当該倍音成分は除外してはならないが、後者の場合は、これを除外するべきである。
【００７４】
本願発明者は、ノートナンバー単位（ＭＩＤＩの半音単位）の粗い周波数解析では、上述した識別は困難であるが、より細かな周波数解析を行えば、この識別が可能であることに気が付いた。これを図１１の例で説明しよう。図１１(a) は、単一楽器で単一音を演奏したときに得られる周波数スペクトルを示している。横軸は、ノートナンバー軸であり、対数尺度の周波数に相当する。ただし、互いに倍音関係にあるノートナンバーの周波数付近をグラフ上で詳細に示すための便宜上、横軸は各所で分断され不連続となっている。単一楽器で単一音を鳴らした場合（たとえば、ピアノの鍵盤の１つを叩いた場合）、図１１(a) に示すように、基本音（叩いた鍵盤に対応する本来の音）となるノートナンバーＮの位置に信号強度が得られるとともに、２倍音となるノートナンバーＮ＋１２、３倍音となるノートナンバーＮ＋１９、４倍音となるノートナンバーＮ＋２４、５倍音となるノートナンバーＮ＋２８、６倍音となるノートナンバーＮ＋３１の各位置に、それぞれ倍音成分となる信号強度が得られることになる。これらの各倍音成分は、基本音周波数に対して正確に整数倍の周波数をもつことになる。
【００７５】
一方、図１１(b) は、単一楽器でオクターブ和音を演奏したときに得られる周波数スペクトルを示している。たとえば、ピアノの「ド」の音と、１オクターブ高い「ド」の音とを同時に叩いた場合のスペクトルに相当する。この場合、第１の「ド」がノートナンバーＮであったとすれば、第２の「ド」はノートナンバーＮ＋１２に相当する音になる。ところが、実際の楽器では、１オクターブが正確に２倍の周波数となるような調律はなされておらず、厳密な意味では、多少の誤差が生じることになる。これは、「一般の楽器では、正しい調律が行われていない」という意味ではなく、むしろ「正しい調律では、１オクターブが正確に２倍の周波数となるような設定は行わない」という意味である。このため、周波数軸を細かくとると、図１１(b) に示すように、第１の「ド」が実線で示すノートナンバーＮであったとすると、第２の「ド」は破線で示すノートナンバーＮ＋１２となる。すなわち、実線で示すノートナンバーＮ＋１２は、第１の「ド」の２倍音成分であるのに対し、破線で示すノートナンバーＮ＋１２は、全く独立した第２の「ド」の成分になる。もちろん、この第２の「ド」についても倍音成分が生じることになるが、これらの倍音成分はいずれも、この第２の「ド」の周波数に対して正確に整数倍の周波数をもった成分となる。このため、細かな周波数軸上でみれば、図１１(b) に示すように、実線で示す第１の「ド」およびその倍音成分と、破線で示す第２の「ド」およびその倍音成分とは、互いに所定周波数分だけずれた関係になり、決して重なることはない。
【００７６】
このような事情は、異なる楽器についても全く同様である。図１１(c) は、二種楽器で同一音を合奏したときに得られる周波数スペクトルを示している。たとえば、ピアノで「ド」を演奏し、木琴で同じ音程の「ド」を演奏したような場合に相当する。この場合、たとえば、ピアノの「ド」が実線で示すノートナンバーＮの周波数をもっていたとすると、木琴の「ド」は破線で示すノートナンバーＮの周波数をもっていることになり、両者の周波数は、同一のノートナンバーＮの音であるにもかかわらず若干異なることになる。しかも、ピアノの音の倍音成分は、ピアノの基本音の周波数に対して正確に整数倍の周波数を有し、木琴の音の倍音成分は、木琴の基本音の周波数に対して正確に整数倍の周波数を有する、という関係が維持されるので、図示のとおり、実線で示すピアノ系の倍音成分と、破線で示す木琴系の倍音成分とは、互いに所定周波数分だけずれた関係になり、決して重なることはない。
【００７７】
このような現象を考慮すれば、基本音に付随して生じる倍音成分のみを除去し、和音によって生じる倍音成分は除去せずに残す、という倍音除去処理が可能になる。具体的には、§５までに述べた実施形態では、図２に示すように、１２８通りの周波数ｆ（０）〜ｆ（１２７）を定義し、ＭＩＤＩのノートナンバーに対応する半音間隔の周波数精度で解析を行っていたが、この周波数精度をより高めるようにすればよい。たとえば、１／１０半音間隔で周波数を定義すれば、図２の１０倍の周波数精度が得られるようになり、１２８０通りの周波数を定義することができる。この場合、周期関数も１２８０通り用意することになるので、演算負担はかなり増大することになるが、周波数軸の解像度が向上するため、図１１(b) ，(c) に実線で示す成分と破線で示す成分とを互いに区別することが可能になる。したがって、たとえば、図１１(b) に示す例において、実線で示すノートナンバーＮに対応する周波数を代表周波数として選出した際に、その倍音成分を除去する処理を行ったとしても、除去の対象となる倍音成分は、代表周波数に対して正確に整数倍となる周波数をもった実線で示す成分だけであり、破線で示す成分が除去されることはなくなる。このため、続いて、破線で示すノートナンバーＮ＋１２に対応する周波数が代表周波数として選出されることになる。
【００７８】
ただし、最終的にＭＩＤＩデータとして符号データを得るような符号化を行う際には（実用上は、このような符号化を行うケースが最も多いと思われるが）、単に周波数精度を上げるだけの対応を採ることは適当でない。なぜなら、一般的なＭＩＤＩデータは、あくまでも半音単位の周波数精度を前提とした符号であり、周波数に関しては、０〜１２７までの１２８通りのノートナンバーを用いて表現することが前提となっているためである。したがって、ＭＩＤＩ符号化を行う際には、最終的には、ノートナンバーに対応する半音単位の周波数を用いた符号化が必要になる。
【００７９】
そこで、本実施形態では、階層構造をもった周波数定義を行うようにしている。すなわち、関数群を定義する際には、まず、上位階層となる複数α通りの標準周波数を定義し、各標準周波数について、それぞれβ通りのバリエーションをもった近接周波数（標準周波数の近傍に位置する周波数という意味で用いており、標準周波数と同一のものを含んでいてもよい）を定義する。このとき、互いに隣接する標準周波数についての近接周波数帯が重なり合わない範囲内で設定するようにする。具体的には、ノートナンバーに対応する１２８通りの周波数を上位階層となる標準周波数として定義する（α＝１２８）。そして、個々の標準周波数について、たとえば、１３通りのバリエーションをもった近接周波数を定義する（β＝１３）。
【００８０】
図１２は、このような近接周波数定義の一例を示す図であり、第ｉ番目のノートナンバーＮ（ｉ）に対応する標準周波数について、１３通りの近接周波数が定義された状態が示されている。標準周波数は、ノートナンバーに対応する周波数であるから半音単位の周波数となっており、図においてノートナンバーＮ（ｉ−１）の周波数とノートナンバーＮ（ｉ）の周波数との間隔は半音であり、ノートナンバーＮ（ｉ）の周波数とノートナンバーＮ（ｉ＋１）の周波数との間隔も半音である。これに対し、１３通りの近接周波数としては、１／１３半音単位の周波数を用いるようにしており、図に丸数字で示した１３通りの近接周波数の間隔はすべて１／１３半音となっている。図に＋または−を付して示した各分数（分母はいずれも１３）は、ノートナンバーＮ（ｉ）に対応する標準周波数に対する周波数差を示している。ここで、中央の近接周波数▲７▼は、標準周波数と同一である。
【００８１】
このように、１つの標準周波数について、複数β通りのバリエーションをもった近接周波数を定義するわけであるが、互いに隣接する標準周波数についての近接周波数帯が重なり合わないようにする必要がある。図には、この第ｉ番目の標準周波数についての近接周波数帯が示されているが、この近接周波数帯は、左側に位置する第（ｉ−１）番目の標準周波数についての近接周波数帯とも、右側に位置する第（ｉ＋１）番目の標準周波数についての近接周波数帯とも、重なり合わないようにする。図示の例では、１／１３半音単位の周波数をもった１３通りの近接周波数を定義しているので、各標準周波数の間を１３分割することにより、各近接周波数帯がうまく隣接するように配置されている。
【００８２】
このように、α通りの標準周波数について、それぞれβ通りの近接周波数を定義すると、合計でα×β通りの近接周波数が定義されることになる。このすべての近接周波数について、当該近接周波数をもった近接周期関数を定義する。結局、合計α×β通りの近接周期関数が定義されることになる。上述の例の場合、合計１２８×１３通りの近接周期関数が定義されることになる。図１３は、このようにして定義された近接周期関数の一覧を示す図である。たとえば、第ａ番目のノートナンバー（標準周波数）についての第ｂ番目のバリエーションに該当する近接周期関数は、ｆ（ａ，ｂ）ということになる。もっとも、この近接周期関数ｆ（ａ，ｂ）は、実際には一対の正弦関数と余弦関数から構成されることになる。
【００８３】
さて、このように１２８×１３通りの近接周期関数を定義して、倍音成分の除去処理を付加した解析および符号化の手順を、再び図１０の流れ図を参照して説明しよう。この流れ図において、ステップＳ１の入力処理およびステップＳ２の単位区間設定処理は、§５で述べた場合と全く同様である。ただし、ステップＳ３における関数群定義処理では、上述したように、１２８×１３通りの近接周期関数からなる関数群が定義されることになる。続くステップＳ４において、区間信号Ｘを抽出し、これを第１番目の差分信号Ｓ（１）と定義する点も、§５で述べたとおりである。そして、ステップＳ５において、パラメータｊ＝１に設定し、ステップＳ６の処理を行うことになる。
【００８４】
このステップＳ６では、関数群を構成する１２８×１３通りの近接周期関数の中から、要素関数が選出されることになる。すなわち、１／１３半音単位の周波数解像度で、区間信号Ｘに対する相関値Ｅが最も高く、かつ、差分信号Ｓ（ｊ）に対する相関値ＥＥと相関値Ｅとの関係が所定の設定条件を満足している近接周期関数が、要素関数として選出されることになる。したがって、この要素関数は、１／１３半音単位の周波数解像度をもった正確な周波数によって定義される関数となる。なお、実際の演算を行う上では、§５で述べたように、１２８×１３通りの全近接周期関数について、それぞれ区間信号Ｘとの相関値を予め計算し、その結果を記録しておくようにすれば、同じ演算をパラメータｊが変わるたびに毎回行う必要はない。
【００８５】
続くステップＳ７では、選出された要素関数を関数群から除去する処理が行われるが、このとき、ある１つの近接周期関数を関数群から除外する際には、当該近接周期関数のバリエーションとなる近接周期関数も含めて合計β通りの近接周期関数すべてを除外するようする。これは、後述するように、最終的な符号化の段階では、ノートナンバーに対応する粗い周波数解像度での符号化が行われるため、１つの標準周波数について、いずれか１つのバリエーションが要素関数として選出された場合には、もはや同一の標準周波数についてのバリエーションを要素関数として選出する必要がなくなるためである。
【００８６】
次に、ステップＳ８において、含有信号Ｇ（ｊ）が求められる。この含有信号Ｇ（ｊ）は、既に述べたように、選出された要素関数（近接周期関数）に、当該要素関数の差分信号Ｓ（ｊ）に対する相関値ＥＥを乗じることにより得られる信号である。したがって、この含有信号Ｇ（ｊ）も、１／１３半音単位の周波数解像度をもった正確な周波数によって定義される信号となる。§５で述べた手順では、差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）を減じることにより、新たな差分信号Ｓ（ｊ＋１）を求めていたが、ここでは倍音成分の除去を行うために、選出された要素関数がもつ周波数の整数倍の周波数をもち、かつ、関数群の中に含まれている１つまたは複数の近接周期関数を、倍音成分関数として選出し、「各倍音成分関数」と「当該倍音成分関数の差分信号Ｓ（ｊ）に対する相関値ＥＥ」との積で与えられる各倍音含有信号を求め、差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）と各倍音含有信号とを減じることにより得られる信号を新たな差分信号Ｓ（ｊ＋１）とする処理を行うようにする。
【００８７】
たとえば、図１１(b) に示す例において、ノートナンバーＮに対応する標準周波数についての１３通りのバリエーションのうちの１つに相当する特定の近接周期関数が、要素関数として選出されたものとしよう。この場合、当該近接周期関数のもつ周波数に対して正確に整数倍の周波数をもつ倍音成分関数が選出されることになる。ここで選出される倍音成分関数のバリエーション番号は、要素関数として選出された近接周期関数のバリエーション番号と同じになるはずである（倍音成分関数が、正確に整数倍の周波数をもつため）。たとえば、図１３に示す表における近接周期関数ｆ（ａ，ｂ）が要素関数として選出された場合、２倍音の倍音成分関数はｆ（ａ＋１２，ｂ）であり、３倍音の倍音成分関数はｆ（ａ＋１９，ｂ）であり、４倍音の倍音成分関数はｆ（ａ＋２４，ｂ）であり、５倍音の倍音成分関数はｆ（ａ＋２８，ｂ）であり、６倍音の倍音成分関数はｆ（ａ＋３１，ｂ）である。したがって、この場合、これら各倍音成分関数のそれぞれについて、差分信号Ｓ（ｊ）に対する相関値ＥＥが求められる。たとえば、各関数ｆ（ａ，ｂ），ｆ（ａ＋１２，ｂ），ｆ（ａ＋１９，ｂ），ｆ（ａ＋２４，ｂ），ｆ（ａ＋２８，ｂ），ｆ（ａ＋３１，ｂ）について求められた相関値が、それぞれＥＥ１，ＥＥ２，ＥＥ３，ＥＥ４，ＥＥ５，ＥＥ６であったとすると、含有信号Ｇ（ｊ）＝ＥＥ１・ｆ（ａ，ｂ）となり、倍音含有信号は、それぞれＥＥ２・ｆ（ａ＋１２，ｂ），ＥＥ３・ｆ（ａ＋１９，ｂ），ＥＥ４・ｆ（ａ＋２４，ｂ），ＥＥ５・ｆ（ａ＋２８，ｂ），ＥＥ６・ｆ（ａ＋３１，ｂ）ということになる（相関値ＥＥは実効値であり、実際には、正弦関数と余弦関数とでそれぞれ別個の相関値が用いられる）。差分信号Ｓ（ｊ）からは、含有信号Ｇ（ｊ）だけではなく、これらの倍音含有信号がすべて減じられることになり、その結果、新たな差分信号Ｓ（ｊ＋１）が求まる。
【００８８】
結局、新たな差分信号Ｓ（ｊ＋１）は、もとの差分信号Ｓ（ｊ）から含有信号Ｇ（ｊ）の成分と、その倍音成分とを差し引いた信号ということになり、図１１(b) の例では、実線で示した成分をすべて差し引いた信号ということになる。ただし、破線で示した成分については差し引かれていないので、パラメータｊが更新され、次の回の要素関数の選出段階においては、この破線で示した成分を要素関数として選出することができる。このように、上述した実施形態によれば、符号化する必要のない、基本音に付随して生じる倍音成分については除去され、符号化する必要がある、和音として演奏された倍音成分については除去されずに残すことが可能になる。
【００８９】
なお、この手順によって最終的に得られる含有信号は、いずれも１／１３半音単位の周波数解像度をもった正確な周波数をもった信号となるので、そのままではＭＩＤＩ符号化するのに不適当である。そこで、符号化にあたっては、各単位区間ごとに最終的に求められた含有信号の周波数をそれぞれ近接した標準周波数に置き換え、各単位区間内の時系列信号の構成要素として最終的に抽出される周期信号が、いずれかの標準周波数をもった信号となるようにした上で、ＭＩＤＩデータに変換するようにする。たとえば、上述の例において、Ｇ（ｊ）＝ＥＥ１・ｆ（ａ，ｂ）なる含有信号が得られた場合、これをＧ（ｊ）＝ＥＥ１・ｆ（ａ）なる含有信号に変更し、ノートナンバーＮ＝ａに相当するＭＩＤＩデータを生成するようにすればよい。
【００９０】
なお、最終的にＭＩＤＩ符号を作成した後、実用上は、必要に応じて、時間軸上で隣接する複数のＭＩＤＩ符号を統合する処理を行うようにするのが好ましい。たとえば、ノートオフ時刻とノートオン時刻とが近接する２つのＭＩＤＩ符号が存在し、これらのノートナンバーが同一または近似し、ベロシティーが同一または近似しているような場合、これらの２つのＭＩＤＩ符号を統合して１つにまとめる処理を行うようにするとよい。具体的には、先行するＭＩＤＩ符号のノートオフ時刻を、後続するＭＩＤＩ符号のノートオフ時刻に変更し、後続するＭＩＤＩ符号を削除すればよい。このとき、後続するＭＩＤＩ符号のベロシティーが先行するＭＩＤＩ符号のベロシティーよりも大きい場合には、先行するＭＩＤＩ符号のベロシティーを後続するＭＩＤＩ符号のベロシティーで置換するようにするのが好ましい。また、統合されたＭＩＤＩ音符の中でも、音符の長さ（ノートオン時刻からノートオフ時刻に至るまでの時間）が所定の値よりも短いものや、ベロシティーが所定の値よりも小さいものは削除するような処理を行うのが好ましい。
【００９１】
以上述べた実施形態によれば、周辺音成分および不要な倍音成分を除去した符号化が可能になる。従来の方法に比べて、演算負荷は増大することにはなるが、代表周波数の選出精度が向上し、不要な倍音成分が抽出されなくなるので、符号化品質および符号化効率が向上する。特に、演奏録音から自動的に楽譜をおこす自動採譜への応用には非常に効果的である。また、ボーカルの符号化においては、再生時にノイズや歪みの要因となる不要な音成分の混入が抑制され、逆に、従来の方法では帯域制約によって落とされていた重要な音成分が忠実に符号化されることになるため、より自然でリアルな再生品質が得られるようになる。
【００９２】
以上、本発明を図示するいくつかの実施形態について説明したが、本発明はこれらの実施形態に限定されるものではない。また、本発明は、コンピュータ処理によって実現できる発明であり、本発明を実施するための種々の処理はプログラムとして記述することができ、コンピュータ読取り可能な記録媒体に記録して配付することが可能である。
【００９３】
【発明の効果】
以上のとおり本発明によれば、時系列信号に対するより正確な周波数解析を行うことができ、原音響信号の符号化を高い品質をもって行うことが可能になる。
【図面の簡単な説明】
【図１】本発明に係る時系列信号の解析方法および音響信号の符号化方法の基本原理を示す図である。
【図２】本発明に係る方法で利用される周期関数の一例を示す図である。
【図３】図２に示す各周期関数の周波数とＭＩＤＩノートナンバーｎとの関係式を示す図である。
【図４】解析対象となる信号と周期信号との相関計算の手法を示す図である。
【図５】図４に示す相関計算を行うための計算式を示す図である。
【図６】図５に示す計算式を一般の周期関数にまで拡張した式を示す図である。
【図７】一般的なフーリエ解析によって得られるスペクトル強度グラフを示す図である。
【図８】一般化調和解析の基本的な手法を示す図である。
【図９】短時間フーリエ変換に基づく解析結果と一般化調和解析に基づく解析結果とを比較するための図である。
【図１０】本発明に係る時系列信号の解析方法の基本手順を示す流れ図である。
【図１１】種々の倍音成分のスペクトルを示すグラフである。
【図１２】本発明に係る時系列信号の解析方法において定義される近接周波数の概念を示す図である。
【図１３】本発明に係る時系列信号の解析方法において定義された多数の近接周期関数の一例を示す図である。
【符号の説明】
Ａ（ｎ），Ｂ（ｎ）…相関値
ｄ，ｄ１〜ｄ５…単位区間
Ｅ（ｎ）…相関値（実効値）
ｅ（ｄ１，１），ｅ（ｄ１，２），ｅ（ｄ１，３），ｅ（ｄ２，１），ｅ（ｄ２，２），ｅ（ｄ２，３）…振幅強度
Ｆ…サンプリング周波数
ｆ（０）〜ｆ（１２７），ｆ（ｎ）…標準周波数
ｆ（ａ，ｂ）…近接周期関数
Ｇ（ｊ）…含有信号
ｉ…ノートナンバー又は単位区間を示すパラメータ
ｊ…繰り返し回数を示すパラメータ
ｋ…サンプル番号を示すパラメータ
Ｌ…区間長
Ｎ１，Ｎ２，Ｎ（ｉ−１），Ｎ（ｉ），Ｎ（ｉ＋１）…ノートナンバー
ｎ，ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３），ｎ（ｄ２，１），ｎ（ｄ２，２），ｎ（ｄ２，３）…ノートナンバー／代表符号コード
Ｒｎ…周期関数
Ｓ（ｊ），Ｓ（ｊ＋１）…差分信号
Ｔ１〜Ｔ３…トラック
ｔ１〜ｔ６…時刻
ｗ…サンプル番号
Ｘ，Ｘ（ｋ）…区間信号[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a time-series signal analysis method and an acoustic signal encoding method, and performs analysis by extracting a plurality of periodic signals as constituent elements from a time-series signal given as a time-series intensity signal. The present invention also relates to a technique for encoding an acoustic signal using this analysis method. In particular, the present invention is suitable for use in a process for efficiently converting a general audio signal into MIDI format code data, and includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet distribution). , Various industrial fields that produce various audio contents provided by package media (CD, MD, cassette, video, LD, CD-ROM, game cassette), and various acoustic signals such as medical auscultatory sounds (eg heart sounds) Application to the field of analysis and diagnosis is expected.
[0002]
[Prior art]
A time-series signal represented by an acoustic signal includes a plurality of periodic signals as its constituent elements. For this reason, a method for analyzing what kind of periodic signal is included in a given time-series signal has been known for a long time. For example, Fourier analysis is widely used as a method for analyzing a frequency component included in a given time series signal.
[0003]
By using such a time-series signal analysis method, it is possible to encode an acoustic signal. With the spread of computers, it has become easy to sample an analog audio signal as the original sound at a predetermined sampling frequency, quantize the signal intensity at each sampling, and capture it as digital data. If a method such as Fourier analysis is applied to the data and the frequency components included in the original sound signal are extracted, the original sound signal can be encoded by a code indicating each frequency component.
[0004]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, when reproducing the actual sound, a MIDI sound source storing the waveform of the instrument sound is separately required. However, its high encoding efficiency is attracting attention, and encoding and decoding according to the MIDI standard are being attracted attention. This technology is currently widely adopted in software that uses a personal computer to perform musical instrument performance, musical instrument practice, composition, etc.
[0005]
  Therefore, by analyzing a time-series signal represented by an acoustic signal by a predetermined method, a periodic signal as a constituent element is extracted, and the extracted periodic signal is encoded using MIDI data. Proposals have been made. For example, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200, JP-A-11-95753,JP 2000-090909 A, JP 2000-090993 A, JP 2000-261322 A, JP 2001-005450 A, JP 2001-148633 A,Have proposed various methods capable of analyzing frequency components as constituent elements of an arbitrary time-series signal and creating MIDI data from the analysis result.
[0006]
[Problems to be solved by the invention]
  Each of the analysis methods disclosed in the above-mentioned literatures sets a plurality of unit sections along the time axis of the time series signal to be analyzed, and supports a predetermined periodic function with high correlation for each unit section. Thus, a procedure of extracting a signal corresponding to each periodic function as a periodic signal as a constituent element is adopted. However, in this analysis process, there is a problem that a periodic function that does not originally exist in the original time series signal is picked up. In particular, when these analysis methods are performed on the original sound signal consisting of instrument sounds, not only the original frequencies corresponding to the actually played sound, but also the surrounding frequencies and harmonic components are picked up. When MIDI data is finally created, there arises a problem that a code different from the original performance sound is created. In order to deal with such problems,JP 2000-261322 ADiscloses a method using generalized harmonic analysis instead of Fourier analysis.JP 2001-005450 ADiscloses a method to set the periodic function that is the target of correlation calculation more finely.JP 2001-148633 ADiscloses a method of performing frequency correction using a phase difference. However, with these methods proposed so far, accurate frequency analysis cannot always be performed, and when encoding is performed, there is a problem of quality deterioration such as distortion in reproduced sound.
[0007]
Therefore, the present invention provides a time-series signal analysis method and a sound signal encoding method that can perform more accurate frequency analysis on a time-series signal and can perform original sound signal encoding with high quality. The purpose is to do.
[0008]
[Means for Solving the Problems]
  (1) A first aspect of the present invention is a time-series signal analysis method for extracting a plurality of periodic signals as constituent elements from a time-series signal given as a time-series intensity signal.
  An input stage for capturing time series signals to be analyzed as digital data,
  A section setting stage for setting a plurality of unit sections on the time axis of the captured time series signal,
  Set multiple frequencies,By defining a pair of functions consisting of a sine function and a cosine function with the same frequency as a periodic function for that frequency,Each frequencyaboutA function group definition stage for defining a function group composed of periodic functions;
  Extracting a time-series signal in one unit section as a section signal X, and defining a section signal X as a first difference signal S (1);
  Among the defined function groups, the correlation value E with respect to the interval signal X is the highest, and the relationship between the correlation value EE and the correlation value E with respect to the j-th differential signal S (j) satisfies a predetermined setting condition. Is selected as the j-th element function, the j-th element function is excluded from the function group, and the j-th contained signal given by the product of the j-th element function and the correlation value EE A process of obtaining G (j) and setting a signal obtained by subtracting the jth inclusion signal G (j) from the jth difference signal S (j) as a new difference signal S (j + 1), j Element function selection stage that is repeated J times from = 1 to J (J is an arbitrary integer);
  And
  At the function group definition stage, a plurality of α standard frequencies in semitone units corresponding to MIDI note numbers are adjacent frequencies (including the same as the standard frequency) with multiple β variations in 1 / β semitone units. May be set within a range in which adjacent frequency bands for adjacent standard frequencies do not overlap each other, and each adjacent periodic function having each adjacent frequency is defined, and a total of α × β adjacent periodic functions are defined. The function group is configured by
In the element function selection stage, the effective value of the correlation value for the sine function and the correlation value for the cosine function is used as the correlation value between the predetermined periodic function and the predetermined signal.
In the element function selection stage, when a certain proximity periodic function is excluded from the function group, a total of β types of proximity periodic functions including a proximity periodic function that is a variation of the proximity periodic function are excluded,
  The section signal extraction stage and the element function selection stage are performed for each unit section, and a plurality of contained signals are obtained for each unit section, and the contained signals obtained for each unit section are calculated in the unit section. It is extracted as a periodic signal that is a component of the time series signal.
[0009]
(2) According to a second aspect of the present invention, in the time-series signal analysis method according to the first aspect described above,
In the element function selection stage, the correlation value E with respect to the section signal X is the highest in the function group, and the relationship between the correlation value EE and the correlation value E with respect to the j-th differential signal S (j) is set to a predetermined value. When performing the process of selecting the periodic function satisfying the condition as the j-th element function,
A periodic function having the highest correlation value E with respect to the interval signal X is selected as a temporary element function, a correlation value EE with respect to the j-th differential signal S (j) for this temporary element function is calculated, and a condition is determined.
If the set condition is satisfied as a result of the condition determination, the temporary element function is selected as the jth element function, and the jth element function is excluded from the function group. Done
If the setting condition is not satisfied as a result of the condition determination, a process for excluding the temporary element function from the function group and selecting a new temporary element function is performed until the setting condition is satisfied. Repeatedly.
[0010]
(3) According to a third aspect of the present invention, in the time-series signal analysis method according to the first or second aspect described above,
When a predetermined threshold value Δ is set as a predetermined setting condition for the relationship between the correlation value EE and the correlation value E in the element function selection stage, | E−EE | <Δ or EE> E−Δ. The condition is used.
[0012]
  (Four)   Of the present invention4thAspects of the above1st to 3rdIn the method of analyzing a time series signal according to the aspect of
  In the element function selection stage, one or more adjacent periodic functions having a frequency that is an integral multiple of the frequency of the selected element function and included in the function group are selected as overtone component functions; Each overtone-containing signal given by the product of “each overtone component function” and “correlation value EE with respect to the difference signal S (j) of the overtone component function” is obtained, and the contained signal G (j) is obtained from the difference signal S (j). And a signal obtained by subtracting each overtone-containing signal from each other are processed as a new difference signal S (j + 1).
[0013]
  (Five)   Of the present invention5thAspects of the above1st to 4thIn the method of analyzing a time series signal according to the aspect of
  By replacing the frequency of the contained signal obtained for each unit section with a standard frequency close to each other, the periodic signal finally extracted as a component of the time series signal in each unit section is one of the standard frequencies. This is a signal that has
[0016]
  (6)   Of the present invention6thAspects of the above1st to 5thIn the method of encoding an acoustic signal using the time-series signal analysis method according to the aspect of
  By treating the acoustic signal to be encoded as a time-series signal to be analyzed, the content signal for each individual unit section is obtained,
  Create code data that includes information indicating the amplitude and frequency of the contained signal obtained for a specific unit section, information indicating the frequency of the adjacent signal, and information indicating the position of the specific unit section on the time axis. The acoustic signal in the specific unit section is encoded by the code data.
[0017]
  (7)   Of the present invention7thAspects of the above6thIn the method of encoding an acoustic signal according to the aspect of
  For each unit section, M (M <J) inclusion signals are selected from the obtained J inclusion signals in descending order of amplitude, and code data is generated based on the M inclusion signals. It is what I did.
[0018]
  (8)   Of the present invention8thAspects of the above6th or 7thIn the method of encoding an acoustic signal according to the aspect of
  Section start position on the time axis of a specific unit section using velocity as information indicating the amplitude of the contained signal obtained for a specific unit section, using a note number as information indicating the frequency or a frequency in the vicinity thereof In this example, note-on time is used as information indicating the end time, note-off time is used as information indicating the end position of the section, and MIDI data is generated as code data.
[0019]
  (9)   Of the present invention9thAspects of the above1st to 8thA program for causing a computer to execute the time-series signal analysis method or the acoustic signal encoding method according to the above aspect is recorded on a computer-readable recording medium.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described based on the illustrated embodiments.
[0021]
§1. Basic principle of analysis method and encoding method according to the present invention
First, the basic principle of the time-series signal analysis method and acoustic signal encoding method according to the present invention will be described. Since this basic principle is disclosed in the above-mentioned publications or specifications, only the outline will be briefly described here.
[0022]
Suppose that an analog acoustic signal is given as a time-series signal as shown in FIG. In the illustrated example, this acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis. Here, first, the analog sound signal is processed as digital sound data. This may be performed by using a conventional general PCM method, sampling the analog acoustic signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined number of quantization bits.
[0023]
Subsequently, a plurality of unit sections are set on the time axis of the acoustic signal to be analyzed. In the example shown in FIG. 1A, six times t1 to t6 are defined at equal intervals on the time axis t, and five unit intervals d1 to d5 having these times as the start point and the end point are set. In the example shown in the figure, all unit sections having the same section length are set. However, the section length may be changed for each unit section. Alternatively, the section setting may be performed such that adjacent unit sections partially overlap on the time axis.
[0024]
When the unit section is set in this way, representative frequencies are selected for the acoustic signals (hereinafter referred to as section signals) for each unit section. Each section signal usually includes various frequency components. For example, a frequency component having a large amplitude may be selected as the representative frequency. Although only one representative frequency may be selected, encoding with higher accuracy becomes possible by selecting a plurality of representative frequencies. FIG. 1B shows an example in which three representative frequencies are selected for each unit section and one representative frequency is encoded as one representative code code (shown as a note for convenience in the figure). It is shown. Here, three tracks T1, T2, and T3 are provided to accommodate representative code codes (musical notes), but this means that three representative code codes selected for each unit section, This is to accommodate different trucks. Here, “code” means “code” meaning a symbol, not “chord” indicating a chord.
[0025]
For example, representative code codes n (d1,1), n (d1,2), n (d1,3) selected for the unit section d1 are accommodated in tracks T1, T2, T3, respectively. Here, each code n (d1,1), n (d1,2), n (d1,3) is a code indicating a note number in the MIDI code. The note number in the MIDI code takes 128 values from 0 to 127, each indicating one key of the piano keyboard. Specifically, for example, when 440 Hz is selected as the representative frequency, this frequency corresponds to the note number n = 69 (corresponding to “ra sound (A3 sound)” in the center of the piano keyboard). As a result, n = 69 is selected. However, FIG. 1B is a conceptual diagram showing the representative code code obtained by the above-described method in the form of a note. Actually, data relating to strength is also added to each note. For example, the track T1 includes data indicating the scales of note numbers n (d1,1), n (d2,1)... And data indicating the intensity of e (d1,1), e (d2,1). Will be housed. The data indicating the intensity is determined by the degree to which the component of each representative frequency is included in the original section signal. Specifically, the data indicating the intensity is determined based on the correlation value with respect to the section signal of the periodic function having each representative frequency. In addition, in the conceptual diagram shown in FIG. 1 (b), the position of each unit section on the time axis is indicated by the position of the note in the horizontal direction. Is accurately added as a numerical value to each note.
[0026]
As a format for encoding an acoustic signal, it is not always necessary to adopt the MIDI format. However, since the MIDI format is the most popular as this type of encoding, code data in the MIDI format is practically used. Is most preferred. In the MIDI format, “note-on” data or “note-off” data exists while interposing “delta time” data. “Note-on” data is data that designates a specific note number N and velocity V to instruct the start of performance of a specific sound, and “note-off” data is specific note number N and velocity V. Is data that designates the end of the performance of a specific sound. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which the piano keyboard is pressed down (velocity at note-on) and the speed at which the finger is released from the keyboard (velocity at note-off). Or it shows the strength of the performance end operation.
[0027]
In the above-described method, J note numbers n (di, 1), n (di, 2),..., N (di, J) are obtained as representative code codes for the i-th unit interval di. Intensities e (di, 1), e (di, 2),..., E (di, J) are obtained for each. Therefore, MIDI format code data can be created by the following method. First, as the note number N described in the “note-on” data or “note-off” data, the obtained note numbers n (di, 1), n (di, 2),..., N (di, J ) Can be used as they are. On the other hand, as the velocity V described in the “note-on” data or “note-off” data, the obtained intensities e (di, 1), e (di, 2),..., E (di, J) A value normalized by a predetermined method may be used. The “delta time” data may be set according to the length of each unit section.
[0028]
§2. Specific method for obtaining correlation with periodic function
In the method based on the basic principle described above, one or a plurality of representative frequencies are selected for the section signal, and the section signal is represented by a periodic signal having this representative frequency. The representative frequency selected here is literally the frequency representing the signal component in the unit section. Specific methods for selecting the representative frequency include a method using a short-time Fourier transform and a method using a generalized harmonic analysis method, as described in §3. Both methods have the same basic concept, and a plurality of periodic functions having different frequencies are prepared in advance, and a periodic function having a high correlation with the section signal in the unit section is selected from the plurality of periodic functions. And a method of selecting the frequency of the highly correlated periodic function as a representative frequency is adopted. That is, when a representative frequency is selected, an operation for obtaining a correlation between a plurality of periodic functions prepared in advance and a section signal in a unit section is performed. Therefore, here, a specific method for obtaining the correlation with the periodic function will be described.
[0029]
Assume that trigonometric functions as shown in FIG. 2 are prepared as a plurality of periodic functions. These trigonometric functions are composed of a pair of a sine function and a cosine function having the same frequency. For each of 128 standard frequencies f (0) to f (127), a pair of a sine function and a cosine function. Is defined. Here, a pair of functions consisting of a sine function and a cosine function having the same frequency is defined as a periodic function for the frequency. In other words, the periodic function for a specific frequency is constituted by a pair of sine function and cosine function. Thus, the periodic function is defined by a pair of sine function and cosine function in order to eliminate the influence of the correlation value on the correlation value when the correlation value of the periodic function with respect to the signal is obtained. The variables F and k in each trigonometric function shown in FIG. 2 are variables corresponding to the sampling frequency F and the sample number k for the section signal X. For example, a sine wave with respect to the frequency f (0) is expressed by sin (2πf (0) k / F), and given an arbitrary sample number k, the same time position as the k-th sample constituting the section signal The amplitude value of the periodic function at is obtained.
[0030]
Here, an example in which 128 standard frequencies f (0) to f (127) are defined by equations as shown in FIG. That is, the nth (0 ≦ n ≦ 127) standard frequency f (n) is
f (n) = 440 · 2^{γ (n)}
γ (n) = (n−69) / 12
Will be defined by the expression If the standard frequency is defined by such an expression, it is convenient when finally encoding using MIDI data is performed. This is because the 128 standard frequencies f (0) to f (127) set by such a definition take frequency values forming a geometric series, and correspond to the note numbers used in the MIDI data. This is because it becomes a frequency. For example, note number n = 69 represents the “ra sound (A3 sound)” at the center of the piano keyboard as described above, and corresponds to a sound of 440 Hz. If the th standard frequency f (n) is defined, when n = 69 is substituted, f (n) = 440 is obtained. In other words, the 128 standard frequencies f (0) to f (127) defined by the equation shown in FIG. 3 are frequencies corresponding to 128 note numbers n = 0 to 127 in the MIDI data. Become. The note number n indicates a logarithmic scale whose frequency is doubled by one octave, and therefore does not correspond linearly to the frequency axis f. Therefore, the 128 standard frequencies f (0) to f (127) shown in FIG. 2 are frequencies set at equal intervals (in semitone units in MIDI) on the frequency axis shown on the logarithmic scale. For this reason, in this application, the note number axis | shaft in the graph published on a figure will show all in a logarithmic scale.
[0031]
Next, a more specific description will be given of how to obtain the correlation of each periodic function with respect to an arbitrary interval signal. For example, as shown in FIG. 4, it is assumed that a section signal X is given for a certain unit section d. Here, it is assumed that sampling is performed at the sampling frequency F for the unit section d having the section length L, and w sample values are obtained in total, and the sample numbers are 0, Let 1, 2, 3,..., K,..., W-2, w-1 (the w-th sample indicated by a white circle is a sample included at the beginning of the next unit interval adjacent to the right). In this case, for an arbitrary sample number k, an amplitude value of X (k) is given as digital data.
[0032]
Let us show the principle of obtaining a correlation value with the sine function Rn having the nth standard frequency f (n) for such a section signal X. Both correlation values A (n) can be defined by the first arithmetic expression of FIG. Here, X (k) is the amplitude value of the sample number k in the section signal X as shown in FIG. 4, and sin (2πf (n) · k / F) is at the same position on the time axis. This is the amplitude value of the sine function Rn. This first arithmetic expression obtains the product of the amplitude value of the section signal X and the amplitude value of the sine function Rn for the positions of all sample numbers k = 0 to w−1 in the unit section d, and calculates the sum of the products. It can be said that it is a desired expression. Since the amplitude value has a positive / negative sign, the product also has a positive / negative sign. Therefore, if there is no correlation between the section signal X and the sine function Rn, the sign of the product of both amplitudes becomes positive or negative at random, and the sum is zero. On the contrary, if there is a correlation between the two, the absolute value of the sum of the products of both amplitudes increases according to the degree of correlation. For example, when the amplitude of the interval signal X is positive, the amplitude of the sine function Rn is always positive, and when the amplitude of the interval signal X is negative, the amplitude of the sine function Rn is always negative. When the interval signal X and the sine function Rn have the same frequency and the same phase, the sum of the products becomes a positive maximum value, and conversely, the amplitude of the interval signal X is positive. Sometimes the amplitude of the sine function Rn is always negative, and when the amplitude of the interval signal X is negative, there is a negative correlation such that the amplitude of the sine function Rn is always positive (interval signal X and sine If the function Rn is at the same frequency and opposite phase), the sum of the products is the negative maximum value.
[0033]
Similarly, the second arithmetic expression in FIG. 5 is an expression for obtaining a correlation value between the interval signal X and the cosine function having the nth standard frequency f (n), and the correlation value between the two is B ( n). The first arithmetic expression for obtaining the correlation value A (n) and the second arithmetic expression for obtaining the correlation value B (n) are finally multiplied by the coefficient 2 / w. This is for normalizing the correlation value. That is, the denominator w is the total number of samples included in the unit interval d, and the total obtained for all w samples from k = 0 to w−1 is divided by the total number of samples w to obtain one sample. Means to find the average of the minutes. On the other hand, the numerator 2 is a constant for causing the correlation values A (n) and B (n) to be values between −1 and +1.
[0034]
The total correlation between the interval signal X and the periodic function having the standard frequency f (n) is, for example, the correlation value A (n) with the sine function and the cosine as shown in the third arithmetic expression in FIG. It can be represented by the effective value of the correlation value B (n) with the function, that is, the root sum square value E (n). In this way, by using the square sum of squares, it is possible to obtain a comprehensive correlation that reflects both positive and negative correlations, and to obtain an accurate correlation that eliminates the influence of the phase. . For example, when a positive correlation is shown for a sine function and a negative correlation is shown for a cosine function, the correlation value A (n) is a positive value and the correlation value B (n) is a negative value. The square sum square root value E (n) is a value reflecting the absolute value of both correlation values.
[0035]
The arithmetic expression shown in FIG. 5 is an example in which a trigonometric function is used as a periodic function (in other words, an example of a function in which the waveform shape is a sine wave), but the periodic function used in carrying out the present invention. The waveform shape is not limited to a sine wave, and a periodic function having a waveform shape such as a triangular wave, a rectangular wave, or a sawtooth wave may be used. For example, as a periodic function, a plurality of functions whose waveform shape is a sine wave, a triangular wave, a rectangular wave, or a sawtooth wave are defined, and have a predetermined waveform shape based on the characteristics of the acquired acoustic data. The function can also be selectively used manually (instructed by the operator). Of course, it is possible to provide a function of automatically selecting the most appropriate periodic function by analyzing the characteristics of the acquired acoustic data without waiting for the operator's selection instruction.
[0036]
The expression shown in FIG. 6 is an arithmetic expression that defines a correlation when a general periodic function Rn having a standard frequency f (n) is used instead of the trigonometric function. In the arithmetic expression for obtaining the correlation value A (n), the periodic function Rn (k) is used, whereas in the arithmetic expression for obtaining the correlation value B (n), the periodic function Rn (k + F / 4f (n)) Is used because both periodic functions have the same standard frequency f (n) but differ in phase from each other by π / 2. As described above, F is the sampling frequency of the section signal X, and F / f (n) corresponds to the total number of samples in one cycle. Therefore, F / 4f (n) is a value indicating the number of samples within a time corresponding to a quarter period, and a value indicating the phase difference π / 2 in units of sample numbers. As described above, if the correlation values A (n) and B (n) are obtained for a pair of periodic functions having the standard frequency f (n) and having phases different from each other by π / 2, the square sum square root value E ( n) is a parameter indicating an overall correlation for a periodic function having a standard frequency f (n).
[0037]
§3. Two methods for selecting a representative frequency
As already described in §2, the basic principle of selecting a representative frequency for a specific interval signal is to obtain the correlation between a plurality of periodic functions prepared in advance and the interval signal, and the frequency of the highly correlated periodic function. Is selected as a representative frequency. As specific methods for selecting the representative frequency, two methods, a method using a short-time Fourier transform and a method using a generalized harmonic analysis method, have been proposed. Here, these two methods will be described.
[0038]
In the method using the Fourier transform, first, correlation values for the section signal X are obtained for all prepared periodic functions. For example, when 128 periodic functions are prepared as shown in FIG. 2, the calculation shown in the equation of FIG. 5 is performed on all these periodic functions, and correlation values E (0) to E (127) are obtained. ). If the correlation value obtained in this way is plotted as a correlation strength E (n) to form a graph, an intensity graph as shown in FIG. 7 is obtained. This intensity graph is a graph showing frequency components for the section signal X in the unit section d1 of the acoustic signal shown in FIG. 1, and shows a so-called Fourier spectrum. Such processing for obtaining a Fourier spectrum is Fourier transform. When Fourier transform is performed on a signal in a short time interval such as the unit interval d1 (short-time Fourier transform), usually a Hanning Window In general, the calculation is performed after the interval signal X is filtered using a weight function such as. In the first place, the original Fourier transform is based on the theory that there is an infinite number of similar signals before and after the extracted section. Therefore, if the weight function is not used in the short-time Fourier transform, high-frequency noise is added to the created spectrum. There are many cases. If a weight function such as a Hanning window function is used such that the weights at both ends of the section are 0, such a harmful effect can be suppressed to some extent. The Hanning window function H (k) has a unit section length L, and k = 1... L (k is a parameter indicating a position in the unit section).
H (k) = 0.5−0.5 * cos (2πk / L)
Is a function given by
[0039]
The intensity graph of the unit interval d1 thus obtained is a graph indicating the ratio of each frequency component corresponding to the note number n = 0 to 127 included in the interval signal X in the unit interval d1 as the correlation intensity E (n). Can do. Therefore, based on each correlation strength E (n) shown in the intensity graph, J note numbers are selected from a total of 128 note numbers, and these J note numbers are converted into unit intervals d1. May be selected as a representative frequency that represents. For example, if extraction is performed on the basis of “three code codes are extracted in descending order of correlation strength E (n)”, in the example shown in FIG. 7, note number n (d1) is used as the first representative frequency. , 1), the note number n (d1, 2) is selected as the second representative frequency, and the note number n (d1, 3) is selected as the third representative frequency.
[0040]
In this way, when J representative frequencies are selected, the section signal X for the unit section d1 can be encoded based on each representative frequency and its correlation strength. For example, in the case of the above-described example, in the intensity graph shown in FIG. 7, the correlation strengths of note numbers n (d1,1), n (d1,2), n (d1,3) are e (d1,1), If e (d1,2) and e (d1,3), the section signal X in the unit section d1 can be expressed by the following three pairs of data.
n (d1,1), e (d1,1)
n (d1,2), e (d1,2)
n (d1,3), e (d1,3)
The code code arranged at the position corresponding to the unit section d1 of each track in FIG. 1 (b) is the code code thus obtained. The above is the method of short-time Fourier transform.
[0041]
  Next, I will explain the method of generalized harmonic analysis. Details of this technique are, for example,JP 2000-261322 AThe basic principle is as follows. Assume that there is a signal S (j) for a unit interval d as shown in FIG. Here, j is a parameter for repetitive processing (j = 1 to J), as will be described later. First, a Fourier spectrum for the signal S (j) is obtained. That is, an intensity graph as shown in FIG. 7 is obtained in the same manner as the short-time Fourier transform method described above. However, in the short-time Fourier transform method described above, J frequencies (J = 3 in the above example) are selected as representative frequencies in descending order of correlation strength based on this strength graph. Only one frequency with the highest correlation strength is selected as the representative frequency. Subsequently, an inclusion signal G (j) as shown in FIG. 8B is defined. This inclusion signal G (j) is a periodic function having a selected representative frequency, and its amplitude is a value corresponding to the correlation strength with respect to the signal S (j). More specifically, for example, as shown in FIG. 2 as a periodic function, when a frequency f (n) is selected as a representative frequency using a pair of sine function and cosine function, it has an amplitude A (n). A signal composed of the sum of the sine function A (n) sin (2πf (n) k / F) and the cosine function B (n) cos (2πf (n) k / F) having the amplitude B (n) is obtained. This is the content signal G (j) (in FIG. 8B, only one function is shown for convenience of illustration). Here, since A (n) and B (n) are normalized correlation values obtained by the equation of FIG. 5, the inclusion signal G (j) is eventually included in the signal S (j). It can be said that the signal component has a certain frequency f (n).
[0042]
Thus, when the content signal G (j) is obtained, the difference signal S (j + 1) is obtained by subtracting the content signal G (j) from the signal S (j). FIG. 8C shows the differential signal S (j + 1) obtained in this way. The difference signal S (j + 1) can be said to be a signal composed of the remaining signal components obtained by removing the signal component having the frequency f (n) from the original signal S (j). Therefore, by increasing the parameter j by 1, this difference signal S (j + 1) is handled as a new signal S (j), and the same processing is performed J times while increasing the parameter j by 1 from j = 1 to J. If it is repeatedly executed, J representative frequencies can be selected.
[0043]
After all, in order to select a total of J representative frequencies by applying the generalized harmonic analysis method to the section signal X in a predetermined unit section, first, the parameter j is set to the initial value 1, and this section The signal X is defined as the first differential signal S (1), and the above-described processing may be repeated J times while increasing the parameter j by 1 from j = 1 to J (difference when j = 1). The signal S (1) is the section signal X itself to be analyzed and should not be referred to as “difference signal”, but here the signal S (j) is referred to as “difference signal”. The signal S (1) is also referred to as “difference signal” for convenience.) In the case of the method using the Fourier transform described above, all of the J representative frequencies are selected based on the correlation with respect to the original interval signal X, whereas in the case of this generalized harmonic analysis method, the representative is selected. Each time one frequency is determined, a difference signal obtained by subtracting the representative frequency component from the original interval signal X is obtained, and the next representative frequency is determined based on the correlation with the difference signal. It will be repeated J times.
[0044]
§4.2 Characteristics of the two methods and basic concept of the present invention
In §3, two methods for selecting a representative frequency for a specific section signal were described, that is, a method using short-time Fourier transform and a method using generalized harmonic analysis. Each of these two methods has unique characteristics. In general, the method based on the generalized harmonic analysis is more computationally expensive than the method based on the short-time Fourier transform, but can select a more accurate representative frequency.
[0045]
For example, the example shown in FIG. 9 shows the results of performing spectrum analysis on instrument sounds by two methods. Fig. 9 (a) shows the spectrum of the instrument sound as the original signal, Fig. 9 (b) shows the spectrum analyzed by the short-time Fourier transform, and Fig. 9 (c) shows the generalized harmonic. The spectrum analyzed by analysis is shown. In both cases, the horizontal axis indicates a note number (logarithmic scale), and one scale corresponds to a frequency difference in units of semitones. As shown in FIG. 9 (a), the original signal is a chord of note numbers N1 and N2 (a third-harmonic pitch). For example, in the case of a piano, a key corresponding to note number N1 and N2 This is equivalent to hitting the key to be simultaneously pressed. Therefore, since the original signal is basically composed of the components of the note numbers N1 and N2, when the sound signal generated by the original signal is MIDI-encoded, originally, the MIDI code corresponding to the note number N1; A MIDI code corresponding to the note number N2 should be generated.
[0046]
However, when this acoustic signal is analyzed by short-time Fourier transform, as shown in FIG. 9 (b), it corresponds to the basic sound corresponding to the note number N1 (referred to as N1 basic sound) and the note number N2. In addition to the basic sound (referred to as N2 basic sound), many overtone components appear, and their surrounding sounds also appear. In the figure, the basic sound and overtone components are indicated by bold lines, and the peripheral sound components thereof are indicated by thin lines. The harmonic component of the fundamental sound appears in the spectrum intentionally due to the physical characteristics of acoustic instruments (natural instruments that have been used for a long time, not electronic instruments such as electronic instruments). This is because not only the frequency component of the performance sound that is played but also a frequency component that is an integral multiple of that frequency component is included. In the illustrated example, for each of the basic sounds N1 and N2, harmonic components of the second harmonic, the third harmonic, the fourth harmonic, the fifth harmonic, and the sixth harmonic appear. The note number is a frequency parameter defined at equal intervals on the logarithmic scale frequency axis. Therefore, when looking at the number of the note number itself, the second overtone corresponds to the note number obtained by adding 12 to the note number of the basic sound. Hereinafter, the third harmonic corresponds to +19, the fourth harmonic corresponds to +24, the fifth harmonic corresponds to +28, and the sixth harmonic corresponds to +31.
[0047]
As described above, the Fourier transform is originally based on the theory that the same signal is assumed to be infinite before and after the unit section from which the signal is cut out. However, in reality, a section signal in a unit section having a finite section length is not such an ideal signal. For this reason, when short-time Fourier transform is performed, in addition to the above harmonic components, those peripheral sound components appear. These ambient sounds may cause the original basic sound to be blurred in some cases. For example, in the example shown in FIG. 9 (b), each basic sound is indicated by a bold line, so that it can be identified on the figure, but in reality, the N2 basic sound is hidden by the peripheral sound of the N1 basic sound. It has been done. That is, when three note numbers are selected from the Fourier spectrum shown in FIG. 9 (b) in descending order of the correlation strength, the basic sound N1 is selected, but the basic sound N2 is buried in the surrounding sound and leaked from the selection. It will end up.
[0048]
On the other hand, when generalized harmonic analysis is performed on the same acoustic signal, as shown in FIG. 9 (c), the harmonic component indicated by the thick line still remains, but the surrounding sound component indicated by the thin line is considerably suppressed. I understand that. Therefore, if three note numbers are selected from the Fourier spectrum shown in FIG. 9C in descending order of correlation strength, both the basic sounds N1 and N2 are selected. However, it is not possible to completely remove the peripheral sound component and the overtone component using this generalized harmonic analysis method. Therefore, there is a possibility that the basic sound is still buried by the peripheral sound component, and there is a possibility that the basic sound is buried and not correctly selected when encoding is performed. Moreover, if such a peripheral sound component is encoded in the original basic sound component, unpleasant noise is generated during reproduction. As is known from the theory of harmony, if adjacent sounds with semitone intervals are played simultaneously, they become harmony and dissonance, and if many of these sounds gather, it becomes noise. Incidentally, the N1 sound and the N2 sound shown in FIG. 9 (a) are the third and third harmonics, and are the harmonics in which the fourth harmonic and the fifth harmonic resonate.
[0049]
In this way, in the conventionally proposed time-series signal analysis method, not only the fundamental sound component but also its overtone component and surrounding sound component appear in the spectrum, so that the original basic signal intensity was low. There is a problem that the sound component is buried and cannot be selected, or an unnecessary component is selected by mistake. The present invention is based on a new technique for solving such a problem.
[0050]
First, the basic principle of a method for completely removing peripheral sound components will be described. Now, comparing the spectra shown in FIGS. 9 (b) and 9 (c), the N1 basic sound and the N2 basic sound have almost the same correlation strength, whereas the peripheral sound indicated by the thin line is It can be seen that the correlation strength is different between the two. That is, in the case of the illustrated example, the peripheral sound component in the spectrum by the generalized harmonic analysis shown in FIG. 9 (c) is generally compared to the peripheral sound component in the spectrum by the short-time Fourier transform shown in FIG. 9 (b). It is getting smaller. In this way, comparing the analysis result by the short-time Fourier transform and the analysis result by the generalized harmonic analysis, the correlation strength for the basic sound is almost the same, but the correlation strength for the surrounding sound is quite different. It turns out that there is a property. Conversely, when both analysis results are given, if there is a frequency (note number in this example) that has a large difference in correlation strength, the frequency component is not the original fundamental sound component. In other words, it may be determined that it is a peripheral sound component.
[0051]
  The basic concept of the present invention is based on the above-described principle. When an interval signal to be analyzed is given, an analysis result by a short-time Fourier transform and a generalized harmonic analysis are performed on the interval signal. The idea is to perform both of the analysis results and determine that the frequencies for which there is a large difference in the correlation strength between the two are not the original frequencies that should be selected as representative frequencies, and will not select them. Is based. If such an idea is followed, in the example shown in FIG. 9, it will be understood that none of the peripheral sound components indicated by the thin lines is selected as the representative frequency. Specifically, for both correlation strength differences, a predetermined threshold value Δ is set, and a condition is set so that the difference between the two is within the threshold value Δ. You should make it elect. That is, for a predetermined threshold ΔCorrelation strength differenceIt is sufficient that the condition | E−EE | <Δ is satisfied. Δ is a positive value, for example, about E / 2. For example, the N1 basic sound and the N2 basic sound shown in FIG. 9 both satisfy this setting condition (the correlation strength of each basic sound in FIG. 9 (b) and the basic sound in FIG. 9 (c)). The correlation intensities are almost the same, and the difference is within a predetermined threshold value Δ), which is selected as the representative frequency.
[0052]
Note that, with this method alone, each harmonic component is selected as a representative frequency, but an additional method for removing the harmonic component will be described in §6. In addition, when the present inventor conducted experiments on various acoustic signals, the ambient sound component has a correlation strength on the spectrum by the generalized harmonic analysis, compared to the correlation strength E on the spectrum by the short-time Fourier transform. It was found that there were an overwhelming number of cases where EE was smaller. Therefore, as a setting condition for determining whether or not to select the representative frequency, a condition of EE> E−Δ may be used instead of the condition of | E−EE | <Δ.
[0053]
§5. Basic procedure of time series signal analysis method according to the present invention
Next, a basic procedure of the time series signal analysis method according to the present invention will be described based on the flowchart of FIG.
[0054]
First, in step S1, a time series signal to be analyzed is captured as digital data. When an analog acoustic signal is an analysis target, for example, sampling may be performed at a sampling frequency of 44.1 kHz and captured as digital data. In the subsequent step S2, a plurality of unit sections are set on the time axis of the captured time series signal. For example, in the example shown in FIG. 1, unit sections d1 to d5 having a predetermined section length are set. When sampling is performed at a sampling frequency of 44.1 kHz, if the number of samples included in one unit section is, for example, 2048, the section length of the unit section is about 46 ms.
[0055]
In the next step S3, a plurality of frequencies are set, and a function group composed of periodic functions having respective frequencies is defined. In the above example, as shown in FIG. 2, 128 standard frequencies f (0) to f (127) corresponding to MIDI note numbers 0 to 127 are defined, and 128 pairs of sine and cosine functions are defined. Will do. In this way, a pair of functions consisting of a sine function and a cosine function having the same frequency is used as a periodic function for the frequency, as described above, in order to avoid the influence of the phase when obtaining the correlation. It is. As a correlation value between this periodic function and a predetermined signal, as shown in FIG. 5, an effective value E (n) of a correlation value A (n) for the sine function and a correlation value B (n) for the cosine function. Will be used.
[0056]
Subsequently, in step S4, a time series signal in one unit section is extracted as the section signal X, and this section signal X is defined as a first difference signal S (1). This differential signal S (1) corresponds to the signal S (j) shown in FIG. 8 (a), and is the signal that is the first analysis target in performing the generalized harmonic analysis (as described above). In addition, the difference signal S (1) at j = 1 is not originally a signal that should be called a "difference signal", but is called a "difference signal" here for convenience.) In the next step S5, the parameter j indicating the number of repetitions of the generalized harmonic analysis is set to an initial value 1, and the processing in steps S6 to S8 is performed as the condition in step S9, that is, the parameter j is set to the predetermined number J. Until it reaches, it is repeatedly executed while increasing the parameter j by 1 in step S10. This iterative process is basically the process of the generalized harmonic analysis described above, but in the normal generalized harmonic analysis, the representative frequency was selected unconditionally in the order of the correlation value, Here, it is determined whether or not the relationship between the correlation value EE with respect to the difference signal S (j) and the correlation value E with respect to the section signal X satisfies a predetermined setting condition, and only when the condition is satisfied, Select as the representative frequency.
[0057]
That is, in step S6, the correlation value E with respect to the section signal X is the highest among the functions defined in step S3, and the correlation value EE and the correlation value E with respect to the jth difference signal S (j) A process of selecting a periodic function whose relationship satisfies a predetermined setting condition as the j-th element function is performed. Here, the element function is a periodic function that can be determined to be a constituent element of the section signal X, and is a periodic function having a frequency suitable for selection as a representative frequency. As described above, in the conventional generalized harmonic analysis, the periodic function with the highest correlation value is selected as an element function unconditionally, but here, the periodic function with the highest correlation value E is used as a temporary element function for the time being. For this temporary element function, the correlation value EE for the j-th differential signal S (j) is obtained, and only when the relationship between the correlation value E and the correlation value EE satisfies a predetermined setting condition, The temporary element function is selected as the formal element function. If the temporary element function does not satisfy the predetermined setting condition, a periodic function having the next magnitude of the correlation value E with respect to the interval signal X is selected as the next temporary element function, Similarly, a correlation value EE with respect to the j-th difference signal S (j) is obtained, and it is determined whether or not the relationship between the correlation value E and the correlation value EE satisfies a predetermined setting condition.
[0058]
Since the periodic function having the highest correlation value E with respect to the section signal X in the function group does not necessarily satisfy the predetermined setting condition, the “correlation value E with respect to the section signal X in step S6”. The meaning of the phrase “is the highest” means “the highest among periodic functions that satisfy a predetermined setting condition”. The “predetermined setting condition” is a setting condition for excluding ambient sounds as described in §4. Specifically, the “predetermined setting condition” is a correlation value EE (generalized harmony shown in FIG. 9 (c)). Correlation value E (corresponding to the correlation strength of the analysis spectrum) and correlation value E (corresponding to the correlation strength of the short-time Fourier transform spectrum shown in FIG. 9B) are approximated within a predetermined range, or the former is smaller than the latter What is necessary is just to use the condition that there is no. In step S6, by selecting, as an element function, a periodic function that satisfies such setting conditions and has the highest correlation value E with respect to the section signal X, a periodic function corresponding to the frequency of the peripheral sound component indicated by a thin line in FIG. It can be prevented from being selected as an element function. However, since the difference signal S (j) = the section signal X at the stage of j = 1, the correlation value E = the correlation value EE, and the “predetermined setting condition” is always satisfied. Therefore, it is in the stage after j = 2 that the condition judgment in step S6 is meaningful. Note that the Hanning window described in §3 is not set when performing the correlation value calculation so that the correlation value E and the correlation value EE can be compared under the same conditions as much as possible.
[0059]
When the j-th element function is selected in this way, in step S7, processing for excluding the j-th element function from the function group is performed. This is to prevent a periodic function once selected as an element function from being selected again as an element function again later.
[0060]
In subsequent step S8, the j-th inclusion signal G (j) is obtained based on the selected j-th element function. The element function is a function selected from among the 128 periodic functions shown in FIG. 2 (a pair of a sine function and a cosine function having the same frequency), and is a function that does not have a coefficient to be an amplitude. The contained signal G (j) is a signal obtained by adding a coefficient indicating an amplitude to this element function, and specifically, a signal obtained by multiplying the element function by a correlation value EE. The correlation value EE is a value indicating the correlation between the element function and the j-th difference signal S (j), and the larger the correlation value EE, the greater the frequency component of the element function is included. become. As described above, since a standardized value is used as the correlation value, the inclusion signal given by the product of the element function and the correlation value EE is included in the difference signal S (j). This is a signal having the frequency component of the element function. Actually, since the element function is a pair of a sine function and a cosine function having the same frequency, the contained signal G (j) is a sine having a coefficient of the correlation value A (n) obtained by the expression shown in FIG. This is a signal composed of the sum of the signal and a cosine signal having a correlation value B (n) as a coefficient.
[0061]
Next, a new difference signal S (j + 1) is obtained by subtracting the jth inclusion signal G (j) from the jth difference signal S (j). This difference signal S (j + 1) is the remaining signal obtained by removing the component of the inclusion signal G (j) from the original difference signal S (j). Eventually, the process of step S8 is the difference signal S ( This is a process of extracting the content signal G (j) from j).
[0062]
Thus, if the processes of steps S6 to S8 are repeated J times from j = 1 to J (J is an arbitrary integer), a total of J inclusion signals G (1) to G (J) are extracted. become. The J inclusion signals are signal components included in the original section signal X, and are signal components that are not ambient sound components. Thus, with respect to the section signal X to be analyzed, J inclusion signals G (1) to G (J) are extracted as periodic signals as the constituent elements.
[0063]
The above is the process for one unit section, but by returning to step S3 via step S11, the same process is executed for the next unit section. In this way, if it is determined in step S11 that the processing for all unit sections has been completed, all of the procedures are completed. Eventually, J inclusion signals G (1) to G (J) are extracted for all unit sections. Each contained signal has information on a predetermined frequency (representative frequency) and information on a predetermined amplitude (intensity), and each unit section has information indicating a position on the time axis. Therefore, if code data is created based on these pieces of information, it is possible to encode the acoustic signal input as a time-series signal in step S1. For example, if MIDI data is created as code data, velocity is used as information indicating the amplitude of the contained signal obtained for each unit section, note number is used as information indicating frequency, and time of each unit section is used. The note-on time may be used as information indicating the section start position on the axis, and the note-off time may be used as information indicating the section end position.
[0064]
In step S9, if J iterations are performed, a total of J inclusion signals G (1) to G (J) are extracted for each unit section. Does not necessarily need to use information about all these inclusion signals. For example, for each unit section, M (M <J) inclusion signals are selected from the obtained J inclusion signals in descending order of amplitude, and code data is generated based on the M inclusion signals. As a result, it is possible to selectively encode only the main frequency component having a large amplitude.
[0065]
The basic concept flow of the time-series signal analysis method and the acoustic signal encoding method according to the present invention has been described above based on the flowchart of FIG. Therefore, here, consideration is given to the actual execution of processing based on this basic concept using a computer.
[0066]
First, the process of step S6 is a process of selecting one element function from the function group. If a function group as shown in FIG. 2 is prepared, one element function is selected from these 128 periodic functions. As the selection conditions, “the correlation value E with respect to the section signal X is the highest” and “the relationship between the correlation value EE with respect to the difference signal S (j) and the correlation value E with respect to the section signal X satisfies a predetermined setting condition. It becomes the condition of “Yes” In order to select an element function based on such conditions, the following processing may actually be performed.
[0067]
First, all correlation values E (n) between the section signal X and 128 periodic functions are calculated (n = 0 to 127). Here, the correlation value E (n) is an effective value obtained by the equation shown in FIG. Since the interval signal X itself does not change even if the parameter j indicating the number of repetitions increases by one, the calculation for obtaining the 128 correlation values E (n) is performed only once when the parameter j = 1. It would be enough to record this.
[0068]
Next, the periodic function having the highest correlation value E (n) thus obtained is selected as a temporary element function. Then, a correlation value EE (n) for the difference signal S (j) is calculated for this temporary element function. Since the difference signal S (j) changes each time the parameter j increases, the calculation for obtaining the correlation value EE (n) for the selected temporary element function needs to be performed every time. Then, it is determined whether or not the relationship between EE (n) and E (n) satisfies a predetermined setting condition. As a result, if the setting condition is satisfied, the temporary element function is formally selected as the j-th element function. If the setting condition is not satisfied, a new temporary element function is selected. Select and do the same. Thus, if the same processing is repeated until the set condition is satisfied, a formal element function is eventually selected.
[0069]
It should be noted that the periodic function officially selected as the element function is excluded from the function group in step S7, so that duplicate selection is avoided, but in practice, it is once selected as the temporary element function. However, it is preferable to perform a process of excluding a periodic function that has not been officially selected as an element function because it did not satisfy a predetermined setting condition. As already described, since the periodic function that does not satisfy the predetermined setting condition is a periodic function having a frequency that is a peripheral sound component, it is an inappropriate periodic function for selection as an element function. Therefore, it is better to remove such a periodic function from the function group when it is selected as a temporary element function. Therefore, in practice, the exclusion process from the function group is not performed in step S7, but may be performed when the temporary element function is selected in step S6. That is, in step S6, once the periodic function selected as the temporary element function is excluded from the function group at this point regardless of whether or not it is selected as the formal element function. Good. Then, when selecting a new temporary element function from the function group, the periodic function having the highest correlation value E (n) is always selected from the periodic functions remaining in the function group at that time. Since selection should be made as a provisional element function, the selection work becomes easy.
[0070]
By the way, it is convenient to use ON / OFF of a flag in order to perform a process of “excluding a specific periodic function from a function group” by actually using a computer. That is, when a group of functions consisting of 128 types of periodic functions is defined in step S3, an ON state flag is set for each of the 128 types of periodic functions, and a specific periodic function is excluded from the function group. In doing so, the flag for the periodic function may be turned off. The periodic function remaining in the function group at each time point is a periodic function whose flag is in the ON state at that time point.
[0071]
§6. Procedure to remove overtone components
If the basic procedure of §5 described above is executed, it is possible to prevent the element function having the frequency of the peripheral sound component shown by the thin line in FIG. 9 from being selected, but the frequency of the harmonic component shown by the thick line in FIG. It is not possible to prevent selection of element functions. When an acoustic signal is encoded using MIDI data, even such overtone components are encoded as they are, and the original musical instrument (for example, a piano) strikes only two keys of N1 sound and N2 sound. However, MIDI data for a large number of keys corresponding to the overtone component indicated by the bold line is created.
[0072]
A method for removing such overtone components at the time of analysis or encoding is disclosed in Japanese Patent Application Laid-Open No. 11-95753. The basic principle is that, when the frequency of the basic sound can be selected as the representative frequency, the frequency component that is an integral multiple of the basic sound frequency is simply excluded. According to this method, for example, in FIG. 9B, when N1 fundamental sound is selected as the representative frequency, each harmonic component of N1 is removed, and further, N2 basic sound is selected as the representative frequency. Thus, each harmonic component of N2 is removed, so that the frequency of the harmonic component is not selected as the representative frequency. Therefore, the example shown in FIG. 9 is certainly an effective method for removing overtones. In practice, however, this method often causes problems. This is because the frequency of the second basic sound overlaps the harmonic frequency of the first basic sound when, for example, an octave chord (two sounds that are harmonically related to each other) is played on the same instrument. In the example shown in FIG. 9 (a), since the N2 sound is not a harmonic overtone of the N1 sound, such a problem does not occur. However, if the N2 sound is in the harmonic overtone position of the N1 sound, N1 The N2 fundamental sound is also removed by the overtone component removal processing performed when the fundamental sound is selected as the representative frequency.
[0073]
In the end, the given proposition is that when a harmonic component of the basic sound appears, whether this harmonic component is a component based on the sound actually played with the basic sound as a chord, or is simply generated accompanying the basic sound. It is to identify whether it is a harmonic component. In the analysis, the harmonic component should not be excluded in the former case, but should be excluded in the latter case.
[0074]
The inventor of the present application has found that the above-described identification is difficult in the case of coarse frequency analysis in note number units (MIDI semitone units), but this identification is possible by performing a finer frequency analysis. This will be described with reference to the example of FIG. FIG. 11 (a) shows a frequency spectrum obtained when a single sound is played with a single musical instrument. The horizontal axis is the note number axis and corresponds to the logarithmic scale frequency. However, for the sake of convenience in order to show in detail on the graph the frequency of the note numbers that are in overtone relation with each other, the horizontal axis is discontinuous at each location. When a single instrument plays a single sound (for example, when one of the piano keys is struck), as shown in FIG. 11 (a), the basic sound (the original sound corresponding to the struck keyboard) and The signal intensity is obtained at the position of the note number N, and the note number N + 12 which is the second harmonic, the note number N + 19 which is the third harmonic, the note number N + 24 which is the fourth harmonic, the note number N + 28 which is the fifth harmonic, and the sixth harmonic. Signal intensities that are overtone components are obtained at each position of the note number N + 31. Each of these overtone components has a frequency that is exactly an integral multiple of the fundamental frequency.
[0075]
On the other hand, FIG. 11 (b) shows a frequency spectrum obtained when an octave chord is played with a single musical instrument. For example, this corresponds to a spectrum in the case where a “do” sound of a piano and a “do” sound one octave higher are simultaneously hit. In this case, if the first “do” is the note number N, the second “do” is a sound corresponding to the note number N + 12. However, an actual musical instrument is not tuned so that one octave has exactly twice the frequency, and in a strict sense, a slight error occurs. This does not mean that a normal instrument does not perform the correct tuning, but rather means that a correct tuning does not set the frequency so that one octave is exactly doubled. . Therefore, when the frequency axis is taken finely, as shown in FIG. 11 (b), if the first "do" is a note number N indicated by a solid line, the second "do" is a note number indicated by a broken line. N + 12. That is, note number N + 12 indicated by a solid line is a second overtone component of the first “do”, whereas note number N + 12 indicated by a broken line is a completely independent second “do” component. Of course, a harmonic component is also generated for the second “do”, but each of these harmonic components is a component having a frequency that is exactly an integral multiple of the frequency of the second “do”. It becomes. Therefore, on the fine frequency axis, as shown in FIG. 11 (b), the first “do” indicated by a solid line and its harmonic component, and the second “do” indicated by a broken line and its harmonic component are shown. And are shifted from each other by a predetermined frequency and never overlap each other.
[0076]
The situation is exactly the same for different instruments. FIG. 11 (c) shows a frequency spectrum obtained when the same sound is played with two kinds of musical instruments. For example, this corresponds to a case where “do” is played on the piano and “do” having the same pitch is played on the xylophone. In this case, for example, if the piano “do” has the frequency of the note number N indicated by the solid line, the xylophone “do” has the frequency of the note number N indicated by the broken line, and both frequencies are the same. Despite the note number N, it will be slightly different. Moreover, the harmonic component of the piano sound has a frequency that is exactly an integer multiple of the fundamental frequency of the piano, and the harmonic component of the xylophone sound is an exact multiple of the fundamental frequency of the xylophone. As shown in the figure, the piano overtone component indicated by the solid line and the xylophone overtone component indicated by the broken line are shifted from each other by a predetermined frequency. There is no overlap.
[0077]
Considering such a phenomenon, it is possible to perform a harmonic removal process in which only the harmonic component generated accompanying the basic sound is removed, and the harmonic component generated by the chord is left without being removed. Specifically, in the embodiments described up to §5, as shown in FIG. 2, 128 frequencies f (0) to f (127) are defined, and the frequency of the semitone interval corresponding to the MIDI note number The analysis was performed with accuracy, but this frequency accuracy may be further increased. For example, if the frequency is defined with an interval of 1/10 semitone, a frequency accuracy 10 times that of FIG. 2 can be obtained, and 1280 frequencies can be defined. In this case, since 1280 kinds of periodic functions are prepared, the calculation burden increases considerably. However, since the resolution of the frequency axis is improved, the components indicated by the solid lines in FIGS. 11 (b) and 11 (c) It becomes possible to distinguish the components indicated by the broken lines from each other. Therefore, for example, in the example shown in FIG. 11 (b), when the frequency corresponding to the note number N indicated by the solid line is selected as the representative frequency, even if the process of removing the harmonic component is performed, The overtone component is only a component indicated by a solid line having a frequency that is exactly an integer multiple of the representative frequency, and the component indicated by the broken line is not removed. Therefore, the frequency corresponding to the note number N + 12 indicated by the broken line is subsequently selected as the representative frequency.
[0078]
However, when encoding is performed to finally obtain code data as MIDI data (in practice, such encoding is considered to be most often performed), the frequency accuracy is simply increased. It is not appropriate to take action. This is because general MIDI data is a code based on the premise of frequency accuracy in semitone units, and the frequency is assumed to be expressed using 128 note numbers from 0 to 127. It is. Therefore, when MIDI encoding is performed, it is finally necessary to perform encoding using a semitone frequency corresponding to the note number.
[0079]
Therefore, in this embodiment, frequency definition having a hierarchical structure is performed. That is, when defining a function group, first, a plurality of α standard frequencies as upper layers are defined, and for each standard frequency, adjacent frequencies having β variations are positioned (in the vicinity of the standard frequency). Is used in the sense of frequency, and may contain the same standard frequency). At this time, the adjacent frequency bands for the standard frequencies adjacent to each other are set within a range that does not overlap. Specifically, 128 frequencies corresponding to the note numbers are defined as standard frequencies that are upper layers (α = 128). Then, for each standard frequency, for example, adjacent frequencies having 13 variations are defined (β = 13).
[0080]
FIG. 12 is a diagram illustrating an example of such a proximity frequency definition, in which 13 types of proximity frequencies are defined for the standard frequency corresponding to the i-th note number N (i). . Since the standard frequency is a frequency corresponding to the note number, it is a frequency in semitone units. In the figure, the interval between the frequency of note number N (i-1) and the frequency of note number N (i) is a semitone. The interval between the frequency of the note number N (i) and the frequency of the note number N (i + 1) is also a semitone. On the other hand, as the 13 adjacent frequencies, the frequency of 1/13 semitone unit is used, and the intervals of the 13 adjacent frequencies indicated by the circled numbers in the figure are all 1/13 semitones. . Each fraction shown with + or-in the figure (the denominator is 13) indicates the frequency difference with respect to the standard frequency corresponding to the note number N (i). Here, the central proximity frequency (7) is the same as the standard frequency.
[0081]
As described above, adjacent frequencies having a plurality of β variations are defined for one standard frequency. However, it is necessary to prevent adjacent frequency bands for adjacent standard frequencies from overlapping each other. In the figure, the adjacent frequency band for the i-th standard frequency is shown, and this adjacent frequency band is also the adjacent frequency band for the (i-1) -th standard frequency located on the left side. The adjacent frequency band for the (i + 1) th standard frequency located on the right side is not overlapped. In the example shown in the figure, 13 adjacent frequencies having a frequency of 1/13 semitone unit are defined, so that each adjacent frequency band is arranged adjacently by dividing each standard frequency into 13 parts. Has been.
[0082]
In this way, when α different standard frequencies are defined for β standard frequencies, α × β adjacent frequencies are defined in total. For all the adjacent frequencies, a close periodic function having the close frequencies is defined. Eventually, a total of α × β adjacent periodic functions are defined. In the case of the above example, a total of 128 × 13 proximity periodic functions are defined. FIG. 13 is a diagram showing a list of proximity periodic functions defined as described above. For example, the proximity periodic function corresponding to the b-th variation for the a-th note number (standard frequency) is f (a, b). However, the proximity periodic function f (a, b) is actually composed of a pair of sine function and cosine function.
[0083]
Now, an analysis and encoding procedure in which 128 × 13 types of proximity periodic functions are defined and a harmonic component removal process is added will be described with reference to the flowchart of FIG. 10 again. In this flowchart, the input process in step S1 and the unit interval setting process in step S2 are exactly the same as those described in §5. However, in the function group definition process in step S3, as described above, a function group including 128 × 13 types of proximity periodic functions is defined. In the subsequent step S4, the section signal X is extracted and defined as the first difference signal S (1) as described in §5. In step S5, the parameter j = 1 is set, and the process in step S6 is performed.
[0084]
In step S6, an element function is selected from 128 × 13 types of proximity periodic functions constituting the function group. That is, at the frequency resolution of 1/13 semitone unit, the correlation value E with respect to the section signal X is the highest, and the relationship between the correlation value EE and the correlation value E with respect to the differential signal S (j) satisfies a predetermined setting condition. The adjacent periodic function is selected as the element function. Therefore, this element function is a function defined by an accurate frequency having a frequency resolution of 1/13 semitone unit. In performing the actual calculation, as described in §5, the correlation value with the section signal X is calculated in advance for each of the 128 × 13 all close periodic functions, and the result is recorded. In this case, the same calculation need not be performed every time the parameter j changes.
[0085]
In the subsequent step S7, a process of removing the selected element function from the function group is performed. At this time, when a certain close periodic function is excluded from the function group, the proximity function that is a variation of the close periodic function is included. Exclude all β periodic functions including the periodic function in total. As will be described later, since encoding is performed at a coarse frequency resolution corresponding to the note number in the final encoding stage, any one variation is selected as an element function for one standard frequency. This is because it is no longer necessary to select variations for the same standard frequency as element functions.
[0086]
Next, in step S8, the inclusion signal G (j) is obtained. As described above, the inclusion signal G (j) is a signal obtained by multiplying the selected element function (proximity periodic function) by the correlation value EE for the difference signal S (j) of the element function. . Therefore, this inclusion signal G (j) is also a signal defined by an accurate frequency having a frequency resolution of 1/13 semitone unit. In the procedure described in §5, a new difference signal S (j + 1) is obtained by subtracting the inclusion signal G (j) from the difference signal S (j). Here, in order to remove the overtone component, , One or more adjacent periodic functions having a frequency that is an integral multiple of the frequency of the selected element function and included in the function group are selected as harmonic component functions. ”And“ correlation value EE with respect to the difference signal S (j) of the harmonic component function ”, each harmonic overtone-containing signal is obtained, and the contained signal G (j) and each overtone-containing signal are obtained from the difference signal S (j). The signal obtained by subtracting is used as a new difference signal S (j + 1).
[0087]
For example, in the example shown in FIG. 11B, it is assumed that a specific proximity periodic function corresponding to one of 13 variations of the standard frequency corresponding to the note number N is selected as the element function. . In this case, a harmonic component function having a frequency that is exactly an integral multiple of the frequency of the adjacent periodic function is selected. The variation number of the overtone component function selected here should be the same as the variation number of the proximity periodic function selected as the element function (because the overtone component function has a frequency that is exactly an integral multiple). For example, when the close periodic function f (a, b) in the table shown in FIG. 13 is selected as an element function, the harmonic component function of the second harmonic is f (a + 12, b), and the harmonic component function of the third harmonic is f. (A + 19, b), the fourth harmonic overtone component function is f (a + 24, b), the fifth overtone component function is f (a + 28, b), and the sixth overtone component function is f (a + 31). , B). Therefore, in this case, a correlation value EE for the difference signal S (j) is obtained for each of these harmonic component functions. For example, the correlation obtained for each function f (a, b), f (a + 12, b), f (a + 19, b), f (a + 24, b), f (a + 28, b), f (a + 31, b) Assuming that the values are EE1, EE2, EE3, EE4, EE5, and EE6, respectively, the inclusion signal G (j) = EE1 · f (a, b), and the overtone-containing signals are EE2 · f (a + 12, b), respectively. ), EE3 · f (a + 19, b), EE4 · f (a + 24, b), EE5 · f (a + 28, b), EE6 · f (a + 31, b) (correlation value EE is an effective value) In practice, separate correlation values are used for the sine and cosine functions). From the difference signal S (j), not only the contained signal G (j) but all these overtone-containing signals are subtracted, and as a result, a new difference signal S (j + 1) is obtained.
[0088]
Eventually, the new difference signal S (j + 1) is a signal obtained by subtracting the component of the inclusion signal G (j) and its overtone component from the original difference signal S (j). In this example, the signal is obtained by subtracting all the components indicated by the solid line. However, since the component indicated by the broken line is not subtracted, the parameter j is updated, and the component indicated by the broken line can be selected as the element function in the next element function selection stage. As described above, according to the above-described embodiment, overtone components that do not need to be encoded and that accompany the basic sound are removed, and overtone components that are played as chords that need to be encoded are removed. It is possible to leave without being.
[0089]
Note that the contained signal finally obtained by this procedure is a signal having an accurate frequency with a frequency resolution of 1/13 semitone unit, and is not suitable for MIDI encoding as it is. . Therefore, in encoding, the frequency of the contained signal finally obtained for each unit section is replaced with the adjacent standard frequency, and the period finally extracted as a component of the time-series signal in each unit section The signal is converted into MIDI data after being set to a signal having any one of the standard frequencies. For example, in the above-described example, when the inclusion signal G (j) = EE1 · f (a, b) is obtained, the inclusion signal is changed to G (j) = EE1 · f (a), and the note MIDI data corresponding to the number N = a may be generated.
[0090]
It should be noted that, after the MIDI code is finally created, it is preferable in practice to perform a process of integrating a plurality of MIDI codes adjacent on the time axis as necessary. For example, if there are two MIDI codes in which the note-off time and the note-on time are close to each other, the note numbers are the same or similar, and the velocities are the same or approximate, the two MIDI codes It is good to perform the process which integrates and integrates. Specifically, the note-off time of the preceding MIDI code may be changed to the note-off time of the subsequent MIDI code, and the subsequent MIDI code may be deleted. At this time, when the velocity of the subsequent MIDI code is larger than the velocity of the preceding MIDI code, it is preferable to replace the velocity of the preceding MIDI code with the velocity of the subsequent MIDI code. Also, among the integrated MIDI notes, those whose note length (time from note-on time to note-off time) is shorter than a predetermined value or those whose velocity is smaller than a predetermined value are deleted. It is preferable to perform such processing.
[0091]
According to the embodiment described above, it is possible to perform encoding with removal of peripheral sound components and unnecessary harmonic components. Compared to the conventional method, although the calculation load increases, the selection accuracy of the representative frequency is improved and unnecessary harmonic components are not extracted, so that the encoding quality and the encoding efficiency are improved. In particular, it is very effective for application to automatic music transcription in which a musical score is automatically generated from performance recording. In addition, in vocal coding, the mixing of unnecessary sound components that cause noise and distortion during playback is suppressed, and on the contrary, important sound components that were dropped due to band restrictions in the conventional method are faithfully encoded. Therefore, more natural and realistic reproduction quality can be obtained.
[0092]
As mentioned above, although several embodiment which illustrates this invention was described, this invention is not limited to these embodiment. Further, the present invention is an invention that can be realized by computer processing, and various processes for carrying out the present invention can be described as a program, and can be recorded and distributed on a computer-readable recording medium. is there.
[0093]
【The invention's effect】
As described above, according to the present invention, it is possible to perform more accurate frequency analysis on a time series signal, and it is possible to perform encoding of an original sound signal with high quality.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic principle of a time-series signal analysis method and an acoustic signal encoding method according to the present invention.
FIG. 2 is a diagram showing an example of a periodic function used in the method according to the present invention.
3 is a diagram showing a relational expression between the frequency of each periodic function shown in FIG. 2 and a MIDI note number n. FIG.
FIG. 4 is a diagram illustrating a method of calculating a correlation between a signal to be analyzed and a periodic signal.
FIG. 5 is a diagram showing a calculation formula for performing the correlation calculation shown in FIG. 4;
6 is a diagram showing an expression obtained by extending the calculation expression shown in FIG. 5 to a general periodic function. FIG.
FIG. 7 is a diagram showing a spectrum intensity graph obtained by a general Fourier analysis.
FIG. 8 is a diagram showing a basic method of generalized harmonic analysis.
FIG. 9 is a diagram for comparing an analysis result based on a short-time Fourier transform and an analysis result based on a generalized harmonic analysis.
FIG. 10 is a flowchart showing a basic procedure of a time-series signal analysis method according to the present invention.
FIG. 11 is a graph showing spectra of various overtone components.
FIG. 12 is a diagram showing the concept of a proximity frequency defined in the time-series signal analysis method according to the present invention.
FIG. 13 is a diagram showing an example of a number of proximity periodic functions defined in the time-series signal analysis method according to the present invention.
[Explanation of symbols]
A (n), B (n) ... correlation value
d, d1 to d5 ... unit interval
E (n) ... correlation value (effective value)
e (d1,1), e (d1,2), e (d1,3), e (d2,1), e (d2,2), e (d2,3) ... amplitude intensity
F ... Sampling frequency
f (0) to f (127), f (n) ... standard frequency
f (a, b) ... proximity periodic function
G (j) ... Inclusion signal
i: Parameter indicating note number or unit interval
j: Parameter indicating the number of repetitions
k: Parameter indicating the sample number
L ... Section length
N1, N2, N (i-1), N (i), N (i + 1) ... Note number
n, n (d1, 1), n (d1, 2), n (d1, 3), n (d2, 1), n (d2, 2), n (d2, 3) ... note number / representative code code
Rn ... periodic function
S (j), S (j + 1) ... differential signal
T1-T3 ... track
t1-t6 ... Time
w ... Sample number
X, X (k) ... Section signal

Claims

A time-series signal analysis method for extracting a plurality of periodic signals as constituent elements from a time-series signal given as a time-series intensity signal,
An input stage for capturing time series signals to be analyzed as digital data,
A section setting stage for setting a plurality of unit sections on the time axis of the captured time series signal,
By defining multiple frequencies and defining a pair of functions consisting of a sine function and cosine function with the same frequency as a periodic function for that frequency, a function group consisting of periodic functions for each frequency is defined. A function group definition stage to perform,
Extracting a time-series signal in one unit section as a section signal X, and defining a section signal X as a first difference signal S (1);
Among the function groups, the correlation value E with respect to the interval signal X is the highest, and the relationship between the correlation value EE with respect to the j-th differential signal S (j) and the correlation value E satisfies a predetermined setting condition. The periodic function is selected as the j-th element function, the j-th element function is excluded from the function group, and a product of the j-th element function and the correlation value EE is given. A jth inclusion signal G (j) is obtained, and a signal obtained by subtracting the jth inclusion signal G (j) from the jth difference signal S (j) is used as a new difference signal S (j + 1). ) To repeat element processing J times from j = 1 to J (J is an arbitrary integer),
Have
In the function group definition stage, a plurality of α standard frequencies in semitone units corresponding to MIDI note numbers are adjacent frequencies (same as the standard frequency) having a plurality of β variations in 1 / β semitone units. May be included) within a range in which adjacent frequency bands for standard frequencies adjacent to each other do not overlap each other, and a proximity period function with each adjacent frequency is defined, respectively, and a total of α × β adjacent periods Make a function group by function,
In the element function selection step, the effective value of the correlation value for the sine function and the correlation value for the cosine function is used as the correlation value between the predetermined periodic function and the predetermined signal,
In the element function selection stage, when a certain proximity periodic function is excluded from the function group, all of the β neighboring proximity functions including a proximity periodic function that is a variation of the proximity periodic function are excluded. West,
The interval signal extraction step and the element function selection step are performed for each individual unit interval, a plurality of inclusion signals are obtained for each unit interval, and the inclusion signals obtained for each unit interval are determined in the unit interval. A method for analyzing a time series signal, wherein the time series signal is extracted as a periodic signal which is a constituent element of the series signal.

The time series signal analysis method according to claim 1,
In the element function selection stage, the correlation value E with respect to the section signal X is the highest among the function group, and the relationship between the correlation value EE with respect to the j-th differential signal S (j) and the correlation value E is a predetermined value. When performing the process of selecting the periodic function satisfying the setting condition as the j-th element function,
A periodic function having the highest correlation value E with respect to the interval signal X is selected as a temporary element function, a correlation value EE with respect to the j-th differential signal S (j) for this temporary element function is calculated, and a condition is determined.
If the set condition is satisfied as a result of the condition determination, the temporary element function is selected as the j-th element function, and the j-th element function is excluded from the function group. Process
If the setting condition is not satisfied as a result of the condition determination, the setting condition satisfies the process of selecting a new temporary element function after performing the process of excluding the temporary element function from the function group. A method of analyzing a time-series signal, which is repeatedly performed until it is performed.

In the time series signal analysis method according to claim 1 or 2,
When a predetermined threshold value Δ is set as a predetermined setting condition for the relationship between the correlation value EE and the correlation value E in the element function selection stage, | E−EE | <Δ or EE> E−Δ. A method for analyzing a time-series signal, characterized by using a condition.

In the time series signal analysis method according to any one of claims 1 to 3,
In the element function selection stage, one or more adjacent periodic functions having a frequency that is an integral multiple of the frequency of the selected element function and included in the function group are selected as overtone component functions; Each overtone-containing signal given by the product of “each overtone component function” and “correlation value EE with respect to the difference signal S (j) of the overtone component function” is obtained, and the contained signal G (j) is obtained from the difference signal S (j). And a signal obtained by subtracting each overtone-containing signal as a new difference signal S (j + 1).

In the time series signal analysis method according to any one of claims 1 to 4,
By replacing the frequency of the contained signal obtained for each unit section with a standard frequency close to each other, the periodic signal finally extracted as a component of the time series signal in each unit section is one of the standard frequencies. A method for analyzing a time-series signal, characterized in that the signal has a signal.

An audio signal encoding method using the time-series signal analysis method according to claim 1,
By treating the acoustic signal to be encoded as a time-series signal to be analyzed, the content signal for each individual unit section is obtained,
Create code data including information indicating the amplitude and frequency of the contained signal obtained for the specific unit section, information indicating the frequency or the frequency in the vicinity thereof, and information indicating the position of the specific unit section on the time axis A method of encoding an acoustic signal, wherein the acoustic signal in the specific unit section is encoded by the code data.

The method of encoding an acoustic signal according to claim 6,
For each unit section, out of the obtained J contained signals, M contained signals (M <J) are selected in descending order of amplitude, and code data is created based on the M contained signals. A method for encoding an acoustic signal.

The method for encoding an acoustic signal according to claim 6 or 7,
Using velocity as information indicating the amplitude of the contained signal obtained for a specific unit interval, using a note number as information indicating the frequency or a frequency in the vicinity thereof, and starting the interval on the time axis of the specific unit interval An acoustic signal encoding method, wherein note-on time is used as information indicating a position, note-off time is used as information indicating a section end position, and MIDI data is created as code data.

A computer-readable recording medium having recorded thereon a program for causing a computer to execute the time-series signal analysis method or the acoustic signal encoding method according to claim 1.