JP4132362B2

JP4132362B2 - Acoustic signal encoding method and program recording medium

Info

Publication number: JP4132362B2
Application number: JP05843199A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1999-03-05
Filing date: 1999-03-05
Publication date: 2008-08-13
Anticipated expiration: 2019-03-05
Also published as: JP2000261322A

Abstract

PROBLEM TO BE SOLVED: To perform conversion into code data such as MIDI data with high quality. SOLUTION: An acoustic signal to be encoded is made a PCM code and fetched as acoustic data. Plural unit sections (d) are set on a time base, and a section signal (x) is extracted in each unit section. An element signal consisting of trigonometric functions corresponding to 128 ways of MIDI note numbers are preliminarily prepared, and an element signal having the highest correlation value to the signal (x) is selected as a harmony signal. A contained signal obtained by multiplying the harmony signal by the correlation value is defined, and the same processing repeated with a residual obtained by subtracting the contained signal from the signal (x) as a new section signal. When the original section signal can be represented approximately according to the total of plural contained signals, a MIDI code is generated on the basis of a note number and a correlation value corresponding to each contained signal.

Description

【０００１】
【発明の属する技術分野】
本発明は音響信号の符号化方法に関し、時系列の強度信号として与えられる音響信号を符号化し、これを復号化して再生する技術に関する。特に、本発明は一般の音響信号を、ＭＩＤＩ形式の符号データに効率良く変換する処理に適しており、放送メディア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音声配信、インターネット配信）、パッケージメディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯＭ、ゲームカセット）などで提供する各種オーディオコンテンツを制作する種々の産業分野への応用が期待される。
【０００２】
【従来の技術】
音響信号を符号化する技術として、ＰＣＭ（Pulse Code Modulation ）の手法は最も普及している手法であり、現在、オーディオＣＤやＤＡＴなどの記録方式として広く利用されている。このＰＣＭの手法の基本原理は、アナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして表現する点にあり、サンプリング周波数や量子化ビット数を高くすればするほど、原音を忠実に再生することが可能になる。ただ、サンプリング周波数や量子化ビット数を高くすればするほど、必要な情報量も増えることになる。そこで、できるだけ情報量を低減するための手法として、信号の変化差分のみを符号化するＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation ）の手法も用いられている。
【０００３】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形を記憶したＭＩＤＩ音源が別途必要になる。しかしながら、上述したＰＣＭの手法で音を記録する場合に比べて、情報量が極めて少なくてすむという特徴を有し、その符号化効率の高さが注目を集めている。このＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウエアに広く採り入れられており、カラオケ、ゲームの効果音といった分野でも広く利用されている。
【０００４】
【発明が解決しようとする課題】
上述したように、ＰＣＭの手法により音響信号を符号化する場合、十分な音質を確保しようとすれば情報量が膨大になり、データ処理の負担が重くならざるを得ない。したがって、通常は、ある程度の情報量に抑えるため、ある程度の音質に妥協せざるを得ない。もちろん、ＭＩＤＩ規格による符号化の手法を採れば、非常に少ない情報量で十分な音質をもった音の再生が可能であるが、上述したように、ＭＩＤＩ規格そのものが、もともと楽器演奏の操作を符号化するためのものであるため、広く一般音響への適用を行うことはできない。別言すれば、ＭＩＤＩデータを作成するためには、実際に楽器を演奏するか、あるいは、楽譜の情報を用意する必要がある。
【０００５】
このように、従来用いられているＰＣＭの手法にしても、ＭＩＤＩの手法にしても、それぞれ音響信号の符号化方法としては一長一短があり、一般の音響について、少ない情報量で十分な音質を確保することはできない。ところが、一般の音響についても効率的な符号化を行いたいという要望は、益々強くなってきている。いわゆるヴォーカル音響と呼ばれる人間の話声や歌声を取り扱う分野では、かねてからこのような要望が強く出されている。たとえば、語学教育、声楽教育、犯罪捜査などの分野では、ヴォーカル音響信号を効率的に符号化する技術が切望されている。このような要求に応えるために、特願平９−２７３９４９号明細書や特願平１０−２８３４５３号明細書には、ＭＩＤＩデータを利用することが可能な新規な符号化方法が提案されている。これらの方法では、音響信号の時間軸に沿って複数の単位区間を設定し、各単位区間ごとにフーリエ変換を行ってスペクトルを求め、このスペクトルに応じたＭＩＤＩデータを作成するという手順が実行される。しかしながら、フーリエ変換を利用した周波数解析法は、もともと一定周波数の信号が時間軸上で無限に連続するという前提で数式が定義されているため、時間軸上で有限の幅をもった単位区間ごとの解析に利用すると、必ずしも忠実な符号化を行うことができない。このため、品質の高い符号化という観点においては問題があった。
【０００６】
そこで本発明は、ＭＩＤＩデータのような符号データへの変換を高い品質をもって行うことが可能な音響信号の符号化方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
(1) 本発明の第１の態様は、時系列の強度信号として与えられる音響信号を符号化するための音響信号の符号化方法において、
符号化対象となる音響信号の時間軸上に複数の単位区間を設定し、個々の単位区間ごとにそれぞれ区間信号を抽出する区間信号抽出段階と、
この区間信号の構成要素となるべき複数通りの要素信号を準備する要素信号準備段階と、
準備した複数通りの要素信号の中から、区間信号に対する相関値が最も高い要素信号を調和信号として選出する調和信号選出段階と、
この調和信号とその相関値との積で与えられる含有信号を、区間信号から減じることにより差分信号を求める差分信号演算段階と、
差分信号を新たな区間信号として、調和信号選出段階および差分信号演算段階を実行して新たな含有信号および新たな差分信号を得る処理を、繰り返し行うことにより複数通りの含有信号を求め、求めた含有信号に基づいて、区間信号を表現するための複数の符号コードを生成する符号化段階と、
を行い、個々の単位区間ごとに生成された符号コードの集合によって、音響信号を表現するようにしたものである。
【０００８】
(2) 本発明の第２の態様は、上述の第１の態様に係る音響信号の符号化方法において、
要素信号準備段階で、互いに周波数の異なる複数通りの要素信号を準備するようにし、
調和信号選出段階で、区間信号に対してフーリエ変換を行い、得られたフーリエスペクトルのピーク周波数に対応する要素信号を調和信号として選出するようにしたものである。
【０００９】
(3) 本発明の第３の態様は、上述の第１の態様に係る音響信号の符号化方法において、
調和信号選出段階では、区間信号のピーク位置に関する情報のみを用いて相関値を演算する簡易相関演算を行い、この簡易相関演算の結果得られた相関値に基づいて調和信号を選出するようにし、
差分信号演算段階では、選出された調和信号の全情報を用いて相関値を再演算し、この再演算の結果得られた相関値を用いて含有信号を求める演算を行うようにしたものである。
【００１０】
(4) 本発明の第４の態様は、上述の第１の態様に係る音響信号の符号化方法において、
各単位区間の区間信号について第１回目の調和信号を選出する際に、複数Ｘ通りの要素信号の中から、区間信号に対する相関値の高い順に第１位〜第Ｙ位までの複数Ｙ個（Ｙ＜Ｘ）の候補を選出しておき、第１位の候補を第１回目の調和信号として選出し、第２回目以降の調和信号を選出する際には、既に選出された複数Ｙ個の候補の中から区間信号に対する相関値が最も高い要素信号を調和信号として選出するようにしたものである。
【００１１】
(5) 本発明の第５の態様は、上述の第１の態様に係る音響信号の符号化方法において、
区間信号抽出段階で、隣接する単位区間が時間軸上で部分的に重複するような設定を行うようにしたものである。
【００１２】
(6) 本発明の第６の態様は、上述の第５の態様に係る音響信号の符号化方法において、
第１の単位区間の区間信号についての調和信号を選出する際に、複数Ｘ通りの要素信号の中から、区間信号に対する相関値の高い順に第１位〜第Ｚ位までの複数Ｚ個（Ｚ＜Ｘ）の候補を選出しておき、この複数Ｚ個の候補の中から調和信号を選出するようにし、
第１の単位区間に対して時間軸上で所定の時間以上にわたって重複するような第２の単位区間の区間信号についての調和信号を選出する際には、既に選出された複数Ｚ個の候補の中から調和信号を選出するようにしたものである。
【００１３】
(7) 本発明の第７の態様は、上述の第１〜第６の態様に係る音響信号の符号化方法において、
要素信号準備段階で、互いに同一の周波数をもった正弦関数と余弦関数との合成関数を１要素信号とし、等比級数をなす複数Ｘ個の周波数についての各合成関数をそれぞれ各要素信号とするようにしたものである。
【００１４】
(8) 本発明の第８の態様は、上述の第１〜第６の態様に係る音響信号の符号化方法において、
要素信号準備段階で、等比級数をなす複数Ｘ個の周波数を定義し、第ｎ番目（ｎ＝１，２，…，Ｘ）の周波数ｆ（ｎ）について、
単位区間と同一区間内に定義され、この区間内における周波数ｆ（ｎ）をもった正弦関数と余弦関数との合成により得られる第１の合成関数と、
単位区間と同一区間内に定義され、この区間内で区間開始周波数ｆ（ｎ）から区間終了周波数ｆ（ｎ−１）に至るまで連続的に周波数が変化するような正弦関数と余弦関数との合成により得られる第２の合成関数と、
単位区間と同一区間内に定義され、この区間内で区間開始周波数ｆ（ｎ）から区間終了周波数ｆ（ｎ＋１）に至るまで連続的に周波数が変化するような正弦関数と余弦関数との合成により得られる第３の合成関数と、
を定義することにより合計３Ｘ個の合成関数を定義し、これらの合成関数をそれぞれ要素信号として用いて相関値を求める演算を行い、第２の合成関数または第３の合成関数についての相関値が最も高いと判断された場合には、当該合成関数に対応する第１の合成関数を調和信号として選出するようにしたものである。
【００１５】
(9) 本発明の第９の態様は、上述の第１〜第６の態様に係る音響信号の符号化方法において、
要素信号準備段階で、比例定数αの等比級数をなす複数Ｘ個の周波数を定義し、第ｎ番目（ｎ＝１，２，…，Ｘ）の周波数ｆ（ｎ）について、
単位区間と同一区間内に定義され、この区間内における周波数ｆ（ｎ）をもった正弦関数と余弦関数との合成により得られる第１の合成関数と、
単位区間と同一区間内に定義され、この区間内における周波数ｆ（ｎ）＊βをもった正弦関数と余弦関数との合成により得られる第２の合成関数と、
単位区間と同一区間内に定義され、この区間内における周波数ｆ（ｎ）／βをもった正弦関数と余弦関数との合成により得られる第３の合成関数と、
を定義することにより合計３Ｘ個の合成関数を定義し（ただし、１＜β＜平方根α）、これらの合成関数をそれぞれ要素信号として用いて相関値を求める演算を行い、第２の合成関数または第３の合成関数についての相関値が最も高いと判断された場合には、当該合成関数に対応する第１の合成関数を調和信号として選出するようにしたものである。
【００１６】
(10) 本発明の第１０の態様は、上述の第７〜第９の態様に係る音響信号の符号化方法において、
複数Ｘ個の周波数として、ＭＩＤＩデータで利用される各ノートナンバーに対応した周波数を用いるようにし、
符号化段階で、個々の単位区間の音響信号を、各含有信号の周波数に対応したノートナンバーと、その振幅に基いて決定されたベロシティーと、当該単位区間の長さに基いて決定されたデルタタイムと、を示すデータからなるＭＩＤＩ形式の符号データによって表現するようにしたものである。
【００１７】
(11) 本発明の第１１の態様は、上述の第１〜第１０の態様に係る音響信号の符号化方法において、
所定の周波数ｆをもった要素信号に対する相関を求める演算を行う代わりに、正弦関数および余弦関数についての倍角公式を用いることにより、周波数ｆ／２ｑ（ｑは所定の整数）をもった要素信号に対する相関を求める演算を行うようにしたものである。
【００１８】
(12) 本発明の第１２の態様は、上述の第１〜第１１の態様に係る音響信号の符号化方法をコンピュータに実行させるためのプログラムを、コンピュータ読み取り可能な記録媒体に記録するようにしたものである。
【００１９】
【発明の実施の形態】
以下、本発明を図示する実施形態に基づいて説明する。
【００２０】
§１．フーリエ変換を利用した音響信号の符号化方法の基本原理
はじめに、本発明に対する先願発明となる特願平１０−２８３４５３号明細書において提案されているフーリエ変換を利用した音響信号の符号化方法の基本原理を説明する。いま、図１(a) に示すように、時系列の強度信号としてアナログ音響信号が与えられたものとしよう。図示の例では、横軸に時間ｔ、縦軸に振幅（強度）をとってこの音響信号を示している。ここでは、まずこのアナログ音響信号を、デジタルの音響データとして取り込む処理を行う。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周期でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えばよい。ここでは、説明の便宜上、ＰＣＭの手法でデジタル化した音響データの波形も、図１(a) のアナログ音響信号と同一の波形で示すことにする。
【００２１】
続いて、この符号化対象となる音響信号の時間軸上に、複数の単位区間を設定する。図１(a) に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの単位区間ｄ１〜ｄ５が設定されている（より実用的な区間設定方法については後述する）。
【００２２】
こうして単位区間が設定されたら、各単位区間ごとの音響信号（ここでは、区間信号と呼ぶことにする）に対してそれぞれフーリエ変換を行い、スペクトルを作成する。このとき、ハニング窓（Hanning Window )などの重み関数で、切り出した区間信号にフィルタをかけてフーリエ変換を施す。一般にフーリエ変換は、切り出した区間前後に同様な信号が無限に存在することが想定されているため、重み関数を用いない場合、作成したスペクトルに高周波ノイズがのることが多い。ハニング窓関数など区間の両端の重みが０になるような重み関数を用いると、このような弊害をある程度抑制できる。ハニング窓関数Ｈ（ｋ）は、単位区間長をＬとすると、ｋ＝１…Ｌに対して、
Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（２πｋ／Ｌ）
で与えられる関数である。
【００２３】
図１(b) には、単位区間ｄ１について作成されたスペクトルの一例が示されている。このスペクトルでは、横軸上に定義された周波数ｆによって、単位区間ｄ１についての区間信号に含まれる周波数成分（０〜Ｆ：ここでＦはサンプリング周波数）が示されており、縦軸上に定義された複素強度Ａによって、各周波数成分ごとの複素強度が示されている。
【００２４】
次に、このスペクトルの周波数軸ｆに対応させて、離散的に複数Ｘ個の符号コードを定義する。この例では、符号コードとしてＭＩＤＩデータで利用されるノートナンバーｎを用いており、ｎ＝０〜１２７までの１２８個の符号コードを定義している。ノートナンバーｎは、音符の音階を示すパラメータであり、たとえば、ノートナンバーｎ＝６９は、ピアノの鍵盤中央の「ラ音（Ａ３音）」を示しており、４４０Ｈｚの音に相当する。このように、１２８個のノートナンバーには、いずれも所定の周波数が対応づけられるので、スペクトルの周波数軸ｆ上の所定位置に、それぞれ１２８個のノートナンバーｎが離散的に定義されることになる。
【００２５】
ここで、ノートナンバーｎは、１オクターブ上がると、周波数が２倍になる対数尺度の音階を示すため、周波数軸ｆに対して線形には対応しない。そこで、ここでは周波数軸ｆを対数尺度で表し、この対数尺度軸上にノートナンバーｎを定義した強度グラフを作成してみる。図１(c) は、このようにして作成された単位区間ｄ１についての強度グラフを示す。この強度グラフの横軸は、図１(b) に示すスペクトログラムの横軸を対数尺度に変換したものであり、ノートナンバーｎ＝０〜１２７が等間隔にプロットされている。一方、この強度グラフの縦軸は、図１(b) に示すスペクトルの複素強度Ａを実効強度Ｅに変換したものであり、各ノートナンバーｎの位置における強度を示している。一般に、フーリエ変換によって得られる複素強度Ａは、実数部Ｒと虚数部Ｉとによって表されるが、実効強度Ｅは、Ｅ＝（Ｒ^２＋Ｉ^２）^１／２なる演算によって求めることができる。
【００２６】
こうして求められた単位区間ｄ１の強度グラフは、単位区間ｄ１についての区間信号に含まれる振動成分について、ノートナンバーｎ＝０〜１２７に相当する各振動成分の割合を実効強度として示すグラフということができる。そこで、この強度グラフに示されている各実効強度に基いて、全Ｘ個（この例ではＸ＝１２８）のノートナンバーの中からＰ個のノートナンバーを選択し、このＰ個のノートナンバーｎを、単位区間ｄ１を代表する代表符号コードとして抽出する。ここでは、説明の便宜上、Ｐ＝３として、全１２８個の候補の中から３個のノートナンバーを代表符号コードとして抽出する場合を示すことにする。たとえば、「候補の中から強度の大きい順にＰ個の符号コードを抽出する」という基準に基いて抽出を行えば、図１(c) に示す例では、第１番目の代表符号コードとしてノートナンバーｎ（ｄ１，１）が、第２番目の代表符号コードとしてノートナンバーｎ（ｄ１，２）が、第３番目の代表符号コードとしてノートナンバーｎ（ｄ１，３）が、それぞれ抽出されることになる。
【００２７】
このようにして、Ｐ個の代表符号コードが抽出されたら、これらの代表符号コードとその実効強度によって、単位区間ｄ１についての区間信号を表現することができる。たとえば、上述の例の場合、図１(c) に示す強度グラフにおいて、ノートナンバーｎ（ｄ１，１）、ｎ（ｄ１，２）、ｎ（ｄ１，３）の実効強度がそれぞれｅ（ｄ１，１）、ｅ（ｄ１，２）、ｅ（ｄ１，３）であったとすれば、以下に示す３組のデータ対によって、単位区間ｄ１の音響信号を表現することができる。
ｎ（ｄ１，１），ｅ（ｄ１，１）
ｎ（ｄ１，２），ｅ（ｄ１，２）
ｎ（ｄ１，３），ｅ（ｄ１，３）
以上、単位区間ｄ１についての処理について説明したが、単位区間ｄ２〜ｄ５についても、それぞれ別個に同様の処理が行われ、代表符号コードおよびその強度を示すデータが得られることになる。たとえば、単位区間ｄ２については、
ｎ（ｄ２，１），ｅ（ｄ２，１）
ｎ（ｄ２，２），ｅ（ｄ２，２）
ｎ（ｄ２，３），ｅ（ｄ２，３）
なる３組のデータ対が得られる。このようにして各単位区間ごとに得られたデータによって、原音響信号を符号化することができる。
【００２８】
図２は、上述の方法による符号化の概念図である。図２(a) には、図１(a) と同様に、原音響信号について５つの単位区間ｄ１〜ｄ５を設定した状態が示されており、図２(b) には、各単位区間ごとに得られた符号データが音符の形式で示されている。この例では、個々の単位区間ごとに３個の代表符号コードを抽出しており（Ｐ＝３）、これら代表符号コードに関するデータを３つのトラックＴ１〜Ｔ３に分けて収容するようにしている。たとえば、単位区間ｄ１について抽出された代表符号コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容されている。もっとも、図２(b) は、上述の方法によって得られる符号データを音符の形式で示した概念図であり、実際には、各音符にはそれぞれ強度に関するデータが付加されている。たとえば、トラックＴ１には、ノートナンバーｎ（ｄ１，１），ｎ（ｄ２，１），ｎ（ｄ３，１）…なる音階を示すデータとともに、ｅ（ｄ１，１），ｅ（ｄ２，１），ｅ（ｄ３，１）…なる強度を示すデータが収容されることになる。
【００２９】
なお、ここで採用する符号化の形式としては、必ずしもＭＩＤＩ形式を採用する必要はないが、この種の符号化形式としてはＭＩＤＩ形式が最も普及しているため、実用上はＭＩＤＩ形式の符号データを用いるのが最も好ましい。ＭＩＤＩ形式では、「ノートオン」データもしくは「ノートオフ」データが、「デルタタイム」データを介在させながら存在する。「ノートオン」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏開始を指示するデータであり、「ノートオフ」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏終了を指示するデータである。また、「デルタタイム」データは、所定の時間間隔を示すデータである。ベロシティーＶは、たとえば、ピアノの鍵盤などを押し下げる速度（ノートオン時のベロシティー）および鍵盤から指を離す速度（ノートオフ時のベロシティー）を示すパラメータであり、特定の音の演奏開始操作もしくは演奏終了操作の強さを示すことになる。
【００３０】
前述の方法では、第ｉ番目の単位区間ｄｉについて、代表符号コードとしてＰ個のノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｐ）が得られ、このそれぞれについて実効強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｐ）が得られる。そこで、次のような手法により、ＭＩＤＩ形式の符号データを作成することができる。まず、「ノートオン」データもしくは「ノートオフ」データの中で記述するノートナンバーＮとしては、得られたノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｐ）をそのまま用いていればよい。一方、「ノートオン」データもしくは「ノートオフ」データの中で記述するベロシティーＶとしては、得られた実効強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｐ）を、値が０〜１の範囲となるように規格化し、この規格化後の実効強度Ｅの平方根に、たとえば１２７を乗じた値を用いるようにする。すなわち、実効強度Ｅについての最大値をＥmax とした場合、
Ｖ＝（Ｅ／Ｅmax ）^１／２・１２７
なる演算で求まる値Ｖをベロシティーとして用いる。あるいは対数をとって、
Ｖ＝ｌｏｇ（Ｅ／Ｅmax ）・１２７＋１２７
（ただし、Ｖ＜０の場合はＶ＝０とする）
なる演算で求まる値Ｖをベロシティーとして用いてもよい。また、「デルタタイム」データは、各単位区間の長さに応じて設定すればよい。
【００３１】
結局、上述した実施形態では、３トラックからなるＭＩＤＩ符号データが得られることになる。このＭＩＤＩ符号データを３台のＭＩＤＩ音源を用いて再生すれば、６チャンネルのステレオ再生音として音響信号が再生される。
【００３２】
§２．より実用的な区間設定方法
前述した§１では、非常に単純な区間設定例を述べたが、ここでは、区間設定を行う上でのより実用的な手法を説明する。図２(a) に示された例では、時間軸ｔ上に等間隔に定義された６つの時刻ｔ１〜ｔ６を境界として、５つの単位区間ｄ１〜ｄ５が設定されている。このような区間設定に基いて符号化を行った場合、再生時に、境界となる時刻において音の不連続が発生しやすい。したがって、実用上は、隣接する単位区間が時間軸上で部分的に重複するような区間設定を行うのが好ましい。
【００３３】
図３(a) は、このように部分的に重複する区間設定を行った例である。図示されている単位区間ｄ１〜ｄ４は、いずれも部分的に重なっており、このような区間設定に基いて前述の処理を行うと、図３(b) の概念図に示されているような符号化が行われることになる。この例では、それぞれの単位区間の中心を基準位置として、各音符をそれぞれの基準位置に配置しているが、単位区間に対する相対的な基準位置は、必ずしも中心に設定する必要はない。図３(b) に示す概念図を図２(b) に示す概念図と比較すると、音符の密度が高まっていることがわかる。このように重複した区間設定を行うと、作成される符号データの数は増加することになるが、再生時に音の不連続が生じない自然な符号化が可能になる。
【００３４】
図４は、時間軸上で部分的に重複する区間設定を行う具体的な手法を示す図である。この具体例では、音響信号を２２ｋＨｚのサンプリング周波数でサンプリングすることによりデジタル音響データとして取り込み、個々の単位区間の区間長Ｌを１０２４サンプル分（約４７ｍｓｅｃ）に設定し、各単位区間ごとのずれ量を示すオフセット長ΔＬを２０サンプル分（約０．９ｍｓｅｃ）に設定したものである。すなわち、任意のｉに対して、第ｉ番目の単位区間の始点と第（ｉ＋１）番目の単位区間の始点との時間軸上での隔たりがオフセット長ΔＬに設定されることになる。たとえば、第１番目の単位区間ｄ１は、１〜１０２４番目のサンプルを含んでおり、第２番目の単位区間ｄ２は、２０サンプル分ずれた２１〜１０４４番目のサンプルを含んでいることになる。
【００３５】
§３．スペクトル強度の効率的な演算方法
さて、図１で説明した原理に係る符号化方法の基本手順は、まず、図１(a) に示すように、音響データの時間軸上に複数の単位区間ｄ１，ｄ２，ｄ３，…を設定し、区間ｄ１内の音響データに対してフーリエ変換を行い、図１(b) に示すようなスペクトルを求め、図１(c) に示すように、このスペクトルのピーク周波数に相当するいくつかの符号ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）によって、区間ｄ１の音響信号を表現する、ということになる。ここでは、図１(b) に示すようなスペクトルを求めるための効率的な演算方法を述べることにする。
【００３６】
図１(a) に示すような振動成分をもった信号について、図１(b) に示すようなスペクトルを得る場合、フーリエ変換を利用するのが一般的であり、実用上は、高速フーリエ変換（ＦＦＴ）の手法を用いた演算が行われる。しかしながら、一般的なフーリエ変換は、線形な周波数軸を用いたスペクトルを得ることを前提としており、ＭＩＤＩデータなどの非線形な符号データへの変換には必ずしも適していない。これは次のような理由によるものである。
【００３７】
いま、図５に示すような線形尺度によるフーリエスペクトルを考えてみよう。このフーリエスペクトルは、横軸に線形尺度による周波数ｆをとり、縦軸にスペクトル強度をとったグラフである。ここで、横軸（周波数軸）上には、複数Ｍ個の測定ポイントが等間隔に離散的に定義されており、各測定ポイントごとに、そのスペクトル強度が棒グラフで示されている。グラフの下欄▲１▼には、各測定ポイントの番号が記されており、グラフの下欄▲２▼には、これら各測定ポイントに相当する周波数値が記されている。この例は、サンプリング周波数Ｆ＝２２．０５ｋＨｚで音響信号をデータとして取り込んだ例であり、測定ポイントの数Ｍ＝１０２４に設定してある。したがって、周波数ｆ＝０となる第０番目の測定ポイントから、周波数ｆ＝１１０１４Ｈｚ（サンプリング周波数Ｆのほぼ１／２）となる第１０２３番目の測定ポイントに至るまで、合計１０２４個の測定ポイントのそれぞれにおいて、棒グラフの長さに相当するスペクトル強度が求まっている。一般のフーリエ変換では、このように線形な周波数軸上に等間隔で定義された多数の測定ポイントについて、それぞれスペクトル強度が求められることになる。
【００３８】
ところが、この図５のように、線形な周波数軸上に等間隔で定義された測定ポイントについて強度が得られているスペクトルを、ＭＩＤＩデータのように、周波数に関して非線形な特性を有する符号系への変換に利用することは効率的ではない。図６は、図５に示すスペクトルの周波数軸を対数尺度に書き直したものである。グラフの下欄▲１▼には、各測定ポイントの番号が記されており、グラフの下欄▲２▼には、これら各測定ポイントに対応づけられたノートナンバー（log ｆに相当）が記されている。測定ポイントの数Ｍ＝１０２４である点は図５と同じであるが、周波数軸が対数尺度となっているため、各測定ポイントは横軸上で等間隔には配置されていない。別言すれば、低周波領域では、測定ポイントの配置は粗いが、高周波領域にゆくにしたがって、測定ポイントの配置は密になる。
【００３９】
図６の例における低周波領域では、第１の測定ポイントについては、ノートナンバーｎ＝４、第２の測定ポイントについては、ノートナンバーｎ＝１６、第３の測定ポイントについては、ノートナンバーｎ＝２４を割り当てているが、これらの中間に位置するノートナンバーについては対応する測定ポイントが存在しないため、スペクトル強度が得られない結果となっており、いわば歯抜けの櫛のような状態となっている。したがって、サンプリング周波数Ｆ＝２２．０５ｋＨｚ、測定ポイントの数Ｍ＝１０２４という設定では、ノートナンバーｎ＝５〜１５，１７〜２３についての強度を定義することができなくなる。もちろん、測定ポイントの数Ｍ＝１０２４を更に増やすようにすれば、歯抜けの状態を解消することは可能であるが、そのような多数の測定ポイントについての演算を行うこと自体が非効率的である。
【００４０】
逆に、高周波領域では、第９７０番目の測定ポイント〜第１０２３番目の測定ポイントに至るまでの合計５４個の測定ポイントが、同一のノートナンバーｎ＝１２４に割り当てられている。もちろん、この場合、全５４個の測定ポイントについてのスペクトル強度の平均値をノートナンバーｎ＝１２４についての強度と定義すれば問題はないが、１つのノートナンバーｎ＝１２４についての強度値を求めるのに、５４個もの測定ポイントについての演算を行うこと自体が非効率的である。
【００４１】
結局、ＭＩＤＩデータのような非線形な符号コードへの変換を効率よく行うためには、必要な符号コードに合わせて周波数軸上に複数Ｍ個の測定ポイントを離散的に定義し、音響信号に含まれるＭ個の測定ポイントに相当する周波数成分についてのスペクトル強度だけを求めるようにすればよい。特に、ＭＩＤＩデータへの変換を行う場合は、対数尺度の周波数軸上で等間隔となるように複数Ｍ個の測定ポイントを離散的に定義すればよい。別言すれば、各測定ポイントの周波数が等比数列をなすように、複数Ｍ個の測定ポイントを離散的に定義すればよい。図７は、このようにして定義した測定ポイントの一部分を示す図である。図示されている各測定ポイントには、ノートナンバーｎ＝６０〜６５が割り当てられており、これら各測定ポイントは、対数尺度の周波数軸上で等間隔となっている。また、各測定ポイントの具体的な周波数値２６２，２７８，２９４，…に着目すると、等比数列をなしている。フーリエ変換によりスペクトル強度を演算する際には、これら各測定ポイントについてのスペクトル強度のみを演算するようにすれば、無駄な演算を省くことができる。このような無駄を省いた効率的な演算を行うための具体的な方法は、前掲の特願平１０−２８３４５３号明細書に詳述されているので、ここでは詳しい説明は省略する。
【００４２】
§４．一般化調和解析を用いた符号化方法
以上、§１〜§３において、先願発明において提案されているフーリエ解析を用いた符号化方法の概略を述べた。本願で提案する符号化方法は、大筋においては、前述した先願発明の符号化方法と同じである。すなわち、符号化対象となる音響信号の時間軸上に複数の単位区間を設定し、個々の単位区間ごとにそれぞれ区間信号（符号化対象となる音響信号のうちの各単位区間内に位置する部分）を抽出し、各区間信号を所定の符号コードに置き換えるという手法を採る。ただ、各区間信号を所定の符号コードに置き換える際に、先願発明ではフーリエ解析の手法を用いていたが、本願発明では一般化調和解析の手法を用いる点が異なっている。
【００４３】
たとえば、図８の上段に示すように、ある単位区間ｄについて区間信号ｘが与えられたとしよう。ここでは、区間長Ｌをもった単位区間ｄについて、サンプリング周波数Ｆでサンプリングが行われており、全部でｗ個のサンプル値が得られているものとし、サンプル番号を図示のように、０，１，２，３，…，ｋ，…，ｗ−２，ｗ−１としよう。ここで、任意のサンプル番号ｋについては、ｘ（ｋ）なる振幅値がデジタルデータとして与えられていることになる。
【００４４】
先願発明において提案されているフーリエ解析を用いた符号化方法の場合、この区間信号ｘについてのフーリエスペクトルを求め、スペクトル強度の高い周波数に対応するノートナンバーを所定の数だけ選出し、選出されたノートナンバーとそのスペクトル強度とに基づいて、ＭＩＤＩ符号化を行っていた。しかしながら、本来、フーリエ解析は、時間軸上に無限に連続する信号波形を対象とした解析手法であるため、図８の例のように、区間長Ｌという有限の時間内にしか存在しない区間信号ｘについての解析に適用した場合、正確な周波数解析を行うことができない。このため、高品質の符号化を行う際には問題があることは、既に述べたとおりである。
【００４５】
本願発明において適用する一般化調和解析の基本概念は、区間信号ｘをあらかじめ規定されたＩ個の調和関数に分解して取り扱うという点にある。一般の音響信号は可聴領域として２０Ｈｚから２０ｋＨｚの調和関数が連続して含まれていると考えられるが、本願の目的は、与えられた音響信号を、ＭＩＤＩで定義されている１２８種の不連続な周波数で強引に表現しようというものである。すなわち、図８に示すような、一見ランダムな信号波形を、数式で定義される複数の信号波形の和として表現しよう、という試みを行うことになる。そのために、まず、区間信号ｘの構成要素の候補となるべき複数通りの要素信号を準備しておく。ここでは、図８の下段の表に示されているような１２８通りの要素信号を準備することにする。各要素信号は、互いに同一の周波数をもった正弦関数と余弦関数との合成関数により構成されており、それぞれノートナンバー０〜１２７に対応している。たとえば、ノートナンバーｎに対応した要素信号は、周波数ｆ（ｎ）をもった正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）との合成関数として与えられる。変数ｋは図８上段に示されたサンプル番号、Ｆはサンプリング周波数であり、上記三角関数の項（ｋ／Ｆ）は、単位区間ｄの左端位置を基準としたときの時間ｔに相当するものである。また、図８の下段の表において各三角関数の頭に付されたＡ（ｎ），Ｂ（ｎ）は、振幅を示す係数である。ただし、各要素信号は、あくまでも区間信号ｘが存在する単位区間ｄと同一の区間内にのみ定義された信号である。なお、各ノートナンバー０〜１２７に相当する周波数をｆ（０）〜ｆ（１２７）とすれば、これらの周波数は等比級数をなすことになる（ノートナンバーが１２だけ隔たると１オクターブの隔たりとなり、周波数としては２倍の隔たりとなる）。
【００４６】
ここで行う一般化調和解析の目的は、区間信号ｘに相当する関数ｘ（ｋ）について、図９の式で示されるような誤差値Error を最小とするような近似関数ξ（ｋ）を求めることである。誤差値Error は、ｗ個の各サンプル番号位置（０〜（ｗ−１））における関数ｘ（ｋ）と近似関数ξ（ｋ）との二乗誤差の総和であり、この誤差値Error が小さければ小さいほど、近似関数ξ（ｋ）は関数ｘ（ｋ）に近似することになる。近似関数ξ（ｋ）は、図９の式に示されているように、１２８通りの要素信号（図８の下段の表に掲載されたもの）の総和であり、各要素信号の係数Ａ（ｎ），Ｂ（ｎ）をそれぞれ特定することにより定まる。別言すれば、図８の下段の表に掲載された各三角関数の係数Ａ（０）〜Ａ（１２７），Ｂ（０）〜Ｂ（１２７）の値をそれぞれ固有の値に定めた上で、これらすべての三角関数の総和を求めれば、この総和が近似関数ξ（ｋ）となる。誤差値Error を最小にするような近似関数ξ（ｋ）を求めるということは、そのような近似関数ξ（ｋ）の構成要素となる各要素信号についての個々の係数値を求めることに他ならない。このように、誤差値Error を最小にするための係数値を求めるには、たとえば、個々の係数値がとりうるすべての値を入れた膨大な組み合わせについて誤差値Error を演算し、最小値が得られた係数値の組み合わせをとればよい。しかしながら、このような方法は、演算負担が膨大なものとなるため現実的ではない。また、現在入手可能なＭＩＤＩ音源で同時に合成できる音は標準規格では１６であり、与えられた音響信号をＩ個の調和関数に分解する場合、Ｉを１６以下に設定しておかないと再生できない。そこで、本願では、次のような簡便な手法で一般化調和解析を行う。
【００４７】
まず、図８の上段に示すような区間信号ｘが与えられたら、とりあえず、この区間信号ｘに対してフーリエ変換を実行し、図１０に示すようなフーリエスペクトルを求める。フーリエ変換の演算負担は、ＦＦＴなどの手法を利用すれば、一般のパソコンを用いても実行可能な程度であり、特に、前述の§３で述べたように、１２８通りのノートナンバーに相当する周波数位置のみについてのスペクトル強度を求める効率的な演算手法を採れば、パソコンで十分に実行可能である（一般に、「フーリエ変換」という言葉は、線形周波数軸をもった周波数スペクトルを求める処理をさすが、本願明細書では、§３で述べたような対数周波数軸をもった周波数スペクトルを求める処理も含めた広義の意味で用いることにする）。続いて、こうして求められたフーリエスペクトルのピーク周波数に対応する要素信号を調和信号として選出する。ここで、調和信号とは、複数通りの要素信号の中から、区間信号ｘに対する相関値が最も高い信号を言う。たとえば、図１０に示す例の場合、フーリエスペクトルのピーク周波数ｆ（ｎ）に対応する要素信号が調和信号として選出されることになる（ピーク位置が、１２８通りの周波数のいずれかに正確に一致しない場合には、周波数軸上で最も近くにある周波数をとればよい）。この例の場合、フーリエ解析に基づく相関値が最も高い要素信号として、周波数ｆ（ｎ）をもった正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）との合成関数が選出されたことになる。前述したように、周波数ｆ（ｎ）は、ノートナンバーｎに相当する周波数であり、周波数ｆ（ｎ）とノートナンバーｎとの間には、図１０の下段に示すような関係式が成り立ち、ノートナンバーｎ＝６９に対応する周波数ｆ（６９）が４４０Ｈｚになる。
【００４８】
さて、こうして周波数ｆ（ｎ）をもった正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）との合成関数が調和信号として選出されたら、続いて、図１１に示す式に基づいて、係数Ａ（ｎ）とＢ（ｎ）とを求める。ここで、これらの係数は、実は、この調和信号と区間信号ｘとの相関値となっている。すなわち、係数Ａ（ｎ）は、正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と区間信号ｘ（ｋ）との相関を示す値となっており、係数Ｂ（ｎ）は、余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）と区間信号ｘ（ｋ）との相関を示す値となっている。
【００４９】
たとえば、係数Ａ（ｎ）を求める式の右辺に着目すると、第ｋ番目のサンプル位置において、区間信号ｘ（ｋ）の値と、正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）の値との積が求められているが、もし両関数が全く同一の関数であったとすれば（別言すれば、最大の相関を有していたとすれば）、サンプル位置ｋの値によらず、両関数値は必ず同符号となるため、両者の積は必ず正になる。よって、ｋ＝０〜（ｗ−１）についての総和、すなわち係数Ａ（ｎ）の値は、正の大きな値になる。これに対して、もし両関数の間に全く相関がなかったとすれば、サンプル位置ｋの値により、両関数値は同符号となったり、異符号となったりし、両者の積は全くランダムに正になったり負になったりする。よって、ｋ＝０〜（ｗ−１）についての総和、すなわち係数Ａ（ｎ）の値は、０に近くなる。
【００５０】
なお、前述したように、周波数ｆ（ｎ）をもった正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）との合成関数が、調和信号として選出されたのは、図１０に示すフーリエスペクトルにおいて、周波数ｆ（ｎ）がピークを示したためである。したがって、区間信号ｘには、周波数ｆ（ｎ）をもった要素信号の成分が最も多く含まれていると予想されるので、相関を示す値である係数Ａ（ｎ）とＢ（ｎ）とは、比較的大きな値になるはずである。
【００５１】
さて、ここで、図１２に示すような信号Ｇ（ｋ）を定義する。この信号Ｇ（ｋ）は、調和信号（上述の各三角関数）と、この調和信号について得られた相関値（上述の係数Ａ（ｎ），Ｂ（ｎ））との積で与えられる信号であり、いわば選出された調和信号に、相関値に対応した振幅値を与えたものである。別言すれば、この信号Ｇ（ｋ）は、区間信号ｘ（ｋ）内に含まれている主たる構成信号のひとつと言うことができる。前述したように、一般化調和解析の目的は、区間信号ｘに近似する近似関数ξ（ｋ）を求めることであるが、信号Ｇ（ｋ）は、この近似関数ξ（ｋ）の構成要素の１つということになる。したがって、本願明細書では、この信号Ｇ（ｋ）のことを、区間信号ｘ（ｋ）内に含まれている信号のひとつという意味で、「含有信号」と呼ぶことにする。もちろん、区間信号ｘ（ｋ）内には、他にも多数の信号が含まれており、上述の手法で求まった第１の含有信号Ｇ（ｋ）以外にも、含有信号となるべき信号をみつける必要がある。
【００５２】
そのために、図１３に示すような差分演算を行う。すなわち、区間信号ｘから含有信号Ｇを減じることにより、差分信号を求めるのである。具体的には、ｘ（ｋ）−Ｇ（ｋ）なる演算を、すべてのｋの値（ｋ＝０〜（ｗ−１））について行えばよい。こうして得られた差分信号は、第１の含有信号Ｇ（ｋ）以外の信号成分からなる信号ということができる。したがって、この差分信号を新たな区間信号（前述の区間信号ｘ（ｋ）を第１の区間信号と呼べば、ｘ（ｋ）−Ｇ（ｋ）で求まる差分信号は、第２の区間信号ということになる）として、上述の手法と同等の手法を繰り返し実行すれば、今度は、第２の含有信号を求めることができる。この第２の含有信号は、第１の含有信号とともに、第１の区間信号ｘ（ｋ）に構成要素として含まれていた信号ということになる。更に、第２の区間信号から第２の含有信号を減じることにより第２の差分信号を求め、この第２の差分信号を新たな区間信号、すなわち、第３の区間信号として、更に同じ手法を繰り返せば、第３の含有信号を求めることができる。
【００５３】
このような処理を繰り返してゆけば、複数Ｐ個の含有信号を求めることができ、各含有信号に基づいて、複数Ｐ個の符号コードを生成することができる。たとえば、上述の処理を３回繰り返すことにより、

なる３つの含有信号が得られたとすると（ここで、ｎ１，ｎ２，ｎ３は、０〜１２７のうちのいずれかのノートナンバー）、ノートナンバーが「ｎ１」、ベロシティーが「Ａ（ｎ１）^２＋Ｂ（ｎ１）^２」の平方根（実行振幅値）となるようなＭＩＤＩ符号と、ノートナンバーが「ｎ２」、ベロシティーが「Ａ（ｎ２）^２＋Ｂ（ｎ２）^２」の平方根（実行振幅値）となるようなＭＩＤＩ符号と、ノートナンバーが「ｎ３」、ベロシティーが「Ａ（ｎ３）^２＋Ｂ（ｎ３）^２」の平方根（実行振幅値）となるようなＭＩＤＩ符号と、によって、区間信号ｘが符号化されることになる。
【００５４】
上述の処理を、図１４〜図１６に示す一般式を用いて説明すると次のようになる。すなわち、第ｉ番目の区間信号ｘｉ（ｋ）が与えられたら、この区間信号ｘｉ（ｋ）についてのフーリエスペクトルを求め、そのピーク周波数ｆ（ｎｉ）を決定する。そして、このピーク周波数ｆ（ｎｉ）に相当する要素信号を第ｉ番目の調和信号として選出し、この調和信号についての係数Ａ（ｎｉ），Ｂ（ｎｉ）を、図１４の式に基づいて計算する。続いて、図１５の式のように、第ｉ番目の含有信号Ｇｉ（ｋ）を定義し、図１６の式のように、第ｉ番目の区間信号ｘｉ（ｋ）から第ｉ番目の含有信号Ｇｉ（ｋ）を減ずることにより差分信号を求め、この差分信号を、第（ｉ＋１）番目の区間信号ｘ_ｉ＋１（ｋ）とする。このような処理を、初期値ｉ＝１から、ｉを１ずつ増やしながら、必要な回数だけ繰り返し実行すればよい。
【００５５】
以上が、一般化調和解析として知られている周波数解析法である。相関値が最も高い要素信号を調和信号として選出する際にフーリエ変換を利用しているが、基本的には、複数の要素信号の総和として原信号を表現する手法を採っており、フーリエ解析とは異なるアプローチを採っている。なお、図１１あるいは図１４の式において、右辺の先頭に２／ｗなる項が存在するが、この項における分母「ｗ」は、全サンプル数ｗで除すことを示しており、分子の「２」は、この一般化調和解析を行うのに最も適した係数値として経験的に知られている数値である（理論的にも説明のつく数値であるが、ここでは詳しい説明は省略する）。
【００５６】
最後に、一般化調和解析を利用した本発明に係る符号化方法の基本手順を図１７の流れ図に基づいて説明する。まず、ステップＳ１において、符号化対象となる音響信号を入力する。具体的には、既に述べたように、所定のサンプリング周波数Ｆでサンプリングし、ＰＣＭの手法でデジタルデータとして取り込むことになる。続いて、ステップＳ２において、時間軸上に複数の単位区間を設定し、個々の単位区間ごとにそれぞれ区間信号ｘを抽出する。単位区間の設定は、§２で述べたように、隣接する単位区間が時間軸上で部分的に重複するようにするのが好ましい。続いて、ステップＳ３において、パラメータｉを１に設定する。このパラメータｉは、上述した繰り返し処理の回数をカウントするためのものである。
【００５７】
次に、ステップＳ４において、第ｉ番目の区間信号ｘｉをフーリエ変換する。ｉ＝１の場合、ステップＳ２で抽出された区間信号ｘが、ステップＳ４における区間信号ｘｉとなる。そして、ステップＳ５において、得られたフーリエスペクトルのピークに対応する周波数ｆ（ｎｉ）を１２８通りの候補の中から決定する。ここで、１２８通りの候補は、図８の下段の表に示された周波数ｆ（０）〜ｆ（１２７）であり、ＭＩＤＩにおける１２８通りのノートナンバーに対応した周波数である。このステップＳ５における周波数ｆ（ｎｉ）の決定処理は、１２８通りの要素信号の中から、区間信号ｘｉに対する相関値（この場合は、フーリエスペクトルの強度）が最も高い要素信号を決定する処理に相当し、周波数ｆ（ｎｉ）をもった要素信号を、ここでは調和信号と呼んでいる。
【００５８】
続いて、ステップＳ６において、この調和信号についての係数Ａ（ｎｉ），Ｂ（ｎｉ）を算出し（図１４の式）、第ｉ番目の含有信号Ｇｉを求める（図１５の式）。ここで算出したＡ（ｎｉ），Ｂ（ｎｉ）が、区間信号ｘｉに対する調和信号の相関値に相当するものであることは既に述べたとおりである。なお、この相関値は、ステップＳ４においてフーリエスペクトルを求める際にも計算されているので、これをそのまま利用してもかまわない。
【００５９】
次に、ステップＳ７において、第ｉ番目の区間信号ｘｉから、第ｉ番目の含有信号Ｇｉを減じることにより差分信号を求め、この差分信号を第（ｉ＋１）番目の区間信号ｘ（ｉ＋１）とする。そして、ステップＳ８において、パラメータｉが所定回数値Ｉまで到達したか否かが判定され、Ｉに達していない場合には、ステップＳ９へと進み、ｉが１だけ更新され、ステップＳ４へと戻ることになる。このステップＳ４では、今度は、第（ｉ＋１）番目の区間信号ｘ（ｉ＋１）についてのフーリエ変換が行われることになる。所定回数値Ｉは、１つの単位区間をいくつの符号データで表現するかを示すパラメータになる。たとえば、図３に示す例では、１つの単位区間を３つのＭＩＤＩ符号データによって表現し、これらをトラックＴ１〜Ｔ３に配置している。この場合、Ｉ＝３に設定し、３つの含有信号Ｇ１，Ｇ２，Ｇ３を求め、それぞれからＭＩＤＩ符号データを求めればよい。実際には、Ｉ＝８程度に設定し、８トラック分のＭＩＤＩ符号データを生成するのが好ましい。
【００６０】
この図１７に示す例では、ステップＳ４〜Ｓ８までの処理が必ずＩ回分繰り返されることになるが、ｉ＜Ｉであっても、図９に示すError 値が所定の設定値よりも小さくなるような近似関数ξ（ｋ）が得られた場合には、そこで繰り返し作業を打ち切るようにしてもかまわない。たとえば、上述の処理を３回繰り返すと、３つの含有信号Ｇ１，Ｇ２，Ｇ３が求まる。ここで、近似関数ξ（ｋ）＝Ｇ１＋Ｇ２＋Ｇ３として、図９に示すError 値を計算した結果、所定の設定値よりも小さかった場合、３つの含有信号Ｇ１，Ｇ２，Ｇ３の総和により、区間信号ｘ（ｋ）にかなり近い信号が既に実現できていることになる。したがって、ステップＳ８の直前に、ξ（ｋ）＝ΣＧｉを求め、図９に示すError 値を計算し、所定の設定値と比較するステップを追加し、所定の設定値以下であった場合には、ステップＳ１０へと進むようにしておいてもよい。
【００６１】
なお、ステップＳ５において、１２８通りの候補の中から、１つの周波数ｆ（ｎｉ）を選出したときに、既に選出済みの周波数が再び選出される可能性もある（一般的には、ステップＳ７において、一度選出された周波数を含む信号が引き算されるので、残った差分信号には、当該周波数成分はあまり多く含まれておらず、既に選出済みの周波数が再選出される可能性は低いと考えられるが）。このような場合、同一周波数の再選出を許す取り扱いと、許さない取り扱いとの２通りの取り扱いができる。前者の取り扱いを行うのであれば、ステップＳ５において、重複選出か否かをチェックすることなく、そのままフーリエスペクトルのピークに対応する周波数を選出すればよい。この場合、最終的に得られた含有信号の中に、周波数が同一のものが含まれることになり、異なるトラックに、同一音階のＭＩＤＩ符号データが配置されることになる。一方、後者の取り扱いを行うのであれば、ステップＳ５において、重複選出か否かをチェックし、もし重複選出となる場合には、次の候補（フーリエスペクトルの次のピークに対応する周波数）を選出すればよい。
【００６２】
こうして、必要な回数の繰り返し処理が完了したら、当該単位区間についての符号化は完了するので、ステップＳ１０からステップＳ１１へと進み、単位区間の更新が行われる。たとえば、図４に示すような区間設定を行っているのであれば、単位区間をオフセット長ΔＬ（２０サンプル分）だけずらした単位区間がステップＳ２において新たに設定され、この新たな単位区間について採取された１０２４サンプル分のデータが、新たな区間信号ｘとして抽出されることになる。このような処理が、全区間について完了すれば、ステップＳ１０を経て、この符号化の手順は終了する。
【００６３】
§５．一般化調和解析の演算負担を軽減する工夫
さて、本発明の骨子は、§４で述べたように、一般化調和解析を行うことにより、区間信号を複数の含有信号に分解し、個々の含有信号をそれぞれ符号データに変換することにある。ただ、一般化調和解析の手法は、図１７の流れ図にも示されているように、多数の信号相互間で相関を求める演算が必要になるため、フーリエ解析の手法に比べると演算負担は膨大なものとなる。このため、現状では、一般的な利用にまでは至っていない。そこで、本願発明者は、この一般化調和解析の演算負担を軽減させるための工夫をいくつか案出した。これらの工夫を行うことにより、実際の音響信号の符号化を、パソコンを用いて実用レベルで行うことが可能になる。以下、これらの工夫を順に述べる。これらの工夫は、それぞれ単独で実施することも可能であるが、実用上は、すべてを組み合わせるのが好ましい。
【００６４】
(1). 簡易相関演算の導入
§４で述べた手法において、２つの信号（関数）の相関を演算するステップは、２か所に存在する。第１のステップは、１２８通りの要素信号の中から、１つの調和信号（区間信号に対する相関値が最も高い信号）を選出するステップであり、図１７の流れ図では、ステップＳ４において区間信号ｘｉをフーリエ変換し、ステップＳ５において１２８候補の中からスペクトルピークに対応する周波数を決定する処理に相当する。一方、第２のステップは、選出された調和信号についての係数Ａ（ｎ），Ｂ（ｎ）を求めるためのステップであり、図１７の流れ図では、ステップＳ６における演算処理に相当する。実は、図１７のステップＳ４とステップＳ６とは、本質的には同じことを行っている。
【００６５】
そもそも、ステップＳ４において行われるフーリエ変換は、図１８に示すように、特定の三角関数との相関値を求める演算である。たとえば、図１８(a) に示すように、単位区間ｄ内に所定の区間信号ｘが与えられていたとしよう。ここで、単位区間ｄは、区間長Ｌをもった区間であり、区間信号ｘはサンプリング周波数Ｆでサンプリングされたデータである。サンプル番号ｋ（ｋ＝０，１，２，…，ｗ−１）で示される区間信号ｘのサンプル値は、ｘ（ｋ）となる。この区間信号ｘに対して、たとえば、図１８(b) に示すような、同一の単位区間に定義された周波数ｆ（ｎ）の正弦波 sin（２πｆ（ｎ）ｋ／Ｆ）を用意する。ここで、区間信号ｘとこの正弦波信号との相関値Ｓ１（ｎ）は、図１８(c) の式によって計算できる。この式の右辺に着目すると、第ｋ番目のサンプル位置において、区間信号ｘ（ｋ）の値と、正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）の値との積が求められている。もし両関数が全く同一の関数であったとすれば（別言すれば、最大の相関を有していたとすれば）、サンプル位置ｋの値によらず、両関数値は必ず同符号となるため、両者の積は必ず正になる。よって、ｋ＝０〜（ｗ−１）についての総和、すなわち相関値Ｓ１（ｎ）の値は、正の大きな値になる。これに対して、もし両関数の間に全く相関がなかったとすれば、サンプル位置ｋの値により、両関数値は同符号となったり、異符号となったりし、両者の積は全くランダムに正になったり負になったりする。よって、ｋ＝０〜（ｗ−１）についての総和、すなわち相関値Ｓ１（ｎ）の値は、０に近くなる。
【００６６】
一方、正弦波 sin（２πｆ（ｎ）ｋ／Ｆ）の代わりに、余弦波 cos（２πｆ（ｎ）ｋ／Ｆ）を用いた相関値Ｓ２（ｎ）も、図１８(c) の式によって計算できる。周波数ｆ（ｎ）の成分をもった周期信号との相関を求める上では、位相差の影響を避けるために、正弦波に対する相関と余弦波に対する相関との双方を考慮する必要がある（どのような位相であっても、正弦波と余弦波との双方を考慮すれば、いずれかで相関が検出できる）。そこで、実際には、図１８(c) の最下段の式に示されているように、正弦波に対する相関値Ｓ１（ｎ）と余弦波に対する相関値Ｓ２（ｎ）との二乗和の平方根Ｅ（ｎ）を、周波数ｆ（ｎ）の成分をもった周期信号との相関値として求め、フーリエスペクトルを得るようにする。図１(c) における実行強度Ｅは、この二乗和の平方根Ｅ（ｎ）の値に相当する。
【００６７】
ところで、図１８(c) に示した相関値Ｓ１（ｎ），Ｓ２（ｎ）に関する式は、図１１に示した係数Ａ（ｎ），Ｂ（ｎ）に関する式とほぼ同じである。これはいずれも、周波数ｆ（ｎ）の成分をもった周期信号と区間信号ｘとの相関値を求める式であるためである（図１１の式において、２／ｗなる項は、前述したように、調和解析を行う上で経験的に得られた係数である）。結局、図１７のステップＳ４とステップＳ６とでは、ほぼ同じ演算処理が実行されることになる。ただし、ステップＳ４の目的は、１２８通りの要素関数の中で、区間信号ｘに対する相関値が最も高いものを調和信号として選出することにあるのに対し、ステップＳ６の目的は、選出された調和信号についての相関値を求め、調和信号に相関値を乗じることにより含有信号Ｇｉを得ることにある。
【００６８】
この目的の相違に着目すると、次の２つの特徴が明らかになる。第１の特徴は、ステップＳ４では、１２８通りのすべての要素関数に対する相関を計算する必要があるのに対し、ステップＳ６では、調和関数について（つまり、１２８通りの中から選出された１通りの要素関数について）の相関さえ計算できればよい、ということである。そして第２の特徴は、ステップＳ４では、１２８通りの要素関数についての相関の大小関係が判別できればよいので、相関の演算精度はあまり要求されないのに対し、ステップＳ６では、含有信号Ｇｉの振幅値に相当する係数Ａ（ｎ），Ｂ（ｎ）を決定する必要があるため、ある程度の演算精度をもった相関値が要求される、ということである。
【００６９】
このような２つの特徴を考慮すると、ステップＳ４では、１２８通りのすべての要素関数についての相関を求めねばならないが、粗い相関値が求まれば十分であり、ステップＳ６では、１通りの要素関数（調和関数）についてのみ、高い精度での相関値を求めればよい、ということがわかる。
【００７０】
ステップＳ４において行う粗い相関演算としては、たとえば、図１９に示すような簡易相関演算の手法を用いることができる。まず、図１９(a) に示すような区間信号ｘについて、振幅のピーク位置を検出する。なお、ここでは、正のピークと負のピークとが交互に現れるという前提で各ピーク位置を決定しており、同極性のピークが連続して現れるような場合には、よりピーク値の大きい一方だけをピークとして認識するようにしている。図示の例では、５つのピーク位置Ｐ１〜Ｐ５（それぞれ時間ｔ（Ｐ１）〜ｔ（Ｐ５）に現れる）が検出されている。こうして、区間信号ｘのピーク位置が検出されたら、このピーク位置に関する情報のみを用いて相関値を演算するのである。
【００７１】
たとえば、図１９(b) に示すような周波数ｆ（ｎ）の正弦波との相関値を演算するのであれば、時間ｔ（Ｐ１）〜ｔ（Ｐ５）の５ヶ所の位置のみにおける振幅値の積を演算し、その総和をとることになる。別言すれば、通常の相関演算（フーリエ解析）では、図１８(c) に示す式に基づいて、相関値Ｓ１（ｎ），Ｓ２（ｎ）を求め、最終的な相関値（実効値）Ｅ（ｎ）を得ていたのに対し、簡易相関演算では、図１９(c) に示す式に基づいて、簡易相関値ＳＳ１（ｎ），ＳＳ２（ｎ）を求め、最終的な簡易相関値（実効値）ＥＥ（ｎ）を得ることになる。ここで、パラメータｊは、ピーク位置番号を示し、ｘ（Ｐｊ）は、ｊ番目のピーク位置における区間信号ｘの値を、ｔ（Ｐｊ）は、ｊ番目のピーク位置の時間値を、Ｊは、ピーク位置の総数を、それぞれ示している。
【００７２】
こうして、１２８通りの簡易相関値ＥＥ（ｎ）が求まったら、その中で最も大きな値を示す簡易相関値に対応した周波数ｆ（ｎ）をもつ要素信号を調和信号として選出すればよい。そして、ステップＳ６では、この調和信号についてのみ、図１１に示す式に基づいて、通常の相関演算（調和信号のピーク位置だけではなく、全情報（全サンプル位置）を用いた相関値の演算）を行い、この再演算の結果得られた相関値（係数Ａ（ｎ），Ｂ（ｎ））を用いて含有信号Ｇを求めるようにすればよい。
【００７３】
このような簡易相関演算を導入する手法を採った場合の処理手順を図２０の流れ図に示す。ここで、図１７の流れ図と同一のステップについては同一番号を付してある。図１７の流れ図と異なるステップは、ステップＳ４１およびステップＳ５１である。図１７のステップＳ４では、フーリエ変換（１２８通りの要素信号すべてに対しての通常の相関演算）が行われていたのに対し、図２０のステップＳ４１では、１２８通りの要素信号に対して上述の簡易相関演算を行っている。また、図１７のステップＳ５では、フーリエスペクトルのピーク位置に対応する要素信号を調和信号として選出していたのに対し、図２０のステップＳ５１では、１２８通りの簡易相関演算の結果、最も強い相関を示す要素信号を調和信号として選出することになる。
【００７４】
具体的な数値を掲げておけば、たとえば、１単位区間内のサンプル数ｗが１０２４であるのに対し、振幅のピーク位置の数Ｊが１００程度であったとすれば、簡易相関演算を採用することにより、演算負担は１／１０程度にまで軽減されることになる。
【００７５】
(2). 要素信号候補の絞り込み（その１）
上述した図２０の流れ図に示す処理では、ステップＳ６において第ｉ番目の含有信号Ｇｉが求まり、ステップＳ７において、第（ｉ＋１）番目の区間信号ｘ（ｉ＋１）が求まり、ステップＳ９においてｉの値が更新された後、再びステップＳ４１において、新たな区間信号と１２８通りの要素信号との間で相関演算（簡易相関演算）が行われる。もちろん、パラメータｉの更新により、区間信号ｘｉも更新されるので（ステップＳ７の演算における差分信号が新たな区間信号となる）、ステップＳ４１において、この新たな区間信号ｘｉにについての相関演算を行うことは意味のあることではあるが、演算効率をより高めるために、このステップＳ４１において毎回行われる相関演算を一部省略することができる。
【００７６】
図２１に示す流れ図は、このような省略を行った処理を示している。図２０の流れ図と相違するステップは、ステップＳ４２，ステップＳ５２，ステップＳ９２である。図２０のステップＳ４１では、１２８通りの要素信号と区間信号ｘｉとの間で簡易相関演算が行われたが、図２１のステップＳ４２では、更に、この簡易相関演算の結果として得られた相関の強い順に、上位１６候補を抽出する処理が加わる（抽出する候補の数は、ステップＳ８のＩの値の倍数に設定するのが好ましい。本実施例では、Ｉ＝８としているため、その２倍の１６候補を抽出するようにしている。）。そして、ステップＳ５２では、この抽出した１６候補の中で、相関が最も強い要素信号を調和信号として選出する処理が行われる。もっとも、この時点では、ステップＳ５２の処理内容は、ステップＳ５１の処理内容と全く変わりはない（１２８候補中の第１位と、１６候補中の第１位とは当然同じである）。調和信号として選出される要素信号は、常に、相関が最も強い要素信号ということになるので、ステップＳ４２において、上位１６候補を抽出したとしても、続くステップＳ５２では、第１位のみが調和信号として選出されることになり、第２位〜第１６位までの候補は、この時点では何ら意味をもたない。
【００７７】
しかしながら、図２１に示す手順の場合、ステップＳ９によりパラメータｉの値が更新された後は、ステップＳ４２の処理を実行する代わりに、ステップＳ９２の処理を実行することになる。このステップＳ９２の処理は、既に抽出された１６候補の要素信号と、区間信号ｘｉとの間での簡易相関演算を行う処理である。ステップＳ４２の処理では、１２８通りすべての要素信号に対する相関を計算していたのに対し、ステップＳ９２の処理では、１６通りの要素信号に対する相関を計算するだけですむ。この手法は、パラメータｉの値が更新されたとしても、区間信号ｘｉに含まれている周波数成分に大きな変わりはないであろうとの考え方に基づくものである。結局、パラメータｉ＝１として、区間信号ｘ１（原音響信号）について上位１６候補を抽出したら、この上位１６候補以外の要素信号は、パラメータｉ＝２以降の区間信号ｘｉ（ｉ個の含有信号を次々に減じていった残りの信号）については一切考慮しないことになるが、大きな支障は生じない。
【００７８】
この手法を採れば、１２８通りのすべての要素信号についての相関演算を行うのは、パラメータｉ＝１、すなわち、第１回目の調和信号Ｇ１を選出する際だけに限られ、第２回目の調和信号Ｇ２，Ｇ３，…を選出する際には、抽出された１６候補の要素信号についての相関演算を行えばよいので、演算負担は、１／８程度にまで軽減されることになる。
【００７９】
なお、この要素信号候補の絞り込み手法は、簡易相関演算を行わない図１７に示す処理にも適用可能である。要するに、この手法では、各単位区間の区間信号について第１回目の調和信号を選出する際に、複数Ｘ通り（上例の場合Ｘ＝１２８）の要素信号の中から、この区間信号に対する相関値の高い順に第１位〜第Ｙ位（上例の場合Ｙ＝１６）までの複数Ｙ個（Ｙ＜Ｘ）の候補を選出しておき、第１位の候補を第１回目の調和信号として選出し、第２回目以降の調和信号を選出する際には、複数Ｙ個の候補の中から区間信号に対する相関値が最も高い要素信号を調和信号として選出すればよい。
【００８０】
(3). 要素信号候補の絞り込み（その２）
前述の候補絞り込み手法では、パラメータｉが２に更新された後に、候補を絞り込んだ演算を行っていた。これに対し、ここで述べる候補絞り込み手法では、単位区間が更新された後に、候補を絞り込んだ演算を行うことになる。このような絞り込み手法は、§２で述べたように、隣接する単位区間が時間軸上で部分的に重複するような設定を行った場合に有効である。たとえば、図４に示す例で、単位区間ｄ１内の１０２４個のサンプルデータと、単位区間ｄ２内の１０２４個のサンプルデータとを比較すると、わずか２０サンプル分だけが相違しており、残りの１００４個分のデータは全く共通である。ところが、図１７，図２０，図２１に示す手順では、いずれもステップＳ１１で単位区間が更新された後は、再び前回と同じ処理を繰り返している。ここで述べる絞り込み手法は、この点に着目して演算効率を向上させる手法である。
【００８１】
図２２は、この手法の処理を示す流れ図である。図２１の流れ図と相違するステップは、ステップＳ４３〜Ｓ４５である。まず、ステップＳ２において特定の単位区間が設定されたら、ステップＳ３においてパラメータｉを初期値１に設定し、ステップＳ４３において、詳細演算を行うか否かを判断する。ステップＳ２で抽出された区間信号が、最初の区間信号の場合は、詳細演算を行うことになり、ステップＳ４３からステップＳ４４へと進む。このステップＳ４４では、図２１のステップＳ４２の処理とほぼ同じ処理が行われる。すなわち、１２８通りの要素信号と区間信号ｘｉとの間で簡易相関演算が行われ、この簡易相関演算の結果として得られた相関の強い順に、複数の候補が抽出される。ただし、この図２２に示す例では、上位３２候補を抽出するようにしている（ここでも、抽出する候補の数は、ステップＳ８のＩの値の倍数に設定するのが好ましい。本実施例では、Ｉ＝８としており、その４倍の３２候補を抽出するようにしている。）。
【００８２】
続くステップＳ５２は、図２１に示すステップＳ５２と全く同じであり、抽出した１６候補の中から、相関が最も強い要素信号を調和信号として選出する処理が行われる。ステップＳ４４では、上位３２候補が抽出されているが、ステップＳ５２では、そのうちの上位１６候補のみが利用されている。以下の手順は、図２１の手順と全く同様である。すなわち、ステップＳ９でパラメータｉが更新された後は、ステップＳ４４を実行する代わりに、ステップＳ９２が実行され、常に１６候補についての相関演算のみが行われることになる。したがって、この時点では、ステップＳ４４において抽出された上位３２候補のうち、第１７位〜第３２位までの候補は何ら利用されることはない。
【００８３】
こうして、パラメータｉが設定値Ｉに到達すると、当該単位区間についての処理は完了し、ステップＳ１０からステップＳ１１を経て、再びステップＳ２へと戻ることになる。ここで、新たな単位区間が設定され、新たな区間信号が抽出されるが、前述したように、新区間信号と旧区間信号とは、時間軸上で大部分が重複している。このように、過去にステップＳ４４で、旧区間信号に対して、１２８通りの要素信号についての相関演算が行われており、新区間信号と旧区間信号とが時間軸上で所定の時間以上にわたって重複している場合には、ステップＳ４３において、詳細演算は行わないこととし、ステップＳ４５へと進むようにする。このステップＳ４５では、１２８通りの要素信号についての相関を演算する代わりに、過去にステップＳ４４で抽出された３２候補の要素信号についての相関演算が行われ、その結果、上位１６候補が抽出される。以下、ステップＳ５２以降では、この１６候補の中から調和信号の選出が行われることになる。
【００８４】
このような手法を採れば、単位区間が更新されるごとに、１２８通りすべての要素信号についての相関演算を繰り返し行う必要がなくなり、演算負担はほぼ１／４程度に軽減されることになる。なお、この手法は、図２１に示す処理だけでなく、図１７，図２０に示す処理にも適用可能である。要するに、この手法では、第１の単位区間の区間信号についての調和信号を選出する際に、複数Ｘ通り（上例の場合Ｘ＝１２８）の要素信号の中から、区間信号に対する相関値の高い順に第１位〜第Ｚ位（上例の場合、Ｚ＝３２）までの複数Ｚ個（Ｚ＜Ｘ）の候補を選出しておき、この複数Ｚ個の候補の中から調和信号を選出するようにし、この第１の単位区間に対して時間軸上で所定の時間以上にわたって重複するような第２の単位区間の区間信号についての調和信号を選出する際には、複数Ｚ個の候補の中から調和信号を選出するようにすればよい。
【００８５】
ステップＳ４３において、詳細演算を行うか否かの判断基準は、単位区間のずれ量を考慮して適当な値を設定すればよく、たとえば、単位区間の区間長Ｌの半分の長さ（Ｌ／２）以上ずれるごとに、詳細演算を行うように設定しておけばよい。図４に示す例の場合であれば、１単位区間内のサンプル数が１０２４であるから、その半分の５１２サンプル以上ずれた場合に、詳細演算を行うようにすればよい。具体的には、この場合、１回の単位区間更新によって２０サンプル分ずれるので、ステップＳ１１における単位区間更新を約２５回行うたびに、詳細演算が行われることになる。
【００８６】
§６．周波数ゆらぎに対する対処方法
ビブラートを伴う楽器音や、人間の声音（ボーカル音）などには、微小な周波数ゆらぎが含まれている。ところが、これまで述べてきた手法では、要素信号をいずれも調和関数（正弦関数や余弦関数など、単一周波数をもった関数）によって構成していたため、微小な周波数ゆらぎを含んだ音響信号については、必ずしも正しい相関が得られない可能性がある。たとえば、１単位区間内において、ノートナンバーｎに対応する周波数ｆ（ｎ）から、ノートナンバー（ｎ＋１）に対応する周波数ｆ（ｎ＋１）に至までの周波数変動があった場合、この単位区間内の区間信号に対して、周波数ｆ（ｎ）をもった要素信号および周波数ｆ（ｎ＋１）をもった要素信号についての相関を調べると、いずれについても５０％程度の相関しか得られなくなってしまう。ここでは、このような周波数ゆらぎに対する対処方法について述べる。
【００８７】
(1). 非調和関数を用いる対処方法
まず、要素信号として、調和関数だけでなく、非調和関数を用いる方法を述べる。図２３(a) には、代表的な調和関数である正弦波の波形が示されている。この正弦波は、単一の周波数ｆ（ｎ）をもった調和関数であり、サンプリング周波数をＦ、サンプル番号をｋとすれば、 sin（２πｆ（ｎ）ｋ／Ｆ）なる式で表される。これに対して、図２３(b) に示すような非調和関数を考えてみる。この非調和関数の周波数は、区間長Ｌにわたって徐々に変化している。すなわち、区間の左側では周波数が低く、区間の右側では周波数が高くなっており、周波数はサンプル番号ｋに依存して変わることになる。この非調和関数は、 sin（２πｆｊ（ｎ，ｋ）ｋ／Ｆ）なる式で表される。ここで、ｆｊ（ｎ，ｋ）は、図２３(c) に示す式で表される関数であり、ｊ＝−１，０，＋１である。
【００８８】
図２４は、関数ｆｊ（ｎ，ｋ）の意味を説明するための図であり、ｊ＝−１，０，＋１の３種類の場合に分けて、この意味が説明されている。たとえば、ｊ＝−１の場合、関数ｆｊ（ｎ，ｋ）＝（ｆ（ｎ−１）−ｆ（ｎ））ｋ／ｗ＋ｆ（ｎ）となる。ここで、ｋ＝０を代入すると、ｆｊ（ｎ，０）＝ｆ（ｎ）、ｋ＝ｗを代入すると、ｆｊ（ｎ，ｗ）＝ｆ（ｎ−１）となるので、図２４上段に示すように、区間長Ｌの左端では周波数ｆ（ｎ）、右端では周波数ｆ（ｎ−１）をもち、左から右へゆくにしたがって、周波数が徐々に減少する非調和関数になることがわかる。また、ｊ＝０の場合は、関数ｆｊ（ｎ，ｋ）＝ｆ（ｎ）となり、図２４中段に示すように、周波数が常に一定の調和関数（図２３(a) の正弦関数）になることがわかる。更に、ｊ＝＋１の場合、関数ｆｊ（ｎ，ｋ）＝（ｆ（ｎ＋１）−ｆ（ｎ））ｋ／ｗ＋ｆ（ｎ）となる。ここで、ｋ＝０を代入すると、ｆｊ（ｎ，０）＝ｆ（ｎ）、ｋ＝ｗを代入すると、ｆｊ（ｎ，ｗ）＝ｆ（ｎ＋１）となるので、図２４下段に示すように、区間長Ｌの左端では周波数ｆ（ｎ）、右端では周波数ｆ（ｎ＋１）をもち、左から右へゆくにしたがって、周波数が徐々に増加する非調和関数（図２３(b) に示すような関数）になることがわかる。
【００８９】
§４で述べた方法では、図８の下段の表に示すように、ノートナンバー０〜１２７に対応して、周波数ｆ（０）〜ｆ（１２７）をもった１２８通りの調和関数（正弦関数と余弦関数との合成関数）が要素信号として用意された。ここでは、この１２８通りの要素信号のそれぞれについて、図２３(c) に示す式におけるｊの値を−１と＋１にした非調和関数を用意する。たとえば、周波数ｆ（ｎ）については、図２３(a) に示すような正弦関数 sin（２πｆ（ｎ）ｋ／Ｆ）と、余弦関数 cos（２πｆ（ｎ）ｋ／Ｆ）との合成によって得られる第１の合成関数（ｊ＝０に相当する調和関数）と、ｊ＝−１に設定することにより定義される第２の合成関数（区間開始周波数ｆ（ｎ）から区間終了周波数ｆ（ｎ−１）に至るまで連続的に周波数が変化するような正弦関数と余弦関数との合成により得られる非調和関数）と、ｊ＝＋１に設定することにより定義される第３の合成関数（区間開始周波数ｆ（ｎ）から区間終了周波数ｆ（ｎ＋１）に至るまで連続的に周波数が変化するような正弦関数と余弦関数との合成により得られる非調和関数）と、の３通りの要素信号を用意する。結局、合計１２８×３通りの要素信号が用意されることになる。
【００９０】
そして、調和信号を選出するための相関演算（図１７のステップＳ４，図２０のステップＳ４１，図２１のステップＳ４２，Ｓ９２，図２２のステップＳ４４，Ｓ４５，Ｓ９２）においては、合計１２８×３通り（候補の絞り込みを行う場合には、１６×３通りまたは３２×３通り）の相関演算を行うようにする。そして、もし、ｊ＝−１あるいはｊ＝＋１に相当する非調和関数をもった要素信号との相関値が最大となった場合には、これに対応するｊ＝０に相当する調和関数をもった要素信号を調和信号として選出するようにする。このような方法を採れば、演算負担は３倍に増えることになるが、微小な周波数ゆらぎを含む音響信号に対しても、より正確な相関演算を行うことが可能になり、より正確な符号化が可能になる。
【００９１】
(2). 中間の周波数を用いる対処方法
図８の下段の表に示した１２８通りの周波数ｆ（０）〜ｆ（１２７）は、ＭＩＤＩ符号のノートナンバーに対応した周波数であり、比例定数α（αは２の１２乗根）の等比級数をなしている。そこで、第ｎ番目の周波数ｆ（ｎ）について、周波数ｆ（ｎ）をもった正弦関数と余弦関数との合成により得られる第１の合成関数（図８の下段の表に掲載された関数）と、周波数ｆ（ｎ）＊βをもった正弦関数と余弦関数との合成により得られる第２の合成関数と、周波数ｆ（ｎ）／βをもった正弦関数と余弦関数との合成により得られる第３の合成関数と、を定義することにより、合計１２８×３通りの合成関数を用意し、これらを要素信号として用いることにする。ただし、１＜β＜平方根αとなるように設定する。そして、調和信号を選出するための相関演算（図１７のステップＳ４，図２０のステップＳ４１，図２１のステップＳ４２，Ｓ９２，図２２のステップＳ４４，Ｓ４５，Ｓ９２）においては、合計１２８×３通り（候補の絞り込みを行う場合には、１６×３通りまたは３２×３通り）の相関演算を行うようにする。そして、もし、第２の合成関数または第３の合成関数についての相関値が最も高いと判断された場合には、当該合成関数に対応する第１の合成関数を調和信号として選出する。このような方法を採れば、演算負担は３倍に増えることになるが、やはり微小な周波数ゆらぎを含む音響信号に対しても、より正確な相関演算を行うことが可能になり、より正確な符号化が可能になる。
【００９２】
図２５は、上述の手法をより具体的に示す図である。ここでは、β＝３乗根αに設定してある。周波数ｆを対数軸で示した場合、ノートナンバーは、図示のように、周波数軸上で等間隔に位置することになるが（ノートナンバーの間隔が等比級数の比例定数αになる）、β＝３乗根αに設定すると、周波数ｆ（ｎ）＊βおよびｆ（ｎ）／βは、それぞれ、ノートナンバーの間隔を３等分した位置にプロットされる。結局、ノートナンバーｎの近傍には、ノートナンバーｎ、ノートナンバー（ｎ＋１／３）、ノートナンバー（ｎ−１／３）に対応する３通りの周波数をもった要素信号が用意されるので、この３通りのいずれかについての相関が高いと認定された場合には、常に、ノートナンバーｎに対応する要素信号を調和信号として選出するようにすればよい（これは、ＭＩＤＩ符号データには、ノートナンバー（ｎ＋１／３）、ノートナンバー（ｎ−１／３）といった符号データは定義されていないため、ノートナンバーｎに代表させるためである）。
【００９３】
§７．オクターブ下降を利用した手法
一般に、高い周波数をもった要素信号との相関演算は、演算精度が低下する。これは、図２６に示す例のように、区間信号ｘと周波数ｆ（ｎ）の正弦波との相関をとる場合を考えると、周波数ｆ（ｎ）が高くなればなるほど、１周期あたりに対応するサンプル数が少なくなるためである。この程度高い周波数になると、周波数ｆ（ｎ）についての相関値と、周波数ｆ（ｎ＋１）についての相関値との間に、相違が出にくくなる。
【００９４】
このような問題に対処するためには、所定の周波数ｆをもった要素信号に対する相関を求める演算を行う代わりに、正弦関数および余弦関数についての倍角公式を用いることにより、周波数ｆ／２ｑ（ｑは所定の整数）をもった要素信号に対する相関を求める演算を行うようにすればよい。別言すれば、ｑオクターブ低い周波数の関数を用いた相関演算を行えばよい。図２７は、三角関数についての一般的な倍角公式であり、 sin２θについての演算を行う代わりに、２ sinθ・ cosθなる演算を利用することができ、また、 cos２θについての演算を行う代わりに、 cos^２θ− sin^２θなる演算を利用することができることを示している。そこで、たとえば、図２８に示すような式の置換が可能になる。ここで、左辺の sin（２πｆ（ｎ）ｋ／Ｆ）や、 cos（２πｆ（ｎ）ｋ／Ｆ）は、これまで述べてきた種々の式で用いられている関数であるが、これを右辺の式に置換すると、周波数ｆ（ｎ）を、周波数ｆ（ｎ−１２）に置換することができる。ＭＩＤＩにおいてノートナンバーが１２だけ隔たりをもつことは、１オクターブの隔たりをもつことに相当し（１２半音が１オクターブ）、周波数にして２倍の隔たりが生じることになる。
【００９５】
結局、図２９に示すように、ノートナンバー０〜１１までに対応する１２通りの周波数をもった三角関数を基本三角関数と呼ぶことにすれば、ノートナンバー１２以降に対応するすべての周波数をもった三角関数は、この基本三角関数を用いた演算に置き換えることが可能になる。そこで、本発明において、このオクターブ下降を利用した手法を利用すれば、常に、低い周波数をもった三角関数に対する相関演算を行うことができるようになり、より高い精度で相関を求めることが可能になる。
【００９６】
以上、本発明を図示する実施形態に基づいて説明したが、本発明はこれらの実施形態に限定されるものではなく、この他にも種々の態様で実施可能である。特に、上述した種々の符号化処理は、実際にはコンピュータを用いて実行されるものであり、本発明による符号化処理を実現するためのプログラムは、磁気ディスクや光ディスクなどのコンピュータ読み取り可能な記録媒体に記録して供給することができ、また、本発明による符号化処理によって符号化された符号データは、同様に、磁気ディスクや光ディスクなどのコンピュータ読み取り可能な記録媒体に記録して供給することができる。
【００９７】
【発明の効果】
以上のとおり本発明に係る音響信号の符号化方法によれば、ＭＩＤＩデータのような符号データへの変換を高い品質をもって行うことが可能になる。
【図面の簡単な説明】
【図１】先願発明に係る音響信号の符号化方法の基本原理を示す図である。
【図２】図１(c) に示す強度グラフに基いて作成された符号コードを示す図である。
【図３】時間軸上に部分的に重複するように単位区間設定を行うことにより作成された符号コードを示す図である。
【図４】時間軸上に部分的に重複するような単位区間設定の具体例を示す図である。
【図５】周波数軸を線形尺度で表示したフーリエスペクトルの一例を示すグラフである。
【図６】周波数軸を対数尺度で表示したフーリエスペクトルの一例を示すグラフである。
【図７】周波数軸を対数尺度で表示したフーリエスペクトルとノートナンバーとの対応関係を示すグラフである。
【図８】符号化の対象となる区間信号ｘと、これを分解するために用意された１２８通りの要素信号とを示す図である。
【図９】調和解析の基本方針を説明するための式を示す図である。
【図１０】フーリエスペクトルのピークに基づいて、最も相関の高い要素信号を調和信号として選出する概念を示す図である。
【図１１】選出された調和信号についての相関値を求めるための式を示す図である。
【図１２】選出された調和信号に基づいて、含有信号Ｇ（ｋ）を定義する式を示す図である。
【図１３】区間信号ｘと含有信号Ｇとの差分信号を求める例を示すグラフである。
【図１４】選出された調和信号についての相関値を求めるための一般式を示す図である。
【図１５】選出された調和信号に基づいて、含有信号Ｇｉ（ｋ）を定義する一般式を示す図である。
【図１６】区間信号ｘｉ（ｋ）と含有信号Ｇｉ（ｋ）との差分信号を、新たな区間信号ｘ（ｉ＋１）（ｋ）とする一般式を示す図である。
【図１７】本発明に係る音響信号の符号化方法の基本手順を示す流れ図である。
【図１８】フーリエ変換における一般的な相関値の決定原理を示す図である。
【図１９】本発明で利用する簡易相関演算法の基本原理を示す図である。
【図２０】簡易相関演算法を利用した音響信号の符号化方法の基本手順を示す流れ図である。
【図２１】要素信号候補を絞り込む手法を利用した音響信号の符号化方法の基本手順を示す流れ図である。
【図２２】要素信号候補を絞り込む手法を利用した音響信号の符号化方法の別な基本手順を示す流れ図である。
【図２３】調和関数ととも用意する非調和関数の例を示す図である。
【図２４】調和関数と非調和関数との関係を説明するための図である。
【図２５】中間周波数をもった要素信号を用意する例を説明するための図である。
【図２６】比較的高い周波数をもった正弦波に対する相関演算を示す図である。
【図２７】三角関数の倍角公式を示す図である。
【図２８】三角関数の倍角公式を利用した式の置換方法を示す図である。
【図２９】本発明に適用可能なオクターブ下降法を説明する図である。
【符号の説明】
Ａ…複素強度
Ａ（ｎ），Ｂ（ｎ）…係数（相関値）
ｄ１〜ｄ５…単位区間
Ｅ，Ｅ（ｎ），ＥＥ（ｎ）…実効強度
Error …誤差値
ｅ（ｉ，ｊ）…符号コードｎ（ｉ，ｊ）の実効強度
Ｆ…サンプリング周波数
ｆ，ｆ（ｎ）…周波数
Ｇ（ｋ）…含有信号
ｉ…繰り返し回数を示すパラメータ
Ｉ…所定回数値
ｊ…ピーク位置番号示すパラメータ／非調和関数を示すパラメータ
Ｊ…ピーク位置の総数
ｋ…１単位区間内のサンプル番号
Ｌ…単位区間の区間長
ΔＬ…オフセット長
Ｍ…測定ポイントの数
ｎ，ｎ１，ｎ２，ｎ３…ノートナンバー
ｎ（ｉ，ｊ）…単位区間ｄｉについて抽出された第ｊ番目の符号コード
Ｐ１〜Ｐ５…ピーク位置番号
Ｓ１（ｎ），Ｓ２（ｎ）…三角関数との相関値
ＳＳ１（ｎ），ＳＳ２（ｎ）…三角関数との簡易相関値
Ｔ１〜Ｔ３…トラック
ｔ１〜ｔ６…時刻
ｗ…１単位区間内のサンプル数
ｘ，ｘｉ…区間信号
ξ（ｋ）…近似関数[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for encoding an acoustic signal, and relates to a technique for encoding an acoustic signal given as a time-series intensity signal, decoding it, and reproducing it. In particular, the present invention is suitable for a process for efficiently converting a general acoustic signal into MIDI format code data, and includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet distribution), package media. (CD, MD, cassette, video, LD, CD-ROM, game cassette) etc. are expected to be applied to various industrial fields that produce various audio contents.
[0002]
[Prior art]
As a technique for encoding an acoustic signal, a PCM (Pulse Code Modulation) technique is the most popular technique, and is currently widely used as a recording system for audio CDs, DAT, and the like. The basic principle of this PCM method is that analog audio signals are sampled at a predetermined sampling frequency, and the signal intensity at each sampling is quantized and expressed as digital data. The sampling frequency and the number of quantization bits can be increased. The more you play, the more faithfully the original sound can be played. However, the higher the sampling frequency and the number of quantization bits, the more information is required. Therefore, as a technique for reducing the amount of information as much as possible, an ADPCM (Adaptive Differential Pulse Code Modulation) technique that encodes only a signal change difference is also used.
[0003]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, when reproducing actual sound, a separate MIDI sound source storing the waveform of the instrument sound is required. However, compared to the case where sound is recorded by the PCM method described above, the amount of information is extremely small, and the high coding efficiency is attracting attention. The encoding and decoding technology based on the MIDI standard is widely used in software for performing musical instruments, practicing musical instruments, and composing music using a personal computer, and is widely used in fields such as karaoke and game sound effects. Has been.
[0004]
[Problems to be solved by the invention]
As described above, when an acoustic signal is encoded by the PCM method, if an attempt is made to ensure sufficient sound quality, the amount of information becomes enormous and the burden of data processing must be increased. Therefore, normally, in order to limit the amount of information to a certain level, a certain level of sound quality must be compromised. Of course, if the encoding method based on the MIDI standard is adopted, it is possible to reproduce a sound having a sufficient sound quality with a very small amount of information. However, as described above, the MIDI standard itself originally performed the operation of the musical instrument. Since it is for encoding, it cannot be widely applied to general sound. In other words, in order to create MIDI data, it is necessary to actually play a musical instrument or prepare information on a musical score.
[0005]
As described above, both the conventional PCM method and the MIDI method have advantages and disadvantages in the method of encoding an acoustic signal, and sufficient sound quality is ensured with a small amount of information for general sound. I can't do it. However, there is an increasing demand for efficient encoding of general sound. In the field of human voice and singing voice called so-called vocal sound, such a request has been strongly issued for some time. For example, in the fields of language education, vocal music education, criminal investigation and the like, there is a strong demand for a technique for efficiently encoding a vocal acoustic signal. In order to meet such a demand, Japanese Patent Application No. 9-273949 and Japanese Patent Application No. 10-283453 propose a new encoding method that can use MIDI data. . In these methods, a procedure is performed in which a plurality of unit sections are set along the time axis of the acoustic signal, a spectrum is obtained by performing Fourier transform for each unit section, and MIDI data corresponding to the spectrum is created. The However, the frequency analysis method using the Fourier transform is originally defined on the premise that a signal with a constant frequency is infinitely continuous on the time axis, so each unit section having a finite width on the time axis. When used for the analysis of the above, faithful encoding cannot always be performed. For this reason, there has been a problem in terms of high-quality encoding.
[0006]
Therefore, an object of the present invention is to provide an audio signal encoding method capable of performing conversion into code data such as MIDI data with high quality.
[0007]
[Means for Solving the Problems]
(1) A first aspect of the present invention is an acoustic signal encoding method for encoding an acoustic signal given as a time-series intensity signal.
A section signal extraction stage that sets a plurality of unit sections on the time axis of the acoustic signal to be encoded and extracts a section signal for each unit section,
Element signal preparation stage for preparing a plurality of element signals to be components of this section signal,
A harmonic signal selection stage for selecting, as a harmonic signal, an element signal having the highest correlation value with respect to the section signal from the prepared multiple element signals,
A difference signal calculation stage for obtaining a difference signal by subtracting the inclusion signal given by the product of the harmonic signal and its correlation value from the interval signal;
Using the difference signal as a new interval signal, the harmonic signal selection step and the difference signal calculation step are executed to obtain a new inclusion signal and a new difference signal, and a plurality of inclusion signals are obtained by repeated processing. An encoding stage for generating a plurality of code codes for expressing the interval signal based on the inclusion signal;
The acoustic signal is expressed by a set of code codes generated for each unit section.
[0008]
(2) According to a second aspect of the present invention, in the audio signal encoding method according to the first aspect described above,
In the element signal preparation stage, prepare multiple kinds of element signals with different frequencies,
In the harmonic signal selection stage, the Fourier transform is performed on the section signal, and the element signal corresponding to the peak frequency of the obtained Fourier spectrum is selected as the harmonic signal.
[0009]
(3) A third aspect of the present invention is the acoustic signal encoding method according to the first aspect described above,
In the harmonic signal selection stage, simple correlation calculation is performed to calculate the correlation value using only information on the peak position of the section signal, and the harmonic signal is selected based on the correlation value obtained as a result of the simple correlation calculation,
In the differential signal calculation stage, the correlation value is recalculated using all the information of the selected harmonic signal, and the calculation for obtaining the contained signal is performed using the correlation value obtained as a result of the recalculation. .
[0010]
(4) According to a fourth aspect of the present invention, in the audio signal encoding method according to the first aspect described above,
When selecting the first harmonic signal for the section signal of each unit section, among the plurality of X element signals, a plurality of Y signals from the first to the Y-th in descending order of the correlation value with respect to the section signal ( Y <X) candidates are selected, the first candidate is selected as the first harmonic signal, and when selecting the second and subsequent harmonic signals, a plurality of already selected Y pieces are selected. The element signal having the highest correlation value with respect to the section signal is selected from the candidates as the harmonic signal.
[0011]
(5) According to a fifth aspect of the present invention, in the audio signal encoding method according to the first aspect described above,
In the section signal extraction stage, settings are made such that adjacent unit sections partially overlap on the time axis.
[0012]
(6) A sixth aspect of the present invention is the acoustic signal encoding method according to the fifth aspect described above,
When selecting the harmonic signal for the section signal of the first unit section, a plurality of Z elements (Z from the first to the Zth order in the descending order of the correlation value with respect to the section signal from among the plurality of X element signals (Z <X) candidates are selected, and harmonic signals are selected from the plurality of Z candidates.
When selecting a harmonic signal for the section signal of the second unit section that overlaps the first unit section over a predetermined time on the time axis, a plurality of Z candidates already selected are selected. A harmonic signal is selected from the inside.
[0013]
(7) A seventh aspect of the present invention is the acoustic signal encoding method according to the first to sixth aspects described above,
In the element signal preparation stage, a synthesis function of a sine function and a cosine function having the same frequency is set as one element signal, and each synthesis function for a plurality of X frequencies forming a geometric series is set as each element signal. It is what I did.
[0014]
(8) An eighth aspect of the present invention is the acoustic signal encoding method according to the first to sixth aspects described above,
In the element signal preparation stage, a plurality of X frequencies forming a geometric series are defined, and for the nth (n = 1, 2,..., X) frequency f (n),
A first synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) in the section;
A sine function and a cosine function that are defined in the same section as the unit section and in which the frequency continuously changes from the section start frequency f (n) to the section end frequency f (n-1) in this section. A second synthesis function obtained by synthesis;
By synthesizing a sine function and a cosine function that are defined in the same section as the unit section and in which the frequency continuously changes from the section start frequency f (n) to the section end frequency f (n + 1). A third composite function obtained;
Are defined, a total of 3X synthesis functions are defined, and a correlation value is calculated using each of these synthesis functions as an element signal, and a correlation value for the second synthesis function or the third synthesis function is obtained. When it is determined that the value is highest, the first synthesis function corresponding to the synthesis function is selected as a harmonic signal.
[0015]
(9) A ninth aspect of the present invention is the acoustic signal encoding method according to the first to sixth aspects described above,
In the element signal preparation stage, a plurality of X frequencies forming a geometric series of the proportional constant α are defined, and the n th (n = 1, 2,..., X) frequency f (n)
A first synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) in the section;
A second synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) * β in the section;
A third synthesis function defined in the same interval as the unit interval and obtained by combining a sine function and a cosine function having a frequency f (n) / β in this interval;
Define a total of 3X synthesis functions (where 1 <β <square root α), and perform an operation for obtaining a correlation value using each of these synthesis functions as an element signal, and the second synthesis function or When it is determined that the correlation value for the third synthesis function is the highest, the first synthesis function corresponding to the synthesis function is selected as a harmonic signal.
[0016]
(10) A tenth aspect of the present invention is the acoustic signal encoding method according to the seventh to ninth aspects described above.
The frequency corresponding to each note number used in MIDI data is used as a plurality of X frequencies,
At the encoding stage, the acoustic signal of each unit section was determined based on the note number corresponding to the frequency of each contained signal, the velocity determined based on the amplitude, and the length of the unit section. This is expressed by MIDI format code data consisting of data indicating delta time.
[0017]
(11) An eleventh aspect of the present invention is the acoustic signal encoding method according to the first to tenth aspects described above,
Instead of performing an operation for obtaining a correlation with respect to an element signal having a predetermined frequency f, by using a double angle formula for a sine function and a cosine function, an element signal having a frequency f / 2q (q is a predetermined integer) is used. An operation for obtaining a correlation is performed.
[0018]
(12) According to a twelfth aspect of the present invention, a program for causing a computer to execute the acoustic signal encoding method according to the first to eleventh aspects is recorded on a computer-readable recording medium. It is a thing.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described based on the illustrated embodiments.
[0020]
§1. Basic principle of audio signal encoding method using Fourier transform
First, the basic principle of an acoustic signal encoding method using Fourier transform, which is proposed in Japanese Patent Application No. 10-283453, which is a prior invention to the present invention, will be described. Assume that an analog acoustic signal is given as a time-series intensity signal, as shown in FIG. In the illustrated example, this acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis. Here, first, the analog sound signal is processed as digital sound data. This may be performed by using a conventional general PCM method, sampling the analog acoustic signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined number of quantization bits. Here, for convenience of explanation, the waveform of the acoustic data digitized by the PCM method is also shown by the same waveform as the analog acoustic signal of FIG.
[0021]
Subsequently, a plurality of unit sections are set on the time axis of the acoustic signal to be encoded. In the example shown in FIG. 1 (a), six times t1 to t6 are defined at equal intervals on the time axis t, and five unit intervals d1 to d5 having these times as start points and end points are set ( A more practical section setting method will be described later).
[0022]
When the unit interval is set in this way, a Fourier transform is performed on each acoustic signal (hereinafter referred to as interval signal) for each unit interval to create a spectrum. At this time, the weighted function such as Hanning Window is used to apply a Fourier transform by filtering the extracted section signal. In general, in the Fourier transform, it is assumed that the same signal exists infinitely before and after the extracted section. Therefore, when a weight function is not used, high frequency noise often appears in the created spectrum. If a weight function such as a Hanning window function is used such that the weights at both ends of the section are 0, such a harmful effect can be suppressed to some extent. The Hanning window function H (k) is expressed as follows:
H (k) = 0.5−0.5 * cos (2πk / L)
Is a function given by
[0023]
FIG. 1 (b) shows an example of a spectrum created for the unit section d1. In this spectrum, the frequency component (0 to F: F is a sampling frequency) included in the section signal for the unit section d1 is indicated by the frequency f defined on the horizontal axis, and is defined on the vertical axis. The complex intensity A for each frequency component is indicated by the complex intensity A.
[0024]
Next, a plurality of X code codes are discretely defined corresponding to the frequency axis f of this spectrum. In this example, note numbers n used in MIDI data are used as code codes, and 128 code codes from n = 0 to 127 are defined. The note number n is a parameter indicating the scale of the note. For example, the note number n = 69 indicates the “ra sound (A3 sound)” at the center of the piano keyboard, and corresponds to a sound of 440 Hz. As described above, since the predetermined frequency is associated with each of the 128 note numbers, 128 note numbers n are discretely defined at predetermined positions on the frequency axis f of the spectrum. Become.
[0025]
Here, note number n does not correspond linearly to frequency axis f because it indicates a logarithmic scale in which the frequency is doubled by one octave. Therefore, here, an intensity graph in which the frequency axis f is expressed on a logarithmic scale and the note number n is defined on the logarithmic scale axis will be created. FIG.1 (c) shows the intensity | strength graph about the unit area d1 produced in this way. The horizontal axis of the intensity graph is obtained by converting the horizontal axis of the spectrogram shown in FIG. 1B to a logarithmic scale, and note numbers n = 0 to 127 are plotted at equal intervals. On the other hand, the vertical axis of this intensity graph is obtained by converting the complex intensity A of the spectrum shown in FIG. 1 (b) into the effective intensity E, and indicates the intensity at the position of each note number n. In general, the complex intensity A obtained by Fourier transform is represented by a real part R and an imaginary part I, but the effective intensity E is E = (R²+ I²)^1/2Can be obtained by the following calculation.
[0026]
The intensity graph of the unit interval d1 thus obtained is a graph indicating the ratio of each vibration component corresponding to the note number n = 0 to 127 as the effective intensity with respect to the vibration component included in the interval signal for the unit interval d1. it can. Therefore, P note numbers are selected from all X (in this example, X = 128) note numbers based on the effective intensities shown in the intensity graph, and the P note numbers n are selected. Is extracted as a representative code code representing the unit interval d1. Here, for convenience of explanation, it is assumed that P = 3 and three note numbers are extracted as representative code codes from a total of 128 candidates. For example, if extraction is performed based on the criterion “P code codes are extracted from candidates in descending order of strength”, the note number is used as the first representative code code in the example shown in FIG. n (d1,1) is extracted as the second representative code code, and note number n (d1,3) is extracted as the third representative code code. Become.
[0027]
When P representative code codes are extracted in this way, a section signal for the unit section d1 can be expressed by these representative code codes and their effective intensities. For example, in the case of the above example, in the intensity graph shown in FIG. 1 (c), the effective intensities of the note numbers n (d1,1), n (d1,2), n (d1,3) are e (d1,1), respectively. If 1), e (d1,2) and e (d1,3), the acoustic signal of the unit interval d1 can be expressed by the following three data pairs.
n (d1,1), e (d1,1)
n (d1,2), e (d1,2)
n (d1,3), e (d1,3)
Although the processing for the unit section d1 has been described above, the same processing is performed separately for each of the unit sections d2 to d5, and data representing the representative code code and its strength is obtained. For example, for the unit section d2,
n (d2,1), e (d2,1)
n (d2,2), e (d2,2)
n (d2,3), e (d2,3)
Three sets of data pairs are obtained. In this way, the original sound signal can be encoded by the data obtained for each unit section.
[0028]
FIG. 2 is a conceptual diagram of encoding by the above-described method. FIG. 2 (a) shows a state in which five unit sections d1 to d5 are set for the original sound signal, as in FIG. 1 (a). FIG. 2 (b) shows each unit section. The obtained code data is shown in a note format. In this example, three representative code codes are extracted for each unit section (P = 3), and data relating to these representative code codes are accommodated in three tracks T1 to T3. For example, representative code codes n (d1,1), n (d1,2), n (d1,3) extracted for the unit section d1 are accommodated in tracks T1, T2, T3, respectively. However, FIG. 2 (b) is a conceptual diagram showing the code data obtained by the above-described method in the form of a note, and in fact, data relating to strength is added to each note. For example, the track T1 includes e (d1,1), e (d2,1) together with data indicating the scale of note numbers n (d1,1), n (d2,1), n (d3,1). , E (d3, 1)... Is stored.
[0029]
Note that the MIDI format is not necessarily adopted as the encoding format adopted here, but since the MIDI format is the most popular as this type of encoding format, the code data in the MIDI format is practically used. Most preferably, is used. In the MIDI format, “note-on” data or “note-off” data exists while interposing “delta time” data. “Note-on” data is data that designates a specific note number N and velocity V to instruct the start of performance of a specific sound, and “note-off” data is specific note number N and velocity V. Is data that designates the end of the performance of a specific sound. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which the piano keyboard is pressed down (velocity at note-on) and the speed at which the finger is released from the keyboard (velocity at note-off). Or it shows the strength of the performance end operation.
[0030]
In the above-described method, P note numbers n (di, 1), n (di, 2),..., N (di, P) are obtained as representative code codes for the i-th unit interval di. Effective strength e (di, 1), e (di, 2),..., E (di, P) is obtained for each. Therefore, MIDI format code data can be created by the following method. First, as the note number N described in the “note on” data or the “note off” data, the obtained note numbers n (di, 1), n (di, 2),..., N (di, P ) As it is. On the other hand, as the velocity V described in the “note-on” data or “note-off” data, the obtained effective intensities e (di, 1), e (di, 2),..., E (di, P ) Is normalized so that the value is in the range of 0 to 1, and a value obtained by multiplying, for example, 127 by the square root of the normalized effective intensity E is used. That is, when the maximum value for the effective strength E is Emax,
V = (E / Emax)^1/2・ 127
A value V obtained by the following calculation is used as velocity. Or take the logarithm,
V = log (E / Emax) .127 + 127
(However, V = 0 if V <0)
The value V obtained by the following calculation may be used as the velocity. The “delta time” data may be set according to the length of each unit section.
[0031]
Eventually, in the above-described embodiment, MIDI code data composed of three tracks is obtained. When this MIDI code data is reproduced using three MIDI sound sources, an audio signal is reproduced as a stereo reproduction sound of 6 channels.
[0032]
§2. More practical section setting method
In §1 described above, a very simple section setting example has been described. Here, a more practical technique for setting a section will be described. In the example shown in FIG. 2A, five unit intervals d1 to d5 are set with six times t1 to t6 defined at equal intervals on the time axis t as boundaries. When encoding is performed based on such a section setting, discontinuity of sound tends to occur at the time that becomes a boundary during reproduction. Therefore, in practice, it is preferable to set a section in which adjacent unit sections partially overlap on the time axis.
[0033]
FIG. 3 (a) is an example in which such partially overlapping sections are set. The unit sections d1 to d4 shown in the figure are all partially overlapped. When the above-described processing is performed based on such section setting, as shown in the conceptual diagram of FIG. 3B. Encoding is performed. In this example, the center of each unit section is used as a reference position, and each note is arranged at each reference position. However, the relative reference position with respect to the unit section is not necessarily set at the center. Comparing the conceptual diagram shown in FIG. 3 (b) with the conceptual diagram shown in FIG. 2 (b), it can be seen that the density of the notes is increased. If overlapping sections are set in this way, the number of code data to be created increases, but natural encoding that does not cause discontinuity of sound during reproduction becomes possible.
[0034]
FIG. 4 is a diagram illustrating a specific method for setting a partially overlapping section on the time axis. In this specific example, an acoustic signal is sampled at a sampling frequency of 22 kHz to be captured as digital acoustic data, the section length L of each unit section is set to 1024 samples (about 47 msec), and the deviation amount for each unit section Is set to 20 samples (about 0.9 msec). That is, for an arbitrary i, the distance on the time axis between the starting point of the i-th unit section and the starting point of the (i + 1) -th unit section is set to the offset length ΔL. For example, the first unit interval d1 includes the 1st to 1024th samples, and the second unit interval d2 includes the 21st to 1044th samples shifted by 20 samples.
[0035]
§3. Efficient calculation method of spectral intensity
The basic procedure of the encoding method according to the principle explained in FIG. 1 is as follows. First, as shown in FIG. 1 (a), a plurality of unit sections d1, d2, d3,. Then, a Fourier transform is performed on the acoustic data in the section d1, and a spectrum as shown in FIG. 1 (b) is obtained. As shown in FIG. 1 (c), several peaks corresponding to the peak frequency of this spectrum are obtained. The sound signal in the section d1 is expressed by the symbols n (d1,1), n (d1,2), and n (d1,3). Here, an efficient calculation method for obtaining a spectrum as shown in FIG. 1 (b) will be described.
[0036]
For a signal having a vibration component as shown in FIG. 1 (a), when obtaining a spectrum as shown in FIG. 1 (b), the Fourier transform is generally used. An operation using the (FFT) method is performed. However, a general Fourier transform is based on the premise that a spectrum using a linear frequency axis is obtained, and is not necessarily suitable for conversion to non-linear code data such as MIDI data. This is due to the following reason.
[0037]
Consider a Fourier spectrum with a linear scale as shown in FIG. This Fourier spectrum is a graph in which the horizontal axis represents the frequency f on a linear scale and the vertical axis represents the spectrum intensity. Here, on the horizontal axis (frequency axis), a plurality of M measurement points are discretely defined at equal intervals, and the spectrum intensity of each measurement point is shown as a bar graph. The lower column (1) of the graph shows the number of each measurement point, and the lower column (2) of the graph shows the frequency value corresponding to each measurement point. In this example, an acoustic signal is captured as data at a sampling frequency F = 22.05 kHz, and the number M of measurement points is set to 1024. Therefore, each of the 1024 measurement points in total, from the 0th measurement point where the frequency f = 0 to the 1023rd measurement point where the frequency f = 1014 Hz (approximately half of the sampling frequency F). , The spectral intensity corresponding to the length of the bar graph is obtained. In general Fourier transform, the spectrum intensity is obtained for each of a large number of measurement points defined at equal intervals on the linear frequency axis.
[0038]
However, as shown in FIG. 5, a spectrum in which intensities are obtained at measurement points defined at equal intervals on a linear frequency axis is converted into a code system having nonlinear characteristics with respect to frequency, such as MIDI data. It is not efficient to use for conversion. FIG. 6 is a rewrite of the frequency axis of the spectrum shown in FIG. 5 to a logarithmic scale. The lower column (1) of the graph shows the number of each measurement point, and the lower column (2) of the graph shows the note number (corresponding to log f) associated with each measurement point. Has been. The number of measurement points M = 1024 is the same as in FIG. 5, but since the frequency axis is a logarithmic scale, the measurement points are not arranged at equal intervals on the horizontal axis. In other words, the measurement points are coarsely arranged in the low frequency region, but the measurement points are densely arranged in the high frequency region.
[0039]
In the low frequency region in the example of FIG. 6, note number n = 4 for the first measurement point, note number n = 16 for the second measurement point, note number n = for the third measurement point. 24, but there is no corresponding measurement point for the note number located in the middle of these, so the spectrum intensity cannot be obtained, so to speak, it is like a tooth comb. Yes. Therefore, with the settings of the sampling frequency F = 22.05 kHz and the number of measurement points M = 1024, the intensity for the note numbers n = 5 to 15 and 17 to 23 cannot be defined. Of course, if the number of measurement points M = 1024 is further increased, it is possible to eliminate the state of missing teeth, but it is inefficient to perform calculations for such a large number of measurement points. is there.
[0040]
Conversely, in the high frequency region, a total of 54 measurement points from the 970th measurement point to the 1023rd measurement point are assigned to the same note number n = 124. Of course, in this case, there is no problem if the average value of the spectrum intensity for all 54 measurement points is defined as the intensity for the note number n = 124, but the intensity value for one note number n = 124 is obtained. In addition, it is inefficient to perform calculations on as many as 54 measurement points.
[0041]
After all, in order to efficiently convert to a non-linear code code such as MIDI data, a plurality of M measurement points are discretely defined on the frequency axis according to the required code code and included in the acoustic signal. It is only necessary to obtain only the spectral intensities for the frequency components corresponding to the M measurement points. In particular, when conversion to MIDI data is performed, a plurality of M measurement points may be defined discretely so as to be equally spaced on the logarithmic scale frequency axis. In other words, a plurality of M measurement points may be defined discretely so that the frequencies of the measurement points form a geometric progression. FIG. 7 is a diagram showing a part of the measurement points defined as described above. Note numbers n = 60 to 65 are assigned to the measurement points shown in the drawing, and these measurement points are equally spaced on the logarithmic scale frequency axis. Further, focusing on

specific frequency values

262, 278, 294,... At each measurement point, a geometric progression is formed. When calculating the spectrum intensity by Fourier transform, if only the spectrum intensity for each of these measurement points is calculated, unnecessary calculation can be omitted. A specific method for performing such efficient calculation without waste is described in detail in the above-mentioned Japanese Patent Application No. 10-283453, and will not be described in detail here.
[0042]
§4. Coding method using generalized harmonic analysis
As described above, in §1 to §3, the outline of the encoding method using Fourier analysis proposed in the prior application invention is described. The encoding method proposed in the present application is basically the same as the encoding method of the prior invention described above. That is, a plurality of unit sections are set on the time axis of the acoustic signal to be encoded, and each unit section is a section signal (a portion located in each unit section of the acoustic signal to be encoded). ) Are extracted and each section signal is replaced with a predetermined code code. However, when replacing each section signal with a predetermined code code, the Fourier analysis technique is used in the prior invention, but the generalized harmonic analysis technique is used in the present invention.
[0043]
For example, assume that a section signal x is given for a certain unit section d as shown in the upper part of FIG. Here, it is assumed that sampling is performed at the sampling frequency F for the unit section d having the section length L, and w sample values are obtained in total, and the sample numbers are 0,

Let

1, 2, 3, ..., k, ..., w-2, w-1. Here, for an arbitrary sample number k, an amplitude value x (k) is given as digital data.
[0044]
In the case of the encoding method using Fourier analysis proposed in the prior invention, the Fourier spectrum for this section signal x is obtained, and a predetermined number of note numbers corresponding to frequencies having high spectral intensities are selected and selected. MIDI encoding was performed based on the note number and its spectral intensity. However, since the Fourier analysis is originally an analysis method for a signal waveform that is infinitely continuous on the time axis, an interval signal that exists only within a finite period of interval length L, as in the example of FIG. When applied to the analysis of x, an accurate frequency analysis cannot be performed. For this reason, as described above, there is a problem when performing high-quality encoding.
[0045]
The basic concept of generalized harmonic analysis applied in the present invention is that the interval signal x is decomposed into I harmonic functions defined in advance and handled. Although it is considered that a general acoustic signal contains a harmonic function of 20 Hz to 20 kHz continuously as an audible region, the purpose of the present application is to provide a given acoustic signal with 128 discontinuities defined by MIDI. This is an attempt to forcibly express at a certain frequency. In other words, an attempt is made to express a seemingly random signal waveform as shown in FIG. 8 as the sum of a plurality of signal waveforms defined by mathematical expressions. For this purpose, first, a plurality of element signals to be candidates for components of the section signal x are prepared. Here, 128 element signals as shown in the lower table of FIG. 8 are prepared. Each element signal is composed of a composite function of a sine function and a cosine function having the same frequency, and corresponds to note numbers 0 to 127, respectively. For example, the element signal corresponding to the note number n is a composite function of a sine function sin (2πf (n) k / F) having a frequency f (n) and a cosine function cos (2πf (n) k / F). Given. The variable k is the sample number shown in the upper part of FIG. 8, F is the sampling frequency, and the above trigonometric function term (k / F) corresponds to the time t when the left end position of the unit interval d is used as a reference. It is. Further, A (n) and B (n) added to the head of each trigonometric function in the lower table of FIG. 8 are coefficients indicating the amplitude. However, each element signal is a signal defined only in the same section as the unit section d in which the section signal x exists. If the frequencies corresponding to the note numbers 0 to 127 are f (0) to f (127), these frequencies form a geometric series (if the note number is separated by 12, the frequency is 1 octave apart. And the frequency is doubled).
[0046]
The purpose of the generalized harmonic analysis performed here is to obtain an approximate function ξ (k) that minimizes the error value Error as shown in the equation of FIG. 9 for the function x (k) corresponding to the interval signal x. That is. The error value Error is a sum of square errors between the function x (k) and the approximate function ξ (k) at each of the w sample number positions (0 to (w−1)). If this error value Error is small, The smaller the function is, the closer the approximate function ξ (k) is to the function x (k). The approximate function ξ (k) is the sum of 128 element signals (listed in the lower table of FIG. 8) as shown in the equation of FIG. 9, and the coefficient A ( n) and B (n) are specified respectively. In other words, the values of the coefficients A (0) to A (127) and B (0) to B (127) of the trigonometric functions listed in the lower table of FIG. Then, if the sum of all these trigonometric functions is obtained, this sum becomes the approximate function ξ (k). Obtaining an approximate function ξ (k) that minimizes the error value Error is nothing but obtaining individual coefficient values for each element signal that is a component of such an approximate function ξ (k). . Thus, in order to obtain the coefficient value for minimizing the error value Error, for example, the error value Error is calculated for a huge number of combinations including all possible values of individual coefficient values, and the minimum value is obtained. What is necessary is just to take the combination of the obtained coefficient value. However, such a method is not realistic because the calculation burden becomes enormous. Moreover, the sound that can be synthesized simultaneously with the MIDI sound source that is currently available is 16 in the standard, and when a given acoustic signal is decomposed into I harmonic functions, it cannot be reproduced unless I is set to 16 or less. . Therefore, in the present application, generalized harmonic analysis is performed by the following simple method.
[0047]
First, when an interval signal x as shown in the upper part of FIG. 8 is given, for the time being, a Fourier transform is performed on the interval signal x to obtain a Fourier spectrum as shown in FIG. The computational burden of the Fourier transform is such that it can be executed using a general personal computer if a technique such as FFT is used, and particularly corresponds to 128 note numbers as described in §3 above. If an efficient calculation method for obtaining the spectrum intensity for only the frequency position is adopted, it can be sufficiently executed by a personal computer (in general, the term “Fourier transform” refers to a process for obtaining a frequency spectrum having a linear frequency axis. In the present specification, it is used in a broad sense including processing for obtaining a frequency spectrum having a logarithmic frequency axis as described in §3). Subsequently, an element signal corresponding to the peak frequency of the Fourier spectrum thus obtained is selected as a harmonic signal. Here, the harmonic signal refers to a signal having the highest correlation value with respect to the section signal x among a plurality of element signals. For example, in the case of the example shown in FIG. 10, an element signal corresponding to the peak frequency f (n) of the Fourier spectrum is selected as a harmonic signal (the peak position exactly matches one of 128 frequencies). If not, take the nearest frequency on the frequency axis). In this example, as the element signal having the highest correlation value based on Fourier analysis, a sine function sin (2πf (n) k / F) having a frequency f (n) and a cosine function cos (2πf (n) k / F ) Is selected. As described above, the frequency f (n) is a frequency corresponding to the note number n, and the relational expression shown in the lower part of FIG. 10 is established between the frequency f (n) and the note number n. The frequency f (69) corresponding to the note number n = 69 is 440 Hz.
[0048]
When a composite function of the sine function sin (2πf (n) k / F) having the frequency f (n) and the cosine function cos (2πf (n) k / F) is selected as a harmonic signal, Based on the equation shown in FIG. 11, coefficients A (n) and B (n) are obtained. Here, these coefficients are actually correlation values between the harmonic signal and the section signal x. That is, the coefficient A (n) is a value indicating the correlation between the sine function sin (2πf (n) k / F) and the interval signal x (k), and the coefficient B (n) is the cosine function cos ( 2πf (n) k / F) and a value indicating the correlation between the section signal x (k).
[0049]
For example, focusing on the right side of the equation for obtaining the coefficient A (n), the value of the interval signal x (k) and the value of the sine function sin (2πf (n) k / F) are obtained at the k-th sample position. The product is sought, but if both functions are identical (in other words, have the greatest correlation), both functions are independent of the value of sample position k. Since the value always has the same sign, the product of both is always positive. Therefore, the sum for k = 0 to (w−1), that is, the value of the coefficient A (n) is a large positive value. On the other hand, if there is no correlation between the two functions, the two function values may have the same sign or different signs depending on the value of the sample position k. It can be positive or negative. Therefore, the sum for k = 0 to (w−1), that is, the value of the coefficient A (n) is close to 0.
[0050]
As described above, a composite function of the sine function sin (2πf (n) k / F) having the frequency f (n) and the cosine function cos (2πf (n) k / F) is selected as the harmonic signal. This is because the frequency f (n) showed a peak in the Fourier spectrum shown in FIG. Accordingly, since the interval signal x is expected to contain the most component of the element signal having the frequency f (n), the coefficients A (n) and B (n), which are values indicating the correlation, Should be relatively large.
[0051]
Here, a signal G (k) as shown in FIG. 12 is defined. This signal G (k) is a signal given by the product of the harmonic signal (the above trigonometric functions) and the correlation values (coefficients A (n) and B (n) described above) obtained for the harmonic signal. In other words, an amplitude value corresponding to the correlation value is given to the selected harmonic signal. In other words, the signal G (k) can be said to be one of the main constituent signals included in the section signal x (k). As described above, the purpose of the generalized harmonic analysis is to obtain an approximate function ξ (k) that approximates the interval signal x, but the signal G (k) is a component of the approximate function ξ (k). It will be one. Therefore, in the present specification, the signal G (k) is referred to as a “included signal” in the sense that it is one of the signals included in the section signal x (k). Of course, the section signal x (k) includes many other signals. In addition to the first inclusion signal G (k) obtained by the above-described method, a signal to be the inclusion signal is also included. I need to find it.
[0052]
For this purpose, a difference calculation as shown in FIG. 13 is performed. That is, the difference signal is obtained by subtracting the inclusion signal G from the section signal x. Specifically, an operation of x (k) −G (k) may be performed for all k values (k = 0 to (w−1)). The differential signal thus obtained can be said to be a signal composed of signal components other than the first inclusion signal G (k). Therefore, if this difference signal is called a new interval signal (the aforementioned interval signal x (k) is called the first interval signal, the difference signal obtained by x (k) -G (k) is called the second interval signal. If the same method as the above method is repeatedly executed, the second inclusion signal can be obtained this time. This second inclusion signal is a signal included as a component in the first interval signal x (k) together with the first inclusion signal. Furthermore, the second difference signal is obtained by subtracting the second inclusion signal from the second interval signal, and this second difference signal is used as a new interval signal, that is, the third interval signal, and the same method is used. If it repeats, a 3rd inclusion signal can be calculated | required.
[0053]
By repeating such processing, a plurality of P contained signals can be obtained, and a plurality of P code codes can be generated based on each contained signal. For example, by repeating the above process three times,

Is obtained (where n1, n2, and n3 are any note numbers from 0 to 127), the note number is “n1”, and the velocity is “A (n1)”.²+ B (n1)²”And the note number is“ n2 ”and the velocity is“ A (n2) ”.²+ B (n2)²”, The note number is“ n3 ”, and the velocity is“ A (n3) ”.²+ B (n3)²The section signal x is encoded by the MIDI code that becomes the square root (execution amplitude value).
[0054]
The above-described process will be described as follows using the general formulas shown in FIGS. That is, when the i-th interval signal xi (k) is given, a Fourier spectrum for the interval signal xi (k) is obtained and its peak frequency f (ni) is determined. Then, the element signal corresponding to the peak frequency f (ni) is selected as the i-th harmonic signal, and the coefficients A (ni) and B (ni) for this harmonic signal are calculated based on the equation of FIG. To do. Subsequently, the i-th inclusion signal Gi (k) is defined as in the equation of FIG. 15, and the i-th inclusion signal from the i-th interval signal xi (k) as in the equation of FIG. A difference signal is obtained by subtracting Gi (k), and this difference signal is obtained as the (i + 1) -th interval signal x._{i + 1}(K). Such processing may be repeated as many times as necessary while increasing i by 1 from the initial value i = 1.
[0055]
The above is the frequency analysis method known as generalized harmonic analysis. The Fourier transform is used to select the element signal with the highest correlation value as the harmonic signal, but basically, the technique is used to express the original signal as the sum of multiple element signals. Takes a different approach. In the equation of FIG. 11 or FIG. 14, there is a term of 2 / w at the beginning of the right side, and the denominator “w” in this term indicates that it is divided by the total number of samples w. “2” is a numerical value that is empirically known as a coefficient value that is most suitable for performing this generalized harmonic analysis (a numerical value that can be explained theoretically, but detailed explanation is omitted here). .
[0056]
Finally, the basic procedure of the encoding method according to the present invention using generalized harmonic analysis will be described based on the flowchart of FIG. First, in step S1, an acoustic signal to be encoded is input. Specifically, as already described, sampling is performed at a predetermined sampling frequency F, and the digital data is captured by the PCM method. Subsequently, in step S2, a plurality of unit sections are set on the time axis, and the section signal x is extracted for each unit section. As described in §2, the unit section is preferably set so that adjacent unit sections partially overlap on the time axis. Subsequently, the parameter i is set to 1 in step S3. This parameter i is for counting the number of the above-described repetitive processes.
[0057]
Next, in step S4, the i-th interval signal xi is Fourier transformed. When i = 1, the section signal x extracted in step S2 becomes the section signal xi in step S4. In step S5, the frequency f (ni) corresponding to the peak of the obtained Fourier spectrum is determined from 128 candidates. Here, the 128 candidates are the frequencies f (0) to f (127) shown in the lower table of FIG. 8 and correspond to the 128 note numbers in MIDI. The process of determining the frequency f (ni) in step S5 corresponds to the process of determining the element signal having the highest correlation value (in this case, the intensity of the Fourier spectrum) with respect to the section signal xi from among 128 element signals. The element signal having the frequency f (ni) is called a harmonic signal here.
[0058]
Subsequently, in step S6, coefficients A (ni) and B (ni) for this harmonic signal are calculated (formula of FIG. 14), and the i-th contained signal Gi is obtained (formula of FIG. 15). As described above, A (ni) and B (ni) calculated here correspond to the correlation value of the harmonic signal with respect to the section signal xi. Since this correlation value is also calculated when the Fourier spectrum is obtained in step S4, it may be used as it is.
[0059]
Next, in step S7, a difference signal is obtained by subtracting the i-th inclusion signal Gi from the i-th interval signal xi, and this difference signal is defined as the (i + 1) -th interval signal x (i + 1). . In step S8, it is determined whether or not the parameter i has reached the predetermined number of times I. If not, the process proceeds to step S9, i is updated by 1, and the process returns to step S4. It will be. In this step S4, a Fourier transform is performed on the (i + 1) th section signal x (i + 1) this time. The predetermined number I is a parameter indicating how many pieces of code data represent one unit section. For example, in the example shown in FIG. 3, one unit section is represented by three MIDI code data, and these are arranged on tracks T1 to T3. In this case, I = 3 is set, three inclusion signals G1, G2, G3 are obtained, and MIDI code data is obtained from each of them. In practice, it is preferable to set I = 8 and generate MIDI code data for 8 tracks.
[0060]
In the example shown in FIG. 17, the processing from steps S4 to S8 is always repeated I times. However, even if i <I, the Error value shown in FIG. 9 is made smaller than a predetermined set value. If an approximate function ξ (k) is obtained, the repetitive operation may be terminated there. For example, if the above process is repeated three times, three inclusion signals G1, G2, and G3 are obtained. Here, when the error value shown in FIG. 9 is calculated as an approximation function ξ (k) = G1 + G2 + G3 and is smaller than a predetermined set value, the interval signal x is calculated by the sum of the three inclusion signals G1, G2, and G3. A signal quite close to (k) has already been realized. Therefore, immediately before step S8, ξ (k) = ΣGi is calculated, the error value shown in FIG. 9 is calculated, and a step of comparing with the predetermined set value is added. The process may proceed to step S10.
[0061]
In step S5, when one frequency f (ni) is selected from the 128 candidates, the already selected frequency may be selected again (generally, in step S7). Since the signal including the frequency once selected is subtracted, the remaining difference signal does not contain much frequency components, and it is unlikely that the frequency already selected will be selected again. But) In such a case, there are two types of handling: handling that allows re-selection of the same frequency and handling that does not allow. If the former is to be handled, in step S5, the frequency corresponding to the peak of the Fourier spectrum may be selected as it is without checking whether or not duplicate selection is performed. In this case, the included signals finally obtained include those having the same frequency, and MIDI code data of the same scale is arranged in different tracks. On the other hand, if the latter is to be handled, it is checked in step S5 whether or not it is a duplicate selection. If it is a duplicate selection, the next candidate (frequency corresponding to the next peak of the Fourier spectrum) is selected. do it.
[0062]
When the necessary number of repetitions are thus completed, the encoding for the unit section is completed, so that the process proceeds from step S10 to step S11, and the unit section is updated. For example, if the section setting as shown in FIG. 4 is performed, a unit section obtained by shifting the unit section by the offset length ΔL (20 samples) is newly set in step S2, and this new unit section is sampled. The data for 1024 samples is extracted as a new section signal x. If such a process is completed for all the sections, the encoding procedure is completed through step S10.
[0063]
§5. A device to reduce the computational burden of generalized harmonic analysis
Now, as described in §4, the gist of the present invention is to decompose the section signal into a plurality of contained signals and convert each contained signal into code data by performing generalized harmonic analysis. . However, as shown in the flowchart of FIG. 17, the generalized harmonic analysis method requires an operation for obtaining a correlation between a large number of signals, so that the calculation burden is enormous compared to the Fourier analysis method. It will be something. For this reason, at present, it has not reached general use. Therefore, the inventor of the present application has devised several ideas for reducing the calculation burden of the generalized harmonic analysis. By performing these measures, it becomes possible to encode an actual acoustic signal at a practical level using a personal computer. Hereinafter, these devices will be described in order. These devices can be implemented independently, but in practice, it is preferable to combine all of them.
[0064]
(1). Introduction of simple correlation calculation
In the method described in §4, there are two steps for calculating the correlation between two signals (functions). The first step is a step of selecting one harmonic signal (the signal having the highest correlation value with respect to the section signal) from the 128 element signals. In the flowchart of FIG. 17, the section signal xi is selected in step S4. This corresponds to the process of performing Fourier transform and determining the frequency corresponding to the spectrum peak from 128 candidates in step S5. On the other hand, the second step is a step for obtaining coefficients A (n) and B (n) for the selected harmonic signal, and corresponds to the arithmetic processing in step S6 in the flowchart of FIG. In fact, step S4 and step S6 in FIG. 17 essentially do the same thing.
[0065]
In the first place, the Fourier transform performed in step S4 is an operation for obtaining a correlation value with a specific trigonometric function, as shown in FIG. For example, assume that a predetermined section signal x is given in the unit section d as shown in FIG. Here, the unit interval d is an interval having an interval length L, and the interval signal x is data sampled at the sampling frequency F. The sample value of the section signal x indicated by the sample number k (k = 0, 1, 2,..., W−1) is x (k). For this section signal x, for example, a sine wave sin (2πf (n) k / F) having a frequency f (n) defined in the same unit section as shown in FIG. 18 (b) is prepared. Here, the correlation value S1 (n) between the section signal x and the sine wave signal can be calculated by the equation of FIG. Focusing on the right side of this equation, the product of the value of the section signal x (k) and the value of the sine function sin (2πf (n) k / F) is obtained at the k-th sample position. If both functions are exactly the same function (in other words, if they have the maximum correlation), both function values will always have the same sign regardless of the value of the sample position k. The product of both is always positive. Therefore, the sum for k = 0 to (w−1), that is, the value of the correlation value S1 (n) is a large positive value. On the other hand, if there is no correlation between the two functions, the two function values may have the same sign or different signs depending on the value of the sample position k. It can be positive or negative. Therefore, the sum for k = 0 to (w−1), that is, the value of the correlation value S1 (n) is close to zero.
[0066]
On the other hand, the correlation value S2 (n) using the cosine wave cos (2πf (n) k / F) instead of the sine wave sin (2πf (n) k / F) is also calculated by the equation of FIG. 18 (c). it can. In obtaining the correlation with the periodic signal having the component of the frequency f (n), in order to avoid the influence of the phase difference, it is necessary to consider both the correlation with the sine wave and the correlation with the cosine wave (how to Even if it is a simple phase, if both sine wave and cosine wave are taken into account, correlation can be detected by either). Therefore, actually, as shown in the lowermost equation in FIG. 18C, the square root E of the sum of squares of the correlation value S1 (n) for the sine wave and the correlation value S2 (n) for the cosine wave. (N) is obtained as a correlation value with a periodic signal having a component of frequency f (n) to obtain a Fourier spectrum. The execution intensity E in FIG. 1 (c) corresponds to the value of the square root E (n) of this sum of squares.
[0067]
By the way, the expressions relating to the correlation values S1 (n) and S2 (n) shown in FIG. 18C are almost the same as the expressions relating to the coefficients A (n) and B (n) shown in FIG. This is because both are equations for obtaining a correlation value between the periodic signal having the component of the frequency f (n) and the section signal x (in the equation of FIG. 11, the term 2 / w is as described above). In addition, it is a coefficient obtained empirically in performing harmonic analysis). As a result, almost the same arithmetic processing is executed in step S4 and step S6 in FIG. However, the purpose of step S4 is to select, as the harmonic signal, the highest correlation value for the interval signal x among the 128 element functions, whereas the purpose of step S6 is to select the selected harmonic. It is to obtain a contained signal Gi by obtaining a correlation value for the signal and multiplying the harmonic signal by the correlation value.
[0068]
Paying attention to this difference in purpose, the following two features become clear. The first feature is that in step S4, it is necessary to calculate correlations for all 128 element functions, whereas in step S6, the harmonic function (that is, one selected from 128 patterns) is selected. It is only necessary to calculate the correlation of the element function. The second feature is that, in step S4, it is only necessary to be able to determine the correlation between 128 element functions, so that the calculation accuracy of the correlation is not required so much. In step S6, the amplitude value of the contained signal Gi is not required. Therefore, it is necessary to determine the coefficients A (n) and B (n) corresponding to, so that a correlation value having a certain degree of calculation accuracy is required.
[0069]
Considering these two features, in step S4, correlations for all 128 element functions must be obtained, but it is sufficient if coarse correlation values are obtained. In step S6, one element function is obtained. It can be seen that it is only necessary to obtain a correlation value with high accuracy only for (harmonic function).
[0070]
As the rough correlation calculation performed in step S4, for example, a simple correlation calculation method as shown in FIG. 19 can be used. First, the amplitude peak position is detected for the section signal x as shown in FIG. Here, each peak position is determined on the assumption that a positive peak and a negative peak appear alternately, and when a peak of the same polarity appears continuously, the peak value is larger. Only the peak is recognized. In the illustrated example, five peak positions P1 to P5 (appearing at times t (P1) to t (P5), respectively) are detected. Thus, when the peak position of the section signal x is detected, the correlation value is calculated using only the information regarding the peak position.
[0071]
For example, if the correlation value with the sine wave of the frequency f (n) as shown in FIG. 19 (b) is calculated, the amplitude value at only five positions from time t (P1) to t (P5) is calculated. The product is calculated and the sum is obtained. In other words, in the normal correlation calculation (Fourier analysis), the correlation values S1 (n) and S2 (n) are obtained based on the equation shown in FIG. 18C, and the final correlation value (effective value) is obtained. Whereas E (n) has been obtained, in the simple correlation calculation, simple correlation values SS1 (n) and SS2 (n) are obtained based on the equation shown in FIG. (Effective value) EE (n) is obtained. Here, the parameter j indicates the peak position number, x (Pj) is the value of the section signal x at the jth peak position, t (Pj) is the time value of the jth peak position, and J is The total number of peak positions is shown respectively.
[0072]
When 128 simple correlation values EE (n) are obtained in this way, an element signal having a frequency f (n) corresponding to the simple correlation value showing the largest value may be selected as a harmonic signal. Then, in step S6, only for this harmonic signal, based on the equation shown in FIG. 11, normal correlation calculation (calculation of correlation values using all information (all sample positions) as well as the peak position of the harmonic signal). And the inclusion signal G may be obtained using the correlation values (coefficients A (n), B (n)) obtained as a result of this recalculation.
[0073]
FIG. 20 is a flowchart illustrating a processing procedure when such a method for introducing a simple correlation calculation is employed. Here, the same steps as those in the flowchart of FIG. Steps different from the flowchart of FIG. 17 are steps S41 and S51. In step S4 of FIG. 17, Fourier transform (normal correlation calculation for all 128 element signals) is performed, whereas in step S41 of FIG. 20, the 128 element signals are described above. Simple correlation calculation is performed. Further, in step S5 in FIG. 17, the element signal corresponding to the peak position of the Fourier spectrum is selected as the harmonic signal, whereas in step S51 in FIG. 20, the strongest correlation is obtained as a result of 128 simple correlation calculations. Is selected as a harmonic signal.
[0074]
If specific numerical values are listed, for example, if the number of samples w in one unit section is 1024, but the number J of amplitude peak positions is about 100, simple correlation calculation is adopted. As a result, the calculation burden is reduced to about 1/10.
[0075]
(2). Narrowing down element signal candidates (part 1)
In the process shown in the flowchart of FIG. 20 described above, the i-th inclusion signal Gi is obtained in step S6, the (i + 1) -th section signal x (i + 1) is obtained in step S7, and the value of i is determined in step S9. After the update, in step S41, correlation calculation (simple correlation calculation) is performed again between the new section signal and the 128 element signals. Of course, since the interval signal xi is also updated by updating the parameter i (the difference signal in the calculation of step S7 becomes a new interval signal), in step S41, a correlation operation is performed on the new interval signal xi. Although this is meaningful, in order to further improve the calculation efficiency, a part of the correlation calculation performed in this step S41 can be omitted.
[0076]
The flowchart shown in FIG. 21 shows processing in which such omission is performed. Steps different from the flowchart of FIG. 20 are steps S42, S52, and S92. In step S41 in FIG. 20, the simple correlation calculation is performed between the 128 element signals and the section signal xi. In step S42 in FIG. 21, the correlation obtained as a result of the simple correlation calculation is further displayed. The process of extracting the top 16 candidates is added in order of strength (the number of candidates to be extracted is preferably set to a multiple of the value of I in step S8. In this embodiment, I = 8, so twice that 16 candidates are extracted.) In step S52, a process of selecting an element signal having the strongest correlation among the extracted 16 candidates as a harmonic signal is performed. However, at this point, the processing content of step S52 is not different from the processing content of step S51 at all (the first place among 128 candidates and the first place among 16 candidates are naturally the same). Since the element signal selected as the harmonic signal is always the element signal having the strongest correlation, even if the top 16 candidates are extracted in step S42, only the first place is the harmonic signal in step S52. The candidates from the second place to the 16th place have no meaning at this point.
[0077]
However, in the case of the procedure shown in FIG. 21, after the value of the parameter i is updated at step S9, the process at step S92 is executed instead of the process at step S42. The process of step S92 is a process of performing a simple correlation calculation between the 16 candidate element signals already extracted and the section signal xi. In the process of step S42, correlations for all 128 element signals are calculated, whereas in the process of step S92, correlations for 16 element signals need only be calculated. This method is based on the idea that even if the value of the parameter i is updated, the frequency component included in the section signal xi will not change significantly. After all, if the parameter i = 1 and the top 16 candidates are extracted for the section signal x1 (original sound signal), the element signals other than the top 16 candidates are the section signals xi (i inclusion signals after the parameter i = 2). The remaining signal (which has been reduced one after another) will not be considered at all, but will not cause any serious trouble.
[0078]
If this method is adopted, the correlation calculation for all 128 element signals is performed only when the parameter i = 1, that is, when the first harmonic signal G1 is selected, and the second harmonic signal is selected. When selecting the signals G2, G3,..., It is only necessary to perform a correlation calculation on the extracted 16 candidate element signals, so that the calculation burden is reduced to about 1/8.
[0079]
Note that this element signal candidate narrowing-down method can also be applied to the processing shown in FIG. 17 where simple correlation calculation is not performed. In short, in this method, when selecting the first harmonic signal for the section signal of each unit section, the correlation value for this section signal is selected from a plurality of X (in the above example, X = 128) element signals. A plurality of Y (Y <X) candidates from the first to the Yth (Y = 16 in the above example) are selected in descending order, and the first candidate is used as the first harmonic signal. When selecting the second and subsequent harmonic signals, an element signal having the highest correlation value with respect to the section signal may be selected as a harmonic signal from among a plurality of Y candidates.
[0080]
(3). Narrowing down element signal candidates (2)
In the above-described candidate narrowing-down method, after the parameter i is updated to 2, calculations for narrowing down candidates are performed. On the other hand, in the candidate narrowing-down method described here, after the unit section is updated, the computation narrowing down the candidates is performed. Such a narrowing-down method is effective when setting is performed such that adjacent unit sections partially overlap on the time axis, as described in §2. For example, in the example shown in FIG. 4, when 1024 sample data in the unit interval d1 and 1024 sample data in the unit interval d2 are compared, only 20 samples are different, and the remaining 1004 The amount of data is completely the same. However, in all of the procedures shown in FIGS. 17, 20, and 21, after the unit section is updated in step S11, the same processing as the previous time is repeated again. The narrowing-down method described here is a method for improving the calculation efficiency by paying attention to this point.
[0081]
FIG. 22 is a flowchart showing the processing of this method. Steps different from the flowchart of FIG. 21 are steps S43 to S45. First, when a specific unit section is set in step S2, the parameter i is set to an initial value 1 in step S3, and in step S43, it is determined whether or not to perform detailed calculation. When the section signal extracted in step S2 is the first section signal, detailed calculation is performed, and the process proceeds from step S43 to step S44. In step S44, substantially the same process as the process of step S42 in FIG. 21 is performed. That is, simple correlation calculation is performed between the 128 element signals and the section signal xi, and a plurality of candidates are extracted in the order of strong correlation obtained as a result of the simple correlation calculation. However, in the example shown in FIG. 22, the top 32 candidates are extracted (again, the number of candidates to be extracted is preferably set to a multiple of the value of I in step S8. In this embodiment, , I = 8, and 32 candidates that are four times as large are extracted.)
[0082]
Subsequent step S52 is exactly the same as step S52 shown in FIG. 21, and a process of selecting the element signal having the strongest correlation as the harmonic signal from the extracted 16 candidates is performed. In step S44, the top 32 candidates are extracted, but in step S52, only the top 16 candidates are used. The following procedure is exactly the same as the procedure of FIG. That is, after the parameter i is updated in step S9, step S92 is executed instead of executing step S44, and only the correlation calculation for 16 candidates is always performed. Therefore, at this time, among the top 32 candidates extracted in step S44, the candidates from the 17th place to the 32nd place are not used at all.
[0083]
Thus, when the parameter i reaches the set value I, the processing for the unit section is completed, and the process returns from step S10 to step S11 to step S2. Here, a new unit interval is set and a new interval signal is extracted. As described above, most of the new interval signal and the old interval signal overlap on the time axis. As described above, in the past, in step S44, correlation calculation is performed on 128 element signals with respect to the old section signal, and the new section signal and the old section signal are over a predetermined time on the time axis. If they overlap, the detailed calculation is not performed in step S43, and the process proceeds to step S45. In this step S45, instead of calculating the correlation for the 128 element signals, correlation calculation is performed for the 32 candidate element signals previously extracted in step S44, and as a result, the top 16 candidates are extracted. . Hereinafter, in step S52 and subsequent steps, a harmonic signal is selected from the 16 candidates.
[0084]
If such a method is adopted, it is not necessary to repeat the correlation calculation for all 128 element signals every time the unit section is updated, and the calculation load is reduced to about ¼. This method can be applied not only to the process shown in FIG. 21, but also to the processes shown in FIGS. In short, in this method, when selecting a harmonic signal for the section signal of the first unit section, the correlation value for the section signal is high from among a plurality of X (in the above example, X = 128) element signals. In order, a plurality of Z candidates (Z <X) from the first to the Zth (Z = 32 in the above example) are selected, and a harmonic signal is selected from the plurality of Z candidates. Thus, when selecting the harmonic signal for the section signal of the second unit section that overlaps the first unit section over a predetermined time on the time axis, a plurality of Z candidate candidates are selected. A harmonic signal may be selected from the inside.
[0085]
In step S43, the determination criterion as to whether or not to perform the detailed calculation may be set to an appropriate value in consideration of the shift amount of the unit section. For example, the length (L / 2) It should be set so that detailed calculation is performed every time the deviation occurs. In the case of the example shown in FIG. 4, since the number of samples in one unit section is 1024, detailed calculation may be performed when the number of samples is shifted by 512 or more. Specifically, in this case, since 20 samples are shifted by one unit interval update, the detailed calculation is performed every time the unit interval update in step S11 is performed about 25 times.
[0086]
§6. How to deal with frequency fluctuations
Instrumental sounds accompanied by vibrato and human voices (vocal sounds) contain minute frequency fluctuations. However, in the methods described so far, the component signals are all composed of harmonic functions (functions having a single frequency such as a sine function or cosine function), so for acoustic signals including minute frequency fluctuations. There is a possibility that a correct correlation is not always obtained. For example, if there is a frequency variation from the frequency f (n) corresponding to the note number n to the frequency f (n + 1) corresponding to the note number (n + 1) within one unit interval, When the correlation of the element signal having the frequency f (n) and the element signal having the frequency f (n + 1) is examined with respect to the section signal, only about 50% correlation can be obtained. Here, a method for dealing with such frequency fluctuation will be described.
[0087]
(1). Coping method using anharmonic function
First, a method using not only a harmonic function but also an anharmonic function as an element signal will be described. FIG. 23 (a) shows a waveform of a sine wave that is a typical harmonic function. This sine wave is a harmonic function having a single frequency f (n). If the sampling frequency is F and the sample number is k, it is expressed by the equation sin (2πf (n) k / F). . On the other hand, consider an anharmonic function as shown in FIG. The frequency of the anharmonic function gradually changes over the section length L. That is, the frequency is low on the left side of the section and high on the right side of the section, and the frequency changes depending on the sample number k. This anharmonic function is represented by the equation sin (2πfj (n, k) k / F). Here, fj (n, k) is a function represented by the equation shown in FIG. 23 (c), and j = -1, 0, +1.
[0088]
FIG. 24 is a diagram for explaining the meaning of the function fj (n, k), and this meaning is explained for three cases of j = -1, 0, +1. For example, when j = −1, the function fj (n, k) = (f (n−1) −f (n)) k / w + f (n). Here, when k = 0 is substituted, fj (n, 0) = f (n) and when k = w is substituted, fj (n, w) = f (n-1) is obtained. As shown, the left end of the section length L has a frequency f (n), the right end has a frequency f (n-1), and becomes an anharmonic function in which the frequency gradually decreases from left to right. . When j = 0, the function fj (n, k) = f (n) is obtained, and as shown in the middle stage of FIG. 24, the frequency is always a constant harmonic function (the sine function of FIG. 23A). I understand that. Further, when j = + 1, the function fj (n, k) = (f (n + 1) −f (n)) k / w + f (n) is obtained. Here, when k = 0 is substituted, fj (n, 0) = f (n) and when k = w is substituted, fj (n, w) = f (n + 1) is obtained. Furthermore, the left end of the section length L has the frequency f (n), the right end has the frequency f (n + 1), and the frequency gradually increases from left to right (as shown in FIG. 23 (b)). It can be seen that
[0089]
In the method described in §4, as shown in the lower table of FIG. 8, 128 harmonic functions (sine functions) having frequencies f (0) to f (127) corresponding to note numbers 0 to 127 are shown. And a cosine function) are prepared as element signals. Here, for each of the 128 element signals, an anharmonic function is prepared in which the value of j in the equation shown in FIG. For example, the frequency f (n) is obtained by combining the sine function sin (2πf (n) k / F) and the cosine function cos (2πf (n) k / F) as shown in FIG. First synthesis function (harmonic function corresponding to j = 0) and a second synthesis function defined by setting j = −1 (section start frequency f (n) to section end frequency f (n -1), a third harmonic function (interval) defined by setting j = + 1 and an anharmonic function obtained by synthesizing a sine function and a cosine function whose frequency continuously changes until -1) An element signal of three types: an anharmonic function obtained by combining a sine function and a cosine function whose frequency continuously changes from the start frequency f (n) to the section end frequency f (n + 1). prepare. Eventually, a total of 128 × 3 element signals are prepared.
[0090]
In the correlation calculation for selecting the harmonic signal (step S4 in FIG. 17, step S41 in FIG. 20, steps S42 and S92 in FIG. 21, steps S44, S45, and S92 in FIG. 22), a total of 128 × 3 patterns. (When narrowing down candidates, 16 × 3 or 32 × 3) correlation calculation is performed. If the correlation value with the element signal having an anharmonic function corresponding to j = −1 or j = + 1 is maximized, the corresponding harmonic function corresponding to j = 0 is provided. The selected element signal is selected as a harmonic signal. If such a method is adopted, the calculation burden increases three times, but it becomes possible to perform a more accurate correlation calculation even for an acoustic signal including a minute frequency fluctuation, and a more accurate code. Can be realized.
[0091]
(2). How to deal with intermediate frequencies
The 128 frequencies f (0) to f (127) shown in the lower table of FIG. 8 are frequencies corresponding to the note numbers of the MIDI code, such as a proportional constant α (α is the 12th root of 2), and the like. Has a specific series. Therefore, the first synthesis function obtained by synthesizing the sine function and the cosine function having the frequency f (n) for the nth frequency f (n) (the function listed in the lower table of FIG. 8). And a second synthesis function obtained by synthesis of a sine function having a frequency f (n) * β and a cosine function, and a synthesis of a sine function having a frequency f (n) / β and a cosine function. By defining the third synthesis function, a total of 128 × 3 synthesis functions are prepared and used as element signals. However, 1 <β <square root α is set. In the correlation calculation for selecting the harmonic signal (step S4 in FIG. 17, step S41 in FIG. 20, steps S42 and S92 in FIG. 21, steps S44, S45, and S92 in FIG. 22), a total of 128 × 3 patterns. (When narrowing down candidates, 16 × 3 or 32 × 3) correlation calculation is performed. If it is determined that the correlation value for the second synthesis function or the third synthesis function is the highest, the first synthesis function corresponding to the synthesis function is selected as a harmonic signal. If such a method is adopted, the calculation burden increases three times, but it becomes possible to perform more accurate correlation calculation even for an acoustic signal including minute frequency fluctuations. Encoding is possible.
[0092]
FIG. 25 is a diagram more specifically showing the above-described method. Here, β is set to the third root α. When the frequency f is indicated by a logarithmic axis, the note numbers are equally spaced on the frequency axis as shown in the figure (note that the interval between the note numbers is a proportional constant α of a geometric series), β When the third root α is set, the frequencies f (n) * β and f (n) / β are plotted at positions obtained by dividing the interval between the note numbers into three equal parts. In the end, element signals having three frequencies corresponding to the note number n, the note number (n + 1/3), and the note number (n-1 / 3) are prepared near the note number n. When it is determined that the correlation in any of the three ways is high, the element signal corresponding to the note number n should always be selected as the harmonic signal (this is the note for MIDI code data. Code data such as a number (n + 1/3) and a note number (n-1 / 3) are not defined, and are represented by the note number n).
[0093]
§7. Method using octave descent
In general, the correlation calculation with an element signal having a high frequency decreases the calculation accuracy. This considers the case where the correlation between the section signal x and the sine wave of the frequency f (n) is taken as in the example shown in FIG. This is because the number of samples to be reduced is reduced. When the frequency is so high, a difference is less likely to occur between the correlation value for the frequency f (n) and the correlation value for the frequency f (n + 1).
[0094]
In order to deal with such a problem, the frequency f / 2q (q May be calculated to obtain a correlation with an element signal having a predetermined integer). In other words, a correlation calculation using a function having a frequency lower by q octaves may be performed. FIG. 27 shows a general double angle formula for a trigonometric function. Instead of performing an operation for sin2θ, an operation of 2 sinθ · cosθ can be used, and instead of performing an operation for cos2θ, cos²θ− sin²It shows that the calculation of θ can be used. Therefore, for example, substitution of an expression as shown in FIG. 28 becomes possible. Here, sin (2πf (n) k / F) and cos (2πf (n) k / F) on the left side are functions used in various expressions described so far. By substituting with the equation, the frequency f (n) can be replaced with the frequency f (n-12). In MIDI, note numbers having a separation of 12 are equivalent to having an octave separation (12 semitones are one octave), and a two-fold separation occurs in frequency.
[0095]
After all, as shown in FIG. 29, if a trigonometric function having 12 frequencies corresponding to note numbers 0 to 11 is called a basic trigonometric function, it has all frequencies corresponding to note numbers 12 and later. The trigonometric function can be replaced with an operation using the basic trigonometric function. Therefore, in the present invention, if a method using this octave lowering is used, it is possible to always perform a correlation operation on a trigonometric function having a low frequency, and to obtain a correlation with higher accuracy. Become.
[0096]
As mentioned above, although this invention was demonstrated based on embodiment shown in figure, this invention is not limited to these embodiment, In addition, it can implement in a various aspect. In particular, the various encoding processes described above are actually executed using a computer, and a program for realizing the encoding process according to the present invention is a computer-readable recording medium such as a magnetic disk or an optical disk. The code data encoded by the encoding process according to the present invention can be recorded and supplied to a computer-readable recording medium such as a magnetic disk or an optical disk. Can do.
[0097]
【The invention's effect】
As described above, according to the audio signal encoding method of the present invention, it is possible to perform conversion into code data such as MIDI data with high quality.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic principle of a method for encoding an acoustic signal according to a prior invention.
FIG. 2 is a diagram showing a code code created based on the intensity graph shown in FIG. 1 (c).
FIG. 3 is a diagram showing a code code created by setting unit sections so as to partially overlap on a time axis.
FIG. 4 is a diagram showing a specific example of unit section setting that partially overlaps on the time axis.
FIG. 5 is a graph showing an example of a Fourier spectrum in which a frequency axis is displayed on a linear scale.
FIG. 6 is a graph showing an example of a Fourier spectrum in which the frequency axis is displayed on a logarithmic scale.
FIG. 7 is a graph showing a correspondence relationship between a Fourier spectrum and a note number in which a frequency axis is displayed on a logarithmic scale.
FIG. 8 is a diagram showing a section signal x to be encoded and 128 element signals prepared for decomposing the section signal x.
FIG. 9 is a diagram illustrating an expression for explaining a basic policy of harmonic analysis.
FIG. 10 is a diagram showing a concept of selecting an element signal having the highest correlation as a harmonic signal based on a peak of a Fourier spectrum.
FIG. 11 is a diagram illustrating an equation for obtaining a correlation value for a selected harmonic signal.
FIG. 12 is a diagram showing an expression for defining a content signal G (k) based on a selected harmonic signal.
13 is a graph showing an example of obtaining a difference signal between a section signal x and a content signal G. FIG.
FIG. 14 is a diagram showing a general formula for obtaining a correlation value for a selected harmonic signal.
FIG. 15 is a diagram showing a general formula for defining a content signal Gi (k) based on a selected harmonic signal.
FIG. 16 is a diagram illustrating a general formula in which a difference signal between a section signal xi (k) and a contained signal Gi (k) is a new section signal x (i + 1) (k).
FIG. 17 is a flowchart showing a basic procedure of an audio signal encoding method according to the present invention.
FIG. 18 is a diagram illustrating a general principle of determining a correlation value in Fourier transform.
FIG. 19 is a diagram showing a basic principle of a simple correlation calculation method used in the present invention.
FIG. 20 is a flowchart showing a basic procedure of an audio signal encoding method using a simple correlation calculation method.
FIG. 21 is a flowchart showing a basic procedure of an audio signal encoding method using a method of narrowing down element signal candidates.
FIG. 22 is a flowchart showing another basic procedure of an audio signal encoding method using a method of narrowing down element signal candidates.
FIG. 23 is a diagram illustrating an example of an anharmonic function prepared together with a harmonic function.
FIG. 24 is a diagram for explaining a relationship between a harmonic function and an anharmonic function.
FIG. 25 is a diagram for explaining an example of preparing an element signal having an intermediate frequency.
FIG. 26 is a diagram showing a correlation calculation for a sine wave having a relatively high frequency.
FIG. 27 is a diagram showing a double angle formula of a trigonometric function.
FIG. 28 is a diagram showing a method for replacing an expression using a double angle formula of a trigonometric function.
FIG. 29 is a diagram for explaining an octave lowering method applicable to the present invention.
[Explanation of symbols]
A ... Complex intensity
A (n), B (n) ... coefficient (correlation value)
d1 to d5: Unit section
E, E (n), EE (n) ... Effective strength
Error ... error value
e (i, j): Effective strength of code code n (i, j)
F ... Sampling frequency
f, f (n) ... frequency
G (k) ... Inclusion signal
i: Parameter indicating the number of repetitions
I: Predetermined number of times
j: Parameter indicating peak position number / parameter indicating anharmonic function
J ... Total number of peak positions
k ... Sample number in one unit section
L: Section length of unit section
ΔL: Offset length
M ... Number of measurement points
n, n1, n2, n3 ... note number
n (i, j) ... jth code code extracted for the unit interval di
P1 to P5: Peak position number
S1 (n), S2 (n) ... correlation values with trigonometric functions
SS1 (n), SS2 (n) ... Simple correlation values with trigonometric functions
T1-T3 ... track
t1-t6 ... Time
w: Number of samples in one unit section
x, xi ... section signal
ξ (k) ... Approximate function

Claims

An encoding method for encoding an acoustic signal given as a time-series intensity signal,
A section signal extraction stage that sets a plurality of unit sections on the time axis of the acoustic signal to be encoded and extracts a section signal for each unit section,
An element signal preparation step of preparing a plurality of element signals to be components of the section signal;
A harmonic signal selection step of selecting, as a harmonic signal, an element signal having the highest correlation value with respect to the section signal from the plurality of element signals;
A difference signal calculation step of obtaining a difference signal by subtracting the inclusion signal given by the product of the harmonic signal and the correlation value obtained for the harmonic signal from the interval signal;
Using the difference signal as a new interval signal, the harmonic signal selection step and the difference signal calculation step are executed to obtain a new inclusion signal and a new difference signal, thereby repeatedly obtaining a plurality of inclusion signals. An encoding step for generating a plurality of code codes for expressing the section signal based on the obtained inclusion signal;
A method for encoding an acoustic signal, wherein the acoustic signal is expressed by a set of code codes generated for each unit section.

The encoding method according to claim 1,
In the element signal preparation stage, a plurality of element signals having different frequencies are prepared,
A method of encoding an acoustic signal, wherein in a harmonic signal selection step, an element signal corresponding to a peak frequency of the obtained Fourier spectrum is selected as a harmonic signal by performing Fourier transform on the section signal.

The encoding method according to claim 1,
In the harmonic signal selection stage, a simple correlation calculation is performed that calculates a correlation value using only information on the peak position of the section signal, and a harmonic signal is selected based on the correlation value obtained as a result of the simple correlation calculation. ,
In the difference signal calculation stage, the correlation value is recalculated using all the information of the selected harmonic signal, and the calculation for obtaining the contained signal is performed using the correlation value obtained as a result of the recalculation. A characteristic acoustic signal encoding method.

The encoding method according to claim 1,
When selecting the first harmonic signal for the section signal of each unit section, among the plurality of X element signals, a plurality of Y signals from the first to the Y-th in descending order of the correlation value with respect to the section signal (Y <X) candidates are selected, the first candidate is selected as the first harmonic signal, and when the second and subsequent harmonic signals are selected, the plurality of Y candidate candidates are selected. An acoustic signal encoding method, wherein an element signal having the highest correlation value with respect to a section signal is selected as a harmonic signal.

The encoding method according to claim 1,
A method for encoding an acoustic signal, characterized in that setting is performed such that adjacent unit sections partially overlap on a time axis in a section signal extraction stage.

The encoding method according to claim 5, wherein
When selecting a harmonic signal for the section signal of the first unit section, a plurality of Z elements from the first to the Z-th order in descending order of the correlation value with respect to the section signal from among the plurality of X element signals ( Z <X) candidates are selected, and a harmonic signal is selected from the plurality of Z candidates.
When selecting a harmonic signal for the section signal of the second unit section that overlaps the first unit section over a predetermined time on the time axis, from among the plurality of Z candidates A method of encoding an acoustic signal, wherein a harmonic signal is selected.

In the encoding method in any one of Claims 1-6,
In the element signal preparation stage, a synthesis function of a sine function and a cosine function having the same frequency is set as one element signal, and each synthesis function for a plurality of X frequencies forming a geometric series is set as each element signal. A method for encoding an acoustic signal.

In the encoding method in any one of Claims 1-6,
In the element signal preparation stage, a plurality of X frequencies forming a geometric series are defined, and for the nth (n = 1, 2,..., X) frequency f (n),
A first synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) in the section;
A sine function and a cosine function that are defined in the same section as the unit section and in which the frequency continuously changes from the section start frequency f (n) to the section end frequency f (n-1) in this section. A second synthesis function obtained by synthesis;
By synthesizing a sine function and a cosine function that are defined in the same section as the unit section and in which the frequency continuously changes from the section start frequency f (n) to the section end frequency f (n + 1). A third composite function obtained;
Defining a total of 3X synthesis functions, calculating a correlation value using each of these synthesis functions as an element signal, and correlating the second synthesis function or the third synthesis function. When it is determined that the value is the highest, the first synthesis function corresponding to the synthesis function is selected as a harmonic signal.

In the encoding method in any one of Claims 1-6,
In the element signal preparation stage, a plurality of X frequencies forming a geometric series of the proportional constant α are defined, and the n th (n = 1, 2,..., X) frequency f (n)
A first synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) in the section;
A second synthesis function defined in the same section as the unit section and obtained by combining a sine function and a cosine function having a frequency f (n) * β in the section;
A third synthesis function defined in the same interval as the unit interval and obtained by combining a sine function and a cosine function having a frequency f (n) / β in this interval;
Defining a total of 3X synthesis functions (where 1 <β <square root α), and performing an operation for obtaining a correlation value using each of these synthesis functions as an element signal, and the second synthesis function Alternatively, when it is determined that the correlation value for the third synthesis function is the highest, the first synthesis function corresponding to the synthesis function is selected as a harmonic signal. Method.

In the encoding method in any one of Claims 7-9,
The frequency corresponding to each note number used in MIDI data is used as a plurality of X frequencies,
At the encoding stage, the acoustic signal of each unit section was determined based on the note number corresponding to the frequency of each contained signal, the velocity determined based on the amplitude, and the length of the unit section. A method for encoding an acoustic signal, characterized in that it is expressed by MIDI format code data comprising data indicating delta time.

In the encoding method in any one of Claims 1-10,
Instead of performing an operation for obtaining a correlation with respect to an element signal having a predetermined frequency f, by using a double angle formula for a sine function and a cosine function, an element signal having a frequency f / 2q (q is a predetermined integer) is used. A method for encoding an acoustic signal, comprising performing an operation for obtaining a correlation.

A computer-readable recording medium on which a program for encoding an acoustic signal for causing a computer to execute the encoding method according to claim 1 is recorded.