JP3795201B2

JP3795201B2 - Acoustic signal encoding method and computer-readable recording medium

Info

Publication number: JP3795201B2
Application number: JP27394997A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1997-09-19
Filing date: 1997-09-19
Publication date: 2006-07-12
Anticipated expiration: 2017-09-19
Also published as: JPH1195753A

Abstract

PROBLEM TO BE SOLVED: To efficiently code acoustic signals. SOLUTION: The acoustic signals, which are the object of coding, are PCM- coded, taken in as acoustic data and plural unit segments d1 to d5 are set on a time axis (a). Then, a Fourier transformation is conducted for every unit segment and a spectrum is obtained (b). Then, 128 note numbers (0 to 127) are discretely defined in accordance with a frequency axis (f) and an effective strength is obtained for every note number (c). Then, P note numbers Np(d1, 1) to Np(d1, P) are extracted in the successive order of effective strengths and arranged in the time positions corresponding to the respective unit intervals on P tracks. The note number on each track is expressed in terms of MIDI data and original acoustic signals are reproduced as P channel stereophonic sounds.

Description

【０００１】
【発明の属する技術分野】
本発明は音響信号の符号化方法に関し、時系列の強度信号として与えられる音響信号を符号化し、これを復号化して再生する技術に関する。特に、本発明はヴォーカル音響信号（人の話声，歌声の信号）を、ＭＩＤＩ形式の符号データに効率良く変換する処理に適しており、音声を記録する種々の産業分野への応用が期待される。
【０００２】
【従来の技術】
音響信号を符号化する技術として、ＰＣＭ（Pulse Code Modulation ）の手法は最も普及している手法であり、現在、オーディオＣＤやＤＡＴなどの記録方式として広く利用されている。このＰＣＭの手法の基本原理は、アナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして表現する点にあり、サンプリング周波数や量子化ビット数を高くすればするほど、原音を忠実に再生することが可能になる。ただ、サンプリング周波数や量子化ビット数を高くすればするほど、必要な情報量も増えることになる。そこで、できるだけ情報量を低減するための手法として、信号の変化差分のみを符号化するＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation ）の手法も用いられている。
【０００３】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形を記憶したＭＩＤＩ音源が別途必要になる。しかしながら、上述したＰＣＭの手法で音を記録する場合に比べて、情報量が極めて少なくてすむという特徴を有し、その符号化効率の高さが注目を集めている。このＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウエアに広く採り入れられており、カラオケ、ゲームの効果音といった分野でも広く利用されている。
【０００４】
【発明が解決しようとする課題】
上述したように、ＰＣＭの手法により音響信号を符号化する場合、十分な音質を確保しようとすれば情報量が膨大になり、データ処理の負担が重くならざるを得ない。したがって、通常は、ある程度の情報量に抑えるため、ある程度の音質に妥協せざるを得ない。もちろん、ＭＩＤＩ規格による符号化の手法を採れば、非常に少ない情報量で十分な音質をもった音の再生が可能であるが、上述したように、ＭＩＤＩ規格そのものが、もともと楽器演奏の操作を符号化するためのものであるため、広く一般音響への適用を行うことはできない。別言すれば、ＭＩＤＩデータを作成するためには、実際に楽器を演奏するか、あるいは、楽譜の情報を用意する必要がある。
【０００５】
このように、従来用いられているＰＣＭの手法にしても、ＭＩＤＩの手法にしても、それぞれ音響信号の符号化方法としては一長一短があり、一般の音響について、少ない情報量で十分な音質を確保することはできない。ところが、一般の音響についても効率的な符号化を行いたいという要望は、益々強くなってきている。いわゆるヴォーカル音響と呼ばれる人間の話声や歌声を取り扱う分野では、かねてからこのような要望が強く出されている。たとえば、語学教育、声楽教育、犯罪捜査などの分野では、ヴォーカル音響信号を効率的に符号化する技術が切望されている。
【０００６】
そこで本発明は、人の声音や歌声を含む音響信号に対しても効率的な符号化を行うことができる音響信号の符号化方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
(1) 本発明の第１の態様は、時系列の強度信号として与えられる音響信号を符号化するための音響信号の符号化方法において、
符号化対象となる音響信号の時間軸上に複数の単位区間を設定する区間設定段階と、
個々の単位区間ごとに、当該単位区間内の音響信号に含まれる周波数成分を第１の軸に、各周波数成分ごとの強度を第２の軸にとったスペクトルを作成するスペクトル作成段階と、
スペクトルの第１の軸に対応させて離散的に複数Ｑ段階の音階を示す複数Ｑ個の符号コードを定義し、この複数Ｑ個の符号コードを第１の軸に、各符号コードごとの強度を第２の軸にとった強度グラフを、個々の単位区間ごとのスペクトルに基いてそれぞれ作成する強度グラフ作成段階と、
強度グラフにおける各符号コードごとの強度に基いて、個々の単位区間ごとに、Ｑ個の全符号コードの中から当該単位区間を代表するＰ個の代表符号コードを抽出し、これら抽出した代表符号コードおよびその強度によって、個々の単位区間の音響信号を表現する符号化段階と、
隣接する複数の単位区間について、音階の差が所定範囲内の一連の代表符号コードがある場合、これら一連の代表符号コードを、複数の単位区間に跨がった統合符号コードに置換する統合処理段階と、
を行うようにしたものである。
【０００８】
(2) 本発明の第２の態様は、上述の第１の態様に係る音響信号の符号化方法において、
区間設定段階で、隣接する単位区間が時間軸上で部分的に重複するような設定を行い、各単位区間についてそれぞれ時間軸上の基準位置を定義し、特定の単位区間について得られた代表符号コードを、当該特定の単位区間についての基準位置に関する音響信号を符号化したコードとして出力するようにしたものである。
【０００９】
(3) 本発明の第３の態様は、上述の第２の態様に係る音響信号の符号化方法において、
区間長Ｌおよびオフセット長ΔＬを定義し（ただし、ΔＬ＜Ｌ）、各単位区間の時間軸上での長さを区間長Ｌに設定し、任意のｉに対して第ｉ番目の単位区間の始点と第（ｉ＋１）番目の単位区間の始点との時間軸上での隔たりをオフセット長ΔＬに設定し、各単位区間について得られた代表符号コードを、オフセット長ΔＬの時間周期で現れる基準位置に関する音響信号を符号化したコードとして出力するようにしたものである。
【００１０】
(4) 本発明の第４の態様は、上述の第１〜第３の態様に係る音響信号の符号化方法において、
スペクトル作成段階で、符号化対象となる音響信号を所定のサンプリング周期でサンプリングしてデジタル音響データとして取り込み、この取り込んだ音響データに対して各単位区間ごとにフーリエ変換を行うことによりスペクトルを作成するようにしたものである。
【００１１】
(5) 本発明の第５の態様は、上述の第３の態様に係る音響信号の符号化方法において、
スペクトル作成段階で、オフセット長△Ｌに基づいて決定される重み関数を窓関数として設定し、符号化対象となる音響信号の各単位区間に対して、前記窓関数を重畳した上でフーリエ変換を行うことによりスペクトルを作成するようにしたものである。
【００１２】
(6) 本発明の第６の態様は、上述の第４の態様に係る音響信号の符号化方法において、
スペクトル作成段階で、所定のサンプリング周期で取り込まれたデジタル音響データに対して、間引きを行わずにフーリエ変換を行うとともに、所定割合で間引きを行ってフーリエ変換を行うことにより、複数通りのスペクトルを用意し、これらのスペクトルを合成するようにしたものである。
【００１３】
(7) 本発明の第７の態様は、上述の第１〜第６の態様に係る音響信号の符号化方法において、
強度グラフ作成段階で、複数Ｑ個の符号コードとしてＭＩＤＩデータで利用されるノートナンバーを用い、
符号化段階で、個々の単位区間の音響信号を、代表符号コードとして抽出されたノートナンバーと、その強度に基いて決定されたベロシティーと、当該単位区間の長さに基いて決定されたデルタタイムと、を示すデータからなるＭＩＤＩ形式の符号データによって表現するようにしたものである。
【００１４】
(8) 本発明の第８の態様は、上述の第１〜第７の態様に係る音響信号の符号化方法において、
符号化段階で代表符号コードを抽出する際に、符号化対象となる強度グラフにおける候補の中から強度の大きい順にＰ個の符号コードを抽出して代表符号コードとするようにしたものである。
【００１５】
(9) 本発明の第９の態様は、上述の第１〜第７の態様に係る音響信号の符号化方法において、
符号化段階で代表符号コードを抽出する際に、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出した後、この第ｉ番目の代表符号コードおよびその倍音成分に相当する符号コードを候補から削除する処理を、ｉ＝１〜（Ｐ−１）について繰り返し実行し、更に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出することにより、合計Ｐ個の代表符号コードの抽出を行うようにしたものである。
【００１６】
(10) 本発明の第１０の態様は、上述の第１〜第７の態様に係る音響信号の符号化方法において、
各符号コードに基いて音を再生するために用いる音源を予め特定しておき、この音源を用いた各符号コードの再生音の周波数特性に基いて、特定の音階の符号コードを、当該特定の音階が倍音となるような、より低い音階の符号コードで代用するための補正テーブルを定義しておき、
符号化段階で代表符号コードを抽出する際に、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の参照コードとし、この第ｉ番目の参照コードに補正テーブルを適用することにより得られる符号コードを第ｉ番目の代表符号コードとして抽出し、第ｉ番目の参照コードおよび第ｉ番目の代表符号コードを候補から除外する処理を、ｉ＝１〜Ｐについて繰り返し実行し、合計Ｐ個の代表符号コードの抽出を行うようにしたものである。
【００１７】
(11) 本発明の第１１の態様は、上述の第１〜第７の態様に係る音響信号の符号化方法において、
各符号コードに基いて音を再生するために用いる音源を予め特定しておき、この音源を用いて各符号コードを実際に再生することにより得られる音響信号に対して、スペクトル作成段階および強度グラフ作成段階を実行し、各符号コードについての固有強度グラフを予め求めておき、
符号化段階で代表符号コードを抽出する際に、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出した後、符号化対象となる強度グラフの各強度値から第ｉ番目の代表符号コードについての固有強度グラフの各強度値を減算する処理を、ｉ＝１〜（Ｐ−１）について繰り返し実行し、更に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出することにより、合計Ｐ個の代表符号コードの抽出を行うようにしたものである。
【００１８】
(12) 本発明の第１２の態様は、上述の第１〜第７の態様に係る音響信号の符号化方法において、
各符号コードに基いて音を再生するために用いる音源を予め特定しておき、この音源を用いて各符号コードを実際に再生することにより得られる音響信号の固有波形を予め求めておき、
第ｉ番目の代表符号コードを決定するために、第ｉ番目の音響信号の波形情報を入力し、入力した波形情報に対してスペクトル作成段階および強度グラフ作成段階を行い、続く符号化段階で、作成された強度グラフにおける候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出し、更に、第ｉ番目の音響信号の強度値から第ｉ番目の代表符号コードについての固有波形の各強度値を減算し、その結果得られる音響信号を第（ｉ＋１）番目の音響信号とする符号抽出処理を定義し、
符号化対象となる原音響信号に対して区間設定段階を行い、各単位区間ごとの原音響信号をそれぞれ第１番目の音響信号として、ｉ＝１〜（Ｐ−１）について符号抽出処理を繰り返し実行し、最後に、第Ｐ番目の音響信号の波形情報を入力し、入力した波形情報に対してスペクトル作成段階および強度グラフ作成段階を行い、続く符号化段階で、作成された強度グラフにおける候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出する処理を実行することにより、各単位区間ごとにそれぞれ合計Ｐ個の代表符号コードの抽出を行うようにしたものである。
【００１９】
(13) 本発明の第１３の態様は、上述の第１〜第１２の態様に係る音響信号の符号化方法を実行する音響信号の符号化のためのプログラムをコンピュータ読み取り可能な記録媒体に記録するようにしたものである。
【００２３】
【発明の実施の形態】
以下、本発明を図示する実施形態に基づいて説明する。
【００２４】
§１．本発明に係る音響信号の符号化方法の基本原理
はじめに、本発明に係る音響信号の符号化方法の基本原理を図１を参照しながら説明する。いま、図１(a) に示すように、時系列の強度信号としてアナログ音響信号が与えられたものとしよう。図示の例では、横軸に時間ｔ、縦軸に振幅（強度）をとってこの音響信号を示している。ここでは、まずこのアナログ音響信号を、デジタルの音響データとして取り込む処理を行う。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周期でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えばよい。ここでは、説明の便宜上、ＰＣＭの手法でデジタル化した音響データの波形も、図１(a) のアナログ音響信号と同一の波形で示すことにする。
【００２５】
続いて、この符号化対象となる音響信号の時間軸上に、複数の単位区間を設定する。図１(a) に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの単位区間ｄ１〜ｄ５が設定されている（より実用的な区間設定方法については後述する）。
【００２６】
こうして単位区間が設定されたら、各単位区間ごとの音響信号に対してそれぞれフーリエ変換を行い、スペクトルを作成する。このとき、ハニング窓（Hanning Window )など周知の窓関数で切り出した音響信号にフィルタをかけてフーリエ変換を施すことが望ましい。一般にフーリエ変換は、切り出した区間前後に同様な信号が無限に存在することが想定されているため、矩形窓（窓なし）の場合、作成したスペクトルに高周波ノイズがのることが多い。このような場合、ハニング窓など区間の両端の重みが０になるような関数を用いるのが望ましい。ハニング窓関数Ｈ（ｋ）は、単位区間長をＬとすると、ｋ＝１…Ｌに対して、
Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（２πｋ／Ｌ）
で与えられる関数である。
【００２７】
図１(b) には、単位区間ｄ１について作成されたスペクトルの一例が示されている。このスペクトルでは、横軸上に定義された周波数ｆによって、単位区間ｄ１内の音響信号に含まれる周波数成分（０〜Ｆｓ：ここでＦｓはサンプリング周波数）が示されており、縦軸上に定義された複素強度Ａによって、各周波数成分ごとの複素強度が示されている。なお、このようなスペクトルを得る手法としては、フーリエ変換の他にも種々の手法が知られており、どのような手法を利用してもかまわない。また、アナログ音響信号から直接的にスペクトルを作成する手法を用いれば、音響信号をＰＣＭの手法でデジタル化する必要はない。
【００２８】
次に、このスペクトルの周波数軸ｆに対応させて、離散的に複数Ｑ個の符号コードを定義する。この例では、符号コードとしてＭＩＤＩデータで利用されるノートナンバーＮを用いており、Ｎ＝０〜１２７までの１２８個の符号コードを定義している。ノートナンバーＮは、音符の音階を示すパラメータであり、たとえば、ノートナンバーＮ＝６９は、ピアノの鍵盤中央の「ラ音（Ａ３音）」を示しており、４４０Ｈｚの音に相当する。このように、１２８個のノートナンバーには、いずれも所定の周波数が対応づけられるので、スペクトルの周波数軸ｆ上の所定位置に、それぞれ１２８個のノートナンバーＮが離散的に定義されることになる。
【００２９】
ここで、ノートナンバーＮは、１オクターブ上がると、周波数が２倍になる対数尺度の音階を示すため、周波数軸ｆに対して線形には対応しない。そこで、周波数軸ｆを対数尺度で表し、この対数尺度軸上にノートナンバーＮを定義した強度グラフを作成する。図１(c) は、このようにして作成された単位区間ｄ１についての強度グラフを示す。この強度グラフの横軸は、図１(b) に示すスペクトログラムの横軸を対数尺度に変換したものであり、ノートナンバーＮ＝０〜１２７が等間隔にプロットされている。一方、この強度グラフの縦軸は、図１(b) に示すスペクトルの複素強度Ａを実効強度Ｅに変換したものであり、各ノートナンバーＮの位置における強度を示している。一般に、フーリエ変換によって得られる複素強度Ａは、実数部Ｒと虚数部Ｉとによって表されるが、実効強度Ｅは、Ｅ＝（Ｒ^２＋Ｉ^２）^１／２なる演算によって求めることができる。
【００３０】
こうして求められた単位区間ｄ１の強度グラフは、単位区間ｄ１の音響信号に含まれる振動成分について、ノートナンバーＮ＝０〜１２７に相当する各振動成分の割合を実効強度として示すグラフということができる。そこで、この強度グラフに示されている各実効強度に基いて、全Ｑ個（この例ではＱ＝１２８）のノートナンバーの中からＰ個のノートナンバーを選択し、このＰ個のノートナンバーＮを、単位区間ｄ１を代表する代表符号コードとして抽出する。ここでは、説明の便宜上、Ｐ＝３として、全１２８個の候補の中から３個のノートナンバーを代表符号コードとして抽出する場合を示すことにする。たとえば、「候補の中から強度の大きい順にＰ個の符号コードを抽出する」という基準に基いて抽出を行えば、図１(c) に示す例では、第１番目の代表符号コードとしてノートナンバーＮｐ（ｄ１，１）が、第２番目の代表符号コードとしてノートナンバーＮｐ（ｄ１，２）が、第３番目の代表符号コードとしてノートナンバーＮｐ（ｄ１，３）が、それぞれ抽出されることになる。
【００３１】
このようにして、Ｐ個の代表符号コードが抽出されたら、これらの代表符号コードとその実効強度によって、単位区間ｄ１の音響信号を表現することができる。たとえば、上述の例の場合、図１(c) に示す強度グラフにおいて、ノートナンバーＮｐ（ｄ１，１）、Ｎｐ（ｄ１，２）、Ｎｐ（ｄ１，３）の実効強度がそれぞれＥｐ（ｄ１，１）、Ｅｐ（ｄ１，２）、Ｅｐ（ｄ１，３）であったとすれば、以下に示す３組のデータ対によって、単位区間ｄ１の音響信号を表現することができる。
【００３２】
Ｎｐ（ｄ１，１），Ｅｐ（ｄ１，１）
Ｎｐ（ｄ１，２），Ｅｐ（ｄ１，２）
Ｎｐ（ｄ１，３），Ｅｐ（ｄ１，３）
以上、単位区間ｄ１についての処理について説明したが、単位区間ｄ２〜ｄ５についても、それぞれ別個に同様の処理が行われ、代表符号コードおよびその強度を示すデータが得られることになる。たとえば、単位区間ｄ２については、
Ｎｐ（ｄ２，１），Ｅｐ（ｄ２，１）
Ｎｐ（ｄ２，２），Ｅｐ（ｄ２，２）
Ｎｐ（ｄ２，３），Ｅｐ（ｄ２，３）
なる３組のデータ対が得られる。このようにして各単位区間ごとに得られたデータによって、原音響信号を符号化することができる。
【００３３】
図２は、上述の方法による符号化の概念図である。図２(a) には、図１(a) と同様に、原音響信号について５つの単位区間ｄ１〜ｄ５を設定した状態が示されており、図２(b) には、各単位区間ごとに得られた符号データが音符の形式で示されている。この例では、個々の単位区間ごとに３個の代表符号コードを抽出しており（Ｐ＝３）、これら代表符号コードに関するデータを３つのトラックＴ１〜Ｔ３に分けて収容するようにしている。たとえば、単位区間ｄ１について抽出された代表符号コードＮｐ（ｄ１，１），Ｎｐ（ｄ１，２），Ｎｐ（ｄ１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容されている。もっとも、図２(b) は、本発明によって得られる符号データを音符の形式で示した概念図であり、実際には、各音符にはそれぞれ強度に関するデータが付加されている。たとえば、トラックＴ１には、ノートナンバーＮｐ（ｄ１，１），Ｎｐ（ｄ２，１），Ｎｐ（ｄ３，１）…なる音階を示すデータとともに、Ｅｐ（ｄ１，１），Ｅｐ（ｄ２，１），Ｅｐ（ｄ３，１）…なる強度を示すデータが収容されることになる。
【００３４】
本発明における符号化の形式としては、必ずしもＭＩＤＩ形式を採用する必要はないが、この種の符号化形式としてはＭＩＤＩ形式が最も普及しているため、実用上はＭＩＤＩ形式の符号データを用いるのが最も好ましい。ＭＩＤＩ形式では、「ノートオン」データもしくは「ノートオフ」データが、「デルタタイム」データを介在させながら存在する。「ノートオン」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏開始を指示するデータであり、「ノートオフ」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏終了を指示するデータである。また、「デルタタイム」データは、所定の時間間隔を示すデータである。ベロシティーＶは、たとえば、ピアノの鍵盤などを押し下げる速度（ノートオン時のベロシティー）および鍵盤から指を離す速度（ノートオフ時のベロシティー）を示すパラメータであり、特定の音の演奏開始操作もしくは演奏終了操作の強さを示すことになる。
【００３５】
本実施形態では、上述したように、第ｉ番目の単位区間ｄｉについて、代表符号コードとしてＰ個のノートナンバーＮｐ（ｄｉ，１），Ｎｐ（ｄｉ，２），…，Ｎｐ（ｄｉ，Ｐ）が得られ、このそれぞれについて実効強度Ｅｐ（ｄｉ，１），Ｅｐ（ｄｉ，２），…，Ｅｐ（ｄｉ，Ｐ）が得られる。そこで本実施形態では、次のような手法により、ＭＩＤＩ形式の符号データを作成している。まず、「ノートオン」データもしくは「ノートオフ」データの中で記述するノートナンバーＮとしては、得られたノートナンバーＮｐ（ｄｉ，１），Ｎｐ（ｄｉ，２），…，Ｎｐ（ｄｉ，Ｐ）をそのまま用いている。一方、「ノートオン」データもしくは「ノートオフ」データの中で記述するベロシティーＶとしては、得られた実効強度Ｅｐ（ｄｉ，１），Ｅｐ（ｄｉ，２），…，Ｅｐ（ｄｉ，Ｐ）を、値が０〜１の範囲となるように規格化し、この規格化後の実効強度Ｅの平方根に１２７を乗じた値を用いている。すなわち、実効強度Ｅについての最大値をＥmax とした場合、
Ｖ＝（Ｅ／Ｅmax ）^１／２・１２７
なる演算で求まる値Ｖをベロシティーとして用いている。あるいは対数をとって、
Ｖ＝ｌｏｇ（Ｅ／Ｅmax ）・１２７＋１２７
（ただし、Ｖ＜０の場合はＶ＝０とする）
なる演算で求まる値Ｖをベロシティーとして用いてもよい。また、「デルタタイム」データは、各単位区間の長さに応じて設定すればよい。
【００３６】
結局、上述した実施形態では、３トラックからなるＭＩＤＩ符号データが得られることになる。このＭＩＤＩ符号データを所定のＭＩＤＩ音源を用いて再生すれば、３チャンネルのステレオ再生音として原音響信号が再生される。なお、ＭＩＤＩ符号データの再生機能をもった一般的な装置は、８チャンネルあるいは１６チャンネルのステレオ再生を行うことが可能であり、実用上は、Ｐ＝８あるいはＰ＝１６に設定し、８トラックあるいは１６トラックからなるＭＩＤＩ符号データを作成するのが好ましい。
【００３７】
上述した手順による符号化処理は、実際にはコンピュータを用いて実行される。本発明による符号化処理を実現するためのプログラムは、磁気ディスクや光ディスクなどのコンピュータ読み取り可能な記録媒体に記録して供給することができ、また、本発明による符号化処理によって符号化された符号データは、同様に、磁気ディスクや光ディスクなどのコンピュータ読み取り可能な記録媒体に記録して供給することができる。
【００３８】
§２．より実用的な区間設定方法
これまで、本発明に係る音響信号の符号化方法の基本原理を述べたが、以下、より実用的な符号化方法を述べる。ここでは、区間設定を行う上でのより実用的な手法を説明する。図２(a) に示された例では、時間軸ｔ上に等間隔に定義された６つの時刻ｔ１〜ｔ６を境界として、５つの単位区間ｄ１〜ｄ５が設定されている。このような区間設定に基いて符号化を行った場合、再生時に、境界となる時刻において音の不連続が発生しやすい。したがって、実用上は、隣接する単位区間が時間軸上で部分的に重複するような区間設定を行うのが好ましい。
【００３９】
図３(a) は、このように部分的に重複する区間設定を行った例である。図示されている単位区間ｄ１〜ｄ４は、いずれも部分的に重なっており、このような区間設定に基いて前述の処理を行うと、図３(b) の概念図に示されているような符号化が行われることになる。この例では、それぞれの単位区間の中心を基準位置として、各音符をそれぞれの基準位置に配置しているが、単位区間に対する相対的な基準位置は、必ずしも中心に設定する必要はない。図３(b) に示す概念図を図２(b) に示す概念図と比較すると、音符の密度が高まっていることがわかる。このように重複した区間設定を行うと、作成される符号データの数は増加することになるが、再生時に音の不連続が生じない自然な符号化が可能になる。
【００４０】
図４は、時間軸上で部分的に重複する区間設定を行う具体的な手法を示す図である。この具体例では、音響信号を２２ｋＨｚのサンプリング周波数でサンプリングすることによりデジタル音響データとして取り込み、個々の単位区間の区間長Ｌを１０２４サンプル分（約４７ｍｓｅｃ）に設定し、各単位区間ごとのずれ量を示すオフセット長ΔＬを２０サンプル分（約０．９ｍｓｅｃ）に設定したものである。すなわち、任意のｉに対して、第ｉ番目の単位区間の始点と第（ｉ＋１）番目の単位区間の始点との時間軸上での隔たりがオフセット長ΔＬに設定されることになる。たとえば、第１番目の単位区間ｄ１は、１〜１０２４番目のサンプルを含んでおり、第２番目の単位区間ｄ２は、２０サンプル分ずれた２１〜１０４４番目のサンプルを含んでいることになる。
【００４１】
このように、時間軸上で部分的に重複する区間設定を行った場合、隣接する単位区間においてかなりのサンプルが共通して用いられることになり、各単位区間ごとに求めたスペクトルに有効な差が生じないことが予想される。たとえば、上述の例の場合、第１番目の単位区間ｄ１と第２番目の単位区間ｄ２とを比較すると、２１〜１０２４番目のサンプルは両単位区間で全く共通して利用されており、両者の相違は、わずか２０サンプル分に依存していることになる。このように、隣接する単位区間のスペクトルに十分な差が得られないと、変化の激しい音響信号に追従できず、結果的に時間分解能が低下するという問題が生じることになる。このような問題に対処するためには、わずか２０サンプル分の相違により、フーリエ変換の入力側に大きな変化が生じるような対策を講じればよい。
【００４２】
そこで、本願発明者は、§１で言及した窓関数に対して、変化する２０サンプル分を強調するような細工を施すことを考案した。前述した周知のハニング窓は、むしろ隣接区間の変動を抑える方向に働くため、上述の問題に対処する観点からは逆効果である。そこで、区間両端の重みが減少するというハニング窓の特徴を継承しつつ、２０サンプル分を強調するような関数を考案し、実際に適用してみた。具体的には、単位区間の区間長をＬ、オフセット長をΔＬとして、
α＝Ｌ／２−ΔＬ／２
β＝Ｌ／２＋ΔＬ／２
なるα，βを定め、区間［α，β］で表される中央近傍区間（単位区間の中央位置に定義された幅ΔＬの区間）を定義し、
ｋ＝１…αのとき
Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（πｋ／２α）
ｋ＝α…βのとき
Ｈ（ｋ）＝０．５−０．５
＊ｃｏｓ（π（ｋ−α）／ΔＬ＋π／２）
ｋ＝β…Ｌのとき
Ｈ（ｋ）＝０．５−０．５
＊ｃｏｓ（π（ｋ−β）／２α＋３π／２）
なる改良型窓関数Ｈ（ｋ）を用いるようにすればよい。この改良型窓関数Ｈ（ｋ）は、半値幅がちょうどΔＬになるように狭幅に変形した分布関数であり、この関数を用いて実験を行ったところ、十分な効果が確認できた。
【００４３】
なお、上述した具体例のように、２２ｋＨｚのサンプリング周波数でサンプリングを行い、単位区間の区間長Ｌを１０２４サンプル分に設定した場合には、対数スケール変換により１２８種類のノートナンバーのうち上半分に相当するデータしか連続的に得ることができず、低音部のデータがいわゆる歯抜け状態になり、全体的に高音に偏ったスペクトルになることが確認できた。結局、１２８種類のノートナンバーすべてをカバーすることを考えると、区間長Ｌを８倍の８１９２サンプル以上に設定する必要がある。しかしながら、区間長Ｌを８倍にすると、各区間ごとの演算時間が６４倍になり、しかも前述した時間分解能の低下という問題を助長することになり現実的ではない。
【００４４】
そこで本願発明者は、同じ区間長で低音部に焦点を当てたスペクトルを別途求め、この別途求めたスペクトルを通常のスペクトルに合成する手法を考案した。低音部に焦点を当てたスペクトルは、次のような方法で、通常のスペクトルと同一演算負荷で容易に求めることができる。たとえば、図１(b) において、区間長Ｌは同一にしたまま、サンプリング周波数を通常の１／８であるＦｓ／８に設定すれば、Ｆｓ／８以下の周波数成分が拡大されたスペクトルを求めることができる。この処理は、音響信号のサンプル数を１／８に間引きして、同一区間長のサンプルを取り出してフーリエ変換を行う処理と等価である（区間長の時間軸のスケールは８倍になっている）。幸いなことに、既に離散データになっている音響信号のサンプリング周波数を上げるのは困難であるが、逆に下げるのは容易である。こうして得られた１／８間引きスペクトルを通常のスペクトルに合成することにより、ノートナンバー２４以上をすべてカバーできることが確認できた（ノートナンバー２４は、ピアノの最低音であり、これ以下の音は、通常の楽器で再生することができないため、実用上は不要である。）。しかも、この手法による演算負荷は、１０２４サンプル分のフーリエ変換をたかだか２回行う程度ですむ。
【００４５】
なお、強度グラフの横軸に定義された１２８種類のノートナンバーのそれぞれについての実効強度Ｅを求めるには、たとえば、各ノートナンバーＮに所定の周波数レンジを割り当て、割り当てられたレンジ内の各周波数の実効強度の平均値を、当該ノートナンバーＮの実効強度とすればよい。図５は、このような手法で実効強度を求める概念を示すグラフである。まず、フーリエ変換によって得られたスペクトルの横軸を対数尺度に変換し、縦軸を実効強度に変換すれば、図５に示すようなグラフが得られる。横軸上に示された周波数値２５９，２８０，２９１，…は、それぞれノートナンバーＮ＝６０，６１，６２，…に対応する周波数である。ここで、たとえば、ノートナンバーＮ＝６１についての実効強度を求めるには、周波数値２８０の近傍の所定の周波数レンジ（図にハッチングを施した領域）を、ノートナンバーＮ＝６１に割り当て、このレンジ内の各周波数の実効強度の最大値を、ノートナンバーＮ＝６１についての実効強度とすればよい。
【００４６】
§３．符号コードの統合処理
上述の§２で述べたように、部分的に重複する区間設定を行った場合、作成される符号コードの数はかなり増えることになる。ここでは、最終的に作成される符号コードの数をできるだけ削減するために効果的な統合処理を説明する。
【００４７】
たとえば、図６(a) に示すような音符で示される符号コードが作成された場合を考える。図示の例では、すべての符号コードが八分音符から構成されている。これは、区間長Ｌが一定であるため、作成される個々の符号コードも同一の長さになるためである。しかしながら、この図６(a) に示す音符群は、図６(b) に示すように書き直すことができる。すなわち、同じ音階を示す音符が複数連続して配置されていた場合には、この複数の音符を１つの音符に統合することができる。別言すれば、複数の単位区間に跨がった音符によって、個々の単位区間ごとの音符を置換することができる。
【００４８】
この図６に示す例では、同じ音階の音符のみを統合したが、統合対象となる音符は、必ずしも同じ音階の音符に限定されるものではなく、ある程度の類似性をもった音符を統合対象としてかまわない。たとえば、互いに１音階の差しかない一連の音符を統合対象として、１つの音符に置換することもできる。この場合は、たとえば、一連の音符の中で音階の低い方の音符によって置換すればよい。一般的に拡張すれば、隣接する複数の単位区間について、所定の条件下で互いに類似する代表符号コードがある場合、これら類似する代表符号コードを、複数の単位区間に跨がった統合符号コードに置換することにより、音符数を削減することが可能になる。
【００４９】
なお、図６では、音符を統合する例について、符号コードの統合処理の概念を説明したが、本発明に係る符号化処理によって作成される符号コードには、それぞれ強度を示すデータ（ＭＩＤＩデータの場合はベロシティー）が付加されている。したがって、符号コードを統合した場合、強度を示すデータも統合する必要がある。ここで、統合対象となる符号コードに、それぞれ異なる強度データが定義されていた場合には、たとえば、最も大きな強度データを統合後の符号コードについての強度データと定めるようにすればよい。ただ、ＭＩＤＩデータの場合、２つの符号コードを統合する際に、先行する符号コードの強度に比べて、後続する符号コードの強度がかなり大きい場合、これら２つの符号コードを統合すると不自然になる。これは、通常のＭＩＤＩ音源の再生音は、楽器の演奏音から構成されており、音の強度が時間とともに減衰してゆくのが一般的だからである。したがって、先行する符号コードの強度に比べて、後続する符号コードの強度が小さい場合には、１つの統合符号コードに置換しても不自然さは生じないが、逆の場合には、不自然さが生じることになる。そこで、２つの符号コードの強度差が所定の基準以上であり、かつ、先行する符号コードの強度に比べて、後続する符号コードの強度が大きい場合には、統合を行わない、というような条件を設定しておくのが好ましい。
【００５０】
上述のように、符号コードの統合処理が行われると、符号コードの数を低減させるメリットが得られるので、できる限り統合処理が促進されるような配慮を行うのが望ましい。このような配慮を行うための最も効果的な手法は、符号コードを周波数に基いてソートしてから各トラックに収容する手法である。図６(a) に示した音符群は、同一のトラック上に収容された符号コードである。統合処理の対象となる音符は、通常、同一のトラック上に収容されている必要がある。ところが、実際には、図２(b) に示すように、複数Ｐ個のトラック（図２(b) の例では、Ｐ＝３）が定義され、各単位区間ごとに抽出されたＰ個の符号コードは、このＰ個のトラックにそれぞれ分けて収容されることになり、統合対象となる音符が出現する確率は、各トラックへの分離処理の方法に大きく依存する。たとえば、図２(b) に示すように、３つの符号データを３個のトラックＴ１，Ｔ２，Ｔ３に分離する場合、３つのうち最も周波数の低いものをトラックＴ１へ、次に周波数の低いものをトラックＴ２へ、最も周波数の高いものをトラックＴ３へ、それぞれ収容するように分離方法を決めておけば、周波数に全く無関係に分離した場合に比べて、統合対象となる音符が出現する確率は向上すると考えられる。
【００５１】
結局、各単位区間ごとに抽出されるＰ個の符号コードを、それぞれＰ個のトラックに分離して収容する際に、抽出されたＰ個の符号コードを周波数に基いてソートしてから各トラックに収容するようにすれば、統合対象となる符号コードを増加させることができる。図７は、Ｐ＝８の場合についての周波数ソート例を示す概念図である。Ｐ＝８の場合、ある単位区間ｄ１についての代表符号コードとして、８個のノートナンバーＮｐ（ｄ１，１）〜Ｎｐ（ｄ１，８）が抽出されることになる。この抽出処理では、８個のノートナンバーは、たとえば、実効強度の大きさの順に順次抽出されることになり、実効強度の大きさの順にソートされた状態になっている（図７の左列）。これを、周波数でソートすれば、たとえば、図７の中列のように順序が入れ替わる。こうしてソートされたノートナンバーを、図７の右列に示すように、８個のトラックＴ１〜Ｔ８に収容するようにすれば、たとえば、８個のノートナンバーの中で最も周波数の低い（ナンバーの小さい）ノートナンバーは常にトラックＴ１に収容され、最も周波数の高い（ナンバーの大きい）ノートナンバーは常にトラックＴ８に収容されるようになる。その結果、いずれのトラックにおいても、統合対象となるノートナンバーの出現頻度が向上することになる。
【００５２】
§４．代表符号コードの抽出方法
図１(c) に示す例では、単位区間ｄ１の強度グラフにおいて、横軸上に定義された１２８個のノートナンバーの中から、３つのノートナンバーＮｐ（ｄ１，１），Ｎｐ（ｄ１，２），Ｎｐ（ｄ１，３）が代表符号コードとして抽出され、抽出された各代表符号コードは、３つのトラックＴ１，Ｔ２，Ｔ３にそれぞれ分離して収容されることになる。一般的に、Ｐ個のトラックＴ１〜ＴＰを用意した場合、Ｐ個のノートナンバーＮｐ（ｄ１，１），Ｎｐ（ｄ１，２），…，Ｎｐ（ｄ１，Ｐ）を代表符号コードとして抽出する必要がある。ここでは、この代表符号コードを抽出する方法として、具体的な５つの方法を述べることにする。
【００５３】
＜＜＜４．１第１の抽出方法＞＞＞
第１の方法は、符号化対象となる強度グラフにおける候補の中から強度の大きい順にＰ個の符号コードを抽出して、これを代表符号コードとする方法である。図１(c) に示された３つのノートナンバーＮｐ（ｄ１，１），Ｎｐ（ｄ１，２），Ｎｐ（ｄ１，３）は、この第１の方法に基いて抽出されている。すなわち、図１(c) に示す強度グラフにおいて、実効強度Ｅの最も大きなノートナンバーＮｐ（ｄ１，１）が第１代表符号コードとして抽出され、実効強度Ｅが２番目に大きなノートナンバーＮｐ（ｄ１，２）が第２代表符号コードとして抽出され、実効強度Ｅが３番目に大きなノートナンバーＮｐ（ｄ１，３）が第３代表符号コードとして抽出されることになる。
【００５４】
図８は、この第１の方法の原理を説明する図である。ここでは、説明の便宜上、５つのノートナンバーＮａ，Ｎｂ，Ｎｃ，Ｎｄ，Ｎｅについて、それぞれ図８(a) に示すような実効強度が定義され、他のノートナンバーについては、いずれも実効強度が零である単純な場合を考える（実際には、１２８個のノートナンバーすべてが、いくらかの実効強度値をもつのが一般的である）。第１の方法によれば、この５つの候補の中で、実効強度が最も大きなノートナンバーＮｂが第１代表符号コードとして抽出される。こうして抽出されたノートナンバーは、候補から削除される。図８に示す例では、第１代表符号コードとして抽出されたノートナンバーＮｂが候補から削除されることになる。図８(b) には、候補から削除されたノートナンバーＮｂのグラフが破線で示されている。続いて、図８(b) に実線グラフで示されている残りの４候補の中で最も実効強度が大きなノートナンバーＮｃが第２代表符号コードとして抽出され、候補から削除されることになる。このような処理を第Ｐ代表符号コードが抽出されるまで繰り返し実行すればよい。
【００５５】
もともと、各単位区間ごとに抽出された代表符号コードは、当該単位区間内の原音響信号に含まれている代表的な周波数成分を示すためのものであるから、原理的には、実効強度の大きい順にＰ個の代表符号コードを抽出するという第１の方法は、最も適当な方法のように見える。しかしながら、この第１の抽出方法を用いて実際に符号化を行った結果、再生時に全体的に音程が高音側にシフトするという現象が確認できた。たとえば、男性の話声を原音響信号として、この第１の抽出方法を用いた符号化を行い、得られた符号データを、一般のＭＩＤＩ音源を用いて再生した場合、もとの男性の話声にくらべてやや甲高い女性の話声に近い再生音が得られた。
【００５６】
本願発明者は、このような現象が生じる理由は、ＭＩＤＩ音源に利用される楽器音などが、倍音成分（基本成分の整数倍の周波数をもった成分）を含んでいるためであると考えている。たとえば、ピアノの鍵盤中央の「ラ音（Ａ３音）」の基本周波数成分は４４０Ｈｚであるが、実際にこの「ラ音（Ａ３音）」の鍵盤を弾いてみると、基本周波数成分である４４０Ｈｚの音とともに、その２倍の周波数成分８８０Ｈｚの音（１オクターブ上のラ音（Ａ４音））や、３倍、４倍、…の周波数成分の音（倍音成分）が混在していることがわかる。したがって、たとえば、代表符号コードとして、ノートナンバーＮ＝６９（Ａ３音）を抽出した場合、再生時には、このノートナンバーＮ＝６９の基本周波数成分である４４０Ｈｚの音の他に、８８０Ｈｚ，１３２０Ｈｚ，…などの倍音成分の音が混在することになる。したがって、この第１の抽出方法によって、実効強度の大きい順にＰ個の代表符号コードを抽出すると、ＭＩＤＩ音源を用いた再生時には、各代表符号コードの基本周波数成分の音に、これらの倍音成分の音が加わることになり、全体的に高音側の強度が高められた状態で再生が行われることになる。再生時に音程が全体的に高音側にシフトするという現象は、このような理由により発生すると思われる。
【００５７】
本願発明者は、このような理由に着目し、再生時に音程が全体的に高音側にシフトするという現象を抑えるための代表符号コードの抽出方法を想到するに至った。以下に述べる各抽出方法は、いずれもこのような着想に基く方法である。
【００５８】
＜＜＜４．２第２の抽出方法＞＞＞
第２の方法は、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出した後、この第ｉ番目の代表符号コードおよびその倍音成分に相当する符号コードを候補から削除する処理を、ｉ＝１〜（Ｐ−１）について繰り返し実行し、最後に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出することにより、合計Ｐ個の代表符号コードの抽出を行う方法である。
【００５９】
たとえば、図９(a) に示すように、５つのノートナンバーＮａ，Ｎｂ，Ｎｃ，Ｎｄ，Ｎｅについて、それぞれ図のような実効強度が定義されている場合を考える。まず、ｉ＝１として、候補の中からこの時点で最も強度の大きい符号コードであるノートナンバーＮｂが、第１代表符号コードとして抽出される。続いて、この抽出されたノートナンバーＮｂおよびその倍音成分に相当する符号コードが候補から削除される。たとえば、ノートナンバーＮｃがノートナンバーＮｂの倍音成分であったとすると、図９(b) に破線で示すように、既に抽出されたノートナンバーＮｂとともに、その倍音成分であるノートナンバーＮｃが候補から削除される。図９(b) では、候補から削除されたノートナンバーＮｂ，Ｎｃのグラフが破線で示されている。続いて、ｉ＝２への更新が行われ、残った候補の中から最も強度の大きい符号コードであるノートナンバーＮａが、第２代表符号コードとして抽出される。そして、この抽出されたノートナンバーＮａおよびその倍音成分に相当する符号コードが候補から削除される。
【００６０】
このような処理を、ｉ＝３，ｉ＝４，…と更新しながら、ｉ＝Ｐ−１まで実行すれば、（Ｐ−１）番目の代表符号コードまでの抽出が完了する。最後に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出すれば、合計Ｐ個の代表符号コードを抽出することができる。
【００６１】
この第２の抽出方法では、１つの代表符号コードが抽出されると、その倍音成分に相当する符号コードが候補から削除されるため、最終的に抽出されたＰ個の代表符号コードの中には、互いに倍音関係にある符号コードが含まれないことになる。したがって、再生時に倍音成分が強調されて甲高い音になる現象を緩和することができる。ただ、この第２の抽出方法では、再生音が甲高い音になる現象を完全に抑制することはできない。その理由は、一般的なＭＩＤＩ音源には、本来の基本周波数成分の強度よりも、倍音成分の強度の方が大きい音が含まれているためと考えられる。
【００６２】
図１０は、一般的なピアノのＭＩＤＩ音源について、ノートナンバーＮ＝２４〜８４に含まれるピーク周波数を測定した結果を示す図表である。たとえば、ノートナンバーＮ＝２４は、本来は「Ｃ０音」の音階に相当する音であるが、この音をＭＩＤＩ音源で再生した際の再生音に含まれる周波数成分を調べると、ピーク周波数が１２９Ｈｚという本来の音階の基本周波数よりも高い結果が得られている。この図表の「対応する音階」欄に示された音階は、このピーク周波数に対応する音階を示しており、ノートナンバーＮ＝２４の場合、対応する音階は「Ｃ２」音になっている。別言すれば、「Ｃ０音」を本来の音階とするノートナンバーＮ＝２４の音を再生すると、実際には、「Ｃ０音」に対応する基本周波数の強度よりも、その４倍音成分である「Ｃ２音」に対応する周波数（１２９Ｈｚ）の強度の方が大きいことがわかる。このような傾向は、主に、ノートナンバーＮ＝５７以下の音についてみられる。すなわち、ノートナンバーＮ＝５７以下の音のうち、Ｎ＝４１，４５，４６，４８，４９，５２，５４，５６については、それぞれ基本周波数の強度が最も大きく、本来の音階とピーク周波数に対応する音階とが一致しているが、それ以外の音ではいずれも基本周波数の強度よりも倍音成分の強度の方が大きくなっており、ピーク周波数に対応する音階が本来の音階に一致していない。なお、ノートナンバーＮ＝５８以上の音については、いずれも基本周波数の強度が最も大きく、本来の音階とピーク周波数に対応する音階とが一致している。
【００６３】
このような特性があると、この第２の抽出方法では、再生時に音程が全体的に高音側にシフトするという現象を完全に抑制することはできない。すなわち、ノートナンバーＮ＝５７以下の音が代表符号コードとして抽出された場合、これらの本来の音階の強度よりも倍音成分の強度の方が大きくなるため、依然として、再生時には高音側が強調されることになる。
【００６４】
＜＜＜４．３第３の抽出方法＞＞＞
第３の抽出方法は、図１０に示すような特性を考慮した方法である。すなわち、予め、各符号コードに基いて音を再生するために用いる音源を特定しておき、この特定の音源を用いた各符号コードの再生音の周波数特性（たとえば、図１０に示すような特性）を求めておく。そして、求めた周波数特性に基いて、所定の補正テーブルを定義する。具体的には、特定のノートナンバーの音を、それより低いノートナンバーの音で代用するような補正テーブルを定義すればよい。たとえば、図１０に示す周波数特性をもった音源を用いて再生を行う場合であれば、ノートナンバーＮ＝４８（Ｃ２音）の音は、それより低いノートナンバーＮ＝２４（Ｃ０音）で代用することができる。なぜなら、ノートナンバーＮ＝２４の音を再生した場合、本来の音階である「Ｃ０音」の強度よりも、倍音成分である「Ｃ２音」の強度の方が大きいからである。
【００６５】
このように低い音で代用することが可能なノートナンバーについては、それぞれ代用対象となるノートナンバーを予め定めておき、代用対象に置換する旨の補正指示を補正テーブルの形で用意しておけばよい。代表符号コードの抽出時には、この補正テーブルを参照しながら、実際に抽出する符号コードを補正するようにする。たとえば、本来であれば、その時点で強度の最も大きいノートナンバーＮ＝４８（Ｃ２音）の音を抽出すべき場合でも、ノートナンバーＮ＝４８（Ｃ２音）をノートナンバーＮ＝２４（Ｃ０音）に補正する旨の指示が補正テーブルにあれば、ノートナンバーＮ＝２４（Ｃ０音）を代表符号コードとして抽出すればよい。
【００６６】
結局、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の参照コードとし、この第ｉ番目の参照コードに、用意した補正テーブルを適用することにより得られる符号コードを第ｉ番目の代表符号コードとして抽出し、第ｉ番目の参照コードおよび第ｉ番目の代表符号コードの双方を候補から除外する処理を、ｉ＝１〜Ｐについて繰り返し実行し、合計Ｐ個の代表符号コードの抽出を行うようにすればよい。
【００６７】
たとえば、図１１(a) に示すように、５つのノートナンバーＮａ，Ｎｂ，Ｎｃ，Ｎｄ，Ｎｅについて、それぞれ図のような実効強度が定義されている場合を考える。まず、ｉ＝１として、候補の中からこの時点で最も強度の大きい符号コードであるノートナンバーＮｂが、第１参照コードとして抽出される。続いて、この第１参照コードに、用意した補正テーブルを適用する。たとえば、補正テーブルに、ノートナンバーＮｂをノートナンバーＮｂ^＊に補正する旨の指示があったとすれば、図１１(b) に示すように、補正後のノートナンバーＮｂ^＊が第１代表符号コードとして抽出されることになる。このとき、第１参照コードであるノートナンバーＮｂと補正後のノートナンバーＮｂ^＊は候補から除外される。図１１(c) では、候補から除外されたノートナンバーＮｂのグラフが破線で示されている（Ｎｂ^＊はもともと強度成分が０に近いため図示されていない）。
【００６８】
続いて、ｉ＝２への更新が行われ、図１１(c) に示すように、残った候補の中から最も強度の大きい符号コードであるノートナンバーＮｃが、第２参照コードとして抽出される。そして、この第２参照コードに、用意した補正テーブルを適用する。たとえば、補正テーブルに、ノートナンバーＮｃをノートナンバーＮｃ^＊に補正する旨の指示があったとすれば、図１１(d) に示すように、補正後のノートナンバーＮｃ^＊が第２代表符号コードとして抽出されることになる。このとき、第２参照コードであるノートナンバーＮｃは候補から除外される。このような処理を、ｉ＝３，ｉ＝４，…と更新しながら、ｉ＝Ｐまで実行すれば、Ｐ番目の代表符号コードまでの抽出が完了する。
【００６９】
この第３の抽出方法を行う際には、補正テーブルの作成方法が重要である。用いる補正テーブルが不適当であると、補正により音程が大きく外れてしまう結果を招くことになる。なお、用意すべき補正テーブルは、厳密に言えば、再生時に用いる音源に依存することになるが、一般的なＭＩＤＩ音源はいずれも類似した周波数特性を有することが多いため、特定の音源について用意した補正テーブルは、別な音源を用いる場合にも、ある程度の汎用性をもって利用することができる。
【００７０】
本願発明者が行った実験によれば、この第３の抽出方法を行うことにより、再生時の音程が全体的に高音側にシフトするという現象をある程度低減させることができるが、この現象をより効果的に抑制するためには、前述した第２の抽出方法とこの第３の抽出方法を組み合わせて用いるのが好ましい。すなわち、図１１(b) に示すように、第１代表符号コードとしてノートナンバーＮｂ^＊を抽出した後、第１参照コードであるノートナンバーＮｂとともに、その倍音成分を候補から削除するようにする。たとえば、ノートナンバーＮｃがノートナンバーＮｂの倍音成分であったとすると、図１１(e) に破線で示すように、ノートナンバーＮｂ，Ｎｃがともに候補から削除されることになり、第２参照コードとしては、ノートナンバーＮａが抽出されることになる。そして、この第２参照コードに、用意した補正テーブルを適用した結果、補正テーブルに、ノートナンバーＮａをノートナンバーＮａ^＊に補正する旨の指示があったとすれば、図１１(f) に示すように、補正後のノートナンバーＮａ^＊が第２代表符号コードとして抽出されることになる。このとき、第２参照コードであるノートナンバーＮａとその倍音成分が候補から削除される。このような処理を、ｉ＝３，ｉ＝４，…と更新しながら、ｉ＝Ｐまで実行すれば、Ｐ番目の代表符号コードまでの抽出が完了する。
【００７１】
＜＜＜４．４第４の抽出方法＞＞＞
第４の抽出方法では、音を再生するために用いる音源を予め特定しておき、この音源を用いて各符号コードを実際に再生することにより得られる音響信号の波形を実測しておく。そして、この音響信号の波形に対して、§１で述べたスペクトル作成段階および強度グラフ作成段階を実行し、各符号コードについての強度グラフを予め求めておく。すなわち、ノートナンバーＮ＝０〜１２７の１２８通りの音を、実際のＭＩＤＩ音源を用いて再生し、この再生波形に対して、図１(b) に示すようなスペクトルを求め（たとえば、§１で述べた単位区間と同一長さの区間を代表区間として設定し、この代表区間についてのスペクトルを求める。このとき、代表区間の設定は、できるだけ信号の立ち上がりまたは立ち下がり部分を避けて設定する。あるいは適当な複数区間についての平均スペクトルを求めてもよい。）、更に、図１(c) に示すような強度グラフ（ここでは、各符号コード（ノートナンバー）についての固有強度グラフと呼ぶ）を求めておく。結局、ノートナンバーＮ＝０〜１２７について、１２８通りの固有強度グラフが得られることになる。以上が、この第４の抽出方法の準備段階となる。
【００７２】
実際に符号化対象となる音響信号を符号化する段階では、次のような方法で、代表符号コードの抽出が行われる。すなわち、符号化対象となる強度グラフにおけるその時点での候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出した後、符号化対象となる強度グラフの各強度値からこの第ｉ番目の代表符号コードについての固有強度グラフの各強度値を減算する処理を、ｉ＝１〜（Ｐ−１）について繰り返し実行し、更に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出することにより、合計Ｐ個の代表符号コードの抽出を行う。
【００７３】
たとえば、図１２(a) に示すように、５つのノートナンバーＮａ，Ｎｂ，Ｎｃ，Ｎｄ，Ｎｅについて、それぞれ図のような実効強度が定義されている場合を考える。まず、ｉ＝１として、候補の中からこの時点で最も強度の大きい符号コードであるノートナンバーＮｂが、第１代表符号コードとして抽出される。このノートナンバーＮｂについては、上述した準備段階において、固有強度グラフが求められている。たとえば、特定のＭＩＤＩ音源を用いて、このノートナンバーＮｂを再生した場合に、図１２(b) に示すような再生信号波形が得られる場合、この再生信号波形に対して、§１で述べたスペクトル作成段階および強度グラフ作成段階を実行することにより、図１２(c) に示すようなノートナンバーＮｂの固有強度グラフが用意されていることになる。そこで、図１２(a) に示す符号化対象となる強度グラフの各強度値からこのノートナンバーＮｂについての固有強度グラフの各強度値を減算する処理を行う。図１２(d) は減算の結果を示すグラフである。図に破線で示した部分が、減算によって削除された部分である。結局、減算の結果、この時点における「符号化対象となる強度グラフ」は、図１２(e) のようなグラフになる。
【００７４】
続いて、ｉ＝２への更新が行われ、図１２(e) に示すように、残った候補の中から最も強度の大きい符号コードであるノートナンバーＮａが、第２代表符号コードとして抽出される。そして、この時点における「符号化対象となる強度グラフ」である図１２(e) のグラフの各強度値から、ノートナンバーＮａについての固有強度グラフ（図示されていない）の各強度値を減算する処理を行い、この減算の結果得られたグラフが、新たな「符号化対象となる強度グラフ」となる。
【００７５】
このような処理を、ｉ＝３，ｉ＝４，…と更新しながら、ｉ＝Ｐ−１まで実行すれば、（Ｐ−１）番目の代表符号コードまでの抽出が完了する。最後に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出すれば、合計Ｐ個の代表符号コードを抽出することができる。
【００７６】
この第４の抽出方法は、再生時に利用する音源の特性に大きく依存するため、予め再生に利用する予定の音源が特定できている場合の利用に適している。１つの代表符号コードを抽出するたびに、その代表符号コードについての実際の再生音に含まれる周波数成分を減じてゆく手法を採っているため、極めて忠実な再生が可能になる。
【００７７】
なお、上述の例では、得られた固有強度グラフの強度値をそのまま減じているが、固有強度グラフの強度値について規格化を行った上で、減算を行うようにしてもよい。たとえば、図１２(a) に示す「符号化対象となる強度グラフ」における代表符号コードとして抽出されたノートナンバーＮｂの強度値をＸとし、図１２(c) の固有強度グラフにおける同じノートナンバーＮｂの強度値をＹとした場合、後者の固有強度グラフの各強度値をＸ／Ｙ倍してから減算を行うようにすると、減算結果として得られる新たな「符号化対象となる強度グラフ」におけるノートナンバーＮｂの強度値を零にすることができ、同じノートナンバーＮｂが繰り返し代表符号コードとして抽出されることを防ぐことができる。
【００７８】
＜＜＜４．５第５の抽出方法＞＞＞
第５の抽出方法では、上述した第４の抽出方法と同様に、準備段階として、音を再生するために用いる音源を予め特定しておき、この音源を用いて各符号コードを実際に再生することにより得られる音響信号の波形を実測しておく。ただし、この第５の抽出方法では、この実測した波形自体を保存しておき、後の処理の減算に用いるようにする。具体的には、ノートナンバーＮ＝０〜１２７の１２８通りの音を、実際のＭＩＤＩ音源を用いて再生し、この１２８通りの再生波形をそのまま保存しておくことになる。ここでは、これら各ノートナンバーについての再生波形を、各符号コードについての固有波形と呼ぶことにする。
【００７９】
実際に符号化対象となる音響信号を符号化する場合、次のような「符号抽出処理」を定義し、この「符号抽出処理」を繰り返し実行することになる。すなわち、ここで定義される「符号抽出処理」とは、「第ｉ番目の代表符号コードを決定するために、第ｉ番目の音響信号の波形情報を入力し、入力した波形情報に対して、§１で述べたスペクトル作成段階および強度グラフ作成段階を行い、続く符号化段階で、作成された強度グラフにおける候補の中から最も強度の大きい符号コードを第ｉ番目の代表符号コードとして抽出し、更に、第ｉ番目の音響信号の強度値から前記第ｉ番目の代表符号コードについての固有波形の各強度値を減算し、その結果得られる音響信号を第（ｉ＋１）番目の音響信号とする処理」である。
【００８０】
この第５の抽出方法では、まず、符号化対象となる原音響信号に対して区間設定処理を施し、時間軸上に複数の単位区間を設定する。そして、各単位区間ごとの音響信号をそれぞれ第１番目の音響信号として、上述した「符号抽出処理」を、各単位区間のそれぞれごとに、ｉ＝１〜（Ｐ−１）について繰り返し実行し、最後に、第Ｐ番目の音響信号の波形情報を入力し、入力した波形情報に対して、§１で述べたスペクトル作成段階および強度グラフ作成段階を行い、続く符号化段階で、作成された強度グラフにおける候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出すれば、各単位区間ごとにそれぞれ合計Ｐ個の代表符号コードの抽出を行うことができる。以上が、第５の抽出方法の基本手順である。以下、この基本手順を図１３に示す具体例に即して説明する。
【００８１】
まず、符号化対象となる原音響信号に対して、区間設定段階が行われ、各単位区間ごとの原音響信号が、それぞれ第１番目の音響信号となる。以下の処理は、それぞれ各単位ごとに行われることになる。まず、ｉ＝１に設定され、第１回目の「符号抽出処理」が実行される。ここでは、ある単位区間ｄについての第１番目の音響信号として、図１３(a) に示すような波形をもった信号が入力されたものとしよう。この信号は、単位区間ｄについての原信号というべき信号である。続いて、この図１３(a) に示す信号に対して、フーリエ変換を行いスペクトルを求め、このスペクトルに基いて強度グラフを求める。ここでは、図１３(b) に示すような強度グラフが得られたものとする。
【００８２】
次に、この強度グラフにおいて、最も強度の大きい符号コードであるノートナンバーＮｂが、第１代表符号コードとして抽出される。その後、図１３(a) に示す第１番目の音響信号の強度値から、図１３(c) に示す第１代表符号コードについての固有波形（準備段階で予め求めて保存してある）の各強度値を減算する。その結果、たとえば、図１３(d) に示すような音響信号が得られたとする。この図１３(d) に示す減算結果は、第２番目の音響信号となる。
【００８３】
続いて、ｉ＝２への更新が行われ、今度は、図１３(d) に示す第２番目の音響信号に対してフーリエ変換を行いスペクトルを求め、このスペクトルに基いて強度グラフを求める。ここでは、図１３(e) に示すような強度グラフが得られたものとする。次に、この強度グラフにおいて、最も強度の大きい符号コードであるノートナンバーＮｃが、第２代表符号コードとして抽出される。その後、図１３(d) に示す第２番目の音響信号の強度値から、第２代表符号コードについての固有波形（図示されていない）の各強度値を減算し、第３番目の音響信号を求める。
【００８４】
このような処理を、ｉ＝３，ｉ＝４，…と更新しながら、ｉ＝Ｐ−１まで実行すれば、（Ｐ−１）番目の代表符号コードまでの抽出が完了する。最後に、残った候補の中から最も強度の大きい符号コードを第Ｐ番目の代表符号コードとして抽出すれば、合計Ｐ個の代表符号コードを抽出することができる。
【００８５】
以上の処理を、各単位区間ごとに実行すれば、各単位区間ごとに、それぞれＰ個の代表符号コードを得ることができる。
【００８６】
【発明の効果】
以上のとおり本発明に係る符号化方法によれば、音響信号に対して効率的な符号化を行うことができるようになる。
【図面の簡単な説明】
【図１】本発明に係る音響信号の符号化方法の基本原理を示す図である。
【図２】図１(c) に示す強度グラフに基いて作成された符号コードを示す図である。
【図３】時間軸上に部分的に重複するように単位区間設定を行うことにより作成された符号コードを示す図である。
【図４】時間軸上に部分的に重複するような単位区間設定の具体例を示す図である。
【図５】周波数軸とノートナンバーとの対応関係を示すグラフである。
【図６】単位区間の統合処理により符号データの量を削減した例を示す図である。
【図７】複数のノートナンバーを周波数でソートしてからトラックに収容する概念を示す図である。
【図８】強度グラフに基いて代表符号コードを抽出する第１の方法を示す図である。
【図９】強度グラフに基いて代表符号コードを抽出する第２の方法を示す図である。
【図１０】ＭＩＤＩ音源で各ノートナンバーを再生した場合の周波数特性を示す図表である。
【図１１】強度グラフに基いて代表符号コードを抽出する第３の方法を示す図である。
【図１２】強度グラフに基いて代表符号コードを抽出する第４の方法を示す図である。
【図１３】強度グラフに基いて代表符号コードを抽出する第５の方法を示す図である。
【符号の説明】
Ａ…複素強度
ｄ１〜ｄ５…単位区間
Ｅ…実効強度
Ｆｓ…サンプリング周波数
ｆ…周波数
Ｌ…単位区間の区間長
ΔＬ…オフセット長
Ｎ…ノートナンバー
Ｎａ〜Ｎｅ…ノートナンバー
Ｎａ^＊，Ｎｂ^＊，Ｎｃ^＊…補正により得られたノートナンバー
Ｎｐ（ｄｊ，ｉ）…単位区間ｄｊについて抽出された第ｉ番目の代表符号コード（ノートナンバー）
Ｅｐ（ｄｊ，ｉ）…代表符号コードＮｐ（ｄｊ，ｉ）の実効強度
Ｔ１〜Ｔ８…トラック
ｔ１〜ｔ６…時刻[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method for encoding an acoustic signal, and relates to a technique for encoding an acoustic signal given as a time-series intensity signal, decoding it, and reproducing it. In particular, the present invention is suitable for a process for efficiently converting vocal acoustic signals (human speech and singing voice signals) into MIDI-format code data, and is expected to be applied to various industrial fields for recording speech. The
[0002]
[Prior art]
As a technique for encoding an acoustic signal, a PCM (Pulse Code Modulation) technique is the most popular technique, and is currently widely used as a recording system for audio CDs, DAT, and the like. The basic principle of this PCM method is that analog audio signals are sampled at a predetermined sampling frequency, and the signal intensity at each sampling is quantized and expressed as digital data. The sampling frequency and the number of quantization bits can be increased. The more you play, the more faithfully the original sound can be played. However, the higher the sampling frequency and the number of quantization bits, the more information is required. Therefore, as a technique for reducing the amount of information as much as possible, an ADPCM (Adaptive Differential Pulse Code Modulation) technique that encodes only a signal change difference is also used.
[0003]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, when reproducing actual sound, a separate MIDI sound source storing the waveform of the instrument sound is required. However, compared to the case where sound is recorded by the PCM method described above, the amount of information is extremely small, and the high coding efficiency is attracting attention. The encoding and decoding technology based on the MIDI standard is widely used in software for performing musical instruments, practicing musical instruments, composing music, etc. using a personal computer, and is widely used in fields such as karaoke and game sound effects. Has been.
[0004]
[Problems to be solved by the invention]
As described above, when an acoustic signal is encoded by the PCM method, if an attempt is made to ensure sufficient sound quality, the amount of information becomes enormous and the burden of data processing must be increased. Therefore, normally, in order to limit the amount of information to a certain level, a certain level of sound quality must be compromised. Of course, if the encoding method based on the MIDI standard is adopted, it is possible to reproduce a sound having a sufficient sound quality with a very small amount of information. However, as described above, the MIDI standard itself originally performed the operation of the musical instrument. Since it is for encoding, it cannot be widely applied to general sound. In other words, in order to create MIDI data, it is necessary to actually play a musical instrument or prepare information on a musical score.
[0005]
As described above, both the conventional PCM method and the MIDI method have advantages and disadvantages in the method of encoding an acoustic signal, and sufficient sound quality is ensured with a small amount of information for general sound. I can't do it. However, there is an increasing demand for efficient encoding of general sound. In the field of human voice and singing voice called so-called vocal sound, such a request has been strongly issued for some time. For example, in the fields of language education, vocal music education, criminal investigation and the like, there is a strong demand for a technique for efficiently encoding a vocal acoustic signal.
[0006]
SUMMARY OF THE INVENTION An object of the present invention is to provide an audio signal encoding method capable of efficiently encoding an audio signal including a human voice or singing voice.
[0007]
[Means for Solving the Problems]
  (1) A first aspect of the present invention is an acoustic signal encoding method for encoding an acoustic signal given as a time-series intensity signal.
  A section setting stage for setting a plurality of unit sections on the time axis of the acoustic signal to be encoded,
  For each unit section, a spectrum creation stage for creating a spectrum with the frequency component contained in the acoustic signal in the unit section as the first axis and the intensity for each frequency component as the second axis;
  Discretely corresponding to the first axis of the spectrumIndicates multiple Q-stage scalesA plurality of Q code codes are defined, and an intensity graph using the plurality of Q code codes as a first axis and the intensity of each code code as a second axis is displayed as a spectrum for each unit section. Strength graph creation stage to create each based on,
  Based on the intensity of each code code in the intensity graph, P representative code codes representing the unit section are extracted from all Q code codes for each unit section, and these extracted representative codes are extracted. An encoding stage that represents the acoustic signal of individual unit sections by the code and its strength;
  For a plurality of adjacent unit sections, when there is a series of representative code codes whose scale differences are within a predetermined range, an integration process that replaces the series of representative code codes with an integrated code code that straddles a plurality of unit sections Stages,
  Is to do.
[0008]
  (2) According to a second aspect of the present invention, in the audio signal encoding method according to the first aspect described above,
  In the section setting stage, set so that adjacent unit sections partially overlap on the time axis.And defining a reference position on the time axis for each unit section, and outputting the representative code code obtained for the specific unit section as a code obtained by encoding an acoustic signal related to the reference position for the specific unit sectionIt is what I did.
[0009]
  (3) A third aspect of the present invention is the acoustic signal encoding method according to the second aspect described above,
  The section length L and the offset length ΔL are defined (where ΔL <L), the length of each unit section on the time axis is set to the section length L, and the i-th unit section for any i The offset on the time axis between the start point and the start point of the (i + 1) th unit section is set to the offset length ΔL.The representative code code obtained for each unit section is output as a code obtained by encoding the acoustic signal related to the reference position that appears in the time period of the offset length ΔL.It is what I did.
[0010]
(4) According to a fourth aspect of the present invention, in the acoustic signal encoding method according to the first to third aspects described above,
In the spectrum creation stage, the acoustic signal to be encoded is sampled at a predetermined sampling period and is taken in as digital acoustic data, and a spectrum is created by performing Fourier transform on the acquired acoustic data for each unit section. It is what I did.
[0011]
(5) According to a fifth aspect of the present invention, in the audio signal encoding method according to the third aspect described above,
In the spectrum creation stage, a weighting function determined based on the offset length ΔL is set as a window function, and the Fourier transform is performed after superimposing the window function on each unit section of the acoustic signal to be encoded. By doing so, a spectrum is created.
[0012]
  (6) According to a sixth aspect of the present invention, in the audio signal encoding method according to the fourth aspect described above,
  During the spectrum creation stage,PredeterminedCaptured at the sampling cycleDigitalFor acoustic dataBy performing Fourier transform without performing decimation and performing decimation at a predetermined rate,A plurality of spectra are prepared, and these spectra are synthesized.
[0013]
(7) A seventh aspect of the present invention is the acoustic signal encoding method according to the first to sixth aspects described above,
In the intensity graph creation stage, using the note number used in MIDI data as multiple Q code codes,
At the encoding stage, the acoustic signal of each unit section is converted into the delta determined based on the note number extracted as the representative code code, the velocity determined based on the intensity, and the length of the unit section. This is expressed by MIDI format code data composed of data indicating time.
[0014]
(8) According to an eighth aspect of the present invention, in the acoustic signal encoding method according to the first to seventh aspects described above,
When the representative code code is extracted at the encoding stage, P code codes are extracted from the candidates in the strength graph to be encoded in descending order of the strength to obtain the representative code code.
[0015]
(9) According to a ninth aspect of the present invention, in the acoustic signal encoding method according to the first to seventh aspects described above,
When extracting the representative code code at the encoding stage, after extracting the code code having the highest intensity from the candidates at that time in the intensity graph to be encoded as the i-th representative code code, The process of deleting the i-th representative code code and the code code corresponding to its harmonic component from the candidates is repeated for i = 1 to (P-1), and the code with the highest intensity among the remaining candidates A total of P representative code codes are extracted by extracting the code as the P-th representative code code.
[0016]
  (10) A tenth aspect of the present invention is the acoustic signal encoding method according to the first to seventh aspects described above,
  A sound source used to reproduce sound based on each code code is specified in advance, and based on the frequency characteristics of the reproduced sound of each code code using this sound source.To replace the code code of a specific scale with a code code of a lower scale such that the specific scale is a harmonicDefine a correction table,
  When the representative code code is extracted at the encoding stage, the code code having the highest intensity among the candidates at that time in the intensity graph to be encoded is set as the i-th reference code, and this i-th reference code is used. The process of extracting the code code obtained by applying the correction table to the code as the i-th representative code code and excluding the i-th reference code and the i-th representative code code from the candidates is i = 1. -P is repeatedly executed, and a total of P representative code codes are extracted.
[0017]
(11) An eleventh aspect of the present invention is the acoustic signal encoding method according to the first to seventh aspects described above,
A sound source used to reproduce sound based on each code code is specified in advance, and a spectrum creation stage and an intensity graph are obtained for an acoustic signal obtained by actually reproducing each code code using this sound source. Execute the creation stage, obtain in advance a specific strength graph for each code code,
When extracting the representative code code at the encoding stage, the code code having the highest intensity is extracted as the i-th representative code code from the candidates at that time in the intensity graph to be encoded, and then encoded. The process of subtracting each intensity value of the inherent intensity graph for the i-th representative code code from each intensity value of the target intensity graph is repeatedly executed for i = 1 to (P−1), and the remaining By extracting the code code having the highest strength from the candidates as the P-th representative code code, a total of P representative code codes are extracted.
[0018]
(12) A twelfth aspect of the present invention is the acoustic signal encoding method according to the first to seventh aspects described above,
A sound source used to reproduce sound based on each code code is specified in advance, and a specific waveform of an acoustic signal obtained by actually reproducing each code code using this sound source is obtained in advance.
In order to determine the i-th representative code code, the waveform information of the i-th acoustic signal is input, the spectrum generation step and the intensity graph generation step are performed on the input waveform information, and in the subsequent encoding step, The code code having the highest intensity is extracted from the candidates in the created intensity graph as the i-th representative code code, and the i-th representative code code is determined from the intensity value of the i-th acoustic signal. Define a code extraction process that subtracts each intensity value of the waveform and sets the resulting acoustic signal as the (i + 1) th acoustic signal,
The section setting stage is performed on the original acoustic signal to be encoded, and the code extraction process is repeated for i = 1 to (P−1), with the original acoustic signal for each unit section as the first acoustic signal. And finally, the waveform information of the Pth acoustic signal is input, the spectrum creation stage and the intensity graph creation stage are performed on the input waveform information, and the candidates in the created intensity graph in the subsequent encoding stage By executing the process of extracting the code code having the highest strength from the P-th representative code code, a total of P representative code codes are extracted for each unit section. .
[0019]
  (13) According to a thirteenth aspect of the present invention, an audio signal encoding program for executing the audio signal encoding method according to the first to twelfth aspects is recorded on a computer-readable recording medium. It is a thing.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described based on the illustrated embodiments.
[0024]
§1. Basic principle of encoding method of acoustic signal according to the present invention
First, the basic principle of an audio signal encoding method according to the present invention will be described with reference to FIG. Assume that an analog acoustic signal is given as a time-series intensity signal, as shown in FIG. In the illustrated example, this acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis. Here, first, the analog sound signal is processed as digital sound data. This may be performed by using a conventional general PCM method, sampling the analog acoustic signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined number of quantization bits. Here, for convenience of explanation, the waveform of the acoustic data digitized by the PCM method is also shown by the same waveform as the analog acoustic signal of FIG.
[0025]
Subsequently, a plurality of unit sections are set on the time axis of the acoustic signal to be encoded. In the example shown in FIG. 1 (a), six times t1 to t6 are defined at equal intervals on the time axis t, and five unit intervals d1 to d5 having these times as start points and end points are set ( A more practical section setting method will be described later).
[0026]
When the unit section is set in this way, a Fourier transform is performed on the acoustic signal for each unit section to create a spectrum. At this time, it is desirable to apply a Fourier transform to the acoustic signal cut out by a known window function such as a Hanning Window. In general, in the Fourier transform, it is assumed that the same signal exists infinitely before and after the cut-out section. Therefore, in the case of a rectangular window (no window), high-frequency noise often appears in the created spectrum. In such a case, it is desirable to use a function such that the weights at both ends of the section are 0, such as a Hanning window. The Hanning window function H (k) is expressed as follows:
H (k) = 0.5−0.5 * cos (2πk / L)
Is a function given by
[0027]
FIG. 1 (b) shows an example of a spectrum created for the unit section d1. In this spectrum, the frequency component (0 to Fs: where Fs is a sampling frequency) included in the acoustic signal in the unit section d1 is indicated by the frequency f defined on the horizontal axis, and defined on the vertical axis. The complex intensity A for each frequency component is indicated by the complex intensity A. Various techniques other than Fourier transform are known as techniques for obtaining such a spectrum, and any technique may be used. Further, if a technique for creating a spectrum directly from an analog acoustic signal is used, it is not necessary to digitize the acoustic signal by a PCM technique.
[0028]
Next, a plurality of Q code codes are defined discretely corresponding to the frequency axis f of this spectrum. In this example, note numbers N used in MIDI data are used as code codes, and 128 code codes from N = 0 to 127 are defined. The note number N is a parameter indicating the scale of the note. For example, the note number N = 69 indicates the “ra sound (A3 sound)” at the center of the piano keyboard, and corresponds to a sound of 440 Hz. As described above, since the predetermined frequency is associated with each of the 128 note numbers, 128 note numbers N are discretely defined at predetermined positions on the frequency axis f of the spectrum. Become.
[0029]
Here, the note number N indicates a logarithmic scale in which the frequency is doubled by one octave, and therefore does not correspond linearly to the frequency axis f. Therefore, an intensity graph in which the frequency axis f is expressed on a logarithmic scale and the note number N is defined on the logarithmic scale axis is created. FIG.1 (c) shows the intensity | strength graph about the unit area d1 produced in this way. The horizontal axis of the intensity graph is obtained by converting the horizontal axis of the spectrogram shown in FIG. 1 (b) into a logarithmic scale, and note numbers N = 0 to 127 are plotted at equal intervals. On the other hand, the vertical axis of this intensity graph is obtained by converting the complex intensity A of the spectrum shown in FIG. 1B to the effective intensity E, and indicates the intensity at the position of each note number N. In general, the complex intensity A obtained by Fourier transform is represented by a real part R and an imaginary part I, but the effective intensity E is E = (R²+ I²)^1/2Can be obtained by the following calculation.
[0030]
The intensity graph of the unit interval d1 thus obtained can be referred to as a graph showing, as the effective intensity, the ratio of each vibration component corresponding to the note number N = 0 to 127 with respect to the vibration component included in the acoustic signal of the unit interval d1. . Therefore, P note numbers are selected from all Q (Q = 128 in this example) note numbers based on the effective intensities shown in the intensity graph, and the P note numbers N are selected. Is extracted as a representative code code representing the unit interval d1. Here, for convenience of explanation, it is assumed that P = 3 and three note numbers are extracted as representative code codes from a total of 128 candidates. For example, if extraction is performed based on the criterion “P code codes are extracted from candidates in descending order of strength”, the note number is used as the first representative code code in the example shown in FIG. Np (d1, 1) is extracted as the second representative code code, and the note number Np (d1, 3) is extracted as the third representative code code. Become.
[0031]
When P representative code codes are extracted in this way, the acoustic signal of the unit section d1 can be expressed by these representative code codes and their effective intensities. For example, in the case of the above-described example, in the intensity graph shown in FIG. 1C, the effective intensities of the note numbers Np (d1,1), Np (d1,2), Np (d1,3) are respectively Ep (d1,1). If 1), Ep (d1,2), and Ep (d1,3), the acoustic signal of the unit section d1 can be expressed by the following three data pairs.
[0032]
Np (d1,1), Ep (d1,1)
Np (d1,2), Ep (d1,2)
Np (d1,3), Ep (d1,3)
Although the processing for the unit section d1 has been described above, the same processing is performed separately for each of the unit sections d2 to d5, and data representing the representative code code and its strength is obtained. For example, for the unit section d2,
Np (d2,1), Ep (d2,1)
Np (d2,2), Ep (d2,2)
Np (d2,3), Ep (d2,3)
Three sets of data pairs are obtained. In this way, the original sound signal can be encoded by the data obtained for each unit section.
[0033]
FIG. 2 is a conceptual diagram of encoding by the above-described method. FIG. 2 (a) shows a state in which five unit sections d1 to d5 are set for the original sound signal, as in FIG. 1 (a). FIG. 2 (b) shows each unit section. The obtained code data is shown in a note format. In this example, three representative code codes are extracted for each unit section (P = 3), and data relating to these representative code codes are accommodated in three tracks T1 to T3. For example, the representative code codes Np (d1,1), Np (d1,2), Np (d1,3) extracted for the unit section d1 are accommodated in the tracks T1, T2, T3, respectively. However, FIG. 2 (b) is a conceptual diagram showing the code data obtained by the present invention in the form of musical notes, and in fact, data relating to strength is added to each musical note. For example, the track T1 includes data indicating the scales of note numbers Np (d1,1), Np (d2,1), Np (d3,1)..., And Ep (d1,1), Ep (d2,1). , Ep (d3,1)... Is stored.
[0034]
As the encoding format in the present invention, it is not always necessary to adopt the MIDI format, but since the MIDI format is the most popular as this type of encoding, the code data in the MIDI format is practically used. Is most preferred. In the MIDI format, “note-on” data or “note-off” data exists while interposing “delta time” data. “Note-on” data is data that designates a specific note number N and velocity V to instruct the start of performance of a specific sound, and “note-off” data is specific note number N and velocity V. Is data that designates the end of the performance of a specific sound. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which the piano keyboard is pressed down (velocity at note-on) and the speed at which the finger is released from the keyboard (velocity at note-off). Or it shows the strength of the performance end operation.
[0035]
In the present embodiment, as described above, for the i-th unit interval di, P note numbers Np (di, 1), Np (di, 2),..., Np (di, P) are used as representative code codes. , Ep (di, 1), Ep (di, 2),..., Ep (di, P) are obtained. Therefore, in the present embodiment, code data in the MIDI format is created by the following method. First, note numbers N described in “note-on” data or “note-off” data are obtained note numbers Np (di, 1), Np (di, 2),..., Np (di, P ) Is used as is. On the other hand, as the velocity V described in the “note on” data or “note off” data, the obtained effective intensities Ep (di, 1), Ep (di, 2),..., Ep (di, P ) Is normalized so that the value is in the range of 0 to 1, and a value obtained by multiplying the square root of the normalized effective strength E by 127 is used. That is, when the maximum value for the effective strength E is Emax,
V = (E / Emax)^1/2・ 127
The value V obtained by the following calculation is used as the velocity. Or take the logarithm,
V = log (E / Emax) .127 + 127
(However, V = 0 if V <0)
The value V obtained by the following calculation may be used as the velocity. The “delta time” data may be set according to the length of each unit section.
[0036]
Eventually, in the above-described embodiment, MIDI code data composed of three tracks is obtained. When this MIDI code data is reproduced using a predetermined MIDI sound source, the original sound signal is reproduced as a three-channel stereo reproduction sound. Note that a general apparatus having a MIDI code data playback function can perform 8-channel or 16-channel stereo playback. In practice, P = 8 or P = 16 is set and 8 tracks are set. Alternatively, it is preferable to create MIDI code data consisting of 16 tracks.
[0037]
The encoding process according to the above-described procedure is actually executed using a computer. The program for realizing the encoding process according to the present invention can be supplied by being recorded on a computer-readable recording medium such as a magnetic disk or an optical disk, and can be encoded by the encoding process according to the present invention. Similarly, the data can be supplied by being recorded on a computer-readable recording medium such as a magnetic disk or an optical disk.
[0038]
§2. More practical section setting method
Up to now, the basic principle of the audio signal encoding method according to the present invention has been described. Hereinafter, a more practical encoding method will be described. Here, a more practical method for setting the section will be described. In the example shown in FIG. 2A, five unit intervals d1 to d5 are set with six times t1 to t6 defined at equal intervals on the time axis t as boundaries. When encoding is performed based on such a section setting, discontinuity of sound tends to occur at the time that becomes a boundary during reproduction. Therefore, in practice, it is preferable to set a section in which adjacent unit sections partially overlap on the time axis.
[0039]
FIG. 3 (a) is an example in which such partially overlapping sections are set. The unit sections d1 to d4 shown in the figure are all partially overlapped. When the above-described processing is performed based on such section setting, as shown in the conceptual diagram of FIG. 3B. Encoding is performed. In this example, the center of each unit section is used as a reference position, and each note is arranged at each reference position. However, the relative reference position with respect to the unit section is not necessarily set at the center. Comparing the conceptual diagram shown in FIG. 3 (b) with the conceptual diagram shown in FIG. 2 (b), it can be seen that the density of the notes is increased. If overlapping sections are set in this way, the number of code data to be created increases, but natural encoding that does not cause discontinuity of sound during reproduction becomes possible.
[0040]
FIG. 4 is a diagram illustrating a specific method for setting a partially overlapping section on the time axis. In this specific example, an acoustic signal is sampled at a sampling frequency of 22 kHz to be captured as digital acoustic data, the section length L of each unit section is set to 1024 samples (about 47 msec), and the deviation amount for each unit section Is set to 20 samples (about 0.9 msec). That is, for an arbitrary i, the distance on the time axis between the starting point of the i-th unit section and the starting point of the (i + 1) -th unit section is set to the offset length ΔL. For example, the first unit interval d1 includes the 1st to 1024th samples, and the second unit interval d2 includes the 21st to 1044th samples shifted by 20 samples.
[0041]
In this way, when a section that overlaps partially on the time axis is set, a considerable number of samples are commonly used in adjacent unit sections, and an effective difference in the spectrum obtained for each unit section Is not expected to occur. For example, in the case of the above example, when comparing the first unit interval d1 and the second unit interval d2, the 21st to 1024th samples are used in common in both unit intervals, The difference will depend on only 20 samples. Thus, unless a sufficient difference is obtained in the spectrum of adjacent unit sections, it is impossible to follow a rapidly changing acoustic signal, resulting in a problem that time resolution is lowered. In order to deal with such a problem, it is only necessary to take a measure that causes a large change on the input side of the Fourier transform due to a difference of only 20 samples.
[0042]
Therefore, the present inventor has devised a technique for emphasizing the changing 20 samples for the window function mentioned in §1. The known Hanning window described above works rather in the direction of suppressing fluctuations in adjacent sections, and is therefore counterproductive from the viewpoint of dealing with the above-described problem. Therefore, a function that emphasizes 20 samples while inheriting the feature of the Hanning window in which the weights at both ends of the interval are reduced was devised and applied. Specifically, the section length of the unit section is L and the offset length is ΔL.
α = L / 2−ΔL / 2
β = L / 2 + ΔL / 2
Α and β are defined, and a central neighborhood section (a section having a width ΔL defined at the center position of the unit section) represented by the section [α, β]
When k = 1 ... α
H (k) = 0.5−0.5 * cos (πk / 2α)
When k = α ... β
H (k) = 0.5-0.5
* Cos (π (k−α) / ΔL + π / 2)
When k = β ... L
H (k) = 0.5-0.5
* Cos (π (k−β) / 2α + 3π / 2)
An improved window function H (k) may be used. This improved window function H (k) is a distribution function deformed so that the half-value width is just ΔL, and when an experiment was performed using this function, a sufficient effect could be confirmed.
[0043]
As in the specific example described above, when sampling is performed at a sampling frequency of 22 kHz and the section length L of the unit section is set to 1024 samples, the upper half of the 128 types of note numbers is obtained by logarithmic scale conversion. Only the corresponding data could be obtained continuously, and it was confirmed that the data of the bass part was in a so-called tooth-missing state, and the spectrum was biased toward the high tone as a whole. After all, considering that all 128 types of note numbers are covered, the section length L needs to be set to 8 times 8192 samples or more. However, if the section length L is increased by eight times, the calculation time for each section is increased by 64 times, and the above-described problem of a decrease in time resolution is promoted, which is not realistic.
[0044]
Therefore, the inventor of the present application has devised a method of separately obtaining a spectrum focusing on the bass part with the same section length and synthesizing the separately obtained spectrum into a normal spectrum. The spectrum focusing on the bass part can be easily obtained by the following method with the same calculation load as the normal spectrum. For example, in FIG. 1 (b), if the sampling frequency is set to Fs / 8, which is 1/8 of the normal length, while keeping the section length L the same, a spectrum in which the frequency component below Fs / 8 is expanded is obtained. be able to. This process is equivalent to a process in which the number of samples of the acoustic signal is reduced to 1/8, samples having the same section length are extracted, and Fourier transform is performed (the scale of the section length time axis is 8 times). ). Fortunately, it is difficult to increase the sampling frequency of acoustic signals that are already discrete data, but it is easy to decrease it. By synthesizing the 1/8 thinned spectrum obtained in this way into a normal spectrum, it was confirmed that all of the note numbers 24 and above could be covered (note number 24 is the lowest piano sound, Since it cannot be played back with a normal musical instrument, it is unnecessary in practice.) In addition, the calculation load by this method is only to perform Fourier transformation for 1024 samples at most twice.
[0045]
In order to obtain the effective intensity E for each of 128 types of note numbers defined on the horizontal axis of the intensity graph, for example, a predetermined frequency range is assigned to each note number N, and each frequency within the assigned range is assigned. The average value of the effective intensities may be the effective intensity of the note number N. FIG. 5 is a graph showing the concept of obtaining the effective strength by such a method. First, if the horizontal axis of the spectrum obtained by Fourier transform is converted into a logarithmic scale and the vertical axis is converted into effective intensity, a graph as shown in FIG. 5 is obtained. The frequency values 259, 280, 291,... Shown on the horizontal axis are the frequencies corresponding to the note numbers N = 60, 61, 62,. Here, for example, in order to obtain the effective intensity for the note number N = 61, a predetermined frequency range in the vicinity of the frequency value 280 (the hatched area in the figure) is assigned to the note number N = 61, and this range. The maximum value of the effective intensity of each frequency may be the effective intensity for the note number N = 61.
[0046]
§3. Code code integration processing
As described in §2 above, when a partially overlapping section is set, the number of code codes to be created increases considerably. Here, an effective integration process for reducing the number of code codes finally created as much as possible will be described.
[0047]
For example, consider a case where a code code indicated by a musical note as shown in FIG. In the illustrated example, all code codes are composed of eighth notes. This is because the section length L is constant, and thus the individual code codes created have the same length. However, the note group shown in FIG. 6 (a) can be rewritten as shown in FIG. 6 (b). That is, when a plurality of notes indicating the same scale are continuously arranged, the plurality of notes can be integrated into one note. In other words, it is possible to replace a note for each unit section with a note straddling a plurality of unit sections.
[0048]
In the example shown in FIG. 6, only notes of the same scale are integrated, but the notes to be integrated are not necessarily limited to the notes of the same scale, and notes having a certain degree of similarity are to be integrated. It doesn't matter. For example, a series of notes that are one scale apart from each other can be integrated and replaced with one note. In this case, for example, a lower note in the series may be replaced. In general, if there are representative code codes that are similar to each other under a predetermined condition for a plurality of adjacent unit sections, these similar representative code codes are integrated code codes that straddle the plurality of unit sections. By replacing with, the number of notes can be reduced.
[0049]
In FIG. 6, the concept of the code code integration process has been described with respect to an example in which musical notes are integrated. However, each code code created by the encoding process according to the present invention includes data indicating intensity (MIDI data). (Velocity in the case) is added. Therefore, when the code codes are integrated, it is necessary to integrate data indicating the strength. Here, when different intensity data is defined for each code code to be integrated, for example, the largest intensity data may be determined as the intensity data for the integrated code code. However, in the case of MIDI data, when two code codes are integrated, if the intensity of the following code code is considerably larger than the intensity of the preceding code code, it becomes unnatural if these two code codes are integrated. . This is because the reproduction sound of a normal MIDI sound source is composed of the performance sound of a musical instrument, and the intensity of the sound generally decreases with time. Therefore, when the strength of the subsequent code code is smaller than the strength of the preceding code code, the unnaturalness does not occur even if it is replaced with one integrated code code, but in the opposite case, the unnaturalness is not generated. Will occur. Therefore, the condition that the integration is not performed when the intensity difference between the two code codes is equal to or greater than a predetermined reference and the intensity of the subsequent code code is larger than the intensity of the preceding code code. Is preferably set.
[0050]
As described above, when the code code integration process is performed, a merit of reducing the number of code codes can be obtained. Therefore, it is desirable to take care that the integration process is promoted as much as possible. The most effective method for performing such consideration is a method in which code codes are sorted on the basis of frequency and then accommodated in each track. The note group shown in FIG. 6A is a code code accommodated on the same track. The notes to be integrated are usually required to be accommodated on the same track. However, in practice, as shown in FIG. 2 (b), a plurality of P tracks (P = 3 in the example of FIG. 2 (b)) are defined, and P tracks extracted for each unit section. The code code is separately accommodated in the P tracks, and the probability that a note to be integrated appears depends greatly on the method of separation processing into each track. For example, as shown in FIG. 2 (b), when three code data are separated into three tracks T1, T2, and T3, the lowest one of the three is the track T1, and the next lowest frequency If the separation method is determined so that the highest frequency is accommodated in the track T2 and the highest frequency is accommodated in the track T3, the probability that the notes to be integrated will appear compared to the case where the separation is performed regardless of the frequency. It is thought to improve.
[0051]
After all, when the P code codes extracted for each unit section are accommodated separately in P tracks, the extracted P code codes are sorted based on the frequency and then each track is sorted. If this is accommodated, code codes to be integrated can be increased. FIG. 7 is a conceptual diagram showing an example of frequency sorting in the case of P = 8. When P = 8, eight note numbers Np (d1,1) to Np (d1,8) are extracted as representative code codes for a certain unit section d1. In this extraction process, for example, the eight note numbers are sequentially extracted in the order of the effective intensity, and are sorted in the order of the effective intensity (the left column in FIG. 7). ). If these are sorted by frequency, for example, the order is changed as shown in the middle row of FIG. If the note numbers sorted in this way are accommodated in eight tracks T1 to T8 as shown in the right column of FIG. 7, for example, the lowest frequency among the eight note numbers (numbers of the numbers). The (small) note number is always accommodated in the track T1, and the highest frequency (largest number) note number is always accommodated in the track T8. As a result, the appearance frequency of note numbers to be integrated is improved in any track.
[0052]
§4. Extraction method of representative code
In the example shown in FIG. 1C, in the intensity graph of the unit section d1, three note numbers Np (d1,1), Np (d1,2) are selected from 128 note numbers defined on the horizontal axis. ), Np (d1,3) are extracted as representative code codes, and each extracted representative code code is stored separately in three tracks T1, T2, T3. Generally, when P tracks T1 to TP are prepared, P note numbers Np (d1,1), Np (d1,2),..., Np (d1, P) are extracted as representative code codes. There is a need. Here, five specific methods will be described as methods for extracting the representative code code.
[0053]
<< 4.1 First Extraction Method >>
The first method is a method in which P code codes are extracted in descending order of the strength from candidates in the strength graph to be encoded, and this is used as a representative code code. The three note numbers Np (d1,1), Np (d1,2) and Np (d1,3) shown in FIG. 1C are extracted based on this first method. That is, in the intensity graph shown in FIG. 1C, the note number Np (d1, 1) having the largest effective intensity E is extracted as the first representative code code, and the note number Np (d1) having the second largest effective intensity E is extracted. , 2) is extracted as the second representative code code, and the note number Np (d1, 3) having the third largest effective intensity E is extracted as the third representative code code.
[0054]
FIG. 8 is a diagram for explaining the principle of the first method. Here, for convenience of explanation, effective strengths as shown in FIG. 8 (a) are defined for the five note numbers Na, Nb, Nc, Nd, and Ne, respectively, and the effective strengths of all other note numbers are defined. Consider the simple case of zero (in fact, it is common for all 128 note numbers to have some effective intensity value). According to the first method, the note number Nb having the largest effective intensity is extracted as the first representative code code among the five candidates. The note number extracted in this way is deleted from the candidates. In the example shown in FIG. 8, the note number Nb extracted as the first representative code code is deleted from the candidates. In FIG. 8 (b), the graph of the note number Nb deleted from the candidates is indicated by a broken line. Subsequently, the note number Nc having the highest effective intensity among the remaining four candidates shown by the solid line graph in FIG. 8B is extracted as the second representative code code and deleted from the candidates. Such processing may be repeated until the P-th representative code code is extracted.
[0055]
Originally, the representative code code extracted for each unit section is intended to indicate a representative frequency component included in the original sound signal in the unit section. The first method of extracting P representative code codes in descending order appears to be the most appropriate method. However, as a result of actual encoding using this first extraction method, it was confirmed that the pitch was shifted to the high pitch side as a whole during reproduction. For example, when male speech is used as an original sound signal, encoding using the first extraction method is performed, and the obtained code data is reproduced using a general MIDI sound source, the original male story is obtained. Compared to the voice, it was possible to obtain a playback sound that was almost similar to a woman's voice.
[0056]
The inventor of the present application considers that the reason why such a phenomenon occurs is that the musical instrument sound used for the MIDI sound source includes a harmonic component (a component having a frequency that is an integral multiple of the basic component). Yes. For example, the fundamental frequency component of the “ra sound (A3 sound)” at the center of the piano keyboard is 440 Hz, but when this “ra sound (A3 sound)” is actually played, the fundamental frequency component is 440 Hz. 2 times the frequency component 880 Hz (rabble one octave higher (A4 sound)), 3 times, 4 times,... Frequency component sound (overtone component). Recognize. Therefore, for example, when note number N = 69 (A3 sound) is extracted as the representative code code, at the time of reproduction, in addition to the 440 Hz sound that is the fundamental frequency component of note number N = 69, 880 Hz, 1320 Hz,. Overtone components such as are mixed. Therefore, when P representative code codes are extracted in descending order of the effective intensity by this first extraction method, during reproduction using a MIDI sound source, the sound of the fundamental frequency component of each representative code code is added to the sound of these harmonic components. Sound is added, and playback is performed in a state where the intensity of the high-pitched sound side is increased as a whole. The phenomenon that the pitch shifts to the high pitch side as a whole during reproduction seems to occur for this reason.
[0057]
The inventor of the present application pays attention to such a reason, and has come up with a representative code code extraction method for suppressing the phenomenon that the pitch is entirely shifted to the high pitch side during reproduction. Each extraction method described below is based on such an idea.
[0058]
<<< 4.2 Second Extraction Method >>>>
In the second method, after extracting the code code having the highest intensity from the candidates at that time in the intensity graph to be encoded as the i-th representative code code, the i-th representative code code and The process of deleting the code code corresponding to the harmonic component from the candidates is repeatedly executed for i = 1 to (P−1), and finally the code code having the highest intensity among the remaining candidates is This is a method of extracting a total of P representative code codes by extracting them as representative code codes.
[0059]
For example, as shown in FIG. 9A, consider the case where effective strengths as shown in the figure are defined for five note numbers Na, Nb, Nc, Nd, and Ne. First, with i = 1, the note number Nb, which is the code code with the highest intensity at this time, is extracted from the candidates as the first representative code code. Subsequently, the extracted code number corresponding to the extracted note number Nb and its harmonic component is deleted from the candidates. For example, if the note number Nc is a harmonic component of the note number Nb, as shown by the broken line in FIG. 9 (b), the note number Nc which is the harmonic component and the already extracted note number Nb are deleted from the candidates. Is done. In FIG. 9B, the graphs of the note numbers Nb and Nc deleted from the candidates are indicated by broken lines. Subsequently, updating to i = 2 is performed, and the note number Na that is the code code having the highest intensity is extracted as the second representative code code from the remaining candidates. Then, the extracted note number Na and the code code corresponding to the harmonic component thereof are deleted from the candidates.
[0060]
When such processing is updated to i = P−1 while updating i = 3, i = 4,..., Extraction to the (P−1) th representative code code is completed. Finally, if the code code having the highest strength is extracted from the remaining candidates as the P-th representative code code, a total of P representative code codes can be extracted.
[0061]
In this second extraction method, when one representative code code is extracted, the code code corresponding to the harmonic component is deleted from the candidates, and therefore, among the P representative code codes finally extracted, Does not include code codes that are in a harmonic relationship with each other. Therefore, it is possible to mitigate the phenomenon that the harmonic component is emphasized during reproduction and becomes a high-pitched sound. However, this second extraction method cannot completely suppress the phenomenon that the reproduced sound becomes a high-pitched sound. The reason is considered to be that a general MIDI sound source includes a sound having a higher harmonic component intensity than the original fundamental frequency component intensity.
[0062]
FIG. 10 is a chart showing the results of measuring the peak frequencies included in note numbers N = 24 to 84 for a general piano MIDI sound source. For example, note number N = 24 is originally a sound corresponding to the scale of “C0 sound”, but when the frequency component contained in the reproduced sound when this sound is reproduced with a MIDI sound source is checked, the peak frequency is 129 Hz. The result is higher than the fundamental frequency of the original scale. The scale shown in the “corresponding scale” column of this chart indicates the scale corresponding to this peak frequency. When the note number N = 24, the corresponding scale is “C2” sound. In other words, when the sound of note number N = 24 with “C0 sound” as the original scale is reproduced, it is actually a fourth harmonic component rather than the intensity of the fundamental frequency corresponding to “C0 sound”. It can be seen that the intensity of the frequency (129 Hz) corresponding to “C2 sound” is larger. Such a tendency is mainly observed for the note number N = 57 or less. That is, among the sounds with note number N = 57 or less, N = 41, 45, 46, 48, 49, 52, 54, and 56 each have the highest fundamental frequency intensity and correspond to the original scale and peak frequency. In all other sounds, the intensity of the harmonic component is greater than the intensity of the fundamental frequency, and the scale corresponding to the peak frequency does not match the original scale. . It should be noted that all the sounds with note numbers N = 58 and above have the highest fundamental frequency intensity, and the original scale corresponds to the scale corresponding to the peak frequency.
[0063]
With such a characteristic, the second extraction method cannot completely suppress the phenomenon that the pitch is entirely shifted to the high pitch side during reproduction. That is, when a note number N = 57 or less is extracted as a representative code code, the intensity of the overtone component becomes larger than the intensity of these original scales, so that the high frequency side is still emphasized during reproduction. become.
[0064]
<<<< 4.3 Third Extraction Method >>>>
The third extraction method is a method considering characteristics as shown in FIG. That is, the sound source used for reproducing the sound based on each code code is specified in advance, and the frequency characteristics of the reproduced sound of each code code using this specific sound source (for example, the characteristics shown in FIG. 10). ) Then, a predetermined correction table is defined based on the obtained frequency characteristics. Specifically, a correction table may be defined so that the sound of a specific note number is substituted with the sound of a lower note number. For example, when reproduction is performed using a sound source having the frequency characteristics shown in FIG. 10, note number N = 48 (C2 sound) is substituted with a lower note number N = 24 (C0 sound). can do. This is because when the note number N = 24 is reproduced, the intensity of the “C2 sound” that is the harmonic component is greater than the intensity of the “C0 sound” that is the original scale.
[0065]
For note numbers that can be substituted with low sounds like this, each note number to be substituted is determined in advance, and a correction instruction to replace it with the substitute object is prepared in the form of a correction table. Good. When extracting the representative code code, the code code actually extracted is corrected while referring to the correction table. For example, even if the note number N = 48 (C2 sound) having the highest intensity should be extracted at that time, the note number N = 48 (C2 sound) is changed to the note number N = 24 (C0 sound). ), The note number N = 24 (C0 sound) may be extracted as the representative code code.
[0066]
Eventually, the code code having the highest intensity among the current candidates in the intensity graph to be encoded is set as the i-th reference code, and the prepared correction table is applied to the i-th reference code. The code code obtained by the above is extracted as the i-th representative code code, and the process of excluding both the i-th reference code and the i-th representative code code from the candidates is repeated for i = 1 to P. A total of P representative code codes may be extracted.
[0067]
For example, as shown in FIG. 11 (a), consider a case where effective strengths as shown in the figure are defined for five note numbers Na, Nb, Nc, Nd, and Ne. First, with i = 1, the note number Nb, which is the code code with the highest intensity at this time, is extracted from the candidates as the first reference code. Subsequently, the prepared correction table is applied to the first reference code. For example, note number Nb is changed to note number Nb in the correction table.^*If there is an instruction for correction, as shown in FIG. 11B, the corrected note number Nb^*Are extracted as the first representative code code. At this time, the note number Nb as the first reference code and the corrected note number Nb^*Is excluded from the candidates. In FIG. 11 (c), the graph of the note number Nb excluded from the candidates is indicated by a broken line (Nb^*Is not shown because the intensity component is originally close to 0).
[0068]
Subsequently, updating to i = 2 is performed, and as shown in FIG. 11 (c), a note number Nc which is a code code having the highest intensity is extracted from the remaining candidates as a second reference code. . Then, the prepared correction table is applied to the second reference code. For example, note number Nc is changed to note number Nc in the correction table.^*If there is an instruction to correct, the note number Nc after correction is corrected as shown in FIG.^*Are extracted as the second representative code code. At this time, the note number Nc as the second reference code is excluded from the candidates. If such processing is updated to i = P while updating i = 3, i = 4,..., Extraction to the Pth representative code code is completed.
[0069]
When performing this third extraction method, the correction table creation method is important. If the correction table to be used is inappropriate, the result is that the pitch is greatly deviated by the correction. Strictly speaking, the correction table to be prepared depends on the sound source used at the time of reproduction. However, since all general MIDI sound sources often have similar frequency characteristics, a specific sound source is prepared. The correction table can be used with a certain degree of versatility even when another sound source is used.
[0070]
According to an experiment conducted by the present inventor, the third extraction method can reduce the phenomenon that the pitch during reproduction is shifted to the high pitch side as a whole. In order to effectively suppress it, it is preferable to use a combination of the second extraction method described above and the third extraction method. That is, as shown in FIG. 11 (b), the note number Nb is used as the first representative code code.^*Are extracted together with the note number Nb, which is the first reference code, from the candidate. For example, if the note number Nc is a harmonic component of the note number Nb, both the note numbers Nb and Nc are deleted from the candidates as shown by the broken line in FIG. Will extract the note number Na. As a result of applying the prepared correction table to the second reference code, the note number Na is added to the note number Na in the correction table.^*If there is an instruction for correction, as shown in FIG. 11 (f), the corrected note number Na^*Are extracted as the second representative code code. At this time, note number Na which is the second reference code and its harmonic component are deleted from the candidates. If such processing is updated to i = P while updating i = 3, i = 4,..., Extraction to the Pth representative code code is completed.
[0071]
<< 4.4 Fourth Extraction Method >>>>
In the fourth extraction method, a sound source used to reproduce sound is specified in advance, and the waveform of an acoustic signal obtained by actually reproducing each code code using this sound source is measured. Then, the spectrum creation stage and the intensity graph creation stage described in section 1 are executed on the waveform of the acoustic signal, and an intensity graph for each code code is obtained in advance. That is, 128 sounds of note numbers N = 0 to 127 are reproduced using an actual MIDI sound source, and a spectrum as shown in FIG. 1B is obtained from this reproduced waveform (for example, §1 A section having the same length as the unit section described in the above is set as a representative section, and a spectrum for this representative section is obtained, and the representative section is set by avoiding the rising or falling portion of the signal as much as possible. Alternatively, an average spectrum for a plurality of appropriate sections may be obtained.) Further, an intensity graph as shown in FIG. 1 (c) (here, referred to as an inherent intensity graph for each code code (note number)). I ask for it. Eventually, 128 unique intensity graphs are obtained for note numbers N = 0 to 127. The above is the preparation stage of the fourth extraction method.
[0072]
In the stage of actually encoding the acoustic signal to be encoded, the representative code code is extracted by the following method. That is, after extracting the code code having the highest intensity from the candidates at that time in the intensity graph to be encoded as the i-th representative code code, this code is calculated from each intensity value of the intensity graph to be encoded. The process of subtracting each intensity value of the inherent intensity graph for the i-th representative code code is repeated for i = 1 to (P−1), and the code code having the highest intensity among the remaining candidates Are extracted as the P-th representative code code, so that a total of P representative code codes are extracted.
[0073]
For example, as shown in FIG. 12 (a), consider a case where effective strengths as shown in the figure are defined for five note numbers Na, Nb, Nc, Nd, and Ne. First, with i = 1, the note number Nb, which is the code code with the highest intensity at this time, is extracted from the candidates as the first representative code code. For the note number Nb, an inherent intensity graph is obtained in the preparation stage described above. For example, when this note number Nb is reproduced using a specific MIDI sound source and a reproduced signal waveform as shown in FIG. 12 (b) is obtained, this reproduced signal waveform is described in section 1. By executing the spectrum creation stage and the intensity graph creation stage, an inherent intensity graph of the note number Nb as shown in FIG. 12C is prepared. Therefore, a process of subtracting each intensity value of the inherent intensity graph for the note number Nb from each intensity value of the intensity graph to be encoded shown in FIG. FIG. 12 (d) is a graph showing the result of subtraction. A portion indicated by a broken line in the figure is a portion deleted by subtraction. After all, as a result of the subtraction, the “intensity graph to be encoded” at this time becomes a graph as shown in FIG.
[0074]
Subsequently, updating to i = 2 is performed, and as shown in FIG. 12 (e), the note number Na that is the code code having the highest intensity is extracted as the second representative code code from the remaining candidates. The Then, each intensity value of the inherent intensity graph (not shown) for the note number Na is subtracted from each intensity value of the graph of FIG. 12 (e) which is the “intensity graph to be encoded” at this time. The graph obtained as a result of this subtraction after processing is a new “intensity graph to be encoded”.
[0075]
When such processing is updated to i = P−1 while updating i = 3, i = 4,..., Extraction to the (P−1) th representative code code is completed. Finally, if the code code having the highest strength is extracted from the remaining candidates as the P-th representative code code, a total of P representative code codes can be extracted.
[0076]
Since this fourth extraction method largely depends on the characteristics of the sound source used at the time of reproduction, it is suitable for use when the sound source scheduled to be used for reproduction can be specified in advance. Every time one representative code code is extracted, a method is adopted in which the frequency component contained in the actual reproduced sound for the representative code code is reduced, so that extremely faithful reproduction is possible.
[0077]
In the above example, the intensity value of the obtained intrinsic strength graph is reduced as it is, but the intensity value of the intrinsic intensity graph may be normalized and then subtracted. For example, let X be the intensity value of the note number Nb extracted as the representative code code in the “intensity graph to be encoded” shown in FIG. 12 (a), and the same note number Nb in the intrinsic intensity graph of FIG. 12 (c). When the intensity value of Y is Y, if each intensity value of the latter inherent intensity graph is multiplied by X / Y and then subtraction is performed, a new “intensity graph to be encoded” obtained as a subtraction result is obtained. The intensity value of the note number Nb can be made zero, and the same note number Nb can be prevented from being repeatedly extracted as a representative code code.
[0078]
<<< 4.5 Fifth Extraction Method >>>>
In the fifth extraction method, as in the fourth extraction method described above, as a preparation stage, a sound source to be used for reproducing sound is specified in advance, and each code code is actually reproduced using this sound source. The waveform of the acoustic signal obtained by this is actually measured. However, in the fifth extraction method, the actually measured waveform itself is stored and used for subtraction in later processing. Specifically, 128 sounds of note numbers N = 0 to 127 are reproduced using an actual MIDI sound source, and the 128 reproduced waveforms are stored as they are. Here, the reproduction waveform for each note number is referred to as a unique waveform for each code code.
[0079]
When actually encoding an acoustic signal to be encoded, the following “code extraction process” is defined, and this “code extraction process” is repeatedly executed. That is, the “code extraction process” defined here is “to input the waveform information of the i-th acoustic signal in order to determine the i-th representative code code, Perform the spectrum creation stage and the intensity graph creation stage described in §1, and in the subsequent encoding stage, extract the code code having the highest intensity from the candidates in the created intensity graph as the i-th representative code code, Further, a process of subtracting each intensity value of the eigen waveform for the i-th representative code code from the intensity value of the i-th acoustic signal and using the resulting acoustic signal as the (i + 1) -th acoustic signal Is.
[0080]
In the fifth extraction method, first, section setting processing is performed on the original sound signal to be encoded, and a plurality of unit sections are set on the time axis. Then, the acoustic signal for each unit section is set as the first acoustic signal, and the above-described “code extraction process” is repeatedly executed for i = 1 to (P−1) for each unit section. Finally, the waveform information of the Pth acoustic signal is input, the spectrum generation stage and the intensity graph generation stage described in §1 are performed on the input waveform information, and the intensity generated in the subsequent encoding stage If the code code having the highest strength is extracted from the candidates in the graph as the P-th representative code code, a total of P representative code codes can be extracted for each unit section. The above is the basic procedure of the fifth extraction method. Hereinafter, this basic procedure will be described with reference to a specific example shown in FIG.
[0081]
First, the section setting stage is performed on the original acoustic signal to be encoded, and the original acoustic signal for each unit section becomes the first acoustic signal. The following processing is performed for each unit. First, i = 1 is set, and the first “code extraction process” is executed. Here, it is assumed that a signal having a waveform as shown in FIG. 13A is input as the first acoustic signal for a certain unit interval d. This signal is a signal that should be called an original signal for the unit interval d. Subsequently, the signal shown in FIG. 13A is subjected to Fourier transform to obtain a spectrum, and an intensity graph is obtained based on the spectrum. Here, it is assumed that an intensity graph as shown in FIG. 13 (b) is obtained.
[0082]
Next, in this intensity graph, the note number Nb which is the code code having the highest intensity is extracted as the first representative code code. Then, from the intensity value of the first acoustic signal shown in FIG. 13 (a), each characteristic waveform (preliminarily obtained and stored in the preparation stage) for the first representative code code shown in FIG. 13 (c). Subtract the intensity value. As a result, for example, an acoustic signal as shown in FIG. 13 (d) is obtained. The subtraction result shown in FIG. 13 (d) is the second acoustic signal.
[0083]
Subsequently, updating to i = 2 is performed, and this time, a Fourier transform is performed on the second acoustic signal shown in FIG. 13 (d) to obtain a spectrum, and an intensity graph is obtained based on this spectrum. Here, it is assumed that an intensity graph as shown in FIG. 13 (e) is obtained. Next, in this intensity graph, the note number Nc which is the code code having the highest intensity is extracted as the second representative code code. Thereafter, each intensity value of the eigen waveform (not shown) for the second representative code code is subtracted from the intensity value of the second acoustic signal shown in FIG. 13 (d) to obtain the third acoustic signal. Ask.
[0084]
When such processing is updated to i = P−1 while updating i = 3, i = 4,..., Extraction to the (P−1) th representative code code is completed. Finally, if the code code having the highest strength is extracted from the remaining candidates as the P-th representative code code, a total of P representative code codes can be extracted.
[0085]
If the above processing is executed for each unit section, P representative code codes can be obtained for each unit section.
[0086]
【The invention's effect】
As described above, according to the encoding method of the present invention, it is possible to perform efficient encoding on an acoustic signal.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic principle of an audio signal encoding method according to the present invention.
FIG. 2 is a diagram showing a code code created based on the intensity graph shown in FIG. 1 (c).
FIG. 3 is a diagram showing a code code created by setting unit sections so as to partially overlap on a time axis.
FIG. 4 is a diagram showing a specific example of unit section setting that partially overlaps on the time axis.
FIG. 5 is a graph showing a correspondence relationship between a frequency axis and a note number.
FIG. 6 is a diagram illustrating an example in which the amount of code data is reduced by unit area integration processing;
FIG. 7 is a diagram showing a concept of storing a plurality of note numbers in a track after sorting them by frequency.
FIG. 8 is a diagram illustrating a first method of extracting a representative code code based on an intensity graph.
FIG. 9 is a diagram illustrating a second method of extracting a representative code code based on an intensity graph.
FIG. 10 is a chart showing frequency characteristics when each note number is reproduced by a MIDI sound source.
FIG. 11 is a diagram illustrating a third method of extracting a representative code code based on an intensity graph.
FIG. 12 is a diagram illustrating a fourth method of extracting a representative code code based on an intensity graph.
FIG. 13 is a diagram illustrating a fifth method of extracting a representative code code based on an intensity graph.
[Explanation of symbols]
A ... Complex intensity
d1 to d5: Unit section
E ... Effective strength
Fs: Sampling frequency
f ... Frequency
L: Section length of unit section
ΔL ... Offset length
N ... Note number
Na ~ Ne ... note number
Na^*, Nb^*, Nc^*... note number obtained by correction
Np (dj, i): i-th representative code code (note number) extracted for the unit section dj
Ep (dj, i) ... Effective strength of representative code Np (dj, i)
T1-T8 ... track
t1-t6 ... Time

Claims

An encoding method for encoding an acoustic signal given as a time-series intensity signal,
A section setting stage for setting a plurality of unit sections on the time axis of the acoustic signal to be encoded,
For each unit section, a spectrum creation stage for creating a spectrum with the frequency component contained in the acoustic signal in the unit section as the first axis and the intensity for each frequency component as the second axis;
A plurality of Q code codes discretely indicating a plurality of Q stages of scales are defined in correspondence with the first axis of the spectrum, and the Q code codes are defined on the first axis for each code code. An intensity graph creation stage for creating an intensity graph taking intensity as a second axis based on the spectrum of each unit section;
Based on the strength of each code code in the strength graph, P representative code codes representing the unit section are extracted from all the Q code codes for each unit section, and these extracted representatives are extracted. An encoding stage that represents the acoustic signal of each individual unit section according to the code code and its strength;
For a plurality of adjacent unit sections, when there is a series of representative code codes whose scale differences are within a predetermined range, an integration process that replaces the series of representative code codes with an integrated code code that straddles a plurality of unit sections Stages,
A method for encoding an acoustic signal, comprising:

The encoding method according to claim 1,
In the section setting stage, settings are made so that adjacent unit sections partially overlap on the time axis, a reference position on the time axis is defined for each unit section, and a representative code obtained for a specific unit section A method of encoding an acoustic signal, comprising: outputting a code as a code obtained by encoding an acoustic signal related to a reference position for the specific unit section .

The encoding method according to claim 2, wherein
The section length L and the offset length ΔL are defined (where ΔL <L), the length of each unit section on the time axis is set to the section length L, and the i-th unit section for any i A reference position where the distance on the time axis between the start point and the start point of the (i + 1) th unit section is set to the offset length ΔL, and the representative code code obtained for each unit section appears in the time period of the offset length ΔL A method for encoding an acoustic signal, comprising: outputting an acoustic signal as an encoded code.

In the encoding method in any one of Claims 1-3,
In the spectrum creation stage, the acoustic signal to be encoded is sampled at a predetermined sampling period and is taken in as digital acoustic data, and a spectrum is created by performing Fourier transform on the acquired acoustic data for each unit section. A method for encoding an acoustic signal.

The encoding method according to claim 3, wherein
In the spectrum creation stage, a weighting function determined based on the offset length ΔL is set as a window function, and the Fourier transform is performed after superimposing the window function on each unit section of the acoustic signal to be encoded. A method for encoding an acoustic signal, wherein a spectrum is created by performing the method.

The encoding method according to claim 4, wherein
In the spectrum creation stage, the digital acoustic data captured at a predetermined sampling period is subjected to Fourier transform without performing decimation, and by performing decimation at a predetermined ratio and performing Fourier transform, a plurality of spectra are obtained. A method for encoding an acoustic signal, comprising preparing and synthesizing these spectra.

In the encoding method in any one of Claims 1-6,
In the intensity graph creation stage, using the note number used in MIDI data as multiple Q code codes,
At the encoding stage, the acoustic signal of each unit section is converted into the delta determined based on the note number extracted as the representative code code, the velocity determined based on the intensity, and the length of the unit section. A method for encoding an acoustic signal, wherein the method is expressed by MIDI format code data including data indicating time.

In the encoding method in any one of Claims 1-7,
An acoustic signal characterized in that, when a representative code code is extracted at the encoding stage, P code codes are extracted in descending order of the strength from candidates in an intensity graph to be encoded and used as a representative code code Encoding method.

In the encoding method in any one of Claims 1-7,
When extracting the representative code code at the encoding stage, after extracting the code code having the highest intensity from the candidates at that time in the intensity graph to be encoded as the i-th representative code code, The process of deleting the i-th representative code code and the code code corresponding to its harmonic component from the candidates is repeated for i = 1 to (P-1), and the code with the highest intensity among the remaining candidates A method for encoding an acoustic signal, comprising extracting a total of P representative code codes by extracting a code as a P-th representative code code.

In the encoding method in any one of Claims 1-7,
A sound source used to reproduce sound based on each code code is specified in advance, and a code code of a specific scale is assigned to the specific sound code based on the frequency characteristics of the reproduced sound of each code code using the sound source . Define a correction table for substituting with a lower scale code that makes the scale a harmonic ,
When the representative code code is extracted at the encoding stage, the code code having the highest intensity among the candidates at that time in the intensity graph to be encoded is set as the i-th reference code, and this i-th reference code is used. A process of extracting a code code obtained by applying the correction table to a code as an i-th representative code code and excluding the i-th reference code and the i-th representative code code from candidates, An acoustic signal encoding method, wherein i = 1 to P is repeatedly executed, and a total of P representative code codes are extracted.

In the encoding method in any one of Claims 1-7,
A sound source used to reproduce sound based on each code code is specified in advance, and a spectrum creation stage and an intensity graph are obtained for an acoustic signal obtained by actually reproducing each code code using this sound source. Execute the creation stage, obtain in advance a specific strength graph for each code code,
When extracting the representative code code at the encoding stage, the code code having the highest intensity is extracted as the i-th representative code code from the candidates at that time in the intensity graph to be encoded, and then encoded. The process of subtracting each intensity value of the inherent intensity graph for the i-th representative code code from each intensity value of the target intensity graph is repeatedly executed for i = 1 to (P−1), and the remaining And extracting a total of P representative code codes by extracting a code code having the highest strength from the candidates as a P-th representative code code.

In the encoding method in any one of Claims 1-7,
A sound source used to reproduce sound based on each code code is specified in advance, and a specific waveform of an acoustic signal obtained by actually reproducing each code code using this sound source is obtained in advance.
In order to determine the i-th representative code code, the waveform information of the i-th acoustic signal is input, the spectrum generation step and the intensity graph generation step are performed on the input waveform information, and in the subsequent encoding step, The code code having the highest intensity is extracted from the candidates in the created intensity graph as the i-th representative code code. Further, the i-th representative code code is extracted from the intensity value of the i-th acoustic signal. Define a code extraction process that subtracts each intensity value of the natural waveform and sets the resulting acoustic signal as the (i + 1) th acoustic signal,
The section setting stage is performed on the original sound signal to be encoded, and the code extraction process is performed for i = 1 to (P−1), with the original sound signal for each unit section as the first sound signal. Repeatedly execute, finally input the waveform information of the Pth acoustic signal, perform the spectrum creation stage and the intensity graph creation stage on the input waveform information, and in the subsequent encoding stage, A total of P representative code codes are extracted for each unit section by executing a process of extracting the code code having the highest strength from the candidates as the P-th representative code code. A method for encoding an acoustic signal.

A computer-readable recording medium on which a program for encoding an acoustic signal for executing the encoding method according to claim 1 is recorded.