JP4220108B2

JP4220108B2 - Acoustic signal coding system

Info

Publication number: JP4220108B2
Application number: JP2000190391A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2000-06-26
Filing date: 2000-06-26
Publication date: 2009-02-04
Anticipated expiration: 2020-06-26
Also published as: JP2002006841A

Description

【０００１】
【産業上の利用分野】
本発明は、放送メディア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音声配信、インターネット音楽配信、通信カラオケ）、パッケージメディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯＭ、ゲームカセット、携帯音楽プレーヤ向け固体メモリ媒体）などで提供する各種オーディオコンテンツの制作、並びに、専用携帯音楽プレーヤ、携帯電話・ＰＨＳ・ポケベルなどに向けたボーカルを含む音楽コンテンツ、歌舞伎・能・読経・詩歌など文芸作品の音声素材または語学教育音声教材のＭＩＤＩ伝送に利用するのに好適な音響信号の符号化技術に関する。
【０００２】
【従来の技術】
音響信号に代表される時系列信号には、その構成要素として複数の周期信号が含まれている。このため、与えられた時系列信号にどのような周期信号が含まれているかを解析する手法は、古くから知られている。例えば、フーリエ解析は、与えられた時系列信号に含まれる周波数成分を解析するための方法として広く利用されている。
【０００３】
このような時系列信号の解析方法を利用すれば、音響信号を符号化することも可能である。コンピュータの普及により、原音となるアナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして取り込むことが容易にできるようになってきており、こうして取り込んだデジタルデータに対してフーリエ解析などの手法を適用し、原音信号に含まれていた周波数成分を抽出すれば、各周波数成分を示す符号によって原音信号の符号化が可能になる。
【０００４】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形（歪み波形パターン）を記憶したＭＩＤＩ音源が別途必要になるが、その符号化効率の高さが注目を集めており、ＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウェアに広く採り入れられている。
【０００５】
そこで、音響信号に代表される時系列信号に対して、所定の手法で解析を行うことにより、その構成要素となる周期信号を抽出し、抽出した周期信号をＭＩＤＩデータを用いて符号化しようとする提案がなされている。例えば、特開平１０−２４７０９９号公報、特開平１１−７３１９９号公報、特開平１１−７３２００号公報、特開平１１−９５７５３号公報、特開平１２−９９００９号公報、特開平１２−９９０９３号公報、特願平１１−５８４３１号明細書、特願平１１−１７７８７５号明細書、特願平１１−３２９２９７号明細書には、任意の時系列信号について、構成要素となる周波数を解析し、その解析結果からＭＩＤＩデータを作成することができる種々の方法が提案されている。
【０００６】
【発明が解決しようとする課題】
上記各公報または明細書において提案してきたＭＩＤＩ符号化方式により、ボーカルを含む音楽の再現が可能となったが、オーディオ符号化一般に適用するためには、複数の楽器音色を再生できるようなマルチトラック形式符号化を実現する必要がある。これを実現するには、「音源分離技術」、「単音楽器のＭＩＤＩ符号化」の２つの問題があり、楽器音から演奏された音符を認識してＭＩＤＩ楽器で再現するという理想的なＭＩＤＩ符号化形式の実現は現状極めて困難である。
【０００７】
音源分離技術に関しては、ミキシングされた演奏録音から楽器パート、ボーカルパート別に音源分離し、パートごとにＭＩＤＩ符号化を行う必要があるが、これは現状では技術的に不可能といわれ、ある程度実現できるようになっても不完全さが残るという問題がある。単音楽器のＭＩＤＩ符号化に関しては、ピアノ、ギター、バイオリンなど弦楽器系の楽器が倍音の割合が少なく、ＭＩＤＩ符号化を行い易いが、倍音を多く含むブラスなど金管楽器系やノイズ音源である打楽器系に対してはＭＩＤＩ符号化が難しいという問題がある。
【０００８】
一方、音響信号を正弦波にスペクトル分解し、スペクトル情報を符号化する分析合成符号化方式が既にＭＰ３（MPEG-1 layer3）などで実用化されている。ＰＣＭやＡＤＰＣＭといった波形符号化方式に比べ情報量が少なく、ＭＩＤＩと同様に再生速度やピッチを変更できるという特徴がある。しかし、この方式には次の３つの問題がある。
【０００９】
第１に、圧縮率をあまり上げられず、ＭＰ３でも実用的圧縮率は１／１０程度であるという問題がある。その理由として楽器音などは無限に続く倍音成分を可聴周波数の最大まで忠実に符号化する必要があるためで、信号を正弦波に分解する考え方のデメリットである。第２に、分解した正弦波の合成処理負荷の問題がある。復号側では、スペクトル分解された１００以上の正弦波をリアルタイムに合成することが必要となっている。この技術については、既に確立されているが、復号処理は煩雑なものとなっている。第３に、ハイパーソニック領域の音再現の問題がある。人間の最大可聴周波数は２２ｋＨｚであり、そのためＣＤのサンプリング周波数は４４．１ｋＨｚと決められている。しかし、最近では２２ｋＨｚ以上のハイパーソニック領域もある程度知覚できるという説が濃厚になり、次世代のＤＶＤ−ａｕｄｉｏなどではサンプリング周波数を９６ｋＨｚに上げる方向にある。（アナログのＬＰレコードでは２２ｋＨｚ以上も再現される。）そうなるとスペクトル符号量は４倍になる。
【００１０】
一方、ＭＩＤＩ音源には、ＧＭ（General MIDI）標準で１２８種、拡張音源では数千種類以上の歪み波形が蓄積され、１６〜３２種の歪み波形をリアルタイムに合成できる。そこで、本発明は上記のような点に鑑み、音響信号から、ＭＩＤＩ音源があらかじめ用意している歪み波形に近いものを抽出して符号化することにより、楽器パートに対応した符号化データを得ることが可能な音響信号符号化システムを提供することを課題とする。
【００１１】
【課題を解決するための手段】
上記課題を解決するため、本発明では、音響信号符号化システムを、複数の歪み波形パターンを蓄積する歪み波形パターン蓄積手段と、与えられた音響信号に対して所定の解析区間を複数個定義するための区間定義手段と、前記歪み波形パターン蓄積手段から前記音響信号に対応する歪み波形パターンを基本歪み波形パターンとして入力する手段と、前記解析区間の１つの信号切片波形に対して、周波数解析を行って正弦波の集合に分解し、前記分解された正弦波を所定のルールに従って複数のグループに分け、前記各グループに属する正弦波を合成することにより各グループの歪み波を得る一方、前記基本歪み波形パターンの周波数と振幅を変化させた歪み波形バリエーションを複数作成し、前記各グループの歪み波に近い波形形状の歪み波形バリエーションを各グループの歪み波形バリエーションとして選択する信号解析手段と、前記信号解析手段により選択された各々の歪み波形バリエーションの周波数と振幅、および前記解析区間の開始時刻と終了時刻の情報を所定の符号セットで符号化することにより音響符号化データを作成する符号形成手段を有する構成としたことを特徴とする。
【００１２】
本発明によれば、上記のような構成とし、音響信号に対応した基本歪み波形パターンを入力し、この基本歪み波形パターンの周波数と振幅を変化させた複数の歪み波形バリエーションを作成し、この歪み波形バリエーションの中から、音響信号から得られる信号切片波形を周波数解析してグループ化した各グループの歪み波に最も近いものを選択して符号化するようにしたので、あらかじめ音源として用意している楽器パートに近い成分ごとに符号化を行うことが可能となる。
【００１３】
【発明の実施の形態】
以下、本発明の実施形態について図面を参照して詳細に説明する。
まず、最初に本発明による音響信号符号化システムにおける基本的な考え方について説明する。ＰＣＭなどの手法によりデジタル化された音響信号は、周知の手法である短時間フーリエ変換、一般化調和解析などにより以下の（数式１）に示すように正弦波分解できる。
【００１４】
（数式１）
ｇ（ｔ） ≒ Σ_i｛Ａ_iｓｉｎ（２πｆ_iｔ）＋Ｂ_iｃｏｓ（２πｆ_iｔ）｝
【００１５】
この（数式１）における［ｆ_i，Ａ_i，Ｂ_i］によりｇ（ｔ）を再現するのが、従来より行なわれている分析合成符号化方式の原理である。上記（数式１）を更に整理すると、以下の（数式２）のように変形される。
【００１６】
（数式２）
ｇ（ｔ）≒Σ_nα_n［Σ_j｛Ｐ_njｓｉｎ(２πｊｆ_nｔ)＋Ｑ_njｃｏｓ(２πｊｆ_nｔ)｝］
【００１７】
この（数式２）において、ｊは周波数を整数倍することを示す整数であり、ある周波数ｆ_nの整数倍の周波数ｊｆ_nを有する正弦波がグループ化された状態を示している。上記（数式２）を更に整理すると、以下の（数式３）のように変形される。
【００１８】
（数式３）
ｇ（ｔ） ≒ Σ_nα_nｕ_n（２πｆ_nｔ）
【００１９】
この（数式３）において、ｕ_n（２πｆ_nｔ）は基本周波数ｆ_nをもつ歪み波であり、この歪み波としてＭＩＤＩ音源に用意されているものの中から最も形状が近いものを選べば、理論上分析合成符号化方式と同様にｇ（ｔ）を再現することができる。本発明による利点は、（数式１）におけるｉでΣ計算を行うのに比べ、（数式３）におけるｎでΣ計算を行う方が、計算回数が圧倒的に少なく、符号化するパラメータ［ｆ_n，α_n］も少なくなる。ここで問題となるのは、この歪み波ｕ_n（２πｆ_nｔ）の決定方式であり、入力音響信号ｇ（ｔ）から直接歪み波ｕ_n（２πｆ_nｔ）を求めるのは困難である。そこで、本発明では、あらかじめＭＩＤ音源等の歪み波形パターン蓄積手段に蓄積されている歪み波形パターンから、入力音響信号に対応するものを選択する手法を採る。
【００２０】
続いて、本システムの具体的な構成について説明する。図１は、本発明による音響信号符号化システムの構成を示す機能ブロック図である。図１において、音響信号入力部１は、アナログ信号である音響信号をＰＣＭ等の手法によりデジタル化したデジタル音響信号を入力する機能を有する。区間定義部２は、音響信号の解析のために、時系列の音響信号を所定の区間で区切る機能を有する。信号解析部３は、区間定義部２で定義された区間単位で音響信号の解析を行う機能を有する。歪み波形パターン蓄積部４は、複数の歪み波形パターンを蓄積したものであり、例えば、１２８種の歪み波形を有するＭＩＤＩ音源で実現される。歪み波形パターン検索部５は、歪み波形パターン蓄積部４に蓄積された歪み波形パターンの中から、入力された音響信号に対応するものを検索し、基本歪み波形パターンとして入力するためのものである。符号形成部６は、信号解析部３により解析された音響信号をＭＩＤＩなどのデータに符号化する機能を有する。
【００２１】
次に、図１に示した音響信号符号化システムの処理動作について説明する。まず、音響信号入力部１よりＰＣＭによりデジタル化された音響信号を入力する。ここでは、例えば、図２に示すような波形で表される音響信号が入力されたものとする。図２の例では、横軸に時間ｔ、縦軸に振幅（強度）をとって、この音響信号を示している。
【００２２】
続いて、この解析対象となる音響信号の時間軸上に対して、区間定義部２が複数の解析区間を設定する。図２に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの解析区間ｄ１〜ｄ５が設定されている。図２の例では、全て同一の区間長をもった解析区間が設定されているが、個々の解析区間ごとに区間長を変えるようにしてもかまわない。あるいは、隣接する解析区間が時間軸上で部分的に重なり合うような区間設定を行ってもかまわない。こうして解析区間が設定されたら、各解析区間ごとの音響信号（以下、区間信号と呼ぶことにする）について、信号解析部３により解析が行われる。
【００２３】
信号解析部３においては、まず各解析区間ごとの一般化調和解析が行われ、区間信号を複数の正弦波に分解する。ここでは、分解により１２８０種の異なる正弦波が得られる。各正弦波の振幅は、区間信号と各周波数を有する正弦関数、余弦関数との相関に基づいて決定される。一般化調和解析は、周知の周波数解析手法であり、前掲の公報および明細書にも記載があるので、詳細な説明は省略する。
【００２４】
続いて、分解された正弦波の中から振幅値が大きいものを選択する。選択する正弦波の数は、あらかじめその数自体を設定しておくこともできるし、振幅値の閾値を設定しておき、閾値を超えた数とすることもできる。ここで、選択される正弦波の数は最終的に和音を構成する単音の数となる。正弦波が選択されたら、各正弦波の周波数を基本周波数とし、この基本周波数の整数倍の周波数を有する正弦波をグループ化する。次に、同一グループに属する正弦波を合成し、歪み波を作成する。図３に分解された正弦波と合成された歪み波の一例を示す。図３中、左側には周波数の異なる６つの正弦波が示されている。このうち、上から３番目と４番目の正弦波は、それぞれ１番上の正弦波の２倍、３倍という整数倍の周波数となっている。このため、これら３つの正弦波は１つのグループとして合成され、図３中、右上のような歪み波となる。同様に、上から５番目と６番目の正弦波は、それぞれ上から２番目の正弦波の３倍、５倍という整数倍の周波数となっているので、１つのグループとして合成され、図３中、右下のような歪み波となる。これらの処理を各グループについて行うことにより、グループ数分の歪み波が作成されることになる。グループ化の手法としては、このようなある特定の周波数を有する正弦波の整数倍の周波数を有する正弦波でグループ化する手法に限定せず、他の手法で行っても良いが、このように、整数倍の周波数の正弦波をグループ化することにより、上記（数式２）、（数式３）で示したように、演算処理の負荷を軽減することが可能となる。また、上記手法によりグループ化する際、基本周波数となるべき正弦波がグループ内に含まれない場合がある。これは基本周波数となるべき正弦波の振幅が小さいために最初の選択から外れてしまうためである。このような場合は、他のグループに属する適当な正弦波の周波数を基本周波数として、その整数倍の周波数を有する正弦波をグループ化する。
【００２５】
一方、ＭＩＤＩ音源で実現される歪み波形パターン蓄積部４からは、音響信号入力部１より入力された音響信号に対応する歪み波形パターンが、歪み波形パターン検索部５により検索され、基本歪み波形パターンとして入力される。この歪み波形パターンの検索は、解析される音響信号を別途音声として出力し、これを検索者が聴いて、どの歪み波形パターンにするかを決定し、歪み波形パターンを識別する情報を検索キーとして入力することにより行なわれる。例えば、検索者がこの音響信号を音声で聴いたときに、ピアノの音で表現するのが最適だと感じたら、ピアノの音色を表現した歪み波形パターンを検索する。
【００２６】
信号解析部３では、歪み波形パターン検索部５により検索され、入力された歪み波形パターンを基本歪み波形パターンとし、この基本歪み波形パターンの周波数、振幅を変化させた歪み波形バリエーションを作成する。作成する歪み波形バリエーションの数は、設定により変更することができるが、周波数、振幅ともに１２８通り設定したとすると、１２８×１２８通りの歪み波形バリエーションが作成されることになる。
【００２７】
この１２８×１２８通りの歪み波形バリエーションの中から、上記のようにグループ化された正弦波を合成することにより得られた各歪み波に近い波形形状を有するものを１つずつ選択する。この結果、所定数の歪み波形バリエーションが選択されることになる。
【００２８】
次に、符号形成部６が、選択された各歪み波形バリエーションの周波数と振幅、および解析区間の開始時刻と終了時刻の情報を所定の符号セットで出力する。ここで、符号化される周波数ｆ_nおよび振幅α_nは、上記（数式３）に対応する。また、解析区間については、例えば図２に示す解析区間ｄ１の場合、開始時刻はｔ１、終了時刻はｔ２となる。
【００２９】
以上の処理を音響信号の全解析区間に対して行い、各解析区間ごとに符号セットを出力する。この際、前の解析区間の終了時刻が、後の解析区間の開始時刻に近接しており、前の解析区間における符号セットの周波数および振幅が、後の解析区間における符号セットの周波数および振幅に類似している場合、２つの解析区間における符号セットの統合を行う。具体的には、前の解析区間における符号セットの終了時刻を後の解析区間における符号セットの終了時刻に変更し、後の終了時刻の符号セットを削除する。これは、近接する解析区間における音が類似している場合、同一の音であるとみなし、符号化の際に１つの音にまとめるために行なわれる。
【００３０】
符号化データとしては、特に限定する必要はないが、ＭＩＤＩ規格を利用することが好ましい。ＭＩＤＩ規格を利用した場合、上記符号セットの周波数情報は、ＭＩＤＩ規格で定義されているノートナンバーに対応する１２８種の周波数から選択され、振幅情報は、ＭＩＤＩ規格で定義されている１２８段階のベロシティで記述され、開始時刻および終了時刻がＳＭＦ規格のデルタタイムで記述され、符号セットはノートオンイベントおよびノートオフイベントの１対で記述される。
【００３１】
このようにして得られた音響符号化データは、蓄積手段または伝送手段を介して音響データ復号化システムにて復号化され、音響信号として発せられる。図４に音響データ復号化システムの機能ブロック図を示す。図４において、歪み波形パターン蓄積部４は、図１に示したものと同一の機能を有するものであるため、図１と同一符号で示している。符号化データ入力部７は音響符号化データを入力するためのものである。歪み波形バリエーション再現部８は、入力された音響符号化データに基づいて、歪み波形パターン検索部９から得られる基本歪み波形パターンに変更を加えて、歪み波形バリエーションを再現する機能を有する。歪み波形パターン検索部９は、歪み波形バリエーション再現部８の指示に従い、歪み波形パターン蓄積部４より歪み波形パターンを検索する機能を有する。信号合成部１０は符号セットごとに再現された歪み波形バリエーションを合成して音響信号を再現する機能を有する。
【００３２】
次に、図４に示した音響データ復号化システムの処理動作について説明する。音響信号符号化システムにより得られた音響符号化データが符号化データ入力部７より入力されると、歪み波形バリエーション再現部８は、音響符号化データから符号セット単位で符号データを抽出し、歪み波形バリエーションの再現処理を行う。歪み波形バリエーションの再現処理としては、まず、符号セットから基本歪み波形パターンを識別する情報を抽出する。続いて、この識別情報を用いて歪み波形パターン検索部９が歪み波形パターン蓄積部４から対応する歪み波形パターンを抽出する。歪み波形バリエーション再現部８は、抽出された歪み波形パターンを基本歪み波形パターンとして取得し、この基本歪み波形パターンに対して、符号セットが有する周波数、振幅になるように変形を行い、歪み波形バリエーションを作成する。この歪み波形バリエーションは、１つの符号セットについて、和音を構成する単音の数だけ作成されることになる。
【００３３】
歪み波形バリエーション再現部８による処理を全ての符号セットに対して行い、歪み波形バリエーションが全て再現されたら、信号合成部１０が再現された歪み波形バリエーションの合成を行う。この合成処理は、各符号セットの開始時刻と終了時刻の情報を基に時間軸上で行なわれる。このような符号セットの開始時刻と終了時刻から、符号セットに対応する歪み波を合成する処理は、ＭＩＤＩ符号データをＭＩＤＩ音源を用いて再生する際に従来より行われている手法であるので、詳細な説明は省略する。信号合成部１０により合成されることにより再現された音響信号は、例えば、ＰＣＭの形式で出力することができる。
【００３４】
以上、本発明の好適な実施形態について説明したが、本発明は上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、音響符号化データが１つのトラックにより構成される場合について説明したが、音響符号化データが複数のトラックにより構成されるようにしても良い。１つのトラックは、１つの楽器パートに対応させることができるため、複数のトラックを用意することにより、複数の楽器で演奏された楽曲を再現することができる。複数のトラックを有する音響符号化データを作成する場合は、図１に示した歪み波形パターン蓄積部４より複数の歪み波形パターンを、歪み波形パターン入力部５により検索して入力するようにすれば良い。これにより、信号解析部３が複数の基本歪み波形パターンを認識し、それぞれについてトラックを作成する。そして各トラックに割り当てられるべき符号セットが、各基本歪み波形パターンについて作成されることになる。ＭＩＤＩ規格の場合、トラックはチャンネルに対応し、基本歪みパターンを識別するための情報は、楽器音色を示すプログラム番号、バンク番号に対応する。
【００３５】
また、上記実施形態では、歪み波形パターン蓄積部４にあらかじめ用意されている歪み波形パターンを検索し、基本歪み波形パターンとして入力するようにしていたが、ユーザが入力された音響に最も適合すると思われる歪み波形パターンを歪み波形パターン蓄積部４に随時追加する方法をとることも可能である。その場合、ＭＩＤＩ規格で定義されるバンク番号の基本歪み波形パターン未定義領域をユーザが追加した基本歪み波形パターンとして定義すれば、ＭＩＤＩ規格を逸脱せずに符号化および復号化を行うことが可能であり、サンプラー機能をもつＭＩＤＩ規格音源を用いてこのような波形追加を実現できる。
【００３６】
【発明の効果】
以上、説明したように本発明によれば、複数の歪み波形パターンを蓄積する歪み波形パターン蓄積手段と、与えられた音響信号に対して所定の解析区間を複数個定義するための区間定義手段と、前記歪み波形パターン蓄積手段から前記音響信号に対応する歪み波形パターンを基本歪み波形パターンとして入力する手段と、前記解析区間の１つの信号切片波形に対して、周波数解析を行って正弦波の集合に分解し、前記分解された正弦波を所定のルールに従って複数のグループに分け、前記各グループに属する正弦波を合成することにより各グループの歪み波を得る一方、前記基本歪み波形パターンの周波数と振幅を変化させた歪み波形バリエーションを複数作成し、前記各グループの歪み波に近い波形形状の歪み波形バリエーションを各グループの歪み波形バリエーションとして選択する信号解析手段と、前記信号解析手段により選択された各々の歪み波形バリエーションの周波数と振幅、および前記解析区間の開始時刻と終了時刻の情報を所定の符号セットで符号化することにより音響符号化データを作成する符号形成手段を有するようにしたので、あらかじめ音源として用意している楽器パートに近い成分ごとに符号化を行うことが可能となるという効果を奏する。
【図面の簡単な説明】
【図１】本発明による音響信号符号化システムの構成を示す機能ブロック図である。
【図２】区間定義部２による解析区間の定義を説明するための図である。
【図３】分解された正弦波と、正弦波を合成することに得られる歪み波を示す図である。
【図４】音響データ復号化システムの構成を示す機能ブロック図である。
【符号の説明】
１・・・音響信号入力部
２・・・区間定義部
３・・・信号解析部
４・・・歪み波形パターン蓄積部
５・・・歪み波形パターン検索部
６・・・符号形成部
７・・・符号化データ入力部
８・・・歪み波形バリエーション再現部
９・・・歪み波形パターン検索部
１０・・・信号合成部[0001]
[Industrial application fields]
The present invention includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet music distribution, communication karaoke), package media (CD, MD, cassette, video, LD, CD-ROM, game cassette, mobile phone). Production of various audio contents provided by a solid-state memory medium for music players), music content including vocals for dedicated mobile music players, mobile phones, PHS, pagers, literary arts such as Kabuki, Noh, Reading, Poetry The present invention relates to an audio signal encoding technique suitable for use in MIDI transmission of audio material of a work or audio teaching material for language education.
[0002]
[Prior art]
A time-series signal represented by an acoustic signal includes a plurality of periodic signals as its constituent elements. For this reason, a method for analyzing what kind of periodic signal is included in a given time-series signal has been known for a long time. For example, Fourier analysis is widely used as a method for analyzing frequency components included in a given time series signal.
[0003]
By using such a time-series signal analysis method, an acoustic signal can be encoded. With the spread of computers, it has become easy to sample an analog audio signal as the original sound at a predetermined sampling frequency, quantize the signal intensity at each sampling, and capture it as digital data. If a method such as Fourier analysis is applied to the data and the frequency components included in the original sound signal are extracted, the original sound signal can be encoded by a code indicating each frequency component.
[0004]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, in order to reproduce the actual sound, a MIDI sound source that stores the waveform of the instrument sound (distortion waveform pattern) is required separately, but its high encoding efficiency has attracted attention, and is based on the MIDI standard. Encoding and decoding techniques are currently widely used in software that performs musical instrument performance, musical instrument practice, composition, etc. using a personal computer.
[0005]
Therefore, by analyzing a time-series signal represented by an acoustic signal by a predetermined method, a periodic signal as a constituent element is extracted, and the extracted periodic signal is encoded using MIDI data. Proposals have been made. For example, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200, JP-A-11-95753, JP-A-12-99009, JP-A-12-99093, In Japanese Patent Application No. 11-58431, Japanese Patent Application No. 11-177875, and Japanese Patent Application No. 11-329297, the frequency as a component is analyzed for an arbitrary time series signal, and the analysis is performed. Various methods for creating MIDI data from the results have been proposed.
[0006]
[Problems to be solved by the invention]
The MIDI encoding method proposed in each of the above publications or specifications has made it possible to reproduce music including vocals, but in order to apply to audio encoding in general, a multitrack that can reproduce a plurality of instrument sounds Formal encoding needs to be realized. In order to realize this, there are two problems of “sound source separation technology” and “MIDI encoding of a single musical instrument”, and an ideal MIDI code that recognizes a musical note played from an instrument sound and reproduces it with a MIDI instrument. The realization of the conversion format is extremely difficult at present.
[0007]
As for sound source separation technology, it is necessary to separate sound sources from mixed performance recordings for each instrument part and vocal part and perform MIDI encoding for each part, but this is technically impossible at present and can be realized to some extent. There is a problem that imperfection remains even if it becomes. As for MIDI encoding of single music instruments, stringed instruments such as piano, guitar, violin have a low ratio of harmonics and are easy to perform MIDI encoding, but brass instruments such as brass containing many overtones and percussion instruments that are noise sources However, there is a problem that MIDI encoding is difficult.
[0008]
On the other hand, an analysis / synthesis coding method for spectrally decomposing an acoustic signal into a sine wave and coding spectrum information has already been put to practical use in MP3 (MPEG-1 layer 3) and the like. The amount of information is smaller than that of waveform coding methods such as PCM and ADPCM, and the reproduction speed and pitch can be changed as in the case of MIDI. However, this method has the following three problems.
[0009]
First, there is a problem that the compression ratio cannot be increased so much and the practical compression ratio is about 1/10 even with MP3. The reason for this is that musical instrument sounds and the like need to faithfully encode overtone components that continue indefinitely to the maximum audible frequency, which is a disadvantage of the idea of decomposing signals into sine waves. Second, there is a problem of the combined processing load of the decomposed sine wave. On the decoding side, it is necessary to synthesize 100 or more sine waves subjected to spectral decomposition in real time. This technique has already been established, but the decoding process is complicated. Third, there is a problem of sound reproduction in the hypersonic region. The maximum human audible frequency is 22 kHz, so the sampling frequency of the CD is determined to be 44.1 kHz. Recently, however, the theory that a hypersonic region of 22 kHz or higher can be perceived to some extent has become rich, and the next-generation DVD-audio or the like tends to increase the sampling frequency to 96 kHz. (Analog LP records also reproduce more than 22 kHz.) Then, the amount of spectral code is quadrupled.
[0010]
On the other hand, 128 types of GM (General MIDI) standard are stored in the MIDI sound source, and thousands or more types of distortion waveforms are stored in the extended sound source, and 16 to 32 types of distortion waveforms can be synthesized in real time. Therefore, in view of the above points, the present invention obtains encoded data corresponding to a musical instrument part by extracting and encoding from a sound signal a distortion waveform close to a MIDI sound source prepared in advance. It is an object of the present invention to provide an acoustic signal encoding system that can perform the above-described process.
[0011]
[Means for Solving the Problems]
In order to solve the above-described problems, in the present invention, an acoustic signal encoding system defines a plurality of distortion waveform pattern storage means for storing a plurality of distortion waveform patterns and a plurality of predetermined analysis sections for a given acoustic signal. a section defining means for, with respect to the strain waveform pattern and the hand stage from the storage unit to enter the distorted waveform pattern corresponding to the acoustic signal as a basic distorted waveform pattern, one signal sections waveform of the analysis zones, While performing frequency analysis to decompose into a set of sine waves, divide the decomposed sine waves into a plurality of groups according to a predetermined rule, and obtain a distorted wave of each group by synthesizing the sine waves belonging to each group the fundamental distortion frequency and strain waveform variation with varying amplitude of the waveform patterns create multiple waveform distortion shape close to the strain wave of each group And signal analyzing means for selecting a shape variation as a distorted waveform variation of each group, the signal frequency and amplitude of the distortion waveform variation of each selected by the analyzing means, and start and end time information given of the analysis zones It is characterized by having a code forming means for creating acoustic encoded data by encoding with a code set.
[0012]
According to the present invention, a structure as described above, the basic distorted waveform pattern corresponding to the acoustic signal to enter, to create a plurality of strain waveform variations of varying the frequency and amplitude of the fundamental distortion waveform pattern, the Among the distortion waveform variations, the signal intercept waveform obtained from the acoustic signal is frequency-analyzed and the group closest to the distortion wave of each group is selected and encoded. It is possible to perform encoding for each component close to a musical instrument part.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
First, the basic concept in the acoustic signal encoding system according to the present invention will be described. An acoustic signal digitized by a technique such as PCM can be decomposed into a sine wave as shown in the following (Equation 1) by short-time Fourier transform, generalized harmonic analysis, and the like, which are well-known techniques.
[0014]
(Formula 1)
g (t) ≈Σ _i {A _i sin (2πf _i t) + B _i cos (2πf _i t)}
[0015]
Reproducing g (t) by [f _i , A _i , B _i ] in (Equation 1) is the principle of a conventional analysis / synthesis coding method. When the above (Formula 1) is further arranged, it is transformed as the following (Formula 2).
[0016]
(Formula 2)
g (t) ≈Σ _n α _n [Σ _j {P _nj sin (2πjf _n t) + Q _nj cos (2πjf _n t)}]
[0017]
In this (Formula 2), j is an integer indicating that the frequency is multiplied by an integer, and indicates a state in which sine waves having a frequency jf _n that is an integral multiple of a certain frequency f _n are grouped. When the above (Formula 2) is further arranged, it is transformed as the following (Formula 3).
[0018]
(Formula 3)
g (t) ≈Σ _n α _n u _n (2πf _n t)
[0019]
In this (Equation 3), u _n (2πf _n t) is a distorted wave having the fundamental frequency f _n , and if the one having the closest shape is selected from those prepared in the MIDI sound source as this distorted wave, the theory G (t) can be reproduced in the same manner as the above analysis / synthesis coding method. An advantage of the present invention, compared to the perform calculations Σ in i in (Equation 1), who performs Σ calculated by n in (Equation 3) is overwhelmingly small number of calculations, the parameters to be encoded [f _n , Α _n ] also decreases. The problem here is a method of determining the strain wave u _{_n} (2πf _n t), it is difficult to determine the input audio signal g (t) directly from the strain wave u _{_n} (2πf _n t). Therefore, the present invention adopts a method of selecting one corresponding to the input acoustic signal from the distortion waveform patterns stored in advance in the distortion waveform pattern storage means such as the MID sound source.
[0020]
Subsequently, a specific configuration of the present system will be described. FIG. 1 is a functional block diagram showing the configuration of an acoustic signal encoding system according to the present invention. In FIG. 1, an acoustic signal input unit 1 has a function of inputting a digital acoustic signal obtained by digitizing an acoustic signal that is an analog signal by a technique such as PCM. The section definition unit 2 has a function of dividing a time-series acoustic signal into predetermined sections in order to analyze the acoustic signal. The signal analysis unit 3 has a function of analyzing an acoustic signal in units of sections defined by the section definition unit 2. The distortion waveform pattern accumulation unit 4 accumulates a plurality of distortion waveform patterns, and is realized by, for example, a MIDI sound source having 128 types of distortion waveforms. The distortion waveform pattern search unit 5 searches the distortion waveform pattern accumulated in the distortion waveform pattern accumulation unit 4 for the one corresponding to the input acoustic signal and inputs it as a basic distortion waveform pattern. . The code forming unit 6 has a function of encoding the acoustic signal analyzed by the signal analyzing unit 3 into data such as MIDI.
[0021]
Next, the processing operation of the acoustic signal encoding system shown in FIG. 1 will be described. First, an acoustic signal digitized by PCM is input from the acoustic signal input unit 1. Here, for example, an acoustic signal represented by a waveform as shown in FIG. 2 is input. In the example of FIG. 2, this acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis.
[0022]
Subsequently, the section definition unit 2 sets a plurality of analysis sections on the time axis of the acoustic signal to be analyzed. In the example shown in FIG. 2, six times t1 to t6 are defined at equal intervals on the time axis t, and five analysis sections d1 to d5 having these times as a start point and an end point are set. In the example of FIG. 2, analysis sections having the same section length are set, but the section length may be changed for each analysis section. Alternatively, section settings may be made such that adjacent analysis sections partially overlap on the time axis. When the analysis section is set in this way, the signal analysis unit 3 analyzes the acoustic signal for each analysis section (hereinafter referred to as section signal).
[0023]
In the signal analysis unit 3, first, generalized harmonic analysis is performed for each analysis section, and the section signal is decomposed into a plurality of sine waves. Here, 1280 different sine waves are obtained by decomposition. The amplitude of each sine wave is determined based on the correlation between the interval signal and the sine function and cosine function having each frequency. Generalized harmonic analysis is a well-known frequency analysis method, and is described in the above-mentioned publications and specifications, so detailed description thereof is omitted.
[0024]
Subsequently, one having a large amplitude value is selected from the decomposed sine waves. The number of sine waves to be selected can be set in advance, or the threshold value of the amplitude value can be set and the number exceeding the threshold value. Here, the number of sine waves to be selected is the number of single notes that finally make up a chord. When a sine wave is selected, the frequency of each sine wave is set as a fundamental frequency, and sine waves having an integer multiple of the fundamental frequency are grouped. Next, sine waves belonging to the same group are synthesized to create a distorted wave. FIG. 3 shows an example of a decomposed sine wave and a combined distorted wave. In FIG. 3, six sine waves having different frequencies are shown on the left side. Of these, the third and fourth sine waves from the top have a frequency that is an integral multiple of twice or three times the top sine wave, respectively. For this reason, these three sine waves are combined as one group and become a distorted wave as shown in the upper right in FIG. Similarly, the fifth and sixth sine waves from the top have a frequency that is an integer multiple of three and five times that of the second sine wave from the top. , It becomes a distorted wave like the lower right. By performing these processes for each group, as many distortion waves as the number of groups are created. The method of grouping is not limited to the method of grouping with a sine wave having an integer multiple of a sine wave having a specific frequency, but other methods may be used. By grouping sine waves having an integer multiple frequency, it is possible to reduce the processing load as shown in the above (Equation 2) and (Equation 3). In addition, when grouping by the above method, a sine wave that should be a fundamental frequency may not be included in the group. This is because the amplitude of the sine wave that should be the fundamental frequency is small, so that it is not the first choice. In such a case, sine waves having a frequency that is an integral multiple of the frequency of an appropriate sine wave belonging to another group are grouped.
[0025]
On the other hand, a distortion waveform pattern corresponding to the acoustic signal input from the acoustic signal input unit 1 is searched by the distortion waveform pattern search unit 5 from the distortion waveform pattern storage unit 4 realized by the MIDI sound source, and the basic distortion waveform pattern is searched. Is entered as In this distortion waveform pattern search, an acoustic signal to be analyzed is output as a separate voice, the searcher listens to it, decides which distortion waveform pattern to use, and uses the information for identifying the distortion waveform pattern as a search key. This is done by inputting. For example, when the searcher listens to the sound signal by voice and feels that it is optimal to express the sound with a piano sound, a search is made for a distortion waveform pattern expressing the tone of the piano.
[0026]
The signal analysis unit 3 uses the distortion waveform pattern searched and input by the distortion waveform pattern search unit 5 as a basic distortion waveform pattern, and creates a distortion waveform variation in which the frequency and amplitude of the basic distortion waveform pattern are changed. The number of distortion waveform variations to be created can be changed by setting. However, if 128 frequencies and amplitudes are set, 128 × 128 distortion waveform variations are created.
[0027]
From these 128 × 128 distortion waveform variations, one having a waveform shape close to each distortion wave obtained by synthesizing the sine waves grouped as described above is selected one by one. As a result, a predetermined number of distortion waveform variations are selected.
[0028]
Next, the code forming unit 6 outputs the frequency and amplitude of each selected distortion waveform variation and information on the start time and end time of the analysis section in a predetermined code set. Here, the frequency f _n and the amplitude α _n to be encoded correspond to the above (Formula 3). As for the analysis section, for example, in the case of the analysis section d1 shown in FIG. 2, the start time is t1 and the end time is t2.
[0029]
The above processing is performed for all analysis sections of the acoustic signal, and a code set is output for each analysis section. At this time, the end time of the previous analysis interval is close to the start time of the subsequent analysis interval, and the frequency and amplitude of the code set in the previous analysis interval are changed to the frequency and amplitude of the code set in the subsequent analysis interval. If they are similar, the code sets in the two analysis intervals are integrated. Specifically, the code set end time in the previous analysis interval is changed to the code set end time in the subsequent analysis interval, and the code set of the later end time is deleted. This is performed so that when sounds in adjacent analysis sections are similar, they are regarded as the same sound and are combined into one sound at the time of encoding.
[0030]
The encoded data need not be particularly limited, but it is preferable to use the MIDI standard. When the MIDI standard is used, the frequency information of the code set is selected from 128 types of frequencies corresponding to the note numbers defined in the MIDI standard, and the amplitude information is a 128-step velocity defined in the MIDI standard. The start time and end time are described in SMF standard delta time, and the code set is described as a pair of note-on event and note-off event.
[0031]
The acoustic encoded data obtained in this way is decoded by an acoustic data decoding system via a storage means or a transmission means, and is emitted as an acoustic signal. FIG. 4 shows a functional block diagram of the acoustic data decoding system. In FIG. 4, the distortion waveform pattern accumulating unit 4 has the same function as that shown in FIG. The encoded data input unit 7 is for inputting acoustic encoded data. The distorted waveform variation reproducing unit 8 has a function of reproducing the distorted waveform variation by changing the basic distorted waveform pattern obtained from the distorted waveform pattern searching unit 9 based on the input encoded sound data. The distortion waveform pattern search unit 9 has a function of searching for a distortion waveform pattern from the distortion waveform pattern storage unit 4 in accordance with an instruction from the distortion waveform variation reproduction unit 8. The signal synthesis unit 10 has a function of reproducing the acoustic signal by synthesizing the distortion waveform variations reproduced for each code set.
[0032]
Next, the processing operation of the acoustic data decoding system shown in FIG. 4 will be described. When the acoustic encoded data obtained by the acoustic signal encoding system is input from the encoded data input unit 7, the distortion waveform variation reproducing unit 8 extracts code data in units of code sets from the acoustic encoded data. Performs waveform variation reproduction processing. As a distortion waveform variation reproduction process, first, information for identifying a basic distortion waveform pattern is extracted from a code set. Subsequently, using this identification information, the distortion waveform pattern search unit 9 extracts a corresponding distortion waveform pattern from the distortion waveform pattern storage unit 4. The distortion waveform variation reproduction unit 8 obtains the extracted distortion waveform pattern as a basic distortion waveform pattern, and performs deformation so that the basic distortion waveform pattern has the frequency and amplitude of the code set. Create This distortion waveform variation is created for each code set by the number of single notes constituting the chord.
[0033]
The processing by the distortion waveform variation reproduction unit 8 is performed on all the code sets, and when all the distortion waveform variations are reproduced, the signal synthesis unit 10 synthesizes the reproduced distortion waveform variation. This synthesis process is performed on the time axis based on the information on the start time and end time of each code set. Since the process of synthesizing the distorted wave corresponding to the code set from the start time and the end time of such a code set is a method conventionally performed when reproducing MIDI code data using a MIDI sound source, Detailed description is omitted. The acoustic signal reproduced by being synthesized by the signal synthesizing unit 10 can be output in the PCM format, for example.
[0034]
The preferred embodiments of the present invention have been described above. However, the present invention is not limited to the above embodiments, and various modifications can be made. For example, in the above embodiment, the case where the encoded sound data is configured by one track has been described, but the encoded sound data may be configured by a plurality of tracks. Since one track can correspond to one instrument part, by preparing a plurality of tracks, it is possible to reproduce music played by a plurality of instruments. When creating encoded audio data having a plurality of tracks, a plurality of distortion waveform patterns may be retrieved from the distortion waveform pattern storage unit 4 shown in FIG. good. As a result, the signal analysis unit 3 recognizes a plurality of basic distortion waveform patterns and creates a track for each. A code set to be assigned to each track is created for each basic distortion waveform pattern. In the MIDI standard, a track corresponds to a channel, and information for identifying a basic distortion pattern corresponds to a program number indicating a musical instrument tone color and a bank number.
[0035]
In the above embodiment, a distortion waveform pattern prepared in advance in the distortion waveform pattern storage unit 4 is searched and input as a basic distortion waveform pattern. However, it seems to be most suitable for the sound input by the user. It is also possible to add a distorted waveform pattern to be added to the distorted waveform pattern storage unit 4 as needed. In that case, if the basic distortion waveform pattern undefined area of the bank number defined by the MIDI standard is defined as a basic distortion waveform pattern added by the user, encoding and decoding can be performed without departing from the MIDI standard. Such waveform addition can be realized using a MIDI standard sound source having a sampler function.
[0036]
【The invention's effect】
As described above, according to the present invention, the distortion waveform pattern storage means for storing a plurality of distortion waveform patterns, and the section definition means for defining a plurality of predetermined analysis sections for a given acoustic signal, , and hand stage to enter the distorted waveform pattern corresponding to the acoustic signal from the distorted waveform pattern storage means as a basic distorted waveform pattern, for one of the signal sections waveform of the analysis zones, sine performs frequency analysis While disassembling into a set of waves, dividing the decomposed sine waves into a plurality of groups according to a predetermined rule, and combining the sine waves belonging to each group to obtain a distorted wave of each group , the basic distortion waveform pattern frequency and distortion waveform variations with varying amplitudes create multiple, each group distortion waveform variation of the waveform shape close to the strain wave of each group And signal analyzing means for selecting as a distorted waveform variations flops, numerals frequency and amplitude of the distortion waveform variation of each selected by the signal analyzing means, and the information of the start and end times of the analysis zones in a predetermined code set Therefore, there is an effect that it is possible to perform coding for each component close to a musical instrument part prepared as a sound source in advance.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing a configuration of an acoustic signal encoding system according to the present invention.
FIG. 2 is a diagram for explaining analysis section definition by the section definition section 2;
FIG. 3 is a diagram showing a decomposed sine wave and a distorted wave obtained by synthesizing the sine wave.
FIG. 4 is a functional block diagram showing a configuration of an acoustic data decoding system.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Acoustic signal input part 2 ... Section definition part 3 ... Signal analysis part 4 ... Distortion waveform pattern storage part 5 ... Distortion waveform pattern search part 6 ... Code formation part 7 ... Encoded data input unit 8 ... distortion waveform variation reproduction unit 9 ... distortion waveform pattern search unit 10 ... signal synthesis unit

Claims

Distortion waveform pattern storage means for storing a plurality of distortion waveform patterns, section definition means for defining a plurality of predetermined analysis sections for a given acoustic signal, and from the distortion waveform pattern storage means to the acoustic signal hand stage to enter the corresponding distorted waveform pattern as basic distorted waveform pattern,
One signal intercept waveform in the analysis interval is subjected to frequency analysis to be decomposed into a set of sine waves, the decomposed sine waves are divided into a plurality of groups according to a predetermined rule, and the sine waves belonging to each group The distortion wave of each group is obtained by synthesizing, while creating multiple distortion waveform variations in which the frequency and amplitude of the basic distortion waveform pattern are changed, and the distortion waveform variation of the waveform shape close to the distortion wave of each group is created. Signal analysis means to select as a distortion waveform variation of each group ;
Code formation for generating acoustic encoded data by encoding the frequency and amplitude of each distortion waveform variation selected by the signal analysis means, and information on the start time and end time of the analysis section with a predetermined code set An acoustic signal encoding system comprising: means.

With the previous SL basic distorted waveform pattern is a multiple-input, the sound encoded data is composed of a plurality of tracks, said isolated in any of the tracks relative to the basic distorted waveform patterns each code set is used, each The acoustic signal encoding system according to claim 1, wherein identification information of a basic distortion waveform pattern selected for each track is added.

The code forming means is configured to perform the operation in the case where the end time of the previous code set is close to the start time of the subsequent code set and the frequency and amplitude of the previous code set are similar to the frequency and amplitude of the subsequent code set. The acoustic signal encoding system according to claim 1 or 2 , wherein the end time of the code set is changed to the end time of the subsequent code set and the subsequent code set is deleted.

The grouping in the signal analyzing means is performed such that the frequency of the sine wave belonging to each group is an integer multiple of the frequency of one or more specific sine waves in the group. The acoustic signal encoding system according to any one of claims 1 to 3 .

The grouping in the signal analyzing means is performed such that the frequency of the sine wave belonging to each group is an integer multiple of the frequency of one or more specific sine waves belonging to another group. The acoustic signal encoding system according to claim 4 .

The frequency information of the code set is selected from 128 types of frequencies corresponding to the note numbers defined in the MIDI standard, the amplitude information is described in 128 levels of velocity defined in the MIDI standard, and the start time and end time Are described in SMF standard delta time, the code set is described as a pair of MIDI standard note-on and note-off events, and the distortion waveform pattern is realized by a MIDI sound source. The acoustic signal encoding system according to any one of claims 1 to 5 .

The frequency information of the code set is selected from 128 types of frequencies corresponding to the note numbers defined in the MIDI standard, the amplitude information is described in 128 levels of velocity defined in the MIDI standard, and the start time and end time Is described in SMF standard delta time, the code set is described as a pair of MIDI standard note-on event and note-off event, the distortion waveform pattern is realized by a MIDI sound source, and the track is 3. The acoustic signal encoding system according to claim 2, wherein the identification signal is a MIDI standard channel, and identification information of a basic distortion waveform pattern is described by a program number and a bank number indicating a musical instrument tone color according to the MIDI standard.

An acoustic data decoding system for reproducing the acoustic signal by using acoustic encoded data encoded by a predetermined method in advance for a given acoustic signal,
A distorted waveform pattern storing means for storing a plurality of strain waveform pattern, the encoded data input means for inputting the sound encoded data, a distorted waveform pattern corresponding to the input audio coded data, the sound coded data The distortion waveform pattern search means for searching from the distortion waveform pattern storage means based on the identification information of the code set of the code, and the searched basics based on the frequency and amplitude information of the code set of the acoustic encoded data Distortion waveform variation reproduction means for creating a distortion waveform variation by changing the frequency and amplitude of the distortion waveform pattern, and each of the code sets with respect to a plurality of distortion waveform variations obtained by creating each of the code sets wherein by performing synthesis on the time axis based on the information of the start and end times Sound data decoding system characterized in that it comprises a signal combining means for reproducing a sound signal.