JP4344438B2

JP4344438B2 - Audio signal waveform processing device

Info

Publication number: JP4344438B2
Application number: JP30061499A
Authority: JP
Inventors: 智日下部
Original assignee: Roland Corp
Current assignee: Roland Corp
Priority date: 1999-10-22
Filing date: 1999-10-22
Publication date: 2009-10-14
Anticipated expiration: 2019-10-22
Also published as: JP2001117595A

Abstract

PROBLEM TO BE SOLVED: To allow an audio signal waveform processor, which performs a time stretching and pitch shifting by a phase vocoder system, to prevent a noise due to a pre-echo generated by band division from being generated. SOLUTION: The audio signal waveform processor has, as waveform data, the amplitudes and phases of respective band components obtained by dividing an audio signal into the bands, restores an audio signal waveform by reproducing and putting together the band components of the respective bands while performing time compression or expansion according to the waveform data, and performs phase resetting in the beginning of a pre-echo section generated in the audio signal waveform by the band division so that respective restored band components are in phase with the respective original band components shown by the above waveform data.

Description

【０００１】
【発明の属する技術分野】
本発明は、位相ボコーダ方式によりタイムストレッチ（時間圧縮／伸長）やピッチシフト（ピッチ変換）を行うオーディオ信号波形処理装置に関するものである。
【０００２】
位相ボコーダ方式においては、分析系では、原音のオーディオ信号波形を帯域フィルタを用いて複数の周波数帯域（バンド）に分割し、各帯域のバンド成分をそれぞれ分析してその出力振幅と位相を特徴パラメータとして抽出して保持しておき、合成系では、各帯域についてその出力振幅と位相を用いて元のバンド成分を再生し、それら各帯域のバンド成分を加算合成して、元のオーディオ信号波形を復元する。
【０００３】
図１０はこの位相ボコーダ方式によるオーディオ信号波形処理装置の概念を説明する。図示するように、オーディオ信号波形Ｘ(n) を複数の分析部６０に入力する。この例では、分析部６０はオーディオ信号波形の周波数を１００に帯域分割した各帯域対応に設ける。各分析部６０は図１１に示すようにオーディオ信号波形の概略基本周波数をそれぞれ中心周波数とするバンド０〜９９を持つもので、図１３に示すごとくの構成を持つ。すなわち、例えばバンドｋの分析部は、入力したオーディオ信号波形Ｘ(n) をその中心の複素周波数sin(ωｋ）、cos(ωｋ) にて乗算（同期検波）してバンドｋの出力振幅値を得るとともに、その検波出力の位相値を微分等して瞬間周波数の情報を得る。この瞬間周波数は、各時点（波形の時間軸上の各位置）における単位時間あたりの位相の変化量（微分値）であり、中心周波数からの周波数偏差を示す情報である。
【０００４】
図１０の波形処理装置では、このオーディオ信号波形Ｘ(n) の各バンドの波形データ（出力振幅と瞬間周波数）は波形メモリ６１に格納される。波形メモリ６１への波形データ格納の態様は、図１２に示すごとく、オーディオ信号波形ｘ(n) の時間軸上の各アドレスaddr(0) 〜addr(n) に対して、各バンド０〜９９毎に、振幅データＡと瞬間周波数データｆとが格納されるものである。
【０００５】
合成系は、各バンド毎に設けられたバンド成分再生部６２からなり、各バンド成分再生部６２は時間周波数変換処理部３１と余弦発振器３３と乗算器３３とからなる。時間周波数変換処理部３１は、図１４に示すような構成からなる。すなわち、入力した出力振幅の値に対しては、タイムストレッチ比（時間圧縮／伸長比）に応じてサンプル点を補間部で飛越し／追加補間してその振幅エンベロープ（振幅値の経時的変化を示すエンベロープ）を圧縮／伸長した振幅値を出力する。また、入力した瞬間周波数の値に対しては、その瞬間周波数値に中心の角周波数ωk を加算するとともに、ピッチ変換を行う場合にはこの瞬間周波数値に周波数変換比（ピッチシフトの度合いに応じた比）を乗算し、タイムストレッチ比に応じてサンプル点を補間部で飛越し／追加補間してその周波数エンベロープ（見なし周波数値の経時的変化を示すエンベロープ）を圧縮／伸長した瞬間周波数の値を出力する。
【０００６】
図１５はこの振幅値と瞬間周波数の補間処理の様子を示す図である。時間伸長する場合には、図１５（ａ）に示すように、元の振幅エンベロープと周波数エンベロープをともに引き伸ばして、時間軸を伸長した振幅値と瞬間周波数とを生成する。また、時間圧縮する場合には、図１５（ｂ）に示すように、元の振幅エンベロープと周波数エンベロープをともに縮めて、時間軸を圧縮した振幅値と瞬間周波数とを生成する。この補間処理により、元のオーディオ信号波形の時間軸を任意に圧縮／伸長することができる。
【０００７】
時間周波数変換処理部３１で処理された瞬間周波数値（適宜、タイムストレッチされたもの）は余弦発振器に供給され、それにより余弦発振器はそのバンドの周波数の余弦波を発生し、その余弦波に、時間周波数変換処理部３１で処理された振幅エンベロープを付加して出力する。これにより、当該バンドの成分信号が再生される。さらに、これら各バンド０〜９９のバンド成分を加算合成することで、元のオーディオ信号波形を復元できる。
【０００８】
【発明が解決しようとする課題】
上記の位相ボコーダ方式で楽音のオーディオ信号波形を再生するにあたり、ピアノなどのオーディオ信号波形はそのアタック部分で時間的にレベルが大きく変化する。このようなオーディオ信号波形をディジタルフィルタを用いて帯域分割すると、一つ一つのバンドには、アタックに先行してその痕跡、いわゆる「プリエコー」が現れる。
【０００９】
図９はこのプリエコー現象を説明する図である。簡単のために、図９では、元のオーディオ信号波形を高周波帯域と低周波帯域に２分した場合について示してある。典型的なアタックの波形（ａ）を、低周波成分の波形（ｂ）と高周波成分の波形（ｃ）に分離し、プリエコーの発生する様子を示している。低周波成分波形（ｂ）と高周波成分波形（ｃ）では、元のアタック波形（ａ）に先行している無音部分にプリエコーが生じる様子が分かる。低周波成分波形（ｂ）のプリエコーと高周波成分波形（ｃ）のプリエコーは互いに逆相となっており、したがって、最終的に低周波成分波形（ｂ）と高周波成分波形をそのままの位相で合成すると、両者のプリエコーは互いに相殺され、無音部分が復元する。このことは、分割する帯域（バンド）数が多い場合も同じであり、各バンドの成分波形を再生後に加算合成すれば、プリエコーは相殺し合って無音部分となる。
【００１０】
かかるプリエコーは、使用するディジタルフィルタ（ＦＩＲフィルタ）のタップ数が大きいほど甚だしく発生するようになり、その発生限界はフィルタのタップ数そのものになり、一般にフィルタの遅延量からプリエコーの大きさを見積もることができる。
【００１１】
ところで、位相ボコーダ方式を用いてタイムストレッチやピッチシフトを行うと、一般に各バンドの波形は元波形の位相には戻らず、位相が変化してしまう。このため、各バンドのバンド成分を再生した後に加算合成しても、各バンドの波形のプリエコー部分も位相が変化しているため、これらプリエコー部分が互いに相殺し合うことにならず、この結果、合成後の信号にアタッフに先行するプリエコーが残ってしまう。
【００１２】
このため、通常、位相ボコーダを用いた波形再生では、アタックの直前にプリエコーによる雑音が生じることになり、アタック音が歯切れのよいものにならず、いわゆるアタック感が著しく損なわれてしまう。
【００１３】
本発明はかかる問題点に鑑みてなされたものであり、位相ボコーダ方式を用いつつも、帯域分割により発生するプリエコーに起因する雑音の発生を防止することを目的とする。
【００１４】
【課題を解決するための手段および作用】
本発明が適用されるオーディオ信号波形処理装置では、オーディオ信号を複数バンドに帯域分割した各バンド成分の振幅と位相を波形データとして持ち、この波形データに基づいて各バンドのバンド成分を必要に応じ時間圧縮／伸長しつつ再生して合成することでオーディオ信号波形を復元する。
このオーディオ信号波形処理装置において、上述の課題を解決するために、本発明では、帯域分割によりオーディオ信号波形中に発生するプリエコーの区間に対して、該プリエコー区間の始めにて、復元する各バンド成分の位相を、上記波形データで示されるそれぞれに対応する元のバンド成分の位相となるように位相リセットをするように構成する。
このプリエコーの区間にては、復元する波形の時間圧縮／伸長を行わないようにすることが望ましい。
このようにすると、プリエコーが発生する時点で位相リセットが行われ、その時点での各バンドの再生波形の位相が原音のそれと同じものにセットされるので、これらのバンド成分を合成すると、復元したオーディオ信号波形においてプリエコーが相殺される。
【００１５】
また、上記の位相リセットでは、それまで再生していた波形の位相と上記波形データに基づく置き換えるべき位相の値とをクロスフェードにより置き換えるように構成すると一層望ましい。
このようにすると、位相リセットの際に、再生波形の位相が連続的に変化するようにあるから、突然位相値を変えてしまうことによる位相の不連続的変化に起因した雑音の発生を防止できる。
【００１６】
また、波形再生にピッチ変換が伴う場合には、ピッチ変換に基づく位相変化の度合いと、波形データを読み出すための読出しアドレスの歩進度合いとを合わせるように構成すると望ましい。
これにより、ピッチ変換された波形の位相は元波形と同じになり、バンド成分の再生波形において元の位相が再現され、各バンド成分を合成することでプリエコーを相殺することができる。
【００１７】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態を説明する。
図１には本発明の一実施例としてのオーディオ信号波形処理装置が示される。この実施例は鍵盤型の電子楽器に本発明を適用したものである。図中、２は装置全体の制御を行うＣＰＵ（中央処理装置）、３は各種テーブルデータやＣＰＵ用の制御プログラムなどを格納するＲＯＭ（リード・オンリー・メモリ）、４は複数のオーディオ信号波形の波形データを格納したりＣＰＵの作業用メモリとして用いられるＲＡＭ（ランダム・アクセス・メモリ）である。５は各種の操作子からなる操作子群であり、この操作子としては各種波形から再生波形を選択するための選択ボタンや後述する柔軟度を設定したり波形再生を開始・終了指示するなどの各種の操作子などがある。６は鍵盤装置であり、再生速度を調整するためのレバー６１をそのパネル上に持つ。１は波形合成部であり、波形メモリ１０と各バンドに対応して設けたバンド成分再生部１１-0〜１１-99 とからなり、再生のために選択した波形データに基づいて元のオーディオ信号波形を復元して出力する。
【００１８】
波形合成部１の波形メモリ１０は、オーディオ信号波形の波形データを格納する領域とリセットアドレス情報を格納する領域とを有する。
【００１９】
オーディオ信号波形の波形データは出力振幅値と位相値とからなる。出力振幅値は前述の従来技術で説明したものと同じである。位相値は前述の従来技術の瞬間周波数値に代わるもので、位相ボコーダの本来の特徴パラメータである。本実施例装置では、複数種類のオーディオ信号波形について上記波形データ（振幅値と位相値）を予め分析して求めておいて、ＲＡＭ４に格納してある。そして、波形選択用の選択ボタンで再生用に選択したオーディオ信号波形の波形データが上記波形メモリ１０の波形データ領域に格納される。
【００２０】
図２にはオーディオ信号波形を分析して各バンドの波形の出力振幅値と位相値の波形データを抽出する分析部の概念構成が示される。図示するように、オーディオ信号波形Ｘ(n) に対して、バンド中心の複素周波数cos(ωｋ,n) 、sin(ωｋ,n) をそれぞれ乗じてインパルス応答Ｗ(n) の分析フィルタを通し、その出力を二乗演算した後に加算して平方根を計算することで振幅値を得る。また、分析フィルタの出力Ｘcos 、Ｘsin を演算部に通して位相値を得る。演算部では、分析フィルタの出力Ｘcos 、Ｘsin に応じて、
Ｘcos ＞０の時には Arctan(Xsin/Xcos)
Ｘcos ＜０の時には Arctan(Xsin/Xcos)＋π
を演算し、その結果を位相値として出力する。
【００２１】
また、波形メモリ１０のリセットアドレス情報領域には、選択したオーディオ信号波形のリセットアドレス情報（位相リセット情報）が格納される。このリセットアドレス情報は、オーディオ信号波形のアタックを表すアドレスよりもプリエコーが発生する分だけ前の位置のアドレスが設定されており、オーディオ信号波形中に時系列的に存在する各アタック波形毎に設定されて、RstAdr(0) 、RstAdr(1) 、RstAdr(2) ・・・のように連続的に付番され、その大きさの順に整列させておく。なお、プリエコーがアタックよりどれだけ先行するかは帯域分割するフィルタの遅延量から推し量ることができる。
【００２２】
本実施例装置では、複数種類のオーディオ信号波形について上記リセットアドレス情報を予め分析して求めておいて、ＲＡＭ４に格納しておく。そして、波形選択用の選択ボタンで再生用に選択したオーディオ信号波形のリセットアドレス情報が上記波形メモリ１０のリセットアドレス情報領域に格納される。
【００２３】
バンド成分再生部１１-0〜１１-99 には波形メモリ１０からは振幅値と位相値と、さらにリセットアドレスとが入力され、またＣＰＵ２側から時間位置情報（読出しアドレスDataAdr)とピッチ情報が入力されており、これらのデータに基づいて該当バンドのバンド成分を再生し出力する。この各バンド成分再生部１１-0〜１１-99 から出力されたバンド成分は互いに加算合成されてオーディオ信号波形として復元されて出力される。
【００２４】
図３にはバンド成分再生部１１の構成例が示される。図示するごとく、波形メモリ１０から読み出された位相値は２分岐されて一方は直接に余弦発振器３２に、もう一方は微分部３０にそれぞれ入力される。微分部３０に入力された位相値は微分されることで瞬間周波数に変換されて時間周波数変換処理部３１に入力される。また、波形メモリ１０から読み出された振幅値は時間周波数変換処理部３１に入力される。この時間周波数変換処理部３１は従来技術の項で説明したものと同じ構成を持っており、従来構成同様、その振幅値を乗算器３３に、瞬間周波数の値を余弦発振器３２にそれぞれ送出する。
【００２５】
また、波形メモリ１０から読み出されたリセットアドレス情報は、位相リセット信号発生部３４に入力される。この位相リセット信号発生部３４には波形メモリ１０から波形データを読み出すための読出しアドレスDataAdr （時間位置情報）が入力されている。位相リセット信号発生部３４は、波形メモリ１０から読み出したリセットアドレスRstAdrと読出しアドレスDataAdr とを比較し、読出しアドレスDataAdr がリセットアドレスRstAdrを超えた時点で位相リセット信号を生成して余弦発振器３２に送出する。
【００２６】
図４にはこの位相リセット信号発生部３４での位相リセット信号送出手順のフローチャートが示される。図示するように、波形の再生にともない読出しアドレスDataAdr が更新されると（ステップＳ１）、その読出しアドレスがリセットアドレスRstAdr(k) を超えたか、すなわち、
RstAdr(k) ＜DataAdr
かを判定し（ステップＳ２）、超えていなければ処理を終了して、次の読出しアドレスの更新まで待ち、超えていれば、リセットアドレスの順番を示すパラメータｋの更新、すなわちｋ＝ｋ＋１を行うとともに、位相リセット信号を発生して送出する（ステップＳ３）。
【００２７】
図５には余弦発振器３２の構成例が示される。波形メモリ１０からの位相値データは加算器４２に入力され、ここで当該ブロックの中心周波数の回転分（角周波数）ωｋが加算されてその結果値が和分器４０に入力される。この和分器４０には時間周波数変換処理部３１からの瞬間周波数値が入力されており、和分することで瞬間位相値に変換して、この瞬間位相値を余弦演算器４１に供給することで、当該ブロックの成分信号の周波数と見なされる周波数を持つ余弦波を発生し出力する。また、和分器には位相リセット信号が入力されており、位相リセット信号が到来すると、その瞬間周波数の位相を加算器４２からの絶対位相に置き換える。
【００２８】
次に、この実施例装置では、オーディオ信号波形のタイムストレッチ（時間の圧縮／伸長）を行うために、時間の圧縮伸長率と柔軟度を用いているので、これらについて説明する。圧縮伸長率は、オーディオ信号波形に対する時間圧縮伸長量を圧縮／伸長前後の時間長の比率で表わすものである。一方、柔軟度は、オーディオ信号波形の各区間において圧縮伸長率をどのように修正するかの修正度合を表わしたものである。
【００２９】
〔柔軟度〕
例えば図６に示すように、楽音波形を時間的に複数の区間（その総区間数をｍとする）に区切り、それらの各区間毎に柔軟度Ｅ(i) を設定する。この区間は例えばプリエコーの区間、アタックの区間、ディケイの区間、サスティンの区間、リリースの区間、無音の区間などに一致するよう設定すると望ましい。
【００３０】
ここで、柔軟度Ｅ(i) が「０」である場合は、その区間についてはその再生速度はレバー６１の設定状態や圧縮伸長率の値いかんにかかわらず元波形のままに保つことになる。また、柔軟度が「１」である場合、レバー６１で設定された再生速度となる。また、柔軟度が「１」よりも小さくなれば、レバー６１で設定された再生速度よりも低くなり、柔軟度が「１」よりも大きくなれば、レバー６１で設定された再生速度よりも高くなる。
【００３１】
この柔軟度は、楽音波形の各区間の長さとその区間における柔軟度との積の総和が全区間の長さとなるように、各区間の柔軟度が設定される。これは数式で表わすと、
ΣＥ(i) ×Ｌ(i) ＝ΣＬ(i) ・・・（１）
但し、Σはｉ＝０から（ｍ−１）までの加算
Ｌ(i) は各区間の長さ
Ｅ(i) は各区間における柔軟度
ｍは総区間数
ｉは０からｍまでの区間のうちの再生中の区間を示す区間カウンタ
となる。従って、柔軟度によって各区間の圧縮伸長率が修正されても、波形データの全体の圧縮または伸長された時間は、圧縮伸長率に元の波形データの時間を乗算した値、即ち波形データ全体を単に圧縮伸長率に基づいて圧縮／伸長したときの時間となる。
【００３２】
なお、後述するが、本発明では、プリエコー区間の柔軟度Ｅ(i) を「０」に設定することで本発明による処理を可能ならしめている。
【００３３】
〔圧縮伸長率〕
圧縮伸長率は、再生する波形の時間長を元波形の時間長に比べて圧縮したり伸長したりするための係数である。すなわち、この実施例装置では、鍵盤６に設けたレバー６１を操作することによって波形の再生速度が可変できる。このレバー６１は、自動復帰点である中点位置では再生速度は元波形と同じになり、前後に揺動することで再生速度を速めたり遅らしたりできる。この再生速度は、圧縮伸長率とは逆数の関係にある。例えば、或る波形を２倍の再生速度で再生すると、再生にかかる時間はｌ／２倍となるが、これは、即ち圧縮伸長率がｌ／２であることに他ならない。また、この波形をｌ／２倍の再生速度で再生すると、再生にかかる時間は２倍となるが、同様に、これは圧縮伸長率が２であることを示している。
【００３４】
上記の再生速度は歩進量（tcomp, tcomp', tcomp")に基づいて決まる。歩進量は、オーディオ信号波形を再生するための時間位置（読出しアドレスDataAdr)を歩進させる量であり、言い換えれば波形再生の速さ、すなわち時間伸長圧縮率の逆数になる。この歩進量は、レバー６１の操作によって設定される。
【００３５】
この歩進量には、レバー６１の設定位置により決まる基本となる基本歩進量tcomp 、この基本歩進量tcomp を柔軟度を考慮して修正した柔軟度修正歩進量tcomp'、この柔軟度修正歩進量tcomp'を更にピッチ変換を考慮して修正したピッチ修正歩進量tcomp"がある。
【００３６】
まず、基本歩進量tcomp は、具体的にはレバー６１が中点位置にある時に「１」、前後に倒された時に例えば「＋２」〜「０」の間で変化する値である。この基本歩進量tcomp で読出しアドレスDataAdr を更新する場合、基本歩進量tcomp が「１」であればオーディオ信号波形は元の時間長のまま再生され、基本歩進量tcomp が「２」であればオーディオ信号波形は元の時間長の半分で再生され、基本歩進量tcomp が「０」であればオーディオ信号波形は同じ位置を繰り返し再生される。
【００３７】
柔軟度修正歩進量tcomp'は、上記基本歩進量tcomp を対応区間の柔軟度Ｅ(i) に基づいて下式により修正したものである。
tcomp'＝１／［Ｅ(i) ×｛ (１／tcomp ）−１｝＋１］
ここで柔軟度Ｅ(i) を１とすると、柔軟度修正歩進量tcomp'は基本歩進量tcomp となり、レバー６１で設定された再生速度となる。また柔軟度Ｅ(i) をｌよりも小さくすると、柔軟度修正歩進量tcomp'の値は基本歩進量tcomp よりも小さくなり、レバー６１によって設定された再生速度よりも低くなる。また柔軟度Ｅ(i) を１よりも大きくすると、柔軟度修正歩進量tcomp'の値は基本歩進量tcomp よりも大きくなり、レバー６１で設定された再生速度よりも高くなる。また柔軟度Ｅ(i) を「０」に設定すると、柔軟度修正歩進量tcomp'は「ｌ」となり、元波形の再生遠度となる。従って、柔軟度Ｅ(i) を「０」に設定することによって圧縮伸長を行っているにもかかわらず、その区間Ｍ(i) を元波形の再生速度で再生することができる。この場合、レバー６１によって設定される基本歩進量tcomp の値が再生の途中で変更されても、柔軟度修正歩進量tcomp'は「１」を維持する。このようにＣＰＵ２が基本歩進量tcomp の修正を行っている。
【００３８】
ピッチ修正歩進量tcomp"については、詳しくは後述するが、柔軟度修正歩進量tcomp'を更に修正して、
tcomp"＝Ｗ×tcomp'
により求める。ここに、Ｗは修正度であり、
Ｗ＝（Ｌ／tcomp −Ｌ０）／（Ｌ／tcomp −Ｌ０／Ｐ）
である。ここで、「Ｌ０」はリセットアドレスから始まる柔軟度０の区間、「Ｌ」は注目しているリセットアドレスから次のリセットアドレスまでの長さである。
【００３９】
〔読出しアドレスDataAdr の生成〕
時間の圧縮／伸長やピッチシフトを行っていない場合、読出しアドレスDataAdr はオーディオ信号波形のアドレス順に歩進されて発生される。その速度は、オーディオ信号波形データのサンプリング周波数が４４．ｌｋＨｚであると、４４．１ｋＨｚとなる。レバー６１により再生速度が変えられている場合には、レバー６１により決定される基本歩進量tcomp により読出しアドレスDataAdr は歩進される。例えば、基本歩進量tcomp が「２」であれば、元のオーディオ信号波形のサンプリング点（アドレスAddr) を一つ置きに飛び越す読出しアドレスDataAdr が生成される（すなわち時間圧縮される）。また、例えば基本歩進量tcomp が「１／２」であれば、元のオーディオ信号波形のサンプリング点間の中間点においても読出しアドレスDataAdr が生成される（すなわち時間伸長される）。
【００４０】
以下、この実施例装置の動作を説明する。
前述したように、複数のオーディオ信号波形について、それぞれのオーディオ信号波形中の各プリエコー部分の先頭がリセットアドレスとして予め設定されており、これらのリセットアドレスは対応するオーディオ信号波形の波形データとともにＲＡＭ４に予め格納されている。また、これらのオーディオ信号波形については、柔軟度Ｅ(i) の情報も各オーディオ信号波形に対応付けてＲＡＭ４に格納されている。本実施例では、上記各プリエコーの区間に対しては柔軟度Ｅ(i) として「０」が設定されている。
【００４１】
操作子群５中の波形選択ボタンを用いて任意のオーディオ信号波形を選択すると、当該オーディオ信号波形の波形データとリセットアドレス情報が波形合成部１の波形メモリ１０に転送されて格納される。また、柔軟度の情報が読出しアドレスDataAdr を作成するためにＣＰＵ２によって参照される。
【００４２】
波形再生の開始を指示すると、ＣＰＵ２ではその時のレバー６１の設定状態や波形再生位置の柔軟度情報、ピッチ情報などを参照して読出しアドレスDataAdr を生成し、これをバンド成分再生部１１-0〜１１-99 に時間位置情報として供給する。
【００４３】
バンド成分再生部１１-0〜１１-99 では、この読出しアドレスDataAdr に従って波形メモリ１０から各バンドの波形データを読み出して、各バンド成分再生部１１-0〜１１-99 でバンド成分を再生する。バンド成分再生部１１における時間周波数変換処理部３１での処理は、波形メモリ１０から読み出した位相値を微分部３０で微分することにより瞬間周波数に変換して時間周波数変換処理部３１に送出している点を除けば、従来技術で説明したと同様の処理である。
【００４４】
また、波形メモリ１０からはリセットアドレスRstAdr(n) が読み出されて位相リセット信号発生部３４に入力され、この位相リセット信号発生部３４において読出しアドレスDataAdr がこのリセットアドレスRstAdr(n) と比較され、両者が一致したら、位相リセット信号を発生して余弦発振器３２に送出される。余弦発振器３２では、位相リセット信号を受信すると、発生する余弦波の位相を、その時点で波形メモリ１０から入力される波形データの位相値（すなわち原音波形の位相）になるようリセットする。
【００４５】
これにより、リセットアドレスRstAdr(n) の時間位置、すなわちプリエコー区間の先頭位置で、各バンド成分再生部１１-0〜１１-99 で再生される各バンド成分信号の位相は、元のオーディオ信号波形を帯域分割した各バンド成分信号のそれと一致することになるので、これらを加算合成すれば、各バンド成分信号のプリエコー波形は互いに相殺し合って、復元したオーディオ信号波形中から無くなる。プリエコー区間では柔軟度が「０」に設定されているため同区間を通して時間圧縮／伸長が行われないので、再生信号中に時間圧縮／伸長処理に起因する位相シフトは生じない。よって、上記の再生される各バンド成分信号のプリエコー波形が互いに相殺し合う状態は、上記プリエコー区間全体わたり続くことになる。
【００４６】
なお、余弦発振器３２中での動作は、位相リセット信号が入力されると、和分器４０において、保持している位相をリセットし、波形メモリ１０から直接に送られている位相値に中心周波数の回転分ωｋｎを加えたものに置き換える。そして、この位相を置き換えた瞬間周波数の値を余弦演算器４１に与えて、その位相と周波数の余弦波を演算で発生し出力する。上述したように、予め少なくとも各リセットアドレスRstAdr(n) からアタック区間の先頭アドレスまでは柔軟度が０の区間になるように設定してあるので、アタックが開始されるまでの区間では、時間位置情報（読出しアドレス）は原音が忠実に再現されるような値になり、この区間では、初めに帯域分割したままのプリエコー波形がそのまま再現され、よって合成時にプリエコーが相殺される。
【００４７】
以上の動作はピッチ変換（ピッチシフト）を伴わない場合について説明したものであり、この発明は、このようなピッチ変換を伴わない場合に最も効果的に用いられるものであるが、しかし、ピッチ変換を伴う場合にも、以下のような処理を行うことでこの発明を適用することができる。
【００４８】
まず、リセットアドレスに続く柔軟度０の区間を仮にリジット区間と呼ぶ。便宜上、リジット区間の長さを「Ｌ０」、注目しているリセットアドレスから次のリセットアドレスまでの長さを「Ｌ」とする。先に説明した柔軟度修正歩進量tcomp'に更に修正を加える。すなわち、リセットアドレスに続く柔軟度「０」の区間では、
tcomp"＝Ｐ
とする。ここに、Ｐはバンド成分再生部に供給されるピッチ情報Ｐ（ピッチシフトさせる量を示す情報）である。
【００４９】
さらに、次のリセットアドレスまでの各区間では、歩進量として、柔軟度修正歩進量tcomp'を更に修正した再修正ピッチ修正歩進量tcomp"、
tcomp"＝Ｗ×tcomp'
を用いる。ここに、Ｗは修正度であり、
Ｗ＝（Ｌ／tcomp −Ｌ０）／（Ｌ／tcomp −Ｌ０／Ｐ）
である。
【００５０】
これは、リジッド区間においては、ピッチ変換がなされるために位相の進み方がＰ倍になる。そこで、基本歩進量tcomp のほうも同じ値に設定すれば、位相の進みは元波形と同じになり、ピッチ変換した波形においても元の位相が再現される。さらに、それに伴いリジッド区間の長さは１／Ｐになるので、次のリセットアドレスが来るまでの残りの区間のアドレスの歩進量を修正度Ｗで調整することで、リセットアドレスから次のリセットアドレスまでの長さを所望の時間長に修正する。図７はこの様子を示したもので、黒い部分がリジッド区間である。
このピッチ修正歩進量tcomp"により時間位置情報が計算される。
【００５１】
本発明の実施にあたっては種々の変形形態が可能である。例えば、上述した実施例では、余弦発振器で位相リセットするときに一般に位相は不連続になり、そのために雑音が発生する可能性がある。一般に、位相リセットはアタックの少し前になされ、そのときの振幅は通常きわめて小さなものであるため、この部分の位相が不連続であるとしても、雑音は発生しないか発生したとしても問題にならないくらい小さなものであるが、しかし、この小さな雑音発生の可能性をさらに低減するよう改良することも可能である。
【００５２】
すなわち、余弦発振器において、位相リセット信号の到来のタイミングでいきなり位相リセットするのではなく、位相リセット信号の到来から、周波数値の和分の値と絶対位相（波形メモリ１０の波形データの位相に応じたもの）との割合を徐々に変えながら加えていく、いわゆるクロスフェードにより位相を連続的に絶対位相に近づけていくように構成する。クロスフェードの時間はリセットアドレスからプリエコーの長さに比べて短い時間で十分であるが、リセットアドレスをその分だけさらに前になるように予め設定しておくこともできる。
【００５３】
図８は、このクロスフェードをするときの余弦発振器の構成例を示すものである。この構成例では、クロスフェード演算器４３を新たに設け、位相リセット信号、絶対位相、和分器４０の出力（瞬間位相値）を入力するとともに、和分器４０に対してクロスフェード終了通知を送出するよう構成する。クロスフェード演算器４３は、通常のとき、和分器４０からのデータをそのまま出力し、位相リセット信号が到来すると、クロスフェードを開始する。このクロスフェードは、位相リセット後、このクロスフェード演算器４３の出力信号を、和分器４０の瞬間位相値から時間経過とともに絶対位相値の比率がだんだん大きくなるようにする。クロスフェードの終了時点でクロスフェード終了通知を和分器４０に送出し、和分器４０の位相を絶対位相値に書き換えるとともに、この和分器４０からの出力信号を瞬間位相値として出力するよう切換えを行う。
【００５４】
また、上述の実施例では、位相リセット信号をプリエコー区間の開始位置で発生することでプリエコーの除去を可能ならしめるものであった。しかし、この位相リセット信号は、例えば、再生するオーディオ信号波形を原音を忠実に再現するものにするような場合（すなわち時間圧縮／伸長を行わない場合）には、波形再生の開始時点や再生途中の任意の時点にて発生するようにしてもよい。このようにすると、その位相リセット時点で、復元するオーディオ信号波形の各バンドの成分信号の各位相が原音のそれと一致されるので、再生波形が原音に一層忠実なものになる。
【００５５】
また、上述の実施例では、柔軟度「０」の区間をプリエコーの区間に設定することで、プリエコーの除去を可能ならしめていたが、この柔軟度「０」の区間を例えばアタック区間に設定することもできる。このようにアタック区間を柔軟度「０」とすると、オーディオ信号波形の時間圧縮／伸長を行った場合にも、アタック部分は原音どうりの波形（すなわち時間圧縮／伸長されていない波形）で再生されることになる。一般に、アタック部分はその楽音の特徴を最もよく表現するので、このアタック部分を時間圧縮／伸長してしまうと、原音と異なった音色のように聞こえてしまうが、上記のようにアタック部分を柔軟度「０」に設定することで、波形全体を時間圧縮／伸長する場合にも、アタック部分により表現されるその楽音の特徴をよく保存することができる。
【００５６】
【発明の効果】
以上に説明したように、本発明によれば、プリエコーが発生する時点で位相リセット信号が発生し、その時点での位相が原音と同じものにセットされるので、プリエコーに起因して生じる雑音の発生を防止し、それによりアタック音の歯切れをよいものにしてアタック感が損なわれることを防止できる。
【図面の簡単な説明】
【図１】本発明にかかる一実施例としてのオーディオ信号波形処理装置を示す図である。
【図２】実施例装置におけるオーディオ信号波形から位相値、振幅値を抽出する分析部の概念を説明する図である。
【図３】実施例装置におけるバンド成分再生部の構成例を示す図である。
【図４】実施例装置のバンド成分再生部における位相リセット信号発生部による位相リセット信号送出の処理手順を示すフローチャートである。
【図５】実施例装置のバンド成分再生部における余弦発振器の構成例を示す図である。
【図６】実施例装置における柔軟度の概念を説明するための図である。
【図７】実施例装置におけるピッチ変換を伴う場合の波形形状を説明する図である。
【図８】実施例装置におけるバンド成分再生部における余弦発振器の他の構成例を示す図である。
【図９】帯域分割されたオーディオ信号波形におけるアタック波形とプリエコーとの関係を説明するための図である。
【図１０】位相ボコーダ方式による従来のオーディオ信号波形処理装置の全体構成概念を説明するための図である。
【図１１】原音のオーディオ信号波形の成分と帯域分割した各バンドとの関係を説明する図である。
【図１２】原音のオーディオ信号波形についての波形データを説明する図である。
【図１３】実施例装置におけるオーディオ信号波形を分析して振幅値と位相値を抽出する概念を説明するための図である。
【図１４】従来のオーディオ信号波形処理装置における時間周波数変換処理部の構成例を示す図である。
【図１５】従来のオーディオ信号波形処理装置の時間周波数変換処理部における信号波形の処理例を示す図である。
【符号の説明】
１波形合成部
２ＣＰＵ（中央処理装置）
３ＲＯＭ〔リード・オンリー・メモリ〕
４ＲＡＭ（ランダム・アクセス・メモリ）
５操作子群
６鍵盤装置
６１速度調整用のレバー
１０波形メモリ
１１バンド成分再生部
３０微分部
３１時間周波数変換処理部
３２余弦発振器
３３乗算器
３４位相リセット信号発生部
４０和分器
４１余弦演算部
４２加算器
４３クロスフェード演算器[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio signal waveform processing apparatus that performs time stretch (time compression / expansion) and pitch shift (pitch conversion) by a phase vocoder method.
[0002]
In the phase vocoder method, the analysis system divides the audio signal waveform of the original sound into a plurality of frequency bands (bands) using a band filter, analyzes the band components of each band individually, and outputs the output amplitude and phase as characteristic parameters. In the synthesis system, the original band components are reproduced using the output amplitude and phase for each band, and the band components of each band are added and synthesized to produce the original audio signal waveform. Restore.
[0003]
FIG. 10 explains the concept of an audio signal waveform processing apparatus using this phase vocoder method. As shown in the figure, the audio signal waveform X (n) is input to a plurality of analysis units 60. In this example, the analysis unit 60 is provided for each band obtained by dividing the frequency of the audio signal waveform into 100 bands. As shown in FIG. 11, each analysis unit 60 has bands 0 to 99 each having an approximate fundamental frequency of an audio signal waveform as a center frequency, and has a configuration as shown in FIG. That is, for example, the analysis unit of the band k multiplies the input audio signal waveform X (n) by the complex frequency sin (ωk) and cos (ωk) at the center (synchronous detection) to obtain the output amplitude value of the band k At the same time, the phase value of the detection output is differentiated to obtain instantaneous frequency information. The instantaneous frequency is a phase change amount (differential value) per unit time at each time point (each position on the time axis of the waveform), and is information indicating a frequency deviation from the center frequency.
[0004]
In the waveform processing apparatus of FIG. 10, waveform data (output amplitude and instantaneous frequency) of each band of the audio signal waveform X (n) is stored in the waveform memory 61. As shown in FIG. 12, the waveform data is stored in the waveform memory 61 with respect to each address addr (0) to addr (n) on the time axis of the audio signal waveform x (n). Each time, amplitude data A and instantaneous frequency data f are stored.
[0005]
The synthesis system includes a band component reproduction unit 62 provided for each band, and each band component reproduction unit 62 includes a time-frequency conversion processing unit 31, a cosine oscillator 33, and a multiplier 33. The time-frequency conversion processing unit 31 has a configuration as shown in FIG. That is, for the input output amplitude value, the interpolation point interpolates / adds the sample point according to the time stretch ratio (time compression / expansion ratio), and the amplitude envelope (change of the amplitude value over time). The amplitude value obtained by compressing / decompressing the indicated envelope is output. For the input instantaneous frequency value, the central angular frequency ωk is added to the instantaneous frequency value. When pitch conversion is performed, the frequency conversion ratio (depending on the degree of pitch shift) is added to the instantaneous frequency value. Value), and interpolated / additionally interpolated sample points according to the time stretch ratio and compressed / expanded the frequency envelope (envelope showing the change over time of the assumed frequency value) Is output.
[0006]
FIG. 15 is a diagram showing how the amplitude value and the instantaneous frequency are interpolated. In the case of time extension, as shown in FIG. 15A, both the original amplitude envelope and the frequency envelope are extended to generate an amplitude value and an instantaneous frequency obtained by extending the time axis. In the case of time compression, as shown in FIG. 15B, both the original amplitude envelope and the frequency envelope are contracted to generate an amplitude value and an instantaneous frequency obtained by compressing the time axis. By this interpolation processing, the time axis of the original audio signal waveform can be arbitrarily compressed / expanded.
[0007]
The instantaneous frequency value processed by the time-frequency conversion processing unit 31 (appropriately time-stretched) is supplied to the cosine oscillator, and the cosine oscillator generates a cosine wave having the frequency of the band. The amplitude envelope processed by the time frequency conversion processing unit 31 is added and output. Thereby, the component signal of the band is reproduced. Furthermore, the original audio signal waveform can be restored by adding and synthesizing the band components of these bands 0 to 99.
[0008]
[Problems to be solved by the invention]
When reproducing an audio signal waveform of a musical tone by the above-described phase vocoder method, the level of an audio signal waveform of a piano or the like greatly changes in time in the attack portion. When such an audio signal waveform is band-divided using a digital filter, a trace, so-called “pre-echo”, appears before each attack in each band.
[0009]
FIG. 9 is a diagram for explaining this pre-echo phenomenon. For the sake of simplicity, FIG. 9 shows a case where the original audio signal waveform is divided into a high frequency band and a low frequency band. A typical attack waveform (a) is separated into a low-frequency component waveform (b) and a high-frequency component waveform (c), and the appearance of pre-echo is shown. It can be seen from the low-frequency component waveform (b) and the high-frequency component waveform (c) that pre-echo occurs in the silent part preceding the original attack waveform (a). The pre-echo of the low-frequency component waveform (b) and the pre-echo of the high-frequency component waveform (c) are out of phase with each other. Therefore, when the low-frequency component waveform (b) and the high-frequency component waveform are finally synthesized with the same phase, The two pre-echoes cancel each other, and the silent part is restored. This is the same even when the number of bands (bands) to be divided is large. If the component waveforms of each band are added and combined after reproduction, the pre-echo cancels each other and becomes a silent part.
[0010]
Such pre-echoes become more severe as the number of taps of the digital filter (FIR filter) used increases, and the generation limit is the number of taps of the filter itself. In general, the pre-echo size is estimated from the delay amount of the filter. Can do.
[0011]
By the way, when time stretch or pitch shift is performed using the phase vocoder method, generally, the waveform of each band does not return to the phase of the original waveform, and the phase changes. For this reason, even after adding and synthesizing after reproducing the band components of each band, the phase of the pre-echo part of the waveform of each band also changes, so these pre-echo parts do not cancel each other. A pre-echo preceding the attachment remains in the synthesized signal.
[0012]
For this reason, normally, in waveform reproduction using a phase vocoder, noise due to pre-echo is generated immediately before the attack, and the attack sound is not crisp and the so-called attack feeling is significantly impaired.
[0013]
The present invention has been made in view of such problems, and an object of the present invention is to prevent generation of noise due to pre-echo generated by band division while using a phase vocoder method.
[0014]
[Means and Actions for Solving the Problems]
The audio signal waveform processing apparatus to which the present invention is applied has, as waveform data, the amplitude and phase of each band component obtained by dividing the audio signal into a plurality of bands, and the band components of each band are used as needed based on this waveform data. The audio signal waveform is restored by reproducing and synthesizing while compressing / decompressing time.
In the audio signal waveform processing apparatus, in order to solve the above-described problem, in the present invention, each band to be restored at the beginning of the pre-echo period with respect to the pre-echo period generated in the audio signal waveform by band division. The phase is reset so that the phase of the component becomes the phase of the original band component corresponding to each of the waveform data.
It is desirable not to perform time compression / expansion of the waveform to be restored in the pre-echo section.
In this way, the phase reset is performed when pre-echo occurs, and the phase of the playback waveform of each band at that point is set to the same as that of the original sound. Pre-echo is canceled in the audio signal waveform.
[0015]
In the above-described phase reset, it is more desirable that the phase of the waveform that has been reproduced so far and the value of the phase to be replaced based on the waveform data are replaced by a cross fade.
In this way, the phase of the reproduced waveform changes continuously at the time of phase reset, so that it is possible to prevent the occurrence of noise due to a discontinuous change in phase due to a sudden change in phase value. .
[0016]
In addition, when the waveform reproduction involves pitch conversion, it is desirable that the degree of phase change based on the pitch conversion and the stepping degree of the read address for reading the waveform data are matched.
Thereby, the phase of the pitch-converted waveform becomes the same as the original waveform, the original phase is reproduced in the reproduction waveform of the band component, and the pre-echo can be canceled by synthesizing each band component.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 shows an audio signal waveform processing apparatus as an embodiment of the present invention. In this embodiment, the present invention is applied to a keyboard-type electronic musical instrument. In the figure, 2 is a CPU (central processing unit) that controls the entire apparatus, 3 is a ROM (read only memory) that stores various table data, a control program for the CPU, and the like, 4 is a plurality of audio signal waveforms. It is a RAM (Random Access Memory) that stores waveform data and is used as a working memory for the CPU. Reference numeral 5 denotes an operator group composed of various operators, such as a selection button for selecting a playback waveform from various waveforms, setting a flexibility to be described later, and instructing start / end of waveform playback. There are various controls. A keyboard device 6 has a lever 61 on the panel for adjusting the reproduction speed. Reference numeral 1 denotes a waveform synthesizing unit, which comprises a waveform memory 10 and band component reproducing units 11-0 to 11-99 provided corresponding to the respective bands. The original audio signal is based on the waveform data selected for reproduction. Restore the waveform and output it.
[0018]
The waveform memory 10 of the waveform synthesizer 1 has an area for storing waveform data of an audio signal waveform and an area for storing reset address information.
[0019]
The waveform data of the audio signal waveform consists of an output amplitude value and a phase value. The output amplitude value is the same as that described in the prior art. The phase value replaces the aforementioned instantaneous frequency value of the prior art and is an original characteristic parameter of the phase vocoder. In the present embodiment apparatus, the waveform data (amplitude value and phase value) of a plurality of types of audio signal waveforms is obtained by analyzing in advance and stored in the RAM 4. Then, the waveform data of the audio signal waveform selected for reproduction by the selection button for waveform selection is stored in the waveform data area of the waveform memory 10.
[0020]
FIG. 2 shows a conceptual configuration of an analysis unit that analyzes an audio signal waveform and extracts waveform data of an output amplitude value and a phase value of each band waveform. As shown in the figure, the audio signal waveform X (n) is multiplied by the band center complex frequencies cos (ωk, n) and sin (ωk, n), respectively, and passed through the analysis filter of the impulse response W (n), The output is squared and then added to calculate a square root to obtain an amplitude value. Further, the outputs Xcos and Xsin of the analysis filter are passed through a calculation unit to obtain a phase value. In the calculation unit, according to the output Xcos and Xsin of the analysis filter,
Arctan (Xsin / Xcos) when Xcos> 0
Arctan (Xsin / Xcos) + π when Xcos <0
And outputs the result as a phase value.
[0021]
The reset address information area of the waveform memory 10 stores reset address information (phase reset information) of the selected audio signal waveform. This reset address information is set to an address at a position before the address indicating the attack of the audio signal waveform by the amount of pre-echo, and is set for each attack waveform existing in time series in the audio signal waveform. Are sequentially assigned as RstAdr (0), RstAdr (1), RstAdr (2)..., And are arranged in the order of their sizes. Note that how far the pre-echo precedes the attack can be estimated from the delay amount of the filter that performs band division.
[0022]
In this embodiment, the reset address information is obtained by analyzing in advance for a plurality of types of audio signal waveforms, and stored in the RAM 4. Then, the reset address information of the audio signal waveform selected for reproduction by the waveform selection button is stored in the reset address information area of the waveform memory 10.
[0023]
The band component reproducing units 11-0 to 11-99 are supplied with an amplitude value, a phase value, and a reset address from the waveform memory 10, and with time position information (reading address DataAdr) and pitch information from the CPU 2 side. Based on these data, the band component of the corresponding band is reproduced and output. The band components output from each of the band component reproducing units 11-0 to 11-99 are added and synthesized with each other to be restored and output as an audio signal waveform.
[0024]
FIG. 3 shows a configuration example of the band component reproduction unit 11. As shown in the figure, the phase value read from the waveform memory 10 is branched into two, one being directly input to the cosine oscillator 32 and the other being input to the differentiating unit 30. The phase value input to the differentiating unit 30 is differentiated to be converted into an instantaneous frequency and input to the time-frequency conversion processing unit 31. The amplitude value read from the waveform memory 10 is input to the time frequency conversion processing unit 31. The time-frequency conversion processing unit 31 has the same configuration as that described in the section of the prior art, and sends the amplitude value to the multiplier 33 and the instantaneous frequency value to the cosine oscillator 32 as in the conventional configuration.
[0025]
The reset address information read from the waveform memory 10 is input to the phase reset signal generator 34. A read address DataAdr (time position information) for reading waveform data from the waveform memory 10 is input to the phase reset signal generator 34. The phase reset signal generator 34 compares the reset address RstAdr read from the waveform memory 10 with the read address DataAdr, generates a phase reset signal when the read address DataAdr exceeds the reset address RstAdr, and sends it to the cosine oscillator 32. To do.
[0026]
FIG. 4 shows a flowchart of a phase reset signal transmission procedure in the phase reset signal generator 34. As shown in the figure, when the read address DataAdr is updated as the waveform is reproduced (step S1), whether the read address exceeds the reset address RstAdr (k), that is,
RstAdr (k) <DataAdr
(Step S2), if it does not exceed, the process ends and waits until the next read address update, and if it exceeds, update the parameter k indicating the order of the reset address, that is, k = k + 1. At the same time, a phase reset signal is generated and transmitted (step S3).
[0027]
FIG. 5 shows a configuration example of the cosine oscillator 32. The phase value data from the waveform memory 10 is input to the adder 42, where the rotation (angular frequency) ωk of the center frequency of the block is added, and the resultant value is input to the summer 40. An instantaneous frequency value from the time frequency conversion processing unit 31 is input to the summer 40, and is converted into an instantaneous phase value by summing, and this instantaneous phase value is supplied to the cosine calculator 41. Then, a cosine wave having a frequency regarded as the frequency of the component signal of the block is generated and output. The phase reset signal is input to the summer. When the phase reset signal arrives, the phase of the instantaneous frequency is replaced with the absolute phase from the adder 42.
[0028]
Next, since the apparatus of this embodiment uses the time compression / expansion rate and flexibility in order to perform time stretching (time compression / expansion) of the audio signal waveform, these will be described. The compression / expansion rate represents the amount of time compression / expansion with respect to the audio signal waveform as a ratio of time length before and after compression / expansion. On the other hand, the flexibility indicates the degree of correction of how the compression / decompression rate is corrected in each section of the audio signal waveform.
[0029]
[Flexibility]
For example, as shown in FIG. 6, the musical sound waveform is divided into a plurality of sections in time (the total number of sections is m), and the flexibility E (i) is set for each section. For example, it is desirable to set this section so as to coincide with a pre-echo section, an attack section, a decay section, a sustain section, a release section, a silent section, and the like.
[0030]
Here, when the degree of flexibility E (i) is “0”, the reproduction speed of the section is maintained as the original waveform regardless of the setting state of the lever 61 and the value of the compression / expansion rate. . When the flexibility is “1”, the playback speed set by the lever 61 is obtained. If the flexibility is less than “1”, the playback speed is set lower than the playback speed set by the lever 61. If the flexibility is higher than “1”, the playback speed is set higher than the playback speed set by the lever 61. Become.
[0031]
The flexibility of each section is set so that the sum of the products of the length of each section of the musical sound waveform and the flexibility in the section becomes the length of all sections. This can be expressed in mathematical formulas:
ΣE (i) x L (i) = ΣL (i) (1)
However, Σ is an addition from i = 0 to (m-1)
L (i) is the length of each section
E (i) is the flexibility in each section
m is the total number of sections
i is a section counter indicating a section being played out of sections from 0 to m
It becomes. Therefore, even if the compression / expansion rate of each section is corrected by the flexibility, the entire compression or expansion time of the waveform data is the value obtained by multiplying the compression / expansion rate by the time of the original waveform data, that is, the entire waveform data. It is simply the time when compression / decompression is performed based on the compression / decompression ratio.
[0032]
As will be described later, in the present invention, the processing according to the present invention is made possible by setting the flexibility E (i) of the pre-echo period to “0”.
[0033]
[Compression / decompression ratio]
The compression / expansion rate is a coefficient for compressing or expanding the time length of the waveform to be reproduced as compared with the time length of the original waveform. That is, in this embodiment apparatus, the waveform reproduction speed can be varied by operating the lever 61 provided on the keyboard 6. The lever 61 has the same playback speed as the original waveform at the midpoint position, which is the automatic return point, and can swing the playback speed forward or backward by swinging back and forth. This reproduction speed has a reciprocal relationship with the compression / expansion rate. For example, when a certain waveform is reproduced at twice the reproduction speed, the time required for reproduction is l / 2 times, which is nothing but the compression / expansion rate is l / 2. When this waveform is reproduced at a reproduction speed of 1/2, the time required for reproduction is doubled. Similarly, this indicates that the compression / expansion rate is 2.
[0034]
The above playback speed is determined based on the step amount (tcomp, tcomp ', tcomp "). The step amount is an amount by which the time position (read address DataAdr) for reproducing the audio signal waveform is stepped, In other words, the waveform reproduction speed, that is, the reciprocal of the time expansion / compression ratio, is set by operating the lever 61.
[0035]
The step amount includes a basic step amount tcomp determined based on the set position of the lever 61, a flexibility correction step amount tcomp 'obtained by correcting the basic step amount tcomp in consideration of the flexibility, and the flexibility. There is a pitch correction step amount tcomp "in which the correction step amount tcomp 'is further corrected in consideration of pitch conversion.
[0036]
First, the basic step amount tcomp is specifically a value that changes between “1” when the lever 61 is in the middle position and, for example, between “+2” and “0” when the lever 61 is tilted back and forth. When the read address DataAdr is updated with the basic step amount tcomp, if the basic step amount tcomp is “1”, the audio signal waveform is reproduced with the original time length, and the basic step amount tcomp is “2”. If there is, the audio signal waveform is reproduced with half the original time length, and if the basic step amount tcomp is “0”, the audio signal waveform is repeatedly reproduced at the same position.
[0037]
The flexibility correction step amount tcomp ′ is obtained by correcting the basic step amount tcomp by the following expression based on the flexibility E (i) of the corresponding section.
tcomp ′ = 1 / [E (i) × {(1 / tcomp) −1} +1]
Here, when the flexibility E (i) is 1, the flexibility correction step amount tcomp ′ becomes the basic step amount tcomp and the playback speed set by the lever 61. When the flexibility E (i) is smaller than l, the value of the flexibility correction step amount tcomp ′ becomes smaller than the basic step amount tcomp and becomes lower than the reproduction speed set by the lever 61. When the flexibility E (i) is larger than 1, the value of the flexibility correction step amount tcomp ′ becomes larger than the basic step amount tcomp and becomes higher than the reproduction speed set by the lever 61. If the flexibility E (i) is set to “0”, the flexibility correction step amount tcomp ′ becomes “l”, which is the reproduction distance of the original waveform. Therefore, although the compression / expansion is performed by setting the flexibility E (i) to “0”, the section M (i) can be reproduced at the reproduction speed of the original waveform. In this case, even if the value of the basic step amount tcomp set by the lever 61 is changed during the reproduction, the flexibility correction step amount tcomp ′ is maintained at “1”. In this way, the CPU 2 corrects the basic step amount tcomp.
[0038]
The pitch correction step amount tcomp "will be described in detail later, but the flexibility correction step amount tcomp 'is further corrected,
tcomp "= W × tcomp '
Ask for. Where W is the degree of correction,
W = (L / tcomp-L0) / (L / tcomp-L0 / P)
It is. Here, “L0” is the interval of 0 flexibility starting from the reset address, and “L” is the length from the reset address of interest to the next reset address.
[0039]
[Generation of read address DataAdr]
When time compression / expansion and pitch shift are not performed, the read address DataAdr is generated by stepping in the order of addresses of the audio signal waveform. The sampling rate of the audio signal waveform data is 44. When it is 1 kHz, it becomes 44.1 kHz. When the playback speed is changed by the lever 61, the read address DataAdr is stepped by the basic step amount tcomp determined by the lever 61. For example, if the basic step amount tcomp is “2”, a read address DataAdr that jumps every other sampling point (address Addr) of the original audio signal waveform is generated (that is, time-compressed). For example, if the basic step amount tcomp is “½”, the read address DataAdr is generated (that is, time-extended) at the intermediate point between the sampling points of the original audio signal waveform.
[0040]
The operation of this embodiment apparatus will be described below.
As described above, with respect to a plurality of audio signal waveforms, the head of each pre-echo part in each audio signal waveform is preset as a reset address, and these reset addresses are stored in the RAM 4 together with the waveform data of the corresponding audio signal waveform. Stored in advance. For these audio signal waveforms, information on the flexibility E (i) is also stored in the RAM 4 in association with each audio signal waveform. In this embodiment, “0” is set as the flexibility E (i) for each pre-echo section.
[0041]
When an arbitrary audio signal waveform is selected using the waveform selection button in the operator group 5, the waveform data and reset address information of the audio signal waveform are transferred to and stored in the waveform memory 10 of the waveform synthesizer 1. Also, the flexibility information is referred to by the CPU 2 to create the read address DataAdr.
[0042]
When the start of waveform reproduction is instructed, the CPU 2 generates a read address DataAdr by referring to the setting state of the lever 61, the flexibility information of the waveform reproduction position, the pitch information, etc. The time position information is supplied to 11-99.
[0043]
The band component reproducing units 11-0 to 11-99 read the waveform data of each band from the waveform memory 10 in accordance with the read address DataAdr, and the band component reproducing units 11-0 to 11-99 reproduce the band components. The processing in the time frequency conversion processing unit 31 in the band component reproduction unit 11 is converted to an instantaneous frequency by differentiating the phase value read from the waveform memory 10 by the differentiating unit 30 and sent to the time frequency conversion processing unit 31. Except for this point, the processing is the same as that described in the prior art.
[0044]
In addition, the reset address RstAdr (n) is read from the waveform memory 10 and input to the phase reset signal generator 34, and the read address DataAdr is compared with the reset address RstAdr (n) in the phase reset signal generator 34. If they match, a phase reset signal is generated and sent to the cosine oscillator 32. When receiving the phase reset signal, the cosine oscillator 32 resets the phase of the generated cosine wave so as to become the phase value of the waveform data input from the waveform memory 10 at that time (that is, the phase of the original sound waveform).
[0045]
As a result, the phase of each band component signal reproduced by each of the band component reproducing units 11-0 to 11-99 at the time position of the reset address RstAdr (n), that is, the start position of the pre-echo period, is the original audio signal waveform. Therefore, if these are added and synthesized, the pre-echo waveforms of the respective band component signals cancel each other and disappear from the restored audio signal waveform. Since the degree of flexibility is set to “0” in the pre-echo period, time compression / expansion is not performed through the same period, so that a phase shift due to time compression / expansion processing does not occur in the reproduction signal. Therefore, the state where the pre-echo waveforms of the respective band component signals to be reproduced cancel each other continues throughout the pre-echo period.
[0046]
When the phase reset signal is input, the operation in the cosine oscillator 32 resets the held phase in the summer 40, and changes the center frequency to the phase value sent directly from the waveform memory 10. Is replaced with a rotation amount ωkn added. The instantaneous frequency value obtained by replacing the phase is supplied to the cosine calculator 41, and a cosine wave having the phase and frequency is generated and output. As described above, since at least the reset address RstAdr (n) to the head address of the attack section is set in advance so that the flexibility is zero, the time position in the section until the attack is started The information (reading address) is a value that faithfully reproduces the original sound, and in this section, the pre-echo waveform that is first divided into bands is reproduced as it is, and thus the pre-echo is canceled at the time of synthesis.
[0047]
The above operation has been described with respect to a case where no pitch conversion (pitch shift) is involved, and the present invention is most effectively used when such a pitch conversion is not involved. Even with the above, the present invention can be applied by performing the following processing.
[0048]
First, a section of 0 flexibility following the reset address is temporarily called a rigid section. For convenience, the length of the rigid section is “L0”, and the length from the reset address of interest to the next reset address is “L”. Further correction is added to the flexibility correction step tcomp 'described above. That is, in the section of flexibility “0” following the reset address,
tcomp "= P
And Here, P is pitch information P (information indicating the amount of pitch shift) supplied to the band component reproduction unit.
[0049]
Further, in each section up to the next reset address, as a step amount, a recorrected pitch correction step amount tcomp "further correcting the flexibility correction step amount tcomp ',
tcomp "= W × tcomp '
Is used. Where W is the degree of correction,
W = (L / tcomp-L0) / (L / tcomp-L0 / P)
It is.
[0050]
This is because in the rigid section, pitch conversion is performed, so that the phase advancement is P times. Therefore, if the basic step amount tcomp is also set to the same value, the phase advance becomes the same as the original waveform, and the original phase is reproduced even in the pitch-converted waveform. Further, since the length of the rigid section becomes 1 / P, the next reset address is adjusted from the reset address by adjusting the step amount of the address of the remaining section until the next reset address comes with the correction degree W. The length to the address is corrected to a desired time length. FIG. 7 shows this state, and the black portion is a rigid section.
Time position information is calculated from this pitch correction step tcomp ".
[0051]
Various modifications are possible in the practice of the present invention. For example, in the above-described embodiment, when the phase is reset by the cosine oscillator, the phase is generally discontinuous, which may cause noise. In general, the phase reset is performed shortly before the attack, and the amplitude at that time is usually very small, so even if the phase of this part is discontinuous, no noise will be generated or no problem will occur. Although small, it can also be improved to further reduce the possibility of this small noise generation.
[0052]
That is, in the cosine oscillator, the phase reset signal is not suddenly reset at the timing of arrival of the phase reset signal, but the sum of the frequency value and the absolute phase (according to the phase of the waveform data in the waveform memory 10) from the arrival of the phase reset signal. The phase is continuously brought close to the absolute phase by so-called cross-fade, which is added while gradually changing the ratio. Although a short time compared with the length of the pre-echo from the reset address is sufficient for the cross-fade time, the reset address can be set in advance so as to be further ahead.
[0053]
FIG. 8 shows a configuration example of a cosine oscillator when performing this crossfade. In this configuration example, a cross-fade calculator 43 is newly provided, and a phase reset signal, an absolute phase, and an output (instantaneous phase value) of the summer 40 are input, and a cross-fade end notification is sent to the summer 40. Configure to send. The crossfade computing unit 43 normally outputs the data from the summer 40 as it is, and starts the crossfade when the phase reset signal arrives. In this crossfade, after the phase is reset, the output signal of the crossfade calculator 43 is set so that the ratio of the absolute phase value gradually increases with time from the instantaneous phase value of the summer 40. At the end of the crossfade, a crossfade end notification is sent to the summer 40, the phase of the summer 40 is rewritten to an absolute phase value, and the output signal from the summer 40 is output as an instantaneous phase value. Change over.
[0054]
In the above-described embodiment, the pre-echo can be removed by generating the phase reset signal at the start position of the pre-echo period. However, this phase reset signal is used when, for example, the audio signal waveform to be reproduced is a faithful reproduction of the original sound (that is, when time compression / decompression is not performed), or when waveform reproduction is started or during reproduction. It may be generated at any point of time. In this way, at the time of the phase reset, each phase of the component signal of each band of the audio signal waveform to be restored matches that of the original sound, so that the reproduced waveform becomes more faithful to the original sound.
[0055]
Further, in the above-described embodiment, it is possible to remove the pre-echo by setting the section of flexibility “0” as the pre-echo section. However, the section of flexibility “0” is set as the attack section, for example. You can also. As described above, when the attack period is set to “0”, even when the audio signal waveform is time-compressed / expanded, the attack part is reproduced as the original sound waveform (that is, the waveform not time-compressed / expanded). Will be. In general, the attack part best represents the characteristics of the musical sound, so if this attack part is compressed / expanded in time, it will sound like a tone different from the original sound, but the attack part is flexible as described above. By setting the degree to “0”, even when the entire waveform is time-compressed / expanded, the characteristics of the musical sound expressed by the attack portion can be well preserved.
[0056]
【The invention's effect】
As described above, according to the present invention, the phase reset signal is generated when the pre-echo occurs, and the phase at that time is set to the same as the original sound. Generation | occurrence | production can be prevented and the crispness of an attack sound can be made good thereby and it can prevent that an attack feeling is impaired.
[Brief description of the drawings]
FIG. 1 is a diagram showing an audio signal waveform processing apparatus as one embodiment according to the present invention.
FIG. 2 is a diagram for explaining a concept of an analysis unit that extracts a phase value and an amplitude value from an audio signal waveform in the embodiment apparatus;
FIG. 3 is a diagram illustrating a configuration example of a band component reproduction unit in the embodiment apparatus;
FIG. 4 is a flowchart showing a processing procedure of transmitting a phase reset signal by a phase reset signal generating unit in a band component reproducing unit of the embodiment apparatus;
FIG. 5 is a diagram illustrating a configuration example of a cosine oscillator in a band component reproduction unit of the example device.
FIG. 6 is a diagram for explaining the concept of flexibility in the embodiment device;
FIG. 7 is a diagram for explaining a waveform shape when accompanied by pitch conversion in the embodiment device;
FIG. 8 is a diagram illustrating another configuration example of the cosine oscillator in the band component reproduction unit in the embodiment apparatus;
FIG. 9 is a diagram for explaining a relationship between an attack waveform and a pre-echo in a band-divided audio signal waveform.
FIG. 10 is a diagram for explaining an overall configuration concept of a conventional audio signal waveform processing apparatus using a phase vocoder method;
FIG. 11 is a diagram for explaining a relationship between components of an audio signal waveform of an original sound and each band obtained by dividing a band.
FIG. 12 is a diagram illustrating waveform data regarding an audio signal waveform of an original sound.
FIG. 13 is a diagram for explaining a concept of analyzing an audio signal waveform and extracting an amplitude value and a phase value in an embodiment apparatus.
FIG. 14 is a diagram illustrating a configuration example of a time-frequency conversion processing unit in a conventional audio signal waveform processing apparatus.
FIG. 15 is a diagram illustrating an example of signal waveform processing in a time-frequency conversion processing unit of a conventional audio signal waveform processing apparatus.
[Explanation of symbols]
1 Waveform synthesis unit
2 CPU (Central Processing Unit)
3 ROM [Read Only Memory]
4 RAM (Random Access Memory)
5 controls
6 Keyboard device
61 Lever for speed adjustment
10 Waveform memory
11 Band component playback section
30 Differentiation part
31 Time frequency conversion processor
32 Cosine oscillator
33 multiplier
34 Phase reset signal generator
40 divider
41 Cosine computation section
42 Adder
43 Crossfade calculator

Claims

By having the amplitude and phase of each band component obtained by dividing the audio signal into a plurality of bands as waveform data, and reproducing and synthesizing the band components of each band while compressing / decompressing them as necessary based on this waveform data. An audio signal waveform processing apparatus for restoring an audio signal waveform,
For the pre-echo period generated in the audio signal waveform by band division, the phase of each band component to be reproduced at the beginning of the pre-echo period is changed to the phase of the original band component corresponding to each of the waveform data. An audio signal waveform processing device configured to reset the phase so that

2. The audio signal waveform processing apparatus according to claim 1, wherein time compression / decompression of the waveform to be restored is not performed in the pre-echo section.

3. The audio signal waveform processing apparatus according to claim 1, wherein the phase reset replaces the phase of the waveform that has been reproduced up to now and the value of the phase to be replaced based on the waveform data by crossfading.

4. When the waveform reproduction involves pitch conversion, the degree of phase change based on the pitch conversion and the stepping degree of the read address for reading the waveform data are matched. Audio signal waveform processing device.