JP2015079131A

JP2015079131A - Acoustic signal processing device and acoustic signal processing program

Info

Publication number: JP2015079131A
Application number: JP2013216255A
Authority: JP
Inventors: 太白木原; Futoshi Shirokibara
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-10-17
Filing date: 2013-10-17
Publication date: 2015-04-23

Abstract

PROBLEM TO BE SOLVED: To provide an acoustic signal processing device capable of reducing an arithmetic processing quantity, and an acoustic signal processing program.SOLUTION: A delay amount adjustment part 7 adjusts a delay time difference of a plurality of reflection voices V1, V2, ..., Vk to an integer multiple of a time corresponding to an FFT shift size. An acoustic block selection part 9 selects an acoustic block corresponding to a direct voice V0 and selects acoustic blocks corresponding to the plurality of reflection voices V1, V2, ..., Vk on the basis of the adjusted delay time difference. A convolution operation part 10 uses the selected acoustic blocks and a selected divided HRTF block to execute a convolution operation, in frequency domains, on the direct voice V0 and the plurality of reflection voices V1, V2, ..., Vk and performs complex vector addition on a result of the convolution operation. A time domain conversion part 11 successively converts the arithmetic result of the convolution operation part 10 into acoustic signals in time domains through IFFT.

Description

本発明は、音響空間における音を再現するための音響信号を出力する音響信号処理装置および音響信号処理プログラムに関する。 The present invention relates to an acoustic signal processing device and an acoustic signal processing program for outputting an acoustic signal for reproducing sound in an acoustic space.

コンサートホールまたは劇場等の音響空間における音響効果をリスニングルームにおいて再現するための種々の技術が開発されている（特許文献１〜３参照）。 Various techniques for reproducing the acoustic effect in an acoustic space such as a concert hall or a theater in a listening room have been developed (see Patent Documents 1 to 3).

音響空間内の音源から放射された音は、受聴者に直接到達するとともに、音響空間の壁または天井等により１回または複数反射した後に到達する。複数の反射音は、直接音に対してそれぞれの音線経路の長さに応じた遅延時間をもって受聴者に到来する。リスニングルームにおいて音響空間の音響効果を再現するために、音響空間内での複数の反射音と同じ遅延時間を有する複数の反射音が再生される。特許文献１に記載された反射音抽出装置では、予め記憶された複数の反射音と音楽信号とを畳み込むことにより音場が再現される。特許文献２に記載された残響付与装置では、発音点から受音点の方向が発音点の向きとして特定され、特定された発音点の向きを反映させたインパルス応答と音響効果を付与すべき音響信号との畳み込み演算が行われる。特許文献３に記載された残響付与装置では、発音点および受音点の指向特性等に応じて求められた音線合成ベクトルからインパルス応答が特定され、このインパルス応答が音響信号に畳み込み演算される。 The sound radiated from the sound source in the acoustic space reaches the listener directly and reaches after being reflected once or a plurality of times by the wall or ceiling of the acoustic space. The plurality of reflected sounds arrive at the listener with a delay time corresponding to the length of each sound ray path with respect to the direct sound. In order to reproduce the acoustic effect of the acoustic space in the listening room, a plurality of reflected sounds having the same delay time as the plurality of reflected sounds in the acoustic space are reproduced. In the reflected sound extraction apparatus described in Patent Document 1, a sound field is reproduced by convolving a plurality of reflected sounds and music signals stored in advance. In the reverberation imparting device described in Patent Literature 2, the direction from the sounding point to the sound receiving point is specified as the direction of the sounding point, and an impulse response and a sound effect to which a sound effect is reflected reflecting the direction of the specified sounding point. A convolution operation with the signal is performed. In the reverberation imparting device described in Patent Document 3, an impulse response is specified from a sound ray synthesis vector obtained according to the directivity characteristics of the sound generation point and the sound reception point, and the impulse response is convolved with the acoustic signal. .

また、仮想的な音響空間における反射音を再現するためには、異なる遅延時間を有する複数の反射音の音響信号と頭部伝達関数との畳み込み演算を行うことが考えられる。 In order to reproduce the reflected sound in the virtual acoustic space, it is conceivable to perform a convolution operation between the acoustic signals of a plurality of reflected sounds having different delay times and the head-related transfer function.

特開平５−４６１９３号公報JP-A-5-46193 特許第４０６２９５９号Patent No. 4062959 特許第４４６４０６４号Japanese Patent No. 4464064

上記のように、実際の音響空間または仮想的な音響空間における複数の反射音を再現するためには、それぞれ異なる遅延時間を有する複数の音響信号とインパルス応答または頭部伝達関数との畳み込み演算を行う必要がある。 As described above, in order to reproduce a plurality of reflected sounds in an actual sound space or a virtual sound space, a convolution operation between a plurality of sound signals having different delay times and an impulse response or a head-related transfer function is performed. There is a need to do.

しかしながら、多数の反射音に対応する多数の音響信号が存在するため、畳み込み演算の処理量が増大する。その場合、リアルタイムな音響信号の入力に対して演算処理が遅れないように、高速動作が可能な演算処理装置を用いる必要がある。それにより、コストが増加するとともにシステムの小型化が困難となる。一方、比較的安価な演算処理装置を用いた場合には、音響信号のリアルタイムの入力に演算処理が遅れないように、音の再現精度を低下させざるを得ない。 However, since there are a large number of acoustic signals corresponding to a large number of reflected sounds, the processing amount of the convolution calculation increases. In that case, it is necessary to use an arithmetic processing device capable of high-speed operation so that the arithmetic processing is not delayed with respect to the input of the real-time acoustic signal. This increases the cost and makes it difficult to reduce the size of the system. On the other hand, when a relatively inexpensive arithmetic processing device is used, the sound reproduction accuracy has to be lowered so that the arithmetic processing is not delayed by the real-time input of the acoustic signal.

本発明の目的は、演算処理量を低減可能な音響信号処理装置および音響信号処理プログラムを提供することである。 An object of the present invention is to provide an acoustic signal processing device and an acoustic signal processing program capable of reducing the amount of calculation processing.

（１）本発明に係る音響信号処理装置は、第１の音源により放射されて受音点に到来する第１の音と少なくとも１つの第２の音源により放射されて第１の音から遅延して受音点に到来する少なくとも１つの第２の音とを混合した音を表す音響信号を出力する音響信号処理装置であって、第１の音と第２の音との間の遅延時間差を算出する算出部と、第１の音源により放射される第１の音を表す原音響信号を時間軸上で一定のシフト量ずつシフトしつつ順次時間−周波数変換することにより周波数領域の音響信号を得る第１の変換部と、算出部により算出された遅延時間差を時間−周波数変換のシフト量に相当する時間の整数倍に調整する調整部と、第１の変換部により得られた周波数領域の音響信号から第１の音に対応する第１の信号部分を選択し、調整部により調整された遅延時間差に基づいて、第１の変換部により得られた周波数領域の音響信号から第２の音に対応する第２の信号部分を選択する選択部と、第１の音源から受音点までの第１の音響伝達関数と選択部により選択された第１の信号部分との第１の畳み込み演算および第２の音源から受音点までの第２の音響伝達関数と選択部により選択された第２の信号部分との第２の畳み込み演算を周波数領域で行い、第１および第２の畳み込み演算の結果の加算を行う演算部と、演算部による加算の結果を時間領域の音響信号に変換する第２の変換部とを備えるものである。 (1) The acoustic signal processing device according to the present invention is delayed from the first sound by the first sound radiated from the first sound source and arriving at the sound receiving point and the at least one second sound source. An acoustic signal processing apparatus that outputs an acoustic signal representing a sound obtained by mixing at least one second sound arriving at a sound receiving point, and calculating a delay time difference between the first sound and the second sound. An acoustic signal in the frequency domain is obtained by performing time-frequency conversion sequentially while shifting the original acoustic signal representing the first sound radiated from the first sound source by a certain shift amount on the time axis. A first conversion unit to be obtained, an adjustment unit that adjusts the delay time difference calculated by the calculation unit to an integral multiple of a time corresponding to a shift amount of time-frequency conversion, and a frequency domain obtained by the first conversion unit. A first signal portion corresponding to the first sound from the acoustic signal; And selecting a second signal portion corresponding to the second sound from the acoustic signal in the frequency domain obtained by the first converter based on the delay time difference adjusted by the adjustment unit; A first convolution operation between the first sound transfer function from one sound source to the sound receiving point and the first signal portion selected by the selection unit, and a second sound transfer from the second sound source to the sound receiving point. A second convolution operation between the function and the second signal portion selected by the selection unit is performed in the frequency domain, and an addition result of the first and second convolution operations is added; Is converted to a time domain acoustic signal.

この音響信号処理装置においては、第１の音源に対応する第１の音と少なくとも１つの第２の音源に対応する少なくとも１つの第２の音との間の遅延時間差が算出される。さらに、算出された遅延時間差が時間−周波数変換のシフト量に相当する時間の整数倍に調整される。 In this acoustic signal processing device, a delay time difference between a first sound corresponding to the first sound source and at least one second sound corresponding to at least one second sound source is calculated. Further, the calculated delay time difference is adjusted to an integral multiple of the time corresponding to the shift amount of the time-frequency conversion.

第１の音を表す原音響信号が時間軸上で一定のシフト量ずつシフトされつつ順次時間−周波数変換されることにより周波数領域の音響信号が得られる。周波数領域の音響信号から第１の音に対応する第１の信号部分が選択され、調整された遅延時間差に基づいて、第２の音に対応する第２の信号部分が選択される。第１の音響伝達関数と第１の信号部分との第１の畳み込み演算および第２の音響伝達関数と第２の信号部分との第２の畳み込み演算が周波数領域で行われ、第１および第２の畳み込み演算の結果の加算が行われる。加算の結果が時間領域の音響信号に変換される。 The original acoustic signal representing the first sound is sequentially time-frequency converted while being shifted by a certain shift amount on the time axis, thereby obtaining an acoustic signal in the frequency domain. A first signal portion corresponding to the first sound is selected from the acoustic signal in the frequency domain, and a second signal portion corresponding to the second sound is selected based on the adjusted delay time difference. A first convolution operation between the first acoustic transfer function and the first signal portion and a second convolution operation between the second acoustic transfer function and the second signal portion are performed in the frequency domain, and the first and second The result of the convolution operation of 2 is added. The result of the addition is converted into an acoustic signal in the time domain.

この場合、周波数領域の音響信号における第１の信号部分と第２の信号部分との間の遅延時間差は時間−周波数変換のシフト量に相当する時間の整数倍であるため、第２の信号部分として、以前の時間−周波数変換により既に得られている第１の信号部分を用いることができる。そのため、第２の信号部分を得るための時間−周波数変換が不要である。また、第１および第２の畳み込み演算の結果の加算が周波数領域で行われるので、原音響信号の１つの信号部分（第１または第２の信号部分）当たり、音響ブロック周波数領域から時間領域への１回の変換により時間領域の音響信号を得ることができる。それにより、演算回数を低減することができる。その結果、受音点に到来する音を表す音響信号を出力するための演算処理における処理量を低減することが可能となる。 In this case, since the delay time difference between the first signal portion and the second signal portion in the acoustic signal in the frequency domain is an integral multiple of the time corresponding to the shift amount of the time-frequency conversion, the second signal portion The first signal portion already obtained by the previous time-frequency conversion can be used. Therefore, time-frequency conversion for obtaining the second signal portion is unnecessary. In addition, since the addition of the results of the first and second convolution operations is performed in the frequency domain, the sound block frequency domain is changed to the time domain per signal part (first or second signal part) of the original acoustic signal. An acoustic signal in the time domain can be obtained by a single conversion. Thereby, the number of calculations can be reduced. As a result, it is possible to reduce the amount of processing in the arithmetic processing for outputting the acoustic signal representing the sound arriving at the sound receiving point.

（２）第１の変換部は、原音響信号から第１のサンプル数の単位ブロックを順次取得し、単位ブロックを含みかつ第１のサンプル数よりも多い第２のサンプル数の音響信号を高速フーリエ変換し、第１の変換部、演算部および第２の変換部は、オーバラップセーブ法またはオーバラップアド法により高速フーリエ変換、第１および第２の畳み込み演算ならびに時間領域の音響信号への変換を行い、高速フーリエ変換のシフト量は単位ブロックのサンプル数に等しくてもよい。 (2) The first conversion unit sequentially obtains a unit block of the first number of samples from the original sound signal, and high-speeds an acoustic signal of the second number of samples including the unit block and larger than the first number of samples. Fourier transform, and the first transform unit, the computation unit, and the second transform unit perform fast Fourier transform, first and second convolution operations, and time domain acoustic signals by the overlap save method or the overlap add method. Conversion is performed, and the shift amount of the fast Fourier transform may be equal to the number of samples of the unit block.

この場合、単位ブロックのサイズを小さくすることにより、遅延時間差の調整による誤差および畳み込み演算における遅延時間を低減することができる。それにより、受音点に到来する音を高い精度で再現することができる。 In this case, by reducing the size of the unit block, it is possible to reduce the error due to the adjustment of the delay time difference and the delay time in the convolution calculation. As a result, the sound arriving at the sound receiving point can be reproduced with high accuracy.

（３）第１の音響伝達関数は複数の第１の分割伝達関数を含み、複数の第１の分割伝達関数は、第１の音源から受音点までの時間領域の第１の音響応答特性の分割により得られた複数の第１の分割応答特性が高速フーリエ変換されることにより得られ、第２の音響伝達関数は複数の第２の分割伝達関数を含み、複数の第２の分割伝達関数は、第２の音源から受音点までの時間領域の第２の音響応答特性の分割により得られた複数の第２の分割応答特性が高速フーリエ変換されることにより得られ、選択部は、複数の第１の分割伝達関数の分割数に応じた数の第１の信号部分を選択し、複数の第２の分割伝達関数の分割数に応じた数の複数の第２の信号部分を選択し、演算部は、複数の第１の分割伝達関数と選択部により選択された複数の第１の信号部分との第１の畳み込み演算および複数の第２の分割伝達関数と選択部により選択された複数の第２の信号部分との第２の畳み込み演算を周波数領域で行ってもよい。 (3) The first acoustic transfer function includes a plurality of first divided transfer functions, and the plurality of first divided transfer functions are the first acoustic response characteristics in the time domain from the first sound source to the sound receiving point. The plurality of first division response characteristics obtained by the division are obtained by fast Fourier transform, and the second acoustic transfer function includes a plurality of second division transfer functions, and a plurality of second division transfer functions The function is obtained by fast Fourier transforming a plurality of second divided response characteristics obtained by dividing the second acoustic response characteristic in the time domain from the second sound source to the sound receiving point. , Selecting a number of first signal portions according to the number of divisions of the plurality of first division transfer functions, and selecting a plurality of second signal portions according to the number of divisions of the plurality of second division transfer functions. The calculation unit selects a plurality of first divided transfer functions and a plurality of first divisions selected by the selection unit. The second convolution of the first convolution and a plurality of second divided transfer function and a plurality of second signal portion selected by the selection unit of the signal portion may be performed in the frequency domain.

この場合、各第１の信号部分のサイズが小さくなり、各第２の信号部分のサイズが小さくなる。それにより、時間領域の音響信号を高速フーリエ変換する際の演算回数が低減される。したがって、受音点に到来する音を表す音響信号を出力するための演算処理における処理量をより低減することが可能となる。 In this case, the size of each first signal portion is reduced, and the size of each second signal portion is reduced. This reduces the number of computations when fast Fourier transforming the time domain acoustic signal. Therefore, it is possible to further reduce the processing amount in the arithmetic processing for outputting the acoustic signal representing the sound arriving at the sound receiving point.

また、単位ブロックのサイズを小さくすることができるので、遅延時間差の調整による誤差および畳み込み演算における遅延時間を低減することができる。それにより、受音点に到来する音をより高い精度で再現することができる。 In addition, since the size of the unit block can be reduced, it is possible to reduce the error due to the adjustment of the delay time difference and the delay time in the convolution calculation. Thereby, the sound arriving at the sound receiving point can be reproduced with higher accuracy.

（４）第１の音は、第１の音源から反射することなく受音点に到来する直接音であり、第２の音は、第１の音源から反射しつつ到来する反射音であり、第２の音源は、反射音を仮想的に放射する仮想音源であってもよい。 (4) The first sound is a direct sound that arrives at the receiving point without being reflected from the first sound source, and the second sound is a reflected sound that is reflected while being reflected from the first sound source, The second sound source may be a virtual sound source that virtually radiates reflected sound.

この場合、実際の音響空間または仮想的な音響空間において受音点に到来する音を再現することが可能となる。 In this case, it is possible to reproduce the sound that arrives at the sound receiving point in an actual acoustic space or a virtual acoustic space.

（５）本発明に係る音響信号処理プログラムは、第１の音源により放射されて受音点に到来する第１の音と少なくとも１つの第２の音源により放射されて第１の音から遅延して受音点に到来する少なくとも１つの第２の音とを混合した音を表す音響信号を出力するためにコンピュータにより実行可能な音響信号処理プログラムであって、第１の音と第２の音との間の遅延時間差を算出する処理と、第１の音源により放射される第１の音を表す原音響信号を時間軸上で一定のシフト量ずつシフトしつつ時間−周波数変換することにより周波数領域の音響信号を得る処理と、算出された遅延時間差を時間−周波数変換のシフト量に相当する時間の整数倍に調整する処理と、周波数領域の音響信号から第１の音に対応する第１の信号部分を選択し、調整された遅延時間差に基づいて、周波数領域の音響信号から第２の音に対応する第２の信号部分を選択する処理と、第１の音源から受音点までの第１の音響伝達関数と選択された第１の信号部分との第１の畳み込み演算および第２の音源から受音点までの第２の音響伝達関数と選択された第２の信号部分との第２の畳み込み演算を周波数領域で行い、第１および第２の畳み込み演算の結果の加算を行う処理と、加算の結果を時間領域の音響信号に変換する処理とを、コンピュータに実行させるものである。 (5) The acoustic signal processing program according to the present invention is delayed from the first sound by the first sound radiated from the first sound source and arriving at the sound receiving point and the at least one second sound source. An acoustic signal processing program executable by a computer to output an acoustic signal representing a sound obtained by mixing at least one second sound arriving at a sound receiving point, the first sound and the second sound Frequency by performing a time-frequency conversion while shifting the original sound signal representing the first sound radiated by the first sound source by a certain shift amount on the time axis. A process for obtaining an acoustic signal in the region, a process for adjusting the calculated delay time difference to an integral multiple of a time corresponding to the shift amount of the time-frequency conversion, and a first corresponding to the first sound from the acoustic signal in the frequency domain Select the signal part of A process of selecting a second signal portion corresponding to the second sound from the frequency domain acoustic signal based on the adjusted delay time difference, a first acoustic transfer function from the first sound source to the sound receiving point, and The first convolution operation with the selected first signal portion and the second convolution operation between the second sound transfer function from the second sound source to the sound receiving point and the selected second signal portion with the frequency are performed. The computer executes a process of performing addition of the results of the first and second convolution operations in the area and a process of converting the result of the addition into an acoustic signal in the time domain.

この音響信号処理プログラムによれば、周波数領域の音響信号における第１の信号部分と第２の信号部分との間の遅延時間差は時間−周波数変換のシフト量に相当する時間の整数倍であるため、第２の信号部分として、以前の時間−周波数変換により既に得られている第１の信号部分を用いることができる。そのため、第２の信号部分を得るための時間−周波数変換が不要である。また、第１および第２の畳み込み演算の結果の加算が周波数領域で行われるので、原音響信号の１つの信号部分（第１または第２の信号部分）当たり、周波数領域から時間領域への１回の変換により時間領域の音響信号を出力することができる。それにより、演算回数を低減することができる。その結果、受音点に到来する音を表す音響信号を出力するための演算処理における処理量を低減することが可能となる。 According to this acoustic signal processing program, the delay time difference between the first signal portion and the second signal portion in the frequency domain acoustic signal is an integral multiple of the time corresponding to the shift amount of the time-frequency conversion. As the second signal part, the first signal part already obtained by the previous time-frequency conversion can be used. Therefore, time-frequency conversion for obtaining the second signal portion is unnecessary. In addition, since the addition of the results of the first and second convolution operations is performed in the frequency domain, one signal portion (first or second signal portion) of the original acoustic signal is 1 from the frequency domain to the time domain. An acoustic signal in the time domain can be output by the conversion of times. Thereby, the number of calculations can be reduced. As a result, it is possible to reduce the amount of processing in the arithmetic processing for outputting the acoustic signal representing the sound arriving at the sound receiving point.

本発明によれば、受音点に到来する音を表す音響信号を出力するための演算処理における処理量を低減することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to reduce the processing amount in the arithmetic processing for outputting the acoustic signal showing the sound which arrives at a sound receiving point.

本発明の一実施の形態に係る音響信号処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the acoustic signal processing apparatus which concerns on one embodiment of this invention. 仮想的な音響空間を示す模式図である。It is a schematic diagram which shows virtual acoustic space. 図１の音響信号処理装置のハードウエア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the acoustic signal processing apparatus of FIG. 時間領域の頭部インパルス応答および周波数領域の頭部伝達関数の説明図である。It is explanatory drawing of the head impulse response of a time domain, and the head-related transfer function of a frequency domain. ＨＲＴＦデータベースに格納される複数組の分割ＨＲＴＦブロックを示す模式図である。It is a schematic diagram showing a plurality of sets of divided HRTF blocks stored in the HRTF database. 時間領域の原音響信号および周波数領域の音響ブロックの説明図である。It is explanatory drawing of the original acoustic signal of a time domain, and the acoustic block of a frequency domain. 直接音および反射音に対応する頭部インパルス応答、頭部伝達関数、調整前の遅延量、調整後の遅延量および遅延ブロック数を示す図である。It is a figure which shows the head impulse response corresponding to a direct sound and a reflected sound, a head-related transfer function, the delay amount before adjustment, the delay amount after adjustment, and the number of delay blocks. 周波数領域での分割ＨＲＴＦブロックと音響ブロックとの畳み込み演算を示す図である。It is a figure which shows the convolution calculation of the division | segmentation HRTF block and acoustic block in a frequency domain. 時間領域での音響信号のつなぎ合わせを示す図である。It is a figure which shows the joining of the acoustic signal in a time domain. 図１の音響信号処理装置により行われる音響信号処理を示すフローチャートである。It is a flowchart which shows the acoustic signal process performed by the acoustic signal processing apparatus of FIG. 畳み込み演算処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a convolution calculation process. 参考形態に係る畳み込み演算処理における周波数領域での分割ＨＲＴＦブロックと音響ブロックとの畳み込み演算を示す図である。It is a figure which shows the convolution calculation of the division | segmentation HRTF block and acoustic block in the frequency domain in the convolution calculation process which concerns on a reference form. 参考形態に係る畳み込み演算処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the convolution calculation process which concerns on a reference form. 分割オーバラップアド法を用いた場合の時間領域の原音響信号および周波数領域の音響ブロックの説明図である。It is explanatory drawing of the original sound signal of a time domain at the time of using a division | segmentation overlap add method, and the sound block of a frequency domain. 分割オーバラップアド法を用いた場合の時間領域での音響信号のつなぎ合わせを示す図である。It is a figure which shows the joining of the acoustic signal in the time domain at the time of using a division | segmentation overlap add method.

以下、本発明の実施の形態に係る音響信号処理装置および音響信号プログラムについて図面を用いて詳細に説明する。 Hereinafter, an acoustic signal processing device and an acoustic signal program according to embodiments of the present invention will be described in detail with reference to the drawings.

（１）音響信号処理装置の機能的な構成
図１は本発明の一実施の形態に係る音響信号処理装置の構成を示す機能ブロック図である。図２は仮想的な音響空間を示す模式図である。図３は図１の音響信号処理装置のハードウエア構成の一例を示すブロック図である。 (1) Functional Configuration of Acoustic Signal Processing Device FIG. 1 is a functional block diagram showing a configuration of an acoustic signal processing device according to an embodiment of the present invention. FIG. 2 is a schematic diagram showing a virtual acoustic space. FIG. 3 is a block diagram showing an example of a hardware configuration of the acoustic signal processing apparatus of FIG.

図１の音響信号処理装置１００は、仮想的な音響空間（以下、仮想空間と呼ぶ）内で受音点に到来する音を表す音響信号を出力する。ここで、図２を参照して仮想空間の一例を説明する。 The acoustic signal processing device 100 in FIG. 1 outputs an acoustic signal representing a sound arriving at a sound receiving point in a virtual acoustic space (hereinafter referred to as a virtual space). Here, an example of the virtual space will be described with reference to FIG.

図２において、仮想空間３００内に主音源Ｓ０および受音点Ｒが配置される。仮想空間３００、主音源Ｓ０および受音点Ｒはコンピュータプログラム上で仮想的に作成される。主音源Ｓ０から音が前後、左右および上下の３次元方向に放射される。主音源Ｓ０から放射された音は、受音点Ｒに直接音Ｖ０として到達するとともに、仮想空間３００の壁または天井等により１回または複数回反射され、受音点Ｒに複数の反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋとして到達する。ここで、ｋは自然数であり、反射音の数を表す。図２では、複数の反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋの方向が２次元方向で表されているが、複数の反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋの方向が３次元方向で表されてもよい。 In FIG. 2, the main sound source S0 and the sound receiving point R are arranged in the virtual space 300. The virtual space 300, the main sound source S0, and the sound receiving point R are virtually created on a computer program. Sound is emitted from the main sound source S0 in the three-dimensional directions of front and rear, left and right, and upper and lower. The sound radiated from the main sound source S0 reaches the sound receiving point R as a direct sound V0 and is reflected once or a plurality of times by the wall or ceiling of the virtual space 300, and a plurality of reflected sounds V1 are received at the sound receiving point R. , V2, V3, V4, ..., Vk. Here, k is a natural number and represents the number of reflected sounds. In FIG. 2, the directions of the plurality of reflected sounds V1, V2, V3, V4,..., Vk are represented in a two-dimensional direction, but the directions of the plurality of reflected sounds V1, V2, V3, V4,. It may be expressed in a three-dimensional direction.

複数の反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋは、等価的にそれぞれ仮想音源Ｓ１，Ｓ２，Ｓ３，Ｓ４，…，Ｓｋから放射されるものとみなすことができる。仮想音源Ｓ１，Ｓ２，Ｓ３，Ｓ４，…，Ｓｋは、受音点Ｒから反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋの入射方向と逆方向に向かう直線上に位置する。受音点Ｒと仮想音源Ｓ１，Ｓ２，Ｓ３，Ｓ４，…，Ｓｋとの間の距離は、反射音Ｖ１，Ｖ２，Ｖ３，Ｖ４，…，Ｖｋが主音源Ｓ０から受音点Ｒに到達するまでの経路の長さに等しい。 The plurality of reflected sounds V1, V2, V3, V4,..., Vk can be regarded as equivalently emitted from the virtual sound sources S1, S2, S3, S4,. The virtual sound sources S1, S2, S3, S4,..., Sk are located on a straight line from the sound receiving point R in the direction opposite to the incident direction of the reflected sounds V1, V2, V3, V4,. The distance between the sound receiving point R and the virtual sound sources S1, S2, S3, S4, ..., Sk is such that the reflected sounds V1, V2, V3, V4, ..., Vk reach the sound receiving point R from the main sound source S0. Equal to the length of the path to

以下、主音源Ｓ０から放射された音が直接音Ｖ０として受音点Ｒに到達するまでの時間を遅延量と呼ぶ。同様に、主音源Ｓ０から放射された音が反射音Ｖ１，Ｖ２，…，Ｖｋとして受音点Ｒに到達するまでの時間を遅延量と呼ぶ。反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量は、直接音Ｖ０の遅延量よりも大きい。反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量と直接音Ｖ０の遅延量との差を遅延時間差と呼ぶ。 Hereinafter, the time until the sound radiated from the main sound source S0 reaches the sound receiving point R as the direct sound V0 is referred to as a delay amount. Similarly, the time until the sound radiated from the main sound source S0 reaches the sound receiving point R as the reflected sounds V1, V2,. The delay amount of the reflected sounds V1, V2,..., Vk is larger than the delay amount of the direct sound V0. The difference between the delay amount of the reflected sounds V1, V2,..., Vk and the delay amount of the direct sound V0 is called a delay time difference.

受音点Ｒに到来する音の方向ごとに周波数領域の頭部伝達関数（ＨＲＴＦ；Head-Related Transfer Function）が予め求められる。すなわち、複数の方向に対応する複数の頭部伝達関数が予め求められる。ここで、受音点Ｒに到来する音の方向が３次元方向で表されている場合、複数の３次元方向にそれぞれ対応する頭部伝達関数が予め求められる。受音点Ｒでの直接音Ｖ０の到来方向に対応する頭部伝達関数は、主音源Ｓ０から受音点Ｒまでの音の伝達特性を示す。受音点Ｒでの反射音Ｖ１，Ｖ２，…，Ｖｋの到来方向に対応する頭部伝達関数は、それぞれ仮想音源Ｓ１，Ｓ２，…，Ｓｋから受音点Ｒまでの音の伝達特性を示す。これらの頭部伝達関数は、後述するように受音点Ｒに到来する音を表す音響信号を算出するために用いられる。 A head-related transfer function (HRTF) in the frequency domain is obtained in advance for each direction of sound arriving at the receiving point R. That is, a plurality of head related transfer functions corresponding to a plurality of directions are obtained in advance. Here, when the direction of the sound arriving at the sound receiving point R is represented in a three-dimensional direction, head related transfer functions respectively corresponding to a plurality of three-dimensional directions are obtained in advance. The head-related transfer function corresponding to the direction of arrival of the direct sound V0 at the sound receiving point R indicates a sound transfer characteristic from the main sound source S0 to the sound receiving point R. The head-related transfer functions corresponding to the arrival directions of the reflected sounds V1, V2,..., Vk at the sound receiving point R indicate the sound transfer characteristics from the virtual sound sources S1, S2,. . These head-related transfer functions are used to calculate an acoustic signal representing a sound arriving at the sound receiving point R as will be described later.

図１において、音響信号処理装置１００は、部屋形状指示部１、主音源位置指示部２、頭部伝達関数データベース（以下、ＨＲＴＦデータベースと呼ぶ）３、および頭部伝達関数ブロック選択部（以下、ＨＲＴＦブロック選択部と呼ぶ）４を含む。また、音響信号処理装置１００は、仮想音源位置算出部５、遅延量算出部６、遅延量調整部７、遅延ブロック数算出部８、および音響ブロック選択部９を含む。さらに、音響信号処理装置１００は、畳み込み演算部１０、時間領域変換部１１、音響信号出力部１２、音響信号入力部１３、周波数領域変換部１４、および周波数領域音響バッファ１５を含む。音響信号処理装置１００の全体は同一のサンプリング周波数で動作する。音響信号処理装置１００の全体のサンプリング周波数をサンプリング周波数ｆｓと表記する。サンプリング周波数ｆｓは、例えば４８ｋＨｚである。 In FIG. 1, an acoustic signal processing apparatus 100 includes a room shape instruction unit 1, a main sound source position instruction unit 2, a head related transfer function database (hereinafter referred to as HRTF database) 3, and a head related transfer function block selection unit (hereinafter referred to as “head transfer function block selection unit”). 4) (referred to as an HRTF block selector). The acoustic signal processing device 100 includes a virtual sound source position calculation unit 5, a delay amount calculation unit 6, a delay amount adjustment unit 7, a delay block number calculation unit 8, and an acoustic block selection unit 9. Furthermore, the acoustic signal processing device 100 includes a convolution operation unit 10, a time domain conversion unit 11, an acoustic signal output unit 12, an acoustic signal input unit 13, a frequency domain conversion unit 14, and a frequency domain acoustic buffer 15. The entire acoustic signal processing apparatus 100 operates at the same sampling frequency. The overall sampling frequency of the acoustic signal processing apparatus 100 is denoted as sampling frequency fs. The sampling frequency fs is 48 kHz, for example.

部屋形状指示部１は、仮想空間の形状（以下、部屋形状と呼ぶ）を指示する部屋データを出力する。例えば、部屋形状指示部１は、ユーザが画面上でマウス等の入力装置を用いて描画した部屋形状を示す部屋データを出力し、または予め準備された複数の部屋形状のうちユーザにより選択された部屋形状を示す部屋データを出力する。あるいは、部屋形状指示部１は、プログラム上で動的に部屋データを出力してもよい。例えば、ビデオゲームにおいてキャラクターの位置によりプログラムが適切な部屋データを選択してもよい。この場合、ビデオゲームのプログラムの一部が部屋形状指示部１に相当する。 The room shape instructing unit 1 outputs room data instructing the shape of the virtual space (hereinafter referred to as a room shape). For example, the room shape instructing unit 1 outputs room data indicating the room shape drawn by the user using an input device such as a mouse on the screen, or is selected by the user from a plurality of room shapes prepared in advance. Outputs room data indicating the room shape. Or the room shape instruction | indication part 1 may output room data dynamically on a program. For example, in a video game, the program may select appropriate room data depending on the position of the character. In this case, a part of the video game program corresponds to the room shape instruction unit 1.

主音源位置指示部２は、仮想空間内での主音源Ｓ０の位置を示す位置データを出力する。例えば、主音源位置指示部２は、ユーザが画面上で描画した部屋形状を有する仮想空間内での主音源Ｓ０の位置を示す位置データを出力する。あるいは、主音源位置指示部２は、プログラム上で動的に位置データを出力してもよい。例えば、ビデオゲームにおけるキャラクターの位置を示す位置データをプログラムが出力してもよい。この場合、ビデオゲームのプログラムの一部が主音源位置指示部２に相当する。主音源Ｓ０の位置データは、例えば、受音点Ｒから主音源Ｓ０へ向かう方向および受音点Ｒから主音源Ｓ０までの距離を表すベクトルデータからなる。 The main sound source position instruction unit 2 outputs position data indicating the position of the main sound source S0 in the virtual space. For example, the main sound source position instruction unit 2 outputs position data indicating the position of the main sound source S0 in a virtual space having a room shape drawn on the screen by the user. Alternatively, the main sound source position instruction unit 2 may dynamically output position data on the program. For example, the program may output position data indicating the position of the character in the video game. In this case, a part of the video game program corresponds to the main sound source position instruction unit 2. The position data of the main sound source S0 includes, for example, vector data representing the direction from the sound receiving point R to the main sound source S0 and the distance from the sound receiving point R to the main sound source S0.

仮想音源位置算出部５は、部屋形状指示部１から出力される部屋データおよび主音源位置指示部２から出力される位置データに基づいて、受音点Ｒに到来する複数の反射音Ｖ１，Ｖ２，…，Ｖｋを仮想的に放射する複数の仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置を算出する。仮想音源位置算出部５からは、複数の仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置を示す位置データが出力される。仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置データは、例えば、受音点Ｒから仮想音源Ｓ１，Ｓ２，…，Ｓｋへ向かう方向と受音点Ｒから仮想音源Ｓ１，Ｓ２，…，Ｓｋまでの距離とを表すベクトルデータからなる。また、仮想音源位置算出部５は、直接音Ｖ０に対する複数の反射音Ｖ１，Ｖ２，…，Ｖｋの振幅減衰量を算出する。振幅減衰量は、各反射音Ｖ１，Ｖ２，…，Ｖｋごとに、音の経路の長さ（距離）、反射の回数および各反射面の吸音率等に基づいて算出される。なお、振幅減衰量の算出処理が音の周波数帯域により異なるように行われてもよい。 Based on the room data output from the room shape instruction unit 1 and the position data output from the main sound source position instruction unit 2, the virtual sound source position calculation unit 5 has a plurality of reflected sounds V1 and V2 that arrive at the sound receiving point R. ,..., Vk, and the positions of a plurality of virtual sound sources S1, S2,. The virtual sound source position calculation unit 5 outputs position data indicating the positions of the plurality of virtual sound sources S1, S2,. The position data of the virtual sound sources S1, S2,..., Sk is, for example, the direction from the sound receiving point R to the virtual sound sources S1, S2,. It consists of vector data representing the distance. In addition, the virtual sound source position calculation unit 5 calculates the amplitude attenuation amount of the plurality of reflected sounds V1, V2,..., Vk with respect to the direct sound V0. The amplitude attenuation amount is calculated for each reflected sound V1, V2,..., Vk based on the length (distance) of the sound path, the number of reflections, the sound absorption coefficient of each reflecting surface, and the like. The calculation process of the amplitude attenuation amount may be performed so as to vary depending on the frequency band of the sound.

遅延量算出部６は、主音源位置指示部２から出力される位置データに基づいて直接音Ｖ０の遅延量を算出するとともに、仮想音源位置算出部５から出力される位置データに基づいて複数の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量をそれぞれ算出する。ここで、複数の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量と直接音Ｖ０の遅延量との差を遅延時間差と呼ぶ。 The delay amount calculation unit 6 calculates the delay amount of the direct sound V0 based on the position data output from the main sound source position instruction unit 2, and a plurality of delays based on the position data output from the virtual sound source position calculation unit 5. The delay amounts of the reflected sounds V1, V2,. Here, the difference between the delay amount of the plurality of reflected sounds V1, V2,..., Vk and the delay amount of the direct sound V0 is referred to as a delay time difference.

遅延量調整部７は、複数の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延時間差がサンプリング周波数ｆｓとＦＦＴ（高速フーリエ変換）シフトサイズとにより定まる時間の整数倍になるように複数の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量を調整する。なお、ＦＦＴシフトサイズについては後述する。具体的には、各反射音の遅延時間差がＦＦＴシフトサイズをサンプリング周波数ｆｓで除算することにより得られる時間の整数倍になるように各反射音の遅延量が調整される。この場合、調整後の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量と調整前の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量と間の誤差が最小となるように整数が選択される。 The delay amount adjusting unit 7 includes a plurality of reflected sounds V1 such that a delay time difference between the plurality of reflected sounds V1, V2,..., Vk is an integral multiple of a time determined by the sampling frequency fs and the FFT (Fast Fourier Transform) shift size. , V2,..., Vk are adjusted. The FFT shift size will be described later. Specifically, the delay amount of each reflected sound is adjusted so that the delay time difference between the reflected sounds is an integral multiple of the time obtained by dividing the FFT shift size by the sampling frequency fs. In this case, an integer is selected so that an error between the delay amount of the reflected sounds V1, V2,..., Vk after adjustment and the delay amount of the reflected sounds V1, V2,.

遅延ブロック数算出部８は、複数の反射音Ｖ１，Ｖ２，…，Ｖｋについての遅延ブロック数を算出する。ここで、遅延ブロック数とは、調整後の遅延時間差に相当する単位ブロック（フレーム）の数である。単位ブロックとは、一度に処理される音響信号のサンプル（すなわち音響信号の処理単位）である。本実施の形態では、単位ブロックはＮサンプルからなる。Ｎは自然数である。 The delay block number calculation unit 8 calculates the number of delay blocks for a plurality of reflected sounds V1, V2,. Here, the number of delay blocks is the number of unit blocks (frames) corresponding to the adjusted delay time difference. A unit block is a sample of an acoustic signal that is processed at one time (that is, an acoustic signal processing unit). In the present embodiment, the unit block consists of N samples. N is a natural number.

音響信号入力部１３は、時間領域の音響信号を入力する。例えば、音響信号入力部１３は、外部機器またはマイクロフォンから音響入力端子に与えられるアナログの音響信号をサンプリング周波数ｆｓでデジタルの音響信号に変換する。あるいは、音響信号入力部１３は、光学ディスク、磁気ディスクまたはメモリカード等の記憶媒体に記憶されたデジタルの音響信号を入力する。以下、音響信号入力部１３により入力された時間領域の音響信号を原音響信号と呼ぶ。原音響信号のサンプリング周波数はｆｓである。 The acoustic signal input unit 13 inputs a time domain acoustic signal. For example, the acoustic signal input unit 13 converts an analog acoustic signal given to an acoustic input terminal from an external device or a microphone into a digital acoustic signal at a sampling frequency fs. Alternatively, the acoustic signal input unit 13 inputs a digital acoustic signal stored in a storage medium such as an optical disk, a magnetic disk, or a memory card. Hereinafter, the time-domain acoustic signal input by the acoustic signal input unit 13 is referred to as an original acoustic signal. The sampling frequency of the original sound signal is fs.

周波数領域変換部１４は、音響信号入力部１３により入力された原音響信号をＦＦＴ（高速フーリエ変換）により周波数領域の音響信号の信号部分に順次変換する。以下、周波数領域の音響信号の信号部分を音響ブロックと呼ぶ。周波数領域変換部１４により変換された音響ブロックは、周波数領域音響バッファ１５に順次格納される。 The frequency domain conversion unit 14 sequentially converts the original acoustic signal input from the acoustic signal input unit 13 into a signal portion of the frequency domain acoustic signal by FFT (Fast Fourier Transform). Hereinafter, the signal portion of the frequency domain acoustic signal is referred to as an acoustic block. The acoustic blocks converted by the frequency domain converting unit 14 are sequentially stored in the frequency domain acoustic buffer 15.

音響ブロック選択部９は、遅延ブロック数算出部８により算出された遅延ブロック数に基づいて、周波数領域音響バッファ１５に格納された音響ブロックから、直接音Ｖ０および複数の反射音Ｖ１，Ｖ２，…，Ｖｋに対応する音響ブロックを選択する。 Based on the number of delay blocks calculated by the delay block number calculation unit 8, the acoustic block selection unit 9 generates a direct sound V 0 and a plurality of reflected sounds V 1, V 2,... From the acoustic block stored in the frequency domain acoustic buffer 15. , Vk is selected.

一方、ＨＲＴＦデータベース３には、周波数領域の複数組の分割頭部伝達関数（以下、分割ＨＲＦＴブロックと呼ぶ)が予め格納される。分割ＨＲＴＦブロックの詳細については後述する。複数組の分割ＨＲＴＦブロックは、受音点Ｒに到来する音の複数の方向に対応して予め準備されている。受音点Ｒに到来する音の方向が３次元方向で表される場合、複数組の分割ＨＲＴＦブロックはそれぞれ３次元方向に対応する。 On the other hand, in the HRTF database 3, a plurality of sets of divided head related transfer functions (hereinafter referred to as divided HRFT blocks) in the frequency domain are stored in advance. Details of the divided HRTF block will be described later. A plurality of sets of divided HRTF blocks are prepared in advance corresponding to a plurality of directions of sound arriving at the sound receiving point R. When the direction of the sound arriving at the sound receiving point R is represented by a three-dimensional direction, the plurality of sets of divided HRTF blocks respectively correspond to the three-dimensional direction.

ＨＲＴＦブロック選択部４は、主音源位置指示部２および仮想音源位置算出部５から出力される位置データに基づいて、ＨＲＴＦデータベース３に格納される複数組の分割ＨＲＴＦブロックから、直接音Ｖ０および複数の反射音Ｖ１，Ｖ２，Ｖ３，…，Ｖｋに対応する分割ＨＲＴＦブロックを選択する。 Based on the position data output from the main sound source position instructing unit 2 and the virtual sound source position calculating unit 5, the HRTF block selecting unit 4 receives the direct sound V0 and the plurality of direct sounds V0 from a plurality of sets of divided HRTF blocks stored in the HRTF database 3. The divided HRTF blocks corresponding to the reflected sounds V1, V2, V3,.

畳み込み演算部１０は、音響ブロック選択部９により選択された音響ブロックおよびＨＲＴＦブロック選択部４により選択された分割ＨＲＴＦブロックを用いて、直接音Ｖ０および複数の反射音Ｖ１，Ｖ２，…，Ｖｋについての周波数領域での畳み込み演算を行い、畳み込み演算の結果を複素ベクトル加算する。この場合、畳み込み演算部１０は、仮想音源位置算出部５により算出された振幅減衰量に基づいて音響ブロックにおける各周波数成分の振幅を調整する。 The convolution operation unit 10 uses the acoustic block selected by the acoustic block selection unit 9 and the divided HRTF block selected by the HRTF block selection unit 4 to perform direct sound V0 and a plurality of reflected sounds V1, V2,. The convolution operation in the frequency domain is performed, and the result of the convolution operation is added as a complex vector. In this case, the convolution operation unit 10 adjusts the amplitude of each frequency component in the acoustic block based on the amplitude attenuation amount calculated by the virtual sound source position calculation unit 5.

時間領域変換部１１は、畳み込み演算部１０の演算結果をＩＦＦＴ（逆高速フーリエ変換）により時間領域の音響信号に順次変換する。 The time domain conversion unit 11 sequentially converts the calculation results of the convolution calculation unit 10 into time domain acoustic signals by IFFT (Inverse Fast Fourier Transform).

音響信号出力部１２は、時間領域変換部１１により変換されたサンプリング周波数ｆｓの音響信号を出力する。例えば、音響信号出力部１２は、サンプリング周波数ｆｓのデジタルの音響信号をアナログの音響信号に変換し、音響出力端子を通してヘッドフォンまたはスピーカにアナログの音響信号を出力する。それにより、ヘッドフォンまたはスピーカから音が発生される。 The acoustic signal output unit 12 outputs an acoustic signal having the sampling frequency fs converted by the time domain conversion unit 11. For example, the acoustic signal output unit 12 converts a digital acoustic signal having a sampling frequency fs into an analog acoustic signal, and outputs the analog acoustic signal to a headphone or a speaker through an acoustic output terminal. Thereby, sound is generated from the headphone or the speaker.

本実施の形態では、分割ＨＲＴＦブロックを用いたオーバラップセーブ（Overlap-Save）法により周波数領域での畳み込み演算が行われる。以下、分割ＨＲＴＦブロックを用いたオーバラップセーブ法を分割オーバラップセーブ法と呼ぶ。 In the present embodiment, a convolution operation in the frequency domain is performed by an overlap-save method using divided HRTF blocks. Hereinafter, the overlap save method using the divided HRTF block is referred to as a divided overlap save method.

（２）音響信号処理装置のハードウエア構成
図３は音響信号処理装置１００のハードウエア構成の一例を示すブロック図である。 (2) Hardware Configuration of Acoustic Signal Processing Device FIG. 3 is a block diagram showing an example of the hardware configuration of the acoustic signal processing device 100.

図３の音響信号処理装置１００は、ＣＰＵ（中央演算処理装置）１１０、ＲＯＭ（リードオンリメモリ）１２０、ＲＡＭ（ランダムアクセスメモリ）１３０、記憶装置１４０、表示装置１５０、入力装置１６０および出力装置１７０を含む。 3 includes a CPU (Central Processing Unit) 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage device 140, a display device 150, an input device 160, and an output device 170. including.

ＲＯＭ１２０は、例えば不揮発性メモリからなり、システムプログラムおよび音響信号処理プログラム等のコンピュータプログラムを記憶する。ＲＡＭ１３０は、例えば揮発性メモリからなり、ＣＰＵ１１０の作業領域として用いられるとともに、各種データを一時的に記憶する。ＣＰＵ１１０は、ＲＯＭ１２０に記憶された音響信号処理プログラムをＲＡＭ１３０上で実行することにより後述する音響信号処理を行う。この場合、図１の各構成要素の機能が実現される。 The ROM 120 is composed of, for example, a non-volatile memory, and stores computer programs such as a system program and an acoustic signal processing program. The RAM 130 is composed of, for example, a volatile memory, is used as a work area for the CPU 110, and temporarily stores various data. The CPU 110 performs acoustic signal processing described later by executing an acoustic signal processing program stored in the ROM 120 on the RAM 130. In this case, the function of each component in FIG. 1 is realized.

記憶装置１４０は、ハードディスク、光学ディスク、磁気ディスクまたはメモリカード等の記憶媒体を含む。この記憶装置１４０には、図１のＨＲＴＦデータベース３および周波数領域音響バッファ１５が構成される。上記の音響信号処理プログラムが記憶装置１４０に記憶されてもよい。また、例えば、図１の音響信号処理装置１００がビデオゲームのプログラムの一部として構成される場合、ビデオゲームのプログラムが記憶装置１４０に記憶されてもよい。 The storage device 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card. The storage device 140 includes the HRTF database 3 and the frequency domain acoustic buffer 15 shown in FIG. The acoustic signal processing program may be stored in the storage device 140. Further, for example, when the audio signal processing device 100 of FIG. 1 is configured as a part of a video game program, the video game program may be stored in the storage device 140.

なお、本実施の形態における音響信号処理プログラムは、コンピュータが読み取り可能な記録媒体に格納された形態で提供されてＲＯＭ１２０または記憶装置１４０にインストールされてもよく、通信網を介した配信の形態で提供されてＲＯＭ１２０または記憶装置１４０にインストールされてもよい。 The acoustic signal processing program in the present embodiment may be provided in a form stored in a computer-readable recording medium and installed in the ROM 120 or the storage device 140, or in a form distributed via a communication network. It may be provided and installed in the ROM 120 or the storage device 140.

表示装置１５０は、液晶表示装置、有機ＥＬ（エレクトロルミネッセンス）表示装置またはプラズマディスプレイ装置等からなる。入力装置１６０は、マウス、キーボード、および音響入力端子等を含む。入力装置１６０がビデオゲーム用のコントローラであってもよい。 The display device 150 includes a liquid crystal display device, an organic EL (electroluminescence) display device, a plasma display device, or the like. The input device 160 includes a mouse, a keyboard, and an acoustic input terminal. The input device 160 may be a video game controller.

表示装置１５０および入力装置１６０は、例えばユーザが画面上で部屋形状および主音源の位置を指示するために用いられる。表示装置１５０および入力装置１６０がタッチパネルとして一体化されてもよい。 The display device 150 and the input device 160 are used, for example, for the user to indicate the room shape and the position of the main sound source on the screen. The display device 150 and the input device 160 may be integrated as a touch panel.

出力装置１７０は、音響出力端子およびヘッドフォン等を含む。出力装置１７０がスピーカを含んでもよい。出力装置１７０の音響出力端子からは音響信号処理により得られた音響信号が出力される。 The output device 170 includes a sound output terminal and headphones. The output device 170 may include a speaker. An acoustic signal obtained by acoustic signal processing is output from the acoustic output terminal of the output device 170.

音響信号処理装置１００は、ＣＰＵ１１０の代わりにＤＳＰ（Digital Signal Processor）を備えてもよく、またはＣＰＵ１１０に加えてＤＳＰを備えてもよい。また、図１の各構成要素の一部または全てが電子回路等のハードウエアにより構成されてもよい。 The acoustic signal processing apparatus 100 may include a DSP (Digital Signal Processor) instead of the CPU 110, or may include a DSP in addition to the CPU 110. Also, some or all of the components in FIG. 1 may be configured by hardware such as an electronic circuit.

（３）頭部伝達関数
図４は時間領域の頭部インパルス応答および周波数領域の頭部伝達関数の説明図である。 (3) Head-related transfer function FIG. 4 is an explanatory diagram of a time-domain head impulse response and a frequency-domain head-related transfer function.

直接音Ｖ０に対応する時間領域の頭部インパルス応答（ＨＲＩＲ；；Head-Related Impulse Response）ｈ０がＭ個の部分（以下、分割ＨＲＩＲブロックと呼ぶ）に分割される。Ｍは自然数である。図４の例では、頭部インパルス応答ｈ０が時間軸上で４つの分割ＨＲＩＲブロックｈ０，０，ｈ０，１，ｈ０，２，ｈ０，３に分割される。各分割ＨＲＩＲブロックｈ０，０，ｈ０，１，ｈ０，２，ｈ０，３はＮサンプルからなる。頭部インパルス応答のサンプリング周波数は原音響信号のサンプリング周波数ｆｓと等しい。 A time-domain head-head impulse response (HRIR;) corresponding to the direct sound V0 is divided into M parts (hereinafter referred to as divided HRIR blocks). M is a natural number. In the example of FIG. 4, the head impulse response h0 is divided into four divided HRIR blocks h0, 0, h0, 1, h0, 2, h0, 3 on the time axis. Each divided HRIR block h0,0, h0,1, h0,2, h0,3 consists of N samples. The sampling frequency of the head impulse response is equal to the sampling frequency fs of the original sound signal.

分割ＨＲＩＲブロックｈ０，０の後にＮサンプルの０が付加され、０を含む２ＮサンプルがＦＦＴにより周波数領域の分割ＨＲＴＦブロックＨ０，０に変換される。同様に、分割ＨＲＩＲブロックｈ０，１，ｈ０，２，ｈ０，３を用いて周波数領域の分割ＨＲＴＦブロックＨ０，１，Ｈ０，２，Ｈ０，３がそれぞれ得られる。 0 of N samples are added after the divided HRIR block h0,0, and 2N samples including 0 are converted into divided HRTF blocks H0,0 in the frequency domain by FFT. Similarly, divided HRTF blocks H0, 1, H0, 2, H0, 3 in the frequency domain are obtained using the divided HRIR blocks h0, 1, h0, 2, h0, 3, respectively.

なお、分割ＨＲＩＲブロックｈ０，０，ｈ０，１，ｈ０，２，ｈ０，３の前にそれぞれＮサンプルの０が付加されてもよい。 Note that N samples of 0 may be added before the divided HRIR blocks h0,0, h0,1, h0,2, h0,3, respectively.

同様にして、反射音Ｖ１，Ｖ２，…，Ｖｋの各々の方向に対応する時間領域の頭部インパルス応答がＭ個の分割ＨＲＩＲブロックに分割され、Ｍ個の分割ＨＲＩＲブロックがＦＦＴによりＭ個の周波数領域の分割ＨＲＴＦブロックに変換される。 Similarly, the head impulse response in the time domain corresponding to each direction of the reflected sounds V1, V2,..., Vk is divided into M divided HRIR blocks, and M divided HRIR blocks are divided into M pieces by FFT. Converted to frequency domain split HRTF blocks.

図５はＨＲＴＦデータベース３に格納される複数組の分割ＨＲＴＦブロックを示す模式図である。 FIG. 5 is a schematic diagram showing a plurality of sets of divided HRTF blocks stored in the HRTF database 3.

図５に示すように、ＨＲＴＦデータベース３には、ｋ個の方向に対応するｋ組の分割ＨＲＴＦブロックが予め格納されている。分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３は直接音Ｖ０の方向に対応する。分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３は反射音Ｖ１の方向に対応する。分割ＨＲＴＦブロックＨ２，０，Ｈ２，１，Ｈ２，２，Ｈ２，３は反射音Ｖ２の方向に対応する。分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３は反射音Ｖｋの方向に対応する。 As shown in FIG. 5, the HRTF database 3 stores in advance k sets of divided HRTF blocks corresponding to k directions. The divided HRTF blocks H0, 0, H0, 1, H0, 2, H0, 3 correspond to the direction of the direct sound V0. The divided HRTF blocks H1, 0, H1, 1, H1, 2, H1, 3 correspond to the direction of the reflected sound V1. The divided HRTF blocks H2,0, H2,1, H2,2, H2,3 correspond to the direction of the reflected sound V2. The divided HRTF blocks Hk, 0, Hk, 1, Hk, 2, Hk, 3 correspond to the direction of the reflected sound Vk.

（４）音響ブロック
図６は時間領域の原音響信号および周波数領域の音響ブロックの説明図である。図６において、時間は右から左へ経過する。 (4) Acoustic Block FIG. 6 is an explanatory diagram of the original acoustic signal in the time domain and the acoustic block in the frequency domain. In FIG. 6, time elapses from right to left.

原音響信号ＶＩＮにおいて、単位ブロックｖｎが現在入力されている。単位ブロックｖｎ−１，ｖｎ−２，ｖｎ−３，ｖｎ−４は、それぞれ１回前、２回前、３回前および４回前に入力された単位ブロックである。各単位ブロックｖｎ，ｖｎ−１，ｖｎ−２，ｖｎ−３，ｖｎ−４のサイズはＮサンプルである。 In the original sound signal VIN, the unit block vn is currently input. The unit blocks vn-1, vn-2, vn-3, and vn-4 are unit blocks that are input one time before, two times before, three times before, and four times before, respectively. The size of each unit block vn, vn-1, vn-2, vn-3, vn-4 is N samples.

単位ブロックｖｎ，ｖｎ−１からなる信号部分ｘｎがＦＦＴにより周波数領域の音響ブロックＸｎに変換される。同様に、単位ブロックｖｎ−１，ｖｎ−２からなる信号部分ｘｎ−１がＦＦＴにより周波数領域の音響ブロックＸｎ−１に変換され、単位ブロックｖｎ−２，ｖｎ−３からなる信号部分ｘｎ−２がＦＦＴにより周波数領域の音響ブロックＸｎ−２に変換され、単位ブロックｖｎ−３，ｖｎ−４からなる信号部分ｘｎ−３がＦＦＴにより周波数領域の音響ブロックＸｎ−３に変換される。音響ブロックＸｎ，Ｘｎ−１，Ｘｎ−２，Ｘｎ−３は図１の周波数領域音響バッファ１５に順次格納される。 A signal portion xn composed of unit blocks vn and vn−1 is converted into an acoustic block Xn in the frequency domain by FFT. Similarly, the signal part xn-1 composed of the unit blocks vn-1 and vn-2 is converted into an acoustic block Xn-1 in the frequency domain by FFT, and the signal part xn-2 composed of the unit blocks vn-2 and vn-3. Is converted to a frequency domain acoustic block Xn-2 by FFT, and a signal portion xn-3 composed of unit blocks vn-3 and vn-4 is converted to a frequency domain acoustic block Xn-3 by FFT. The acoustic blocks Xn, Xn-1, Xn-2, and Xn-3 are sequentially stored in the frequency domain acoustic buffer 15 of FIG.

ここで、１度のＦＦＴで処理される信号部分のサイズをＦＦＴサイズと呼ぶ。図６の例では、ＦＦＴサイズは２Ｎサンプルである。また、時間軸上で各ＦＦＴの対象である単位ブロックとその前のＦＦＴの対象である単位ブロックとのずれ量をＦＦＴシフトサイズと呼ぶ。図６の例では、ＦＦＴシフトサイズＳＳはＮサンプルであり、単位ブロックのサイズに等しい。この場合、ＦＦＴサイズはＦＦＴシフトサイズＳＳの２倍となっている。なお、ＦＦＴサイズとＦＦＴシフトサイズＳＳとの関係は、本例に限定されず、ＦＦＴサイズがＦＦＴシフトサイズＳＳの２倍以外（例えば４倍）のサイズであってもよい。 Here, the size of the signal portion processed by one FFT is called the FFT size. In the example of FIG. 6, the FFT size is 2N samples. Also, the shift amount between the unit block that is the object of each FFT on the time axis and the unit block that is the object of the previous FFT is referred to as the FFT shift size. In the example of FIG. 6, the FFT shift size SS is N samples and is equal to the size of the unit block. In this case, the FFT size is twice the FFT shift size SS. The relationship between the FFT size and the FFT shift size SS is not limited to this example, and the FFT size may be a size other than twice the FFT shift size SS (for example, four times).

（５）遅延量の調整
図７は直接音Ｖ０および反射音Ｖ１，Ｖ２，…，Ｖｋに対応する頭部インパルス応答、分割ＨＲＴＦブロック、調整前の遅延量、調整後の遅延量および遅延ブロック数を示す図である。図７において、Ｍ１，Ｍ２，…Ｍｋは整数である。 (5) Adjustment of delay amount FIG. 7 shows head impulse response, divided HRTF block, delay amount before adjustment, delay amount after adjustment, and number of delay blocks corresponding to direct sound V0 and reflected sounds V1, V2,. FIG. In FIG. 7, M1, M2,... Mk are integers.

直接音Ｖ０には、時間領域の頭部インパルス応答ｈ０および周波数領域の１組の分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３が対応する。直接音Ｖ０の調整前の遅延量はｄ０であり、調整後の遅延量もｄ０であり、遅延ブロック数は０である。 The head sound response h0 in the time domain and a set of divided HRTF blocks H0, 0, H0, 1, H0, 2, H0, 3 in the frequency domain correspond to the direct sound V0. The delay amount before adjustment of the direct sound V0 is d0, the delay amount after adjustment is also d0, and the number of delay blocks is zero.

反射音Ｖ１には、時間領域の頭部インパルス応答ｈ１および周波数領域の１組の分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３が対応する。反射音Ｖ１の調整前の遅延量はｄ１である。図１の遅延量調整部７は、反射音Ｖ１の遅延量をｄ０＋Ｍ１×ＳＳ／ｆｓに調整する。この場合、遅延ブロック数はＭ１である。 The reflected sound V1 corresponds to a head impulse response h1 in the time domain and a set of divided HRTF blocks H1, 0, H1, 1, H1, 2, H1, 3 in the frequency domain. The delay amount before adjustment of the reflected sound V1 is d1. The delay amount adjustment unit 7 in FIG. 1 adjusts the delay amount of the reflected sound V1 to d0 + M1 × SS / fs. In this case, the number of delay blocks is M1.

同様に、反射音Ｖ２には、時間領域の頭部インパルス応答ｈ２および周波数領域の１組の分割ＨＲＴＦブロックＨ２，０，Ｈ２，１，Ｈ２，２，Ｈ２，３が対応する。反射音Ｖ２の調整前の遅延量はｄ２であり、調整後の遅延量はｄ０＋Ｍ２×ＳＳ／ｆｓであり、遅延ブロック数はＭ２である。また、反射音Ｖｋには、時間領域の頭部インパルス応答ｈｋおよび周波数領域の１組の分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３が対応する。反射音Ｖｋの調整前の遅延量はｄｋであり、調整後の遅延量はｄ０＋Ｍｋ×ＳＳ／ｆｓであり、遅延ブロック数はＭｋである。 Similarly, the head sound response h2 in the time domain and a set of divided HRTF blocks H2,0, H2,1, H2,2, H2,3 in the frequency domain correspond to the reflected sound V2. The delay amount before adjustment of the reflected sound V2 is d2, the delay amount after adjustment is d0 + M2 × SS / fs, and the number of delay blocks is M2. Further, the reflected sound Vk corresponds to a head-time impulse response hk in the time domain and a set of divided HRTF blocks Hk, 0, Hk, 1, Hk, 2, Hk, 3 in the frequency domain. The delay amount before adjustment of the reflected sound Vk is dk, the delay amount after adjustment is d0 + Mk × SS / fs, and the number of delay blocks is Mk.

本例では、反射音Ｖ１，Ｖ２，…，Ｖｋの遅延時間差は、Ｍ１×ＳＳ／ｆｓ、Ｍ２×ＳＳ／ｆｓおよびＭｋ×ＳＳ／ｆｓにそれぞれ調整される。すなわち、反射音Ｖ１，Ｖ２，…，Ｖｋの遅延時間差がＦＦＴシフトサイズＳＳに相当する時間の整数倍に調整される。 In this example, the delay time differences between the reflected sounds V1, V2,..., Vk are adjusted to M1 × SS / fs, M2 × SS / fs, and Mk × SS / fs, respectively. That is, the delay time difference between the reflected sounds V1, V2,..., Vk is adjusted to an integral multiple of the time corresponding to the FFT shift size SS.

（６）周波数領域での畳み込み演算
図８は周波数領域での分割ＨＲＴＦブロックと音響ブロックとの畳み込み演算を示す図である。図８において、時間は右から左に経過する。 (6) Convolution calculation in frequency domain FIG. 8 is a diagram showing a convolution calculation of a divided HRTF block and an acoustic block in the frequency domain. In FIG. 8, time elapses from right to left.

時間軸の左端部が現時点で入力されている原音響信号ＶＩＮの部分である。現時点では、図６に示したように、原音響信号ＶＩＮの２Ｎサンプルの部分がＦＦＴにより音響ブロックＸｎに変換される。音響ブロックＸｎ−１，Ｘｎ−２，…，Ｘｎ−１２は、既に図１の周波数領域音響バッファ１５に格納されている。 The left end of the time axis is the portion of the original sound signal VIN that is currently input. At present, as shown in FIG. 6, the 2N sample portion of the original sound signal VIN is converted into the sound block Xn by FFT. The acoustic blocks Xn-1, Xn-2,..., Xn-12 are already stored in the frequency domain acoustic buffer 15 of FIG.

図８の例では、反射音Ｖ１の遅延時間差ＤＬ１はＦＦＴシフトサイズＳＳに相当する時間の３倍であり、遅延ブロック数Ｍ１は３である。反射音Ｖｋの遅延時間差ＤＬｋはＦＦＴシフトサイズＳＳに相当する時間の９倍であり、遅延ブロック数Ｍｋは９である。 In the example of FIG. 8, the delay time difference DL1 of the reflected sound V1 is three times the time corresponding to the FFT shift size SS, and the number of delay blocks M1 is 3. The delay time difference DLk of the reflected sound Vk is nine times the time corresponding to the FFT shift size SS, and the number of delay blocks Mk is nine.

図５のＨＲＴＦデータベース３に格納された複数組の分割ＨＲＴＦブロックから、直接音Ｖ０に対応する分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３が選択される。また、反射音Ｖ１に対応する分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３が選択され、反射音Ｖｋに対応する分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３が選択される。 The divided HRTF blocks H0,0, H0,1, H0,2, H0,3 corresponding to the direct sound V0 are selected from a plurality of sets of divided HRTF blocks stored in the HRTF database 3 of FIG. Also, the divided HRTF blocks H1, 0, H1, 1, H1, 2, H1, 3 corresponding to the reflected sound V1 are selected, and the divided HRTF blocks Hk, 0, Hk, 1, Hk, 2 corresponding to the reflected sound Vk are selected. , Hk, 3 are selected.

直接音Ｖ０については、周波数領域で分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３と音響ブロックＸｎ，Ｘｎ−１，Ｘｎ−２，Ｘｎ−３との畳み込み演算が行われ、畳み込み演算結果Ｙ０が得られる。反射音Ｖ１については、周波数領域で分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３と（遅延ブロック数Ｍ１が３であるので）音響ブロックＸｎ−３，Ｘｎ−４，Ｘｎ−５，Ｘｎ−６との畳み込み演算が行われ、畳み込み演算結果Ｙ１が得られる。反射音Ｖｋについては、周波数領域で分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３と（遅延ブロック数Ｍｋが９であるので）音響ブロックＸｎ−９，Ｘｎ−１０，Ｘｎ−１１，Ｘｎ−１２との畳み込み演算が行われ、畳み込み演算結果Ｙｋが得られる。畳み込み演算の詳細については後述する。 For the direct sound V0, a convolution operation of the divided HRTF blocks H0, 0, H0, 1, H0, 2, H0, 3 and the acoustic blocks Xn, Xn-1, Xn-2, Xn-3 is performed in the frequency domain. A convolution operation result Y0 is obtained. For the reflected sound V1, the divided HRTF blocks H1, 0, H1, 1, H1, 2, H1, 3 and the acoustic blocks Xn-3, Xn-4, Xn (because the delay block number M1 is 3) in the frequency domain. A convolution operation with -5 and Xn-6 is performed, and a convolution operation result Y1 is obtained. For the reflected sound Vk, the divided HRTF blocks Hk, 0, Hk, 1, Hk, 2, Hk, 3 and the acoustic blocks Xn-9, Xn-10, Xn in the frequency domain (since the delay block number Mk is 9). A convolution operation with -11 and Xn-12 is performed, and a convolution operation result Yk is obtained. Details of the convolution operation will be described later.

図１の仮想音源位置算出部５により反射音Ｖ１について算出された振幅減衰量に応じたゲインが畳み込み演算結果Ｙ１に乗算される。同様に、反射音Ｖｋについて算出された振幅減衰量に応じたゲインが畳み込み演算結果Ｙｋに乗算される。それにより、畳み込み演算結果Ｙ１，…，Ｙｋの振幅が調整される。なお、振幅減衰量が０の場合にはゲインは１となる。畳み込み演算結果Ｙ０および振幅調整後の畳み込み演算結果Ｙ１，…，Ｙｋが複素ベクトル加算され、加算結果がＩＦＦＴにより時間領域の音響信号ｙｎに変換される。 The convolution calculation result Y1 is multiplied by a gain corresponding to the amplitude attenuation amount calculated for the reflected sound V1 by the virtual sound source position calculation unit 5 of FIG. Similarly, the convolution calculation result Yk is multiplied by a gain corresponding to the amplitude attenuation calculated for the reflected sound Vk. Thereby, the amplitudes of the convolution calculation results Y1,..., Yk are adjusted. When the amplitude attenuation amount is 0, the gain is 1. The convolution calculation result Y0 and the amplitude adjustment convolution calculation results Y1,..., Yk are added as complex vectors, and the addition result is converted into an acoustic signal yn in the time domain by IFFT.

図９は時間領域での音響信号のつなぎ合わせを示す図である。図９に示すように、今回の処理で得られた音響信号ｙｎの前半部分のＮサンプルが破棄される。音響信号ｙｎの後半部分のＮサンプルが前回の処理で得られた音響信号ｙｎ−１の後半部分のＮサンプルとつなぎ合わさせる。この操作が順次行われることにより音響信号ＶＯＵＴが逐次出力される。 FIG. 9 is a diagram showing stitching of acoustic signals in the time domain. As shown in FIG. 9, the N samples in the first half of the acoustic signal yn obtained by the current process are discarded. The N samples in the latter half of the acoustic signal yn are joined to the N samples in the latter half of the acoustic signal yn-1 obtained in the previous process. By sequentially performing these operations, the acoustic signal VOUT is sequentially output.

なお、図４に示される周波数領域の各分割ＨＲＴＦブロックの算出の際に各分割ＨＲＩＲブロックの前にそれぞれＮサンプルの０が付加された場合には、今回の処理で得られた音響信号ｙｎの後半部分のＮサンプルが破棄され、音響信号ｙｎの前半部分のＮサンプルが前回の処理で得られた音響信号ｙｎ−１の前半部分のＮサンプルとつなぎ合わさせる。 In addition, when N samples of 0 are added before each divided HRIR block at the time of calculation of each divided HRTF block in the frequency domain shown in FIG. 4, the acoustic signal yn obtained in this processing is The N samples in the latter half are discarded, and the N samples in the first half of the acoustic signal yn are joined with the N samples in the first half of the acoustic signal yn-1 obtained in the previous processing.

（７）音響信号処理装置の全体の動作
図１０は図１の音響信号処理装置１００により行われる音響信号処理を示すフローチャートである。図１０の音響信号処理は、図３のＣＰＵ１１０がＲＯＭ１２０または記憶装置１４０に記憶された音響信号処理プログラムを実行することに行われる。 (7) Overall Operation of Acoustic Signal Processing Device FIG. 10 is a flowchart showing acoustic signal processing performed by the acoustic signal processing device 100 of FIG. The acoustic signal processing in FIG. 10 is performed by the CPU 110 in FIG. 3 executing the acoustic signal processing program stored in the ROM 120 or the storage device 140.

図１の部屋形状指示部１は、部屋形状を指示する部屋データを出力する（ステップＳ１）。また、主音源位置指示部２は、指示された部屋形状を有する仮想空間内での主音源Ｓ０の位置を示す位置データを出力する（ステップＳ２）。 The room shape instructing unit 1 in FIG. 1 outputs room data for instructing the room shape (step S1). The main sound source position instruction unit 2 outputs position data indicating the position of the main sound source S0 in the virtual space having the instructed room shape (step S2).

次に、仮想音源位置算出部５は、部屋データおよび主音源Ｓ０の位置データに基づいて、複数の仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置を算出する（ステップＳ３）。それにより、仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置を示す位置データが出力される。 Next, the virtual sound source position calculation unit 5 calculates the positions of the plurality of virtual sound sources S1, S2,..., Sk based on the room data and the position data of the main sound source S0 (step S3). Thereby, position data indicating the positions of the virtual sound sources S1, S2,.

遅延量算出部６は、主音源Ｓ０および仮想音源Ｓ１，Ｓ２，…，Ｓｋの位置データに基づいて、直接音Ｖ０および反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量をそれぞれ算出する（ステップＳ４）。遅延量調整部７は、複数の反射音Ｖ１，Ｖ２，…，Ｖｋの遅延時間差をＦＦＴシフトサイズに相当する時間（＝ＳＳ／ｆｓ）の整数倍に調整する（ステップＳ５）。遅延ブロック数算出部８は、調整後の遅延時間差に基づいて複数の反射音Ｖ１，Ｖ２，…，Ｖｋについての遅延ブロック数を算出する。 The delay amount calculation unit 6 calculates the delay amounts of the direct sound V0 and the reflected sounds V1, V2,..., Vk based on the position data of the main sound source S0 and the virtual sound sources S1, S2,. ). The delay amount adjusting unit 7 adjusts the delay time difference between the plurality of reflected sounds V1, V2,..., Vk to an integral multiple of the time corresponding to the FFT shift size (= SS / fs) (step S5). The delay block number calculation unit 8 calculates the number of delay blocks for a plurality of reflected sounds V1, V2,..., Vk based on the adjusted delay time difference.

ＨＲＴＦブロック選択部４は、主音源位置指示部２および仮想音源位置算出部５から出力される位置データに基づいて、ＨＲＴＦデータベース３に格納される複数組の分割ＨＲＴＦブロックから、直接音Ｖ０および複数の反射音Ｖ１，Ｖ２，…，Ｖｋに対応する分割ＨＲＴＦブロックを選択する（ステップＳ６）。 Based on the position data output from the main sound source position instructing unit 2 and the virtual sound source position calculating unit 5, the HRTF block selecting unit 4 receives the direct sound V0 and the plurality of direct sounds V0 from a plurality of sets of divided HRTF blocks stored in the HRTF database 3. The divided HRTF blocks corresponding to the reflected sounds V1, V2,..., Vk are selected (step S6).

畳み込み演算部１０、時間領域変換部１１、音響信号出力部１２および周波数領域変換部１４は畳み込み演算処理を行う（ステップＳ７）。 The convolution operation unit 10, the time domain conversion unit 11, the acoustic signal output unit 12, and the frequency domain conversion unit 14 perform a convolution operation process (step S7).

図１１は畳み込み演算処理の詳細を示すフローチャートである。図１１における変数ｎは現在の処理を意味し、変数ｎの値は０から１ずつ増加する。Ｍは頭部伝達関数の分割数（分割ＨＲＴＦブロックの数）を表し、ｋは反射音の数を表す。Ｍ１，…，Ｍｋは、反射音Ｖ１，…，Ｖｋについての遅延ブロック数を表す。 FIG. 11 is a flowchart showing details of the convolution operation processing. The variable n in FIG. 11 means the current process, and the value of the variable n is incremented by 1 from 0. M represents the number of divisions of the head-related transfer function (number of divided HRTF blocks), and k represents the number of reflected sounds. M1,..., Mk represent the number of delay blocks for the reflected sounds V1,.

初期状態では変数ｎの値は０である（ステップＳ１１）。図１の周波数領域変換部１４は、サンプリング周波数ｆｓの原音響信号ＶＩＮの信号部分ｘｎをＦＦＴにより音響ブロックＸｎに変換する（ステップＳ１２）。信号部分ｘｎは、原音響信号ＶＩＮから現在取得した単位ブロックｖｎと前回取得した単位ブロックｖｎ−１とからなる（図６参照）。また、周波数領域変換部１４は、音響ブロックＸｎを周波数領域音響バッファ１５に格納する（ステップＳ１３）。後述するステップＳ３４で変数ｎの値が増加するにしたがって周波数領域音響バッファ１５に順次音響ブロックＸｎが格納される。 In the initial state, the value of the variable n is 0 (step S11). The frequency domain converter 14 in FIG. 1 converts the signal portion xn of the original acoustic signal VIN having the sampling frequency fs into the acoustic block Xn by FFT (step S12). The signal portion xn includes a unit block vn currently acquired from the original sound signal VIN and a unit block vn-1 acquired last time (see FIG. 6). Moreover, the frequency domain conversion part 14 stores the acoustic block Xn in the frequency domain acoustic buffer 15 (step S13). The acoustic block Xn is sequentially stored in the frequency domain acoustic buffer 15 as the value of the variable n increases in step S34 described later.

ステップＳ１４〜Ｓ１９では、直接音Ｖ０についての畳み込み演算結果Ｙ０が算出される。ステップＳ２０〜Ｓ２５では、反射音Ｖ１についての畳み込み演算結果Ｙ１が算出され、ステップＳ２６〜Ｓ３１では、反射音Ｖｋについての畳み込み演算結果Ｙｋが算出される。ステップＳ１４〜Ｓ１９の処理、ステップＳ２０〜Ｓ２５の処理およびステップＳ２６〜Ｓ３１の処理は、並列的に実行される。 In steps S14 to S19, the convolution calculation result Y0 for the direct sound V0 is calculated. In steps S20 to S25, the convolution calculation result Y1 for the reflected sound V1 is calculated, and in steps S26 to S31, the convolution calculation result Yk for the reflected sound Vk is calculated. Steps S14 to S19, steps S20 to S25, and steps S26 to S31 are executed in parallel.

畳み込み演算部１０は、まず、変数ｍの値を初期値０に設定し（ステップＳ１４，Ｓ２０，Ｓ２６）、畳み込み演算結果Ｙ０，Ｙ１，…Ｙｋを初期値０に設定する（ステップＳ１５，Ｓ２１，Ｓ２７）。次に、畳み込み演算部１０は、音響ブロックＸｎ−ｍと分割ＨＲＴＦブロックＨ０，ｍとの複素ベクトル乗算を行い、Ｙ＝Ｘｎ−ｍ＊Ｈ０，ｍを畳み込み演算結果として算出する（ステップＳ１６）。次に、畳み込み演算部１０は、前回の畳み込み演算結果Ｙ０に今回の畳み込み演算結果Ｙを複素ベクトル加算する（ステップＳ１７）。その後、変数ｍに１を加算し（ステップＳ１８）、変数ｍがＭ−１よりも大きいか否かを判定する（ステップＳ１９）。変数ｍがＭ−１になるまで、ステップＳ１６〜Ｓ１９の処理が繰り返し行われる。それにより、Ｙ０＝Ｘｎ＊Ｈ０，０＋Ｘｎ−１＊Ｈ０，１＋Ｘｎ−２＊Ｈ０，２＋…＋Ｘｎ−Ｍ＋１＊Ｈ０，Ｍ−１が算出される。ここで、「＊」は複素ベクトル乗算を意味し、「＋」は複素ベクトル加算を意味する。図８の例では、Ｍ＝４であるため、Ｙ０＝Ｘｎ＊Ｈ０，０＋Ｘｎ−１＊Ｈ０，１＋Ｘｎ−２＊Ｈ０，２＋Ｘｎ−３＊Ｈ０，３が算出される。 First, the convolution operation unit 10 sets the value of the variable m to the initial value 0 (steps S14, S20, S26), and sets the convolution operation results Y0, Y1,... Yk to the initial value 0 (steps S15, S21, S21). S27). Next, the convolution operation unit 10 performs complex vector multiplication of the acoustic block Xn-m and the divided HRTF blocks H0, m, and calculates Y = Xn-m * H0, m as a convolution operation result (step S16). Next, the convolution operation unit 10 adds the current convolution operation result Y to the previous convolution operation result Y0 by a complex vector (step S17). Thereafter, 1 is added to the variable m (step S18), and it is determined whether or not the variable m is larger than M−1 (step S19). Until the variable m becomes M-1, the processes of steps S16 to S19 are repeated. Thereby, Y0 = Xn * H0, 0 + Xn-1 * H0, 1 + Xn-2 * H0, 2+... + Xn-M + 1 * H0, M-1 is calculated. Here, “*” means complex vector multiplication, and “+” means complex vector addition. In the example of FIG. 8, since M = 4, Y0 = Xn * H0, 0 + Xn-1 * H0, 1 + Xn-2 * H0, 2 + Xn-3 * H0,3 are calculated.

上記の畳み込み演算において、音響ブロックＸｎ−１，Ｘｎ−２，…，Ｘｎ−Ｍ＋１は、以前の処理で既に算出され、周波数領域音響バッファ１５に格納されている。 In the above convolution calculation, the acoustic blocks Xn−1, Xn−2,..., Xn−M + 1 have already been calculated in the previous processing and stored in the frequency domain acoustic buffer 15.

同様にして、ステップＳ２２〜Ｓ２５において、Ｙ１＝Ｘｎ−Ｍ１＊Ｈ１，０＋Ｘｎ−１−Ｍ１＊Ｈ１，１＋Ｘｎ−２−Ｍ１＊Ｈ１，２＋…＋Ｘｎ−Ｍ＋１−Ｍ１＊Ｈ１，Ｍ−１が算出される。ここで、Ｍ１は反射音Ｖ１の遅延ブロック数である。図８の例では、Ｍ＝４であり、Ｍ１＝３であるため、Ｙ１＝Ｘｎ−３＊Ｈ１，０＋Ｘｎ−４＊Ｈ１，１＋Ｘｎ−５＊Ｈ１，２＋Ｘｎ−６＊Ｈ１，３が算出される。 Similarly, in steps S22 to S25, Y1 = Xn−M1 * H1, 0 + Xn−1−M1 * H1,1 + Xn−2M1 * H1,2 +... + Xn−M + 1−M1 * H1, M−1 is calculated. The Here, M1 is the number of delay blocks of the reflected sound V1. In the example of FIG. 8, since M = 4 and M1 = 3, Y1 = Xn−3 * H1, 0 + Xn−4 * H1,1 + Xn−5 * H1, + Xn−6 * H1,3 is calculated. .

上記の畳み込み演算において、音響ブロックＸｎ−Ｍ１，Ｘｎ−１−Ｍ１，Ｘｎ−２−Ｍ１…，Ｘｎ−Ｍ＋１−Ｍ１は、以前の処理で既に算出され、周波数領域音響バッファ１５に格納されている。 In the above convolution calculation, the acoustic blocks Xn-M1, Xn-1-M1, Xn-2-M1,..., Xn-M + 1-M1 have already been calculated in the previous processing and stored in the frequency domain acoustic buffer 15. .

また、ステップＳ２８〜Ｓ３１において、Ｙ１＝Ｘｎ−Ｍｋ＊Ｈｋ，０＋Ｘｎ−１−Ｍｋ＊Ｈｋ，１＋Ｘｎ−２−Ｍｋ＊Ｈｋ，２＋…＋Ｘｎ−Ｍ＋１−Ｍｋ＊Ｈｋ，Ｍ−１が算出される。ここで、Ｍｋは反射音Ｖｋの遅延ブロック数である。図８の例では、Ｍ＝４であり、Ｍｋ＝９であるため、Ｙ１＝Ｘｎ−９＊Ｈｋ，０＋Ｘｎ−１０＊Ｈｋ，１＋Ｘｎ−１１＊Ｈｋ，２＋Ｘｎ−１２＊Ｈｋ，３が算出される。 In steps S28 to S31, Y1 = Xn-Mk * Hk, 0 + Xn-1-Mk * Hk, 1 + Xn-2-Mk * Hk, 2+... + Xn-M + 1-Mk * Hk, M-1 are calculated. Here, Mk is the number of delay blocks of the reflected sound Vk. In the example of FIG. 8, since M = 4 and Mk = 9, Y1 = Xn-9 * Hk, 0 + Xn-10 * Hk, 1 + Xn-11 * Hk, 2 + Xn-12 * Hk, 3 are calculated. .

上記の畳み込み演算において、音響ブロックＸｎ−Ｍｋ，Ｘｎ−１−Ｍｋ，Ｘｎ−２−Ｍｋ，…，Ｘｎ−Ｍ＋１−Ｍｋは、以前の処理で既に算出され、周波数領域音響バッファ１５に格納されている。 In the above convolution calculation, the acoustic blocks Xn-Mk, Xn-1-Mk, Xn-2-Mk,..., Xn-M + 1-Mk are already calculated in the previous processing and stored in the frequency domain acoustic buffer 15. Yes.

時間領域変換部１１は、畳み込み演算結果Ｙ０，Ｙ１，…，Ｙｋを周波数領域で複素ベクトル加算し、複素ベクトル加算の結果をＩＦＦＴによりサンプリング周波数ｆｓの時間領域の音響信号ｙｎに変換する（ステップＳ３２）。音響信号出力部１２は、時間領域の音響信号ｙｎを出力する（ステップＳ３３）。その後、変数ｎの値が１増加され（ステップＳ３４）、ステップＳ１２〜Ｓ３４の処理が行われる。上記のように、音響信号ｙｎの前半部分が破棄され、残りの後半部分が前回の処理で得られた音響信号ｙｎ−１の後半部分につなぎ合わされる。 The time domain transforming unit 11 performs complex vector addition on the convolution calculation results Y0, Y1,..., Yk in the frequency domain, and converts the result of the complex vector addition into an acoustic signal yn in the time domain having the sampling frequency fs by IFFT (step S32). ). The acoustic signal output unit 12 outputs the time domain acoustic signal yn (step S33). Thereafter, the value of the variable n is incremented by 1 (step S34), and the processes of steps S12 to S34 are performed. As described above, the first half of the acoustic signal yn is discarded, and the remaining second half is joined to the second half of the acoustic signal yn-1 obtained in the previous process.

（８）実施の形態の効果
本実施の形態に係る音響信号処理装置１００によれば、反射音Ｖ１，Ｖ２，…，Ｖｋと直接音Ｖ０との遅延時間差がＦＦＴシフトサイズに相当する時間の整数倍に調整されるので、反射音Ｖ１，Ｖ２，…，Ｖｋに対応する音響ブロックとして、既に算出された直接音Ｖ０に対応する音響ブロックを用いることができる。そのため、反射音Ｖ１，Ｖ２，…，Ｖｋに対応する音響ブロックを得るためのＦＦＴが不要である。また、直接音Ｖ０および反射音Ｖ１，Ｖ２，…，Ｖｋについての畳み込み演算結果Ｙ０，Ｙ１，…Ｙｋの複素ベクトル加算が周波数領域で行われるので、１回のＩＦＦＴにより時間領域の音響信号ＶＯＵＴを得ることができる。一方、畳み込み演算結果の加算が時間領域で行われる場合には、１つのＦＦＴに対して（ｋ＋１）回のＩＦＦＴが必要となると考えられる。 (8) Effects of the Embodiment According to the acoustic signal processing apparatus 100 according to the present embodiment, an integer of time in which the delay time difference between the reflected sounds V1, V2,..., Vk and the direct sound V0 corresponds to the FFT shift size. Since the adjustment is performed twice, the acoustic block corresponding to the direct sound V0 that has already been calculated can be used as the acoustic block corresponding to the reflected sounds V1, V2,. Therefore, FFT for obtaining acoustic blocks corresponding to the reflected sounds V1, V2,. Further, the convolution calculation results Y0, Y1,... Yk for the direct sound V0 and the reflected sounds V1, V2,..., Vk are added in the frequency domain, so that the time domain acoustic signal VOUT is obtained by one IFFT. Can be obtained. On the other hand, when the convolution calculation results are added in the time domain, it is considered that (k + 1) times of IFFTs are required for one FFT.

これらにより、畳み込み演算処理における演算回数を低減することができる。その結果、音響信号ＶＯＵＴを出力するための演算処理における処理量を低減することが可能となる。 As a result, the number of calculations in the convolution calculation process can be reduced. As a result, it is possible to reduce the amount of processing in the arithmetic processing for outputting the acoustic signal VOUT.

また、分割ＨＲＴＦブロックを用いた分割オーバラップセーブ法が用いられるので、単位ブロックのサイズを小さくすることができる。それにより、ＦＦＴおよびＩＦＦＴにおける乗算回数を低減することができる。したがって、音響信号ＶＯＵＴを出力するための演算処理における処理量をより低減することが可能となる。 Further, since the divided overlap save method using divided HRTF blocks is used, the size of the unit block can be reduced. Thereby, the number of multiplications in FFT and IFFT can be reduced. Therefore, it is possible to further reduce the processing amount in the arithmetic processing for outputting the acoustic signal VOUT.

さらに、ＦＦＴシフトサイズが単位ブロックのサイズと等しいため、単位ブロックのサイズを小さくすることにより、遅延時間差の調整による誤差および畳み込み演算における遅延時間を低減することができる。それにより、受音点Ｒに到来する音をより高い精度で再現することができる。 Further, since the FFT shift size is equal to the size of the unit block, it is possible to reduce the error due to the adjustment of the delay time difference and the delay time in the convolution calculation by reducing the size of the unit block. Thereby, the sound arriving at the sound receiving point R can be reproduced with higher accuracy.

以上の結果、音の再現精度を低下させることなく音響信号処理装置１００の低コスト化および小型化が可能となる。 As a result, the cost and size of the acoustic signal processing apparatus 100 can be reduced without reducing the sound reproduction accuracy.

（９）演算回数の比較
（ａ）本実施の形態および参考形態における演算回数
以下、本実施の形態に係る畳み込み演算処理における演算回数を参考形態に係る畳み込み演算処理における演算回路と比較する。 (9) Comparison of the number of operations (a) Number of operations in the present embodiment and the reference embodiment Hereinafter, the number of operations in the convolution operation processing according to the present embodiment is compared with the operation circuit in the convolution operation processing according to the reference embodiment.

参考形態における音響信号処理では、反射音Ｖ１，Ｖ２，…，Ｖｋの遅延量の調整が行われない。したがって、反射音Ｖ１，Ｖ２，…，Ｖｋの遅延時間差は、ＦＦＴシフトサイズに相当する時間の整数倍とはならない。 In the acoustic signal processing in the reference form, the delay amount of the reflected sounds V1, V2,..., Vk is not adjusted. Therefore, the delay time difference between the reflected sounds V1, V2,..., Vk is not an integral multiple of the time corresponding to the FFT shift size.

図１２は参考形態に係る畳み込み演算処理における周波数領域での分割ＨＲＴＦブロックと音響ブロックとの畳み込み演算を示す図である。図１２において、時間は右から左に経過する。 FIG. 12 is a diagram showing a convolution operation between the divided HRTF block and the sound block in the frequency domain in the convolution operation processing according to the reference embodiment. In FIG. 12, time elapses from right to left.

図１２の例では、反射音Ｖ１の遅延時間差ｄｌ１および反射音Ｖｋの遅延時間差ｄｌｋはＦＦＴシフトサイズＳＳに相当する時間の整数倍ではない。原音響信号ＶＩＮのＦＦＴにより直接音Ｖ０に対応する音響ブロックＸ０，ｎ，Ｘ０，ｎ−１，Ｘ０，ｎ−２，Ｘ０，ｎ−３を算出するとともに、反射音Ｖ１に対応する音響ブロックＸ１，ｎ，Ｘ１，ｎ−１，Ｘ１，ｎ−２，Ｘ１，ｎ−３および反射音Ｖｋに対応する音響ブロックＸｋ，ｎ，Ｘｋ，ｎ−１，Ｘｋ，ｎ−２，Ｘｋ，ｎ−３をそれぞれ算出する必要がある。 In the example of FIG. 12, the delay time difference dl1 of the reflected sound V1 and the delay time difference dlk of the reflected sound Vk are not integer multiples of the time corresponding to the FFT shift size SS. The acoustic block X0, n, X0, n-1, X0, n-2, X0, n-3 corresponding to the direct sound V0 is calculated by the FFT of the original acoustic signal VIN, and the acoustic block X1 corresponding to the reflected sound V1 , N, X1, n-1, X1, n-2, X1, n-3 and the acoustic block Xk, n, Xk, n-1, Xk, n-2, Xk, n-3 corresponding to the reflected sound Vk Need to be calculated respectively.

直接音Ｖ０については、周波数領域で分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３と音響ブロックＸ０，ｎ，Ｘ０，ｎ−１，Ｘ０，ｎ−２，Ｘ０，ｎ−３との畳み込み演算が行われ、周波数領域の音響信号Ｙ０が得られる。反射音Ｖ１については、周波数領域で分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３と音響ブロックＸ１，ｎ，Ｘ１，ｎ−１，Ｘ１，ｎ−２，Ｘ１，ｎ−３との畳み込み演算が行われ、周波数領域の音響信号Ｙ１が得られる。反射音Ｖｋについては、周波数領域で分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３と音響ブロックＸｋ，ｎ，Ｘｋ，ｎ−１，Ｘｋ，ｎ−２，Ｘｋ，ｎ−３との畳み込み演算が行われる。 For the direct sound V0, divided HRTF blocks H0,0, H0,1, H0,2, H0,3 and acoustic blocks X0, n, X0, n-1, X0, n-2, X0, n- in the frequency domain. 3 is performed, and an acoustic signal Y0 in the frequency domain is obtained. For the reflected sound V1, the divided HRTF blocks H1, 0, H1, 1, H1, 2, H1, 3 and the acoustic blocks X1, n, X1, n-1, X1, n-2, X1, n− are divided in the frequency domain. 3 is performed, and an acoustic signal Y1 in the frequency domain is obtained. For the reflected sound Vk, the divided HRTF blocks Hk, 0, Hk, 1, Hk, 2, Hk, 3 and the acoustic blocks Xk, n, Xk, n-1, Xk, n-2, Xk, n- in the frequency domain. A convolution operation with 3 is performed.

図１３は参考形態に係る畳み込み演算処理の詳細を示すフローチャートである。 FIG. 13 is a flowchart showing details of the convolution calculation processing according to the reference embodiment.

初期状態では変数ｎの値は０である（ステップＳ５１）。ステップＳ５２〜Ｓ５９では、直接音Ｖ０についての畳み込み演算結果Ｙ０が算出される。ステップＳ６０〜Ｓ６７では、反射音Ｖ１についての畳み込み演算結果Ｙ１が算出され、ステップＳ６８〜Ｓ７５では、反射音Ｖｋについての畳み込み演算結果Ｙｋが算出される。 In the initial state, the value of the variable n is 0 (step S51). In steps S52 to S59, the convolution calculation result Y0 for the direct sound V0 is calculated. In steps S60 to S67, the convolution calculation result Y1 for the reflected sound V1 is calculated, and in steps S68 to S75, the convolution calculation result Yk for the reflected sound Vk is calculated.

直接音Ｖ０について、原音響信号ＶＩＮの信号部分ｘ０，ｎがＦＦＴにより音響ブロックＸ０，ｎに変換され（ステップＳ５２）、音響ブロックＸ０，ｎが周波数領域音響バッファ１５に格納される（ステップＳ５３）。また、反射音Ｖ１について、原音響信号ＶＩＮの信号部分ｘ１，ｎがＦＦＴにより音響ブロックＸ１，ｎに変換され（ステップＳ６０）、音響ブロックＸ１，ｎが周波数領域音響バッファ１５に格納される（ステップＳ６１）。同様に、反射音Ｖｋについて、原音響信号ＶＩＮの信号部分ｘｋ，ｎがＦＦＴにより音響ブロックＸｋ，ｎに変換され（ステップＳ６８）、音響ブロックＸｋ，ｎが周波数領域音響バッファ１５に格納される（ステップＳ６９）。 For the direct sound V0, the signal portion x0, n of the original sound signal VIN is converted into the sound block X0, n by FFT (step S52), and the sound block X0, n is stored in the frequency domain sound buffer 15 (step S53). . For the reflected sound V1, the signal parts x1 and n of the original sound signal VIN are converted into sound blocks X1 and n by FFT (step S60), and the sound blocks X1 and n are stored in the frequency domain sound buffer 15 (step S60). S61). Similarly, for the reflected sound Vk, the signal part xk, n of the original sound signal VIN is converted to the sound block Xk, n by FFT (step S68), and the sound block Xk, n is stored in the frequency domain sound buffer 15 ( Step S69).

ステップＳ５４〜Ｓ５９において、直接音Ｖ０について、Ｙ０＝Ｘ０，ｎ＊Ｈ０，０＋Ｘ０，ｎ−１＊Ｈ０，１＋Ｘ０，ｎ−２＊Ｈ０，２＋…＋Ｘ０，ｎ−ｍ＊Ｈ０，ｍが算出される。ステップＳ６２〜Ｓ６７において、反射音Ｖ１について、Ｙ１＝Ｘ１，ｎ＊Ｈ１，０＋Ｘ１，ｎ−１＊Ｈ１，１＋Ｘ１，ｎ−２＊Ｈ１，２＋…＋Ｘ１，ｎ−ｍ＊Ｈ１，ｍが算出される。ステップＳ６８〜Ｓ７５において、反射音Ｖｋについて、Ｙｋ＝Ｘｋ，ｎ＊Ｈｋ，０＋Ｘｋ，ｎ−１＊Ｈｋ，１＋Ｘｋ，ｎ−２＊Ｈｋ，２＋…＋Ｘｋ，ｎ−ｍ＊Ｈｋ，ｍが算出される。 In steps S54 to S59, Y0 = X0, n * H0, 0 + X0, n-1 * H0, 1 + X0, n-2 * H0, 2+... + X0, n−m * H0, m is calculated for the direct sound V0. . In steps S62 to S67, Y1 = X1, n * H1, 0 + X1, n-1 * H1, 1 + X1, n-2 * H1, 2... + X1, n−m * H1, m is calculated for the reflected sound V1. . In steps S68 to S75, Yk = Xk, n * Hk, 0 + Xk, n-1 * Hk, 1 + Xk, n-2 * Hk, 2+... + Xk, n−m * Hk, m are calculated for the reflected sound Vk. .

ステップＳ７６〜Ｓ７８の処理は、図１１のステップＳ３１〜Ｓ３３の処理と同様である。 The processing in steps S76 to S78 is the same as the processing in steps S31 to S33 in FIG.

ここで、図１１の実施の形態に係る畳み込み演算処理における演算回数と図１３の参考形態に係る畳み込み演算処理における演算回数とを比較する。 Here, the number of calculations in the convolution calculation process according to the embodiment of FIG. 11 is compared with the number of calculations in the convolution calculation process according to the reference form of FIG.

単位ブロックのサイズをＮサンプルとし、頭部伝達関数の分割数をＭとし、主音源および仮想音源の数をｋとする。この場合、ＦＦＴの対象となるサンプル数は２Ｎとなる。 The size of the unit block is N samples, the division number of the head-related transfer function is M, and the number of main sound sources and virtual sound sources is k. In this case, the number of samples to be subjected to FFT is 2N.

ＦＦＴでの乗算回数およびＩＦＦＴでの乗算回数をそれぞれＯＡとし、ループを含む複素ベクトル積での乗算回数をＯＢとすると、乗算回数ＯＡ，ＯＢは次式のようになる。 When the number of multiplications in the FFT and the number of multiplications in the IFFT is OA, and the number of multiplications in the complex vector product including the loop is OB, the multiplication numbers OA and OB are expressed by the following equations.

ＯＡ＝２×（２Ｎ）×ｌｏｇ２（２Ｎ）
ＯＢ＝Ｍ×４×Ｎ
図１１の実施の形態における演算回数ＰＩは、次式のようになる。 OA = 2 × (2N) × log2 (2N)
OB = M × 4 × N
The number of computations PI in the embodiment of FIG.

ＰＩ＝ＯＡ＋ｋ×ＯＢ＋ＯＡ
図１３の参考形態における演算回数ＰＲは、次式のようになる。 PI = OA + k × OB + OA
The number of calculations PR in the reference form of FIG.

ＰＲ＝ｋ×（ＯＡ＋ＯＢ）＋ＯＡ
単位ブロックのサイズＮを３２サンプルとし、頭部伝達関数の分割数Ｍを４とし、主音源および仮想音源の数ｋを１００とすると、乗算回数ＯＡ，ＯＢは次のようになる。 PR = k × (OA + OB) + OA
Assuming that the size N of the unit block is 32 samples, the division number M of the head-related transfer function is 4, and the number k of the main sound source and the virtual sound source is 100, the multiplication times OA and OB are as follows.

ＯＡ＝２×（２×３２）×ｌｏｇ２（２×３２）＝７６８
ＯＢ＝４×４×３２＝５１２
これにより、図１１の実施の形態における演算回数ＰＩは、次式のようになる。 OA = 2 * (2 * 32) * log2 (2 * 32) = 768
OB = 4 × 4 × 32 = 512
As a result, the number of operations PI in the embodiment of FIG.

ＰＩ＝７６８＋１００×５１２＋７６８＝５２７３６
一方、図１３の参考形態における演算回数ＰＲは、次式のようになる。 PI = 768 + 100 × 512 + 768 = 52736
On the other hand, the number of operations PR in the reference form of FIG.

ＰＲ＝１００×（７６８＋５１２）＋７６８＝１２８７７８
演算回数ＰＩと演算回数ＰＲとの比は次のように算出される。 PR = 100 × (768 + 512) + 768 = 128778
The ratio between the calculation number PI and the calculation number PR is calculated as follows.

ＰＩ／ＰＲ＝５２７３６／１２８７７８≒０．４
したがって、本実施の形態に係る畳み込み演算処理によれば、参考形態に係る畳み込み演算処理に比べて演算回数が約６０％削減される。仮想音源の数（反射音の数）が増加するほど、演算回数の削減の効果は顕著となる。 PI / PR = 52736 / 128778≈0.4
Therefore, according to the convolution operation processing according to the present embodiment, the number of operations is reduced by about 60% compared to the convolution operation processing according to the reference embodiment. As the number of virtual sound sources (the number of reflected sounds) increases, the effect of reducing the number of computations becomes more prominent.

なお、上記の演算回数の比較では、加算回数およびバッファに対する読み書きについては考慮していない。 In the comparison of the number of operations described above, the number of additions and reading / writing with respect to the buffer are not considered.

（ｂ）時間領域の畳み込み演算処理における演算回数
次に、時間領域の畳み込み演算処理を用いた音響信号処理における演算回数を算出する。 (B) Number of Calculations in Time Domain Convolution Calculation Processing Next, the number of calculations in acoustic signal processing using time domain convolution calculation processing is calculated.

時間領域の畳み込み演算処理における演算回数ＯＴは、次式のようになる。 The number of operations OT in the time domain convolution operation processing is expressed by the following equation.

ＯＴ＝ｋ×Ｍ×Ｎ２
単位ブロックのサイズＮを３２サンプルとし、頭部伝達関数の分割数Ｍを４とし、主音源および仮想音源の数ｋを１００とすると、演算回数ＯＴは次のようになる。 OT = k × M × N2
When the size N of the unit block is 32 samples, the division number M of the head related transfer function is 4, and the number k of the main sound source and the virtual sound source is 100, the number of operations OT is as follows.

ＯＴ＝１００×４×３２２＝４０９６００
これにより、図１１の実施の形態における演算回数ＰＩは、次式のようになる。 OT = 100 × 4 × 322 = 409600
As a result, the number of operations PI in the embodiment of FIG.

本実施の形態に係る畳み込み演算処理における演算回数ＰＩと時間領域の畳み込み演算処理における演算回数ＯＴとの比は次のように算出される。 The ratio between the number of operations PI in the convolution operation processing according to the present embodiment and the number of operations OT in the time domain convolution operation processing is calculated as follows.

ＰＩ／ＯＴ＝５２７３６／４０９６００≒０．１３
したがって、本実施の形態に係る畳み込み演算処理によれば、時間領域の畳み込み演算処理に比べて、演算回数が約８７％削減される。仮想音源の数（反射音の数）が増加するほど、演算回数の削減の効果は顕著となる。 PI / OT = 52736 / 409600≈0.13
Therefore, according to the convolution calculation process according to the present embodiment, the number of calculations is reduced by about 87% compared to the time domain convolution calculation process. As the number of virtual sound sources (the number of reflected sounds) increases, the effect of reducing the number of computations becomes more prominent.

（ｃ）遅延量の調整による誤差
ＦＦＴシフトサイズＳＳを３２サンプルとした場合、反射音Ｖ１，Ｖ２，…，Ｖｋと直接音Ｖ０との間の遅延時間差の調整による遅延量の誤差は、最大１６サンプルに相当する時間である。サンプリング周波数を４８ｋＨｚとした場合、遅延量の誤差は次式のように算出される。 (C) Error due to adjustment of delay amount When the FFT shift size SS is 32 samples, the error of the delay amount due to adjustment of the delay time difference between the reflected sounds V1, V2,... This is the time corresponding to the sample. When the sampling frequency is 48 kHz, the delay amount error is calculated as follows.

１６／４８０００［Ｈｚ］≒０．０００３３［ｓｅｃ］＝０．３３［ｍｓｅｃ］
この遅延量の誤差に相当する距離の誤差は次式により算出される。 16/48000 [Hz] ≈0.00033 [sec] = 0.33 [msec]
The distance error corresponding to the delay amount error is calculated by the following equation.

０．０００３３［ｓｅｃ］×３４０［ｍ／ｓｅｃ］≒０．１１［ｍ］＝１１［ｃｍ］
仮想空間のサイズが１１ｃｍ程度変化した場合の反射音の変化が音像定位および音の広がり感に与える影響はほとんどないと考えられる。 0.00033 [sec] × 340 [m / sec] ≈0.11 [m] = 11 [cm]
It is considered that the change in reflected sound when the size of the virtual space changes by about 11 cm has little influence on the sound image localization and the sound spread.

ＦＦＴシフトサイズＳＳを１６サンプルとした場合には、遅延量の誤差に相当する距離の誤差は約５．６ｃｍとなり、音像定位および音の広がり感に与える影響はさらに小さくなる。 When the FFT shift size SS is 16 samples, the distance error corresponding to the delay amount error is about 5.6 cm, and the influence on the sound image localization and the sound spread is further reduced.

（１０）他の実施の形態
（ａ）上記の実施の形態では、畳み込み演算処理に分割オーバラップセーブ法を用いているが、本発明はこれに限定されない。例えば、畳み込み演算処理に分割ＨＲＴＦブロックを用いたオーバラップアド（Overlap-Add）法を用いてもよい。以下、分割ＨＲＴＦブロックを用いたオーバラップアド法を分割オーバラップアド法と呼ぶ。 (10) Other Embodiments (a) In the above embodiment, the division overlap save method is used for the convolution operation processing, but the present invention is not limited to this. For example, an overlap-add method using divided HRTF blocks may be used for convolution calculation processing. Hereinafter, the overlap add method using the divided HRTF block is referred to as a divided overlap add method.

図１４は分割オーバラップアド法を用いた場合の時間領域の原音響信号および周波数領域の音響ブロックの説明図である。図１４において、時間は右から左へ経過する。 FIG. 14 is an explanatory diagram of a time-domain original sound signal and a frequency-domain sound block when the divided overlap add method is used. In FIG. 14, time elapses from right to left.

分割オーバラップアド法では、原音響信号ＶＩＮにおいて、現在入力されているＮサンプルの単位ブロックｖｎにＮサンプルの０が付加され、２Ｎサンプルの信号部分ｘｎがＦＦＴにより音響ブロックＸｎに変換される。同様に、単位ブロックｖｎ−１にＮサンプルの０が付加され、２Ｎサンプルの信号部分ｘｎ−１がＦＦＴにより音響ブロックＸｎ−１に変換される。また、単位ブロックｖｎ−２にＮサンプルの０が付加され、２Ｎサンプルの信号部分ｘｎ−２がＦＦＴにより音響ブロックＸｎ−２に変換される。さらに、単位ブロックｖｎ−３にＮサンプルの０が付加され、２Ｎサンプルの信号部分ｘｎ−３がＦＦＴにより音響ブロックＸｎ−３に変換される。この場合にも、ＦＦＴシフトサイズＳＳはＮサンプルである。周波数領域での畳み込み演算は、分割オーバラップセーブ法を用いた場合と同様である。 In the divided overlap add method, in the original sound signal VIN, 0 of N samples is added to the unit block vn of N samples currently input, and the signal portion xn of 2N samples is converted into the sound block Xn by FFT. Similarly, 0 of N samples is added to the unit block vn-1, and a signal portion xn-1 of 2N samples is converted into an acoustic block Xn-1 by FFT. Further, 0 of N samples is added to the unit block vn-2, and a signal portion xn-2 of 2N samples is converted into an acoustic block Xn-2 by FFT. Further, 0 of N samples is added to the unit block vn-3, and a signal portion xn-3 of 2N samples is converted into an acoustic block Xn-3 by FFT. Also in this case, the FFT shift size SS is N samples. The convolution operation in the frequency domain is the same as when the division overlap save method is used.

図１５は分割オーバラップアド法を用いた場合の時間領域での音響信号のつなぎ合わせを示す図である。図１５に示すように、今回の処理で得られた音響信号ｙｎの前半部分のＮサンプルと前回の処理で得られた音響信号ｙｎ−１の後半部分のＮサンプルとが加算される。この操作が順次行われることにより音響信号ＶＯＵＴが逐次出力される。 FIG. 15 is a diagram showing stitching of acoustic signals in the time domain when the divided overlap add method is used. As shown in FIG. 15, the N samples in the first half of the acoustic signal yn obtained by the current processing and the N samples in the second half of the acoustic signal yn-1 obtained by the previous processing are added. By sequentially performing these operations, the acoustic signal VOUT is sequentially output.

（ｂ）上記実施の形態では、周波数領域の頭部伝達関数が複数の分割ＨＲＴＦブロックに分割されているが、これに限定されない。本発明は、頭部伝達関数の分割数Ｍが１の場合にも適用される。分割数Ｍが１の場合のオーバラップセーブ法は通常のオーバラップセーブ法であり、分割数Ｍが１の場合のオーバラップアド法は通常のオーバラップセーブ法である。 (B) In the above embodiment, the head-related transfer function in the frequency domain is divided into a plurality of divided HRTF blocks. However, the present invention is not limited to this. The present invention is also applied when the division number M of the head-related transfer function is 1. The overlap save method when the division number M is 1 is a normal overlap save method, and the overlap add method when the division number M is 1 is a normal overlap save method.

ここで、時間領域の音響信号の単位ブロックが例えば１２８サンプルからなるものとする。通常のオーバラップセーブ法では、１２８サンプルの時間領域の頭部インパルス応答に１２８サンプルの０を付加し、合計２５６サンプルをＦＦＴにより周波数領域の頭部伝達関数に変換する。また、今回入力された１２８サンプルの音響信号と前回入力された１２８サンプルの音響信号とからなる２５６サンプルの信号部分をＦＦＴにより周波数領域の音響ブロックに変換する。その後、周波数領域の頭部伝達関数と周波数領域の音響ブロックとを複素ベクトル乗算し、乗算結果をＩＦＦＴにより２５６サンプルの時間領域の音響信号に変換する。最後に、時間領域の音響信号の半分を破棄し、残りの１２８サンプルの音響信号を得る。今回得られた１２８サンプルの音響信号を前回得られた１２８サンプルの音響信号につなぎ合わせる。 Here, it is assumed that the unit block of the acoustic signal in the time domain consists of, for example, 128 samples. In the normal overlap-save method, 128 samples of 0 are added to the 128-sample time-domain head impulse response, and a total of 256 samples are converted into a frequency-domain head-related transfer function by FFT. Also, a 256-sample signal portion composed of the 128-sample acoustic signal input this time and the 128-sample acoustic signal input last time is converted into a frequency-domain acoustic block by FFT. Thereafter, the head-related transfer function in the frequency domain and the acoustic block in the frequency domain are multiplied by a complex vector, and the multiplication result is converted into an acoustic signal in the time domain of 256 samples by IFFT. Finally, half of the time domain acoustic signal is discarded to obtain the remaining 128 samples of the acoustic signal. The 128-sample acoustic signal obtained this time is connected to the 128-sample acoustic signal obtained last time.

通常のオーバラップアド法が通常のオーバラップセーブ法と異なるのは次の点である。今回入力された１２８サンプルの時間領域の音響信号に１２８サンプルの０を付加し、０を含む２５６サンプルの信号部分をＦＦＴにより周波数領域の音響ブロックに変換する。ＩＦＦＴにより得られた２５６サンプルの時間領域の音響信号を前回得られた２５６サンプルの時間領域の音響信号と１２８サンプル分重なるように加算する。 The normal overlap add method differs from the normal overlap save method in the following points. A 128-sample 0 is added to the 128-sample time-domain sound signal input this time, and a 256-sample signal portion including 0 is converted into a frequency-domain sound block by FFT. The 256-sample time-domain acoustic signal obtained by IFFT is added so as to overlap the 256-sample time-domain acoustic signal obtained last time by 128 samples.

（ｃ）上記実施の形態では、本発明が仮想空間における音を再現するために用いられるが、本発明はこれに限定されない。本発明は、実際の音響空間における音を再現するための残響付与装置に適用することも可能である。この場合、周波数領域の頭部伝達関数の代わりに、インパルス応答をＦＦＴすることにより得られる周波数領域の音響伝達関数が用いられる。 (C) In the above embodiment, the present invention is used to reproduce sound in a virtual space, but the present invention is not limited to this. The present invention can also be applied to a reverberation imparting device for reproducing sound in an actual acoustic space. In this case, a frequency domain acoustic transfer function obtained by performing FFT on the impulse response is used instead of the frequency domain head related transfer function.

（ｄ）上記実施の形態では、音響信号入力部１３が原音響信号ＶＩＮを入力し、音響信号出力部１２が音響信号ｙｎを出力するが、本発明はこれに限定されない。音響信号入力部１３がＷＡＶファイル等のファイル形式の原音響信号を入力してもよく、音響信号出力部１２がＷＡＶファイル等のファイル形式の音響信号を出力してもよい。また、本発明は、音響シミュレーションを行うための音響シミュレーション装置に適用することも可能である。 (D) In the above embodiment, the acoustic signal input unit 13 inputs the original acoustic signal VIN and the acoustic signal output unit 12 outputs the acoustic signal yn. However, the present invention is not limited to this. The acoustic signal input unit 13 may input an original acoustic signal in a file format such as a WAV file, and the acoustic signal output unit 12 may output an acoustic signal in a file format such as a WAV file. The present invention can also be applied to an acoustic simulation apparatus for performing acoustic simulation.

（ｅ）図１１のステップＳ３３において、音響信号出力部１２は、音響信号ｙｎを図７の遅延量ｄ０分遅延させて出力してもよい。 (E) In step S33 of FIG. 11, the acoustic signal output unit 12 may delay and output the acoustic signal yn by the delay amount d0 of FIG.

（ｆ）上記実施の形態では、原音声信号ＶＩＮの全体の周波数帯域について図１０および図１１の音響信号処理が行われるが、これに限定されない。例えば、原音声信号ＶＩＮの全体の周波数帯域が高域および低域に分割され、高域および低域の各々について上記の音響信号処理が行われてもよい。 (F) In the above embodiment, the acoustic signal processing of FIG. 10 and FIG. 11 is performed for the entire frequency band of the original audio signal VIN, but the present invention is not limited to this. For example, the entire frequency band of the original audio signal VIN may be divided into a high band and a low band, and the above acoustic signal processing may be performed for each of the high band and the low band.

（ｇ）上記実施の形態では、時間領域の原音響信号を周波数領域の音響ブロックに変換するための時間−周波数変換としてＦＦＴを用いているが、本発明はこれに限定されない。時間−周波数変換として、例えばラプラス変換、Ｚ変換またはメリン（Ｍｅｌｌｉｎ）変換等の他の直交変換を用いてもよい。また、上記実施の形態では、周波数領域の畳み込み演算結果の加算結果を時間領域の音響信号に変換するための周波数−時間変換としてＩＦＦＴを用いているが、本発明はこれに限定されない。周波数−時間変換として、例えば逆ラプラス変換、逆Ｚ変換または逆メリン変換等の他の逆直交変換を用いてもよい。 (G) In the above embodiment, FFT is used as time-frequency conversion for converting an original sound signal in the time domain into an acoustic block in the frequency domain, but the present invention is not limited to this. As time-frequency conversion, other orthogonal transforms such as Laplace transform, Z transform, and Mellin transform may be used. Moreover, in the said embodiment, although IFFT is used as frequency-time conversion for converting the addition result of the convolution calculation result of a frequency domain into the acoustic signal of a time domain, this invention is not limited to this. Other inverse orthogonal transforms such as inverse Laplace transform, inverse Z transform or inverse Merin transform may be used as the frequency-time transform.

（ｈ）上記実施の形態では、音響信号処理装置１００の全体が同一のサンプリング周波数ｆｓで動作するが、これに限定されない。音響信号処理装置１００の一部が適宜サンプリング周波数変換処理を行うことによりサンプリング周波数ｆｓとは異なるサンプリング周波数で動作してもよい。 (H) In the above embodiment, the entire acoustic signal processing apparatus 100 operates at the same sampling frequency fs, but is not limited to this. A part of the acoustic signal processing apparatus 100 may operate at a sampling frequency different from the sampling frequency fs by appropriately performing a sampling frequency conversion process.

（ｉ）上記実施の形態では、ＨＲＴＦデータベース３に複数組の分割ＨＲＴＦブロックが記憶されているが、例えば、複数組の分割ＨＲＴＦブロックがインターネット上のサーバ等に記憶され、音響信号処理装置１００がサーバ等から複数組の分割ＨＲＴＦブロックをダウンロードして用いてもよい。この場合、音響信号処理装置１００がＨＲＴＦデータベース３を備えなくてもよい。 (I) In the above embodiment, a plurality of sets of divided HRTF blocks are stored in the HRTF database 3, but for example, a plurality of sets of divided HRTF blocks are stored in a server or the like on the Internet. A plurality of sets of divided HRTF blocks may be downloaded from a server or the like. In this case, the acoustic signal processing apparatus 100 may not include the HRTF database 3.

（ｊ）上記実施の形態では、単一の音響信号処理装置１００について説明しているが、左耳用および右耳用の一対の音響信号処理装置１００が設けられてもよい。この場合、図１に示される複数の構成要素のうち一部の構成要素が左耳用および右耳用の音響信号処理装置１００に共通に用いられてもよい。 (J) Although the single acoustic signal processing apparatus 100 has been described in the above embodiment, a pair of acoustic signal processing apparatuses 100 for the left ear and the right ear may be provided. In this case, some of the components shown in FIG. 1 may be commonly used for the left ear and right ear acoustic signal processing apparatuses 100.

（１１）請求項の各構成要素と実施の形態の各部との対応
以下、請求項の各構成要素と実施の形態の各部との対応の例について説明するが、本発明は下記の例に限定されない。 (11) Correspondence between each constituent element of claim and each part of the embodiment Hereinafter, an example of correspondence between each constituent element of the claim and each part of the embodiment will be described, but the present invention is limited to the following example. Not.

上記実施の形態では、主音源Ｓ０が第１の音源の例であり、仮想音源Ｓ１，Ｓ２，…，Ｓｋが第２の音源の例であり、受音点Ｒが受音点の例であり、直接音Ｖ０が第１の音の例であり、反射音Ｖ１，Ｖ２，…，Ｖｋが第２の音の例である。 In the above embodiment, the main sound source S0 is an example of the first sound source, the virtual sound sources S1, S2,..., Sk are examples of the second sound source, and the sound receiving point R is an example of the sound receiving point. The direct sound V0 is an example of the first sound, and the reflected sounds V1, V2,..., Vk are examples of the second sound.

遅延量算出部６が算出部の例であり、ＨＲＴＦデータベース３が記憶部の例であり、遅延量調整部７が調整部の例であり、音響ブロック選択部９が選択部の例であり、周波数領域変換部１４が第１の変換部の例であり、畳み込み演算部１０が演算部の例であり、時間領域変換部１１が第２の変換部の例である。 The delay amount calculation unit 6 is an example of a calculation unit, the HRTF database 3 is an example of a storage unit, the delay amount adjustment unit 7 is an example of an adjustment unit, and the acoustic block selection unit 9 is an example of a selection unit, The frequency domain conversion unit 14 is an example of a first conversion unit, the convolution calculation unit 10 is an example of a calculation unit, and the time domain conversion unit 11 is an example of a second conversion unit.

分割ＨＲＴＦブロックＨ０，０，Ｈ０，１，Ｈ０，２，Ｈ０，３が第１の音響伝達関数または複数の第１の分割伝達関数の例であり、分割ＨＲＴＦブロックＨ１，０，Ｈ１，１，Ｈ１，２，Ｈ１，３、分割ＨＲＴＦブロックＨ２，０，Ｈ２，１，Ｈ２，２，Ｈ２，３および分割ＨＲＴＦブロックＨｋ，０，Ｈｋ，１，Ｈｋ，２，Ｈｋ，３が第２の音響伝達関数または複数の第２の分割伝達関数の例であり、頭部インパルス応答ｈ０が第１の音響応答特性の例であり、頭部インパルス応答ｈ１，ｈ２，…，ｈｋが第２の音響応答特性の例であり、分割ＨＲＩＲブロックｈ０，０，ｈ０，１，ｈ０，２，ｈ０，３が複数の第１の分割応答特性の例である。 The divided HRTF blocks H0, 0, H0, 1, H0, 2, H0, 3 are examples of the first acoustic transfer function or the plurality of first divided transfer functions, and the divided HRTF blocks H1, 0, H1, 1, H1,2, H1,3, divided HRTF blocks H2,0, H2,1, H2,2, H2,3 and divided HRTF blocks Hk, 0, Hk, 1, Hk, 2, Hk, 3 are the second sound. It is an example of a transfer function or a plurality of second divided transfer functions, a head impulse response h0 is an example of a first acoustic response characteristic, and head impulse responses h1, h2,..., Hk are second acoustic responses. This is an example of characteristics, and divided HRIR blocks h0, 0, h0, 1, h0, 2, h0, 3 are examples of a plurality of first divided response characteristics.

原音響信号ＶＩＮが原音響信号の例であり、ＦＦＴシフトサイズＳＳが一定のシフト量の例であり、音響ブロックＸｎ，Ｘｎ−１，…，Ｘｎ−Ｍ＋１が第１の信号部分の例であり、音響ブロックＸｎ−Ｍ１，Ｘｎ−１−Ｍ１，…，Ｘｎ−Ｍ＋１−Ｍ１および音響ブロックＸｎ−Ｍｋ，Ｘｎ−１−Ｍｋ，…，Ｘｎ−Ｍ＋１−Ｍｋが第２の信号部分の例であり、音響信号ＶＯＵＴが時間領域の音響信号の例であり、Ｎサンプルが第１のサンプル数の例であり、２Ｎサンプルが第２のサンプル数の例である。 The original acoustic signal VIN is an example of the original acoustic signal, the FFT shift size SS is an example of a constant shift amount, and the acoustic blocks Xn, Xn−1,..., Xn−M + 1 are examples of the first signal portion. , Acoustic blocks Xn-M1, Xn-1-M1,..., Xn-M + 1-M1 and acoustic blocks Xn-Mk, Xn-1-Mk,. The acoustic signal VOUT is an example of a time domain acoustic signal, N samples are examples of the first number of samples, and 2N samples are examples of the second number of samples.

請求項の各構成要素として、請求項に記載されている構成または機能を有する他の種々の要素を用いることができる。 As each constituent element in the claims, various other elements having configurations or functions described in the claims can be used.

本発明は、音響空間における受音点に到来する音を再現すること等に利用することができる。 The present invention can be used to reproduce sound that arrives at a sound receiving point in an acoustic space.

１部屋形状指示部
２主音源位置指示部
３ＨＲＴＦデータベース
４ＨＲＴＦブロック選択部
５仮想音源位置算出部
６遅延量算出部
７遅延量調整部
８遅延ブロック数算出部
９音響ブロック選択部
１０畳み込み演算部
１１時間領域変換部
１２音響信号出力部
１３音響信号入力部
１４周波数領域変換部
１５周波数領域音響バッファ
１００音響信号処理装置
１１０ＣＰＵ
１２０ＲＯＭ
１３０ＲＡＭ
１４０記憶装置
１５０表示装置
１６０入力装置
１７０出力装置
３００仮想空間 DESCRIPTION OF SYMBOLS 1 Room shape instruction | indication part 2 Main sound source position instruction | indication part 3 HRTF database 4 HRTF block selection part 5 Virtual sound source position calculation part 6 Delay amount calculation part 7 Delay amount adjustment part 8 Delay block number calculation part 9 Acoustic block selection part 10 Convolution calculation part DESCRIPTION OF SYMBOLS 11 Time domain conversion part 12 Acoustic signal output part 13 Acoustic signal input part 14 Frequency domain conversion part 15 Frequency domain acoustic buffer 100 Acoustic signal processing apparatus 110 CPU
120 ROM
130 RAM
140 Storage Device 150 Display Device 160 Input Device 170 Output Device 300 Virtual Space

Claims

A first sound radiated by the first sound source and arriving at the sound receiving point and at least one first sound radiated by at least one second sound source and delayed from the first sound and arriving at the sound receiving point An acoustic signal processing apparatus that outputs an acoustic signal representing a sound obtained by mixing two sounds,
A calculation unit for calculating a delay time difference between the first sound and the second sound;
A first transform that obtains a frequency domain acoustic signal by sequentially time-frequency transforming an original acoustic signal representing the first sound emitted from the first sound source while shifting the original acoustic signal by a certain shift amount on the time axis. And
An adjustment unit that adjusts the delay time difference calculated by the calculation unit to an integral multiple of a time corresponding to the shift amount of the time-frequency conversion;
The first signal portion corresponding to the first sound is selected from the frequency domain acoustic signal obtained by the first conversion unit, and based on the delay time difference adjusted by the adjustment unit, the first A selection unit that selects a second signal portion corresponding to the second sound from the frequency domain acoustic signal obtained by the conversion unit;
A first convolution operation between the first sound transfer function from the first sound source to the sound receiving point and the first signal portion selected by the selection unit, and from the second sound source to the sound receiving point. An arithmetic unit that performs a second convolution operation between the second acoustic transfer function and the second signal portion selected by the selection unit in the frequency domain and adds the results of the first and second convolution operations When,
An acoustic signal processing apparatus comprising: a second conversion unit that converts a result of addition by the calculation unit into an acoustic signal in a time domain.

The first conversion unit sequentially obtains a unit block of the first number of samples from the original sound signal, and performs high-speed operation of the sound signal of the second number of samples that includes the unit block and is larger than the first number of samples. Fourier transform
The first conversion unit, the calculation unit, and the second conversion unit may perform the fast Fourier transform, the first and second convolution operations, and the time domain acoustic signal by an overlap save method or an overlap add method. Conversion to
The acoustic signal processing apparatus according to claim 1, wherein a shift amount of the fast Fourier transform is equal to the number of samples of the unit block.

The first acoustic transfer function includes a plurality of first divided transfer functions, and the plurality of first divided transfer functions is a first acoustic response in a time domain from the first sound source to the sound receiving point. A plurality of first divided response characteristics obtained by dividing the characteristics are obtained by fast Fourier transform,
The second acoustic transfer function includes a plurality of second divided transfer functions, and the plurality of second divided transfer functions are second acoustic responses in a time domain from the second sound source to the sound receiving point. A plurality of second division response characteristics obtained by characteristic division are obtained by fast Fourier transform,
The selection unit selects a first signal portion having a number corresponding to the number of divisions of the plurality of first division transfer functions, and a second number corresponding to the number of divisions of the plurality of second division transfer functions. Select the signal part of
The calculation unit includes the first convolution calculation of the plurality of first division transfer functions and the plurality of first signal portions selected by the selection unit, and the plurality of second division transfer functions and the selection. The acoustic signal processing device according to claim 2, wherein the second convolution operation with a plurality of second signal portions selected by the unit is performed in a frequency domain.

The first sound is a direct sound that arrives at the sound receiving point without being reflected from the first sound source, and the second sound is a reflected sound that is reflected while being reflected from the first sound source. The acoustic signal processing apparatus according to claim 1, wherein the second sound source is a virtual sound source that virtually radiates the reflected sound.

A first sound radiated by the first sound source and arriving at the sound receiving point and at least one first sound radiated by at least one second sound source and delayed from the first sound and arriving at the sound receiving point An acoustic signal processing program executable by a computer to output an acoustic signal representing a sound mixed with the sound of two,
A process of calculating a delay time difference between the first sound and the second sound;
A process of obtaining an acoustic signal in the frequency domain by performing time-frequency conversion while shifting the original acoustic signal representing the first sound radiated by the first sound source by a certain shift amount on the time axis;
A process of adjusting the calculated delay time difference to an integral multiple of a time corresponding to the shift amount of the time-frequency conversion;
A first signal portion corresponding to the first sound is selected from the frequency domain acoustic signal, and a first corresponding to the second sound is selected from the frequency domain acoustic signal based on the adjusted delay time difference. A process of selecting two signal parts;
A first convolution operation between the first acoustic transfer function from the first sound source to the sound receiving point and the selected first signal portion and a second convolution from the second sound source to the sound receiving point. A process of performing a second convolution operation of the acoustic transfer function and the selected second signal portion in the frequency domain, and adding the results of the first and second convolution operations;
A process of converting the result of the addition into an acoustic signal in a time domain,
An acoustic signal processing program to be executed by the computer.