JP5815435B2

JP5815435B2 - Sound source position determination apparatus, sound source position determination method, program

Info

Publication number: JP5815435B2
Application number: JP2012035131A
Authority: JP
Inventors: 賢一野口; 島内　末廣; 末廣島内; 仲大室; 羽田　陽一; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-02-21
Filing date: 2012-02-21
Publication date: 2015-11-17
Anticipated expiration: 2032-02-21
Also published as: JP2013170936A

Description

本発明は、音源がマイクロホンの近くにあるか、遠くにあるかを判定する音源位置判定装置、音源位置判定方法、プログラムに関する。 The present invention relates to a sound source position determination device, a sound source position determination method, and a program for determining whether a sound source is near or far from a microphone.

音源とマイクロホン間の距離を測定する方法として、例えば、特許文献１に示される方法がある。この方法では、複数のマイクロホンからなるマイクロホンアレーを用いて、入力信号に含まれる直接音と間接音（残響音）との比である直間比を求める。直間比はマイクロホンと音源の距離が大きくなるほど、単調に減少する特性を持つため、直間比を求めることで、マイクロホンと音源の距離を測定することができる。 As a method for measuring the distance between a sound source and a microphone, for example, there is a method disclosed in Patent Document 1. In this method, a direct ratio, which is a ratio of a direct sound and an indirect sound (reverberation sound) included in an input signal, is obtained using a microphone array composed of a plurality of microphones. Since the direct ratio has a characteristic that it decreases monotonically as the distance between the microphone and the sound source increases, the distance between the microphone and the sound source can be measured by obtaining the direct ratio.

特開２０１１−５３０６２号公報JP 2011-53062 A

しかしながら、特許文献１では、複数のマイクロホンからなるマイクロホンアレーを用いて、音源とマイクロホン間の距離を測定するため、複数マイクロホンデバイスのコスト、ディジタル信号に変換するためのＡ／Ｄ変換器のコスト、複数チャネル信号を処理するための演算量コスト等の装置コストが大きくなってしまうことが問題であった。そこで、本発明では、１チャネルマイクロホン入力信号のみを用いて、音源がマイクロホンの近くにあるか、遠くにあるかを判定することができる音源位置判定装置、音源位置判定方法、プログラムを提供することを目的とする。 However, in Patent Document 1, since a distance between a sound source and a microphone is measured using a microphone array including a plurality of microphones, the cost of a plurality of microphone devices, the cost of an A / D converter for converting to a digital signal, The problem is that the device cost such as the amount of calculation cost for processing a plurality of channel signals becomes large. Therefore, the present invention provides a sound source position determination device, a sound source position determination method, and a program capable of determining whether a sound source is near or far from a microphone using only a 1-channel microphone input signal. With the goal.

本発明の音源位置判定装置は、フレーム分割部と、特徴量計算部と、第１遠近判定部とを備える。フレーム分割部は、入力信号をフレーム毎に分割する。特徴量計算部は、入力信号に含まれる音源信号の直接音と間接音の到達時間差に基づく特徴量を計算する。第１遠近判定部は、計算された特徴量と予め定めたしきい値を比較して音源とマイクロホンの遠近を判定する。 The sound source position determination apparatus of the present invention includes a frame division unit, a feature amount calculation unit, and a first perspective determination unit. The frame dividing unit divides the input signal for each frame. The feature amount calculation unit calculates a feature amount based on the arrival time difference between the direct sound and the indirect sound of the sound source signal included in the input signal. The first perspective determination unit determines the perspective of the sound source and the microphone by comparing the calculated feature amount with a predetermined threshold.

本発明の音源位置判定装置によれば、１チャネルマイクロホン入力信号のみを用いて、音源がマイクロホンの近くにあるか、遠くにあるかを判定することができる。 According to the sound source position determination apparatus of the present invention, it is possible to determine whether the sound source is near or far from the microphone using only the 1-channel microphone input signal.

実施例１の音源位置判定装置の構成を示すブロック図。1 is a block diagram illustrating a configuration of a sound source position determination device according to a first embodiment. 実施例１の音源位置判定装置の動作を示すフローチャート。5 is a flowchart showing the operation of the sound source position determination apparatus according to the first embodiment. マイクロホンの近くで発生する衝撃音の時間波形及びパワースペクトル時間変化量を示す図。The figure which shows the time waveform and power spectrum time variation | change_quantity of the impact sound which generate | occur | produces near a microphone. マイクロホンの遠くで発生する衝撃音の時間波形及びパワースペクトル時間変化量を示す図。The figure which shows the time waveform and power spectrum time variation | change_quantity of the impact sound which generate | occur | produces in the distance of a microphone. 実施例２の音源位置判定装置の構成を示すブロック図。FIG. 6 is a block diagram illustrating a configuration of a sound source position determination device according to a second embodiment. 実施例２の音源位置判定装置の動作を示すフローチャート。9 is a flowchart showing the operation of the sound source position determination apparatus according to the second embodiment. 音源とマイクロホンと壁面の配置について例示する図。The figure which illustrates about arrangement | positioning of a sound source, a microphone, and a wall surface. 直接・間接音の到達時間差と音源−マイクロホン間距離との関係を示す図。The figure which shows the relationship between the arrival time difference of a direct and indirect sound, and the distance between a sound source and a microphone. 実施例３の音源位置判定装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of a sound source position determination device according to a third embodiment. 実施例３のパワー比特徴量計算部の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of a power ratio feature amount calculation unit according to the third embodiment. 実施例３の音源位置判定装置の動作を示すフローチャート。10 is a flowchart showing the operation of the sound source position determination apparatus according to the third embodiment. 実施例４の音源位置判定装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of a sound source position determination device according to a fourth embodiment. 実施例４の音源位置判定装置の動作を示すフローチャート。10 is a flowchart illustrating the operation of the sound source position determination device according to the fourth embodiment. 実施例５の音源位置判定装置の構成を示すブロック図。FIG. 9 is a block diagram illustrating a configuration of a sound source position determination device according to a fifth embodiment. 実施例５の音源位置判定装置の動作を示すフローチャート。10 is a flowchart illustrating the operation of the sound source position determination device according to the fifth embodiment. 実施例５の特徴量データベースの例を示す図。FIG. 10 is a diagram illustrating an example of a feature amount database according to the fifth embodiment.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。
＜本発明のポイント＞
マイクロホンの近くで音が発生した場合、直接音と間接音の到達時間差が生じるため、音発生直後においては、マイクロホン入力信号のほとんどの成分は直接音であり、間接音（残響音）は少ない。一方、マイクロホンの遠くで音が発生した場合、直接音と間接音の到達時間差が少ないため、音発生直後においてもマイクロホン入力信号は直接音と間接音の混合信号となる。本発明では、この差を利用して、音が発生した直後の信号を分析することで、直接音と間接音の特徴から、音がマイクロホンの近くで発生しているか、遠くで発生しているかを判定する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.
<Points of the present invention>
When a sound is generated near the microphone, a difference in arrival time between the direct sound and the indirect sound occurs. Therefore, immediately after the sound is generated, most components of the microphone input signal are direct sounds, and there are few indirect sounds (reverberation sounds). On the other hand, when a sound is generated far away from the microphone, the difference in arrival time between the direct sound and the indirect sound is small, so that the microphone input signal is a mixed signal of the direct sound and the indirect sound even immediately after the sound is generated. In the present invention, by using this difference and analyzing the signal immediately after the sound is generated, whether the sound is generated near the microphone or far away from the characteristics of the direct sound and the indirect sound. Determine.

図１、図２、図３、図４を参照して、実施例１の音源位置判定装置について詳細に説明する。図１は本実施例の音源位置判定装置１の構成を示すブロック図である。図２は本実施例の音源位置判定装置１の動作を示すフローチャートである。図３はマイクロホンの近くで発生する衝撃音の時間波形及びパワースペクトル時間変化量を示す図である。図４はマイクロホンの遠くで発生する衝撃音の時間波形及びパワースペクトル時間変化量を示す図である。 The sound source position determination apparatus according to the first embodiment will be described in detail with reference to FIGS. 1, 2, 3, and 4. FIG. 1 is a block diagram illustrating a configuration of a sound source position determination apparatus 1 according to the present embodiment. FIG. 2 is a flowchart showing the operation of the sound source position determination apparatus 1 of this embodiment. FIG. 3 is a diagram showing a time waveform of an impact sound generated near the microphone and a power spectrum time variation. FIG. 4 is a diagram showing a time waveform of an impact sound generated at a distance from the microphone and a power spectrum time variation.

本実施例の音源位置判定装置１は、マイクロホン１０と、フレーム分割部２０と、特徴量計算部３０と、遠近判定部４０と、しきい値記憶部５０とを備える。特徴量計算部３０は、周波数領域変換手段３１と、パワースペクトル計算手段３２と、パワースペクトル記憶手段３３と、パワースペクトル変化計算手段３４とを備える。 The sound source position determination apparatus 1 according to the present embodiment includes a microphone 10, a frame division unit 20, a feature amount calculation unit 30, a perspective determination unit 40, and a threshold storage unit 50. The feature amount calculation unit 30 includes a frequency domain conversion unit 31, a power spectrum calculation unit 32, a power spectrum storage unit 33, and a power spectrum change calculation unit 34.

本実施例では、物をたたく音や、ぶつかる音といった衝撃音を対象として、発生した音がマイクロホンの近くで発生しているか、遠くで発生しているかを判別する例について説明する。 In the present embodiment, an example will be described in which it is determined whether the generated sound is generated near the microphone or in the distance for impact sounds such as a hitting sound and a hitting sound.

マイクロホン１０は音を収音する（Ｓ１０）。例えばマイクロホン入力信号は、サンプリング周波数１６ｋＨｚ、量子化ビット数１６ｂｉｔのディジタル信号とすることができる。マイクロホン入力信号ｘ（ｎ）は、フレーム分割部２０に入力される。ここで、ｎは離散時間を表す。フレーム分割部２０は、マイクロホン入力信号をフレーム毎に分割し、フレーム分割信号ｘ_ｔ（ｎ）とする（Ｓ２０）。 The microphone 10 picks up sound (S10). For example, the microphone input signal can be a digital signal having a sampling frequency of 16 kHz and a quantization bit number of 16 bits. The microphone input signal x (n) is input to the frame dividing unit 20. Here, n represents discrete time. The frame division unit 20 divides the microphone input signal for each frame and sets it as a frame division signal x _t (n) (S20).

ここで、ｔはフレーム番号を表す。フレーム長は例えば２５６サンプル（１６ｍｓ）とすることができる。フレームシフト幅は例えば１０サンプル（０．６２５ｍｓ）とする。後段では、あるフレーム内で求めた特徴量と、次フレーム内で求めた特徴量を比較し判定を行うため、フレーム長とフレームシフト幅は、判定精度に大きく寄与する。衝撃音を対象とする時、１フレーム内に衝撃音が収まる長さ以上に設定した方が、精度が高い。しかし、フレーム長が長い場合、処理遅延も大きくなり、問題となる。また、フレーム長を必要以上に長くすると、フレーム内に対象とする音以外の音が混入し、精度低下の要因となる。例えば、衝撃音の長さを１０ｍｓと仮定して、フレーム長を１６ｍｓとすることができる。本方法では対象音が微細な時間に変化するか否かについて着目する。よって、フレームシフト幅は短くし、細かく分析した方が望ましい。しかし、フレームシフト幅を短くすることは、処理量の増加につながる。判別精度を保ち、短すぎないフレームシフト幅として、例えば１０サンプル程度を選択すればよい。フレーム分割部２０で分割された信号ｘ_ｔ（ｎ）は、周波数領域変換手段３１に送られる。周波数領域変換手段３１は、マイクロホン入力信号を周波数領域信号Ｘ_ｔ（ｋ）に変換する（Ｓ３１）。ここで、ｋは周波数を表す。周波数領域への変換方法として例えば、フレーム分割信号ｘ_ｔ（ｎ）にハニング窓ｗを乗じ、ＦＦＴ（Ｆａｓｔ＿Ｆｏｕｒｉｅｒ＿Ｔｒａｎｓｆｏｒｍ：高速フーリエ変換）を行えばよい。周波数領域信号Ｘ_ｔ（ｋ）は、パワースペクトル計算手段３２に送られる。パワースペクトル計算手段３２は、周波数領域信号からパワースペクトル信号を計算する（Ｓ３２）。具体的には、次式によりパワースペクトル信号Ｐ_ｔ（ｋ）が計算される。パワースペクトルを計算することで、フレーム間における信号の位相変化を無視できる利点がある。 Here, t represents a frame number. The frame length can be, for example, 256 samples (16 ms). The frame shift width is, for example, 10 samples (0.625 ms). In the subsequent stage, since the determination is performed by comparing the feature value obtained in a certain frame with the feature value obtained in the next frame, the frame length and the frame shift width greatly contribute to the determination accuracy. When the impact sound is targeted, it is more accurate to set the length to be longer than the impact sound can be contained in one frame. However, when the frame length is long, the processing delay becomes large, which causes a problem. Further, if the frame length is made longer than necessary, sounds other than the target sound are mixed in the frame, causing a reduction in accuracy. For example, assuming that the length of the impact sound is 10 ms, the frame length can be 16 ms. This method focuses on whether or not the target sound changes in a minute time. Therefore, it is desirable to make the frame shift width short and to analyze in detail. However, reducing the frame shift width leads to an increase in processing amount. For example, about 10 samples may be selected as the frame shift width that maintains the discrimination accuracy and is not too short. The signal x _t (n) divided by the frame dividing unit 20 is sent to the frequency domain conversion means 31. The frequency domain conversion means 31 converts the microphone input signal into a frequency domain signal X _t (k) (S31). Here, k represents a frequency. As a conversion method to the frequency domain, for example, the frame division signal x _t (n) may be multiplied by a Hanning window w and FFT (Fast_Fourier_Transform) may be performed. The frequency domain signal X _t (k) is sent to the power spectrum calculation means 32. The power spectrum calculation means 32 calculates a power spectrum signal from the frequency domain signal (S32). Specifically, the power spectrum signal P _t (k) is calculated by the following equation. By calculating the power spectrum, there is an advantage that the phase change of the signal between frames can be ignored.

パワースペクトル信号Ｐ_ｔ（ｋ）はパワースペクトル記憶手段３３に送られる。パワースペクトル記憶手段３３は、予め定めた時間区間ごとにパワースペクトル信号Ｐ_ｔ（ｋ）を記憶し、所定時間前の時間区間のパワースペクトル信号を出力する（Ｓ３３）。ここでは、１フレーム分のパワースペクトル信号を記憶することとし、現処理フレーム番号がｔの時、１フレーム前のパワースペクトル信号Ｐ_ｔ−１（ｋ）を出力する。次に、パワースペクトル変化計算手段３４は、現在のパワースペクトル信号と、所定時間前の時間区間のパワースペクトル信号とからパワースペクトル時間変化量を計算する（Ｓ３４）。ここでは現処理フレームのパワースペクトル信号Ｐ_ｔ（ｋ）と１フレーム前のパワースペクトル信号Ｐ_ｔ−１（ｋ）を入力し、パワースペクトル時間変化量Ｓ_ｔを出力する。次式に示す計算式で、パワースペクトルの時間変化を求める。ここで、Ｎはｋの最大値であり、ナイキスト周波数に相当する値とする。 The power spectrum signal P _t (k) is sent to the power spectrum storage means 33. The power spectrum storage means 33 stores the power spectrum signal P _t (k) for each predetermined time interval, and outputs the power spectrum signal of the time interval before a predetermined time (S33). Here, the power spectrum signal for one frame is stored, and when the current processing frame number is t, the power spectrum signal P _t-1 (k) of the previous frame is output. Next, the power spectrum change calculation means 34 calculates the power spectrum time change amount from the current power spectrum signal and the power spectrum signal in the time interval of a predetermined time (S34). Here, the power spectrum signal P _t (k) of the current processing frame and the power spectrum signal P _t−1 (k) of the previous frame are input, and the power spectrum time change amount _St is output. The time change of the power spectrum is obtained by the calculation formula shown below. Here, N is the maximum value of k and is a value corresponding to the Nyquist frequency.

遠近判定部４０は、パワースペクトル時間変化量Ｓ_ｔを入力とし、当該パワースペクトル時間変化量Ｓ_ｔと予め定めたしきい値を比較して音源とマイクロホンの遠近を判定する（Ｓ４０）。図３に示すように、マイクロホンの近くで発生する衝撃音では、音が発生した瞬間、パワースペクトル時間変化量Ｓ_ｔが急激に大きくなり、その後、急激に小さくなる。さらに、音が消える瞬間に再度急激に大きくなる。音が発生した直後の、パワースペクトル時間変化量が急激に小さくなるところに着目する。図４に示すように、マイクロホンの遠くで発生する衝撃音では、パワースペクトル時間変化量が急激に小さくなることはない。 Distance determining unit 40 inputs the power spectrum time variation S _t, by comparing a predetermined threshold with the power spectrum time variation S _t determines perspective of the sound source and the microphone (S40). As shown in FIG. 3, in the impact sound generated near the microphone, the power spectrum time variation _St is rapidly increased at the moment when the sound is generated, and then rapidly decreased. Furthermore, it suddenly increases again at the moment when the sound disappears. Attention is paid to the point where the amount of change in the power spectrum time immediately decreases immediately after the sound is generated. As shown in FIG. 4, the amount of change in the power spectrum time does not become abruptly reduced with an impact sound generated far from the microphone.

定常雑音のみがある場合、パワースペクトル時間変化量はほぼ一定の値となるため、この値Ｓ_ｃを基準として、しきい値係数Ｔ_ｃを乗じて、遠近判定のためのしきい値Ｔを決定する。パワースペクトル時間変化量がしきい値Ｔを下回る時、音はマイクロホンの近くで発生していると判定し、それ以外では、音はマイクロホンの遠くで発生していると判定し、結果を出力する。ここでは、定常雑音のみがある場合のパワースペクトル時間変化量平均値に０．７を乗じた値をしきい値Ｔとして設定する。これにより、マイクロホンからの距離が１ｍ以内で発生した音かどうかを判別可能とする。 If there is only stationary noise, the power spectrum time change amount for substantially a constant value, based on the value S _c, is multiplied by the threshold value coefficient T _c, determines the threshold T for the distance determination To do. When the power spectrum time change amount is below the threshold value T, it is determined that the sound is generated near the microphone. Otherwise, it is determined that the sound is generated far from the microphone, and the result is output. . Here, a value obtained by multiplying the average value of power spectrum time variation when there is only stationary noise by 0.7 is set as the threshold value T. Thereby, it is possible to determine whether the sound is generated within a distance of 1 m from the microphone.

しきい値はあらかじめ決めた固定値としてもよい。この場合は計算コストを削減することができる。また、パワースペクトル時間変化量Ｓ_ｔが急激に大きくなるところを捉え、その直後の信号をしきい値で判定しても良い。これにより、判定の精度を高めることができる。 The threshold value may be a fixed value determined in advance. In this case, calculation cost can be reduced. Further, it is also possible to catch the place where the power spectrum time change amount _St suddenly increases and determine the signal immediately after that with the threshold value. Thereby, the accuracy of determination can be improved.

このように、本実施例の音源位置判定装置１によれば、１チャネルマイクロホン入力信号のみを用いて、発生する音がマイクロホンの近くで発生しているか、遠くで発生しているかを判定することができる。これにより、マイクロホンアレーを用いる方法と比較して、装置コストを削減できる。汎用端末である携帯電話機やパソコンなどのマイクロホンを有する機器のほとんどは、単一マイクロホンを搭載しているため、本発明をこれらの機器に用いることで、内蔵の単一マイクロホンを用いた処理が可能となるため、マイクロホンアレーを外部機器とした接続が不要となり、導入コストを大幅に下げることができる。 As described above, according to the sound source position determination device 1 of the present embodiment, it is determined using only the 1-channel microphone input signal whether the generated sound is generated near the microphone or far away. Can do. Thereby, compared with the method using a microphone array, an apparatus cost can be reduced. Most devices with microphones, such as mobile phones and personal computers that are general-purpose terminals, are equipped with single microphones, so using the present invention for these devices enables processing using a single built-in microphone. Therefore, connection using a microphone array as an external device is not necessary, and the introduction cost can be greatly reduced.

次に、図５、図６、図７、図８を参照して、実施例２の音源位置判定装置について詳細に説明する。図５は本実施例の音源位置判定装置２の構成を示すブロック図である。図６は本実施例の音源位置判定装置２の動作を示すフローチャートである。図７は音源とマイクロホンと壁面の配置について例示する図である。図８は直接・間接音の到達時間差と音源−マイクロホン間距離との関係を示す図である。 Next, the sound source position determination apparatus according to the second embodiment will be described in detail with reference to FIGS. 5, 6, 7, and 8. FIG. 5 is a block diagram showing the configuration of the sound source position determination apparatus 2 of the present embodiment. FIG. 6 is a flowchart showing the operation of the sound source position determination apparatus 2 of the present embodiment. FIG. 7 is a diagram illustrating the arrangement of the sound source, the microphone, and the wall surface. FIG. 8 is a diagram showing the relationship between the arrival time difference between the direct and indirect sounds and the distance between the sound source and the microphone.

本実施例の音源位置判定装置２は、マイクロホン１０と、フレーム分割部２０と、周波数特徴量計算部２３０と、遠近判定部４０と、しきい値記憶部５０とを備える。周波数特徴量計算部２３０以外の構成は、実施例１の音源位置判定装置１において同一番号を付した各構成部と同じ動作をするため説明を割愛する。
本実施例の音源位置判定装置２は、直接音と間接音の到達時間差を利用して、直接音の成分が支配的な信号を捉えることで、音源の遠近を判定する。直接音と間接音の到達時間差はマイクロホンと音源が存在する部屋の特性に依存する。ここで、図７に示す配置を考える。音源とマイクロホンの直線距離はａ［ｍ］とし、一番近い壁との法線距離をｂ［ｍ］とする。音速をｃ［ｍ／ｓ］とする。直接音と、最初の間接音の到達時間差Ｔ_ｓ［ｓ］は、次式で求められる。 The sound source position determination apparatus 2 according to the present embodiment includes a microphone 10, a frame division unit 20, a frequency feature amount calculation unit 230, a distance determination unit 40, and a threshold storage unit 50. Since the configuration other than the frequency feature amount calculation unit 230 performs the same operation as each configuration unit assigned the same number in the sound source position determination device 1 of the first embodiment, a description thereof will be omitted.
The sound source position determination device 2 according to the present embodiment determines the distance of the sound source by capturing a signal in which the direct sound component is dominant, using the arrival time difference between the direct sound and the indirect sound. The difference in arrival time between the direct sound and the indirect sound depends on the characteristics of the room where the microphone and the sound source exist. Here, the arrangement shown in FIG. 7 is considered. The linear distance between the sound source and the microphone is a [m], and the normal distance from the nearest wall is b [m]. Let sound velocity be c [m / s]. The arrival time difference T _s [s] between the direct sound and the first indirect sound is obtained by the following equation.

ここで、ｂ＝１［ｍ］とし、ｃ＝３４０［ｍ／ｓ］としたときの、直接音と間接音の到達時間差Ｔ_ｓ［ｓ］と音源とマイクロホンの距離ａ［ｍ］の関係を図８に示す。音源とマイクロホンの距離が０．５ｍの時、直接音と間接音の到達時間差は５ｍｓとなる。この５ｍｓの間、マイクロホン入力信号は直接音のみであり、５ｍｓ以降は直接音と間接音の混合音となる。この到達時間差はわずかであり、時間方向の微細な分析が必要となる。よって、フレーム分割ではシフト幅を小さくすることで、時間方向の微細な分析を行う。 Here, when b = 1 [m] and c = 340 [m / s], the relationship between the arrival time difference T _s [s] between the direct sound and the indirect sound and the distance a [m] between the sound source and the microphone is shown. As shown in FIG. When the distance between the sound source and the microphone is 0.5 m, the arrival time difference between the direct sound and the indirect sound is 5 ms. During this 5 ms, the microphone input signal is only a direct sound, and after 5 ms, it is a mixed sound of the direct sound and the indirect sound. This difference in arrival time is slight, and fine analysis in the time direction is required. Therefore, in frame division, a fine analysis in the time direction is performed by reducing the shift width.

周波数特徴量計算部２３０は、入力信号の全帯域パワーに対する高帯域のパワーの割合を特徴量として計算する（Ｓ２３０）。本実施例の音源位置判定装置２は、音源からの信号が到達した直後のマイクロホン入力信号の先頭または先頭に近いフレームにおいて、直接音のみか、直接音と間接音の混合信号かによって、音源の遠近判定を行うことを特徴とする。本実施例では直接音と間接音の特徴の差を利用している。ここでは、周波数特性の差に着目する。直接音と間接音の混合信号では、インパルス応答の周波数特性により、高域が減衰する。このことから、周波数特徴量計算部２３０は、例えばフーリエ変換により、周波数特性を求め、周波数全体に対する２ｋＨｚ以上の高域のパワーの割合を特徴量として計算すればよい。周波数特性を求める計算としては、他に参考非特許文献１記載の音声スペクトル分析法を用いてもよい。 The frequency feature amount calculator 230 calculates the ratio of the high band power to the entire band power of the input signal as a feature amount (S230). The sound source position determination device 2 of the present embodiment determines whether the sound source of the sound source depends on whether the sound is a direct sound or a mixed signal of a direct sound and an indirect sound in the head of the microphone input signal immediately after the signal from the sound source arrives or near the head. A perspective determination is performed. In this embodiment, the difference in characteristics between the direct sound and the indirect sound is used. Here, attention is focused on the difference in frequency characteristics. In the mixed signal of the direct sound and the indirect sound, the high range is attenuated by the frequency characteristic of the impulse response. From this, the frequency feature amount calculation unit 230 may obtain the frequency characteristics by, for example, Fourier transform, and calculate the ratio of the high frequency power of 2 kHz or more to the entire frequency as the feature amount. As a calculation for obtaining the frequency characteristics, a speech spectrum analysis method described in Reference Non-Patent Document 1 may be used.

次に、遠近判定部４０では、実施例１と同様に、周波数特徴量計算部２３０で計算された特徴量に対して、定められたしきい値と比較を行うことで、遠近の判定を行う（Ｓ４０）。しきい値は、あらかじめ実験的に求める。
（参考非特許文献１）古井貞熙著、「ディジタル音声処理」、東海大学出版会、１９８５年、Ｐ．３９ Next, in the perspective determination unit 40, as in the first embodiment, the perspective is determined by comparing the feature amount calculated by the frequency feature amount calculation unit 230 with a predetermined threshold value. (S40). The threshold is experimentally obtained in advance.
(Reference Non-Patent Document 1) Sadaaki Furui, “Digital Audio Processing”, Tokai University Press, 1985, P.A. 39

このように、本実施例の音源位置判定装置２によれば、入力信号の全帯域パワーに対する高帯域のパワーの割合を特徴量として、当該特徴量をしきい値と比較することにより、実施例１と同様の効果を得ることができる。 As described above, according to the sound source position determination device 2 of the present embodiment, the ratio of the high band power to the entire band power of the input signal is used as the feature amount, and the feature amount is compared with the threshold value. 1 can be obtained.

次に、図９、図１０、図１１を参照して、実施例３の音源位置判定装置について詳細に説明する。図９は本実施例の音源位置判定装置３の構成を示すブロック図である。図１０は本実施例のパワー比特徴量計算部３３０の構成を示すブロック図である。図１１は本実施例の音源位置判定装置３の動作を示すフローチャートである。 Next, the sound source position determination apparatus according to the third embodiment will be described in detail with reference to FIGS. 9, 10, and 11. FIG. 9 is a block diagram showing the configuration of the sound source position determination device 3 of the present embodiment. FIG. 10 is a block diagram showing the configuration of the power ratio feature quantity calculation unit 330 of this embodiment. FIG. 11 is a flowchart showing the operation of the sound source position determination device 3 of this embodiment.

本実施例の音源位置判定装置３は、マイクロホン１０と、フレーム分割部２０と、パワー比特徴量計算部３３０と、遠近判定部４０と、しきい値記憶部５０とを備える。パワー比特徴量計算部３３０は、入力信号切り出し手段３３１と、離散フーリエ変換手段３３２と、パワー計算手段３３３と、基本周波数推定手段３３４と、周波数成分パワー計算手段３３５と、非周期成分パワー計算手段３３６と、除算手段３３７とを備える。パワー比特徴量計算部３３０以外の構成は、実施例１の音源位置判定装置１において同一番号を付した各構成部と同じ動作をするため説明を割愛する。 The sound source position determination device 3 according to the present embodiment includes a microphone 10, a frame division unit 20, a power ratio feature quantity calculation unit 330, a distance determination unit 40, and a threshold storage unit 50. The power ratio feature amount calculation unit 330 includes an input signal cutout unit 331, a discrete Fourier transform unit 332, a power calculation unit 333, a fundamental frequency estimation unit 334, a frequency component power calculation unit 335, and an aperiodic component power calculation unit. 336 and a dividing means 337. Since the configuration other than the power ratio feature amount calculation unit 330 performs the same operation as each configuration unit assigned the same number in the sound source position determination device 1 of the first embodiment, a description thereof will be omitted.

本実施例では、直接音と間接音の特徴の差として、直接音のみの時は、音の調波性、スパース性がよく表れるが、直接音と間接音の混合信号では信号の重なり合いから、音の調波性、スパース性が表れなくなるといった知見を利用する。本実施例では、入力信号に含まれる調波成分の割合を特徴量として用いる。信号に含まれる調波成分の割合を表す値として、参考特許文献１に記載の周期性成分パワーと非周期性成分パワーとのパワー比を用いる。パワー比特徴量計算部３３０は、入力信号を周期性成分パワーと非周期性成分パワーとのパワー比に変換する（Ｓ３３０）。 In this embodiment, as a difference in characteristics between the direct sound and the indirect sound, when only the direct sound is present, the harmonics and sparsity of the sound are well expressed, but in the mixed signal of the direct sound and the indirect sound, Use the knowledge that the harmonic and sparseness of the sound does not appear. In this embodiment, the ratio of the harmonic component included in the input signal is used as the feature amount. As a value representing the ratio of the harmonic component included in the signal, the power ratio between the periodic component power and the aperiodic component power described in Reference Patent Document 1 is used. The power ratio feature amount calculation unit 330 converts the input signal into a power ratio between the periodic component power and the aperiodic component power (S330).

詳細には、ステップＳ２０におけるフレーム分割の後、入力信号切り出し手段３３１は、入力信号の一部区間を切り出す（ＳＳ３３１）。離散フーリエ変換手段３３２は、切り出された入力信号を離散フーリエ変換して、周波数スペクトルを求める（ＳＳ３３２）。パワー計算手段３３３は、切り出された入力信号のパワーを計算する（ＳＳ３３３）。基本周波数推定手段３３４は、切り出された入力信号の基本周波数を推定する（ＳＳ３３４）。周期性成分パワー計算手段３３５は、切り出された入力信号の周波数スペクトル、パワー、推定した基本周波数から周期性成分パワーを求める（ＳＳ３３５）。非周期性成分パワー計算手段３３６は、切り出された入力信号のパワーから周期性成分パワーを減算して、非周期成分パワーを求める（ＳＳ３３６）。除算手段３３７は、周期性成分パワーと非周期成分パワーを除算することで、周期性成分パワーと非周期性成分パワーとのパワー比を求める（ＳＳ３３７）。 Specifically, after the frame division in step S20, the input signal cutout unit 331 cuts out a partial section of the input signal (SS331). The discrete Fourier transform means 332 performs a discrete Fourier transform on the extracted input signal to obtain a frequency spectrum (SS332). The power calculation means 333 calculates the power of the cut input signal (SS333). The fundamental frequency estimation means 334 estimates the fundamental frequency of the extracted input signal (SS334). The periodic component power calculation means 335 obtains the periodic component power from the frequency spectrum, power, and estimated basic frequency of the extracted input signal (SS335). The aperiodic component power calculation unit 336 subtracts the periodic component power from the extracted power of the input signal to obtain the aperiodic component power (SS336). The dividing unit 337 obtains a power ratio between the periodic component power and the non-periodic component power by dividing the periodic component power and the non-periodic component power (SS337).

次に、遠近判定部４０では、実施例２と同様に、パワー比特徴量計算部３３０で計算された特徴量に対して、定められたしきい値と比較を行うことで、遠近の判定を行う（Ｓ４０）。しきい値は、あらかじめ実験的に求める。 Next, the perspective determination unit 40 compares the feature amount calculated by the power ratio feature amount calculation unit 330 with a predetermined threshold in the same manner as in the second embodiment, thereby determining the perspective. Perform (S40). The threshold is experimentally obtained in advance.

このように、本実施例の音源位置判定装置３によれば、入力信号に含まれる調波成分の割合を特徴量として、当該特徴量をしきい値と比較することにより、実施例１、２と同様の効果を得ることができる。 As described above, according to the sound source position determination device 3 of the present embodiment, the ratio of the harmonic component included in the input signal is used as the feature amount, and the feature amount is compared with the threshold value. The same effect can be obtained.

次に、図１２、図１３を参照して、実施例４の音源位置判定装置について詳細に説明する。図１２は本実施例の音源位置判定装置４の構成を示すブロック図である。図１３は本実施例の音源位置判定装置４の動作を示すフローチャートである。 Next, a sound source position determination apparatus according to the fourth embodiment will be described in detail with reference to FIGS. FIG. 12 is a block diagram showing the configuration of the sound source position determination device 4 of this embodiment. FIG. 13 is a flowchart showing the operation of the sound source position determination apparatus 4 of this embodiment.

本実施例の音源位置判定装置４は、マイクロホン１０と、フレーム分割部２０と、周波数特徴量計算部２３０と、特徴量記憶部４３３と、特徴量変化計算部４３４と、遠近判定部４４０と、しきい値記憶部５０とを備える。特徴量記憶部４３３と、特徴量変化計算部４３４と、遠近判定部４４０以外の構成は、実施例１の音源位置判定装置１、実施例２の音源位置判定装置２において同一番号を付した各構成部と同じ動作をするため説明を割愛する。本実施例では、特徴量の時間変化に着目する。本実施例では、特徴量を時間の関数として表し、その関数の傾きを特徴量変化として計算する。計算された特徴量は特徴量記憶部４３３に記憶される（Ｓ４３３）。特徴量変化計算部４３４では、特徴量記憶部４３３に記憶された過去の特徴量の値と現在の特徴量を比較し、特徴量変化を計算する（Ｓ４３４）。実施例２、３で挙げた特徴量である高域のパワーの割合、周期性成分パワーと非周期性成分パワーとのパワー比は、いずれも音が発生した後に減少する。本実施例では、減少の傾きを特徴量変化として計算する。遠近判定部４４０では、特徴量変化計算部４３４で計算された特徴量変化に対して、定められたしきい値と比較を行うことで、遠近の判定を行う（Ｓ４４０）。前述の減少の傾きに対しては、傾きが大きい時に、音はマイクロホンの近くで発生していると判定する。しきい値は、あらかじめ実験的に求める。 The sound source position determination device 4 according to the present embodiment includes a microphone 10, a frame division unit 20, a frequency feature amount calculation unit 230, a feature amount storage unit 433, a feature amount change calculation unit 434, a perspective determination unit 440, And a threshold storage unit 50. The components other than the feature amount storage unit 433, the feature amount change calculation unit 434, and the perspective determination unit 440 are assigned the same numbers in the sound source position determination device 1 of the first embodiment and the sound source position determination device 2 of the second embodiment. The description is omitted because it performs the same operation as the component. In the present embodiment, attention is paid to the temporal change of the feature amount. In the present embodiment, the feature amount is expressed as a function of time, and the slope of the function is calculated as a feature amount change. The calculated feature quantity is stored in the feature quantity storage unit 433 (S433). The feature amount change calculation unit 434 compares the past feature amount value stored in the feature amount storage unit 433 with the current feature amount, and calculates a feature amount change (S434). The ratio of the high frequency power, which is the feature amount described in the second and third embodiments, and the power ratio between the periodic component power and the non-periodic component power all decrease after the sound is generated. In this embodiment, the slope of decrease is calculated as a feature amount change. The perspective determination unit 440 performs perspective determination by comparing the feature amount change calculated by the feature amount change calculation unit 434 with a predetermined threshold value (S440). With respect to the above-described decrease slope, it is determined that sound is generated near the microphone when the slope is large. The threshold is experimentally obtained in advance.

［変形例１］
次に、引き続き図１２、図１３を参照して、実施例４の変形例の音源位置判定装置について詳細に説明する。本変形例の音源位置判定装置４’は、実施例４における周波数特徴量計算部２３０を、実施例３におけるパワー比特徴量計算部３３０に置き換えたものである。前述したように、実施例３で挙げた特徴量である周期性成分パワーと非周期性成分パワーとのパワー比は、音が発生した後に減少するため、この減少の傾きを特徴量変化として、本変形例のように、周波数特徴量計算部２３０を、パワー比特徴量計算部３３０に置き換えても、同様の効果を達成できる。 [Modification 1]
Next, a sound source position determination apparatus according to a modification of the fourth embodiment will be described in detail with reference to FIGS. The sound source position determination device 4 ′ of the present modification is obtained by replacing the frequency feature amount calculation unit 230 in the fourth embodiment with the power ratio feature amount calculation unit 330 in the third embodiment. As described above, the power ratio between the periodic component power and the non-periodic component power, which is the feature amount described in the third embodiment, decreases after the sound is generated. Similar effects can be achieved even if the frequency feature quantity calculator 230 is replaced with the power ratio feature quantity calculator 330 as in this modification.

このように、本実施例（本変形例）の音源位置判定装置４（４’）によれば、入力信号から得た特徴量の時間変化に着目して、当該特徴量変化をしきい値と比較することにより、実施例１、２、３と同様の効果を得ることができる。 Thus, according to the sound source position determination device 4 (4 ′) of this embodiment (this modification), paying attention to the temporal change of the feature quantity obtained from the input signal, the feature quantity change is used as the threshold value. By comparing, the same effects as those of Examples 1, 2, and 3 can be obtained.

次に、図１４、図１５、図１６を参照して、実施例５の音源位置判定装置について詳細に説明する。図１４は本実施例の音源位置判定装置５の構成を示すブロック図である。図１５は本実施例の音源位置判定装置５の動作を示すフローチャートである。図１６は本実施例の特徴量データベース５５０の例を示す図である。 Next, a sound source position determination apparatus according to the fifth embodiment will be described in detail with reference to FIGS. 14, 15, and 16. FIG. 14 is a block diagram showing the configuration of the sound source position determination device 5 of this embodiment. FIG. 15 is a flowchart showing the operation of the sound source position determination apparatus 5 of this embodiment. FIG. 16 is a diagram illustrating an example of the feature amount database 550 of the present embodiment.

本実施例の音源位置判定装置５は、マイクロホン１０と、フレーム分割部２０と、特徴量計算部３０と、距離判定部５４０と、特徴量データベース５５０とを備える。特徴量計算部３０は、周波数領域変換手段３１と、パワースペクトル計算手段３２と、パワースペクトル記憶手段３３と、パワースペクトル変化計算手段３４とを備える。距離判定部５４０と、特徴量データベース５５０以外の構成は、実施例１の音源位置判定装置１において同一番号を付した各構成部と同じ動作をするため説明を割愛する。 The sound source position determination apparatus 5 according to the present embodiment includes a microphone 10, a frame division unit 20, a feature amount calculation unit 30, a distance determination unit 540, and a feature amount database 550. The feature amount calculation unit 30 includes a frequency domain conversion unit 31, a power spectrum calculation unit 32, a power spectrum storage unit 33, and a power spectrum change calculation unit 34. Since the components other than the distance determination unit 540 and the feature amount database 550 operate in the same manner as the components assigned the same numbers in the sound source position determination device 1 of the first embodiment, description thereof is omitted.

本実施例では、実施例１の構成を拡張し、遠近の判定だけでなく、マイクロホンと音源の距離を判定する。距離判定部５４０は、パワースペクトル時間変化量Ｓ_ｔを入力とし、定常雑音のみがある場合はパワースペクトル時間変化量Ｓ_ｃを計算し、あらかじめ様々なパワースペクトル時間変化量Ｓ_ｔとマイクロホンと音源の距離の関係を対応させて記憶した特徴量データベース５５０と照合する。特徴量データベース５５０の例を図１６に示す。距離判定部５４０は、パワースペクトル時間変化量Ｓ_ｔと特徴量データベース５５０との照合により、測定されたパワースペクトル時間変化量Ｓ_ｔと最も近いデータベース上のパワースペクトル時間変化量データと対応する距離値を、マイクロホンと音源の距離の推定値として出力する（Ｓ５４０）。 In the present embodiment, the configuration of the first embodiment is expanded to determine not only the distance determination but also the distance between the microphone and the sound source. The distance determination unit 540 receives the power spectrum time variation S _t as input, calculates the power spectrum time variation S _c when there is only stationary noise, and previously determines various power spectrum time variations _St , microphones, and sound sources. It collates with the feature amount database 550 stored in correspondence with the distance relationship. An example of the feature quantity database 550 is shown in FIG. Distance determining section 540, the verification of the power spectrum time variation S _t and the feature quantity database 550, the corresponding distance value and power spectrum time change amount data on the measured nearest database and power spectrum time variation S _t Is output as an estimated value of the distance between the microphone and the sound source (S540).

なお、本実施例では、パワースペクトル時間変化量Ｓ_ｔを特徴量として、当該特徴量と距離の関係をデータベース化して予め記憶しておき、当該データベースを参照することで、音源とマイクロホンの距離を推定することとしたが、上述の特徴量はパワースペクトル時間変化量Ｓ_ｔに限定されない。例えば、実施例２のように入力信号の全帯域パワーに対する高帯域のパワーの割合を特徴量としても良い。実施例３のように周期性成分パワーと非周期性成分パワーとのパワー比を特徴量としても良い。実施例４のように特徴量変化を用いることとし、特徴量変化とマイクロホン−音源間距離とを対応させてデータベース化しておくこととしても良い。 In this embodiment, as the feature amount power spectrum time variation S _t, a database of relationships of the features and the distance is stored in advance, by referring to the database, the distance of the sound source and the microphone It was decided to estimate feature amounts described above is not limited to the power spectrum time variation S _t. For example, as in the second embodiment, the ratio of the high band power to the total band power of the input signal may be used as the feature amount. As in the third embodiment, the power ratio between the periodic component power and the non-periodic component power may be used as the feature amount. The feature amount change may be used as in the fourth embodiment, and the feature amount change and the distance between the microphone and the sound source may be associated with each other in a database.

なお、実施例１〜５において、マイクロホン入力を例に説明したが、本発明の入力信号としては、これに限られず、マイクロホン入力の替わりに、あらかじめ録音された音声ファイルを入力としてもよい。また、入力信号に対して、ハイパス、ローパス、バンドパスフィルタフィルタを適用した信号を用いても良い。 In the first to fifth embodiments, the microphone input has been described as an example. However, the input signal of the present invention is not limited to this, and a voice file recorded in advance may be input instead of the microphone input. Further, a signal obtained by applying a high-pass, low-pass, or band-pass filter to the input signal may be used.

このように、本実施例の音源位置判定装置５によれば、入力信号とマイクロホン−音源間距離の関係を予めデータベース化しておくことでマイクロホンと音源との距離を測定することができる。 As described above, according to the sound source position determination device 5 of the present embodiment, the distance between the microphone and the sound source can be measured by previously creating a database of the relationship between the input signal and the distance between the microphone and the sound source.

また、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 In addition, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good.

なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer). In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

A frequency domain converting means for converting an input signal into a frequency domain signal; a power spectrum calculating means for calculating a power spectrum signal from the frequency domain signal; and storing a power spectrum signal for each predetermined time interval, a power spectrum memory means for outputting a power spectrum signal in a time interval, and the current of the power spectrum signal, and the power spectral change calculation means for calculating a power spectrum signal Toka Lapa word spectrum time change amount of the predetermined time before the time interval A feature amount calculation unit comprising:
And the far proximal determination unit determine distance of the sound source and the microphone by comparing a predetermined threshold with the calculated power spectrum time change amount,
A sound source position determination apparatus comprising:

The sound source position determination device according to claim 1,
A feature database for storing the relationship between the distance between the sound source and the microphone and the feature in advance;
Instead of the front Kito near determination unit, and the feature quantity database, determines the sound source position, characterized in that it comprises a distance determination unit the distance between the features and compared to the sound source and the microphone of the input signal apparatus.

A frequency domain conversion sub-step for converting an input signal into a frequency domain signal; a power spectrum calculation sub-step for calculating a power spectrum signal from the frequency domain signal; and storing a power spectrum signal for each predetermined time interval; a power spectrum storage substep of outputting the power spectrum signal in the previous time interval, and the current of the power spectrum signal, the power spectral change to calculate the power spectrum signal Toka Lapa word spectrum time change amount of the predetermined time before the time interval A feature sub-step comprising: a feature amount calculating step;
A perspective determination step of determining the perspective of the sound source and the microphone by comparing the calculated power spectrum time variation with a predetermined threshold value;
A sound source position determination method characterized by comprising:

A program for causing a computer to function as the sound source position determination device according to claim 1.