JP2013088141A

JP2013088141A - Sound source direction estimation method, sound source direction estimation device and creation device for image for sound source estimation

Info

Publication number: JP2013088141A
Application number: JP2011226020A
Authority: JP
Inventors: Masanao Owaki; 雅直大脇; Takefumi Zaima; 健史財満
Original assignee: Kumagai Gumi Co Ltd
Current assignee: Kumagai Gumi Co Ltd
Priority date: 2011-10-13
Filing date: 2011-10-13
Publication date: 2013-05-13
Anticipated expiration: 2031-10-13
Also published as: JP5826582B2

Abstract

PROBLEM TO BE SOLVED: To provide a method and a device in which even in a field of high reflection sound, a sound source direction of direct sound can be estimated easily and accurately and even impulse sound can also be extracted accurately.SOLUTION: When estimating a sound source direction by calculating a sound arrival time difference Dbetween microphones M1 and M3 and a sound arrival time difference Dbetween microphones M2 and M4 from a sound pressure signal sampled by using the first and second microphone couples (M1, M3) and (M2 and M4) disposed on two straight lines crossing each other, fast Fourier transform for an extremely short time in which the length of an analysis interval is 0.1 to 10 msec. is performed N times, cross spectrums p, pand amplitude values w, wbetween M1 and M3 and between M2 and M4 are calculated, and arrival time differences D, Dare calculated from weighted average cross spectrums P, Pobtained by performing weighted averaging on these cross spectrums p, pwith the amplitude values w, w.

Description

本発明は、複数のマイクロフォンで採取した音の情報から音源方向を推定する方法とその装置、及び、マイクロフォンで採取した音の情報と撮影手段で撮影した映像の情報とを用いて、音源を推定するための画像を作成する装置に関するものである。 The present invention relates to a method and apparatus for estimating a sound source direction from sound information collected by a plurality of microphones, and to estimate a sound source using sound information collected by a microphone and video information photographed by a photographing means. The present invention relates to an apparatus for creating an image to be used.

従来、音の到来方向である音源方向を推定する方法としては、多数のマイクロフォンを等間隔に配置したマイクロフォンアレーを構成し、基準となるマイクロフォンで採取された音圧信号と各マイクロフォンで採取された音圧信号との位相差から音源方向を推定する、いわゆる音響学的手法が考案されている（例えば、非特許文献１参照）。
一方、マイクロフォンアレーを構成する複数のマイクロフォンの出力信号の位相差からではなく、複数のマイクロフォンにより互いに交わる直線状に配置された複数のマイクロフォン対を構成し、対となる２つのマイクロフォン間の位相差に相当する到達時間差と、他の対となる２つのマイクロフォン間の到達時間差との比から音源の方向を推定する方法が提案されている（例えば、特許文献１〜３参照）。 Conventionally, as a method of estimating the sound source direction that is the direction of sound arrival, a microphone array in which a large number of microphones are arranged at equal intervals is configured, and a sound pressure signal collected by a reference microphone and each microphone are collected. A so-called acoustic technique has been devised that estimates the sound source direction from the phase difference with the sound pressure signal (see, for example, Non-Patent Document 1).
On the other hand, not a phase difference between output signals of a plurality of microphones constituting a microphone array, but a plurality of microphone pairs arranged in a straight line intersecting each other by a plurality of microphones, and a phase difference between two paired microphones There has been proposed a method for estimating the direction of a sound source from the ratio of the arrival time difference corresponding to the above and the arrival time difference between two other paired microphones (see, for example, Patent Documents 1 to 3).

具体的には、図６に示すように、４個のマイクロフォンＭ１〜Ｍ４を、互いに直交する２直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対（Ｍ１，Ｍ３）及びマイクロフォン対（Ｍ２，Ｍ４）を構成するように配置し、前記マイクロフォン対（Ｍ１，Ｍ３）を構成するマイクロフォンＭ１，Ｍ３に入力する音の到達時間差Ｄ₁₃と、前記マイクロフォン対（Ｍ２，Ｍ４）を構成するマイクロフォンＭ２，Ｍ４に入力する音の到達時間差Ｄ₂₄との比から、計測点と音源の位置との水平角θを推定するとともに、前記マイクロフォンＭ１〜Ｍ４の作る平面上にない位置に第５のマイクロフォンＭ５を配置して４組のマイクロフォン対（Ｍ５, Ｍ１），（Ｍ５, Ｍ２），（Ｍ５, Ｍ３），（Ｍ５, Ｍ４）を構成し、前記各マイクロフォン対を構成するマイクロフォンで採取した音の到達時間差Ｄ₁₃，Ｄ₂₄及びＤ_5j（ｊ＝１〜４）から、計測点と音源の位置との成す仰角φを推定する。
なお、前記到達時間差Ｄ_ijは、２つのマイクロフォン対（Ｍ_ｉ，Ｍ_ｊ）に入力される信号をＡ/Ｄ変換した音圧波形データをそれぞれ高速フーリエ変換し、この高速フーリエ変換された音圧波形データのクロススペクトルを求め、更に、対象とする周波数ｆの位相角情報を用いて算出される。
また、計測点から測った音源方向は、前記水平角θと前記仰角φとにより表わせる。 Specifically, as shown in FIG. 6, two microphone pairs (M1, M3) and microphone pairs (four microphones M1 to M4) are arranged at predetermined intervals on two straight lines orthogonal to each other. M2, M4) arranged so as to constitute a microphone constituting the arrival time difference D ₁₃ of the sound input to the microphone M1, M3 constituting the microphone pair (M1, M3), said microphone pairs (M2, M4) The horizontal angle θ between the measurement point and the position of the sound source is estimated from the ratio with the arrival time difference D ₂₄ of the sound input to M2 and M4, and the fifth microphone is located at a position not on the plane formed by the microphones M1 to M4. M5 is arranged to constitute four pairs of microphones (M5, M1), (M5, M2), (M5, M3), (M5, M4). The elevation angle φ formed by the measurement point and the position of the sound source is estimated from the arrival time differences D ₁₃ , D ₂₄ and D _5j (j = 1 to 4) of the sounds collected by the microphones constituting the pair.
The arrival time difference D _ij is obtained by subjecting sound pressure waveform data obtained by A / D conversion of signals input to the two microphone pairs (M _i , M _j ) to fast Fourier transform, respectively. The cross spectrum of the waveform data is obtained and further calculated using the phase angle information of the target frequency f.
The sound source direction measured from the measurement point can be expressed by the horizontal angle θ and the elevation angle φ.

これにより、マイクロフォンアレーを用いて音源方向を推定する場合に比較して、少ないマイクロフォン数で音源方向を正確に推定することができる。
また、このとき、ＣＣＤカメラ等の映像採取手段を設けて前記推定された音源方向の画像を撮影し、この画像データと音源方向のデータとを合成して、画像中に前記推定した音源方向（θ，φ）と音圧レベルとを図形で表示した音源推定用画像をディスプレイ等の表示画面に表示するようにすれば、音源を視覚的に把握することができる。
また、音の採取と同時に映像採取手段にて映像を連続的に撮影し、音の情報である音圧波形データと映像の情報である画像データとをコンピュータのハードディスクに保存しておき、音の情報と映像の情報との採取後に、ハードディスクから音圧波形データを取出して音源方向を推定するとともに、この音源方向の推定計算に使用した音圧波形データに対応する画像データをハードディスクから取出し、この画像データと音源方向のデータとを合成して音源推定用画像を表示する方法も行われている。 Thereby, compared with the case where the sound source direction is estimated using the microphone array, the sound source direction can be accurately estimated with a small number of microphones.
Further, at this time, a video sampling means such as a CCD camera is provided to take an image of the estimated sound source direction, and the image data and the sound source direction data are combined to generate the estimated sound source direction ( If the sound source estimation image in which θ, φ) and the sound pressure level are graphically displayed is displayed on a display screen such as a display, the sound source can be visually grasped.
Simultaneously with the sound collection, the image collection means continuously shoots the image, and the sound pressure waveform data that is the sound information and the image data that is the image information are stored in the hard disk of the computer. After collecting information and video information, the sound pressure waveform data is extracted from the hard disk to estimate the sound source direction, and the image data corresponding to the sound pressure waveform data used for the calculation of the sound source direction is extracted from the hard disk. A method of displaying a sound source estimation image by combining image data and sound source direction data is also performed.

特開２００２−１８１９１３号公報Japanese Patent Laid-Open No. 2002-181913 特開２００６−３２４８９５号公報JP 2006-324895 A 特開２００８−２２４２５９号公報JP 2008-224259 A

大賀寿郎，山崎芳男，金田豊；音響システムとディジタル処理，コロナ社，１９９５Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda; Acoustic system and digital processing, Corona, 1995

前記従来の方法では、音源の方向と到来した音の大きさを周波数毎に計測できるので、音源の情報を確実に把握することができるものの、反射音の大きい場においては、直接音と反射音とを区別するための演算処理が必要であった。
また、音源方向の解析区間が０．１〜１．０ｓｅｃ．と長いため、周期の短い衝撃音を的確に捉えることが困難であった。 In the conventional method, the direction of the sound source and the magnitude of the incoming sound can be measured for each frequency, so that the information on the sound source can be reliably grasped. It is necessary to perform arithmetic processing to distinguish the
Also, the sound source direction analysis interval is 0.1 to 1.0 sec. For this reason, it was difficult to accurately capture impact sounds with a short period.

本発明は、従来の問題点に鑑みてなされたもので、反射音の大きい場であっても直接音の音源方向を容易にかつ精度よく推定することができるとともに、衝撃音についても的確に抽出することのできる方法とその装置を提供することを目的とする。 The present invention has been made in view of the conventional problems, and can easily and accurately estimate the direction of the sound source of a direct sound even when the reflected sound is large, and can accurately extract the impact sound. It is an object of the present invention to provide a method and an apparatus that can be used.

本願発明者らは、鋭意検討の結果、クロススペクトルを求める際に、解析区間の長さ（入力信号に掛けられる窓関数の窓の幅）を短くして周波数分解能を低くした極短時間高速フーリエ変換を多数回行ってそれぞれクロススペクトルを求め、これら求められた多数回のクロススペクトルを加重平均した加重平均クロススペクトルから重心的な位相差（到達時間差）を算出するようにすれば、直接音の音源方向を精度良く推定することができることを見出し、本発明に到ったものである。
すなわち、本願の請求項１に記載の発明は、複数のマイクロフォンで採取した音の音圧信号から音源の方向を推定する方法であって、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された第１及び第２のマイクロフォン対を用いて到来した音の音圧信号を採取するステップと、前記第１のマイクロフォン対を構成するマイクロフォンＭ１，Ｍ３で採取された音圧信号と第２のマイクロフォン対を構成するマイクロフォンＭ２，Ｍ４で採取された音圧信号とをそれぞれＡ／Ｄ変換して前記４つのマイクロフォンＭ１〜Ｍ４で採取された音の音圧波形データをそれぞれ求めるステップと、前記各音圧波形データを高速フーリエ変換するステップと、前記高速フーリエ変換された前記マイクロフォンＭ１，Ｍ３の音圧波形データのクロススペクトルと前記マイクロフォンＭ２，Ｍ４の音圧波形データのクロススペクトルとを求めてマイクロフォンＭ１，Ｍ３間の音の到達時間差Ｄ₁₃と前記マイクロフォンＭ２，Ｍ４間の音の到達時間差Ｄ₂₄をそれぞれ算出するステップと、前記算出された第１のマイクロフォン対における到達時間差Ｄ₁₃と第２のマイクロフォン対における到達時間差Ｄ₂₄とから前記到来した音の音源方向を推定するステップと、を備え、前記高速フーリエ変換するステップでは、解析区間の長さが０．１ｍｓｅｃ．〜１０ｍｓｅｃ．である極短時間高速フーリエ変換を連続して多数回行うか、もしくは、前記解析区間の一部を重複させて多数回行い、前記到達時間差を算出するステップは、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルの振幅値を求めるステップと、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルを、前記振幅値より加重平均した加重平均クロススペクトルを求めるステップと、前記加重平均クロススペクトルから前記各マイクロフォン間の音の到達時間差Ｄ₁₃，Ｄ₂₄を算出するステップと、を備えることを特徴とする。
このように、極短時間解析を多数回行い、これら多数回のクロススペクトルの平均を振幅値による加重平均により求めて到達時間差を算出することにより、反射音やノイズ成分を低減するようにしたので、反射が大きい場においても、衝撃音などの直接音を確実に捉えることができるとともに、連続音についても音源方向を精度よく推定することができる。 As a result of intensive studies, the inventors of the present application have obtained a very short time fast Fourier in which the frequency resolution is reduced by shortening the length of the analysis section (the width of the window of the window function applied to the input signal) when obtaining the cross spectrum. If the cross spectrum is obtained by performing the conversion many times, and the centroidal phase difference (arrival time difference) is calculated from the weighted average cross spectrum obtained by weighted averaging of the obtained multiple cross spectra, the direct sound The present inventors have found that the direction of a sound source can be estimated with high accuracy and have arrived at the present invention.
That is, the invention according to claim 1 of the present application is a method for estimating the direction of a sound source from sound pressure signals of sounds collected by a plurality of microphones, and is arranged on each of two straight lines intersecting each other at predetermined intervals. A step of collecting the sound pressure signal of the incoming sound using the first and second microphone pairs, and the sound pressure signal and the second microphone collected by the microphones M1 and M3 constituting the first microphone pair. A / D conversion is performed on the sound pressure signals collected by the microphones M2 and M4 constituting the pair to obtain sound pressure waveform data of the sounds collected by the four microphones M1 to M4, respectively, A step of fast Fourier transforming the pressure waveform data; and a cross-slice of the sound pressure waveform data of the microphones M1 and M3 subjected to the fast Fourier transform. Step wherein a vector seeking cross spectrum of the sound pressure waveform data of the microphone M2, M4 calculates microphones M1, M3 between the arrival time difference D ₁₃ sounds the microphone M2, M4 between the arrival time difference D ₂₄ sounds respectively And estimating the sound source direction of the incoming sound from the calculated arrival time difference D ₁₃ in the first microphone pair and the arrival time difference D ₂₄ in the second microphone pair, and performing the fast Fourier transform In the step, the length of the analysis section is 0.1 msec. -10 msec. The step of calculating the arrival time difference by performing the very short time fast Fourier transform a number of times in succession or performing a number of times by overlapping a part of the analysis section, A step of obtaining an amplitude value of a cross spectrum obtained for each operation, and a step of obtaining a weighted average cross spectrum obtained by performing a weighted average of the cross spectrum obtained for each operation of the ultrashort-time fast Fourier transform from the amplitude value. And calculating the sound arrival time differences D ₁₃ and D ₂₄ between the microphones from the weighted average cross spectrum.
As described above, the analysis of the short time analysis was performed many times, and the average of these multiple cross spectra was obtained by the weighted average based on the amplitude value, and the arrival time difference was calculated, so that the reflected sound and noise components were reduced. Even in a highly reflective field, a direct sound such as an impact sound can be reliably captured, and a sound source direction can be accurately estimated for a continuous sound.

請求項２に記載の発明は、請求項１に記載の音源方向推定方法であって、前記４つのマイクロフォンＭ１〜Ｍ４に加えて、前記２組のマイクロフォン対の作る平面上にない第５のマイクロフォンＭ５を設けて到来した音の音圧信号を採取するとともに、前記到達時間差を算出するステップは、前記２組のマイクロフォン対を構成するマイクロフォンＭ１，Ｍ３間及びマイクロフォンＭ２，Ｍ４間の到達時間差Ｄ₁₃，Ｄ₂₄と、前記第５のマイクロフォンＭ５と前記４つのマイクロフォンＭ１〜Ｍ４のそれぞれとで構成される４組のマイクロフォン対を構成するマイクロフォン間の到達時間差Ｄ₅₁〜Ｄ₅₄を算出し、前記音源方向を推定するステップでは、前記算出された到達時間差Ｄ₁₃，Ｄ₂₄，Ｄ₅₁〜Ｄ₅₄を用いて前記到来した音の音源方向を推定することを特徴とする。
これにより、計測点から見た音源方向の水平角θに加えて、仰角φについても推定できるので、音源方向の推定精度を向上させることができる。 The invention according to claim 2 is the sound source direction estimating method according to claim 1, wherein, in addition to the four microphones M1 to M4, a fifth microphone that is not on a plane formed by the two pairs of microphones. The step of collecting the sound pressure signal of the incoming sound by providing M5 and calculating the arrival time difference includes the arrival time difference D ₁₃ between the microphones M1, M3 and the microphones M2, M4 constituting the two pairs of microphones. , and D _24, and calculates an arrival time difference D ₅₁ to D ₅₄ between microphones constituting the four sets of microphones pair composed of the respective said fifth microphone M5 of the four microphones M1 to M4, the sound source in the step of estimating a direction, a sound source direction of the incoming sound with the calculated arrival time difference _{_{_{D 13, D 24, D 51}}} ~D 54 Characterized in that it estimated.
Thus, since the elevation angle φ can be estimated in addition to the horizontal angle θ of the sound source direction viewed from the measurement point, the estimation accuracy of the sound source direction can be improved.

請求項３に記載の発明は、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された第１及び第２のマイクロフォン対と前記２組のマイクロフォン対の作る平面上にない第５のマイクロフォンとを有する音採取手段を備え、前記音採取手段で採取した音の音圧信号から、音源の方向を推定する音源方向推定装置であって、前記各マイクロフォンで採取した音圧信号をそれぞれデジタル信号に変換するＡ／Ｄ変換器と、前記デジタル信号に変換された音圧信号である音圧波形データを高速フーリエ変換する高速フーリエ変換器と、
前記高速フーリエ変換された音圧波形データのうちの前記第１のマイクロフォン対を構成する２つのマイクロフォンで採取された音の音圧波形データのクロススペクトルと、前記第２のマイクロフォン対を構成する２つのマイクロフォンで採取された音の音圧波形データのクロススペクトルと、前記第５のマイクロフォンと前記第１及び第２のマイクロフォン対を構成する４つのマイクロフォンのそれぞれで採取された音の音圧波形データとのクロススペクトルとを演算するクロススペクトル演算手段と、前記クロススペクトルから前記第１及び第２のマイクロフォン対をそれぞれ構成するマイクロフォン間の音の到達時間差Ｄ₁₃，Ｄ₂₄と前記第５のマイクロフォンと前記２組のマイクロフォン対を構成する４個のマイクロフォン間の音の到達時間差Ｄ₅₁〜Ｄ₅₄とを算出する到達時間差算出手段と、前記算出された到達時間差Ｄ₁₃，Ｄ₂₄，Ｄ₅₁〜Ｄ₅₄を用いて音源方向を推定する音源方向推定手段とを備え、前記高速フーリエ変換器は、解析区間の長さが０．１ｍｓｅｃ．〜１０ｍｓｅｃ．である極短時間高速フーリエ変換を連続して多数回行うか、もしくは、前記解析区間の一部を重複させて多数回行い、前記クロススペクトル演算手段は、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルの振幅値を求めるとともに、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルを、前記振幅値より加重平均した加重平均クロススペクトルを求め、前記到達時間差算出手段は、前記加重平均クロススペクトルから前記各マイクロフォン間の音の到達時間差Ｄ₁₃，Ｄ₂₄，Ｄ₅₁〜Ｄ₅₄を算出することを特徴とする。
このような構成を採ることにより、短時間高速フーリエ変換した音圧波形データの加重平均クロススペクトルを確実に求めることができるので、直接音の音源方向を精度良く推定することができる音源方向推定装置を得ることができる。 According to a third aspect of the present invention, there are provided a first microphone pair and a second microphone pair disposed at predetermined intervals on two straight lines that intersect with each other, and a fifth microphone that is not on a plane formed by the two microphone pairs. A sound source direction estimating device for estimating the direction of the sound source from the sound pressure signal of the sound collected by the sound collecting means, wherein each sound pressure signal collected by each microphone is converted into a digital signal. An A / D converter for converting, a fast Fourier transformer for fast Fourier transforming sound pressure waveform data, which is a sound pressure signal converted into the digital signal,
Among the sound pressure waveform data subjected to the fast Fourier transform, a cross spectrum of sound pressure waveform data of sound collected by two microphones constituting the first microphone pair and 2 constituting the second microphone pair Cross spectrum of sound pressure waveform data of sound collected by one microphone, and sound pressure waveform data of sound collected by each of the fifth microphone and the four microphones constituting the first and second microphone pairs cross spectrum and the cross-spectrum calculation means for calculating a time of arrival difference D _13, D ₂₄ of the sound between the microphones respectively constituting the first and second microphone pair from the cross-spectral and the fifth microphone and The arrival of sound between the four microphones constituting the two microphone pairs And the arrival time difference calculating means for calculating a reach time difference D ₅₁ to D _54, and a sound source direction estimating means for estimating the sound source direction using the arrival time the calculated difference _{_{_{D 13, D 24, D 51}}} ~D 54, The fast Fourier transformer has an analysis section length of 0.1 msec. -10 msec. The ultra-short-time fast Fourier transform is continuously performed many times, or a part of the analysis section is overlapped many times, and the cross spectrum calculation means is operated for each operation of the ultra-short-time fast Fourier transform. And calculating a weighted average cross spectrum obtained by weighted averaging the cross spectrum obtained for each operation of the extremely short time fast Fourier transform from the amplitude value. The calculating means calculates the arrival time differences D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ between the microphones from the weighted average cross spectrum.
By adopting such a configuration, a weighted average cross spectrum of sound pressure waveform data subjected to short-time fast Fourier transform can be obtained reliably, so that a sound source direction estimating device that can accurately estimate the sound source direction of a direct sound Can be obtained.

また、請求項４に記載の発明は、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された第１及び第２のマイクロフォン対と前記２組のマイクロフォン対の作る平面上にない第５のマイクロフォンとを備えた音採取手段と音源方向の映像を撮影する撮影手段とを備え、前記音採取手段で採取した音源から伝播される音の音圧信号と前記音圧信号と前記撮影手段で撮影された音源の方向の映像信号とから、音源の方向を示す図形が描画された画像である音源推定用画像を作成する音源推定用画像の作成装置であって、前記各マイクロフォンで採取した音圧信号と撮影手段で撮影した映像信号とをそれぞれデジタル信号に変換するＡ／Ｄ変換器と、前記デジタル信号に変換された音圧信号である音圧波形データをそれぞれ高速フーリエ変換する高速フーリエ変換器と、前記高速フーリエ変換された音圧波形データのうちの前記第１のマイクロフォン対を構成する２つのマイクロフォンで採取された音の音圧波形データのクロススペクトルと、前記第２のマイクロフォン対を構成する２つのマイクロフォンで採取された音の音圧波形データのクロススペクトルと、前記第５のマイクロフォンと前記第１及び第２のマイクロフォン対を構成する４つのマイクロフォンのそれぞれで採取された音の音圧波形データとのクロススペクトルとを演算するクロススペクトル演算手段と、前記クロススペクトルから前記第１及び第２のマイクロフォン対をそれぞれ構成するマイクロフォン間の音の到達時間差Ｄ₁₃，Ｄ₂₄と前記第５のマイクロフォンと前記２組のマイクロフォン対を構成する４個のマイクロフォン間の音の到達時間差Ｄ₅₁〜Ｄ₅₄とを算出する到達時間差算出手段と、前記算出された到達時間差Ｄ₁₃，Ｄ₂₄，Ｄ₅₁〜Ｄ₅₄を用いて音源方向を推定する音源方向推定手段と、前記推定された音源方向のデータと前記デジタル信号に変換された映像信号である画像データとを合成して、前記推定された音源方向を示す図形が描画された画像である音源推定用画像を作成する音源推定用画像作成手段とを備え、前記高速フーリエ変換器は、解析区間の長さが０．１ｍｓｅｃ．〜１０ｍｓｅｃ．である極短時間高速フーリエ変換を連続して多数回行うか、もしくは、前記解析区間の一部を重複させて多数回行い、前記クロススペクトル演算手段は、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルの振幅値を求めるとともに、前記極短時間高速フーリエ変換の操作毎にそれぞれ求められたクロススペクトルを、前記振幅値より加重平均した加重平均クロススペクトルを求め、前記到達時間差算出手段は、前記加重平均クロススペクトルから前記各マイクロフォン間の音の到達時間差Ｄ₁₃，Ｄ₂₄，Ｄ₅₁〜Ｄ₅₄を算出することを特徴とする。
このような構成を採ることにより、直接音の音源方向を精度良く推定することができるとともに、音源を推定するための音源推定用画像を容易に作成することができる。
また、請求項５に記載の発明は、請求項４に記載の音源推定用画像の作成装置において、前記作成された音源推定用画像を表示する表示画面を有する表示手段を備えたことを特徴とする。
これにより、表示手段の表示画面状に音源推定用画像を表示できるので、作業者が音源を容易に視認することができる。 According to a fourth aspect of the present invention, there is provided a fifth embodiment in which the first and second microphone pairs disposed at predetermined intervals on two straight lines that intersect with each other and the plane formed by the two microphone pairs are not on the fifth plane. A sound collecting means including a microphone and a photographing means for photographing a sound source direction image, and a sound pressure signal of the sound propagated from the sound source collected by the sound collecting means, the sound pressure signal, and the photographing means. A sound source estimation image creating apparatus that creates a sound source estimation image that is an image in which a graphic showing a direction of a sound source is drawn from a video signal in a direction of the sound source, and the sound pressure collected by each microphone An A / D converter that converts a signal and a video signal captured by a photographing means into digital signals, and a fast Fourier transform of sound pressure waveform data that is a sound pressure signal converted into the digital signals, respectively A fast Fourier transformer, a cross spectrum of sound pressure waveform data of sound collected by two microphones constituting the first microphone pair of the sound pressure waveform data subjected to the fast Fourier transform, and the second Of the sound pressure waveform data of the sound collected by the two microphones constituting the microphone pair, and each of the four microphones constituting the fifth microphone and the first and second microphone pairs. The cross spectrum calculation means for calculating the cross spectrum of the sound pressure waveform data of the sound and the sound arrival time differences D ₁₃ and D ₂₄ between the microphones constituting the first and second microphone pairs from the cross spectrum, respectively. And the fifth microphone and the four microphones constituting the two pairs of microphones And the arrival time difference calculating means for calculating an arrival time difference D ₅₁ to D ₅₄ of the sound between the microphones, the sound source direction estimation for estimating the sound source direction using the arrival time the calculated difference _{_{_{D 13, D 24, D 51}}} ~D 54 Means for synthesizing the estimated sound source direction data and the image data which is the video signal converted into the digital signal, and is a sound source estimation image which is an image in which a figure showing the estimated sound source direction is drawn Sound source estimation image creating means for creating an image, wherein the fast Fourier transformer has an analysis section length of 0.1 msec. -10 msec. The ultra-short-time fast Fourier transform is continuously performed many times, or a part of the analysis section is overlapped many times, and the cross spectrum calculation means is operated for each operation of the ultra-short-time fast Fourier transform. And calculating a weighted average cross spectrum obtained by weighted averaging the cross spectrum obtained for each operation of the extremely short time fast Fourier transform from the amplitude value. The calculating means calculates the arrival time differences D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ between the microphones from the weighted average cross spectrum.
By adopting such a configuration, the sound source direction of the direct sound can be estimated with high accuracy, and a sound source estimation image for estimating the sound source can be easily created.
Further, the invention described in claim 5 is the sound source estimation image creating apparatus according to claim 4, further comprising display means having a display screen for displaying the created sound source estimation image. To do.
Thereby, since the sound source estimation image can be displayed on the display screen of the display means, the operator can easily visually recognize the sound source.

なお、前記発明の概要は、本発明の必要な全ての特徴を列挙したものではなく、これらの特徴群のサブコンビネーションもまた、発明となり得る。 The summary of the invention does not list all necessary features of the present invention, and sub-combinations of these feature groups can also be the invention.

本発明の実施の形態に係る音源推定用画像表示システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the image display system for sound source estimation which concerns on embodiment of this invention. 本実施の形態に係る音源推定用画像表示システムを用いた音源推定用画像の表示方法を示すフローチャートである。It is a flowchart which shows the display method of the image for sound source estimation using the image display system for sound source estimation which concerns on this Embodiment. 極短時間高速フーリエ変換を説明するための図である。It is a figure for demonstrating very short time fast Fourier transform. 本発明による音源推定用画像の一例を示す図である。It is a figure which shows an example of the image for sound source estimation by this invention. 従来の音源推定用画像の一例を示す図である。It is a figure which shows an example of the conventional image for sound source estimation. 従来のマイクロフォン対を用いた音源探査方法におけるマイクロフォンの配列を示す図である。It is a figure which shows the arrangement | sequence of the microphone in the sound source search method using the conventional microphone pair.

以下、実施の形態を通じて本発明を詳説するが、以下の実施の形態は特許請求の範囲に係る発明を限定するものでなく、また、実施の形態の中で説明される特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described in detail through embodiments, but the following embodiments do not limit the invention according to the claims, and all combinations of features described in the embodiments are included. It is not necessarily essential for the solution of the invention.

図１は音源推定用画像表示システムの構成を示す機能ブロック図である。
音源推定用画像表示システムは、音・映像採取ユニット１０とデータ処理装置２０と演算置３０と表示装置４０と記憶装置５０とを備える。
データ処理装置２０は、増幅器２１と、Ａ／Ｄ変換器２２と、映像入出力手段２３とを備える。
演算装置３０は、バッファ３１と、音圧波形データ抽出手段３２と、クロススペクトル演算手段３３と、到達時間差算出手段３４と、音源方向推定手段３５と、画像データ抽出手段３６と、音源推定用画像作成手段３７とを備える。この演算装置３０は、例えば、パーソナルコンピュータのソフトウェアにより構成される。
表示装置４０は、後述する音源位置を推定するための画像である音源位置推定画像を表示する表示画面４０Ｍを備える。
記憶装置５０は、例えば、パーソナルコンピュータのハードディスクなどから構成されるメモリーである。 FIG. 1 is a functional block diagram showing a configuration of a sound source estimation image display system.
The sound source estimation image display system includes a sound / video sampling unit 10, a data processing device 20, a calculation device 30, a display device 40, and a storage device 50.
The data processing device 20 includes an amplifier 21, an A / D converter 22, and a video input / output unit 23.
The calculation device 30 includes a buffer 31, sound pressure waveform data extraction means 32, cross spectrum calculation means 33, arrival time difference calculation means 34, sound source direction estimation means 35, image data extraction means 36, and sound source estimation image. Creating means 37. The arithmetic device 30 is configured by software of a personal computer, for example.
The display device 40 includes a display screen 40M that displays a sound source position estimation image that is an image for estimating a sound source position, which will be described later.
The storage device 50 is a memory composed of, for example, a hard disk of a personal computer.

音・映像採取ユニット１０は、音採取手段１１と、映像採取手段としてのＣＣＤカメラ（以下、カメラという）１２と、マイクロフォン固定部１３と、カメラ支持台１４と、支柱１５と、基台１６とを備える。
音採取手段１１は複数のマイクロフォンＭ１〜Ｍ５を備える。
マイクロフォンＭ１〜Ｍ５の配置は、図６に示したものと同様で、４個のマイクロフォンＭ１〜Ｍ４を、互いに直交する２直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対（Ｍ１，Ｍ３）及びマイクロフォン対（Ｍ２，Ｍ４）を構成するように配置するとともに、第５のマイクロフォンＭ５を前記マイクロフォンＭ１〜Ｍ４の作る平面上にない位置、詳細には、マイクロフォンＭ１〜Ｍ４の作る正方形を底面とする四角錐の頂点の位置に配置する。これにより、更に４組のマイクロフォン対（Ｍ５, Ｍ１）〜（Ｍ５, Ｍ４）が構成される。
本例では、カメラ１２の撮影方向を、前記直交する２直線の交点を通り前記２直線とほぼ４５°をなす方向に設定している。したがって、音・映像採取ユニット１０の向きは、図１の白抜きの矢印Ｄの方向となる。カメラ１２は、音・映像採取ユニット１０の向きに応じた映像を採取する。 The sound / video collection unit 10 includes a sound collection unit 11, a CCD camera (hereinafter referred to as a camera) 12, a microphone fixing unit 13, a camera support base 14, a support column 15, and a base 16 as a video collection unit. Is provided.
The sound collection means 11 includes a plurality of microphones M1 to M5.
The arrangement of the microphones M1 to M5 is the same as that shown in FIG. 6, and two microphone pairs (M1, M1) are arranged with four microphones M1 to M4 arranged at predetermined intervals on two straight lines orthogonal to each other. M5) and the microphone pair (M2, M4) are arranged so that the fifth microphone M5 is not on the plane formed by the microphones M1 to M4. Specifically, a square formed by the microphones M1 to M4 is formed. It arranges at the position of the apex of the quadrangular pyramid as the bottom. Thereby, four pairs of microphones (M5, M1) to (M5, M4) are further configured.
In this example, the shooting direction of the camera 12 is set to a direction that passes through the intersection of the two orthogonal lines and forms approximately 45 ° with the two lines. Therefore, the direction of the sound / image collection unit 10 is the direction of the white arrow D in FIG. The camera 12 collects an image corresponding to the direction of the sound / image collection unit 10.

マイクロフォン固定部１３にはマイクロフォンＭ１〜Ｍ５が設置され、カメラ支持台１４にはカメラ１２が設置され、マイクロフォン固定部１３とカメラ支持台１４とは、３本の支柱１５によって連結されている。つまり、音採取手段１１とカメラ１２とは一体化されている。なお、マイクロフォンＭ１〜Ｍ５は、カメラ１２の上部に配置される。
基台１６は、回転支柱１６ａとこの回転支柱１６ａを回転させる図示しない回転機構を備えた支持台１６ｂとを備えており、カメラ支持台１４は回転支柱１６ａ上に搭載されている。したがって、回転支柱１６ａを回転させることにより、音採取手段１１とカメラ１２とを一体に回転させることができる。なお、回転機構を省略し、作業者が基台１６を回転させることで、音・映像採取ユニット１０の向きを変更するようにしてもよい。
マイクロフォンＭ１〜Ｍ５は、図示しない音源から到来する音の音圧信号の大きさである音圧レベルをそれぞれ測定する。 The microphones M1 to M5 are installed on the microphone fixing unit 13, the camera 12 is installed on the camera support base 14, and the microphone fixing unit 13 and the camera support base 14 are connected by three support columns 15. That is, the sound collection means 11 and the camera 12 are integrated. The microphones M1 to M5 are disposed on the upper part of the camera 12.
The base 16 includes a rotary support 16a and a support 16b including a rotation mechanism (not shown) that rotates the rotary support 16a, and the camera support 14 is mounted on the rotary support 16a. Therefore, the sound collection means 11 and the camera 12 can be rotated together by rotating the rotary support 16a. The rotation mechanism may be omitted, and the operator may change the direction of the sound / video sampling unit 10 by rotating the base 16.
The microphones M1 to M5 each measure a sound pressure level that is a magnitude of a sound pressure signal of sound coming from a sound source (not shown).

増幅器２１はローパスフィルタを備え、マイクロフォンＭ１〜Ｍ５で採取した音の音圧信号から高周波ノイズ成分を除去するとともに、前記各音圧信号を増幅してＡ／Ｄ変換器２２に出力する。
Ａ／Ｄ変換器２２は、前記音圧信号をＡ／Ｄ変換した音圧波形データを作成し、これを、バッファ３１の音圧波形データ保存領域３１ａに送る。この音圧波形データ保存領域３１ａは小領域３１１〜３１５に区切られ、各小領域３１１〜３１５にそれぞれマイクロフォンＭ１〜Ｍ５の音圧波形データが保存される。
映像入出力手段２３は、カメラ１２で連続的に撮影された映像信号を入力し、予め設定された画面切換時間Ｔ_p（例えば、Ｔ_p＝１／３０秒）毎に撮影方向の画像データをバッファ３１の画像データ保存領域３１ｂに送る。
前記所定時間Ｔ_p毎に出力される画像データは、表示装置４０の表示画面４０Ｍに表示される一つの画面、いわゆる動画の「一コマ」分の画像を構成する画像データである。 The amplifier 21 includes a low-pass filter, removes high frequency noise components from the sound pressure signals of the sounds collected by the microphones M1 to M5, amplifies the sound pressure signals, and outputs them to the A / D converter 22.
The A / D converter 22 generates sound pressure waveform data obtained by A / D converting the sound pressure signal, and sends the sound pressure waveform data to the sound pressure waveform data storage area 31 a of the buffer 31. The sound pressure waveform data storage area 31a is divided into small areas 311 to 315, and the sound pressure waveform data of the microphones M1 to M5 are stored in the small areas 311 to 315, respectively.
The video input / output means 23 inputs video signals continuously shot by the camera 12 and outputs image data in the shooting direction every preset screen switching time T _p (eg, T _p = 1/30 seconds). The data is sent to the image data storage area 31b of the buffer 31.
The image data output at each predetermined time T _p is image data that constitutes one screen displayed on the display screen 40M of the display device 40, that is, an image for “one frame” of a so-called moving image.

音圧波形データ抽出手段３２は、バッファ３１の音圧波形データ保存領域３１ａから、予め設定された高速フーリエ変換（以下、ＦＦＴという）の解析区間の長さＴ_Fの音圧波形データを順次取出して、クロススペクトル演算手段３３の高速フーリエ変換器３３１〜３３５に順次出力する。詳細には、小領域３１１〜３１５から取出されたマイクロフォンＭ１〜Ｍ５の音圧波形データは、高速フーリエ変換器３３ｋに出力される。
なお、音圧波形データをＡ／Ｄ変換器２２から直接高速フーリエ変換器３３ｋに出力してもよい。また、音圧波形データをＡ／Ｄ変換器２２から記憶装置５０に保存して、記憶装置５０から高速フーリエ変換器３３ｋに出力してもよいが、処理速度を考慮すると、音圧波形データは、Ａ／Ｄ変換器２２から直接、もしくは、バッファ３１を介して高速フーリエ変換器３３ｋに出力することが好ましい。 The sound pressure waveform data extraction means 32 sequentially extracts sound pressure waveform data of the length _TF of the analysis section of a preset fast Fourier transform (hereinafter referred to as FFT) from the sound pressure waveform data storage area 31a of the buffer 31. And sequentially output to the fast Fourier transformers 331 to 335 of the cross spectrum calculation means 33. Specifically, the sound pressure waveform data of the microphones M1 to M5 extracted from the small regions 311 to 315 is output to the fast Fourier transformer 33k.
The sound pressure waveform data may be output directly from the A / D converter 22 to the fast Fourier transformer 33k. The sound pressure waveform data may be stored in the storage device 50 from the A / D converter 22 and output from the storage device 50 to the fast Fourier transformer 33k. However, considering the processing speed, the sound pressure waveform data is It is preferable to output the data directly from the A / D converter 22 or through the buffer 31 to the fast Fourier transformer 33k.

クロススペクトル演算手段３３は、高速フーリエ変換器３３ｋと、クロススペクトル演算器３３ｍと、加重平均クロススペクトル生成器３３Ｍとを備える。
高速フーリエ変換器３３ｋは、５台の高速フーリエ変換器３３１〜３３５を備え、それぞれ、マイクロフォンＭｋ（ｋ＝１〜５）の音圧波形データに対して、解析区間の長さＴ_Fが、例えば、２ｍｓｅｃ．と極端に短い極短時間高速フーリエ変換を予め設定された計測時間Ｔ_c内にＮ回行ない、その結果を、順次、クロススペクトル演算器３３ｍに出力する。
なお、極短時間高速フーリエ変換は、長さが解析区間の長さに等しい窓関数を用いて連続して行うが、本例では、解析区間の長さが短いことから、時間的に前後する解析区間の一部を重複させることが好ましい。 The cross spectrum calculation means 33 includes a fast Fourier transformer 33k, a cross spectrum calculator 33m, and a weighted average cross spectrum generator 33M.
The fast Fourier transformer 33k includes five fast Fourier transformers 331 to 335, and for each of the sound pressure waveform data of the microphone Mk (k = 1 to 5), the length _TF of the analysis section is, for example, 2 msec. An extremely short extremely short time fast Fourier transform is performed N times within a preset measurement time _Tc , and the results are sequentially output to the cross spectrum calculator 33m.
Note that the extremely short-time fast Fourier transform is continuously performed using a window function whose length is equal to the length of the analysis section. However, in this example, since the length of the analysis section is short, the time is around. It is preferable to overlap a part of the analysis interval.

クロススペクトル演算器３３ｍは、６台のクロススペクトル演算器３３ｘ，３３ｙ，３３ａ〜３３ｄを備え、極短時間ＦＦＴの処理毎に、高速フーリエ変換器３３１〜３３５から出力される予め設定された６組のマイクロフォン対のクロススペクトルｐ_n（ｆ）とその振幅ｗ_n（ｆ）とを順次求める（ｎ＝１〜Ｎ）。
具体的には、クロススペクトル演算器３３ｘは、高速フーリエ変換器３３１，３３３から出力されるマイクロフォン対（Ｍ１，Ｍ３）を構成するマイクロフォンＭ１，Ｍ３の音圧波形データであるＸ_n1（ｆ）とＸ_n3（ｆ）とのクロススペクトルｐ_n１3（ｆ）とその振幅ｗ_n13（ｆ）とを、極短時間ＦＦＴの処理毎に順次求める。
クロススペクトル演算器３３ｙは、高速フーリエ変換器３３２，３３４から出力されるマイクロフォン対（Ｍ２，Ｍ４）を構成するマイクロフォンＭ２，Ｍ４の音圧波形データであるＸ_n2（ｆ）とＸ_n4（ｆ）とのクロススペクトルｐ_n24（ｆ）とその振幅ｗ_n24（ｆ）とを求める。
クロススペクトル演算器３３ａ〜３３ｄは、それぞれ、高速フーリエ変換器３３５から出力されるマイクロフォンＭ５の音圧波形データであるＸ_n5（ｆ）と高速フーリエ変換器３３１〜３３４から出力されるマイクロフォンＭ１〜Ｍ４の音圧波形データであるＸ_ni（ｆ）とのクロススペクトルｐ_n5j（ｆ）とその振幅ｗ_n5j（ｆ）（ｊ＝１〜４）とをそれぞれ求める。
なお、クロススペクトルｐ_n（ｆ）は周波数ｆ毎に演算する。 The cross spectrum calculator 33m includes six cross spectrum calculators 33x, 33y, and 33a to 33d, and six preset groups output from the fast Fourier transformers 331 to 335 for each extremely short time FFT processing. The cross spectrum p _n (f) and the amplitude w _n (f) of the microphone pair are sequentially obtained (n = 1 to N).
Specifically, the cross spectrum calculator 33x and X _n1 (f) which are sound pressure waveform data of the microphones M1 and M3 constituting the microphone pair (M1, M3) output from the fast Fourier transformers 331 and 333, and A cross spectrum p _n13 (f) with X _n3 (f) and its amplitude w _n13 (f) are sequentially obtained for each extremely short time FFT processing.
Cross-spectral calculator 33y includes a microphone pair outputted from the fast Fourier transformer 332, 334 (M2, M4) is a sound pressure waveform data of the microphone M2, M4 constituting the X _n2 (f) and X _n4 (f) The cross spectrum p _n24 (f) and its amplitude w _n24 (f) are obtained.
The cross spectrum calculators 33a to 33d are X _n5 (f) which is the sound pressure waveform data of the microphone M5 output from the fast Fourier transformer 335 and the microphones M1 to M4 output from the fast Fourier transformers 331 to 334, respectively. The cross spectrum p _n5j (f) with the sound pressure waveform data X _ni (f) and the amplitude w _n5j (f) (j = 1 to 4) are respectively obtained.
The cross spectrum p _n (f) is calculated for each frequency f.

加重平均クロススペクトル生成器３３Ｍは、６台の加重平均クロススペクトル生成器３３Ｘ，３３Ｙ，３３Ａ〜３３Ｄを備え、各クロススペクトル演算器３３ｘ，３３ｙ，３３ａ〜３３ｄでそれぞれ求めたＮ個のクロススペクトルｐ_n（ｆ）の加重平均クロススペクトルを求める。
加重平均クロススペクトル生成器３３Ｘは、クロススペクトル演算器３３ｘから順次出力されるｎ＝１〜Ｎ個のクロススペクトルｐ_n１3（ｆ）とその振幅ｗ_n13（ｆ）を図示しないメモリーに一時記憶するとともに、クロススペクトルｐ_n１3（ｆ）をその振幅ｗ_n13（ｆ）によって加重平均して、マイクロフォンＭ１で採取した音圧信号とマイクロフォンＭ３で採取した音圧信号との加重平均クロススペクトルＰ_１3（ｆ）を求める。
加重平均クロススペクトル生成器３３Ｙは、クロススペクトル演算器３３ｙで求めたクロススペクトルｐ_n24（ｆ）をその振幅ｗ_n24（ｆ）とを用いてマイクロフォンＭ２で採取した音圧信号とマイクロフォンＭ４で採取した音圧信号との加重平均クロススペクトルＰ₂₄（ｆ）を求める。
加重平均クロススペクトル生成器３３Ａ〜３３Ｄは、クロススペクトル演算器３３ａ〜３３ｄでそれぞれ求めたクロススペクトルｐ_n5j（ｆ）をその振幅ｗ_n5j（ｆ）によって加重平均して、マイクロフォンＭ５で採取した音圧信号とマイクロフォンＭｊで採取した音圧信号との加重平均クロススペクトルＰ_5j（ｆ）をそれぞれ求める（ｊ＝１〜４）。 The weighted average cross spectrum generator 33M includes six weighted average cross spectrum generators 33X, 33Y, and 33A to 33D, and N cross spectra p obtained by the cross spectrum calculators 33x, 33y, and 33a to 33d, respectively. _n Find the weighted average cross spectrum of (f).
The weighted average cross spectrum generator 33X temporarily stores n = 1 to N cross spectra p _n13 (f) and their amplitudes w _n13 (f) sequentially output from the cross spectrum calculator _33x in a memory (not shown). The cross spectrum p _n13 (f) is weighted and averaged by the amplitude w _n13 (f), and the weighted average cross spectrum P ₁₃ (f) between the sound _pressure signal sampled by the microphone M1 and the sound _pressure signal sampled by the microphone M3. Ask for.
The weighted average cross spectral generator 33Y were harvested cross spectral p _n24 determined by cross-spectral calculator 33y to (f) in its amplitude w _n24 (f) and the sound pressure signals collected by a microphone M2 with microphone M4 A weighted average cross spectrum P ₂₄ (f) with the sound pressure signal is obtained.
The weighted average cross spectrum generators 33A to 33D perform the weighted average of the cross spectra p _n5j (f) obtained by the cross spectrum calculators 33a to 33d, respectively, using the amplitude w _n5j (f), and the sound pressure collected by the microphone M5. A weighted average cross spectrum P _5j (f) between the signal and the sound pressure signal collected by the microphone Mj is obtained (j = 1 to 4).

到達時間差算出手段３４は、加重平均クロススペクトル生成器３３Ｍで求められた加重平均クロススペクトルＰ_ij（ｆ）から、以下の式（１）を用いて各マイクロフォン対（Ｍ_i，Ｍ_j）を構成するマイクロフォンＭ_i，Ｍ_j間の音の到達時間差Ｄ_ijを算出する。

Ｄ₁₃はマイクロフォン対（Ｍ１，Ｍ３）を構成するマイクロフォンＭ１，Ｍ３に入力する音の到達時間差、Ｄ₂₄はマイクロフォン対（Ｍ２，Ｍ４）を構成するマイクロフォンＭ２，Ｍ４に入力する音の到達時間差、Ｄ_5j（ｊ＝１〜４）は第５のマイクロフォンＭ５に入力する音圧信号とマイクロフォンＭ１〜Ｍ４のそれぞれに入力する音圧信号との到達時間差である。
到達時間差Ｄ_ijは周波数ｆ毎に算出する。
音源方向推定手段３５では、前記求められた到達時間差Ｄ₁₃，Ｄ₂₄及び到達時間差Ｄ_5j（ｊ＝１〜４）から、下記の式（２），（３）を用いて、計測点から見た到来した音の方向である水平角θと仰角φとを算出することで、音源方向を推定する。

The arrival time difference calculation means 34 configures each microphone pair (M _i , M _j ) using the following equation (1) from the weighted average cross spectrum P _ij (f) obtained by the weighted average cross spectrum generator 33M. A sound arrival time difference D _ij between the microphones M _i and M _j is calculated.

D ₁₃ is a difference in arrival time of sounds input to the microphones M 1 and M 3 constituting the microphone pair (M 1, M 3), D ₂₄ is a difference in arrival time of sounds input to the microphones M 2 and M 4 constituting the microphone pair (M 2, M 4), D _5j (j = 1 to 4) is a difference in arrival time between the sound pressure signal input to the fifth microphone M5 and the sound pressure signal input to each of the microphones M1 to M4.
The arrival time difference D _ij is calculated for each frequency f.
The sound source direction estimating means 35 uses the following expressions (2) and (3) from the obtained arrival time differences D ₁₃ and D ₂₄ and the arrival time difference D _5j (j = 1 to 4) to see from the measurement point. The sound source direction is estimated by calculating the horizontal angle θ and the elevation angle φ, which are directions of the incoming sound.

画像データ抽出手段３６は、バッファ３１の画像データ保存領域３１ｂから、Ｎ/２回目極短時間ＦＦＴの処理を行った時間、すなわち、前述した計測時間Ｔ_cの中心に相当する時間に最も近い時間に撮影した画像データを抽出し、これを音源推定用画像作成手段３７に出力する。
音源推定用画像作成手段３７は、音源方向推定手段３５で推定された水平角θと仰角φのデータと画像データ抽出手段３６で抽出された画像データとを合成し、画像中に音源の方向と大きさとを示す図形が描画された音源方向推定画像を作成して表示装置４０に出力する。
記憶装置５０は、水平角θと仰角φのデータと音源方向推定画像に使用した画像データとを計測時刻とともに記憶する。なお、計測時刻は音源方向推定画像に使用した画像データの撮影時刻である。 The image data extracting means 36 is the time closest to the time corresponding to the center of the measurement time T _c described above, that is, the time when the N / 2th extremely short time FFT processing is performed from the image data storage area 31b of the buffer 31. Then, the image data taken is extracted and output to the sound source estimation image creating means 37.
The sound source estimation image creating means 37 synthesizes the horizontal angle θ and elevation angle φ data estimated by the sound source direction estimating means 35 and the image data extracted by the image data extracting means 36, and the direction of the sound source in the image. A sound source direction estimation image in which a graphic indicating the size is drawn is created and output to the display device 40.
The storage device 50 stores the horizontal angle θ and elevation angle φ data and the image data used for the sound source direction estimation image together with the measurement time. The measurement time is the shooting time of the image data used for the sound source direction estimation image.

次に、本例の音源推定用画像表示システムを用いた音源方向の推定方法と、音源推定用画像の表示方法について、図２のフローチャートを参照して説明する。
まず、音・映像採取ユニット１０とデータ処理装置２０と演算装置３０と表示装置４０とを接続した後、音・映像採取ユニット１０を計測点にセットする（ステップＳ１０）。
作業者は、カメラ１２の撮影方向を測定予定場所に向け、表示画面４０Ｍを見てカメラ１２が計測予定場所を撮影していることを確認した後、マイクロフォンＭ１〜Ｍ５にて音を採取すると同時に、カメラ１２にて計測予定場所の映像を採取する（ステップＳ１１）。
次に、マイクロフォンＭ１〜Ｍ５で採取した音の音圧信号を増幅してＡ／Ｄ変換しこのＡ／Ｄ変換したデジタル信号（以下、音圧波形データという）をバッファ３１の音ファイル保存領域３１ａに保存するととともに、カメラ１２の映像信号をＡ／Ｄ変換、このＡ／Ｄ変換したデジタル信号（以下、画像データという）をバッファ３１の動画ファイル保存領域３１ｂに保存する（ステップＳ１２）。 Next, a sound source direction estimation method and a sound source estimation image display method using the sound source estimation image display system of this example will be described with reference to the flowchart of FIG.
First, after connecting the sound / video sampling unit 10, the data processing device 20, the arithmetic device 30, and the display device 40, the sound / video sampling unit 10 is set at a measurement point (step S10).
The operator directs the shooting direction of the camera 12 to the planned measurement location, sees the display screen 40M and confirms that the camera 12 is shooting the planned measurement location, and then simultaneously collects sound with the microphones M1 to M5. Then, an image of the measurement planned place is collected by the camera 12 (step S11).
Next, the sound pressure signal of the sound collected by the microphones M1 to M5 is amplified and A / D converted, and this A / D converted digital signal (hereinafter referred to as sound pressure waveform data) is stored in the sound file storage area 31a of the buffer 31. In addition, the video signal of the camera 12 is A / D converted, and the A / D converted digital signal (hereinafter referred to as image data) is stored in the moving image file storage area 31b of the buffer 31 (step S12).

次に、バッファ３１の音圧波形データ保存領域３１ａから、予め設定された長さＴ_Fの音圧波形データを順次取出して極短時間高速フーリエ変換を行い（ステップＳ１３）、しかる後に、これら極短時間高速フーリエ変換した音圧波形データから、予め設定しておいたマイクロフォン対（Ｍｉ，Ｍｊ）を構成するマイクロフォンＭｉの音圧波形データとマイクロフォンＭｊの音圧波形データとを取出してクロススペクトルｐ_nij求めるとともにクロススペクトルの振幅の大きさ（振幅値）ｗ_nijを算出する（ステップＳ１４）。なお、ｐ_nijは、ｎ回目（ｎ＝１〜Ｎ）に極短時間高速フーリエ変換したマイクロフォンＭｉの音圧波形デーとマイクロフォンＭｊの音圧波形デーとのクロススペクトルである。
クロススペクトルｐ_nijとその振幅値ｗ_nijの算出は、解析区間の長さＴ_Fとサンプリング周期と応じて決定される周波数帯域毎に行う。本例では、周波数帯域が１０〜５００Ｈｚ，５００〜１０００Ｈｚ，１０００〜７５００Ｈｚの３つの周波数帯域に分けてそれぞれクロススペクトルｐ_ij（ｆ）を求めた。
極短時間高速フーリエ変換は、前述したように、解析区間の長さＴ_Fが、例えば、２ｍｓｅｃ．と極端に短い高速フーリエ変換で、本例では、この極短時間高速フーリエ変換を予め設定された計測時間Ｔ_c内に多数回行なう。
具体的には、図３（ａ）に示すように、従来のＦＦＴの解析区間Ｔ₀の長さ（約１．０ｓｅｃ．）に対して、本例では、図３（ｂ）に示すように、ＦＦＴの解析区間の長さＴ_Fを極端に短くするとともに、極短時間高速フーリエ変換を解析区間Ｔ₀の長さに亘って連続してＮ回（Ｎ≧１００）行っている。なお、解析区間の長さＴ_Fとしては、０．１ｍｓｅｃ．〜１０ｍｓｅｃ．の範囲とすることが好ましく、１ｍｓｅｃ．〜２ｍｓｅｃ．とすると更に好ましい。
なお、極短時間高速フーリエ変換は、長さが解析区間の長さに等しい窓関数を用いて連続して行ってもよいが、解析区間の長さが短いことから、図３（ｂ）に示すように、時間的に前後する解析区間の一部を重複させて行うことが好ましい。 Next, sound pressure waveform data having a preset length _TF is sequentially extracted from the sound pressure waveform data storage area 31a of the buffer 31 and subjected to extremely short-time fast Fourier transform (step S13). The sound pressure waveform data of the microphone Mi and the sound pressure waveform data of the microphone Mj constituting the microphone pair (Mi, Mj) set in advance are extracted from the sound pressure waveform data subjected to the fast Fourier transform for a short time, and the cross spectrum p is obtained. _Nij is calculated and the amplitude (amplitude value) w _nij of the cross spectrum is calculated (step S14). Note that p _nij is a cross spectrum of the sound pressure waveform data of the microphone Mi and the sound pressure waveform data of the microphone Mj that have been subjected to the fast Fourier transform for the n-th time (n = 1 to N).
The calculation of the cross spectrum p _nij and the amplitude value w _nij is performed for each frequency band determined according to the length T _F of the analysis section and the sampling period. In this example, the cross spectrum p _ij (f) is obtained by dividing the frequency band into three frequency bands of 10 to 500 Hz, 500 to 1000 Hz, and 1000 to 7500 Hz.
As described above, the extremely short time fast Fourier transform has an analysis interval length _TF of 2 msec. In this example, this extremely short-time fast Fourier transform is performed many times within a preset measurement time _Tc .
Specifically, as shown in FIG. 3A, in contrast to the length (about 1.0 sec.) Of the conventional FFT analysis section T ₀ , in this example, as shown in FIG. The length _TF of the FFT analysis interval is extremely shortened, and the extremely short time fast Fourier transform is performed N times (N ≧ 100) continuously over the length of the analysis interval T ₀ . The analysis section length _TF is 0.1 msec. -10 msec. Is preferably in the range of 1 msec. ~ 2 msec. More preferably.
Note that the extremely short-time fast Fourier transform may be continuously performed using a window function whose length is equal to the length of the analysis section. However, since the length of the analysis section is short, FIG. As shown, it is preferable to carry out by overlapping a part of analysis sections that are temporally mixed.

ステップＳ１５では、クロススペクトルの演算が終了したか否かを判定する。
クロススペクトルの演算が終了していない場合には、前記ステップＳ１３に戻って、音圧波形データ保存領域３１ａから、次に解析する音圧波形データを取出して極短時間高速フーリエ変換を行いクロススペクトルを演算するという操作を繰り返す。クロススペクトルの演算が終了した場合には、ステップＳ１６に進んで、Ｎ回の操作で得られたＮ個のクロススペクトルｐ_n（ｆ）とその振幅ｗ_n（ｎ＝１〜Ｎ）から、加重平均クロススペクトルＰ（ｆ）を求める。
加重平均クロススペクトルＰ_１3（ｆ）は以下の式で表わせる。
Ｐ_１3（ｆ）＝｛Σｗ_n13（ｆ）・ｐ_n１3（ｆ）｝/｛Σｗ_n13｝……Σはｎ＝１〜Ｎの和。
次に、加重平均クロススペクトルＰ_ij（ｆ）から、マイクロフォンＭ_i，Ｍ_j間の音の到達時間差Ｄ_ijを算出し（ステップＳ１７）、これら到達時間差Ｄ_ijから前述した式（２），（３）を用いて水平角θと仰角φとを算出して、到来した音の音源方向を推定する（ステップＳ１８）。
加重平均クロススペクトルＰ_ij（ｆ）は、クロススペクトルｐ_n（ｆ）をその振幅ｗ_nで加重平均しているので、直接音よりも振幅が小さくかつ振幅のバラつきの大きな反射音の成分は従来のクロススペクトルＰ_ij（ｆ）から求めた反射音の成分よりもかなり小さくなるので、前述の式（１）を用いてマイクロフォンＭ_i，Ｍ_j間の音の到達時間差Ｄ_ijを算出することで、直接音の到達時間差Ｄ_ijのみを抽出することができる。
また、従来のＦＦＴにおいては、衝撃音が発生した場合には、衝撃音が周期的な音でなくかつ持続時間が短いため、衝撃音の音源を精度よく把握することができなかったが、本例では、極短時間高速フーリエ変換した音圧波形データのクロススペクトルｐ_n（ｆ）をその振幅ｗ_nで加重平均しているので、衝撃音の継続時間が短い場合でも、衝撃音を的確に把握することができる。 In step S15, it is determined whether or not the calculation of the cross spectrum has been completed.
If the calculation of the cross spectrum has not been completed, the process returns to step S13, the sound pressure waveform data to be analyzed next is taken out from the sound pressure waveform data storage area 31a, and the extremely short time fast Fourier transform is performed to perform the cross spectrum. The operation of calculating is repeated. When the calculation of the cross spectrum is completed, the process proceeds to step S16, and weighting is performed from the N cross spectra p _n (f) and the amplitudes w _n (n = 1 to N) obtained by N operations. An average cross spectrum P (f) is obtained.
The weighted average cross spectrum P ₁₃ (f) can be expressed by the following equation.
P ₁₃ (f) = {Σw _n13 (f) · p _n13 (f)} / {Σw _n13 } …… Σ is the sum of n = 1 to _N.
Then, weighted average from the cross spectrum P _ij (f), calculates the microphone M _i, the arrival time difference D _ij sound between M _j (step S17), the formula (2) described above from these arrival time differences D _ij, ( The horizontal angle θ and the elevation angle φ are calculated using 3), and the sound source direction of the incoming sound is estimated (step S18).
The weighted average cross spectrum P _ij (f), since by using the weighted averages of the cross-spectrum p _n (f) in its amplitude w _n, components of large reflection sound amplitude is small and the amplitude of variation than direct sound conventional Therefore, the sound arrival time difference D _ij between the microphones M _i and M _j is calculated by using the above-described equation (1) because it is much smaller than the reflected sound component obtained from the cross spectrum P _ij (f). Only the arrival time difference D _ij of the direct sound can be extracted.
In addition, in the conventional FFT, when an impact sound is generated, the impact sound is not a periodic sound and the duration is short, so the sound source of the impact sound cannot be accurately grasped. In the example, since the cross spectrum p _n (f) of the sound pressure waveform data subjected to extremely short-time fast Fourier transform is weighted and averaged with the amplitude w _n , the impact sound is accurately obtained even when the duration of the impact sound is short. I can grasp it.

音源方向の推定が終了した後には、音源方向を撮影した画像データと推定された水平角θと仰角φのデータとを合成し、画像中に、例えば、半径が到達音の大きさを示し模様が周波数を示す円などの、音源の方向と音の大きさを示す図形が描画された音源方向推定画像を作成し、これを表示手段４０の表示画面４０Ｍに表示する（ステップＳ１８）。
図４は、音源方向推定画像の一例としての車室内における音源方向推定画像を示す図、図５は従来の音源推定方法を用いて作成した音源方向推定画像を示す図で、横軸は水平角θ、縦軸は仰角φである。
図４において、左下がりの斜線を施した円が周波数帯域が１０〜５００Ｈｚの音源、右下がりの斜線を施した円が周波数帯域が５００〜１０００Ｈｚの音源、網目を施した円が周波数帯域が１０００〜１５００Ｈｚの音源である。
一方、図５においては、音源方向を、図３（ａ）に示す方法で高速フーリエ変換して求めたものである。比較のため、３１．５〜５００Ｈｚのバンドを全て左下がりの斜線を施した円とし、５００〜１０００Ｈzのバンドを全て右下がりの斜線を施した円とし、１０００〜７５００Ｈzのバンドを全て網目を施した円とした。
図４と図５とを比較して明らかなように、従来の方法では、反射音が大きいだけなく、直接音も反射音も周波数でばらついているのに対し、本実施の形態の方法では、周波数帯域についての情報については精度が低いものの、反射音もなく、音源の位置のバラツキも少ない。したがって、本実施の形態の方法を用いることにより、反射音の大きい場であっても直接音の音源方向を容易にかつ精度よく推定することができることが確認された。 After the estimation of the sound source direction is completed, the image data obtained by photographing the sound source direction and the estimated horizontal angle θ and elevation angle φ data are combined, and for example, the radius indicates the size of the arrival sound. A sound source direction estimation image in which a graphic indicating the direction of the sound source and the size of the sound, such as a circle indicating the frequency, is created and displayed on the display screen 40M of the display means 40 (step S18).
4 is a diagram showing a sound source direction estimation image in a vehicle interior as an example of a sound source direction estimation image, FIG. 5 is a diagram showing a sound source direction estimation image created using a conventional sound source estimation method, and the horizontal axis is a horizontal angle. θ, the vertical axis is the elevation angle φ.
In FIG. 4, a circle with a left-slanted diagonal line indicates a sound source with a frequency band of 10 to 500 Hz, a circle with a diagonally downward-sloping line indicates a sound source with a frequency band of 500 to 1000 Hz, and a circle with a mesh has a frequency band of 1000 It is a sound source of ˜1500 Hz.
On the other hand, in FIG. 5, the sound source direction is obtained by fast Fourier transform by the method shown in FIG. For comparison, all the bands from 31.5 to 500 Hz are circles with a slanting left slope, all the bands from 500 to 1000 Hz are circles with a slanting right slope, and all the bands from 1000 to 7500 Hz are meshed. It was made a circle.
As is clear from comparison between FIG. 4 and FIG. 5, in the conventional method, not only the reflected sound is large, but also the direct sound and the reflected sound vary in frequency, whereas in the method of the present embodiment, Although the accuracy of the information about the frequency band is low, there is no reflected sound and there is little variation in the position of the sound source. Therefore, by using the method of the present embodiment, it was confirmed that the sound source direction of the direct sound can be estimated easily and accurately even in a field where the reflected sound is large.

なお、前記実施の形態では、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された第１及び第２のマイクロフォン対で採取した音圧信号から到来した音の音源方向を推定したが、本発明はこれに限るものではなく、従来、マイクロフォンアレーを用いた音源推定方向にも適用可能である。
また、前記例では、Ｎ個のクロススペクトルを振幅値により加重平均して加重平均クロススペクトルを求めたが、振幅値の二乗で加重平均してもよい。
また、前記例では、５本のマイクロフォンＭ１〜Ｍ５を用いて、計測点と音源位置とのなす水平角θと仰角φとを推定したが、音源位置が水平角θだけで十分な場合には、マイクロフォンＭ５を省略して、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対（Ｍ１，Ｍ３），（Ｍ２，Ｍ４）のみを用いればよい。 In the above-described embodiment, the sound source direction of the incoming sound is estimated from the sound pressure signals collected by the first and second microphone pairs arranged at predetermined intervals on two straight lines intersecting each other. The invention is not limited to this, and can be applied to a sound source estimation direction using a microphone array.
In the above example, the weighted average cross spectrum is obtained by weighting and averaging the N cross spectra with the amplitude value. However, the weighted average may be obtained by the square of the amplitude value.
In the above example, the horizontal angle θ and the elevation angle φ formed by the measurement point and the sound source position are estimated using the five microphones M1 to M5. However, when the sound source position is sufficient, the horizontal angle θ is sufficient. The microphone M5 may be omitted, and only two pairs of microphones (M1, M3) and (M2, M4) arranged at predetermined intervals on two straight lines that intersect with each other may be used.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は前記実施の形態に記載の範囲には限定されない。前記実施の形態に、多様な変更または改良を加えることが可能であることが当業者にも明らかである。そのような変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the embodiment. It is apparent from the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

以上説明したように、本発明によれば、反射音がある場合にも直接音の音源方向のみを容易にかつ精度よく推定できるとともに、衝撃音についても的確に抽出できるので、簡単な構成で推定精度の高い音源方向推定装置を提供することができる。 As described above, according to the present invention, even when there is a reflected sound, only the sound source direction of the direct sound can be estimated easily and accurately, and the impact sound can also be accurately extracted. A highly accurate sound source direction estimating apparatus can be provided.

１０音・映像採取ユニット、１１音採取手段、１２ＣＣＤカメラ、
１３マイクロフォン固定部、１４カメラ支持台、１５支柱、１６基台、
２０データ処理装置、２１増幅器、２２Ａ／Ｄ変換器、２３映像入出力手段、
３０演算装置、３１、バッファ、３１ａ音データ保存領域、
３１ｂ画像データ保存領域、３２音圧波形データ抽出手段、
３３クロススペクトル演算手段、３３ｋ高速フーリエ変換器、
３３ｍクロススペクトル演算器、３３Ｍ加重平均クロススペクトル生成器、
３４到達時間差算出手段、３５音源方向推定手段、３６画像データ抽出手段、
３７音源推定用画像作成手段、
４０表示装置、４０Ｍ表示画面、５０記憶装置、
Ｍ１〜Ｍ５マイクロフォン。 10 sound / video sampling unit, 11 sound sampling means, 12 CCD camera,
13 microphone fixing part, 14 camera support base, 15 struts, 16 base,
20 data processing devices, 21 amplifiers, 22 A / D converters, 23 video input / output means,
30 arithmetic unit 31, buffer, 31a sound data storage area,
31b Image data storage area, 32 sound pressure waveform data extraction means,
33 cross spectrum calculation means, 33k fast Fourier transform,
33m cross spectrum calculator, 33M weighted average cross spectrum generator,
34 arrival time difference calculating means, 35 sound source direction estimating means, 36 image data extracting means,
37 sound source estimation image creation means,
40 display device, 40M display screen, 50 storage device,
M1-M5 microphones.

Claims

A method for estimating the direction of a sound source from sound pressure signals of sounds collected by a plurality of microphones,
Collecting a sound pressure signal of an incoming sound using first and second microphone pairs disposed at predetermined intervals on two straight lines intersecting each other;
The sound pressure signals collected by the microphones M1 and M3 constituting the first microphone pair and the sound pressure signals collected by the microphones M2 and M4 constituting the second microphone pair are A / D converted, respectively. Obtaining respective sound pressure waveform data of sounds collected by the four microphones M1 to M4;
Fast Fourier transform each sound pressure waveform data;
And the fast Fourier transformed the microphone M1, M3 arrival time difference D ₁₃ of the sound between the microphones M1, M3 and a cross spectrum of the sound pressure waveform data determined and a cross spectrum of the sound pressure waveform data of the microphone M2, M4 of Calculating a sound arrival time difference D ₂₄ between the microphones M2 and M4;
Estimating the sound source direction of the incoming sound from the calculated arrival time difference D ₁₃ in the first microphone pair and the arrival time difference D ₂₄ in the second microphone pair;
With
In the fast Fourier transform step, the length of the analysis section is 0.1 msec. -10 msec. Or performing a very short time fast Fourier transform a number of times continuously, or a number of times by overlapping a part of the analysis interval,
The step of calculating the arrival time difference includes:
Obtaining an amplitude value of a cross spectrum obtained for each operation of the ultrashort-time fast Fourier transform;
Obtaining a weighted average cross spectrum obtained by performing a weighted average of the cross spectrum obtained for each operation of the very short time fast Fourier transform from the amplitude value;
Calculating sound arrival time differences D ₁₃ and D ₂₄ between the microphones from the weighted average cross spectrum;
A sound source direction estimation method comprising:

In addition to the four microphones M1 to M4, a fifth microphone M5 that is not on the plane formed by the two pairs of microphones is provided to collect a sound pressure signal of the incoming sound and calculate the arrival time difference Then
It is composed of arrival time differences D ₁₃ and D ₂₄ between the microphones M1 and M3 and the microphones M2 and M4 constituting the two pairs of microphones, the fifth microphone M5, and the four microphones M1 to M4. Calculating arrival time differences D _{51 to} D ₅₄ between the microphones constituting the four microphone pairs;
In the step of estimating the sound source direction,
DOA estimation method according to claim 1, characterized in that for estimating the sound source direction of the incoming sound with the calculated arrival time difference _{_{_{D 13, D 24, D 51}}} ~D 54.

Sound collecting means comprising first and second microphone pairs disposed on two straight lines intersecting each other at a predetermined interval and a fifth microphone not on a plane formed by the two microphone pairs; A sound source direction estimating device for estimating the direction of a sound source from the sound pressure signal of the sound collected by the sound collecting means,
An A / D converter that converts a sound pressure signal collected by each microphone into a digital signal;
A fast Fourier transformer for fast Fourier transforming sound pressure waveform data that is a sound pressure signal converted into the digital signal;
Among the sound pressure waveform data subjected to the fast Fourier transform, a cross spectrum of sound pressure waveform data of sound collected by two microphones constituting the first microphone pair and 2 constituting the second microphone pair Cross spectrum of sound pressure waveform data of sound collected by one microphone, and sound pressure waveform data of sound collected by each of the fifth microphone and the four microphones constituting the first and second microphone pairs Cross spectrum calculation means for calculating the cross spectrum with
Differences in sound arrival times D ₁₃ and D ₂₄ between the microphones constituting the first and second microphone pairs from the cross spectrum, and between the fifth microphone and the four microphones constituting the two microphone pairs. Arrival time difference calculating means for calculating the arrival time differences D _{51 to} D ₅₄ of the sound of
Sound source direction estimating means for estimating a sound source direction using the calculated arrival time differences D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ ,
The fast Fourier transformer has an analysis section length of 0.1 msec. -10 msec. Or performing a very short time fast Fourier transform a number of times continuously, or a number of times by overlapping a part of the analysis interval,
The cross spectrum calculation means obtains the amplitude value of the cross spectrum obtained for each operation of the extremely short time fast Fourier transform, and obtains the cross spectrum obtained for each operation of the extremely short time fast Fourier transform, Obtain a weighted average cross spectrum that is weighted average from the amplitude value,
The arrival time difference calculating means calculates a sound arrival time difference D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ between the microphones from the weighted average cross spectrum.

Sound collecting means comprising first and second microphone pairs arranged at predetermined intervals on two intersecting straight lines and a fifth microphone not on a plane formed by the two microphone pairs, and a sound source direction A photographing means for photographing the image of, and from the sound pressure signal of the sound propagated from the sound source collected by the sound collecting means, the sound pressure signal, and the video signal in the direction of the sound source photographed by the photographing means, A sound source estimation image creating apparatus that creates a sound source estimation image that is an image in which a graphic showing a direction of a sound source is drawn,
An A / D converter that converts the sound pressure signal collected by each microphone and the video signal photographed by the photographing means into digital signals,
A fast Fourier transformer that performs fast Fourier transform on the sound pressure waveform data that is the sound pressure signal converted into the digital signal;
Among the sound pressure waveform data subjected to the fast Fourier transform, a cross spectrum of sound pressure waveform data of sound collected by two microphones constituting the first microphone pair and 2 constituting the second microphone pair Cross spectrum of sound pressure waveform data of sound collected by one microphone, and sound pressure waveform data of sound collected by each of the fifth microphone and the four microphones constituting the first and second microphone pairs Cross spectrum calculation means for calculating the cross spectrum with
Differences in sound arrival times D ₁₃ and D ₂₄ between the microphones constituting the first and second microphone pairs from the cross spectrum, and between the fifth microphone and the four microphones constituting the two microphone pairs. Arrival time difference calculating means for calculating the arrival time differences D _{51 to} D ₅₄ of the sound of
Sound source direction estimating means for estimating a sound source direction using the calculated arrival time differences D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ ;
The estimated sound source direction data and the image data that is the video signal converted into the digital signal are combined to create a sound source estimation image that is an image in which a figure showing the estimated sound source direction is drawn. Sound source estimation image creation means for
The fast Fourier transformer has an analysis section length of 0.1 msec. -10 msec. Or performing a very short time fast Fourier transform a number of times continuously, or a number of times by overlapping a part of the analysis interval,
The cross spectrum calculation means obtains the amplitude value of the cross spectrum obtained for each operation of the extremely short time fast Fourier transform, and obtains the cross spectrum obtained for each operation of the extremely short time fast Fourier transform, Obtain a weighted average cross spectrum that is weighted average from the amplitude value,
The arrival time difference calculating means calculates a sound arrival time difference D ₁₃ , D ₂₄ , D _{51 to} D ₅₄ between the microphones from the weighted average cross spectrum.

5. The sound source estimation image creating apparatus according to claim 4, further comprising display means having a display screen for displaying the created sound source estimation image.