JP4652191B2

JP4652191B2 - Multiple sound source separation method

Info

Publication number: JP4652191B2
Application number: JP2005279512A
Authority: JP
Inventors: 潤二中島; 祐治淺井; 雅直大脇; 健史財満; 恭弘山下
Original assignee: Chubu Electric Power Co Inc; Kumagai Gumi Co Ltd
Current assignee: Chubu Electric Power Co Inc; Kumagai Gumi Co Ltd
Priority date: 2005-09-27
Filing date: 2005-09-27
Publication date: 2011-03-16
Anticipated expiration: 2025-09-27
Also published as: JP2007096418A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method of accurately separating directions of a plurality of sound sources or a direction of a direct sound and a direction of a reflected sound even when the plural sound sources exist or the effect of the reflected sound is great. <P>SOLUTION: Sound vectors A5, B5 from each of sound sources and picked up by a reference microphone M5 are estimated under the condition that a sound vector representing an observed sound picked up by each microphone is represented by a sound vector of sound from each sound source, the magnitude of which represents the amplitude of the sound from each sound source with respect to the amplitude of the observed sound picked up by the reference microphone and the phase angle of which is a phase difference with respect to the observed sound. Directions of the sound sources A, B are estimated by using the sound vectors A5, B5, new sound vectors from the estimated sound source A are obtained, sound vectors A5, B5 resulting from minimizing differences between the new sound vectors and the estimated sound vectors are particularized, and directions of the sound sources A, B estimated by using the particularized sound vectors A5, B5 are used for the directions of the sound sources A, B. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、到来する音の音源が複数個である場合に、その音源の位置を分離して特定する方法に関するもので、特に、周波数が同一かまたは近接している複数音源の位置を分離して特定する方法に関する。 The present invention relates to a method for separating and specifying the positions of sound sources when there are a plurality of sound sources of incoming sounds, and in particular, separating the positions of a plurality of sound sources having the same or close frequencies. It is related with the method of specifying.

従来、音の到来方向を推定する方法としては、複数のマイクロフォンで得られる信号の位相差から音の到来方向を推定する、いわゆる音響学的手法が考案されている。ここで、図１１（ａ）に示すように、２つのマイクロフォンＭ１，Ｍ２を所定の間隔ｄだけ離隔して配置し、このマイクロフォンＭ１，Ｍ２により、θ_s方向から平面波として到来する音波を受音する場合を考える。上記θ_s方向から到来した音波は、まず第１のマイクロフォンＭ１で受音され、次に、上記音波は第２のマイクロフォンＭ２で受音されるとすると、上記第２のマイクロフォンＭ２の受音信号ｘ２（ｔ）は、上記第１のマイクロフォンＭ１の受音信号ｘ１（ｔ）に対して、音波が距離ξ＝ｄ・sinθ_s進行するのに要した時間τ_s＝ξ/ｃ（ｃ；音速）だけ遅れた信号となる。したがって、上記マイクロフォンＭ１，Ｍ２間の時間遅れτ_s（あるいは、マイクロフォンＭ１，Ｍ２間の位相差δ₂₁）を求めることにより、上記音波の到来方向θ_sを求めることができる。
しかし、マイクロフォン数が２個では音波の到来方向θ_sを精度よく測定することは困難であるため、実際には、図１１（ｂ）に示すように、多数のマイクロフォンＭ１〜Ｍｍを等間隔に配置したマイクロフォンアレーを構築し、基準となるマイクロフォンＭ１に対する各マイクロフォンＭｉ（ｉ＝２〜ｍ）の位相差δ_m1から音波の到来方向θ_sを求めるようにしている。具体的には、各マイクロフォンＭ１〜Ｍｍの後段にそれぞれ遅延器Ｄ１〜Ｄｍを設けるとともに、上記遅延器Ｄ１〜Ｄｍの出力を加算する加算器Σを設けて遅延和アレーを構成する。これにより、上記各マイクロフォンＭ１〜Ｍｍの出力信号に、上記Ｍ１〜Ｍｍの幾何学的配置から求めた時間差をそれぞれ遅延して与えると、仮定した方向θ_sからの音波の成分はすべて同期化されるが、仮定した方向以外の方向の成分はキャンセルされて小さくなるので、上記遅延された信号を加算処理することにより、音波の到来方向θ_sを求めることができる。この時間差による遅延和アレー処理を用いて、ある方向から到来する音の成分を強調して取出す方法は、一般に、ビームフォーマ法（または、ビームフォーカシング法）と呼ばれている（例えば、非特許文献１参照）。 Conventionally, as a method for estimating the direction of arrival of sound, a so-called acoustic method for estimating the direction of arrival of sound from the phase difference of signals obtained by a plurality of microphones has been devised. Here, as shown in FIG. 11 (a), two microphones M1 and M2 are spaced apart by a predetermined distance d, and the microphones M1 and M2 receive sound waves arriving as plane waves from the θ _s direction. Consider the case. The sound wave coming from the θ _s direction is first received by the first microphone M1, and then the sound wave is received by the second microphone M2, and the sound reception signal of the second microphone M2 is received. x2 (t) is the time τ _s = ξ / c (c; speed of sound) required for the sound wave to travel the distance ξ = d · sin θ _{s with} respect to the sound reception signal x1 (t) of the first microphone M1. ) Is a delayed signal. Therefore, the arrival direction θ _{s of the} sound wave can be obtained by obtaining the time delay τ _s between the microphones M1 and M2 (or the phase difference δ ₂₁ between the microphones M1 and M2).
However, since it is difficult to accurately measure the arrival direction θ _{s of} sound waves when the number of microphones is two, in practice, a large number of microphones M1 to Mm are equally spaced as shown in FIG. The arranged microphone array is constructed, and the sound wave arrival direction θ _s is obtained from the phase difference δ _m1 of each microphone Mi (i = 2 to m) with respect to the reference microphone M1. Specifically, delay units D1 to Dm are provided in the subsequent stages of the microphones M1 to Mm, respectively, and an adder Σ for adding the outputs of the delay units D1 to Dm is provided to constitute a delay sum array. As a result, when the time difference obtained from the geometrical arrangement of M1 to Mm is delayed and given to the output signals of the microphones M1 to Mm, all the sound wave components from the assumed direction θ _s are synchronized. However, since components in directions other than the assumed direction are canceled and become smaller, the arrival direction θ _s of the sound wave can be obtained by adding the delayed signals. A method of emphasizing and extracting a component of a sound coming from a certain direction using the delay-and-sum array processing based on this time difference is generally called a beam former method (or beam focusing method) (for example, non-patent literature) 1).

一方、計測点に配置された複数のマイクロフォンの出力信号の位相差からではなく、複数のマイクロフォンから互いに交わる直線状に配置された複数のマイクロフォン対を構成し、対となる２つのマイクロフォンＭａ，Ｍｂ間の位相差（時間遅れＤ_ab）と、他の対となる２つのマイクロフォンＭｃ，Ｍｄ間の位相差（時間遅れＤ_cd）とから音源の方向を推定する方法が提案されている（例えば、非特許文献２，３、特許文献１参照）。
すなわち、図１２に示すように、４個のマイクロフォンＭ１〜Ｍ４を、互いに直交する２直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対（Ｍ１，Ｍ３）及びマイクロフォン対（Ｍ２，Ｍ４）を構成するように配置するとともに、第５のマイクロフォンＭ５を上記マイクロフォンＭ１〜Ｍ４の作る平面上にない位置に配置して、更に４組のマイクロフォン対（Ｍ５, Ｍ１）〜（Ｍ５, Ｍ４）を構成した場合、音の入射方向である水平角θと仰角φとは以下の式（１）及び式（２）で表わせる。
ここで、時間遅れＤ_ijは、マイクロフォンＭ_iに到達する音圧信号と、このマイクロフォンＭ_iに対して対となるマイクロフォンＭ_jに到達する音圧信号との時間差であり、この対となる２つのマイクロフォンＭ_i及びＭ_jに入力される信号のクロススペクトルＰ_ij（ｆ）を求め、更に、対象とする上記周波数ｆの位相角情報Ψ（ｒａｄ）を用いて、以下の式（３）により算出される。
これにより、上記マイクロフォンアレーを用いて音源の方向を推定する場合に比較して、少ないマイクロフォン数で音源の方向を正確に推定することができるだけでなく、屋外においても、音源の方向を精度よく推定することができる。
なお、上記式（１），（２）は、マイクロフォン間の距離を半波長とする周波数以下の平面波で成立する。また、対象となる音源位置が測定点とほぼ同一平面上にあり、仰角φを必要としない場合には、２組のマイクロフォン対（Ｍ１，Ｍ３）及び（Ｍ２，Ｍ４）のみで音源の方向である水平角θを推定することができる。
大賀寿郎，山崎芳男，金田豊；音響システムとディジタル処理，コロナ社，１９９５上明戸昇，野上英和，山下恭弘，財満健史，大脇雅直，杉山武，和田浩之；音情報と画像を組込んだ音源探査システムの開発，日本建築学会計画系論文集，第553号，pp17-22,2002.3 大脇雅直，財満健史，和田浩之，山下恭弘；画像に音情報を組込んだ音源探査システムの開発，電力土木、No.308，pp100-104,2003.11 特開２００３−１１１１８３号公報 On the other hand, not a phase difference between output signals of a plurality of microphones arranged at a measurement point, but a plurality of microphone pairs arranged in a straight line intersecting each other from the plurality of microphones, and two microphones Ma and Mb as a pair are formed. A method for estimating the direction of a sound source from the phase difference between the two (time delay D _ab ) and the phase difference (time delay D _cd ) between the two other paired microphones Mc and Md has been proposed (for example, Non-Patent Documents 2 and 3 and Patent Document 1).
That is, as shown in FIG. 12, two microphone pairs (M1, M3) and microphone pairs (M2, M4) in which four microphones M1 to M4 are arranged on two straight lines orthogonal to each other at predetermined intervals, respectively. ) And the fifth microphone M5 is arranged at a position not on the plane formed by the microphones M1 to M4, and four microphone pairs (M5, M1) to (M5, M4) Is configured, the horizontal angle θ and the elevation angle φ, which are the incident directions of sound, can be expressed by the following equations (1) and (2).
The time delay D _ij has a sound pressure signal that reaches the microphone M _i, the time difference between the sound pressure signal that reaches the microphone M _j making a pair with respect to the microphone M _i, the the pair 2 The cross spectrum P _ij (f) of the signals input to the two microphones M _i and M _j is obtained, and further, using the phase angle information Ψ (rad) of the target frequency f, the following equation (3) Calculated.
This makes it possible not only to accurately estimate the direction of the sound source with a small number of microphones, but also to accurately estimate the direction of the sound source even outdoors, as compared to estimating the direction of the sound source using the microphone array. can do.
In addition, said Formula (1), (2) is materialized by the plane wave below the frequency which makes the distance between microphones a half wavelength. When the target sound source position is substantially on the same plane as the measurement point and does not require the elevation angle φ, only two pairs of microphones (M1, M3) and (M2, M4) are used in the direction of the sound source. A certain horizontal angle θ can be estimated.
Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda; Acoustic system and digital processing, Corona, 1995 Noboru Kamiakido, Hidekazu Nogami, Akihiro Yamashita, Takefumi Kazama, Masanao Owaki, Takeshi Sugiyama, Hiroyuki Wada; Development of sound source exploration system incorporating sound information and images, Architectural Institute of Japan Proceedings, No.553, pp17- 22,2002.3 Masaaki Ohwaki, Takefumi Mitsuma, Hiroyuki Wada, Akihiro Yamashita; Development of sound source exploration system incorporating sound information in images, Electric Power Engineering, No.308, pp100-104, 2003.11 JP 2003-111183 A

しかしながら、上記従来の方法では、複数の音源の周波数が異なる場合には、受音する信号から特定の周波数の音信号を抽出して、それぞれの音源の方向を推定すればよいが、周波数が同一である場合には、互いの音が干渉し合うことによって、正しく目的音の方向を推定できない場合がある。また、音源は１個であるが、路面や壁などによる反射音が大きい場合にも、同一周波数の複数の音源が存在する場合と同様に、上記音源の方向を正確に測定できなかった。
例えば、上記ビームフォーマ法では、音波の到来方向（メインローブ）以外に、一定の方向にサイドローブと呼ばれている弱い指向性が現れることから、音源が多数ある場合には、到来方向の精度が低下してしまっていた。そこで、この他方向からの出力の寄与を最小にするため、目的音の大きさを変化させずに、雑音を最小化するため、各マイクロフォンに重み付けをして、到来方向のパワーの推定値の分散が最小になるようなθを求めることにより、音の到来方向を求める方法（最小分散法）も提案されている（例えば、田中雅史，金田豊，小島順治；音源方向の推定法の室内残響下での性能評価，日本音響学会誌50巻7号，pp540-548,1994 参照）。しかしながら、この場合にも、実音源における直接音と反射音のように、物理的に相関の高い音を対象とした場合には、十分な測定精度を得ることができなかった。 However, in the above conventional method, when the frequencies of a plurality of sound sources are different, a sound signal having a specific frequency may be extracted from the received signal and the direction of each sound source may be estimated. In this case, the direction of the target sound may not be correctly estimated due to the mutual interference of the sounds. Further, although the number of sound sources is one, the direction of the sound source cannot be accurately measured even when there is a large amount of reflected sound from the road surface or walls, as in the case where there are a plurality of sound sources having the same frequency.
For example, in the beamformer method described above, weak directivity called side lobes appears in a certain direction other than the direction of arrival of sound waves (main lobe). Had fallen. Therefore, in order to minimize the contribution of the output from the other direction, to minimize the noise without changing the target sound volume, each microphone is weighted and the estimated power of the arrival direction is calculated. A method for obtaining the direction of sound arrival (minimum dispersion method) by obtaining θ that minimizes the variance has also been proposed (for example, Masafumi Tanaka, Yutaka Kanada, Junji Kojima; room reverberation of the sound source direction estimation method) (See Performance Evaluation below, Journal of the Acoustical Society of Japan, Vol. 50, No. 7, pp540-548, 1994). However, even in this case, sufficient measurement accuracy could not be obtained when a physically highly correlated sound such as a direct sound and a reflected sound in an actual sound source was targeted.

本発明は、従来の問題点に鑑みてなされたもので、音源が複数個である場合や、反射音の影響が大きい場合でも、複数の音源の方向、もしくは、直接音の方向と反射音の方向とを正確に分離することのできる方法を提供することを目的とする。 The present invention has been made in view of the conventional problems, and even when there are a plurality of sound sources or when the influence of the reflected sound is large, the direction of the plurality of sound sources, or the direction of the direct sound and the reflected sound. It is an object of the present invention to provide a method capable of accurately separating directions.

本発明者らは、鋭意検討を重ねた結果、基準となるマイクロフォンに入力する観測音の振幅をその大きさとしたベクトルは、その大きさが各音源から上記基準マイクロフォンに入力する音の振幅の大きさを表し、その位相角が上記観測音に対する位相差を表わす複素ベクトル（以下、音ベクトルという）の和で表わされることから、各マイクロフォンについても、実際に観測された各マイクロフォンに入力する観測音の上記基準マイクロフォンに対する位相差から、各音源の音ベクトルを求めることができるので、これらの音ベクトルの組から各音源毎に各マイクロフォン間の位相差を求め、この位相差から上記複数の音源の方向を推定するとともに、この推定された音源のうちの１つの音源から各マイクロフォンに到来する音の位相差を求めて、この位相差から新たな音ベクトルの組を求め、この新たな音ベクトルの組が最初に仮定した音ベクトルの組に最も近い音ベクトルの組を各音源からの音ベクトルの組とすれば、複数の音源の方向を分離して特定できることを見出し、本発明に至ったものである。 As a result of extensive studies, the present inventors have determined that the magnitude of the amplitude of the observation sound input to the reference microphone is the magnitude of the amplitude of the sound input from each sound source to the reference microphone. And the phase angle is represented by the sum of complex vectors (hereinafter referred to as sound vectors) representing the phase difference with respect to the observed sound. Therefore, for each microphone, the observed sound input to each actually observed microphone Therefore, the sound vector of each sound source can be obtained from the phase difference with respect to the reference microphone. Therefore, the phase difference between the microphones is obtained for each sound source from the set of these sound vectors, and the plurality of sound sources are obtained from this phase difference. In addition to estimating the direction, the phase difference of the sound arriving at each microphone from one of the estimated sound sources is obtained. Then, a new set of sound vectors is obtained from this phase difference, and the set of sound vectors closest to the set of sound vectors that the new set of sound vectors is initially assumed is set as the set of sound vectors from each sound source. The inventors have found that the directions of a plurality of sound sources can be separated and specified, and have reached the present invention.

すなわち、本願の請求項１に記載の発明は、複数個の音源、特に、周波数が同一または近接するような複数個の音源から到来した音の合成音である観測音を複数のマイクロフォンで採取し、上記観測音の各マイクロフォン間の位相差のデータを用いて上記各音源の方向を分離して特定する方法であって、
複数のマイクロフォンで観測した観測音の、各マイクロフォン間の位相差をそれぞれ検出する第１のステップと、
基準となるマイクロフォンに入力する観測音に対応する複素ベクトルを、大きさが観測音の大きさを表わしその位相角が０°である基準音ベクトルで表わしたとき、各音源から上記基準マイクロフォンに入力する音に対応する音ベクトルが、それぞれ、その大きさが各音源から到来する音の大きさを表わし、その位相角が上記観測音に対する位相差を表わす複素ベクトルであり、かつ、上記各音源の音ベクトルの和が上記基準音ベクトルになるように上記各音源の音ベクトルを設定する第２のステップと、
上記基準となるマイクロフォン以外の各マイクロフォンについて、上記設定された各音ベクトルと大きさが等しく、かつ、その和が当該マイクロフォンに入力する観測音の音ベクトルになるように、上記各マイクロフォンに入力する各音源の音ベクトルをそれぞれ算出し、全てのマイクロフォンについての各音源からの音ベクトルの組を求める第３のステップと、
上記音ベクトルの組の中から１つの音源方向の音ベクトルを特定し、この特定された音ベクトルについて、各マイクロフォン間の位相差を求める第４のステップと、
上記第４のステップで求めた位相差から上記特定された音源の方向を推定する第５のステップと、
上記推定された特定音源から到来する音の各マイクロフォン間の位相差を算出する第６のステップと、
各マイクロフォンについて、上記第６のステップで求めた位相差から、上記推定された特定音源からの音の音ベクトルを各マイクロフォン毎に算出する第７のステップと、
各マイクロフォンについて、上記第３のステップで算出された音ベクトルと、上記第７のステップで算出された音ベクトルとの差のベクトルを求める第８のステップと、
上記第２のステップにおいて、各音源からの音ベクトルの大きさと位相角とを変更して、その変更された音ベクトルについて、上記第２のステップから上記第８のステップを繰り返して、上記第２のステップで設定された各音ベクトルのそれぞれについて、上記差のベクトルを求め、例えば、上記求められた差のベクトルの絶対値の和が最小となるような各音源の音ベクトルの組を特定するなど、上記求められた差のベクトルの大きさに基づいて、最も確からしい各音源の音ベクトルの組を特定し、上記特定された音ベクトルの組の各音ベクトルと上記基準音ベクトルとの位相角の差から、上記各音源の方向をそれぞれ推定する第９のステップ、
とを備えたことを特徴とするものである。
なお、上記複数の音源としては、１つの音源と上記音源からの反射音を発生する仮想音源（２次音源）である場合も含むものとする。 That is, the invention described in claim 1 of the present application uses a plurality of microphones to collect observation sounds that are synthesized sounds of a plurality of sound sources, in particular, a plurality of sound sources having the same or close frequencies. The method of separating and specifying the direction of each sound source using the phase difference data between the microphones of the observation sound,
A first step of detecting a phase difference between the microphones of the observation sound observed by a plurality of microphones;
When a complex vector corresponding to the observation sound input to the reference microphone is represented by a reference sound vector whose magnitude indicates the magnitude of the observation sound and whose phase angle is 0 °, it is input from each sound source to the reference microphone. Sound vectors corresponding to the sound to be reproduced are each a magnitude vector representing the magnitude of the sound coming from each sound source, a phase angle of which is a complex vector representing a phase difference with respect to the observed sound, and each sound source A second step of setting the sound vectors of the sound sources so that the sum of the sound vectors becomes the reference sound vector;
For each microphone other than the reference microphone, input to each microphone so that the sound vector of the set sound vector is equal in magnitude and the sum thereof becomes the sound vector of the observation sound input to the microphone. A third step of calculating a sound vector of each sound source and obtaining a set of sound vectors from each sound source for all microphones;
A fourth step of identifying a sound vector in one sound source direction from the set of sound vectors and obtaining a phase difference between the microphones for the identified sound vector;
A fifth step of estimating the direction of the identified sound source from the phase difference obtained in the fourth step;
A sixth step of calculating a phase difference between the microphones of sound coming from the estimated specific sound source;
For each microphone, a seventh step for calculating the sound vector of the sound from the estimated specific sound source for each microphone from the phase difference obtained in the sixth step;
For each microphone, an eighth step for obtaining a vector of a difference between the sound vector calculated in the third step and the sound vector calculated in the seventh step;
In the second step, the magnitude and phase angle of the sound vector from each sound source are changed, and the second step to the eighth step are repeated for the changed sound vector. For each of the sound vectors set in step S, the difference vector is obtained. For example, the set of sound vectors of the sound sources that minimizes the sum of absolute values of the obtained difference vectors is specified. Based on the magnitude of the obtained difference vector, the most probable sound vector set of each sound source is specified, and the phase of each sound vector of the specified sound vector set and the reference sound vector is determined. A ninth step of estimating the direction of each sound source from the angle difference;
It is characterized by comprising.
Note that the plurality of sound sources include a case where a single sound source and a virtual sound source (secondary sound source) that generates a reflected sound from the sound source are included.

請求項２に記載の発明は、請求項１に記載の複数音源の分離方法において、上記第１のステップで行う観測音の各マイクロフォン間の位相差の検出と、第５のステップで行う音源方向の推定とを、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対を構成するマイクロフォン間の位相差を用いて音源の方向を推定する音源位置推定手段を備えた音源位置推定装置を用いて行うようにしたことを特徴とする。
また、請求項３に記載の発明は、請求項１に記載の複数音源の分離方法において、上記第１のステップで行う観測音の各マイクロフォン間の位相差の検出と、第５のステップで行う音源方向の推定とを、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対と上記２組のマイクロフォン対の作る平面上にない第５のマイクロフォンとから成るマイクロフォン群と、上記２組のマイクロフォン対を構成するマイクロフォン間の位相差、及び、上記第５のマイクロフォンと上記２組のマイクロフォン対を構成する４個のマイクロフォンのそれぞれとで構成される４組のマイクロフォン対を構成するマイクロフォン間の位相差を用いて音源の方向を推定する音源位置推定手段を備えた音源位置推定装置とを用いて行うようにしたことを特徴とする。
請求項４に記載の発明は、請求項１〜請求項３のいずれかに記載の複数音源の分離方法において、上記推定された音源方向近傍の映像を採取し、上記推定された音源方向と上記採取された映像とから音源の位置を特定するようにしたことを特徴とする。 According to a second aspect of the present invention, in the method for separating a plurality of sound sources according to the first aspect, the phase difference detection between the microphones of the observation sound performed in the first step and the sound source direction performed in the fifth step A sound source provided with sound source position estimating means for estimating the direction of the sound source using the phase difference between the microphones constituting two pairs of microphones arranged at predetermined intervals on two straight lines intersecting each other It is characterized by using a position estimation device.
According to a third aspect of the present invention, in the method for separating a plurality of sound sources according to the first aspect, the phase difference between the microphones of the observation sound performed in the first step is detected and the fifth step is performed. A microphone group consisting of two microphone pairs arranged at predetermined intervals on two straight lines intersecting each other and a fifth microphone not on the plane formed by the two microphone pairs, for estimating the sound source direction. A phase difference between the microphones constituting the two microphone pairs, and four microphone pairs constituted by the fifth microphone and the four microphones constituting the two microphone pairs. Using a sound source position estimation device including sound source position estimation means for estimating the direction of a sound source using a phase difference between the constituent microphones. Characterized in that way the.
According to a fourth aspect of the present invention, in the method for separating a plurality of sound sources according to any one of the first to third aspects, an image in the vicinity of the estimated sound source direction is sampled, and the estimated sound source direction and the It is characterized in that the position of the sound source is specified from the collected video.

本発明によれば、複数の音源の位置、特に、周波数が同一かまたは近接している複数音源の位置を分離して特定する際に、その大きさが基準となるマイクロフォンに入力する観測音の振幅に対する各音源からの音の振幅の大きさを表し、その角度が上記観測音に対する位相差を表わす音ベクトルを各音源毎に想定し、この想定された音ベクトルの組を用いて各音源の方向を推定した後、上記推定された音源の中の特定音源から各マイクロフォンに到達する音の位相差を求め、この位相差から新たな音ベクトルを求めるとともに、この新たな音ベクトルと上記設定した上記特定音源の音ベクトルとの差のベクトルが最も小さくなるような音ベクトルを求め、この音ベクトルを含む音ベクトルの組を与える各音源方向を各音源の推定方向として音源の方向を分離して特定するようにしたので、音源が複数個である場合や、反射音の影響が大きい場合でも、複数の音源の方向、もしくは、直接音の到来方向と反射音の到来方向とを正確に分離して特定することができる。
このとき、上記第１のステップで行う観測音の各マイクロフォン間の位相差の検出と、第５のステップで行う音源方向の推定とを、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対を構成するマイクロフォン間の位相差を用いて音源の方向を推定する音源位置推定手段を備えた音源位置推定装置、あるいは、互いに交わる２つの直線上にそれぞれ所定の間隔で配置された２組のマイクロフォン対と上記２組のマイクロフォン対の作る平面上にない第５のマイクロフォンとから成るマイクロフォン群と、上記２組のマイクロフォン対を構成するマイクロフォン間の位相差、及び、上記第５のマイクロフォンと上記２組のマイクロフォン対を構成する４個のマイクロフォンのそれぞれとで構成される４組のマイクロフォン対を構成するマイクロフォン間の位相差を用いて音源の方向を推定する音源位置推定手段を備えた音源位置推定装置とを用いて行うようにすれば、少ないマイクロフォン数で、効率よくかつ正確に音源の方向を分離することができる。 According to the present invention, when the positions of a plurality of sound sources, in particular, the positions of a plurality of sound sources having the same or close frequencies are separated and specified, the magnitude of the observation sound to be input to the microphone serving as a reference is determined. It represents the magnitude of the sound amplitude from each sound source with respect to the amplitude, and the sound vector whose angle represents the phase difference with respect to the observed sound is assumed for each sound source, and using this assumed sound vector set, After estimating the direction, the phase difference of the sound reaching each microphone from the specific sound source among the estimated sound sources is obtained, and a new sound vector is obtained from the phase difference, and the new sound vector and the above-described setting are obtained. A sound vector that minimizes the difference vector from the sound vector of the specific sound source is obtained, and each sound source direction that gives a set of sound vectors including this sound vector is set as the estimated direction of each sound source. The direction of the sound is separated and specified, so even if there are multiple sound sources or the influence of the reflected sound is large, the direction of the multiple sound sources or the arrival direction of the direct sound and the reflected sound Can be accurately separated and specified.
At this time, detection of the phase difference between the microphones of the observation sound performed in the first step and estimation of the sound source direction performed in the fifth step are arranged at predetermined intervals on two intersecting straight lines, respectively. A sound source position estimation device provided with sound source position estimation means for estimating the direction of a sound source using a phase difference between microphones constituting two pairs of microphones, or arranged at predetermined intervals on two intersecting straight lines A microphone group including the two microphone pairs and a fifth microphone that is not on the plane formed by the two microphone pairs, a phase difference between the microphones constituting the two microphone pairs, and the first 4 sets of microphones composed of 5 microphones and each of the 4 microphones constituting the above 2 pairs of microphones. By using a sound source position estimation device equipped with a sound source position estimation means for estimating the direction of a sound source using the phase difference between microphones constituting a pair of microphones, it is possible to efficiently and accurately with a small number of microphones. The direction of the sound source can be separated.

以下、本発明の最良の形態について、図面に基づき説明する。
図１は本発明の最良の形態に係る音源探査システムの概要を示す図で、Ｍ１〜Ｍ５は図示しない騒音源からの雑音の音圧レベルを測定するための測定用のマイクロフォン、１１は音源位置近傍の映像を採取するためのＣＣＤカメラ（以下、カメラという）、１２はローパスフィルタを備えていて、上記マイクロフォンＭ１〜Ｍ５で採取された音響情報から所定の周波数以下の成分を取り出し増幅する増幅器、１３は上記増幅された音響情報（アナログ信号）をデジタル信号に変換するＡ／Ｄ変換器、１４は上記カメラ１１の映像情報信号（アナログ信号）をデジタル信号に変換するビデオ入出力ユニットである。また、２０は上記各マイクロフォンＭ１〜Ｍ５を所定の位置に配列するためのマイクロフォンフレーム、３０は三脚から成る支持部材３１と、この支持部材３１の上部に配設された回転台３２とから成る測定用基台で、この回転台３２により、上記マイクロフォンフレーム２０を回転でき、上記マイクロフォンＭ１〜Ｍ５を水平面内で回転させることができる。
また、４０は入力手段であるキーボード４１とマイクロフォン数やサンプリング周波数などの測定パラメータを記憶するとともに、音源位置推定の演算等を行う記憶演算部４２と画像表示手段であるディスプレイ４３とを備えた音源位置推定装置で、上記記憶・演算部４２は、上記測定パラメータを記憶するパラメータファイル４２ｍを備えたデータ記憶手段４２ａと、上記Ａ／Ｄ変換されたマイクロフォンＭ１〜Ｍ５からの音響情報を用いて騒音源の方向を推定する音源位置推定手段４２ｂと、上記カメラ１１からの映像に上記推定された音源位置を示す画像を付加した画像を生成して上記ディスプレイ４３に送る画像合成手段４２ｃとを備えるとともに、上記騒音源が複数である可能性がある場合、あるいは、上記騒音の反射音の影響が強く、あたかも２音源から音が到来していると考えられる場合に、上記複数の音源の方向、または、直接音の方向と反射音の方向とを分離するための、音源分離手段４２ｄを備えている。これにより、上記騒音源が複数である場合や反射音の影響が強い場合などには、上記騒音源の方向を分離して推定するとともに、上記カメラ１１からの映像に、上記推定された複数の音源位置を示す画像を付加した画像を生成して上記ディスプレイ４３に表示することができる。 Hereinafter, the best mode of the present invention will be described with reference to the drawings.
FIG. 1 is a diagram showing an outline of a sound source search system according to the best mode of the present invention. M1 to M5 are measurement microphones for measuring the sound pressure level of noise from a noise source (not shown), and 11 is a sound source position. A CCD camera (hereinafter referred to as a camera) for collecting a nearby image, an amplifier 12 having a low-pass filter, which extracts and amplifies components below a predetermined frequency from the acoustic information collected by the microphones M1 to M5, Reference numeral 13 denotes an A / D converter that converts the amplified acoustic information (analog signal) into a digital signal, and reference numeral 14 denotes a video input / output unit that converts the video information signal (analog signal) of the camera 11 into a digital signal. Reference numeral 20 denotes a microphone frame for arranging the microphones M1 to M5 at predetermined positions. Reference numeral 30 denotes a support member 31 composed of a tripod and a turntable 32 provided on the support member 31. The microphone frame 20 can be rotated by the turntable 32 at the base, and the microphones M1 to M5 can be rotated in a horizontal plane.
Reference numeral 40 denotes a sound source including a keyboard 41 serving as input means, a measurement parameter such as the number of microphones and a sampling frequency, a storage operation unit 42 for performing sound source position estimation computation, and a display 43 serving as image display means. In the position estimation apparatus, the storage / calculation unit 42 generates noise using acoustic information from the data storage means 42a having a parameter file 42m for storing the measurement parameters and the A / D converted microphones M1 to M5. A sound source position estimating means 42b for estimating the direction of the source; and an image synthesizing means 42c for generating an image in which an image indicating the estimated sound source position is added to the video from the camera 11 and sending the image to the display 43. If there is a possibility that there are a plurality of noise sources, or the influence of the reflected sound of the noise is In addition, when it is considered that sound is coming from two sound sources, sound source separation means 42d is provided for separating the directions of the plurality of sound sources or the direct sound direction and the reflected sound direction. Yes. Accordingly, when there are a plurality of noise sources or when the influence of reflected sound is strong, the direction of the noise sources is separated and estimated, and the estimated plurality of images are displayed on the video from the camera 11. An image to which an image indicating the sound source position is added can be generated and displayed on the display 43.

次に、本発明による複数音源の分離方法について、図２のフローチャートに基づき説明する。なお、説明を簡単にするため、分離する音源を同一周波数の音を発生する音源Ａと音源Ｂとの２つの音源とし、基準となるマイクロフォンを、図１２に示した四角錐状に配置された５個のマイクロフォンＭ１〜Ｍ５のうちの、上記四角錐の頂点に配置されたマイクロフォンＭ５とした場合について説明する。
最初に、マイクロフォンＭ１〜Ｍ５の出力から、上記各マイクロフォンＭ１〜Ｍ５に入力される観測音の音源が１個であると仮定したときの上記観測音の音源方向を推定する（ステップＳ１０）。
このとき実際の測定においては、音源の位置がマイクロフォンの位置から十分（例えば、マイクロフォン間隔の１０倍以上）離れているので、各マイクロフォンＭ１〜Ｍ５に到達する音を平面波とみなすことが可能である。そこで、本例では、音源位置を求める際に、音源の位置がマイクロフォンＭ１〜Ｍ５の位置から十分離れており、音は平面波として各マイクロフォンＭ１〜Ｍ５に入射すると仮定して音源位置を推定する。
平面波近似においては、マイクロフォンＭ_ｉとマイクロフォンＭ_j間の時間遅れＤ_ijと音源の位置の水平角θ及び仰角φとは、上述した式（１），（２）で表わせるので、各マイクロフォンＭ１〜Ｍ５の出力信号を周波数分析して、対象となる周波数ｆにおける各マイクロフォンＭ_ｉ，Ｍ_ｊ間の時間遅れＤ_ijを算出することにより、上記水平角θ及び仰角φを求めることができる。以下、上記式（１），（２）を再掲する。
また、上記時間遅れＤ_ijは、２つのマイクロフォン対（Ｍ_ｉ，Ｍ_ｊ）に入力される信号のクロススペクトルＰ_ij（ｆ）を求め、更に、対象とする上記周波数ｆの位相角情報Ψ（ｒａｄ）を用いて、以下の式（３）を用いて算出される。
なお、上記推定された音源方向近傍の映像をカメラ１１により採取すれば、上記推定された音源方向と上記カメラ１１で採取した映像情報とから、音源の位置についても推定することができるとともに、音源位置推定装置４０のディスプレイ４３上に表示された音源方向近傍の映像に、上記推定された音源位置を表示することができる。
また、上記音源の位置は、各周波数毎に算出することができる。
但し、この音源位置は、上述したように、観測音の音源が１個であると仮定した場合の音源位置で、本例では、以下に示すように、音源分離手段４２ｄを用いて上記音源位置を音源Ａの位置と音源Ｂの位置とに分離する。
なお、上記観測音の音源方向あるいは音源位置と後に推定する音源Ａ，Ｂの方向あるいは位置とを比較する必要のない場合には、上記時間遅れＤ_ijを算出するだけでよく、上記水平角θ及び仰角φの算出や上記観測音の音源位置の推定については省略してもよい。 Next, a method of separating a plurality of sound sources according to the present invention will be described based on the flowchart of FIG. In order to simplify the explanation, the sound source to be separated is two sound sources, a sound source A and a sound source B that generate sound of the same frequency, and the reference microphones are arranged in a quadrangular pyramid shape shown in FIG. The case where the microphone M5 is arranged at the apex of the quadrangular pyramid among the five microphones M1 to M5 will be described.
First, the sound source direction of the observed sound is estimated from the outputs of the microphones M1 to M5 when it is assumed that there is one sound source of the observed sound input to each of the microphones M1 to M5 (step S10).
At this time, in the actual measurement, since the position of the sound source is sufficiently away from the position of the microphone (for example, 10 times or more of the microphone interval), it is possible to regard the sound reaching each of the microphones M1 to M5 as a plane wave. . Therefore, in this example, when determining the sound source position, the sound source position is estimated on the assumption that the position of the sound source is sufficiently away from the positions of the microphones M1 to M5 and that the sound is incident on each of the microphones M1 to M5 as a plane wave.
In the plane wave approximation, the microphone M _i and the microphone M horizontal angle θ and elevation position of the time delay D _ij and the sound source between _j phi, the above Expression (1), so can be expressed by (2), each microphone M1 The horizontal angle θ and the elevation angle φ can be obtained by frequency analysis of the output signals of .about.M5 and calculating the time delay D _ij between the microphones M _i and M _{j at} the target frequency f. Hereinafter, the above formulas (1) and (2) will be described again.
Further, the time delay D _{ij obtains} the cross spectrum P _ij (f) of the signals input to the two microphone pairs (M _i , M _j ), and further, the phase angle information ψ ( rad) and is calculated using the following equation (3).
If the video near the estimated sound source direction is collected by the camera 11, the position of the sound source can be estimated from the estimated sound source direction and the video information collected by the camera 11, and the sound source The estimated sound source position can be displayed on the image near the sound source direction displayed on the display 43 of the position estimation device 40.
The position of the sound source can be calculated for each frequency.
However, this sound source position is a sound source position when it is assumed that the number of sound sources of the observation sound is one as described above. In this example, the sound source position is determined using the sound source separation means 42d as described below. Is separated into the position of the sound source A and the position of the sound source B.
If it is not necessary to compare the direction or position of the sound source of the observed sound with the direction or position of the sound sources A and B to be estimated later, it is only necessary to calculate the time delay D _ij and the horizontal angle θ. The calculation of the elevation angle φ and the estimation of the sound source position of the observation sound may be omitted.

次に、図３（ａ）に示すように、基準マイクロフォンであるマイクロフォンＭ５に入力する観測音を振幅が１で位相角が０°である複素ベクトル（以下、観測音基準ベクトルという）Ｓ５で表現するとともに、音源ＡからマイクロフォンＭ５に入力される音の複素ベクトルＡ５（以下、音ベクトルＡ５という）の振幅Ｐ_Aと位相角δ_A、及び、音源Ｂから上記マイクロフォンＭ５に入力する音の複素ベクトルＢ５（以下、音ベクトルＢ５という）の振幅Ｐ_Bと位相角δ_Bとをそれぞれ設定する（ステップＳ１１）。
上記マイクロフォンＭ５に入力される観測音は、音源ＡからマイクロフォンＭ５に入力される音と音源ＢからマイクロフォンＭ５に入力される音との合成音であるので、上記音ベクトルＳ５は上記音ベクトルＡ５と上記音ベクトルＢ５とのベクトル和となる。したがって、上記音ベクトルＡ５の振幅Ｐ_Aと位相角δ_Aとを設定すれば、音源ＢからマイクロフォンＭ５に入力される音の音ベクトルＢ５の振幅Ｐ_Bと位相角δ_Bとは、上記振幅Ｐ_Aと上記位相角δ_Aとを用いて一義的に求めることができる（ステップＳ１２）。
本例では、図３（ｂ）に示すように、上記ステップＳ１１において設定される音ベクトルＡ５の振幅Ｐ_Aを０．４〜４．０の範囲で、ΔＰ_A＝０．１ずつ変化させるとともに、位相角δ_Aを０°〜３６０°の範囲で、Δδ_A＝１°ずつ変化させて、上記ステップＳ１１から以下に記載するステップＳ１８までを繰り返し、上記振幅Ｐ_Aと上記位相角δ_Aとの全ての組について音源Ａ及び音源Ｂの方向を推定する。そして、この推定された音源Ａからの音と音源Ｂからの音との合成音が上記観察音に最も近い振幅Ｐ_Aと位相角δ_Aとを有する音ベクトルＡ５を求めて上記音源Ａの方向と音源Ｂの方向とを特定する。 Next, as shown in FIG. 3A, the observation sound input to the microphone M5, which is a reference microphone, is represented by a complex vector (hereinafter referred to as an observation sound reference vector) S5 having an amplitude of 1 and a phase angle of 0 °. In addition, the amplitude P _A and phase angle δ _{A of} the sound complex vector A5 (hereinafter referred to as the sound vector A5) input from the sound source A to the microphone M5, and the complex vector of the sound input from the sound source B to the microphone M5. An amplitude P _B and a phase angle δ _{B of} B5 (hereinafter referred to as sound vector B5) are set (step S11).
Since the observation sound input to the microphone M5 is a synthesized sound of the sound input from the sound source A to the microphone M5 and the sound input from the sound source B to the microphone M5, the sound vector S5 is the sound vector A5. This is a vector sum with the sound vector B5. Therefore, if the amplitude P _A and the phase angle δ _{A of the} sound vector A5 are set, the amplitude P _B and the phase angle δ _B of the sound vector B5 of the sound input from the sound source B to the microphone M5 are the amplitude P it can be uniquely determined by using the _a and the phase angle [delta] _a (step S12).
In this example, as shown in FIG. 3B, the amplitude P _A of the sound vector A5 set in step S11 is changed by ΔP _A = 0.1 in the range of 0.4 to 4.0. The phase angle δ _A is changed by Δδ _A = 1 ° in the range of 0 ° to 360 °, and the above steps S11 to S18 described below are repeated, and the amplitude P _A and the phase angle δ _A The direction of the sound source A and the sound source B is estimated for all pairs. Then, a sound vector A5 having an amplitude P _A and a phase angle δ _A where the synthesized sound of the estimated sound from the sound source A and the sound from the sound source B is closest to the observed sound is obtained, and the direction of the sound source A And the direction of the sound source B.

次に、図４を参照して、各マイクロフォンＭ１〜Ｍ４に入力される観察音の音ベクトルＳ１〜Ｓ４から、上記音源Ａ及び音源Ｂから上記マイクロフォンＭ１〜Ｍ４に入力される音の音ベクトルＡ１〜Ａ４及び音ベクトルＢ１〜Ｂ４を求める方法について説明する。
図４（ａ）に示すように、マイクロフォンＭ１の観測音ベクトルＳ１の振幅は上記観測音基準ベクトルＳ５の振幅と等しく１であり、観測音基準ベクトルＳ５と観測音ベクトルＳ１との角度差Δ₅₁は、上記ステップＳ１０で実際に検出された時間遅れＤ₅₁に等しい。また、上記観測音ベクトルＳ１は上記音源ＡからマイクロフォンＭ１に入力される音の音ベクトルＡ１と、上記音源ＢからマイクロフォンＭ１に入力される音の音ベクトルＢ１との和となっており、かつ、上記音ベクトルＡ１，Ｂ１の振幅Ｐ_A1，Ｐ_B1は、それぞれ上記音ベクトルＡ５，Ｂ５の振幅Ｐ_A，Ｐ_Bと等しい。したがって、そのベクトル和が上記観測音ベクトルＳ１となる音ベクトルＡ１，Ｂ１としては、図４（ａ）に示すように、［Ａ11，Ｂ11］，［Ａ12，Ｂ12］の２通りが考えられる。
同様に、マイクロフォンＭ２〜Ｍ４に入力される音源Ａ及び音源Ｂからの音の音ベクトルＡ２〜Ａ４，Ｂ２〜Ｂ４についても、図４（ｂ）〜（ｄ）に示すように、それぞれ２組の音ベクトル［Ａi1，Ｂi1］，［Ａi2，Ｂi2］のいずれかになる（ｉ＝２〜４）。したがって、基準となるマイクロフォンＭ５の音ベクトルＡ５，Ｂ５に対してそれぞれ時間遅れがＤ_5iである観測音が入力されるマイクロフォンＭｉに入力される音源Ａ及び音源Ｂからの音の音ベクトルの組［Ａik，Ｂik］の数は、１つの音ベクトルＡ５を設定すると、ｉ＝１〜４，ｋ＝１，２であるので、１６通りあることになる（ステップＳ１３）。 Next, referring to FIG. 4, the sound vector A1 of the sound input from the sound source A and the sound source B to the microphones M1 to M4 from the sound vectors S1 to S4 of the observation sound input to the microphones M1 to M4. A method for obtaining .about.A4 and sound vectors B1 to B4 will be described.
As shown in FIG. 4A, the amplitude of the observation sound vector S1 of the microphone M1 is 1 which is equal to the amplitude of the observation sound reference vector S5, and the angle difference Δ ₅₁ between the observation sound reference vector S5 and the observation sound vector S1. Is equal to the time delay D ₅₁ actually detected in step S10. The observed sound vector S1 is the sum of the sound vector A1 of the sound input from the sound source A to the microphone M1 and the sound vector B1 of the sound input from the sound source B to the microphone M1, and The amplitudes P _A1 and P _B1 of the sound vectors A1 and B1 are equal to the amplitudes P _A and P _{B of the} sound vectors A5 and B5, respectively. Therefore, as the sound vectors A1 and B1 whose vector sum is the observed sound vector S1, two types of [A11, B11] and [A12, B12] are conceivable as shown in FIG.
Similarly, two sets of sound vectors A2 to A4 and B2 to B4 of the sound from the sound source A and the sound source B input to the microphones M2 to M4, respectively, as shown in FIGS. One of the sound vectors [Ai1, Bi1] and [Ai2, Bi2] is obtained (i = 2 to 4). Therefore, a set of sound vectors of the sound source A and the sound source B input to the microphone Mi to which the observation sound whose time delay is D _5i is input with respect to the sound vector A5 and B5 of the reference microphone M5 [ If one sound vector A5 is set, i = 1 to 4 and k = 1, 2, so there are 16 numbers of Aik, Bik] (step S13).

音ベクトルの組合わせを求めた後には、以下に示すように、音源Ａの方向と音源Ｂの方向とを上記１６通りの音ベクトルの組合わせにつきそれぞれ推定し、どの音ベクトルの組合わせが実際の音源Ａの方向に近いかを推定する。なお、以下の計算は、仮定した１つの音ベクトルＡ５に対して想定される１６通りの音ベクトルの組［Ａik，Ｂik］について行うが、表現を単純化するため、上記Ａik，Ｂikを単にＡi，Ｂiと表わし、その位相角をδ_Ai，δ_Biと表わす。
上記各マイクロフォンＭ１〜Ｍ４の音ベクトルＡi(ｉ＝１〜４)の角度δ_Aiと基準マイクロフォンＭ５の音ベクトルＡ５の位相角δ_Aとの角度差δ_A5iは、図５（ａ）〜（ｄ）に示すように、音源ＡからマイクロフォンＭｉへ入力した音の位相と基準マイクロフォンＭ５へ入力した音の位相との差である。したがって、上記角度差δ_A5iと、上記角度差δ_A5iを用いて算出した角度差δ_A13及び角度差δ_A24とをそれぞれ時間遅れＤ₁₃，Ｄ₂₄及び時間遅れＤ₅₁〜Ｄ₅₄として、上記式（１），（２）に代入すれば、音源Ａの方向（水平角θAと仰角φA）を推定することができる。また、音源Ｂについても、同様に、上記音ベクトルＢiの角度δ_Biからδ_B5i及びδ_B13及びδ_B24を求めることにより、音源Ｂの方向（水平角θBと仰角φB）を推定することができる（ステップＳ１４）。
音源Ａ，Ｂの方向の推定が完了した後には、上記推定した１つの音源Ａから平面波が到来してきたと仮定して、上記各マイクロフォンＭ１〜Ｍ５へ入力する平面波の、上記基準となるマイクロフォンＭ５に対する各マイクロフォンＭｉの時間遅れＤ_5i(A)をそれぞれ算出する（ステップＳ１５）。上記時間遅れＤ_5i(A)は、上記ステップＳ１２で仮定した基準マイクロフォンＭ５の音ベクトルＡ５を用いて推定した音源Ａの方向から上記マイクロフォンＭ５に入力される音の音ベクトルである新たな音ベクトルａ５と各マイクロフォンＭ１〜Ｍ４の新たな音ベクトルａｉ（ｉ＝１〜４）との位相差に相当する。また、上記音ベクトルａｉの振幅が上記音ベクトルＡ５の振幅Ｐ_Aに等しい。
ここで、上記新たな音ベクトルａ５を、上記ステップＳ１２で仮定した音ベクトルＡ５に等しいとして、上記新たな音ベクトルａ１〜ａ４と上記ステップＳ１３で求めた音ベクトルＡ１〜Ａ４とを比較する。すなわち、上記新たな音ベクトルａ１〜ａ４と上記ステップＳ１３で求めた音ベクトルＡ１〜Ａ４とがほぼ一致すれば、上記音ベクトルＡ５の設定が実際の音源Ａからの音ベクトルであると考えられる。
そこで、図６（ａ）〜（ｄ）に示すように、上記推定された音源Ａからの音の音ベクトルａｉと、上記ステップＳ１３において上記音ベクトルＡ５に基づいて求めたベクトルＡｉ（ｉ＝１〜４）との差のベクトルΔ_Aiを求めた後、この差のベクトルΔ_Aiの大きさの和を算出して保存する（ステップＳ１６）。上記新たな音ベクトルａ１〜ａ４と上記ステップＳ１３で求めた音ベクトルＡ１〜Ａ４との差が小さいほど上記設定された音ベクトルＡ５が実際の音源Ａからの音ベクトルに近いことから、上記差のベクトルΔ_Aiの大きさの和を、上記設定された音ベクトルＡ５が実際の音源Ａの音ベクトルに近いかどうかの判定基準とすれば、実際の音源Ａの方向を精度よく推定することができる。 After obtaining the combination of sound vectors, as shown below, the direction of the sound source A and the direction of the sound source B are estimated for each of the 16 sound vector combinations, and which sound vector combination is actually It is estimated whether it is close to the direction of the sound source A. The following calculation is performed for 16 possible sound vector pairs [Aik, Bik] for one assumed sound vector A5. In order to simplify the expression, the above Aik, Bik is simply Ai. , Bi, and the phase angles are represented as δ _Ai , δ _Bi .
The angle difference δ _A5i between the angle δ _Ai of the sound vector Ai (i = 1 to 4) of each of the microphones M1 to M4 and the phase angle δ _A of the sound vector A5 of the reference microphone M5 is shown in FIGS. ), The difference between the phase of the sound input from the sound source A to the microphone Mi and the phase of the sound input to the reference microphone M5. Therefore, the angle difference δ _A5i , the angle difference δ _A13 calculated using the angle difference δ _A5i , and the angle difference δ _A24 are set as time delays D ₁₃ and D ₂₄ and time delays D _{51 to} D ₅₄ , respectively. By substituting into (1) and (2), the direction of the sound source A (horizontal angle θA and elevation angle φA) can be estimated. Further, the sound source B is similarly by obtaining [delta] _B5i and [delta] _B13 and [delta] _B24 from the angle [delta] _Bi of the sound vector Bi, it is possible to estimate the direction of the sound source B (horizontal angle θB and elevation [phi] B) (Step S14).
After the estimation of the directions of the sound sources A and B is completed, it is assumed that a plane wave has arrived from the one estimated sound source A, and the plane wave input to each of the microphones M1 to M5 corresponds to the reference microphone M5. The time delay D _5i (A) of each microphone Mi is calculated (step S15). The time delay D _5i (A) is a new sound vector that is a sound vector of sound input to the microphone M5 from the direction of the sound source A estimated using the sound vector A5 of the reference microphone M5 assumed in step S12. This corresponds to the phase difference between a5 and the new sound vectors ai (i = 1 to 4) of the microphones M1 to M4. The amplitude of the sound vector ai is equal to the amplitude P _A of the sound vector A5.
Here, assuming that the new sound vector a5 is equal to the sound vector A5 assumed in step S12, the new sound vectors a1 to a4 are compared with the sound vectors A1 to A4 obtained in step S13. That is, if the new sound vectors a1 to a4 and the sound vectors A1 to A4 obtained in step S13 substantially match, it is considered that the setting of the sound vector A5 is a sound vector from the actual sound source A.
Therefore, as shown in FIGS. 6A to 6D, the vector Ai (i = 1) obtained based on the estimated sound vector ai from the sound source A and the sound vector A5 in step S13. After obtaining the difference vector Δ _Ai from ˜4), the sum of the magnitudes of the difference vector Δ _Ai is calculated and stored (step S16). The smaller the difference between the new sound vectors a1 to a4 and the sound vectors A1 to A4 obtained in step S13, the closer the set sound vector A5 is to the sound vector from the actual sound source A. If the sum of the magnitudes of the vectors Δ _Ai is used as a criterion for determining whether the set sound vector A5 is close to the actual sound vector of the sound source A, the direction of the actual sound source A can be accurately estimated. .

次にステップＳ１４に戻って、上記差のベクトルΔ_Aiの大きさの和を算出して保存する操作を１つの音ベクトルＡ５が設定されたときに求められた１６通りの組み合わせ全てについて行う（ステップＳ１７）。
１つの音ベクトルＡ５についての差のベクトルΔ_Aiの大きさの和の算出が終了したら、ステップＳ１１に戻って、上記音ベクトルＡ５の振幅Ｐ_Aまたは位相差δ_Aを変化させて上記ステップＳ１２からステップＳ１７までの操作を繰り返し、上記音ベクトルＡ５の振幅Ｐ_Aを０．４〜４．０の範囲でΔＰ_A＝０．１ずつ変化させ、位相差δ_Aを０°〜３６０°の範囲で、Δδ_A＝１°ずつ変化させた全ての場合について、上記差のベクトルΔ_Aiの大きさの和を算出して保存する操作を行う（ステップＳ１８）。
そして、上記差のベクトルΔ_Aiの大きさの和を比較し、上記絶対値の和が最小値をとるような音ベクトルＡ５，及びＢ５を特定し、この特定された音ベクトルＡ５，及びＢ５から得られる音源Ａ及び音源Ｂの方向を、観測音を分離して得られた音源Ａ及び音源Ｂの方向とする（ステップＳ１９）。
すなわち、ステップＳ１４，Ｓ１５において、音ベクトルＡ５，及びＢ５が上記特定された音ベクトルＡ５，及びＢ５である場合の音源Ａの方向（水平角θAと仰角φA）及び音源Ｂの方向（水平角θBと仰角φB）が観測音を分離して得られた音源Ａ及び音源Ｂの方向となる。そして、上記推定された音源Ａの方向と音源Ｂの方向とを含む映像をカメラ１１により採取し、音源位置推定装置４０のディスプレイ４３上に表示された映像上に上記音源Ａ，Ｂの方向をそれぞれ表示することにより、上記音源Ａの位置と音源Ｂの位置を決定する（ステップＳ２０）。 Returning now to step S14, is performed for all combinations of the 16 types obtained when the operation to save the calculated sum of the magnitude of the vector delta _Ai of the difference is 1 Tsunooto vector A5 is set (step S17).
When the calculation of the sum of the magnitudes of the difference vectors Δ _Ai for one sound vector A5 is completed, the process returns to step S11 to change the amplitude P _A or the phase difference δ _A of the sound vector A5 to start from step S12. The operation up to step S17 is repeated, and the amplitude P _A of the sound vector A5 is changed by ΔP _A = 0.1 in the range of 0.4 to 4.0, and the phase difference δ _A is changed in the range of 0 ° to 360 °. , Δδ _A = 1 for every 1 °, an operation of calculating and storing the sum of the magnitudes of the difference vectors Δ _Ai is performed (step S18).
Then, the sums of the magnitudes of the difference vectors Δ _Ai are compared, sound vectors A5 and B5 whose sum of absolute values takes the minimum value are specified, and from the specified sound vectors A5 and B5, The direction of the sound source A and the sound source B obtained is set as the direction of the sound source A and the sound source B obtained by separating the observation sound (step S19).
That is, in steps S14 and S15, the direction of the sound source A (horizontal angle θA and elevation angle φA) and the direction of the sound source B (horizontal angle θB) when the sound vectors A5 and B5 are the specified sound vectors A5 and B5. And the elevation angle φB) are directions of the sound source A and the sound source B obtained by separating the observation sound. Then, an image including the estimated direction of the sound source A and the direction of the sound source B is collected by the camera 11, and the directions of the sound sources A and B are displayed on the image displayed on the display 43 of the sound source position estimating device 40. By displaying each, the position of the sound source A and the position of the sound source B are determined (step S20).

このように、本最良の形態によれば、まず、各マイクロフォンの観測音を表わす音ベクトルが、その大きさが基準となるマイクロフォンに入力する観測音の振幅に対する各音源からの音の振幅の大きさを表し、その位相角が上記観測音に対する位相差を表わす、音源Ａ及び音源Ｂからの音の音ベクトルの和で表わされるとして、基準となるマイクロフォンＭ５に入力される各音源からの音ベクトルＡ５，Ｂ５を想定し、上記音ベクトルＡ５，Ｂ５と各マイクロフォンＭ１〜Ｍ４に入力される観測音の位相差とから、各マイクロフォン毎に音ベクトルＡｉ，Ｂｉを求めて音ベクトルの組（Ａｉ，Ｂｉ）を算出する操作を行って、この算出された音ベクトルの組から各音源Ａ，Ｂの方向を推定する。そして、上記推定された音源Ａの方向から音が到来するとして、上記到来音の基準となるマイクロフォンＭ５に対する各マイクロフォンＭｉの時間遅れＤ_5i(A)をそれぞれ算出し、この時間遅れＤ_5i(A)から新たな音ベクトルａｉの組を求め、この新たな音ベクトルａｉの組と上記設定した各音源の音ベクトルＡｉの組との差のベクトルΔ_Aiの大きさの和を求める。この差のベクトルΔ_Aiの大きさの和は、上記音ベクトルＡ５が実際の音源Ａの音ベクトルに近いかどうかの判定基準となるので、上記差のベクトルΔ_Aiの大きさの和が最も小さくなるような音ベクトルＡｉの組を求め、この音ベクトルＡｉの組を与える各音源Ａ，Ｂの方向を各音源Ａ，Ｂの推定方向として音源Ａ，Ｂの方向を分離するようにすれば、音源が複数個である場合や、反射音の影響が大きい場合でも、複数の音源の方向、もしくは、直接音の方向と反射音の方向とを正確に分離して特定することができる。 Thus, according to this best mode, first, the sound vector representing the observation sound of each microphone has a magnitude of the amplitude of the sound from each sound source with respect to the amplitude of the observation sound input to the microphone whose magnitude is the reference. The sound vector from each sound source input to the reference microphone M5 is expressed as the sum of the sound vectors of the sound from the sound source A and the sound source B, which represents the phase difference with respect to the observed sound. Assuming A5 and B5, the sound vectors Ai and Bi are obtained for each microphone from the sound vectors A5 and B5 and the phase difference between the observed sounds input to the microphones M1 to M4, and a set of sound vectors (Ai, The operation of calculating Bi) is performed, and the directions of the sound sources A and B are estimated from the calculated set of sound vectors. Then, assuming that sound comes from the direction of the estimated sound source A, the time delay D _5i (A) of each microphone Mi with respect to the microphone M5 serving as the reference of the incoming sound is calculated, and this time delay D _5i (A ), A new set of sound vectors ai is obtained, and the sum of the magnitudes of the difference vectors Δ _Ai between the new set of sound vectors ai and the set of sound vectors Ai of the respective sound sources is obtained. The sum of the magnitudes of the difference vectors Δ _Ai serves as a criterion for determining whether or not the sound vector A5 is close to the actual sound vector of the sound source A. Therefore, the sum of the magnitudes of the difference vectors Δ _Ai is the smallest. If the direction of each sound source A and B giving the set of sound vectors Ai is determined as the estimated direction of each sound source A and B, the direction of sound sources A and B is separated. Even when there are a plurality of sound sources or when the influence of the reflected sound is large, the direction of the plurality of sound sources, or the direction of the direct sound and the direction of the reflected sound can be accurately separated and specified.

なお、上記最良の形態では、音ベクトルＡ５の大きさと位相角とを変化させて音源Ａと音源Ｂの方向を推定するようにしたが、図７に示すように、音ベクトルＡ５の位相角と音ベクトルＢ５の位相角とをそれぞれ変化させて音源Ａと音源Ｂの方向を推定するようにしてもよい。この場合、音ベクトルＡ５，Ｂ５の大きさは、音ベクトルＡ５と音ベクトルＢ５との和が観測音基準ベクトルＳ５になることから一義的に求めることができる。
また、上記例では、音源が２個である場合について説明したが、本発明はこれに限るものではなく、音源が３個以上の場合でも、図８に示すように、観測音基準ベクトルＳ５に対して各音ベクトルＡ５，Ｂ５，Ｃ５，‥‥を想定することにより、各音源Ａ，Ｂ，Ｃ‥‥からの音の到来の方向を推定することができる。例えば、音源が３個である場合には、音ベクトルＡ５，Ｂ５の大きさと位相角となる４つのパラメータを設定するか、音ベクトルＡ５の大きさと位相角と、音ベクトルＢ５，Ｃ５の位相角となる４つのパラメータを設定するなどすれば、音ベクトルの組（Ａｉ，Ｂｉ，Ｃｉ）を求めることができる。その後は、音源が２つの場合と同様に、上記音ベクトルＡ５から音源Ａの方向を推定して新たな音ベクトルの組（ａｉ，ｂｉ，ｃｉ）を求めて、上記音ベクトルの組（Ａｉ，Ｂｉ，Ｃｉ）と比較して、最も確からしい（Ａｉ，Ｂｉ，Ｃｉ）の組、すなわち、実際の音源Ａ，Ｂ，Ｃの音ベクトルに近い（Ａｉ，Ｂｉ，Ｃｉ）の組を求めて、上記音源Ａ，Ｂ，Ｃの方向を推定するようにすればよい。
また、音源Ａ及び音源Ｂの方向特定の判定基準を、上述した差のベクトルΔ_Aiの絶対値の和に代えて、差のベクトルΔ_Aiの積を用いても良い。あるいは、上記差のベクトルΔ_Aiの二乗の和の平方根を用いても良い。
また、音源が１個である場合でも、反射音の影響が強い場合には、音源Ａを直接音を発生する音源とし、音源Ｂを上記音源Ａからの反射音を発生する仮想音源（２次音源）とすれば、反射の起こる箇所の特定や反射音の影響を正確に把握することができる。 In the best mode, the direction of the sound source A and the sound source B is estimated by changing the magnitude and the phase angle of the sound vector A5. However, as shown in FIG. The direction of the sound source A and the sound source B may be estimated by changing the phase angle of the sound vector B5. In this case, the magnitudes of the sound vectors A5 and B5 can be uniquely determined because the sum of the sound vector A5 and the sound vector B5 becomes the observation sound reference vector S5.
In the above example, the case where there are two sound sources has been described. However, the present invention is not limited to this, and even when there are three or more sound sources, as shown in FIG. For each sound vector A5, B5, C5,..., It is possible to estimate the direction of arrival of the sound from each sound source A, B, C,. For example, when there are three sound sources, four parameters that are the magnitude and phase angle of the sound vectors A5 and B5 are set, or the magnitude and phase angle of the sound vector A5 and the phase angles of the sound vectors B5 and C5. For example, a set of sound vectors (Ai, Bi, Ci) can be obtained. Thereafter, as in the case of two sound sources, the direction of the sound source A is estimated from the sound vector A5 to obtain a new sound vector set (ai, bi, ci), and the sound vector set (Ai, Compared with (Bi, Ci), the most probable (Ai, Bi, Ci) set, that is, the (Ai, Bi, Ci) set that is close to the sound vectors of the actual sound sources A, B, C, is obtained. The directions of the sound sources A, B, and C may be estimated.
Further, instead of the sum of the absolute values of the difference vector Δ _Ai described above, the product of the difference vectors Δ _Ai may be used as the determination criterion for the direction of the sound source A and the sound source B. Alternatively, the square root of the sum of the squares of the difference vector Δ _Ai may be used.
Even when there is only one sound source, when the influence of reflected sound is strong, the sound source A is a sound source that directly generates sound, and the sound source B is a virtual sound source (secondary sound source that generates reflected sound from the sound source A). If it is a sound source), it is possible to accurately identify the location where reflection occurs and the influence of reflected sound.

本発明の音源探査システムを用いて、無響室内に所定距離離して設置した左右のスピーカから出力される２５０Ｈｚの正弦波を採取し、その音源位置を分離して特定したところ、分離処理前には、図９（ａ）に示すように、音源の推定位置が左右のスピーカのほぼ中間の位置であったのに対して、分離処理後には、図９（ｂ）に示すように、音源位置を左右のスピーカの２箇所に分離することができた。
また、左のスピーカと測定箇所との間に床（反射板）を設置し、左のスピーカのみから出力される１０００Ｈｚの正弦波を採取したところ、分離処理前には、図１０（ａ）に示すように、音源の推定位置が左のスピーカの右下側であったのに対して、分離処理後には、図１０（ｂ）に示すように、音源位置を左のスピーカのほぼ中心部と床面の２箇所に分離することができるとともに、反射位置についても推定できることが確認された。 Using the sound source search system of the present invention, 250 Hz sine waves output from left and right speakers installed at a predetermined distance in an anechoic chamber were sampled and their sound source positions were separated and specified. As shown in FIG. 9 (a), the estimated position of the sound source is a substantially intermediate position between the left and right speakers, but after the separation process, as shown in FIG. 9 (b) Could be separated into two locations on the left and right speakers.
In addition, a floor (reflector) was installed between the left speaker and the measurement location, and a 1000 Hz sine wave output from only the left speaker was collected. As shown in FIG. 10, the estimated position of the sound source was on the lower right side of the left speaker, but after the separation process, the sound source position is set to approximately the center of the left speaker as shown in FIG. It was confirmed that it could be separated into two places on the floor surface and the reflection position could be estimated.

以上説明したように、本発明によれば、音源が複数個である場合や、反射音の影響が大きい場合でも、複数の音源の方向、もしくは、音の到来方向と反射音の到来方向とを正確に分離して特定することができるので、遮音壁などの防音対策を行う場合に、その遮音効果を正確にシミュレーションすることができ、有効な騒音対策を行うことができる。 As described above, according to the present invention, the direction of a plurality of sound sources, or the direction of arrival of sound and the direction of arrival of reflected sound can be determined even when there are a plurality of sound sources or the influence of reflected sound is large. Since the sound can be accurately separated and specified, the sound insulation effect can be accurately simulated and effective noise countermeasures can be taken when performing sound insulation measures such as a sound insulation wall.

本発明の最良の形態に係わる音源探査システムの概要を示す図である。It is a figure which shows the outline | summary of the sound source search system concerning the best form of this invention. 音源の分離方法を示すフローチャートである。It is a flowchart which shows the separation method of a sound source. 各音源から基準マイクロフォンに到達する音の複素ベクトルを設定する方法を示す図である。It is a figure which shows the method of setting the complex vector of the sound which reaches | attains a reference | standard microphone from each sound source. 各マイクロフォンの音ベクトルの組を求める方法を示す図である。It is a figure which shows the method of calculating | requiring the set of the sound vector of each microphone. 各マイクロフォンの音ベクトルから、各マイクロフォンへ入力した音の位相と基準マイクロフォンへ入力した音の位相との差を求める方法を示す図である。It is a figure which shows the method of calculating | requiring the difference of the phase of the sound input into each microphone, and the phase of the sound input into the reference | standard microphone from the sound vector of each microphone. 仮定した音ベクトルの組から推定した推定音源から各マイクロフォンに到来する音の音ベクトルを求める方法を示す図である。It is a figure which shows the method of calculating | requiring the sound vector of the sound which arrives at each microphone from the estimated sound source estimated from the set of the assumed sound vector. 本発明による音ベクトルの他の設定方法を示す図である。It is a figure which shows the other setting method of the sound vector by this invention. 音源が３個である場合の音ベクトルの設定方法を示す図である。It is a figure which shows the setting method of a sound vector in case there are three sound sources. 本発明による２音源の分離処理の実施例を示す図である。It is a figure which shows the Example of the separation process of 2 sound sources by this invention. 本発明による反射音の分離処理の実施例を示す図である。It is a figure which shows the Example of the separation process of the reflected sound by this invention. 従来のマイクロフォンアレーを用いた音の到来方向の推定方法を示す図である。It is a figure which shows the estimation method of the arrival direction of the sound using the conventional microphone array. マイクロフォン対を用いた音源探査方法におけるマイクロフォンの配列を示す図である。It is a figure which shows the arrangement | sequence of the microphone in the sound source search method using a microphone pair.

Explanation of symbols

Ｍ１〜Ｍ５マイクロフォン、１１カメラ、１２増幅器、
１３Ａ／Ｄ変換器、１４ビデオ入出力ユニット、２０マイクロフォンフレーム、
３０基台、３１支持部材、３２回転台、４０音源位置推定装置、
４１キーボード、４２記憶・演算部、４２ａデータ記憶手段、
４２ｂ音源位置推定手段、４２ｃ画像合成手段、４２ｄ音源分離手段、
４２ｍパラメータファイル、４３ディスプレイ。 M1 to M5 microphones, 11 cameras, 12 amplifiers,
13 A / D converter, 14 video input / output unit, 20 microphone frame,
30 base, 31 support member, 32 turntable, 40 sound source position estimation device,
41 keyboard, 42 storage / calculation unit, 42a data storage means,
42b sound source position estimation means, 42c image composition means, 42d sound source separation means,
42m parameter file, 43 display.

Claims

A plurality of microphones for collecting observation sounds, which are synthesized sounds of sounds coming from a plurality of sound sources, and separating and specifying the directions of the sound sources using phase difference data between the microphones of the observation sounds. A sound source separation method,
A first step of detecting a phase difference between the microphones of the observation sound observed by a plurality of microphones;
When a complex vector corresponding to the observation sound input to the reference microphone is represented by a reference sound vector whose magnitude indicates the magnitude of the observation sound and whose phase angle is 0 °, it is input from each sound source to the reference microphone. Sound vectors corresponding to the sound to be reproduced are each a magnitude vector representing the magnitude of the sound coming from each sound source, a phase angle of which is a complex vector representing a phase difference with respect to the observed sound, and each sound source A second step of setting the sound vectors of the sound sources so that the sum of the sound vectors becomes the reference sound vector;
For each microphone other than the reference microphone, input to each microphone so that the sound vector of the set sound vector is equal in magnitude and the sum thereof becomes the sound vector of the observation sound input to the microphone. A third step of calculating a sound vector of each sound source and obtaining a set of sound vectors from each sound source for all microphones;
A fourth step of identifying a sound vector in one sound source direction from the set of sound vectors and obtaining a phase difference between the microphones for the identified sound vector;
A fifth step of estimating the direction of the identified sound source from the phase difference obtained in the fourth step;
A sixth step of calculating a phase difference between the microphones of sound coming from the estimated specific sound source;
For each microphone, from the phase difference obtained in the sixth step, a seventh step for calculating the sound vector of the estimated specific sound source for each microphone;
For each microphone, an eighth step for obtaining a vector of a difference between the sound vector calculated in the third step and the sound vector calculated in the seventh step;
In the second step, the magnitude and phase angle of the sound vector from each sound source are changed, and the second step to the eighth step are repeated for the changed sound vector. For each of the sound vectors set in the above step, the above difference vector is obtained, and based on the magnitude of the obtained difference vector, the most likely sound vector set of each sound source is identified, and the above identification is performed. A ninth step of estimating the direction of each sound source from the difference in phase angle between each sound vector of the set of sound vectors and the reference sound vector,
A method for separating a plurality of sound sources.

Two sets of detection of the phase difference between the microphones of the observation sound performed in the first step and estimation of the sound source direction performed in the fifth step are arranged at predetermined intervals on two straight lines that intersect each other. The plurality of sound sources according to claim 1, wherein the sound source position estimation device includes sound source position estimation means for estimating the direction of a sound source by using a phase difference between microphones constituting a pair of microphones. Sound source separation method.

Two sets of detection of the phase difference between the microphones of the observation sound performed in the first step and estimation of the sound source direction performed in the fifth step are arranged at predetermined intervals on two straight lines that intersect each other. A microphone group consisting of a pair of microphones and a fifth microphone that is not on a plane formed by the two pairs of microphones, a phase difference between the microphones constituting the two pairs of microphones, and the fifth microphone Sound source position estimating means for estimating the direction of the sound source using the phase difference between the microphones constituting the four microphone pairs configured with each of the four microphones constituting the two microphone pairs is provided. The method of separating a plurality of sound sources according to claim 1, wherein the sound source position estimating device is used.

4. The sound source according to claim 1, wherein a video in the vicinity of the estimated sound source direction is sampled, and a position of the sound source is specified from the estimated sound source direction and the sampled video. The method for separating multiple sound sources according to claim 1.