JP2015055843A

JP2015055843A - Number-of-signal-source estimation device, number-of-signal-source estimation method, and program

Info

Publication number: JP2015055843A
Application number: JP2013190661A
Authority: JP
Inventors: 信貴伊藤; Nobutaka Ito; 中谷　智広; Tomohiro Nakatani; 智広中谷; 荒木　章子; Akiko Araki; 章子荒木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-09-13
Filing date: 2013-09-13
Publication date: 2015-03-23
Anticipated expiration: 2033-09-13
Also published as: JP6082679B2

Abstract

PROBLEM TO BE SOLVED: To accurately estimate the number of signal sources even when the influence of reflection or reverberation is large.SOLUTION: A frequency region conversion unit 10 converts an observation signal into a time frequency region, and divides each frequency bin into a plurality of separation signals. An activity calculation unit 12 determines activities from the separation signals. A clustering unit 14 clusters the activities into a plurality of activity clusters. A number-of-signal-sources determination unit 16 determines an estimated value of the number of signal sources on the basis of the number of activities included in each activity cluster.

Description

この発明は、観測信号を用いて信号源の数を推定する技術に関する。 The present invention relates to a technique for estimating the number of signal sources using observation signals.

残響環境において、N個の信号源からの信号が混合されてM個のセンサで観測されたとする（Nは1以上の整数）。このとき、j（1≦j≦M）番目のセンサで観測される信号の時間周波数変換x_j(τ,f)は、式（１）でモデル化される。 It is assumed that signals from N signal sources are mixed and observed by M sensors in a reverberant environment (N is an integer of 1 or more). At this time, the time-frequency conversion x _j (τ, f) of the signal observed by the j (1 ≦ j ≦ M) -th sensor is modeled by Expression (1).

ここで、τは時間フレーム番号を表し、fは周波数ビン番号を表し、s_k(τ,f)はk番目の信号源の時間周波数変換を表す。また、h_jk(f)はk番目の信号源からj番目のセンサへの伝達関数を表し、信号源とセンサの距離による信号の遅延や減衰及び壁などにおける反射や残響の影響をモデル化している。 Here, τ represents a time frame number, f represents a frequency bin number, and s _k (τ, f) represents a time-frequency conversion of the k-th signal source. H _jk (f) represents the transfer function from the k-th signal source to the j-th sensor, and models the effects of signal delay and attenuation due to the distance between the signal source and the sensor, and reflection and reverberation on the wall. Yes.

この発明で扱う信号源数推定の問題は、M個のセンサによる観測信号x_j(τ,f)（j=1,…,M）を用いて、信号源の個数Nを推定する問題である。特に、クラスタリングに基づく信号源数推定の枠組みでは、信号源ごとに固有の値を取る特徴量f_ζ（ζ=1,…,L）を用い、その特徴量のヒストグラムが各信号源に対応するピークを持つことに基づき、クラスタリングとモデル選択により各ピークに一つの確率モデルを当てはめ、確率モデルの数を数えることにより信号源数を推定する。 The problem of estimating the number of signal sources handled in the present invention is a problem of estimating the number N of signal sources using observation signals x _j (τ, f) (j = 1,..., M) from M sensors. . In particular, in the framework of estimating the number of signal sources based on clustering, a feature quantity f _ζ (ζ = 1,..., L) that takes a unique value for each signal source is used, and a histogram of the feature quantity corresponds to each signal source. Based on having a peak, one probability model is applied to each peak by clustering and model selection, and the number of signal sources is estimated by counting the number of probability models.

従来の信号源数推定技術として、音の到来方向のクラスタリングによる音源数推定技術が知られている（例えば、特許文献１）。図１に、特許文献１に記載されたクラスタリングによる音源数推定装置の一例を示す。これは、上述のクラスタリングに基づく信号源数推定の一般的な枠組みにおいて、特徴量として到来方向を用いて、モデル選択機構として混合重みの事前分布にディリクレ分布を与えた場合に相当する。 As a conventional signal source number estimation technique, a sound source number estimation technique based on clustering of sound arrival directions is known (for example, Patent Document 1). FIG. 1 shows an example of an apparatus for estimating the number of sound sources by clustering described in Patent Document 1. This corresponds to a case in which the arrival direction is used as a feature amount and a Dirichlet distribution is given to the prior distribution of the mixture weight as a model selection mechanism in the general framework of signal source number estimation based on the clustering described above.

図１に示す音源数推定装置４００は、J（J≧2）個の収音手段２０_ｊ（例えば、マイクロフォン）（j=1,…,J）、周波数領域変換部３０、パワー推定部３２、到来方向推定部３４、方向情報分布推定装置２００、音源数推定部３６からなる。J個の収音手段２０_ｊにより観測された観測信号は、周波数領域変換部３０により周波数領域へ変換される。到来方向推定部３４は、周波数領域に変換された観測信号を用いて到来方向を計算する。パワー推定部３２は、クラスタリングの際に尤度関数に与える重みであるパワーを計算する。この重みは、パワーが小さく信頼性の小さいデータがクラスタリングへ与える影響を小さくするために与えられる。方向情報分布推定装置２００は、パワーと到来方向とを用いて、到来方向を各音源に対応するクラスタにクラスタリングする。音源数推定部３６は、クラスタの個数を数えることにより音源数の推定値を計算する。 The sound source number estimation apparatus 400 shown in FIG. 1 includes J (J ≧ 2) sound collection means 20 _j (for example, microphones) (j = 1,..., J), a frequency domain conversion unit 30, a power estimation unit 32, The arrival direction estimation unit 34, the direction information distribution estimation device 200, and the sound source number estimation unit 36 are included. Observation signals observed by the J sound collecting means 20 _j are converted into the frequency domain by the frequency domain conversion unit 30. The arrival direction estimation unit 34 calculates the arrival direction using the observation signal converted into the frequency domain. The power estimation unit 32 calculates a power that is a weight given to the likelihood function during clustering. This weight is given to reduce the influence of data with low power and low reliability on clustering. The direction information distribution estimation apparatus 200 clusters the arrival direction into clusters corresponding to each sound source using the power and the arrival direction. The sound source number estimation unit 36 calculates the estimated value of the number of sound sources by counting the number of clusters.

特開２０１０−１４５８３６号公報JP 2010-145836 A

観測信号を用いて計算した音の到来方向の推定値は、壁などにおける反射や残響の影響を受けやすいという問題がある。無響室のように壁などにおける反射や残響の影響が小さく、音の伝搬が平面波伝搬と近似できる場合には、音の到来方向を正確に推定することができるが、壁における反射や残響が大きい場合には正確に推定することが困難である。そのため、特許文献１の音源数推定装置では、このような場合には音源数を正確に推定することが困難であった。 There is a problem that the estimated value of the sound arrival direction calculated using the observation signal is easily affected by reflection or reverberation on a wall or the like. If the influence of reflection and reverberation on the wall is small like an anechoic room and the sound propagation can be approximated to plane wave propagation, the direction of sound arrival can be estimated accurately, but the reflection and reverberation on the wall are not. If it is large, it is difficult to estimate accurately. Therefore, it is difficult for the sound source number estimation device of Patent Document 1 to accurately estimate the number of sound sources in such a case.

この発明の目的は、壁などにおける反射や残響の影響が大きい場合でも、信号源数を正確に推定することである。 An object of the present invention is to accurately estimate the number of signal sources even when the influence of reflection or reverberation on a wall or the like is large.

上記の課題を解決するために、この発明の信号源数推定装置は、クラスタリング部と信号源数決定部を含む。クラスタリング部は、観測信号を周波数ビンごとに複数の分離信号に分離したときの、前記分離信号の各々から取得した分離信号のアクティビティを複数のアクティビティ・クラスタにクラスタリングする。信号源数決定部は、アクティビティ・クラスタのうち信号源に対応すると判定されるアクティビティ・クラスタの個数を信号源数の推定値として求める。 In order to solve the above-described problem, the signal source number estimation device of the present invention includes a clustering unit and a signal source number determination unit. The clustering unit clusters the activity of the separated signal acquired from each of the separated signals into a plurality of activity clusters when the observed signal is separated into a plurality of separated signals for each frequency bin. The signal source number determination unit obtains the number of activity clusters determined to correspond to the signal source among the activity clusters as an estimated value of the number of signal sources.

この発明の信号源数推定技術は、クラスタリングに基づく信号源数推定の枠組みにおいて、特徴量としてアクティビティを用いて信号源数を推定する。アクティビティは、壁などにおける反射や残響の有無にかかわらず、信号ごとに固有のパターンを示す特徴量である。したがって、この発明の信号源数推定技術によれば、壁などにおける反射や残響の影響が大きい場合でも、信号源数を正確に推定することができる。 The signal source number estimation technique according to the present invention estimates the number of signal sources using an activity as a feature amount in a framework of signal source number estimation based on clustering. An activity is a feature amount indicating a unique pattern for each signal regardless of the presence or absence of reflection or reverberation on a wall or the like. Therefore, according to the signal source number estimation technique of the present invention, the number of signal sources can be accurately estimated even when the influence of reflection or reverberation on a wall or the like is large.

図１は、従来の音源数推定装置の機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of a conventional sound source number estimation apparatus. 図２は、発明の原理を説明するための図である。FIG. 2 is a diagram for explaining the principle of the invention. 図３は、信号源数推定装置の機能構成を例示する図である。FIG. 3 is a diagram illustrating a functional configuration of the signal source number estimation apparatus. 図４は、信号源数推定方法の処理フローを例示する図である。FIG. 4 is a diagram illustrating a processing flow of the signal source number estimation method. 図５は、クラスタリング部の機能構成を例示する図である。FIG. 5 is a diagram illustrating a functional configuration of the clustering unit. 図６は、信号源数決定部の機能構成を例示する図である。FIG. 6 is a diagram illustrating a functional configuration of the signal source number determination unit. 図７は、実験環境を説明するための図である。FIG. 7 is a diagram for explaining the experimental environment.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the component which has the same function in drawing, and duplication description is abbreviate | omitted.

[発明の原理]
実施形態の説明に先立ち、この発明の原理について説明する。 [Principle of the invention]
Prior to the description of the embodiments, the principle of the present invention will be described.

この発明では、従来のクラスタリングにおける信号源数推定の枠組みにおいて、特徴量としてアクティビティを用い、アクティビティをクラスタリングすることで信号源数を推定する。信号のアクティビティとは、分離信号の振幅の時間変化に対応する時系列情報であり、例えば、時間区間ごとの分離信号の振幅値の時系列、時間区間ごとの分離信号の寄与度の時系列などである。信号のアクティビティは、壁などにおける反射や残響の有無にかかわらず、信号ごとに固有のパターンを示すという性質を持つ。この性質に基づいて信号のアクティビティをクラスタリングすることで、壁などにおける反射や残響の影響が大きい場合でも、信号源数を正確に推定することが可能となる。 In the present invention, in the conventional framework for estimating the number of signal sources in clustering, the number of signal sources is estimated by clustering the activities using the activity as the feature amount. The signal activity is time series information corresponding to the time change of the amplitude of the separated signal. For example, the time series of the amplitude value of the separated signal for each time interval, the time series of the contribution degree of the separated signal for each time interval, etc. It is. Signal activity has the property of showing a unique pattern for each signal, regardless of the presence or absence of reflection or reverberation on a wall or the like. By clustering the signal activity based on this property, the number of signal sources can be accurately estimated even when the influence of reflection or reverberation on a wall or the like is large.

信号のアクティビティを得るために、従来から知られている音源分離方法を応用する。ただし、従来の音源分離方法は、音源数が既知であることが前提とされるが、この発明では信号源数が未知の状況である。そのため、想定される信号源数よりも多い数を仮定して、従来の音源分離方法により分離信号を得る。 In order to obtain signal activity, a conventionally known sound source separation method is applied. However, although the conventional sound source separation method is premised on the fact that the number of sound sources is known, in the present invention, the number of signal sources is unknown. Therefore, a separated signal is obtained by a conventional sound source separation method, assuming that the number is larger than the assumed number of signal sources.

こうして得られた分離信号には、信号源に対応しない雑音などの信号が含まれる。そのため、信号源に対応しない分離信号を信号源に対応するものとして扱うことで信号源数を過大推定してしまうことを防ぐ必要がある。これを実現するためには、信号源に対応しない分離信号を除去する、もしくは、信号源に対応しない分離信号が信号源数推定に与える影響を小さくするなどの対策が考えられる。 The separated signal thus obtained includes a signal such as noise that does not correspond to the signal source. Therefore, it is necessary to prevent the number of signal sources from being overestimated by treating a separated signal that does not correspond to the signal source as corresponding to the signal source. In order to realize this, measures such as removing a separated signal that does not correspond to the signal source or reducing the influence of the separated signal that does not correspond to the signal source on the estimation of the number of signal sources can be considered.

そこで、この発明では、信号源に対応しない分離信号のアクティビティが、信号源に対応する分離信号のアクティビティと比較して小さな値をとることが多いという性質を利用して、信号源に対応しない分離信号の影響を排除する。すなわち、分離信号のアクティビティをクラスタリングする際に、小さいアクティビティをクラスタリングに利用しない、あるいは、小さいアクティビティからなるクラスタを排除する。 Therefore, the present invention takes advantage of the fact that the activity of the separated signal that does not correspond to the signal source often takes a smaller value compared to the activity of the separated signal that corresponds to the signal source, and therefore the separated signal that does not correspond to the signal source. Eliminate signal effects. That is, when clustering the activities of the separated signals, small activities are not used for clustering, or clusters consisting of small activities are excluded.

同一の信号源に由来する分離信号は、そのアクティビティが互いに類似するため、同じクラスタに分類されることが期待される。また、信号源に対応しない分離信号のアクティビティは小さい値を持つので、クラスタリングに用いられない、または、クラスタリング結果から排除される。したがって、最終的にアクティビティをクラスタリングして得られるクラスタの数が信号源の数と高い精度で一致することが期待される。なお、後述の実施形態ではアクティビティをクラスタリングして得られるクラスタを、アクティビティ・クラスタと呼ぶ。 The separated signals originating from the same signal source are expected to be classified into the same cluster because their activities are similar to each other. Moreover, since the activity of the separated signal not corresponding to the signal source has a small value, it is not used for clustering or excluded from the clustering result. Therefore, it is expected that the number of clusters finally obtained by clustering activities will coincide with the number of signal sources with high accuracy. In the embodiment described later, a cluster obtained by clustering activities is called an activity cluster.

図２に上述した発明の原理を概念的に示す。図中左側の「Speech activity sequences」は観測信号を周波数ビンごとに分離した分離信号から求めたアクティビティである。図中右側の「Speech activity clusters」はアクティビティをクラスタリングしたアクティビティ・クラスタである。詳細は後述するが、アクティビティの時系列をクラスタリングし、アクティビティの代表値ρ_iが大きいアクティビティ・クラスタ（図中では「Active」と表す）は信号源に対応するものと判定する。一方、アクティビティの代表値ρ_iが小さいアクティビティ・クラスタ（図中では「Garbage」と表す）は信号源に対応しないものと判定する。そして、信号源に対応するものと判定された上記アクティビティ・クラスタの数を信号源数として決定する。 FIG. 2 conceptually shows the principle of the invention described above. "Speech activity sequences" on the left side of the figure is the activity obtained from the separated signal obtained by separating the observation signal for each frequency bin. "Speech activity clusters" on the right side of the figure is an activity cluster obtained by clustering activities. As will be described in detail later, the time series of activities is clustered, and an activity cluster (represented as “Active” in the figure) having a large activity representative value ρ _i is determined to correspond to a signal source. On the other hand, it is determined that an activity cluster (represented as “Garbage” in the figure) having a small activity representative value ρ _i does not correspond to a signal source. Then, the number of the activity clusters determined to correspond to the signal source is determined as the number of signal sources.

[発明のポイント]
この発明の第一のポイントは、従来のクラスタリングによる信号源数推定の枠組みにおいて、特徴量としてアクティビティを用い、アクティビティをクラスタリングする点である。 [Points of invention]
The first point of the present invention is that activities are clustered using activities as feature quantities in the conventional framework for estimating the number of signal sources by clustering.

第二のポイントは、信号源数が事前に分かってない状況で、観測信号からアクティビティを求めるために、実際よりも多い信号源数を仮定して信号源分離を行うことにより、アクティビティを求めることである。 The second point is to obtain activity by performing signal source separation assuming a larger number of signal sources than actual in order to obtain activity from the observed signal in a situation where the number of signal sources is not known in advance. It is.

第三のポイントは、信号に対応しないアクティビティを除去するか、これによる影響を小さくする手段を導入したことである。これは、実際よりも多い信号源数を仮定したことにより信号に対応しないアクティビティが生じ、信号に対応しないアクティビティを信号に対応するアクティビティと誤認識することで信号源数を過大に推定してしまうことを防ぐためである。 The third point is that a means for removing an activity not corresponding to the signal or reducing the influence thereof is introduced. This is because an activity that does not correspond to the signal occurs due to the assumption that the number of signal sources is larger than the actual number, and an activity that does not correspond to the signal is mistakenly recognized as an activity that corresponds to the signal, thereby overestimating the number of signal sources. This is to prevent this.

［第一実施形態］
図３を参照して、第一実施形態に係る信号源数推定装置１００の機能構成の一例を説明する。信号源数推定装置１００は、周波数領域変換部１０、アクティビティ計算部１２、クラスタリング部１４、信号源数決定部１６を含む。信号源数推定装置１００は、例えば、中央演算処理装置（Central Processing Unit、CPU）、主記憶装置（Random Access Memory、RAM）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。信号源数推定装置１００は、例えば、中央演算処理装置の制御のもとで各処理を実行する。信号源数推定装置１００に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。 [First embodiment]
An example of a functional configuration of the signal source number estimation apparatus 100 according to the first embodiment will be described with reference to FIG. The signal source number estimation apparatus 100 includes a frequency domain conversion unit 10, an activity calculation unit 12, a clustering unit 14, and a signal source number determination unit 16. The signal source number estimation device 100 is configured by, for example, reading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (Random Access Memory, RAM), and the like. Special equipment. For example, the signal source number estimation device 100 executes each process under the control of the central processing unit. The data input to the signal source number estimation device 100 and the data obtained in each process are stored in, for example, a main storage device, and the data stored in the main storage device is read out as necessary to perform other processing. Used for

図４を参照して、信号源数推定装置１００の実行する信号源数推定方法の処理フローの一例を、実際に行われる手続きの順に従って説明する。 With reference to FIG. 4, an example of the processing flow of the signal source number estimation method executed by the signal source number estimation apparatus 100 will be described in the order of procedures actually performed.

ステップＳ１において、周波数領域変換部１０へM個のマイクロフォン２０₁,…,２０_Mにより観測された時間領域の混合信号~x(t)が入力される。時間領域の混合信号~x(t)は式（２）で定義される。 In step S1, M number of microphones 20 ₁ to the frequency domain conversion unit 10, ..., 20 mixed signals observed time domain by _M ~ x (t) is input. The mixed signal ~ x (t) in the time domain is defined by equation (2).

ここで、~x_m(t)はm番目のマイクロフォン２０_mで観測された時間領域の混合信号を表し、tは時間インデックスを表し、上付き文字のTはベクトルの転置を表す。 Here, ~ x _m (t) represents a time-domain mixed signal observed by the m-th microphone 20 _m , t represents a time index, and superscript T represents transposition of a vector.

周波数領域変換部１０は、入力された時間領域の混合信号~x(t)に対して、短時間フーリエ変換などの時間周波数変換を行い、時間周波数領域の混合信号x(τ,f)を生成し、出力する。出力した時間周波数領域の混合信号x(τ,f)はアクティビティ計算部１２へ入力される。時間周波数領域の混合信号x(τ,f)は式（３）で定義される。 The frequency domain transform unit 10 performs time frequency transform such as short-time Fourier transform on the input time domain mixed signal to x (t) to generate a time frequency domain mixed signal x (τ, f). And output. The output mixed signal x (τ, f) in the time frequency domain is input to the activity calculator 12. The mixed signal x (τ, f) in the time-frequency domain is defined by Equation (3).

ここで、x_m(τ,f)はm番目のマイクロフォン２０_mで観測された時間領域の混合信号~x_m(t)の時間周波数変換を表し、τは時間フレーム番号を表し、fは周波数ビン番号を表す。 Here, x _m (τ, f) represents the time-frequency conversion of the time domain mixed signal to x _m (t) observed by the m-th microphone 20 _m , τ represents the time frame number, and f represents the frequency. Represents the bin number.

ステップＳ２において、アクティビティ計算部１２は、入力された時間周波数領域の混合信号x(τ,f)に対し、アクティビティy(ζ)を計算し出力する。出力したアクティビティy(ζ)はクラスタリング部１４へ入力される。 In step S2, the activity calculation unit 12 calculates and outputs an activity y (ζ) for the input mixed signal x (τ, f) in the time frequency domain. The output activity y (ζ) is input to the clustering unit 14.

第一実施形態では、分離信号のアクティビティとして、時間区間ごとの分離信号の寄与度（事後確率）の時系列、すなわち、時間周波数領域の混合信号x(τ,f)にk（1≦k≦K）番目のクラスタが寄与する事後確率η_kτfを用いる。事後確率η_kτfを求める方法は、例えば、「特開２００９―０５３３４９号公報（参考文献１）」に開示されている方法が利用できる。 In the first embodiment, as the activity of the separated signal, the time series of the contribution (posterior probability) of the separated signal for each time interval, that is, the mixed signal x (τ, f) in the time frequency domain is k (1 ≦ k ≦ K) Use the posterior probability η _kτf to which the th cluster contributes. As a method for obtaining the posterior probability η _kτf , for example, a method disclosed in “JP 2009-053349 A (Reference Document 1)” can be used.

クラスタの総数Kは想定される信号源数の上限以上に設定する。各周波数ビンfにおいて各クラスタkに対して事後確率の時系列[η_k1f…η_kTf]^Tが定義できる。ここで、下付き文字に現れるTはフレームの総数、上付き文字のTはベクトルの転置を表す。これらにζ=k+(f-1)Kで定義される通し番号を振って、アクティビティy(ζ)を式（４）により定義する。 The total number K of clusters is set to be equal to or greater than the upper limit of the number of signal sources assumed. A time series [η _k1f ... η _kTf ] ^{T of} posterior probabilities can be defined for each cluster k in each frequency bin f. Here, T appearing in the subscript represents the total number of frames, and T of the superscript represents the transposition of the vector. A serial number defined by ζ = k + (f−1) K is assigned to these, and activity y (ζ) is defined by equation (4).

アクティビティの総数はL:=K×Fとなる。ここで、Fは周波数ビンの総数である。 The total number of activities is L: = K × F. Here, F is the total number of frequency bins.

なお、事後確率は上述の例のようにフレームごとに求めてもよいし、数フレームごとに求めたものを用いてもよい。 The posterior probability may be obtained for each frame as in the above example, or may be obtained for every several frames.

ステップＳ３からＳ９において、クラスタリング部１４は、入力されたアクティビティy(ζ)をアクティビティの分布を表す確率モデルに当てはめ、確率モデルを評価する所定の評価関数を用いて確率モデルのモデルパラメータを計算することにより、クラスタリングを行う。 In steps S3 to S9, the clustering unit 14 applies the input activity y (ζ) to a probability model representing the distribution of activities, and calculates model parameters of the probability model using a predetermined evaluation function for evaluating the probability model. Thus, clustering is performed.

信号の振幅の立ち上がり及び立下がりは、信号源ごとに固有のパターンを示すため、同じ信号源に対応する分離信号のアクティビティは互いに類似する。この発明では、この性質を利用して、アクティビティをクラスタリングする。同じ信号源に由来する分離信号のアクティビティを一つのクラスタにまとめることにより、アクティビティのクラスタ（以下、アクティビティ・クラスタとも言う）の個数に基づいて信号源数を推定することができる。 Since the rise and fall of the amplitude of the signal shows a unique pattern for each signal source, the activities of the separated signals corresponding to the same signal source are similar to each other. In the present invention, this property is used to cluster activities. By collecting the activities of separated signals originating from the same signal source into one cluster, the number of signal sources can be estimated based on the number of activity clusters (hereinafter also referred to as activity clusters).

アクティビティの分布を表す確率モデルは、例えば、以下のようにモデル化されたものである。i番目のクラスタに関するアクティビティの分布を、例えば、式（５）に示すようにワトソン（Watson）分布でモデル化する。 The probability model representing the activity distribution is modeled as follows, for example. For example, the activity distribution relating to the i-th cluster is modeled by a Watson distribution as shown in Expression (5).

ここで、Γはガンマ関数である。また、z(ζ)は、前処理として、アクティビティy(ζ)をノルムが1となるように式（６）に示すように正規化したものである。 Here, Γ is a gamma function. Also, z (ζ) is normalized as shown in the equation (6) so that the norm is 1 as preprocessing.

ここで、||・||はユークリッドノルムを表す。w_iはクラスタiに関する正規化後のアクティビティz(ζ)の分布の中心を表し、平均方向（mean orientation）と呼ばれ、||w_i||=1なる制約条件を満たす。к_iはクラスタiに関する正規化後のアクティビティz(ζ)の分布の集中度合いを表し、密度パラメータ（concentration parameter）と呼ばれる。M(a,b,x)はクンマー（Kummer）関数である。クンマー関数についての詳細は、例えば、「S. Sra and D. Karp, “The multivariate Watson distribution: Maximum-likelihood estimation and other aspects”, Journal of Multivariate Analysis, vol. 114, pp. 256-269, Feb. 2013.（参考文献２）」を参照されたい。 Here, || · || represents the Euclidean norm. w _i represents the center of the distribution of activity z (ζ) after normalization with respect to cluster i, which is called a mean orientation, and satisfies the constraint condition || w _i || = 1. к _i represents the degree of concentration of the distribution of activity z (ζ) after normalization with respect to cluster i, and is called a density parameter. M (a, b, x) is a Kummer function. For details on the Kummer function, see, for example, “S. Sra and D. Karp,“ The multivariate Watson distribution: Maximum-likelihood estimation and other aspects ”, Journal of Multivariate Analysis, vol. 114, pp. 256-269, Feb. See 2013. (Reference 2).

以上より、正規化後のアクティビティz(ζ)の尤度関数は、式（７）で表される混合モデルで与えられる。 As described above, the likelihood function of the activity z (ζ) after normalization is given by the mixed model represented by the equation (7).

ここで、α_iは混合重みであり、α_iはΣ_i=1 ^Kα_i=1なる制約条件を満たす。また、Θは式（８）に示すパラメータ集合である。 Here, α _i is a mixing weight, and α _i satisfies the constraint condition that Σ _{i = 1} ^K α _i = 1. Θ is a parameter set shown in the equation (8).

ここで、例えば{α_i}_iのような記法は、{α_i|∀i}なる集合を表すものとする。なお、iはアクティビティ・クラスタのインデックスを表しており、時間周波数成分のクラスタのインデックスであるkと区別されたい。 Here, for example, a notation such as {α _i } _i represents a set {α _i | ∀i}. Note that i represents an index of an activity cluster and should be distinguished from k, which is an index of a cluster of time frequency components.

第一実施形態においては、簡単のため、アクティビティ・クラスタの数を、時間周波数成分のクラスタ数と同数のKと仮定しているが、これらは必ずしも同数である必要はない。アクティビティ・クラスタの数は、存在しうる信号源の上限以上となっていれば、自由に設定することができる。 In the first embodiment, for the sake of simplicity, it is assumed that the number of activity clusters is the same number K as the number of clusters of time frequency components, but it is not always necessary to have the same number. The number of activity clusters can be freely set as long as the number of activity clusters is equal to or greater than the upper limit of possible signal sources.

すでに述べたように、アクティビティは信号源ごとに固有のパターンを示すため、アクティビティの分布は、各信号源に対応する複数のピークをもつ。アクティビティ・クラスタの個数に基づいて信号源数を推定するためには、アクティビティの分布の各ピークに一つの確率分布モデルをフィッティングさせることが重要である。そこで、混合重みに、事前分布として式（９）で与えられるディリクレ（Dirichlet）事前分布を与える。 As already mentioned, since the activity shows a unique pattern for each signal source, the activity distribution has a plurality of peaks corresponding to each signal source. In order to estimate the number of signal sources based on the number of activity clusters, it is important to fit one probability distribution model to each peak of the activity distribution. Therefore, the Dirichlet prior distribution given by the equation (9) is given to the mixture weight as a prior distribution.

ここで、Γはガンマ関数であり、φはハイパーパラメータである。φを0＜φ＜1の範囲に定めることにより、少数の混合重みのみが大きな値を取りやすいようにすることができる。混合重み以外のパラメータについては一様な事前分布を仮定する。したがって、パラメータ集合Θの事前分布は、式（１０）で表される。 Here, Γ is a gamma function, and φ is a hyperparameter. By defining φ in the range of 0 <φ <1, it is possible to make it easy for only a small number of mixing weights to take a large value. A uniform prior distribution is assumed for parameters other than the mixture weight. Therefore, the prior distribution of the parameter set Θ is expressed by Expression (10).

ここで、記号∝は、比例を表す。 Here, the symbol ∝ represents proportionality.

クラスタリング部１４では、正規化後のアクティビティz(ζ)を、以上のようにモデル化された確率モデルに当てはめ、確率モデルを評価する所定の評価関数を用いて、事後確率及び最適なパラメータ集合Θを求め、その結果に基づいて、アクティビティをクラスタリングする。 The clustering unit 14 applies the activity z (ζ) after normalization to the probability model modeled as described above, and uses a predetermined evaluation function for evaluating the probability model, and uses the posterior probability and the optimum parameter set Θ. And clustering activities based on the result.

以下、クラスタリング部１４の処理をより具体的に説明する。 Hereinafter, the processing of the clustering unit 14 will be described more specifically.

クラスタリング部１４は、例えば、図５に示すとおり、正規化部１４０、事後確率計算部１４２、パラメータ更新部１４４、パラメータ保持部１４６、クラスタ決定部１４８から構成される。パラメータ更新部１４４は、さらに、混合重み更新手段１４４０、相関行列更新手段１４４２、平均方向更新手段１４４４、密度パラメータ更新手段１４４６、重み更新手段１４４８から構成される。 The clustering unit 14 includes, for example, a normalization unit 140, a posterior probability calculation unit 142, a parameter update unit 144, a parameter holding unit 146, and a cluster determination unit 148, as shown in FIG. The parameter updating unit 144 further includes a mixed weight updating unit 1440, a correlation matrix updating unit 1442, an average direction updating unit 1444, a density parameter updating unit 1446, and a weight updating unit 1448.

図４に示すステップＳ３において、クラスタリング部１４は、混合重みα_i、平均方向w_i、密度パラメータк_iからなるパラメータ集合Θの初期値をパラメータ保持部１４６に記憶する。この初期値は、例えば、式（１１）に示すように設定すればよい。 In step S 3 shown in FIG. 4, the clustering unit 14 stores the initial value of the parameter set Θ composed of the mixture weight α _i , the average direction w _i , and the density parameter к _i in the parameter holding unit 146. For example, the initial value may be set as shown in Expression (11).

また、w_iは{z(ζ)}_ζから無作為に選べばよい。 In addition, w _i may be selected at random from {z (ζ)} _ζ .

ステップＳ４において、正規化部１４０は、クラスタリングに先立つ前処理として、入力されたアクティビティy(ζ)を、式（１２）に示すようにユークリッドノルムが1となるように正規化し、その計算結果を正規化後のアクティビティz(ζ)として出力する。 In step S4, the normalization unit 140 normalizes the input activity y (ζ) so that the Euclidean norm becomes 1 as shown in Expression (12) as pre-processing prior to clustering, and the calculation result is obtained. Output as activity z (ζ) after normalization.

ステップＳ５において、事後確率計算部１４２は、正規化部１４０が出力した正規化後のアクティビティz(ζ)と、パラメータ保持部１４６に記憶されたパラメータ集合Θ’:={{α’_i}_i,{w’_i}_i,{к’_i}_i}とを用いて、正規化後のアクティビティz(ζ)に関する事後確率γ_i(ζ)を式（１３）により計算し、出力する。ここで、事後確率γ_i(ζ):=p(i|z(ζ),Θ’)は、正規化後のアクティビティz(ζ)がi番目のクラスタに属する事後確率である。 In step S 5, the posterior probability calculation unit 142 outputs the activity z (ζ) after normalization output from the normalization unit 140 and the parameter set Θ ′: = {{α ′ _i } _i stored in the parameter holding unit 146. , {w ′ _i } _i , {к ′ _i } _i }, the posterior probability γ _i (ζ) related to the activity z (ζ) after normalization is calculated by the equation (13) and output. Here, the posterior probability γ _i (ζ): = p (i | z (ζ), Θ ′) is the posterior probability that the activity z (ζ) after normalization belongs to the i-th cluster.

ステップＳ６において、パラメータ更新部１４４の混合重み更新手段１４４０は、式（１４）に示すように、事後確率計算部１４２の出力する事後確率γ_i(ζ)とパラメータ保持部１４６に記憶された重みλ’とを用いて、混合重みα_iを更新する。 In step S6, the mixture weight updating unit 1440 of the parameter updating unit 144 outputs the posterior probability γ _i (ζ) output from the posterior probability calculating unit 142 and the weight stored in the parameter holding unit 146, as shown in Expression (14). The mixing weight α _i is updated using λ ′.

ステップＳ６において、パラメータ更新部１４４の相関行列更新手段１４４２は、事後確率計算部１４２の出力する事後確率γ_i(ζ)と正規化部１４０の出力するアクティビティz(ζ)とを用いて、クラスタiに関する相関行列R_iを計算する。相関行列R_iは式（１５）により定義される。 In step S 6, the correlation matrix updating unit 1442 of the parameter updating unit 144 uses the posterior probability γ _i (ζ) output from the posterior probability calculating unit 142 and the activity z (ζ) output from the normalizing unit 140 to use the cluster. Compute the correlation matrix R _i for _i . The correlation matrix R _i is defined by equation (15).

ステップＳ６において、パラメータ更新部１４４の平均方向更新手段１４４４は、相関行列R_iの最大固有値に対応する固有ベクトルであって、ユークリッドノルムが1であるものとして、平均方向w_iを更新する。 In step S6, the average direction updating unit 1444 of the parameter updating unit 144 updates the average direction w _i assuming that the eigenvector corresponds to the maximum eigenvalue of the correlation matrix R _i and the Euclidean norm is 1.

ステップＳ６において、パラメータ更新部１４４の密度パラメータ更新手段１４４６は、式（１６）に示すように、平均方向w_iと相関行列R_iとを用いて、密度パラメータк_iを更新する。 In step S6, the density parameter updating unit 1446 of the parameter updating unit 144 updates the density parameter к _i using the average direction w _i and the correlation matrix R _i as shown in Expression (16).

ここで、r_i:=w_i ^TR_iw_iである。 Here, r _i : = w _i ^T R _i w _i .

ステップＳ６において、パラメータ更新部１４４の重み更新手段１４４８は、式（１７）に示すように、混合重みα_i、平均方向w_i、密度パラメータк_iを用いて、重みλを更新する。 In step S6, the weight updating unit 1448 of the parameter updating unit 144 updates the weight λ using the mixture weight α _i , the average direction w _i , and the density parameter к _i as shown in Expression (17).

以下、パラメータ更新部１４４における各パラメータの更新式の導出根拠を説明する。パラメータの更新はEMアルゴリズムを導出してそれに基づき行う。なお、ワトソン分布のインデックスiは、EMアルゴリズムにおける隠れ変数として扱う。まず、MAP（maximum a posteriori）推定のための評価関数G(Θ)は、式（１８）（１９）により与えられる。 Hereinafter, the basis for deriving the update formula for each parameter in the parameter update unit 144 will be described. The parameter update is performed based on the EM algorithm. The index i of the Watson distribution is treated as a hidden variable in the EM algorithm. First, an evaluation function G (Θ) for MAP (maximum a posteriori) estimation is given by equations (18) and (19).

ここで、すでに定義したようにp(z(ζ)│i,w_i,к_i)は式（２０）で与えられ、const.はパラメータ集合Θによらない定数を表す。 Here, as already defined, p (z (ζ) | _i , w _i , к _i ) is given by equation (20), and const. Represents a constant that does not depend on the parameter set Θ.

EMアルゴリズムで用いる評価関数であるQ関数は、式（２１）で与えられる。 A Q function, which is an evaluation function used in the EM algorithm, is given by Expression (21).

式（２１）の第二項は、ディリクレ事前分布に由来する項であり、少数の混合重みα_iのみが大きな値を取るようにする効果がある。さらに、マイクや信号源の配置などの条件が変わっても、より適するφの値を適応的に決定するために、「S. Medasani and R. Krishnapuram, “Categorization of image databases for efficient retrieval using robust mixture decomposition”, Computer Vision and Image Understanding, vol. 83, pp. 216-235, 2001.（参考文献３）」にならい、式（２１）を変更し、式（２２）に示すようにハイパーパラメータφの代わりに、適応的なハイパーパラメータλを導入する。 The second term of the equation (21) is a term derived from the Dirichlet prior distribution, and has an effect that only a small number of mixture weights α _i take a large value. Furthermore, in order to adaptively determine a more suitable value for φ even if conditions such as the placement of microphones and signal sources change, “S. Medasani and R. Krishnapuram,“ Categorization of image databases for efficient retrieval using robust mixture Following “decomposition”, Computer Vision and Image Understanding, vol. 83, pp. 216-235, 2001. (reference 3), equation (21) is changed and hyperparameter φ is changed as shown in equation (22). Instead, an adaptive hyperparameter λ is introduced.

ハイパーパラメータλは、EMアルゴリズムの反復ごとに、式（２３）により更新される。 The hyperparameter λ is updated by equation (23) for each iteration of the EM algorithm.

各パラメータの更新式は、上記の修正されたQ関数~Q(Θ)のパラメータによる偏微分を0と置くことにより導かれる。ただし、制約条件が存在する場合には、ラグランジュ（Lagrange）の未定乗数法により導かれる。 The update equation for each parameter is derived by setting the partial derivative with respect to the parameters of the modified Q function to Q (Θ) as 0. However, if there is a constraint, it is derived by the Lagrange undetermined multiplier method.

まず、混合重みの更新式は、制約条件Σ_i=1 ^Kα_i=1に留意して、式（２４）に対し、式（２５）の連立方程式を解くことにより得られる。ここで、μはラグランジュの未定乗数である。 First, the blend weight update formula is obtained by solving the simultaneous equations of Formula (25) with respect to Formula (24) while paying attention to the constraint condition Σ _{i = 1} ^K α _i = 1. Here, μ is Lagrange's undetermined multiplier.

平均方向の更新式は、制約条件||w_i||=1に留意して、式（２６）に対し、式（２７）の連立方程式を解くことにより得られる。 The update equation in the average direction is obtained by solving the simultaneous equations of Equation (27) with respect to Equation (26) with the constraint condition || w _i || = 1.

密度パラメータの更新式は、まず、Q関数~Q(Θ)を密度パラメータк_iで偏微分し、式(28)を得る。 In the density parameter update formula, first, the Q function ~ Q (Θ) is partially differentiated by the density parameter к _i to obtain formula (28).

ここで、M'(1/2,T/2,к_i)は、M(1/2,T/2,к_i)のк_iによる偏微分であり、R_iはすでに定義したように、式（２９）により定義される。また、r_i:=w_i ^TR_iw_iである。 Where M ′ (1/2, T / 2, к _i ) is the partial differentiation of M (1/2, T / 2, к _i ) by к _i , and R _i is It is defined by equation (29). Also, r _i : = w _i ^T R _i w _i .

「S. Sra and D. Karp, “The multivariate Watson distribution: Maximum-likelihood estimation and other aspects”, Journal of Multivariate Analysis, vol. 114, pp. 256-269, Feb. 2013.（参考文献４）」に記載の近似法により、式（３０）に示すように、κ_iの近似解が得られる。 “S. Sra and D. Karp,“ The multivariate Watson distribution: Maximum-likelihood estimation and other aspects ”, Journal of Multivariate Analysis, vol. 114, pp. 256-269, Feb. 2013. (reference 4) By the described approximation method, an approximate solution of κ _i is obtained as shown in equation (30).

ステップＳ７において、クラスタリング部１４は、事前に設定したパラメータ更新回数の最大値max_iter（例えば200）に達しているか、または式（３１）に示す収束条件を満たしているかを判定する。 In step S 7, the clustering unit 14 determines whether a preset parameter update count maximum value max_iter (for example, 200) has been reached or whether the convergence condition shown in Expression (31) is satisfied.

ここで、thr_convは収束判定のための閾値であり、例えば10^-10とする。 Here, thr_conv is a threshold value for convergence determination, and is set to 10 ⁻¹⁰ , for example.

ステップＳ８において、パラメータ保持部１４６は、パラメータ更新部１４４での更新処理により得られたパラメータ集合Θ及び重みλを記憶する。パラメータ保持部１４６は、事後確率計算部１４２及びパラメータ更新部１４４での次回の処理の際に、保持しているパラメータ集合Θ及び重みλをパラメータ集合Θ'及び重みλ'として提供する。 In step S 8, the parameter holding unit 146 stores the parameter set Θ and the weight λ obtained by the update process in the parameter update unit 144. The parameter holding unit 146 provides the held parameter set Θ and weight λ as the parameter set Θ ′ and weight λ ′ in the next processing in the posterior probability calculation unit 142 and the parameter update unit 144.

クラスタリング部１４は、ステップＳ７においてパラメータ更新回数の最大値に達するか式（３１）に示した収束条件を満たすまで、事後確率計算部１４２及びパラメータ更新部１４４における処理並びにパラメータ保持部１４６への更新データの読み書きを繰り返し実行する。反復終了後の事後確率γ_i(ζ)は、クラスタ決定部１４８に入力される。 The clustering unit 14 performs processing in the posterior probability calculation unit 142 and the parameter update unit 144 and updates to the parameter holding unit 146 until the maximum value of the parameter update count is reached in step S7 or the convergence condition shown in Expression (31) is satisfied. Repeatedly read and write data. The posterior probability γ _i (ζ) after the end of the iteration is input to the cluster determination unit 148.

ステップＳ９において、クラスタ決定部１４８は、正規化する前のアクティビティy(ζ)と反復終了後の事後確率γ_i(ζ)とを用い、アクティビティ・クラスタC_i（1≦i≦K）を式（３２）により決定する。 In step S9, the cluster determination unit 148 uses the activity y (ζ) before normalization and the posterior probability γ _i (ζ) after the end of the iteration, to calculate the activity cluster C _i (1 ≦ i ≦ K). (32).

上述のクラスタリング部１４の例では、アクティビティの分布の各ピークに一つの確率分布モデルをフィッティングするためのモデル選択の機構を備えたクラスタリング法として、混合重みに適応的なハイパーパラメータを持つディリクレ事前分布を与えた混合モデルを用いていた。ただし、モデル選択の機構を備えたクラスタリング法は、これに限らず様々な方法を用いてもよく、例えば、固定されたハイパーパラメータを持つディリクレ事前分布を用いてもよいし、ノンパラメトリックベイズモデル（例えば、ディリクレ過程混合モデル）、情報量基準（例えば、赤池情報量基準やベイズ情報量基準）、交叉検証（cross validation）、仮説検定などを用いてもよい。 In the example of the clustering unit 14 described above, as a clustering method having a model selection mechanism for fitting one probability distribution model to each peak of the activity distribution, the Dirichlet prior distribution having hyperparameters adaptive to the mixture weights. A mixed model was used. However, the clustering method having a model selection mechanism is not limited to this, and various methods may be used. For example, a Dirichlet prior distribution having a fixed hyperparameter may be used, or a nonparametric Bayes model ( For example, a Dirichlet process mixture model), information criterion (for example, Akaike information criterion or Bayesian information criterion), cross validation, hypothesis testing, and the like may be used.

また、上述のクラスタリング部１４の例では、すべての周波数ビンにおけるアクティビティを用いてクラスタリングを行う例を説明したが、一部の周波数ビンにおけるアクティビティのみを用いてクラスタリングを行ってもよい。用いる周波数ビンは、好ましくは4kHz〜5.56kHzの周波数帯域に含まれる200個の周波数ビンであるが、これに限らず任意の周波数ビンを選択することができる。 Further, in the example of the clustering unit 14 described above, an example is described in which clustering is performed using activities in all frequency bins, but clustering may be performed using only activities in some frequency bins. The frequency bins to be used are preferably 200 frequency bins included in the frequency band of 4 kHz to 5.56 kHz, but not limited to this, any frequency bin can be selected.

信号源数決定部１６は、クラスタリング部１４で得たアクティビティ・クラスタを用いて信号源数Nを決定し、出力する。具体的には、小さいアクティビティからなるアクティビティ・クラスタ（すなわち、信号源に対応しないと想定されるアクティビティ・クラスタ）を除く残りのアクティビティ・クラスタの数を信号源数Nとして推定する。 The signal source number determination unit 16 determines and outputs the signal source number N using the activity cluster obtained by the clustering unit 14. Specifically, the number of remaining activity clusters excluding an activity cluster consisting of small activities (that is, an activity cluster assumed not to correspond to a signal source) is estimated as the signal source number N.

以下、信号源数決定部１６の処理をより具体的に説明する。 Hereinafter, the processing of the signal source number determination unit 16 will be described more specifically.

信号源数決定部１６は、例えば、図６に示すとおり、アクティビティ総和計算部１６０、アクティビティ総和クラスタリング部１６２、要素数計数部１６４から構成される。 The signal source number determination unit 16 includes, for example, an activity total calculation unit 160, an activity total clustering unit 162, and an element number counting unit 164 as shown in FIG.

図４に示すステップＳ１０において、信号源数決定部１６のアクティビティ総和計算部１６０は、入力されたアクティビティ・クラスタC_i（1≦i≦K）を用いて、式（３３）によりアクティビティ総和ρ_i（1≦i≦K）を計算し、出力する。 In step S10 shown in FIG. 4, the activity sum calculation unit 160 of the signal source number determination unit 16 uses the input activity cluster C _i (1 ≦ i ≦ K), and the activity sum ρ _{i according} to the equation (33). Calculate (1 ≦ i ≦ K) and output.

ここで、sum(・)はベクトルの要素の総和を表す。 Here, sum (•) represents the sum of the elements of the vector.

ステップＳ１１において、信号源数決定部１６のアクティビティ総和クラスタリング部１６２は、入力されたアクティビティ総和ρ_iに基づいて、アクティビティ・クラスタC_iを二つのクラスタに分類する。クラスタリングは、例えば、「C.M. Bishop, “Pattern Recognition and Machine Learning,” Springer, 2006.（参考文献５）」に記載のk-meansクラスタリングにおいてクラスタ数を2としたものにより行い、二つのクラスタのうちセントロイド値が小さくない方を信号源に対応するクラスタD_activeとして得る。信号源に対応するアクティビティ・クラスタは信号源に対応しないアクティビティ・クラスタよりも大きなアクティビティ総和を持つと考えられる。したがって、信号源に対応するアクティビティ総和と信号源に対応しないアクティビティ総和とは、それぞれがクラスタをなし、セントロイド値が大きいクラスタが信号源に対応すると考えられるためである。 In step S11, the activity total clustering unit 162 of the signal source number determination unit 16 classifies the activity clusters C _i into two clusters based on the input activity totals ρ _i . Clustering is performed, for example, using k-means clustering described in “CM Bishop,“ Pattern Recognition and Machine Learning, ”Springer, 2006. (reference 5)” with two clusters. The one where the centroid value is not small is obtained as the cluster D _active corresponding to the signal source. The activity cluster corresponding to the signal source is considered to have a larger activity sum than the activity cluster not corresponding to the signal source. Therefore, the activity sum corresponding to the signal source and the activity sum not corresponding to the signal source are each considered to form a cluster and a cluster having a large centroid value corresponds to the signal source.

ステップＳ１２において、信号源数決定部１６の要素数計数部１６４は、入力されたD_activeの要素数を数え、信号源数Nとして出力する。 In step S 12, the element number counting unit 164 of the signal source number determining unit 16 counts the number of input D _active elements and outputs the number as the signal source number N.

上述の信号源数決定部１６の例では、各アクティビティ・クラスタのアクティビティ総和間の距離に基づいて二つのクラスタにクラスタリングする処理を説明した。ただし、アクティビティ総和はアクティビティの大きさに対応した特徴量の一例であり、アクティビティ総和に限らず、他の特徴量を用いても良い。例えば、各アクティビティ・クラスタに含まれるアクティビティのいずれか一つの大きさ（ノルム）や、各クラスタ内のアクティビティの平均値の大きさ（ノルム）などの、アクティビティ・クラスタ内に含まれるアクティビティの代表値を用いて計算した特徴量であってもよい。 In the example of the signal source number determination unit 16 described above, the process of clustering into two clusters based on the distance between the activity sums of the activity clusters has been described. However, the activity sum is an example of a feature amount corresponding to the activity size, and other feature amounts may be used instead of the activity sum. For example, the representative value of the activity included in the activity cluster, such as the size (norm) of any one of the activities included in each activity cluster and the average size (norm) of the activities in each cluster. It may be a feature amount calculated using.

また、アクティビティ・クラスタ間の距離に基づいて二つのクラスタに分類してから信号源に対応するクラスタを得る代わりに、各アクティビティ・クラスタの代表値（または代表値に当該アクティビティ・クラスタの大きさに対応した重みを乗じた値）の大きさが所定の閾値以上であるか否かを判定し、所定の閾値以上であると判定されたアクティビティ・クラスタの数を信号源数Nとして出力する構成としてもよい。 Instead of classifying the two clusters based on the distance between the activity clusters and then obtaining the cluster corresponding to the signal source, the representative value of each activity cluster (or the representative value is the size of the activity cluster). (The value obtained by multiplying the corresponding weight) is determined whether or not the size is equal to or larger than a predetermined threshold, and the number of activity clusters determined to be equal to or larger than the predetermined threshold is output as the number N of signal sources. Also good.

［変形例］
第一実施形態では、クラスタリング部１４でアクティビティのクラスタリングを行った後に、信号源数決定部１６で信号源に対応しないアクティビティ・クラスタ（つまり、小さいアクティビティからなるアクティビティ・クラスタ）を排除する処理を行ったが、この順序を逆にしてもよい。すなわち、まず、周波数ビンごとの分離信号のアクティビティを、その大きさ（ノルム）に基づいて二つのクラスタにクラスタリングし、そのうちノルムが小さいアクティビティからなる方のクラスタを、信号源に対応しないクラスタとして除外し、ノルムが大きいアクティビティからなる方のクラスタを、信号源に対応するクラスタとして保持し、前記信号源に対応するクラスタに属するアクティビティのみを、クラスタリング部１４の処理によりアクティビティ・クラスタにクラスタリングしてもよい。この場合、信号源に対応しないアクティビティはすでに除外されているため、信号源数決定部１６においては、単に、混合重みα_iが所定の閾値以上であるアクティビティ・クラスタの個数を信号源数Nとして出力すればよい。 [Modification]
In the first embodiment, after clustering activities by the clustering unit 14, the signal source number determining unit 16 performs processing for eliminating activity clusters that do not correspond to signal sources (that is, activity clusters consisting of small activities). However, this order may be reversed. That is, first, the activity of the separated signal for each frequency bin is clustered into two clusters based on its magnitude (norm), and the cluster with the activity with the smaller norm is excluded as a cluster that does not correspond to the signal source. The cluster having the activity with the larger norm is held as the cluster corresponding to the signal source, and only the activities belonging to the cluster corresponding to the signal source are clustered into the activity cluster by the processing of the clustering unit 14. Good. In this case, since the activity not corresponding to the signal source has already been excluded, the signal source number determination unit 16 simply sets the number of activity clusters whose mixing weight α _i is equal to or greater than a predetermined threshold as the signal source number N. Output.

また、第一実施形態では、信号源に対応しないアクティビティは信号源に対応するアクティビティよりも小さいという性質を利用して信号源に対応しないアクティビティを除外している。この性質を利用する他の方法によって信号源に対応しないアクティビティを除外してもよいことは言うまでもない。例えば、まず、周波数ビンごとの分離信号のアクティビティのノルム||y(ζ)||を計算し、前記ノルムが所定の閾値未満であるアクティビティy(ζ)を信号源に対応しないアクティビティとして除外し、前記ノルムが所定の閾値以上であるアクティビティy(ζ)を信号源に対応するアクティビティとして保持し、前記信号源に対応するアクティビティについてのみクラスタリング部１４の処理によりアクティビティ・クラスタにクラスタリングしてもよい。この場合も、信号源に対応しないアクティビティはすでに除外されているため、信号源数決定部１６においては、単に、混合重みα_iが所定の閾値以上であるアクティビティ・クラスタの個数を信号源数Nとして出力すればよい。 In the first embodiment, the activity that does not correspond to the signal source is excluded by using the property that the activity that does not correspond to the signal source is smaller than the activity that corresponds to the signal source. It goes without saying that other methods that take advantage of this property may exclude activities that do not correspond to the signal source. For example, first, the activity norm of the separated signal for each frequency bin is calculated as || y (ζ) ||, and the activity y (ζ) whose norm is less than a predetermined threshold is excluded as the activity not corresponding to the signal source. The activity y (ζ) having the norm equal to or greater than a predetermined threshold may be held as an activity corresponding to the signal source, and only the activity corresponding to the signal source may be clustered into an activity cluster by the processing of the clustering unit 14. . Also in this case, since the activity that does not correspond to the signal source has already been excluded, the signal source number determination unit 16 simply determines the number of activity clusters whose mixing weight α _i is equal to or greater than a predetermined threshold as the number N of signal sources. As output.

あるいは、信号源に対応しないアクティビティを陽に除外する代わりに、クラスタリング部１４におけるクラスタリングにおいて、信号に対応しないアクティビティを無視してもよい。すなわち、クラスタリング部１４における評価関数~Q(Θ)に対し、アクティビティの大きさに基づいて式（３４）に示すように重みづけを行う。 Alternatively, instead of explicitly excluding activities that do not correspond to signal sources, activities that do not correspond to signals may be ignored in clustering in the clustering unit 14. That is, the evaluation function ~ Q (Θ) in the clustering unit 14 is weighted as shown in Expression (34) based on the size of the activity.

ここで、ν(ζ)はアクティビティの大きさに基づく重みであり、式（３５）により定義される。 Here, ν (ζ) is a weight based on the size of the activity, and is defined by Expression (35).

ここで、指数aは所定の正の数である。式（３４）に基づくパラメータ更新式も、第一実施形態と同様の方法で導出することができる。このように、アクティビティの大きさに基づく重みを導入すれば、クラスタリング部１４におけるクラスタリングは、大きいν(ζ)を持つ（すなわち、信号源に対応する）アクティビティの影響を大きく受け、小さいν(ζ)を持つ（すなわち、信号源に対応しない）アクティビティの影響はあまり受けない。したがって、信号源に対応しないアクティビティを陽に除外しなくても、それらをほとんど無視してクラスタリングを行うことができる。この場合、信号源に対応しないアクティビティはクラスタリングにおいてほとんど無視されるため、信号源数決定部１６においては、単に、混合重みα_iが所定の閾値以上であるアクティビティ・クラスタの個数を信号源数Nとして出力すればよい。 Here, the index a is a predetermined positive number. The parameter update formula based on formula (34) can also be derived by the same method as in the first embodiment. In this way, if the weight based on the size of the activity is introduced, the clustering in the clustering unit 14 is greatly affected by the activity having a large ν (ζ) (that is, corresponding to the signal source), and a small ν (ζ). ) (I.e., not corresponding to the signal source). Therefore, even if activities that do not correspond to the signal source are not explicitly excluded, clustering can be performed while almost ignoring them. In this case, since the activity that does not correspond to the signal source is almost ignored in the clustering, the signal source number determination unit 16 simply determines the number of activity clusters whose mixing weight α _i is equal to or greater than a predetermined threshold as the signal source number N. As output.

［第二実施形態］
第一実施形態に係る信号源数推定装置１００では、アクティビティ計算部１２において、アクティビティとして事後確率の時系列を用いた。しかし、この発明で利用するアクティビティは事後確率の時系列に限定されず、各周波数ビンにおける時間周波数マスクを用いてもよい。なお、各周波数ビンにおける時間周波数マスクは、分離信号のパワーを用いて計算できる。以下では、時間周波数マスクの一例として、ウィーナー（Wiener）マスクを用いる場合について説明するが、どのような時間周波数マスクを用いてもよい。例えば、「Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, pp. 1109-1121, Dec. 1984.（参考文献６）」に開示されているMMSE-STSAマスクを用いてもよいし、「Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-33, no. 2, pp. 443-445, Apr. 1985.（参考文献７）」に開示されているMMSE-logSTSAマスクを用いてもよい。 [Second Embodiment]
In the signal source number estimation apparatus 100 according to the first embodiment, the activity calculation unit 12 uses a time series of posterior probabilities as activities. However, the activity used in the present invention is not limited to the time series of posterior probabilities, and a time frequency mask in each frequency bin may be used. Note that the time frequency mask in each frequency bin can be calculated using the power of the separated signal. In the following, a case where a Wiener mask is used as an example of a time frequency mask will be described, but any time frequency mask may be used. For example, “Y. Ephraim and D. Malah,“ Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator ”, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-32, pp. 1109- 1121, Dec. 1984. (reference 6) ”may be used, or“ Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral MMSE-logSTSA mask disclosed in "Amplitude estimator", IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-33, no. 2, pp. 443-445, Apr. 1985. May be used.

以下、アクティビティとしてウィーナーマスクを用いる場合のアクティビティ計算部の処理について説明する。 Hereinafter, the processing of the activity calculation unit when using a Wiener mask as an activity will be described.

第二実施形態に係るアクティビティ計算部１３は、入力された時間周波数領域の混合信号x(τ,f)に対し、任意の信号源分離技術を適用してK個の分離信号ξ_kτf∈Cを求める。ここで、kは分離信号の番号、τはフレーム番号、fは周波数ビン番号、Cは複素数全体を表す。この信号源分離技術は、クラスタリングに基づく信号源分離技術でもよいが、他の任意の信号源分離技術を用いてもよい。例えば、独立成分分析に基づく音源分離技術などを適用することができる。ここで、分離信号の個数Kは、想定される信号源数の上限以上に設定するものとする。 The activity calculation unit 13 according to the second embodiment applies an arbitrary signal source separation technique to the input mixed signal x (τ, f) in the time-frequency domain to obtain K separated signals ξ _{kτf εC} . Ask. Here, k is the number of the separated signal, τ is the frame number, f is the frequency bin number, and C is the whole complex number. This signal source separation technique may be a signal source separation technique based on clustering, but any other signal source separation technique may be used. For example, a sound source separation technique based on independent component analysis can be applied. Here, the number K of separated signals is set to be equal to or more than the upper limit of the number of assumed signal sources.

続いて、アクティビティ計算部１３は、分離信号ξ_kτfを用いて、ウィーナーマスクη_kτfを式（３６）により計算する。 Subsequently, the activity calculation unit 13 calculates the Wiener mask η _kτf using the separated signal ξ _kτf _according to Expression (36).

これにより、各周波数ビンfにおいて、各分離信号の番号kに対して、アクティビティ[η_k1f…η_kTf]^Tが定義できる。ここで、下付き文字に現れるTはフレームの総数、上付き文字のTはベクトルの転置を表す。これらにζ=k+(f-1)Kで定義される通し番号を振って、アクティビティy(ζ)を式（３７）により定義する。 Thereby, in each frequency bin f, the activity [η _k1f ... _ΗkTf ] ^T can be defined for the number k of each separated signal. Here, T appearing in the subscript represents the total number of frames, and T of the superscript represents the transposition of the vector. A serial number defined by ζ = k + (f−1) K is assigned to these, and activity y (ζ) is defined by equation (37).

［実験結果］
この発明の信号源数推定技術による信号源数の推定精度を評価するために実験を行った。観測信号は、計測した室内インパルス応答を音源信号に畳み込むことにより生成した。 [Experimental result]
An experiment was conducted to evaluate the estimation accuracy of the number of signal sources by the signal source number estimation technique of the present invention. The observation signal was generated by convolving the measured room impulse response with the sound source signal.

図７を参照して実験環境における音源とマイクロフォンの配置を説明する。直方体の空間の中央付近に正三角形を描くように三つのマイクロフォンを配置し、それらのマイクロフォンを取り囲む円を描くように四つの音源を配置した。直方体の空間の大きさは、幅4.45メートル、奥行3.55メートル、高さ2.5メートルである。三つのマイクロフォンの描く正三角形は一辺4センチメートルである。四つの音源の描く円の半径は1.2メートルである。四つの音源は図７の紙面下方向を0°として反時計回りに70°、150°、245°、315°の位置に配置されている。三つのマイクロフォンと四つの音源は水平に設置され、その床面からの高さは1.2メートルとした。 The arrangement of sound sources and microphones in the experimental environment will be described with reference to FIG. Three microphones were placed near the center of the rectangular parallelepiped space so as to draw an equilateral triangle, and four sound sources were placed so as to draw a circle surrounding the microphones. The size of the rectangular parallelepiped space is 4.45 meters wide, 3.55 meters deep, and 2.5 meters high. The regular triangle drawn by the three microphones is 4 centimeters on a side. The radius of the circle drawn by the four sound sources is 1.2 meters. The four sound sources are arranged at positions of 70 °, 150 °, 245 °, and 315 ° counterclockwise with the downward direction in FIG. 7 as 0 °. Three microphones and four sound sources were installed horizontally, and the height from the floor was 1.2 meters.

表１にその他の実験条件を示す。 Table 1 shows other experimental conditions.

予備実験により、アクティビティ・クラスタリングに用いる周波数ビンの帯域を4kHz〜5.56kHz（200の周波数ビンを含む）に限定した場合に特に信号源数推定精度が良いことが分かったため、この帯域に限定した。 Preliminary experiments showed that the frequency bin frequency band used for activity clustering was limited to 4 kHz to 5.56 kHz (including 200 frequency bins).

時間周波数成分のクラスタリングは、「特開２００９−０５３３４９号公報（参考文献８）」に記載の方法により行った。 The clustering of the time frequency component was performed by the method described in “JP 2009-053349 A (reference document 8)”.

実験では、信号源数、マイクロフォン数、残響時間、クラスタ数（K）の組み合わせの各々に対し、音源信号の組み合わせを変えて8回の試行を行ったうち、正しく信号源数が推定された回数の割合を正解率として算出した。 In the experiment, for each combination of the number of signal sources, the number of microphones, the reverberation time, and the number of clusters (K), the number of times that the number of signal sources was correctly estimated after eight trials with different combinations of sound source signals. Was calculated as the correct answer rate.

表２は、信号源数を2個とし、マイクロフォン数を3個とした場合の実験結果である。使用した音源は図７に示した音源１、２であり、使用したマイクロフォンは図７に示したマイクロフォン１、２、３である。 Table 2 shows the experimental results when the number of signal sources is two and the number of microphones is three. The sound sources used are the sound sources 1 and 2 shown in FIG. 7, and the microphones used are the microphones 1, 2, and 3 shown in FIG.

表３は、信号源数を2個とし、マイクロフォン数を2個とした場合の実験結果である。使用した音源は図７に示した音源１、２であり、使用したマイクロフォンは図７に示したマイクロフォン１、２である。 Table 3 shows the experimental results when the number of signal sources is two and the number of microphones is two. The sound sources used are the sound sources 1 and 2 shown in FIG. 7, and the microphones used are the microphones 1 and 2 shown in FIG.

表４は、信号源数を3個とし、マイクロフォン数を3個とした場合の実験結果である。使用した音源は図７に示した音源１、２、３であり、使用したマイクロフォンは図７に示したマイクロフォン１、２、３である。 Table 4 shows the experimental results when the number of signal sources is three and the number of microphones is three. The sound sources used are the sound sources 1, 2, and 3 shown in FIG. 7, and the microphones used are the microphones 1, 2, and 3 shown in FIG.

表５は、信号源数を3個とし、マイクロフォン数を2個とした場合の実験結果である。使用した音源は図７に示した音源１、２、３であり、使用したマイクロフォンは図７に示したマイクロフォン１、２である。 Table 5 shows the experimental results when the number of signal sources is three and the number of microphones is two. The sound sources used are the sound sources 1, 2, and 3 shown in FIG. 7, and the microphones used are the microphones 1 and 2 shown in FIG.

表６は、信号源数を4個とし、マイクロフォン数を3個とした場合の実験結果である。使用した音源は図７に示した音源１、２、３、４であり、使用したマイクロフォンは図７に示したマイクロフォン１、２、３である。 Table 6 shows the experimental results when the number of signal sources is four and the number of microphones is three. The sound sources used are the sound sources 1, 2, 3, 4 shown in FIG. 7, and the microphones used are the microphones 1, 2, 3 shown in FIG.

表２から表４に示すように、信号源数≦マイクロフォン数の場合には、クラスタ数（K）や残響時間によらず100％の正解率が得られた。さらに、表５及び表６に示すように、より困難な条件である信号源数＞マイクロフォン数の場合にも、高い正解率が得られた。表５の3音源、2マイクロフォンの場合、K=12のとき、残響時間によらず100％の正解率が得られた。表６の4音源、3マイクロフォンという最も困難な場合でも、K=8,10,12のとき、残響時間によらず75％以上という高い正解率を達成した。 As shown in Tables 2 to 4, when the number of signal sources ≦ the number of microphones, an accuracy rate of 100% was obtained regardless of the number of clusters (K) and the reverberation time. Furthermore, as shown in Tables 5 and 6, even when the number of signal sources> the number of microphones, which is a more difficult condition, a high accuracy rate was obtained. In the case of 3 sound sources and 2 microphones in Table 5, when K = 12, a correct answer rate of 100% was obtained regardless of the reverberation time. Even in the most difficult case of 4 sound sources and 3 microphones in Table 6, when K = 8, 10, 12, a high accuracy rate of 75% or higher was achieved regardless of the reverberation time.

以上の実験結果より、この発明の信号源数推定技術により十分な効果が得られることが確認された。 From the above experimental results, it was confirmed that a sufficient effect can be obtained by the signal source number estimation technique of the present invention.

［プログラム、記録媒体］
この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施例において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 [Program, recording medium]
The present invention is not limited to the above-described embodiment, and it goes without saying that modifications can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above-described embodiments are not only executed in time series according to the order described, but may be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes.

また、上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１００信号源数推定装置
１０周波数領域変換部
１２、１３アクティビティ計算部
１４クラスタリング部
１４０正規化部
１４２事後確率計算部
１４４パラメータ更新部
１４６パラメータ保持部
１４８クラスタ決定部
１６信号源数決定部
１６０アクティビティ総和計算部
１６２アクティビティ総和クラスタリング部
１６４要素数計数部
４００音源数推定装置
２０収音手段
３０周波数領域変換部
３２パワー推定部
３４到来方向推定部
３６音源数推定部
２００方向情報分布推定装置 100 signal source number estimation device 10 frequency domain conversion unit 12, 13 activity calculation unit 14 clustering unit 140 normalization unit 142 posterior probability calculation unit 144 parameter update unit 146 parameter holding unit 148 cluster determination unit 16 signal source number determination unit 160 activity sum Calculation unit 162 Activity total clustering unit 164 Element number counting unit 400 Sound source number estimation device 20 Sound collection means 30 Frequency domain conversion unit 32 Power estimation unit 34 Arrival direction estimation unit 36 Sound source number estimation unit 200 Direction information distribution estimation device

Claims

A clustering unit that clusters the activity of the separated signal acquired from each of the separated signals into a plurality of activity clusters when the observation signal is separated into a plurality of separated signals for each frequency bin;
A signal source number determination unit for obtaining the number of activity clusters determined to correspond to a signal source among the activity clusters as an estimated value of the number of signal sources;
An apparatus for estimating the number of signal sources.

The signal source number estimation apparatus according to claim 1,
The signal source number determining unit is a feature corresponding to the activity size among two clusters obtained by clustering the activity cluster based on a feature amount corresponding to the activity size for each activity cluster. The total number of activity clusters included in the cluster having the larger amount is used as the estimated value of the number of signal sources.

The signal source number estimation apparatus according to claim 1,
The signal source number determination unit uses the total number of activity clusters whose feature quantity corresponding to the activity size for each activity cluster is equal to or greater than a predetermined threshold as an estimate of the number of signal sources. Number estimation device.

The signal source number estimation device according to any one of claims 1 to 3,
The activity is the contribution of the separated signal for each time interval.

The signal source number estimation device according to any one of claims 1 to 3,
The activity is a time frequency mask obtained using the power of the separated signal.

The signal source number estimation device according to any one of claims 1 to 5,
The clustering unit clusters the activity into a plurality of activity clusters by fitting the activity to a probability model representing an activity distribution and calculating model parameters of the probability model using an evaluation function for evaluating the probability model. The signal source number estimation device.

When the clustering unit separates the observation signal into a plurality of separated signals for each frequency bin, a clustering step of clustering the activities of the separated signals acquired from each of the separated signals into a plurality of activity clusters;
A signal source number determination unit that determines the number of activity clusters determined to correspond to a signal source among the activity clusters as an estimate of the number of signal sources; and
A method for estimating the number of signal sources.

A program for causing a computer to function as the signal source number estimation apparatus according to claim 1.