JP7046636B2

JP7046636B2 - Signal analyzers, methods, and programs

Info

Publication number: JP7046636B2
Application number: JP2018026316A
Authority: JP
Inventors: 弘和亀岡; 英章鏡; 正裕湯川
Original assignee: Nippon Telegraph and Telephone Corp; Keio University
Current assignee: Nippon Telegraph and Telephone Corp; Keio University
Priority date: 2018-02-16
Filing date: 2018-02-16
Publication date: 2022-04-04
Anticipated expiration: 2038-02-16
Also published as: JP2019144320A

Description

本発明は、信号解析装置、方法、及びプログラムに係り、特に、各構成音が混合された観測信号を分離する信号解析装置、方法、及びプログラムに関する。 The present invention relates to a signal analysis device, a method, and a program, and more particularly to a signal analysis device, a method, and a program for separating an observation signal in which each constituent sound is mixed.

ブラインド音源分離(Blind Source separation; BSS) は、音源とマイクとの間の伝達関数が未知な状況下で、マイクアレイの入力から個々の音源信号を分離する技術である。周波数領域で定式化されるBSS のアプローチは、周波数ごとの音源分離の問題と周波数ごとに得られる分離信号がそれぞれどの音源のものであるかを対応づけるパーミュテーション整合と呼ぶ問題を併せて解く必要があるが、音源の混合過程を畳み込み演算を含まない瞬時混合系で表せるため比較的効率の高いアルゴリズムを実現できる利点がある。また、音源に関する時間周波数領域で成り立つ様々な仮定やマイクロホンアレーの周波数応答に関する仮定を有効活用できるようになる点も大きな利点である。例えば、同一音源に由来する周波数成分の大きさは同期して時間変化しやすいという傾向を手がかりにしながら各周波数における音源分離とパーミュテーション整合を同時解決する独立ベクトル分析(Independent Vector Analysis; IVA)と呼ぶICAの拡張版が提案されている。 Blind Source separation (BSS) is a technology that separates individual sound source signals from the input of a microphone array when the transfer function between the sound source and the microphone is unknown. The BSS approach formulated in the frequency domain solves the problem of sound source separation for each frequency and the problem called permutation matching that associates the sound source separation signal obtained for each frequency with each other. Although it is necessary, there is an advantage that a relatively efficient algorithm can be realized because the mixing process of the sound source can be represented by an instantaneous mixing system that does not include the convolution operation. It is also a great advantage that various assumptions about the sound source in the time frequency domain and assumptions about the frequency response of the microphone array can be effectively utilized. For example, Independent Vector Analysis (IVA), which solves sound source separation and permutation matching at each frequency at the same time, using the tendency that the magnitude of frequency components derived from the same sound source tends to change with time in synchronization. An extended version of ICA called ICA has been proposed.

異なるアプローチとして、非負行列因子分解(Non-negative Matrix Factorization; NMF) の多チャンネル拡張が近年注目を集めている（非特許文献１～３）。NMFはもともとモノラル音源分離に適用されてきた手法である。NMFでは観測信号のパワー（あるいは振幅）スペクトログラムを非負値行列とみなし、これを二つの行列の非負値行列の積で近似する。これは、各時間フレームで観測される混合信号のパワースペクトルが、時間変化する振幅によってスケーリングされた限られた数の基底スペクトルの線形和によって近似できると仮定することに相当する。多チャンネルNMF（MNMF）は分離のための追加の手掛かりとして空間情報の使用を可能にするために、このアプローチを多チャンネルのケースに拡張したものである。また、MNMF は周波数ごとの音源分離とパーミュテーション整合の手掛かりとしてスペクトルテンプレートを用いた周波数領域BSSの拡張として解釈することもできる。 As a different approach, the multi-channel extension of non-negative Matrix Factorization (NMF) has been attracting attention in recent years (Non-Patent Documents 1 to 3). NMF is a method originally applied to monaural sound source separation. In NMF, the power (or amplitude) spectrogram of the observed signal is regarded as a non-negative matrix, and this is approximated by the product of the non-negative matrices of the two matrices. This corresponds to assuming that the power spectrum of the mixed signal observed in each time frame can be approximated by the linear sum of a limited number of basis spectra scaled by the time-varying amplitude. Multi-channel NMF (MNMF) extends this approach to the multi-channel case to allow the use of spatial information as an additional clue for isolation. MNMF can also be interpreted as an extension of the frequency domain BSS using a spectral template as a clue to sound source separation and permutation matching for each frequency.

従来のMNMF （非特許文献１）では劣決定条件（マイク数<音源数）における分離を対象としているが、優決定（マイク数_音源数）の状況に限定した場合、優決定MNMF(DMNMF)と呼ばれる効果的な手法が提案されている（非特許文献２、３）。非特許文献３ではDNMFとIVAの関連が考察されており、この考察を通して、IVAで導入された高速なアルゴリズムをDMNMF における分離行列推定に適用可能であることが示されている。これにより、非特許文献３のアルゴリズムは従来の劣決定版MNMF（非特許文献１）よりも30倍以上高速であることが報告されている。 The conventional MNMF (Non-Patent Document 1) targets separation under the inferior determination condition (number of microphones <number of sound sources), but when limited to the situation of superior determination (number of microphones_number of sound sources), superior determination MNMF (DMNMF) An effective method called (Non-Patent Documents 2 and 3) has been proposed. Non-Patent Document 3 considers the relationship between DNMF and IVA, and through this consideration, it is shown that the high-speed algorithm introduced in IVA can be applied to the separation matrix estimation in DMNMF. As a result, it is reported that the algorithm of Non-Patent Document 3 is 30 times or more faster than the conventional inferior version MNMF (Non-Patent Document 1).

A. Ozerov and C. F_evotte, “Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation," IEEE Transactions on Audio, Speech, and Language Processing, vol.18, no. 3, pp. 550-563, 2010.A. Ozerov and C. F_evotte, “Multichannel nonnegative matrix factorization in convolutive laminate for audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol.18, no. 3, pp. 550-563, 2010. H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, and K. Kashino, “Statistical model of speech signals based on composite autoregressive system with application to blind source separation," in LVA/ICA. Springer, 2010, pp. 245-253.H. Kameoka, T. Yoshioka, M. Hamamura, J. Le Roux, and K. Kashino, “Statistical model of speech signals based on composite autoregressive system with application to blind source separation,” in LVA / ICA. Springer, 2010, pp. 245-253. D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 9, pp. 1626-1641, 2016.D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE / ACM Transactions on Audio, Speech, and Language Processing, vol . 24, no. 9, pp. 1626-1641, 2016.

IVAやDMNMFで仮定している時間周波数領域での瞬時混合モデルの１つの欠点は、高残響下でその仮定が成り立たない点である。 One drawback of the instantaneous mixing model in the time frequency domain assumed by IVA and DMNMF is that the assumption does not hold under high reverberation.

本発明では、上記事情を鑑みて成されたものであり、高残響下であっても、各構成音が混合した混合信号から、各構成音を精度よく分離することができる信号解析装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and is a signal analysis device and method capable of accurately separating each constituent sound from a mixed signal in which each constituent sound is mixed even under high reverberation. , And the purpose of providing the program.

上記目的を達成するために、本発明に係る信号解析装置は、各構成音が混合された観測信号を入力として、各構成音の基底スペクトル、各構成音及び各基底の各時刻における音量を表すアクティベーションパラメータ、時間周波数領域で各構成音が混合された混合音を各構成音に分離するための分離行列、及び残響除去フィルタを用いて残響除去された観測信号を各構成音に分離した信号を用いて表される目的関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、前記分離行列と、前記残響除去フィルタとを推定するパラメータ推定部を含んで構成されている。 In order to achieve the above object, the signal analysis device according to the present invention uses an observation signal in which each constituent sound is mixed as an input, and represents a base spectrum of each constituent sound, each constituent sound, and a volume at each time of each base. The activation parameter, the separation matrix for separating the mixed sound in which each constituent sound is mixed in the time frequency region into each constituent sound, and the signal obtained by separating the observation signal whose reverberation is removed by using the reverberation removal filter into each constituent sound. To reduce the objective function represented by It is configured to include a parameter estimation unit for estimating.

本発明に係る信号解析方法は、パラメータ推定部が、各構成音が混合された観測信号を入力として、各構成音の基底スペクトル、各構成音及び各基底の各時刻における音量を表すアクティベーションパラメータ、時間周波数領域で各構成音が混合された混合音を各構成音に分離するための分離行列、及び残響除去フィルタを用いて残響除去された観測信号を各構成音に分離した信号を用いて表される目的関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、前記分離行列と、前記残響除去フィルタとを推定する。 In the signal analysis method according to the present invention, the parameter estimation unit uses an observation signal in which each constituent sound is mixed as an input, and an activation parameter representing the base spectrum of each constituent sound, each constituent sound, and the volume at each time of each base. , A separation matrix for separating the mixed sound in which each constituent sound is mixed in the time frequency region into each constituent sound, and a signal obtained by separating the observation signal whose reverberation has been removed by using the reverberation removal filter into each constituent sound. Estimate the base spectrum at each constituent sound and each base, the activation parameters at each time of each constituent sound and each base, the separation matrix, and the reverberation removal filter so as to reduce the objective function represented. ..

また、本発明のプログラムは、コンピュータを、上記の信号解析装置を構成する各部として機能させるためのプログラムである。 Further, the program of the present invention is a program for making a computer function as each part constituting the above-mentioned signal analysis apparatus.

以上説明したように、本発明の信号解析装置、方法、及びプログラムによれば、各構成音の基底スペクトル、各構成音及び各基底の各時刻における音量を表すアクティベーションパラメータ、時間周波数領域で各構成音が混合された混合音を各構成音に分離するための分離行列、及び残響除去フィルタを用いて残響除去された観測信号を各構成音に分離した信号を用いて表される目的関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、前記分離行列と、前記残響除去フィルタとを推定することにより、高残響下であっても、各構成音が混合した混合信号から、各構成音を精度よく分離することができる。 As described above, according to the signal analyzer, method, and program of the present invention, the base spectrum of each constituent sound, the activation parameter representing the volume of each constituent sound and each base at each time, and each in the time frequency region. A separation matrix for separating the mixed sound in which the constituent sounds are mixed into each constituent sound, and an objective function expressed using the signal obtained by separating the observation signal whose reverberation has been removed by using the reverberation removal filter into each constituent sound. High reverberation by estimating the base spectrum of each constituent sound and each base, the activation parameter at each time of each constituent sound and each base, the separation matrix, and the reverberation removal filter so as to make it smaller. Even so, each constituent sound can be accurately separated from the mixed signal in which each constituent sound is mixed.

本発明の実施の形態に係る信号解析装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the signal analysis apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る信号解析装置におけるパラメータ推定処理ルーチンを示すフローチャート図である。It is a flowchart which shows the parameter estimation processing routine in the signal analysis apparatus which concerns on embodiment of this invention. 実験結果を示す図である。It is a figure which shows the experimental result.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態の概要＞
高残響の混合信号の音源分離は、周波数領域の畳み込み混合モデルを用いて効果的に解くことができることが示されている（非特許文献２、５）。非特許文献５の方法では、分離行列、残響除去フィルタおよび各音源のスペクトルパラメータを反復的に更新することによって、周波数領域畳み込み混合モデルのパラメータを効率的に推定することを可能にしている。 <Outline of Embodiment of the present invention>
It has been shown that the sound source separation of a highly reverberant mixed signal can be effectively solved using a convolutional mixed model in the frequency domain (Non-Patent Documents 2 and 5). The method of Non-Patent Document 5 makes it possible to efficiently estimate the parameters of the frequency domain convolution mixed model by iteratively updating the separation matrix, the reverberation removal filter, and the spectral parameters of each sound source.

本発明の実施の形態は、周波数領域畳み込み混合モデルをDNMF の枠組に導入し、非特許文献６と非特許文献３、４と非特許文献５のアルゴリズムを融合することにより、高残響下で頑健な音源分離を実現するものである。本発明の実施の形態の最適化プロセスは、(i) 補助関数法を用いたNMF のパラメータ推定、(ii) 分離行列更新、(iii) 残響除去フィルタ更新、の３ステップによって構成され、(i) に非特許文献６、７のアルゴリズム、(ii) に非特許文献３、４のアルゴリズム、(iii) に非特許文献５のアルゴリズムを用いる。 In the embodiment of the present invention, a frequency domain convolution mixed model is introduced into the framework of DNMF, and the algorithms of Non-Patent Document 6 and Non-Patent Documents 3 and 4 and Non-Patent Document 5 are fused to be robust under high reverberation. It realizes the separation of sound sources. The optimization process of the embodiment of the present invention consists of three steps: (i) parameter estimation of NMF using the auxiliary function method, (ii) separation matrix update, and (iii) reverberation removal filter update. ) Uses the algorithms of Non-Patent Documents 6 and 7, (ii) uses the algorithms of Non-Patent Documents 3 and 4, and (iii) uses the algorithm of Non-Patent Document 5.

[非特許文献４] N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique," in Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on. IEEE, 2011, pp. 189-192. [Non-Patent Document 4] N. Ono, “Stable and fast update rules for independent vector analysis based on auxiliary function technique,” in Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on. IEEE, 2011, pp . 189-192.

[非特許文献５] T. Yoshioka, T. Nakatani, M. Miyoshi, and H. G. Okuno, “Blind separation and dereverberation of speech mixtures by joint optimization," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 69-84, 2011. [Non-Patent Document 5] T. Yoshioka, T. Nakatani, M. Miyoshi, and HG Okuno, “Blind separation and dereverberation of speech laminate by joint optimization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 69-84, 2011.

[非特許文献６] 亀岡弘和, 後藤真孝, 嵯峨山茂樹, “スペクトル制御エンベロープによる混合音中の周期および非周期成分の選択的イコライザ," 情報処理学会研究報告, 2006-MUS-66-13, pp. 77-84, Aug. 2006. [Non-Patent Document 6] Hirokazu Kameoka, Masataka Goto, Shigeki Sagayama, "Selective Equalizer of Periodic and Aperiodic Components in Mixed Sound by Spectral Control Envelope," IPSJ Research Report, 2006-MUS-66-13, pp. 77-84, Aug. 2006.

[非特許文献７] M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, and S. Sagayama, “Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence," in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2010, pp. 283-288. [Non-Patent Document 7] M. Nakano, H. Kameoka, J. Le Roux, Y. Kitano, N. Ono, and S. Sagayama, “Convergence-guaranteed multiplicative algorithms for non-negative matrix factorization with beta-divergence,” in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2010, pp. 283-288.

＜問題の定式化＞
マイク数をM、音源数をMとし、観測信号、推定信号に短時間フーリエ変換(short-time Fouriertrans form; STFT) をかけた信号をそれぞれ <Problem formulation>
The number of microphones is M, the number of sound sources is M, and the observed signal and the estimated signal are subjected to short-time Fourier transform (STFT).

とする。ここで、f とn はそれぞれ周波数ビンおよび時間フレームのインデックス、i とj はそれぞれマイクと音源のインデックスである。また、(・)^T は行列またはベクトルの転置を表す。多くの優決定条件のBSS では時間周波数領域での瞬時分離系 And. Where f and n are the frequency bin and time frame indexes, respectively, and i and j are the microphone and sound source indexes, respectively. Also, (・) ^T represents the transpose of a matrix or vector. In BSS with many dominant conditions, the instantaneous separation system in the time frequency domain

を仮定する。ここでW^H(f) は分離行列と呼ばれる。また、(・)^Hは複素共役転置を表す。しかしながら、高残響下（インパルス応答がSTFTのフレーム長より長い状況）ではこの仮定は成り立たない。 Is assumed. Here W ^H (f) is called the separation matrix. Also, (・) ^H represents the complex conjugate transpose. However, this assumption does not hold under high reverberation (the situation where the impulse response is longer than the frame length of the STFT).

本発明の実施の形態では時間周波数領域でマルチチャネル有限インパルス応答を有する分離システム In the embodiment of the present invention, a separation system having a multi-channel finite impulse response in the time frequency domain.

を用いる。ここでW^H(f, n′)、 0≦n′≦N′はM×Mの係数行列である。W^H(f, 0) が可逆であるとすると、式(4) は次のように変形できる。 Is used. Here, W ^H (f, n ′) and 0 ≦ n ′ ≦ N ′ are coefficient matrices of M × M. Assuming that ^WH (f, 0) is reversible, Eq. (4) can be transformed as follows.

ここで

である。式(5) は混合信号x(f,n)の残響除去を行うプロセスであり、式(6) は残響除去された信号y(f,n) の分離プロセスであることが分かる。 here

Is. It can be seen that Eq. (5) is the process of removing the reverberation of the mixed signal x (f, n), and Eq. (6) is the process of separating the signal y (f, n) from which the reverberation has been removed.

確率変数s_j(f,n) を Random variable s _j (f, n)

とし、s_j(f, n) とs_j′ (f′,n′) が(f, n, j) ≠ (f′, n′, j′) のとき統計的に独立であるとする。ここで複素正規分布を Let s _j (f, n) and s _j ′ (f ′, n ′) be statistically independent when (f, n, j) ≠ (f ′, n ′, j ′). Here is the complex normal distribution

とする。さらにパワースペクトル密度v_j(f, n) を And. Further, the power spectral density v _j (f, n)

とする。それぞれ、h_j,k(f)≧0は基底行列、u_j,k(n)≧0はアクティベーション行列のj 番目の音源の(j, k) 要素である。パワースペクトログラムモデル(9) や、その類似モデルを用いた多チャンネル音源分離はMNMF と呼ばれている。y_i(f, n) に関して負の対数尤度をとると目的関数 And. H _{j, k} (f) ≧ 0 is the basis matrix, and u _{j, k} (n) ≧ 0 is the (j, k) element of the jth sound source of the activation matrix, respectively. The multi-channel sound source separation using the power spectrogram model (9) and its similar model is called MNMF. Taking a negative log-likelihood with respect to y _i (f, n) is the objective function

が得られる。ここで

である。 Is obtained. here

Is.

＜パラメータ推定アルコリスムの導出＞
目的関数(10) は次のように各変数についてそれぞれ最小化することで、関数値を小さくする更新式が得られる。 <Derivation of parameter estimation alcoholism>
By minimizing the objective function (10) for each variable as follows, an update expression that reduces the function value can be obtained.

以下の節で各変数に関する更新式を導出する。 The update formula for each variable is derived in the following sections.

＜

の更新＞ <

Update>

に関する更新式は補助関数法を用いて導出する。式(10) から

に関する項だけを取り出すと

The update formula for is derived using the auxiliary function method. From equation (10)

If you take out only the section about

となる。この関数を最小化するために、C₁ の補助関数（上界関数） Will be. Auxiliary function of C ₁ (upper bound function) to minimize this function

を用いる。ここで

である。このときC₁ ⁺が補助関数になっていることは Is used. here

Is. At this time, the fact that C ₁ ⁺ is an auxiliary function

を満たすことから確認できる。また、式(16)、(17) の等号成立条件はそれぞれ It can be confirmed by satisfying. In addition, the conditions for establishing the equal sign in equations (16) and (17) are different, respectively.

である。目的関数C₁ は次の二つの更新を繰り返すことで間接的に最小化される。 Is. The objective function C ₁ is indirectly minimized by repeating the following two updates.

1. 式(18)、(19) を用いてC₁ ⁺を

について最小化、 1. Using equations (18) and (19), C ₁ ⁺

Minimize about,

2. C₁ ⁺ を

について最小化. 2. C ₁ ⁺

Minimize about.

二番目の更新は

の要素ごとに偏微分が0 になるように行う。 The second update is

The partial differential is set to 0 for each element of.

ここで

とした。 here

And said.

＜

の更新＞ <

Update>

式(10) から

に関する項だけを取り出すと From equation (10)

If you take out only the section about

となる。ここでw_j(f) は

のj 番目の列ベクトル、

である。前述の通り、

を固定したとき、式(10) は残響除去された混合信号y(f, n) の瞬時分離問題である。このことから、分離行列

に関する更新は、従来の優決定BSS で用いられていた手法を使うことができる。例えば自然勾配法、FastICA(FICA) や反復射影法(IP) などである。ここではIP を用いた導出を行う。 Will be. Where w _j (f) is

Jth column vector,

Is. As mentioned above

When fixed, Eq. (10) is the instantaneous separation problem of the reverberated mixed signal y (f, n). From this, the separation matrix

Updates can use the techniques used in traditional dominant BSS. For example, the natural gradient method, FastICA (FICA) and iterative projection method (IP). Here, the derivation using IP is performed.

IP は

の列ベクトルごとに更新するブロック座標降下型アルゴリズムである。 IP is

It is a block coordinate descent type algorithm that updates each column vector of.

を

の複素共役

で偏微分し、それを0 とすると

of

Complex conjugate of

Partially differentiate with, and set it to 0

となる。行列式に関する微分

を用いることで式(23) は次のように変形できる。 Will be. Derivatives for determinants

By using, Eq. (23) can be transformed as follows.

このとき、式（24）、（25）からの解は、 At this time, the solutions from equations (24) and (25) are

を全てのf、 j について行うことで得られる。e_j はM×M 単位行列Ｉのj 列ベクトルである。 Is obtained by doing for all f and j. e _j is a j-column vector of the M × M identity matrix I.

＜

の更新＞ <

Update>

式(10) から

に関する項だけを取り出すと From equation (10)

If you take out only the section about

となる。ここで

であり、

を零行列とする。 Will be. here

And

Let be a zero matrix.

式(28) から明らかに、全てのf について

が互いに依存している。

を独立に更新するために、

を次のようにベクトル化し、式変形を行う。 Obviously from equation (28) for all f

Are dependent on each other.

To update independently

Is vectorized as follows, and the formula is transformed.

ここで

は

のm番目の列ベクトルである。g(f) を用いて、式(28) の

は here

teeth

The mth column vector of. Using g (f), Eq. (28)

teeth

と書き換えられる。ここで Is rewritten as. here

はクロネッカー積である。式(28) に式(31) を代入すると、目的関数は

Is the Kronecker product. Substituting Eq. (31) into Eq. (28), the objective function becomes

となる。以上より

について最小化する更新を求めればよいが、式(33) は

に関する二次式となるため偏微分が0 になるように更新すればよく、 Will be. From the above

You can ask for an update that minimizes about, but Eq. (33)

Since it is a quadratic equation with respect to, it should be updated so that the partial derivative becomes 0.

となる。 Will be.

＜全体の更新式＞
以上より、提案手法の更新式をまとめると次のようになる。 <Overall update formula>
From the above, the update formula of the proposed method can be summarized as follows.

Step1)

の初期値を設定する。 Step1)

Set the initial value of.

Step2) 式（２０）、（２１）に従って各周波数ｆ、各時刻ｎ、各構成音ｊについて

の要素を更新する。 Step2) For each frequency f, each time n, and each constituent sound j according to equations (20) and (21).

Update the element of.

Step3) 式（２６）、（２７）に従って、各周波数ｆ、各構成音ｊについて

の要素を更新する。 Step3) For each frequency f and each constituent sound j according to equations (26) and (27)

Update the element of.

Step4) 式（３４）に従って、各周波数ｆについて

の要素を更新する。 Step4) For each frequency f according to equation (34)

Update the element of.

収束するまで、Step2)～Step4)を繰り返す。 Repeat Step 2) to Step 4) until it converges.

＜本発明の実施の形態に係る信号解析装置の構成＞
次に、本発明の実施の形態に係る信号解析装置の構成について説明する。図１に示すように、本発明の実施の形態に係る信号解析装置１００は、ＣＰＵと、ＲＡＭと、後述するパラメータ推定処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することができる。この信号解析装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部９０と、を含んで構成されている。 <Structure of the signal analysis device according to the embodiment of the present invention>
Next, the configuration of the signal analysis device according to the embodiment of the present invention will be described. As shown in FIG. 1, the signal analysis device 100 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program for executing a parameter estimation processing routine described later and various data. It can be configured on a computer. The signal analysis device 100 is functionally configured to include an input unit 10, a calculation unit 20, and an output unit 90, as shown in FIG.

入力部１０は、複数の構成音が混じっている混合信号（以後、観測信号）の時系列データを受け付ける。 The input unit 10 receives time-series data of a mixed signal (hereinafter referred to as an observation signal) in which a plurality of constituent sounds are mixed.

演算部２０は、時間周波数展開部２４と、パラメータ推定部３６と、を含んで構成されている。 The calculation unit 20 includes a time frequency expansion unit 24 and a parameter estimation unit 36.

時間周波数展開部２４は、観測信号の時系列データに基づいて、各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムを計算する。なお、本実施の形態においては、短時間フーリエ変換やウェーブレット変換などの時間周波数展開を行う。 The time frequency expansion unit 24 calculates an amplitude spectrogram or a power spectrogram representing a spectrum at each time based on the time series data of the observed signal. In this embodiment, time frequency expansion such as short-time Fourier transform and wavelet transform is performed.

パラメータ推定部３６は、観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムに基づいて、各構成音の基底スペクトル、各構成音及び各基底の各時刻における音量を表すアクティベーションパラメータ、時間周波数領域で各構成音が混合された混合音を各構成音に分離するための分離行列、残響除去フィルタを用いて残響除去された観測信号を各構成音に分離した信号を用いて表される目的関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、分離行列と、残響除去フィルタとを推定する。 The parameter estimation unit 36 has a base spectrum of each constituent sound, an activation parameter representing each constituent sound and a volume at each time of each base, and a time frequency based on an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal at each time. A separation matrix for separating the mixed sound in which each constituent sound is mixed in the region into each constituent sound, and the purpose of expressing the observation signal whose reverberation has been removed by using the reverberation removal filter using the signal separated into each constituent sound. To make the function smaller, the base spectrum at each constituent note and each base, the activation parameters at each time of each constituent note and each base, the separation matrix, and the reverberation removal filter are estimated.

具体的には、パラメータ推定部３６は、初期値設定部４０、パラメータ更新部４２、分離行列更新部４４、残響除去フィルタ更新部４６、及び収束判定部４８を備えている。 Specifically, the parameter estimation unit 36 includes an initial value setting unit 40, a parameter update unit 42, a separation matrix update unit 44, a reverberation removal filter update unit 46, and a convergence determination unit 48.

初期値設定部４０は、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、分離行列と、残響除去フィルタとに初期値を設定する。 The initial value setting unit 40 sets initial values in the base spectrum of each constituent sound and each base, the activation parameter at each time of each constituent sound and each base, the separation matrix, and the reverberation removal filter.

パラメータ更新部４２は、観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、前回更新された、又は初期値が設定された、基底スペクトル、アクティベーションパラメータ、分離行列、及び残響除去フィルタとに基づいて、上記（１５）式に示す補助関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータとを更新する。 The parameter update unit 42 includes an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal at each time, and a base spectrum, an activation parameter, a separation matrix, and a reverberation removal filter that have been updated last time or have initial values set. Based on the above, the base spectrum at each constituent sound and each base and the activation parameter at each time of each constituent sound and each base are updated so as to reduce the auxiliary function shown in the above equation (15).

具体的には、上記（１５）式に示す補助関数を小さくするように、上記（２０）式、（２１）式に従って、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータとを要素毎に更新する。 Specifically, according to the above equations (20) and (21), the basis spectrum of each constituent sound and each basis, and the basis spectrum of each constituent sound and each basis so as to reduce the auxiliary function shown in the above equation (15). Update the activation parameters at each time element by element.

分離行列更新部４４は、観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、更新された基底スペクトル及びアクティベーションパラメータと、前回更新された、又は初期値が設定された、分離行列及び残響除去フィルタとに基づいて、上記（１０）式に示す目的関数を小さくするように、上記（２６）式、（２７）式に従って、分離行列を更新する。 The separation matrix updater 44 includes an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal at each time, an updated basis spectrum and activation parameters, and a separation matrix and an initial value set last time. The separation matrix is updated according to the above equations (26) and (27) so as to reduce the objective function shown in the above equation (10) based on the reverberation removal filter.

残響除去フィルタ更新部４６は、観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、更新された、基底スペクトル、アクティベーションパラメータ、分離行列、及び残響除去フィルタに基づいて、上記（１０）式に示す目的関数を小さくするように、上記（３４）式に従って、残響除去フィルタを更新する。 The reverberation removal filter update unit 46 is based on the amplitude spectrogram or power spectrogram representing the spectrum of the observed signal at each time, and the updated base spectrum, activation parameter, separation matrix, and reverberation removal filter (10). The reverberation elimination filter is updated according to the above equation (34) so as to reduce the objective function shown in the equation.

収束判定部４８は、収束条件を満たすか否かを判定し、収束条件を満たすまで、パラメータ更新部４２における更新処理と、分離行列更新部４４における更新処理と、残響除去フィルタ更新部４６における更新処理とを繰り返させる。 The convergence determination unit 48 determines whether or not the convergence condition is satisfied, and until the convergence condition is satisfied, the parameter update unit 42 updates, the separation matrix update unit 44 updates, and the reverberation removal filter update unit 46 updates. Repeat the process.

収束条件としては、例えば、繰り返し回数が、上限回数に到達したことを用いることができる。あるいは、収束条件として、上記（１０）式の目的関数の値と前回の目的関数の値との差分が、予め定められた閾値以下であることを用いることができる。 As the convergence condition, for example, it can be used that the number of repetitions has reached the upper limit. Alternatively, as the convergence condition, it can be used that the difference between the value of the objective function in the above equation (10) and the value of the previous objective function is equal to or less than a predetermined threshold value.

出力部９０は、パラメータ推定部３６において取得した、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータとを出力する。 The output unit 90 outputs the base spectrum of each constituent sound and each base acquired by the parameter estimation unit 36, and the activation parameter at each time of each constituent sound and each base.

＜本発明の実施の形態に係る信号解析装置の作用＞
次に、本発明の実施の形態に係る信号解析装置１００の作用について説明する。 <Operation of the signal analysis device according to the embodiment of the present invention>
Next, the operation of the signal analysis device 100 according to the embodiment of the present invention will be described.

入力部１０において、各構成音が混在した観測信号の時系列データを受け付けると、信号解析装置１００は、図２に示すパラメータ推定処理ルーチンを実行する。 When the input unit 10 receives the time-series data of the observation signal in which each constituent sound is mixed, the signal analysis device 100 executes the parameter estimation processing routine shown in FIG.

まず、ステップＳ１２０では、観測信号の時系列データに基づいて、各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムを計算する。 First, in step S120, an amplitude spectrogram or a power spectrogram representing a spectrum at each time is calculated based on the time series data of the observed signal.

ステップＳ１２２では、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、分離行列と、残響除去フィルタとに初期値を設定する。 In step S122, initial values are set in the base spectrum of each constituent sound and each basis, the activation parameter at each time of each constituent sound and each basis, the separation matrix, and the reverberation removal filter.

ステップＳ１２４では、パラメータ更新部４２は、上記ステップＳ１２０で計算された観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、前回更新された、又は初期値が設定された、基底スペクトル、アクティベーションパラメータ、分離行列、及び残響除去フィルタとに基づいて、上記（１５）式に示す補助関数を小さくするように、上記（２０）式、（２１）式に従って、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータとを要素毎に更新する。 In step S124, the parameter update unit 42 includes an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal calculated in step S120 at each time, and a base spectrum and an accusation which have been updated last time or have initial values set. Based on the titration parameters, the separation matrix, and the reverberation elimination filter, the auxiliary functions shown in the above equation (15) are reduced, and the bases in each constituent sound and each basis are according to the above equations (20) and (21). The spectrum and the activation parameters at each time of each constituent note and each base are updated element by element.

ステップＳ１２６では、分離行列更新部４４は、上記ステップＳ１２０で計算された観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、前回更新された、又は初期値が設定された、基底スペクトル、アクティベーションパラメータ、分離行列、及び残響除去フィルタとに基づいて、上記（１０）式に示す目的関数を小さくするように、上記（２６）式、（２７）式に従って、分離行列を更新する。 In step S126, the separation matrix update unit 44 includes an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal calculated in step S120 at each time, and a base spectrum which has been updated last time or has an initial value set. The separation matrix is updated according to the above equations (26) and (27) so as to reduce the objective function shown in the above equation (10) based on the activation parameter, the separation matrix, and the reverberation elimination filter.

ステップＳ１２８では、残響除去フィルタ更新部４６は、上記ステップＳ１２０で計算された観測信号の各時刻のスペクトルを表す振幅スペクトログラム又はパワースペクトログラムと、前回更新された、又は初期値が設定された、基底スペクトル、アクティベーションパラメータ、分離行列、及び残響除去フィルタとに基づいて、上記（１０）式に示す目的関数を小さくするように、上記（３４）式に従って、残響除去フィルタを更新する。 In step S128, the reverberation removal filter update unit 46 includes an amplitude spectrogram or a power spectrogram representing the spectrum of the observed signal calculated in step S120 at each time, and a base spectrum updated last time or set with an initial value. , The reverberation elimination filter is updated according to the above equation (34) so as to reduce the objective function shown in the above equation (10) based on the activation parameter, the separation matrix, and the reverberation elimination filter.

次に、ステップＳ１３０では、収束条件を満たすか否かを判定する。収束条件を満たした場合には、ステップＳ１３２へ移行し、収束条件を満たしていない場合には、ステップＳ１２４へ移行し、ステップＳ１２４～ステップＳ１２８の処理を繰り返す。 Next, in step S130, it is determined whether or not the convergence condition is satisfied. If the convergence condition is satisfied, the process proceeds to step S132, and if the convergence condition is not satisfied, the process proceeds to step S124, and the processes of steps S124 to S128 are repeated.

ステップＳ１３２では、上記ステップＳ１２４で最終的に更新された各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータとを、出力部９０から出力して、パラメータ推定処理ルーチンを終了する。 In step S132, the base spectrum of each constituent sound and each basis finally updated in step S124, and the activation parameter at each time of each constituent sound and each base are output from the output unit 90 and are parameterized. End the estimation processing routine.

＜実験例＞
本実施の形態の手法の有効性を確認するため、ATR speech database の男性話者と女性話者の声を用いて実験を行った。音源数を2、マイク数を4 として、インパルス応答を畳み込むことで高残響(0.6 sec) な状況の混合信号を生成した。比較対象として従来のDMNMF をベースラインとした。図３にその結果を示す。提案手法が他手法に比べて高い分離性能を得られていることが確認できる。 <Experimental example>
In order to confirm the effectiveness of the method of this embodiment, an experiment was conducted using the voices of male and female speakers in the ATR speech database. By convolving the impulse response with 2 sound sources and 4 microphones, a mixed signal with high reverberation (0.6 sec) was generated. The conventional DMNMF was used as the baseline for comparison. The result is shown in FIG. It can be confirmed that the proposed method has obtained higher separation performance than other methods.

以上説明したように、本発明の実施の形態に係る信号解析装置によれば、各構成音の基底スペクトル、各構成音及び各基底の各時刻における音量を表すアクティベーションパラメータ、時間周波数領域で各構成音が混合された混合音を各構成音に分離するための分離行列、及び残響除去フィルタを用いて残響除去された観測信号を各構成音に分離した信号を用いて表される目的関数を小さくするように、各構成音及び各基底における基底スペクトルと、各構成音及び各基底の各時刻におけるアクティベーションパラメータと、前記分離行列と、前記残響除去フィルタとを推定することにより、高残響下であっても、各構成音が混合した混合信号から、各構成音を精度よく分離することができる。 As described above, according to the signal analysis apparatus according to the embodiment of the present invention, the base spectrum of each constituent sound, the activation parameter representing the volume of each constituent sound and each base at each time, and each in the time frequency region. A separation matrix for separating the mixed sound in which the constituent sounds are mixed into each constituent sound, and an objective function expressed using the signal obtained by separating the observation signal whose reverberation has been removed by using the reverberation removal filter into each constituent sound. High reverberation by estimating the base spectrum of each constituent sound and each base, the activation parameter at each time of each constituent sound and each base, the separation matrix, and the reverberation removal filter so as to make it smaller. Even so, each constituent sound can be accurately separated from the mixed signal in which each constituent sound is mixed.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、更新するパラメータの順番には任意性があるため、上記の実施の形態の順番に限定されない。 For example, the order of the parameters to be updated is arbitrary, and is not limited to the order of the above embodiments.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the specification of the present application, the program has been described as a pre-installed embodiment, but the program can be stored in a computer-readable recording medium and provided, or provided via a network. It is also possible to do.

１０入力部
２０演算部
２４時間周波数展開部
３６パラメータ推定部
４０初期値設定部
４２パラメータ更新部
４４分離行列更新部
４６残響除去フィルタ更新部
４８収束判定部
９０出力部
１００信号解析装置 10 Input unit 20 Calculation unit 24 Time frequency expansion unit 36 Parameter estimation unit 40 Initial value setting unit 42 Parameter update unit 44 Separation matrix update unit 46 Reverberation removal filter update unit 48 Convergence judgment unit 90 Output unit 100 Signal analysis device

Claims

Using the observation signal, which is a mixture of each constituent sound, as an input
The base spectrum of each constituent sound, the activation parameter representing the volume of each constituent sound at each time, the separation matrix for separating the mixed sound in which each constituent sound is mixed in the time frequency region, and the separation matrix for each constituent sound. The base spectrum of each constituent sound and each base, and each constituent sound and each Includes a parameter estimator that estimates activation parameters at each base time, said separation matrix, and said reverberation filter.
The parameter estimation unit is
A parameter updater that updates the base spectrum of each constituent sound and each basis and the activation parameter at each time of each constituent sound and each basis so as to reduce the auxiliary function that is the upper bound function of the objective function.
A separation matrix updater that updates the separation matrix so as to make the objective function smaller,
A reverberation removal filter update unit that updates the reverberation removal filter so as to make the objective function smaller,
A convergence test unit that repeats the update by the parameter update unit, the update by the separation matrix update unit, and the update by the reverberation removal filter update unit until a predetermined convergence condition is satisfied.
Including
The objective function is a signal analysis device represented by the following equation .

However,

And N represents the total number of time frames

Represents a separation matrix of frequency f, (・) ^H is a complex conjugate transposition of a vector, h _{j and k} represent the basis spectra of the basis k and the constituent sound j, and u _{j and k} represent the constituent sounds. Represents the activation parameters of j and the basis k, and s _j (f, n) represents the component of the frequency f of the time frame n of the signal obtained by separating the reverberation-removed observation signal into the constituent sounds j.

The parameter estimator uses the observation signal, which is a mixture of each constituent sound, as an input.
The base spectrum of each constituent sound, the activation parameter representing the volume of each constituent sound at each time, the separation matrix for separating the mixed sound in which each constituent sound is mixed in the time frequency region, and the separation matrix for each constituent sound. The base spectrum of each constituent sound and each base, and each constituent sound and each Estimate the activation parameters at each base time, the separation matrix, and the reverberation filter.
Including that
By estimating by the parameter estimation unit,
The parameter update unit updates the base spectrum of each constituent sound and each basis and the activation parameter at each time of each constituent sound and each basis so as to reduce the auxiliary function which is the upper bound function of the objective function. ,
The separation matrix update unit updates the separation matrix so as to make the objective function smaller.
The reverberation removal filter update unit updates the reverberation removal filter so as to make the objective function smaller.
The convergence test unit repeats the update by the parameter update unit, the update by the separation matrix update unit, and the update by the reverberation removal filter update unit until the predetermined convergence condition is satisfied.
The objective function is a signal analysis method represented by the following equation .

However,

And N represents the total number of time frames

A program for operating a computer as each part of the signal analysis device according to claim 1 .