JP2011081293A

JP2011081293A - Signal separation device and signal separation method

Info

Publication number: JP2011081293A
Application number: JP2009234978A
Authority: JP
Inventors: Tomoya Takatani; 智哉高谷; Jani Even; ジャニエバン
Original assignee: Nara Institute of Science and Technology NUC; Toyota Motor Corp
Current assignee: Nara Institute of Science and Technology NUC; Toyota Motor Corp
Priority date: 2009-10-09
Filing date: 2009-10-09
Publication date: 2011-04-21
Also published as: WO2011042808A8; WO2011042808A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal separation system accurately recognizing a user's voice with few calculation load, even if it includes an internal noise source. <P>SOLUTION: The signal separation system includes an external microphone which purposes collection of user's voice, and an internal sensor for detecting only internal noise from the system internal noise source. An independent component analysis section separates a signal into a separation signal for outputting the internal noise and a signal group which does not include it, by optimizing a separation filter matrix. A permutation solution section performs permutation solution on the separation signal group which does not include the internal noise. In the permutation solution section, a value of a scale parameter of Laplace distribution when the separation signal is fitted by the Laplace distribution is obtained, and the separation signal which has the maximum parameter is set to be user's voice. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、複数の信号が空間内で混合された状態において、特定の信号を抽出する信号分離装置および信号分離方法に関し、特に、パーミュテーション解決技術に関する。 The present invention relates to a signal separation device and a signal separation method for extracting a specific signal in a state where a plurality of signals are mixed in a space, and more particularly to a permutation solving technique.

複数の原信号が未知の係数によって線形に混合されているときに、統計的独立性を用いて原信号を分離・復元する独立成分分析（Independent Component Analysis; ICA）が知られている（特許文献1）。 Independent Component Analysis (ICA) is known that separates and restores original signals using statistical independence when multiple original signals are linearly mixed by unknown coefficients (Patent Literature). 1).

複数の原信号s(t)を複数のマイクロフォンで観測した観測信号をx(t)とする。 An observation signal obtained by observing a plurality of original signals s (t) with a plurality of microphones is defined as x (t).

ICAにおいては、観測信号x(t)を短時間離散フーリエ変換によって時間周波数領域の信号（X(f, t)）に変換した信号を用い、周波数領域の独立成分分析によってS（ｆ、ｔ）を推定する。
ここで、原信号s(t)および観測信号x(t)を短時間フーリエ変換したものをそれぞれS（ｆ、ｔ）、X（ｆ、ｔ）とする。
時間周波数領域でＳ（ｆ，ｔ）を推定するには、先ず、下記式のような式を考える。
この式において、Ｙ（ｆ，ｔ）はｋ番目の出力Y_ｋ（ｔ）を要素とする列ベクトルを表す。
Ｗ（ｆ）はｗ_ｉｊ（ｆ）を要素とするｎ×ｎの行列（分離行列）を表す。 In ICA, the observed signal x (t) is converted into a time-frequency domain signal (X (f, t)) by short-time discrete Fourier transform, and S (f, t) is analyzed by frequency-domain independent component analysis. Is estimated.
Here, the original signal s (t) and the observation signal x (t), which are obtained by short-time Fourier transform, are S (f, t) and X (f, t), respectively.
To estimate S (f, t) in the time-frequency domain, first consider the following equation.
In this equation, Y (f, t) represents a column vector whose elements are the k-th output Y _k (t).
W (f) represents an n × n matrix (separation matrix) whose elements are w _ij (f).

次に、周波数ビンｆを固定してｔを変化させたときにＹ１（ｆ，ｔ）〜Ｙｎ（ｆ，ｔ）が統計的に独立となる（実際には、独立性が最大となる）ようなＷ（ｆ）を求める。
統計的に独立となるＹ１（ｆ，ｔ）〜Ｙｎ（ｆ，ｔ）が全てのｆについて得られたら、それらを逆フーリエ変換することで、時間領域の分離信号ｙ（ｔ）を得ることができる。 Next, when frequency bin f is fixed and t is changed, Y1 (f, t) to Yn (f, t) are statistically independent (in practice, the independence is maximized). Find W (f).
When Y1 (f, t) to Yn (f, t) that are statistically independent are obtained for all f, a time domain separation signal y (t) can be obtained by performing inverse Fourier transform on them. it can.

しかしながら、時間周波数領域の独立成分分析では、信号の分離処理を周波数binごとに行っており、周波数binの間の関係は考慮していない。
そのため、分離自体は成功しても、周波数binの間で分離先の不統一が発生する可能性がある。
分離先の不統一とは、例えばｆ＝１ではＹ１にＳ１由来の信号が現れるのに対してｆ＝２ではＹ１にＳ２由来の信号が現れる、というような現象のことであり、パーミュテーション（置換）の問題と呼ばれている。 However, in the independent component analysis in the time-frequency domain, signal separation processing is performed for each frequency bin, and the relationship between the frequencies bin is not considered.
For this reason, even if the separation itself is successful, there is a possibility that the separation destinations may be inconsistent among the frequency bins.
Separation of separation destinations is, for example, a phenomenon in which a signal derived from S1 appears in Y1 at f = 1, whereas a signal derived from S2 appears in Y1 at f = 2. It is called a (replacement) problem.

特許文献1では、信号の到来方向を推定し、各信号の方位情報に基づいて信号にラベル付けを行うことでパーミュテーション問題を解決する手法が開示されている。
しかしながら、実際にはすべての音源が点音源であるとは限らないので、信号の到来方向を正しく推定できるとは限らない。
たとえば拡散性ノイズの場合にはノイズの方位を特定することができず、ラベル付けの間違いが発生してしまう。 Patent Document 1 discloses a technique for solving the permutation problem by estimating the arrival direction of a signal and labeling the signal based on the direction information of each signal.
However, since not all sound sources are actually point sound sources, however, the arrival direction of a signal cannot always be estimated correctly.
For example, in the case of diffusive noise, the direction of the noise cannot be specified, and a labeling error occurs.

また、特許文献2、非特許文献1には、分離した信号の結合確率密度分布を求め、この結合確率密度分布の形状に基づいて分離した信号を音声とノイズとに振り分ける手法を開示している。この手法では、たとえば、結合確率密度分布が非ガウス分布である信号を特定の音声信号と判定し、ガウス分布である信号をノイズ信号と判定する。
この手法によれば、雑音（拡散性ノイズ）に対しても正確にラベル付けを行い、高い精度で信号の分離先を決定することができる。 Patent Document 2 and Non-Patent Document 1 disclose a technique for obtaining a joint probability density distribution of separated signals and allocating the separated signals to speech and noise based on the shape of the joint probability density distribution. . In this method, for example, a signal whose joint probability density distribution is a non-Gaussian distribution is determined as a specific speech signal, and a signal which is a Gaussian distribution is determined as a noise signal.
According to this method, it is possible to accurately label noise (diffusive noise) and determine a signal separation destination with high accuracy.

特開2004-145172号公報JP 2004-145172 A WO/2009/113192WO / 2009/113192

Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, ``An Improved permutation solver for blind signal separation based front-ends in robot audition,'' IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS2008), Nice, France, pp. 2172--2177, September 2008.Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, `` An Improved permutation solver for blind signal separation based front-ends in robot audition, '' IEEE / RSJ International Conference on Intelligent Robotics and Systems (IROS2008), Nice, France, pp. 2172 --2177, September 2008.

ここで、実際に信号分離装置を利用する環境としては次のような場合が想定される。
図5は、音声認識機能を有するロボット10を示す図である。
このロボット10は、複数のマイク11からなるマイクアレイ12と、マイクアレイ12からの観測信号を信号処理する信号分離装置20と、を備えている。
この構成において、マイクアレイ12には、ユーザー音声S1とともに周辺ノイズS2が入る。
さらに、ロボット自身がノイズ発生源となる。
すなわち、ロボット10はモーターなどの動力源30を備えているので、この動力源30からのノイズ音S3もマイク11に入ってしまうことになる。 Here, as an environment in which the signal separation device is actually used, the following cases are assumed.
FIG. 5 is a diagram showing a robot 10 having a voice recognition function.
The robot 10 includes a microphone array 12 composed of a plurality of microphones 11 and a signal separation device 20 that performs signal processing on observation signals from the microphone array 12.
In this configuration, the microphone array 12 receives ambient noise S2 along with the user voice S1.
Further, the robot itself becomes a noise generation source.
That is, since the robot 10 includes the power source 30 such as a motor, the noise sound S3 from the power source 30 also enters the microphone 11.

したがって、観測信号x(t)には動力源30からのノイズS3が含まれることになる。
このようにユーザーの音声S₁(f、t)、周辺ノイズS₂(f、t)、および、動力ノイズS₃(f、t)を含んだ信号を独立成分分析して、統計的に独立となるＹ１（ｆ，ｔ）〜Ｙｎ（ｆ，ｔ）を求めることとなる。
そのうえで各分離信号Ｙ１（ｆ，ｔ）〜Ｙｎ（ｆ，ｔ）にラベル付けを行うことになる。
しかしながら、上記のように結合確率密度分布が非ガウス分布である信号を単純にユーザーの音声であると判定すると、ラベル付けに間違いが生じる恐れがある。
これは、動力源30のノイズS3も高い尖度を有する非ガウス分布の結合確率密度を示すからである。 Therefore, the observation signal x (t) includes the noise S3 from the power source 30.
In this way, independent component analysis is performed on the signal including user's voice S ₁ (f, t), ambient noise S ₂ (f, t), and power noise S ₃ (f, t), and statistically independent. Y1 (f, t) to Yn (f, t) are obtained.
In addition, the separation signals Y1 (f, t) to Yn (f, t) are labeled.
However, if it is determined that a signal having a joint probability density distribution having a non-Gaussian distribution is simply a user's voice as described above, there is a possibility that an error may occur in labeling.
This is because the noise S3 of the power source 30 also exhibits a non-Gaussian distribution probability density having a high kurtosis.

このように特許文献2、非特許文献1に開示された従来の手法を実際の環境に適用した場合、分離信号のラベル付けを間違えてしまう恐れがある。
さらに、結合確率密度分布を求める演算は計算量が非常に多く、ユーザーの音声、周辺ノイズに加えて動力ノイズについても結合確率密度分布の形状を求めるとなると、その計算負荷が大きすぎる。 As described above, when the conventional methods disclosed in Patent Document 2 and Non-Patent Document 1 are applied to an actual environment, there is a possibility that the labeling of the separated signal is mistaken.
Furthermore, the calculation for obtaining the joint probability density distribution has a very large amount of calculation, and if the shape of the joint probability density distribution is obtained for the power noise in addition to the user's voice and ambient noise, the calculation load is too large.

本発明の信号分離システムは、
複数の信号が混合された時間領域の観測信号を独立成分分析を用いて信号ごとに分離し、分離した信号のうちから特定のユーザー音声を抽出する信号分離システムであって、
外部に向けて設けられた外部マイクと、
システム内に存在する内部ノイズ源からの内部ノイズのみを検知する内部センサと、
前記外部マイクおよび前記内部センサからの信号を離散フーリエ変換する離散フーリエ変換部と、
独立成分分析により互いに独立した分離信号を取り出す独立成分分析部と、
独立成分分析の結果に対してパーミュテーション解決を実行するパーミュテーション解決部と、を備え、
前記独立成分分析部は、前記内部センサからの検知信号を用いて特定の内部ノイズ分離信号が前記内部ノイズ源からのノイズだけを含むようにし、この内部ノイズ分離信号と独立になるように調整することにより前記内部ノイズを含まない分離信号を取り出し、
前記パーミュテーション解決部は、前記内部ノイズを含まない前記分離信号についてパーミュテーション解決を実行する
ことを特徴とする。 The signal separation system of the present invention comprises:
A signal separation system that separates a time domain observation signal in which a plurality of signals are mixed into each signal using independent component analysis, and extracts a specific user voice from the separated signals,
An external microphone provided to the outside,
An internal sensor that only detects internal noise from internal noise sources present in the system;
A discrete Fourier transform unit for performing a discrete Fourier transform on signals from the external microphone and the internal sensor;
An independent component analyzer that extracts independent separated signals by independent component analysis;
A permutation resolution unit that performs permutation resolution on the result of independent component analysis,
The independent component analysis unit uses a detection signal from the internal sensor so that a specific internal noise separation signal includes only noise from the internal noise source, and adjusts the internal noise separation signal to be independent of the internal noise separation signal. By taking out the separated signal not containing the internal noise,
The permutation resolution unit performs permutation resolution on the separated signal not including the internal noise.

本発明では、
前記パーミュテーション解決部は、
前記分離信号の確率密度分布の尖り度であるスパイクドネスを算出するスパイクドネス算出部と、
前記スパイクドネスに基づいて前記分離信号にユーザー音声または周辺ノイズのラベル付けを実行するクラスタリング部と、を備え、
前記スパイクドネス算出部は、前記スパイクドネスとして、分離信号をラプラス分布でフィッティングしたときのラプラス分布のスケールパラメータを求める
ことが好ましい。 In the present invention,
The permutation resolution unit
A spikedness calculating unit that calculates a spikedness that is a kurtosis of a probability density distribution of the separated signal;
A clustering unit that performs user voice or ambient noise labeling on the separated signal based on the spikedness, and
It is preferable that the spikedness calculating unit obtains a scale parameter of a Laplace distribution when the separated signal is fitted with a Laplace distribution as the spikedness.

本発明では、
前記クラスタリング部は、前記スパイクドネスが最も大きい分離信号をユーザー音声とする
ことが好ましい。 In the present invention,
The clustering unit preferably uses the separated signal having the largest spikedness as a user voice.

第1実施形態に係る信号分離装置を搭載したロボットを示す図。1 is a diagram showing a robot equipped with a signal separation device according to a first embodiment. FIG. 信号分離装置のブロック図。The block diagram of a signal separation apparatus. パーミュテーション解決部340のブロック図。The block diagram of the permutation solution part 340. FIG. 観測信号x₁(t)、x₂(t)からスパイクドネス（スケールパラメータαi（ｆ））を求めるまでの流れの概略を示す図。Observed signal x ₁ (t), shows an outline of a flow from the x ₂ (t) to obtain the spikedness (scale parameter αi (f)). 音声認識機能を有するロボット10を示す図。The figure which shows the robot 10 which has a voice recognition function.

本発明の実施の形態を図示するとともに図中の各要素に付した符号を参照して説明する。
（第1実施形態）
本発明に係る第1実施形態について説明する。
図1は、第1実施形態に係る信号分離装置を搭載したロボットを示す図である。
ロボット100には、外部マイク110と、内部センサ120と、信号分離装置200と、が設けられている。 Embodiments of the present invention will be illustrated and described with reference to reference numerals attached to respective elements in the drawings.
(First embodiment)
A first embodiment according to the present invention will be described.
FIG. 1 is a diagram illustrating a robot equipped with the signal separation device according to the first embodiment.
The robot 100 is provided with an external microphone 110, an internal sensor 120, and a signal separation device 200.

外部マイク110はロボット100の体表面に設置された集音マイクである。
ここでは説明のため、第1外部マイク111と、第2外部マイク112と、が設けられているとする。
このとき、外部マイク110には、ユーザーからの音声S1および周辺からのノイズS2が入る。
加えて、外部マイク110には、動力源30からのノイズS3も入る。 The external microphone 110 is a sound collecting microphone installed on the body surface of the robot 100.
Here, for the sake of explanation, it is assumed that a first external microphone 111 and a second external microphone 112 are provided.
At this time, voice S1 from the user and noise S2 from the surroundings enter the external microphone 110.
In addition, noise S3 from the power source 30 also enters the external microphone 110.

内部センサ120は、動力源30からのノイズS3を限定的に検知するセンサである。
内部センサ120は、動力源30からのノイズを検知する一方、外部からの音信号（S1、S2）は検知しないようになっている。内部センサ120は、たとえば、外部マイク110の裏など、外部マイクに近接した位置に配設されることが好ましい。
このように動力源30からノイズS3を限定的に検知するセンサとしては、たとえば加速度センサあるいは指向性の高いマイクが例として挙げられる。 The internal sensor 120 is a sensor that detects the noise S3 from the power source 30 in a limited manner.
The internal sensor 120 detects noise from the power source 30, but does not detect external sound signals (S1, S2). The internal sensor 120 is preferably disposed at a position close to the external microphone, such as the back of the external microphone 110, for example.
As examples of the sensor that detects the noise S3 from the power source 30 in a limited manner as described above, for example, an acceleration sensor or a microphone having high directivity can be cited.

なお、外部マイク110および内部センサ120の数は限定されず、必要に応じて増減されるものである。
たとえば、外部マイク110が複数ある場合、外部マイクごとに内部センサを設けるようにしてもよい。 The numbers of external microphones 110 and internal sensors 120 are not limited and can be increased or decreased as necessary.
For example, when there are a plurality of external microphones 110, an internal sensor may be provided for each external microphone.

ここで、ユーザー音声をS₁(f、t)とし、周辺ノイズをS₂(f、t)とし、動力ノイズをS₃(f、t)として表す。
また、第1外部マイク111による観測信号をX₁(f、t)、第2外部マイク112による観測信号をX₂(f、t)、内部センサ120による観測信号をR₁(f、t)、として表す。
このとき、未知の係数行列A(f)を用いて、原信号と観測信号との関係は次のようになる。 Here, the user voice is represented as S ₁ (f, t), the ambient noise is represented as S ₂ (f, t), and the power noise is represented as S ₃ (f, t).
The observation signal from the first external microphone 111 is X ₁ (f, t), the observation signal from the second external microphone 112 is X ₂ (f, t), and the observation signal from the internal sensor 120 is R ₁ (f, t). , Expressed as
At this time, using the unknown coefficient matrix A (f), the relationship between the original signal and the observed signal is as follows.

ここで、第1外部マイク111および第2外部マイク112にはユーザー音声S₁(f、t)、周辺ノイズS₂(f、t)および動力ノイズS₃(f、t)が入るので、X₁(f、t)、X₂(f、t)に対応する係数行列Aの成分（A₁₁(f)、A₁₂(f)、A₁₃(f)、A₂₁(f)、A₂₂(f)、A₂₃(f)）は０ではない係数がはいる。
これに対し、内部センサ120には、ユーザー音声S₁(f、t)および周辺ノイズS₂(f、t)が入らないので、R₁(f、t)に対応する係数行列Aの成分（0、0、A₃₃(f)）としては動力ノイズ30に対応する係数A₃₃(f)の他は0になる。 Here, since the user voice S ₁ (f, t), the ambient noise S ₂ (f, t) and the power noise S ₃ (f, t) are input to the first external microphone 111 and the second external microphone 112, X ₁ (f, t), X ₂ (f, t) components of coefficient matrix A (A ₁₁ (f), A ₁₂ (f), A ₁₃ (f), A ₂₁ (f), A ₂₂ ( f) and A ₂₃ (f)) have non-zero coefficients.
On the other hand, since the user sensor S ₁ (f, t) and the ambient noise S ₂ (f, t) do not enter the internal sensor 120, the component of the coefficient matrix A corresponding to R ₁ (f, t) ( 0, 0, A ₃₃ (f)) is 0 except for the coefficient A ₃₃ (f) corresponding to the power noise 30.

図2は、信号分離装置のブロック図である。
信号分離装置200は、アナログ／デジタル（A／D）変換部210と、雑音抑圧処理部300と、音声認識部220を備えている。 FIG. 2 is a block diagram of the signal separation device.
The signal separation device 200 includes an analog / digital (A / D) conversion unit 210, a noise suppression processing unit 300, and a speech recognition unit 220.

A／D変換部210は、外部マイク110および内部センサ120から入力されたそれぞれの信号をデジタル信号に変換して雑音抑圧処理部300に出力する。 The A / D conversion unit 210 converts each signal input from the external microphone 110 and the internal sensor 120 into a digital signal and outputs the digital signal to the noise suppression processing unit 300.

雑音抑圧処理部300は、入力されたデジタル信号に含まれるノイズを抑圧する処理を実行する。
雑音抑圧処理部300は、短時間離散フーリエ変換部310、独立成分分析部320、利得補正部330、パーミュテーション解決部340、逆離散フーリエ変換部350を備えている。 The noise suppression processing unit 300 executes processing for suppressing noise included in the input digital signal.
The noise suppression processing unit 300 includes a short-time discrete Fourier transform unit 310, an independent component analysis unit 320, a gain correction unit 330, a permutation resolution unit 340, and an inverse discrete Fourier transform unit 350.

短時間離散フーリエ変換部310は、AD変換部210からの各デジタルデータに対して短時間離散フーリエ変換を実行する。 The short-time discrete Fourier transform unit 310 performs short-time discrete Fourier transform on each digital data from the AD conversion unit 210.

独立成分分析部320は、短時間離散フーリエ変換部310で得られた時間−周波数領域表現された観測信号に対して独立成分分析（ICA：Independent Component Analysis）を行い、各周波数ビンごとに分離行列を算出する。
独立成分分析の具体的な処理については、例えば、特許文献1に詳細に開示されている。 The independent component analysis unit 320 performs independent component analysis (ICA) on the observation signal expressed in the time-frequency domain obtained by the short-time discrete Fourier transform unit 310, and separates the separation matrix for each frequency bin. Is calculated.
Specific processing of independent component analysis is disclosed in detail, for example, in Patent Document 1.

ここで、観測信号x₁(t)、x₂(t)、r₁(t)、をそれぞれ短時間離散フーリエ変換したものをX₁(f, t)、X₂(f, t)、R₁(f, t)として表す。
そして、分離行列W(f)を用いて、統計的に独立な分離信号Y₁(f, t)、Y₂(f, t)、Q₁(f, t)が取り出されるとする。 Here, X ₁ (f, t), X ₂ (f, t), R, which are short-time discrete Fourier transforms of the observed signals x ₁ (t), x ₂ (t), r ₁ (t), respectively Expressed as ₁ (f, t).
Then, it is assumed that statistically independent separation signals Y ₁ (f, t), Y ₂ (f, t), and Q ₁ (f, t) are extracted using the separation matrix W (f).

本実施形態では、動力ノイズS₃ (f, t)だけを含むR₁(f, t)に係数（W₃₃(f)）を乗算した分離信号Q₁(f、t)（内部ノイズ分離信号）を生成する。
ICAは、このQ₁(f、t)と分離信号Y₁(f、t)、Y₂(f、t)とが互いに独立になるように分離フィルタ行列W(f)を適応学習するので、動力ノイズを含まない分離信号Y₁(f、t)、Y₂(f、t)が取り出される（セミブラインド信号分離）。
すなわち、Y₁(f、t)、Y₂(f、t)は、動力ノイズ以外の成分、すなわち、ユーザー音声および周辺ノイズのいずれかである。 In this embodiment, a separated signal Q ₁ (f, t) (internal noise separated signal) obtained by multiplying R ₁ (f, t) including only the power noise S ₃ (f, t) by a coefficient (W ₃₃ (f)). ) Is generated.
ICA adaptively learns the separation filter matrix W (f) so that Q ₁ (f, t) and separation signals Y ₁ (f, t), Y ₂ (f, t) are independent from each other. Separation signals Y ₁ (f, t) and Y ₂ (f, t) that do not include power noise are taken out (semi-blind signal separation).
That is, Y ₁ (f, t) and Y ₂ (f, t) are components other than power noise, that is, any of user voice and ambient noise.

利得補正部330は、独立成分分析部320によって算出された各周波数での分離行列に対して利得補正処理を実行する。 Gain correction section 330 performs gain correction processing on the separation matrix at each frequency calculated by independent component analysis section 320.

パーミュテーション解決部340は、パーミュテーション問題を解決するための処理を実行する。
図3は、パーミュテーション解決部340のブロック図である。
ここで、本実施形態においては、独立成分分析部320において分離されたY₁(f、t)、Y₂(f、t)、Q₁(f、t)のうち、すでに、Y₁(f、t)とY₂(f、t)とが、動力ノイズ以外の成分、すなわち、ユーザー音声か周辺ノイズのいずれかであることがわかっている。
したがって、パーミュテーションの対象となるのは、Y₁(f、t)とY₂(f、t)である。
パーミュテーション解決部340には、分離信号Y₁(f、t)、Y₂(f、t)が入力され、分離信号Q₁(f、t)については次段の逆フーリエ変換部350に直接送られる。 The permutation resolution unit 340 executes processing for solving the permutation problem.
FIG. 3 is a block diagram of the permutation resolution unit 340.
Here, in the present embodiment, among Y ₁ (f, t), Y ₂ (f, t), and Q ₁ (f, t) separated by the independent component analysis unit 320, Y ₁ (f , T) and Y ₂ (f, t) are components other than power noise, that is, either user speech or ambient noise.
Therefore, the target of permutation is Y ₁ (f, t) and Y ₂ (f, t).
The permutation resolution unit 340 receives the separation signals Y ₁ (f, t) and Y ₂ (f, t), and the separation signal Q ₁ (f, t) is input to the inverse Fourier transform unit 350 in the next stage. Sent directly.

そして、本実施形態のパーミュテーション解決においては、ユーザー音声の確率密度分布が周辺ノイズの確率密度分布に比べてより鋭く尖った形状（spiker）であることを利用する。
さらに、確率密度分布のスパイクドネス（尖り度）を見積もるために、ラプラス分布のスケールパラメータα_i(f)を用いる。
ここで、ラプラス分布のスケールパラメータα_i(f)を推定するにあたっては、分離信号Y(f, t)の絶対値の期待値を利用する。
以下、順に説明する。 In the permutation solution of the present embodiment, the fact that the probability density distribution of user speech is sharper and sharper than the probability density distribution of ambient noise is utilized.
Further, the Laplace distribution scale parameter α _i (f) is used to estimate the spikedness (sharpness) of the probability density distribution.
Here, in estimating the scale parameter α _i (f) of the Laplace distribution, the expected value of the absolute value of the separated signal Y (f, t) is used.
Hereinafter, it demonstrates in order.

パーミュテーション解決部340は、スパイクドネス（Spikedness）算出部341と、クラスタリング決定部342と、を備える。 The permutation resolution unit 340 includes a spikedness calculation unit 341 and a clustering determination unit 342.

スパイクドネス算出部341は、分離信号Y₁、Y₂の確率密度分布のスパイクドネス（分布の尖り度）を求める。
スパイクドネスとしては、分離信号Y_i(f、t)をラプラス分布でフィッティングしたときのラプラス分布のスケールパラメータα_i(f)を用いる。
そして、スケールパラメータα_i(f)は最尤推定法を用いることで、次式により算出ができる。 The spikedness calculating unit 341 obtains the spikedness (distribution sharpness) of the probability density distribution of the separation signals Y ₁ and Y ₂ .
As spikedness, the scale parameter α _i (f) of the Laplace distribution when the separation signal Y _i (f, t) is fitted with the Laplace distribution is used.
The scale parameter α _i (f) can be calculated by the following equation using the maximum likelihood estimation method.

ここで、Y（ｆ、ｔ）は複素スペクトルであるので、｜Y（ｆ、ｔ）｜は複素数の絶対値を意味する。
また、ε_ｔ{｜Y（ｆ、ｔ）｜}は、所定フレーム数における｜Y（ｆ、ｔ）｜の平均を意味する。 Here, since Y (f, t) is a complex spectrum, | Y (f, t) | means the absolute value of a complex number.
Further, ε _t {| Y (f, t) |} means an average of | Y (f, t) | in a predetermined number of frames.

ここで、図4は、観測信号x₁(t)、x₂(t)、r₁(t)からスパイクドネス（スケールパラメータα_i(f)）を求めるまでの流れの概略を示す図である。
第1外部マイク111で集音された音声信号がx₁(t)、第2外部マイク112で集音された音声信号がx₂(t)、内部センサ120によって検知された信号がr₁(t)である。
これを、所定時間幅の窓（フレーム）で離散フーリエ変換した結果がX₁(f、t)、X₂(f、t)、R₁(f、t)である。
X₁(f、t)、X₂(f、t)、R₁(f、t)に対する独立成分分析の結果がY₁(f、t)、Y₂(f、t)、Q₁(f、t)である。
このとき、周波数ビン（bin）f＝f_kのときのスパイクドネス（スケールパラメータα_i(f_k)）は、たとえば、t₀−t₂の時間幅を用いて次のように表わされる。 Here, FIG. 4 is a diagram showing an outline of a flow from the observation signals x ₁ (t), x ₂ (t), and r ₁ (t) to obtaining the spikedness (scale parameter α _i (f)).
The sound signal collected by the first external microphone 111 is x ₁ (t), the sound signal collected by the second external microphone 112 is x ₂ (t), and the signal detected by the internal sensor 120 is r ₁ ( t).
The result of performing discrete Fourier transform on a window (frame) having a predetermined time width is X ₁ (f, t), X ₂ (f, t), and R ₁ (f, t).
The results of independent component analysis for X ₁ (f, t), X ₂ (f, t), R ₁ (f, t) are Y ₁ (f, t), Y ₂ (f, t), Q ₁ (f , T).
At this time, the spikedness (scale parameter α _i (f _k )) when the frequency bin (bin) f = f _k is expressed as follows using a time width of t ₀ −t ₂ , for example.

クラスタリング決定部342は、前記のように求められたスパイクドネス（スケールパラメータα_i(f_k)）を用いてY₁(f_k、t)、Y₂(f_k、t)のラベル付けを行い、必要があればY₁(f_k、t)、Y₂(f_k、t)の入れ替え作業を実行する。
すなわち、Y₁(f_k、t)、Y₂(f_k、t)のうちの一方をユーザー音声と判定し、他方を周辺ノイズと判定し、すべての周波数ビンにおいてユーザー音声と周辺ノイズとの振り分けが統一されるようにする。
具体的には、スパイクドネス（スケールパラメータα_i(f_k)）が最も大きいものをユーザー音声であると判定する。 The clustering determination unit 342 labels Y ₁ (f _k , t) and Y ₂ (f _k , t) using the spikedness (scale parameter α _i (f _k )) obtained as described above, If necessary, replacement work of Y ₁ (f _k , t) and Y ₂ (f _k , t) is executed.
That is, _one of Y ₁ (f _k , t) and Y ₂ (f _k , t) is determined as user speech, the other is determined as ambient noise, and the user speech and ambient noise between all frequency bins are determined. Ensure that the distribution is unified.
Specifically, the one having the largest spikedness (scale parameter α _i (f _k )) is determined as the user voice.

たとえば、インデックス番号１にユーザー音声を振り分け、インデックス番号2に周辺ノイズを振り分けるとすると、次のような処理になる。
（ケース１）
ケース１として、α₁(f_k)≧α₂(f_k)のときを考える。
この場合、Y₁(f_k, t)がユーザー音声となっており、Y₂(f_k, t)が周辺ノイズとなっていると判断できる。
この場合、入れ替え作業は必要ない。 For example, if user voice is assigned to index number 1 and ambient noise is assigned to index number 2, the following processing is performed.
(Case 1)
As case 1, consider the case of α ₁ (f _k ) ≧ α ₂ (f _k ).
In this case, it can be determined that Y ₁ (f _k , t) is the user voice and Y ₂ (f _k , t) is the ambient noise.
In this case, replacement work is not necessary.

（ケース２）
ケース2として、α₁(f_k)＜α₂(f_k)のときを考える。
この場合、Y₂(f_k, t)がユーザー音声となっており、Y₁(f_k, t)が周辺ノイズとなっていると判断できる。
この場合、この周波数ビンf_kでは入れ替え作業を実行する。 (Case 2)
Case 2 is considered when α ₁ (f _k ) <α ₂ (f _k ).
In this case, it can be determined that Y ₂ (f _k , t) is the user voice and Y ₁ (f _k , t) is the ambient noise.
In this case, the replacement work is executed in this frequency bin f _k .

このようなクラスタリングをすべての周波数ビンで実行する。 Such clustering is performed on all frequency bins.

最後に、逆離散フーリエ変換部350は、逆離散フーリエ変換を実行し、周波数領域のデータY₁(f, t)、Y₂(f, t)、Q₁(f, t)を時間領域のデータに変換して出力する。 Finally, the inverse discrete Fourier transform unit 350 performs inverse discrete Fourier transform, and converts the frequency domain data Y ₁ (f, t), Y ₂ (f, t), and Q ₁ (f, t) in the time domain. Convert to data and output.

このような構成によれば、次の効果を奏することができる。
（１）内部ノイズ源（動力源）30からのノイズだけを限定的に検知する内部センサ120を設けている。
そして、独立成分分析にあたっては、内部ノイズを推定するQ₁(f、t)とそれ以外の分離信号Y₁(f、t)、Y₂(f、t)間が互いに独立になるように最適化される。
Q₁(f、t)は内部センサ120からの信号R₁(f、t)だけから生成されるのでQ₁(f、t)に内部ノイズが必ず出力される。
仮に、分離信号Y₁(f、t)、Y₂(f、t)に内部ノイズが含まれた場合、相関が生じるので、その成分はICAの最適化により除去されることになる。
従って、内部ノイズはQ₁(f、t)だけに出力される。
これにより、Q₁(f、t)以外の分離信号Y₁(f、t)、Y₂(f、t)のどれかがユーザー音声となる。
すなわち、Q₁(f、t)以外の分離信号Y₁(f、t)、Y₂(f、t)に対してパーミュテーション問題を解決すればよい。
したがって、パーミュテーション解決の計算負荷を減少させることができる。 According to such a configuration, the following effects can be achieved.
(1) An internal sensor 120 that detects only noise from the internal noise source (power source) 30 is provided.
For independent component analysis, it is optimal that Q ₁ (f, t) for estimating internal noise and the other separated signals Y ₁ (f, t), Y ₂ (f, t) are independent from each other. It becomes.
Since Q ₁ (f, t) is generated only from the signal R ₁ (f, t) from the internal sensor 120, the internal noise is always output to Q ₁ (f, t).
If internal signals are included in the separated signals Y ₁ (f, t) and Y ₂ (f, t), correlation occurs, and the components are removed by ICA optimization.
Therefore, the internal noise is output only to Q ₁ (f, t).
As a result, any _one of the separated signals Y ₁ (f, t) and Y ₂ (f, t) other than Q ₁ (f, t) becomes the user voice.
That is, the permutation problem may be solved for the separated signals Y ₁ (f, t) and Y ₂ (f, t) other than Q ₁ (f, t).
Therefore, the calculation load of permutation solution can be reduced.

（２）内部ノイズ源（動力源）30からのノイズは、確率密度分布の尖度が大きいなどユーザー音声とよく似ており、内部ノイズとユーザー音声との間ではパーミュテーション問題を解決しにくい場合がある。
この点、本実施形態では、内部ノイズだけを検知するセンサを利用し、かつ、分離フィルタ行列W（ｆ）の成分W₃₁(f)、W₃₂(f)を０としてモデル化することで分離信号Q₁(f、t)に内部ノイズを集約させ、残りの分離信号Y₁(f、t)、Y₂(f、t)に含まれないようにしている。
したがって、ユーザー音声を分離して取り出す正確さを向上させることができる。 (2) The noise from the internal noise source (power source) 30 is very similar to the user voice, such as the kurtosis of the probability density distribution is large, and it is difficult to solve the permutation problem between the internal noise and the user voice. There is a case.
In this regard, in the present embodiment, separation is performed by using a sensor that detects only internal noise and modeling the components W ₃₁ (f) and W ₃₂ (f) of the separation filter matrix W (f) as 0. Internal noise is aggregated in the signal Q ₁ (f, t) so that it is not included in the remaining separated signals Y ₁ (f, t), Y ₂ (f, t).
Therefore, it is possible to improve the accuracy of separating and extracting the user voice.

（３）本実施形態では、ラベル付けにあたっては、分離信号Y₁(f、t)、Y₂(f、t)の確率密度分布のスパイクドネス（分布の尖り度）を用い、さらに、スパイクドネスとしては、分離信号Y_i(f、t)をラプラス分布でフィッティングしたときのラプラス分布のスケールパラメータα_i(f)を用いる。
この手法によれば、計算量を格段に少なくすることができる。 (3) In the present embodiment, for labeling, the spikedness (distribution sharpness) of the probability density distribution of the separated signals Y ₁ (f, t) and Y ₂ (f, t) is used. The scale parameter α _i (f) of the Laplace distribution when the separation signal Y _i (f, t) is fitted with the Laplace distribution is used.
According to this method, the calculation amount can be remarkably reduced.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention.

10…ロボット、11…マイク、12…マイクアレイ、20…信号分離装置、30…動力源、100…ロボット、110…外部マイク、111…外部マイク、112…外部マイク、120…内部センサ、200…信号分離装置、210…AD変換部、220…音声認識部、300…雑音抑圧処理部、310…離散フーリエ変換部、320…独立成分分析部、330…利得補正部、340…パーミュテーション解決部、341…スパイクドネス算出部、342…クラスタリング決定部、350…逆離散フーリエ変換部。 10 ... Robot, 11 ... Microphone, 12 ... Microphone array, 20 ... Signal separation device, 30 ... Power source, 100 ... Robot, 110 ... External microphone, 111 ... External microphone, 112 ... External microphone, 120 ... Internal sensor, 200 ... Signal separation device, 210 ... AD conversion unit, 220 ... speech recognition unit, 300 ... noise suppression processing unit, 310 ... discrete Fourier transform unit, 320 ... independent component analysis unit, 330 ... gain correction unit, 340 ... permutation solution unit 341, spikedness calculation unit 342, clustering determination unit 350, inverse discrete Fourier transform unit.

Claims

A signal separation system that separates a time domain observation signal in which a plurality of signals are mixed into each signal using independent component analysis, and extracts a specific user voice from the separated signals,
An external microphone provided to the outside,
An internal sensor that only detects internal noise from internal noise sources present in the system;
A discrete Fourier transform unit for performing a discrete Fourier transform on signals from the external microphone and the internal sensor;
An independent component analyzer that extracts independent separated signals by independent component analysis;
A permutation resolution unit that performs permutation resolution on the result of independent component analysis,
The independent component analysis unit uses a detection signal from the internal sensor so that a specific internal noise separation signal includes only noise from the internal noise source, and adjusts the internal noise separation signal to be independent of the internal noise separation signal. By taking out the separated signal not containing the internal noise,
The permutation resolution unit performs permutation resolution on the separated signal not including the internal noise.

In the signal separation system according to claim 1,
The permutation resolution unit
A spikedness calculating unit that calculates a spikedness that is a kurtosis of a probability density distribution of the separated signal;
A clustering unit that performs user voice or ambient noise labeling on the separated signal based on the spikedness, and
The spikedness calculation unit obtains a scale parameter of a Laplace distribution when the separated signal is fitted with a Laplace distribution as the spikedness.

In the signal separation system according to claim 2,
The spikedness calculating unit
An expected value of an absolute value of the separated signal is used as the maximum likelihood estimated value of the scale parameter.

In the signal separation system according to claim 2 or claim 3,
The spikedness calculating unit
When the separated signal is represented by Y (f, t),
The signal separation system, wherein the scale parameter α _i (f) is obtained by the following equation.

Here, ε _t {| Y (f, t) |} is an average of | Y (f, t) | in a predetermined number of frames.

In the signal separation system according to any one of claims 2 to 4,
The clustering unit uses the separation signal having the largest spikedness as a user voice.