JP5105336B2

JP5105336B2 - Sound source separation apparatus, program and method

Info

Publication number: JP5105336B2
Application number: JP2009282026A
Authority: JP
Inventors: 誠森戸; 隆矢頭; 圭山田; 哲則小林; 健三赤桐; 哲司小川
Original assignee: Waseda University; Oki Electric Industry Co Ltd
Current assignee: Waseda University; Oki Electric Industry Co Ltd
Priority date: 2009-12-11
Filing date: 2009-12-11
Publication date: 2012-12-26
Anticipated expiration: 2029-12-11
Also published as: JP2011123370A

Description

本発明は、音源分離装置、プログラム及び方法に関し、例えば、電話装置や音声認識装置等の音声捕捉における雑音除去に適用し得る。 The present invention relates to a sound source separation device, a program, and a method, and can be applied to noise removal in speech capture such as a telephone device and a speech recognition device.

電話装置や音声認識装置では、マイクロフォンによりユーザ音声を捕捉するが、周囲雑音によって、音声認識の精度は極度に劣化したり、録音した音声が雑音のために聞き取りにくい場合がある。 In a telephone device or a voice recognition device, a user's voice is captured by a microphone, but the accuracy of voice recognition may be extremely deteriorated due to ambient noise, or the recorded voice may be difficult to hear due to noise.

このため、従来は、マイクロフォンアレーにより指向特性を制御する等して、所望の目的音だけを選択的に捕捉する試みがなされているが、このような指向特性の制御だけでは、所望の音声を背景雑音から分離して取り出すことは困難であった。 For this reason, in the past, attempts have been made to selectively capture only a desired target sound by controlling the directional characteristics using a microphone array. It was difficult to separate and extract from background noise.

従来のマイクロフォンアレーの技術としては、例えば、遅延和アレー（ＤＳＡ：ＤｅｌａｙｅｄＳｕｍＡｒｒａｙ）や、ＢＦ（Ｂｅａｍ−Ｆｏｒｍｉｎｇ）と呼ばれる指向特性制御に関する技術、あるいはＤＣＭＰ（ＤｉｒｅｃｔｉｏｎａｌｌｙＣｏｎｓｔｒａｉｎｅｄＭｉｎｉｍｉｚａｔｉｏｎｏｆＰｏｗｅｒ）アダプティブアレーによる指向特性制御に関する技術等がある。 Conventional microphone array technologies include, for example, a delay-and-sum array (DSA), a technology related to directivity control called BF (Beam-Forming), or a DCMP (Directly Constrained Minimization of Power) adaptive array. There are technologies related to characteristic control.

一方、遠隔発話による音声を分離する技術として、複数の固定マイクロフォンの出力信号を狭帯域スペクトル分析し、周波数帯域毎に最も大きな振幅を与えたマイクロフォンにその周波数帯域の音を割り当てる技術（ＳＡＦＩＡと称されている）としては、特許文献１の記載技術がある。特許文献１に記載されている帯域選択（ＢＳ：ＢａｎｄＳｅｌｅｃｔｉｏｎ）による音声の分離技術では、所望の音声を得るために、所望の音声を発する音源に最も近いマイクロフォンを選び、そのマイクロフォンに割り当てられた周波数帯域の音を使って音声を合成する。 On the other hand, as a technology for separating speech by remote utterance, a technology (referred to as SAFIA) that performs narrowband spectrum analysis on the output signals of a plurality of fixed microphones and assigns the sound in that frequency band to the microphone that gives the largest amplitude for each frequency band. Is described in Patent Document 1. In the sound separation technology by band selection (BS) described in Patent Document 1, in order to obtain a desired sound, a microphone closest to a sound source that emits the desired sound is selected and assigned to the microphone. Synthesizes speech using sound in the frequency band.

また、更なる技術として、帯域選択の方法に改良を加えた技術が特許文献２に記載されている。 As a further technique, Patent Document 2 discloses a technique obtained by improving the band selection method.

特許文献２の記載技術では、目的音到来方向と直角または略直角をなす方向に並べて配置された２個のマイクロフォンに入力された信号を用いて、妨害音を抑圧して捕捉対象である目的音を強調した目的音優勢信号と、目的音を抑圧して妨害音を強調した目的音劣勢信号を作成し、その２種類の信号を目的音と妨害音の分離に利用している。 In the technique described in Patent Document 2, the target sound to be captured is suppressed by using the signals input to the two microphones arranged side by side in a direction perpendicular or substantially perpendicular to the direction of arrival of the target sound. And a target sound inferior signal in which the target sound is suppressed and the disturbance sound is emphasized, and the two kinds of signals are used for separation of the target sound and the interference sound.

特許文献２では、目的音優勢信号及び目的音劣勢信号の生成について「空間フィルタ」と呼ばれるフィルタを用いて実現している。 In Patent Document 2, the generation of the target sound superior signal and the target sound inferior signal is realized by using a filter called “spatial filter”.

図３は、空間フィルタの特性について示した説明図である。 FIG. 3 is an explanatory diagram showing the characteristics of the spatial filter.

以下では、２つのマイクロフォンＭ１、Ｍ２を結ぶ線に対する垂直平面を０度の方向と呼び、時計回りの方向を正の角度、反時計回りの方向を負の角度として方向を表すものとする。すなわち、上述の方向は−１８０度〜１８０度（−１８０度と１８０度は同じ方向）の範囲で表される。 Hereinafter, a vertical plane with respect to a line connecting the two microphones M1 and M2 is referred to as a 0 degree direction, and the clockwise direction is a positive angle and the counterclockwise direction is a negative angle. That is, the above-described direction is expressed in a range of −180 degrees to 180 degrees (−180 degrees and 180 degrees are the same direction).

図３では、間隔ｄで配置された２つのマイクロフォンＭ１、Ｍ２に対して角度θの方向から入力される音源があった場合について説明している。この場合、角度θの方向から入力される音源から２つのマイクロフォンＭ１、Ｍ２への距離で、ｄ×ｓｉｎθの距離差が生じ、結果として音の到達時間については、マイクロフォンＭ１、Ｍ２の間で、以下の（１）式で表される時間差τが生じる。

FIG. 3 illustrates a case where there is a sound source that is input from the direction of the angle θ with respect to the two microphones M1 and M2 arranged at the interval d. In this case, a distance difference of d × sin θ occurs in the distance from the sound source input from the direction of the angle θ to the two microphones M1 and M2, and as a result, the sound arrival time is between the microphones M1 and M2. A time difference τ expressed by the following equation (1) is generated.

そこで、マイクロフォンＭ２の出力からマイクロフォンＭ１の出力を時間差τ遅延させた出力を減じると、お互いが相殺されθ方向の音は抑圧される。以下では、空間フィルタにおいて音を抑圧する方向の角度（上述の例ではθ）を「抑圧角度」と呼ぶ。 Therefore, when the output obtained by delaying the output of the microphone M1 by the time difference τ is subtracted from the output of the microphone M2, they cancel each other and the sound in the θ direction is suppressed. Hereinafter, an angle (θ in the above example) in a direction in which sound is suppressed in the spatial filter is referred to as a “suppression angle”.

図４は、空間フィルタにおける指向特性について示した説明図である。 FIG. 4 is an explanatory diagram showing directivity characteristics in the spatial filter.

図４において、曲線Ｌは、空間フィルタの抑圧角度をθに設定した場合の指向特性を表しており、マイクロフォンＭ１、Ｍ２を結ぶ線の中点からの距離が長くなっている方向ほど、利得が大きく（抑圧の強度が弱い）、距離が短いほど利得が小さい（抑圧の強度が強い）ことを示している。 In FIG. 4, a curve L represents the directivity when the suppression angle of the spatial filter is set to θ, and the gain increases as the distance from the midpoint of the line connecting the microphones M1 and M2 increases. It indicates that the gain is small (the suppression intensity is strong) as it is large (the suppression intensity is weak) and the distance is short.

図４では、空間フィルタの抑圧角度をθの方向に設定しているため、その方向の抑圧の強度が最も大きくなるように設定されている様子を示している。 In FIG. 4, since the suppression angle of the spatial filter is set in the direction of θ, the state in which the intensity of suppression in that direction is set to the maximum is shown.

特開平１０−３１３４９７号公報Japanese Patent Laid-Open No. 10-313497 特開２００６−１９７５５２号公報JP 2006-197552 A

しかしながら、特許文献１の記載技術では、２つの音が重なった状況において、よく両者を分離することができるが、音源が３つ以上となると、理論的には分離可能とされているものの、分離性能は極端に劣化する。従って、複数の雑音源が存在する状況下で、これらの複数の雑音から目的音を精度よく分離することは困難である。 However, in the technique described in Patent Document 1, the two sounds can be separated well in a situation where two sounds overlap. However, if there are three or more sound sources, the separation is theoretically possible. Performance is extremely degraded. Therefore, it is difficult to accurately separate the target sound from the plurality of noises in a situation where there are a plurality of noise sources.

また、特許文献２の記載技術では、空間フィルタを用いて、目的音を分離する処理を行っているが、分離する処理をしている途中で目的音の到来方向がずれた場合に、空間フィルタの特性が、分離後の目的音の品質に影響を及ぼす恐れがある。以下、特許文献２に記載の方法において、空間フィルタの特性が、分離後の目的音に及ぼす恐れがある影響について説明する。 In the technique described in Patent Document 2, the target sound is separated using the spatial filter. However, when the direction of arrival of the target sound is shifted during the separation process, the spatial filter is used. May affect the quality of the target sound after separation. Hereinafter, in the method described in Patent Document 2, the influence that the characteristics of the spatial filter may have on the target sound after separation will be described.

図５は、空間フィルタにおける抑圧角度に近い方向の利得の変化特性について示した説明図である。 FIG. 5 is an explanatory diagram showing the change characteristic of the gain in the direction close to the suppression angle in the spatial filter.

図５では、空間フィルタの抑圧角度をθとし、目的音が０度の方向（正面）から到来した場合の利得をＧ１、０度から反時計回りに僅かにずれた方向から目的音が到来した場合の利得をＧ２として説明している。 In FIG. 5, the suppression angle of the spatial filter is θ, and the target sound arrives from a direction slightly shifted counterclockwise from G1 when the target sound arrives from the direction of 0 degrees (front). The case gain is described as G2.

空間フィルタにおいて、抑圧角度の近くで、角度の変化に応じた利得の変化率が大きい場合には、図５に示すように、利得がＧ１となる方向と、Ｇ２となる方向との角度のずれが僅かであっても、Ｇ１とＧ２の差は大きくなってしまう恐れがある。 In the spatial filter, when the rate of change in gain according to the change in angle is large near the suppression angle, as shown in FIG. 5, the deviation of the angle between the direction in which the gain is G1 and the direction in which it is G2 Even if it is slight, the difference between G1 and G2 may become large.

上述の特許文献２に記載されている目的音劣勢信号生成手段では、目的音が到来すると想定される方向に、空間フィルタの抑圧角度を向けて、目的音成分を抑圧するとともに、妨害音成分を抽出しているが、上述のように、目的音と妨害音とを分離する処理をしている途中で目的音の到来方向がずれると、僅かなずれであっても、出力音に大きなゆれを生じる結果となる恐れがある。 In the target sound inferior signal generating means described in Patent Document 2 described above, the target sound component is suppressed by directing the suppression angle of the spatial filter in the direction in which the target sound is expected to arrive, and the interference sound component is reduced. As described above, if the direction of arrival of the target sound is shifted during the process of separating the target sound and the interference sound as described above, even if there is a slight shift, the output sound will be greatly shaken. May result.

そのため、目的音と、目的音の到来方向以外の任意の方向から到来する妨害音とを分離する処理において、目的音の到来方向がずれた場合でも、分離処理後の音の品質を保つことができる音源分離装置、プログラム及び方法が望まれている。 Therefore, in the process of separating the target sound and the interfering sound coming from an arbitrary direction other than the direction of arrival of the target sound, the quality of the sound after the separation process can be maintained even when the direction of arrival of the target sound is deviated. A sound source separation device, a program, and a method that can be used are desired.

第１の本発明の音源分離装置は、（１）間隔を置いて配置された複数個のマイクロフォンのうち、２個のマイクロフォンの受音信号のスペクトルについて、目的音が到来すると想定される想定到来方向を含む所定の範囲内で、それぞれ異なる方向に、成分抑圧の指向性を向けて処理する複数の目的音抑圧部を用いて、上記受音信号のスペクトルから、上記目的音の成分を抑圧した目的音抑圧スペクトルを生成する目的音抑圧スペクトル生成手段と、（２）上記受音信号のスペクトルについて、上記所定の範囲以外の任意の方向から到来する妨害音を抑圧した目的音優勢スペクトルを生成する目的音優勢スペクトル生成手段と、（３）目的音抑圧スペクトルと、目的音優勢スペクトルとを用いて、上記受音信号について、上記妨害音の成分と上記目的音の成分とを分離する分離手段とを有し、（４）上記目的音抑圧スペクトル生成手段は、目的音抑圧スペクトルの各成分について、上記目的音抑圧部の処理結果のうち最も絶対値の小さい値を適用することを特徴とする。 The sound source separation apparatus according to the first aspect of the present invention is (1) an assumed arrival that a target sound is assumed to arrive in the spectrum of the received signal of two microphones among a plurality of microphones arranged at intervals. The target sound component is suppressed from the spectrum of the received signal using a plurality of target sound suppression units that process the directivity of component suppression in different directions within a predetermined range including the direction. A target sound suppression spectrum generating means for generating a target sound suppression spectrum; and (2) generating a target sound dominant spectrum in which interference sound coming from an arbitrary direction other than the predetermined range is suppressed for the spectrum of the received signal. Using the target sound dominant spectrum generating means, (3) the target sound suppression spectrum, and the target sound dominant spectrum, Have a separating means for separating the components of the target sound, (4) the target sound suppression spectrum generating unit for each component of the target sound suppressed spectrum, the most absolute value of the processing result of the target sound suppressing unit It is characterized by applying a small value .

第２の本発明の音源分離プログラムは、音源分離装置に搭載されたコンピュータを、（１）間隔を置いて配置された複数個のマイクロフォンのうち、２個のマイクロフォンの受音信号のスペクトルについて、目的音が到来すると想定される想定到来方向を含む所定の範囲内で、それぞれ異なる方向に、成分抑圧の指向性を向けて処理する複数の目的音抑圧部を用いて、上記受音信号のスペクトルから、上記目的音の成分を抑圧した目的音抑圧スペクトルを生成する目的音抑圧スペクトル生成手段と、（２）上記受音信号のスペクトルについて、上記所定の範囲以外の任意の方向から到来する妨害音を抑圧した目的音優勢スペクトルを生成する目的音優勢スペクトル生成手段と、（３）目的音抑圧スペクトルと、目的音優勢スペクトルとを用いて、上記受音信号について、上記妨害音の成分と上記目的音の成分とを分離する分離手段として機能させ、（４）上記目的音抑圧スペクトル生成手段は、目的音抑圧スペクトルの各成分について、上記目的音抑圧部の処理結果のうち最も絶対値の小さい値を適用することを特徴とする。 The sound source separation program according to the second aspect of the present invention provides a computer mounted on a sound source separation device, for (1) a spectrum of received sound signals of two microphones among a plurality of microphones arranged at intervals. A spectrum of the received signal using a plurality of target sound suppression units that process the directivity of component suppression in different directions within a predetermined range including an assumed direction of arrival where the target sound is expected to arrive. And (2) interference sound coming from an arbitrary direction other than the predetermined range for the spectrum of the received signal. A target sound dominance spectrum generating means for generating a target sound dominance spectrum with suppressed sound, and (3) a target sound suppression spectrum and a target sound dominance spectrum. For the received sound signal, to function as a separating means for separating the components of the component and the target sound of the interference sound, (4) the target sound suppression spectrum generating unit for each component of the target sound suppression spectrum, the A value having the smallest absolute value among the processing results of the target sound suppressing unit is applied .

第３の本発明は、音源分離装置により行われる音源分離方法において、（１）目的音抑圧スペクトル生成手段、目的音優勢スペクトル生成手段、分離手段を有し、（２）上記目的音抑圧スペクトル生成手段は、間隔を置いて配置された複数個のマイクロフォンのうち、２個のマイクロフォンの受音信号のスペクトルについて、目的音が到来すると想定される想定到来方向を含む所定の範囲内で、それぞれ異なる方向に、成分抑圧の指向性を向けて処理する複数の目的音抑圧部を用いて、上記受音信号のスペクトルから、上記目的音の成分を抑圧した目的音抑圧スペクトルを生成し、（３）上記目的音優勢スペクトル生成手段は、上記受音信号のスペクトルについて、上記所定の範囲以外の任意の方向から到来する妨害音を抑圧した目的音優勢スペクトルを生成し、（４）上記分離手段は、目的音抑圧スペクトルと、目的音優勢スペクトルとを用いて、上記受音信号について、上記妨害音の成分と上記目的音の成分とを分離し、（５）上記目的音抑圧スペクトル生成手段は、目的音抑圧スペクトルの各成分について、上記目的音抑圧部の処理結果のうち最も絶対値の小さい値を適用することを特徴とする。
According to a third aspect of the present invention, in the sound source separation method performed by the sound source separation device, (1) a target sound suppression spectrum generation unit, a target sound dominant spectrum generation unit, and a separation unit are provided, and (2) the target sound suppression spectrum generation is performed. The means is different from each other within a predetermined range including an assumed arrival direction in which the target sound is expected to be received, with respect to the spectrum of the received signal of two microphones among a plurality of microphones arranged at intervals. A target sound suppression spectrum in which the target sound component is suppressed is generated from the spectrum of the received signal using a plurality of target sound suppression units that process the directivity of component suppression in the direction, (3) The target sound dominance spectrum generating means suppresses the interference sound coming from an arbitrary direction other than the predetermined range in the spectrum of the received signal. Generates a spectrum, (4) the separating means uses a target sound suppressed spectrum, the target sound predominant spectrum for the received sound signal, to separate the components of the component and the target sound of the interference sound, (5) The target sound suppression spectrum generating means applies a value having the smallest absolute value among the processing results of the target sound suppression unit to each component of the target sound suppression spectrum .

本発明によれば、目的音と、目的音の到来方向以外の任意の方向から到来する妨害音とを分離する処理において、目的音の到来方向がずれた場合でも、分離処理後の音の品質を保つことができる。 According to the present invention, in the process of separating the target sound and the disturbing sound coming from an arbitrary direction other than the direction of arrival of the target sound, the quality of the sound after the separation process even when the direction of arrival of the target sound is deviated. Can keep.

第１の実施形態に係る音源分離装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound source separation apparatus which concerns on 1st Embodiment. 第２の実施形態に係る音源分離装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound source separation apparatus which concerns on 2nd Embodiment. 従来の空間フィルタの特性について示した説明図である。It is explanatory drawing shown about the characteristic of the conventional spatial filter. 従来の空間フィルタにおける指向特性について示した説明図である。It is explanatory drawing shown about the directional characteristic in the conventional spatial filter. 従来の空間フィルタにおける抑圧角度に近い方向の利得の変化特性について示した説明図である。It is explanatory drawing shown about the change characteristic of the gain of the direction close | similar to the suppression angle in the conventional spatial filter.

（Ａ）第１の実施形態
以下、本発明による音源分離装置、プログラム及び方法の第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of a sound source separation device, program, and method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）第１の実施形態の構成及び動作
図１は、第１の実施形態の音源分離装置１０の機能的構成について示したブロック図である。 (A-1) Configuration and Operation of First Embodiment FIG. 1 is a block diagram showing a functional configuration of a sound source separation device 10 of the first embodiment.

音源分離装置１０は、目的音と、目的音の到来方向以外の任意の方向から到来する妨害音とを分離するものである。音源分離装置１０の用途は限定されるものではないが、例えば、音声認識装置や、携帯電話などの電話装置に搭載して、音声捕捉に用いるようにしても良い。具体的には、例えば、音源分離装置１０を電話会議装置に搭載して、遠隔発話を行う複数の話者による混合音声から任意の話者の音声を目的音として分離したり、遠隔発話を行う話者の音声とその他の音との混合音から話者の音声を目的音として分離したりすることに用いるようにしても良い。また、例えば、音声対話を行うロボット、カーナビゲーションシステム等の車載機器についての音声操作、会議の議事録作成等の音声認識において、目的音となるユーザの音声の分離に用いるようにしても良い。 The sound source separation device 10 separates the target sound and the disturbing sound coming from an arbitrary direction other than the arrival direction of the target sound. The use of the sound source separation device 10 is not limited. For example, the sound source separation device 10 may be mounted on a voice recognition device or a telephone device such as a mobile phone and used for voice capture. Specifically, for example, the sound source separation device 10 is installed in a teleconference device, and a voice of an arbitrary speaker is separated as a target sound from a mixed voice of a plurality of speakers performing remote speech, or remote speech is performed. It may be used to separate the speaker's voice as the target sound from the mixed sound of the speaker's voice and other sounds. Further, for example, it may be used for separation of a user's voice, which is a target sound, in voice recognition for in-vehicle devices such as a robot that performs a voice dialogue, a car navigation system, and a meeting minutes.

音源分離装置１０は、大きくは、入力手段２０、分析手段３０、分離手段４０、除去手段５０、生成手段６０を有する。 The sound source separation device 10 mainly includes an input unit 20, an analysis unit 30, a separation unit 40, a removal unit 50, and a generation unit 60.

音源分離装置１０は、マイクロフォン等のハードウェア以外の構成要素に関しては、プロセッサ（ＣＰＵ等）を有する装置に、実施形態の音源分離プログラムをインストールすることにより実現するようにしても良いし、一部又は全部の構成要素について、専用のハードウェア（例えば、半導体チップ）を用いて実現するようにしても良い。 The sound source separation device 10 may be realized by installing the sound source separation program of the embodiment in a device having a processor (CPU or the like) regarding components other than hardware such as a microphone, or a part thereof. Alternatively, all the components may be realized using dedicated hardware (for example, a semiconductor chip).

入力手段２０は、間隔を置いて配置された２個のマイクロフォン２１、２２と、これらの２個のマイクロフォン２１、２２の受音信号をアナログ/ディジタル信号変換器（図示せず）を用いてディジタル信号に変換し、そのディジタル信号を分析手段３０に与える。 The input means 20 digitally converts two microphones 21 and 22 arranged at intervals, and the received signals of these two microphones 21 and 22 using an analog / digital signal converter (not shown). The signal is converted into a signal, and the digital signal is given to the analyzing means 30.

以下では、上述の図３〜図５と同様に、２つのマイクロフォン２１、２２を結ぶ線に対する垂直平面を０度の方向と呼び、時計回りの方向を正の角度、反時計回りの方向を負の角度として方向を表すものとする。すなわち、上述の方向は−１８０度〜１８０度（−１８０度と１８０度とは同じ方向）の範囲で表される。 In the following, as in FIGS. 3 to 5 described above, the vertical plane with respect to the line connecting the two microphones 21 and 22 is referred to as 0 degree direction, the clockwise direction is a positive angle, and the counterclockwise direction is negative. The direction is expressed as an angle. That is, the above-described direction is expressed in a range of −180 degrees to 180 degrees (−180 degrees and 180 degrees are the same direction).

また、以下では、例として、音源分離装置１０は、目的音が概ね０度の方向から到来することを想定した構成として説明する。 Further, hereinafter, as an example, the sound source separation device 10 will be described as a configuration assuming that the target sound comes from a direction of approximately 0 degrees.

以下の説明においては、マイク２１から出力されるディジタル音声信号をｘ１（ｎ）とする。また、同様にマイク２２から出力されるディジタル音声信号をｘ２（ｎ）とする。但し、ｎは、ｎ番目のデータ（サンプル）を表すものとする。 In the following description, the digital audio signal output from the microphone 21 is assumed to be x1 (n). Similarly, let the digital audio signal output from the microphone 22 be x2 (n). However, n represents the nth data (sample).

ディジタル音声信号ｘ１（ｎ）、ｘ２（ｎ）は、例えば、マイクロフォンなどの音声入力装置から入力されたアナログ音声信号を、アナログ／ディジタル変換し、標本化周期Ｔ毎に標本化することにより得られるものである。標本化周期Ｔは、例えば、３１．２５マイクロ秒〜１２５マイクロ秒程度とすることが望ましい。 The digital audio signals x1 (n) and x2 (n) are obtained, for example, by analog / digital conversion of an analog audio signal input from an audio input device such as a microphone and sampling at every sampling period T. Is. It is desirable that the sampling period T is, for example, about 31.25 microseconds to 125 microseconds.

同一時間区間における、Ｎ個の連続するｘ１（ｎ）、ｘ２（ｎ）を１つの分析単位（フレーム）として、後述する分析手段３０、分離手段４０、除去手段５０、生成手段６０の処理は行われるものとする。 With the N consecutive x1 (n) and x2 (n) as one analysis unit (frame) in the same time interval, the analysis means 30, separation means 40, removal means 50, and generation means 60 described later are performed. Shall be.

以下の説明において、音源分離装置１０では、例としてＮ＝１０２４とする。そして、音源分離装置１０では、処理対象分析単位に対する当該音源分離の一連の処理が終了すると、ｘ１（ｎ）、ｘ２（ｎ）のうち後半の３Ｎ／４個のデータを前半にシフトし、新たに入力された連続するＮ／４個のデータを後半に接続するものとする。これにより、音源分離装置１０では、新たなＮ個の連続するｘ１（ｎ）、ｘ２（ｎ）を生成し、１つの分析単位として新たな処理を行うものとする。音源分離装置１０では、このような処理対象分析単位の処理を繰り返すようになされているものとする。 In the following description, the sound source separation apparatus 10 assumes N = 1024 as an example. When the sound source separation apparatus 10 completes a series of the sound source separation processes for the processing target analysis unit, 3N / 4 of the latter half of x1 (n) and x2 (n) are shifted to the first half, and a new It is assumed that N / 4 continuous data input to the terminal is connected to the latter half. Thus, the sound source separation apparatus 10 generates new N consecutive x1 (n) and x2 (n), and performs a new process as one analysis unit. It is assumed that the sound source separation apparatus 10 repeats such processing for the processing target analysis unit.

なお、分析手段３０に入力されるディジタル音声信号はマイクロフォンが捕捉してアナログ／ディジタル変換されたものに限定されない。例えば、記録媒体などから読み出されたものであっても良いし、他の装置から通信によって与えられたものであっても良い。すなわち、音源分離装置１０において、ｘ１（ｎ）、ｘ２（ｎ）が保持できれば、入力手段２０を省略した構成としても良い。 Note that the digital audio signal input to the analyzing means 30 is not limited to the one obtained by the microphone and analog / digital converted. For example, it may be read from a recording medium or the like, or may be given by communication from another device. That is, in the sound source separation device 10, the input unit 20 may be omitted as long as x1 (n) and x2 (n) can be held.

分析手段３０は、雑音の混在したディジタル音声信号ｘ１（ｎ）、ｘ２（ｎ）が、入力手段２０から与えられると、ｘ１（ｎ）を周波数分析部３１で、ｘ２（ｎ）を周波数分析部３２で、それぞれＦＦＴ（高速フーリエ変換）処理等を行い、その結果を分離手段４０に与える。分析手段３０では、ＦＦＴ処理にあたっては、Ｎ個の連続するｘ１（ｎ）、ｘ２（ｎ）に対し、窓関数をかけるものとする。なお、窓関数ｗ（ｎ）としては、各種の窓関数を適用可能であるが、例えば、以下の（２）式に示すようなハニング窓を適用するようにしても良い。

When the digital voice signals x1 (n) and x2 (n) mixed with noise are given from the input means 20, the analyzing means 30 is supplied with x1 (n) by the frequency analyzing section 31 and x2 (n) by the frequency analyzing section. At 32, FFT (fast Fourier transform) processing or the like is performed, and the result is given to the separating means 40. In the analysis unit 30, in the FFT processing, a window function is applied to N consecutive x1 (n) and x2 (n). Various window functions can be applied as the window function w (n). For example, a Hanning window as shown in the following equation (2) may be applied.

分析手段３０による上述の窓処理は、後述する生成手段６０における分析単位の接続処理を考慮してなされる処理である。ただし、上述の窓関数を適用することは好ましいが、必須ではない。 The window processing described above by the analysis unit 30 is a process performed in consideration of the analysis unit connection processing in the generation unit 60 described later. However, although it is preferable to apply the above window function, it is not essential.

以下では、周波数分析部３１、３２の出力を、それぞれＤ１（ｍ）、Ｄ２（ｍ）と表すものとする。なお、Ｄ１（ｍ）、Ｄ２（ｍ）は複素数である。 Below, the output of the frequency analysis parts 31 and 32 shall be represented as D1 (m) and D2 (m), respectively. D1 (m) and D2 (m) are complex numbers.

なお、分析手段３０における分析方法は、ＦＦＴに限定されず、ＤＦＴ（離散フーリエ変換）などの他の周波数分析方法を適用するようにしても良い。 The analysis method in the analysis means 30 is not limited to FFT, and other frequency analysis methods such as DFT (Discrete Fourier Transform) may be applied.

また、音源分離装置１０が搭載される装置によっては、他の目的の処理装置における分析に関する構成を、この音源分離装置１０の構成として流用するようにしても良い。例えば、当該音源分離装置１０が搭載される装置がＩＰ電話装置の場合には、このような流用が可能である。ＩＰ電話装置の場合、ＩＰパケットのペイロードにはＦＦＴ出力を符号化したものを挿入するが、そのＦＦＴ出力を、上述した分析手段３０の出力として流用することができる。 Further, depending on the device on which the sound source separation device 10 is mounted, the configuration relating to the analysis in the other purpose processing device may be used as the configuration of the sound source separation device 10. For example, when the device on which the sound source separation device 10 is mounted is an IP telephone device, such diversion is possible. In the case of an IP telephone device, an encoded FFT output is inserted into the payload of an IP packet, and the FFT output can be used as the output of the analyzing means 30 described above.

また、後述する分離手段４０の処理では、スペクトルＤ（ｍ）の性質Ｄ（ｍ）＝Ｄ＊（Ｎ−ｍ）（ただし、１≦ｍ≦Ｎ／２−１、Ｄ＊（Ｎ−ｍ）はＤ（Ｎ−ｍ）の共役複素数を表す）から、０≦ｍ≦Ｎ／２の範囲で行えば良い。 Further, in the process of the separating means 40 described later, the property D (m) = D * (N−m) of the spectrum D (m) (where 1 ≦ m ≦ N / 2-1, D * (N−m)). Represents a conjugate complex number of D (N−m)) to 0 ≦ m ≦ N / 2.

分離手段４０は、妨害音抑圧部４１及び目的音抑圧部４２を有している。 The separating means 40 includes a disturbance sound suppressing unit 41 and a target sound suppressing unit 42.

妨害音抑圧部４１は、Ｄ１（ｍ）、Ｄ２（ｍ）を利用して、妨害音の成分を抑圧し、目的音の成分が強調されたスペクトルを生成する。そして、目的音抑圧部４２は、Ｄ１（ｍ）、Ｄ２（ｍ）を利用して、目的音の成分を抑圧し、妨害音の成分が強調されたスペクトルを生成する。 The interfering sound suppressing unit 41 uses D1 (m) and D2 (m) to suppress the interfering sound component and generate a spectrum in which the target sound component is emphasized. Then, the target sound suppression unit 42 uses D1 (m) and D2 (m) to suppress the target sound component and generate a spectrum in which the disturbing sound component is emphasized.

次に、妨害音抑圧部４１の構成について説明する。 Next, the configuration of the interference sound suppression unit 41 will be described.

妨害音抑圧部４１は、２つの空間フィルタ４１１、４１２及び最小選択部４１３を有している。 The interference sound suppression unit 41 includes two spatial filters 411 and 412 and a minimum selection unit 413.

空間フィルタ４１１、４１２の抑圧角度は、それぞれ、９０度、−９０度に設定されているものとする。 It is assumed that the suppression angles of the spatial filters 411 and 412 are set to 90 degrees and −90 degrees, respectively.

上述の通り、音源分離装置１０では、目的音は、概ね０度の方向から到来することが想定されているため、妨害音抑圧部４１では、目的音が到来する方向とは異なる方向に、空間フィルタの抑圧角度を向けているが、目的音が到来すると想定される方向に応じて、空間フィルタの数や抑圧角度の組み合わせを変更するようにしても良い。 As described above, since the target sound is assumed to arrive from a direction of approximately 0 degrees in the sound source separation device 10, the interference sound suppression unit 41 has a space in a direction different from the direction in which the target sound arrives. Although the suppression angle of the filter is directed, the number of spatial filters and the combination of suppression angles may be changed according to the direction in which the target sound is expected to arrive.

空間フィルタ４１１の具体的な処理としては、以下の（３）式を用いて、Ｅ１（ｍ）を求める。また、空間フィルタ４１２は、以下の（４）式を用いて、Ｅ２（ｍ）を求める。以下の（３）式、（４）式において、ｆはサンプリング周波数であり、例えば、１６００Ｈｚを適用するようにしても良い。 As a specific process of the spatial filter 411, E1 (m) is obtained using the following equation (3). The spatial filter 412 calculates E2 (m) using the following equation (4). In the following formulas (3) and (4), f is a sampling frequency, and for example, 1600 Hz may be applied.

そして、最小選択部４１３は、以下の（５）式に示すように、空間フィルタ４１１の出力Ｅ１（ｍ）と空間フィルタ４１２の出力Ｅ２（ｍ）の絶対値の最小値Ｍ（ｍ）を、算出してＭ（ｍ）を求める。この出力Ｍ（ｍ）は、目的音の成分を抽出したものとして、最小選択部４１３から除去手段５０に与えられる。

Then, as shown in the following equation (5), the minimum selection unit 413 determines the absolute value M (m) of the absolute value of the output E1 (m) of the spatial filter 411 and the output E2 (m) of the spatial filter 412 as follows: Calculate M (m). The output M (m) is given from the minimum selection unit 413 to the removing unit 50 as an extracted target sound component.

次に、目的音抑圧部４２の構成について説明する。 Next, the configuration of the target sound suppression unit 42 will be described.

目的音抑圧部４２は、３つの空間フィルタ４２１、４２２、４２３及び最小選択部４２４を有している。 The target sound suppression unit 42 includes three spatial filters 421, 422, and 423 and a minimum selection unit 424.

空間フィルタ４２１、４２２、４２３の抑圧角度は、それぞれ、０度、５度、−５度の方向に設定されているものとする。 It is assumed that the suppression angles of the spatial filters 421, 422, and 423 are set in directions of 0 degrees, 5 degrees, and -5 degrees, respectively.

上述の通り、音源分離装置１０では、目的音は、概ね０度の方向から到来することが想定されているため、目的音抑圧部４２では、空間フィルタ４２１の抑圧角度を０度に設定し、０度の方向から、わずか（±５度程度）にずらした方向に、空間フィルタ４２２及び空間フィルタ４２３の抑圧角度を設定している。音源分離装置１０では、上述の例のように、目的音が到来すると想定される方向を中心として、左右対称の対になるように空間フィルタの抑圧角度を設定することが望ましい。 As described above, in the sound source separation device 10, since the target sound is assumed to come from a direction of approximately 0 degrees, the target sound suppression unit 42 sets the suppression angle of the spatial filter 421 to 0 degrees, The suppression angles of the spatial filter 422 and the spatial filter 423 are set in a direction slightly shifted (about ± 5 degrees) from the 0 degree direction. In the sound source separation device 10, it is desirable to set the suppression angle of the spatial filter so as to form a symmetric pair around the direction in which the target sound is expected to arrive, as in the above example.

目的音抑圧部４２では、３つの空間フィルタを用いているが、目的音が到来すると想定される方向（音源分離装置１０では０度）を含む所定の範囲内（音源分離装置１０では−５度〜＋５度の範囲内）で、複数の空間フィルタにより、それぞれ異なる抑圧角度が向けられていれば、空間フィルタの数やその抑圧角度の組み合わせは限定されないものである。 The target sound suppression unit 42 uses three spatial filters, but within a predetermined range including the direction in which the target sound is expected to arrive (0 degree in the sound source separation apparatus 10) (-5 degrees in the sound source separation apparatus 10). As long as different suppression angles are directed by a plurality of spatial filters, the number of spatial filters and combinations of the suppression angles are not limited.

空間フィルタ４２１具体的な処理としては、以下の（６）式を用いて、Ｆ０（ｍ）を求める。 As specific processing of the spatial filter 421, F0 (m) is obtained using the following equation (6).

空間フィルタ４２２は、以下の（７）式を用いて、Ｆ１（ｍ）を求める。なお、（７）式において、τ₅は抑圧角度＝＋５度に相当する遅延である。 The spatial filter 422 calculates F1 (m) using the following equation (7). In equation (7), τ ₅ is a delay corresponding to the suppression angle = + 5 degrees.

空間フィルタ４２３は、以下の（８）式を用いて、Ｆ２（ｍ）を求める。なお、（８）式において、τ_-5は抑圧角度＝−５度に相当する遅延である。 The spatial filter 423 calculates F2 (m) using the following equation (8). In the equation (8), τ ₋₅ is a delay corresponding to the suppression angle = −5 degrees.

そして、最小選択部４２４は、以下の（９）式に示すように、Ｆ０（ｍ）、Ｆ１（ｍ）、Ｆ２（ｍ）の絶対値の最小値Ｎ（ｍ）を算出する。この出力Ｎ（ｍ）は、妨害音の成分を抽出したものとして、最小選択部４２４から除去手段５０に与えられる。

And the minimum selection part 424 calculates the minimum value N (m) of the absolute value of F0 (m), F1 (m), and F2 (m), as shown to the following (9) Formula. This output N (m) is provided from the minimum selection unit 424 to the removing means 50 as an extracted component of the interference sound.

次に、除去手段５０の構成について説明する。 Next, the configuration of the removing unit 50 will be described.

除去手段５０は、分離手段４０から与えられるＭ（ｍ）とＮ（ｍ）とを用いて、Ｄ１（ｍ）における、妨害音を除去するための妨害音除去スペクトルＨ（ｍ）を求めて、生成手段６０に与える。 The removing means 50 uses M (m) and N (m) given from the separating means 40 to obtain a disturbing sound removal spectrum H (m) for removing the disturbing sound in D1 (m). This is given to the generating means 60.

以下では、除去手段５０が求める妨害音除去スペクトルＨ（ｍ）の一例について説明する。 Hereinafter, an example of the interference noise removal spectrum H (m) required by the removal unit 50 will be described.

除去手段５０では最小選択部４１３の出力Ｍ（ｍ）と最小選択部４２４の出力Ｎ（ｍ）から、以下の（１０）式を用いて、Ｓ（ｍ）を求める。さらに、除去手段５０は、０≦ｍ≦Ｎ／２の範囲で求められたＳ（ｍ）に対して、以下の（１１）式を用いて、除去手段５０の出力である妨害音除去スペクトルＨ（ｍ）を求める。なお、（１０）式、（１１）式において、Ｄ１をＤ２に置き換えるようにしても良い。

The removing unit 50 obtains S (m) from the output M (m) of the minimum selection unit 413 and the output N (m) of the minimum selection unit 424 using the following equation (10). Further, the removal means 50 uses the following equation (11) for S (m) obtained in the range of 0 ≦ m ≦ N / 2, and the interference noise removal spectrum H that is the output of the removal means 50: Find (m). Note that D1 may be replaced with D2 in the expressions (10) and (11).

Ｈ（ｍ）＝Ｓ（ｍ）Ｄ１（ｍ） …（１１）
また、除去手段５０は、Ｈ（ｍ）＝Ｈ＊（Ｎ−ｍ）（ただし、Ｎ／２＋１≦ｍ≦Ｎ−１）の性質を利用して、０≦ｍ≦Ｎ−１の範囲の妨害音除去スペクトルＨ（ｍ）を求め、生成手段６０に与える。 H (m) = S (m) D1 (m) (11)
Further, the removing means 50 uses the property of H (m) = H * (N−m) (where N / 2 + 1 ≦ m ≦ N−1) and obstructs in the range of 0 ≦ m ≦ N−1. A sound removal spectrum H (m) is obtained and given to the generation means 60.

生成手段６０は、妨害音除去スペクトルＨ（ｍ）をＮ点逆ＦＦＴ処理し、音源分離信号ｈ（ｎ）を得る。そして、生成手段６０は、以下の（１２）式に示すように、現在の音源分離信号ｈ（ｎ）と、直前の分析単位についての音源分離信号ｈ’（ｎ）の後半の３Ｎ／４個のデータを加算して、出力ｙ（ｎ）を得る。 The generation unit 60 performs N-point inverse FFT processing on the interference noise removal spectrum H (m) to obtain a sound source separation signal h (n). Then, as shown in the following equation (12), the generation unit 60 generates 3N / 4 signals in the latter half of the current sound source separation signal h (n) and the sound source separation signal h ′ (n) for the immediately preceding analysis unit. Are added to obtain an output y (n).

ｙ（ｎ）＝ｈ（ｎ）＋ｈ’（ｎ＋Ｎ／４） …（１２）
音源分離装置１０では、相前後する分析単位でデータ（サンプル）を重複させるように、Ｎ／４個のデータをシフトしながら、上述した処理を行う例について説明したが、これは、波形接続を円滑に行うためであるため、必ずしも必要な処理ではなく、Ｎ個ずつ処理するようにしても良い。なお、Ｎ／４個のデータをシフトしながら処理する場合、１つの分析単位に対し、分析手段３０から当該生成手段６０までの上述した一連の処理に要する時間は、ＮＴ／４を上限とすることが望ましい。 y (n) = h (n) + h ′ (n + N / 4) (12)
In the sound source separation apparatus 10, the example in which the above-described processing is performed while shifting N / 4 data so that data (samples) are overlapped in successive analysis units has been described. Since it is for smooth execution, it is not always necessary, and N pieces may be processed. In the case of processing while shifting N / 4 data, the time required for the above-described series of processing from the analysis unit 30 to the generation unit 60 is limited to NT / 4 for one analysis unit. It is desirable.

（Ａ−２）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-2) Effects of First Embodiment According to the first embodiment, the following effects can be achieved.

音源分離装置１０では、目的音抑圧部４２の、３つの空間フィルタに、それぞれ０度、５度、−５度の指向性を持たせており、最小選択部４２４では、３つの空間フィルタの出力値のうち最も小さい絶対値となる出力値をＮ（ｍ）に適用している。すなわち、目的音抑圧部４２では、目的音が０度方向の近辺から到来した場合には、０度方向の近辺の成分については、空間フィルタ４２１の出力値の絶対値が最も小さいため、これがＮ（ｍ）に反映される。一方、５度方向の近辺のから目的音が到来した場合には、５度方向の近辺の成分については、空間フィルタ４２２の出力値がＮ（ｍ）に反映されることになる。このように、目的音抑圧部４２では、目的音の到来方向に応じて、選択して適用される空間フィルタ群を設けることにより、目的音が到来する方向が少しずれた場合でも、Ｎ（ｍ）に目的音成分が混入することを防ぎ、音源分離装置１０が出力する音質劣化を防いでいる。 In the sound source separation device 10, the directivity of 0 degree, 5 degrees, and -5 degrees is given to the three spatial filters of the target sound suppression unit 42, respectively, and the minimum selection unit 424 outputs the three spatial filters. An output value having the smallest absolute value among the values is applied to N (m). That is, in the target sound suppression unit 42, when the target sound comes from the vicinity of the 0 degree direction, the absolute value of the output value of the spatial filter 421 is the smallest for the component in the vicinity of the 0 degree direction. Reflected in (m). On the other hand, when the target sound arrives from the vicinity in the 5 degree direction, the output value of the spatial filter 422 is reflected in N (m) for the component in the vicinity in the 5 degree direction. In this way, the target sound suppression unit 42 provides a spatial filter group that is selected and applied according to the direction of arrival of the target sound, so that N (m ) Is prevented from being mixed with the target sound component, and deterioration of the sound quality output by the sound source separation device 10 is prevented.

したがって、上述のように、目的音の到来方向に応じて選択して適用される空間フィルタ群を用いて目的音抑圧部４２を構成することにより、目的音の到来方向がずれた場合でも、分離後の目的音の音声の品質を向上させ、聞きやすくすることができる。 Therefore, as described above, by configuring the target sound suppression unit 42 using the spatial filter group that is selected and applied according to the direction of arrival of the target sound, even when the direction of arrival of the target sound is deviated, separation is performed. It is possible to improve the quality of the later target sound and make it easier to hear.

（Ｂ）第２の実施形態
以下、本発明による音源分離装置、プログラム及び方法の第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Hereinafter, a second embodiment of a sound source separation device, program, and method according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態の構成及び動作
図２は、第２の実施形態の音源分離装置１０Ａの全体構成を示すブロック図である。 (B-1) Configuration and Operation of the Second Embodiment FIG. 2 is a block diagram showing the overall configuration of the sound source separation device 10A of the second embodiment.

第１の実施形態の音源分離装置１０では、入力手段２０、分析手段３０、分離手段４０をそれぞれ一つずつ有する構成であったが、第２の実施形態の音源分離装置１０Ａでは、入力手段２０、分析手段３０、分離手段４０の組を、複数組備える点で異なっている。また、第２の実施形態の音源分離装置１０Ａでは、除去手段５０が除去手段５０Ａに置き換わっている点で、第１の実施形態と異なっている。 In the sound source separation apparatus 10 of the first embodiment, the input means 20, the analysis means 30, and the separation means 40 are each provided, but in the sound source separation apparatus 10A of the second embodiment, the input means 20 The difference is that a plurality of sets of analysis means 30 and separation means 40 are provided. Further, the sound source separation device 10A of the second embodiment is different from the first embodiment in that the removing unit 50 is replaced with the removing unit 50A.

音源分離装置１０Ａでは、図２に示すように、入力手段２０、分析手段３０、分離手段４０の組を２組有する。すなわち、２つの入力手段２０（２０−１、２０−２）、２つの分析手段３０（３０−１、３０−２）、及び２つの分離手段４０（４０−１、４０−２）を有している。また、入力手段２０−１は、２つのマイクロフォン２１−１、２２−１を有し、入力手段２０−２も２つのマイクロフォン２１−２、２２−２を有している。 As shown in FIG. 2, the sound source separation device 10 </ b> A has two sets of input means 20, analysis means 30, and separation means 40. That is, it has two input means 20 (20-1, 20-2), two analysis means 30 (30-1, 30-2), and two separation means 40 (40-1, 40-2). ing. The input unit 20-1 has two microphones 21-1 and 22-1, and the input unit 20-2 also has two microphones 21-2 and 22-2.

入力手段２０−１、２０−２、分析手段３０−１、３０−２、分離手段４０−１、４０−２のそれぞれの処理については、第１の実施形態の入力手段２０、分析手段３０、分離手段４０と同様であるので詳しい説明は省略する。 Regarding the processes of the input means 20-1, 20-2, the analysis means 30-1, 30-2, and the separation means 40-1, 40-2, the input means 20, the analysis means 30, and the first embodiment, Since it is the same as that of the separation means 40, detailed description is abbreviate | omitted.

また、以下では、分離手段４０−１における、妨害音抑圧部の出力をＭＡ（ｍ）、目的音抑圧部の出力をＮＡ（ｍ）と表す。また、分離手段４０−２における、妨害音抑圧部の出力をＭＢ（ｍ）、目的音抑圧部の出力をＮＢ（ｍ）と表す。また、マイクロフォン２１−１からの信号を、分析手段３０−１で処理したものをＤ１（ｍ）と表すものとする。 In the following, the output of the interference sound suppressing unit in the separating unit 40-1 is expressed as MA (m), and the output of the target sound suppressing unit is expressed as NA (m). Further, in the separating unit 40-2, the output of the interference sound suppressing unit is represented as MB (m), and the output of the target sound suppressing unit is represented as NB (m). In addition, a signal obtained by processing the signal from the microphone 21-1 by the analyzing unit 30-1 is represented as D1 (m).

次に、除去手段５０Ａの構成について説明する。 Next, the configuration of the removing unit 50A will be described.

除去手段５０Ａは、分離手段４０−１から与えられるＭＡ（ｍ）、ＮＡ（ｍ）と、４０−２から与えられる、ＭＢ（ｍ）、ＮＢ（ｍ）とを用いて、Ｄ１（ｍ）における、妨害音を除去するための妨害音除去スペクトルＨ（ｍ）を求めて、生成手段６０に与える。 The removing means 50A uses MA (m) and NA (m) given from the separating means 40-1 and MB (m) and NB (m) given from 40-2, and uses D (m). Then, the interference sound removal spectrum H (m) for removing the interference sound is obtained and given to the generating means 60.

以下では、除去手段５０Ａが求める妨害音除去スペクトルＨ（ｍ）の一例について説明する。 Hereinafter, an example of the interference noise removal spectrum H (m) required by the removing unit 50A will be described.

除去手段５０Ａは、分離手段４０−１から与えられるＭＡ（ｍ）、ＮＡ（ｍ）と、４０−２から与えられる、ＭＢ（ｍ）、ＮＢ（ｍ）とを、以下の（１３）式に適用し、Ｓ（ｍ）を求める。さらに、除去手段５０Ａは、０≦ｍ≦Ｎ／２の範囲で求められたＳ（ｍ）に対して、以下の（１４）式を用いて、除去手段５０Ａの出力である妨害音除去スペクトルＨ（ｍ）を求める。なお、（１３）式、（１４）式において、Ｄ１を、他のマイクロフォンからの信号に基づくスペクトルに置き換えるようにしても良い。

The removing unit 50A converts MA (m) and NA (m) given from the separating unit 40-1 and MB (m) and NB (m) given from 40-2 into the following equation (13). Apply and determine S (m). Further, the removing unit 50A uses the following equation (14) for S (m) obtained in the range of 0 ≦ m ≦ N / 2, and uses the following equation (14) to remove the interference sound removal spectrum H that is the output of the removing unit 50A. Find (m). Note that in the equations (13) and (14), D1 may be replaced with a spectrum based on signals from other microphones.

Ｈ（ｍ）＝Ｓ（ｍ）Ｄ１（ｍ） …（１４）
また、Ｈ（ｍ）＝Ｈ＊（Ｎ−ｍ）（ただし、Ｎ／２＋１≦ｍ≦Ｎ−１）の性質を利用して、０≦ｍ≦Ｎ−１の範囲の妨害音除去スペクトルＨ（ｍ）を求め、生成手段６０に与える。 H (m) = S (m) D1 (m) (14)
Further, by utilizing the property of H (m) = H * (N−m) (where N / 2 + 1 ≦ m ≦ N−1), the interference noise elimination spectrum H (range of 0 ≦ m ≦ N−1) ( m) is obtained and given to the generating means 60.

生成手段６０の処理については、第１の実施形態と同様であるので説明を省略する。 Since the processing of the generation unit 60 is the same as that of the first embodiment, description thereof is omitted.

（Ｂ−２）第２の実施形態の効果
第２の実施形態の音源分離装置１０Ａでは、入力手段において、２個よりも多い数のマイクロフォンを用いた場合でも、第１の実施形態と同様の効果を奏することができる。 (B-2) Effects of Second Embodiment In the sound source separation device 10A of the second embodiment, even when more than two microphones are used in the input means, the same as in the first embodiment There is an effect.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｃ−１）第１の実施形態において、音源分離装置１０の用途によっては、生成手段６０を省略したり、他の装置が有する生成部を流用したりすることができる。例えば、音源分離装置が音声認識装置に利用される場合であれば、分離スペクトルＨ（ｍ）を認識用特徴量として用いるようにして生成手段６０を省略することができる。また例えば、音源分離装置がＩＰ電話機に利用される場合であれば、ＩＰ電話機が生成部に相当する手段を有するので、その生成部を流用するようにしても良い。 (C-1) In the first embodiment, depending on the application of the sound source separation device 10, the generation unit 60 can be omitted, or a generation unit included in another device can be used. For example, if the sound source separation device is used for a speech recognition device, the generation means 60 can be omitted by using the separated spectrum H (m) as the recognition feature amount. Further, for example, if the sound source separation device is used for an IP phone, the IP phone has means corresponding to the generation unit, and the generation unit may be used.

（Ｃ−２）第２の実施形態において、４つのマイクロフォン２１−１、２２−１、２１−２、２２−２を用いる例について説明したが、入力手段２０−１と入力手段２０−２との間で、１個を共通に使うことにより３個のマイクロフォンによって構成するようにしても良い。この場合、共通に用いるマイクロフォンが受音した信号の処理を共通にすることができるため演算量を低減させることができる。また、用いるマイクロフォンの数をさらに増加させた場合でも、同様に入力手段の間で共通のマイクロフォンを用いるようにしても良い。 (C-2) In the second embodiment, the example using the four microphones 21-1, 22-1, 21-2, and 22-2 has been described. However, the input unit 20-1, the input unit 20-2, In this case, one microphone may be used in common, and three microphones may be used. In this case, the amount of calculation can be reduced because the processing of the signals received by the commonly used microphones can be made common. Further, even when the number of microphones to be used is further increased, a common microphone may be used between the input means.

１０…音源分離装置、２０…入力手段、２１、２２…マイクロフォン、３０…分析手段、３１、３２…周波数分析部、４０…分離手段、４１…妨害音抑圧部、４１１、４１２…空間フィルタ、４１３…最小選択部、４２…目的音抑圧部、４２１、４２２、４２３空間フィルタ、４２４…最小選択部、５０…除去手段、６０…生成手段。 DESCRIPTION OF SYMBOLS 10 ... Sound source separation apparatus, 20 ... Input means, 21, 22 ... Microphone, 30 ... Analysis means, 31, 32 ... Frequency analysis part, 40 ... Separation means, 41 ... Interference sound suppression part, 411, 412 ... Spatial filter, 413 ... minimum selection unit, 42 ... target sound suppression unit, 421, 422, 423 spatial filter, 424 ... minimum selection unit, 50 ... removal means, 60 ... generation means.

Claims

Among the plurality of microphones arranged at intervals, the spectrum of the received sound signal of two microphones is in a different direction within a predetermined range including an assumed arrival direction where the target sound is expected to arrive. A target sound suppression spectrum generating means for generating a target sound suppression spectrum in which a component of the target sound is suppressed from a spectrum of the received signal using a plurality of target sound suppression units that process the directivity of component suppression; ,
About the spectrum of the received sound signal, target sound dominant spectrum generating means for generating a target sound dominant spectrum suppressing interfering sound coming from any direction other than the predetermined range;
Separating means for separating the disturbing sound component and the target sound component from the received signal using the target sound suppression spectrum and the target sound dominant spectrum ;
The sound source separation device, wherein the target sound suppression spectrum generating means applies a value having the smallest absolute value among the processing results of the target sound suppression unit to each component of the target sound suppression spectrum .

A plurality of spectrum generation processing units having the target sound suppression spectrum generation means and the target sound dominant spectrum generation means,
The separation means separates the interference sound component and the target sound component from the received signal using the target sound suppression spectrum and the target sound dominant spectrum generated by each of the spectrum generation processing units. The sound source separation device according to claim 1, wherein:

The computer installed in the sound source separation device
Among the plurality of microphones arranged at intervals, the spectrum of the received sound signal of two microphones is in a different direction within a predetermined range including an assumed arrival direction where the target sound is expected to arrive. A target sound suppression spectrum generating means for generating a target sound suppression spectrum in which a component of the target sound is suppressed from a spectrum of the received signal using a plurality of target sound suppression units that process the directivity of component suppression; ,
About the spectrum of the received sound signal, target sound dominant spectrum generating means for generating a target sound dominant spectrum suppressing interfering sound coming from any direction other than the predetermined range;
Using a target sound suppressed spectrum, the target sound predominant spectrum for the received sound signal, to function as a separating means for separating the components of the component and the target sound of the interference sound,
The sound source separation program, wherein the target sound suppression spectrum generating means applies a value having the smallest absolute value among the processing results of the target sound suppression unit to each component of the target sound suppression spectrum .

In the sound source separation method performed by the sound source separation device,
A target sound suppression spectrum generating means, a target sound dominant spectrum generating means, and a separating means;
The target sound suppression spectrum generating means includes a predetermined number of directions including an assumed arrival direction in which the target sound is expected to arrive in the spectrum of the received signal of two microphones among a plurality of microphones arranged at intervals. A target sound suppression spectrum in which the target sound component is suppressed from the spectrum of the received signal using a plurality of target sound suppression units that process the component suppression directivity in different directions within the range. Generate
The target sound dominant spectrum generating means generates a target sound dominant spectrum in which a disturbing sound coming from an arbitrary direction other than the predetermined range is suppressed for the spectrum of the received signal, and the separating means suppresses the target sound suppression Using the spectrum and the target sound dominant spectrum, for the received signal, separating the disturbing sound component and the target sound component ,
The sound source separation method, wherein the target sound suppression spectrum generating means applies a value having the smallest absolute value among the processing results of the target sound suppression unit to each component of the target sound suppression spectrum .