JP2015070315A

JP2015070315A - Sound source separation device, sound source separation method, and sound source separation program

Info

Publication number: JP2015070315A
Application number: JP2013200347A
Authority: JP
Inventors: 一浩片桐; Kazuhiro Katagiri
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2013-09-26
Filing date: 2013-09-26
Publication date: 2015-04-13
Anticipated expiration: 2033-09-26
Also published as: JP6197534B2

Abstract

PROBLEM TO BE SOLVED: To separate a target sound arriving from a sound source in a specific direction, regardless of a position of a sound source generating non-target sounds.SOLUTION: A sound source separation device 100 is configured to separate between a target sound arriving from a sound source which is present in a target direction, and non-target sounds arriving from sound sources which are present in any other directions than the target direction. The sound source separation device 100 includes: unidirectional formation parts 31 and 32 which use acoustic signals Xand Xoutputted by microphones M1 and M2 to generate two acoustic signals Aand Ahaving unidirectionality so as to invert the dead angles thereof; non-target signal extraction parts 41 and 42 which perform spectrum subtraction using the generated two acoustic signals having unidirectionality and extract signals Nand Nof non-target sounds arriving from the dead angles; and a target signal extraction part 51 which performs spectrum subtraction using the extracted signals Nand Nof non-target sounds and extracts a signal Y of the target sound.

Description

本発明は、音源分離装置、音源分離方法、及び音源分離プログラムに関し、特に、複数の音源が存在する環境下において、特定の方向の音源のみを分離し抽出する場合に適用し得るものである。 The present invention relates to a sound source separation device, a sound source separation method, and a sound source separation program, and is particularly applicable to a case where only a sound source in a specific direction is separated and extracted in an environment where a plurality of sound sources exist.

従来、複数の音源が存在する環境下において、ある特定の方向（以下、目的方向）の音信号（以下、目的音）のみ分離し抽出する技術として、マイクロホンアレイを用いたビームフォーマ（Beam Former；以下「ＢＦ」）がある。ＢＦとは、各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である（非特許文献１参照）。
ＢＦは、加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。図６は、マイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。 Conventionally, as a technique for separating and extracting only a sound signal (hereinafter referred to as a target sound) in a specific direction (hereinafter referred to as a target sound) in an environment where a plurality of sound sources exist, a beam former (Beam Former; “BF”). BF is a technique for forming directivity using the time difference between signals reaching each microphone (see Non-Patent Document 1).
BF is roughly divided into two types, an addition type and a subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF. FIG. 6 is a block diagram showing a configuration related to the subtraction type BF when the number of microphones is two.

減算型ＢＦは、まず遅延器により、ある音源からの音信号が各マイクロホンに到来する信号の時間差を算出し、一方のマイクロホンに到来する信号に算出した時間差の遅延を加えることにより音源からの音信号の位相を合わせる。以下では、音源が存在する方向を「音源方向」と呼ぶ。 The subtraction type BF first calculates a time difference between signals that sound signals from a certain sound source arrive at each microphone by a delay unit, and adds a delay of the calculated time difference to a signal that arrives at one microphone, thereby generating sound from the sound source. Match the signal phase. Hereinafter, the direction in which the sound source exists is referred to as “sound source direction”.

時間差は下記（１）式により算出される。
τ＝（ｄ・ｓｉｎθ）／ｃ・・・・・・・・・・・・・・・・・・・（１）式
ここで、「ｄ」はマイクロホン間の距離（ｍ）、「ｃ」は音速（ｍ／ｓ）、「τ」は遅延時間（ｓ）である。また、「θ」は、各マイクロホンを結んだ直線に対する垂線と音源方向との間の角度（°）である。 The time difference is calculated by the following equation (1).
τ = (d · sin θ) / c (1) where “d” is the distance between the microphones (m) and “c”. Is the speed of sound (m / s), and “τ” is the delay time (s). “Θ” is an angle (°) between a perpendicular to a straight line connecting the microphones and a sound source direction.

音源方向が第１のマイクロホンＭ１と第２のマイクロホンＭ２との中心に対して第１のマイクロホンＭ１の方向に存在する場合、第１のマイクロホンＭ１の入力ｘ_１（ｔ）に対し遅延処理を行う。その結果、遅延処理後の第１のマイクロホンＭ１の入力は「ｘ_１（ｔ−τ）」となる。その後、減算型ＢＦは、下記（２）式に従い減算器により減算処理を行う。
ａ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ）・・・・・・・・・・・・・・（２）式 When the sound source direction exists in the direction of the first microphone M1 with respect to the center of the first microphone M1 and the second microphone M2, a delay process is performed on the input x ₁ (t) of the _first microphone M1. . As a result, the input of the first microphone M1 after the delay processing is “x ₁ (t−τ)”. Thereafter, the subtraction type BF performs subtraction processing by a subtracter according to the following equation (2).
a (t) = x ₂ (t) −x ₁ (t−τ) (2)

減算処理は、周波数領域でも同様に行うことができる。ここで、時間軸をτだけ遅らせた信号のフーリエ変換は、元の信号をフーリエ変換した結果に「ｅｘｐ（−ｊωτ）」を乗じたものになることが知られている。その場合、（２）式は、下記（３）式のように変更される。
Ａ（ω）＝Ｘ_２（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_１（ω）・・・・・（３）式 The subtraction process can be similarly performed in the frequency domain. Here, it is known that the Fourier transform of a signal whose time axis is delayed by τ is obtained by multiplying the result of Fourier transform of the original signal by “exp (−jωτ)”. In that case, the equation (2) is changed to the following equation (3).
A (ω) = X ₂ (ω) − [exp (−jωτ)] X ₁ (ω) (3)

ここで、θ＝±π／２の場合、形成される指向性は図７（ａ）に示すように、カージオイド型の単一指向性となる。また、θ＝０，πの場合は、図７（ｂ）のような「８」の字型の双指向性となる。ここでは、入力信号から単一指向性を形成するフィルタを「単一指向性フィルタ」、双指向性を形成するフィルタを「双指向性フィルタ」と呼称する。 Here, in the case of θ = ± π / 2, the formed directivity is a cardioid unidirectionality as shown in FIG. Further, in the case of θ = 0, π, the “8” -shaped bidirectionality as shown in FIG. Here, a filter that forms unidirectionality from an input signal is referred to as “unidirectional filter”, and a filter that forms bidirectionality is referred to as “bidirectional filter”.

これらの技術を用いて、目的方向により強い指向性を形成するための技術が開発されている。例えば、特許文献１には、目的方向に対して左右に死角を向けるように形成した２つの単一指向性の重なった領域（図７（ｃ）の斜線部分）に指向性を形成する手法が提案されている。この手法は、左右の単一指向性フィルタの出力の差分を利用して適応フィルタを形成し、両方の指向性に共通に含まれる成分を抽出する。 Using these techniques, techniques for forming stronger directivity in the target direction have been developed. For example, Patent Document 1 discloses a method of forming directivity in two overlapping regions of unidirectivity (shaded portions in FIG. 7C) formed so that the blind spot is directed left and right with respect to the target direction. Proposed. In this method, an adaptive filter is formed using the difference between the outputs of the left and right unidirectional filters, and components included in both directivities are extracted.

また、特許文献２には、単一指向性及び双指向性の２種類の指向性を利用することで、目的方向に強い指向性を形成する手法が提案されている。この手法では、まず目的方向に対して左右に死角を向ける単一指向性と、目的方向に対して死角を向ける双指向性を形成する。その後、２つの単一指向性フィルタの出力の内、パワーの大きい方を選択し、選択した単一指向性フィルタ出力から双指向性フィルタ出力をスペクトル減算（Spectral Subtraction；以下「ＳＳ」）することにより、目的方向以外に存在する音（以下、非目的音）を抑圧し、目的音を抽出する。
このように、これらの既存の技術を用いれば、目的方向に強い指向性を形成することができる。 Patent Document 2 proposes a method of forming strong directivity in the target direction by using two types of directivity, unidirectionality and bi-directionality. In this method, first, unidirectionality in which a blind spot is directed to the left and right with respect to the target direction and bi-directionality in which the blind spot is directed to the target direction are formed. After that, the larger one of the outputs of the two unidirectional filters is selected, and spectral subtraction (hereinafter referred to as “SS”) is performed on the bi-directional filter output from the selected unidirectional filter output. Thus, the sound existing in the direction other than the target direction (hereinafter, non-target sound) is suppressed and the target sound is extracted.
Thus, if these existing technologies are used, a strong directivity can be formed in the target direction.

特開平１１−２０５９００号公報Japanese Patent Laid-Open No. 11-205900 特開２００８−１３１１８３号公報JP 2008-131183 A

浅野太著、「音のアレイ信号処理」コロナ社、２０１１年２月２５日発行ｐ．７０〜ｐ．７９Asano Tadashi, "Sound Array Signal Processing", Corona Publishing, February 25, 2011 p. 70-p. 79

しかしながら、図３に示すように、２つのスピーカ（音源）の真ん中にマイクロホンＭ１，Ｍ２を配置する状況において、左右のスピーカＬ，Ｒから同時に同じ音（非目的音）が再生されると、これら既存の技術では、正面に位置する話者の声（目的音）のみを分離抽出することができないという問題があった。 However, as shown in FIG. 3, in the situation where the microphones M1 and M2 are arranged in the middle of the two speakers (sound sources), if the same sound (non-target sound) is reproduced from the left and right speakers L and R at the same time, The existing technology has a problem that it is impossible to separate and extract only the voice (target sound) of the speaker located in front.

この状況では、２つのマイクロホンＭ１，Ｍ２で全く同じ音を収音することになり、左右の単一指向性には、目的音の成分だけでなくスピーカＬ，Ｒの再生音も同時に等しく含まれることになる。また目的音だけでなくスピーカＬ，Ｒの再生音の位相も同じため、双指向性には、目的音とスピーカＬ，Ｒの再生音とのどちらも含まれないことになる。 In this situation, the two microphones M1 and M2 pick up exactly the same sound, and the left and right unidirectivity includes not only the target sound component but also the reproduced sound of the speakers L and R at the same time. It will be. Further, since not only the target sound but also the reproduced sound of the speakers L and R have the same phase, the bidirectional sound does not include both the target sound and the reproduced sound of the speakers L and R.

すなわち、特許文献１の技術では、スピーカの再生音がどちらの単一指向性にも含まれるため、差分が０になり学習が行われず、目的音とともにスピーカの再生音もそのまま出力されることになる。
また、特許文献２の技術では、左右の単一指向性フィルタどちらを選択してもスピーカの再生音が含まれることに加えて、左右のマイクロホンで収音したスピーカの再生音に時間差が生じないため、双指向性を形成してもスピーカの再生音は抽出できない。そのため、その後、スペクトル減算（ＳＳ）を行ってもスピーカの再生音を抑圧することができない。 That is, in the technique of Patent Document 1, since the reproduction sound of the speaker is included in either unidirectionality, the difference becomes 0 and learning is not performed, and the reproduction sound of the speaker is output as it is together with the target sound. Become.
In addition, in the technique of Patent Document 2, in addition to including the reproduced sound of the speaker regardless of which of the left and right unidirectional filters is selected, there is no time difference between the reproduced sounds of the speakers picked up by the left and right microphones. Therefore, the reproduced sound of the speaker cannot be extracted even if the bidirectionality is formed. Therefore, the reproduced sound of the speaker cannot be suppressed even after performing spectral subtraction (SS).

本発明は、前記問題に鑑みてなされたものであり、非目的音を発生させる音源の位置に関わらず、特定の方向の音源から到達する目的音を分離することができる、音源分離装置、音源分離方法、及び音源分離プログラムを提供することを課題とする。 The present invention has been made in view of the above problems, and a sound source separation device and a sound source that can separate a target sound that arrives from a sound source in a specific direction regardless of the position of the sound source that generates the non-target sound. It is an object to provide a separation method and a sound source separation program.

前記課題を解決するため、本発明に係る音源分離装置は、法線方向が目的方向となる面内に位置する複数のマイクロホンに到達する音響の内、前記目的方向に存在する音源から到達する目的音と、それ以外の方向に存在する音源から到達する非目的音とを分離する音源分離装置（１００）であって、前記複数のマイクロホンの内の何れか２つのマイクロホンが出力する音響信号（Ｘ_１，Ｘ_２）を用い、死角が前記２つのマイクロホンを結ぶ方向の何れかに向けられた単一指向性を有する音響信号（Ａ_Ｒ，Ａ_Ｌ）を前記死角が逆向きになるように２つ生成する単一指向性形成部（３１，３２）と、前記複数のマイクロホンの何れか１つが出力する音響信号（Ｘ_１，Ｘ_２）、または２つ以上のマイクロホンが出力する音響信号を平均した信号（Ｘ_ＤＳ）に対して、前記単一指向性形成部で生成した前記単一指向性を有する２つの音響信号を用いてスペクトル減算を行い、それぞれの死角から到達する非目的音の信号（Ｎ_ＵＤＬ，Ｎ_ＵＤＲ）を抽出する非目的信号抽出部（４１，４２）と、前記複数のマイクロホンの何れか１つが出力する音響信号（Ｘ_１，Ｘ_２）、または２つ以上のマイクロホンが出力する音響信号を平均した信号（Ｘ_ＤＳ）に対して、前記非目的信号抽出部で抽出した非目的音の信号（Ｎ_ＵＤＬ，Ｎ_ＵＤＲ）を用いてスペクトル減算を行い、目的音の信号（Ｙ）を抽出する目的信号抽出部（５１）と、を備えることを特徴とする。但し、括弧内の符号は例示である。 In order to solve the above-described problem, the sound source separation device according to the present invention is an object that reaches from a sound source that exists in the target direction among sounds that reach a plurality of microphones located in a plane whose normal direction is the target direction. A sound source separation device (100) that separates a sound from a non-target sound that arrives from a sound source that exists in the other direction, and an acoustic signal (X) output from any two of the plurality of microphones ₁ , X ₂ ), an acoustic signal (A _R , A _L ) having a unidirectional directivity in which the blind spot is directed in any of the directions connecting the two microphones is set so that the blind spot is reversed. Unidirectionality forming units (31, 32) to be generated, and acoustic signals (X ₁ , X ₂ ) output from any one of the plurality of microphones, or acoustic signals output from two or more microphones are averaged did No. respect (X _DS), said unidirectional said generated by forming unit with two acoustic signals having a unidirectional performs spectral subtraction, a non-target sound signals arriving from the respective blind spot ( N _UDL , N _UDR ) for extracting non-target signals (41, 42), acoustic signals (X ₁ , X ₂ ) output from any one of the plurality of microphones, or output from two or more microphones Spectrum signal subtraction is performed on the signal (X _DS ) obtained by averaging the sound signals to be _processed using the non-target sound signals (N _UDL , N _UDR ) extracted by the non-target signal extraction unit, and the target sound signal (Y And a target signal extraction unit (51) for extracting. However, the reference numerals in parentheses are examples.

本発明によれば、非目的音を発生させる音源の位置に関わらず、特定の方向の音源から到達する目的音を分離することができる。 According to the present invention, it is possible to separate a target sound that arrives from a sound source in a specific direction regardless of the position of the sound source that generates the non-target sound.

実施形態に係る音源分離装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sound source separation apparatus which concerns on embodiment. 実施形態に係る音源分離装置が目的方向に鋭い指向性を形成する工程を説明するための図であり、図２（ａ）は単一指向性形成部の処理を説明するための図であり、図２（ｂ）は非目的信号抽出部の処理を説明するための図であり、図２（ｃ）は目的信号抽出部の処理を説明するための図である。It is a figure for demonstrating the process in which the sound source separation apparatus which concerns on embodiment forms the sharp directivity in a target direction, FIG.2 (a) is a figure for demonstrating the process of a unidirectional formation part, FIG. 2B is a diagram for explaining the processing of the non-target signal extracting unit, and FIG. 2C is a diagram for explaining the processing of the target signal extracting unit. 実施形態に係る音源分離装置の使用状況の一例を示す図である。It is a figure which shows an example of the usage condition of the sound source separation apparatus which concerns on embodiment. 実施形態に係る音源分離装置及び従来の双方向性による音源分離装置の性能評価実験の結果を説明するための図である。It is a figure for demonstrating the result of the performance evaluation experiment of the sound source separation device which concerns on embodiment, and the conventional sound source separation device by bidirectionality. 変形例に係る音源分離装置の構成を説明するための図であり、図５（ａ）は変形例に係る音源分離装置が備えるマイクロホンの配置を示す図であり、図５（ｂ）は単一指向性を形成するためのマイクロホンの組み合わせ例を示す図である。FIG. 5A is a diagram for explaining a configuration of a sound source separation device according to a modified example. FIG. 5A is a diagram illustrating an arrangement of microphones included in the sound source separation device according to the modification, and FIG. It is a figure which shows the example of a combination of the microphone for forming directivity. 従来例におけるマイクロホン数が２個の場合の減算型ＢＦの構成を示すブロック図である。It is a block diagram which shows the structure of subtraction type BF in case the number of microphones in a prior art example is two. 従来例におけるマイクロホン数が２個の場合の減算型ＢＦにより形成される指向特性を示す図である。It is a figure which shows the directional characteristic formed by subtraction type BF in case the number of microphones in a prior art example is two.

以下、本発明の実施をするための形態を、適宜図面を参照しながら詳細に説明する。
各図は、本発明を十分に理解できる程度に、概略的に示してあるに過ぎない。よって、本発明は、図示例のみに限定されるものではない。また、参照する図面において、本発明を構成する部材の寸法は、説明を明確にするために誇張して表現されている場合がある。なお、各図において、共通する構成要素や同様な構成要素については、同一の符号を付し、それらの重複する説明を省略する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings as appropriate.
Each figure is only schematically shown so that the invention can be fully understood. Therefore, the present invention is not limited to the illustrated example. In the drawings to be referred to, dimensions of members constituting the present invention may be exaggerated for clarity of explanation. In addition, in each figure, about the same component or the same component, the same code | symbol is attached | subjected and those overlapping description is abbreviate | omitted.

［実施形態］
≪実施形態に係る音源分離装置の構成≫
図１を参照して、実施形態に係る音源分離装置の構成を説明する。図１は、実施形態に係る音源分離装置１００の構成を示すブロック図である。
音源分離装置１００は、周囲に存在する複数の音源から到達する音響の内、ある特定の方向（目的方向）に存在する音源から到達する目的音と、それ以外の方向に存在する音源から到達する非目的音とを分離するものである。 [Embodiment]
<< Configuration of Sound Source Separation Device According to Embodiment >>
With reference to FIG. 1, the structure of the sound source separation device according to the embodiment will be described. FIG. 1 is a block diagram illustrating a configuration of a sound source separation device 100 according to the embodiment.
The sound source separation device 100 arrives from a target sound that arrives from a sound source that exists in a specific direction (target direction), and a sound source that exists in other directions, among the sounds that arrive from a plurality of sound sources that exist in the vicinity. Separates non-target sounds.

音源分離装置１００は、例えば、テレビ電話機能を有する多機能携帯電話機（スマートフォン）に搭載され、テレビ電話機能を用いて雑音下で会話する場合における、通話音質の向上を可能にする。なお、音源分離装置１００は、ビデオ会議システム、インターホン等にも有効である。また、音源分離装置１００は、音声検索機能を有するカーナビゲーションシステムに搭載され、音声検索機能を用いて雑音下で情報を検索する場合における、検索する情報の誤認識を防止する。 The sound source separation device 100 is mounted on, for example, a multi-function mobile phone (smart phone) having a videophone function, and enables improvement of call sound quality when talking under noise using the videophone function. The sound source separation device 100 is also effective for a video conference system, an interphone, and the like. The sound source separation device 100 is mounted on a car navigation system having a voice search function, and prevents erroneous recognition of information to be searched when searching for information under noise using the voice search function.

図１に示すように、実施形態に係る音源分離装置１００は、第１のマイクロホンＭ１と、第２のマイクロホンＭ２と、信号入力部１１，１２と、信号加算部２１と、単一指向性形成部３１，３２と、非目的信号抽出部４１，４２と、目的信号抽出部５１とを備える。
音源分離装置１００は、ディスクリート部品などの組み合わせや半導体チップなどによって専用的に構成されたものであってもよく、また、ＣＰＵ（Central Processing Unit）によるプログラム実行処理により実現されるものであってもよい。つまり、音源分離装置１００は、その実現化方法は特に限定されるものではない。以下では、各構成要素を詳細に説明する。 As shown in FIG. 1, the sound source separation device 100 according to the embodiment includes a first microphone M1, a second microphone M2, signal input units 11 and 12, a signal addition unit 21, and a unidirectional formation. Units 31, 32, non-target signal extraction units 41, 42, and a target signal extraction unit 51.
The sound source separation device 100 may be configured exclusively by a combination of discrete components, a semiconductor chip, or the like, or may be realized by a program execution process by a CPU (Central Processing Unit). Good. That is, the method for realizing the sound source separation device 100 is not particularly limited. Below, each component is demonstrated in detail.

（マイクロホン）
第１のマイクロホンＭ１及び第２のマイクロホンＭ２は、音（音波）を音響信号に変換するものである。ここでは、２つのマイクロホンＭ１，Ｍ２を結ぶ線に対する前方側の垂線方向を「０°」とし、時計回りの方向を正の角度で表し、反時計回りの方向を負の角度で表すことにする。また、前方（０°）を目的方向として想定する。つまり、第１のマイクロホンＭ１及び第２のマイクロホンＭ２は、それぞれ水平に配置され、第１のマイクロホンＭ１及び第２のマイクロホンＭ２を結ぶ軸の垂線が目的方向に対して平行になっている。言い換えれば、第１のマイクロホンＭ１及び第２のマイクロホンＭ２を結ぶ軸を含む垂直面の法線方向が目的方向となっている。 (Microphone)
The first microphone M1 and the second microphone M2 convert sound (sound wave) into an acoustic signal. Here, the perpendicular direction to the front side with respect to the line connecting the two microphones M1 and M2 is “0 °”, the clockwise direction is represented by a positive angle, and the counterclockwise direction is represented by a negative angle. . Further, the front (0 °) is assumed as the target direction. That is, the first microphone M1 and the second microphone M2 are respectively arranged horizontally, and the perpendicular of the axis connecting the first microphone M1 and the second microphone M2 is parallel to the target direction. In other words, the normal direction of the vertical plane including the axis connecting the first microphone M1 and the second microphone M2 is the target direction.

マイクロホンＭ１，Ｍ２は、全指向性（無指向性）のものである。その為、マイクロホンＭ１，Ｍ２は、目的方向から到達する目的音に加え、目的方向以外から到達する非目的音を収音する。そして、マイクロホンＭ１は、収音した目的音及び非目的音を音響信号ｘ_１に変換して、音響信号ｘ_１を信号入力部１１に出力する。また、マイクロホンＭ２は、収音した目的音及び非目的音を音響信号ｘ_２に変換して、音響信号ｘ_２を信号入力部１２に出力する。 The microphones M1 and M2 are omnidirectional (omnidirectional). For this reason, the microphones M1 and M2 collect non-target sound that arrives from other than the target direction in addition to target sound that arrives from the target direction. The microphones M1 converts the target sound picked up and non-target sound to the acoustic signals x _1, and outputs the acoustic signals x ₁ to the signal input unit 11. Further, the microphone M2 converts the target sound picked up and non-target sound to the acoustic signal x _2, and outputs the acoustic signal x ₂ to the signal input portion 12.

（信号入力部）
信号入力部１１，１２は、マイクロホンＭ１，Ｍ２で収音し、出力した音響信号ｘ_１，ｘ_２をアナログ信号からデジタル信号に変換する。信号入力部１１，１２は、例えば、最大音声周波数の２倍の周波数でサンプリングすることにより、アナログ／デジタル変換を行う。
また、信号入力部１１，１２は、変換したデジタル信号を周波数分析する。信号入力部１１，１２は、例えば、高速フーリエ変換を用いてデジタル信号を時間領域から周波数領域へ変換する。以下では、周波数領域で表される音響信号の振幅成分を「振幅スペクトル」と呼び大文字の英字で表すことにする。そして、信号入力部１１，１２は、音響信号の振幅スペクトルＸ_１，Ｘ_２を信号加算部２１及び単一指向性形成部３１，３２に出力する。 (Signal input section)
The signal input units 11 and 12 collect sound with the microphones M1 and M2, and convert the output acoustic signals x ₁ and x ₂ from analog signals to digital signals. For example, the signal input units 11 and 12 perform analog / digital conversion by sampling at a frequency twice the maximum audio frequency.
The signal input units 11 and 12 perform frequency analysis on the converted digital signal. The signal input units 11 and 12 convert the digital signal from the time domain to the frequency domain using, for example, fast Fourier transform. In the following, the amplitude component of the acoustic signal represented in the frequency domain is referred to as “amplitude spectrum” and is represented by capital letters. Then, the signal input units 11 and 12 output the amplitude spectra X ₁ and X ₂ of the acoustic signal to the signal adding unit 21 and the unidirectional forming units 31 and 32.

（信号加算部）
信号加算部２１は、信号入力部１１，１２から入力された音響信号の振幅スペクトルＸ_１，Ｘ_２を加算した後、１／２倍することにより目的音成分を協調する。そして、信号加算部２１は、協調した音響信号の振幅スペクトルＸ_ＤＳを非目的信号抽出部４１，４２、及び目的信号抽出部５１に出力する。 (Signal adder)
The signal adding unit 21 cooperates with the target sound component by adding the amplitude spectra X ₁ and X ₂ of the acoustic signals input from the signal input units 11 and 12 and then multiplying by half. The signal addition unit 21 outputs the amplitude spectrum X _DS coordinated acoustic signal non-target signal extraction unit 41, and the target signal extractor 51.

（単一指向性形成部）
単一指向性形成部３１，３２は、信号入力部１１，１２からの音響信号の振幅スペクトルＸ_１，Ｘ_２を用いて、目的方向に対して±９０°に死角を向ける単一指向性を有する振幅スペクトルＡ_Ｌ，Ａ_Ｒを形成する。単一指向性を有する振幅スペクトルＡ_Ｌ，Ａ_Ｒの形成は、前記説明した（１）式および（３）式に従い、それぞれθ＝＋π／２，θ＝−π／２として計算される。ここで、単一指向性形成部３１，３２は、振幅スペクトルＡ_Ｒ，Ａ_Ｌに周波数毎のゲイン補正を行う。
τ＝（ｄ・ｓｉｎθ）／ｃ・・・・・・・・・・・・・・・・・・・（１）式
Ａ（ω）＝Ｘ_２（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_１（ω）・・・・・（３）式 (Unidirectionality forming part)
The unidirectional formation units 31 and 32 use the amplitude spectra X ₁ and X ₂ of the acoustic signals from the signal input units 11 and 12 to provide unidirectionality that directs the dead angle to ± 90 ° with respect to the target direction. Amplitude spectra A _L and A _R having the same are formed. Formation of amplitude spectra A _L and A _R having unidirectionality is calculated as θ = + π / 2 and θ = −π / 2, respectively, according to the above-described equations (1) and (3). Here, the unidirectional forming units 31 and 32 perform gain correction for each frequency on the amplitude spectra A _R and A _L.
τ = (d · sin θ) / c (1) Formula A (ω) = X ₂ (ω) − [exp (−jωτ)] X ₁ (ω) (3)

具体的には、単一指向性形成部３１は、θ＝＋π／２として、信号入力部１１から出力された音響信号の振幅スペクトルＸ_１に「ｅ^−ｊωτ」を乗算してから減算処理を行うことで、目的方向に対して＋９０°に死角を向ける単一指向性の振幅スペクトルＡ_Ｒ（目的方向に対して「−９０°」方向に単一指向性を形成した振幅スペクトルＡ_Ｒ）を形成する（図２（ａ）参照）。振幅スペクトルＡ_Ｒには、目的方向から到達する目的音と単一指向性が形成された側である「−９０°」方向から到達する非目的音とが含まれる。 Specifically, unidirectional forming unit 31, as θ = + π / 2, the subtraction processing after multiplying the "e ^{-Jeiomegatau"} from the signal input unit 11 to the amplitude spectrum X ₁ output acoustic signals it is carried out, the _(amplitude spectrum a _R forming the unidirectional to the "-90 °" direction relative to the intended _direction) amplitude spectrum a _R of unidirectional directing blind spots + 90 ° with respect to the intended direction It forms (refer Fig.2 (a)). The amplitude spectrum A _R, include a non-target sound arriving from which is the side of objective sound and unidirectional reaching the intended direction is formed "-90 °" direction.

また、単一指向性形成部３２は、θ＝−π／２として、信号入力部１２から出力された音響信号の振幅スペクトルＸ_２に「ｅ^−ｊωτ」を乗算してから減算処理を行うことで、目的方向に対して−９０°に死角を向ける単一指向性の振幅スペクトルＡ_Ｌ（目的方向に対して「＋９０°」方向に単一指向性を形成した振幅スペクトルＡ_Ｌ）を形成する（図２（ａ）参照）。振幅スペクトルＡ_Ｌには、目的方向から到達する目的音と単一指向性が形成された側である「＋９０°」方向から到達する非目的音とが含まれる。 Further, the unidirectional forming unit 32 performs subtraction processing after multiplying the amplitude spectrum X ₂ of the acoustic signal output from the signal input unit 12 by “e ^−jωτ ”, with θ = −π / 2. in form _(amplitude spectrum a _L forming the unidirectional to "+ 90 °" direction relative to the intended _direction) amplitude spectrum a _L of unidirectional directing blind spots -90 ° with respect to the intended direction (See FIG. 2 (a)). The amplitude spectrum _AL includes a target sound arriving from the target direction and a non-target sound arriving from the “+ 90 °” direction on the side where the unidirectionality is formed.

（非目的信号抽出部）
非目的信号抽出部４１，４２は、信号加算部２１の出力Ｘ_ＤＳから単一指向性形成部３１，３２の出力Ａ_Ｒ，Ａ_Ｌをそれぞれスペクトル減算（ＳＳ）し、目的方向に対して±９０°の方向（死角）に存在する非目的音Ｎ_ＵＤＬ，Ｎ_ＵＤＲを抽出する。非目的音Ｎ_ＵＤＬ，Ｎ_ＵＤＲの抽出は、（４）式および（５）式に従い行われる。
Ｎ_ＵＤＬ＝Ｘ_ＤＳ−Ａ_Ｒ・・・・・・・・・・・・・・・・・・・・・（４）式
Ｎ_ＵＤＲ＝Ｘ_ＤＳ−Ａ_Ｌ・・・・・・・・・・・・・・・・・・・・・（５）式 (Non-purpose signal extraction unit)
The non-target signal extraction units 41 and 42 perform spectral subtraction (SS) on the outputs A _R and A _L of the unidirectional forming units 31 and 32 from the output X _DS of the signal addition unit 21, respectively, and ±± The non-target sounds N _UDL and N _UDR existing in the 90 ° direction (dead angle) are extracted. Extraction of the non-target sounds N _UDL and N _UDR is performed according to the equations (4) and (5).
N _UDL = X _DS -A _R (4) Formula N _UDR = X _DS -A _L ...・・・・・・・・・・・・ (5)

つまり、非目的信号抽出部４１は、振幅スペクトルＸ_ＤＳから振幅スペクトルＡ_Ｒをスペクトル減算（ＳＳ）することにより、目的方向に対して「＋９０°」方向に存在する非目的音Ｎ_ＵＤＬを抽出する（図２（ｂ）参照）。そして、非目的信号抽出部４１は、抽出した非目的音Ｎ_ＵＤＬを目的信号抽出部５１に出力する。
また、非目的信号抽出部４２は、振幅スペクトルＸ_ＤＳから振幅スペクトルＡ_Ｌをスペクトル減算（ＳＳ）することにより、目的方向に対して「−９０°」方向に存在する非目的音Ｎ_ＵＤＲを抽出する（図２（ｂ）参照）。そして、非目的信号抽出部４２は、抽出した非目的音Ｎ_ＵＤＲを目的信号抽出部５１に出力する。 That is, non-target signal extraction unit 41, by spectral subtraction (SS) the amplitude spectrum _{A R} from the amplitude spectrum _{X DS,} to extract the non-target sound _{N UDL} present in "+ 90 °" direction relative to the intended direction (See FIG. 2 (b)). Then, the non-purpose signal extraction unit 41 outputs the extracted non-purpose sound N _UDL to the target signal extraction unit 51.
Also, non-target signal extraction unit 42, by spectral subtraction (SS) the amplitude spectrum _{A L} from the amplitude spectrum _{X DS,} extracted non-target sound _{N UDR} present in "-90 °" direction relative to the intended direction (See FIG. 2 (b)). Then, the non-purpose signal extraction unit 42 outputs the extracted non-purpose sound N _UDR to the target signal extraction unit 51.

なお、ここでは、非目的信号抽出部４１，４２は、信号加算部２１の出力Ｘ_ＤＳを使用しているが、信号入力部１１又は信号入力部１２の出力Ｘ_１，Ｘ_２をそのまま使用することも出来る。後記する目的信号抽出部５１についても同様である。 Here, the non-target signal extraction unit 41 and 42, the use of the output _{X DS} signal addition unit 21 and used as an output _X 1, _{X 2} of the signal input unit 11 or the signal input unit 12 You can also The same applies to the target signal extraction unit 51 described later.

（目的信号抽出部）
目的信号抽出部５１は、信号加算部２の出力Ｘ_ＤＳから非目的音抽出部４１，４２で抽出した目的方向に対して±９０°方向に存在する非目的音Ｎ_ＵＤＬ，Ｎ_ＵＤＲをスペクトル減算（ＳＳ）し、目的音の振幅スペクトルＹを抽出する。目的音の振幅スペクトルＹの抽出は、（６）式に従い行われる。ここで、「β_Ｌ」と「β_Ｒ」は、スペクトル減算（ＳＳ）の強度を調整するための係数である。
Ｙ＝Ｘ_ＤＳ−β_ＬＮ_ＤＵＬ−β_ＲＮ_ＤＵＲ・・・・・・・・・・・・・（６）式 (Target signal extraction unit)
Target signal extraction part 51, a non-target sound present in the ± 90 ° direction with respect to the destination direction extracted from the output _{X DS} signal addition unit 2 in a non-target sound extraction section 41 and 42 _N _UDL, spectral subtraction the _{N UDR} (SS) and the amplitude spectrum Y of the target sound is extracted. Extraction of the amplitude spectrum Y of the target sound is performed according to the equation (6). Here, “β _L ” and “β _R ” are coefficients for adjusting the intensity of spectral subtraction (SS).
_{_{_{Y = X DS -β L N DUL}}} -β R N DUR ············· (6) formula

これにより、目的信号抽出部５１は、目的方向に対して鋭い指向性を有する目的音の振幅スペクトルＹを形成する（図２（ｃ）参照）。そして、目的信号抽出部５１は、目的音の振幅スペクトルＹを出力する。
以上で、実施形態に係る音源分離装置１００の構成についての説明を終了する。 Accordingly, the target signal extraction unit 51 forms an amplitude spectrum Y of the target sound having a sharp directivity with respect to the target direction (see FIG. 2C). Then, the target signal extraction unit 51 outputs the amplitude spectrum Y of the target sound.
This is the end of the description of the configuration of the sound source separation device 100 according to the embodiment.

≪実施形態に係る音源分離装置の効果≫
続いて、図３を参照して、実施形態に係る音源分離装置１００の効果について説明する。図３は、実施形態に係る音源分離装置の使用状況の一例を示す図である。
図３では、目的方向から見て左側に第１のマイクロホンＭ１が配置され、目的方向から見て右側に第２のマイクロホンＭ２が配置される。ここでは便宜上、第１のマイクロホンＭ１と話者と左のスピーカＬとの距離、及び第２のマイクロホンＭ２と話者と右のスピーカＲとの距離が等しいものとする。 << Effect of the sound source separation apparatus according to the embodiment >>
Next, the effect of the sound source separation device 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram illustrating an example of a usage state of the sound source separation device according to the embodiment.
In FIG. 3, the first microphone M1 is disposed on the left side when viewed from the target direction, and the second microphone M2 is disposed on the right side when viewed from the target direction. Here, for convenience, it is assumed that the distance between the first microphone M1, the speaker, and the left speaker L and the distance between the second microphone M2, the speaker, and the right speaker R are equal.

スピーカＬ，Ｒから再生された音（非目的音）が２つのマイクロホンＭ１，Ｍ２に到達する時間差をτとすると、ある時刻ｔにおいて、第１のマイクロホンＭ１が収音した音響信号ｘ_１（ｔ）には、目的音である話者の音声ｓ（ｔ）、非目的音である左のスピーカＬの再生音ｐ_Ｌ（ｔ）、非目的音である右のスピーカＲの再生音ｐ_Ｒ（ｔ−τ）が含まれている。また、第２のマイクロホンＭ２が収音した音響信号ｘ_２（ｔ）には、目的音である話者の音声ｓ（ｔ）、非目的音である左のスピーカＬの再生音ｐ_Ｌ（ｔ）、非目的音である右のスピーカＲの再生音ｐ_Ｒ（ｔ−τ）が含まれている。 Assuming that τ is the time difference between the sounds (non-target sounds) reproduced from the speakers L and R reaching the two microphones M1 and M2, the acoustic signal x ₁ (t) collected by the first microphone M1 at a certain time t. ) Include the speaker's voice s (t) as the target sound, the playback sound p _L (t) of the left speaker L as the non-target sound, and the playback sound p _R (of the right speaker R as the non-target sound. t-τ). Further, the acoustic signal x ₂ (t) picked up by the second microphone M2 includes the speaker's voice s (t) as the target sound and the reproduced sound p _L (t of the left speaker L as the non-target sound. ), The reproduction sound p _R (t−τ) of the right speaker R, which is a non-target sound, is included.

左右のスピーカＬ，Ｒから同じ音が同じ大きさで再生されている場合、音響信号ｘ_１（ｔ）と音響信号ｘ_２（ｔ）とに含まれる成分は等しくなり、時刻ｔを中心とする時間フレームでＦＦＴ演算したそれぞれの振幅スペクトルＸ_１（ｔ），Ｘ_２（ｔ）は共に（７）式として表される。ここで、「Ｓ」は目的音の振幅スペクトル、「Ｐ」は非目的音の振幅スペクトルである。
Ｘ（ｔ）＝Ｓ（ｔ）＋Ｐ（ｔ）＋Ｐ（ｔ−τ）・・・・・・・・・・（７）式 When the same sound is reproduced from the left and right speakers L and R with the same volume, the components included in the acoustic signal x ₁ (t) and the acoustic signal x ₂ (t) are equal and centered at time t. The amplitude spectra X ₁ (t) and X ₂ (t) obtained by performing the FFT operation in the time frame are both expressed as Equation (7). Here, “S” is the amplitude spectrum of the target sound, and “P” is the amplitude spectrum of the non-target sound.
X (t) = S (t) + P (t) + P (t−τ) (7)

また、スピーカＬ，Ｒから再生された音が２つのマイクロホンＭ１，Ｍ２に到達する時間差（前記した遅延時間τ）が十分に小さい場合、Ｐ（ｔ）≒Ｐ（ｔ−τ）と近似できる。その為、（７）式は、（８）式として表すことができる。
Ｘ（ｔ）＝Ｓ（ｔ）＋２Ｐ（ｔ）・・・・・・・・・・・・・・・・（８）式 Further, when the time difference between the sound reproduced from the speakers L and R reaching the two microphones M1 and M2 (the delay time τ described above) is sufficiently small, it can be approximated as P (t) ≈P (t−τ). Therefore, equation (7) can be expressed as equation (8).
X (t) = S (t) + 2P (t) (8)

次に、単一指向性形成部３１，３２は、各マイクロホンＭ１，Ｍ２が収音した音響信号の振幅スペクトルＸ_１（ｔ），Ｘ_２（ｔ）から、「＋９０°」方向に死角を向ける単一指向性を有する振幅スペクトルＡ_Ｒ（ｔ）と、「−９０°」方向に死角を向ける単一指向性を有する振幅スペクトルＡ_Ｌ（ｔ）とを形成する。振幅スペクトルＸ_１（ｔ）＝Ｘ_２（ｔ）の場合、各単一指向性形成部３１，３２が出力する振幅スペクトルＡ_Ｌ（ｔ），Ａ_Ｒ（ｔ）も等しくなる（Ａ_Ｌ（ｔ）＝Ａ_Ｒ（ｔ））。単一指向性形成部３１，３２の出力である振幅スペクトルＡ（ｔ）を（９）式に示す。（９）式に示すＡ（ｔ）は、単一指向性の死角により左右どちらか一方の非目的音Ｐ（ｔ）が抑圧されるため、Ａ（ｔ）に含まれるＰ（ｔ）はＸ（ｔ）に比べて半分になる。なお、Ａ（ｔ）は、周波数毎のゲイン補正を行っているものとする。
Ａ（ｔ）＝Ｓ（ｔ）＋Ｐ（ｔ）・・・・・・・・・・・・・・・・・（９）式 Next, the unidirectional formation units 31 and 32 direct the blind spots in the “+ 90 °” direction from the amplitude spectra X ₁ (t) and X ₂ (t) of the acoustic signals collected by the microphones M1 and M2. An amplitude spectrum A _R (t) having unidirectionality and an amplitude spectrum A _L (t) having unidirectionality in which a blind spot is directed in the “−90 °” direction are formed. In the case of the amplitude spectrum X ₁ (t) = X ₂ (t), the amplitude spectra A _L (t) and A _R (t) output from the unidirectional forming units 31 and 32 are also equal (A _L (t ) = A _R (t)). The amplitude spectrum A (t) that is the output of the unidirectional forming units 31 and 32 is shown in the equation (9). Since A (t) shown in Equation (9) suppresses the left or right non-target sound P (t) due to the unidirectional blind spot, P (t) included in A (t) is X It becomes half compared to (t). It is assumed that A (t) is gain correction for each frequency.
A (t) = S (t) + P (t) (9)

続いて、非目的信号抽出部４１，４２は、（１０）式に従い、Ｘ（ｔ）からＡ（ｔ）をスペクトル減算（ＳＳ）する。これにより、Ｓ（ｔ）が抑圧され、目的方向に対して±９０°の方向（死角）に存在する非目的音Ｐ（ｔ）を抽出する。その結果、非目的信号抽出部４１，４２の出力Ｎ（ｔ）にはＰ（ｔ）のみ含まれることになる。
Ｎ（ｔ）＝Ｘ（ｔ）−Ａ（ｔ）・・・・・・・・・・・・・・・・・（１０）式 Subsequently, the non-target signal extraction units 41 and 42 perform spectral subtraction (SS) of A (t) from X (t) according to the equation (10). As a result, S (t) is suppressed, and the non-target sound P (t) existing in a direction (dead angle) of ± 90 ° with respect to the target direction is extracted. As a result, only P (t) is included in the outputs N (t) of the non-purpose signal extraction units 41 and 42.
N (t) = X (t) −A (t) (10)

最後に、目的信号抽出部５１は、（１１）式に従い、Ｘ（ｔ）からＮ（ｔ）をスペクトル減算（ＳＳ）する。ここで「β」は、スペクトル減算（ＳＳ）の強度を調節する係数（パラメータ）である。（８）式が成立する場合、「β＝２」と設定すると、Ｘ（ｔ）に含まれるＰ（ｔ）が抑圧され、最終出力Ｙ（ｔ）＝Ｓ（ｔ）となり目的音を抽出することができる。
Ｙ（ｔ）＝Ｘ（ｔ）−βＮ（ｔ）・・・・・・・・・・・・・・・・（１１）式 Finally, the target signal extraction unit 51 performs spectral subtraction (SS) of N (t) from X (t) according to the equation (11). Here, “β” is a coefficient (parameter) for adjusting the intensity of spectral subtraction (SS). If the equation (8) is satisfied, if “β = 2” is set, P (t) included in X (t) is suppressed, and the final output Y (t) = S (t) is obtained to extract the target sound. be able to.
Y (t) = X (t) −βN (t) (11)

次に、実施形態に係る音源分離装置１００の性能評価実験の結果を説明する。図４は、図３に示す配置において、左右のスピーカＬ，Ｒから違う音、又は同じ音が同時に再生されたと想定した場合における、再生音の抑圧量を計算機でシミュレーションした結果である。実験条件としては、使用した目的音及び非目的音は共に音声であり、抑圧量指標としてＮＲＲ（Noise Reduction Rate）を用いた。また、本発明との比較には、従来法として特許文献２の手法を用いた。 Next, the result of the performance evaluation experiment of the sound source separation device 100 according to the embodiment will be described. FIG. 4 is a result of simulation of the amount of suppression of reproduced sound when it is assumed that different sounds from the left and right speakers L and R or the same sound are simultaneously reproduced in the arrangement shown in FIG. As experimental conditions, both the target sound and the non-target sound used were speech, and NRR (Noise Reduction Rate) was used as a suppression amount index. For comparison with the present invention, the method of Patent Document 2 was used as a conventional method.

左右のスピーカＬ，Ｒから違う音声が再生された場合、従来法でのＮＲＲは「３６ｄＢ」となり、本発明でのＮＲＲは「３２ｄＢ」となった。どちらも十分に非目的音を抑圧できていることが分かる。しかしながら、左右のスピーカＬ，Ｒから同じ音声が再生されば場合、従来法でのＮＲＲは「５ｄＢ」となり抑圧量が劣化しているのが分かる。これは、左右のスピーカＬ，Ｒのどちらか一方の非目的音だけしか抑圧できていないためである。それに比べ、本発明でのＮＲＲは「２３ｄＢ」と高い抑圧量を誇っており、左右のスピーカＬ，Ｒの非目的音を抑圧できていることが分かる。 When different sounds were reproduced from the left and right speakers L and R, the NRR in the conventional method was “36 dB”, and the NRR in the present invention was “32 dB”. It can be seen that both can sufficiently suppress the non-target sound. However, if the same sound is reproduced from the left and right speakers L and R, it can be seen that the NRR in the conventional method is “5 dB” and the suppression amount is degraded. This is because only the non-target sound of one of the left and right speakers L and R can be suppressed. In comparison, the NRR in the present invention is proud of a high suppression amount of “23 dB”, and it can be seen that the non-target sounds of the left and right speakers L and R can be suppressed.

以上のように、実施形態に係る音源分離装置１００は、目的方向に対して左右の方向に単一指向性を形成し、各単一指向性の死角方向に存在する非目的音を抽出する。そして、入力信号から抽出した非目的音をスペクトル減算（ＳＳ）することにより、目的方向に鋭い指向性を形成することができる。その為、実施形態に係る音源分離装置１００は、同じ音（非目的音）を同じ大きさで発生する２つの音源の真ん中に配置された場合でも、非目的音を抑圧可能であり、目的方向の音源から到達する目的音のみ分離して抽出することができる。 As described above, the sound source separation device 100 according to the embodiment forms unidirectivity in the left and right directions with respect to the target direction, and extracts the non-target sound existing in the blind direction of each unidirectional. A sharp directivity can be formed in the target direction by spectral subtraction (SS) of the non-target sound extracted from the input signal. Therefore, the sound source separation device 100 according to the embodiment can suppress the non-target sound even when the sound source is arranged in the middle of two sound sources that generate the same sound (non-target sound) with the same magnitude, and the target direction It is possible to separate and extract the target sound that reaches from the sound source.

［変形例］
以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、特許請求の範囲の趣旨を変えない範囲で実施することができる。実施形態の変形例を以下に示す。 [Modification]
As mentioned above, although embodiment of this invention was described, this invention is not limited to this, It can implement in the range which does not change the meaning of a claim. The modification of embodiment is shown below.

実施形態では、音源分離装置１００が２つのマイクロホンＭ１，Ｍ２を備える場合を説明したが、音源分離装置１００は、外部のマイクロホンが収音した音響信号を有線回線又は無線回線を用いて受信し、受信した音響信号を用いて目的方向に鋭い指向性を形成してもよい。 In the embodiment, the case where the sound source separation device 100 includes two microphones M1 and M2 has been described. However, the sound source separation device 100 receives an acoustic signal collected by an external microphone using a wired line or a wireless line, Sharp directivity may be formed in the target direction using the received acoustic signal.

また、実施形態では、音源分離装置１００が２つのマイクロホンＭ１，Ｍ２から収音した音響信号を用いて目的方向に鋭い指向性を形成する場合を説明した。しかしながら、音源分離装置１００は、目的方向に対して水平に配置される複数のマイクロホンから収音した音響信号を用いればよく、マイクロホンの数は特に限定されるものではない。図５を参照して、４つのマイクロホンＭ１，Ｍ２，Ｍ３，Ｍ４が収音した音響信号を用いて目的方向に鋭い指向性を形成する場合を説明する。 Further, in the embodiment, the case where the sound source separation device 100 forms sharp directivity in the target direction using the acoustic signals collected from the two microphones M1 and M2 has been described. However, the sound source separation device 100 may use acoustic signals collected from a plurality of microphones arranged horizontally with respect to the target direction, and the number of microphones is not particularly limited. With reference to FIG. 5, a description will be given of a case where sharp directivity is formed in a target direction using acoustic signals collected by the four microphones M1, M2, M3, and M4.

図５（ａ）は変形例に係る音源分離装置が備えるマイクロホンの配置を示す図である。図５（ｂ）は単一指向性を形成するためのマイクロホンの組み合わせ例を示す図である。
図５（ａ）に示すように、目的音が正面（ｚ軸方向）から到来する場合に、ｘｙ平面上に４個の無指向性のマイクロホンＭ１，Ｍ２，Ｍ３，Ｍ４を配置する。 FIG. 5A is a diagram illustrating an arrangement of microphones included in a sound source separation device according to a modification. FIG. 5B is a diagram illustrating a combination example of microphones for forming unidirectionality.
As shown in FIG. 5A, when the target sound comes from the front (z-axis direction), four omnidirectional microphones M1, M2, M3, and M4 are arranged on the xy plane.

変形例に係る音源分離装置２００（図示せず）は、４個のマイクロホンＭ１，Ｍ２，Ｍ３，Ｍ４の内の２個ずつを図５（ｂ）に示すように「第４のマイクロホンＭ４及び第１のマイクロホンＭ１のペアＢ１」、「第１のマイクロホンＭ１及び第２のマイクロホンＭ２のペアＢ２」、「第２のマイクロホンＭ２及び第３のマイクロホンＭ３のペアＢ３」、「第３のマイクロホンＭ３及び第４のマイクロホンＭ４のペアＢ４」の４通りを作成する。続いて、音源分離装置２００（図示せず）は、この４通りのペアＢ１〜Ｂ４を用いて、上下左右４方向の単一指向性を形成する。上下左右４方向の単一指向性の形成は、例えば、以下に示す（１２）〜（１５）式に従って行われる。 The sound source separation device 200 (not shown) according to the modified example is configured such that two of the four microphones M1, M2, M3, and M4 are “fourth microphone M4 and second microphone” as shown in FIG. "A pair B1 of one microphone M1", "a pair B2 of a first microphone M1 and a second microphone M2", "a pair B3 of a second microphone M2 and a third microphone M3", "a third microphone M3 and Four patterns of “pair B4 of the fourth microphone M4” are created. Subsequently, the sound source separation device 200 (not shown) uses these four pairs B1 to B4 to form unidirectionalities in four directions, up, down, left, and right. The formation of unidirectionality in the four directions of up, down, left, and right is performed, for example, according to the following equations (12) to (15).

Ａ_１（ω）＝Ｘ_１（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_４（ω）・・・（１２）式
Ａ_２（ω）＝Ｘ_２（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_１（ω）・・・（１３）式
Ａ_３（ω）＝Ｘ_３（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_２（ω）・・・（１４）式
Ａ_４（ω）＝Ｘ_４（ω）−［ｅｘｐ（−ｊωτ）］Ｘ_３（ω）・・・（１５）式 A ₁ (ω) = X ₁ (ω) − [exp (−jωτ)] X ₄ (ω) (12) Formula A ₂ (ω) = X ₂ (ω) − [exp (−jωτ)] X ₁ (ω) (13) Formula A ₃ (ω) = X ₃ (ω) − [exp (−jωτ)] X ₂ (ω) (14) Formula A ₄ (ω) = X ₄ (ω) − [exp (−jωτ)] X ₃ (ω) (15)

続いて、変形例に係る音源分離装置２００（図示せず）は、上下左右４方向の単一指向性を用いて、各単一指向性の死角方向に存在する非目的音を抽出する。そして、入力信号から抽出した非目的音をスペクトル減算（ＳＳ）することにより、目的方向に鋭い指向性を形成することができる。その為、変形例に係る音源分離装置２００（図示せず）は、同じ音（非目的音）を同じ大きさで発生する上下左右に配置される４つの音源の真ん中に配置された場合でも、非目的音を抑圧可能であり、目的方向の音源から到達する目的音のみ分離して抽出することができる。 Subsequently, the sound source separation device 200 (not shown) according to the modified example extracts the non-target sound existing in the blind spot direction of each unidirectional using the unidirectionality in the four directions of up, down, left, and right. A sharp directivity can be formed in the target direction by spectral subtraction (SS) of the non-target sound extracted from the input signal. Therefore, even when the sound source separation device 200 (not shown) according to the modified example is arranged in the middle of four sound sources that are arranged vertically and horizontally that generate the same sound (non-target sound) with the same magnitude, The non-target sound can be suppressed, and only the target sound that arrives from the sound source in the target direction can be separated and extracted.

また、実施形態では、音響信号の振幅スペクトルを用いて目的方向に鋭い指向性を計算する場合を説明したが、パワースペクトルを用いて計算を行ってもよい。 In the embodiment, the case where the directivity sharp in the target direction is calculated using the amplitude spectrum of the acoustic signal has been described. However, the calculation may be performed using the power spectrum.

１１，１２信号入力部
２１信号加算部
３１，３２単一指向性形成部
４１，４２非目的信号抽出部
５１目的信号抽出部
１００音源分離装置
Ｍ１，Ｍ２，Ｍ３，Ｍ４マイクロホン DESCRIPTION OF SYMBOLS 11, 12 Signal input part 21 Signal addition part 31, 32 Unidirectional formation part 41, 42 Non-target signal extraction part 51 Target signal extraction part 100 Sound source separation device M1, M2, M3, M4 Microphone

Claims

Of the sound that reaches a plurality of microphones located in the plane whose normal direction is the target direction, the target sound that arrives from the sound source that exists in the target direction and the non-purpose that arrives from the sound source that exists in the other direction A sound source separation device for separating sound from
Using the acoustic signal output by any two microphones of the plurality of microphones, the dead angle is reverse to the acoustic signal having a unidirectionality in which the blind angle is directed to any of the directions connecting the two microphones. A unidirectional forming unit that generates two so that
The unidirectionality generated by the unidirectional forming unit with respect to an acoustic signal output from any one of the plurality of microphones or an averaged acoustic signal output from two or more microphones. A non-target signal extraction unit that performs spectral subtraction using two acoustic signals and extracts a signal of a non-target sound that arrives from each blind spot;
A non-target sound signal extracted by the non-target signal extraction unit is used for an acoustic signal output from any one of the plurality of microphones or an average signal of an acoustic signal output from two or more microphones. A target signal extraction unit that performs spectral subtraction and extracts a target sound signal;
A sound source separation device comprising:

Of the sound that reaches a plurality of microphones located in the plane whose normal direction is the target direction, the target sound that arrives from the sound source that exists in the target direction and the non-purpose that arrives from the sound source that exists in the other direction A sound source separation method executed by a sound source separation device that separates sound,
Using the acoustic signal output by any two microphones of the plurality of microphones, the dead angle is reverse to the acoustic signal having a unidirectionality in which the blind angle is directed to any of the directions connecting the two microphones. Generate two so that
A spectrum obtained by using two acoustic signals having the unidirectionality generated with respect to an acoustic signal output from any one of the plurality of microphones or an average of acoustic signals output from two or more microphones. Subtract, extract the signal of the non-target sound that arrives from each blind spot,
Spectral subtraction is performed on the acoustic signal output from any one of the plurality of microphones, or the average of the acoustic signals output from two or more microphones, using the extracted non-target sound signal. Extract the signal of the
A sound source separation method characterized by the above.

A sound source separation program for causing a computer to execute the sound source separation method according to claim 2.