JP6473066B2

JP6473066B2 - Noise suppression device, method and program thereof

Info

Publication number: JP6473066B2
Application number: JP2015210079A
Authority: JP
Inventors: 翔一郎齊藤; 小林　和則; 和則小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-10-26
Filing date: 2015-10-26
Publication date: 2019-02-20
Anticipated expiration: 2035-10-26
Also published as: JP2017083583A

Description

本発明は、ある収音信号に含まれる雑音成分を他の収音信号を利用して抑圧する雑音抑圧技術に関する。特に、接話型の複数のマイクロホンから得られる複数の収音信号のうちの一つの収音信号に含まれる雑音成分を抑圧する雑音抑圧技術に関する。 The present invention relates to a noise suppression technique for suppressing a noise component included in a certain collected sound signal by using another collected sound signal. In particular, the present invention relates to a noise suppression technique for suppressing a noise component contained in one of a plurality of collected sound signals obtained from a plurality of close-talking microphones.

音声をマイクロホンで収音する場合、音声とともに周囲環境の雑音を収音してしまうことは不可避の事象である。従って、目的音成分と雑音成分とを含む音をマイクロホンで収音する場合には、何らかの方法で雑音成分を除去ないし抑圧する技術がこれまで研究されてきた。 When sound is picked up by a microphone, it is an inevitable event that sound of the surrounding environment is picked up along with the sound. Therefore, when a sound including a target sound component and a noise component is collected by a microphone, techniques for removing or suppressing the noise component by some method have been studied so far.

例えば、従来、マイクロホンを利用して、雑音抑圧を行う場合、スペクトルサブトラクション法が演算量も少ないので一般的に用いられてきた（非特許文献１参照）。スペクトルサブトラクション法は、接話マイクロホン等で収音した信号から雑音区間（すなわち、収音したい音声（目的音）が含まれない時間区間で、非音声区間とも呼ぶ）の雑音パワーを推定する。そして推定した雑音パワーを用いて音声区間（目的音が含まれる時間区間）の収音信号に重畳する雑音成分を周波数スペクトル上で差し引くことで雑音を抑圧する手法である。 For example, conventionally, when noise suppression is performed using a microphone, the spectral subtraction method has been generally used because of a small amount of calculation (see Non-Patent Document 1). In the spectral subtraction method, the noise power of a noise interval (that is, a time interval in which a voice (target sound) to be collected (target sound) is not included) is estimated from a signal collected by a close-talking microphone or the like. This is a technique for suppressing noise by subtracting, on the frequency spectrum, a noise component to be superimposed on a collected sound signal in a speech section (time section including the target sound) using the estimated noise power.

また、モノラルマイクロホン向けのスペクトルサブトラクション法以外にも、スマートフォンに二つのマイクロホンを搭載し、二つのマイクロホンで収音した信号からマイクロホンアレイ処理を行い、背面に配置されたサブマイクロホンで収音した信号の成分を、通話時において口元近傍に位置するように配置されたメインマイクロホンの信号から除去することで、雑音抑圧を行う雑音抑圧処理が用いられている（非特許文献２及び非特許文献３参照）。この処理方法が成り立つ前提には、二つのマイクロホンの特性はある程度同じで、かつ、サブマイクロホンは雑音のみ収音し、メインマイクロホンは目的音と雑音との双方を収音するという仮定がある。 In addition to the spectral subtraction method for monaural microphones, two microphones are mounted on the smartphone, microphone array processing is performed from the signals collected by the two microphones, and the signals collected by the sub-microphones arranged on the back are Noise suppression processing is performed in which noise suppression is performed by removing the component from the signal of the main microphone arranged so as to be positioned near the mouth during a call (see Non-Patent Document 2 and Non-Patent Document 3). . The premise for this processing method is that the characteristics of the two microphones are the same to some extent, the sub microphones only pick up noise, and the main microphone picks up both the target sound and noise.

目的音に重畳する雑音の音源は、目的音の音源より離れた位置に存在し、雑音の音源とマイクロホンとの間の伝達特性の影響をより大きく受ける。加えて、一般的にはその特性は未知であるため、それを推定する必要がある。そこで、その伝達過程を未知システムとして適応フィルタによるシステム同定を行い、サブマイクロホンの収音信号に適応フィルタを乗じて得られるフィルタ出力をメインマイクロホンの収音信号から差し引くことで目的音を取り出す。 The noise source superimposed on the target sound exists at a position distant from the target sound source, and is more greatly affected by the transfer characteristics between the noise source and the microphone. In addition, since its characteristics are generally unknown, it is necessary to estimate it. Therefore, system identification by an adaptive filter is performed with the transmission process as an unknown system, and the target sound is extracted by subtracting the filter output obtained by multiplying the sound collection signal of the sub microphone by the adaptive filter from the sound collection signal of the main microphone.

このとき、二つのマイクロホンの間隔は大き過ぎず、小さ過ぎないことが望ましい。二つのマイクロホンは間隔が大き過ぎる場合、互いに異なった特徴の雑音成分を収音することになり、単純なスペクトルサブトラクション法では誤った雑音成分を差し引くことになるためである。他方、二つのマイクロホンの間隔が小さ過ぎる場合、各マイクロホンの雑音成分の相関性は高まるが、本来除去対象とすべきでない目的音成分もサブマイクロホンで雑音成分と同時に収音することになり、サブマイクロホンは雑音のみ収音するという前提が崩れてしまう。すなわち二つのマイクロホンを用いたスペクトルサブトラクション法は、二つのマイクロホンで雑音を収音しながら、その相関性を保ちつつ、目的音をメインマイクロホンでのみ収音しなければならないという相反する音響的特性を理想として適用されている。しかし現実的には、二つのマイクロホンの特性がそろっているため、回り込んだ目的音をサブマイクロホンで収音しないようにすることは困難である。 At this time, it is desirable that the distance between the two microphones is not too large and not too small. This is because, if the distance between the two microphones is too large, noise components having different characteristics will be collected, and the erroneous noise component will be subtracted in the simple spectral subtraction method. On the other hand, if the distance between the two microphones is too small, the correlation between the noise components of each microphone will increase, but the target sound component that should not be removed will be collected simultaneously with the noise component by the sub microphone. The premise that microphones only collect noise is broken. In other words, the spectral subtraction method using two microphones has the contradictory acoustic characteristics that the target sound must be picked up only by the main microphone while keeping the correlation while picking up the noise with the two microphones. Applied as an ideal. However, in reality, since the characteristics of the two microphones are the same, it is difficult to prevent the target sound that has been turned around from being collected by the sub microphone.

BOLL S. F., "Suppression of acoustic noise in speech using spectral subtraction. Acoustics", Speech and Signal Processing, 1979, IEEE Transactions on, Volume:27 , Issue: 2, pp.113-120.BOLL S. F., "Suppression of acoustic noise in speech using spectral subtraction.Acoustics", Speech and Signal Processing, 1979, IEEE Transactions on, Volume: 27, Issue: 2, pp.113-120. Jian Zhang et. al. "A FAST TWO-MICROPHONE NOISE REDUCTION ALGORITHM BASED ON POWER LEVEL RATIO FOR MOBILE PHONE", Kowloon: Chinese Spoken Language Processing (ISCSLP), 2012, 8th International Symposium on, pp.206-209.Jian Zhang et. Al. "A FAST TWO-MICROPHONE NOISE REDUCTION ALGORITHM BASED ON POWER LEVEL RATIO FOR MOBILE PHONE", Kowloon: Chinese Spoken Language Processing (ISCSLP), 2012, 8th International Symposium on, pp.206-209. 中西功、「知識の森」、1 群（信号・システム）-- 9 編（ディジタル信号処理）3 章適応信号処理、［online］、電子情報通信学会、「知識の森」、[平成26年10月23日検索]、インターネット<http://www.ieice-hbkb.org/files/01/01gun_09hen_03m.pdf>Isao Nakanishi, “Knowledge Forest”, Group 1 (Signal / System)-Volume 9 (Digital Signal Processing) Chapter 3, Adaptive Signal Processing, [online], IEICE, “Knowledge Forest”, [2014 Search October 23], Internet <http://www.ieice-hbkb.org/files/01/01gun_09hen_03m.pdf>

近接した複数のマイクロホンを用いる場合、１つのサブマイクロホンの収音信号を用いてメインマイクロホンの収音信号から雑音成分を除去するスペクトルサブトラクション法を行うだけでは、サブマイクロホンで収音してしまっている目的音までもメインマイクロホンの収音信号から除去してしまい、目的音にミュージカルノイズ等として知られている劣化や音声の歪みが生じてしまうという課題がある。 When using a plurality of adjacent microphones, the sub-microphone picks up sound only by performing the spectral subtraction method that removes the noise component from the sound pickup signal of the main microphone using the sound pickup signal of one sub-microphone. There is also a problem that even the target sound is removed from the collected sound signal of the main microphone, and the target sound is deteriorated, which is known as musical noise, and distortion of the sound occurs.

そこで、劣化や音声の歪みを抑えるために、非音声区間である雑音区間でのみ適応フィルタによるシステム同定を行う。雑音区間で推定された適応フィルタを用いることで目的音を残しながら雑音を消すことができる。また、接話であるという位置関係を活かしたフィルタリング処理により、その雑音消去性能をより効果的にする。 Therefore, in order to suppress deterioration and distortion of speech, system identification is performed by an adaptive filter only in a noise interval that is a non-speech interval. By using the adaptive filter estimated in the noise section, it is possible to eliminate the noise while leaving the target sound. In addition, the noise cancellation performance is made more effective by the filtering process utilizing the positional relationship of close talk.

本発明では、この非音声区間のみで適応フィルタの学習をすすめる処理を適応フィルタの式を変形することで実現する。本発明では、新たにVAD（voice activity detection）などの装置を必要とせずに、劣化や音声の歪みを抑え、雑音を抑圧する雑音抑圧装置、その方法及びプログラムを提供することを目的とする。 In the present invention, the process of promoting learning of the adaptive filter only in the non-speech period is realized by modifying the expression of the adaptive filter. An object of the present invention is to provide a noise suppression apparatus, a method and a program for suppressing noise by suppressing deterioration and distortion of a voice without requiring a new apparatus such as VAD (voice activity detection).

上記の課題を解決するために、本発明の一態様によれば、雑音抑圧装置は、第一主音声信号に含まれる雑音成分を第一参照信号と第二参照信号とを用いて抑圧する。雑音抑圧装置は、第一参照信号に第一適応フィルタを用いてフィルタリングを行い、第一フィルタリング後信号を求める第一適応フィルタ部と、第二参照信号に第二適応フィルタを用いてフィルタリングを行い、第二フィルタリング後信号を求める第二適応フィルタ部と、第一主音声信号から第一フィルタリング後信号と第二フィルタリング後信号とを減算して得られる値を誤差信号として求める減算部とを含み、第一適応フィルタ部において、誤差信号と第一参照信号とを用いて、誤差信号が最小となるように逐次的に第一適応フィルタを更新し、第一参照信号に対する誤差信号の割合である誤差割合β₁の絶対値が所定の閾値a₁以下の場合には、誤差割合β₁に対する単調増加関数に基づく第１−１更新量により第一適応フィルタを更新し、誤差割合β₁の絶対値が所定の閾値a₁より大きい場合には、誤差割合β₁に対して第１−１更新量よりも増加量が小さい単調増加関数に基づく第２−１更新量により第一適応フィルタを更新し、第二適応フィルタ部において、誤差信号と第二参照信号とを用いて、誤差信号が最小となるように逐次的に第二適応フィルタを更新し、第二参照信号に対する誤差信号の割合である誤差割合β₂の絶対値が所定の閾値a₂以下の場合には、誤差割合β₂に対する単調増加関数に基づく第１−２更新量により第二適応フィルタを更新し、誤差割合β₂の絶対値が所定の閾値a₂より大きい場合には、誤差割合β₂に対して第１−２更新量よりも増加量が小さい単調増加関数に基づく第２−２更新量により第二適応フィルタを更新する。 In order to solve the above problem, according to an aspect of the present invention, a noise suppression device suppresses a noise component included in a first main audio signal using a first reference signal and a second reference signal. The noise suppression device performs filtering using the first adaptive filter for the first reference signal and filtering using the first adaptive filter unit for obtaining the first filtered signal and the second adaptive filter for the second reference signal. A second adaptive filter unit for obtaining a second filtered signal, and a subtracting unit for obtaining a value obtained by subtracting the first filtered signal and the second filtered signal from the first main audio signal as an error signal. The first adaptive filter unit sequentially updates the first adaptive filter using the error signal and the first reference signal so that the error signal is minimized, and is the ratio of the error signal to the first reference signal. If the absolute value of the error ratio beta ₁ is in a predetermined threshold value a ₁ or less, and updates the first adaptive filter by a 1-1 updating amount based on monotonically increasing function with respect to error rate beta ₁ If the absolute value of the error ratio beta ₁ is greater than a predetermined threshold value a ₁ is the 2-1st update amount based on monotonically increasing function weight increase is smaller than the 1-1 update amount relative error ratio beta ₁ The first adaptive filter is updated, and the second adaptive filter unit sequentially updates the second adaptive filter using the error signal and the second reference signal so that the error signal is minimized. If the absolute value of the error ratio β ₂ , which is the ratio of the error signal with respect to, is less than or equal to the predetermined threshold a ₂ , the second adaptive filter is updated with the 1-2 update amount based on the monotonically increasing function for the error ratio β ₂ . , if the absolute value of the error ratio beta ₂ is greater than a predetermined threshold value a _2, the 2-2nd update amount based on monotonically increasing function weight increase is smaller than the 1-2 update amount relative error ratio beta ₂ To update the second adaptive filter.

上記の課題を解決するために、本発明の他の態様によれば、雑音抑圧方法は、第一主音声信号に含まれる雑音成分を第一参照信号と第二参照信号とを用いて抑圧する。雑音抑圧方法は、第一適応フィルタ部が、第一参照信号に第一適応フィルタを用いてフィルタリングを行い、第一フィルタリング後信号を求める第一適応フィルタステップと、第二適応フィルタ部が、第二参照信号に第二適応フィルタを用いてフィルタリングを行い、第二フィルタリング後信号を求める第二適応フィルタステップと、減算部が、第一主音声信号から第一フィルタリング後信号と第二フィルタリング後信号とを減算して得られる値を誤差信号として求める減算ステップとを含み、第一適応フィルタステップにおいて、誤差信号と第一参照信号とを用いて、誤差信号が最小となるように逐次的に第一適応フィルタを更新し、第一参照信号に対する誤差信号の割合である誤差割合β₁の絶対値が所定の閾値a₁以下の場合には、誤差割合β₁に対する単調増加関数に基づく第１−１更新量により第一適応フィルタを更新し、誤差割合β₁の絶対値が所定の閾値a₁より大きい場合には、誤差割合β₁に対して第１−１更新量よりも増加量が小さい単調増加関数に基づく第２−１更新量により第一適応フィルタを更新し、第二適応フィルタステップにおいて、誤差信号と第二参照信号とを用いて、誤差信号が最小となるように逐次的に第二適応フィルタを更新し、第二参照信号に対する誤差信号の割合である誤差割合β₂の絶対値が所定の閾値a₂以下の場合には、誤差割合β₂に対する単調増加関数に基づく第１−２更新量により第二適応フィルタを更新し、誤差割合β₂の絶対値が所定の閾値a₂より大きい場合には、誤差割合β₂に対して第１−２更新量よりも増加量が小さい単調増加関数に基づく第２−２更新量により第二適応フィルタを更新する。 In order to solve the above problem, according to another aspect of the present invention, a noise suppression method suppresses a noise component included in a first main audio signal using a first reference signal and a second reference signal. . In the noise suppression method, the first adaptive filter unit performs filtering using the first adaptive filter on the first reference signal to obtain a first filtered signal, and the second adaptive filter unit includes A second adaptive filter step for filtering the second reference signal using a second adaptive filter to obtain a second filtered signal; and a subtracting unit that performs the first filtered signal and the second filtered signal from the first main audio signal. Subtracting a value obtained by subtracting as an error signal, and in the first adaptive filter step, the error signal and the first reference signal are used to sequentially minimize the error signal. update an adaptive filter, when the absolute value of the error ratio beta ₁ is the ratio of the error signal of a predetermined threshold value a ₁ or less for the first reference signal, the error split A first adaptive filter updates the first 1-1 updating amount based on monotonically increasing function with respect to engagement beta _1, when the absolute value of the error ratio beta ₁ is greater than a predetermined threshold value a _1, to the error ratio beta ₁ The first adaptive filter is updated with the 2-1 update amount based on the monotonically increasing function whose increase amount is smaller than the 1-1 update amount, and the error signal and the second reference signal are used in the second adaptive filter step. The second adaptive filter is sequentially updated so that the error signal is minimized, and when the absolute value of the error ratio β ₂ , which is the ratio of the error signal to the second reference signal, is equal to or less than the predetermined threshold a ₂ , a second adaptive filter updates the first-second update amount based on monotonically increasing function with respect to error rate beta _2, when the absolute value of the error ratio beta ₂ is greater than a predetermined threshold value a _2, compared error ratio beta ₂ Monotonically increasing function with an increase smaller than the 1-2 update amount The second adaptive filter is updated with the 2-2 update amount based on the number.

本発明によれば、新たにVAD（voice activity detection）などの装置を必要とせずに、劣化や音声の歪みを抑え、雑音を抑圧することができるという効果を奏する。 According to the present invention, there is an effect that it is possible to suppress deterioration and distortion of voice and suppress noise without newly requiring a device such as VAD (voice activity detection).

第一実施形態に係る雑音抑圧装置の機能ブロック図。The functional block diagram of the noise suppression apparatus which concerns on 1st embodiment. 第一実施形態に係る雑音抑圧装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the noise suppression apparatus which concerns on 1st embodiment. 。メインマイクロホンとサブマイクロホンとの位置関係を説明するための図。. The figure for demonstrating the positional relationship of a main microphone and a submicrophone. 制限関数f(β₁)、f(β₂)の例を示す図。The figure which shows the example of the limiting function f ((beta) ₁ ), f ((beta) ₂ ). 制限関数f(β₁)、f(β₂)の例を示す図。The figure which shows the example of the limiting function f ((beta) ₁ ), f ((beta) ₂ ). 第二実施形態に係る雑音抑圧装置の機能ブロック図。The functional block diagram of the noise suppression apparatus which concerns on 2nd embodiment. 第二実施形態に係る雑音抑圧装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the noise suppression apparatus which concerns on 2nd embodiment. メインマイクロホンとサブマイクロホンとの位置関係を説明するための図。The figure for demonstrating the positional relationship of a main microphone and a submicrophone.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

雑音抑圧装置には、メインマイクロホン二つとサブマイクロホン一つとが搭載される。例えば、雑音抑圧装置は、接話マイク（ヘッドセットマイク）を含む構成であり、その特性上、マイクロホンを配置できる位置は顔の周囲に限定されるため、マイクロホンを所定の間隔以上離すことは難しい。そのため、本実施形態では、雑音はメインマイクロホンとサブマイクロホンにおいて同じ音圧（同程度の音圧）で収音されると仮定する。また、本実施形態では、メインマイクロホンで収音する収音信号に含まれる目的音の音圧が、サブマイクロホンで収音する収音信号に含まれる目的音の音圧よりも大きくなるように、メインマイクロホンとサブマイクロホンとを配置する。例えば、口元近傍に位置するようにメインマイクロホンを配置し、口元から最も遠く目的音が入りづらい位置になるようにサブマイクロホンを配置する。より具体的には、例えばメインマイロホン１つ目をヘッドセットのマイクアームの先端部分のさらに先端（口元付近）に配置し、メインマイクロホン２つ目を同じく先端部分のやや根本側（口元から少し離れた場所）に配置し、サブマイクロホンをヘッドセットのヘッドバンドの根元部分など、後頭部や耳付近に配置する。この前提を元に、音声信号（収音信号）をフィルタリングするとともに、音声区間と非音声区間とで適応フィルタの更新量に対して異なる制限をかける制限関数を用いることで、適応フィルタの学習方法を調整し、適応フィルタの安定化と目的音の劣化や音声の歪みを抑え、雑音を抑圧することが本発明のポイントである。 The noise suppression device is equipped with two main microphones and one sub microphone. For example, the noise suppression device has a configuration including a close-talking microphone (headset microphone). Due to its characteristics, the position where the microphone can be placed is limited to the periphery of the face, so it is difficult to separate the microphones beyond a predetermined interval. . Therefore, in the present embodiment, it is assumed that noise is collected with the same sound pressure (similar sound pressure) in the main microphone and the sub microphone. In the present embodiment, the sound pressure of the target sound included in the sound collection signal collected by the main microphone is larger than the sound pressure of the target sound contained in the sound collection signal collected by the sub microphone. A main microphone and a sub microphone are arranged. For example, the main microphone is arranged so as to be located near the mouth, and the sub microphone is arranged so that the target sound is farthest from the mouth and the target sound is difficult to enter. More specifically, for example, the first main microphone is placed further on the tip (near the mouth) of the tip of the microphone arm of the headset, and the second main microphone is also on the slightly root side (a little from the mouth). Place the sub-microphone near the back of the head or ear, such as at the base of the headset headband. Based on this premise, an adaptive filter learning method is performed by filtering a voice signal (sound pickup signal) and using a limit function that places different limits on the update amount of the adaptive filter in a voice interval and a non-voice interval. The point of the present invention is to suppress noise by stabilizing the adaptive filter, suppressing deterioration of the target sound and distortion of the sound.

本実施形態では、接話型ヘッドセットに取り付けられた３つのマイクロホンを利用し、適応フィルタを用いて雑音抑圧、音声強調を行う。 In the present embodiment, noise suppression and speech enhancement are performed using an adaptive filter using three microphones attached to the close-talking headset.

なお、上記のような配置にすると、マイクロホン間の収音信号の関係は次のようになる。目的音はメインマイクロホンにおいて、サブマイクロホンに比べて高い音圧で収音される。また、雑音はメインマイクロホン、サブマイクロホンともに同程度の音圧で収音される。この性質を用いて適応フィルタにより雑音を抑圧し、目的音を強調する。 If the arrangement is as described above, the relationship between the collected sound signals between the microphones is as follows. The target sound is picked up by the main microphone with a higher sound pressure than the sub microphone. In addition, noise is picked up with the same sound pressure in both the main microphone and the sub microphone. Using this property, noise is suppressed by an adaptive filter and the target sound is emphasized.

＜第一実施形態に係る雑音抑圧装置１００＞
図１は第一実施形態に係る雑音抑圧装置１００の機能ブロック図を、図２はその処理フローを示す。なお、図１の機能ブロック図は処理回路を明示するためのものであって、実際には回路構成は雑音抑圧装置１００に内蔵されているものである。 <Noise Suppression Device 100 according to First Embodiment>
FIG. 1 is a functional block diagram of a noise suppression apparatus 100 according to the first embodiment, and FIG. 2 shows a processing flow thereof. Note that the functional block diagram of FIG. 1 is for clearly showing the processing circuit, and the circuit configuration is actually built in the noise suppression apparatus 100.

雑音抑圧装置１００は、メインマイクロホン１０１，１０２とサブマイクロホン１０３と近接音強調部１０４と近接音抑圧部１０５と第一適応フィルタ部１１０と、第二適応フィルタ部１１１と減算部１２０とフィルタ設計部１３０とスペクトルフィルタ部１４０とを含む。 The noise suppression apparatus 100 includes a main microphone 101, 102, a sub microphone 103, a proximity sound enhancement unit 104, a proximity sound suppression unit 105, a first adaptive filter unit 110, a second adaptive filter unit 111, a subtraction unit 120, and a filter design unit. 130 and a spectrum filter unit 140.

＜メインマイクロホン１０１，１０２及びサブマイクロホン１０３＞
メインマイクロホン１０１及び１０２は、それぞれ目的音と雑音を収音し（Ｓ１０１，Ｓ１０２）、第一収音信号s₁(n)および第二収音信号s₂(n)を出力する。サブマイクロホン１０３は、目的音と雑音を収音し（Ｓ１０３）、第二参照信号x₂(n)を出力する。なお、nを時間を表すインデックスとする。メインマイクロホン１０１及び１０２とサブマイクロホン１０３との位置関係を図３に示す。２つのメインマイクロホン１０１及び１０２は話者の口元に配置され、サブマイクロホン１０３は口元から遠い位置に配置される。３つのマイクロホンはワイヤー（アーム）などで固定され、配線ケーブルや無線通信等を介して、外部に配置された雑音抑圧装置１００本体（雑音抑圧装置１００からメインマイクロホン１０１及び１０２とサブマイクロホン１０３とを除いた構成）にそれぞれ接続される。ただし、雑音抑圧装置１００本体は小型化され、マイク固定用のアームなどと一体化した形状であってもよい。 <Main microphones 101 and 102 and sub microphone 103>
The main microphones 101 and 102 collect the target sound and noise, respectively (S101, S102), and output the first sound collection signal s ₁ (n) and the second sound collection signal s ₂ (n). The sub microphone 103 picks up the target sound and noise (S103) and outputs the second reference signal x ₂ (n). Note that n is an index representing time. The positional relationship between the main microphones 101 and 102 and the sub microphone 103 is shown in FIG. The two main microphones 101 and 102 are arranged at the mouth of the speaker, and the sub microphone 103 is arranged at a position far from the mouth. The three microphones are fixed by wires (arms) or the like, and the main body of the noise suppression device 100 (the main microphones 101 and 102 and the sub microphone 103 are connected from the noise suppression device 100) via a wiring cable, wireless communication, or the like. To each other). However, the main body of the noise suppression device 100 may be downsized and integrated with a microphone fixing arm or the like.

例えば、メインマイクロホン１０１及び１０２、並びに、サブマイクロホン１０３は、すべて無指向性型のマイクロホンであり、メインマイクロホン１０１及び１０２のマイクロホン感度の周波数特性はそろっていることとする。また、サブマイクロホン１０３のマイクロホン感度の周波数特性も他２つとそろっていることとする。ただし、本発明は、メインマイクロホン、サブマイクロホンそれぞれについての特性をこれに限定するものではない。 For example, the main microphones 101 and 102 and the sub microphone 103 are all omnidirectional microphones, and the main microphones 101 and 102 have the same frequency characteristics of microphone sensitivity. It is also assumed that the frequency characteristics of the microphone sensitivity of the sub microphone 103 are the same as the other two. However, the present invention does not limit the characteristics of each of the main microphone and the sub microphone.

メインマイクロホン１０１及び１０２は、雑音抑圧装置１００を送受話装置または音声入力装置として利用する際に利用者の口元に近づくように配置されている。サブマイクロホン１０３は、メインマイクロホン１０１及び１０２の配置位置から遠ざけつつ、メインマイクロホン１０１及び１０２が収音する周囲雑音と相関性の高い周囲雑音を収録する位置に配置されている。サブマイクロホン１０３側に利用者の音声が、空間を通じて伝わったり、利用者の骨や筋肉や筐体の振動を通じて伝わったり、あるいは周囲の音響環境で反射するなどによって、収音されることを否定するものではない。 The main microphones 101 and 102 are arranged so as to approach the user's mouth when the noise suppression apparatus 100 is used as a transmission / reception apparatus or a voice input apparatus. The sub microphone 103 is arranged at a position for recording ambient noise having a high correlation with the ambient noise collected by the main microphones 101 and 102 while being away from the arrangement position of the main microphones 101 and 102. Denies that the user's voice is transmitted to the sub-microphone 103 side by being transmitted through the space, transmitted through vibrations of the user's bones, muscles, and housing, or reflected by the surrounding acoustic environment. It is not a thing.

＜近接音強調部１０４＞
近接音強調部１０４は、第一収音信号s₁(n)と第二収音信号s₂(n)とを受け取り、第一近接音強調フィルタf^- ₁(n)と第二近接音強調フィルタf^- ₂(n)とを用いて、それぞれフィルタリングし、フィルタリング後の第一収音信号f^- ₁ ^H(n)s^- ₁(n)とフィルタリング後の第二収音信号f^- ₂ ^H(n)s^- ₂(n)との和を第一主音声信号d(n)として求め（Ｓ１０４）、出力する。近接音強調部１０４の処理は次式で表される。
d(n)= f^- ₁ ^H(n)s^- ₁(n)+f^- ₂ ^H(n)s^- ₂(n)
ただし、f^- ₁(n)=[f_{1_0}(n),f_{1_1}(n),…,f_{1_M-1}(n)]^T、f^- ₂(n)=[f_{2_0}(n),f_{2_1}(n),…,f_{2_M-1}(n)]^T、s^- ₁(n)=[s₁(n),s₁(n-1),…,s₁(n-M+1)]^T、s^- ₂(n)=[s₂(n),s₂(n-1),…,s₂(n-M+1)]^Tとし、^Tは転置を、^Hは複素共役転置を表す。第一近接音強調フィルタf^- ₁(n)と第二近接音強調フィルタf^- ₂(n)とは畳み込み演算を行うため、タップサイズMの長さをもち、それぞれ第一収音信号s^- ₁(n)と第二収音信号s^- ₂(n)とを演算に用いる。 <Near tone enhancement unit 104>
The proximity sound enhancement unit 104 receives the first sound collection signal s ₁ (n) and the second sound collection signal s ₂ (n), and receives the first proximity sound enhancement filter f ^- ₁ (n) and the second proximity sound enhancement signal. Using the filter f ^- ₂ (n), the filtered first filtered sound signal f ^- ₁ ^H (n) s ^- ₁ (n) and the filtered second collected signal f ^- ₂ ^H ^{_{(n) s - 2 (n}} ) and the sum of the calculated as the first main audio signal d (n) of (S104), and outputs. The processing of the proximity sound enhancement unit 104 is expressed by the following equation.
d (n) = f ^- ₁ ^H (n) s ^- ₁ (n) + f ^- ₂ ^H (n) s ^- ₂ (n)
Where f ^- ₁ (n) = [f _{1_0} (n), f _{1_1} (n),…, f _{1_M-1} (n)] ^T , f ^- ₂ (n) = [f _{2_0} (n), f _{2_1} (n), ..., f _{2_M-1} (n)] ^T , s ^- ₁ (n) = [s ₁ (n), s ₁ (n-1), ..., s ₁ (n-M + 1)] ^T , s ^- ₂ (n) = [s ₂ (n), s ₂ (n-1), ..., s ₂ (n-M + 1)] ^T , ^T is transpose, ^H is complex conjugate transpose Represent. The first proximity sound enhancement filter f ^- ₁ (n) and the second closest tone enhancement filter f ^- for performing convolution with ₂ (n), has a length of the tap size M, respectively first collected signal s ^- ₁ (n) and the second voice collecting signal s ^- ₂ (n) and used for calculation.

例えば、f^- ₁(n)をf_{1_0}(n)=1、f_{1_m}(n)=0(m=1,2,…,M-1）とし、f^- ₂(n)をf_{2_0}(n)=-1、f_{2_m}(n)=0(m=1,2,…,M-1）とする。この場合、第一主音声信号d(n)は、s₁(n)とs₂(n)との差となる。つまり、d(n)=s₁(n)-s₂(n)である。 For example, f ^- ₁ (n) and _{f 1_0 (n) = 1,} f 1_m (n) = 0 (m = 1,2, ..., M-1) and, f ^- ₂ a (n) f _{2_0} (n ) =-1 and f _{2_m} (n) = 0 (m = 1, 2,..., M-1). In this case, the first main audio signal d (n) is the difference between s ₁ (n) and s ₂ (n). That is, d (n) = s ₁ (n) −s ₂ (n).

目的音の音源がメインマイクロホン１０２に比べてメインマイクロホン１０１に非常に近く、メインマイクロホン１０１と１０２に収音される音圧差が大きい場合（近接話者の音声の場合）、第一主音声信号d(n)は第一収音信号s₁(n)に近くなる。一方、音源がメインマイクロホン１０１から遠くにある場合（周囲雑音の場合）、第一収音信号s₁(n)と第二収音信号s₂(n)との音圧はほとんど同じになり、第一主音声信号d(n)は0に近づく。よって、この係数（第一近接音強調フィルタf^- ₁(n)と第二近接音強調フィルタf^- ₂(n)）は近接音を強調するフィルタの一つになっている。なお、口元に指向性ビームを生成する手法でも口元の音声を強調することはできるが、口元ではマイクの位置の少しのずれが指向性の角度に大きく影響してしまう問題がある。上述のようにフィルタを設定することにより、接話マイクの多少の位置・方向ずれに左右されずに、メインマイクロホン１０１の方がメインマイクロホン１０２より口元に近い、という条件のみを満たせば近接音が強調できる、という利点がある。これは処理を単純化した例であり、他の例としては、第一近接音強調フィルタf^- ₁(n)と第二近接音強調フィルタf^- ₂(n)とに周波数特性（振幅、位相）の補正フィルタを用いてもいい。なお、第一近接音強調フィルタf^- ₁(n)と第二近接音強調フィルタf^- ₂(n)とは、時刻nに依存しないものとし、第一近接音強調フィルタf^- ₁と第二近接音強調フィルタf^- ₂と表記してもよい。 If the sound source of the target sound is very close to the main microphone 101 compared to the main microphone 102 and the sound pressure difference collected by the main microphones 101 and 102 is large (in the case of the voice of a close speaker), the first main audio signal d (n) is close to the first sound pickup signal s ₁ (n). On the other hand, when the sound source is far from the main microphone 101 (in the case of ambient noise), the sound pressures of the first sound collection signal s ₁ (n) and the second sound collection signal s ₂ (n) are almost the same, The first main audio signal d (n) approaches zero. Therefore, these coefficients (the first proximity sound enhancement filter f ^- ₁ (n) and the second proximity sound enhancement filter f ^- ₂ (n)) are one of the filters that emphasize the proximity sound. Note that the voice of the mouth can also be emphasized by the method of generating a directional beam at the mouth, but there is a problem that a slight deviation of the microphone position greatly affects the directivity angle at the mouth. By setting the filter as described above, if the main microphone 101 is closer to the mouth than the main microphone 102 without being affected by some positional / direction deviation of the close-up microphone, the proximity sound can be generated. There is an advantage that it can be emphasized. This is an example in which the processing is simplified. As another example, the frequency characteristics (amplitude and phase) of the first proximity sound enhancement filter f ^- ₁ (n) and the second proximity sound enhancement filter f ^- ₂ (n) are included. ) Correction filter may be used. Note that the first proximity sound enhancement filter f ^- ₁ (n) and the second proximity sound enhancement filter f ^- ₂ (n) do not depend on the time n, and the first proximity sound enhancement filter f ^- ₁ and the second proximity sound enhancement filter f ^- ₁ (n) It may be expressed as a proximity sound enhancement filter f ^- ₂ .

＜近接音抑圧部１０５＞
近接音抑圧部１０５は、第一収音信号s₁(n)と第二収音信号s₂(n)とを受け取り、第一近接音抑圧フィルタg^- ₁(n)と第二近接音抑圧フィルタg^- ₂(n)とを用いてそれぞれフィルタリングし、フィルタリング後の第一収音信号g^- ₁ ^H(n)s^- ₁(n)とフィルタリング後の第二収音信号g^- ₂ ^H(n)s^- ₂(n)との和を第一参照信号x₁(n)として求め、出力する。近接音抑圧部１０５の処理は次式で表される。
x₁(n)= g^- ₁ ^H(n)s^- ₁(n)+g^- ₂ ^H(n)s^- ₂(n)
ただし、g^- ₁(n)=[g_{1_0}(n),g_{1_1}(n),…,g_{1_M-1}(n)]^T、g^- ₂(n)=[g_{2_0}(n),g_{2_1}(n),…,g_{2_M-1}(n)]^Tとする。第一近接音抑圧フィルタg^- ₁(n)と第二近接音抑圧フィルタg^- ₂(n)とは畳み込み演算を行うため、タップサイズMの長さをもち、それぞれ第一収音信号s^- ₁(n)と第二収音信号s^- ₂(n)とを演算に用いる。例えば、距離減衰に応じて、メインマイクロホン１０１に入力される近接話者の音声と比べて、メインマイクロホン１０２に入力される近接話者の音声が、サンプルaの遅延と、ゲインbの違いを持って観察されるとする。ただしaは１以上の整数、bは0≦b≦1の実数とする。そのとき、g^- ₁(n)をg_{1_a}(n)=b、g_{1_m}(n)=0(m=0,1,…,M-1,ただし、m≠a）とし、g^- ₂(n)をg_{2_0}(n)=-1、g_{2_m}(n)=0(m=1,…,M-1）とする。この場合、第一参照信号x₁(n)は、b×s₁(n-a)とs₂(n)との差となる。つまり、x₁(n)= b×s₁(n-a)-s₂(n)である。 <Proximity Sound Suppression Unit 105>
Proximity sound suppressing unit 105, the first voice collecting signal s ₁ (n) and receiving a second voice collecting signal s ₂ (n), the first proximity sound suppressing filter g ^- ₁ (n) and the suppression second closest sound Filtered using the filter g ^- ₂ (n), respectively, the filtered first collected sound signal g ^- ₁ ^H (n) s ^- ₁ (n) and the filtered second collected sound signal g ^- ₂ ^H ( n) The sum of s ⁻ ₂ (n) is obtained as the first reference signal x ₁ (n) and output. The processing of the close sound suppression unit 105 is expressed by the following equation.
x ₁ (n) = g ^- ₁ ^H (n) s ^- ₁ (n) + g ^- ₂ ^H (n) s ^- ₂ (n)
Where g ^- ₁ (n) = [g _{1_0} (n), g _{1_1} (n),…, g _{1_M-1} (n)] ^T , g ^- ₂ (n) = [g _{2_0} (n), g _{2_1} (n), ..., and _{g 2_M-1 (n)]} T. The first proximity sound suppressing filter g ^- ₁ (n) and the second proximity sound suppressing filter g ^- for performing convolution with ₂ (n), has a length of the tap size M, respectively first collected signal s ^- ₁ (n) and the second voice collecting signal s ^- ₂ (n) and used for calculation. For example, the voice of the close speaker input to the main microphone 102 has a difference between the delay of the sample a and the gain b compared to the sound of the close speaker input to the main microphone 101 according to the distance attenuation. Be observed. However, a is an integer of 1 or more, and b is a real number of 0 ≦ b ≦ 1. Then g ^- ₁ (n) is g _{1_a} (n) = b, g _{1_m} (n) = 0 (m = 0,1, ..., M-1, where m ≠ a) and g ^- ₂ ( the _{n) g 2_0 (n) =} - 1, g 2_m (n) = 0 (m = 1, ..., M-1) to. In this case, the first reference signal x ₁ (n) is a difference between b × s ₁ (na) and s ₂ (n). That is, x ₁ (n) = b × s ₁ (na) −s ₂ (n).

メインマイクロホン１０１に入力される近接話者の音声と比べて、メインマイクロホン１０２に入力される近接話者の音声が、サンプルaの遅延と、ゲインbの違いを持って観察されるため、近接話者が発話を行ったとき、第一参照信号x₁(n)= b×s₁(n-a)- s₂(a)に含まれる近接話者の音声は低減し、第一参照信号x₁(n)は0に近づく。また、周囲雑音は本処理によりほとんど低減しない。これにより、近接話者の音声のみを抑圧する効果を得られる。よって、この係数（第一近接音抑圧フィルタg^- ₁(n)と第二近接音抑圧フィルタg^- ₂(n)）は近接音を抑圧するフィルタの一つになっている。そのほか、g^- ₁(n),g^- ₂(n)に周波数特性の補正フィルタを用いてもいい。なお、第一近接音抑圧フィルタg^- ₁(n)と第二近接音抑圧フィルタg^- ₂(n)は、時刻nに依存せず、第一近接音抑圧フィルタg^- ₁と第二近接音抑圧フィルタg^- ₂としてもよい。 Compared to the voice of the near speaker input to the main microphone 101, the voice of the close speaker input to the main microphone 102 is observed with a difference between the delay of the sample a and the gain b. When a person speaks, the first reference signal x ₁ (n) = b × s ₁ (na) −s ₂ (a) includes a voice of a nearby speaker that is reduced, and the first reference signal x ₁ ( n) approaches 0. In addition, ambient noise is hardly reduced by this processing. As a result, an effect of suppressing only the voice of the close speaker can be obtained. Therefore, this coefficient (the first proximity sound suppression filter g ^- ₁ (n) and the second proximity sound suppression filter g ^- ₂ (n)) is one of the filters for suppressing the proximity sound. In ^{_{addition, g - 1 (n),}} g - good using the correction filter of the frequency characteristic ₂ (n). The first proximity sound suppression filter g ^- ₁ (n) and the second proximity sound suppression filter g ^- ₂ (n) do not depend on the time n, and the first proximity sound suppression filter g ^- ₁ and the second proximity sound suppression filter g ^- ₂ (n) The suppression filter g ^- ₂ may be used.

＜第一適応フィルタ部１１０＞
第一適応フィルタ部１１０は、第一参照信号x₁(n)と誤差信号e(n)とを受け取り、第一参照信号x₁(n)に第一適応フィルタh^- ₁(n)を用いてフィルタリングを行い（Ｓ１１０）、フィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)を求め、出力する。ただし、h^- ₁(n)=[h_{1_0}(n),h_{1_1}(n),…,h_{1_M-1}(n)]^T、x^- ₁(n)=[x₁(n),x₁(n-1),…,x₁(n-M+1)]^Tとする。第一適応フィルタh^- ₁(n)は畳み込み演算を行うため、タップサイズMの長さをもち、第一参照信号x^- ₁(n)を演算に用いる。 <First Adaptive Filter Unit 110>
First adaptive filter unit 110, first reference signal x ₁ (n) and receives the error signal e (n), a first adaptive filter h in the first reference signal x ₁ (n) ^- a ₁ (n) using performs filtering Te (S110), the filtered signal ^{_{^{h - 1 H (n) x}}} - 1 obtains (n), and outputs. Where h ^- ₁ (n) = [h _{1_0} (n), h _{1_1} (n),…, h _{1_M-1} (n)] ^T , x ^- ₁ (n) = [x ₁ (n), x ₁ (n-1), ..., x ₁ (n-M + 1)] ^T. First adaptive filter h ^- ₁ (n) is for performing a convolution operation, has a length of the tap size M, the first reference signal x ^- using ₁ (n) to operation.

また、第一適応フィルタ部１１０は、誤差信号e(n)と第一参照信号x₁(n)とを用いて、誤差信号e(n)が最小となるように逐次的に第一適応フィルタh^- ₁(n)を更新し、第一参照信号x₁(n)に対する誤差信号e(n)の割合（以下「誤差割合」ともいう）β₁の絶対値が所定の閾値（本実施形態では閾値を1とする）以下の場合には、誤差割合β₁に対する単調増加関数に基づく第１−１更新量により第一適応フィルタh^- ₁(n)を更新し、誤差割合β₁の絶対値が所定の閾値より大きい場合には、誤差割合β₁に対して第１−１更新量よりも増加量が小さい単調増加関数に基づく第２−１更新量により第一適応フィルタh^- ₁(n)を更新する。 Further, the first adaptive filter unit 110 sequentially uses the error signal e (n) and the first reference signal x ₁ (n) so that the error signal e (n) is minimized. h ^- ₁ (n) is updated, and the absolute value of the error signal e (n) relative to the first reference signal x ₁ (n) (hereinafter also referred to as “error ratio”) β ₁ is a predetermined threshold (this embodiment) in the case of 1 to) less the threshold value, the first adaptive filter h by 1-1 updating amount based on monotonically increasing function with respect to error rate beta ₁ ^- ₁ (n) to update the absolute error ratio beta ₁ value is greater than a predetermined threshold value, the error rate β by 2-1st update amount based on monotonically increasing function weight increase is smaller than the 1-1 update amount relative to ₁ the first adaptive filter h ^- ₁ ( Update n).

＜第二適応フィルタ部１１１＞
第二適応フィルタ部１１１は、第二参照信号x₂(n)と誤差信号e(n)とを受け取り、第二参照信号x^- ₂(n)に第二適応フィルタh^- ₂(n)を用いてフィルタリングを行い（Ｓ１１１）、フィルタリング後信号h^- ₂ ^H(n)x^- ₂(n)を求め、出力する。ただし、h^- ₂(n)=[h_{2_0}(n),h_{2_1}(n),…,h_{2_M-1}(n)]^T、x^- ₂(n)=[x₂(n),x₂(n-1),…,x₂(n-M+1)]^Tとする。第二適応フィルタh^- ₂(n)は畳み込み演算を行うため、タップサイズMの長さをもち、第二参照信号x^- ₂(n)を演算に用いる。 <Second Adaptive Filter Unit 111>
The second adaptive filter portion 111 includes a second reference signal x ₂ (n) receives the error signal e (n), the second reference signal x ^- a ₂ (n) second adaptive filter h ^- ₂ (n) of used to perform filtering by (S 111), the filtered signal ^{_{^{h - 2 H (n) x}}} - seeking ₂ (n), and outputs. Where h ^- ₂ (n) = [h _{2_0} (n), h _{2_1} (n), ..., h _{2_M-1} (n)] ^T , x ^- ₂ (n) = [x ₂ (n), x ₂ (n-1), ..., x 2 (n-M + 1)] and ^T. Since the second adaptive filter h ^- ₂ (n) performs the convolution operation, the second adaptive filter h ^- ₂ (n) has a length of the tap size M and uses the second reference signal x ^- ₂ (n) for the operation.

また、第二適応フィルタ部１１１は、誤差信号e(n)と第二参照信号x₂(n)とを用いて、誤差信号e(n)が最小となるように逐次的に第二適応フィルタh^- ₂(n)を更新し、第二参照信号x₂(n)に対する誤差信号e(n)の割合（以下「誤差割合」ともいう）β₂の絶対値が所定の閾値（本実施形態では閾値を1とする）以下の場合には、誤差割合β₂に対する単調増加関数に基づく第１−２更新量により第二適応フィルタh^- ₂(n)を更新し、誤差割合β₂の絶対値が所定の閾値より大きい場合には、誤差割合β₂に対して第１−２更新量よりも増加量が小さい単調増加関数に基づく第２−２更新量により第二適応フィルタh^- ₂(n)を更新する。 Further, the second adaptive filter unit 111 sequentially uses the error signal e (n) and the second reference signal x ₂ (n) so as to minimize the error signal e (n). h ^- ₂ (n) is updated, and the absolute value of the error signal e (n) relative to the second reference signal x ₂ (n) (hereinafter also referred to as “error ratio”) β ₂ is a predetermined threshold (this embodiment) in the case 1 to) below threshold, the 1-2 second adaptive filter by updating amount h based on monotonically increasing function with respect to error rate β _₂ ^- ₂ (n) to update the absolute error ratio beta ₂ If the value is greater than a predetermined threshold value, the error rate β by 2-2nd update amount based on monotonically increasing function weight increase is smaller than for the ₂ 1-2 update amount second adaptive filter h ^- ₂ ( Update n).

＜適応フィルタの設計方法＞
以下、適応フィルタの設計方法について述べる。なお、第一主音声信号d(n)は、目的音を収音するために配置された２個のメインマイクロホン１０１及び１０２で収音した信号s₁(n),s₂(n)を、目的音について強調した信号であるが、抑圧しているものの雑音も含まれる信号である。第一参照信号x₁(n)は、２個のメインマイクロホン１０１及び１０２で収音した信号s₁(n),s₂(n)を、目的音以外の音（ここでは雑音）について強調した信号であるが、抑圧しているものの目的音も含まれる信号である。第二参照信号x₂(n)は、第一主音声信号に含まれる周囲雑音と相関性のある周囲雑音を収音するために配置された1個のサブマイクロホン１０３で収音した信号に基づき得られる信号(本実施形態では、サブマイクロホン１０３で収音した信号そのもの)である。なお、第一参照信号x₁(n)は、第一主音声信号d(n)に含まれる雑音とほぼ同じ特性（遅延、周波数特性）の雑音が含まれる。一方、第二参照信号x₂(n)は、第一主音声信号d(n)に含まれる雑音とは多少異なる（雑音の到来時間差、周波数応答）雑音が観測され、目的音の音源に対して物理的に離れているので目的音は第一参照信号x₁(n)よりも少なく観測される。雑音を第一参照信号信号x₁(n)及び第二参照信号信号x₂(n)と第一適応フィルタh^- ₁(n)及び第二適応フィルタh^- ₂(n)とを用いて、除去する。本実施形態では、適応フィルタの更新に、正規化LMS(NLMS: Normalized least mean square)法を用いる（引用文献３参照）。 <Adaptive filter design method>
The adaptive filter design method will be described below. The first main audio signal d (n) is a signal s ₁ (n), s ₂ (n) collected by the two main microphones 101 and 102 arranged for collecting the target sound, Although the signal emphasizes the target sound, it is a signal that is suppressed but includes noise. The first reference signal x ₁ (n) emphasizes the signals s ₁ (n) and s ₂ (n) collected by the two main microphones 101 and 102 with respect to sounds other than the target sound (here, noise). Although it is a signal, it is a signal that includes a target sound that is suppressed. The second reference signal x ₂ (n) is based on a signal picked up by one sub microphone 103 arranged to pick up the ambient noise correlated with the ambient noise included in the first main audio signal. This is a signal obtained (in this embodiment, the signal itself picked up by the sub microphone 103). The first reference signal x ₁ (n) includes noise having substantially the same characteristics (delay and frequency characteristics) as the noise included in the first main audio signal d (n). On the other hand, the second reference signal x ₂ (n) is slightly different from the noise included in the first main speech signal d (n) (noise arrival time difference, frequency response). Therefore, the target sound is observed less than the first reference signal x ₁ (n). Noise using the first reference signal signal x ₁ (n) and the second reference signal signal x ₂ (n), the first adaptive filter h ^- ₁ (n) and the second adaptive filter h ^- ₂ (n), Remove. In the present embodiment, a normalized LMS (NLMS: Normalized least mean square) method is used to update the adaptive filter (see cited document 3).

第一適応フィルタh^- ₁(n)及び第二適応フィルタh^- ₂(n)は、第一主音声信号d(n)からフィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)とフィルタリング後信号h^- ₂ ^H(n)x^- ₂(n)とを減算して得られる値である誤差信号e(n)が最小になるようフィルタ設計を行う。
e(n)=d(n)-h^- ₁ ^H(n)x^- ₁(n)-h^- ₂ ^H(n)x^- ₂(n) (1)
なお、第一適応フィルタh^- ₁(n)及び第二適応フィルタh^- ₂(n)は逐次的に更新を行う。通常のNLMSでは、更新式は以下を用いる。 The first adaptive filter h ^- ₁ (n) and the second adaptive filter h ^- ₂ (n) are filtered from the first main audio signal d (n) to the filtered signal h ^- ₁ ^H (n) x ^- ₁ (n). the filtered signal h ^- performing ₂ (n) and a value obtained by subtracting an error signal e (n) so that a minimum filter design ^- ₂ ^H (n) x.
e (n) = d (n) -h ^- ₁ ^H (n) x ^- ₁ (n) -h ^- ₂ ^H (n) x ^- ₂ (n) (1)
The first adaptive filter h ^- ₁ (n) and the second adaptive filter h ^- ₂ (n) are updated sequentially. In normal NLMS, the update formula is as follows.

ここで、||x₁(n)||及び||x₂(n)||はそれぞれ第一参照信号x₁(n)及び第二参照信号x₂(n)のノルム、適応定数μは更新式の更新量を決めるステップサイズのパラメータである。適応係数μはシステム動作中は誤差信号e(n)に拠らず一定値を取り、値の範囲は0<μ<2の実数である。この更新式を以下のように分解する。 Where || x ₁ (n) || and || x ₂ (n) || are the norms of the first reference signal x ₁ (n) and the second reference signal x ₂ (n), respectively, and the adaptation constant μ is This is a step size parameter that determines the update amount of the update formula. The adaptive coefficient μ takes a constant value regardless of the error signal e (n) during the system operation, and the value range is a real number of 0 <μ <2. This update formula is decomposed as follows.

ここでβ₁及びβ₂は、それぞれ第一参照信号x^- ₁(n)のノルムに対する誤差信号e(n)の比率（割合）及び第二参照信号x^- ₂(n)のノルムに対する誤差信号e(n)の比率（割合）を表している。 Wherein beta ₁ and beta _2, respectively first reference signal x ^- ₁ Ratio (ratio) and the second reference signal of the error signal e (n) with respect to the norm of (n) x ^- error signal with respect to the norm of ₂ (n) It represents the ratio (ratio) of e (n).

第一適応フィルタh^- ₁(n)及び第二適応フィルタh^- ₂(n)の学習がある程度収束した状態では、ヘッドセットに取り付けられた三つのマイクロホンの位置関係から、非音声区間において、フィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)とフィルタリング後信号h^- ₂ ^H(n)x^- ₂(n)との和と、第一主音声信号d(n)とで雑音成分を同程度の音圧とすることができる。そのため、第一適応フィルタh^- ₁(n)及び第二適応フィルタh^- ₂(n)のフィルタリングにより、第一主音声信号d(n)からフィルタリング後信号フィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)とh^- ₂ ^H(n)x^- ₂(n)とを減算して得られる値である誤差信号e(n)は小さくなり、第一参照信号x^- ₁(n)及び第二参照信号x^- ₂(n)それぞれのノルムに対する誤差信号e(n)の比率も小さくなり、-1<β₁<1、-1<β₂<1となる。 In the state where the learning of the first adaptive filter h ^- ₁ (n) and the second adaptive filter h ^- ₂ (n) has converged to some extent, filtering is performed in the non-speech section from the positional relationship of the three microphones attached to the headset. The noise component of the sum of the post-signal h ^- ₁ ^H (n) x ^- ₁ (n) and the post-filtering signal h ^- ₂ ^H (n) x ^- ₂ (n) and the first main audio signal d (n) Can be set to the same sound pressure. Therefore, after filtering of the first adaptive filter h ^- ₁ (n) and the second adaptive filter h ^- ₂ (n), the filtered signal h ^- ₁ ^H (n) from the first main audio signal d (n) is filtered. The error signal e (n), which is the value obtained by subtracting x ^- ₁ (n) and h ^- ₂ ^H (n) x ^- ₂ (n), becomes smaller and the first reference signal x ^- ₁ (n) and a second reference signal x ^- is also reduced ratio of the error signal e (n) for ₂ (n), respectively norm, -1 <β _{1 <1,} the -1 <β ₂ <1.

一方、メインマイクロホン１０１及び１０２は話者の口元の近くに配置されることから、音声区間における音声はメインマイクロホン１０１及び１０２の収音の音圧が大きくなる。すると誤差信号e(n)には話者の発した目的音成分が多く含まれ、誤差信号e(n)の絶対値は第一参照信号x^- ₁(n)及び第二参照信号x^- ₂(n)それぞれのノルム||x^- ₁(n)||、||x^- ₂(n)||よりも大きな値となり、β₁<-1, 1<β₁、β₂<-1, 1<β₂となる。本実施形態では、β₁、β₂に対してそれぞれ非線形な制限関数f(β₁)、f(β₂)を用いることで、音声区間でのフィルタの更新量が小さくなる。 On the other hand, since the main microphones 101 and 102 are arranged near the mouth of the speaker, the sound pressure of the sound picked up by the main microphones 101 and 102 increases in the voice in the voice section. Then, the error signal e (n) includes many target sound components emitted by the speaker, and the absolute value of the error signal e (n) is the first reference signal x ^- ₁ (n) and the second reference signal x ^- _2. (n) Each norm || x ^- ₁ (n) ||, || x ^- ₂ (n) || is larger than β ₁ <-1, 1 <β ₁ , β ₂ <-1, 1 <a β _2. In the present embodiment, the amount of update of the filter in the speech interval is reduced by using nonlinear limit functions f (β ₁ ) and f (β ₂ ) for β ₁ and β ₂ , respectively.

制限関数f(β₁)、f(β₂)はそれぞれ|β₁|≧1(β₁≦-1,β₁≧1)、|β₂|≧1(β₂≦-1,β₂≧1)で小さな値をとる非線形な関数である。例えば、以下の式で表される。 The limiting functions f (β ₁ ) and f (β ₂ ) are respectively | β ₁ | ≧ 1 (β ₁ ≦ -1, β ₁ ≧ 1), | β ₂ | ≧ 1 (β ₂ ≦ -1, β ₂ ≧ It is a nonlinear function that takes a small value in 1). For example, it is represented by the following formula.

例えば、L=5とすると、図４で示す関数となる。また、例えば、制限関数f(β₁)、f(β₂)は以下の式で表されるシグモイド関数を用いてもよい。 For example, when L = 5, the function shown in FIG. 4 is obtained. For example, the limiting functions f (β ₁ ) and f (β ₂ ) may be sigmoid functions represented by the following expressions.

例えば、L=5とすると、図５で示す関数となる。図４と図５において、誤差割合β₁、β₂の絶対値が1以下の場合には、誤差割合β₁、β₂それぞれに対する単調増加関数に基づく第１−１更新量、第１−２更新量を用い、誤差割合β₁、β₂の絶対値が1より大きい場合には、誤差割合β₁、β₂それぞれに対して第１−１更新量、第１−２更新量よりも増加量が小さい単調増加関数に基づく第２−１更新量、第２−２更新量を用いる。よって、誤差割合β₁、β₂の絶対値が1(閾値)以下の場合の更新量(第１−１更新量、第１−２更新量)が、誤差割合β₁、β₂の絶対値が1(閾値)より大きい場合の更新量(第２−１更新量、第２−２更新量)よりも大きい。そして、式(5-1)により第１−１更新量または第２−１更新量に基づいて第一適応フィルタh^- ₁(n)を更新し、式(5-2)により第１−２更新量または第２−２更新量に基づいて第二適応フィルタh^- ₂(n)を更新する。 For example, when L = 5, the function shown in FIG. 5 is obtained. 4 and 5, when the absolute values of the error ratios β ₁ and β ₂ are 1 or less, the 1-1 update amount based on the monotonically increasing function for each of the error ratios β ₁ and β ₂ , 1-2 When the absolute values of the error ratios β ₁ and β ₂ are larger than ₁ using the update amount, the error rates β ₁ and β ₂ are larger than the 1-1 update amount and the 1-2 update amount, respectively. A 2-1 update amount and a 2-2 update amount based on a monotonically increasing function with a small amount are used. Therefore, the update amount (1-1 update amount, 1-2 update amount) when the absolute values of the error ratios β ₁ and β ₂ are 1 (threshold) or less is the absolute value of the error ratios β ₁ and β ₂ . Is larger than the update amount (the 2-1 update amount, the 2-2 update amount) when 1 is greater than 1 (threshold). The first adaptive filter h under section 1-1 updates the amount or the 2-1 update amount by the equation (5-1) ^- Updates the ₁ (n), the by formula (5-2) 1-2 The second adaptive filter h ^- ₂ (n) is updated based on the update amount or the 2-2nd update amount.

関数の制約条件はβ₁=1, β₁=-1を境に、β₁の値の絶対値が減少し、β₂=1, β₂=-1を境に、β₂の値の絶対値が減少する。雑音区間では、メインマイクロホン１０１及び１０２、サブマイクロホン１０３ともに同程度の音圧で観測されるため、もし適応フィルタによって全くメインマイクの信号を抑圧しない場合はβ₁、β₂が1をとり、抑圧することができれば|β₁|<1、|β₂|<1となる。次に音声区間では、誤差信号e(n)が第一参照信号x₁(n)、第二参照信号x₂(n)に比べて大きく観測されるためβ₁>1、β₂>1となる。音声区間で誤差信号e(n)を小さくするよう学習すると目的音声まで抑圧してしまう。それを避けるためβ₁>1、β₂>1では小さい値をとることでフィルタの更新量が小さくなる制限関数の設計となっている。 Function of constraints beta ₁ = 1, the boundary of beta ₁ = -1, the absolute value decreases the beta ₁ value, beta ₂ = 1, the boundary of beta ₂ = -1, beta absolute value of ₂ The value decreases. In the noise section, both the main microphones 101 and 102 and the sub microphone 103 are observed with the same sound pressure. Therefore, if the main filter signal is not suppressed at all by the adaptive filter, β ₁ and β ₂ take 1, and suppression is performed. If it can be done, | β ₁ | <1 and | β ₂ | <1. Next, in the speech interval, the error signal e (n) is observed larger than the first reference signal x ₁ (n) and the second reference signal x ₂ (n), so β ₁ > 1 and β ₂ > 1. Become. If the error signal e (n) is learned to be small in the speech section, the target speech is suppressed. In order to avoid this, the design of the limiting function is such that when β ₁ > 1 and β ₂ > 1, the update amount of the filter is reduced by taking a small value.

別の言い方をすると、音声区間では第一主音声信号d(n)と第一参照信号x₁(n)及び第二参照信号x₂(n)とで観測できる音圧に大きな差があり、誤差信号e(n)には目的音成分が残るため、第一参照信号x₁(n)及び第二参照信号x₂(n)それぞれのノルムに対する誤差信号e(n)の比率も大きくなる。そのため、|β₁|>1、|β₂|>1となる。逆に言えば、|β₁|>1、|β₂|>1となる区間は音声区間の可能性が高く、|β₁|>1、|β₂|>1となる区間で制限を加える事で、音声区間での適応フィルタのステップサイズを抑えることができる。言い換えると、|β₁|>1、|β₂|>1となる区間において、β₁及びβ₂それぞれに比べ、f(β₁)及びf(β₂)の値を小さくすることができる。また、本手法によって、ステップサイズを|β₁|>1、|β₂|>1（音声区間）で0にしないことで、雑音源がサブマイクロホン１０３の近傍に存在する場合や、雑音源が大きく移動し、雑音源からサブマイクロホン１０３、メインマイクロホン１０１及び１０２への伝達関数が変化した場合に|β₁|>1、|β₂|>1となった場合にフィルタの更新が停止することを防ぐことができる。この制限関数f(β₁)、f(β₂)を用いることで、音声区間に存在する目的音である音声信号を消す方向に進む適応フィルタの処理を抑えることが出来る。また、雑音と性質の異なる音声区間でフィルタの学習を緩和することでフィルタの安定性を高める効果も得ることが出来る。 In other words, there is a large difference in sound pressure that can be observed between the first main audio signal d (n), the first reference signal x ₁ (n), and the second reference signal x ₂ (n) in the voice section, Since the target sound component remains in the error signal e (n), the ratio of the error signal e (n) to the norms of the first reference signal x ₁ (n) and the second reference signal x ₂ (n) also increases. Therefore, | β ₁ |> 1 and | β ₂ |> 1. _{Conversely, | β 1 |> 1,} | β 2 |> 1 and comprising the section is likely speech _{period, | β 1 |> 1,} | to limit at> 1 become section | beta ₂ Thus, the step size of the adaptive filter in the speech section can be suppressed. In other words, the values of f (β ₁ ) and f (β ₂ ) can be made smaller than β ₁ and β ₂ in the sections where | β ₁ |> 1 and | β ₂ |> 1, respectively. In addition, according to this method, the step size is not set to 0 in | β ₁ |> 1, | β ₂ |> 1 (voice section), so that the noise source exists near the sub microphone 103 or the noise source is When the transfer function from the noise source to the sub microphone 103 and the main microphones 101 and 102 changes greatly, the update of the filter stops when | β ₁ |> 1 and | β ₂ |> 1. Can be prevented. By using the limiting functions f (β ₁ ) and f (β ₂ ), it is possible to suppress the processing of the adaptive filter that proceeds in the direction of erasing the speech signal that is the target sound existing in the speech section. In addition, it is possible to obtain an effect of improving the stability of the filter by relaxing the learning of the filter in a speech section having a characteristic different from that of noise.

＜減算部１２０＞
減算部１２０は、第一主音声信号d(n)とフィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)とh^- ₂ ^H(n)x^- ₂(n)とを受け取り、第一主音声信号d(n)からフィルタリング後信号h^- ₁ ^H(n)x^- ₁(n)とh^- ₂ ^H(n)x^- ₂(n)とを減算し、その差分d(n)-h^- ₁ ^H(n)x^- ₁(n)-h^- ₂ ^H(n)x^- ₂(n)を誤差信号e(n)として求め（Ｓ１２０）、出力する。 <Subtraction unit 120>
Subtraction unit 120, the first main audio signal d (n) and the filtered signal ^{_{^{h - 1 H (n) x}}} - 1 and ^{_{^{(n) h - 2 H (}}} n) x - receive ₂ and (n), the Subtract the filtered signal h ^- ₁ ^H (n) x ^- ₁ (n) and h ^- ₂ ^H (n) x ^- ₂ (n) from the primary audio signal d (n), and the difference d (n) ^{_{^{-h - 1 H (n) x}}} - 1 (n) -h - 2 H (n) x - seeking ₂ (n) as an error signal e (n) (S120), and outputs.

＜フィルタ設計部１３０＞
フィルタ設計部１３０は、第一参照信号x₁(n)と第二参照信号x₂(n)と誤差信号e(n)とを受け取り、減算部１２０で消し残った雑音成分を抑圧するフィルタGを設計し（Ｓ１３０）、出力する。 <Filter design unit 130>
The filter design unit 130 receives the first reference signal x ₁ (n), the second reference signal x ₂ (n), and the error signal e (n), and suppresses the noise component that has not been erased by the subtraction unit 120. Is designed (S130) and output.

なお、フィルタの設計手法は様々あるが、例えば、参考文献１記載にPSD（power-spectrum density：パワースペクトル密度）推定に基づく雑音除去技術を利用した手法を用いてフィルタ設計を行う。
（参考文献１）丹羽健太、日岡裕輔、小林和則、鎌土記良、「雑音下での音声認識率向上を目的としたマイクロホンアレイの実装」、日本音響学会講演論文集、２０１４年、pp.717-718 Although there are various filter design methods, for example, the filter design is performed using a method using a noise removal technique based on PSD (power-spectrum density) estimation described in Reference Document 1.
(Reference 1) Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Noriyoshi Kamado, “Implementation of a microphone array to improve speech recognition under noisy conditions”, Proc. Of the Acoustical Society of Japan, 2014, pp .717-718

例えば、フィルタ設計部１３０は、第一参照信号x₁(n)と第二参照信号x₂(n)と誤差信号e(n)とを周波数領域の信号である周波数領域第一参照信号X₁(ω,τ)と周波数領域第二参照信号X₂(ω,τ)と周波数領域誤差信号E(ω,τ)とに変換する。誤差信号と第一参照信号及び第二参照信号との比から|E(ω,τ)|/|X₁(ω,τ)| > 1かつ|E(ω,τ)|/|X₂(ω,τ)| > 1となるときの周波数領域誤差信号E(ω,τ)を目的音のスペクトルEs(ω,τ)とし、|E(ω,τ)|/|X₁(ω,τ)| ≦ 1または|E(ω,τ)|/|X₂(ω,τ)| ≦ 1となるときの周波数領域誤差信号E(ω,τ)を雑音のスペクトルEn(ω,τ)とする。このとき、次式により、Wiener法に基づいてポストフィルタG(ω)を設計する。 For example, the filter design unit 130 uses the first reference signal x ₁ (n), the second reference signal x ₂ (n), and the error signal e (n) as a frequency domain first reference signal X _1. (ω, τ), frequency domain second reference signal X ₂ (ω, τ) and frequency domain error signal E (ω, τ) are converted. From the ratio of the error signal to the first reference signal and the second reference signal, | E (ω, τ) | / | X ₁ (ω, τ) |> 1 and | E (ω, τ) | / | X ₂ ( The frequency domain error signal E (ω, τ) when ω, τ) |> 1 is the spectrum Es (ω, τ) of the target sound, and | E (ω, τ) | / | X ₁ (ω, τ ) | ≤ 1 or | E (ω, τ) | / | X ₂ (ω, τ) | ≤ 1 when the frequency domain error signal E (ω, τ) is the noise spectrum En (ω, τ). To do. At this time, the post filter G (ω) is designed based on the Wiener method according to the following equation.

Xs(ω)= E[|Es(ω,τ)|²], Xn(ω)= E[|En(ω,τ)²|]とする。ただし、ωが周波数を表すインデックスであり、τはフレームを表すインデックスであり、E[]はフレームτの平均値とする。スペクトルの算出方法は例えば、高速フーリエ変換（FFT）により時間領域の信号を周波数領域の信号に変換すればよい。 Xs (ω) = E [| Es (ω, τ) | ² ], Xn (ω) = E [| En (ω, τ) ² |]. Here, ω is an index representing a frequency, τ is an index representing a frame, and E [] is an average value of the frame τ. For example, the spectrum may be calculated by converting a time domain signal into a frequency domain signal by fast Fourier transform (FFT).

＜スペクトルフィルタ部１４０＞
スペクトルフィルタ部１４０は、誤差信号e(n)とフィルタGとを受け取り、フィルタGを用いて、誤差信号e(n)に対してフィルタリングを行う（Ｓ１４０）。誤差信号e(n)に含まれる消し残った雑音成分を抑圧するために、ポストフィルタG(ω)を掛け合わせる。
Y(ω,τ)=G(ω)E(ω,τ) (9)
最後に、Y(ω,τ)を逆高速フーリエ変換（IFFT）することで、出力信号y(n)を得る。 <Spectral filter unit 140>
The spectrum filter unit 140 receives the error signal e (n) and the filter G, and filters the error signal e (n) using the filter G (S140). In order to suppress the unerased noise component included in the error signal e (n), the post filter G (ω) is multiplied.
Y (ω, τ) = G (ω) E (ω, τ) (9)
Finally, Y (ω, τ) is subjected to inverse fast Fourier transform (IFFT) to obtain an output signal y (n).

＜効果＞
このような構成により、新たにVAD（voice activity detection）などの装置を必要とせずに、劣化や音声の歪みを抑え、雑音を抑圧することができる。本実施形態では、メインマイクロホンとサブマイクロホンを利用して雑音抑圧を行う際に、音声区間、雑音区間ごとに制限関数によって適応フィルタの更新の速度量を変化させる。これにより、音声区間での誤った方向へのフィルタ学習を抑制し、２つのマイクロホンへ同等の音圧で収音される雑音のみを消すフィルタを作成することができる。また、音声区間でのフィルタ学習を緩和する事で、音声の抑圧を防ぎ、フィルタの安定化を図ることが可能となる。また、距離減衰による近接音強調処理により接話マイクの装着方法のばらつき頑健になり、様々な人が装着したときの周波数特性のばらつきによる雑音の増加も適応フィルタの効果により抑えることができる。 <Effect>
With such a configuration, it is possible to suppress deterioration and distortion of voice and suppress noise without requiring a new device such as VAD (voice activity detection). In this embodiment, when noise suppression is performed using the main microphone and the sub microphone, the speed of update of the adaptive filter is changed by a limiting function for each voice section and noise section. As a result, it is possible to create a filter that suppresses filter learning in the wrong direction in the speech section and eliminates only noise collected by two microphones with equivalent sound pressure. In addition, by relaxing the filter learning in the speech section, it is possible to prevent speech suppression and stabilize the filter. Also, the proximity sound emphasis processing by distance attenuation makes the close-up microphone mounting method robust, and the increase in noise due to the variation in frequency characteristics when various people wear it can be suppressed by the effect of the adaptive filter.

＜変形例＞
本実施形態では、β₁<-1、β₁>1またはβ₂<-1、β₂>1において、フィルタの更新量を制限しているが、β₁<-a₁、β₁>a₁またはβ₂<-a₂、β₂>a₂においてフィルタの更新量を制限してもよい。a₁及びa₂の値はa₁> 0、a₂> 0とする。例えば、式(6-1),(6-2)を以下の式に置き換えてもよい。 <Modification>
In this embodiment, the filter update amount is limited in β ₁ <−1, β ₁ > 1 or β ₂ <−1, β ₂ > 1, but β ₁ <−a ₁ , β ₁ > a The update amount of the filter may be limited in ₁ or β ₂ <-a ₂ and β ₂ > a ₂ . The values of a ₁ and a ₂ are a ₁ > 0 and a ₂ > 0. For example, the expressions (6-1) and (6-2) may be replaced with the following expressions.

なお、上述の式(6'-1),(6'-2)において、式(6-1),(6-2)のLに代えて、それぞれL₁、L₂を用いている。 In the above formulas (6′-1) and (6′-2), L ₁ and L ₂ are used in place of L in formulas (6-1) and (6-2), respectively.

また、本実施形態では式(2-1),(2-2)において、同じ適応定数μを用いているが、異なる適応定数μ₁、μ₂(0<μ₁<2,0<μ₂<2)を用いてもよい。 In this embodiment, the same adaptation constant μ is used in the expressions (2-1) and (2-2), but different adaptation constants μ ₁ and μ ₂ (0 <μ ₁ <2,0 <μ ₂ <2) may be used.

また、本実施形態では、第一近接音強調フィルタf^- ₁(n)、第二近接音強調フィルタf^- ₂(n)、第一近接音抑圧フィルタg^- ₁(n)、第二近接音抑圧フィルタg^- ₂(n)、第一適応フィルタh^- ₁(n)、第二適応フィルタh^- ₂(n)のフィルタ長を同じ長さMとしているが、第一参照信号、第二参照信号、各フィルタリング後信号を得るために適切な範囲内で、それぞれ異なるフィルタ長としてもよい。 Further, in the present embodiment, the first proximity sound enhancement filter f ^- ₁ (n), the second closest tone enhancing filter f ^- ₂ (n), the first proximity sound suppressing filter g ^- ₁ (n), the second proximity sound The filter lengths of the suppression filter g ^- ₂ (n), the first adaptive filter h ^- ₁ (n), and the second adaptive filter h ^- ₂ (n) are the same length M, but the first reference signal and the second reference The filter lengths may be different from each other within an appropriate range for obtaining the signal and the signal after filtering.

本実施形態では、第一適応フィルタ部１１０、第二適応フィルタ部１１１、減算部１２０の処理を時間領域で行っているが、周波数領域で処理を行ってもよい。例えば、図示しない周波数領域変換部を設け、第一主音声信号d(n)、第一参照信号x₁(n)及び第二参照信号x₂(n)をそれぞれ周波数領域の信号である周波数領域第一主音声信号D(ω,τ)、周波数領域第一参照信号X₁(ω,τ)及び周波数領域第二参照信号X₂(ω,τ)に変換する。 In the present embodiment, the processes of the first adaptive filter unit 110, the second adaptive filter unit 111, and the subtracting unit 120 are performed in the time domain, but may be performed in the frequency domain. For example, a frequency domain conversion unit (not shown) is provided, and the first main audio signal d (n), the first reference signal x ₁ (n), and the second reference signal x ₂ (n) are frequency domain signals, respectively. The first main audio signal D (ω, τ), the frequency domain first reference signal X ₁ (ω, τ), and the frequency domain second reference signal X ₂ (ω, τ) are converted.

第一適応フィルタ部１１０は、周波数領域第一参照信号X₁(ω,τ)と周波数領域誤差信号E(ω,τ)とを受け取り、周波数領域第一参照信号X₁(ω,τ)に適応フィルタH₁(ω,τ)を用いてフィルタリングを行い（Ｓ１１０）、フィルタリング後信号H₁(ω,τ)X₁(ω,τ)を求め、出力する。 The first adaptive filter unit 110 receives the frequency domain first reference signal X ₁ (ω, τ) and the frequency domain error signal E (ω, τ), and generates the frequency domain first reference signal X ₁ (ω, τ). Filtering is performed using the adaptive filter H ₁ (ω, τ) (S110), and a filtered signal H ₁ (ω, τ) X ₁ (ω, τ) is obtained and output.

また、第二適応フィルタ部１１１は、周波数領域第二参照信号X₂(ω,τ)と周波数領域誤差信号E(ω,τ)とを受け取り、周波数領域第二参照信号X₂(ω,τ)に適応フィルタH₂(ω,τ)を用いてフィルタリングを行い（Ｓ１１１）、フィルタリング後信号H₂(ω,τ)X₂(ω,τ)を求め、出力する。 The second adaptive filter unit 111 receives the frequency domain second reference signal X ₂ (ω, τ) and the frequency domain error signal E (ω, τ), and receives the frequency domain second reference signal X ₂ (ω, τ). ) Is filtered using the adaptive filter H ₂ (ω, τ) (S111), and a filtered signal H ₂ (ω, τ) X ₂ (ω, τ) is obtained and output.

また、第一適応フィルタ部１１０、第二適応フィルタ部１１１では、次式により、フィルタを更新する。 The first adaptive filter unit 110 and the second adaptive filter unit 111 update the filter according to the following equation.

周波数領域誤差信号E(ω,τ)に目的音成分が多く含まれるとき、周波数領域誤差信号E(ω,τ)の絶対値は周波数領域第一参照信号X₁(ω,τ)、周波数領域第二参照信号X₂(ω,τ)それぞれのノルム||X₁(ω,τ)||、||X₂(ω,τ)||よりも大きな値となり、β₁<-1, 1<β₁、β₂<-1, 1<β₂となる。この変形例の場合でも、第一適応フィルタ部１１０、第二適応フィルタ部１１１では第一実施形態と同様に、誤差割合β₁、β₂に対してそれぞれ非線形な制限関数f(β₁)、f(β₂)、を用いることで、音声区間でのフィルタの更新量を小さくすることができる。また、この説明では非線形な制限関数f(β₁)、f(β₂)はそれぞれ全周波数帯域で同じ制限関数f(β₁)、f(β₂)を用いることとしたが、周波数領域ωごとにそれぞれ別の制限関数f(β₁，ω)、f(β₂，ω)を用いるように構成しても良い。 When the frequency domain error signal E (ω, τ) contains many target sound components, the absolute value of the frequency domain error signal E (ω, τ) is the frequency domain first reference signal X ₁ (ω, τ), the frequency domain The norm of each of the second reference signals X ₂ (ω, τ) || X ₁ (ω, τ) || and || X ₂ (ω, τ) || is larger than β ₁ <-1, 1 <β ₁ , β ₂ <-1, 1 <β ₂ . Even in the case of this modification, the first adaptive filter unit 110 and the second adaptive filter unit 111 are similar to the first embodiment in that the limit functions f (β ₁ ), which are nonlinear with respect to the error ratios β ₁ and β ₂ , respectively. By using f (β ₂ ), it is possible to reduce the amount of filter update in the speech interval. In this description, the same limiting functions f (β ₁ ) and f (β ₂ ) are used for the non-linear limiting functions f (β ₁ ) and f (β ₂ ) in all frequency bands. Each may be configured to use different limiting functions f (β ₁ , ω) and f (β ₂ , ω).

減算部１２０では、周波数領域第一主音声信号D(ω,τ)とフィルタリング後信号H₁(ω,τ)X₁(ω,τ)、H₂(ω,τ)X₂(ω,τ)とを受け取り、周波数領域第一主音声信号D(ω,τ)からフィルタリング後信号H₁(ω,τ)X₁(ω,τ)とフィルタリング後信号H₂(ω,τ)X₂(ω,τ)とを減算して得られる値D(ω,τ)-H₁(ω,τ)X₁(ω,τ)-H₂(ω,τ)X₂(ω,τ)を周波数領域誤差信号E(ω,τ)として求め（Ｓ１２０）、出力する。後段（フィルタ設計部１３０及びスペクトルフィルタ部１４０）において、周波数領域で処理を行うのであれば、そのまま周波数領域第一参照信号X₁(ω,τ)、周波数領域第二参照信号X₂(ω,τ)と周波数領域誤差信号E(ω,τ)を用いればよいし、時間領域信号を用いるのであれば、周波数領域信号に変換して後段に出力すればよい。 In the subtracting unit 120, the frequency domain first main audio signal D (ω, τ) and the filtered signal H ₁ (ω, τ) X ₁ (ω, τ), H ₂ (ω, τ) X ₂ (ω, τ) ) And filtered signal H ₁ (ω, τ) X ₁ (ω, τ) and filtered signal H ₂ (ω, τ) X ₂ ( The value D (ω, τ) -H ₁ (ω, τ) X ₁ (ω, τ) -H ₂ (ω, τ) X ₂ (ω, τ) obtained by subtracting An area error signal E (ω, τ) is obtained (S120) and output. If the subsequent stage (filter design unit 130 and spectral filter unit 140) performs processing in the frequency domain, the frequency domain first reference signal X ₁ (ω, τ) and the frequency domain second reference signal X ₂ (ω, τ) and the frequency domain error signal E (ω, τ) may be used, and if a time domain signal is used, it may be converted into a frequency domain signal and output to the subsequent stage.

また、本実施形態のポイントは、音声区間と非音声区間とで適応フィルタの更新量に対して異なる制限をかける制限関数を用いることである。よって、雑音抑圧装置１００は、必ずしもメインマイクロホン１０１、１０２、サブマイクロホン１０３、フィルタ設計部１３０、スペクトルフィルタ部１４０を含まなくともよい。 Also, the point of the present embodiment is to use a restriction function that places different restrictions on the update amount of the adaptive filter in the voice section and the non-voice section. Therefore, the noise suppression apparatus 100 does not necessarily include the main microphones 101 and 102, the sub microphone 103, the filter design unit 130, and the spectrum filter unit 140.

＜第二実施形態に係る雑音抑圧装置２００＞
第一実施形態と異なる部分を中心に説明する。 <Noise Suppression Device 200 according to Second Embodiment>
A description will be given centering on differences from the first embodiment.

図６は第二実施形態に係る雑音抑圧装置２００の機能ブロック図を、図７はその処理フローを示す。 FIG. 6 is a functional block diagram of the noise suppression apparatus 200 according to the second embodiment, and FIG. 7 shows a processing flow thereof.

雑音抑圧装置１００は、P個のメインマイクロホン１０−ｐとQ個のサブマイクロホン１０−Ｐ＋ｑと近接音強調部２０４と近接音抑圧部２０５と雑音強調部２０１と第一適応フィルタ部１１０と、第二適応フィルタ部１１１と減算部１２０とフィルタ設計部１３０とスペクトルフィルタ部１４０とを含む。ただし、p=1,2,…,P、q=1,2,…,Qであり、P及びQはそれぞれ2以上の整数である。 The noise suppression apparatus 100 includes P main microphones 10-p, Q sub microphones 10-P + q, a proximity sound enhancement unit 204, a proximity sound suppression unit 205, a noise enhancement unit 201, a first adaptive filter unit 110, a first Two adaptive filter units 111, a subtracting unit 120, a filter design unit 130, and a spectrum filter unit 140 are included. Here, p = 1, 2,..., P, q = 1, 2,..., Q, and P and Q are each an integer of 2 or more.

本実施形態では、P個のメインマイクロホン１０−ｐを用いて近接音強調部２０４、近接音抑圧部２０５の処理を行ったり、Q個のサブマイクロホン１０−Ｐ＋ｑを用いて、雑音強調部２０１により雑音を強調した信号を第二参照信号x₂(n)とする（図６、図７参照）。雑音強調部２０１については、例えば参考文献２に示す遅延和ビームフォーマなどを利用し、目的音声から離れた方向を強調する、などすればよい。
（参考文献２）「知識の森、2 群（画像・音・言語）- 6 編（音響信号処理）、2 章音源分離、2-1 ビームフォーミング」、［online］、2012年、電子情報通信学会、［平成27年10月15日検索］、インターネット〈URL：http://www.ieice-hbkb.org/files/02/02gun_06hen_02.pdf〉 In the present embodiment, processing of the proximity sound enhancement unit 204 and the proximity sound suppression unit 205 is performed using the P main microphones 10-p, and the noise enhancement unit 201 is performed using the Q sub microphones 10-P + q. The signal with enhanced noise is defined as a second reference signal x ₂ (n) (see FIGS. 6 and 7). About the noise emphasis part 201, what is necessary is just to emphasize the direction away from the target audio | voice etc., for example using the delay sum beamformer etc. which are shown in the reference literature 2.
(Reference 2) “Knowledge Forest, Group 2 (Image / Sound / Language)-Volume 6 (Sound Signal Processing), Chapter 2 Sound Source Separation, 2-1 Beamforming”, [online], 2012, Electronic Information Communication Academic Society, [October 15, 2015 search], Internet <URL: http://www.ieice-hbkb.org/files/02/02gun_06hen_02.pdf>

本実施形態では、第二参照信号x₂(n)は、Q個のサブマイクロホン１０−Ｐ＋ｑで収音した信号を、目的音以外の音について強調した信号である。一方、第一実施形態では、第二参照信号は、１個のサブマイクロホンで収音した信号自体である。何れの場合も、第二参照信号x₂(n)は、サブマイクロホンで収音した信号に基づき得られる信号であると言える。なお、第二参照信号x₂(n)が、Q個のサブマイクロホン１０−Ｐ＋ｑで収音した信号を目的音以外の音について強調した信号である場合、１個のサブマイクロホン１０３で収音した信号をそのまま利用した場合と比べて、第二参照信号x₂(n)に含まれる目的音の割合がさらに小さくなる。 In the present embodiment, the second reference signal x ₂ (n) is a signal obtained by emphasizing a signal collected by the Q sub microphones 10-P + q with respect to sounds other than the target sound. On the other hand, in the first embodiment, the second reference signal is a signal itself picked up by one sub microphone. In any case, it can be said that the second reference signal x ₂ (n) is a signal obtained based on the signal picked up by the sub microphone. If the second reference signal x ₂ (n) is a signal obtained by emphasizing signals other than the target sound collected by the Q sub microphones 10 -P + q, the sound is collected by one sub microphone 103. Compared to the case where the signal is used as it is, the proportion of the target sound included in the second reference signal x ₂ (n) is further reduced.

＜メインマイクロホン１０−ｐ及びサブマイクロホン１０−Ｐ＋ｑ＞
P個のメインマイクロホン１０−ｐは、それぞれ目的音と雑音を収音し（Ｓ１０−ｐ）、第p収音信号s_p(n)を出力する。Q個のサブマイクロホン１０−Ｐ＋ｑは、目的音と雑音を収音し（Ｓ１０−Ｐ＋ｑ）、第P+q収音信号s_P+q(n)を出力する。P個のメインマイクロホン１０−ｐとQ個のサブマイクロホン１０−Ｐ＋ｑとの位置関係を図８に示す。図８では、P=3,Q=3とした。P個のメインマイクロホン１０−ｐは話者の口元に配置され、Q個のサブマイクロホン１０−Ｐ＋ｑは口元から遠い位置に配置される。P+Q個のマイクロホンはワイヤー（アーム）及びヘッドバンドなどを介して固定され、配線ケーブルや無線通信等を介して、外部に配置された雑音抑圧装置２００本体（雑音抑圧装置２００からP個のメインマイクロホン１０−ｐとQ個のサブマイクロホン１０−Ｐ＋ｑとを除いた構成）にそれぞれ接続される。 <Main microphone 10-p and sub microphone 10-P + q>
Each of the P main microphones 10-p collects a target sound and noise (S10-p), and outputs a _p- th collected signal sp (n). The Q sub-microphones 10-P + q collect the target sound and noise (S10-P + q) and output the P + q collected sound signal s _{P + q} (n). FIG. 8 shows the positional relationship between the P main microphones 10-p and the Q sub microphones 10-P + q. In FIG. 8, P = 3 and Q = 3. P main microphones 10-p are arranged at the mouth of the speaker, and Q sub-microphones 10-P + q are arranged at positions far from the mouth. P + Q microphones are fixed via wires (arms), headbands, and the like, and the noise suppression device 200 main body (P noises from the noise suppression device 200) is arranged outside via a wiring cable, wireless communication, or the like. To the main microphone 10-p and Q sub-microphones 10-P + q).

＜雑音強調部２０１＞
雑音強調部２０１は、Q個の第P+q収音信号s_P+q(n)を受け取り、目的音以外の音について強調した第二参照信号x₂(n)を求め（Ｓ２０１）、出力する。例えば、参考文献２の技術を用いて、利用者の口元の方向とは異なる方向にビームを形成し、第二参照信号x₂(n)を求める。 <Noise enhancement unit 201>
The noise enhancement unit 201 receives the Qth P + q collected sound signals s _{P + q} (n), obtains a second reference signal x ₂ (n) that emphasizes sounds other than the target sound (S201), and outputs To do. For example, using the technique of Reference Document 2, a beam is formed in a direction different from the direction of the user's mouth, and the second reference signal x ₂ (n) is obtained.

＜近接音強調部２０４及び近接音抑圧部２０５＞
近接音強調部２０４は、P個の第p収音信号s_p(n)を受け取り、目的音について強調した第一主音声信号d(n)を求め（Ｓ２０４）、出力する。例えば、第１収音信号s₁(n)と第p収音信号s_p(n)の差分（p=2,3,…,P）を計算し、P-1個の差分信号の平均をとって、第一主音声信号d(n)を求める。 <Proximity Sound Enhancement Unit 204 and Proximity Sound Suppression Unit 205>
Proximity sound enhancement unit 204 receives the P-number of the p collected signal s _p (n), we obtain a first main audio signal d (n) which highlighted the target sound (S204), and outputs. For example, the difference (p = 2, 3,..., P) between the first collected signal s ₁ (n) and the pth collected signal s _p (n) is calculated, and the average of the P−1 difference signals is calculated. Thus, the first main audio signal d (n) is obtained.

近接音抑圧部２０５は、P個の第p収音信号s_p(n)を受け取り、目的音以外の音について強調した第一参照信号x₁(n)を求め（Ｓ２０５）、出力する。例えば、参考文献２の技術を用いて、利用者の口元の方向とは異なる方向にビームを形成し、第一参照信号x₁(n)を求める。 Proximity sound suppressing unit 205 receives the P-number of the p collected signal s _p (n), the first reference signal x ₁ (n) and calculated (S205) which highlighted the sounds other than the target sound, and outputs. For example, using the technique of Reference Document 2, a beam is formed in a direction different from the direction of the user's mouth, and the first reference signal x ₁ (n) is obtained.

なお、P=2の場合、近接音強調部２０４及び近接音抑圧部２０５は、第一実施形態と同様の方法により、第一主音声信号d(n)と第一参照信号x₁(n)とを求めてもよい。 In the case of P = 2, the proximity sound emphasizing unit 204 and the proximity sound suppressing unit 205 perform the first main audio signal d (n) and the first reference signal x ₁ (n) by the same method as in the first embodiment. You may ask for.

このような構成により、第一主音声信号d(n)、第一参照信号x₁(n)及び第二参照信号x₂(n)には、第一実施形態の場合と同様の性質が現れる。つまり、第一主音声信号d(n)は、目的音を収音するために配置されたP個のメインマイクロホン１０−ｐで収音した信号s_p(n)を、目的音について強調した信号であるが、抑圧しているものの雑音も含まれる信号である。第一参照信号x₁(n)は、P個のメインマイクロホン１０−ｐで収音した信号s_p(n)を、目的音以外の音（ここでは雑音）について強調した信号であるが、抑圧しているものの目的音も含まれる信号である。第二参照信号x₂(n)は、第一主音声信号に含まれる周囲雑音と相関性のある周囲雑音を収音するために配置されたQ個のサブマイクロホン１０−Ｐ−ｑで収音した信号に基づき得られる信号であって、目的音以外の音について強調した信号である。なお、第一参照信号x₁(n)は、第一主音声信号d(n)に含まれる雑音とほぼ同じ特性（遅延、周波数特性）の雑音が含まれる。一方、第二参照信号x₂(n)は、第一主音声信号d(n)に含まれる雑音とは多少異なる（雑音の到来時間差、周波数応答）雑音が観測され、音源に対して物理的に離れているので目的音は第一参照信号x₁(n)よりも少なく観測される。 With this configuration, the first main audio signal d (n), the first reference signal x ₁ (n), and the second reference signal x ₂ (n) have the same properties as in the first embodiment. . That is, the first main audio signal d (n) signal the signal s _p (n) picked up by the arranged number P of the main microphones 10-p in order to pick up a target sound highlighted the target sound However, although it is suppressed, it is a signal including noise. The first reference signal x ₁ (n) is the signal s _p picked up by the P number of main microphones 10-p (n), but (in this case noise) sounds other than the target sound is emphasized signal for suppression The target sound is also included in the signal. The second reference signal x ₂ (n) is collected by Q sub-microphones 10-Pq arranged to pick up ambient noise correlated with the ambient noise included in the first main audio signal. It is a signal obtained based on the received signal, and emphasized for sounds other than the target sound. The first reference signal x ₁ (n) includes noise having substantially the same characteristics (delay and frequency characteristics) as the noise included in the first main audio signal d (n). On the other hand, the second reference signal x ₂ (n) is slightly different from the noise included in the first main speech signal d (n) (noise arrival time difference, frequency response). Therefore, the target sound is observed less than the first reference signal x ₁ (n).

以降の処理については、第一実施形態と同様である。
＜効果＞
上述の構成により、第一実施形態と同様の効果を得ることができる。なお、第一実施形態の変形例と本実施形態とを組合せてもよい。 The subsequent processing is the same as in the first embodiment.
<Effect>
With the configuration described above, the same effects as those of the first embodiment can be obtained. In addition, you may combine the modification of 1st embodiment, and this embodiment.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

A noise suppression device that suppresses a noise component included in a first main audio signal using a first reference signal and a second reference signal,
Filtering the first reference signal using a first adaptive filter, a first adaptive filter unit for obtaining a first filtered signal;
Filtering the second reference signal using a second adaptive filter, and obtaining a second filtered signal, a second adaptive filter unit;
A subtracting unit for obtaining, as an error signal, a value obtained by subtracting the first filtered signal and the second filtered signal from the first main audio signal;
The first adaptive filter unit sequentially updates the first adaptive filter using the error signal and the first reference signal so as to minimize the error signal, and the error relative to the first reference signal is determined. When the absolute value of the error ratio β ₁ , which is the signal ratio, is equal to or less than a predetermined threshold a ₁ , the first adaptive filter is updated with a 1-1 update amount based on a monotonically increasing function with respect to the error ratio β ₁ . When the absolute value of the error rate β ₁ is larger than a predetermined threshold a ₁ , a second increase based on a monotonically increasing function with an increase amount smaller than the 1-1 update amount with respect to the error rate β ₁ . Updating the first adaptive filter by one update amount;
The second adaptive filter unit uses the error signal and the second reference signal to sequentially update the second adaptive filter so that the error signal is minimized, and the error relative to the second reference signal When the absolute value of the error ratio β ₂ , which is a signal ratio, is equal to or less than a predetermined threshold value a ₂ , the second adaptive filter is updated with a 1-2 update amount based on a monotonically increasing function with respect to the error ratio β ₂ . When the absolute value of the error rate β ₂ is larger than a predetermined threshold value a _2, a _second value based on a monotonically increasing function that is smaller than the first-2 update amount with respect to the error rate β ₂ is obtained. Updating the second adaptive filter by two update amounts;
Noise suppression device.

The noise suppression device of claim 1,
The first main audio signal is a signal in which a signal collected by two or more main microphones arranged to collect the target sound is emphasized for the target sound,
The first reference signal is a signal in which signals collected by two or more main microphones are emphasized for sounds other than the target sound,
The second reference signal is a signal obtained based on a signal picked up by one or more sub-microphones arranged to pick up ambient noise correlated with the ambient noise included in the first main audio signal. Is,
Noise suppression device.

The noise suppression device of claim 1,
The first main audio signal is a signal picked up by two or more main microphones arranged at the mouth of the speaker, and a signal that picks up the target sound and the ambient noise that are the utterance of the speaker. , A signal that emphasizes the target sound,
The first reference signal is a signal picked up by two or more main microphones, and a signal obtained by picking up the target sound that is the utterance of the speaker and ambient noise is emphasized for sounds other than the target sound. Signal,
The second reference signal is a signal obtained based on a signal picked up by one or more sub-microphones,
In the sub microphone, the sound pressure of the target sound included in the signal collected by the sub microphone is smaller than the sound pressure of the target sound included in the signal collected by the main microphone, and is collected by the sub microphone. The sound pressure of the noise contained in the signal that was struck is arranged to be approximately the same as the sound pressure of the noise contained in the signal collected by the main microphone,
Noise suppression device.

The noise suppression device according to any one of claims 1 to 3,
An index representing the time with the 1-1 update amount or the 2-1 update amount as f (β ₁ ) and the 1-2 update amount or the 2-2 update amount as f (β ₂ ). Is n, the filter length of the filter coefficient of the first adaptive filter is M _1, and the filter coefficient at time n is h ⁻ ₁ (n) = [h ₁ (n), h ₁ (n−1),. , h ₁ (nM ₁ +1)], the filter length of the filter coefficient of the second adaptive filter is M _2, and the filter coefficient at time n is h ⁻ ₂ (n) = [h ₂ (n), h ₂ (n−1),..., H ₂ (nM ₂ +1)], the adaptive constant of the first adaptive filter is μ ₁ , the adaptive constant of the second adaptive filter is μ _2, and the above-mentioned at time n The first reference signal is x ₁ (n), x ⁻ ₁ (n) = [x ₁ (n), x ₁ (n−1),..., X ₁ (n−M + 1)]. The norm of one reference signal x ^- ₁ (n) is || x ^- ₁ (n) ||, the second reference signal at time n is x ₂ (n), and x ^- ₂ (n) = [x ₂ _{(n), x 2 (n} -1), ..., x 2 (n-M + 1)] And, wherein the second reference signal x ^- each greater than 1 and ₂ (n) ||, the error signal at time n and e (n), L ₁ and L ₂ a ^- norm || x of ₂ (n) The update formula of the first adaptive filter and the second adaptive filter is a real number.

And

Or

And

Or

Is,
Noise suppression device.

A noise suppression method for suppressing a noise component included in a first main audio signal using a first reference signal and a second reference signal,
A first adaptive filter unit that performs filtering using the first adaptive filter on the first reference signal to obtain a first filtered signal;
A second adaptive filter unit performs filtering using the second adaptive filter on the second reference signal, and obtains a second post-filtering signal; and
A subtracting step for subtracting a value obtained by subtracting the first filtered signal and the second filtered signal from the first main audio signal as an error signal;
In the first adaptive filter step, using the error signal and the first reference signal, the first adaptive filter is sequentially updated so that the error signal is minimized, and the error relative to the first reference signal is determined. When the absolute value of the error ratio β ₁ , which is the signal ratio, is equal to or less than a predetermined threshold a ₁ , the first adaptive filter is updated with a 1-1 update amount based on a monotonically increasing function with respect to the error ratio β ₁ . When the absolute value of the error rate β ₁ is larger than a predetermined threshold a ₁ , a second increase based on a monotonically increasing function with an increase amount smaller than the 1-1 update amount with respect to the error rate β ₁ . Updating the first adaptive filter by one update amount;
In the second adaptive filter step, using the error signal and the second reference signal, the second adaptive filter is sequentially updated so that the error signal is minimized, and the error relative to the second reference signal is determined. When the absolute value of the error ratio β ₂ , which is a signal ratio, is equal to or less than a predetermined threshold value a ₂ , the second adaptive filter is updated with a 1-2 update amount based on a monotonically increasing function with respect to the error ratio β ₂ . When the absolute value of the error rate β ₂ is larger than a predetermined threshold value a _2, a _second value based on a monotonically increasing function that is smaller than the first-2 update amount with respect to the error rate β ₂ is obtained. Updating the second adaptive filter by two update amounts;
Noise suppression method.

The noise suppression method according to claim 5, comprising:
The first main audio signal is a signal in which a signal collected by two or more main microphones arranged to collect the target sound is emphasized for the target sound,
The first reference signal is a signal in which signals collected by two or more main microphones are emphasized for sounds other than the target sound,
The second reference signal is a signal obtained based on a signal picked up by one or more sub-microphones arranged to pick up ambient noise correlated with the ambient noise included in the first main audio signal. Is,
Noise suppression method.

The noise suppression method according to claim 5, comprising:
The first main audio signal is a signal picked up by two or more main microphones arranged at the mouth of the speaker, and a signal that picks up the target sound and the ambient noise that are the utterance of the speaker. , A signal that emphasizes the target sound,
The first reference signal is a signal picked up by two or more main microphones, and a signal obtained by picking up the target sound that is the utterance of the speaker and ambient noise is emphasized for sounds other than the target sound. Signal,
The second reference signal is a signal obtained based on a signal picked up by one or more sub-microphones,
In the sub microphone, the sound pressure of the target sound included in the signal collected by the sub microphone is smaller than the sound pressure of the target sound included in the signal collected by the main microphone, and is collected by the sub microphone. The sound pressure of the noise contained in the signal that was struck is arranged to be approximately the same as the sound pressure of the noise contained in the signal collected by the main microphone,
Noise suppression method.

The noise suppression method according to any one of claims 5 to 7,
An index representing the time with the 1-1 update amount or the 2-1 update amount as f (β ₁ ) and the 1-2 update amount or the 2-2 update amount as f (β ₂ ). Is n, the filter length of the filter coefficient of the first adaptive filter is M _1, and the filter coefficient at time n is h ⁻ ₁ (n) = [h ₁ (n), h ₁ (n−1),. , h ₁ (nM ₁ +1)], the filter length of the filter coefficient of the second adaptive filter is M _2, and the filter coefficient at time n is h ⁻ ₂ (n) = [h ₂ (n), h ₂ (n−1),..., H ₂ (nM ₂ +1)], the adaptive constant of the first adaptive filter is μ ₁ , the adaptive constant of the second adaptive filter is μ _2, and the above-mentioned at time n The first reference signal is x ₁ (n), x ⁻ ₁ (n) = [x ₁ (n), x ₁ (n−1),..., X ₁ (n−M + 1)]. The norm of one reference signal x ^- ₁ (n) is || x ^- ₁ (n) ||, the second reference signal at time n is x ₂ (n), and x ^- ₂ (n) = [x ₂ _{(n), x 2 (n} -1), ..., x 2 (n-M + 1)] And, wherein the second reference signal x ^- each greater than 1 and ₂ (n) ||, the error signal at time n and e (n), L ₁ and L ₂ a ^- norm || x of ₂ (n) The update formula of the first adaptive filter and the second adaptive filter is a real number.

And

Or

And

Or

Is,
Noise suppression method.

The program for functioning a computer as a noise suppression apparatus in any one of Claims 1-4.