JP2005529379A

JP2005529379A - Method and apparatus for removing noise from electronic signals

Info

Publication number: JP2005529379A
Application number: JP2004562239A
Authority: JP
Inventors: バーネット，グレゴリー・シー
Original assignee: アリフコム
Priority date: 2001-11-21
Filing date: 2002-11-21
Publication date: 2005-09-29
Also published as: KR20040077661A; EP1480589A1; CN1589127A; KR100936093B1; AU2002359445A1; WO2004056298A1

Abstract

人の音声から音響ノイズを除去する方法およびシステムを提供する。音響ノイズは、ノイズの種別、振幅、または方位に関係なく除去する。本システムは、マイクロフォンと発声活動検出（「ＶＡＤ」）エレメントとの間に結合されているプロセッサを含む。プロセッサは、伝達関数を発生するノイズ消去アルゴリズムを実行する。プロセッサは、マイクロフォンからの音響データと、ＶＡＤからのデータを受け取る。プロセッサは、ＶＡＤが発声活動を示すとき、およびＶＡＤが発声活動を示さないときに、種々の伝達関数を発生する。伝達関数を用いて、ノイズ消去データ・ストリームを発生する。Methods and systems for removing acoustic noise from human speech are provided. The acoustic noise is removed regardless of the noise type, amplitude, or direction. The system includes a processor coupled between a microphone and a vocal activity detection (“VAD”) element. The processor executes a noise cancellation algorithm that generates a transfer function. The processor receives acoustic data from the microphone and data from the VAD. The processor generates various transfer functions when the VAD indicates vocal activity and when the VAD does not indicate vocal activity. A transfer function is used to generate a noise cancellation data stream.

Description

本発明は、望ましくない音響ノイズを音響送信または記録から除去または抑制するための数学的方法および電子システムの分野に関する。 The present invention relates to the field of mathematical methods and electronic systems for removing or suppressing unwanted acoustic noise from acoustic transmission or recording.

関連出願
本特許出願は、２００１年７月１２日に出願された米国特許出願第０９／９０５，３６１号の一部継続出願であり、その内容はここで引用したことにより本願にも含まれることとする。また、本特許出願は、２００１年１１月２１に出願された米国仮特許出願第６０／３３２，２０２号の優先権も主張する。 Related Application This patent application is a continuation-in-part of US patent application Ser. No. 09 / 905,361 filed on Jul. 12, 2001, the contents of which are hereby incorporated herein by reference. And This patent application also claims the priority of US Provisional Patent Application No. 60 / 332,202, filed Nov. 21, 2001.

典型的な音響の用途では、人間のユーザからの音声を記録または格納し、異なる位置にある受信側に送信する。ユーザの環境には、１つ以上のノイズ源が存在し、対象の信号（ユーザの音声）を望ましくない音響ノイズで汚染する可能性がある。このために、受信側は、人であろうが機械であろうが、ユーザの音声を理解することが困難または不可能になる。これは、現在、セルラ電話機やパーソナル・ディジタル・アシスタントのような携帯通信デバイスの普及により、特に問題となっている。これらのノイズ付加を抑制する方法が存在するが、これらには重大な欠点がある。例えば、既存の方法は、計算時間が必要なために低速である。また、既存の方法は、かさばるハードウエアを必要とし、対象信号を容認できない程に歪ませたり、または使用できないような貧弱な性能を有する場合もある。これら既存の方法の多くは、Vaseghiの"Advanced Digital Signal Processing and Noise Reduction"（高度ディジタル信号処理およびノイズ低減）ISBN 0-471-62692-9のような教科書に記載されている。 In a typical acoustic application, audio from a human user is recorded or stored and transmitted to receivers at different locations. There may be one or more noise sources in the user's environment, which can contaminate the signal of interest (user's voice) with undesirable acoustic noise. This makes it difficult or impossible for the recipient to understand the user's voice, whether human or machine. This is particularly a problem with the widespread use of portable communication devices such as cellular telephones and personal digital assistants. There are ways to suppress these noise additions, but they have significant drawbacks. For example, existing methods are slow because they require computation time. In addition, existing methods require bulky hardware and may have poor performance such that the signal of interest is unacceptably distorted or cannot be used. Many of these existing methods are described in textbooks such as Vaseghi's "Advanced Digital Signal Processing and Noise Reduction" ISBN 0-471-62692-9.

Advanced Digital Signal Processing and Noise ReductionAdvanced Digital Signal Processing and Noise Reduction

以下の説明は、本発明の実施形態の完全な理解を得るための具体的な詳細、およびそれを実施可能とする記載を提示する。しかしながら、本発明は、これらの詳細以外でも実施可能であることは当業者にはおわかりであろう。その他の場合には、本発明の実施形態の説明を不必要に曖昧にするのを避けるために、周知の構造や機能については詳細に図示も説明もしていない。 The following description presents specific details in order to provide a thorough understanding of embodiments of the invention, and a description that enables it to be implemented. However, it will be apparent to one skilled in the art that the present invention may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the invention.

以下で特に記載がない限り、図に示す種々のブロックの構造および動作は、従来の設計と同一である。その結果、このようなブロックは、ここでは更に詳細に説明する必要はない。何故なら、これらは当業者にはわかっているはずであるからである。このような更なる詳細は、簡略化のために省略し、本発明の詳細な説明を曖昧にしないようにしている。図（または他の実施形態）におけるブロックに必要な修正は、いずれも、ここに提示する詳細な説明に基づいて、当業者には容易に行うことができる。 Unless otherwise noted below, the structure and operation of the various blocks shown in the figures are identical to conventional designs. As a result, such blocks need not be described in further detail here. Because these should be known to those skilled in the art. Such further details are omitted for the sake of brevity so as not to obscure the detailed description of the invention. Any modifications necessary to the blocks in the figure (or other embodiments) can be readily made by those skilled in the art based on the detailed description presented herein.

図１は、いつ音声が発しているかという知識を用いた、一実施形態のノイズ消去システムのブロック図である。この知識は、発声活動に関する心理学的情報から得られる。このシステムは、少なくとも１つのプロセッサ３０に信号を供給するマイクロフォン１０およびセンサ２０を含む。プロセッサは、ノイズ消去サブシステム、即ち、アルゴリズム４０を含む。 FIG. 1 is a block diagram of one embodiment of a noise cancellation system that uses knowledge of when speech is uttered. This knowledge is obtained from psychological information about vocal activity. The system includes a microphone 10 and a sensor 20 that provide signals to at least one processor 30. The processor includes a noise cancellation subsystem or algorithm 40.

図２は、一実施形態のノイズ除去アルゴリズムを例示するブロック図であり、用いるシステム構成要素を示す。単一のノイズ源およびマイクロフォンへの直接経路を想定している。図２は、単一の信号源１００および単一のノイズ源１０１による、一実施形態のプロセスの図による説明を含む。このアルゴリズムは２つのマイクロフォン、即ち、「信号」マイクロフォン１（「ＭＩＣ１」）と、「ノイズ」マイクロフォン２（「ＭＩＣ２」）とを用いるが、これに限定される訳ではない。ＭＩＣ１は、主に信号を捕捉しノイズは一部に過ぎないと想定し、一方ＭＩＣ２は主にノイズを捕捉し、信号は一部である。信号源１００からＭＩＣ１へのデータをｓ（ｎ）で示す。ここで、ｓ（ｎ）は、信号源１００からのアナログ信号の離散サンプルである。信号源１００からＭＩＣ２へのデータをｓ_２（ｎ）で示す。ノイズ源１０１からＭＩＣ２へのデータをｎ（ｎ）で示す。ノイズ源１０１からＭＩＣ１へのデータをｎ_２（ｎ）で示す。同様に、ＭＩＣ１からノイズ除去エレメント１０５へのデータをｍ_１（ｎ）で示し、ＭＩＣ２からノイズ除去エレメント１０５へのデータをｍ_２（ｎ）で示す。 FIG. 2 is a block diagram illustrating the noise removal algorithm of one embodiment, showing the system components used. A direct path to a single noise source and microphone is assumed. FIG. 2 includes a graphical illustration of a process of one embodiment with a single signal source 100 and a single noise source 101. The algorithm uses, but is not limited to, two microphones: a “signal” microphone 1 (“MIC1”) and a “noise” microphone 2 (“MIC2”). MIC1 mainly captures the signal and assumes that the noise is only part, while MIC2 mainly captures the noise and the signal is part. Data from the signal source 100 to the MIC 1 is indicated by s (n). Here, s (n) is a discrete sample of the analog signal from the signal source 100. Data from the signal source 100 to the MIC _{2 is} denoted by s ₂ (n). Data from the noise source 101 to the MIC 2 is indicated by n (n). Data from the noise source 101 to the MIC 1 is denoted by n ₂ (n). Similarly, data from the MIC1 to the noise removal element 105 is denoted by m ₁ (n), and data from the MIC2 to the noise removal element 105 is denoted by m ₂ (n).

また、ノイズ除去エレメントは、発声活動検出（「ＶＡＤ」）エレメント１０４からの信号も受ける。ＶＡＤ１０４は、心理学的情報を検出し用いて、話者が発話しているときを判定する。種々の実施形態では、ＶＡＤは、無線周波数デバイス、電子言語記録計(electroglottograph)、超音波デバイス、音響喉当てマイクロフォン、および／または空気流検出器を含む。 The noise removal element also receives a signal from the vocal activity detection (“VAD”) element 104. The VAD 104 detects and uses psychological information to determine when the speaker is speaking. In various embodiments, the VAD includes a radio frequency device, an electroglottograph, an ultrasound device, an acoustic throat microphone, and / or an airflow detector.

信号源１００からＭＩＣ１への伝達関数、およびノイズ源１０１からＭＩＣ２への伝達関数は、１であると仮定する。信号源１００からＭＩＣ２への伝達関数をＨ_２（ｚ）で示し、ノイズ源１０１からＭＩＣ１への伝達関数をＨ_１（ｚ）で示す。伝達関数を１に仮定したことは、このアルゴリズムの一般化を阻害することはない。何故なら、信号、ノイズ、およびマイクロフォン間の実際の関係は、単純な比率であり、この比率は簡略化のためにこのように再定義されるからである。 Assume that the transfer function from signal source 100 to MIC1 and the transfer function from noise source 101 to MIC2 are unity. The transfer function from the signal source 100 to MIC2 is denoted by H ₂ (z), and the transfer function from the noise source 101 to MIC1 is denoted by H ₁ (z). Assuming a transfer function of 1 does not preclude the generalization of this algorithm. Because the actual relationship between signal, noise, and microphone is a simple ratio, this ratio is redefined in this way for simplicity.

従来のノイズ除去システムでは、ＭＩＣ２からの情報を用いてＭＩＣ１からノイズを除去しようとしていた。しかしながら、ＶＡＤエレメント１０４は決して完璧ではないという暗黙の仮定があり、したがって、ノイズ消去は注意深く行い、ノイズと共に余りに多くの信号を除去してしまうことのないようにしなければならない。しかしながら、ＶＡＤ１０４が完璧であり、ユーザが音声を発しないときにはそれが０に等しく、音声を発したときには１に等しいと仮定した場合、ノイズ除去に著しい改善を得ることができる。 In the conventional noise removal system, information from MIC2 is used to remove noise from MIC1. However, there is an implicit assumption that the VAD element 104 is by no means perfect, so noise cancellation must be done carefully to avoid removing too many signals with noise. However, if the VAD 104 is perfect and is assumed to be equal to 0 when the user does not speak, and equals 1 when the user speaks, a significant improvement in noise removal can be obtained.

図２を参照して、単一のノイズ源１０１およびマイクロフォンへの直接経路を分析するに当たり、ＭＩＣ１に入来する総音響情報をｍ_１（ｎ）で示す。ＭＩＣ２に入来する総音響情報も、同様に、ｍ_２（ｎ）で示す。ｚ（ディジタル周波数）ドメインでは、これらはＭ_１（ｚ）およびＭ_２（ｚ）として表される。すると、 Referring to FIG. 2, in analyzing the direct path to a single noise source 101 and microphone, the total acoustic information coming into MIC1 is denoted m ₁ (n). Similarly, the total acoustic information coming into MIC2 is indicated by m ₂ (n). In the z (digital frequency) domain, these are represented as M ₁ (z) and M ₂ (z). Then

そして、 And

したがって、 Therefore,

式１
これは、全ての二マイクロフォン・システムについての一般的な場合である。実際のシステムでは、常に多少のノイズ漏れがあってＭＩＣ１に混入し、更に多少の信号漏れがＭＩＣ２に混入する。式１は、４つの未知数と２つのみの既知数の関係であり、したがって明示的に解くことはできない。 Formula 1
This is the general case for all two-microphone systems. In an actual system, there is always some noise leakage and is mixed into MIC1, and further some signal leakage is mixed into MIC2. Equation 1 is a relationship between four unknowns and only two known numbers, and therefore cannot be solved explicitly.

しかしながら、式１における未知数の一部に対する別の解法がある。分析は、信号が発生していない場合、即ち、ＶＡＤエレメント１０４からの信号が０に等しく、音声が発せられていない場合の検討から開始する。この場合、ｓ（ｎ）＝Ｓ（ｚ）＝０であり、式１は次のように変形する。 However, there is another solution for some of the unknowns in Equation 1. The analysis begins with consideration when no signal is generated, i.e., when the signal from the VAD element 104 is equal to 0 and no sound is being emitted. In this case, s (n) = S (z) = 0, and Equation 1 is transformed as follows.

ここで、変数Ｍの下付き文字ｎは、ノイズのみを受けていることを示す。これから、次の式が導かれる。 Here, the subscript n of the variable M indicates that only noise is received. From this, the following equation is derived.

式２
Ｈ_１（ｚ）は、利用可能なシステム識別アルゴリズム、およびシステムがノイズのみを受けていることが確実なときのマイクロフォン出力を用いて計算することができる。この計算は適応的に行うことができるので、システムはノイズ変化に反応することができる。 Formula 2
H ₁ (z) can be calculated using available system identification algorithms and the microphone output when it is certain that the system is only receiving noise. Since this calculation can be done adaptively, the system can react to noise changes.

式１における未知数の１つに対して、ここである解法が利用可能である。別の未知数Ｈ_２（ｚ）は、ＶＡＤが１に等しく、音声が発せられているときを用いることによって決定することができる。これが生じているが、マイクロフォンの最新の（恐らく１秒未満）履歴が低いノイズ・レベルを示す場合、ｎ（ｓ）＝Ｎ（ｚ）〜０と想定することができる。したがって、式１は次のように変形する。 The solution here can be used for one of the unknowns in Equation 1. Another unknown H ₂ (z) can be determined by using when VAD is equal to 1 and speech is being emitted. If this occurs, but n (s) = N (z) -0 can be assumed if the latest (possibly less than 1 second) history of the microphone indicates a low noise level. Therefore, Equation 1 is transformed as follows.

一方、これから次の式が導かれる。 On the other hand, the following equation is derived from this.

これは、Ｈ_１（ｚ）計算の逆である。しかしながら、異なる入力を用いていることを注記しておく。ここでは信号のみが発生しているのに対して、以前ではノイズのみが発生していた。Ｈ_２（ｚ）)の計算中、Ｈ_１（ｚ）のために計算した値は一定を保ち、その逆も言える。つまり、Ｈ_１（ｚ）およびＨ_２（ｚ）の一方を計算している間、計算していない方は実質的に変化しないと考えられる。 This is the reverse of the H ₁ (z) calculation. However, note that different inputs are used. In this case, only the signal is generated, whereas in the past, only the noise is generated. During the calculation of H ₂ (z)), the value calculated for H ₁ (z) remains constant and vice versa. That is, while _one of H ₁ (z) and H ₂ (z) is being calculated, it is considered that the non-calculated one does not change substantially.

Ｈ_１（ｚ）およびＨ_２（ｚ）を計算した後、これらを用いて信号からノイズを除去する。式１を次のように書き換えると、 After calculating H ₁ (z) and H ₂ (z), they are used to remove noise from the signal. Rewriting equation 1 as follows:

次いで、式に示すようにＮ（ｚ）を代入すれば、以下のようにＳ（ｚ）について解くことができる。 Next, if N (z) is substituted as shown in the equation, S (z) can be solved as follows.

式３
伝達関数Ｈ_１（ｚ）およびＨ_２（ｚ）を十分な精度で記述することができれば、ノイズを完全に除去し、元の信号を再現することができる。これは、ノイズの増幅やスペクトル特性に関しない限り正しい。行う唯一の仮定は、完璧なＶＡＤ、即ち、十分に精度が高いＨ_１（ｚ）およびＨ_２（ｚ）、そしてＨ_１（ｚ）およびＨ_２（ｚ）の一方を計算しているとき、他方は実質的に変化しないことである。実際には、これらの仮定は正当であることが証明されている。 Formula 3
If the transfer functions H ₁ (z) and H ₂ (z) can be described with sufficient accuracy, the noise can be completely removed and the original signal can be reproduced. This is true unless it relates to noise amplification or spectral characteristics. The only assumptions to make are perfect VAD, i.e. when calculating one of H ₁ (z) and H ₂ (z) and H ₁ (z) and H ₂ (z) that are sufficiently accurate, The other is that it does not change substantially. In practice, these assumptions have proven to be valid.

ここに説明するノイズ除去アルゴリズムは、あらゆる数のノイズ源を含むように、容易に一般化することができる。図３は、ｎ個の別個のノイズ源に一般化した、一実施形態のノイズ除去アルゴリズムのフロント・エンドのブロック図である。これら別個のノイズ源は、互いの反射またはエコーである可能性もあるが、そのようには限定しないこととする。数個のノイズ源が示されており、各々が各マイクロフォンまでの伝達関数、即ち、経路を有する。ノイズ源２のＭＩＣ１への経路の表示が一層都合良くなるように、以前にＨ_２と命名した経路はＨ_０と表記され直されている。各マイクロフォンの出力は、ｚドメインに変換すると、次のように表される。 The denoising algorithm described herein can be easily generalized to include any number of noise sources. FIG. 3 is a block diagram of the front end of an embodiment denoising algorithm generalized to n separate noise sources. These separate noise sources may be reflections or echoes of each other, but are not so limited. Several noise sources are shown, each having a transfer function or path to each microphone. The path previously designated H ₂ has been rewritten as H ₀ so that the display of the path to MIC 1 of noise source 2 is more convenient. When the output of each microphone is converted to the z domain, it is expressed as follows.

式４
信号がない（ＶＡＤ＝０）場合、（明確化のためにｚを省略する）次のようになる。 Formula 4
When there is no signal (VAD = 0), (z is omitted for clarity):

式５
ここで、前述のＨ_１（ｚ）に類似した、新たな伝達関数を定義することができる。 Formula 5
Here, a new transfer function similar to the aforementioned H ₁ (z) can be defined.

式６
このように、Ｈ_１ ^〜は、ノイズ源およびそれらの各伝達関数のみに依存し、信号が送信されていないときにはいつでも計算することができる。この場合も、マイクロフォン入力に付いているｎという下付き文字は、ノイズが検出されていることのみを示し、一方ｓという下付き文字は、マイクロフォンが信号のみを受信していることを示す。 Equation 6
Thus, H ₁ ^~ depends only on the noise sources and their respective transfer functions and can be calculated whenever no signal is being transmitted. Again, the subscript n attached to the microphone input indicates only that noise has been detected, while the subscript s indicates that the microphone is receiving only the signal.

ノイズがないと仮定しながら式４を検討すると、次の式が得られる。 Examining Equation 4 assuming no noise, the following equation is obtained.

このように、いずれかの利用可能な伝達関数計算アルゴリズムを用いれば、以前と同様にＨ_０について解くことができる。数学的には、次のように表される。 Thus, using any available transfer function calculation algorithm, one can solve for H ₀ as before. Mathematically expressed as:

式６において定義したＨ_１ ^〜を用いて式４を書き直すと、次の式が得られる。 Rewriting equation 4 using H ₁ ^~ defined in equation 6 yields the following equation:

式７
Ｓについて解くと、次の式が得られる。 Equation 7
Solving for S yields:

式８
これは式３と同一であり、Ｈ_０がＨ_２に取って代わり、Ｈ_１ ^〜がＨ_１に取って代わっている。このように、ノイズ除去アルゴリズムは、ノイズ源の多数のエコーを含む、あらゆる数のノイズ源に対しても、数学的には有効である。この場合もＨ_０およびＨ_１ ^〜を十分な精度で推定することができ、信号からマイクロフォンまでの経路が唯一であるという前述の仮定を維持すれば、ノイズを完全に除去することができる。 Equation 8
This is the same as Equation 3, _{H 0} is replaces the _{_{H 2,} H} ^{1 ~} has replaced the _{H 1.} Thus, the denoising algorithm is mathematically effective for any number of noise sources, including multiple echoes of the noise source. In this case also it is possible to estimate H ₀ and H ₁ ^~ with sufficient accuracy, it is maintained the aforementioned assumption that the path from the signal to the microphone, the only, the noise can be completely removed.

最も一般的な場合では、多数のノイズ源および多数の信号源を伴う。図４は、ｎ個の別個のノイズ源および信号反射があるという最も一般的な場合における、一実施形態のノイズ除去アルゴリズムのフロント・エンドのブロック図である。ここでは、信号の反射は双方のマイクロフォンに入る。これは、最も一般的な場合であり、マイクロフォンに入るノイズ源の反射は、単純な追加ノイズ源として精度高くモデル化することができる。明確化のために、信号からＭＩＣ２への直接経路は、Ｈ_０（ｚ）からＨ_００（ｚ）に変化し、ＭＩＣ１およびＭＩＣ２への反射経路を、それぞれ、Ｈ_０１（ｚ）およびＨ_０２（ｚ）で示す。 The most common case involves multiple noise sources and multiple signal sources. FIG. 4 is a block diagram of the front end of an embodiment denoising algorithm in the most general case where there are n distinct noise sources and signal reflections. Here, signal reflections enter both microphones. This is the most common case and the reflection of the noise source entering the microphone can be accurately modeled as a simple additional noise source. For clarity, the direct path from the signal to MIC2 changes from H ₀ (z) to H ₀₀ (z), and the reflection paths to MIC1 and MIC2 are respectively H ₀₁ (z) and H ₀₂ ( z).

すると、マイクロフォンへの入力は次のようになる。 Then, the input to the microphone is as follows.

式９
ＶＡＤ＝０の場合、入力は次のようになる（この場合も「ｚ」を省略する）。 Equation 9
When VAD = 0, the input is as follows (again, “z” is omitted).

これは、式５と同一である。つまり、式６におけるＨ_１ ^〜の計算は、予想通り不変である。ノイズがない状況を検討すると、式９は次のように変形する。 This is the same as Equation 5. In other words, the calculation of _H ^{1 ~} in Formula 6 is invariant as expected. Examining the situation without noise, Equation 9 is transformed as follows.

これからＨ_２ ^〜の定義が導かれる。 This leads ^to the definition of H ₂ ^~ .

式１０
再度Ｈ_１ ^〜についての定義（式７にあるように）を用いて式９を書き直すと、次の式が得られる。 Equation 10
Rewriting Equation 9 using the definition for H ₁ ^~ again (as in Equation 7) yields:

式１１
いくつかの代数的操作によって次の式が得られる。 Equation 11
Several algebraic operations yield:

そして、最終的に次の式が得られる。 Finally, the following equation is obtained.

式１２
式１２は式８と同一であるが、Ｈ_０がＨ_２ ^〜と置き換わり、（１＋Ｈ_０１）係数が左辺に追加されている。この新たな係数は、この状況ではＳは直接的には解くことができず、信号とそのエコー全てを追加したものについて解を生成できることを意味する。これはさほど悪い状況ではない。何故なら、エコー抑制を扱う方法は従来から多数あり、エコーを抑制できなくても、エコーが音声の理解度に重要な影響を及ぼす可能性は低いからである。ノイズ源として作用するＭＩＣ２における信号エコーに対処するには、一層複雑なＨ_２ ^〜の計算が必要となる。 Formula 12
Equation 12 is the same as Equation 8, except that H ₀ replaces H ₂ ^˜ and a (1 + H ₀₁ ) coefficient is added on the left side. This new coefficient means that in this situation S cannot be solved directly, but a solution can be generated for the signal plus all its echoes. This is not a bad situation. This is because there have been many methods for dealing with echo suppression, and even if the echo cannot be suppressed, it is unlikely that the echo will have an important effect on the understanding of speech. To deal with the signal echo in MIC2 acting as a noise source, a more complex calculation of H ₂ ^~ is required.

図５は、一実施形態のノイズ消去方法(denoising method)のフロー図である。動作において、音響信号を受ける（５０２）。更に、人の発声活動に伴う心理学的情報を受ける（５０４）。少なくとも指定した１時間期間に発声情報が音響信号にないと判定した場合、音響信号を表す第１伝達関数を計算する（５０６）。少なくとも１時間期間に発声情報が音響信号にあると判定した場合、音響信号を表す第２伝達関数を計算する（５０８）。第１伝達関数および第２伝達関数の少なくとも１つの組み合わせを用いて、音響信号からノイズを除去し、ノイズ消去音響データ・ストリームを生成する（５１０）。 FIG. 5 is a flow diagram of a noise denoising method of one embodiment. In operation, an acoustic signal is received (502). Furthermore, it receives psychological information associated with human voice activity (504). If it is determined that the utterance information is not present in the acoustic signal for at least the designated one hour period, a first transfer function representing the acoustic signal is calculated (506). If it is determined that the utterance information is in the acoustic signal for at least one hour, a second transfer function representing the acoustic signal is calculated (508). At least one combination of the first transfer function and the second transfer function is used to remove noise from the acoustic signal and generate a noise-canceled acoustic data stream (510).

ここでは、ノイズ除去のアルゴリズム、即ち、ノイズ消去アルゴリズムについて、反射やエコーを有する多数のノイズ源への直接経路を有する単一のノイズ源という最も単純な事例から説明する。このアルゴリズムをここに示したのは、あらゆる環境条件の下でも実行可能であるためである。ノイズの種類および量は、Ｈ_１ ^〜およびＨ_２ ^〜の正しい推定値が得られ、そして一方を計算している間他方が実質的に変化しないならば、取るに足りないことである。ユーザ環境にエコーが存在するような場合、これらがノイズ源から来るのであれば、補償することができる。信号エコーも存在する場合、これらは浄化信号(cleaned signal)に影響を及ぼすが、その効果は殆どの場合無視できる程度とみてよい。 Here, the noise removal algorithm, ie, the noise cancellation algorithm, will be described from the simplest case of a single noise source having direct paths to multiple noise sources with reflections and echoes. This algorithm is shown here because it can be executed under any environmental conditions. The type and amount of noise is obtained the correct estimate of ^~ H ₁ ^~ and H ₂ is and, if the other does not change substantially during the calculated one, is that insignificant. If echoes are present in the user environment, they can be compensated if they come from noise sources. If signal echoes are also present, they affect the cleaned signal, but the effect can be considered negligible in most cases.

動作において、一実施形態のアルゴリズムは、種々のノイズ種、振幅、および方位を処理した際に、優れた結果を示している。しかしながら、数学的概念から設計用途に移行する際には、常に近似や調節を行わなければならない。式３において、Ｈ_２（ｚ）が小さく、したがってＨ_２（ｚ）Ｈ_１（ｚ）≒０であると仮定すると、式３は次のように変形する。 In operation, the algorithm of one embodiment has shown excellent results when processing various noise types, amplitudes, and orientations. However, approximations and adjustments must always be made when moving from mathematical concepts to design applications. Assuming that H ₂ (z) is small in equation 3 and therefore H ₂ (z) H ₁ (z) ≈0, equation 3 is transformed as follows.

これは、Ｈ_１（ｚ）のみを計算すればよいので、プロセスの高速化および必要な計算数の著しい削減がもたらされることを意味する。マイクロフォンを適正に選択することによって、この近似は容易に実現する。 This means that only H ₁ (z) needs to be calculated, resulting in a faster process and a significant reduction in the number of calculations required. This approximation is easily achieved by proper selection of the microphone.

一実施形態では、別の近似はフィルタの使用を伴う。実際のＨ_１（ｚ）は疑いもなく極および零双方を有するが、安定性および簡素化のために、全零有限インパルス応答（ＦＩＲ）フィルタを用いる。十分なタップ（約６０）があれば、実際のＨ_１（ｚ）に対する近似は非常に優れている。 In one embodiment, another approximation involves the use of a filter. The actual H ₁ (z) undoubtedly has both poles and zeros, but uses an all-zero finite impulse response (FIR) filter for stability and simplicity. With enough taps (about 60), the approximation to the actual H ₁ (z) is very good.

サブバンドの選択に関して、伝達関数を計算しなければならない周波数範囲が広い程、これを精度高く計算するのは一層困難となる。したがって、音響データを１６のサブバンドに分割し、最低周波数を５０Ｈｚ、最高周波数を３７００Ｈｚとする。次いで、ノイズ消去アルゴリズムを各サブバンドに順番に適用し、１６のノイズ消去データ・ストリームを再結合して、ノイズ消去音響データを生成する。これは非常にうまく行くが、いずれのサブバンドの組み合わせ（即ち、４、６、８、３２、等間隔、知覚できる間隔等）でも用いることができ、同様に作用することがわかっている。 With respect to subband selection, the wider the frequency range for which the transfer function must be calculated, the more difficult it is to calculate it with high accuracy. Therefore, the acoustic data is divided into 16 subbands, with the lowest frequency being 50 Hz and the highest frequency being 3700 Hz. A noise cancellation algorithm is then applied to each subband in turn, and the 16 noise cancellation data streams are recombined to generate noise cancellation acoustic data. This works very well, but any subband combination (ie, 4, 6, 8, 32, equally spaced, perceptible intervals, etc.) can be used and has been found to work as well.

一実施形態では、ノイズの振幅を制限し、用いるマイクロフォンが飽和しないようにした（即ち、線形応答領域の外側で動作する）。最良の性能を確保するには、マイクロフォンが線形に動作することは重要である。この制約があっても、信号対ノイズ比（ＳＮＲ）が非常に低い信号でも、ノイズを消去することができる（−１０ｄＢまたはこれ未満に低下）。 In one embodiment, the noise amplitude was limited so that the microphone used was not saturated (ie, operating outside the linear response region). It is important that the microphone behaves linearly to ensure the best performance. Even with this constraint, even a signal with a very low signal-to-noise ratio (SNR) can eliminate noise (down to -10 dB or less).

Ｈ_１（ｚ）の計算は、最小二乗平均（ＬＭＳ）方法、共通適応伝達関数を用いて、１０ミリ秒毎に行う。WidrowおよびSteamsによる"Adaptive Signal Processing"（適応信号処理）(1985)、Prentic-Hall出版、ISBN 0-13-004029-0を参照すれば、説明を見ることができる。 The calculation of H ₁ (z) is performed every 10 milliseconds using a least mean square (LMS) method and a common adaptive transfer function. The explanation can be found in "Adaptive Signal Processing" (1985) by Widrow and Steams, Prentic-Hall, ISBN 0-13-004029-0.

一実施形態に対するＶＡＤは、無線周波数センサと２つのマイクロフォンから得られ、有声および無声音声双方に対して非常に高い精度（＞９９％）が得られる。一実施形態のＶＡＤは、無線周波数（ＲＦ）干渉計を用いて、人の音声生成に伴う組織の運動を検出するが、そのように限定される訳ではない。したがって、これは完全に音響ノイズがなく、あらゆる音響ノイズ環境において機能することができる。ＲＦ信号の単純なエネルギ測定を用いれば、有声音声が発しているか否か判定することができる。無声音声の判定は、従来の音響系方法を用いて、ＲＦセンサまたは同様の発声センサを用いて判定した有声区間に対する近接(proximity)によって、または前述の組み合わせによって行うことができる。無声音声におけるエネルギは遥かに少ないので、その活性化精度(activation accuracy)は、有声音声ほど重要ではない。 The VAD for one embodiment is obtained from a radio frequency sensor and two microphones, resulting in very high accuracy (> 99%) for both voiced and unvoiced speech. The VAD of one embodiment uses a radio frequency (RF) interferometer to detect tissue motion associated with human speech production, but is not so limited. It is therefore completely free of acoustic noise and can function in any acoustic noise environment. A simple energy measurement of the RF signal can be used to determine whether voiced speech is being emitted. The determination of unvoiced speech can be made by proximity to the voiced interval determined using a conventional acoustic method, using an RF sensor or similar utterance sensor, or a combination of the foregoing. Since the energy in unvoiced speech is much less, its activation accuracy is not as important as voiced speech.

有声および無声音声を信頼性高く検出することによって、一実施形態のアルゴリズムを実現することができる。ここで再度、ノイズ除去アルゴリズムは、どのようにしてＶＡＤを取得するかには依存しないこと、特に有声音声に対しては精度が高いことのみを繰り返すことは有用である。音声が検出されず、音声上で訓練が行われる場合、後続のノイズ消去音響データは歪む可能性がある。 By detecting voiced and unvoiced speech reliably, the algorithm of one embodiment can be realized. Here again, it is useful to repeat that the noise removal algorithm does not depend on how the VAD is acquired, and particularly that it is highly accurate for voiced speech. If no speech is detected and training is performed on the speech, subsequent noise-canceling acoustic data may be distorted.

データは、４チャネルで収集した。即ち、ＭＩＣ１に１つ、ＭＩＣ２に１つ、そして有声音声に伴う組織の運動を検出した無線周波数センサに２つ割り当てた。データを４０ｋＨｚで同時にサンプリングし、次いでディジタル的にフィルタ処理し、８ｋＨｚにデシメート(decimate)した。高サンプリング・レートを用いることにより、アナログ−ディジタル・プロセスによって生ずる可能性があるエイリアシングを低減させた。National Instruments社の４チャネルＡ／Ｄボードを、Labviewと共に用い、データを捕獲して格納した。次いで、データをＣプログラムに読み込み、一度に１０ミリ秒ずつノイズ消去した。 Data was collected on 4 channels. That is, one was assigned to MIC1, one was assigned to MIC2, and two radio frequency sensors were detected that detected tissue movements associated with voiced speech. Data was sampled simultaneously at 40 kHz, then digitally filtered and decimated to 8 kHz. By using a high sampling rate, the aliasing that can be caused by the analog-digital process has been reduced. A National Instruments 4-channel A / D board was used with Labview to capture and store the data. The data was then read into the C program and noise was erased 10 milliseconds at a time.

図６は、多くの他の話者や公衆用通報を含む空港ターミナルのノイズが存在する場合における、一実施形態のノイズ抑制アルゴリズムの英語を話すアメリカ人女性に関する結果を示す。話者は、中程度の空港ターミナルのノイズの中で、番号４０６−５５６２を発音した。汚れた音響データは、一度に１０ミリ秒ずつノイズ消去され、ノイズ消去の前に、１０ミリ秒のデータは、５０から３７００Ｈｚで予めフィルタ処理されている。約１７ｄＢのノイズ低減が明白である。このサンプルには事後フィルタ処理を行わなかった。つまり、実現したノイズ低減の全ては、一実施形態のアルゴリズムによるものである。尚、アルゴリズムはノイズに即座に順応し、他の話者の非常に難しいノイズを除去可能であることは明らかである。多くの異なる種類のノイズ全てを検査して、同様の結果を得た。その中には、町中の騒音、ヘリコプタ、音楽、および正弦波を含み、それ以外にも多数ある。また、ノイズの方位を大幅に変化させることができ、それでもノイズ抑制性能は殆ど変化しない。最後に、ノイズ除去した音声の歪みは非常に少なく、音声認識エンジンや聴取者(human receiver)に対しても同様に優れた性能を確保する。 FIG. 6 shows the results for an English speaking American woman of one embodiment of the noise suppression algorithm in the presence of airport terminal noise including many other speakers and public calls. The speaker pronounced the number 406-5562 in moderate airport terminal noise. The dirty acoustic data is noise-erased 10 milliseconds at a time, and the 10-millisecond data is pre-filtered at 50 to 3700 Hz before noise cancellation. A noise reduction of about 17 dB is evident. This sample was not post-filtered. That is, all of the realized noise reduction is due to the algorithm of one embodiment. It is clear that the algorithm adapts quickly to noise and can remove the very difficult noise of other speakers. Many different types of noise were examined and similar results were obtained. Among them are town noise, helicopters, music, and sine waves, and many others. In addition, the noise direction can be significantly changed, and the noise suppression performance is hardly changed. Lastly, the noise-removed speech has very little distortion and ensures excellent performance for speech recognition engines and human receivers as well.

一実施形態のノイズ除去アルゴリズムは、あらゆる環境条件の下でも実行可能であることが示された。Ｈ_１ ^〜およびＨ_２ ^〜の正しい推定値が得られているならば、取るに足りないことである。ユーザ環境にエコーが存在するような場合、これらがノイズ源から来るのであれば、補償することができる。信号エコーも存在する場合、これらは浄化信号に影響を及ぼすが、その効果は殆どの場合無視できる程度とみてよい。 It has been shown that the denoising algorithm of one embodiment can be performed under any environmental conditions. If H ₁ ^- and H ₂ correct estimate of ^~ is obtained, is that insignificant. If echoes are present in the user environment, they can be compensated if they come from noise sources. If signal echoes are also present, they affect the purification signal, but the effect can be considered negligible in most cases.

図７は、図２、図３、および図４の実施形態の下において、ノイズに単一指向性マイクロフォンＭ２を用い、音声に無指向性マイクロフォンＭ１を用いてノイズ消去を行う物理的構成のブロック図である。前述のように、音声からノイズ・マイクロフォン（ＭＩＣ２）までの経路は０に近似され、近似は、無指向性および単一指向性マイクロフォンの注意深い設置によって実現する。ノイズが信号位置（ノイズ源Ｎ１）と逆に向いている場合、これは非常に高い効果が得られる（２０〜４０ｄＢのノイズ抑制）。しかしながら、ノイズ源が話者（ノイズ源Ｎ２）と同じ側を向いている場合、性能は、わずか１０〜２０ｄＢのノイズ抑制に低下する可能性がある。この抑制能力の低下は、Ｈ_２が０に近いことを確保するために講ずる処置に帰する可能性がある。これらの処置は、ノイズ・マイクロフォン（ＭＩＣ２）に単一指向性マイクロフォンを用いることを含むので、ノイズ・データ内には信号は殆ど現れない。単一指向性マイクは、特定の方向から来る音響ノイズを打ち消すので、音声と同じ方向から来るノイズも打ち消す。このために、適応アルゴリズムがＮ_２のような位置においてノイズを特徴化し次いで除去する能力が制限されることもあり得る。同じ効果は、単一指向性マイクロフォンを音声マイクロフォンＭ１に用いる場合にも見られる。 FIG. 7 is a block diagram of a physical configuration that performs noise cancellation using the unidirectional microphone M2 for noise and the omnidirectional microphone M1 for speech under the embodiment of FIGS. FIG. As mentioned above, the path from the voice to the noise microphone (MIC2) is approximated to 0, and the approximation is achieved by careful placement of omnidirectional and unidirectional microphones. If the noise is opposite to the signal position (noise source N1), this has a very high effect (20-40 dB noise suppression). However, if the noise source is facing the same side as the speaker (noise source N2), the performance can be reduced to only 10-20 dB noise suppression. This reduction in suppression capability can be attributed to actions taken to ensure that H ₂ is close to zero. These treatments involve using a unidirectional microphone for the noise microphone (MIC2), so that almost no signal appears in the noise data. The unidirectional microphone cancels acoustic noise coming from a specific direction, and thus cancels noise coming from the same direction as the voice. For this, the adaptive algorithm can also be the ability to characterize the noise is then removed in a position such as N ₂ is restricted. The same effect can be seen when a unidirectional microphone is used for the voice microphone M1.

しかしながら、単一指向性マイクロフォンＭ２を無指向性マイクロフォンと交換すると、大量の信号がＭ２によって捕獲される。これは、Ｈ_２が０であるとした前述の仮定に反することになり、その結果、発声中に大量の信号が除去され、ノイズ消去および「信号消去」(de-signaling)が行われることになる。信号ノイズを最小に抑えなければならない場合、これは容認できない。歪みを低減するためには、したがって、Ｈ_２の値を計算する。しかしながら、Ｈ_２の値は、ノイズの存在下では計算できず、さもなければ、ノイズを音声と誤って認識し(mislabel)、除去されないことになる。 However, when the unidirectional microphone M2 is replaced with an omnidirectional microphone, a large amount of signal is captured by M2. This would violate the above assumption that H ₂ is 0, so that a large amount of signal is removed during speech and noise cancellation and “de-signaling” is performed. Become. This is unacceptable if signal noise must be minimized. To reduce the distortion, thus, to calculate the value of H _2. However, the value of H ₂ can not calculate in the presence of noise, otherwise noise is recognized incorrectly voice (mislabel), it will not be removed.

音響のみのマイクロフォン・アレイを用いた経験から、小型の二マイクロフォン・アレイが、この問題に対する解決策となり得る。図８は、一実施形態の下における、２つの無指向性マイクロフォンを含む、ノイズ消去マイクロフォン構成である。同じ方向（信号源に向けて）に方位付けした２つの単一指向性マイクロフォンを用いることによっても、同じ効果を得ることができる。更に別の実施形態では、１つの単一指向性マイクロフォンと、１つの無指向性マイクロフォンとを用いる。その考えは、信号源の方向において、音響源から同様の情報を捕獲することである。信号源と２つのマイクロフォンの相対的な位置は、一定であり、既知である。マイクロフォンを、ｎ個の離散時間サンプルに対応する距離ｄだけ離間して配置し、話者をアレイの軸上に配置することによって、Ｈ_２をＣｚ^−ｎという形態に固定することができる。ここで、ＣはＭ_１およびＭ_２における信号データの振幅の差である。以下に続く論述について、ｎ＝１と仮定する。しかし、０以外のいずれの整数でも使用可能である。因果性(causality)のために、正の整数の使用を推奨する。球状圧力源の振幅が１／ｒとして変化すると、これは、圧力源の方向だけでなく、その距離の指定も可能にする。要求されるＣは、次のように推定することができる。 From experience with acoustic-only microphone arrays, a small two-microphone array can be a solution to this problem. FIG. 8 is a noise cancellation microphone configuration that includes two omnidirectional microphones under an embodiment. The same effect can be obtained by using two unidirectional microphones oriented in the same direction (towards the signal source). In yet another embodiment, one unidirectional microphone and one omnidirectional microphone are used. The idea is to capture similar information from the acoustic source in the direction of the signal source. The relative positions of the signal source and the two microphones are constant and known. Microphone and spaced apart by a distance d corresponding to n discrete time sample was placed, by placing the speaker on the axis of the array can be fixed and H ₂ in the form of Cz ^-n. Here, C is a difference in amplitude of signal data in M ₁ and M ₂ . For the discussion that follows, assume n = 1. However, any integer other than 0 can be used. The use of positive integers is recommended for causality. As the amplitude of the spherical pressure source changes as 1 / r, this allows not only the direction of the pressure source but also the designation of its distance. The required C can be estimated as follows.

図９は、図８の実施形態の下における、要求Ｃ対距離のプロットである。漸近線はＣ＝１．０にあり、Ｃは、１フィートよりも多少多い約３８センチメートルで０．９に達し、約６０センチメートルで０．９４に達することが分かる。ハンドセットおよびイヤピースにおいて通常関与する距離（４から１２ｃｍ）では、Ｃは約０．５から０．７５の間であろう。これは、約６０ｃｍに位置するノイズ源とは、約１９から４４％の差であり、殆どのノイズ源はそれよりも遠くに位置することは明白である。したがって、この構成を用いるシステムは、ノイズと信号との間で、これらの方位が類似していても、非常に効果的に判別することができる。 FIG. 9 is a plot of required C versus distance under the embodiment of FIG. It can be seen that the asymptote is at C = 1.0, and C reaches 0.9 at about 38 centimeters, slightly more than 1 foot, and reaches 0.94 at about 60 centimeters. For distances typically involved in handsets and earpieces (4 to 12 cm), C will be between about 0.5 and 0.75. This is a difference of about 19 to 44% from a noise source located at about 60 cm, and it is clear that most noise sources are located further away. Therefore, a system using this configuration can very effectively discriminate between noise and signal even if their orientations are similar.

Ｃの推定値が劣る場合のノイズ消去に対する効果を判定するために、Ｃ＝ｎＣ_０と仮定する。ここで、Ｃは推定値、Ｃ_０はＣの実際の値である。前述のことからの信号の定義を用いると、 To determine the effect on noise cancellation when the estimated value of C is poor, it is assumed that C = nC _0. Here, C is an estimated value, and C ₀ is an actual value of C. Using the signal definition from the above,

Ｈ_２（ｚ）は非常に小さいと仮定したので、信号は次のように近似することができる。 Since H ₂ (z) is assumed to be very small, the signal can be approximated as:

音声がない場合にはこれは真である。何故なら、定義からＨ_２＝０であるからである。しかしながら、音声が発声すると、Ｈ_２は非零となり、Ｃｚ^−１に設定すると、次のようになる。 This is true when there is no audio. This is because H ₂ = 0 by definition. However, the voice is uttered, the H ₂ is non-zero, and the setting to Cz ^-1, as follows.

これは、次のように書き換えることができる。 This can be rewritten as follows.

分母における最後の係数が、Ｃの推定が劣ることによる誤差を決定する。この係数をＥとすると、 The last coefficient in the denominator determines the error due to poor estimation of C. If this coefficient is E,

となる。
ｚ^−１Ｈ_１（ｚ）はフィルタであるので、その量は常に正である。したがって、計算した信号振幅(signal magnitude)のＥによる変化は、完全に（１−ｎ）に依存する。 It becomes.
Since z ⁻¹ H ₁ (z) is a filter, its amount is always positive. Therefore, the change of the calculated signal magnitude due to E depends entirely on (1-n).

誤差には２つの可能性、Ｃの過小評価（ｎ＜１）、およびＣの過大評価（ｎ＞１）がある。最初の場合、Ｃは実際よりも小さく推定され、即ち、信号は推定よりも近い。この場合、（１−ｎ）、つまりＥは正となる。したがって、分母は過度に大きくなり、ノイズ除去した信号の振幅は過度に小さくなる。これは、信号消去を示す。第２の場合、信号は推定よりも遠くにあり、Ｅは負となり、Ｓは実際よりも大きくなる。この場合、ノイズ消去は不十分となる。非常に小さな信号歪みが望ましいので、信号はＣの過大評価に向かって逸れるとよい。 There are two possibilities for error: C underestimation (n <1) and C overestimation (n> 1). In the first case, C is estimated smaller than actual, i.e. the signal is closer than estimated. In this case, (1-n), that is, E is positive. Therefore, the denominator becomes excessively large, and the amplitude of the signal from which noise is removed becomes excessively small. This indicates signal erasure. In the second case, the signal is farther than estimated, E is negative, and S is greater than actual. In this case, noise erasure is insufficient. Since very small signal distortion is desirable, the signal should deviate towards an overestimation of C.

また、この結果は、ノイズが信号と同じ立体角（Ｍ１からの方向）に位置する場合、信号位置とノイズ位置との間でのＣの変化に応じて、信号はほぼ除去される。このため、ハンドセットを用いＭ_１を口から約４ｃｍとした場合、要求されるＣは約０．５であり、ノイズについては、約１メートルにおいて、Ｃは約０．９６である。このように、ノイズについては、Ｃ＝０．５の推定値が意味するのは、ノイズについてはＣが過小評価されており、ノイズは除去されるということである。除去量は、（１−ｎ）に直接依存する。したがって、このアルゴリズムは、信号に対する方向および距離を用いて、ノイズから信号を分離する。 Further, as a result, when the noise is located at the same solid angle (direction from M1) as the signal, the signal is almost removed according to the change of C between the signal position and the noise position. Thus, if a handset is used and M ₁ is about 4 cm from the mouth, the required C is about 0.5, and for noise, at about 1 meter, C is about 0.96. Thus, for noise, the estimated value of C = 0.5 means that C is underestimated for noise and the noise is removed. The removal amount depends directly on (1-n). The algorithm therefore uses the direction and distance to the signal to separate the signal from the noise.

この技法で生ずる問題の１つは、その安定性である。即ち、（１−Ｈ_１Ｈ_２）のデコンボリューションによって安定性の疑問が生ずる。何故なら、各有声セグメントの開始時に、１−Ｈ_１Ｈ_２の逆数を計算する必要性が生ずるからである。これは、アルゴリズムを実施するために必要な計算時間、即ち、サイクル当たりの命令数を削減するのに役立つ。何故なら、Ｈ_２が一定であると考えられるので、あらゆる有声ウィンドウについて逆数を計算する必要はなく、最初の１つだけでよいからである。しかしながら、擬陽性が生ずる毎に１−Ｈ_１Ｈ_２の逆数を計算する必要があることにより、この近似では、擬陽性(false positive)は計算に関しては一層コスト高となる。 One problem that arises with this technique is its stability. In other words, the deconvolution of (1-H ₁ H ₂ ) raises the question of stability. This is because at the beginning of each voiced segment, it becomes necessary to calculate the reciprocal of 1-H ₁ H ₂ . This helps to reduce the computation time required to implement the algorithm, ie the number of instructions per cycle. Because since H ₂ is considered to be constant, there is no need to calculate the inverse for every voiced window, because initial or only one. However, because it is necessary to calculate the reciprocal of 1-H ₁ H ₂ each time a false positive occurs, in this approximation, false positives are more costly to calculate.

幸い、Ｈ_２の選択によって、デコンボリューションが不要となる。以上の論述から、信号は次のように書き表すことができる。 Fortunately, the selection of H ₂ eliminates the need for deconvolution. From the above discussion, the signal can be written as:

これは次のように書き直すことができる。 This can be rewritten as:

または Or

しかしながら、Ｈ_２（ｚ）Ｃｚ^−１という形態であるので、時間ドメインにおけるシーケンスは次のように見える。 However, since it is in the form of H ₂ (z) Cz ^−1, the sequence in the time domain looks as follows:

これは、現信号サンプルは、現ＭＩＣ１信号、現ＭＩＣ２信号、および直前の信号サンプルを必要とすることを意味する。これは、デコンボリューションは不要であり、以前のように単純な除算、ついでコンボリューションだけでよいことを意味する。必要な計算量の増加は最少である。したがって、この改良は実現が容易である。 This means that the current signal sample requires the current MIC1 signal, the current MIC2 signal, and the previous signal sample. This means that deconvolution is not necessary and simple division and then convolution as before are sufficient. The required increase in computational complexity is minimal. This improvement is therefore easy to implement.

この実施形態に対するマイクロフォン応答の差による効果は、図２、図３、および図４を参照して説明した構成を検討することによって示すことができ、このときは、ＭＩＣ１およびＭＩＣ２の周波数応答を、それらのフィルタリングおよび増幅応答と共に表す伝達関数Ａ（ｚ）およびＢ（ｚ）のみが含まれる。図１０は、２つのマイクロフォンＭＩＣ１およびＭＩＣ２が異なる応答特性を有する一実施形態の下における、ノイズ除去アルゴリズムのフロント・エンドのブロック図である。 The effect of the difference in microphone response for this embodiment can be shown by examining the configuration described with reference to FIGS. 2, 3, and 4, where the frequency response of MIC1 and MIC2 is Only the transfer functions A (z) and B (z) represented along with their filtering and amplification responses are included. FIG. 10 is a block diagram of the front end of the denoising algorithm under one embodiment where the two microphones MIC1 and MIC2 have different response characteristics.

図１０は、一実施形態のプロセスの図式記述を含み、単一の信号源１０００および単一のノイズ源１００１がある。このアルゴリズムは２つのマイクロフォン、即ち、「信号」マイクロフォン１（「ＭＩＣ１」）および「ノイズ」マイクロフォン２（「ＭＩＣ２」）を用いるが、これに限定される訳ではない。ＭＩＣ１は、主に信号を捕獲し、一部ノイズも捕獲すると仮定し、一方ＭＩＣ２は主にノイズを捕獲し、一部信号も捕獲すると仮定する。信号源１０００からＭＩＣ１への信号をｓ（ｎ）で示す。ｓ（ｎ）は信号源１０００からのアナログ信号の離散サンプルである。信号源１０００からＭＩＣ２へのデータをｓ_２（ｎ）で示す。ノイズ源１００１からＭＩＣ２へのデータをｎ（ｎ）で示す。ノイズ源１００１からＭＩＣ１へのデータをｎ_２（ｎ）で示す。 FIG. 10 includes a schematic description of the process of one embodiment, where there is a single signal source 1000 and a single noise source 1001. This algorithm uses, but is not limited to, two microphones: a “signal” microphone 1 (“MIC1”) and a “noise” microphone 2 (“MIC2”). Assume that MIC1 primarily captures signals and also captures some noise, while MIC2 primarily captures noise and also captures some signal. A signal from the signal source 1000 to the MIC 1 is denoted by s (n). s (n) is a discrete sample of the analog signal from the signal source 1000. Data from the signal source 1000 to the MIC _{2 is} denoted by s ₂ (n). Data from the noise source 1001 to the MIC2 is indicated by n (n). Data from the noise source 1001 to the MIC 1 is denoted by n ₂ (n).

伝達関数Ａ（ｚ）は、ＭＩＣ１の周波数応答を、そのフィルタリングおよび増幅応答と共に表す。伝達関数Ｂ（ｚ）は、ＭＩＣ２の周波数応答を、そのフィルタリングおよび増幅応答と共に表す。伝達関数Ａ（ｚ）の出力をｍ_１（ｎ）で示し、伝達関数Ｂ（ｚ）の出力をｍ_２（ｎ）で示す。信号をｍ_１（ｎ）およびｍ_２（ｎ）は、ノイズ除去エレメント１００５によって受け入れられ、ノイズ除去エレメント１００５は、これらの信号を処理して、「ノイズ除去音声」を出力する。 The transfer function A (z) represents the frequency response of MIC1, along with its filtering and amplification response. The transfer function B (z) represents the frequency response of MIC2, along with its filtering and amplification response. The output of the transfer function A (z) is denoted by m ₁ (n), and the output of the transfer function B (z) is denoted by m ₂ (n). The signals m ₁ (n) and m ₂ (n) are accepted by the noise removal element 1005, which processes these signals and outputs “noise removal speech”.

これ以降、「ＭＩＣＸの周波数応答」という用語は、マイクロフォンと、当該マイクロフォンに対するデータ記録過程中に生ずるあらゆる増幅およびフィルタリング過程との総合効果を含むものとする。信号およびノイズについて解くと（明確化のため、「ｚ」を省略する）、 From now on, the term “MIC X frequency response” is intended to include the combined effects of the microphone and any amplification and filtering processes that occur during the data recording process for that microphone. Solving for signal and noise (omitting “z” for clarity)

ここで、後者を前者に代入すると、次の式が得られる。 Here, if the latter is substituted for the former, the following equation is obtained.

これは、（ＭＩＣ１とＭＩＣ２との間の）周波数応答の差が影響を及ぼすことを示すように思われる。しかしながら、測定しているものに注目しなければならない。以前では（マイクロフォンの周波数応答を考慮に入れる前）、Ｈ_１の測定には、次の式を用いた。 This seems to indicate that the difference in frequency response (between MIC1 and MIC2) has an effect. However, you must pay attention to what you are measuring. Previously (before taking into account the frequency response of the microphone), the following equation was used to measure H ₁ :

ここで、下付文字ｎは、この計算が、ノイズのみを含有するウィンドウの間でのみ行われることを示す。しかしながら、前述の式を検討すると、信号がない場合、マイクロフォンにおいて次のものが測定されることがわかる。 Here, the subscript n indicates that this calculation is only performed between windows containing only noise. However, considering the above equation, it can be seen that in the absence of a signal, the following is measured at the microphone:

したがって、Ｈ_１は以下のように計算すればよい。 Therefore, H ₁ may be calculated as follows.

しかしながら、Ｈ_１（ｚ）を計算するときには、Ｂ（ｚ）およびＡ（ｚ）を考慮に入れていない。したがって、実際に測定しているのは、単に各マイクロフォンにおける信号の比率である。 However, B (z) and A (z) are not taken into account when calculating H ₁ (z). Therefore, what is actually measured is simply the ratio of the signal at each microphone.

ここで、Ｈ_１ ^〜は測定した応答を表し、Ｈ_１は実際の応答を表す。Ｈ_２の計算も同様であり、次の式が得られる。 Here, H ₁ ^~ represents a measured response, H ₁ represents the actual response. The calculation of H ₂ is the same, and the following equation is obtained.

Ｈ_１ ^〜およびＨ_２ ^〜を、逆に前述のＳの式に代入すると、次の式が得られる。 Substituting H ₁ ^- and H ₂ ^- for the above-described formula of S, the following formula is obtained.

または Or

これは、マイクロフォンの周波数応答が含まれない場合、以前と同一である。ここで、Ｓ（ｚ）Ａ（ｚ）がＳ（ｚ）に取って代わり、値（Ｈ_１ ^〜（ｚ）およびＨ_２ ^〜（ｚ））が実際のＨ_１（ｚ）およびＨ_２（ｚ）に取って代わる。つまり、このアルゴリズムは、理論上、マイクロフォンや付随するフィルタおよび増幅器応答とは独立である。 This is the same as before if the frequency response of the microphone is not included. Here, S (z) A (z ) is replaces the S (z), value ^{_(H 1} ~ _(z) and _H ² ~ (z)) the actual _H 1 (z) and _H 2 (z ). That is, this algorithm is theoretically independent of the microphone and associated filter and amplifier response.

しかしながら、実施にあたってはＨ_２＝Ｃｚ^−１（Ｃは定数）と仮定するが、実際には、 However, in the implementation, it is assumed that H ₂ = Cz ⁻¹ (C is a constant).

となるので、次の結果が得られる。 Therefore, the following result is obtained.

これは、未知のＢ（ｚ）およびＡ（ｚ）に依存する。これは、マイクロフォンの周波数応答がかなり異なる場合、特に、頻繁に用いられる安価なマイクロフォンではよく生ずることだが、問題を起こす可能性がある。これが意味するのは、ＭＩＣ２からのデータを補償して、ＭＩＣ１から来るデータに対して適正な関係を有するようにしなければならないということである。これは、離れて位置し、実際の信号に予期される方位（実際の信号源を用いることもできる）にある信号源からのブロードバンド信号をＭＩＣ１およびＭＩＣ２双方において記録することによって行うことができる。次に、マイクロフォン信号毎に離散フーリエ変換（ＤＦＴ）を計算し、各周波数ビンにおける変換の度合いを計算する。次に、各周波数ビンにおけるＭＩＣ２に対するＤＦＴの度合いを、ＭＩＣ１に対するＤＦＴの度合いにＣを乗算した値と等しくなるように設定する。Ｍ_１［ｎ］がＭＩＣ１に対するＤＦＴのｎ番目の周波数ビンにおける度合いを表すとすると、Ｍ_２［ｎ］と乗算する係数は、次の通りである。 This depends on the unknown B (z) and A (z). This can cause problems when the frequency responses of the microphones are quite different, especially if they are used frequently and cheaply. This means that the data from MIC2 must be compensated to have a proper relationship to the data coming from MIC1. This can be done by recording at both MIC1 and MIC2 a broadband signal from a signal source located remotely and in the expected orientation for the actual signal (the actual signal source can also be used). Next, a discrete Fourier transform (DFT) is calculated for each microphone signal, and the degree of conversion in each frequency bin is calculated. Next, the degree of DFT for MIC2 in each frequency bin is set to be equal to the value obtained by multiplying the degree of DFT for MIC1 by C. If M ₁ [n] represents the degree in the nth frequency bin of the DFT for MIC _1, the coefficients to be multiplied with M ₂ [n] are as follows:

次に、以前のＭＩＣ２ＤＦＴフェーズを用いて、新たなＭＩＣ２ＤＦＴ振幅に逆変換を適用する。このようにして、ＭＩＣ２を再同期させ、次の関係 Next, an inverse transform is applied to the new MIC2DFT amplitude using the previous MIC2DFT phase. In this way, MIC2 is resynchronized and

が音声のみが発生しているときに正しくなるようにする。また、この変換は、Ｆの特性をできるだけ緊密にエミュレートするフィルタを用いて、時間ドメインにおいて行うこともできる（例えば、計算したＦ［ｎ］の値と共にMatlab関数ＦＦＴ２．Ｍを用いて、適正なＦＩＲフィルタを構築することができる）。 To be correct when only audio is occurring. This transformation can also be performed in the time domain using a filter that emulates the characteristics of F as closely as possible (eg, using the Matlab function FFT2.M with the calculated value of F [n] A simple FIR filter can be constructed).

図１１Ａは、補償前の（４センチメートル隔たった）マイクロフォン間における周波数応答の差（パーセント）のプロットである。図１１Ｂは、ＤＦＴ補償後の（４センチメートル隔たった）マイクロフォン間における周波数応答の差（パーセント）のプロットである。図１１Ｃは、時間ドメイン・フィルタ補償後の（４センチメートル隔たった）マイクロフォン間における周波数応答の差（パーセント）のプロットである。これらのプロットは、前述の補償方法の有効性を示す。つまり、２つの非常に安価な無指向性または単一指向性マイクロフォンを用いても、双方の補償方法は、マイクロフォン間に正しい関係を復元する。 FIG. 11A is a plot of the frequency response difference (percent) between microphones (4 centimeters apart) before compensation. FIG. 11B is a plot of the frequency response difference (percent) between microphones (4 centimeters apart) after DFT compensation. FIG. 11C is a plot of the frequency response difference (percentage) between microphones (4 centimeters apart) after time domain filter compensation. These plots show the effectiveness of the compensation method described above. That is, even with two very inexpensive omnidirectional or unidirectional microphones, both compensation methods restore the correct relationship between the microphones.

変換は、相対的な増幅およびフィルタリング過程が不変である限り、比較的一定なはずである。したがって、補償過程は、製造段階において、１回だけ行えばよいということも可能である。しかしながら、必要であれば、ノイズが殆どなく強い信号がある場所でシステムを用いるまで、Ｈ_２＝０と仮定して動作するようにアルゴリズムを設定することもできる。次いで、補償係数Ｆ［ｎ］を計算し、その時点から以降用いることができる。ノイズが殆どない場合ノイズ消去は不要なので、この計算はノイズ消去アルゴリズムに不当な歪みを加えることはない。また、ノイズ環境が最高精度に好ましいときにはいつでも、ノイズ消去係数を更新することもできる。 The transformation should be relatively constant as long as the relative amplification and filtering process is unchanged. Thus, it is possible that the compensation process only needs to be performed once in the manufacturing stage. However, if necessary, the algorithm can be set to operate assuming that H ₂ = 0 until the system is used where there is little noise and a strong signal. The compensation factor F [n] can then be calculated and used thereafter from that point on. This calculation does not add undue distortion to the noise cancellation algorithm because noise cancellation is not necessary when there is little noise. Also, the noise cancellation coefficient can be updated whenever a noise environment is preferred for maximum accuracy.

ここに提示した図に示したブロックおよびステップの各々は、ここには記載する必要がない一連の動作を含む可能性がある。当業者は、ここに提示した図面および詳細な説明に基づいて、ルーチン、アルゴリズム、ソース・コード、マイクロコード、プログラム・ロジック・アレイを作成したり、あるいはそれ以外で本発明を実施することができる。ここに記載したルーチンは、以下のいずれか、または以下の１つ以上の組み合わせを含むことができる。関連する１つまたは複数のプロセッサの一部をなす不揮発性メモリ（図示せず）に格納したルーチン、従来のプログラム・ロジック・アレイまたは回路エレメントを用いて実現するルーチン、ディスクのような着脱可能な媒体に格納したルーチン、サーバからダウンロードした、またはクライアントの内部に格納してあるルーチン、および電気的消去可能プログラム可能リード・オンリ・メモリ（「ＥＥＰＲＯＭ」）半導体チップのようなチップ、特定用途数隻回路（ＡＳＩＣ）、またはディジタル信号処理（ＤＳＰ）集積回路に配線した、即ち、予めプログラムしたルーチン。 Each of the blocks and steps shown in the figures presented herein may include a series of operations that need not be described here. Those skilled in the art can create routines, algorithms, source code, microcode, program logic arrays, or otherwise implement the invention based on the drawings and detailed descriptions presented herein. . The routines described herein can include any of the following, or a combination of one or more of the following. Routines stored in non-volatile memory (not shown) forming part of one or more associated processors, routines implemented using conventional program logic arrays or circuit elements, removable such as disks Routines stored on media, routines downloaded from a server or stored in the client, and chips such as electrically erasable programmable read only memory (“EEPROM”) semiconductor chips, several applications A pre-programmed routine wired to a circuit (ASIC) or digital signal processing (DSP) integrated circuit.

特に文脈上明らかに必要でない限り、本記載および特許請求の範囲を通じて、「備える」(comprise)、「備えている」(comprising)等の単語は、排他的または網羅的な意味ではなく、包含的意味で解釈するものとする。即ち、「含むが、限定されない」という意味である。単数または複数を用いる単語も、それぞれ、複数または単数も含むものとする。加えて、「ここでは」(herein)、「これ以降」(hereunder)、および同様の趣旨の単語は、本願において用いる場合、本願全体を指し、本願のいずれの特定部分を指すのではないこととする。 Unless specifically required by context, throughout this description and the claims, the words “comprise”, “comprising”, etc. are not inclusive or exhaustive, but are inclusive It shall be interpreted in meaning. In other words, it means “including but not limited to”. Words using the singular or plural number also include the plural or singular number, respectively. In addition, `` herein '', `` hereunder '', and similar words when used herein refer to the entire application and not to any particular part of the application. To do.

例示した本発明の実施形態についての以上の説明は、全てを網羅したことも、開示された正確な形態に本発明を限定することも意図してはいない。例示の目的で本発明の具体的な実施形態や例についてここでは説明したが、本発明の範囲内で種々の等価な変更が可能なことは、当業者であれば認識するところである。ここに提示した本発明の教示は、前述のデータ収集象徴学読み取り装置(data collection symbology reader)だけでなく、その他のマシン・ビジョン・システムにも適用することができる。更に、前述の種々の実施形態のエレメントおよび作用を組み合わせて、更に別の実施形態を得ることも可能である。 The above description of illustrated embodiments of the invention is not intended to be exhaustive or to limit the invention to the precise form disclosed. While specific embodiments and examples of the invention have been described herein for purposes of illustration, those skilled in the art will recognize that various equivalent modifications are possible within the scope of the invention. The teachings of the present invention presented here can be applied not only to the data collection symbology reader described above, but also to other machine vision systems. Furthermore, further embodiments can be obtained by combining the elements and actions of the various embodiments described above.

この中で引用したあらゆる引例または米国特許出願は、その引用により、本願にも含まれることとする。必要であれば、本発明の態様を変更し、これら種々の引用のシステム、機能および概念を採用し、本発明の更に別の実施形態を得ることも可能である。 Any references or US patent applications cited herein are hereby incorporated by reference. If necessary, aspects of the present invention can be modified to employ these various cited systems, functions and concepts to yield further embodiments of the present invention.

図１は、一実施形態におけるノイズ除去システムのブロック図である。FIG. 1 is a block diagram of a noise removal system in one embodiment. 図２は、単一のノイズ源およびマイクロフォンまでの直接経路を想定した場合の、一実施形態におけるノイズ除去アルゴリズムを示すブロック図である。FIG. 2 is a block diagram illustrating a denoising algorithm in one embodiment assuming a single noise source and a direct path to the microphone. 図３は、ｎ個のノイズ源（これらのノイズ源は、互いの反射またはエコーである可能性もある）に一般化した場合の、一実施形態のノイズ除去アルゴリズムのフロント・エンドを示すブロック図である。FIG. 3 is a block diagram illustrating a front end of an embodiment denoising algorithm generalized to n noise sources (these noise sources may be reflections or echoes of each other). It is. 図４は、ｎ個の別個のノイズ源および信号反射がある一般的な事例における、一実施形態のノイズ除去アルゴリズムのフロント・エンドを示すブロック図である。FIG. 4 is a block diagram illustrating the front end of an embodiment denoising algorithm in the general case where there are n distinct noise sources and signal reflections. 図５は、一実施形態におけるノイズ除去方法のフロー図である。FIG. 5 is a flow diagram of a noise removal method in one embodiment. 図６は、多くの他の話者や公衆通報を含む空港ターミナルのノイズにおける、一実施形態のノイズ抑制アルゴリズムの英語を話すアメリカ人女性に関する結果を示す。FIG. 6 shows the results for an English-speaking American woman of one embodiment of the noise suppression algorithm in airport terminal noise, including many other speakers and public calls. 図７は、図２、図３、および図４の実施形態において、単一指向性および無指向性マイクロフォンを用いたノイズ除去についての物理的な構成のブロック図である。FIG. 7 is a block diagram of a physical configuration for noise removal using unidirectional and omnidirectional microphones in the embodiments of FIG. 2, FIG. 3, and FIG. 図８は、一実施形態において、２つの無指向性マイクロフォンを含むノイズ消去マイクロフォン構成を示す図である。FIG. 8 is a diagram illustrating a noise cancellation microphone configuration including two omnidirectional microphones in one embodiment. 図９は、図８の実施形態における、要求Ｃと距離の関係を示すグラフである。FIG. 9 is a graph showing the relationship between request C and distance in the embodiment of FIG. 図１０は、２つのマイクロフォンが異なる応答特性を有する場合の、一実施形態におけるノイズ除去アルゴリズムのフロント・エンドのブロック図である。FIG. 10 is a block diagram of the front end of a denoising algorithm in one embodiment when two microphones have different response characteristics. 図１１Ａは、補償前の（４センチメートル離れている）マイクロフォン間の周波数応答の差（パーセント）を示すグラフである。FIG. 11A is a graph showing the difference (in percent) in frequency response between microphones (4 centimeters away) before compensation. 図１１Ｂは、一実施形態における、ＤＦＴ補償後の（４センチメートル離れている）マイクロフォン間の周波数応答の差（パーセント）を示すグラフである。FIG. 11B is a graph illustrating the frequency response difference (in percent) between microphones (4 centimeters apart) after DFT compensation in one embodiment. 図１１Ｃは、代替実施形態における、時間ドメイン・フィルタ補償後の（４センチメートル離れている）マイクロフォン間の周波数応答の差（パーセント）を示すグラフである。FIG. 11C is a graph showing the difference (in percent) in frequency response between microphones (4 centimeters apart) after time domain filter compensation in an alternative embodiment.

Claims

A method for removing noise from an electronic signal,
Receiving a plurality of acoustic signals at a first receiving device;
Receiving a plurality of acoustic signals at a second receiving device, the plurality of acoustic signals comprising at least one signal generated by at least one noise source and at least one voiced generated by at least one signal source; And wherein the at least one signal source comprises a speaker, and the relative positions of the signal source, the first receiving device, and the second receiving device are constant and known;
Receiving psychological information associated with the speaker's human vocalization activity, including presence or absence of vocalization activity;
Generating at least one first transfer function representing the plurality of acoustic noise signals if it is determined that there is no vocal activity in the plurality of acoustic signals over at least a specified period;
Generating at least one second transfer function representing the plurality of acoustic signals when it is determined that there is voiced information in the plurality of acoustic signals in at least the specified period;
Removing at least one noise from the plurality of acoustic signals using at least one combination of the at least one first transfer function and the at least one second transfer function to generate at least one noise cancellation data stream; When,
A method consisting of:

The method of claim 1, wherein the first receiving device and the second receiving device each comprise a microphone selected from the group consisting of a unidirectional microphone and a unidirectional microphone.

The method of claim 1, wherein the plurality of acoustic signals are received in the form of discrete time samples, the first receiving device and the second receiving device being located a distance "d" apart, d Corresponds to n discrete time samples.

2. The method of claim 1, wherein the at least one second transfer function is fixed as a function of a difference between an amplitude of signal data at the first receiving device and an amplitude of signal data at the second receiving device. ,Method.

The method of claim 1, wherein removing noise from the plurality of acoustic signals comprises using a direction and distance from the at least one first receiving device to the at least one signal source.

The method of claim 1, wherein the frequency response of each of the at least one receiving device and the at least one second receiving device is different, compensating signal data from the at least one second receiving device, and A method having a proper relationship to signal data from at least one first receiving device.

7. The method of claim 6, wherein compensating the signal data from the at least one second receiving device is the at least one signal at the at least one first receiving device and the at least one second receiving device. Recording a broadband signal from a signal source located at a distance and orientation expected for the signal from the source.

The method of claim 6, wherein the compensation of signal data from the at least one first receiving device comprises frequency domain compensation.

9. The method of claim 8, wherein the frequency compensation is
Calculating a frequency transform for signal data from each of the at least one first receiving device and the at least one second receiving device;
Calculating the degree of frequency conversion in each frequency bin;
Setting, for each frequency, the degree of frequency conversion for signal data from the at least one second receiving device to a value relating to the degree of frequency conversion for signal data from the at least one receiving device;
A method consisting of:

The method of claim 6, wherein the compensation of signal data from the at least one first receiving device comprises time domain compensation.

The method of claim 6, further comprising:
Initially setting the at least one second transfer function to zero;
Calculating a compensation factor when the at least one noise signal is relatively small with respect to the at least one voiced signal;
Including a method.

The method of claim 1, wherein the plurality of acoustic signals includes at least one reflection of the at least one noise signal and at least one reflection of the at least one voiced signal.

The method of claim 1, wherein the step of receiving psychological information is selected from the group consisting of an acoustic microphone, a radio frequency device, an electronic language recorder, an ultrasound device, an acoustic throat microphone, and an airflow detector. A method comprising receiving psychological data associated with a person's utterance using at least one detector.

The method of claim 1, wherein generating the at least one first transfer function and the at least one second transfer function comprises using at least one technique selected from the group consisting of adaptive techniques and recursive techniques. Including.

A system for removing noise from an acoustic signal,
At least one receiver,
At least one signal receiver configured to receive at least one acoustic signal from a signal source;
At least one noise receiver configured to receive at least one noise signal from a noise source, relative to the signal source, the at least one signal receiver, and the at least one noise receiver. Said at least one receiver, wherein the general position is constant and known;
At least one sensor for receiving psychological information associated with human vocal activity;
At least one processor coupled between the at least one receiver and the at least one sensor and generating a plurality of transfer functions, wherein there is no voicing information in the plurality of acoustic signals for at least a specified period of time; And at least one first transfer function representing the plurality of acoustic noise signals is generated and responds to the determination that utterance information is present in the plurality of acoustic signals in at least the specified period. And generating at least one second transfer function representing the plurality of acoustic signals, and using the at least one combination of the at least one first transfer function and the at least one second transfer function. A processor that removes noise from the acoustic signal of
System with.

16. The system of claim 15, wherein the at least one sensor includes at least one radio frequency ("RF") interferometer that detects tissue motion associated with human speech.

16. The system of claim 15, wherein the at least one sensor is at least one selected from the group consisting of an acoustic microphone, a radio frequency device, an electronic language recorder, an ultrasonic device, an acoustic throat microphone, and an airflow detector. A system that includes a sensor.

16. The system of claim 15, wherein the at least one processor is
Dividing the acoustic data of the at least one acoustic signal into a plurality of subbands;
Removing noise from each of the plurality of subbands using at least one combination of the at least one first transfer function and the at least one second transfer function to generate a plurality of noise canceled acoustic data streams;
Combining the plurality of noise canceling acoustic data streams to generate the at least one noise canceling acoustic data stream;
Configured as a system.

16. The system of claim 15, wherein the at least one signal receiver and the at least one noise receiver are each a microphone selected from the group consisting of a unidirectional microphone and an omnidirectional microphone.

A signal processing system coupled between at least one user and at least one electronic device,
At least one first receiving device configured to receive at least one acoustic signal from a signal source;
At least one second receiving device configured to receive at least one noise signal from a noise source, the signal source, the at least one first receiving device, and the at least one second receiving device. A second receiving device, the relative position of which is constant and known;
At least one noise cancellation subsystem for removing noise from the acoustic signal, comprising:
At least one processor coupled between the at least one first receiver and the at least one second receiver;
At least one sensor coupled to the at least one processor, wherein the at least one sensor is configured to receive psychological information associated with human vocal activity, the at least one processor Generates a plurality of transfer functions and generates at least one first transfer function representing the plurality of acoustic noise signals in response to determining that there is no utterance information in the plurality of acoustic signals for at least a specified period of time. And at least one second transfer function representing the plurality of acoustic signals is generated in response to determining that utterance information is present in the plurality of acoustic signals in at least the specified one period, The plurality of acoustic signals using at least one combination of a first transfer function and the at least one second transfer function Removing Luo noise, and noise cancellation subsystem,
A signal processing system.

21. The signal processing system of claim 20, wherein the first receiving device and the second receiving device are each a microphone selected from the group consisting of a unidirectional microphone and an omnidirectional microphone.

21. The signal processing system of claim 20, wherein the at least one acoustic signal is received in the form of discrete time samples, wherein the first receiving device and the second receiving device are located a distance "d" apart, d Is a signal processing system corresponding to n discrete time samples.

21. The signal processing system of claim 20, wherein the at least one second transfer function is fixed as a function of a difference between an amplitude of signal data at the first receiving device and an amplitude of signal data at the second receiving device. A signal processing system.

The signal processing system of claim 20, wherein removing noise from the at least one acoustic signal comprises using a direction and distance from the at least one first receiving device to the at least one signal source. Signal processing system.

21. The signal processing system according to claim 20, wherein each of the at least one first receiving device and the at least one second receiving device has a different frequency response, and the signal data from the at least one second receiving device is obtained. A signal processing system that has been compensated to have a proper relationship to signal data from the at least one first receiving device.

26. The signal processing system of claim 25, wherein compensating the signal data from the at least one second receiving device is a sound source located at an expected distance and orientation relative to a signal from the at least one signal source. A broadband signal from the at least one first receiving device and the at least one second receiving device.

26. The signal processing system of claim 25, wherein compensating signal data from the at least one second receiving device includes frequency domain compensation.

28. The signal processing system of claim 27, wherein the frequency compensation is
Calculating a frequency transform for signal data from each of the at least one first receiving device and the at least one second receiving device;
Calculating the degree of frequency conversion in each frequency bin, and for each frequency, calculating the degree of frequency conversion for the signal data from the at least one second receiving device for the signal data from the at least one receiving device; Setting a value relating to the degree of frequency conversion;
A signal processing system comprising:

26. The signal processing system of claim 25, wherein compensating signal data from the at least one second receiving device includes time domain compensation.

26. The signal processing system of claim 25, further comprising compensation.
Initially, the at least one second transfer function is set to 0,
Calculating a compensation factor when the at least one noise signal is relatively small with respect to the at least one acoustic signal;
A signal processing system comprising:

21. The signal processing system of claim 20, wherein the at least one acoustic signal includes at least one reflection of the at least one noise signal and at least one reflection of the at least one acoustic signal.

21. The signal processing system of claim 20, wherein the receiving of psychological information is selected from the group consisting of an acoustic microphone, a radio frequency device, an electronic language recorder, an ultrasound device, an acoustic throat microphone, and an airflow detector. A signal processing system comprising: using at least one detector to receive psychological data associated with human speech.

21. The signal processing system of claim 20, wherein generating the at least one first transfer function and the at least one second transfer function uses at least one technique selected from the group consisting of adaptive techniques and recursive techniques. A signal processing system.