JP2013178570A

JP2013178570A - Method and apparatus for removing noise from electronic signal

Info

Publication number: JP2013178570A
Application number: JP2013107341A
Authority: JP
Inventors: Gregory C Burnett; バーネット，グレゴリー・シー; Eric F Breitfeller; ブレイトフェラー，エリック・エフ
Original assignee: AliphCom LLC
Current assignee: AliphCom LLC
Priority date: 2000-07-19
Filing date: 2013-05-21
Publication date: 2013-09-09
Also published as: WO2002007151A3; JP2004509362A; JP2011203755A; EP1301923A2; US20020039425A1; AU2001276955A1; KR20030076560A; WO2002007151A2; CN1443349A; CA2416926A1

Abstract

PROBLEM TO BE SOLVED: To provide a method and system for removing acoustic noise from human speech, in which noise is removed irrespective of the type, magnitude or orientation of noise to restore a signal after the removal.SOLUTION: A system includes a microphone and a sensor connected to a processor. The microphone receives acoustic signals and a VAD supplies a signal of a binary 1 when speech (both voiced and unvoiced) is occurring and of a binary 0 in the absence of speech. The processor includes a denoising algorithm, and it generates transfer functions. The transfer functions include a transfer function generated in response to a determination that voicing information is not present in the acoustic signal received during a specified time period. Also, the transfer functions include a transfer function generated in response to a determination that voicing information is present in the acoustic signal during a specified time period. At least one denoised acoustic data stream is generated using the transfer functions.

Description

本発明は、音響的な伝送または録音から望ましくない音響ノイズを除去あるいは抑制するための数学的方法並びに電子的システムに関するものである。 The present invention relates to mathematical methods and electronic systems for removing or suppressing undesirable acoustic noise from acoustic transmissions or recordings.

代表的な音響用途においては、人間のユーザからのスピーチは、録音されるかあるいは格納されそして様々な場所にいる受け手に送信される。 In typical acoustic applications, speech from a human user is recorded or stored and transmitted to recipients at various locations.

そのユーザの環境においては、問題とする信号（ユーザのスピーチ）を、望まない音響ノイズで汚染する１つまたはこれより多いノイズ・ソースが存在することがある。これは、受け手が人であろうと機械であろうと、その受け手がユーザのスピーチを理解するのを困難にしたりあるいは不可能にしたりする。このことは、特に、セルラ電話およびパーソナル・デジタル・アシスタントのようなポータブル通信デバイスの普及と共に、現在特に問題となっている。これらノイズ付加物を抑制する既存の方法があるが、これらは、いずれも、長過ぎる計算時間あるいは嵩張るハードウェアを必要としたり、問題の信号を歪ませ過ぎたり、あるいは有用な性能に欠けたりするものである。これら方法の多くは、ヴァセギ(Vaseghi)による“先進のデジタル信号処理およびノイズ低減（Advanced Digital Signal Processing and Noise Reduction）”(ISBN 0-471-62692-9)の教本に記載されている。結果として、代表的なシステムの上記欠点に対処しそして歪みなしで問題の音響信号をクリーンにする新たな技術を提供するノイズ除去および低減法に対するニーズがあることになる。 In that user's environment, there may be one or more noise sources that contaminate the signal of interest (user's speech) with unwanted acoustic noise. This makes it difficult or impossible for the recipient to understand the user's speech, whether the recipient is a person or a machine. This is currently a particular problem, particularly with the proliferation of portable communication devices such as cellular telephones and personal digital assistants. There are existing ways to suppress these noise adjuncts, all of which require too long computation time or bulky hardware, distort the signal in question too much, or lack useful performance Is. Many of these methods are described in the textbook "Advanced Digital Signal Processing and Noise Reduction" (ISBN 0-471-62692-9) by Vaseghi. As a result, there is a need for a denoising and reduction method that addresses the above shortcomings of typical systems and provides a new technique for cleaning the acoustic signal in question without distortion.

人間のスピーチからの音響ノイズの除去のための方法およびシステムを提供し、これにおいては、ノイズ・タイプ、大きさあるいは方位に無関係にノイズを除去してその信号を復元する。本システムは、プロセッサに結合したマイクロホンおよびセンサを備える。マイクロホンは、ノイズと人間の信号ソースからのスピーチ信号の両方を含む音響信号を受ける。センサは、二進の発声活動検出（ＶＡＤ）を発生し、これは、スピーチ（有声および無声の両方）が生起しているとき二進“１”であり、そしてスピーチが生起していないとき二進“０”の信号を供給する。このＶＡＤ信号は、種々の方法、例えば音響利得、加速度計、および無線周波（ＲＦ）センサを使用して得ることができる。 A method and system is provided for the removal of acoustic noise from human speech, in which noise is removed and its signal restored regardless of noise type, magnitude or orientation. The system includes a microphone and a sensor coupled to the processor. The microphone receives an acoustic signal that includes both noise and a speech signal from a human signal source. The sensor generates binary vocal activity detection (VAD), which is binary “1” when speech (both voiced and unvoiced) is occurring and two when speech is not occurring. Supply a signal with a decimal "0". This VAD signal can be obtained using various methods, such as acoustic gain, accelerometers, and radio frequency (RF) sensors.

このプロセッサ・システムおよび方法は、デノイズ処理アルゴリズム（denoizing algorithm）を備え、これは、ノイズ・ソースとマイクロホンとの間の伝達関数、並びに人間のユーザとマイクロホンとの間の伝達関数を計算する。これら伝達関数を使用することにより、受けた音響信号からノイズを除去して、少なくとも１つのデノイズ処理した音響データ・ストリームを発生する。 The processor system and method includes a denoizing algorithm that calculates a transfer function between a noise source and a microphone, as well as a transfer function between a human user and a microphone. By using these transfer functions, noise is removed from the received acoustic signal to generate at least one denoised acoustic data stream.

図１は、１実施形態のデノイズ・システムのブロック図。FIG. 1 is a block diagram of a denoising system according to one embodiment. 図２は、単一のノイズ・ソースとマイクロホンへの直接経路を想定したときの、１実施形態のノイズ除去アルゴリズムのブロック図。FIG. 2 is a block diagram of an embodiment denoising algorithm assuming a single noise source and a direct path to the microphone. 図３は、ｎ個の区別できるノイズ・ソース（これらノイズ・ソースは、互いに他のものの反射またはエコーであることもある）に一般化した、１実施形態のノイズ除去アルゴリズムのフロントエンドのブロック図。FIG. 3 is a block diagram of a front end of an embodiment denoising algorithm generalized to n distinct noise sources (these noise sources may be reflections or echoes of each other). . 図４は、ｎ個の区別できるノイズ・ソースと信号反射とがある最も一般的なケースにおける、１実施形態のノイズ除去アルゴリズムのフロントエンドのブロック図。FIG. 4 is a block diagram of the front end of one embodiment of the denoising algorithm in the most general case where there are n distinct noise sources and signal reflections. 図５は、１実施形態のデノイズ方法のフロー図。FIG. 5 is a flowchart of a denoising method according to one embodiment. 図６は、空港ターミナルのノイズ（他の多くの話しをする人および公共のアナウンスを含む）の存在下でのアメリカの英語を話す女性に対しての、１実施形態のノイズ抑制アルゴリズムの結果を示す。FIG. 6 shows the results of one embodiment of the noise suppression algorithm for an American English-speaking woman in the presence of airport terminal noise (including many other speakers and public announcements). Show.

図１は、１実施形態のデノイズ処理システム（denoising system）のブロック図であって、このシステムは、発声活動に関する生理学的情報から得た、いつスピーチが生起しているかについての知識を使用する。本システムは、複数のマイクロホン１０と複数のセンサ２０とを備え、そしてこれらは、少なくとも１つのプロセッサ３０へ信号を供給する。プロセッサは、デノイズ処理を行うサブシステムまたはアルゴリズムを備えている。 FIG. 1 is a block diagram of one embodiment of a denoising system that uses knowledge about when speech is occurring, derived from physiological information about vocal activity. The system includes a plurality of microphones 10 and a plurality of sensors 20 that supply signals to at least one processor 30. The processor includes a subsystem or algorithm that performs denoising processing.

図２は、単一のノイズ・ソースとマイクロホンへの直接経路を仮定したときの、１実施形態のノイズ除去システム／アルゴリズムのブロック図である。このノイズ除去システム図は、１実施形態のこのプロセスの図式記述を含み、単一の信号ソース（１００）と、単一のノイズ・ソース（１０１）とがある。このアルゴリズムは、２つのマイクロホン、すなわち“信号”マイクロホン（ＭＩＣ１，１０２）と、“ノイズ”マイクロホン（ＭＩＣ２，１０３）とを使用する（但し、これに限定されるものではない）。ＭＩＣ１は、大部分の信号といくらかのノイズを捕獲する一方で、ＭＩＣ２は、大部分のノイズといくらかの信号とを捕獲すると仮定する。これは、従来の先進の音響システムと共通の構成である。ＭＩＣ１への信号からのデータはｓ（ｎ）で示し、ＭＩＣ２への信号からのデータはｓ_２（ｎ）で示し、ＭＩＣ２へのノイズからのデータはｎ（ｎ）で示し、ＭＩＣ１へのノイズからのデータはｎ_２（ｎ）で示している。同様に、ＭＩＣ１からのデータはｍ_１（ｎ）で、そしてＭＩＣ２からのデータはｍ_２（ｎ）で示し、ここで、ｓ（ｎ）はソースからのアナログ信号の離散的なサンプルを示している。 FIG. 2 is a block diagram of one embodiment of a denoising system / algorithm assuming a single noise source and direct path to the microphone. This denoising system diagram includes a schematic description of this process of one embodiment, with a single signal source (100) and a single noise source (101). The algorithm uses (but is not limited to) two microphones: a “signal” microphone (MIC1, 102) and a “noise” microphone (MIC2, 103). Assume that MIC1 captures most of the signal and some noise, while MIC2 captures most of the noise and some signal. This is a configuration common to conventional advanced acoustic systems. Data from the signal to MIC1 is denoted by s (n), data from the signal to MIC2 is denoted by s ₂ (n), data from noise to MIC2 is denoted by n (n), and noise to MIC1 The data from are indicated by n ₂ (n). Similarly, data from MIC1 is denoted m ₁ (n) and data from MIC2 is denoted m ₂ (n), where s (n) represents a discrete sample of the analog signal from the source. Yes.

信号からＭＩＣ１への伝達関数およびノイズからＭＩＣ２への伝達関数は、１であると仮定するが、信号からＭＩＣ２への伝達関数はＨ_２（ｚ）、ノイズからＭＩＣ１への伝達関数はＨ_１（ｚ）で示す。１の伝達関数のこの仮定は、このアルゴリズムの一般性を妨げるものでなく、その理由は、信号とノイズとマイクロホンとの実際の関係が単に比率であり、そしてこの比率は、簡単のためこのようにして再定義されるからである。 It is assumed that the transfer function from the signal to MIC1 and the transfer function from noise to MIC2 is 1, but the transfer function from signal to MIC2 is H ₂ (z), and the transfer function from noise to MIC1 is H ₁ ( z). This assumption of unity transfer function does not preclude the generality of the algorithm because the actual relationship between signal, noise and microphone is simply a ratio, and this ratio is It is because it is redefined.

従来のノイズ除去システムにおいては、ＭＩＣ２からの情報は、ＭＩＣ１からのノイズを除去しようとする試みにおいて使用されている。しかし、語られていない仮定は、発声活動検出（ＶＡＤ（Voice Activity Detection））が決して完全ではないことであり、したがってそのデノイズ処理は、ノイズと一緒に信号をもかなり除去してしまうことのないよう、注意深く実行しなければならない。しかし、このＶＡＤが完全であり、そしてこれが、スピーチが全くユーザによって発されていないときにゼロに等しく、そしてスピーチが発生されているときに１に等しいと仮定すると、このノイズ除去においてかなりの改善を行うことができる。 In conventional denoising systems, information from MIC2 is used in an attempt to remove noise from MIC1. However, an unspoken assumption is that voice activity detection (VAD) is never perfect, so its denoising process does not significantly remove the signal along with the noise. Must be carried out carefully. However, assuming this VAD is perfect and it is equal to zero when no speech is being emitted by the user and equal to 1 when speech is being generated, there is a considerable improvement in this denoising. It can be performed.

マイクロホンへの単一のノイズ・ソースおよび直接経路の分析においては、図２において、ＭＩＣ１へ入来する音響情報は、ｍ_１（ｎ）で示される。ＭＩＣ２へ入来する情報は、同様にｍ_２（ｎ）で示される。ｚ（デジタル周波数）ドメインにおいては、これらは、Ｍ_１（ｚ）およびＭ_２（ｚ）として表される。このとき、 In the analysis of a single noise source and direct path to the microphone, in FIG. 2, the acoustic information coming into MIC1 is denoted m ₁ (n). Information coming into MIC2 is similarly denoted m ₂ (n). In the z (digital frequency) domain, these are represented as M ₁ (z) and M ₂ (z). At this time,

ここで、 here,

したがって、以下となる。 Therefore:

これは、２個のマイクロホン・システムの全てに対する一般的なケースである。実際のシステムでは、常に、ＭＩＣ１へのノイズにいくらかの漏れと、ＭＩＣ２への信号にいくらかの漏れとがある。式１には、未知数が４つで、既知の関係が２つしかないため、これを明快に解くことはできない。 This is the general case for all two microphone systems. In an actual system, there will always be some leakage in the noise to MIC1 and some leakage in the signal to MIC2. Since there are four unknowns and only two known relationships in Equation 1, this cannot be solved clearly.

しかし、式１中の未知数のいくつかに対し解を与える別の方法がある。この分析は、信号が発生されていないケース、すなわちＶＡＤ信号がゼロに等しくかつスピーチが発生されていない場合を調べることから始まる。このケースでは、ｓ（ｎ）＝Ｓ（ｚ）＝０であり、したがって式１は、以下となる。 However, there are other ways to provide solutions for some of the unknowns in Equation 1. This analysis begins by examining the case where no signal is generated, ie, when the VAD signal is equal to zero and no speech is generated. In this case, s (n) = S (z) = 0, so Equation 1 becomes:

ここで、変数Mの下付文字ｎは、ノイズのみを受けていることを示す。これにより、以下となる。 Here, the subscript n of the variable M indicates that only noise is received. This results in the following:

Ｈ_１（ｚ）は、本システムがノイズのみを受信していることが確かであれば、利用可能なシステム識別アルゴリズムおよびマイクロホン出力の任意のものを使用して計算することができる。この計算は、適応的に行うことができ、これにより、本システムは、そのノイズにおける変化に反応することができる。 H ₁ (z) can be calculated using any of the available system identification algorithms and microphone outputs if it is certain that the system is receiving only noise. This calculation can be done adaptively, which allows the system to react to changes in its noise.

式１中の未知数のうちの１つに対し、解がこれで入手可能である。別の未知数、すなわちＨ_２（ｚ）は、ＶＡＤが１に等しくしかもスピーチが発生されている場合を使用することにより、決定することができる。その場合が発生しているが、マイクロホンの最近（おそらく、１秒未満）の履歴が低レベルのノイズを示している場合、ｎ（ｓ）＝Ｎ（ｚ）〜０とみなすことができる。このとき、式１は、以下となる。 A solution is now available for one of the unknowns in Equation 1. Another unknown, H ₂ (z), can be determined by using the case where VAD is equal to 1 and speech is being generated. If that happens, but the recent (probably less than 1 second) history of the microphone indicates a low level of noise, it can be considered n (s) = N (z) -0. At this time, Formula 1 is as follows.

これはさらに、以下となる。 This is further:

これは、Ｈ_１（ｚ）の逆である。しかし、分かるように、異なった入力を使用している、すなわち、これでは、信号のみが発生しており、そしてこれに対し、以前では、ノイズのみが発生していた。Ｈ_２（ｚ）の計算の間、Ｈ_１（ｚ）に対し計算した値は、一定に保持し、そして逆の場合もそうである。したがって、Ｈ_１（ｚ）およびＨ_２（ｚ）は、その他方を計算している間は実質上変化しない、と仮定する。 This is the inverse of H ₁ (z). However, as can be seen, different inputs are used, i.e., this only generated a signal, whereas previously, only noise was generated. During the calculation of H ₂ (z), the value calculated for H ₁ (z) remains constant and vice versa. Accordingly, it is assumed that H ₁ (z) and H ₂ (z) do not change substantially while the other is being calculated.

Ｈ_１（ｚ）およびＨ_２（ｚ）を計算した後、これらは、信号からノイズを除去するのに使用する。もし、式１を以下にように書き直すと、 After calculating H ₁ (z) and H ₂ (z), they are used to remove noise from the signal. If Equation 1 is rewritten as

このとき、Ｎ（ｚ）は、Ｓ（ｚ）を解くために、示したように以下のように置換することができる。 At this time, N (z) can be substituted as shown below to solve S (z).

もし、伝達関数Ｈ_１（ｚ）およびＨ_２（ｚ）が十分な正確さで記述することができる場合、このときには、ノイズを完全に除去することができ、そして元の信号を復元することができる。このことは、ノイズの大きさまたはスペクトル特性に無関係に当てはまる。行った仮定は、完全なＶＡＤ、十分に正確なＨ_１（ｚ）およびＨ_２（ｚ）と、Ｈ_１（ｚ）およびＨ_２（ｚ）がその他方を計算しているときに実質上変化しない、ということのみである。実際、これら仮定は、妥当なものであると判明した。 If the transfer functions H ₁ (z) and H ₂ (z) can be described with sufficient accuracy, then the noise can be completely removed and the original signal can be restored. it can. This is true regardless of noise magnitude or spectral characteristics. The assumptions made are substantially VAD, sufficiently accurate H ₁ (z) and H ₂ (z) and substantially change when H ₁ (z) and H ₂ (z) are calculating the other It ’s just not. In fact, these assumptions proved to be valid.

記述したこのノイズ除去アルゴリズムは、任意の数のノイズ・ソースを含むように容易に一般化できる。図３は、ｎ個の区別できるノイズ・ソースに対し一般化した、１実施形態のノイズ除去アルゴリズムのフロントエンドのブロック図である。これら区別できるノイズ・ソースは、互いの他の反射またはエコーであるとすることができる（但し、これに限定されるものではない）。図示したように、いくつかのノイズ・ソースがあり、その各々は、各マイクロホンに対し伝達関数または経路を有している。先に名称を付与した経路Ｈ_２は、Ｈ_０としてラベルを付与しており、これにより、ＭＩＣ１へのノイズ・ソース２の経路にラベルを付与することは、より都合よくなる。各マイクロホンの出力は、ｚドメインに変換したときには、以下となる。 The described denoising algorithm can be easily generalized to include any number of noise sources. FIG. 3 is a block diagram of a front end of an embodiment denoising algorithm generalized to n distinct noise sources. These distinguishable noise sources can be (but are not limited to) other reflections or echoes of each other. As shown, there are several noise sources, each having a transfer function or path for each microphone. The path H ₂ previously assigned a name is labeled as H ₀ , which makes it more convenient to label the path of noise source 2 to MIC 1. The output of each microphone is as follows when converted to the z domain.

信号が全くないとき（ＶＡＤ＝０）、このとき（簡単のためｚを抑制する）、以下となる。 When there is no signal (VAD = 0), at this time (for the sake of simplicity, z is suppressed):

これにより、上記のＨ_１（ｚ）と同じように、新たな伝達関数を定義することができる。 Thereby, a new transfer function can be defined in the same manner as H ₁ (z).

したがって、Ｈ^~ _１は、ノイズ・ソースとそれらの各々の伝達関数にのみ依存し、したがって伝送されている信号がないどのような時にも計算することができる。もう一度繰り返すと、マイクロホン入力の下付文字ｎは、ノイズが検出されていることのみを示し、その一方で、下付文字ｓは、信号のみをマイクロホンが受信していることを示している。 Thus, H ^~ ₁ depends only on the noise sources and their respective transfer functions and can therefore be calculated at any time when there is no signal being transmitted. Again, the subscript n of the microphone input indicates only that noise has been detected, while the subscript s indicates that the microphone is receiving only the signal.

ノイズが全く発生されていないと仮定している間において、式４を調べると、以下となる。 Examining Equation 4 while assuming that no noise is generated, the following is obtained.

したがって、Ｈ_０は、任意の利用可能な伝達関数計算アルゴリズムを使って、前と同じように解くことができる。数学的には、以下となる。 Thus, H ₀ can be solved as before using any available transfer function calculation algorithm. Mathematically:

式６で定義したＨ^~ _１を使って、式４を書き直すと、以下となる。 Rewriting equation 4 using H ^~ ₁ defined in equation 6 yields:

Ｓに関し解くと、以下となる。 Solving for S gives:

これは、式３と同じとなり、ここで、Ｈ_０がＨ_２に取って代わり、Ｈ^~ _１がＨ_１に取って代わっている。したがって、このノイズ除去アルゴリズムは、依然として、ノイズ・ソースの多数のエコーを含む任意の数のノイズ・ソースに対し、数学的に有効である。再び、Ｈ_０とＨ^~ _１を十分に高い正確さで推定することができ、そして信号からマイクロホンに対しての経路が１つのみであるという仮定が保たれる場合、ノイズは、完全に除去することができる。 , Should be the same as Equation 3, where, _{H 0} is replaces the _{H 2,} H ^~ ₁ are replaced the _{H 1.} Thus, this denoising algorithm is still mathematically effective for any number of noise sources including multiple echoes of the noise source. Again, H ₀ and H ^~ ₁ can be estimated with sufficiently high accuracy, and if the assumption that there is only one path from the signal to the microphone, the noise is completely removed. can do.

最も一般的なケースは、多数のノイズ・ソースと多数の信号ソースが関係する場合である。図４は、ｎ個の区別できるノイズ・ソースおよび信号反射とがある最も一般的なケースにおける、１実施形態のノイズ除去アルゴリズムのフロントエンドのブロック図である。ここで、信号の反射は、両方のマイクロホンに入る。これは、最も一般的なケースであるが、それは、マイクロホンへのノイズ・ソースの反射が、単純な追加のノイズ・ソースとして正確にモデル化できるからである。簡単のため、信号からＭＩＣ２への直接経路は、Ｈ_０（ｚ）からＨ_００（ｚ）に変えてあり、そしてマイクロホン１および２へのその反射経路は、それぞれ、Ｈ_０１（ｚ）およびＨ_０２（ｚ）として示している。 The most common case is when multiple noise sources and multiple signal sources are involved. FIG. 4 is a block diagram of the front end of one embodiment of the denoising algorithm in the most general case where there are n distinct noise sources and signal reflections. Here, the reflection of the signal enters both microphones. This is the most common case because the reflection of the noise source to the microphone can be accurately modeled as a simple additional noise source. For simplicity, the direct path from the signal to MIC2 has been changed from H ₀ (z) to H ₀₀ (z), and its reflection paths to microphones 1 and 2 are H ₀₁ (z) and H respectively. ₀₂ (z).

これにより、マイクロホンへの入力は、以下となる。 Thereby, the input to the microphone is as follows.

ＶＡＤ＝０のとき、それら入力は、以下となる（ｚを再び抑制する）。 When VAD = 0, the inputs are as follows (suppress z again):

これは、式５と同じである。したがって、式６におけるＨ^~ _１の計算は、予期した通り、変化しない。ノイズがないこの状況を検討すると、式９は、以下となる。 This is the same as Equation 5. Therefore, the calculation of H ^~ ₁ in Equation 6 does not change as expected. Considering this situation without noise, Equation 9 becomes:

これは、Ｈ^~ _２の定義となる。 This is the definition of H ^~ ₂ .

再び、（式７におけるのと同じように）Ｈ^~ _１に対する定義を使用して式９を書き直すと、以下となる。 Again, rewriting Equation 9 using the definition for H ^~ ₁ (as in Equation 7) yields:

いくらかの代数的操作により、以下のようになる。 With some algebraic manipulation:

最後には、以下となる。 Finally,

式１２は、式８と同じであるが、但し、Ｈ_０がＨ^~ _２で置き換わっており、また、（１＋Ｈ_０１）の要素が左辺に追加されている。この余分な要素は、Ｓがこの状況では直接解くことができないということを意味しているが、解は、信号にそのエコーの全ての追加に対し生成することができる。このことは、それほど悪い状況ではないが、それは、エコー抑制を取り扱う多くの従来の方法があるからであり、そしてこれらエコーが抑制されない場合でも、それらがスピーチの理解度に意味のある程度にまで影響を与えることは起きそうにない。Ｈ^~ _２のより複雑な計算は、マイクロホン２における信号エコーを考慮する必要がある。 Equation 12 is the same as Equation 8, except, _{H 0} are replaced by H ^~ _2, it has also been added to the left side elements of the (1 _{+ H 01).} This extra factor means that S cannot be solved directly in this situation, but a solution can be generated for every addition of that echo to the signal. This is not a bad situation, because there are many traditional ways of dealing with echo suppression, and even if these echoes are not suppressed, they have a significant impact on speech comprehension. Is unlikely to happen. More complex calculations of H ^~ ₂ need to take into account signal echoes in microphone 2.

図５は、１実施形態のデノイズ処理方法のフロー図である。動作を説明すると、音響信号を受ける（５０２）。さらに、人の発声活動に関連する生理学的情報を受ける（５０４）。音響信号からの発声情報が少なくとも１つの指定した時間の間存在しないと判定したときに、音響信号を表す第１の伝達関数を計算する（５０６）。音響信号を表す第２の伝達関数は、この音響信号において発声情報が少なくとも１つの指定した時間の間存在すると判定したときに、計算する（５０８）。この音響信号からのノイズの除去は、第１伝達関数と第２伝達関数の少なくとも１つの組み合わせを使用して行い、これによりデノイズ処理した音響データ・ストリームを発生する（５１０）。 FIG. 5 is a flowchart of a denoising method according to one embodiment. In operation, an acoustic signal is received (502). In addition, physiological information related to the person's vocal activity is received (504). When it is determined that utterance information from the acoustic signal does not exist for at least one specified time, a first transfer function representing the acoustic signal is calculated (506). A second transfer function representing the acoustic signal is calculated (508) when it is determined that utterance information exists in the acoustic signal for at least one specified time. The removal of noise from the acoustic signal is performed using at least one combination of the first transfer function and the second transfer function, thereby generating a denoised acoustic data stream (510).

ノイズ除去のためのアルゴリズム、すなわちデノイズ処理アルゴリズムは、直接経路をもつ単一のノイズ・ソースの最も単純なケースから、反射およびエコーをもつ多数のノイズ・ソースまでここに記述した。このアルゴリズムは、どのような環境条件下においても実行可能であることを示した。ノイズのタイプおよび量は、Ｈ^~ _１およびＨ^~ _２について良好な推定を行った場合で、しかもそれらが他方の計算中に実質上変化しない場合には、重要ではない。ユーザ環境が、エコーが存在するようなものである場合、それらは、ノイズ・ソースから来たものである場合には、補償を行うことができる。もし、信号エコーも存在する場合、それらは、クリーンにした信号に影響を与えるが、その影響は、ほとんどの環境においては、無視できる程度のものである。 An algorithm for denoising, or denoising algorithm, has been described here, from the simplest case of a single noise source with a direct path to multiple noise sources with reflections and echoes. This algorithm has been shown to be executable under any environmental conditions. The type and amount of noise is not important if good estimates are made for H ^~ ₁ and H ^~ ₂ , and if they do not change substantially during the other calculation. If the user environment is such that echoes are present, they can be compensated if they come from a noise source. If signal echoes are also present, they affect the cleaned signal, but the effect is negligible in most environments.

動作について説明すると、１実施形態のアルゴリズムは、様々なノイズのタイプ、大きさ、方位の取り扱いにおいて、優れた結果を示した。しかし、数学的概念からエンジニアリング用途へ移行するときには、常に近似および調節を行わなければならない。式３では、１つの仮定を行っており、これでは、Ｈ_２（ｚ）は小さく、したがってＨ_２（ｚ）Ｈ_１（ｚ）≒０と仮定し、このため、式３は、以下のようになる。 In operation, the algorithm of one embodiment has shown excellent results in handling various noise types, magnitudes and orientations. However, approximations and adjustments must always be made when moving from mathematical concepts to engineering applications. Equation 3 makes one assumption, which assumes that H ₂ (z) is small and therefore H ₂ (z) H ₁ (z) ≈0, so Equation 3 is become.

このことは、Ｈ_１（ｚ）のみ計算しなければならないことを意味し、これにより、本プロセスをスピードアップし、そして必要な計算数をかなり減少させる。マイクロホンを適切に選択すれば、この近似は容易に実現することができる。 This means that only H ₁ (z) must be calculated, thereby speeding up the process and significantly reducing the number of calculations required. This approximation can be easily realized if the microphone is selected appropriately.

もう１つの近似は、１実施形態において使用するフィルタに関係する。実際のＨ_１（ｚ）は、疑いなく、極とゼロの両方を有することになるが、しかし、安定性および簡単さのためには、全ゼロの有限インパルス応答（ＦＩＲ）フィルタを使用する。十分なタップ（およそ６０）では、実際のＨ_１（ｚ）に対する近似は、非常に良好となる。 Another approximation relates to the filter used in one embodiment. The actual H ₁ (z) will undoubtedly have both poles and zeros, but for stability and simplicity, an all-zero finite impulse response (FIR) filter is used. With enough taps (approximately 60), the approximation to the actual H ₁ (z) is very good.

サブバンド選択に関しては、伝達関数を計算しなければならない周波数範囲が広くなるにつれて、それを正確に計算することがより難しくなる。したがって、音響データは、１６個のサブバンドに分割し、そして最も低い周波数を５０Ｈｚ、最も高いものを３７００とした。次に、本デノイズ処理アルゴリズムを各サブバンドに順番に適用し、そしてこの１６個のデノイズ処理したデータ・ストリームを組み合わせることによって、デノイズ処理した音響データを発生した。これは、非常にうまく機能するが、サブバンドのどのような組み合わせ（すなわち、４，６，８，３２個の等しく離間させ、知覚できる程離間させたもの）も使用でき、そしてこれは、同様に機能することが分かった。 With respect to subband selection, it becomes more difficult to calculate accurately as the frequency range over which the transfer function must be calculated becomes wider. Therefore, the acoustic data was divided into 16 subbands, and the lowest frequency was 50 Hz and the highest was 3700. Next, the denoising algorithm was applied to each subband in turn, and the 16 denoising data streams were combined to generate denoising acoustic data. This works very well, but any combination of subbands (ie 4, 6, 8, 32 equally spaced and perceptually spaced) can be used, and this is similar It turned out to work.

ノイズの大きさは、１実施形態においては抑制することにより、使用したマイクロホンが飽和（すなわち、線形応答領域外での動作）しないようにした。重要なことは、マイクロホンが線形に動作することによって、最良の性能を確保することである。この抑制を伴う場合でも、非常に高い信号対雑音比（ＳＮＲ）のテストを行うことができた（約−１０ｄＢまで）。 The magnitude of the noise is suppressed in one embodiment so that the used microphone does not saturate (ie, operate outside the linear response region). The important thing is to ensure the best performance by the linear operation of the microphone. Even with this suppression, very high signal-to-noise ratio (SNR) tests could be performed (up to about -10 dB).

Ｈ_１（ｚ）の計算は、最小二乗平均法（ＬＭＳ）、一般的な適応性伝達関数を使用して、１０ミリ秒毎に実行した。この説明は、Prentice-Hall発行のWidrowおよびStearnsによる“適応性信号処理（Adaptive Signal Processing）”(1985)，ISBN0-13-004029-0に見ることができる。 The calculation of H ₁ (z) was performed every 10 milliseconds using the least mean square method (LMS), a general adaptive transfer function. This explanation can be found in “Adaptive Signal Processing” (1985), ISBN0-13-004029-0 by Widrow and Stearns, published by Prentice-Hall.

１実施形態に対するＶＡＤは、無線周波数センサおよび２つのマイクロホンから得て、これにより、有声のスピーチおよび無声のスピーチの両方に対し、非常に高い正確さ（＞９９％）を発生した。１実施形態に対するこのＶＡＤは、無線周波数（ＲＦ）干渉計を使用して、人のスピーチ発生に関係する組織運動を検出する（但しこれに限定されるものではない）。したがって、これは、完全に音響ノイズなしであり、このため、どのような音響ノイズ環境においても機能することができる。簡単なエネルギ測定を使用することにより、有声スピーチが生起しているかどうかを判定することができる。無声スピーチは、有声部分への近さにより、または上記の組み合わせによって、従来の周波数ベースの方法を使用して判定することができる。無声スピーチには、それほど多くのエネルギがないため、その活性化の正確さは、有声スピーチ程には重要でない。 The VAD for one embodiment was obtained from a radio frequency sensor and two microphones, which generated very high accuracy (> 99%) for both voiced and unvoiced speech. This VAD for one embodiment uses a radio frequency (RF) interferometer to detect (but is not limited to) tissue motion related to human speech generation. This is therefore completely acoustic noise free and can therefore function in any acoustic noise environment. By using simple energy measurements, it can be determined whether voiced speech is occurring. Unvoiced speech can be determined using conventional frequency-based methods by proximity to the voiced portion or by a combination of the above. Since unvoiced speech does not have that much energy, its activation accuracy is not as important as voiced speech.

有声スピーチおよび無声スピーチを信頼性良く検出することにより、１実施形態のアルゴリズムを実現することができる。ここで再び、ノイズ除去アルゴリズムは、ＶＡＤを得る方法に依存しないこと、これは有声スピーチに対しては特に正確であることのみを繰り返すことは有益である。もしスピーチが検出されずしかもトレーニングがそのスピーチに対して起きる場合、その後続のデノイズ処理された音響データは、歪むことがある。 By detecting voiced speech and unvoiced speech with high reliability, the algorithm of one embodiment can be realized. Here again, it is beneficial to repeat only that the denoising algorithm does not depend on the method of obtaining the VAD, which is particularly accurate for voiced speech. If no speech is detected and training occurs for that speech, the subsequent denoising acoustic data may be distorted.

データは、４つのチャンネルで、すなわち、ＭＩＣ１に対して１つ、ＭＩＣ２に対して１つ、有声スピーチに関連する組織運動を検出する無線周波数センサに対して２つで、収集した。このデータは、４０ＫＨｚで同時にサンプリングし、そして次に、デジタル的にフィルタしそして８ＫＨｚにデシメートした。この高いサンプリング・レートを使用することによって、このアナログ−デジタル・プロセスから生じることのあるどのようなエリアシングも低減するようにした。４チャンネルのナショナル・インスツルメンツのＡ／Ｄボード（National Instruments A/D board）を、Labviewと共に使用して、上記データをキャプチャし格納した。このデータは、次にＣプログラムに読み込み、そして一時に１０ミリ秒デノイズ処理した。 Data was collected on four channels, one for MIC1, one for MIC2, and two for radio frequency sensors that detect tissue motion associated with voiced speech. This data was sampled simultaneously at 40 KHz and then digitally filtered and decimated to 8 KHz. By using this high sampling rate, any aliasing that could result from this analog-to-digital process was reduced. A 4-channel National Instruments A / D board was used with Labview to capture and store the data. This data was then loaded into the C program and denoised for 10 milliseconds at a time.

図６は、空港ターミナルのノイズ（他の多くの話しをする人および公共のアナウンスを含む）の存在下でのアメリカの英語を話す女性に対しての、１実施形態のノイズ抑制アルゴリズムの結果を示している。この話し手は、中位の空港ターミナル・ノイズの真っ只中で、番号４０６−５５６２を発している。汚れた音響データは、一時に１０ミリ秒デノイズ処理し、そしてデータの１０ミリ秒のデノイズ処理の前に５０〜３７００Ｈｚにプレフィルタ処理した。およそ１７ｄＢのノイズ低減が明かとなった。このサンプルには、ポストフィルタ処理は行わなかったため、実現したノイズ低減は全て、１実施形態のこのアルゴリズムに起因するものである。本アルゴリズムは、ノイズに瞬時に適応し、したがって他の人の話者の非常に困難なノイズを除去する能力がある。多くの異なったタイプのノイズ（ほんのいくつかを挙げると、ストリートのノイズ、ヘリコプター、音楽、正弦波）は、その全てをテストしたが、同様の結果となった。また、ノイズの方位は、ノイズ抑制性能を有意に変化させずとも、実質的に変化させることができる。最後に、クリーンにしたスピーチの歪みは、非常に低く、スピーチ認識エンジン並びに人間の受け手に対しても同様に、良好な性能を確保する。 FIG. 6 shows the results of one embodiment of the noise suppression algorithm for an American English-speaking woman in the presence of airport terminal noise (including many other speakers and public announcements). Show. The speaker is calling the number 406-5562 in the middle of the middle airport terminal noise. Dirty acoustic data was denoised 10 milliseconds at a time and prefiltered to 50-3700 Hz prior to 10 millisecond denoising of the data. A noise reduction of about 17 dB was revealed. Since this sample was not post-filtered, all realized noise reduction is due to this algorithm of one embodiment. The algorithm adapts instantly to noise and is therefore capable of removing the very difficult noise of other people's speakers. Many different types of noise (street noise, helicopters, music, sine waves to name just a few) tested all of them, with similar results. Further, the noise direction can be substantially changed without significantly changing the noise suppression performance. Finally, clean speech distortion is very low, ensuring good performance for speech recognition engines as well as human recipients as well.

１実施形態のノイズ除去アルゴリズムは、どのような環境条件の下でも実行可能であることを示した。ノイズのタイプおよび量は、Ｈ^~ _１およびＨ^~ _２について良好な推定が行われた場合には、取るに足りない。もしユーザ環境が、エコーが存在するようなものである場合、これらがノイズ・ソースから来たものである場合にはそれを補償することができる。もし信号エコーも存在する場合、これらは、クリーンにした信号に影響を与えるが、その影響は、ほとんどの環境においては無視できるものである。 It has been shown that the noise removal algorithm of one embodiment can be performed under any environmental conditions. The type and amount of noise is negligible if a good estimate is made for H ^~ ₁ and H ^~ ₂ . If the user environment is such that echoes are present, they can be compensated if they come from noise sources. If signal echoes are also present, these will affect the cleaned signal, but the effect is negligible in most environments.

各種の実施形態について、図面を参照して説明したが、詳細な説明および図面は、限定を意図するものではない。記述した要素の種々の組み合わせについて示さなかったが、これらは、冒頭の特許請求の範囲に記載の本発明の範囲内にあるものである。 While various embodiments have been described with reference to the drawings, the detailed description and drawings are not intended to be limiting. Although various combinations of the described elements have not been shown, they are within the scope of the invention as set forth in the appended claims.

Claims

A noise removal method for removing noise from an acoustic signal,
Receiving a plurality of acoustic signals;
Receiving physiological information related to human vocal activity;
Generating at least one first transfer function representing the plurality of acoustic signals when it is determined that utterance information is not present in the plurality of acoustic signals for at least one specified time;
Generating at least one second transfer function representative of the plurality of acoustic signals when it is determined that speech information is present in the plurality of acoustic signals for at least one specified time;
Using at least one combination of the at least one first transfer function and the at least one second transfer function to remove noise from the plurality of acoustic signals and to perform at least one denoising acoustic Generating a data stream; and
A noise removal method comprising:

The method of claim 1, wherein the plurality of acoustic signals includes at least one reflection of at least one associated noise source signal and at least one reflection of at least one acoustic source signal. To remove noise.

2. The method of claim 1, wherein the step of receiving physiological information uses at least one detector selected from the group consisting of a radio frequency device, an electrical grottograph, an ultrasonic device, an acoustic throat microphone, and an airflow detector. Receiving a physiological data related to human speech by performing a denoising method.

2. The method of claim 1, wherein the step of receiving a plurality of acoustic signals includes receiving using a plurality of independently positioned microphones.

The method of claim 1, wherein removing noise further generates at least one third transfer function using the at least one first transfer function and the at least one second transfer function. The noise removal method characterized by including.

The method of claim 1, wherein generating the at least one first transfer function comprises recalculating the at least one first transfer function during at least one pre-specified interval. Noise elimination method.

The method of claim 1, wherein generating the at least one second transfer function comprises recalculating the at least one second transfer function during at least one pre-specified interval. Noise elimination method.

The method of claim 1, wherein the generating of the at least one first transfer function and the at least one second transfer function comprises using at least one technique selected from the group consisting of adaptive techniques and recursive techniques. A noise removal method characterized by comprising:

A noise removal method for removing noise from an electronic signal,
Detecting the absence of utterance information during at least one time;
Receiving at least one noise source signal during said at least one time;
Generating at least one transfer function representative of the at least one noise source signal;
Receiving at least one composite signal including an acoustic signal and a noise signal;
Generating at least one denoised acoustic data stream from the at least one composite signal by removing the noise signal using the at least one transfer function;
A noise removal method comprising:

The method of claim 9, wherein the at least one noise source signal comprises at least one reflection of at least one associated noise source signal.

The method of claim 9, wherein the at least one composite signal includes at least one reflection of at least one associated composite signal.

10. The method of claim 9, wherein detecting is by using at least one detector selected from the group consisting of a radio frequency device, an electrical grottograph, an ultrasound device, an acoustic throat microphone, and an airflow detector. Collecting a physiological data related to human speech.

The method of claim 9, wherein receiving includes receiving the at least one noise source signal using at least one microphone.

14. The method of claim 13, wherein the at least one microphone includes a plurality of independently positioned microphones.

The method of claim 9, wherein removing the noise signal from the at least one composite signal using the at least one transfer function comprises using at least one other transfer function. Generating a function. A noise removal method comprising:

The method of claim 9, wherein generating at least one transfer function comprises recalculating the at least one transfer function during at least one pre-specified interval. Method.

10. The method of claim 9, wherein generating at least one transfer function is to calculate the at least one transfer function using at least one technique selected from the group consisting of adaptive techniques and recursive techniques. And a noise removing method comprising:

A noise removal method for removing noise from an electronic signal,
Determining at least one silent period during which voiced information is absent;
Receiving at least one noise signal input during the at least one silent period and generating at least one silent transfer function representing the at least one noise signal;
Determining at least one voicing time during which voiced information exists;
Receiving at least one acoustic signal input from at least one signal sensing device between said at least one speech time and generating at least one speech transfer function representative of said at least one acoustic signal;
Receiving at least one composite signal including an acoustic signal and a noise signal;
At least one denoised acoustic data by removing the noise signal from the at least one composite signal using at least one combination of the at least one unvoiced transfer function and the at least one utterance transfer function. A step of generating a stream;
A noise removal method comprising:

A noise removal system for removing noise from an acoustic signal,
At least one receiver for receiving at least one acoustic signal;
At least one sensor for receiving physiological information relating to human vocal activity;
At least one processor coupled between the at least one receiver and the at least one sensor generating a plurality of transfer functions, wherein the at least one first transfer function representing the at least one acoustic signal is , At least one second transfer function representative of the at least one acoustic signal generated in response to determining that utterance information is absent from the at least one acoustic signal for at least one specified time Is generated in response to determining that utterance information is present in the at least one acoustic signal for at least one specified time, and wherein the at least one first transfer function and the at least one Using at least one combination with a second transfer function to remove noise from the at least one acoustic signal; It generates one acoustic data stream denoising process even without, at least one processor of said,
A noise removal system consisting of

21. The denoising system of claim 19, wherein the at least one sensor includes at least one radio frequency (RF) interferometer that detects tissue motion associated with human speech generation.

20. The system of claim 19, wherein the at least one sensor comprises at least one sensor selected from the group consisting of a radio frequency device, an electrical grottograph, an ultrasound device, an acoustic throat microphone, and an airflow detector. A noise removal system characterized by

The system of claim 19, further comprising:
Dividing the acoustic data of the at least one acoustic signal into a plurality of subbands;
Using the at least one combination of the at least one first transfer function and the at least one second transfer function to remove noise from each of the plurality of subbands and to provide a plurality of denoised acoustic data streams Is generated,
Generating the at least one denoising acoustic data stream by combining the plurality of denoising acoustic data streams;
Including a noise removal system.

The system of claim 19, wherein the at least one receiver includes a plurality of independently positioned microphones.

A noise removal system for removing noise from an acoustic signal, comprising: at least one processor coupled between at least one microphone and at least one utterance sensor, wherein the at least one utterance sensor Collect relevant physiological data, detect absence of voiced information using the at least one utterance sensor during at least one time, and at least using the at least one microphone at the at least one time Receiving at least one noise source signal, the at least one processor generates at least one transfer function representing the at least one noise source signal, and the at least one microphone includes at least an acoustic signal and a noise signal; Before receiving one composite signal At least one processor generates at least one denoised acoustic data stream by removing the noise signal from the at least one composite signal using the at least one transfer function; Noise removal system.

A signal processing system coupled between at least one user and at least one electronic device, the signal processing system including at least one denoising subsystem for removing noise from the acoustic signal; A denoising subsystem comprising at least one processor coupled between at least one receiver and at least one sensor, the at least one receiver coupled to receive at least one acoustic signal; The at least one sensor is coupled to receive physiological information related to human vocal activity, wherein the at least one processor generates a plurality of transfer functions and represents at least one acoustic signal A first transfer function wherein the utterance information is the at least one specified time; The at least one second transfer function that is generated in response to the determination of being absent from at least one acoustic signal and the at least one second transfer function represents the utterance information for at least one specified time Using at least one combination of the at least one first transfer function and the at least one second transfer function, in response to determining that the at least one acoustic signal is present in the at least one acoustic signal, A signal processing system for generating at least one denoised acoustic data stream by removing noise from the at least one acoustic signal.

26. The system of claim 25, wherein the at least one electronic device is at least selected from the group consisting of a cellular phone, a personal digital assistant, a portable communication device, a computer, a video camera, a digital camera, a telematic system. A signal processing system comprising: one device.

A computer readable medium containing executable instructions when the executable instructions are executed in a processing system,
Receiving at least one acoustic signal,
Receiving physiological information related to human vocal activity,
Generating at least one first transfer function representative of the at least one acoustic signal in response to determining that utterance information is absent from the at least one acoustic signal for at least one specified time; ,
Generating at least one second transfer function representative of the at least one acoustic signal in response to determining that utterance information is present in the at least one acoustic signal for at least one specified time period; And
At least one denoising acoustic data by removing noise from the at least one acoustic signal using at least one combination of the at least one first transfer function and the at least one second transfer function. Generating a stream,
A computer readable medium characterized by removing noise from the received acoustic signal.

An electromagnetic medium comprising executable instructions when the executable instructions are executed in a processing system;
Receiving at least one acoustic signal,
Receiving physiological information related to human vocal activity,
Generating at least one first transfer function representative of the at least one acoustic signal in response to determining that utterance information is absent from the at least one acoustic signal for at least one specified time; ,
Generating at least one second transfer function representative of the at least one acoustic signal in response to determining that utterance information is present in the at least one acoustic signal for at least one specified time period; And
At least one denoising acoustic data by removing noise from the at least one acoustic signal using at least one combination of the at least one first transfer function and the at least one second transfer function. Generating a stream,
An electromagnetic medium comprising executable instructions, characterized by removing noise from a received acoustic signal.