JP4866958B2

JP4866958B2 - Noise reduction in electronic devices with farfield microphones on the console

Info

Publication number: JP4866958B2
Application number: JP2009509909A
Authority: JP
Inventors: マオシャドン
Original assignee: Sony Interactive Entertainment Inc; Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2006-05-04
Filing date: 2007-03-30
Publication date: 2012-02-01
Anticipated expiration: 2027-03-30
Also published as: EP2014132A4; EP2014132A2; WO2007130766A3; JP4476355B2; JP2009535997A; JP2009535996A; WO2007130766A2; JP4833343B2; EP2012725A4; EP2012725A2; WO2007130765A3; JP2010171985A; WO2007130765A2

Description

[優先権の主張］
本出願は、本出願と譲受人が共通であって本出願と同時に係属する特許文献１の恩恵を主張し、その開示内容全体をここに援用する。本出願は、本出願と譲受人が共通であって本出願と同時に係属する特許文献２の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献３の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献４の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献５の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献６の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献７の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献８の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献９の恩恵を主張し、その開示内容全体をここに援用する。本出願はまた、本出願と譲受人が共通であって本出願と同時に係属する特許文献１０の恩恵を主張し、その開示内容全体をここに援用する。
米国特許出願第11/381,727号,シャドンマオ, "NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE", 2006年5月4日出願, (代理人整理番号SCEA05073US00) 米国特許出願第11/381,729号,シャドンマオ, "ULTRA SMALL MICROPHONE ARRAY", 2006年5月4日出願, (代理人整理番号SCEA05062US00) 米国特許出願第11/381,728号,シャドンマオ, "ECHO AND NOISE CANCELATION", 2006年5月4日出願, (代理人整理番号SCEA05064US00) 米国特許出願第11/381,725号,シャドンマオ, "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION", 2006年5月4日出願, (代理人整理番号SCEA05072US00), 米国特許出願第11/381,724号,シャドンマオ, "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION AND CHARACTERIZATION", 2006年5月4日出願, (代理人整理番号SCEA05079US00) 米国特許出願第11/381,721号,シャドンマオ, "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", 2006年5月4日出願, (代理人整理番号SCEA04005 JUMBOUS) PCT出願 PCT/US06/17483号,シャドンマオ, "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", 2006年5月4日出願, (代理人整理番号SCEA04005 JUMBOPCT) 米国特許出願第11/418,988号,シャドンマオ, "METHODS AND APPARATUSES FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS", 2006年5月4日出願, (代理人整理番号SCEA-00300) 米国特許出願第11/418,989号,シャドンマオ, "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON VISUAL IMAGE", 2006年5月4日出願, (代理人整理番号SCEA-00400) 米国特許出願第11/429,047号,シャドンマオ, "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON A LOCATION OF THE SIGNAL", 2006年5月4日出願, (代理人整理番号SCEA-00500) [Priority claim]
This application claims the benefit of Patent Document 1, which is commonly assigned to this application and is the same as the present application, the entire disclosure of which is incorporated herein by reference. This application claims the benefit of Patent Document 2 that is commonly assigned to this application and is the same as the present application, the entire disclosure of which is incorporated herein. This application also claims the benefit of US Pat. No. 6,057,056, whose assignee is common to this application and is co-pending with this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of US Pat. No. 6,057,056, whose assignee is common to this application and is co-pending with this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of Patent Document 5, which is common to the present application and is assigned at the same time as the present application, the entire disclosure of which is incorporated herein. This application also claims the benefit of US Pat. No. 6,057,097, whose assignee is common to this application and co-pending with this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of Patent Document 7, which is commonly assigned to this application and is pending at the same time as this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of U.S. Pat. No. 6,053,075, whose assignee is common to this application and is co-pending with this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of U.S. Pat. No. 6,053,075, whose assignee is common to this application and is co-pending with this application, the entire disclosure of which is incorporated herein by reference. This application also claims the benefit of US Pat. No. 6,057,056, whose assignee is common to this application and is pending at the same time as this application, the entire disclosure of which is incorporated herein by reference.
US Patent Application No. 11 / 381,727, Shadon Mao, "NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD MICROPHONE ON CONSOLE", filed May 4, 2006, (Attorney Docket Number SCEA05073US00) US Patent Application No. 11 / 381,729, Shadon Mao, "ULTRA SMALL MICROPHONE ARRAY", filed May 4, 2006, (Attorney Docket Number SCEA05062US00) US Patent Application No. 11 / 381,728, Shadon Mao, "ECHO AND NOISE CANCELATION", filed May 4, 2006, (Attorney Docket Number SCEA05064US00) US Patent Application No. 11 / 381,725, Shadon Mao, "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION", filed May 4, 2006, (Attorney Docket Number SCEA05072US00), US Patent Application No. 11 / 381,724, Shadon Mao, "METHODS AND APPARATUS FOR TARGETED SOUND DETECTION AND CHARACTERIZATION", filed May 4, 2006, (Attorney Docket Number SCEA05079US00) US Patent Application No. 11 / 381,721, Shadon Mao, "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", filed May 4, 2006, (Attorney Docket Number SCEA04005 JUMBOUS) PCT Application PCT / US06 / 17483, Shadon Mao, "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING", filed May 4, 2006, (Attorney Docket Number SCEA04005 JUMBOPCT) US Patent Application No. 11 / 418,988, Shadon Mao, "METHODS AND APPARATUSES FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS", filed May 4, 2006, (Attorney Docket Number SCEA-00300) US Patent Application No. 11 / 418,989, Shadon Mao, "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON VISUAL IMAGE", filed May 4, 2006, (Attorney Docket SCEA-00400) US Patent Application No. 11 / 429,047, Shadon Mao, "METHODS AND APPARATUSES FOR CAPTURING AN AUDIO SIGNAL BASED ON A LOCATION OF THE SIGNAL", filed May 4, 2006, (Attorney Docket Number SCEA-00500)

多くの民生用電子装置において、様々なユーザコントロール装置や入力装置を含むコンソールが用いられている。テレビゲームコンソールや、ケーブルテレビセット・トップボックスや、デジタルビデオレコーダのような多くの用途において、コンソールにはマイクロフォンが組み込まれていることが望ましい。コストを削減するために、マイクロフォンは典型的には、優先的な受信方向を有さない従来型の全方向性マイクロフォンである。残念ながら、このような電子装置コンソールは、冷却ファン、ハードディスクドライブ、ＣＤ−ＲＯＭドライブ、デジタルビデオディスク（ＤＶＤ）ドライブのようなノイズ源をも含む。コンソール上に位置するマイクロフォンは、例えばユーザの音声コマンドなどの所望の音声入力を多いに妨害しうる。この問題に対処するために、これらのノイズ源からのノイズをフィルタリング処理により除去する技術がこれらの装置に実装されている。 In many consumer electronic devices, consoles including various user control devices and input devices are used. In many applications, such as video game consoles, cable TV set / top boxes, and digital video recorders, it is desirable for the console to incorporate a microphone. In order to reduce costs, the microphone is typically a conventional omnidirectional microphone that does not have a preferred receiving direction. Unfortunately, such electronic device consoles also include noise sources such as cooling fans, hard disk drives, CD-ROM drives, and digital video disk (DVD) drives. A microphone located on the console can often interfere with a desired voice input, such as a user voice command. In order to cope with this problem, a technique for removing noise from these noise sources by filtering processing is implemented in these devices.

従来の手法は、広帯域（ブロードバンド）に分布するノイズをフィルタリング処理により除去する場合において、効果的であった。例えば、ファンからのノイズはガウス分布に従うため、周波数の広帯域にわたって分布している。このようなノイズは、ガウス分布によってシミュレーションすることができ、コンソールのマイクロフォンの入力信号から打ち消す（ｃａｎｃｅｌ）ことができる。例えばハードディスクやＤＶＤドライブなどのディスクドライブからのノイズは、ガンマ分布や狭帯域ラプラス分布のような狭帯域（ナローバンド）周波数分布によって特徴づけられる。残念ながら、ガウス分布ノイズ用の方法であって、ガンマ分布に従うノイズの除去に適した決定的な方法はない。 The conventional method is effective in removing noise distributed over a wide band by a filtering process. For example, since noise from a fan follows a Gaussian distribution, it is distributed over a wide frequency band. Such noise can be simulated by a Gaussian distribution and can be canceled from the input signal of the console microphone. For example, noise from a disk drive such as a hard disk or a DVD drive is characterized by a narrow band frequency distribution such as a gamma distribution or a narrow band Laplace distribution. Unfortunately, there is no definitive method for gaussian noise and suitable for removing noise that follows a gamma distribution.

［発明の概要］
本発明の実施例は、１以上のマイクロフォンのあるコンソールを有し、狭帯域（ナローバンド）分布ノイズ源もまたそのコンソール上に位置する装置におけるノイズの低減を目的とする。広帯域に分布する所望の音と狭帯域に分布するノイズを含むマイクロフォン信号が、複数の周波数ビンに分割される。各周波数ビンにおいて、信号のその周波数ビン内の一部が、そのコンソールに位置する狭帯域ノイズ源の狭帯域分布特性に属するかどうか、決定される。狭帯域ノイズを低減するために、狭帯域分布に属する信号の一部を含む周波数ビンが、フィルタリング処理される。 [Summary of Invention]
Embodiments of the present invention aim to reduce noise in devices that have a console with one or more microphones, and a narrowband distributed noise source is also located on the console. A microphone signal including a desired sound distributed in a wide band and noise distributed in a narrow band is divided into a plurality of frequency bins. For each frequency bin, it is determined whether a portion of the signal in that frequency bin belongs to the narrow band distribution characteristic of the narrow band noise source located at the console. In order to reduce the narrow band noise, the frequency bin including a part of the signal belonging to the narrow band distribution is filtered.

本発明の教示は、添付の図面とともに、以下の詳細な図面を考慮することによって、容易に理解することができる。
本発明の１実施形態にかかる電子装置の概略図である。図１に示されるタイプの装置におけるノイズ低減方法のフローチャートである。図３Ａ−３Ｂは、本発明の実施形態にかかる狭帯域ノイズ低減を説明するグラフであり、マイクロフォン信号を周波数の関数として示すグラフである。図４Ａ−４Ｂは、本発明の別の実施形態にかかる狭帯域ノイズ低減を説明するグラフであり、異なるマイクロフォンのマイクロフォン信号を周波数の関数として示すグラフである。 The teachings of the present invention can be readily understood by considering the following detailed drawings in conjunction with the accompanying drawings, in which:
1 is a schematic view of an electronic device according to an embodiment of the present invention. 2 is a flowchart of a noise reduction method in the apparatus of the type shown in FIG. 3A-3B are graphs illustrating narrowband noise reduction according to embodiments of the present invention and are graphs showing microphone signals as a function of frequency. 4A-4B are graphs illustrating narrowband noise reduction according to another embodiment of the present invention, showing the microphone signals of different microphones as a function of frequency.

［具体的な実施形態の説明］
以下の詳細な説明は、説明の目的のための具体的な細部を含むが、以下の細部について、多くの変形や変更が、本発明の範囲内において可能であることは、当該技術分野において通常の知識を有する者に理解されるだろう。したがって、以下に記述される本発明の実施例の説明により、特許請求の範囲に記載されている発明が一般性を失うことはなく、また、制限されることはない。 [Description of Specific Embodiment]
The following detailed description includes specific details for the purpose of description, but it is normal in the art that many variations and modifications of the following details are possible within the scope of the present invention. It will be understood by those who have knowledge of. Accordingly, the description of the embodiments of the present invention described below does not lose the generality of the invention described in the scope of claims and is not limited thereto.

図１に示されるように、本発明の実施形態にかかる電子装置１００は、１以上のマイクロフォン１０４Ａ、１０４Ｂを有するコンソール１０２を含む。ここで用いられるように、コンソールという語は、一般的に、計算かつ／または信号処理機能を実行する電子コンポーネントを含むスタンドアローンユニットを指す。コンソールは、例えばジョイスティック１０６のような、１以上の外部入力装置からの入力を受けてもよい。そして、コンソールは、例えばモニタ１０８のような１以上の外部出力装置に出力を供給してもよい。コンソール１０２は、ＣＰＵ１１０とメモリ１１２を含んでもよい。コンソールは、オプションとして、コンソールのコンポーネントを冷却するためのファン１１４を含んでもよい。コンソール１０２は、例えば、ソニープレイステーション（登録商標）のようなテレビゲームシステムのコンソールであってもよく、ケーブルテレビセット・トップボックスであってもよく、カルフォルニア州アルビソのＴｉＶｏＩｎｃが提供するＴｉＶｏデジタルビデオレコーダであってもよい。 As shown in FIG. 1, an electronic device 100 according to an embodiment of the present invention includes a console 102 having one or more microphones 104A, 104B. As used herein, the term console generally refers to a stand-alone unit that includes electronic components that perform computational and / or signal processing functions. The console may receive input from one or more external input devices, such as joystick 106. The console may then provide output to one or more external output devices, such as the monitor 108. The console 102 may include a CPU 110 and a memory 112. The console may optionally include a fan 114 for cooling the components of the console. The console 102 may be a console of a video game system such as Sony PlayStation (registered trademark) or a cable TV set / top box, and is a TiVo digital video recorder provided by TiVoInc in Alviso, California. It may be.

プロセッサユニット１１０とメモリ１１２は、システムバス１１６を介して互いに接続されていてもよい。マイクロフォン１０４Ａと１０４Ｂは、入出力（Ｉ／Ｏ）エレメント１１８を通して、プロセッサかつ／またはメモリと接続されていてもよい。ここで用いられるように、入出力（Ｉ／Ｏ）という言葉は、一般的に、コンソール１００への／からの、および周辺装置への／からのデータを転送する任意のプログラム、オペレーション、または装置を指す。すべてのデータ転送が、一の装置からの出力であり、他の一の装置への入力であると見なすことができるであろう。 The processor unit 110 and the memory 112 may be connected to each other via a system bus 116. Microphones 104A and 104B may be connected to a processor and / or memory through input / output (I / O) elements 118. As used herein, the term input / output (I / O) generally refers to any program, operation, or device that transfers data to / from console 100 and to / from peripheral devices. Point to. All data transfers could be considered output from one device and input to another device.

装置１００は、コンソール１０２に対して内部の、または外部の追加的な１以上の周辺ユニットを含んでもよい。周辺装置は、キーボードやマウスなどの入力のみの装置や、プリンタなどの出力のみの装置、そして上書き可能ＣＤ−ＲＯＭなどの入力装置かつ出力装置として動作する装置を含む。周辺装置という言葉は、マウス、キーボード、プリンタ、モニタ、マイクロフォン、ゲームコントローラ、カメラ、外部Ｚｉｐドライブ、スキャナなどの外部装置と、ＣＤ−ＲＯＭドライブ、ＣＤ−Ｒドライブ、ハードディスクドライブ、ＤＶＤドライブ、内部モデムなどの内部装置（例えば、ディスクドライブ１２０）、および、フラッシュメモリ用リーダ／ライタ、ハードドライブなどのそのほかの周辺装置を含む。 The device 100 may include one or more additional peripheral units internal to or external to the console 102. Peripheral devices include input only devices such as a keyboard and mouse, output only devices such as a printer, and devices that operate as input and output devices such as an overwritable CD-ROM. Peripheral devices are external devices such as mouse, keyboard, printer, monitor, microphone, game controller, camera, external Zip drive, scanner, CD-ROM drive, CD-R drive, hard disk drive, DVD drive, internal modem And other peripheral devices such as a flash memory reader / writer, hard drive, and the like.

コンソールは少なくとも一つの、ディスクドライブ１２０のような狭帯域分布ノイズ源を含む。ディスクドライブ１２０からの狭帯域ノイズは、マイクロフォン入力ｘ_Ａ（ｔ）、ｘ_Ｂ（ｔ）から生成されたデジタル信号からフィルタリング処理される。これにより、例えばリモートソース１０１からの音声などの所望の音が、ディスクドライブ１２０の音によりかき消されないようにできる。狭帯域ノイズはガンマ分布により特徴づけられるかもしれない。ソース１０１からの所望の音は、望ましくは、ガウス分布確率密度関数のような広帯域確率密度関数によって特徴づけられる。 The console includes at least one narrow band distributed noise source, such as a disk drive 120. Narrowband noise from the disk drive 120 is filtered from the digital signal generated from the microphone inputs x _A (t), x _B (t). Thereby, for example, a desired sound such as a sound from the remote source 101 can be prevented from being erased by the sound of the disk drive 120. Narrowband noise may be characterized by a gamma distribution. The desired sound from the source 101 is preferably characterized by a broadband probability density function, such as a Gaussian probability density function.

メモリ１１２は、プロセッサ１１０によって実行可能なコード化された命令、かつ／または、狭帯域ディスクドライブノイズの除去を促進するデータ１１５を含んでもよい。特に、データ１１５は、ディスクドライブからの音の録音の長時間トレーニングデータから作成された分布関数を含んでもよい。分布関数は、ルックアップテーブルの形で格納されていてもよい。 Memory 112 may include coded instructions that are executable by processor 110 and / or data 115 that facilitates removal of narrowband disk drive noise. In particular, the data 115 may include a distribution function created from long-time training data of sound recordings from the disk drive. The distribution function may be stored in the form of a lookup table.

コード化された命令１１３は、図１に示されるタイプの装置における狭帯域分布ノイズを低減するための方法２００を実行してもよい。方法２００によると、１以上のコンソールマイクロフォン入力信号１０４Ａ、１０４Ｂは、ステップ２０２に示されるように、周波数ビンに分割される。信号を複数の周波数ビンに分割するステップは、時間窓で切り取られた信号の一部（例えば、マイクロフォン信号ｘ_Ａ（ｔ））を取得するステップ、その時間窓で切り取られた信号の一部を周波数領域信号ｘ（ｆ）に変換するステップ（例えば高速フーリエ変換を用いて）、周波数領域信号を周波数ビンに分割するステップを含んでもよい。ステップ２０４において、例えば、約３２ミリ秒のマイクロフォンデータが、周波数ビンに分類するためのバッファに格納されているかもしれない。それぞれの周波数ビンについて、信号の一部であるその周波数ビン内の信号が、狭帯域ディスクドライブノイズの狭帯域分布特性に属するかどうか、決定される。ステップ２０６に示されるように、狭帯域分布に属する信号の一部を含む周波数ビンは、入力信号から、フィルタリング処理によって除去される。 The encoded instructions 113 may perform a method 200 for reducing narrowband distributed noise in a device of the type shown in FIG. According to method 200, one or more console microphone input signals 104A, 104B are divided into frequency bins, as shown in step 202. The step of dividing the signal into a plurality of frequency bins includes obtaining a portion of the signal clipped by the time window (eg, microphone signal x _A (t)), and subtracting the portion of the signal clipped by the time window. It may include the steps of converting to a frequency domain signal x (f) (eg, using fast Fourier transform) and dividing the frequency domain signal into frequency bins. In step 204, for example, approximately 32 milliseconds of microphone data may be stored in a buffer for classification into frequency bins. For each frequency bin, it is determined whether the signal within that frequency bin that is part of the signal belongs to the narrow band distribution characteristic of the narrow band disk drive noise. As shown in step 206, frequency bins containing a portion of the signal belonging to the narrowband distribution are removed from the input signal by a filtering process.

入力信号のフィルタリング処理については、図３Ａ−Ｂを参照することにより理解されるであろう。具体的には、図３Ａに示されるように、周波数領域信号ｘ（ｆ）は、広帯域信号３０２と狭帯域信号３０４の組み合わせであると見なすことができるであろう。図３Ｂに示されるように、これらの信号が周波数ビン３０６に分割されるとき、各ビンは、広帯域信号３０２の一部と、狭帯域信号３０４の一部に対応する値を含む。信号ｘ（ｆ）の、所与の周波数ビンにおける狭帯域信号３０４に起因する一部（図３Ｂにおいて、破線の棒グラフで示されている）は、トレーニングデータから予測することができるであろう。この部分は、そのビンにおいて狭帯域ノイズをフィルタリング処理により除去するために、周波数ビン３０６内の値から差し引かれてもよい。 The input signal filtering process will be understood with reference to FIGS. 3A-B. Specifically, as shown in FIG. 3A, the frequency domain signal x (f) could be considered as a combination of a wideband signal 302 and a narrowband signal 304. As shown in FIG. 3B, when these signals are divided into frequency bins 306, each bin includes a value corresponding to a portion of the wideband signal 302 and a portion of the narrowband signal 304. The portion of signal x (f) due to narrowband signal 304 in a given frequency bin (shown in FIG. 3B by a dashed bar graph) could be predicted from the training data. This portion may be subtracted from the value in the frequency bin 306 to remove narrowband noise in that bin by filtering.

狭帯域信号３０４は、以下のように予測されてもよい。初めに、分布モデルのトレーニングのために、大きなボリュームにおいて、狭帯域信号サンプルが採集される。分布モデルは、スピーチモデリングのようなパターン認識技術分野において、当業者に広く知られている。狭帯域信号３０４のための分布モデルは、スピーチモデリングにおいて用いられるモデルと、いくつかの例外を除いて近似する。具体的には、ガウス分布による広帯域分布と考えられているスピーチと異なり、狭帯域信号３０４内の狭帯域ノイズは、「ガンマ」分布密度関数を有する。この分布モデルは、「ガンマミックスモデル」として知られている。これに対して、話者／言語認識のようなスピーチ応用例においては、通常、「ガウス分布ミックスモデル」が用いられる。これら二つのモデルは非常に近似しており、基礎となる分布関数のみが、大きく異なる。モデルトレーニング手法は、スピーチモデリングにおいて広く利用可能である「予測最大」（ＥＭ：Ｅｓｔｉｍａｔｅ−Ｍａｘｉｍｉｚｅ）アルゴリズムに従う。ＥＭアルゴリズムは、トレーニングデータセットから、モデルパラメータの組を予測するインタラクティブな尤度最大化（ｌｉｋｌｉｈｏｏｄｍａｘｉｍｉｚａｔｉｏｎ）方法である。特徴ベクトル（ｆｅａｔｕｒｅｖｅｃｔｏｒ）が、パワースペクトラムの対数から直接的に生成される。これに対して、スピーチモデルにおいては、通常、ＤＣＴやセプトラム係数（ｃｅｐｔｒｕｍ−ｃｏｅｆｆｉｃｉｅｎｔ）のような、さらなる圧縮が適用される。これは、興味の対象となる信号は狭帯域に分布し、広帯域バックグラウンドにおける減衰につながる帯域平均は望しくないからである。実時間において、モデルは、狭帯域ノイズパワースペクトラム密度（ＰＳＤ）を予測するために用いられる。 Narrowband signal 304 may be predicted as follows. Initially, narrowband signal samples are collected in a large volume for training the distribution model. Distribution models are well known to those skilled in the art of pattern recognition techniques such as speech modeling. The distribution model for narrowband signal 304 approximates the model used in speech modeling with some exceptions. Specifically, unlike speech that is considered a wideband distribution with a Gaussian distribution, the narrowband noise in the narrowband signal 304 has a “gamma” distribution density function. This distribution model is known as a “gamma mix model”. On the other hand, in a speech application such as speaker / language recognition, a “Gaussian distribution mixed model” is usually used. These two models are very close and differ only in the underlying distribution function. The model training approach follows an “Estimate-Maximize” (EM) algorithm that is widely available in speech modeling. The EM algorithm is an interactive likelihood maximization method that predicts a set of model parameters from a training data set. A feature vector is generated directly from the logarithm of the power spectrum. In contrast, speech models typically apply further compression, such as DCT or septum-coefficient. This is because the signal of interest is distributed in a narrow band and band averaging that leads to attenuation in the broadband background is not desired. In real time, the model is used to predict narrowband noise power spectral density (PSD).

このようなモデルに対するアルゴリズムは以下のように進められる。 The algorithm for such a model proceeds as follows.

初めに、信号ｘ（ｔ）が、時間領域から周波数領域に変換される。
Ｘ（ｋ）＝ｆｆｔ（ｘ（ｔ））
ここで、ｋは、周波数インデックスである。 First, the signal x (t) is transformed from the time domain to the frequency domain.
X (k) = fft (x (t))
Here, k is a frequency index.

次に周波数領域信号Ｘ（ｋ）から、パワースペクトラムが取得される。
Ｓ_ｙｙ（ｋ）＝Ｘ（ｋ）．^＊ｃｏｎｊ（Ｘ（ｋ））
ここで、「ｃｏｎｊ」は、複素共役を指す。 Next, a power spectrum is acquired from the frequency domain signal X (k).
S _yy (k) = X (k). ^* Conj (X (k))
Here, “conj” refers to a complex conjugate.

次に、パワースペクトラムの対数から、特徴ベクトルＶ（ｋ）が、取得される。
Ｖ（ｋ）＝ｌｏｇ（Ｓ_ｙｙ（ｋ）） Next, a feature vector V (k) is acquired from the logarithm of the power spectrum.
V (k) = log (S _yy (k))

「特徴ベクトル」という語は、パターン認識において広く用いられる語である。基本的に、任意のパターンマッチングは、１）先験的な（ｐｒｉｏｒｉ）特徴空間における分布を定義するあらかじめトレーニングされたモデルと、２）ランタイムが観測される特徴ベクトルと含む。タスクは、特徴ベクトルをモデルに対してマッチングすることである。事前にトレーニングされたガンマ＜モデル＞を所与として、狭帯域ノイズが存在する確率＜Ｐ_ｎ（ｋ）＞は、この観測された特徴Ｖ（ｋ）から取得できる。
Ｐ_ｎ（ｋ）＝Ｇａｍｍａ（Ｍｏｄｅｌ，Ｖ（ｋ）） The term “feature vector” is a word that is widely used in pattern recognition. Basically, arbitrary pattern matching includes 1) a pretrained model that defines a distribution in a priori feature space, and 2) a feature vector whose runtime is observed. The task is to match feature vectors against the model. _Given a pretrained gamma <model>, the probability < _N (k)> that narrowband noise is present can be obtained from this observed feature V (k).
P _n (k) = Gamma (Model, V (k))

狭帯域ノイズＰＳＤは、適応的に更新される。
Ｓ_ｎｎ（ｋ）＝α^＊Ｓ_ｎｎ（ｋ）＋（１−α）^＊Ｓ_ｙｙ（ｋ）^＊Ｐ_ｎ（ｋ）＋Ｓ_ｎｎ（ｋ）^＊（１−Ｐ_ｎ（ｋ））
Ｐｎ（ｋ）が、０であるならば、すなわち、狭帯域ノイズは存在せず、Ｓ_ｎｎ（ｋ）は、変化しない。Ｐ_ｎ（ｋ）＝１であるならば、すなわち、周波数＜ｋ＞は、完全に狭帯域ノイズである。そうすると、
Ｓ_ｎｎ（ｋ）＝α^＊Ｓ_ｎｎ（ｋ）＋（１−α）^＊Ｓ_ｙｙ（ｋ）
が成り立つ。これは基本的に、統計的ピリオドグラム平均である。ここでαは、スムージングファクタである。 The narrowband noise PSD is adaptively updated.
S _nn (k) = α ^* S _nn (k) + (1-α) ^* S _yy (k) ^* P _n (k) + S _nn (k) ^* (1-P _n (k))
If Pn (k) is 0, that is, there is no narrowband noise and S _nn (k) does not change. If P _n (k) = 1, that is, the frequency <k> is completely narrowband noise. Then
S _nn (k) = α ^* S _nn (k) + (1−α) ^* S _yy (k)
Holds. This is basically a statistical periodogram average. Here, α is a smoothing factor.

予測ノイズＰＳＤを所与として、クリーンな音声信号を予測することは、複雑ではない。このような予測を実行するためのアルゴリズムの例は、よく知られており、非特許文献１および非特許文献に記載されるＭＭＳＥに基づく。両文献の開示内容を、ここに援用する。
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust, Speech, Signal Processing, Vol. ASSP-32, 1109-112ページ, 1984年12月 D. Malah, "Speech enhancement using a minimum mean-square error log- spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP- 33, 443-445ページ, 1985年4月 Predicting a clean speech signal given the prediction noise PSD is not complicated. Examples of algorithms for performing such prediction are well known and are based on MMSE described in Non-Patent Literature 1 and Non-Patent Literature. The disclosures of both documents are incorporated herein by reference.
Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust, Speech, Signal Processing, Vol. ASSP-32, 1109-112, 1984 12 Moon D. Malah, "Speech enhancement using a minimum mean-square error log- spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-33, pp. 443-445, April 1985

また別の実施形態においては、フィルタリング処理に当たって、コンソール１０２上の２以上のマイクロフォンの存在を有利に活用してもよい。コンソール１０２上に二つのマイクロフォン１０４Ａ、１０４Ｂがある時には、そのうちの一方（１０４Ｂ）が他方（１０４）よりも、ディスクドライブの近くにあるかもしれない。その結果、マイクロフォン入力信号ｘ_Ａ（ｔ）とｘ_Ｂ（ｔ）とで、ディスクドライブ１２０からのノイズの到着時間が異なる。到着時間の差は、図４Ａ−Ｂに示されるように、入力信号ｘ_Ａ（ｔ）とｘ_Ｂ（ｔ）が、それぞれｘ_Ａ（ｆ）とｘ_Ｂ（ｆ）へと周波数変換されたときの、周波数分布の差に帰着する。これに対して、リモートソースからの周波数が広帯域に分布することに関しては、ｘ_Ａ（ｔ）とｘ_Ｂ（ｔ）とについて、それほど違いはないであろう。しかしながら、マイクロフォン１０４Ａからの狭帯域信号３０４Ａの周波数分布は、マイクロフォン１０４Ｂからの周波数分布３０４Ｂに対して、周波数がシフトしているだろう。周波数ビン３０６に対する狭帯域ノイズの寄与は、二つのマイクロフォン１０４Ａ、１０４Ｂからの周波数領域信号ｘ_Ａ（ｆ）、ｘ_Ｂ（ｆ）から特徴ベクトルＶ（ｋ）を生成することにより決定することができる。 In another embodiment, the presence of two or more microphones on the console 102 may be advantageously utilized in the filtering process. When there are two microphones 104A, 104B on the console 102, one of them (104B) may be closer to the disk drive than the other (104). As a result, the arrival time of noise from the disk drive 120 differs between the microphone input signals x _A (t) and x _B (t). As shown in FIGS. 4A-B, the difference in arrival time is obtained when the input signals x _A (t) and x _B (t) are frequency-converted to x _A (f) and x _B (f), respectively. This results in a difference in frequency distribution. On the other hand, with respect to the fact that the frequency from the remote source is distributed over a wide band, there will not be much difference between x _A (t) and x _B (t). However, the frequency distribution of the narrowband signal 304A from the microphone 104A will be shifted in frequency with respect to the frequency distribution 304B from the microphone 104B. The narrow band noise contribution to the frequency bin 306 can be determined by generating a feature vector V (k) from the frequency domain signals x _A (f), x _B (f) from the two microphones 104A, 104B. .

例えば、第一特徴ベクトルＶ（ｋ，Ａ）は、マイクロフォン１０４ＡについてのパワースペクトラムＳ_ｙｙ（ｋ，Ａ）から生成される。
Ｖ（ｋ，Ａ）＝ｌｏｇ（Ｓ_ｙｙ（ｋ，Ａ）） For example, the first feature vector V (k, A) is generated from the power spectrum S _yy (k, A) for the microphone 104A.
V (k, A) = log (S _yy (k, A))

第二特徴ベクトルＶ（ｋ，Ｂ）は、マイクロフォン１０４ＢについてのパワースペクトラムＳ_ｙｙ（ｋ，Ｂ）から生成される。
Ｖ（ｋ，Ｂ）＝ｌｏｇ（Ｓ_ｙｙ（ｋ，Ｂ）） The second feature vector V (k, B) is generated from the power spectrum S _yy (k, B) for the microphone 104B.
V (k, B) = log (S _yy (k, B))

特徴ベクトルＶ（ｋ）は、Ｖ（ｋ，Ａ）とＶ（ｋ，Ｂ）とを、単純に連結することにより得られる。
Ｖ（ｋ）＝［Ｖ（ｋ，１），Ｖ（ｋ，２）］ The feature vector V (k) is obtained by simply connecting V (k, A) and V (k, B).
V (k) = [V (k, 1), V (k, 2)]

残りのモデルトレーニング、実時間検出は、モデルサイズと特徴ベクトルの次元が倍であることを除いて、同じである。上述の手法においては、アレイビームフォーミング（ａｒｒａｙｂｅａｍｆｏｒｍｉｎｇ）も、到着時間差に依存する手法も用いていないが、実際には、空間情報は、トレーニングされたモデルとランタイム特徴ベクトルに、暗に含まれており、検出における正確性を大いに向上させる。 The rest of the model training and real-time detection is the same except that the model size and feature vector dimensions are doubled. The above approach does not use array beam forming or an approach that depends on arrival time differences, but in practice, spatial information is implicitly included in the trained model and runtime feature vectors. And greatly improve the accuracy of detection.

本発明の実施形態は、ここで提示されたように用いられてもよく、また他のユーザ入力メカニズムと共に用いられてもよい。方位角方向や音声のボリュームを追跡したり測定したりするメカニズム、かつ／または、能動的または受動的にオブジェクトの位置を追跡するメカニズム、マシン・ビジョンを用いるメカニズム、これらの組み合わせなどである。追跡されるオブジェクトは、システムへのフィードバックを操作する補助的なコントロール装置やボタンを含んでもよい。そのようなフィードバックには、光源からの光の放射、音質の歪曲手段、その他の適切な送信機、変調器、コントロール装置、ボタン、圧力パッドなどが含まれてもよいが、これらに制限されるものではない。それは、同じ符号化状態の転送や変調に影響を及ぼしてもよく、かつ／または、システムによって追跡されている装置への命令や、その装置からの命令を転送してもよい。そのような装置は、本発明の実施形態に関連して用いられるシステムの一部であったり、またはシステムと相互作用したり、またはシステムに影響を与えたりする。 Embodiments of the present invention may be used as presented herein and may be used with other user input mechanisms. A mechanism for tracking and measuring the azimuth direction and volume of sound, and / or a mechanism for actively or passively tracking the position of an object, a mechanism using machine vision, a combination thereof, and the like. The tracked object may include auxiliary controls and buttons that manipulate feedback to the system. Such feedback may include, but is not limited to, emission of light from the light source, sound quality distortion means, other suitable transmitters, modulators, control devices, buttons, pressure pads, etc. It is not a thing. It may affect the transfer and modulation of the same coding state and / or transfer instructions to and from the device being tracked by the system. Such devices are part of, or interact with, or affect the system used in connection with embodiments of the present invention.

以上は、本発明の好ましい実施形態の完全な記述であるが、他の様々な変形、変更、等価物への置換が可能である。それゆえ、本発明の範囲は、上記の記述によって決定されるのではなく、以下の請求項によって決定されるべきであり、その完全な等価物もその範囲に含まれる。ここで記述された特徴は、好ましいものであるか否かに関わらず、ここで述べたいずれの特徴と組み合わされてもよい。以下の請求項においては、特に明示的に断らない限りは、各要素の数量は一以上である。ここに、添付される請求項は、所与の請求項において、「〜ための手段」との語句を用いて明示的に示される場合の他は、ミーンズ・プラス・ファンクションの制限を含むと解されてはならない。 The above is a complete description of the preferred embodiment of the present invention, but various other variations, modifications, and equivalents are possible. The scope of the invention should, therefore, be determined not by the above description, but should be determined by the following claims, including their full equivalents. The features described herein may be combined with any of the features described herein, whether or not they are preferred. In the following claims, unless expressly stated otherwise, the quantity of each element is one or more. The claims appended hereto are understood to include means plus function limitations in the given claims, except where explicitly indicated using the phrase “means for”. Must not be done.

Claims

A processor readable medium storing a processor readable instruction set for implementing a noise reduction method in an electronic device having a console, wherein the console has one or more microphones, and the console has a narrowband distributed noise. Noise source is located, the processor is connected to the microphone, the memory is connected to the processor,
The processor readable instructions are:
When executed, causes the apparatus to acquire from the one or more microphones a signal including a desired sound distributed in a wide band and narrowband distributed noise from a noise source located on the console. Instructions and
When executed, the instructions allowed to perform the step of dividing the signal into a plurality of frequency bins,
When executed, for each frequency bin, generate a feature vector from the logarithm of the power spectrum of the signal and match the pre-trained model for that feature vector, so that the signal in the frequency bin part, the instructions allowed to perform the step of determining whether belonging to the narrow band distribution characteristic from the noise source located on the console,
When executed, from the signal data generated from a signal from the one or more microphones, and instructions allowed to perform the step of filtering the frequency bins comprising a portion of the signals belonging to the narrow band distribution characteristic,
A processor readable medium comprising:

Determining whether a portion of the signal in the frequency bin belongs to a narrowband distribution characteristic includes: a value corresponding to the portion of the signal in the frequency bin and a noise source located on the console. The processor-readable medium of claim 1, comprising comparing a value derived from a known signal and stored as a value for that frequency bin.

The one or more microphones include a first microphone and a second microphone;
Obtaining a signal from the one or more microphones includes obtaining a first signal from the first microphone and obtaining a second signal from the second microphone;
The step of determining whether a part of the signal in the frequency bin belongs to a narrowband distribution characteristic is a step of determining a first vector characteristic from the first signal and obtaining a second vector characteristic from the second signal The processor readable medium of claim 1, comprising: forming a combined feature vector from the first signal and the second signal and matching the combined feature vector against a model.

The step of dividing the signal into a plurality of frequency bins comprises:
Capturing a portion of the signal clipped by a time window;
Converting a portion of the signal clipped in the time window into a frequency domain signal;
Dividing the frequency domain signal into a plurality of frequency bins;
The processor readable medium of claim 1 comprising:

The processor-readable medium according to claim 1, wherein the desired sound distributed in a wide band is a voice.

The processor-readable medium of claim 1, wherein the noise source of the narrowband distributed noise is a disk drive.

The processor-readable medium of claim 1, wherein the desired sound distributed over a wide band is characterized by a Gaussian probability density function.

The processor readable medium of claim 1, wherein the narrowband noise is characterized by a gamma distribution probability density function.

An electronic device,
Console,
One or more microphones located on the console;
A narrowband noise source located on the console;
A processor connected to the microphone;
A memory connected to the processor and storing a processor readable instruction set for implementing a noise reduction method;
The processor readable instructions are:
When executed, causes the apparatus to acquire from the one or more microphones a signal including a desired sound distributed in a wide band and narrowband distributed noise from a noise source located on the console. Instructions and
When executed, the instructions allowed to perform the step of dividing the signal into a plurality of frequency bins,
When executed, for each frequency bin, generate a feature vector from the logarithm of the power spectrum of the signal and match the pre-trained model for that feature vector, so that the signal in the frequency bin part, the instructions allowed to perform the step of determining whether belonging to the narrow band distribution characteristic from the noise source located on the console,
When executed, from the signal data generated from a signal from the one or more microphones, and instructions allowed to perform the step of filtering the frequency bins comprising a portion of the signals belonging to the narrow band distribution characteristic,
Including electronic devices.

Instructions allowed to perform the step of determining whether belonging to the narrow band distribution characteristic of the noise source a portion of the signal in that frequency bin for each frequency bin in the are executed is located on the console,
When executed, a value corresponding to a portion of the signal in the frequency bin and a value stored as the value of the frequency bin derived from a known signal from a noise source located on the console The apparatus of claim 9, comprising one or more instructions for comparing to.

Further comprising a lookup table stored in the memory;
The apparatus of claim 10, wherein the look-up table includes the stored value.

The apparatus of claim 9, wherein the one or more microphones include a first microphone and a second microphone.

Instructions allowed to perform the step of obtaining a signal when the run from the one or more microphones,
One or more instructions that, when executed, cause the apparatus to perform the steps of obtaining a first signal from the first microphone and obtaining a second signal from the second microphone;
The step of determining whether a part of the signal in the frequency bin belongs to a narrowband distribution characteristic is a step of determining a first vector characteristic from the first signal and obtaining a second vector characteristic from the second signal And forming a combined feature vector from the first signal and the second signal, and matching the combined feature vector to a model.

Wherein the step of dividing the signal into a plurality of frequency bins when executed allowed to execute instructions,
Capturing a portion of the signal clipped by a time window;
Converting a portion of the signal clipped in the time window into a frequency domain signal;
Dividing the frequency domain signal into a plurality of frequency bins;
10. The device of claim 9, comprising instructions that cause the device to execute.

The apparatus according to claim 9, wherein the desired sound distributed in a wide band is a voice.

The apparatus according to claim 9, wherein the noise source of the narrow band distributed noise is a disk drive.

The apparatus of claim 9, wherein the desired sound distributed over a wide band is characterized by a Gaussian probability density function.

The apparatus of claim 9, wherein the narrowband noise is characterized by a gamma distribution probability density function.

The apparatus of claim 9, wherein the console is a video game console.

The apparatus according to claim 9, wherein the console is a digital video recorder or a cable TV set top box.

A method of reducing noise in a device having a console, the console comprising one or more microphones and a narrowband distributed noise source located on the console;
Obtaining from the one or more microphones a signal comprising a desired sound distributed over a wide band and narrowband distributed noise from a noise source located on the console;
Dividing the signal into a plurality of frequency bins;
By generating a feature vector from the logarithm of the power spectrum of the signal and matching against a pretrained model for that feature vector, for each frequency bin, a portion of the signal in that frequency bin is Determining whether it belongs to a narrowband distribution characteristic from a noise source located above;
Filtering frequency bins including a portion of the signal belonging to a narrowband distribution characteristic from signal data generated from signals from the one or more microphones ;
A method comprising: