JP2003526109A

JP2003526109A - Channel gain correction system and noise reduction method in voice communication

Info

Publication number: JP2003526109A
Application number: JP2000509079A
Authority: JP
Inventors: マウロ、アンソニー・ピー
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 1997-09-02
Filing date: 1997-09-30
Publication date: 2003-09-02
Anticipated expiration: 2017-09-30
Also published as: EP1010169B1; DE69736198D1; DE69736198T2; EP1010169A1; JP4194749B2

Abstract

(57)【要約】【課題】【解決手段】音声処理システム（１０８）の雑音抑制のシステムと手段が提示されている。利得推定器（２２０）は、入力信号の各フレームごとに利得を求め、これにより雑音の抑制レベルが決定される。前記フレームに音声が存在しない場合、利得は予め定められた最小値に設定される。前記フレームに音声が存在する場合、利得調整器（２２４）は、予め定められたひとつの周波数チャネルセットごとの利得係数を決定する。各チャネルの利得係数は、そのチャネルの音声のＳＮＲ（信号対雑音比）の関数である。ＳＮＲ推定器（２１０ｂ）は、エネルギー推定器（２０６ｂ）が提供するチャネルエネルギー推定値および雑音エネルギー推定器（２１４ｂ）が提供するチャネル雑音エネルギー推定値に基づき、チャネルＳＮＲを生成する。雑音エネルギー推定器（２１４ｂ）は、フレームに音声が存在しないと音声検出器（２０８）が判定したとき、その推定値を更新する。 (57) Abstract: A system and means for noise suppression of an audio processing system (108) are presented. A gain estimator (220) determines a gain for each frame of the input signal, and thereby determines a noise suppression level. If there is no speech in the frame, the gain is set to a predetermined minimum value. When voice is present in the frame, the gain adjuster (224) determines a gain coefficient for each predetermined one frequency channel set. The gain factor for each channel is a function of the SNR (signal-to-noise ratio) of the voice for that channel. The SNR estimator (210b) generates a channel SNR based on the channel energy estimate provided by the energy estimator (206b) and the channel noise energy estimate provided by the noise energy estimator (214b). The noise energy estimator (214b) updates the estimated value when the speech detector (208) determines that there is no speech in the frame.

Description

Detailed Description of the Invention

【０００１】（技術分野）本発明は音声処理に関する。より特定的には本発明は、音声処理に用いられる
雑音抑制システムとその方法に関する。TECHNICAL FIELD The present invention relates to voice processing. More particularly, the present invention relates to noise suppression systems and methods used in speech processing.

【０００２】（背景技術）デジタル技法による音声の送信は、特にセルラー電話や個人通信システム（Ｐ
ＣＳ）などの応用分野で広く用いられるようになっている。これがまた、音声処
理技法の改良に対する興味を生じた。改良がなされている１つの領域は雑音抑制
技法の開発である。BACKGROUND ART The transmission of voice by digital techniques is especially useful for cellular telephones and personal communication systems (P
It has been widely used in application fields such as CS). This also generated interest in improving voice processing techniques. One area where improvements have been made is the development of noise suppression techniques.

【０００３】音声通信システムにおける雑音抑制は一般的に、環境的背景雑音を所望の音声
信号からフィルタリングすることによって所望のオーディオ信号の全体的な品質
を改良する目的に適うものである。この音声向上プロセスは、飛行機や、移動中
の車両や、やかましい工場などの異常に高レベルの周辺背景雑音を有する環境に
おいては特に必要である。Noise suppression in speech communication systems generally serves the purpose of improving the overall quality of a desired audio signal by filtering environmental background noise from the desired speech signal. This voice enhancement process is especially necessary in environments with unusually high levels of ambient background noise, such as airplanes, moving vehicles, and noisy factories.

【０００４】１つの雑音抑制技法はスペクトル減算、すなわちスペクトルの利得を修正する
技法である。この方式を用いると、入力オーディオ信号は複数の周波数チャネル
に分割され、これによって、特定の周波数チャネルがその雑音エネルギー含有量
に従って減衰される。各周波数チャネルに対する背景雑音の推定値を利用して、
そのチャネルでの音声の信号対雑音比（ＳＮＲ）を発生し、このＳＮＲ比を用い
て各チャネルの利得係数を計算する。次に、この利得係数によって特定のチャネ
ルの減衰量を決定する。減衰したチャネルは再合成されて、雑音を抑制した出力
信号を生成する。One noise suppression technique is spectral subtraction, a technique that modifies the gain of the spectrum. Using this scheme, the input audio signal is divided into multiple frequency channels, which causes a particular frequency channel to be attenuated according to its noise energy content. Utilizing the background noise estimate for each frequency channel,
The signal-to-noise ratio (SNR) of the voice on that channel is generated and this SNR ratio is used to calculate the gain factor for each channel. Next, the amount of attenuation of a specific channel is determined by this gain coefficient. The attenuated channels are recombined to produce a noise suppressed output signal.

【０００５】比較的高い背景雑音環境を伴う特殊な応用分野では、大抵の雑音抑制技法がか
なりの性能限界を示す。このような応用分野の１例として、セルラーモバイル通
信システムに対する車両スピーカフォンというオプションがある。このスピーカ
フォンオプションは、自動車のドライバにハンドフリーの動作を可能とするもの
である。ハンドフリーマイクロフォンは一般的には、ビザー(visor)の上方に取
り付けられたりして、使用者からかなり隔たったところに置かれる。この遠隔の
マイクロフォンでは、道路と風などの雑音条件のため、ランドエンド(land-end)
側に対して悪いＳＮＲが提供される。ランドエンドで受信された音声は通常は理
解可能であるが、このような背景雑音レベルに対して連続して曝されると、聴取
者の疲労を増すことがしばしばある。In special applications with relatively high background noise environments, most noise suppression techniques exhibit significant performance limits. One example of such an application is the vehicle speakerphone option for cellular mobile communication systems. This speakerphone option allows the driver of the vehicle to operate hands-free. Hands-free microphones are typically mounted above a visor, and placed at a considerable distance from the user. This remote microphone has a land-end due to noise conditions such as roads and wind.
A bad SNR is provided to the side. Speech received at the land end is usually intelligible, but continuous exposure to such background noise levels often increases listener fatigue.

【０００６】雑音抑制システムが適切に機能するためには、音声のＳＮＲを正確に決定する
ことが重要である。しかしながら、現在入手可能な雑音検出器の限界のために、
音声信号のＳＮＲを正確に決定するのは困難である。スペクトル減算技法は、音
声が不在の期間中に背景雑音推定値を更新するものである。音声が不在のときに
、測定されたスペクトルエネルギーは雑音によるものであり、このため、測定さ
れたスペクトルエネルギーに基づいて雑音推定値が更新される。したがって、音
声の存在期間と不在期間を区別して、ＳＮＲを計算するための正確な雑音エネル
ギーを得ることが重要である。For the noise suppression system to work properly, it is important to accurately determine the SNR of the voice. However, due to the limitations of currently available noise detectors,
Accurately determining the SNR of an audio signal is difficult. The spectral subtraction technique updates the background noise estimate during periods of absence of speech. In the absence of speech, the measured spectral energy is due to noise, so the noise estimate is updated based on the measured spectral energy. Therefore, it is important to distinguish between the presence period and the absence period of speech to obtain accurate noise energy for calculating SNR.

【０００７】音声検出のある例示技法では、音声メトリック(metric)カリキュレータを用い
て雑音更新値判断を実行している。音声メトリックとは、チャネルエネルギーの
全体的な音声状特徴の測定値である。最初に、生(raw)のＳＮＲ推定値を用いて
音声メトリック表を割り出し、これによって、各チャネルに対する音声メトリッ
ク値を得る。個々のチャネル音声メトリック値は合計されてエネルギーパラメー
タとなり、これを背景雑音更新しきい値と比較する。この音声メトリック合計値
がこのしきい値以上であれば、その信号は音声を包含していると言われる。音声
メトリック合計値がしきい値未満であれば、入力フレームは雑音と見なされて、
背景雑音更新が実行される。しかしながら、高背景雑音や突然背景雑音や漸増雑
音発生源などの場合、ＳＮＲ測定値は大きな値となり、この結果、音声メトリッ
ク値が高くなり、このため雑音推定更新値が無効となる。One example technique for voice detection uses a voice metric calculator to perform noise update value decisions. Speech metric is a measure of the overall speech-like characteristics of channel energy. First, a raw SNR estimate is used to determine a voice metric table, which yields a voice metric value for each channel. The individual channel voice metric values are summed into an energy parameter, which is compared to the background noise update threshold. If the total voice metric is greater than or equal to this threshold, the signal is said to contain voice. If the total voice metric is less than the threshold, the input frame is considered noise and
Background noise update is performed. However, in the case of high background noise, sudden background noise, increasing noise sources, etc., the SNR measurement will be large, resulting in a high voice metric value, which invalidates the noise estimate update.

【０００８】音声メトリックカリキュレータ技法を洗練させた技法では、チャネルエネルギ
ーの偏差が測定される。この方法では、雑音はある時間にわたって一定のスペク
トルエネルギーを示し、一方音声はある時間にわたって可変のスペクトルエネル
ギーを示すものと仮定される。したがって、チャネルエネルギーは時間に対して
積分され、これによって、チャネルエネルギーの偏差がかなり大きくなると音声
が検出され、一方、チャネルエネルギーの偏差がほとんどなければ雑音が検出さ
れる。チャネルエネルギー偏差を測定する音声検出器は、雑音レベルの突然の変
化を検出する。しかしながら、チャネルエネルギー偏差方法は、入力音声信号が
一定のエネルギーの信号である場合は不正確な結果をもたらす。さらに、漸増雑
音発生源の場合、入力エネルギーが変化すると、エネルギー偏差が大きくなり、
このため、雑音推定更新値がたとえ必要な場合でも無効となってしまう。A refinement of the speech metric calculator technique measures the deviation of the channel energy. In this method, noise is assumed to exhibit constant spectral energy over time, while speech exhibits variable spectral energy over time. Therefore, the channel energies are integrated over time, so that the speech is detected when the deviation of the channel energies is quite large, while the noise is detected when there is little deviation of the channel energies. A speech detector that measures the channel energy deviation detects a sudden change in noise level. However, the channel energy deviation method gives inaccurate results when the input speech signal is a constant energy signal. Furthermore, in the case of a gradually increasing noise source, when the input energy changes, the energy deviation increases,
Therefore, even if the noise estimation update value is necessary, it becomes invalid.

【０００９】正確な音声検出器に加えて、雑音抑制システムは適切にチャネル利得を調整し
なければならない。チャネル利得は、音声品質を犠牲にすることなく雑音が抑制
されるように調整すべきである。チャネル利得を調整する１つの方法では、全体
雑音推定値と音声信号のＳＮＲの関数として利得を計算する。一般に、全体雑音
推定値が増すと、所与のＳＮＲに対する利得係数が減少する。利得係数が低いと
いうことは、減衰係数が高いことを示す。この技法は、全体雑音推定値が非常に
高い場合に、最小の利得値を課して、チャネル利得の過剰減衰を防止するもので
ある。強度にクランプした(clamped)最小利得値を用いることによって、雑音抑
制と音声品質との兼ね合いが導き出される。クランプが比較的低い場合、雑音抑
制は向上するが、音声品質は劣化する。クランプが比較的高ければ、雑音抑制は
劣化するが音声品質は改善する。In addition to accurate speech detectors, noise suppression systems must properly adjust channel gain. The channel gain should be adjusted so that noise is suppressed without sacrificing voice quality. One way to adjust the channel gain is to calculate the gain as a function of the overall noise estimate and the SNR of the speech signal. In general, as the overall noise estimate increases, the gain factor for a given SNR decreases. A low gain coefficient indicates a high attenuation coefficient. This technique imposes a minimum gain value to prevent channel gain over-attenuation when the overall noise estimate is very high. By using a minimum gain value that is strongly clamped, a tradeoff between noise suppression and voice quality is derived. When the clamp is relatively low, noise suppression is improved but voice quality is degraded. If the clamp is relatively high, the noise suppression is degraded but the voice quality is improved.

【００１０】改良型の雑音抑制システムを提供するために、音声検出とチャネル利得計算の
ための現在の技法の限界を指摘する必要がある。これらの問題と欠陥は以下に示
すように本発明によって解決される。In order to provide an improved noise suppression system, it is necessary to point out the limitations of current techniques for voice detection and channel gain calculation. These problems and deficiencies are solved by the present invention as described below.

【００１１】（発明の開示）本発明は、音声処理システムで用いられる雑音抑制のためのシステムと方法で
ある。本発明の目的は、入力信号中に音声が存在することを決定する音声検出器
を提供することである。音声の信号対雑音比（ＳＮＲ）を決定するには信頼性の
高い音声検出器が必要である。音声が不在であると判断されると、入力信号はそ
の全体が雑音信号であると指定されて、雑音エネルギーが測定される。次に、雑
音エネルギーを用いてＳＮＲを決定する。本発明の別の目的は、雑音を抑制する
ための改良型の利得測定エレメントを提供することである。DISCLOSURE OF THE INVENTION The present invention is a system and method for noise suppression used in a speech processing system. It is an object of the invention to provide a voice detector which determines the presence of voice in the input signal. Reliable speech detectors are needed to determine the signal-to-noise ratio (SNR) of speech. If the speech is determined to be absent, the input signal is designated as a noise signal in its entirety and the noise energy is measured. Next, the noise energy is used to determine the SNR. Another object of the invention is to provide an improved gain measurement element for suppressing noise.

【００１２】本発明によれば、雑音抑制システムは、音声が入力信号のフレーム中に存在す
るか否か判断する音声検出器を備えている。音声の存否の判断は、入力信号中の
音声のＳＮＲ尺度に基づいて行われる。ＳＮＲ推定器は、エネルギー推定器が発
生した信号エネルギー推定値と雑音エネルギー推定器が発生した雑音エネルギー
推定値とに基づいてＳＮＲを推定する。音声の存否判断はまた、入力信号の符号
化速度に基づいている。可変速通信システムにおいては、各入力フレームは、入
力フレームの内容に基づいて、所定の速度集合から選択された符号加速度(encor
ding rate)を割り当てられる。一般に、この速度は音声のアクティビティ(activ
ity)のレベルによって異なるため、音声を包含しているフレームには高速度が割
り当てられ、一方、音声を包含していないフレームには低速度が割り当てられる
。さらに、音声存否判断は、入力信号の特徴を記述している１つ以上のモード尺
度に基づくこともある。音声が入力フレーム中に存在しないと判断された場合、
雑音エネルギー推定器は雑音エネルギー推定値を更新する。According to the invention, the noise suppression system comprises a speech detector which determines whether speech is present in the frame of the input signal. The presence / absence of voice is determined based on the SNR scale of the voice in the input signal. The SNR estimator estimates the SNR based on the signal energy estimation value generated by the energy estimator and the noise energy estimation value generated by the noise energy estimator. The voice presence determination is also based on the coding rate of the input signal. In a variable speed communication system, each input frame has a code acceleration (encor
ding rate). In general, this speed is
The frames that contain speech are assigned a high speed, while the frames that do not contain a speech are assigned a low speed, because they depend on the level of (ity). In addition, the voice presence determination may be based on one or more modal measures that describe the characteristics of the input signal. If it is determined that the voice does not exist in the input frame,
The noise energy estimator updates the noise energy estimate.

【００１３】チャネル利得推定器は、入力信号のフレームに対する利得を決定する。音声が
フレーム中に存在しない場合、利得は所定の最小値に設定される。存在する場合
は、利得はフレームの周波数の内容に基づいて決定される。ある好ましい実施形
態では、利得係数は事前定義された集合を成す周波数チャネルの各々に対して決
定される。各チャネルに対して、そのチャネル上の音声のＳＮＲに従って決定さ
れる。チャネル毎に、そのチャネルが存在する周波数バンドの特徴に適した関数
を用いて定義される。一般的には、事前定義された周波数バンドに対して、利得
はＳＮＲが増すと共に自身も線形に増加するように設定される。加えて、各周波
数バンドに対する最小利得は、環境的特徴に基づいて調整することも可能であり
得る。例えば、ユーザー選択可能な最小利得が実現され得る。チャネルＳＮＲは
、エネルギー推定器が発生したチャネルエネルギー推定値と雑音エネルギー推定
器が発生したチャネルエネルギー推定値とに基づいている。利得係数を用いて、
様々なチャネル上の信号の利得を調整し、利得調整されたチャネルは合成されて
、雑音抑制された出力信号を生成する。The channel gain estimator determines the gain for a frame of the input signal. If no speech is present in the frame, the gain is set to a certain minimum value. If present, the gain is determined based on the frequency content of the frame. In a preferred embodiment, the gain factor is determined for each of the predefined set of frequency channels. For each channel, it is determined according to the SNR of the voice on that channel. It is defined for each channel using a function suitable for the characteristics of the frequency band in which the channel exists. Generally, for a predefined frequency band, the gain is set to increase linearly with increasing SNR. In addition, the minimum gain for each frequency band may be adjustable based on environmental characteristics. For example, a user selectable minimum gain may be realized. The channel SNR is based on the channel energy estimate generated by the energy estimator and the channel energy estimate generated by the noise energy estimator. Using the gain factor,
Adjusting the gain of the signals on the various channels, the gain adjusted channels are combined to produce a noise suppressed output signal.

【００１４】（発明を実施するための最良の形態）本発明の特徴、目的及び利点は、全体にわたって同様の参照符号が同様のエレ
メントを示す図面を参照して以下に記述する詳細な説明から明らかであろう。BEST MODE FOR CARRYING OUT THE INVENTION The features, objects and advantages of the present invention will be apparent from the detailed description given below with reference to the drawings in which like reference numerals indicate like elements throughout. Will.

【００１５】音声通信システムにおいては、通常は雑音抑制器を用いて、好ましくない環境
的背景雑音を抑制する。大抵の雑音抑制器は、１つ以上の周波数バンド中の入力
データ信号の背景雑音特徴を推定し、この推定値の平均値をこの入力信号から減
算するように動作する。平均の背景雑音の推定値は音声が不在の期間中に更新さ
れる。雑音抑制器は、正しく動作するには、背景雑音レベルを正確に決定する必
要がある。加えて、雑音の抑制レベルを入力信号の音声と雑音との特徴に基づい
て正しく調整しなければならない。これらの要件は本発明の雑音抑制システムに
よって処理される。In voice communication systems, noise suppressors are typically used to suppress unwanted environmental background noise. Most noise suppressors operate to estimate the background noise features of the input data signal in one or more frequency bands and subtract the average of this estimate from the input signal. The average background noise estimate is updated during periods of absence of speech. The noise suppressor needs to accurately determine the background noise level in order to operate properly. In addition, the noise suppression level must be adjusted correctly based on the speech and noise characteristics of the input signal. These requirements are handled by the noise suppression system of the present invention.

【００１６】本発明が実現されている例示の音声処理システム１００を図１に示す。システ
ム１００はマイクロフォン１０２と、Ａ／Ｄコンバータ１０４と、音声プロセッ
サ１０６と、送信機１１０と、アンテナ１０２と、を備えている。マイクロフォ
ン１０２は、図１に示す他のエレメントと共にセルラー電話中に配置してもよい
。代替例としては、マイクロフォン１０２は、セルラー通信システムの車両スピ
ーカフォンオプションであるハンドフリーマイクロフォンであってもよい。車
両スピーカフォンのアセンブリは時としてカーキットと呼ばれる。マイクロフォ
ン１０２がカーキットの１部である場合、雑音抑制機能は特に重要である。ハン
ドフリーマイクロフォンは一般的に使用者からある程度の距離のところに位置す
るので、受信された音響信号は、道路と風という条件のため悪いＳＮＲを持つ傾
向がある。An exemplary speech processing system 100 in which the present invention is implemented is shown in FIG. The system 100 includes a microphone 102, an A / D converter 104, a voice processor 106, a transmitter 110, and an antenna 102. The microphone 102 may be located in a cellular telephone with the other elements shown in FIG. Alternatively, the microphone 102 may be a hands-free microphone, which is a vehicle speakerphone option for cellular communication systems. The assembly of a vehicle speakerphone is sometimes called a car kit. The noise suppression function is especially important when the microphone 102 is part of a car kit. Since hands-free microphones are typically located some distance from the user, received acoustic signals tend to have poor SNR due to road and wind conditions.

【００１７】図１を引き続き参照すると、音声及び／又は背景雑音を含む入力オーディオ信
号がマイクロフォン１０２によって受信される。入力オーディオ信号はマイクロ
フォン１０２によって、項目ｓ(t)で表される電気音響信号に変換される。この
電気音響信号は、Ａ／Ｄコンバータ１０４によってアナログ信号からパルス符号
変調（ＰＣＭ）サンプルに変換してもよい。ある例示実施形態では、ＰＣＭサン
プルはＡ／Ｄコンバータ１０４から６４kbpsの速度で出力され、図１に示すよう
に信号s(n)として表される。デジタル信号s(n)は、雑音抑制器１０８を他のエレ
メントと共に備えている音声プロセッサ１０６に受信される。雑音抑制器１０８
は本発明に従って信号s(n)中の雑音を抑制する。カーキット(carkit)応用品の中
では、雑音抑制器１０８は背景環境雑音のレベルを測定して、信号の利得を調整
して、このような環境雑音の影響を軽減する。雑音抑制器１０８に加えて、音声
プロセッサ１０６は一般的にはボイスコーダ、すなわちボコーダ（図示せず）を
備えているが、このボコーダは、人間の音声の発生のモデルに関連するパラメー
タを抽出することによって音声を圧縮する。音声プロセッサ１０６はまた、エコ
ーキャンセラ（図示せず）を備えているが、これは、スピーカ（図示せず）とマ
イクロフォン１０２間のフィードバックに起因する音響エコーを解消するもので
ある。With continued reference to FIG. 1, an input audio signal containing voice and / or background noise is received by microphone 102. The input audio signal is converted by the microphone 102 into an electroacoustic signal represented by the item s (t). This electroacoustic signal may be converted from analog signals to pulse code modulation (PCM) samples by A / D converter 104. In one exemplary embodiment, PCM samples are output from A / D converter 104 at a rate of 64 kbps and are represented as signal s (n) as shown in FIG. The digital signal s (n) is received by the voice processor 106 which comprises a noise suppressor 108 along with other elements. Noise suppressor 108
Suppresses noise in the signal s (n) according to the invention. In carkit applications, noise suppressor 108 measures the level of background ambient noise and adjusts the gain of the signal to mitigate the effects of such ambient noise. In addition to the noise suppressor 108, the speech processor 106 typically comprises a voice coder, or vocoder (not shown), which extracts parameters associated with the model of human speech production. To compress the sound. The voice processor 106 also includes an echo canceller (not shown), which eliminates acoustic echo due to feedback between the speaker (not shown) and the microphone 102.

【００１８】音声プロセッサ１０６による処理に続いて、信号は送信機１１０に出力される
が、送信機１１０は、符号分割多重アクセス方式（ＣＤＭＡ）や、時分割多重ア
クセス方式（ＴＤＭＡ）や、周波数分割多重アクセス方式（ＦＤＭＡ）などの所
定の方式に従って変調を実行する。本例示の実施形態では、送信機１１０は、本
発明の譲受人に譲受され、参考としてここに組み込まれる「衛星又は地上中継器
を用いた拡散スペクトル多重アクセス通信システム」（ＳＰＲＥＡＤＳＰＥＣ
ＴＲＵＭＭＵＴＩＰＬＥＡＣＣＥＳＳＣＯＭＭＵＮＩＣＡＴＩＯＮＳＹ
ＳＴＥＭＵＳＩＮＧＳＡＴＥＬＬＩＴＥＯＲＴＥＲＲＥＳＴＲＩＡＬ
ＲＥＰＥＡＴＥＲＳ）という題名の米国特許第４，９０１，３０７号に述べるよ
うなＣＤＭＡ形式に従って信号を変調する。すると、送信機１１０は変調された
信号を上方変換して増幅し、変調された信号はアンテナ１１２から送信される。Following the processing by voice processor 106, the signal is output to transmitter 110, which may employ code division multiple access (CDMA), time division multiple access (TDMA), or frequency division. Modulation is performed according to a predetermined method such as a multiple access method (FDMA). In the exemplary embodiment, transmitter 110 is assigned to the assignee of the present invention and is incorporated herein by reference, "Spread Spectrum Multiple Access Communication System Using Satellite or Terrestrial Repeater" (SPREAD SPEC).
TRUM MUTPLE ACCESS COMMUNICATION SY
STEM USING SATELLITE OR TERRESTRIAL
The signal is modulated according to the CDMA format as described in U.S. Pat. No. 4,901,307 entitled "REPEATERS". Then, the transmitter 110 up-converts and amplifies the modulated signal, and the modulated signal is transmitted from the antenna 112.

【００１９】雑音抑制器１０８は、図１のシステム１００とは異なった音声処理システムと
して実現してもよいことを認識すべきである。例えば、雑音抑制器１０８を、音
声とオプションとを有する電子メール応用例で利用してもよい。このような応用
例中では、図１の送信機１１０とアンテナ１１２とは必要ではない。その代わり
に、雑音抑制された信号が音声プロセッサ１０６によってフォーマッティングさ
れて、電子メールネットワーク上で送信される。It should be appreciated that the noise suppressor 108 may be implemented as a voice processing system different from the system 100 of FIG. For example, the noise suppressor 108 may be utilized in email applications with voice and options. In such an application, transmitter 110 and antenna 112 of FIG. 1 are not needed. Instead, the noise suppressed signal is formatted by the voice processor 106 and sent over the email network.

【００２０】雑音抑制器１０８のある例示実施形態を図２に示す。入力オーディオ信号は図
２に示すように事前プロセッサ２０２によって受信される。事前プロセッサ２０
２は、事前等化(preemphasis)とフレーム発生を実行することによって雑音抑制
するように入力信号を作成する。事前等化は、信号の高周波数音声成分を強調す
ることによって音声信号の出力スペクトル密度を再分布させる。実質的には広域
フィルタリング機能を実行することによって、事前等化処理は、周波数ドメイン
(domain)中にあるこれらの成分のＳＮＲを向上させる。事前プロセッサ２０２は
また、入力信号のサンプルからフレームを発生する。ある好ましい実施形態では
、８０サンプル／フレームの１０ｍｓフレームを発生する。これらのフレームは
サンプルをオーバーラップさせて処理精度を高めることがある。これらのフレー
ムは、入力信号のサンプルをウインドウ処理(windowing)してゼロパッディング
する(zeropadding)ことによって発生させてもよい。プリプロセスされた(prepro
cessed)信号は変換エレメント２０４に出力される。ある好ましい実施形態では
、変換エレメント２０４は、入力信号の各フレームに対して１２８ポイントの高
速フーリエ変換（ＦＦＴ）を発生する。しかしながら、代替スキームを用いて入
力信号の周波数成分を分析してもよいことを理解すべきである。An exemplary embodiment of the noise suppressor 108 is shown in FIG. The input audio signal is received by the pre-processor 202 as shown in FIG. Advance processor 20
2 creates the input signal to be noise suppressed by performing preemphasis and frame generation. Pre-equalization redistributes the output spectral density of a speech signal by emphasizing the high frequency speech components of the signal. By performing the wide-area filtering function, the pre-equalization process is performed in the frequency domain
Improve the SNR of these components in the (domain). The pre-processor 202 also generates frames from samples of the input signal. In a preferred embodiment, a 10 ms frame of 80 samples / frame is generated. These frames may overlap the samples to improve processing accuracy. These frames may be generated by windowing and zero padding the samples of the input signal. Preprocessed (prepro
The cessed) signal is output to the conversion element 204. In a preferred embodiment, transform element 204 produces a 128-point fast Fourier transform (FFT) for each frame of the input signal. However, it should be understood that alternative schemes may be used to analyze the frequency components of the input signal.

【００２１】変換されたこれらの成分はチャネルエネルギー推定器２０６ａに供給され、こ
こでＮチャネル分の変換済み信号を各チャネル毎にエネルギー推定値を発生する
。各チャネルに対して、チャネルエネルギーの更新をするある１つの技法は、前
のフレームのチャネルエネルギーに対して平滑化された現行のチャネルエネルギ
ーとなる更新値を次のように推定する：Ｅ_u(t)=αＥ_ch＋（１―α）Ｅ_u(t-1) （１）ここで、更新された推定値Ｅ_u(t)は現行チャネルエネルギーＥ_chと前の推定チ
ャネル雑音エネルギーＥ_u(t-1)との関数であると定義される。These converted components are supplied to the channel energy estimator 206a, where the N channels of converted signals are used to generate energy estimates for each channel. For each channel, one technique for updating the channel energy estimates an update value that is the current channel energy smoothed with respect to the channel energy of the previous frame as follows: E _u ( t) = αE _ch + (1−α) E _u (t-1) (1) Here, the updated estimated value E _u (t) is the current channel energy E _ch and the previous estimated channel noise energy E _u ( t-1) and is defined as a function of

【００２２】ある好ましい実施形態では、低周波数チャネルのエネルギー推定値と高周波数
チャネルのエネルギー推定値とを、Ｎ＝２となるように決定する。低周波数チャ
ネルは２５０〜２２５０Ｈｚの周波数範囲に対応し、一方、高周波数チャネルは
２２５０〜３５００Ｈｚの周波数範囲に対応している。低周波数チャネルの現行
チャネルエネルギーは、２５０〜２２５０Ｈｚに対応するＦＦＴポイントのエネ
ルギーとを合計することによって決定し、高周波数チャネルの現行チャネルエネ
ルギーは、２２５０〜３５００Ｈｚに対応するＦＦＴポイントのエネルギーを合
計することによって決定してもよい。In a preferred embodiment, the energy estimate for the low frequency channel and the energy estimate for the high frequency channel are determined such that N = 2. The low frequency channel corresponds to the frequency range of 250-2250 Hz, while the high frequency channel corresponds to the frequency range of 2250-3500 Hz. The current channel energy of the low frequency channel is determined by summing the energy of the FFT points corresponding to 250-2250 Hz, and the current channel energy of the high frequency channel is summing the energy of the FFT points corresponding to 2250-3500 Hz. May be determined by

【００２３】これらのエネルギー推定値は音声検出器２０８に供給され、ここで、受信され
たオーディオ信号中に音声が存在するか否か判断する。音声検出器２０８のＳＮ
Ｒ推定器２１０ａは、チャネルエネルギー推定値とチャネル雑音エネルギー推定
値の双方に基づいて、Ｎ個のチャネルの各チャネル上にある音声の信号対雑音比
（ＳＮＲ）を決定する。チャネル雑音エネルギー推定値は雑音エネルギー推定器
２１４ａによって供給されるが、一般的に、音声を包含していない前のフレーム
上で平滑化された雑音エネルギー推定値に対応している。These energy estimates are provided to speech detector 208, which determines whether speech is present in the received audio signal. SN of voice detector 208
R estimator 210a determines the signal-to-noise ratio (SNR) of the speech on each of the N channels based on both the channel energy estimate and the channel noise energy estimate. The channel noise energy estimate is provided by the noise energy estimator 214a, but generally corresponds to the smoothed noise energy estimate on the previous frame that does not contain speech.

【００２４】音声検出器２０８はまた、速度決定エレメント２１２を備えるが、これは、所
定の集合を成すデータレートから入力信号のデータレートを選択する。ある種の
通信システムでは、データは、関連のデータレートがフレーム毎に変化するよう
に符号化される。これは可変レート通信システムとして知られている。可変レー
トスキームに基づいてデータを符号化するボイスコーダは一般的に可変レートボ
コーダと呼ばれる。可変レートボコーダのある例実施形態を、本発明の譲受人に
譲受され、参考としてここに組み込まれる、「可変レートボコーダ」（ＶＡＲＩ
ＡＢＬＥＲＡＴＥＶＯＣＯＤＥＲ）という題名の米国特許第５，４１４，７
９６号に述べられている。可変レート通信チャネルを用いると、送信しても役に
立たない音声がある場合に不必要な送信内容を消去することができる。音声アク
ティビティの変化に従って各フレーム中の情報ビットの数を変化させるために、
アルゴリズムがボコーダ内で利用される。例えば、４つのレートから成る集合を
持つボコーダは、スピーカのアクティビティによって１６、４０、８０又は１７
１の情報ビットを包含する２０ミリ秒のデータフレームを発生する。通信の際に
送信レートを変化させることによって、固定時間内で各データフレームを送信す
るのが好ましい。The voice detector 208 also comprises a rate determining element 212, which selects the data rate of the input signal from a predetermined set of data rates. In some communication systems, the data is encoded such that the associated data rate changes from frame to frame. This is known as a variable rate communication system. A voice coder that encodes data based on a variable rate scheme is commonly referred to as a variable rate vocoder. An example embodiment of a variable rate vocoder, "Variable Rate Vocoder" (VARI), assigned to the assignee of the present invention and incorporated herein by reference.
US Pat. No. 5,414,7 entitled ABLE RATE VOCODER)
No. 96. A variable rate communication channel can be used to eliminate unnecessary transmission content when there is useless audio to transmit. In order to change the number of information bits in each frame as the voice activity changes,
The algorithm is used within the vocoder. For example, a vocoder with a set of four rates may have 16, 40, 80 or 17 depending on speaker activity.
Generate a 20 ms data frame containing one information bit. It is preferable to transmit each data frame within a fixed time by changing the transmission rate during communication.

【００２５】フレームのレートは時間フレーム中の音声アクティビティによって異なるので
、レートを決定することは、音声が存在するか否かに関する情報を提供すること
になる。可変レートを利用しているシステムでは、フレームを最高レートで符号
化すべきであるとする決定は一般に音声の存在を示し、一方、フレームを最低レ
ートで符号化すべきであるとする決定は一般に音声の不在を示す。中間レートは
一般的には、音声の存在と不在の間での遷移(transitions)を示す。Since the rate of the frame depends on the voice activity during the time frame, determining the rate will provide information as to whether voice is present. In systems utilizing variable rates, the decision that a frame should be encoded at the highest rate generally indicates the presence of speech, while the decision that a frame should be encoded at the lowest rate generally Indicates absence. Intermediate rates generally indicate transitions between the presence and absence of speech.

【００２６】レート決定エレメント２１２は、複数個あるレート決定アルゴリズムの内のど
れでも実現し得る。このようなレート決定アルゴリズムがかって、本発明の譲受
人に対して譲受され、参照してここに組み込まれる、「低減レート可変レートボ
コーディングのための方法と装置」（ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＧ
ＵＳＦＯＲＰＥＲＦＯＲＭＩＮＧＲＥＤＵＣＥＤＲＡＴＥＶＡＲＩＡ
ＢＬＥＲＡＴＥＶＯＣＯＤＩＮＧ）という題名の同時係属米国特許出願第０
８／２８６，８４２号に開示されている。この技法はモード尺度(mode measures
)と呼ばれる１集合のレート決定基準を提供する。第１のモード尺度は前の符号
化フレームに基づいた目標整合信号対雑音比（ＴＭＳＮＲ）であり、これは、合
成された音声信号を入力音声信号と比較することによっていかに良好に符号化モ
デルが実行されているかに関する情報となるものである。第２のモード尺度は正
規化自動相関関数(normalized autocorrelation function)（ＮＡＣＦ）であり
、これは音声フレームの周期性を測定するものである。第３のモード尺度はゼロ
交差(zero crossings)（ＺＣ）パラメータであり、これは入力音声フレーム中の
高周波数成分を測定するものである。第４の尺度である予測利得微分(predictio
n gain differential)（ＰＧＤ）はエンコーダがその予測効率を維持しているか
否かを判断するものである。第５の尺度はエネルギー微分(energy differential
)（ＥＤ）であり、これは現行フレーム中のエネルギーを平均フレームエネルギ
ーと比較するものである。これらのモード尺度を用いて、レート決定ロジックは
入力のフレームの符号化レートを選択する。Rate determining element 212 may be implemented in any of a plurality of rate determining algorithms. Such a rate determination algorithm, once assigned to the assignee of the present invention and incorporated herein by reference, is a "method and apparatus for reduced rate variable rate vocoding" (METHOD AND APPARATG.
US FOR PERFORMING REDUCED RATE VARIA
Co-pending US Patent Application No. 0 entitled BLE RATE VOCODING)
No. 8 / 286,842. This technique uses mode measures
), Which provides a set of rate decision criteria. The first modal measure is the Target Matched Signal-to-Noise Ratio (TMSNR) based on the previous coded frame, which shows how well the coding model is by comparing the synthesized speech signal with the input speech signal. It is information about what is being done. The second mode measure is the normalized autocorrelation function (NACF), which measures the periodicity of speech frames. The third modal measure is the zero crossings (ZC) parameter, which measures high frequency components in the input speech frame. The fourth measure, the predictive gain derivative
The n gain differential (PGD) is for determining whether the encoder maintains its prediction efficiency. The fifth measure is energy differential.
) (ED), which compares the energy in the current frame with the average frame energy. Using these modal measures, the rate determination logic selects the coding rate of the incoming frame.

【００２７】レート決定エレメント２１２は、図２では雑音抑制器１０８に含まれるエレメ
ントとして示されているが、その代わりにレート情報を音声プロセッサ１０６の
別の構成部品によって雑音プロセッサ１０８に提供するようにしてもよいことを
理解すべきである（図１）。例えば、音声プロセッサ１０６は、入力信号の各フ
レームに対する符号化レートを決定する可変レートボコーダ（図示せず）を備え
ることがある。雑音抑制器１０８に単独でレート決定させる代わりに、レート情
報を可変レートボコーダによって雑音抑制器１０８に提供するようにしてもよい
。Although rate determining element 212 is shown in FIG. 2 as an element included in noise suppressor 108, it is instead intended to provide rate information to noise processor 108 by another component of speech processor 106. It should be understood that it may be (Figure 1). For example, audio processor 106 may include a variable rate vocoder (not shown) that determines the coding rate for each frame of the input signal. Instead of having the noise suppressor 108 determine the rate alone, the rate information may be provided to the noise suppressor 108 by a variable rate vocoder.

【００２８】また、レートを判断して音声の存在を決定する代わりに、音声検出器２０８が
、レートの判断に寄与するモード尺度から成るサブ集合を用いてもよいことを理
解すべきである。例えば、レート決定エレメント２１２の代わりにＮＡＣＦエレ
メント（図示せず）を用いてもよいが、これは、すでに述べたように、音声フレ
ームの周期性を測定するものである。ＮＡＣＦは以下の関係式に従って評価され
る：It should also be appreciated that instead of determining the rate and determining the presence of speech, the speech detector 208 may use a subset of modal measures that contribute to the rate determination. For example, a NACF element (not shown) may be used in place of the rate determining element 212, which, as already mentioned, measures the periodicity of the speech frame. NACF is evaluated according to the following relationship:

【数１】 [Equation 1]

【００２９】ここで、Ｎは音声フレームのサンプルの数であり、ｔ₁とｔ₂は、ＮＡＣＦを評
価する目標であるＴ個のサンプル内の境界のことである。ＮＡＣＦはホルマント
(formant)の残留信号ｅ（ｎ）に基づいて評価される。ホルマント周波数は音声
の共鳴周波数である。短期フィルタを用いて音声信号をフィルタリングして、フ
ォルマント周波数を得る。この短期(short term)フィルタによるフィルタリング
後に得られる残留信号がフォルマント残留信号であり、ピッチ(pitch)など、信
号の長期音声情報を包含している。Where N is the number of samples in the speech frame and t ₁ and t ₂ are the boundaries within the T samples that are the targets for evaluating the NACF. NACF is a formant
(formant) residual signal e (n). The formant frequency is the resonance frequency of voice. The voice signal is filtered using a short-term filter to obtain the formant frequency. The residual signal obtained after filtering by this short term filter is the formant residual signal, which contains the long-term speech information of the signal, such as the pitch.

【００３０】ＮＡＣＦモード尺度は、発声された音声を包含している信号の周期性が発声さ
れた音声を包含していない信号とは異なるので、音声の存在を決定するのに適し
ている。発声された音声は周期的な成分によって特徴付けられる傾向がある。発
声された音声が存在しない場合、信号は一般に周期的な成分を有しない。したが
って、ＮＡＣＦ尺度は音声検出器２０８が用いる良好なインジケータであり得る
。The NACF mode measure is suitable for determining the presence of speech, since the periodicity of the signal containing vocalized speech differs from the signal not containing vocalized speech. Spoken speech tends to be characterized by periodic components. In the absence of vocalized speech, the signal generally has no periodic component. Therefore, the NACF measure may be a good indicator used by the voice detector 208.

【００３１】音声検出器２０８は、レート決定結果を発生するのが実用的ではない状況にお
いてレート決定結果の代わりにＮＡＣＦなどの尺度を用いることがある。例えば
、レート決定結果が可変レートボコーダから入手可能でなく、雑音プロセッサ１
０８が自分自身のレート決定結果を発生する処理パワーを持たない場合、ＮＡＣ
Ｆなどのモード尺度が所望の代替物を提供する。これは、処理パワーが概して制
限されているカーキット応用例などに当てはまる。Speech detector 208 may use a measure such as NACF in place of rate determination results in situations where it is not practical to generate rate determination results. For example, if the rate determination result is not available from the variable rate vocoder and the noise processor 1
If the 08 does not have the processing power to generate its own rate determination result, NAC
Modal measures such as F provide the desired alternative. This is the case, for example, in car kit applications where processing power is generally limited.

【００３２】加えて、音声検出器２０８は、レート決定結果やモード尺度やＳＮＲ推定値だ
けに基づいて音声の存在に関する決定をすることを理解すべきである。さらなる
尺度によって決定の精度を向上させるべきであるとはいえ、これらの尺度のどの
１つだけでも適切な結果をもたらし得る。In addition, it should be appreciated that the voice detector 208 makes a decision regarding the presence of voice based solely on the rate decision result, the mode metric, and the SNR estimate. Although any further measures should improve the accuracy of the decision, any one of these measures alone may give reasonable results.

【００３３】レート決定結果（又はモード尺度）とＳＮＲ推定器２１０ａによって発生され
たＳＮＲ推定値とは、音声判断エレメント２１６に提供される。音声判断エレメ
ント２１６は、入力信号中に音声が存在するか否かをその入力に基づく判断を発
生する。音声の存在に関する判断によって、雑音エネルギーの推定値を更新する
か否かが決定される。雑音エネルギー推定値はＳＮＲ推定器２１０ａによって用
いられて、入力信号中の音声のＳＮＲを決定する。このＳＮＲは次に、雑音抑制
のための入力信号の減衰のレベルを計算するために用いられる。音声が存在する
と判断された場合、音声判断エレメント２１６はスイッチ２１８ａを開いて、雑
音推定器２１４ａが雑音エネルギー推定値を更新しないようにする。音声が存在
しないと判断された場合、入力信号は雑音であると推測され、音声判断エレメン
ト２１６はスイッチ２１８ａを閉じて、雑音エネルギー推定器２１８ａに雑音推
定値を更新させる。図２ではスイッチ２１８ａと示されているが、音声判断エレ
メント２１６から雑音エネルギー推定器２１４ａに供給されたイネーブル信号も
同じ機能を実行することを理解すべきである。The rate determination result (or mode measure) and the SNR estimate generated by SNR estimator 210 a are provided to speech decision element 216. The voice decision element 216 makes a decision based on the input whether voice is present in the input signal. The decision regarding the presence of speech determines whether or not to update the noise energy estimate. The noise energy estimate is used by the SNR estimator 210a to determine the SNR of the speech in the input signal. This SNR is then used to calculate the level of attenuation of the input signal for noise suppression. If it is determined that speech is present, speech determination element 216 opens switch 218a to prevent noise estimator 214a from updating the noise energy estimate. If it is determined that no speech is present, then the input signal is presumed to be noise and the speech decision element 216 closes switch 218a causing the noise energy estimator 218a to update the noise estimate. Although shown as switch 218a in FIG. 2, it should be understood that the enable signal provided to the noise energy estimator 214a from the voice decision element 216 performs the same function.

【００３４】２つのチャネルＳＮＲが評価されるある好ましい実施形態では、音声判断エレ
メント２１６は以下の手順に基づいて雑音更新判断(the noise update decision
)を発生する：（レート＝最小値）であれば ( (chsnr1 ＞Ｔ1)又は(chsne2 ＞Ｔ2））であれば、（レートカウント(ratecount) ＞Ｔ3）であれば、雑音推定値を更新するそうでなければレートカウント＋＋そうでなければ雑音推定値を更新するレートカウント＝０そうでなければレートカウント＝０ＳＮＲ推定器２１０ａによって供給されたチャネルＳＮＲ推定値はchsnr1とch
snr2とによって表される。レート決定エレメント２１２によって供給された入力
信号のレートはレート(rate)で表される。カウンタ、レートカウントは、以下に
述べるある種の条件に基づいてフレームの数を追跡する。In one preferred embodiment where two channel SNRs are evaluated, the voice decision element 216 determines the noise update decision based on the following procedure.
) Is generated: If (rate = minimum value) ((chsnr1> T1) or (chsne2> T2)), if (rate count (ratecount)> T3), the noise estimation value is updated. Otherwise rate count ++ otherwise update noise estimate rate count = 0 otherwise rate count = 0 channel SNR estimate provided by SNR estimator 210a is chsnr1 and ch
Represented by snr2. The rate of the input signal provided by the rate determining element 212 is represented by rate. A counter, rate count, keeps track of the number of frames based on certain conditions described below.

【００３５】音声判断エレメント２１６は、レートが可変レートの内の最小値であり、chsn
r1がしきい値Ｔ1より大きいか又はchsnr2がしきい値Ｔ2より大きくて、レートカ
ウントがしきい値Ｔ3より大きい場合は、音声が存在せず、また、雑音推定ｃｈ
じを更新すべきであると判断する。レートが最小値であり、ｃｈｓｎｒ１がＴ１
より大きいか又はｃｈｓｎｒ２がＴ２より大きいがレートカウントがＴ3より小
さい場合、レートカウントは１つだけ増加されるが、雑音推定値は更新されない
。カウンタ、レートカウントは、最小レートを有するフレームの数をカウントす
るが同時に複数のチャネルの内の少なくとも１つのチャネルに高エネルギーを有
することによって、雑音レベルが突然増加する場合又は雑音発生源が漸増する場
合を検出する。高ＳＮＲ信号が音声を包含していないことを示すインジケータと
なるカウンタは、信号中に音声が検出されるまではカウントに設定される。ある
好ましい実施形態は、１０ｍｓフレームが評価されるところのＴ1＝Ｔ2＝５ｄＢ
、Ｔ2＝１００フレームを設定する。The voice judgment element 216 determines that the rate is the minimum value of the variable rates, and chsn
If r1 is greater than the threshold T1 or chsnr2 is greater than the threshold T2 and the rate count is greater than the threshold T3, then there is no voice and the noise estimation ch
Judge that the same should be updated. Rate is the minimum value, chsnr1 is T1
Greater than or chsnr2 is greater than T2 but the rate count is less than T3, the rate count is incremented by one but the noise estimate is not updated. A counter, a rate count, counts the number of frames with the lowest rate but has high energy in at least one of the channels at the same time, so that the noise level suddenly increases or the noise source gradually increases. Detect the case. The counter, which is an indicator that the high SNR signal does not contain voice, is set to count until voice is detected in the signal. One preferred embodiment is T1 = T2 = 5 dB where a 10 ms frame is evaluated.
, T2 = 100 frames are set.

【００３６】レートが最小値であり、chsnr1がＴ1未満であり、chsnr2がＴ2未満である場合
、音声判断エレメント２１６は、音声が存在せず、したがって、雑音推定値を更
新すべきであると判断する。加えて、レートカウントがゼロにリセットされる。If the rate is at a minimum value, chsnr1 is less than T1 and chsnr2 is less than T2, the speech decision element 216 decides that there is no speech and therefore the noise estimate should be updated. To do. In addition, the rate count is reset to zero.

【００３７】レートが最小値でなければ、音声判断エレメント２１６は、フレームが音声を
包含しており、したがって、雑音推定値を更新すべきではないと判断し、レート
カウント(ratecount)はゼロにリセットされる。If the rate is not at the minimum value, the voice decision element 216 determines that the frame contains voice and therefore the noise estimate should not be updated and the rate count is reset to zero. To be done.

【００３８】レート尺度(ratemeasure)を用いて音声の存在を判断する代わりに、ＮＡＣＦ
尺度などのモード尺度(mode measures)を利用し得ることを思い出すべきである
。音声判断エレメント２１６はＮＡＣＦ尺度を利用して音声の存在を判断するこ
とがあり、したがって、雑音更新決定は以下の手順に従って実行される：もしも( ( pitch Present＝＝偽り(FALSE)であればもしも( (chsnr1＞ＴＨ1）又は(chsnr2＞ＴＨ2）であればもしも(pitchCount＞ＴＨ3）であれば雑音推定値を更新するそうでなければ pitchCount＋＋そうでなければ雑音推定値を更新する pitchCount＝０そうでなければ pitchCount＝０ここで、pitchPresentは次のように定義される：もしも(NACF＞ＴＴ1）であれば pitchPresent＝真実(TRUE) NACFヌカウント＝０そうでなくて（ＴＴ２≦ＮＡＣＦ≦ＴＴ１）であればもしも（NACFCOUNT＞ＴＴ３）であれば pitchPresent＝真実そうでなければ pitchPresent＝偽り NACFCOUNT＋＋そうでなければ pitchPresent＝偽れ NACFCOUNT＝０さらに、ＳＮＲ推定器２１０ａが供給したチャネルＳＮＲ推定値はchsnr1とch
snr2で表される。ＮＡＣＦエレメント（図示せず）は、上記のピッチの存在を示
す尺度であるpitchPresentを発生する。カウンタであるpitchCountは以下に述べ
るある種の条件に基づいてフレームの数を追跡する。Instead of using a rate measure to determine the presence of speech, NACF
It should be remembered that mode measures such as measures can be used. The speech decision element 216 may utilize the NACF measure to determine the presence of speech, so the noise update decision is performed according to the following procedure: If ((pitch Present == false (FALSE) ((chsnr1> TH1) or (chsnr2> TH2) if (pitchCount> TH3) update the noise estimate otherwise pitchCount + + otherwise update the noise estimate pitchCount = 0 Otherwise pitchCount = 0 where pitchPresent is defined as follows: pitchPresent = TRUE NACF Nucount = 0 if (NACF> TT1) otherwise (TT2 ≦ NACF ≦ TT1) If (NACFCOUNT> TT3) pitchPresent = truth otherwise pitchPresent = false NACFCOUNT + + otherwise pitchPresent = false NACFCOUNT = 0 Furthermore, SNR Channel SNR estimates Joki 210a is supplied chsnr1 and ch
It is represented by snr2. The NACF element (not shown) produces pitchPresent, which is a measure of the presence of the above pitch. The counter, pitchCount, keeps track of the number of frames based on certain conditions described below.

【００３９】尺度pitchPresentは、ＮＡＣＦがしきい値ＴＴ１より大きいとピッチが存在す
ると判断する。ＮＡＣＦがしきい値ＴＴ３より大きい複数のフレームに対して中
間範囲（ＴＴ２≦ＮＣＦ≦ＴＴ１）にある場合も、ピッチが存在すると判断され
る。カウンタ、ＮＡＣＦｃｏｕｎｔは、The scale pitchPresent determines that pitch exists when NACF is greater than the threshold value TT1. If the NACF is in the intermediate range (TT2 ≦ NCF ≦ TT1) for a plurality of frames larger than the threshold value TT3, it is determined that the pitch exists. The counter, NACFcount,

【数２】 [Equation 2]

【００４０】が成立するフレームの数を追跡する。ある好ましい実施形態では、１０ｍｓフレ
ームが評価されるＴＴ１＝０．６、ＴＴ２＝０．４、ＴＴ３＝８フレームとなっ
ている。Keep track of the number of frames where In a preferred embodiment, 10 ms frames are evaluated with TT1 = 0.6, TT2 = 0.4, TT3 = 8 frames.

【００４１】音声判断エレメント２１６は、pitchPresent尺度がピッチが存在しないことを
示しており（pitchPtrsent＝偽り）、chsnr1がしきい値ＴＨ１より大きいか又は
chsnr2がしきい値ＴＴ２より大きく、また、pitchCountがしきい値ＴＨ３より大
きい場合、音声が存在せず、したがって、雑音推定値を更新すべきであると判断
する。pitchPresent＝偽りであり、chsnr1がＴＨ１より大きいか又はchnsr2がＴ
Ｈ２より大きいが、pitchCountがＴＨ３未満である場合、pitchCountは１つ増加
されるが雑音推定値は更新されない。カウンタpitchCountを用いて、雑音のレベ
ルの突然の増加や雑音発生源の漸増を検出する。ある好ましい実施形態では、１
０ｍｓフレームが評価されるＴ１＝Ｔ２＝５ｄＢ、Ｔ２＝１００フレームという
条件が設定される。The speech decision element 216 indicates that the pitchPresent measure indicates that there is no pitch (pitchPtrsent = false) and chsnr1 is greater than the threshold TH1 or
If chsnr2 is greater than the threshold value TT2 and pitchCount is greater than the threshold value TH3, it is determined that there is no voice and therefore the noise estimate should be updated. pitchPresent = false and chsnr1 is greater than TH1 or chnsr2 is T
If it is greater than H2 but pitchCount is less than TH3, pitchCount is incremented by one but the noise estimate is not updated. The counter pitchCount is used to detect sudden increases in noise level and gradual increase in noise sources. In one preferred embodiment, 1
The conditions of T1 = T2 = 5 dB and T2 = 100 frames in which 0 ms frame is evaluated are set.

【００４２】ピッチが存在しないことをpitchPresentが示し、chsnr1がＴＨ１未満であるか
又はchsnr2がＴＨ２未満である場合、音声判断エレメント２１６は、音声が存在
せず、したがって、雑音推定値を更新すべきであると判断する。加えて、pitchC
opuntがゼロにリセットされる。If pitchPresent indicates that there is no pitch and chsnr1 is less than TH1 or chsnr2 is less than TH2, the speech decision element 216 has no speech and therefore should update the noise estimate. It is determined that In addition, pitchC
opunt is reset to zero.

【００４３】ピッチが存在することをpitchPresentが示す（pitchPresent＝真実）場合、音
声判断エレメント２１６は、フレームが音声を包含しており、したがって、雑音
推定値を更新すべきではないと判断する。しかしながら、pitchCountはゼロにリ
セットされる。If pitchPresent indicates that a pitch is present (pitchPresent = truth), then the speech decision element 216 decides that the frame contains speech and therefore the noise estimate should not be updated. However, pitchCount is reset to zero.

【００４４】音声が存在しないと判断されると、スイッチ２１８ａは閉じられて、雑音エネ
ルギー推定器２１４ａが雑音推定値を更新する。雑音エネルギー推定器２１４ａ
は一般に、Ｎチャネル分の入力信号の各々に対する雑音エネルギー推定値を発生
する。音声は存在しないので、エネルギーは全部雑音によるものであると推測さ
れる。各チャネルに対して、雑音エネルギー更新値は、音声を包含しない前のフ
レームのチャネルエネルギーに対して平滑化された現行のチャネルエネルギーで
あると推定される。例えば、更新された推定値は以下の関係式に基づいて得られ
る：Ｅ_n(t) ＝βＥ_ch + (1-β)Ｅ_n(t-1), （３）ここで、更新された推定値Ｅ_n(t)は、現行のチャネルエネルギーＥ_chと前の推定
チャネル雑音エネルギーＥ_n(t-1)の関数として定義される。ある例示実施形態で
はβ＝０．１と設定される。更新されたチャネル雑音エネルギー推定値はＳＮＲ
推定器２１０ａに提供される。これらのチャネル雑音エネルギー推定値を用いて
、入力信号の次のフレームのチャネルＳＮＲ推定更新値を得る。If it is determined that no speech is present, switch 218a is closed and noise energy estimator 214a updates the noise estimate. Noise energy estimator 214a
Generally produces a noise energy estimate for each of the N channels of input signal. Since there is no voice, it is assumed that all the energy is due to noise. For each channel, the noise energy update is estimated to be the current channel energy smoothed with respect to the channel energy of the previous frame that does not contain speech. For example, the updated estimate is obtained based on the following relation: E _n (t) = βE _ch + (1-β) E _n (t-1), (3) where the updated estimate The value E _n (t) is defined as a function of the current channel energy E _ch and the previous estimated channel noise energy E _n (t-1). In one exemplary embodiment, β = 0.1 is set. The updated channel noise energy estimate is the SNR
It is provided to the estimator 210a. These channel noise energy estimates are used to obtain the channel SNR estimate update for the next frame of the input signal.

【００４５】音声の存在に関する決定はチャネル利得推定器２２０にも提供される。チャネ
ル利得推定器２２０は利得を決定し、こうして入力信号のフレームに対する雑音
抑制レベルを決定する。音声決定成分２１６がその音声の存在を決定した場合、
フレームに対する利得が所定の最低利得レベルに設定される。そうでなければ、
利得は周波数の関数として決定される。好ましい実施形態では、利得は図３に示
すグラフに基づいて計算される。図３においてグラフで示しているが、図３に示
した関数はチャネル利得推定器２２０においてルップアップ表として実装しても
よいことを理解すべきである。The decision regarding the presence of speech is also provided to the channel gain estimator 220. The channel gain estimator 220 determines the gain and thus the noise suppression level for the frame of the input signal. If the voice decision component 216 determines the presence of that voice,
The gain for the frame is set to a predetermined minimum gain level. Otherwise,
Gain is determined as a function of frequency. In the preferred embodiment, the gain is calculated based on the graph shown in FIG. Although shown graphically in FIG. 3, it should be understood that the functions shown in FIG. 3 may be implemented as a look-up table in channel gain estimator 220.

【００４６】図３において、本発明の好ましい実施形態が各々のＬ周波数バンド(band)のた
めに別々の利得曲線を限定することが解る。図３において３つのバンド（Ｌ＝３
）が表示されているが、Ｌは１以上のどのような数であってもよい。このように
、低バンドのチャネル用の利得係数を低バンド曲線を使用して決定し、中間バン
ドのチャネル用の利得係数を中間バンド曲線を使用して決定し、高バンドのチャ
ネル用の利得係数を高バンド曲線を使用して決定してもよい。It can be seen in FIG. 3 that the preferred embodiment of the present invention defines a separate gain curve for each L frequency band. In FIG. 3, three bands (L = 3
) Is displayed, L may be any number of 1 or more. Thus, the gain factor for the low band channel is determined using the low band curve, the gain factor for the intermediate band channel is determined using the intermediate band curve, and the gain factor for the high band channel is determined. May be determined using a high band curve.

【００４７】入力信号用の１つだけの利得曲線（Ｌ＝１）を利用して雑音抑制を実施しても
よいが、多数のバンドを使用した場合の方が音声の品質低下が少ないことが見い
出されている。道路や風による雑音等の環境的な雑音の場合、雑音信号のエネル
ギーは低い方の周波数において大きくなり、一般にこのエネルギーは周波数が増
大するにつれて減少する。Noise suppression may be implemented using only one gain curve (L = 1) for the input signal, but when multiple bands are used, the voice quality is less likely to deteriorate. Have been found. In the case of environmental noise, such as road and wind noise, the energy of the noise signal is large at the lower frequencies and generally this energy decreases as the frequency increases.

【００４８】図３において、固定された傾斜(slope)とｙ-インターセプトを備えた直線式を
使用して、各々のバンド用の利得係数を決定する。利得係数の決定は以下の関係
によって説明することができる：利得［低バンド］(dB)＝傾斜１＊ＳＮＲ＋低バンドｙ-インターセプト；（４）利得［中間バンド］(dB)＝傾斜２＊ＳＮＲ＋中間バンドｙ-インターセプト；（５）利得［高バンド］(dB)＝傾斜３＊ＳＮＲ＋高バンドｙ-インターセプト；（６）好ましい実施形態は低バンドを１２５〜３７５Ｈｚと指定し、中間バンドを３
７５〜２６２５Ｈｚと指定し、高バンドを２６２５〜４０００Ｈｚと指定する。
傾斜とｙ-インターセプトは実験的に決定される。好ましい実施形態は３つのバ
ンドの各々について同じ傾斜０．３９を使用するが、各々の周波数バンドに対し
て異なる傾斜を使用してもよい。更に、低バンドｙ-インターセプトは−１７ｄ
Ｂに設定され、中間バンドｙ-インターセプトは−１３ｄＢに設定され、高バン
ドｙ-インターセプトは−１３ｄＢに設定される。In FIG. 3, a linear equation with a fixed slope and y-intercept is used to determine the gain factor for each band. The determination of the gain factor can be explained by the following relationship: gain [low band] (dB) = slope 1 * SNR + low band y-intercept; (4) gain [middle band] (dB) = slope 2 * SNR + (5) Gain [High Band] (dB) = Slope 3 * SNR + High Band y-Intercept; (6) The preferred embodiment specifies the low band as 125-375 Hz and the intermediate band as 3.
Designate as 75-2625 Hz and high band as 2625-4000 Hz.
The slope and y-intercept are determined empirically. Although the preferred embodiment uses the same slope 0.39 for each of the three bands, different slopes may be used for each frequency band. Furthermore, the low band y-intercept is -17d.
B, the middle band y-intercept is set to -13 dB, and the high band y-intercept is set to -13 dB.

【００４９】所望のｙ-インターセプトを選択するために、任意の特徴が雑音抑制器を備え
る装置のユーザを提供するであろう。このように、音声劣化を犠牲にして、より
多くの雑音抑制（低い方のｙ-インターセプト）を選んでもよい。あるいは、ｙ-
インターセプトは雑音抑制器１０８によって決定されるある測定単位の関数とし
て可変であってもよい。例えば、所定の期間に過度の雑音エネルギーが検出され
た場合、より多くの雑音抑制（低い方のｙ-インターセプト）が望ましいかもし
れない。あるいは、バブル等の状態が検出された場合は、少ない雑音抑制（高い
方のｙ-インターセプト）が望ましいかもしれない。バブル状態の間に、バック
グラウンドスピーカが存在し、メインスピーカのカットアウトを防止するために
少ない雑音抑制が正当化されるかもしれない。別の任意の特徴が利得曲線の選択
可能な傾斜を準備するであろう。更に、特定の状況下で利得係数を決定するため
に式（４）〜（６）によって説明される直線以外の曲線の方が適していることが
見い出されるかもしれない。In order to select the desired y-intercept, any feature will provide the user of the device with a noise suppressor. Thus, more noise suppression (lower y-intercept) may be chosen at the expense of speech degradation. Or y-
The intercept may be variable as a function of some unit of measure determined by the noise suppressor 108. For example, more noise suppression (lower y-intercept) may be desirable if excessive noise energy is detected during a given time period. Alternatively, less noise suppression (higher y-intercept) may be desirable if conditions such as bubbles are detected. During the bubble state, a background speaker is present and less noise suppression may be justified to prevent cutout of the main speaker. Another optional feature would provide for a selectable slope of the gain curve. Further, it may be found that curves other than the straight lines described by equations (4)-(6) are more suitable for determining the gain factor under certain circumstances.

【００５０】音声を含む各々のフレームに対して、入力信号のＭ個の周波数チャネルの各々
に対して利得係数が決定され、Ｍは評価すべき所定数のチャネルである。好まし
い実施形態では１６のチャネル（Ｍ＝１６）を評価する。再び図３において、低
バンドの範囲内の周波数成分を有するチャネルに対する利得係数は低バンド曲線
を使用して決定される。中間バンドの範囲内の周波数成分を有するチャネルに対
する利得係数は中間バンド曲線を使用して決定される。高バンドの範囲内の周波
数成分を有するチャネルに対する利得係数は高バンド曲線を使用して決定される
。For each frame containing speech, a gain factor is determined for each of the M frequency channels of the input signal, M being a predetermined number of channels to evaluate. The preferred embodiment evaluates 16 channels (M = 16). Referring again to FIG. 3, the gain factor for channels having frequency components within the low band is determined using the low band curve. The gain factor for channels with frequency components within the mid band is determined using the mid band curve. The gain factor for channels with frequency components in the high band range is determined using the high band curve.

【００５１】評価される各々のチャネルに対して、チャネルＳＮＲを使用して適切な曲線に
基づく利得係数を引き出す。図２において、チャネルＳＮＲはチャネルエネルギ
ー推定器２０６ｂと、雑音エネルギー推定器２１４ｂとＳＮＲ推定器２１０ｂに
よって評価されることが示されている。入力信号の各々のフレームに対して、チ
ャネルエネルギー推定器２０６ｂは変換された入力信号のＭ個のチャネルの各々
に対してエネルギー見積りを発生させ、エネルギー見積りをＳＮＲ推定器２１０
ｂに提供する。チャネルエネルギー見積りは上記の式（１）の関係を使用して更
新することができる。音声決定成分２１６によって入力信号内に如何なる音声も
存在しないと決定された場合、スイッチ２１８ｂが閉じられ、雑音エネルギー推
定器２１４ｂがチャネル雑音エネルギーの見積りを更新する。Ｍ個のチャネルの
各々に対して、更新された雑音エネルギー見積りはチャネルエネルギー推定器２
０６ｂによって決定されるチャネルエネルギー見積りに基づいている。更新され
た見積りは上記に式（３）の関係を使用して評価することができる。チャネル雑
音見積りはＳＮＲ推定器２１０ｂに提供される。こうして、ＳＮＲ推定器２１０
ｂは特定の音声フレームに対するチャネル利得見積りに基づいて各々の音声フレ
ームのためのチャネルＳＮＲ見積りを決定し、チャネル雑音エネルギー見積りが
雑音エネルギー推定器２１４ｂによって提供される。For each channel evaluated, the channel SNR is used to derive the appropriate curve-based gain factor. In FIG. 2, the channel SNR is shown to be evaluated by the channel energy estimator 206b, the noise energy estimator 214b and the SNR estimator 210b. For each frame of the input signal, channel energy estimator 206b generates an energy estimate for each of the M channels of the transformed input signal, and the energy estimate is SNR estimator 210.
b. The channel energy estimate can be updated using the relationship in equation (1) above. If the speech decision component 216 determines that no speech is present in the input signal, the switch 218b is closed and the noise energy estimator 214b updates the channel noise energy estimate. For each of the M channels, the updated noise energy estimate is the channel energy estimator 2
06b based on the channel energy estimate. The updated estimate can be evaluated using the relationship in equation (3) above. The channel noise estimate is provided to SNR estimator 210b. Thus, the SNR estimator 210
b determines a channel SNR estimate for each voice frame based on the channel gain estimate for the particular voice frame, and a channel noise energy estimate is provided by noise energy estimator 214b.

【００５２】当業者であれば、チャネルエネルギー推定器２０６ａと、雑音エネルギー推定
器２１４ａと、スイッチ２１８ａと、ＳＮＲ推定器２１０ａとが、チャネルエネ
ルギー推定器２０６ｂと、雑音エネルギー推定器２１４ｂと、スイッチ２１８ｂ
と、ＳＮＲ推定器２１０ｂと同様の機能を各々果たすことを認識するであろう。
このように、図２において別々の処理成分として示されているが、チャネルエネ
ルギー推定器２０６ａと２０６ｂが１つの処理成分として組み合わされてもよく
、雑音エネルギー推定器２１４ａと２１４ｂが１つの処理成分として組み合わさ
れてもよく、スイッチ２１８ａと２１８ｂが１つの処理成分として組み合わされ
てもよく、またＳＮＲ推定器２１０ａと２１０ｂが１つの処理成分として組み合
わされてもよい。組み合わされた成分として、チャネルエネルギー推定器は音声
検出のために使用されるＮ個のチャネルと、チャネル利得係数を決定するために
使用されるＭ個のチャネルの両方のためにチャネルエネルギー見積りを決定する
であろう。Ｎ＝Ｍが可能であることに注意。同様に、雑音エネルギー推定器とＳ
ＮＲ推定器はＮ個のチャネルとＭ個のチャネルの両方に対して作用するであろう
。そしてＳＮＲ推定器は音声決定成分２１６にＮ個のＳＮＲ見積りを提供し、チ
ャネル利得推定器２２０にＭ個のＳＮＲ見積りを提供する。Those skilled in the art will understand that the channel energy estimator 206a, the noise energy estimator 214a, the switch 218a, the SNR estimator 210a, the channel energy estimator 206b, the noise energy estimator 214b, and the switch 218b.
, And will each perform a similar function as the SNR estimator 210b.
Thus, although shown as separate processing components in FIG. 2, channel energy estimators 206a and 206b may be combined as one processing component and noise energy estimators 214a and 214b may be combined as one processing component. Switches 218a and 218b may be combined as one processing component, and SNR estimators 210a and 210b may be combined as one processing component. As a combined component, the channel energy estimator determines channel energy estimates for both the N channels used for speech detection and the M channels used for determining channel gain factors. Will do. Note that N = M is possible. Similarly, the noise energy estimator and S
The NR estimator will work for both N and M channels. The SNR estimator then provides N SNR estimates for the speech decision component 216 and M gain SNR estimates for the channel gain estimator 220.

【００５３】チャネル利得係数はチャネル利得推定器２２０によって利得調整器２２４に提
供される。利得調整器２２４は変換成分２０４からＦＦＴ変換された入力信号を
受信する。変換信号の利得はチャネル利得係数に従って適宜調整される。例えば
、Ｍ＝１６である上述の実施形態では、１６個のチャネルのうち特定のチャネル
に属する変換された（ＦＦＴ）ポイントが適切なチャネル利得係数に基づいて調
整される。The channel gain factors are provided by the channel gain estimator 220 to the gain adjuster 224. Gain adjuster 224 receives the FFT transformed input signal from transform component 204. The gain of the converted signal is adjusted appropriately according to the channel gain coefficient. For example, in the above embodiment where M = 16, the transformed (FFT) points belonging to a particular channel of the 16 channels are adjusted based on the appropriate channel gain factor.

【００５４】利得調整器２２４によって発生される利得調整された信号は次に変換成分２２
６を逆転させるために提供され、好ましい実施形態では変換成分２２６は信号の
逆高速フーリエ変換（ＩＦＦＴ）を発生させる。入力のフレームが重ねられたサ
ンプルで形成されている場合、後工程成分(post processing element)２２８は
オーバーラップのために出力信号を調整する。また後工程成分２２８は、信号が
プレエンファシスを経験した場合、デエンファシス(deemphasis)を実施する。デ
エンファシスは事前等化の間に強調された周波数成分を減衰させる。事前等化／
デエンファシスプロセスは、処理済み周波数成分の範囲外にある雑音成分を減少
させることによって、雑音抑制に効果的に貢献する。The gain adjusted signal generated by the gain adjuster 224 is then converted to the conversion component 22.
6 is provided to invert, and in the preferred embodiment transform component 226 produces an inverse fast Fourier transform (IFFT) of the signal. If the input frame is formed of superimposed samples, post processing element 228 conditions the output signal for overlap. The post-process component 228 also performs deemphasis if the signal experiences preemphasis. De-emphasis attenuates the emphasized frequency components during pre-equalization. Pre-equalization /
The de-emphasis process effectively contributes to noise suppression by reducing noise components that are outside the range of processed frequency components.

【００５５】図２に示した雑音抑制器の様々な処理ブロックをデジタル信号プロセッサ（Ｄ
ＳＰ）またはアプリケーション特有の集積回路（ＡＳＩＣ）内に構成してもよい
。本発明の機能性を説明すれば、当業者は過度の実験を行うことなくＤＳＰまた
はＡＳＩＣに本発明を実装することができるであろう。The various processing blocks of the noise suppressor shown in FIG.
SP) or application specific integrated circuit (ASIC). Given the functionality of the present invention, one of ordinary skill in the art would be able to implement the present invention in a DSP or ASIC without undue experimentation.

【００５６】次に図４において、図２と３に関連して説明した処理に含まれるステップの一
部を図示するフローチャートが示されている。連続的なステップとして示されて
いるが、当業者であればステップの一部の順序を交換できることを認識するであ
ろう。Referring now to FIG. 4, a flowchart illustrating some of the steps involved in the process described in connection with FIGS. 2 and 3 is shown. Although shown as sequential steps, those skilled in the art will recognize that the order of some of the steps may be interchanged.

【００５７】プロセスはステップ４０２で始まる。ステップ４０４において、変換成分２０
４は入力されたオーディオ信号を変換された信号、慨してＦＦＴ信号に変換する
。ステップ４０６において、ＳＮＲ推定器２１０ｂはチャネルエネルギー推定器
２０６ｂによって提供されるチャネルエネルギー見積りと、雑音エネルギー推定
器２１４ｂによって提供されるチャネル雑音エネルギー見積りに基づいて、入力
信号のＭ個のチャネルに対する音声ＳＮＲを決定する。ステップ４０８において
、チャネル利得推定器２２０がチャネルの周波数に基づいて、入力信号のＭ個の
チャネルに対する利得係数を決定する。チャネル利得推定器２２０は入力信号の
フレームに音声がないことが見い出された場合、利得を最低レベルに設定する。
そうでなければ、所定の関数に基づいてＭ個のチャネルの各々に対する利得係数
が決定される。例えば、図３において、固定された傾斜とを備えた直線式によっ
て限定される関数を使用してもよく、その場合各々の直線式が所定の周波数バン
ドに対する利得を限定する。ステップ４１０において、利得調整器２２４がＭ個
の利得係数を使用して、変換された信号のＭ個のチャネルの利得を調整する。ス
テップ４１２において、逆変換成分２２６が利得調整された変換信号を変換し、
雑音抑制されたオーディオ信号を作り出す。The process begins at step 402. In step 404, the transformed component 20
Reference numeral 4 converts the input audio signal into a converted signal, which is converted into an FFT signal. In step 406, the SNR estimator 210b determines the voice SNR for the M channels of the input signal based on the channel energy estimate provided by the channel energy estimator 206b and the channel noise energy estimate provided by the noise energy estimator 214b. To decide. In step 408, channel gain estimator 220 determines gain factors for the M channels of the input signal based on the frequencies of the channels. The channel gain estimator 220 sets the gain to the lowest level if it is found that there is no speech in the frame of the input signal.
Otherwise, the gain factor for each of the M channels is determined based on a predetermined function. For example, in FIG. 3, functions defined by linear equations with a fixed slope may be used, where each linear equation limits the gain for a given frequency band. In step 410, gain adjuster 224 uses the M gain factors to adjust the gains of the M channels of the converted signal. In step 412, the inverse transform component 226 transforms the gain adjusted transformed signal,
Produces a noise suppressed audio signal.

【００５８】ステップ４１４において、ＳＮＲ推定器２１０ａがチャネルエネルギー推定器
２０６ａによって提供されるチャネルエネルギー見積りと、雑音エネルギー推定
器２１４ａによって提供されるチャネル雑音エネルギー見積りに基づいて、入力
信号のＮ個のチャネルに対する音声ＳＮＲを決定する。ステップ４１６において
、レート決定要素２１２が入力信号の分析を通して入力信号に対する符号化レー
トを決定する。あるいは、ＮＡＣＦ等の１つ以上のモード手段を決定してもよい
。ステップ４１８において、音声決定要素２１６はＳＮＲ推定器２１０ａによっ
て提供されたＳＮＲと、レート決定要素によって提供されたレート及び／または
モード手段に基づいて、入力信号に音声が存在するかどうかを決定する。決定ブ
ロック４２０において、音声が存在しないと決定された場合、入力信号は完全に
雑音であると仮定され、ステップ４２２において雑音エネルギー推定器２１４ａ
によって雑音見積りが更新される。雑音エネルギー推定器２１４ａはチャネルエ
ネルギー推定器２０６ａによって決定されるチャネルエネルギーに基づいて、雑
音見積りを更新する。音声が検出されてもされなくても、手順は入力信号の次の
フレームの処理を続ける。At step 414, the SNR estimator 210a uses the channel energy estimate provided by the channel energy estimator 206a and the channel noise energy estimate provided by the noise energy estimator 214a to determine N channels of the input signal. Determine the voice SNR for In step 416, the rate determining element 212 determines the coding rate for the input signal through analysis of the input signal. Alternatively, one or more mode means such as NACF may be determined. In step 418, the speech decision element 216 determines whether speech is present in the input signal based on the SNR provided by the SNR estimator 210a and the rate and / or mode means provided by the rate decision element. If at decision block 420 it is determined that no speech is present, then the input signal is assumed to be completely noisy, and at step 422 the noise energy estimator 214a.
Updates the noise estimate. Noise energy estimator 214a updates the noise estimate based on the channel energy determined by channel energy estimator 206a. Whether or not speech is detected, the procedure continues processing the next frame of the input signal.

【００５９】好ましい実施形態の前述の説明は、当業者が本発明を利用または使用できるよ
うにするために提供されたものである。これらの実施形態に対する様々な変更は
当業者にとっては容易に自明となるであろうし、ここで定義された一般的な原則
を発明的な才能を使用しないでも他の実施形態に適用することができる。このよ
うに、本発明をここで示した実施形態に制限することは意図しておらず、ここで
開示された原則及び新規の特徴と矛盾しない最も幅広い範囲と一致すべきもので
ある。The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments without the use of inventive talents. . As such, the invention is not intended to be limited to the embodiments shown herein, but should be consistent with the broadest scope consistent with the principles and novel features disclosed herein.

[Brief description of drawings]

【図１】雑音抑制器を利用した通信システムノブロック図である。[Figure 1] FIG. 4 is a block diagram of a communication system using a noise suppressor.

【図２】本発明による通信システムを示すブロック図である。[Fig. 2] 1 is a block diagram showing a communication system according to the present invention.

【図３】本発明による雑音抑制を実現するための周波数に基づいた利得係数のグラフで
ある。FIG. 3 is a graph of frequency-based gain factors for implementing noise suppression according to the present invention.

【図４】図２の処理用エレメントによって実現されるような雑音抑制に必要な処理ステ
ップの例示実施形態を示すフローチャートである。4 is a flow chart illustrating an exemplary embodiment of the processing steps required for noise suppression as implemented by the processing element of FIG.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＧＨ，ＫＥ，ＬＳ，ＭＷ，ＳＤ，ＳＺ，ＵＧ，ＺＷ)，ＥＡ(ＡＭ，ＡＺ，ＢＹ，ＫＧ，ＫＺ，ＭＤ，ＲＵ，ＴＪ，ＴＭ)，ＡＬ，ＡＭ，ＡＴ，ＡＵ，ＡＺ，ＢＡ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＵ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＧＨ，ＨＵ，ＩＤ，ＩＬ，ＩＳ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＣ，ＬＫ，ＬＲ，ＬＳ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＫ，ＭＮ，ＭＷ，ＭＸ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＧ，ＳＩ，ＳＫ，ＳＬ，ＴＪ，ＴＭ，ＴＲ，ＴＴ，ＵＡ，ＵＧ，ＵＺ，ＶＮ，ＹＵ，ＺＷ─────────────────────────────────────────────────── ─── Continued front page (81) Designated countries EP (AT, BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, L U, MC, NL, PT, SE), OA (BF, BJ, CF) , CG, CI, CM, GA, GN, ML, MR, NE, SN, TD, TG), AP (GH, KE, LS, MW, S D, SZ, UG, ZW), EA (AM, AZ, BY, KG , KZ, MD, RU, TJ, TM), AL, AM, AT , AU, AZ, BA, BB, BG, BR, BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, F I, GB, GE, GH, HU, ID, IL, IS, JP , KE, KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, M W, MX, NO, NZ, PL, PT, RO, RU, SD , SE, SG, SI, SK, SL, TJ, TM, TR, TT, UA, UG, UZ, VN, YU, ZW

Claims

[Claims]

1. A noise suppressor for suppressing background noise of a speech signal, a signal-to-noise ratio (SNR) estimator for generating a channel SNR estimate for a predetermined first frequency channel set of the speech signal. A gain estimator for generating a gain coefficient based on the corresponding one of the channel SNR estimation values is provided for each frequency channel, and the gain coefficient is SN
And a gain adjuster that adjusts a gain level for each of the frequency channels based on the corresponding gain coefficient, the background noise of the audio signal being suppressed. Noise suppressor.

2. The noise suppressor of claim 1, wherein the gain function is frequency dependent.

3. The gain function is implemented as a look-up table.
Noise suppressor.

4. The noise suppressor according to claim 1, wherein the gain function is a linear function having a slope and a y-intercept.

5. The noise suppressor of claim 4, wherein a user can select the y-intercept.

6. The noise suppressor of claim 4, wherein the y-intercept can be adjusted based on measured noise characteristics of the audio signal.

7. The noise suppressor of claim 4, wherein the slope is user selectable.

8. The noise suppressor according to claim 4, wherein the slope can be adjusted based on a noise characteristic measurement value of the voice signal.

9. A voice detector for determining the presence of voice in the voice signal;
And a noise energy estimator for generating an updated channel noise energy estimate for each of the frequency channels when the speech detector determines that no speech is present in the speech signal, the updated channel The noise suppressor of claim 1, wherein a noise energy estimate is provided to the SNR estimator that produces the channel SNR estimate.

10. A signal-to-noise ratio (SNR) estimator, wherein the speech detector produces a channel SNR estimate for a predetermined second frequency channel set of the speech signal; and the second frequency. 10. The noise suppressor of claim 9, comprising: a voice determination element that determines the presence of voice according to the channel SNR estimate of a channel set.

11. The speech detector comprises a rate determination element for determining an encoding rate of the first variable rate set of the speech signal; the speech determination element further comprising the presence of speech according to the encoding rate. The noise suppressor according to claim 10, which determines.

12. The voice detector comprises a mode measuring element for determining at least one mode measure characterizing the voice signal; the voice determining element further comprising at least one mode measuring means 11. The noise suppressor of claim 10, which determines that is present.

13. The noise suppressor according to claim 12, wherein the mode measuring means is a normalized autocorrelation function (NACF).

14. Means for determining the presence of voice in the voice signal; Means for generating a channel signal-to-noise ratio (SNR) of one predetermined frequency channel set of the voice signal; Voice is present If the means for determining that there is a voice,
The gain function is defined for each frequency band set and each frequency band, and the gain coefficient is defined to increase with an increase in SNR. A gain coefficient is determined for each frequency channel based on a gain function of the frequency band including a channel, the channel gain coefficient being determined based on a gain function of a frequency band whose range includes the frequency channel; and A noise suppressor that suppresses background noise of the voice signal, comprising means for adjusting the gain level of each frequency based on the channel gain coefficient.

15. The means for determining a gain factor determines the minimum gain factor for each of the frequency channels when the means for determining the presence of voice determines that no voice is present. 14 noise suppressors.

16. The noise suppressor of claim 14, wherein the gain function is implemented as a look-up table.

17. The noise suppressor of claim 14, wherein the gain function is a linear function with slope and y-intercept.

18. The user can select the y-intercept.
7 noise suppressor.

19. The noise suppressor of claim 17, wherein the y-intercept strip is adjustable based on the measured noise characteristics of the audio signal.

20. The noise suppressor of claim 17, wherein the slope is user selectable.

21. The noise suppressor of claim 17, wherein the slope is not adjustable based on the measured noise characteristics of the audio signal.

22. Means for generating an updated channel noise energy estimate for each frequency channel when the means for determining the presence of speech determines that there is no speech in the speech signal, 15. The noise suppressor of claim 14, wherein the updated channel noise energy estimate is provided to means for generating an SNR estimate for updating the channel SNR estimate.

23. The means for determining the presence of voice Means for determining the coding rate of one coding rate set of said speech signal; and Means for determining the presence of speech according to the coding rate; 15. The noise suppressor of claim 14, comprising:

24. The means for determining the presence of speech has an SNR for a predetermined second set of frequency channels of the speech signal.
24. The noise suppressor of claim 23, comprising: means for generating an estimate; the means for determining the presence of speech further determines according to the SNR estimate.

25. Means for determining the presence of speech by means for determining at least one mode measuring means characterizing the speech signal;
And a means for determining the presence of voice according to the at least one mode measuring means.

26. The means for determining the presence of speech comprises: for another predetermined set of frequency channels of the speech signal,
26. The noise suppressor of claim 25, comprising means for generating an SNR estimate; said means for determining the presence of speech further determines according to said SNR estimate.

27. The noise suppressor of claim 25, wherein the mode measuring means is a normalized autocorrelation function (NACF).

28. Converting the audio signal and displaying at the frequency of the audio signal; Determining the presence of audio in the audio signal; Signal pair of a predetermined frequency channel set of the frequency. Noise ratio (SNR)
Generating a voice function in the voice signal, determining the gain coefficient for each frequency channel, defining a gain function for each frequency band set and for each frequency band, and A gain coefficient is defined to increase with an increase, so that a gain coefficient for each frequency channel is determined based on a gain function of the frequency band including the frequency channel; A means for suppressing background noise of an audio signal, comprising: adjusting a gain level for each channel; and inversely converting the frequency for which the gain is adjusted to generate an audio signal in which noise is suppressed.

29. The means of claim 28, wherein a minimum gain factor is determined for each of the frequency channels when it is determined that there is no voice in the voice signal.

30. The means of claim 28, wherein the gain function is a linear function with slope and y-intercept.

31. The step of determining the presence of speech produces an updated channel noise energy estimate for each of the frequency channels when determining that speech is not present in the speech signal, 29. The means of claim 28, wherein the updated channel noise energy estimate is used to generate the channel SNR estimate.

32. The step of determining whether speech is present comprises channel SNR of a predetermined second set of frequency channels of the speech signal.
Generating an estimate; and according to the channel SNR estimate of the set of the second frequency channels,
29. The means of claim 28, comprising the step of: determining whether voice is present.

33. The step of determining the presence of speech determines the coding rate of one variable rate set of the speech signal; and the step of determining the presence of speech according to the coding rate. 33. The means of claim 32, comprising:

34. The step of determining the presence of voice comprises the step of determining at least one mode measuring means characterizing the voice signal; and the presence of voice according to the at least one mode measuring means. 33. The means of claim 32, further comprising: determining.

35. The means of claim 34, wherein the mode measuring means is a normalized autocorrelation function (NACF).