JP2000330597A

JP2000330597A - Noise suppressing device

Info

Publication number: JP2000330597A
Application number: JP11140236A
Authority: JP
Inventors: Takeshi Kawamura; 岳河村; Takeo Kanamori; 丈郎金森; Satoru Ibaraki; 悟茨木
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-05-20
Filing date: 1999-05-20
Publication date: 2000-11-30

Abstract

PROBLEM TO BE SOLVED: To provide a robust noise suppressing device which can take out clear speech against variations in noise power and spectrum. SOLUTION: A suppression coefficient a corresponding to two parameters of an S/N and frequency components is stored in a suppression coefficient data table 13. When a speech signal containing noise is inputted it is converted into signal in a frequency domain by a spectrum converting means 11. The S/N presuming means 12 presumes an S/N from the noise spectrum. A suppression amount presuming means 14 takes out a desired suppression coefficient from the suppression coefficient data table 13, and performs presumption of the noise spectrum to be suppressed. A noise suppressing means 15 suppresses the noise components by using the presumed noise spectrum to the output of the spectrum converting means 11.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、白色性の強い定常
雑音や、パワ周波数特性が徐々に変化していく非定常な
雑音を含む信号から、音声信号のみを明瞭に取り出すこ
とのできる雑音抑圧装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a noise suppression device capable of clearly extracting only an audio signal from a signal containing stationary noise having a strong whiteness or non-stationary noise whose power frequency characteristic gradually changes. It concerns the device.

【０００２】[0002]

【従来の技術】定常雑音を抑圧するための雑音抑圧方式
として、ウィーナーフィルターやスペクトルサブトラク
ション法などが知られている。いずれの抑圧方式におい
ても、雑音抑圧量を調整するため、推定したノイズスペ
クトルに倍率変数である抑圧係数αをかけあわせるのが
一般的である。ウィーナーフィルターの機能は（１）式
で表される。2. Description of the Related Art Wiener filters and spectral subtraction methods are known as noise suppression methods for suppressing stationary noise. In either suppression method, in order to adjust the amount of noise suppression, it is general to multiply the estimated noise spectrum by a suppression coefficient α which is a scaling variable. The function of the Wiener filter is expressed by equation (1).

【数１】なおＳ（ω）は音声スペクトル、Ｘ（ω）は入力信号ス
ペクトル、Ｎ（ω）はノイズスペクトルを表し、スペク
トル信号の記号上に冠した∧（ハット）は、推定の意味
を表す。また‖ ‖は周波数領域の信号の絶対値を示す
記号とする。(Equation 1) Note that S (ω) represents a voice spectrum, X (ω) represents an input signal spectrum, N (ω) represents a noise spectrum, and ∧ (hat) on the symbol of the spectrum signal represents the meaning of estimation. Also, {} is a symbol indicating the absolute value of the signal in the frequency domain.

【０００３】このような方式を採用する従来の雑音抑圧
装置の構成について、図１５〜１７を用いて説明する。
図１５は従来の一般的なウィーナーフィルタを用いた雑
音抑圧装置の構成図である。この雑音抑圧装置は、入力
信号を即時に周波数分析するＦＦＴ２１、周波数軸上の
入力信号をパワスペクトルに変換するパワスペクトル変
換手段２２、雑音抑圧量をスペクトル成分別に推定する
抑圧量推定手段１４、抑圧量推定手段１４で推定した雑
音抑圧量から、フィルタの諸変数を推定するウィーナー
フィルタ推定手段２３、フィルタ係数を具体的に導出す
るフィルタ係数導出手段２４、求めたフィルタ係数を用
いて入力信号と演算することにより、雑音を抑圧するフ
ィルタリング演算手段２５を含んで構成される。[0005] The configuration of a conventional noise suppression device employing such a system will be described with reference to FIGS.
FIG. 15 is a configuration diagram of a conventional noise suppression device using a general Wiener filter. This noise suppression device includes an FFT 21 for immediately analyzing the frequency of an input signal, a power spectrum conversion unit 22 for converting an input signal on a frequency axis into a power spectrum, a suppression amount estimation unit 14 for estimating a noise suppression amount for each spectrum component, A Wiener filter estimating unit 23 for estimating various filter parameters from the noise suppression amount estimated by the amount estimating unit 14, a filter coefficient deriving unit 24 for specifically deriving the filter coefficient, and calculating the input signal using the obtained filter coefficient. By doing so, it is configured to include the filtering operation means 25 for suppressing noise.

【０００４】具体的な例としては、ウィーナーフィルタ
推定手段２３において、抑圧係数αは一定値であり、ウ
ィーナーフィルタの特性Ｈ（ω）は（２）式で表現され
る。As a specific example, in the Wiener filter estimating means 23, the suppression coefficient α is a constant value, and the characteristic H (ω) of the Wiener filter is expressed by equation (2).

【数２】 (Equation 2)

【０００５】またフィルタ係数導出手段２４では、例え
ばＩＦＦＴによりフィルタ係数を求めるものとする。フ
ィルタリング演算手段２５においては、フィルタ係数導
出手段２４から得られたフィルタ係数を用いて時間領域
上の入力信号Ｘ（ｔ）に対して畳み込み演算を行う。The filter coefficient deriving means 24 obtains a filter coefficient by, for example, IFFT. The filtering operation unit 25 performs a convolution operation on the input signal X (t) in the time domain using the filter coefficient obtained from the filter coefficient derivation unit 24.

【０００６】図１６は、図１５の構成で抑圧係数α＝１
に固定した場合の雑音抑圧効果を示す特性図である。具
体的には、５秒間の平均音声スペクトルをＳ（ω）と
し、音声と雑音をＳ／Ｎ比５ｄＢで混合し、装置に入力
する入力信号スペクトルをＸ１（ω）とし、入力信号を
図１５の雑音抑圧装置で抑圧した後の信号スペクトルＸ
２（ω）とするとき、抑圧前のスペクトル距離Ｘ１
（ω）−Ｓ（ω）と抑圧後のスペクトル距離Ｘ２（ω）
−Ｓ（ω）を夫々グラフに示したものである。スペクト
ル距離が０ｄＢに近いほど、雑音を抑圧して音声を忠実
に再現していることを意味する。この図から、従来方式
は周波数帯域によらず、つねに約１０ｄＢほどの雑音抑
圧効果が得られることがわかる。このように、全周波数
帯域において一定量の雑音を抑圧することができる。FIG. 16 shows a configuration of FIG. 15 in which the suppression coefficient α = 1
FIG. 9 is a characteristic diagram showing a noise suppression effect when the frequency is fixed to. Specifically, the average voice spectrum for 5 seconds is S (ω), voice and noise are mixed at an S / N ratio of 5 dB, the input signal spectrum input to the device is X1 (ω), and the input signal is FIG. Signal spectrum X after suppression by the noise suppression device of FIG.
2 (ω), the spectral distance X1 before suppression
(Ω) −S (ω) and spectral distance X2 (ω) after suppression
−S (ω) is shown in a graph. As the spectral distance is closer to 0 dB, it means that the noise is suppressed and the voice is faithfully reproduced. From this figure, it can be seen that the conventional method can always obtain a noise suppression effect of about 10 dB regardless of the frequency band. Thus, a certain amount of noise can be suppressed in all frequency bands.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上記の
ような構成の雑音抑圧装置では、抽出目的の音声レベル
や雑音レベルが変化したときに、雑音成分を効率よく抑
圧できるとはいえない。図１６において、例えば４００
Ｈｚ以上の周波数帯域で、雑音抑圧後のスペクトル距離
が負になってしまっている。これは推定に用いるノイズ
スペクトルＮ（ω）を必要以上に多く引きすぎたためで
あり、音声のスペクトル歪みが生じてしまう。特に音声
認識機能では、音声のスペクトル歪みがあると、認識率
が低下する。このような問題に対する対処法としては、
抑圧係数αの値を小さくする、即ち雑音を軽く引くこと
でスペクトル歪みをなくす方法がある。しかし、抑圧係
数αを小さくすると、聴感上雑音感が高まり、抑圧後の
Ｓ／Ｎ比も悪くなる。即ち全体的に雑音抑圧効果が低下
してしまうという矛盾が生じる。However, the noise suppressor having the above-mentioned structure cannot be said to be able to efficiently suppress the noise component when the audio level or noise level to be extracted changes. In FIG. 16, for example, 400
In the frequency band above Hz, the spectral distance after noise suppression is negative. This is because the noise spectrum N (ω) used for the estimation is pulled too much more than necessary, resulting in speech spectrum distortion. In particular, in the speech recognition function, if there is a spectrum distortion of the speech, the recognition rate decreases. To address these issues,
There is a method of reducing the value of the suppression coefficient α, that is, lightly reducing noise to eliminate spectral distortion. However, when the suppression coefficient α is reduced, the sense of noise increases in terms of audibility, and the S / N ratio after suppression also deteriorates. That is, there is a contradiction that the noise suppression effect is reduced as a whole.

【０００８】このように効率よく抑圧できない理由とし
ては、周波数成分によって抑圧すべき値が変化すること
や、図１６中のいくつかの周波数で見られるようなスペ
クトルの谷間、即ちＳ／Ｎ比の良い周波数でも抑圧が働
いてしまうことなどが挙げられる。[0008] The reason why the suppression cannot be performed efficiently is that the value to be suppressed changes depending on the frequency component, or the valley of the spectrum seen at several frequencies in FIG. For example, suppression works even at a good frequency.

【０００９】図１７は、異なるＳ／Ｎ比で音声とノイズ
を混合した３種類のデータに対して、従来方式で雑音抑
圧を行った場合に得られる夫々のスペクトル距離を示す
特性図であり、ここでは抑圧係数をα＝０．５としてい
る。雑音抑圧後のスペクトル距離は負にはならなくなっ
たが、いずれのＳ／Ｎ比においてもスペクトル距離が０
ｄＢ付近とならず、Ｓ／Ｎ比によっては効率良く雑音が
抑圧されないことが判る。即ち、Ｓ／Ｎ比に応じて雑音
抑圧量を調整する必要のあることがこの図から判る。FIG. 17 is a characteristic diagram showing respective spectral distances obtained when noise suppression is performed by a conventional method on three types of data in which speech and noise are mixed at different S / N ratios. Here, the suppression coefficient is α = 0.5. Although the spectral distance after noise suppression is no longer negative, the spectral distance is 0 at any S / N ratio.
It can be seen that noise is not efficiently suppressed depending on the S / N ratio because the noise does not reach around dB. That is, it can be seen from this figure that it is necessary to adjust the noise suppression amount according to the S / N ratio.

【００１０】本発明は、このような従来の問題点に鑑み
てなされたものであって、Ｓ／Ｎ比によって抑圧量を最
適に変化させる雑音抑圧方式を実現すると共に、雑音感
をなくし、かつ音声劣化を引き起こすことなく所望の音
声を取り出すことのできる雑音抑圧装置を提供すること
を目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of such a conventional problem, and realizes a noise suppression system in which the amount of suppression is optimally changed according to the S / N ratio, and eliminates noise. An object of the present invention is to provide a noise suppression device capable of extracting a desired sound without causing sound deterioration.

【００１１】[0011]

【課題を解決するための手段】上記の課題を解決するた
めに、本願の請求項１の発明は、ノイズを含む入力音声
信号を時間領域から周波数領域に変換するスペクトル変
換手段と、前記スペクトル変換手段の変換出力から前記
入力音声信号のＳ／Ｎ比を推定するＳ／Ｎ推定手段と、
雑音の抑圧量を制御する抑圧係数が複数個格納される抑
圧係数データテーブルと、前記Ｓ／Ｎ推定手段で推定さ
れたＳ／Ｎ比に基づいて前記抑圧係数データテーブルか
ら所望の抑圧係数を取り出し、抑圧すべきノイズスペク
トルの推定を行う抑圧量推定手段と、前記抑圧量推定手
段で推定された抑圧ノイズスペクトルに基づいて、前記
スペクトル変換手段の変換出力に含まれるノイズ成分を
抑圧する雑音抑圧手段と、を具備することを特徴とする
ものである。In order to solve the above-mentioned problems, the invention of claim 1 of the present application provides a spectrum conversion means for converting an input speech signal containing noise from a time domain to a frequency domain, and the spectrum conversion means. S / N estimating means for estimating the S / N ratio of the input audio signal from the converted output of the means;
A desired suppression coefficient is extracted from the suppression coefficient data table based on the S / N ratio estimated by the S / N estimation means, wherein the suppression coefficient data table stores a plurality of suppression coefficients for controlling the amount of noise suppression. A suppression amount estimating unit for estimating a noise spectrum to be suppressed; and a noise suppressing unit for suppressing a noise component included in a conversion output of the spectrum converting unit based on the suppression noise spectrum estimated by the suppression amount estimating unit. And characterized in that:

【００１２】本願の請求項２の発明は、ノイズを含む入
力音声信号を一定サンプル数単位に区切り、フレーム変
換して音声データを出力するフレーム処理手段と、前記
フレーム処理手段の出力する音声データに対して周波数
スペクトルを求めるスペクトル変換手段と、前記フレー
ム処理手段の出力する音声データから、フレーム単位で
ノイズらしさや音声らしさの度合いを示すｎ段階の評価
指数を生成する入力信号種類判定手段と、前記スペクト
ル変換手段の出力をパワスペクトルに変換するパワスペ
クトル変換手段と、前記パワスペクトル変換手段からの
出力と前記入力信号種類判定手段からの出力とに基づい
て、ノイズスペクトルを学習するノイズスペクトル学習
手段と、前記パワスペクトル変換手段からの出力と前記
ノイズスペクトル学習手段からの出力とに基づいて、入
力音声信号の各周波数成分ごとのＳ／Ｎ比を推定するス
ペクトル成分別Ｓ／Ｎ推定手段と、入力音声信号のＳ／
Ｎ比と周波数成分との２つを変数として、雑音の抑圧量
を制御する抑圧係数が複数個格納された抑圧係数データ
テーブルと、前記スペクトル成分別Ｓ／Ｎ推定手段から
出力される各周波数成分ごとのＳ／Ｎ比に基づいて、前
記抑圧係数データテーブルから所望の抑圧係数を取り出
す抑圧係数決定手段と、前記ノイズスペクトル学習手段
から出力されるノイズスペクトルを参照し、前記抑圧係
数決定手段で決定された抑圧係数を用いて、抑圧すべき
ノイズスペクトルの推定を行う抑圧量推定手段と、前記
抑圧量推定手段で推定された抑圧ノイズスペクトル、及
び前記パワスペクトル変換手段から出力されるパワスペ
クトルに基づいて、前記入力音声信号に含まれるノイズ
成分を抑圧する雑音抑圧手段と、を具備することを特徴
とするものである。According to a second aspect of the present invention, there is provided a frame processing means for dividing an input audio signal containing noise into a unit of a fixed number of samples, performing frame conversion and outputting audio data, and converting the audio data output from the frame processing means into audio data. Spectrum conversion means for obtaining a frequency spectrum for the input signal type determination means for generating an n-level evaluation index indicating the degree of noise likeness or sound likeness on a frame basis from audio data output from the frame processing means, Power spectrum conversion means for converting the output of the spectrum conversion means into a power spectrum, and a noise spectrum learning means for learning a noise spectrum based on an output from the power spectrum conversion means and an output from the input signal type determination means. , The output from the power spectrum conversion means and the noise spectrum Based on the output from the learning means, a spectral component by S / N estimation means for estimating the S / N ratio for each frequency component of the input audio signal, the input audio signal S /
A suppression coefficient data table in which a plurality of suppression coefficients for controlling the amount of noise suppression are stored by using two of the N ratio and the frequency component as variables, and each frequency component output from the S / N estimating means for each spectrum component. A suppression coefficient determining unit that retrieves a desired suppression coefficient from the suppression coefficient data table based on the S / N ratio of each noise, and a noise spectrum that is output from the noise spectrum learning unit, and is determined by the suppression coefficient determination unit. A suppression amount estimating unit for estimating a noise spectrum to be suppressed using the suppressed coefficient, a suppression noise spectrum estimated by the suppression amount estimating unit, and a power spectrum output from the power spectrum converting unit. Noise suppression means for suppressing a noise component included in the input audio signal.

【００１３】本願の請求項３の発明は、ノイズを含む入
力音声信号を一定サンプル数単位に区切り、フレーム変
換して音声データを出力するフレーム処理手段と、前記
フレーム処理手段の出力する音声データに対して周波数
スペクトルを求めるスペクトル変換手段と、前記フレー
ム処理手段の出力する音声データから、フレーム単位で
ノイズらしさや音声らしさの度合いを示すｎ段階の評価
指数を生成する入力信号種類判定手段と、前記スペクト
ル変換手段の出力をパワスペクトルに変換するパワスペ
クトル変換手段と、前記パワスペクトル変換手段からの
出力と前記入力信号種類判定手段からの出力とに基づい
て、ノイズスペクトルを学習するノイズスペクトル学習
手段と、前記評価指数を変数とし、雑音の抑圧量を制御
する抑圧係数が複数個格納された抑圧係数データテーブ
ルと、前記入力信号種類判定手段から出力される評価指
数に基づいて、前記抑圧係数データテーブルから所望の
抑圧係数を取り出す抑圧係数決定手段と、前記ノイズス
ペクトル学習手段から出力されるノイズスペクトルを参
照し、前記抑圧係数決定手段で決定された抑圧係数を用
いて、抑圧すべきノイズスペクトルの推定を行う抑圧量
推定手段と、前記抑圧量推定手段で推定された抑圧ノイ
ズスペクトル、及び前記パワスペクトル変換手段から出
力されるパワスペクトルに基づいて、前記入力音声信号
に含まれるノイズ成分を抑圧する雑音抑圧手段と、を具
備することを特徴とするものである。According to a third aspect of the present invention, there is provided a frame processing means for dividing an input audio signal including noise into a unit of a fixed number of samples, performing frame conversion and outputting audio data, and converting the audio data output from the frame processing means into audio data. Spectrum conversion means for obtaining a frequency spectrum for the input signal type determination means for generating an n-level evaluation index indicating the degree of noise likeness or sound likeness on a frame basis from audio data output from the frame processing means, Power spectrum conversion means for converting the output of the spectrum conversion means into a power spectrum, and a noise spectrum learning means for learning a noise spectrum based on an output from the power spectrum conversion means and an output from the input signal type determination means. Using the evaluation index as a variable, the suppression coefficient for controlling the amount of noise suppression is duplicated. The number of stored suppression coefficient data tables and the suppression coefficient determination means for extracting a desired suppression coefficient from the suppression coefficient data table based on the evaluation index output from the input signal type determination means, and the noise spectrum learning means A suppression amount estimating unit for estimating a noise spectrum to be suppressed by referring to the noise spectrum to be output and using the suppression coefficient determined by the suppression coefficient determining unit; and a suppression noise estimated by the suppression amount estimating unit. A noise suppression unit that suppresses a noise component included in the input voice signal based on a spectrum and a power spectrum output from the power spectrum conversion unit.

【００１４】本願の請求項４の発明は、ノイズを含む入
力音声信号を一定サンプル数単位に区切り、フレーム変
換して音声データを出力するフレーム処理手段と、前記
フレーム処理手段の出力する音声データに対して周波数
スペクトルを求めるスペクトル変換手段と、前記フレー
ム処理手段の出力する音声データから、フレーム単位で
ノイズらしさや音声らしさの度合いを示すｎ段階の評価
指数を生成する入力信号種類判定手段と、前記スペクト
ル変換手段の出力をパワスペクトルに変換するパワスペ
クトル変換手段と、前記パワスペクトル変換手段からの
出力と前記入力信号種類判定手段からの出力とに基づい
て、ノイズスペクトルを学習するノイズスペクトル学習
手段と、前記パワスペクトル変換手段からの出力と前記
ノイズスペクトル学習手段からの出力とに基づいて、入
力音声信号の各周波数成分ごとのＳ／Ｎ比を推定するス
ペクトル成分別Ｓ／Ｎ推定手段と、前記ｎ段階の評価指
数、入力音声信号のＳ／Ｎ比、周波数成分の３つを変数
として、雑音の抑圧量を制御する抑圧係数が複数個格納
された抑圧係数データテーブルと、前記スペクトル成分
別Ｓ／Ｎ推定手段から出力される各周波数成分ごとのＳ
／Ｎ比、及び前記入力信号種類判定手段から出力される
評価指数に基づいて、前記抑圧係数データテーブルから
所望の抑圧係数を取り出す抑圧係数決定手段と、前記ノ
イズスペクトル学習手段から出力されるノイズスペクト
ルを参照し、前記抑圧係数決定手段で決定された抑圧係
数を用いて、抑圧すべきノイズスペクトルの推定を行う
抑圧量推定手段と、前記抑圧量推定手段で推定された抑
圧ノイズスペクトル、及び前記パワスペクトル変換手段
から出力されるパワスペクトルに基づいて、前記入力音
声信号に含まれるノイズ成分を抑圧する雑音抑圧手段
と、を具備することを特徴とするものである。According to a fourth aspect of the present invention, there is provided a frame processing means for dividing an input audio signal containing noise into a predetermined number of samples, converting the data into frames, and outputting audio data; Spectrum conversion means for obtaining a frequency spectrum for the input signal type determination means for generating an n-level evaluation index indicating the degree of noise likeness or sound likeness on a frame basis from audio data output from the frame processing means, Power spectrum conversion means for converting the output of the spectrum conversion means into a power spectrum, and a noise spectrum learning means for learning a noise spectrum based on an output from the power spectrum conversion means and an output from the input signal type determination means. , The output from the power spectrum conversion means and the noise spectrum S / N estimating means for each spectral component for estimating the S / N ratio for each frequency component of the input audio signal based on the output from the learning means, the n-stage evaluation index, and the S / N of the input audio signal A suppression coefficient data table in which a plurality of suppression coefficients for controlling the amount of noise suppression are stored with three ratios and frequency components as variables; and a frequency component for each frequency component output from the S / N estimating means for each spectrum component. S
/ N ratio and a suppression coefficient determination unit for extracting a desired suppression coefficient from the suppression coefficient data table based on the evaluation index output from the input signal type determination unit, and a noise spectrum output from the noise spectrum learning unit. And a suppression amount estimating unit for estimating a noise spectrum to be suppressed using the suppression coefficient determined by the suppression coefficient determining unit; a suppression noise spectrum estimated by the suppression amount estimating unit; And a noise suppression unit for suppressing a noise component included in the input audio signal based on a power spectrum output from the spectrum conversion unit.

【００１５】本願の請求項５の発明は、ノイズを含む入
力音声信号を一定サンプル数単位に区切り、フレーム変
換して音声データを出力するフレーム処理手段と、前記
フレーム処理手段の出力する音声データに対して周波数
スペクトルを求めるスペクトル変換手段と、前記スペク
トル変換手段の出力をパワスペクトルに変換するパワス
ペクトル変換手段と、前記パワスペクトル変換手段の出
力するパワスペクトルのデータから、フレーム単位でノ
イズらしさや音声らしさの度合いを示すｎ段階の評価指
数を各周波数成分ごとに生成するスペクトル成分別入力
信号種類判定手段と、前記パワスペクトル変換手段から
の出力と前記スペクトル成分別入力信号種類判定手段か
らの出力とに基づいて、ノイズスペクトルを学習するノ
イズスペクトル学習手段と、前記評価指数と周波数成分
との２つを変数として、雑音の抑圧量を制御する抑圧係
数が複数個格納された抑圧係数データテーブルと、前記
スペクトル成分入力信号種類判定手段から出力される各
周波数成分ごとの評価指数に基づいて、前記抑圧係数デ
ータテーブルから所望の抑圧係数を取り出す抑圧係数決
定手段と、前記ノイズスペクトル学習手段から出力され
るノイズスペクトルを参照し、前記抑圧係数決定手段で
決定された抑圧係数を用いて抑圧すべきノイズスペクト
ルの推定を行う抑圧量推定手段と、前記抑圧量推定手段
で推定された抑圧ノイズスペクトル、及び前記パワスペ
クトル変換手段から出力されるパワスペクトルに基づい
て、前記入力音声信号に含まれるノイズ成分を抑圧する
雑音抑圧手段と、を具備することを特徴とするものであ
る。According to a fifth aspect of the present invention, there is provided a frame processing means for dividing an input audio signal containing noise into a unit of a fixed number of samples, performing frame conversion and outputting audio data, and converting the audio data output from the frame processing means into audio data. A spectrum converting means for obtaining a frequency spectrum, a power spectrum converting means for converting an output of the spectrum converting means into a power spectrum, and noise likeness or voice in frame units from data of the power spectrum output from the power spectrum converting means. An input signal type determining unit for each spectrum component that generates an n-level evaluation index indicating the degree of likelihood for each frequency component, an output from the power spectrum converting unit, and an output from the input signal type determining unit for each spectral component. Noise spectroscopy to learn the noise spectrum based on Means, a suppression coefficient data table storing a plurality of suppression coefficients for controlling the amount of noise suppression using two of the evaluation index and the frequency component as variables, and output from the spectrum component input signal type determination means. On the basis of the evaluation index for each frequency component, a suppression coefficient determination unit that extracts a desired suppression coefficient from the suppression coefficient data table, and a noise spectrum output from the noise spectrum learning unit, and the suppression coefficient determination unit A suppression amount estimating unit for estimating a noise spectrum to be suppressed using the determined suppression coefficient; a suppression noise spectrum estimated by the suppression amount estimating unit; and a power spectrum output from the power spectrum converting unit. Noise suppression means for suppressing a noise component included in the input audio signal. It is an butterfly.

【００１６】本願の請求項６の発明は、ノイズを含む入
力音声信号を一定サンプル数単位に区切り、フレーム変
換して音声データを出力するフレーム処理手段と、前記
フレーム処理手段の出力する音声データに対して周波数
スペクトルを求めるスペクトル変換手段と、前記スペク
トル変換手段の出力をパワスペクトルに変換するパワス
ペクトル変換手段と、前記パワスペクトル変換手段の出
力するパワスペクトルのデータから、フレーム単位でノ
イズらしさや音声らしさの度合いを示すｎ段階の評価指
数を各周波数成分ごとに生成するスペクトル成分別入力
信号種類判定手段と、前記パワスペクトル変換手段から
の出力と前記スペクトル成分別入力信号種類判定手段か
らの出力とに基づいて、ノイズスペクトルを学習するノ
イズスペクトル学習手段と、前記パワスペクトル変換手
段から出力されるフレーム内の入力信号のパワスペクト
ルと前記ノイズスペクトル学習手段から出力されるフレ
ーム内の推定ノイズのパワスペクトルとを用い、フレー
ム内のＳ／Ｎ比の代表値又はスペクトル成分別のＳ／Ｎ
比を推定するスペクトル成分別Ｓ／Ｎ推定手段と、前記
Ｓ／Ｎ比、前記評価指数、周波数成分の３つを変数とし
て、雑音の抑圧量を制御する抑圧係数が格納された抑圧
係数データテーブルと、前記スペクトル成分別Ｓ／Ｎ推
定手段から出力されるＳ／Ｎ比、及び前記スペクトル成
分別入力信号判定手段から出力される各周波数成分ごと
の評価指数に基づいて、前記抑圧係数データテーブルか
ら所望の抑圧係数を取り出す抑圧係数決定手段と、前記
ノイズスペクトル学習手段から出力されるノイズスペク
トルを参照し、前記抑圧係数決定手段で決定された抑圧
係数を用いて、抑圧すべきノイズスペクトルの推定を行
う抑圧量推定手段と、前記抑圧量推定手段で推定された
抑圧ノイズスペクトル、及び前記パワスペクトル変換手
段から出力されるパワスペクトルに基づいて、前記入力
音声信号に含まれるノイズ成分を抑圧する雑音抑圧手段
と、を具備することを特徴とするものである。According to a sixth aspect of the present invention, there is provided a frame processing means for dividing an input audio signal containing noise into a unit of a fixed number of samples, performing frame conversion and outputting audio data, and converting the audio data output from the frame processing means into audio data. A spectrum converting means for obtaining a frequency spectrum, a power spectrum converting means for converting an output of the spectrum converting means into a power spectrum, and noise likeness or voice in frame units from data of the power spectrum output from the power spectrum converting means. An input signal type determining unit for each spectrum component that generates an n-level evaluation index indicating the degree of likelihood for each frequency component, an output from the power spectrum converting unit, and an output from the input signal type determining unit for each spectral component. Noise spectroscopy to learn the noise spectrum based on Means, and a power spectrum of an input signal in a frame output from the power spectrum conversion means and a power spectrum of an estimated noise in the frame output from the noise spectrum learning means, and an S / N ratio in the frame is calculated. S / N by representative value or spectral component
S / N estimating means for each spectral component for estimating a ratio, and a suppression coefficient data table storing a suppression coefficient for controlling the amount of noise suppression using the S / N ratio, the evaluation index, and the frequency component as variables. From the suppression coefficient data table based on the S / N ratio output from the S / N estimating means for each spectral component and the evaluation index for each frequency component output from the input signal determining means for each spectral component. With reference to a suppression coefficient determination unit that extracts a desired suppression coefficient and a noise spectrum output from the noise spectrum learning unit, estimation of a noise spectrum to be suppressed is performed using the suppression coefficient determined by the suppression coefficient determination unit. Suppression amount estimating means to be performed, a suppression noise spectrum estimated by the suppression amount estimating means, and output from the power spectrum converting means. Based on the word spectrum and is characterized by comprising a noise suppression means for suppressing a noise component contained in the input speech signal.

【００１７】本願の請求項７の発明は、予め時間領域の
音声データとして一定フレーム長の擬似音声データを蓄
積する擬似音声データ収録手段と、入力音声信号が雑音
と判定されたときのみ、前記擬似音声データ収録手段か
ら出力される擬似音声データと前記入力音声信号とを異
なる複数のＳ／Ｎ比で混合して蓄積する音声混合手段
と、前記音声混合手段から出力され、複数のＳ／Ｎ比で
混合された音声データの雑音抑圧を行う第１の雑音抑圧
手段と、前記第１の雑音抑圧手段から出力された音声デ
ータのスペクトルと前記擬似音声データのスペクトルと
の比較を行うスペクトル比較手段と、前記スペクトル比
較手段の出力する比較結果に基づき、雑音の抑圧量を制
御する抑圧係数の更新を行い、更新された抑圧係数を記
憶する抑圧係数データテーブルと、前記抑圧係数データ
テーブルから抑圧係数を取り出し、入力音声信号の雑音
抑圧を行う第２の雑音抑圧手段と、を具備することを特
徴とするものである。According to a seventh aspect of the present invention, there is provided a pseudo audio data recording means for storing pseudo audio data having a fixed frame length in advance as audio data in a time domain, and the pseudo audio data recording means is provided only when an input audio signal is determined to be noise. Audio mixing means for mixing pseudo audio data output from audio data recording means and the input audio signal at a plurality of different S / N ratios and storing the mixed data; and a plurality of S / N ratios output from the audio mixing means. First noise suppressing means for performing noise suppression on the voice data mixed in the first and second, and spectrum comparing means for comparing the spectrum of the voice data output from the first noise suppressing means with the spectrum of the pseudo voice data. A suppression coefficient that controls the amount of noise suppression based on the comparison result output from the spectrum comparison means, and stores the updated suppression coefficient. A table, the removed suppression coefficient from the suppression coefficient data table, and the second noise suppression unit for performing noise suppression of the input audio signal, is characterized in that it comprises a.

【００１８】本願の請求項８の発明は、請求項７記載の
雑音抑圧装置の第１及び第２の雑音抑圧手段を、請求項
１〜６のいずれかに記載の雑音抑圧装置から、抑圧係数
データテーブルを取り除いた構成としたものである。According to the invention of claim 8 of the present application, the first and second noise suppression means of the noise suppression device according to claim 7 are replaced with a suppression coefficient from the noise suppression device according to any one of claims 1 to 6. The configuration is such that the data table is removed.

【００１９】特に請求項１の構成によれば、抑圧係数の
データテーブルをもち、入力信号の周波数成分ごとのＳ
／Ｎ比を推定して抑圧係数を決定するという処理を行う
ことで、入力信号レベルとノイズレベルの音質やレベル
変化に対応した雑音抑圧を行うことができる。In particular, according to the configuration of the first aspect, the data table of the suppression coefficient is provided, and the S table for each frequency component of the input signal is provided.
By performing the processing of estimating the / N ratio and determining the suppression coefficient, it is possible to perform noise suppression corresponding to the sound quality of the input signal level and the noise level and the level change.

【００２０】特に請求項２の構成によれば、フレームご
とにノイズと音声を区別し、ノイズと判定されたときの
みノイズスペクトルを学習し、現フレームの入力信号パ
ワスペクトルから、周波数成分ごとの現フレームのＳ／
Ｎ比を推定して抑圧係数を決定するという処理を行うこ
とで、抑圧量の調整が可能となり、抑圧の精度が向上す
る。In particular, according to the configuration of the second aspect, noise and speech are distinguished for each frame, the noise spectrum is learned only when the noise is determined, and the current signal for each frequency component is obtained from the input signal power spectrum of the current frame. S / of frame
By performing the process of estimating the N ratio and determining the suppression coefficient, the suppression amount can be adjusted, and the accuracy of suppression is improved.

【００２１】特に請求項３の構成によれば、フレームご
とにノイズと音声を区別し、その区別により抑圧係数を
決定するという処理を行うことで、ノイズと判定された
フレームのときには無条件に大きく抑圧し、音声と判定
されたフレームのときは音声のみを取り出すよう抑圧を
かけることが可能となり、抑圧の効率がよくなる。In particular, according to the configuration of the third aspect, by performing processing of discriminating noise and speech for each frame and determining the suppression coefficient based on the discrimination, the noise is unconditionally increased when the frame is determined to be noise. When a frame is determined to be speech and is determined to be speech, suppression can be performed so that only speech is extracted, and the efficiency of suppression is improved.

【００２２】特に請求項４の構成によれば、請求項２と
３の効果を同時に得ることができる。In particular, according to the structure of the fourth aspect, the effects of the second and third aspects can be simultaneously obtained.

【００２３】特に請求項５の構成によれば、フレーム内
の周波数成分ごとにノイズと音声を区別し、その区別に
より抑圧係数を決定するという処理を行うことで、ノイ
ズの周波数成分のときには、無条件に大きく抑圧し、音
声の周波数成分のときは音声のみを取り出すよう抑圧を
かけることが可能となり、更に精度良く抑圧が行える。In particular, according to the configuration of claim 5, noise and speech are distinguished for each frequency component in the frame, and a process of determining a suppression coefficient based on the distinction is performed. It is possible to greatly suppress the condition and to suppress only the voice when the frequency component is a voice, so that the suppression can be performed more accurately.

【００２４】特に請求項６の構成によれば、請求項２と
５の効果を同時に得ることができる。In particular, according to the structure of claim 6, the effects of claims 2 and 5 can be simultaneously obtained.

【００２５】特に請求項７，８の構成によれば、抑圧係
数のデータテーブルを常に学習により更新することで、
より適応能力に優れた雑音抑圧を行うことができる。In particular, according to the configuration of the seventh and eighth aspects, the data table of the suppression coefficient is constantly updated by learning,
Noise suppression with better adaptability can be performed.

【００２６】[0026]

【発明の実施の形態】本発明の雑音抑圧装置は、入力信
号のＳ／Ｎ比と周波数特性に応じて、推測する雑音抑圧
成分を適応的に変化させることにより、雑音感をなくし
て音声スペクトルの劣化を防ぎつつ、明瞭な音声を抽出
するものである。これにより、Ｓ／Ｎ比の低い場合でも
雑音抑圧能力を高め、不快な音を残さずに音声スペクト
ル成分のみを抽出する。更に、音声スペクトルの劣化に
より大きく認識率が下がる音声認識装置に対して本発明
の雑音抑圧装置を設けることにより、音声の認識率を向
上させることができる。以下、本発明の各実施の形態に
ついて図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A noise suppression apparatus according to the present invention eliminates a sense of noise by adaptively changing a noise suppression component to be estimated according to an S / N ratio and a frequency characteristic of an input signal. This is to extract clear voice while preventing deterioration of the sound. As a result, even when the S / N ratio is low, the noise suppression capability is enhanced, and only the speech spectrum components are extracted without leaving unpleasant sounds. Furthermore, by providing the noise suppression device of the present invention for a speech recognition device whose recognition rate is greatly reduced due to deterioration of the speech spectrum, the speech recognition rate can be improved. Hereinafter, embodiments of the present invention will be described with reference to the drawings.

【００２７】（実施の形態１）本発明の実施の形態１に
おける雑音抑圧装置について、図１〜５を参照しながら
説明する。図１は本実施の形態１における雑音抑圧装置
の基本構成を示すブロック図である。この雑音抑圧装置
は、スペクトル変換手段１１、Ｓ／Ｎ推定手段１２、抑
圧係数データテーブル１３、抑圧量推定手段１４、雑音
抑圧手段１５を含んで構成される。(Embodiment 1) A noise suppression apparatus according to Embodiment 1 of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a basic configuration of the noise suppression device according to the first embodiment. This noise suppression device includes a spectrum conversion unit 11, an S / N estimation unit 12, a suppression coefficient data table 13, a suppression amount estimation unit 14, and a noise suppression unit 15.

【００２８】入力信号には、音声と定常雑音、又は音声
とゆるやかに変化していく非定常雑音が含まれていると
する。例えば車両に搭載された音声応答機器である音声
応答カーナビゲーション装置に対してドライバーが音声
を発したとする。このときロードノイズがドライバーの
音声に混入されるので、音声信号のＳ／Ｎ比は悪くな
る。以下の実施の形態は、このような入力信号が雑音抑
圧装置に入力されるとする。It is assumed that the input signal contains speech and stationary noise, or non-stationary noise that gradually changes from speech. For example, it is assumed that a driver utters a voice to a voice response car navigation device which is a voice response device mounted on a vehicle. At this time, since the road noise is mixed into the voice of the driver, the S / N ratio of the voice signal is deteriorated. In the following embodiment, it is assumed that such an input signal is input to the noise suppression device.

【００２９】スペクトル変換手段１１は音声と雑音（非
音声）を含む入力信号を時間領域上から周波数領域上に
変換する変換手段である。Ｓ／Ｎ推定手段１２は、周波
数領域上の入力信号のスペクトル分布、入力信号の振幅
の時間変化又は低振幅成分と高振幅成分の時間変化等を
解析することにより、音声と非音声を識別し、Ｓ／Ｎ比
を推定する推定手段である。音声と雑音の識別は技術的
には難しく、多数の話者中の特定の話者の発音を識別し
たり、発声中に突発性の非定常雑音が生じたとき、話者
の音声を認識するのは特に困難である。ここでは前述し
た環境を対象にしているので、Ｓ／Ｎ比の推定は可能で
ある。推定ノイズスペクトルをＮ（ω）とし、入力信号
のスペクトルをＸ（ω）とすると、例えば次の（３）式
を用いてＳ／Ｎ比を推定する。The spectrum converting means 11 is a converting means for converting an input signal including voice and noise (non-voice) from a time domain to a frequency domain. The S / N estimating means 12 discriminates speech and non-speech by analyzing the spectrum distribution of the input signal in the frequency domain, the time change of the amplitude of the input signal, or the time change of the low amplitude component and the high amplitude component. , S / N ratio. It is technically difficult to distinguish between speech and noise, and it is possible to identify the pronunciation of a particular speaker among many speakers, or to recognize the speaker's speech when sudden non-stationary noise occurs during speech. It is especially difficult. Here, since the above-described environment is targeted, the S / N ratio can be estimated. Assuming that the estimated noise spectrum is N (ω) and the spectrum of the input signal is X (ω), the S / N ratio is estimated using, for example, the following equation (3).

【数３】但し（３）式に限るものではなく、次の（４）式による
Ｓ／Ｎ比算出法でもよい。(Equation 3) However, the present invention is not limited to the equation (3), and may be an S / N ratio calculation method based on the following equation (4).

【数４】 (Equation 4)

【００３０】抑圧係数データテーブル１３はＳ／Ｎ比と
周波数との２変数について異なる抑圧係数α（Ｓ／Ｎ，
ω）の値を格納するテーブルである。なお抑圧係数αと
は、推定ノイズスペクトルに掛け合わせる倍率変数であ
る。抑圧量推定手段１４は、抑圧係数データテーブル１
３から最適な抑圧係数αを選択し、抑圧すべきノイズス
ペクトルを推定する推定手段である。雑音抑圧手段１５
は、周波数領域上の入力信号に対して抑圧量推定手段１
４が出力する推定ノイズスペクトルを用いて実際に雑音
抑圧を実行する抑圧手段である。The suppression coefficient data table 13 has different suppression coefficients α (S / N,
ω) is a table for storing values. Note that the suppression coefficient α is a magnification variable that is multiplied by the estimated noise spectrum. The suppression amount estimating means 14 calculates the suppression coefficient data table 1
3 is an estimating means for selecting an optimal suppression coefficient α from 3 and estimating a noise spectrum to be suppressed. Noise suppression means 15
Is a suppression amount estimating means 1 for an input signal in the frequency domain.
4 is a suppression unit that actually performs noise suppression using the estimated noise spectrum output by the output unit 4.

【００３１】図２は、図１における雑音抑圧手段１５に
対してウィーナーフィルタを使用した場合の雑音抑圧装
置の構成図である。この雑音抑圧装置は、図１のＳ／Ｎ
推定手段１２、抑圧係数データテーブル１３、抑圧量推
定手段１４に加えて、入力信号を即時に周波数分析する
ＦＦＴ２１、周波数領域上の入力信号をパワスペクトル
に変換するパワスペクトル変換手段２２、抑圧量推定手
段１４で推定した雑音抑圧量から、フィルタの諸変数を
推定するウィーナーフィルタ推定手段２３、ＩＦＦＴを
含み、フィルタ係数を具体的に導出するフィルタ係数導
出手段２４、得られたフィルタ係数を用いて入力信号と
演算することにより、雑音を抑圧するフィルタリング演
算手段２５を含んで構成される。FIG. 2 is a block diagram of a noise suppression device when a Wiener filter is used for the noise suppression means 15 in FIG. This noise suppression device is the S / N of FIG.
In addition to the estimation unit 12, the suppression coefficient data table 13, and the suppression amount estimation unit 14, an FFT 21 for immediately analyzing the frequency of the input signal, a power spectrum conversion unit 22 for converting an input signal in the frequency domain into a power spectrum, a suppression amount estimation A Wiener filter estimating means 23 for estimating various filter parameters from the noise suppression amount estimated by the means 14, a filter coefficient deriving means 24 including an IFFT and specifically deriving filter coefficients, and an input using the obtained filter coefficients. It is configured to include a filtering calculation unit 25 that suppresses noise by calculating with a signal.

【００３２】ウィーナーフィルタ推定手段２３は、抑圧
量推定手段１４で推定された推定ノイズスペクトルＮ
（ω）と、パワスペクトル変換手段２２から得られた入
力信号のスペクトルＸ（ω）とから、ウィーナーフィル
タの特性Ｈ（ω）を求める。この特性Ｈ（ω）は次の
（５）式で記述される。The Wiener filter estimating means 23 calculates the estimated noise spectrum N estimated by the suppression amount estimating means 14.
The characteristic H (ω) of the Wiener filter is obtained from (ω) and the spectrum X (ω) of the input signal obtained from the power spectrum converting means 22. This characteristic H (ω) is described by the following equation (5).

【数５】 (Equation 5)

【００３３】フィルタリング演算手段２５は、フィルタ
係数導出手段２４から得られたフィルタ係数を入力信号
に演算処理することにより、入力信号に含まれる雑音を
抑圧する。フィルタ係数の実現例としては、ＦＩＲフィ
ルタの各乗算器における乗算係数が挙げられる。またフ
ィルタリング演算手段２５の構成例としては、複数の遅
延器及び乗算器と加算器とからなる畳み込み演算回路が
挙げられる。The filtering operation means 25 operates on the input signal with the filter coefficients obtained from the filter coefficient deriving means 24 to suppress noise contained in the input signal. An example of the implementation of the filter coefficient is a multiplication coefficient in each multiplier of the FIR filter. An example of the configuration of the filtering operation unit 25 is a convolution operation circuit including a plurality of delay units, multipliers, and adders.

【００３４】図３は、図１における雑音抑圧手段１５に
対してスペクトルサブトラクションを使用した場合の雑
音抑圧装置の構成図である。この雑音抑圧装置は、フレ
ーム単位で音声信号を入力するものとし、Ｓ／Ｎ推定手
段１２、抑圧係数データテーブル１３、抑圧量推定手段
１４、ＦＦＴ２１、パワスペクトル変換手段２２に加え
て、位相変換手段３１、差分回路３２、スペクトル復元
手段３３、ＩＦＦＴ３４、波形接続手段３５を含んで構
成される。FIG. 3 is a block diagram of the noise suppression device when spectrum subtraction is used for the noise suppression means 15 in FIG. This noise suppression device inputs a speech signal in frame units. In addition to the S / N estimation means 12, the suppression coefficient data table 13, the suppression amount estimation means 14, the FFT 21, the power spectrum conversion means 22, the phase conversion means 31, a difference circuit 32, a spectrum restoration means 33, an IFFT 34, and a waveform connection means 35.

【００３５】位相変換手段３１は入力信号の位相情報を
算出する変換手段である。差分回路３２は、抑圧量推定
手段１４からの出力、即ち推定ノイズスペクトルと抑圧
係数α（Ｓ／Ｎ，ω）との積と、パワスペクトル変換手
段２２から得られた入力信号のパワスペクトルとの差を
求める回路である。スペクトル復元手段３３は、位相変
換手段３１から得られた位相情報と、差分回路３２から
得られた抑圧後のパワスペクトル情報とから、抑圧後の
スペクトルを復元する復元手段である。ＩＦＦＴ３４
は、復元された抑圧後のスペクトル信号を時間領域上の
信号に変換する手段である。波形接続手段３５はフレー
ム単位で復元された音声信号をフレーム接続する接続手
段である。The phase converter 31 is a converter for calculating the phase information of the input signal. The difference circuit 32 calculates the output of the suppression amount estimating unit 14, that is, the product of the estimated noise spectrum and the suppression coefficient α (S / N, ω), and the power spectrum of the input signal obtained from the power spectrum converting unit 22. This is a circuit for calculating the difference. The spectrum restoring unit 33 is a restoring unit that restores a suppressed spectrum from the phase information obtained from the phase conversion unit 31 and the suppressed power spectrum information obtained from the difference circuit 32. IFFT34
Is means for converting the restored spectral signal after suppression into a signal in the time domain. The waveform connection unit 35 is a connection unit that connects the audio signal restored on a frame basis to the frame.

【００３６】本実施の形態１において、Ｓ／Ｎ推定手段
１２の代わりに、入力信号のノイズらしさや音声らしさ
を判別し、抑圧量推定手段１４に判定結果を出力する入
力信号種類判定手段を設けてもよい。またＳ／Ｎ推定手
段１２と前述した入力信号種類判定手段の２つを併用し
てもよい。抑圧係数データテーブル１３としては、周波
数とＳ／Ｎ比の２つを変数するもの、周波数と入力信号
種類の２つを変数するもの、周波数とＳ／Ｎ比と入力信
号種類との３つを変数するものが考えられ、いずれを変
数の組を用いたデータテーブルであってもよい。In the first embodiment, instead of the S / N estimating means 12, an input signal type judging means for judging the noise likeness or the sound likeness of the input signal and outputting the judgment result to the suppression amount estimating means 14 is provided. You may. The S / N estimating means 12 and the input signal type determining means described above may be used in combination. The suppression coefficient data table 13 has three variables of frequency and S / N ratio, two variables of frequency and input signal type, and three of frequency, S / N ratio and input signal type. Variables are conceivable, and any of them may be a data table using a set of variables.

【００３７】図４は、本実施の形態１の雑音抑圧装置に
より雑音抑圧を行った場合の、抑圧前後のスペクトル距
離を示した特性図である。図１６に比べて図４では特に
高域における過剰な雑音抑圧、即ち音声歪みが軽減され
ていることが判る。FIG. 4 is a characteristic diagram showing spectrum distances before and after suppression when noise suppression is performed by the noise suppression device of the first embodiment. In FIG. 4, compared to FIG. 16, it can be seen that excessive noise suppression, particularly in the high frequency range, that is, voice distortion is reduced.

【００３８】図５は、異なるＳ／Ｎ比で混合されたノイ
ズと音声を用いて、本実施の形態１の雑音抑圧装置によ
り雑音抑圧を行った場合のスペクトル距離を示す特性図
である。図１７に示す特性と異なり、図５ではいずれの
Ｓ／Ｎ比においても、スペクトル距離が０ｄＢ付近とな
っている。このように本実施の形態では、Ｓ／Ｎ比の異
なる場合においても、効率的よく雑音を抑圧できること
が判る。FIG. 5 is a characteristic diagram showing a spectral distance when noise suppression is performed by the noise suppression device of the first embodiment using noise and speech mixed at different S / N ratios. Unlike the characteristics shown in FIG. 17, in FIG. 5, the spectral distance is near 0 dB at any S / N ratio. As described above, in the present embodiment, it can be seen that noise can be efficiently suppressed even when the S / N ratio is different.

【００３９】以上のように、抑圧係数のデータテーブル
を持ち、入力信号の周波数成分ごとのＳ／Ｎ比を推定し
て抑圧係数を決定するという処理を行うことで、入力信
号やノイズの音質及びレベル変化に対応したロバスト
（変化に対して適応能力の高い）な雑音抑圧を行うこと
ができる。このように高精度の雑音抑圧が行える本実施
の形態の雑音抑圧装置は、音声認識装置への入力前処理
手段としても有力である。As described above, by having the data table of the suppression coefficient and performing the processing of estimating the S / N ratio for each frequency component of the input signal and determining the suppression coefficient, the sound quality of the input signal and the noise can be improved. It is possible to perform robust (high adaptive ability to change) noise suppression corresponding to the level change. The noise suppression device according to the present embodiment capable of performing high-precision noise suppression in this way is also effective as an input preprocessing unit for a speech recognition device.

【００４０】（実施の形態２）次に本発明の実施の形態
２における雑音抑圧装置について、図６〜８を参照しな
がら説明する。尚、従来例及び実施の形態１と同一部分
は同一の符号を付け、詳細な説明を省略する。図６は本
実施の形態の雑音抑圧装置の構成図である。この雑音抑
圧装置は、ＦＦＴ２１、パワスペクトル変換手段２２、
抑圧係数データテーブル１３に加えて、フレーム処理手
段６１、入力信号種類判定手段６２、ノイズスペクトル
学習手段６３、スペクトル成分別Ｓ／Ｎ推定手段６４、
抑圧係数決定手段６５、抑圧量推定手段６６、雑音抑圧
手段６７を含んで構成される。(Embodiment 2) Next, a noise suppressing apparatus according to Embodiment 2 of the present invention will be described with reference to FIGS. The same parts as those in the conventional example and the first embodiment are denoted by the same reference numerals, and detailed description is omitted. FIG. 6 is a configuration diagram of the noise suppression device of the present embodiment. This noise suppression device includes an FFT 21, a power spectrum conversion unit 22,
In addition to the suppression coefficient data table 13, a frame processing unit 61, an input signal type determination unit 62, a noise spectrum learning unit 63, an S / N estimation unit for each spectral component 64,
It is configured to include a suppression coefficient determination unit 65, a suppression amount estimation unit 66, and a noise suppression unit 67.

【００４１】フレーム処理手段６１は、入力される音声
信号を一定長のフレームとして切り出す処理手段であ
る。入力信号種類判定手段６２は、フレーム処理手段６
１のフレーム信号を入力し、夫々のフレーム内に含まれ
る音声信号のノイズらしさや音声らしさを区別し、その
判定結果を出力する判定手段である。ここではノイズら
しさや音声らしさをｎ段階に分けて判定を行ってもよ
い。判定方法は、フレーム単位で入力信号を取り入れ、
周波数領域上の入力信号のスペクトル分布、入力信号の
振幅の時間変化又は低振幅成分と高振幅成分の時間変化
等を解析することにより、音声か非音声かを判定する。
入力信号が、車両内のドライバーの発声した音声と、ロ
ードノイズ又はエンジンノイズとが重畳されたような場
合は、上記の判定は技術的に容易である。The frame processing means 61 is a processing means for cutting out an input audio signal as a frame of a fixed length. The input signal type determination means 62
This is a determination unit that inputs one frame signal, distinguishes noise-likeness and voice-likeness of audio signals included in each frame, and outputs the determination result. Here, the determination may be performed by dividing the likelihood of noise and the likelihood of voice into n stages. The judgment method is to take the input signal in frame units,
By analyzing the spectrum distribution of the input signal in the frequency domain, the time change of the amplitude of the input signal, or the time change of the low amplitude component and the high amplitude component, it is determined whether the input signal is voice or non-voice.
When the input signal is such that a voice uttered by a driver in the vehicle is superimposed on road noise or engine noise, the above determination is technically easy.

【００４２】ノイズスペクトル学習手段６３は、入力信
号種類判定手段６２で雑音と判定されたフレームのパワ
スペクトルを学習し、学習結果を一時的に蓄積する学習
手段である。スペクトル成分別Ｓ／Ｎ推定手段６４は、
パワスペクトル変換手段２２で得られたフレームのパワ
スペクトルと、ノイズスペクトル学習手段６３で学習及
び蓄積されたノイズスペクトルとから、フレーム内の周
波数成分ごとのＳ／Ｎ比を推定するＳ／Ｎ推定手段であ
る。The noise spectrum learning means 63 is a learning means for learning a power spectrum of a frame determined to be noise by the input signal type determining means 62 and temporarily storing a learning result. S / N estimating means 64 for each spectral component
S / N estimating means for estimating the S / N ratio for each frequency component in the frame from the power spectrum of the frame obtained by the power spectrum converting means 22 and the noise spectrum learned and accumulated by the noise spectrum learning means 63 It is.

【００４３】抑圧係数決定手段６５は、スペクトル成分
別Ｓ／Ｎ推定手段６４から出力されるスペクトル成分別
Ｓ／Ｎに基づいて、抑圧係数データテーブル１３から最
適の抑圧係数を選択し、抑圧量推定手段６６に与える決
定手段である。抑圧量推定手段６６はノイズスペクトル
学習手段６３から得られたノイズスペクトルと、抑圧係
数決定手段６５で決定された抑圧係数とから、抑圧すべ
きノイズスペクトルを推定する推定手段である。雑音抑
圧手段６７は、パワスペクトル変換手段２２から得られ
た入力信号のパワスペクトルと、抑圧量推定手段６６で
推定した雑音抑圧量とに基づいて、入力信号に対して実
際に雑音抑圧を行う抑圧手段である。The suppression coefficient determining means 65 selects an optimum suppression coefficient from the suppression coefficient data table 13 based on the spectral component S / N output from the spectral component S / N estimating means 64 and estimates the amount of suppression. The determination means is provided to the means 66. The suppression amount estimation unit 66 is an estimation unit that estimates a noise spectrum to be suppressed from the noise spectrum obtained from the noise spectrum learning unit 63 and the suppression coefficient determined by the suppression coefficient determination unit 65. The noise suppression unit 67 performs suppression for actually performing noise suppression on the input signal based on the power spectrum of the input signal obtained from the power spectrum conversion unit 22 and the noise suppression amount estimated by the suppression amount estimation unit 66. Means.

【００４４】なお図７は、図６の雑音抑圧手段６７に対
してウィーナーフィルタを使用した場合の雑音抑圧装置
の構成図である。この雑音抑圧装置は、雑音抑圧手段６
７の内部構成として、点線部で示すようにウィーナーフ
ィルタ推定手段２３、ＩＦＦＴ３４、インパルス応答制
御手段７７、畳み込み演算手段７８が設けられている。FIG. 7 is a block diagram of a noise suppression device when a Wiener filter is used for the noise suppression means 67 in FIG. This noise suppression device comprises a noise suppression unit 6
As an internal configuration of 7, a Wiener filter estimating unit 23, an IFFT 34, an impulse response control unit 77, and a convolution operation unit 78 are provided as indicated by dotted lines.

【００４５】ウィーナーフィルタ推定手段２３は、抑圧
量推定手段６６で推定された推定ノイズスペクトルＮ
（ω）と、パワスペクトル変換手段２２から得られた入
力信号のスペクトルＸ（ω）とから、ウィーナーフィル
タの特性Ｈ（ω）を推定するものである。ＩＦＦＴ３４
はウィーナーフィルタ推定手段２３から出力されたフィ
ルタ特性を時間領域上の制御信号（インパルス応答の制
御信号）に変換する手段である。インパルス応答制御手
段７７は、１フレームごとに得られるインパルス応答の
制御信号を用いて、フレーム間で音声信号を滑らかに変
化していくように制御するための制御手段である。具体
的には、１つ前のフレームでのインパルス応答数列か
ら、新しいフレームへのインパルス応答数列に移行して
変化させるときに、１００％変化させずに一定割合だけ
変化させるようにするといった手法が例として挙げられ
る。これは複数フレーム間のデータを違和感なく接続す
るのに有効である。The Wiener filter estimating unit 23 calculates the estimated noise spectrum N estimated by the suppression amount estimating unit 66.
The characteristic H (ω) of the Wiener filter is estimated from (ω) and the spectrum X (ω) of the input signal obtained from the power spectrum converting means 22. IFFT34
Is a means for converting the filter characteristic output from the Wiener filter estimating means 23 into a control signal in the time domain (control signal of an impulse response). The impulse response control unit 77 is a control unit for controlling the audio signal to change smoothly between frames using a control signal of an impulse response obtained for each frame. Specifically, there is a method in which when changing from the impulse response sequence in the immediately preceding frame to the impulse response sequence in a new frame and changing the sequence, the ratio is changed by a certain ratio without changing 100%. As an example. This is effective for connecting data between a plurality of frames without feeling uncomfortable.

【００４６】畳み込み演算手段７８は、インパルス応答
制御手段７７から適宜出力されるフィルタ係数を用いて
入力信号に畳み込み演算する演算手段である。抑圧係数
データテーブル１３としては、周波数とＳ／Ｎ比との２
つを変数とするデータテーブルを用意すればよい。な
お、スペクトル成分別Ｓ／Ｎ推定手段６４の替わりに、
フレーム単位でＳ／Ｎ比の代表値を推定するＳ／Ｎ推定
手段を用いてもよい。この場合データテーブルとして
は、Ｓ／Ｎ比１つだけを変数とするデータテーブルを用
意すればよい。The convolution operation means 78 is an operation means for performing a convolution operation on an input signal using a filter coefficient appropriately output from the impulse response control means 77. The suppression coefficient data table 13 includes two values of the frequency and the S / N ratio.
What is necessary is just to prepare the data table which uses one as a variable. Note that, instead of the spectral component-specific S / N estimating means 64,
S / N estimating means for estimating a representative value of the S / N ratio for each frame may be used. In this case, a data table having only one S / N ratio as a variable may be prepared.

【００４７】図８は、本実施の形態２の雑音抑圧装置に
より雑音抑圧した場合の音声認識率を示した説明図であ
る。１００単語として１００種の地名を５人の話者に夫
々発声してもらい、発生音声に対して、Ｓ／Ｎ比−２ｄ
Ｂ〜１０ｄＢとなるようノイズを夫々加え、１単語当た
り７種類の音声信号を作り、音声認識の評価対象とし
た。そして夫々の音声信号に対する地名の認識率を調
べ、その平均値をプロットした。雑音抑圧前の認識率を
Ｒ３で示し、従来法による雑音抑圧後の認識率をＲ２で
示し、本実施の形態における方法による雑音抑圧後の認
識率をＲ１で示した。これによると、特にＳ／Ｎ比の悪
い領域、例えばＳ／Ｎ比が０ｄＢ以下の音声信号におい
て、雑音抑圧を行う方が雑音抑圧を行わない方に比較し
て音声の認識率が向上が格段に向上する。この効果に加
えて、従来法による雑音抑圧法による値と比較して、本
実施の形態の雑音抑圧法による音声の認識率が更に向上
することが判る。FIG. 8 is an explanatory diagram showing a speech recognition rate when noise is suppressed by the noise suppression device of the second embodiment. Five speakers each uttered 100 place names as 100 words, and S / N ratio -2d
Noise was added so as to be B to 10 dB, and seven types of speech signals were generated for each word, and the speech signals were evaluated. Then, the recognition rate of the place name for each audio signal was examined, and the average value was plotted. The recognition rate before noise suppression is indicated by R3, the recognition rate after noise suppression by the conventional method is indicated by R2, and the recognition rate after noise suppression by the method in the present embodiment is indicated by R1. According to this, in a region having a particularly low S / N ratio, for example, in a speech signal having an S / N ratio of 0 dB or less, the speech recognition rate is significantly improved when noise suppression is performed as compared with the case where noise suppression is not performed. To improve. In addition to this effect, it can be seen that the speech recognition rate by the noise suppression method of the present embodiment is further improved as compared with the value by the conventional noise suppression method.

【００４８】以上のように、本実施の形態２では、フレ
ームごとにノイズらしさや音声らしさを区別して、ノイ
ズと判断されたときのみノイズスペクトルを学習し、現
フレームのパワスペクトルから、周波数成分ごとの現フ
レームのＳ／Ｎ比を推定し、抑圧係数を決定するという
処理を行うことにより、ロバストな抑圧量の調整が可能
となり、抑圧の精度が向上する。As described above, in the second embodiment, noise likeness and speech likeness are distinguished for each frame, and a noise spectrum is learned only when it is determined to be noise. By performing the processing of estimating the S / N ratio of the current frame and determining the suppression coefficient, it is possible to adjust the amount of suppression robustly and improve the accuracy of suppression.

【００４９】（実施の形態３）次に本発明の実施の形態
３における雑音抑圧装置について、図９を参照しながら
説明する。図９は本実施の形態における雑音抑圧装置の
構成図であり、構成要素はすべて図７に示す実施の形態
２の雑音抑圧装置に準じる。ここでは、図７のスペクト
ル成分別Ｓ／Ｎ推定手段６４の替わりに、入力信号種類
判定手段６２により抑圧係数αが決定されるよう、入力
信号種類判定手段６２の出力が抑圧係数決定手段６５に
入力される構成となっている。また点線部で示す雑音抑
圧手段６７として、ウィーナーフィルタを用いた場合を
示したが、その他の雑音抑圧手段であってもよい。ま
た、入力信号種類判定手段６２では、ノイズか音声かと
いった２段階の判断であってもよいし、その度合いをｎ
段階で判定するものであってもよい。ここでの抑圧係数
データテーブル１３としては、周波数と入力信号種類の
２つを変数とするテーブルを用意すればよい。(Embodiment 3) Next, a noise suppressing apparatus according to Embodiment 3 of the present invention will be described with reference to FIG. FIG. 9 is a configuration diagram of the noise suppression device according to the present embodiment, and all the components are in accordance with the noise suppression device of the second embodiment shown in FIG. Here, the output of the input signal type determination means 62 is sent to the suppression coefficient determination means 65 so that the suppression coefficient α is determined by the input signal type determination means 62 instead of the spectral component-specific S / N estimation means 64 of FIG. It is configured to be input. Further, although the case where a Wiener filter is used as the noise suppression means 67 indicated by the dotted line is shown, other noise suppression means may be used. In addition, the input signal type determination means 62 may be a two-step determination such as noise or voice, or the degree may be n.
The determination may be made in stages. Here, as the suppression coefficient data table 13, a table having two variables of the frequency and the input signal type may be prepared.

【００５０】このように、フレームごとにノイズらしさ
や音声らしさを区別し、その区別により抑圧係数を決定
するという処理を行うことで、ノイズらしいフレームの
ときには入力信号を無条件に大きく抑圧し、音声らしい
フレームのときは音声のみを取り出すよう入力信号に対
して抑圧をかけることが可能となり、抑圧の効率がよく
なる。As described above, by performing processing of discriminating the likelihood of noise and speech for each frame and determining the suppression coefficient based on the discrimination, the input signal is greatly suppressed unconditionally when the frame is likely to be noise. When the frame is likely, it is possible to suppress the input signal so that only the voice is extracted, and the efficiency of the suppression is improved.

【００５１】また実施の形態２のようにＳ／Ｎ比で変数
設定をすると、変数規模の大きさからデータテーブルの
規模が大きくなり、ハードウェア実装の面で不利となる
場合も考えられるが、本実施の形態の場合、ノイズか音
声かというｎ段階判断のｎは任意の数でよく、しかも最
低の２に設定してもさほど支障がないことから、ハード
ウェア実装面で有利である。このような雑音抑圧装置
は、特に携帯電話やＩＣカードなどの小型機器に搭載さ
れる音声認識機能への入力前処理手段としても有力とな
る。When the variables are set by the S / N ratio as in the second embodiment, the size of the data table may be increased due to the size of the variables, which may be disadvantageous in terms of hardware implementation. In the case of the present embodiment, n in the n-stage determination of noise or voice may be an arbitrary number, and even if it is set to the minimum of 2, there is no problem, and this is advantageous in terms of hardware implementation. Such a noise suppression device is particularly effective as an input preprocessing unit for a voice recognition function mounted on a small device such as a mobile phone or an IC card.

【００５２】（実施の形態４）次に本発明の実施の形態
４における雑音抑圧装置について、図１０を参照しなが
ら説明する。図１０は本実施の形態における雑音抑圧装
置の構成図であり、構成要素はすべて実施の形態２の図
７に準じる。ただし抑圧係数αを決定する抑圧係数決定
手段６５への入力として、入力信号種類判定手段６２と
スペクトル成分別Ｓ／Ｎ推定手段６４からの２つの情報
を利用しているのが本実施の形態の特徴である。また図
１０では雑音抑圧手段６７として、ウィーナーフィルタ
を採用した場合の構成を示したが、その他の雑音抑圧手
段であってもよい。抑圧係数データテーブル１３として
は、周波数、入力信号種類、Ｓ／Ｎ比の３つを変数とし
たデータシートを用意すればよい。(Embodiment 4) Next, a noise suppressing apparatus according to Embodiment 4 of the present invention will be described with reference to FIG. FIG. 10 is a configuration diagram of a noise suppression device according to the present embodiment, and all components are the same as those in FIG. 7 of the second embodiment. However, the present embodiment uses two pieces of information from the input signal type determination means 62 and the spectral component-specific S / N estimation means 64 as inputs to the suppression coefficient determination means 65 for determining the suppression coefficient α. It is a feature. FIG. 10 shows a configuration in which a Wiener filter is employed as the noise suppression means 67, but other noise suppression means may be used. As the suppression coefficient data table 13, a data sheet having three variables of frequency, input signal type, and S / N ratio may be prepared.

【００５３】なお、スペクトル成分別Ｓ／Ｎ推定手段６
４の替わりに、新たにフレーム単位でＳ／Ｎ比を推定す
るＳ／Ｎ推定手段を用いてもよい。この場合抑圧係数デ
ータテーブル１３としては、Ｓ／Ｎ比のみを変数とした
データシートを入力信号種類別にｎ枚分用意すればよ
い。The S / N estimating means for each spectral component 6
Instead of 4, an S / N estimating means for newly estimating the S / N ratio for each frame may be used. In this case, as the suppression coefficient data table 13, n data sheets using only the S / N ratio as variables may be prepared for each input signal type.

【００５４】以上のように、本実施の形態では、フレー
ムごとにノイズらしさや音声らしさをｎ段階で判断し、
ノイズらしいと判断されたときのみ、ノイズスペクトル
を学習する。また、フレーム内のパワスペクトルから、
周波数成分ごとのＳ／Ｎ比をつねに推定し、フレームの
音声種類のｎ段階判断に照らしてデータシートを選択す
る。即ちＳ／Ｎ比と周波数の２変数により抑圧係数を決
定するという処理を行うことで、実施の形態２、３より
も更にロバストな抑圧量の調整が可能となり、精度よく
抑圧が行える。As described above, in this embodiment, the likelihood of noise and the like of speech are determined for each frame in n stages.
Only when it is determined to be noise, the noise spectrum is learned. Also, from the power spectrum in the frame,
The S / N ratio for each frequency component is always estimated, and a data sheet is selected in light of the n-stage determination of the audio type of the frame. That is, by performing the process of determining the suppression coefficient by using two variables, the S / N ratio and the frequency, the suppression amount can be adjusted more robustly than in the second and third embodiments, and the suppression can be performed with high accuracy.

【００５５】（実施の形態５）次に本発明の実施の形態
５における雑音抑圧装置について、図１１を参照しなが
ら説明する。図１１は本実施の形態における雑音抑圧装
置の構成図であり、構成要素はすべて実施の形態２の図
７に準じる。ただし入力信号のフレームを、スペクトル
成分別にノイズらしさや音声らしさのｎ段階で判定する
ため、スペクトル成分別入力信号種類判定手段１１１を
設ける。また、抑圧係数αを決定する抑圧係数決定手段
６５への入力としては、スペクトル成分別入力信号種類
判定手段１１１からの情報を利用している。(Embodiment 5) Next, a noise suppressing apparatus according to Embodiment 5 of the present invention will be described with reference to FIG. FIG. 11 is a configuration diagram of the noise suppression device according to the present embodiment, and all components are the same as those in FIG. 7 of the second embodiment. However, in order to determine the frame of the input signal in n stages of noise likeness and voice likeness for each spectral component, an input signal type determining unit 111 for each spectral component is provided. As input to the suppression coefficient determining means 65 for determining the suppression coefficient α, information from the input signal type determining means 111 for each spectrum component is used.

【００５６】実施の形態３との違いは、図９の入力信号
種類判定手段６２はフレーム単位でｎ段階のノイズや音
声などの種類判断を下していたのに対し、本実施の形態
では、フレーム内の周波数成分ごとにｎ段階のノイズや
音声などの種類判断を下しているのが特徴である。また
図１１では雑音抑圧手段６７として、ウィーナーフィル
タを採用した場合の構成を示したが、その他の雑音抑圧
手段であってもよい。抑圧係数データテーブル１３とし
ては、周波数と入力信号種類の２つを変数とするテーブ
ルを用意すればよい。The difference from the third embodiment is that the input signal type judging means 62 shown in FIG. 9 judges the type of noise, voice, etc. in n stages on a frame basis. The feature is that type determination such as noise or voice of n stages is made for each frequency component in the frame. FIG. 11 shows a configuration in which a Wiener filter is employed as the noise suppression means 67, but other noise suppression means may be used. As the suppression coefficient data table 13, a table having two variables of the frequency and the input signal type may be prepared.

【００５７】以上により、フレーム内の周波数成分ごと
にノイズらしさや音声らしさを区別し、この区別より抑
圧係数を決定するという処理を行うことで、ノイズらし
い周波数成分のときには、無条件に大きく入力信号を抑
圧し、音声らしい周波数成分のときは音声のみを取り出
すよう入力信号に抑圧をかけることが可能となり、実施
の形態３よりも精度良く抑圧が行える。As described above, by performing processing of discriminating the likelihood of noise and the likelihood of speech for each frequency component in the frame and determining the suppression coefficient based on the discrimination, when the frequency component is likely to be noise, the input signal is unconditionally greatly increased. Can be suppressed, and the input signal can be suppressed so as to extract only the voice when the frequency component is likely to be a voice, and the suppression can be performed more accurately than in the third embodiment.

【００５８】（実施の形態６）次に本発明の実施の形態
６における雑音抑圧装置について、図１２を参照しなが
ら説明する。図１２は本実施の形態における雑音抑圧装
置の構成図であり、構成要素はすべて実施の形態２の図
７に準じる。ただし抑圧係数αを決定する抑圧係数決定
手段６５への入力として、スペクトル成分別入力信号種
類判定手段１１１とスペクトル成分別Ｓ／Ｎ推定手段６
４からの情報を利用している点が実施の形態４と類似し
ている。実施の形態４では、スペクトル成分別入力信号
種類判定手段６２はフレーム単位でｎ段階のノイズや音
声などの種類判断を下していたのに対し、本実施の形態
のスペクトル成分別入力信号種類判定手段１１１では、
フレーム内の周波数成分ごとにｎ段階のノイズらしさや
音声らしさなどの種類判断を下しているのが特徴であ
る。また図１２では雑音抑圧手段６７として、ウィーナ
ーフィルタを採用した場合の構成を示したが、その他の
雑音抑圧手段であってもよい。(Embodiment 6) Next, a noise suppressing apparatus according to Embodiment 6 of the present invention will be described with reference to FIG. FIG. 12 is a configuration diagram of a noise suppression device according to the present embodiment, in which all components are the same as in FIG. 7 of the second embodiment. However, as inputs to the suppression coefficient determining means 65 for determining the suppression coefficient α, the input signal type determining means 111 for each spectral component and the S / N estimating means 6 for each spectral component are used.
The fourth embodiment is similar to the fourth embodiment in that information from the fourth embodiment is used. In the fourth embodiment, the input signal type determination unit 62 for each spectrum component performs the n-stage type determination of noise, voice, and the like in frame units. In the means 111,
The feature is that type determination such as noise-likeness and voice-likeness of n stages is made for each frequency component in the frame. FIG. 12 shows a configuration in which a Wiener filter is employed as the noise suppressing means 67, but other noise suppressing means may be used.

【００５９】抑圧係数データテーブル１３としては、周
波数、入力信号種類、Ｓ／Ｎ比の３つを変数とするテー
ブルを用意すればよい。なお、スペクトル成分別Ｓ／Ｎ
推定手段６４の替わりに、新たにフレーム単位でＳ／Ｎ
比の代表値を推定するＳ／Ｎ推定手段を用いてもよい。As the suppression coefficient data table 13, a table having three variables of frequency, input signal type, and S / N ratio may be prepared. In addition, S / N for each spectral component
Instead of the estimating means 64, the S / N is newly added in frame units.
S / N estimating means for estimating the representative value of the ratio may be used.

【００６０】以上により、フレーム内の周波数成分ごと
にノイズらしさや音声らしさを区別し、スペクトル成分
別のＳ／Ｎ比も算出して、この２つのデータにより抑圧
係数を決定するという処理を行うことで、ノイズらしい
周波数成分のときには、無条件に大きく抑圧し、音声ら
しい周波数成分のときは音声のみを取り出すよう抑圧を
かけることが可能となり、実施の形態４よりも精度良く
抑圧が行える。例えばこのような雑音抑圧装置を音声認
識装置の前段に設けると、前述した単語の認識率が、７
５％から８２％に改善された。As described above, the processing of discriminating noise likeness and speech likeness for each frequency component in the frame, calculating the S / N ratio for each spectrum component, and determining the suppression coefficient based on these two data is performed. Thus, when the frequency component is likely to be noise, it is possible to unconditionally suppress the noise component, and when the frequency component is likely to be speech, it is possible to suppress such that only the voice is extracted, and the suppression can be performed more accurately than in the fourth embodiment. For example, if such a noise suppression device is provided before the speech recognition device, the above-described word recognition rate becomes 7%.
It improved from 5% to 82%.

【００６１】（実施の形態７）次に本発明の実施の形態
７における雑音抑圧装置について、図１３及び図１４を
参照しながら説明する。図１３は本発明の実施の形態７
における雑音抑圧装置の基本構成を示すブロック図であ
る。この雑音抑圧装置は、擬似音声データ収録手段１３
１、音声混合手段１３２、第１の雑音抑圧手段１３３、
スペクトル比較手段１３４、第２の雑音抑圧手段１３
５、抑圧係数データテーブル１３を含んで構成される。(Embodiment 7) Next, a noise suppressing apparatus according to Embodiment 7 of the present invention will be described with reference to FIGS. FIG. 13 shows Embodiment 7 of the present invention.
FIG. 2 is a block diagram showing a basic configuration of a noise suppression device in FIG. This noise suppression device is provided with a pseudo voice data recording unit 13.
1, voice mixing means 132, first noise suppression means 133,
Spectrum comparing means 134, second noise suppressing means 13
5. It includes the suppression coefficient data table 13.

【００６２】擬似音声データ収録手段１３１は時間領域
上の擬似音声データを記憶するデータ収録手段である。
音声混合手段１３２は入力信号の種類を判断し、ノイズ
と判断したときには擬似音声データ収録手段１３１から
擬似音声データを取り出し、数通りのＳ／Ｎ比で音声を
混合する混合手段である。なお、必ずしもノイズかどう
かの判断を行わなくてもよく、常に入力信号に対して擬
似音声データを混合するものであってもよい。またスペ
クトル上でノイズを学習して平均化したものを、時間波
形に戻して混合してもよい。The pseudo sound data recording means 131 is a data recording means for storing pseudo sound data in the time domain.
The sound mixing means 132 is a mixing means for judging the type of the input signal and, when judging it as noise, taking out pseudo sound data from the pseudo sound data recording means 131 and mixing sounds at several S / N ratios. Note that it is not always necessary to determine whether or not noise is present, and the pseudo signal may always be mixed with the input signal. Further, a result obtained by learning and averaging noise on a spectrum may be returned to a time waveform and mixed.

【００６３】雑音抑圧手段１３３は、各Ｓ／Ｎ比で混合
した音声を、一定の抑圧係数αで雑音抑圧する第１の雑
音抑圧手段であり、実施の形態１〜６のいずれかの雑音
抑圧装置から抑圧係数データテーブルを除いた構成と同
一である。スペクトル比較手段１３４は、擬似音声デー
タのスペクトルと、第１の雑音抑圧手段１３３により雑
音抑圧された各Ｓ／Ｎ比ごとの音声データのスペクトル
とを比較し、フレーム内の平均スペクトル距離という評
価量を用いて抑圧量を評価する比較手段である。尚、比
較及び評価方法は、前述のスペクトル距離でなくてもよ
く、各スペクトル成分の抑圧の様子を評価できるもので
あればよい。The noise suppressing means 133 is the first noise suppressing means for suppressing the noise mixed with the respective S / N ratios with a constant suppression coefficient α, and the noise suppressing means according to any one of the first to sixth embodiments. The configuration is the same as that of the device except for the suppression coefficient data table. The spectrum comparing unit 134 compares the spectrum of the pseudo audio data with the spectrum of the audio data for each S / N ratio, which is noise-suppressed by the first noise suppressing unit 133, and evaluates the evaluation amount as the average spectral distance in the frame. Is a comparison means for evaluating the amount of suppression by using. Note that the comparison and evaluation method need not be the above-described spectral distance, and may be any method that can evaluate the state of suppression of each spectral component.

【００６４】第２の雑音抑圧手段である雑音抑圧手段１
３５は実施の形態１〜６のいずれかの雑音抑圧装置から
抑圧係数データテーブルを除いた構成と同一であり、ス
ペクトル比較手段１３４の比較結果に基づいて、抑圧係
数データテーブル１３から最適の抑圧係数αを選択し、
入力信号の雑音を抑圧する手段である。この場合も、よ
りロバストな雑音抑圧を行うことができる。Noise suppression means 1 as second noise suppression means
35 is the same as that of the noise suppression apparatus of any one of the first to sixth embodiments except that the suppression coefficient data table is removed, and based on the comparison result of the spectrum comparison means 134, the optimum suppression coefficient Select α,
This is a means for suppressing the noise of the input signal. Also in this case, more robust noise suppression can be performed.

【００６５】図１４は、本実施の形態７のより具体的な
構成例を示す雑音抑圧装置の構成図である。この雑音抑
圧装置は図１３の雑音抑圧手段１３３と雑音抑圧手段１
３５とを１つにまとめたことを特徴とし、図中の点線部
で示す雑音抑圧手段１３６が、図１３の雑音抑圧手段１
３３と雑音抑圧手段１３５に相当する。雑音抑圧手段１
３６において、ＦＦＴ１４１は、図１３における第１の
雑音抑圧手段１３３内の第１のＦＦＴである。ＦＦＴ１
４２は、雑音抑圧後の音声データを再び各フレームに分
割し、各フレームのパワスペクトルを得る第２のＦＦＴ
である。データテーブル決定手段１４３は、スペクトル
距離から抑圧係数αの値をＳ／Ｎ比ごとに決定する決定
手段である。図１４の上記以外の構成要素については、
図７に示す実施の形態２の構成要素と、図１３に示す疑
似音声データ収録手段１３１及び音声混合手段１３２と
を組み合わせたものである。FIG. 14 is a configuration diagram of a noise suppression device showing a more specific configuration example of the seventh embodiment. This noise suppression device includes a noise suppression unit 133 and a noise suppression unit 1 shown in FIG.
13 are combined into one, and the noise suppressing means 136 indicated by a dotted line in FIG.
33 and the noise suppression means 135. Noise suppression means 1
At 36, the FFT 141 is the first FFT in the first noise suppression means 133 in FIG. FFT1
Reference numeral 42 denotes a second FFT that divides the noise-suppressed voice data into frames again and obtains a power spectrum of each frame.
It is. The data table determining unit 143 is a determining unit that determines the value of the suppression coefficient α for each S / N ratio from the spectral distance. For the components other than the above in FIG. 14,
This is a combination of the components of the second embodiment shown in FIG. 7 with the pseudo audio data recording unit 131 and the audio mixing unit 132 shown in FIG.

【００６６】このような構成にすると、ハードウエア実
装面において有利である。但し、雑音抑圧手段１３３と
雑音抑圧手段１３５とを１つにまとめる必要はなく、ハ
ードウエア実装面において余裕がある場合は、異なる２
つの雑音抑圧手段を採用してもよい。本例のようにフレ
ーム単位で抑圧係数を変化させていくことで、よりロバ
ストな雑音抑圧が実現できる。Such a configuration is advantageous in terms of hardware mounting. However, it is not necessary to combine the noise suppressing means 133 and the noise suppressing means 135 into one.
Two noise suppression means may be employed. By changing the suppression coefficient for each frame as in this example, more robust noise suppression can be realized.

【００６７】以上により、抑圧係数のデータテーブルを
常に適応的に算出し利用することで、より雑音のスペク
トル形状やレベル変化に対応した雑音抑圧を行うことが
できる。As described above, by constantly calculating and using the data table of the suppression coefficient adaptively, it is possible to perform noise suppression corresponding to a change in the noise spectrum shape and level.

【００６８】[0068]

【発明の効果】以上のように請求項１，２，４，６記載
の雑音抑圧装置によれば、抑圧係数のデータテーブルを
もち、入力信号の周波数成分ごとのＳ／Ｎ比を推定して
抑圧係数を決定するという処理を行うことで、入力信号
やノイズの音質やレベル変化に対応したロバストな雑音
抑圧を行うことができる。このように高精度の雑音抑圧
が行えるこれらの雑音抑圧装置は、音声認識機能への入
力前処理手段としても有力である。As described above, according to the noise suppressing apparatus of the first, second, fourth and sixth aspects, the S / N ratio for each frequency component of the input signal is estimated by having the data table of the suppression coefficient. By performing the process of determining the suppression coefficient, it is possible to perform robust noise suppression corresponding to a change in the sound quality or level of an input signal or noise. These noise suppression devices capable of performing high-precision noise suppression in this way are also effective as input preprocessing means for a speech recognition function.

【００６９】また、請求項３記載の雑音抑圧装置によれ
ば、フレームごとにノイズらしさや音声らしさを区別し
てその区別により抑圧係数を決定するという処理を行う
ことで、ノイズらしいフレームのときには無条件に大き
く抑圧し、音声らしいフレームのときは音声のみを取り
出すよう抑圧をかけることが可能となり、抑圧の効率が
よくなる。請求項２記載の雑音抑圧装置と比較した場
合、Ｓ／Ｎ比という変数領域の大きさからデータテーブ
ルの規模が大きくなるのに対し、ノイズか音声かという
ｎ段階判断のｎは小さくてもよく、ハードウェア実装面
で有利である。According to the noise suppressing apparatus of the third aspect, by performing processing of distinguishing noise likeness and speech likeness for each frame and determining the suppression coefficient by the distinction, unconditional processing is performed for a frame like noise. In the case of a frame that seems to be a voice, it is possible to suppress the voice so that only the voice is extracted, and the efficiency of the suppression is improved. When compared with the noise suppression device according to the second aspect, the size of the data table is increased due to the size of the variable area called the S / N ratio, whereas n in the n-stage determination of noise or voice may be small. This is advantageous in terms of hardware mounting.

【００７０】また、請求項５記載の雑音抑圧装置によれ
ば、フレーム内の周波数成分ごとにノイズ・音声を区別
し、この区別により抑圧係数を決定するという処理を行
うことで、ハードウェア規模を小さくしたまま、請求項
３記載の雑音抑圧装置よりも精度良く抑圧が行える。According to the noise suppression apparatus of the fifth aspect, noise / voice is distinguished for each frequency component in the frame, and the processing of determining the suppression coefficient based on the distinction is performed, thereby reducing the hardware scale. It is possible to perform the suppression more accurately than the noise suppression device according to the third aspect while keeping the size small.

【００７１】また、請求項７，８記載の雑音抑圧装置に
よれば、抑圧係数のデータテーブルを常に適応的に更新
して利用することで、請求項１，２，４，６記載の雑音
抑圧装置より、入力信号やノイズの音質やレベル変化に
対応した雑音抑圧を行うことができる。According to the noise suppression apparatus of the seventh and eighth aspects, the data table of the suppression coefficient is always adaptively updated and used, so that the noise suppression apparatus of the first, second, fourth and sixth aspects is provided. The apparatus can perform noise suppression corresponding to a change in sound quality or level of an input signal or noise.

[Brief description of the drawings]

【図１】本発明の実施の形態１における雑音抑圧装置の
基本構成を示すブロック図である。FIG. 1 is a block diagram illustrating a basic configuration of a noise suppression device according to Embodiment 1 of the present invention.

【図２】本発明の実施の形態１の具体例として、ウィー
ナーフィルタを用いた場合の雑音抑圧装置の構成を示す
ブロック図である。FIG. 2 is a block diagram showing a configuration of a noise suppression device using a Wiener filter as a specific example of Embodiment 1 of the present invention.

【図３】本発明の実施の形態１の具体例として、スペク
トルサブトラクションを用いた場合の雑音抑圧装置の構
成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a noise suppression device when spectrum subtraction is used as a specific example of Embodiment 1 of the present invention.

【図４】実施の形態１の雑音抑圧装置において、雑音抑
圧前後の各周波数成分ごとのスペクトル距離を示した特
性図である。FIG. 4 is a characteristic diagram showing a spectral distance for each frequency component before and after noise suppression in the noise suppression device of the first embodiment.

【図５】実施の形態１の雑音抑圧装置において、各Ｓ／
Ｎ毎及び各周波数成分毎のスペクトル距離を示した特性
図である。FIG. 5 is a block diagram showing the configuration of each S / S
FIG. 4 is a characteristic diagram showing a spectral distance for each N and each frequency component.

【図６】本発明の実施の形態２における雑音抑圧装置の
基本構成を示すブロック図である。FIG. 6 is a block diagram showing a basic configuration of a noise suppression device according to Embodiment 2 of the present invention.

【図７】本発明の実施の形態２の具体例として、ウィー
ナーフィルタを用いた場合の雑音抑圧装置の構成を示す
ブロック図である。FIG. 7 is a block diagram showing a configuration of a noise suppression device using a Wiener filter as a specific example of Embodiment 2 of the present invention.

【図８】本発明の実施の形態１〜２による雑音抑圧装置
を、音声認識装置の前処理として用いた場合の音声認識
率を示した説明図である。FIG. 8 is an explanatory diagram showing a speech recognition rate when the noise suppression device according to Embodiments 1 and 2 of the present invention is used as preprocessing of the speech recognition device.

【図９】本発明の実施の形態３における雑音抑圧装置の
構成を示すブロック図である。FIG. 9 is a block diagram illustrating a configuration of a noise suppression device according to Embodiment 3 of the present invention.

【図１０】本発明の実施の形態４における雑音抑圧装置
の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of a noise suppression device according to Embodiment 4 of the present invention.

【図１１】本発明の実施の形態５における雑音抑圧装置
の構成を示すブロック図である。FIG. 11 is a block diagram showing a configuration of a noise suppression device according to Embodiment 5 of the present invention.

【図１２】本発明の実施の形態６における雑音抑圧装置
の構成を示すブロック図である。FIG. 12 is a block diagram illustrating a configuration of a noise suppression device according to Embodiment 6 of the present invention.

【図１３】本発明の実施の形態７における雑音抑圧装置
の構成を示すブロック図である。FIG. 13 is a block diagram illustrating a configuration of a noise suppression device according to Embodiment 7 of the present invention.

【図１４】本発明の実施の形態７の具体例として、ウィ
ーナーフィルタを用いた場合の雑音抑圧装置の構成を示
すブロック図である。FIG. 14 is a block diagram showing a configuration of a noise suppression device using a Wiener filter as a specific example of the seventh embodiment of the present invention.

【図１５】従来の代表的な雑音抑圧装置の構成例を示す
ブロック図である。FIG. 15 is a block diagram illustrating a configuration example of a conventional typical noise suppression device.

【図１６】従来例の雑音抑圧装置において、各周波数成
分毎のスペクトル距離を示した特性図である。FIG. 16 is a characteristic diagram showing a spectrum distance for each frequency component in the noise suppressor of the conventional example.

【図１７】従来例の雑音抑圧装置において、各Ｓ／Ｎ毎
及び各周波数成分毎のスペクトル距離を示した特性図で
ある。FIG. 17 is a characteristic diagram showing a spectrum distance for each S / N and each frequency component in the conventional noise suppression device.

[Explanation of symbols]

１１スペクトル変換手段１２Ｓ／Ｎ推定手段１３抑圧係数データテーブル１４抑圧量推定手段１５，１３６雑音抑圧手段２１ＦＦＴ２２パワスペクトル変換手段２３ウィーナーフィルタ推定手段２４フィルタ係数導出手段２５フィルタリング演算手段３１位相変換手段３２差分回路３３スペクトル復元手段３４ＩＦＦＴ３５波形接続手段６１フレーム処理手段６２入力信号種類判定手段６３ノイズスペクトル学習手段６４スペクトル成分別Ｓ／Ｎ推定手段６５抑圧係数決定手段６６抑圧量推定手段６７雑音抑圧手段７７インパルス応答制御手段７８畳み込み演算手段１１１スペクトル成分別入力信号種類判定手段１３１擬似音声データ収録手段１３２音声混合手段１３３第１の雑音抑圧手段１３４スペクトル比較手段１３５第２の雑音抑圧手段１４１第１のＦＦＴ１４２第２のＦＦＴ１４３データテーブル決定手段 DESCRIPTION OF SYMBOLS 11 Spectrum conversion means 12 S / N estimation means 13 Suppression coefficient data table 14 Suppression amount estimation means 15,136 Noise suppression means 21 FFT 22 Power spectrum conversion means 23 Wiener filter estimation means 24 Filter coefficient derivation means 25 Filtering calculation means 31 Phase conversion Means 32 Difference circuit 33 Spectrum restoration means 34 IFFT 35 Waveform connection means 61 Frame processing means 62 Input signal type determination means 63 Noise spectrum learning means 64 S / N estimation means for each spectrum component 65 Suppression coefficient determination means 66 Suppression amount estimation means 67 Noise Suppression means 77 impulse response control means 78 convolution operation means 111 input signal type determination means for each spectrum component 131 pseudo audio data recording means 132 audio mixing means 133 first noise suppression means 134 Vector comparison means 135 second noise suppression unit 141 first FFT 142 second FFT 143 data table determining means

───────────────────────────────────────────────────── フロントページの続き (72)発明者茨木悟大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 9A001 BB02 BB03 GG01 GG03 HH01 HH16 HH17 HH18 JJ05 KK31 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Satoru Ibaraki 1006 Kazuma Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. F-term (reference) 9A001 BB02 BB03 GG01 GG03 HH01 HH16 HH17 HH18 JJ05 KK31

Claims

[Claims]

1. A spectrum conversion means for converting an input audio signal containing noise from a time domain to a frequency domain, and an S / N estimation means for estimating an S / N ratio of the input audio signal from a converted output of the spectrum conversion means. A suppression coefficient data table in which a plurality of suppression coefficients for controlling the amount of noise suppression are stored; and a desired suppression coefficient from the suppression coefficient data table based on the S / N ratio estimated by the S / N estimation means. And a noise suppression unit that estimates a noise spectrum to be suppressed, and a noise that suppresses a noise component included in a conversion output of the spectrum conversion unit based on the suppression noise spectrum estimated by the suppression amount estimation unit. And a suppression unit.

2. A frame processing means for dividing an input audio signal containing noise into units of a fixed number of samples, converting the data into frames, and outputting audio data, and a spectrum for obtaining a frequency spectrum for the audio data output from the frame processing means. Conversion means, and n indicating the degree of noise-likeness or sound-likeness on a frame-by-frame basis from audio data output by the frame processing means.
Input signal type determining means for generating a step evaluation index; power spectrum converting means for converting the output of the spectrum converting means into a power spectrum; output from the power spectrum converting means and output from the input signal type determining means Noise spectrum learning means for learning a noise spectrum based on the following: S / N for each frequency component of the input speech signal based on an output from the power spectrum conversion means and an output from the noise spectrum learning means S / N estimating means for each spectrum component for estimating a ratio, and a suppression coefficient storing a plurality of suppression coefficients for controlling a noise suppression amount using two as an S / N ratio and a frequency component of an input speech signal. Based on a data table and an S / N ratio for each frequency component output from the spectrum component-specific S / N estimation means, A suppression coefficient determining unit that extracts a desired suppression coefficient from the compression coefficient data table; and a noise spectrum that is output from the noise spectrum learning unit, and the suppression is performed using the suppression coefficient determined by the suppression coefficient determination unit. A suppression amount estimating unit for estimating a noise spectrum; a noise component included in the input voice signal based on the suppression noise spectrum estimated by the suppression amount estimating unit and a power spectrum output from the power spectrum converting unit. And a noise suppressing unit for suppressing noise.

3. A frame processing means for dividing an input audio signal containing noise into a predetermined number of samples, converting the data into frames, and outputting audio data, and a spectrum for obtaining a frequency spectrum for the audio data output from the frame processing means. Conversion means, and n indicating the degree of noise-likeness or sound-likeness on a frame-by-frame basis from audio data output by the frame processing means.
Input signal type determining means for generating a step evaluation index; power spectrum converting means for converting the output of the spectrum converting means into a power spectrum; output from the power spectrum converting means and output from the input signal type determining means A noise spectrum learning unit that learns a noise spectrum based on the above, a suppression coefficient data table storing a plurality of suppression coefficients that control the amount of noise suppression using the evaluation index as a variable, and the input signal type determination unit. A suppression coefficient determining unit that extracts a desired suppression coefficient from the suppression coefficient data table based on the evaluation index output from the control unit, and a noise spectrum that is output from the noise spectrum learning unit, and is determined by the suppression coefficient determination unit. Suppression amount estimation for estimating the noise spectrum to be suppressed using the reduced suppression coefficient Means, and a noise suppression means for suppressing a noise component included in the input audio signal based on the suppression noise spectrum estimated by the suppression amount estimation means and the power spectrum output from the power spectrum conversion means. A noise suppression device, comprising:

4. A frame processing means for dividing an input audio signal including noise into units of a fixed number of samples, converting the data into frames, and outputting audio data, and a spectrum for obtaining a frequency spectrum for the audio data output from the frame processing means. Conversion means, and n indicating the degree of noise-likeness or sound-likeness on a frame-by-frame basis from audio data output by the frame processing means.
Input signal type determining means for generating a step evaluation index; power spectrum converting means for converting the output of the spectrum converting means into a power spectrum; output from the power spectrum converting means and output from the input signal type determining means Noise spectrum learning means for learning a noise spectrum based on the following: S / N for each frequency component of the input speech signal based on an output from the power spectrum conversion means and an output from the noise spectrum learning means S / N estimating means for each spectral component for estimating a ratio; and a plurality of suppression coefficients for controlling the amount of noise suppression using the n-stage evaluation index, the S / N ratio of the input speech signal, and the frequency component as variables. And the S / N for each frequency component output from the spectral component-specific S / N estimating means. Reference is made to a suppression coefficient determining means for extracting a desired suppression coefficient from the suppression coefficient data table based on the ratio and the evaluation index output from the input signal type determination means, and a noise spectrum output from the noise spectrum learning means. A suppression amount estimating unit that estimates a noise spectrum to be suppressed using the suppression coefficient determined by the suppression coefficient determining unit; a suppression noise spectrum estimated by the suppression amount estimation unit; and the power spectrum conversion. A noise suppression unit configured to suppress a noise component included in the input voice signal based on a power spectrum output from the unit.

5. A frame processing means for dividing an input audio signal containing noise into units of a fixed number of samples, converting the data into frames, and outputting audio data, and a spectrum for obtaining a frequency spectrum for the audio data output from the frame processing means. Conversion means, power spectrum conversion means for converting the output of the spectrum conversion means into a power spectrum, and power spectrum data output from the power spectrum conversion means, from the data of the power spectrum, n stages indicating the degree of noise likeness or sound likeness in frame units A spectrum component input signal type determining means for generating an evaluation index of each frequency component, and a noise spectrum based on an output from the power spectrum converting means and an output from the spectrum component input signal type determining means. Noise spectrum learning means for learning, the evaluation finger Coefficient data table in which a plurality of suppression coefficients for controlling the amount of noise suppression are stored with two variables, namely, a frequency component and a frequency component, and an evaluation for each frequency component output from the spectrum component input signal type determination means. Based on the exponent, a suppression coefficient determining unit that extracts a desired suppression coefficient from the suppression coefficient data table, and a noise spectrum output from the noise spectrum learning unit, and the suppression coefficient determined by the suppression coefficient determination unit is referred to. A suppression amount estimating means for estimating a noise spectrum to be suppressed by using the input sound signal based on a suppression noise spectrum estimated by the suppression amount estimating means and a power spectrum output from the power spectrum converting means. Noise suppression means for suppressing a noise component included in the image signal. Location.

6. A frame processing means for dividing an input audio signal including noise into a unit of a fixed number of samples, performing frame conversion and outputting audio data, and a spectrum for obtaining a frequency spectrum for the audio data output from the frame processing means. Conversion means, power spectrum conversion means for converting the output of the spectrum conversion means into a power spectrum, and power spectrum data output from the power spectrum conversion means, from the data of the power spectrum, n stages indicating the degree of noise likeness or sound likeness in frame units A spectrum component input signal type determining means for generating an evaluation index of each frequency component, and a noise spectrum based on an output from the power spectrum converting means and an output from the spectrum component input signal type determining means. Noise spectrum learning means for learning, and the power Using the power spectrum of the input signal in the frame output from the vector conversion means and the power spectrum of the estimated noise in the frame output from the noise spectrum learning means, a representative value or spectral component of the S / N ratio in the frame S / N estimating means for each spectral component for estimating another S / N ratio, and a suppression coefficient for controlling a noise suppression amount using three of the S / N ratio, the evaluation index, and the frequency component as variables. And S output from the S / N estimating means for each spectral component.
/ N ratio, and an evaluation index for each frequency component output from the spectrum component-based input signal determination means,
A suppression coefficient determining unit that extracts a desired suppression coefficient from the suppression coefficient data table; and a noise spectrum output from the noise spectrum learning unit, and the suppression is performed using the suppression coefficient determined by the suppression coefficient determination unit. A suppression amount estimating unit for estimating a power noise spectrum; a noise included in the input voice signal based on the suppression noise spectrum estimated by the suppression amount estimating unit and a power spectrum output from the power spectrum converting unit. And a noise suppressing unit for suppressing a component.

7. A pseudo-sound data recording means for storing pseudo-speech data having a predetermined frame length as time-domain sound data in advance, and a pseudo-speech data recording means which outputs the pseudo-speech data recording means only when an input sound signal is determined to be noise. Sound mixing means for mixing and storing pseudo sound data and the input sound signal at a plurality of different S / N ratios, and sound data output from the sound mixing means and mixed at a plurality of S / N ratios. A first noise suppression unit that performs noise suppression; a spectrum comparison unit that compares a spectrum of audio data output from the first noise suppression unit with a spectrum of the pseudo audio data; an output of the spectrum comparison unit Based on the comparison result
Update the suppression coefficient to control the amount of noise suppression, a suppression coefficient data table that stores the updated suppression coefficient, and extract the suppression coefficient from the suppression coefficient data table,
Second noise suppression means for suppressing noise of an input voice signal;
A learning-type noise suppression device comprising:

8. The noise suppression device according to claim 1, wherein the first and second noise suppression units are obtained by removing the suppression coefficient data table from the noise suppression device according to claim 1. 7. The noise suppression device according to 7.