JPWO2016038704A1

JPWO2016038704A1 - Noise suppression device, noise suppression method, and noise suppression program

Info

Publication number: JPWO2016038704A1
Application number: JP2016547306A
Authority: JP
Inventors: 訓古田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2014-09-10
Filing date: 2014-09-10
Publication date: 2017-04-27
Anticipated expiration: 2034-09-10
Also published as: JP6261749B2; WO2016038704A1

Abstract

入力信号に関連する情報を用いて、あらかじめ生成された複数の周波数形状に対応した雑音スペクトルである目標雑音スペクトル候補から、目標雑音スペクトルを生成する目標雑音スペクトル生成部６と、生成された目標雑音スペクトルに基づいて、入力信号に含まれた雑音の抑圧量の上下限を規定する抑圧量制限係数を計算する抑圧量制限係数計算部７と、計算された抑圧量制限係数を用いて、スペクトル抑圧量を計算する抑圧量計算部９とを備える。A target noise spectrum generating unit 6 that generates a target noise spectrum from target noise spectrum candidates that are noise spectra corresponding to a plurality of frequency shapes generated in advance using information related to the input signal, and the generated target noise Based on the spectrum, a suppression amount limiting coefficient calculation unit 7 that calculates a suppression amount limiting coefficient that defines upper and lower limits of the amount of noise suppression included in the input signal, and spectrum suppression using the calculated suppression amount limiting coefficient And a suppression amount calculation unit 9 for calculating the amount.

Description

この発明は、入力信号に重畳した背景雑音を抑圧する雑音抑圧装置、雑音抑圧方法および雑音抑圧プログラムに関するものである。 The present invention relates to a noise suppression device, a noise suppression method, and a noise suppression program that suppress background noise superimposed on an input signal.

近年のディジタル信号処理技術の進展に伴い、携帯電話による屋外での音声通話、自動車内でのハンズフリー音声通話、および音声認識によるハンズフリー操作が広く普及している。これらの機能を実現する装置は高騒音環境下で用いられることが多いため、マイクロホンに音声と共に背景雑音も入力されてしまい、通話音声の劣化および音声認識率の低下などを招く。そのため、快適な音声通話および高精度の音声認識を実現するには、入力信号に混入した背景雑音を抑圧する雑音抑圧処理が必要である。 With the recent progress of digital signal processing technology, outdoor voice calls using mobile phones, hands-free voice calls in automobiles, and hands-free operations using voice recognition have become widespread. Since a device that realizes these functions is often used in a high noise environment, background noise is also input to the microphone together with the voice, leading to deterioration of the call voice and a reduction in the voice recognition rate. Therefore, in order to realize a comfortable voice call and high-accuracy voice recognition, a noise suppression process that suppresses background noise mixed in the input signal is necessary.

従来の雑音抑圧方法としては、例えば、時間領域の入力信号を周波数領域の信号であるパワースペクトルに変換し、入力信号のパワースペクトルと、入力信号から別途推定した推定雑音スペクトルとを用いて雑音抑圧のための抑圧量を算出し、得られた抑圧量を用いて入力信号のパワースペクトルの振幅抑圧を行い、振幅抑圧されたパワースペクトルと入力信号の位相スペクトルを時間領域へ変換して雑音抑圧信号を得る方法がある（例えば、非特許文献１参照）。 As a conventional noise suppression method, for example, a time domain input signal is converted into a power spectrum which is a frequency domain signal, and noise suppression is performed using the power spectrum of the input signal and an estimated noise spectrum separately estimated from the input signal. The amount of suppression for the input signal is calculated, the amplitude of the power spectrum of the input signal is suppressed using the obtained amount of suppression, and the noise-suppressed signal is converted by converting the amplitude-suppressed power spectrum and the phase spectrum of the input signal into the time domain. (For example, refer nonpatent literature 1).

この従来の雑音抑圧方法では、音声のパワースペクトルと推定雑音パワースペクトルの比（ＳＮ比）に基づいて抑圧量を算出しているが、入力信号に重畳する雑音が時間・周波数方向にある程度定常な条件下で有効であり、時間・周波数方向で非定常な雑音が入力されると正しく抑圧量を算出することができず、ミュージカルトーンと呼ばれる耳障りな人工的な残留雑音が生じるという不具合があった。 In this conventional noise suppression method, the suppression amount is calculated based on the ratio (SN ratio) of the power spectrum of speech to the estimated noise power spectrum, but the noise superimposed on the input signal is somewhat steady in the time and frequency directions. It is effective under certain conditions, and when non-stationary noise is input in the time and frequency directions, the amount of suppression cannot be calculated correctly, and there is a problem that an unpleasant artificial residual noise called a musical tone occurs. .

上記の不具合に対し、例えば特許文献１には、雑音抑圧後の出力信号に対し、レベルを適宜調整した入力信号（原音）を付加することで、耳障りな残留雑音を聴感上目立たなくする方法が開示されている。 For example, Patent Document 1 discloses a method for making the harsh residual noise inconspicuous by adding an input signal (original sound) whose level is appropriately adjusted to an output signal after noise suppression. It is disclosed.

また、異なる方法として特許文献２には、安定した雑音抑圧をするために所定の１つの目標雑音スペクトルを予め設定し、残留雑音スペクトルが設定した目標雑音スペクトルに近づくよう雑音抑圧量を制御することにより、非定常騒音に対してもミュージカルノイズの発生を抑え、自然で安定した雑音抑圧を行う方法が開示されている。 As another method, Patent Document 2 discloses that a predetermined target noise spectrum is set in advance for stable noise suppression, and the noise suppression amount is controlled so that the residual noise spectrum approaches the set target noise spectrum. Thus, a method for suppressing the occurrence of musical noise even for unsteady noise and performing natural and stable noise suppression is disclosed.

特開２０００−８２９９９号公報JP 2000-82999 A 欧州特許出願公開第１９９５７２２号明細書European Patent Application Publication No. 1995722

Ｙ．Ｅｐｈｒａｉｍ，Ｄ．Ｍａｌａｈ，“ＳｐｅｅｃｈＥｎｈａｎｃｅｍｅｎｔＵｓｉｎｇａＭｉｎｉｍｕｍＭｅａｎＳｑｕａｒｅＥｒｒｏｒＳｈｏｒｔ−ＴｉｍｅＳｐｅｃｔｒａｌＡｍｐｌｉｔｕｄｅＥｓｔｉｍａｔｏｒ”，ＩＥＥＥＴｒａｎｓ．ＡＳＳＰ，ｖｏｌ．ＡＳＳＰ−３２，Ｎｏ．６Ｄｅｃ．１９８４Y. Ephrim, D.H. Malah, “Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimator”, IEEE Trans. ASSP, vol. ASSP-32, no. 6 Dec. 1984

しかしながら、上述した特許文献１の技術では、出力信号に所定の加工信号を付加するため、出力信号の音色に変化が生じる、あるいは音声信号が雑音的になるなどの課題があった。 However, the technique disclosed in Patent Document 1 described above has a problem in that a predetermined processed signal is added to the output signal, so that the timbre of the output signal changes or the sound signal becomes noisy.

また、上述した特許文献２の技術では、所定の帯域のパワーに基づいて雑音抑圧後の残留雑音のスペクトルを所定の１つの目標雑音スペクトルに近づけるように制御しているため、特許文献１の技術による新たな課題は発生しないものの、以下に示すような課題があった。 In the technique of Patent Document 2 described above, the spectrum of residual noise after noise suppression is controlled based on the power of a predetermined band so as to be close to a predetermined target noise spectrum. Although the new problem by does not occur, there are the following problems.

図１４および図１５を参照しながら、特許文献２の技術および当該技術による課題について説明する。図１４および図１５は特許文献２に記載の従来技術を模式的に示した図であり、縦軸は信号振幅（デシベル：ｄＢ）、横軸は周波数（０〜４０００Ｈｚ）を示す。
図１４は、車両が高速（時速７０ｋｍおよび時速１６０ｋｍ）で走行した場合の車内雑音のスペクトルの様態を示す図である。スペクトルＮａは車両が時速７０ｋｍで走行した場合における入力信号の推定雑音スペクトルを示し、スペクトルＮｂは車両が時速１６０ｋｍで走行した場合における入力信号の推定雑音スペクトルを示す。ここで、入力信号の推定雑音スペクトルとは、入力信号に混入する走行騒音から推定されるスペクトルである。
領域Ａおよび領域Ｂで示すように、車両の走行速度の違いによって、雑音の周波数特性に差異が生じる。図１４で示した推定雑音スペクトルＮａ，Ｎｂに対して、特許文献２に記載の従来技術を適用して雑音抑制を行った結果を図１５に示す。With reference to FIG. 14 and FIG. 15, the technique of Patent Document 2 and the problems caused by the technique will be described. 14 and 15 are diagrams schematically showing the prior art described in Patent Document 2, in which the vertical axis indicates signal amplitude (decibel: dB), and the horizontal axis indicates frequency (0 to 4000 Hz).
FIG. 14 is a diagram illustrating a state of a spectrum of in-vehicle noise when the vehicle travels at a high speed (70 km / h and 160 km / h). The spectrum Na indicates the estimated noise spectrum of the input signal when the vehicle travels at a speed of 70 km / h, and the spectrum Nb indicates the estimated noise spectrum of the input signal when the vehicle travels at a speed of 160 km / h. Here, the estimated noise spectrum of the input signal is a spectrum estimated from running noise mixed in the input signal.
As indicated by region A and region B, a difference in the frequency characteristics of noise occurs due to the difference in the traveling speed of the vehicle. FIG. 15 shows the result of noise suppression performed by applying the prior art described in Patent Document 2 to the estimated noise spectra Na and Nb shown in FIG.

図１５（ａ）は時速７０ｋｍで走行中の車両内での雑音抑制を示し、図１５（ｂ）は時速１６０ｋｍで走行中の車両内での雑音抑制を示している。
スペクトルＮａ，Ｎｂは推定雑音スペクトルを示し、スペクトルＲａ，Ｒｂは目標雑音スペクトルを示し、スペクトルＳａ，Ｓｂは残留雑音スペクトルを示す。特許文献２に記載の雑音抑制方法では、基準抑圧量を決定する帯域Ｘａ，Ｘｂにおいて、残留雑音スペクトルＳａ，Ｓｂのレベルを目標雑音スペクトルＲａ，Ｒｂの振幅レベルに合うように、雑音抑圧のための最大抑圧量を制御する（帯域Ｘａ，Ｘｂ内の位置Ｙａ，Ｙｂ参照）。制御された最大抑圧量に基づいて、推定雑音スペクトルＮａ，Ｎｂに対して雑音抑制を行う。具体的には、図１５（ａ）の矢印Ｚ_ａ１，Ｚ_ａ２，Ｚ_ａ３、図１５（ｂ）の矢印Ｚ_ｂ１，Ｚ_ｂ２，Ｚ_ｂ３で示した方向へ、最大抑圧量に基づいた雑音抑制処理を行う。FIG. 15A shows noise suppression in the vehicle running at a speed of 70 km / h, and FIG. 15B shows noise suppression in the vehicle running at a speed of 160 km / h.
The spectra Na and Nb indicate estimated noise spectra, the spectra Ra and Rb indicate target noise spectra, and the spectra Sa and Sb indicate residual noise spectra. In the noise suppression method described in Patent Document 2, noise suppression is performed so that the levels of the residual noise spectra Sa and Sb match the amplitude levels of the target noise spectra Ra and Rb in the bands Xa and Xb for determining the reference suppression amount. Is controlled (see positions Ya and Yb in the bands Xa and Xb). Based on the controlled maximum suppression amount, noise suppression is performed on the estimated noise spectra Na and Nb. Specifically, noise suppression based on the maximum suppression amount is performed in the directions indicated by arrows Z _a1 , Z _a2 , and Z _a3 in FIG. 15A and arrows Z _b1 , Z _b2 , and Z _b3 in FIG. Process.

雑音抑制処理を行った結果、図１５（ａ）に示すように、入力信号の目標雑音スペクトルＲａの形状およびパワーが、推定雑音スペクトルＮａの形状およびパワーと概ね一致する場合には、雑音抑圧処理後の残留抑圧スペクトルＳａは良好な結果を示す。 As a result of performing the noise suppression processing, as shown in FIG. 15A, when the shape and power of the target noise spectrum Ra of the input signal substantially coincide with the shape and power of the estimated noise spectrum Na, the noise suppression processing is performed. The subsequent residual suppression spectrum Sa shows good results.

一方、図１５（ｂ）に示すように、目標雑音スペクトルＲｂの形状およびパワーが、推定雑音スペクトルＮｂの形状およびパワーと大きく異なった場合、雑音抑圧処理後の残留雑音スペクトルＳｂと目標雑音スペクトルＲｂの形状およびパワーとが一致せず、残留雑音スペクトルＳｂを目標雑音スペクトルＲｂの周波数特性に合わせるように、さらに抑圧制御を行う。これにより、領域Ｃで示すように極端に抑圧過剰な帯域（矢印Ｚ_ｃ参照）、あるいは領域Ｄで示すように極端に抑圧不足な帯域（矢印Ｚ_ｄ参照）が発生する。これらの帯域により、音声に歪み、隠滅感および雑音感が生じるという課題があった。On the other hand, as shown in FIG. 15B, when the shape and power of the target noise spectrum Rb are significantly different from the shape and power of the estimated noise spectrum Nb, the residual noise spectrum Sb after the noise suppression processing and the target noise spectrum Rb Further, suppression control is further performed so that the residual noise spectrum Sb matches the frequency characteristic of the target noise spectrum Rb without matching the shape and power of. Thus, extremely suppressed (see the arrow Z _c) excess bandwidth as indicated by a region C, or extremely insufficient suppression band (see arrow Z _d) as indicated by a region D is generated. Due to these bands, there is a problem that the sound is distorted, obscured and noisy.

この発明は、上記のような課題を解決するためになされたもので、音声に歪みや隠滅感および雑音感が生じない良好な雑音抑制を行うことを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to perform excellent noise suppression without causing distortion, concealment, and noise.

この発明に係る雑音抑圧装置は、入力信号に関連する情報を用いて、あらかじめ生成された複数の周波数形状に対応した雑音スペクトルである目標雑音スペクトル候補から、目標雑音スペクトルを生成する目標雑音スペクトル生成部と、生成された目標雑音スペクトルに基づいて、入力信号に含まれた雑音の抑圧量の上下限を規定する抑圧量制限係数を計算する抑圧量制限係数計算部と、計算された抑圧量制限係数を用いて、スペクトル抑圧量を計算する抑圧量計算部とを備えるものである。 A noise suppression device according to the present invention generates a target noise spectrum that generates a target noise spectrum from a target noise spectrum candidate that is a noise spectrum corresponding to a plurality of frequency shapes generated in advance using information related to an input signal. Based on the generated target noise spectrum, a suppression amount limiting coefficient calculation unit for calculating a suppression amount limiting coefficient that defines the upper and lower limits of the amount of noise suppression included in the input signal, and the calculated suppression amount limitation And a suppression amount calculation unit that calculates a spectrum suppression amount using the coefficient.

この発明によれば、極端に抑圧過剰および抑圧不足である帯域を発生させることなく、ミュージカルノイズの発生を抑え、音声に歪みや隠滅感および雑音感が生じない良好な雑音抑圧を行うことができる。 According to the present invention, it is possible to suppress the generation of musical noise without generating a band that is extremely over-suppressed or under-suppressed, and to perform good noise suppression that does not cause a distortion, a sense of concealment, and noise. .

実施の形態１に係る雑音抑圧装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a noise suppression device according to Embodiment 1. FIG. 実施の形態１に係る雑音抑圧装置の動作を示すフローチャートである。3 is a flowchart showing the operation of the noise suppression device according to the first embodiment. 実施の形態１に係る雑音抑圧装置の目標雑音スペクトル生成部の構成を示すブロック図である。3 is a block diagram illustrating a configuration of a target noise spectrum generation unit of the noise suppression apparatus according to Embodiment 1. FIG. 実施の形態１に係る雑音抑圧装置の目標雑音スペクトルメモリが蓄積する目標雑音スペクトルの一例を示す図である。It is a figure which shows an example of the target noise spectrum which the target noise spectrum memory of the noise suppression apparatus which concerns on Embodiment 1 accumulate | stores. 実施の形態１に係る雑音抑圧装置の目標雑音スペクトル生成部の動作を示すフローチャートである。4 is a flowchart showing an operation of a target noise spectrum generation unit of the noise suppression device according to the first embodiment. 実施の形態１に係る雑音抑圧装置の抑圧量制限係数計算部の構成を示すブロック図である。3 is a block diagram illustrating a configuration of a suppression amount limiting coefficient calculation unit of the noise suppression apparatus according to Embodiment 1. FIG. 実施の形態１に係る雑音抑圧装置の抑圧量制限係数計算部の動作を示すフローチャートである。6 is a flowchart illustrating an operation of a suppression amount limiting coefficient calculation unit of the noise suppression device according to the first embodiment. 実施の形態１に係る雑音抑圧装置による雑音抑圧処理後の残留雑音スペクトルを模式的に示した図である。6 is a diagram schematically showing a residual noise spectrum after noise suppression processing by the noise suppression device according to Embodiment 1. FIG. 実施の形態２に係る雑音抑圧装置の目標雑音スペクトル生成部の構成を示すブロック図である。6 is a block diagram illustrating a configuration of a target noise spectrum generation unit of a noise suppression apparatus according to Embodiment 2. FIG. 実施の形態２に係る雑音抑圧装置の目標雑音スペクトル生成部の動作を示すフローチャートである。6 is a flowchart illustrating an operation of a target noise spectrum generation unit of the noise suppression device according to the second embodiment. 実施の形態３に係る雑音抑圧装置の目標雑音スペクトル生成部の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a target noise spectrum generation unit of a noise suppression device according to a third embodiment. 実施の形態３に係る雑音抑圧装置の目標雑音スペクトル生成部の動作を示すフローチャートである。10 is a flowchart illustrating an operation of a target noise spectrum generation unit of the noise suppression device according to the third embodiment. 実施の形態４に係る雑音抑圧装置の目標雑音スペクトル生成部の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a target noise spectrum generation unit of a noise suppression apparatus according to Embodiment 4. 従来の雑音抑圧方法による推定雑音スペクトルを示す図である。It is a figure which shows the estimated noise spectrum by the conventional noise suppression method. 従来の雑音抑圧方法による雑音抑圧を示す図である。It is a figure which shows the noise suppression by the conventional noise suppression method.

以下、この発明をより詳細に説明するために、この発明を実施するための形態について、添付の図面に従って説明する。
実施の形態１．
図１は、実施の形態１に係る雑音抑圧装置の構成を示すブロック図である。
実施の形態１の雑音抑圧装置１００は、入力端子１、フーリエ変換部２、パワースペクトル計算部３、音声・雑音区間判定部４、雑音スペクトル推定部５、目標雑音スペクトル生成部６、抑圧量制限係数計算部７、ＳＮ比計算部８、抑圧量計算部９、スペクトル抑圧部１０、逆フーリエ変換部１１および出力端子１２を備えている。
雑音抑圧装置１００の入力、すなわち入力端子１への入力としては、マイクロホン（図示せず）などを通じて取り込まれた音声および音楽などの音声信号がＡ／Ｄ（アナログ・デジタル）変換された後、所定のサンプリング周波数（例えば、８ｋＨｚ）でサンプリングされると共にフレーム単位（例えば、１０ｍｓ）に分割された信号を用いる。Hereinafter, in order to explain the present invention in more detail, modes for carrying out the present invention will be described with reference to the accompanying drawings.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing the configuration of the noise suppression apparatus according to the first embodiment.
The noise suppression apparatus 100 according to the first embodiment includes an input terminal 1, a Fourier transform unit 2, a power spectrum calculation unit 3, a speech / noise section determination unit 4, a noise spectrum estimation unit 5, a target noise spectrum generation unit 6, and a suppression amount restriction. A coefficient calculation unit 7, an SN ratio calculation unit 8, a suppression amount calculation unit 9, a spectrum suppression unit 10, an inverse Fourier transform unit 11, and an output terminal 12 are provided.
As an input to the noise suppression apparatus 100, that is, an input to the input terminal 1, a voice signal such as a voice and music input through a microphone (not shown) or the like is A / D (analog / digital) converted and then a predetermined signal is input. A signal that is sampled at a sampling frequency (for example, 8 kHz) and divided into frame units (for example, 10 ms) is used.

入力端子１は上述した信号を取り込み、フーリエ変換部２は取り込まれた信号に対して高速フーリエ変換を行ってスペクトル成分Ｘ（λ，ｋ）を取得する。パワースペクトル計算部３は、フーリエ変換部２が変換したスペクトル成分Ｘ（λ，ｋ）からパワースペクトルＹ（λ，ｋ）を計算する。音声・雑音区間判定部４は、パワースペクトル計算部３が計算したパワースペクトルＹ（λ，ｋ）と、雑音スペクトル推定部５が１フレーム前に推定した推定雑音スペクトルＮ（λ−１，ｋ）とを用いて、現フレームの音声信号が音声であるか、雑音であるかの判定を行う。 The input terminal 1 captures the above-described signal, and the Fourier transform unit 2 performs a fast Fourier transform on the captured signal to obtain a spectrum component X (λ, k). The power spectrum calculation unit 3 calculates a power spectrum Y (λ, k) from the spectrum component X (λ, k) converted by the Fourier transform unit 2. The speech / noise section determination unit 4 includes a power spectrum Y (λ, k) calculated by the power spectrum calculation unit 3 and an estimated noise spectrum N (λ-1, k) estimated by the noise spectrum estimation unit 5 one frame before. Are used to determine whether the audio signal of the current frame is audio or noise.

雑音スペクトル推定部５は、パワースペクトル計算部３が計算したパワースぺクトルＹ（λ，ｋ）と、音声・雑音区間判定部４の判定結果とを用いて、現フレームの推定雑音スペクトルＮ（λ，ｋ）を取得する。ここで、現フレームの推定雑音スペクトルとは、現フレームの入力信号に混入する騒音から推定されるスペクトルである。目標雑音スペクトル生成部６は、雑音スペクトル推定部５が取得した推定雑音スペクトルＮ（λ，ｋ）から目標雑音スペクトルＲ（λ，ｋ）を生成する。ここで、目標雑音スペクトルとは、入力信号のスペクトル成分Ｘ（λ，ｋ）の雑音抑圧を行う際に、雑音抑圧処理の目標とするスペクトルである。抑圧量制限係数計算部７は、現フレームでの推定雑音スペクトルＮ（λ，ｋ）に適合するように目標雑音スペクトルＲ（λ，ｋ）のゲインを修正し、抑圧量制限係数Ｇ_{ｆｌｏｏｒ}（λ，ｋ）を計算する。The noise spectrum estimation unit 5 uses the power spectrum Y (λ, k) calculated by the power spectrum calculation unit 3 and the determination result of the speech / noise section determination unit 4 to estimate the noise spectrum N (λ , K). Here, the estimated noise spectrum of the current frame is a spectrum estimated from noise mixed in the input signal of the current frame. The target noise spectrum generation unit 6 generates a target noise spectrum R (λ, k) from the estimated noise spectrum N (λ, k) acquired by the noise spectrum estimation unit 5. Here, the target noise spectrum is a spectrum that is a target of noise suppression processing when noise suppression of the spectrum component X (λ, k) of the input signal is performed. The suppression amount limiting coefficient calculation unit 7 corrects the gain of the target noise spectrum R (λ, k) so as to match the estimated noise spectrum N (λ, k) in the current frame, and suppresses the suppression amount limiting coefficient G _floor (λ , K).

ＳＮ比計算部８は、スペクトル成分毎の事後ＳＮＲ（ａｐｏｓｔｅｒｉｏｒｉＳＮＲ）と事前ＳＮＲ（ａｐｒｉｏｒｉＳＮＲ）を計算する。抑圧量計算部９は、ＳＮ比計算部８が計算した事後ＳＮＲγ（λ，ｋ）および事前ＳＮＲξ（λ，ｋ）と、抑圧量制限係数計算部７が計算した抑圧量制限係数Ｇ_{ｆｌｏｏｒ}（λ，ｋ）とを用いて、スペクトル毎の雑音抑圧量であるスペクトル抑圧量Ｇ（λ，ｋ）を計算する。スペクトル抑圧部１０は、スペクトル抑圧量Ｇ（λ，ｋ）を用いてスペクトル成分Ｘ（λ，ｋ）をスペクトル毎に抑圧し、雑音抑圧された音声信号スペクトルＳ（λ，ｋ）を求める。逆フーリエ変換部１１は、スペクトル抑圧部１０が求めた音声信号スペクトルＳ（λ，ｋ）を用いて逆フーリエ変換を行い、雑音抑圧された音声信号ｓ（ｔ）を得る。出力端子１２は、雑音抑圧された音声信号ｓ（ｔ）を外部へ出力する。The S / N ratio calculator 8 calculates an a posteriori SNR (a postoriori SNR) and an a priori SNR (a priori SNR) for each spectrum component. The suppression amount calculation unit 9 calculates the posterior SNRγ (λ, k) and the prior SNRξ (λ, k) calculated by the SN ratio calculation unit 8 and the suppression amount restriction coefficient G _floor (λ , K) is used to calculate a spectrum suppression amount G (λ, k), which is a noise suppression amount for each spectrum. The spectrum suppression unit 10 suppresses the spectrum component X (λ, k) for each spectrum using the spectrum suppression amount G (λ, k), and obtains the noise signal-suppressed speech signal spectrum S (λ, k). The inverse Fourier transform unit 11 performs inverse Fourier transform using the audio signal spectrum S (λ, k) obtained by the spectrum suppression unit 10 to obtain a noise-suppressed audio signal s (t). The output terminal 12 outputs the audio signal s (t) whose noise is suppressed to the outside.

続いて、実施の形態１に係る雑音抑圧装置１００の各構成の動作原理を、図１および図２に基づいて説明する。
図２は、実施の形態１に係る雑音抑圧装置１００の動作を示すフローチャートである。
入力端子１は、上述した信号を取り込み、入力信号としてフーリエ変換部２に出力する（ステップＳＴ１）。フーリエ変換部２は、ステップＳＴ１で入力された入力信号に対して例えばハニング窓掛けを行った後、以下の式（１）を用いて例えば２５６点の高速フーリエ変換を行い、時間領域の信号ｘ（ｔ）からスペクトル成分Ｘ（λ，ｋ）に変換する（ステップＳＴ２）。得られたスペクトル成分Ｘ（λ，ｋ）は、パワースペクトル計算部３およびスペクトル抑圧部１０にそれぞれ出力される。
Ｘ（λ，ｋ）＝ＦＴ［ｘ（ｔ）］（１）
式（１）において、λは入力信号をフレーム分割したときのフレーム番号、ｋはパワースペクトルの周波数帯域の周波数成分を指定する番号（以下、スペクトル番号と称する）、ＦＴ［・］はフーリエ変換処理を表す。また、ｔはサンプリングにおける離散時間番号を表す。Subsequently, the operation principle of each component of the noise suppression apparatus 100 according to Embodiment 1 will be described with reference to FIGS. 1 and 2.
FIG. 2 is a flowchart showing the operation of the noise suppression apparatus 100 according to the first embodiment.
The input terminal 1 takes in the signal described above and outputs it as an input signal to the Fourier transform unit 2 (step ST1). The Fourier transform unit 2 performs, for example, Hanning windowing on the input signal input in step ST1, and then performs, for example, 256 points of fast Fourier transform using the following equation (1) to obtain a time domain signal x. Conversion from (t) to spectral components X (λ, k) (step ST2). The obtained spectrum component X (λ, k) is output to the power spectrum calculation unit 3 and the spectrum suppression unit 10, respectively.
X (λ, k) = FT [x (t)] (1)
In Expression (1), λ is a frame number when the input signal is divided into frames, k is a number that designates a frequency component of the frequency band of the power spectrum (hereinafter referred to as spectrum number), and FT [·] is a Fourier transform process. Represents. T represents a discrete time number in sampling.

パワースペクトル計算部３は、以下の式（２）を用いて、入力信号のスペクトル成分Ｘ（λ，ｋ）からパワースペクトルＹ（λ，ｋ）を計算する（ステップＳＴ３）。得られたパワースペクトルＹ（λ，ｋ）は、後述する音声・雑音区間判定部４、雑音スペクトル推定部５、抑圧量制限係数計算部７およびＳＮ比計算部８にそれぞれ出力される。

式（２）において、Ｒｅ｛Ｘ（λ，ｋ）｝およびＩｍ｛Ｘ（λ，ｋ）｝は、それぞれフーリエ変換後の入力信号スペクトルの実数部および虚数部を表す。The power spectrum calculation unit 3 calculates the power spectrum Y (λ, k) from the spectrum component X (λ, k) of the input signal using the following equation (2) (step ST3). The obtained power spectrum Y (λ, k) is output to a speech / noise section determining unit 4, a noise spectrum estimating unit 5, a suppression amount limiting coefficient calculating unit 7 and an SN ratio calculating unit 8, which will be described later.

In Equation (2), Re {X (λ, k)} and Im {X (λ, k)} represent the real part and the imaginary part of the input signal spectrum after Fourier transform, respectively.

音声・雑音区間判定部４は、パワースペクトル計算部３から入力されるパワースペクトルＹ（λ，ｋ）と、後述する雑音スペクトル推定部５から入力される１フレーム前に推定された推定雑音スペクトルＮ（λ−１，ｋ）とを用いて、現フレームλの入力信号が音声であるか雑音であるかの判定を行う（ステップＳＴ４）。現フレームλの入力信号が音声である場合（ステップＳＴ４；音声）、判定フラグを「１（音声）」にセットする（ステップＳＴ５）。一方、現フレームλの入力信号が雑音である場合（ステップＳＴ４；雑音）、判定フラグを「０（雑音）」にセットする（ステップＳＴ６）。ステップＳＴ５またはステップＳＴ６でセットされた判定フラグは、雑音スペクトル推定部５および後述する抑圧量制限係数計算部７へそれぞれ出力される。 The voice / noise section determination unit 4 includes a power spectrum Y (λ, k) input from the power spectrum calculation unit 3 and an estimated noise spectrum N estimated one frame before input from a noise spectrum estimation unit 5 described later. Using (λ-1, k), it is determined whether the input signal of the current frame λ is speech or noise (step ST4). When the input signal of the current frame λ is voice (step ST4; voice), the determination flag is set to “1 (voice)” (step ST5). On the other hand, when the input signal of the current frame λ is noise (step ST4; noise), the determination flag is set to “0 (noise)” (step ST6). The determination flag set in step ST5 or step ST6 is output to the noise spectrum estimation unit 5 and a suppression amount limiting coefficient calculation unit 7 described later.

音声・雑音区間判定部４は、例えば、以下の式（３）および式（４）のどちらか一方、または両方を満たすか否かに基づいてステップＳＴ４の音声／雑音区間の判定を行う。式（３）および式（４）のどちらか一方、または両方を満たす場合には、音声であると判定して判定フラグＶｆｌａｇを「１（音声）」にセットする。一方、式（３）および式（４）の両方を満たさない場合には雑音であると判定して判定フラグＶｆｌａｇを「０（雑音）」にセットする。

For example, the voice / noise section determination unit 4 determines the voice / noise section in step ST4 based on whether or not one or both of the following formulas (3) and (4) are satisfied. When either or both of the expressions (3) and (4) are satisfied, it is determined that the sound is sound, and the determination flag Vflag is set to “1 (sound)”. On the other hand, when both of the expressions (3) and (4) are not satisfied, it is determined as noise and the determination flag Vflag is set to “0 (noise)”.

ここで、式（３）において、Ｎ（λ−１，ｋ）は前フレームの推定雑音スペクトルであり、Ｓ_powとＮ_powはそれぞれ入力信号のパワースペクトルの総和、推定雑音スペクトルの総和である。また、式（４）において、ρ_max（λ）は正規化自己相関関数の最大値である。さらに、ＴＨ_{FR_SN}およびＴＨ_ACFは、判定用の所定の定数閾値であり、好適な例としてはＴＨ_{FR_SN}＝３．０およびＴＨ_ACF＝０．３であるが、入力信号の状態および雑音レベルに応じて適宜変更することもできる。Here, in Equation (3), N (λ-1, k) is the estimated noise spectrum of the previous frame, and S _pow and N _pow are the sum of the power spectrum and the estimated noise spectrum of the input signal, respectively. In the equation (4), ρ _max (λ) is the maximum value of the normalized autocorrelation function. Further, TH _{FR_SN} and TH _ACF are predetermined constant threshold values for determination. As a preferable example, TH _{FR_SN} = 3.0 and TH _ACF = 0.3, but depending on the state of the input signal and the noise level Can be changed as appropriate.

なお、式（４）において正規化自己相関関数の最大値ρ_max（λ）は、以下のように求めることができる。まず、式（５）を用いて、パワースペクトルＹ（λ，ｋ）から正規化自己相関関数ρ_Ｎ（λ，τ）を求める。

ただし、ρ（λ，τ）＝ＦＴ［Ｙ（λ，ｋ）］In Equation (4), the maximum value ρ _max (λ) of the normalized autocorrelation function can be obtained as follows. First, a normalized autocorrelation function ρ _N (λ, τ) is obtained from the power spectrum Y (λ, k) using Equation (5).

However, ρ (λ, τ) = FT [Y (λ, k)]

式（５）において、τは遅延時間であり、ＦＴ［・］は上述と同じフーリエ変換処理を表し、例えば上述した式（１）と同じポイント数＝２５６にて高速フーリエ変換を行えばよい。なお、式（５）はウィナーヒンチン（Ｗｉｅｎｅｒ−Ｋｈｉｎｔｃｈｉｎｅ）の定理であるため説明は省略する。 In Expression (5), τ is a delay time, and FT [•] represents the same Fourier transform process as described above. For example, fast Fourier transform may be performed with the same number of points = 256 as in Expression (1) described above. Note that the expression (5) is a Wiener-Khintchine theorem, and thus the description thereof is omitted.

次に、以下の式（６）を用いて、正規化自己相関関数の最大値ρ_max（λ）を得る。

ここで、式（６）は、τ＝１６〜９６の範囲で正規化自己相関関数ρ_Ｎ（λ，τ）の最大値を検索することを意味している。なお、自己相関関数の分析には、式（５）に示した方法の他、ケプストラム分析など公知の手法を用いることができる。Next, the maximum value ρ _max (λ) of the normalized autocorrelation function is obtained using the following equation (6).

Here, equation (6) means that the maximum value of the normalized autocorrelation function ρ _N (λ, τ) is searched in the range of τ = 16 to 96. For the analysis of the autocorrelation function, a known method such as cepstrum analysis can be used in addition to the method shown in Equation (5).

雑音スペクトル推定部５は、パワースペクトル計算部３から入力されるパワースペクトルＹ（λ，ｋ）と、音声・雑音区間判定部４から入力される判定フラグＶｆｌａｇを用いて、以下の式（７）および判定フラグＶｆｌａｇに従って雑音スペクトルの推定と更新を行い、現フレームの推定雑音スペクトルＮ（λ，ｋ）を出力する（ステップＳＴ７，８；詳細は後述する）。推定雑音スペクトルＮ（λ，ｋ）は、目標雑音スペクトル生成部６、抑圧量制限係数計算部７およびＳＮ比計算部８へそれぞれ出力されると共に、上述したように音声・雑音区間判定部４へも前フレームの推定雑音スペクトルＮ（λ−１，ｋ）として出力される。

式（７）において、Ｎ（λ−１，ｋ）は前フレームにおける推定雑音スペクトルであり、雑音スペクトル推定部５内のＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などの記憶手段（不図示）に保持されている。また、αは更新係数であり、０＜α＜１の範囲の所定の定数である。好適な例としてはα＝０．９５であるが、入力信号の状態および雑音レベルに応じて適宜変更することも可能である。The noise spectrum estimation unit 5 uses the power spectrum Y (λ, k) input from the power spectrum calculation unit 3 and the determination flag Vflag input from the speech / noise section determination unit 4, and the following equation (7) The noise spectrum is estimated and updated according to the determination flag Vflag and the estimated noise spectrum N (λ, k) of the current frame is output (steps ST7 and 8; details will be described later). The estimated noise spectrum N (λ, k) is output to the target noise spectrum generation unit 6, the suppression amount limiting coefficient calculation unit 7 and the SN ratio calculation unit 8, respectively, and to the voice / noise section determination unit 4 as described above. Are also output as the estimated noise spectrum N (λ-1, k) of the previous frame.

In Expression (7), N (λ-1, k) is an estimated noise spectrum in the previous frame, and is held in a storage unit (not shown) such as a RAM (Random Access Memory) in the noise spectrum estimation unit 5. . Α is an update coefficient, and is a predetermined constant in the range of 0 <α <1. A preferable example is α = 0.95, but it can be changed as appropriate according to the state of the input signal and the noise level.

式（７）において、判定フラグＶｆｌａｇ＝１の場合（ステップＳＴ５）には、現フレームの入力信号が雑音ではなく音声であると判定されていることから、前フレームの推定雑音スペクトルＮ（λ−１，ｋ）をそのまま現フレームの推定雑音スペクトルＮ（λ，ｋ）として出力する（ステップＳＴ７）。
一方、判定フラグＶｆｌａｇ＝０の場合（ステップＳＴ６）には、現フレームの入力信号が雑音であると判定されていることから、入力信号のパワースペクトルＹ（λ，ｋ）と更新係数αを用いて、前フレームの推定雑音スペクトルＮ（λ−１，ｋ）の更新を行い、現フレームの推定雑音スペクトルＮ（λ，ｋ）として出力する（ステップＳＴ８）。In the expression (7), when the determination flag Vflag = 1 (step ST5), since it is determined that the input signal of the current frame is not a noise but a speech, the estimated noise spectrum N (λ− 1, k) is output as it is as the estimated noise spectrum N (λ, k) of the current frame (step ST7).
On the other hand, when the determination flag Vflag = 0 (step ST6), since the input signal of the current frame is determined to be noise, the power spectrum Y (λ, k) of the input signal and the update coefficient α are used. Then, the estimated noise spectrum N (λ-1, k) of the previous frame is updated and output as the estimated noise spectrum N (λ, k) of the current frame (step ST8).

目標雑音スペクトル生成部６は、雑音スペクトル推定部５から入力される推定雑音スペクトルＮ（λ，ｋ）を用いて、後述する抑圧量制限係数を計算するために必要な目標雑音スペクトルＲ（λ，ｋ）を生成する（ステップＳＴ９）。生成された目標雑音スペクトルＲ（λ，ｋ）は、抑圧量制限係数計算部７に出力される。なお、目標雑音スペクトル生成部６の詳細については後述する。 The target noise spectrum generator 6 uses the estimated noise spectrum N (λ, k) input from the noise spectrum estimator 5 to calculate a target noise spectrum R (λ, k) is generated (step ST9). The generated target noise spectrum R (λ, k) is output to the suppression amount limiting coefficient calculation unit 7. Details of the target noise spectrum generation unit 6 will be described later.

抑圧量制限係数計算部７は、目標雑音スペクトル生成部６から入力される目標雑音スペクトルＲ（λ，ｋ）、パワースペクトル計算部３から入力されるパワースペクトルＹ（λ，ｋ）、雑音スペクトル推定部５から入力される推定雑音スペクトルＮ（λ，ｋ）、音声・雑音区間判定部４から入力される判定フラグＶｆｌａｇおよびユーザが設定する所定の値である最大抑圧ゲイン量ＧＭＩＮを用いて、現フレームでの推定雑音スペクトルＮ（λ，ｋ）に適合するように目標雑音スペクトルＲ（λ，ｋ）のゲインを修正して抑圧量制限係数Ｇ_floor（λ，ｋ）を計算する（ステップＳＴ１０）。計算された抑圧量制限係数Ｇ_floor（λ，ｋ）は、抑圧量計算部９に出力される。なお、抑圧量制限係数計算部７の詳細については後述する。The suppression amount limiting coefficient calculation unit 7 includes a target noise spectrum R (λ, k) input from the target noise spectrum generation unit 6, a power spectrum Y (λ, k) input from the power spectrum calculation unit 3, and noise spectrum estimation. The estimated noise spectrum N (λ, k) input from the unit 5, the determination flag Vflag input from the speech / noise section determination unit 4, and the maximum suppression gain amount GMIN that is a predetermined value set by the user, The gain of the target noise spectrum R (λ, k) is corrected so as to match the estimated noise spectrum N (λ, k) in the frame, and the suppression amount limiting coefficient G _floor (λ, k) is calculated (step ST10). . The calculated suppression amount limiting coefficient G _floor (λ, k) is output to the suppression amount calculation unit 9. Details of the suppression amount limiting coefficient calculation unit 7 will be described later.

ＳＮ比計算部８は、パワースペクトル計算部３から入力されるパワースペクトルＹ（λ，ｋ）、雑音スペクトル推定部５から入力される推定雑音スペクトルＮ（λ，ｋ）、および後述する抑圧量計算部９から入力される前フレームのスペクトル抑圧量Ｇ（λ−１，ｋ）を用いて、スペクトル成分毎の事後ＳＮＲ（ａｐｏｓｔｅｒｉｏｒｉＳＮＲ）と事前ＳＮＲ（ａｐｒｉｏｒｉＳＮＲ）を計算する（ステップＳＴ１１）。計算された事後ＳＮＲγ（λ，ｋ）および事前ＳＮＲξ（λ，ｋ）はそれぞれ抑圧量計算部９へ出力される。 The S / N ratio calculation unit 8 includes a power spectrum Y (λ, k) input from the power spectrum calculation unit 3, an estimated noise spectrum N (λ, k) input from the noise spectrum estimation unit 5, and a suppression amount calculation described later. Using the spectral suppression amount G (λ-1, k) of the previous frame input from the unit 9, a posterior SNR (a positive SNR) and an a priori SNR (a prior SNR) for each spectral component are calculated (step ST11). . The calculated posterior SNRγ (λ, k) and a priori SNRξ (λ, k) are each output to the suppression amount calculation unit 9.

事後ＳＮＲγ（λ，ｋ）は、パワースペクトルＹ（λ，ｋ）と推定雑音スペクトルＮ（λ，ｋ）とを用いて、以下の式（８）より求めることができる。

The a posteriori SNRγ (λ, k) can be obtained from the following equation (8) using the power spectrum Y (λ, k) and the estimated noise spectrum N (λ, k).

また、事前ＳＮＲξ（λ，ｋ）は、前フレームのスペクトル抑圧量Ｇ（λ−１，ｋ）と、前フレームの事後ＳＮＲγ（λ−１，ｋ）とを用いて、以下の式（９）より求めることができる。

式（９）において、δは忘却係数であって０＜δ＜１の範囲の所定の定数であり、この実施の形態１ではδ＝０．９８が好適である。また、Ｆ［・］は半波整流を意味し、事後ＳＮＲγ（λ，ｋ）がデシベル値で負の場合に値をゼロにフロアリング（ｆｌｏｏｒｉｎｇ）するものである。Further, the prior SNRξ (λ, k) is expressed by the following equation (9) using the spectral suppression amount G (λ−1, k) of the previous frame and the posterior SNRγ (λ−1, k) of the previous frame. It can be obtained more.

In equation (9), δ is a forgetting factor and is a predetermined constant in the range of 0 <δ <1, and in the first embodiment, δ = 0.98 is preferable. F [·] means half-wave rectification, and when the posterior SNRγ (λ, k) is negative in decibels, the value is floored to zero.

抑圧量計算部９は、ＳＮ比計算部８から入力される事前ＳＮＲξ（λ，ｋ）および事後ＳＮＲγ（λ，ｋ）と、抑圧量制限係数計算部７から入力される抑圧量制限係数Ｇ_floor（λ，ｋ）とを用いて、スペクトル毎の雑音抑圧量であるスペクトル抑圧量Ｇ（λ，ｋ）を計算する（ステップＳＴ１２）。計算されたスペクトル抑圧量Ｇ（λ，ｋ）は、スペクトル抑圧部１０へ出力される。The suppression amount calculation unit 9 includes a prior SNRξ (λ, k) and a posteriori SNRγ (λ, k) input from the SN ratio calculation unit 8 and a suppression amount restriction coefficient G _floor input from the suppression amount restriction coefficient calculation unit 7. Using (λ, k), a spectrum suppression amount G (λ, k), which is a noise suppression amount for each spectrum, is calculated (step ST12). The calculated spectrum suppression amount G (λ, k) is output to the spectrum suppression unit 10.

抑圧量計算部９においてスペクトル抑圧量Ｇ（λ，ｋ）を求める手法としては、例えばＪｏｉｎｔＭＡＰ（ＭａｘｉｍｕｍＡＰｏｓｔｅｒｉｏｒｉ）法を適用することができる。ＪｏｉｎｔＭＡＰ法は、雑音信号と音声信号をガウス分布であると仮定してスペクトル抑圧量Ｇ（λ，ｋ）を推定する方法であり、事前ＳＮＲξ（λ，ｋ）および事後ＳＮＲγ（λ，ｋ）を用いて、条件付き確率密度関数を最大にする振幅スペクトルと位相スペクトルを求め、その値を推定値として利用する。この構成の場合、スペクトル抑圧量Ｇ_TMP（λ，ｋ）は、確率密度関数の形状を決定するνとμをパラメータとして、以下の式（１０）で表すことができる。

As a technique for obtaining the spectrum suppression amount G (λ, k) in the suppression amount calculation unit 9, for example, the Joint MAP (Maximum A Postoriori) method can be applied. The Joint MAP method is a method for estimating the spectrum suppression amount G (λ, k) on the assumption that the noise signal and the voice signal are Gaussian distributions. The prior SNRξ (λ, k) and the a posteriori SNRγ (λ, k) Is used to obtain an amplitude spectrum and a phase spectrum that maximize the conditional probability density function, and use these values as estimated values. In the case of this configuration, the spectrum suppression amount G _TMP (λ, k) can be expressed by the following equation (10) using ν and μ that determine the shape of the probability density function as parameters.

抑圧量計算部９は、上式（１０）にて仮のスペクトル抑圧量Ｇ_TMP（λ，ｋ）を得た後、抑圧量制限係数Ｇ_floor（λ，ｋ）と以下の式（１１）を用いてスペクトルゲインの最小値の制限であるフロアリング処理を行い、スペクトル抑圧量Ｇ（λ，ｋ）を得る。

After obtaining the provisional spectrum suppression amount G _TMP (λ, k) by the above equation (10), the suppression amount calculation unit 9 calculates the suppression amount limiting coefficient G _floor (λ, k) and the following equation (11). The flooring process, which is the limitation of the minimum value of the spectrum gain, is performed to obtain the spectrum suppression amount G (λ, k).

なお、ＪｏｉｎｔＭＡＰ法におけるスペクトル抑圧量導出法の詳細については、以下の参考文献１を参照することとし、ここでは説明を省略する。
［参考文献１］
Ｔ．Ｌｏｔｔｅｒ，Ｐ．Ｖａｒｙ，“ＳｐｅｅｃｈＥｎｈａｎｃｅｍｅｎｔｂｙＭＡＰＳｐｅｃｔｒａｌＡｍｐｌｉｔｕｄｅＵｓｉｎｇａＳｕｐｅｒ−ＧａｕｓｓｉａｎＳｐｅｅｃｈＭｏｄｅｌ”，ＥＵＲＡＳＩＰＪｏｕｒｎａｌｏｎＡｐｐｌｉｅｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，ｐｐ．１１１０−１１２６，Ｎｏ．７，２００５For details of the spectrum suppression amount derivation method in the Joint MAP method, refer to Reference Document 1 below, and description thereof is omitted here.
[Reference 1]
T.A. Lotter, P.M. Vary, “Speech Enhancement by MAP Spectral Amplitude Usage a Super-Gaussian Speech Model”, EURASIP Journal on Applied Signal Processing. 1110-1126, no. 7, 2005

スペクトル抑圧部１０は、抑圧量計算部９から入力されるスペクトル抑圧量Ｇ（λ，ｋ）を用いて、以下の式（１２）に従って、入力信号のスペクトル成分Ｘ（λ，ｋ）をそのスペクトル毎に抑圧して、雑音抑圧された音声信号スペクトルＳ（λ，ｋ）を求める（ステップＳＴ１３）。求めた音声信号スペクトルＳ（λ，ｋ）は、逆フーリエ変換部１１へ出力される。

The spectrum suppression unit 10 uses the spectrum suppression amount G (λ, k) input from the suppression amount calculation unit 9 to convert the spectrum component X (λ, k) of the input signal into its spectrum according to the following equation (12). The speech signal spectrum S (λ, k) is suppressed every time and noise is suppressed (step ST13). The obtained audio signal spectrum S (λ, k) is output to the inverse Fourier transform unit 11.

逆フーリエ変換部１１は、スペクトル抑圧部１０から入力される音声信号スペクトルＳ（λ，ｋ）を用いて逆フーリエ変換し、前フレームの出力信号と重ね合わせ処理を行い、雑音抑圧された音声信号ｓ（ｔ）を得る（ステップＳＴ１４）。雑音抑圧された音声信号ｓ（ｔ）は出力端子１２へ出力され、出力端子１２は雑音抑圧された音声信号ｓ（ｔ）を外部へ出力し（ステップＳＴ１５）、処理を終了する。 The inverse Fourier transform unit 11 performs inverse Fourier transform using the speech signal spectrum S (λ, k) input from the spectrum suppression unit 10, performs a superimposition process with the output signal of the previous frame, and the speech signal subjected to noise suppression s (t) is obtained (step ST14). The noise-suppressed audio signal s (t) is output to the output terminal 12, and the output terminal 12 outputs the noise-suppressed audio signal s (t) to the outside (step ST15), and the process ends.

次に、目標雑音スペクトル生成部６の詳細な構成および動作を図３から図５を参照しながら説明する。
まず、目標雑音スペクトル生成部６の構成について説明する。
図３は、実施の形態１に係る雑音抑圧装置１００の目標雑音スペクトル生成部６の構成を示すブロック図である。
目標雑音スペクトル生成部６は、雑音パワー計算部６１、目標雑音スペクトル選択部６２および目標雑音スペクトルメモリ６３を備える。Next, the detailed configuration and operation of the target noise spectrum generation unit 6 will be described with reference to FIGS.
First, the configuration of the target noise spectrum generation unit 6 will be described.
FIG. 3 is a block diagram illustrating a configuration of the target noise spectrum generation unit 6 of the noise suppression apparatus 100 according to the first embodiment.
The target noise spectrum generation unit 6 includes a noise power calculation unit 61, a target noise spectrum selection unit 62, and a target noise spectrum memory 63.

雑音パワー計算部６１は、雑音スペクトル推定部５から入力される推定雑音スペクトルＮ（λ，ｋ）を用いて、入力信号スペクトル中の雑音パワーＰ_N（λ）を計算する。目標雑音スペクトル選択部６２は、目標雑音スペクトルメモリ６３を参照し、雑音パワーＰ_N（λ）に対応する目標雑音スペクトルＲ（λ，ｋ）を選択する。目標雑音スペクトルメモリ６３は、雑音パワーのパタンで分類された１以上の様々な周波数形状の雑音スペクトルを目標雑音スペクトルとして蓄積する。The noise power calculation unit 61 calculates the noise power P _N (λ) in the input signal spectrum using the estimated noise spectrum N (λ, k) input from the noise spectrum estimation unit 5. The target noise spectrum selection unit 62 refers to the target noise spectrum memory 63 and selects the target noise spectrum R (λ, k) corresponding to the noise power P _N (λ). The target noise spectrum memory 63 stores noise spectra having one or more various frequency shapes classified by the noise power pattern as the target noise spectrum.

次に、目標雑音スペクトルメモリ６３が蓄積する目標雑音スペクトルについて図４を参照しながら説明する。
図４は、実施の形態１に係る雑音抑圧装置１００の目標雑音スペクトルメモリ６３が蓄積する目標雑音スペクトルの一例を示す図である。図４の例では、縦軸は信号振幅（デシベル：ｄＢ）、横軸は周波数（０〜４０００Ｈｚ）を示し、狭帯域電話音声（０〜４０００Ｈｚ）における雑音抑制を想定した場合を示している。Next, the target noise spectrum stored in the target noise spectrum memory 63 will be described with reference to FIG.
FIG. 4 is a diagram illustrating an example of the target noise spectrum stored in the target noise spectrum memory 63 of the noise suppression apparatus 100 according to the first embodiment. In the example of FIG. 4, the vertical axis represents the signal amplitude (decibel: dB), the horizontal axis represents the frequency (0 to 4000 Hz), and the case is assumed that noise suppression is assumed in the narrowband telephone voice (0 to 4000 Hz).

図４に示す例では、車両の走行速度を雑音パワーに対応付け、各雑音パワーに対応した複数の目標雑音スペクトルを示している。具体的には、車両の走行速度が時速７０ｋｍの場合の目標雑音スペクトルＲ_Ｓ１（ｋ）、車両の走行速度が時速１３０ｋｍの場合の目標雑音スペクトルＲ_Ｓ２（ｋ）、車両の走行速度が時速１６０ｋｍの場合の目標雑音スペクトルＲ_Ｓ３（ｋ）、車両の走行速度が時速１９０ｋｍの場合の目標雑音スペクトルＲ_Ｓ４（ｋ）を示している。なお、図４では、車両の走行速度で分類した目標雑音スペクトルを示したが、車両の走行速度に限られるものではなく、例えばエアコンの風量、窓や屋根の開閉情報、エンジンの回転数などに基づいて分類した目標雑音スペクトルを蓄積するように構成してもよい。In the example shown in FIG. 4, the traveling speed of the vehicle is associated with the noise power, and a plurality of target noise spectra corresponding to each noise power are shown. Specifically, when the traveling speed of the vehicle is at a speed of 70km target noise spectrum _R S1 (k), a target noise spectrum _R S2 when the traveling speed of the vehicle is at a speed of 130 km (k), the running speed of the vehicle speed 160km The target noise spectrum R _S3 (k) in the case of, and the target noise spectrum R _S4 (k) when the vehicle traveling speed is 190 km / h are shown. In FIG. 4, the target noise spectrum classified according to the vehicle traveling speed is shown. However, the target noise spectrum is not limited to the vehicle traveling speed. For example, the air volume of the air conditioner, the opening / closing information of windows and roofs, the engine speed, You may comprise so that the target noise spectrum classified based on may be accumulate | stored.

次に、目標雑音スペクトル生成部６の動作について図５を参照しながら説明する。
図５は、実施の形態１に係る雑音抑圧装置１００の目標雑音スペクトル生成部６の動作を示すフローチャートであり、図２のフローチャートのステップＳＴ９の処理をより詳細に示したものである。
雑音パワー計算部６１は、雑音スペクトル推定部５から推定雑音スペクトルＮ（λ，ｋ）が入力されると（ステップＳＴ２１）、入力された推定雑音スペクトルＮ（λ，ｋ）を用いて、以下の式（１３）に基づいて入力信号スペクトル中の雑音パワーＰ_N（λ）を計算する（ステップＳＴ２２）。計算された雑音パワーＰ_N（λ）は分析結果として目標雑音スペクトル選択部６２に出力される。

式（１３）において、Ｎはスペクトルの個数であり、Ｎ＝１２８とする。Next, the operation of the target noise spectrum generation unit 6 will be described with reference to FIG.
FIG. 5 is a flowchart showing the operation of the target noise spectrum generation unit 6 of the noise suppression apparatus 100 according to Embodiment 1, and shows the process of step ST9 in the flowchart of FIG. 2 in more detail.
When the estimated noise spectrum N (λ, k) is input from the noise spectrum estimation unit 5 (step ST21), the noise power calculation unit 61 uses the input estimated noise spectrum N (λ, k) to Based on the equation (13), the noise power P _N (λ) in the input signal spectrum is calculated (step ST22). The calculated noise power P _N (λ) is output to the target noise spectrum selector 62 as an analysis result.

In Equation (13), N is the number of spectra, and N = 128.

目標雑音スペクトル選択部６２は、目標雑音スペクトルメモリ６３を参照し、以下の式（１４）に基づいて雑音パワー計算部６１が計算した雑音パワーＰ_N（λ）に対応する目標雑音スペクトルＲ（λ，ｋ）を選択する（ステップＳＴ２３）。選択した目標雑音スペクトルＲ（λ，ｋ）は、抑圧量制限係数計算部７に出力される。

The target noise spectrum selection unit 62 refers to the target noise spectrum memory 63, and the target noise spectrum R (λ) corresponding to the noise power P _N (λ) calculated by the noise power calculation unit 61 based on the following equation (14). , K) is selected (step ST23). The selected target noise spectrum R (λ, k) is output to the suppression amount limiting coefficient calculation unit 7.

式（１４）において、ＴＨ_Ｎ１、ＴＨ_Ｎ２、ＴＨ_Ｎ３は、例えば車両の走行速度が時速７０ｋｍ、１３０ｋｍ、１６０ｋｍにおける雑音パワーＰ_N（λ）に関する所定の閾値を示す。なお、車両の走行速度に固定されることはなく、雑音抑圧装置１００の使用形態に応じ、車両の走行速度以外に入力信号の状態および雑音レベルに応じて、各閾値の値を適宜変更する、あるいは閾値条件を追加したりすることができる。ここで、雑音抑圧装置１００の使用形態とは、例えば上述したエアコンの風量、窓や屋根の開閉情報、エンジンの回転数などである。In Expression (14), TH _N1 , TH _N2 , and TH _N3 indicate predetermined threshold values relating to the noise power P _N (λ) when the vehicle traveling speed is 70 km, 130 km, and 160 km per hour, for example. In addition, the travel speed of the vehicle is not fixed, and the value of each threshold is appropriately changed according to the state of the input signal and the noise level in addition to the travel speed of the vehicle, according to the usage mode of the noise suppression device 100. Alternatively, a threshold condition can be added. Here, the usage form of the noise suppression device 100 is, for example, the air volume of the air conditioner described above, information on opening / closing of windows and roofs, the number of revolutions of the engine, and the like.

次に、抑圧量制限係数計算部７の詳細な構成および動作を図６および図７を参照しながら説明する。
図６は、実施の形態１に係る雑音抑圧装置１００の抑圧量制限係数計算部７の構成を示すブロック図である。
抑圧量制限係数計算部７は、パワー計算部７１および係数補正部７２を備える。
パワー計算部７１は、目標雑音スペクトルＲ（λ，ｋ）のパワーＰＯＷ_Ｒ（λ）を計算し、推定雑音スペクトルＮ（λ，ｋ）のパワーＰＯＷ_Ｎ（λ）を計算する。係数補正部７２は、パワー計算部７１が計算したパワーＰＯＷ_Ｒ（λ）およびパワーＰＯＷ_Ｎ（λ）から目標雑音スペクトルＲ（λ，ｋ）の修正量Ｄ（λ）を決定し、決定した修正量Ｄ（λ）を用いて目標雑音スペクトルＲ（λ，ｋ）のゲイン修正を行う。さらにゲイン修正した目標雑音スペクトルＲ_ADJ（λ，ｋ）と入力信号のパワースペクトルＹ（λ，ｋ）とに基づいて抑圧量制限係数Ｇ_floor（λ，ｋ）を計算する。Next, the detailed configuration and operation of the suppression amount limiting coefficient calculation unit 7 will be described with reference to FIGS. 6 and 7.
FIG. 6 is a block diagram illustrating a configuration of the suppression amount limiting coefficient calculation unit 7 of the noise suppression apparatus 100 according to the first embodiment.
The suppression amount limiting coefficient calculation unit 7 includes a power calculation unit 71 and a coefficient correction unit 72.
The power calculator 71 calculates the power POW _R (λ) of the target noise spectrum R (λ, k), and calculates the power POW _N (λ) of the estimated noise spectrum N (λ, k). The coefficient correction unit 72 determines the correction amount D (λ) of the target noise spectrum R (λ, k) from the power POW _R (λ) and the power POW _N (λ) calculated by the power calculation unit 71, and the determined correction The gain of the target noise spectrum R (λ, k) is corrected using the quantity D (λ). Further, the suppression amount limiting coefficient G _floor (λ, k) is calculated based on the target noise spectrum R _ADJ (λ, k) whose gain has been corrected and the power spectrum Y (λ, k) of the input signal.

図７は、実施の形態１に係る雑音抑圧装置１００の抑圧量制限係数計算部７の動作を示すフローチャートであり、図２のフローチャートで示したステップＳＴ１０の処理をより詳細に示したものである。
パワー計算部７１は、目標雑音スペクトル生成部６から目標雑音スペクトルＲ（λ，ｋ）、雑音スペクトル推定部５から入力された推定雑音スペクトルＮ（λ，ｋ）が入力されると（ステップＳＴ３１）、以下の式（１５）に基づいて、目標雑音スペクトルＲ（λ，ｋ）のパワーＰＯＷ_Ｒ（λ）を計算し（ステップＳＴ３２）、また、推定雑音スペクトルＮ（λ，ｋ）のパワーＰＯＷ_Ｎ（λ）を計算する（ステップＳＴ３３）。計算されたパワーＰＯＷ_Ｒ（λ），ＰＯＷ_Ｎ（λ）は、係数補正部７２に出力される。

式（１５）において、ＰＯＷ_Ｒ（λ）は現フレームの目標雑音スペクトルＲ（λ，ｋ）のパワー、ＰＯＷ_Ｎ（λ）は現フレームの推定雑音スペクトルＮ（λ，ｋ）のパワーであり、また、Ｎ＝１２８である。FIG. 7 is a flowchart showing the operation of the suppression amount limiting coefficient calculation unit 7 of the noise suppression apparatus 100 according to Embodiment 1, and shows the process of step ST10 shown in the flowchart of FIG. 2 in more detail. .
When the target noise spectrum R (λ, k) is input from the target noise spectrum generation unit 6 and the estimated noise spectrum N (λ, k) input from the noise spectrum estimation unit 5 is input to the power calculation unit 71 (step ST31). Based on the following equation (15), the power POW _R (λ) of the target noise spectrum R (λ, k) is calculated (step ST32), and the power POW _{N of the} estimated noise spectrum N (λ, k) is calculated. (Λ) is calculated (step ST33). The calculated powers POW _R (λ) and POW _N (λ) are output to the coefficient correction unit 72.

In Equation (15), POW _R (λ) is the power of the target noise spectrum R (λ, k) of the current frame, POW _N (λ) is the power of the estimated noise spectrum N (λ, k) of the current frame, N = 128.

係数補正部７２は、以下の式（１６）に基づいて、目標雑音スペクトルのパワーＰＯＷ_Ｒ（λ）と、推定雑音スペクトルのパワーＰＯＷ_Ｎ（λ）に最大抑圧ゲイン量ＧＭＩＮを乗算した値とを比較し（ステップＳＴ３４）、比較結果に応じて目標雑音スペクトルＲ（λ，ｋ）の修正量Ｄ（λ）を決定する（ステップＳＴ３５）。

式（１６）において、Ｄ_ＵＰおよびＤ_ＤＯＷＮは所定の定数であり、この実施の形態１ではＤ_ＵＰ＝１．０５，Ｄ_ＤＯＷＮ＝０．９５がそれぞれ好適であるが、雑音の種類および雑音レベルに応じて適宜変更することが可能である。また、Ｄ_ＵＰ，Ｄ_ＤＯＷＮの値はそれぞれ１種類だけに限らず、複数個用いて修正量Ｄ（λ）を決定するように構成してもよい。例えば、上式（１６）ではパワーの大小比較だけで修正量Ｄ（λ）を決定しているが、パワーの差が所定の閾値より大きい（または小さい）場合に、Ｄ_ＵＰ＝１．２（または小さい場合にＤ_ＤＯＷＮ＝０．８）として、より大きな修正量Ｄ（λ）を設定するように構成してもよい。このように、パワーの差によって修正量Ｄ（λ）の値を変更することで、修正誤差をより小さくすると共に、修正速度を速くすることができる。The coefficient correction unit 72 calculates a target noise spectrum power POW _R (λ) and a value obtained by multiplying the estimated noise spectrum power POW _N (λ) by the maximum suppression gain amount GMIN based on the following equation (16). The comparison is made (step ST34), and the correction amount D (λ) of the target noise spectrum R (λ, k) is determined according to the comparison result (step ST35).

In the equation (16), D _UP and D _DOWN are predetermined constants. In the first embodiment, D _UP = 1.05 and D _DOWN = 0.95 are preferable, respectively. It is possible to change appropriately according to. Further, the values of D _UP and D _DOWN are not limited to only one type, and a plurality of values may be used to determine the correction amount D (λ). For example, in the above equation (16), the correction amount D (λ) is determined only by comparing the power levels, but when the power difference is larger (or smaller) than a predetermined threshold, D _UP = 1.2 ( Alternatively, a smaller correction amount D (λ) may be set as D _DOWN = 0.8) when smaller. Thus, by changing the value of the correction amount D (λ) according to the power difference, the correction error can be further reduced and the correction speed can be increased.

なお、この実施の形態１においては、上式（１５）において全帯域のパワーを求める構成を示したが、当該構成に限定されるものではなく、一部の帯域成分、例えば、２００Ｈｚ〜８００Ｈｚのパワーを求め、上式（１６）においてパワーの比較を行うことも可能である。 In the first embodiment, the configuration for obtaining the power of the entire band is shown in the above equation (15). However, the configuration is not limited to this configuration, and some band components, for example, 200 Hz to 800 Hz are used. It is also possible to obtain the power and compare the power in the above equation (16).

続いて、係数補正部７２は、以下の式（１７）に基づいて、得られた修正量Ｄ（λ）を用いて目標雑音スペクトルＲ（λ，ｋ）のゲインの修正を行い、ゲイン修正した目標雑音スペクトルＲ_ADJ（λ，ｋ）を得る（ステップＳＴ３６）。

Subsequently, the coefficient correction unit 72 corrects the gain of the target noise spectrum R (λ, k) using the obtained correction amount D (λ) based on the following equation (17) to correct the gain. A target noise spectrum R _ADJ (λ, k) is obtained (step ST36).

また、音声・雑音区間判定部４が出力する判定フラグＶｆｌａｇ＝１の場合、即ち、現フレームが音声と判定されている場合、上式（１７）によるゲインの修正を行わないように構成してもよい。このように判定フラグＶｆｌａｇによってゲイン補正を制御することで、誤って推定雑音に音声が混入した場合に、不要なゲイン補正を抑制することができ、安定した目標雑音スペクトルを得ることができる。 Further, when the determination flag Vflag = 1 output from the speech / noise section determination unit 4, that is, when the current frame is determined to be speech, the gain is not corrected by the above equation (17). Also good. By controlling the gain correction using the determination flag Vflag in this manner, unnecessary gain correction can be suppressed and a stable target noise spectrum can be obtained when speech is erroneously mixed in the estimated noise.

最後に、係数補正部７２は、ゲイン修正した目標雑音スペクトルＲ_ADJ（λ，ｋ）と、パワースペクトル計算部３が出力する入力信号のパワースペクトルＹ（λ，ｋ）とを入力に用い、以下の式（１８）および式（１９）に基づいて抑圧量制限係数Ｇ_floor（λ，ｋ）を計算する（ステップＳＴ３７）。以下の式（１８）は抑圧量の上限と下限を決定する式であり、以下の式（１９）は抑圧量制限係数のフレーム間平滑を行う式である。得られた抑圧量制限係数Ｇ_floor（λ，ｋ）は、抑圧量計算部９へ出力される。

式（１８）において、ＧＭＡＸは最小抑圧ゲイン量、即ち、この雑音抑圧装置１００の「最小」の抑圧量となる１以下の所定の定数、ＧＭＩＮは前述した最大抑圧ゲイン量、即ち、この雑音抑圧装置の「最大」の抑圧量となる１以下の所定の定数である。また、βは所定の平滑化係数を表し、β＝０．１が好適である。Finally, the coefficient correction unit 72 uses the target noise spectrum R _ADJ (λ, k) whose gain has been corrected and the power spectrum Y (λ, k) of the input signal output from the power spectrum calculation unit 3 as inputs. The suppression amount limiting coefficient G _floor (λ, k) is calculated based on the equations (18) and (19) (step ST37). The following equation (18) is an equation for determining the upper limit and the lower limit of the suppression amount, and the following equation (19) is an equation for smoothing the suppression amount limiting coefficient between frames. The obtained suppression amount limiting coefficient G _floor (λ, k) is output to the suppression amount calculation unit 9.

In Expression (18), GMAX is a minimum suppression gain amount, that is, a predetermined constant equal to or less than 1 that is the “minimum” suppression amount of the noise suppression apparatus 100, and GMIN is the above-described maximum suppression gain amount, that is, the noise suppression. This is a predetermined constant equal to or less than 1 that is the “maximum” suppression amount of the apparatus. Β represents a predetermined smoothing coefficient, and β = 0.1 is preferable.

図８は、実施の形態１に係る雑音抑圧装置１００の出力信号である残留雑音スペクトル、即ち、音声信号スペクトルＳ（λ，ｋ）の一例を模式的に表した図である。図８は、縦軸は信号振幅（デシベル：ｄＢ）、横軸は周波数（０〜４０００Ｈｚ）を示し、雑音抑圧対象が狭帯域電話音声（０〜４０００Ｈｚ）である場合を例に説明を行う。また、図８（ａ）は車両の走行速度が時速７０ｋｍの場合を示し、図８（ｂ）は車両の走行速度が時速１６０ｋｍの場合を示している。
図８（ａ），（ｂ）において、スペクトルＮ_１，Ｎ_２は推定雑音スペクトルを示し、スペクトルＲ_ＡＤＪ１，Ｒ_ＡＤＪ２は目標雑音スペクトルを示し、スペクトルＳ_１，Ｓ_２はこの実施の形態１による残留雑音スペクトル、即ち音声信号スペクトルを示している。図８（ａ）および図８（ｂ）共に、得られた残留雑音スペクトルＳ_１，Ｓ_２は、目標雑音スペクトルＲ_ＡＤＪ１，Ｒ_ＡＤＪ２に対して雑音の過度な抑圧や抑圧不足である帯域が生じていない。特に従来技術で説明を行った領域Ｃで示した帯域および領域Ｄで示した帯域においてもスペクトルの過度な抑圧や抑圧不足などが生じていないのが分かる。これは、例えば車両の走行速度を例に説明すると、目標雑音スペクトル選択部６２において、目標雑音スペクトルメモリ６３を参照し、車両の走行速度などのノイズ条件に応じた目標雑音スペクトルを選択するように構成したので、推定雑音スペクトルＮ_１，Ｎ_２の過度な抑制や抑圧不足を抑制することができたためである。FIG. 8 is a diagram schematically illustrating an example of a residual noise spectrum that is an output signal of the noise suppression apparatus 100 according to Embodiment 1, that is, an audio signal spectrum S (λ, k). In FIG. 8, the vertical axis indicates signal amplitude (decibel: dB), the horizontal axis indicates frequency (0 to 4000 Hz), and the case where the noise suppression target is narrowband telephone speech (0 to 4000 Hz) will be described as an example. FIG. 8A shows a case where the vehicle traveling speed is 70 km / h, and FIG. 8B shows a case where the vehicle traveling speed is 160 km / h.
In FIGS. 8A and 8B, spectra N ₁ and N ₂ indicate estimated noise spectra, spectra R _ADJ1 and R _ADJ2 indicate target noise spectra, and spectra S ₁ and S ₂ are according to the first embodiment. The residual noise spectrum, that is, the voice signal spectrum is shown. In both FIG. 8A and FIG. 8B, the obtained residual noise spectra S ₁ and S ₂ have _{bands in} which noise is excessively suppressed or insufficiently suppressed with respect to the target noise spectra R _ADJ1 and R _ADJ2 . Not. In particular, it can be seen that excessive suppression or insufficient suppression of the spectrum does not occur in the band indicated by the region C and the band indicated by the region D described in the related art. For example, when the vehicle traveling speed is described as an example, the target noise spectrum selecting unit 62 refers to the target noise spectrum memory 63 and selects a target noise spectrum corresponding to a noise condition such as the vehicle traveling speed. This is because it was possible to suppress excessive suppression and insufficient suppression of the estimated noise spectra N ₁ and N ₂ .

上述した発明が解決しようとする課題で述べたように、実際の雑音環境、例えば車両走行時の車室内で観測される走行騒音は、風切り音およびエンジン回転音などが原因で高域の雑音パワーが高くなる場合がある。このような雑音が入力信号に混入した場合、従来の雑音抑圧方法では雑音抑圧処理後の残留雑音が所定の目標スペクトルの形状に合うように全体の抑圧量を決定するために、極端に抑圧過剰な帯域および抑圧不足の帯域が出現する場合があった。これに対して、実施の形態１に係る雑音抑圧装置１００では、入力信号から推定した推定雑音スペクトルＮ（λ，ｋ）から抑圧量制限係数Ｇ_floor（λ，ｋ）を計算し、計算した係数を用いてスペクトルゲインの制限処理を行ため、極端に抑圧過剰あるいは抑圧不足な帯域が生じることがなく（図８（ｂ）の領域Ｃ，Ｄ参照）、良好な雑音抑圧を行うことができる。As described in the above-mentioned problem to be solved by the invention, the actual noise environment, for example, the running noise observed in the passenger compartment when the vehicle is running is high noise power due to wind noise and engine rotation noise. May be higher. When such noise is mixed in the input signal, the conventional noise suppression method is extremely over-suppressed in order to determine the overall suppression amount so that the residual noise after noise suppression processing matches the shape of the predetermined target spectrum. In some cases, there was a case where a large band and an insufficiently suppressed band appeared. On the other hand, in the noise suppression apparatus 100 according to Embodiment 1, the suppression amount limiting coefficient G _floor (λ, k) is calculated from the estimated noise spectrum N (λ, k) estimated from the input signal, and the calculated coefficient is calculated. Since the spectrum gain limiting process is performed using, a band that is excessively over-suppressed or under-suppressed does not occur (see regions C and D in FIG. 8B), and good noise suppression can be performed.

以上のように、この実施の形態１によれば、推定雑音スペクトルに基づいて複数の目標雑音スペクトルから入力信号に適した目標雑音スペクトルを生成する目標雑音スペクトル生成部６と、生成された目標雑音スペクトルに基づいて、雑音抑圧の上下限を規定する抑圧量制限係数を計算する抑圧量制限係数計算部７と、入力信号のスペクトル成分のＳＮ比と抑圧量制限係数とを用いてスペクトル抑圧量を計算する抑圧量計算部９と、スペクトル抑圧量を用いて入力信号のスペクトル成分を振幅抑圧するスペクトル抑圧部１０とを備えるように構成したので、雑音を過剰に抑圧した帯域および雑音の抑圧が不足した帯域を生じさせることなく、ミュージカルノイズの発生を抑制すると共に、音声に歪みや隠滅感および雑音感が生じない良好な雑音抑圧を行うことができる。 As described above, according to the first embodiment, the target noise spectrum generation unit 6 that generates a target noise spectrum suitable for an input signal from a plurality of target noise spectra based on the estimated noise spectrum, and the generated target noise Based on the spectrum, the amount of spectrum suppression is calculated by using a suppression amount limiting coefficient calculation unit 7 that calculates a suppression amount limiting coefficient that defines the upper and lower limits of noise suppression, and the S / N ratio of the spectrum component of the input signal and the suppression amount limiting coefficient. Since the suppression amount calculation unit 9 for calculating and the spectrum suppression unit 10 for suppressing the amplitude of the spectrum component of the input signal using the spectrum suppression amount are provided, the band in which noise is excessively suppressed and the noise suppression is insufficient. This reduces the generation of musical noise without creating a special band, and also provides good noise that does not cause distortion, discomfort, or noise. Suppression can be carried out.

また、この実施の形態１によれば、抑圧量制限係数計算部７において、音声・雑音区間判定部４が出力する判定フラグＶｆｌａｇ＝１の場合、即ち、現フレームが音声と判定されている場合に、上式（１７）によるゲインの修正を行わない構成とすることが可能である。このように、判定フラグＶｆｌａｇによってゲイン補正を制御可能に構成することにより、誤って推定雑音に音声が混入していた場合にも不要なゲイン補正を抑制することができ、安定した目標雑音スペクトルを得ることができる。これにより、さらに良好な雑音抑圧が可能である。 Further, according to the first embodiment, in the suppression amount limiting coefficient calculation unit 7, when the determination flag Vflag = 1 output from the speech / noise section determination unit 4, that is, when the current frame is determined to be speech. In addition, it is possible to adopt a configuration in which the gain is not corrected by the above equation (17). In this way, by configuring the gain correction to be controllable by the determination flag Vflag, unnecessary gain correction can be suppressed even when speech is erroneously mixed in the estimated noise, and a stable target noise spectrum can be obtained. Can be obtained. Thereby, even better noise suppression is possible.

実施の形態２．
上述した実施の形態１では、目標雑音スペクトル生成部６において推定雑音スペクトルのパワーに基づいて目標雑音スペクトルを生成する場合を示したが、この実施の形態２では推定雑音スペクトルのパワーに加えて推定雑音スペクトルの周波数特性も合わせて用いて目標雑音スペクトルを生成する構成を示す。Embodiment 2. FIG.
In the first embodiment described above, the target noise spectrum generation unit 6 generates the target noise spectrum based on the power of the estimated noise spectrum. However, in the second embodiment, estimation is performed in addition to the power of the estimated noise spectrum. A configuration for generating a target noise spectrum using the frequency characteristics of the noise spectrum is also shown.

図９は、実施の形態２に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ａの構成を示すブロック図である。なお、以下では、実施の形態１に係る雑音抑圧装置１００の目標雑音スペクトル生成部６の構成要素と同一または相当する部分には、実施の形態１で使用した符号と同一の符号を付して説明を省略または簡略化する。また、雑音抑圧装置１００の目標雑音スペクトル生成部６ａ以外の構成要素は実施に形態１と同一であるため、説明を省略する。 FIG. 9 is a block diagram illustrating a configuration of the target noise spectrum generation unit 6a of the noise suppression apparatus 100 according to the second embodiment. In the following, the same or equivalent parts as those of the target noise spectrum generating unit 6 of the noise suppression apparatus 100 according to the first embodiment are denoted by the same reference numerals as those used in the first embodiment. The description is omitted or simplified. In addition, since the components other than the target noise spectrum generation unit 6a of the noise suppression apparatus 100 are the same as those in the first embodiment, description thereof is omitted.

目標雑音スペクトル生成部６ａは、雑音パワー計算部６１、目標雑音スペクトル選択部６２ａおよび目標雑音スペクトルメモリ６３ａに加えて、周波数特性分析部６４を備える。
目標雑音スペクトルメモリ６３ａは、雑音パワーのパタンで分類された１以上の周波数形状の目標雑音スペクトルに加えて、推定雑音スペクトルの周波数特性のパタンで分類された１以上の周波数形状の目標雑音スペクトルを蓄積している。周波数特性分析部６４は、目標雑音スペクトルメモリ６３ａに蓄積された目標雑音スペクトルの雑音パワーＰ_ＲＳ(ｍ)と、推定雑音スペクトルの雑音パワーＰ_Ｎ（λ）を用いて推定雑音スペクトルＮ（λ，ｋ）の正規化を行い、正規化推定雑音スペクトルと目標雑音スペクトルの二乗誤差Ｄ_Ｎ（λ，ｍ）を算出する。目標雑音スペクトル選択部６２ａは、目標雑音スペクトルメモリ６３ａを参照し、周波数特性分析部６４が算出した二乗誤差Ｄ_Ｎ（λ，ｍ）を用いて目標雑音スペクトルＲ（λ，ｋ）を選択する。The target noise spectrum generation unit 6a includes a frequency characteristic analysis unit 64 in addition to the noise power calculation unit 61, the target noise spectrum selection unit 62a, and the target noise spectrum memory 63a.
The target noise spectrum memory 63a, in addition to the target noise spectrum of one or more frequency shapes classified by the pattern of noise power, in addition to the target noise spectrum of one or more frequency shapes classified by the pattern of the frequency characteristic of the estimated noise spectrum. Accumulated. The frequency characteristic analyzer 64 uses the noise power P _RS (m) of the target noise spectrum stored in the target noise spectrum memory 63a and the noise power P _N (λ) of the estimated noise spectrum to estimate the noise spectrum N (λ, k) is normalized, and a square error D _N (λ, m) between the normalized estimated noise spectrum and the target noise spectrum is calculated. The target noise spectrum selection unit 62a refers to the target noise spectrum memory 63a and selects the target noise spectrum R (λ, k) using the square error D _N (λ, m) calculated by the frequency characteristic analysis unit 64.

次に、実施の形態２に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ａの動作について説明する。
図１０は、実施の形態２に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ａの動作を示すフローチャートである。なお、以下では実施の形態１に係る雑音抑圧装置１００の目標雑音スペクトル生成部６と同一のステップには図５で使用した符号と同一の符号を付し、説明を省略または簡略化する。
雑音パワー計算部６１が入力信号スペクトル中の雑音パワーＰ_N（λ）を計算すると（ステップＳＴ２２）、周波数特性分析部６４は、目標雑音スペクトルメモリ６３ａに蓄積された目標雑音スペクトルの雑音パワーＰ_ＲＳ(ｍ)と、ステップＳＴ２２で計算された雑音パワーＰ_Ｎ（λ）を用いて、推定雑音スペクトルＮ（λ，ｋ）の正規化を行い（ステップＳＴ４１）、以下の式（２０）を用いて、目標雑音スペクトルと正規化推定雑音スペクトルとの二乗誤差Ｄ_Ｎ（λ，ｍ）を算出する（ステップＳＴ４２）。算出された二乗誤差Ｄ_Ｎ（λ，ｍ）は、目標雑音スペクトル選択部６２ａに出力される。

式（２０）において、ｍは図４で示した目標雑音スペクトルＲ_ｓｍ（ｋ）を指定するための番号である。Next, the operation of the target noise spectrum generation unit 6a of the noise suppression apparatus 100 according to Embodiment 2 will be described.
FIG. 10 is a flowchart showing the operation of the target noise spectrum generation unit 6a of the noise suppression apparatus 100 according to the second embodiment. In the following, the same steps as those of the target noise spectrum generation unit 6 of the noise suppression apparatus 100 according to Embodiment 1 are denoted by the same reference numerals as those used in FIG. 5, and description thereof is omitted or simplified.
When the noise power calculation unit 61 calculates the noise power P _N (λ) in the input signal spectrum (step ST22), the frequency characteristic analysis unit 64 causes the noise power P _RS of the target noise spectrum stored in the target noise spectrum memory 63a. (m) and the noise power P _N (λ) calculated in step ST22 are used to normalize the estimated noise spectrum N (λ, k) (step ST41), and the following equation (20) is used. Then, the square error D _N (λ, m) between the target noise spectrum and the normalized estimated noise spectrum is calculated (step ST42). The calculated square error D _N (λ, m) is output to the target noise spectrum selection unit 62a.

In Expression (20), m is a number for designating the target noise spectrum R _sm (k) shown in FIG.

目標雑音スペクトル選択部６２ａは、周波数特性分析部６４が算出した二乗誤差Ｄ_Ｎ（λ，ｍ）を入力とし、当該二乗誤差Ｄ_Ｎ（λ，ｍ）の値が最も小さくなる、即ち、現フレームの推定雑音スペクトルの周波数形状に最も近似する目標雑音スペクトルＲ（λ，ｋ）を、目標雑音スペクトルメモリ６３ａから選択する（ステップＳＴ４３）。選択された目標雑音スペクトルＲ（λ，ｋ）は、抑圧量制限係数計算部７に出力される。The target noise spectrum selection unit 62a receives the square error D _N (λ, m) calculated by the frequency characteristic analysis unit 64, and the value of the square error D _N (λ, m) is the smallest, that is, the current frame. The target noise spectrum R (λ, k) that most closely approximates the frequency shape of the estimated noise spectrum is selected from the target noise spectrum memory 63a (step ST43). The selected target noise spectrum R (λ, k) is output to the suppression amount limiting coefficient calculation unit 7.

以上のように、この実施の形態２によれば、目標雑音スペクトル生成部６ａが、雑音パワーのパタンで分類された１以上の周波数形状の目標雑音スペクトルと、推定雑音スペクトルの周波数特性のパタンで分類された１以上の周波数形状の目標雑音スペクトルとを蓄積した目標雑音スペクトルメモリ６３ａと、目標雑音スペクトルメモリ６３ａに蓄積された目標雑音スペクトルの雑音パワーと、推定雑音スペクトルの雑音パワーを用いて推定雑音スペクトルＮ（λ，ｋ）の正規化を行い、正規化推定雑音スペクトルと目標雑音スペクトルの二乗誤差Ｄ_Ｎを算出する周波数特性分析部６４と、目標雑音スペクトルメモリ６３ａを参照し、周波数特性分析部６４が算出した二乗誤差を用いて目標雑音スペクトルを選択する目標雑音スペクトル選択部６２ａとを備えるように構成したので、現フレームの推定雑音スペクトルの周波数形状に最も近似した目標雑音スペクトルを用いて抑圧処理を行うことができる。これにより、さらに良好に、音声に歪みや隠滅感および雑音感が生じない雑音抑圧を行うことができる。As described above, according to the second embodiment, the target noise spectrum generation unit 6a uses the target noise spectrum of one or more frequency shapes classified by the noise power pattern and the frequency characteristic pattern of the estimated noise spectrum. Estimation is performed using the target noise spectrum memory 63a storing the classified target noise spectrum of one or more frequency shapes, the noise power of the target noise spectrum stored in the target noise spectrum memory 63a, and the noise power of the estimated noise spectrum. normalizes the noise spectrum N (λ, k), the frequency characteristic analyzing unit 64 for calculating a square error D _N of the normalized estimated noise spectrum and a target noise spectrum, with reference to the target noise spectrum memory 63a, the frequency characteristic analysis Target noise spectrum selection using the square error calculated by the unit 64 to select a target noise spectrum Since it is configured to include a 62a, it is possible to perform the suppressing process using the target noise spectrum that best approximates the frequency shape of the estimated noise spectrum of the current frame. As a result, it is possible to perform noise suppression more satisfactorily without causing distortion, obscuration, and noise.

実施の形態３．
上述した実施の形態２では、目標雑音スペクトル選択部６２ａにおいて二乗誤差Ｄ_Ｎ（λ，ｍ）の値が最も小さくなる目標雑音スペクトルを選択する構成を示したが、この実施の形態３では、複数の目標雑音スペクトルから１つの目標雑音スペクトルを合成して出力する構成を示す。Embodiment 3 FIG.
In the second embodiment described above, the target noise spectrum selecting unit 62a has selected the target noise spectrum that minimizes the value of the square error D _N (λ, m). However, in the third embodiment, a plurality of target noise spectra are selected. A configuration in which one target noise spectrum is synthesized from the target noise spectrum and output.

図１１は、実施の形態３に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ｂの構成を示すブロック図である。なお、以下では、実施の形態２に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ａの構成要素と同一または相当する部分には、実施の形態２で使用した符号と同一の符号を付して説明を省略または簡略化する。 FIG. 11 is a block diagram illustrating a configuration of the target noise spectrum generation unit 6b of the noise suppression apparatus 100 according to the third embodiment. In the following, the same or equivalent parts as those of the target noise spectrum generating unit 6a of the noise suppression apparatus 100 according to the second embodiment are denoted by the same reference numerals as those used in the second embodiment. The description is omitted or simplified.

この実施の形態３の目標雑音スペクトル生成部６ｂは、目標雑音スペクトル選択部６２ｂの後段に重み付き平均処理部６５を追加して設けている。
目標雑音スペクトル選択部６２ｂは、目標雑音スペクトルメモリ６３ａを参照し、周波数特性分析部６４が算出した二乗誤差Ｄ_Ｎ（λ，ｍ）を用いて複数の目標雑音スペクトルＲ（λ，ｋ）を選択する。複数とは、例えば、二乗誤差Ｄ_Ｎ（λ，ｍ）の値が小さいものの上位２つの目標雑音スペクトルＲ（λ，ｋ）を選択するなどである。重み付き平均処理部６５は、目標雑音スペクトル選択部６２ｂが選択した複数の目標雑音スペクトルＲ（λ，ｋ）に対して重み付き平均処理を行い、平均化された１つの目標雑音スペクトルを得る。The target noise spectrum generation unit 6b of the third embodiment is additionally provided with a weighted average processing unit 65 subsequent to the target noise spectrum selection unit 62b.
The target noise spectrum selection unit 62b refers to the target noise spectrum memory 63a and selects a plurality of target noise spectra R (λ, k) using the square error D _N (λ, m) calculated by the frequency characteristic analysis unit 64. To do. “Multiple” means, for example, selecting the top two target noise spectra R (λ, k) although the value of the square error D _N (λ, m) is small. The weighted average processing unit 65 performs weighted average processing on the plurality of target noise spectra R (λ, k) selected by the target noise spectrum selection unit 62b, and obtains one averaged target noise spectrum.

次に、実施の形態３に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ｂの動作について説明する。図１２は、実施の形態３に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ｂの動作を示すフローチャートである。なお、以下では実施の形態２に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ａと同一のステップには図１０で使用した符号と同一の符号を付し、説明を省略または簡略化する。
周波数特性分析部６４が目標雑音スペクトルと正規化推定雑音スペクトルとの二乗誤差Ｄ_Ｎ（λ，ｍ）を算出すると（ステップＳＴ４２）、目標雑音スペクトル選択部６２ｂは、当該二乗誤差Ｄ_Ｎ（λ，ｍ）を入力とし、例えば、二乗誤差Ｄ_Ｎ（λ，ｍ）の値が小さいものの上位２つの目標雑音スペクトルを目標雑音スペクトルメモリ６３ａから選択する（ステップＳＴ５１）。重み付き平均処理部６５は、次の式（２１）を用いて目標雑音スペクトル選択部６２ｂが選択した２つの目標雑音スペクトルの重み付き平均処理を行い、平均化された１つの目標雑音スペクトルＲ_SYN（λ，ｋ）を得る（ステップＳＴ５２）。平均化された目標雑音スペクトルＲ_SYN（λ，ｋ）は、抑圧量制限係数計算部７に出力される。

Next, the operation of the target noise spectrum generation unit 6b of the noise suppression apparatus 100 according to Embodiment 3 will be described. FIG. 12 is a flowchart showing the operation of the target noise spectrum generation unit 6b of the noise suppression apparatus 100 according to the third embodiment. In the following, the same steps as those of the target noise spectrum generation unit 6a of the noise suppression apparatus 100 according to the second embodiment are denoted by the same reference numerals as those used in FIG. 10, and description thereof is omitted or simplified.
When the frequency characteristic analysis unit 64 calculates the square error D _N (λ, m) between the target noise spectrum and the normalized estimated noise spectrum (step ST42), the target noise spectrum selection unit 62b calculates the square error D _N (λ, m) is input and, for example, the top two target noise spectra with a small square error _DN (λ, m) are selected from the target noise spectrum memory 63a (step ST51). The weighted average processing unit 65 performs weighted average processing of the two target noise spectra selected by the target noise spectrum selection unit 62b using the following equation (21), and averages one target noise spectrum R _SYN (Λ, k) is obtained (step ST52). The averaged target noise spectrum R _SYN (λ, k) is output to the suppression amount limiting coefficient calculation unit 7.

ここで、上記の式（２１）は、Ｒ_RS1（ｋ）が第１位で選択された目標雑音スペクトル、Ｒ_RS2（ｋ）が第２位で選択された目標雑音スペクトルである場合の一例を示しているが、二乗誤差の値によっては別の目標雑音スペクトルが選択される場合がある。また、ｗは重み係数であり、第１位の目標雑音スペクトルにｗ＝０．８を設定するのが好適な事例であるが、入力信号の様態や二乗誤差の値に応じて適宜変更することも可能である。Here, the above equation (21) is an example in which R _RS1 (k) is the target noise spectrum selected at the first place, and R _RS2 (k) is the target noise spectrum selected at the second place. As shown, another target noise spectrum may be selected depending on the value of the square error. In addition, w is a weighting factor, and it is preferable to set w = 0.8 in the first target noise spectrum. However, it should be changed appropriately according to the state of the input signal and the value of the square error. Is also possible.

上記では説明を簡単にするために、２個の目標雑音スペクトルを用いて重み付き平均処理を行う構成を示したが、用いる目標雑音スペクトルの数は２個に限定されるものではなく、３個以上の目標雑音スペクトルを用いて重み付き平均処理を行うように構成してもよい。その場合、重み係数ｗは、用いる目標雑音スペクトルの個数および入力信号の形態に応じて適宜変更して構成すればよい。ここで、入力信号の形態とは、例えば入力信号に含まれる雑音信号スペクトルのバラつき度合いの違いや、雑音信号スペクトルのパワーの違いなどである。 In the above, for the sake of simplicity, the configuration in which the weighted averaging process is performed using two target noise spectra is shown, but the number of target noise spectra to be used is not limited to two, but three. You may comprise so that a weighted average process may be performed using the above target noise spectrum. In that case, the weight coefficient w may be appropriately changed according to the number of target noise spectra to be used and the form of the input signal. Here, the form of the input signal is, for example, a difference in the degree of variation in the noise signal spectrum included in the input signal, a difference in the power of the noise signal spectrum, or the like.

このように、推定雑音スペクトルに近似した複数の目標雑音スペクトルを用いて重み付き平均化処理を行い、平均化された１つの目標雑音スペクトルを得ることにより、例えば目標雑音スペクトルメモリ６３ａに蓄積されたどの目標雑音スペクトルと比較しても二乗誤差が大きく、１つの目標雑音スペクトルに決めることができない場合に、目標雑音スペクトルの安定化を図ることができる。 In this way, weighted averaging is performed using a plurality of target noise spectra approximated to the estimated noise spectrum, and one averaged target noise spectrum is obtained, for example, stored in the target noise spectrum memory 63a. When the square error is large compared to any target noise spectrum and cannot be determined as one target noise spectrum, the target noise spectrum can be stabilized.

以上のように、この実施の形態３によれば、目標雑音スペクトルと正規化推定雑音スペクトルとの二乗誤差に基づいて複数の目標雑音スペクトルを目標雑音スペクトルメモリ６３ａから選択する目標雑音スペクトル選択部６２ｂと、選択された目標雑音スペクトルの重み付き平均処理を行い、平均化された１つの目標雑音スペクトルを得る重み付き平均処理部６５とを備えるように構成したので、現フレームの推定雑音スペクトルの周波数形状に最も近似する１つの目標雑音スペクトルを決定することができない場合においても、平均化された１つの目標雑音スペクトルを選択することができる。これにより、目標雑音スペクトル選択の安定化を図ることができ、良好な雑音抑制を行うことができる。 As described above, according to the third embodiment, the target noise spectrum selection unit 62b that selects a plurality of target noise spectra from the target noise spectrum memory 63a based on the square error between the target noise spectrum and the normalized estimated noise spectrum. And a weighted average processing unit 65 that performs weighted average processing of the selected target noise spectrum and obtains one averaged target noise spectrum, so that the frequency of the estimated noise spectrum of the current frame is included. Even if one target noise spectrum that most closely approximates the shape cannot be determined, one averaged target noise spectrum can be selected. Thereby, stabilization of target noise spectrum selection can be aimed at and favorable noise suppression can be performed.

なお、上述した実施の形態３では、目標雑音スペクトルと推定雑音スペクトルとの二乗誤差に基づいて目標雑音スペクトルの重み付き平均処理を行う構成を示したが、当該構成に限定されることなく、例えば、実施の形態１の図３で示した目標雑音スペクトル生成部６の目標雑音スペクトル選択部６２の後段に重み付き平均処理部６５を追加するように構成してもよい。その場合、例えば、雑音パワーが近似する複数の目標雑音スペクトルを用いて重み付き平均処理を行う。 In addition, in Embodiment 3 mentioned above, although the structure which performs the weighted average process of a target noise spectrum based on the square error of a target noise spectrum and an estimated noise spectrum was shown, it is not limited to the said structure, For example, The weighted average processing unit 65 may be added after the target noise spectrum selection unit 62 of the target noise spectrum generation unit 6 shown in FIG. 3 of the first embodiment. In this case, for example, a weighted average process is performed using a plurality of target noise spectra that approximate noise power.

実施の形態４．
上述した実施の形態１から実施の形態３では、入力信号から推定した雑音スペクトルを用いて目標雑音スペクトルを生成する目標雑音スペクトル生成部６，６ａ，６ｂを示したが、この実施の形態４では入力信号以外の外部情報を用いて目標雑音スペクトルを生成する構成を示す。Embodiment 4 FIG.
In the first to third embodiments described above, the target noise spectrum generation units 6, 6a, and 6b that generate the target noise spectrum using the noise spectrum estimated from the input signal have been described. The structure which produces | generates a target noise spectrum using external information other than an input signal is shown.

図１３は、実施の形態４に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ｃの構成を示すブロック図である。なお、以下では、実施の形態３に係る雑音抑圧装置１００の目標雑音スペクトル生成部６ｂの構成要素と同一または相当する部分には、実施の形態３で使用した符号と同一の符号を付して説明を省略または簡略化する。 FIG. 13 is a block diagram illustrating a configuration of the target noise spectrum generation unit 6c of the noise suppression apparatus 100 according to the fourth embodiment. In the following, the same or equivalent parts as those of the target noise spectrum generating unit 6b of the noise suppression apparatus 100 according to the third embodiment are denoted by the same reference numerals as those used in the third embodiment. The description is omitted or simplified.

目標雑音スペクトル選択部６２ｃは、外部情報の入力を受け付ける。ここで、外部情報とは、当該雑音抑圧装置１００を車両に適用する場合には、エアコンの風量、ドア・窓・屋根の開閉情報、エンジンやモータの回転数などを用いることができる。また、外部情報はユーザ操作による入力情報、即ちユーザの好みに応じた目標雑音スペクトルの選択情報であってもよい。例えば、外部情報としてエアコンの風量を用いた場合、目標雑音スペクトル選択部６２ｃは「エアコン風量＝小」との外部情報が入力されると、予め設定された「風量＝小」に対応する目標雑音スペクトルを目標雑音スペクトルメモリ６３ｂから選択する。「エアコン風量＝大」との外部情報が入力されると、予め設定された「風量＝大」に対応する目標雑音スペクトルを目標雑音スペクトルメモリ６３ｂから選択する。さらに、目標雑音スペクトル選択部６２ｃは、推定雑音スペクトルに対応した目標雑音スペクトルを選択する。 The target noise spectrum selection unit 62c receives input of external information. Here, as the external information, when the noise suppression apparatus 100 is applied to a vehicle, the air volume of an air conditioner, door / window / roof opening / closing information, the number of revolutions of an engine or a motor, and the like can be used. Further, the external information may be input information by a user operation, that is, selection information of a target noise spectrum according to the user's preference. For example, when the air volume of the air conditioner is used as the external information, the target noise spectrum selection unit 62c receives the target information corresponding to the preset “air volume = small” when the external information “air conditioner air volume = small” is input. A spectrum is selected from the target noise spectrum memory 63b. When the external information “air conditioner air volume = large” is input, the target noise spectrum corresponding to the preset “air volume = high” is selected from the target noise spectrum memory 63b. Further, the target noise spectrum selection unit 62c selects a target noise spectrum corresponding to the estimated noise spectrum.

目標雑音スペクトルメモリ６３ｂは、雑音パワーのパタンで分類された１以上の周波数形状の目標雑音スペクトル、および推定雑音スペクトルの周波数特性のパタンで分類された１以上の周波数形状の目標雑音スペクトルに加えて、上述した外部情報のパタンで分離された１以上の周波数形状の目標雑音スペクトルを蓄積している。重み付き平均処理部６５は、上述した式（２１）を用いて外部情報に対応した目標雑音スペクトルおよび推定雑音スペクトルに対応した目標雑音スペクトルの重み付き平均処理を行い、平均化された１つの目標雑音スペクトルＲ_SYN（λ，ｋ）を求めて出力する。The target noise spectrum memory 63b includes, in addition to the target noise spectrum of one or more frequency shapes classified by the noise power pattern, and the target noise spectrum of one or more frequency shapes classified by the frequency characteristic pattern of the estimated noise spectrum. The target noise spectrum having one or more frequency shapes separated by the above-described external information pattern is accumulated. The weighted average processing unit 65 performs weighted average processing of the target noise spectrum corresponding to the external information and the target noise spectrum corresponding to the estimated noise spectrum using the above-described equation (21), and averages one target The noise spectrum R _SYN (λ, k) is obtained and output.

以上のように、この実施の形態４によれば、推定雑音スペクトルに加えて外部情報に応じた目標雑音スペクトルを選択する目標雑音スペクトル選択部６２ｃと、推定雑音スペクトルに対応した目標雑音スペクトルおよび外部情報に対応した目標雑音スペクトルの重み付き平均処理を行い、平均化された１つの目標雑音スペクトルを得る重み付き平均処理部６５とを備えるように構成したので、マイクロホンから入力された雑音信号以外の外部情報も用いて選択した複数の目標雑音スペクトルを重み付き平均化処理することができ、目標雑音スペクトルの精度を高めることができる。これにより、目標雑音スペクトルの変更の応答速度が向上し、より良好な雑音抑圧を行うことができる。 As described above, according to the fourth embodiment, the target noise spectrum selecting unit 62c that selects the target noise spectrum corresponding to the external information in addition to the estimated noise spectrum, the target noise spectrum corresponding to the estimated noise spectrum, and the external Since the weighted average processing of the target noise spectrum corresponding to the information is performed and the weighted average processing unit 65 for obtaining one averaged target noise spectrum is provided, a noise signal other than the noise signal input from the microphone is provided. A plurality of target noise spectra selected using external information can be weighted and averaged, and the accuracy of the target noise spectrum can be improved. As a result, the response speed of changing the target noise spectrum is improved, and better noise suppression can be performed.

なお、上述した実施の形態１から実施の形態４では、抑圧量計算部９がＪｏｉｎｔＭＡＰ法に基づいて雑音抑圧量Ｇ（λ、ｋ）を算出し、算出された雑音抑圧量Ｇ（λ、ｋ）を用いてスペクトル抑圧部１０が雑音抑圧を行う構成を例に説明を行ったが、雑音抑圧量の算出はＪｏｉｎｔＭＡＰ法に限定されるものではなく、その他の方法を適用することも可能である。例えば、上述した非特許文献１に詳述されている最小平均２乗誤差短時間スペクトル振幅法、および以下に示す参考文献２に詳述されているスペクトル減算法などを適用することができる。
［参考文献２］
Ｓ．Ｆ．Ｂｏｌｌ，“ＳｕｐｐｒｅｓｓｉｏｎｏｆＡｃｏｕｓｔｉｃＮｏｉｓｅｉｎＳｐｅｅｃｈＵｓｉｎｇＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ”（ＩＥＥＥＴｒａｎｓ．ｏｎＡＳＳＰ，Ｖｏｌ．２７，Ｎｏ．２，ｐｐ．１１３−１２０，Ａｐｒ．１９７９）In the first to fourth embodiments described above, the suppression amount calculation unit 9 calculates the noise suppression amount G (λ, k) based on the Joint MAP method, and the calculated noise suppression amount G (λ, The example in which the spectrum suppression unit 10 performs noise suppression using k) has been described, but the calculation of the noise suppression amount is not limited to the Joint MAP method, and other methods may be applied. It is. For example, the minimum mean square error short-time spectrum amplitude method detailed in Non-Patent Document 1 described above and the spectral subtraction method detailed in Reference Document 2 shown below can be applied.
[Reference 2]
S. F. Boll, “Suppression of Acoustic Noise in Spectral Usage Subtraction” (IEEE Trans. On ASSP, Vol. 27, No. 2, pp. 113-120, Apr. 1979).

また、上述した実施の形態１から実施の形態４では、入力信号の全帯域について抑圧量制御を行う構成を示したが、これに限定されるものではなく、例えば必要に応じて低域のみまたは高域のみ抑圧量制御を行ってもよいし、また例えば５００〜８００Ｈｚ近傍のみといった特定の周波数帯域のみ抑圧量制御を行うように構成しても良い。このような限定的な周波数帯域に対する抑圧量制御は、風きり音、自動車エンジンおよびモータ回転音などの狭帯域性ノイズに有効である。
さらに、図８で示した例では雑音抑圧対象が狭帯域電話音声（０〜４０００Ｈｚ）である場合を想定して説明を行ったが、雑音抑圧対象は狭帯域電話音声に限定されるものではなく、例えば０〜８０００Ｈｚの広帯域電話音声および音響信号に対しても適用可能である。In the first to fourth embodiments described above, the configuration in which the suppression amount control is performed for the entire band of the input signal has been described. However, the present invention is not limited to this. The suppression amount control may be performed only in the high frequency, or the suppression amount control may be performed only in a specific frequency band such as only in the vicinity of 500 to 800 Hz. Such suppression amount control for a limited frequency band is effective for narrow-band noise such as wind noise, automobile engine, and motor rotation noise.
Furthermore, in the example shown in FIG. 8, the description has been made assuming that the noise suppression target is a narrowband telephone voice (0 to 4000 Hz), but the noise suppression target is not limited to the narrowband telephone voice. For example, the present invention can be applied to a broadband telephone voice and an acoustic signal of 0 to 8000 Hz.

また、上述した実施の形態１から実施の形態４において、雑音抑圧された音声信号は、デジタルデータ形式で音声符号化装置、音声認識装置、音声蓄積装置、ハンズフリー通話装置等の各種音声音響処理装置へ送出されるが、実施の形態１から実施の形態４の雑音抑圧装置１００は、単独または上述の他の装置と共にＤＳＰ（デジタル信号処理プロセッサ）によって実現する、あるいはソフトウエアプログラムとして実行することによっても実現可能である。プログラムはソフトウエアプログラムを実行するコンピュータの記憶装置に記憶させる構成としてもよいし、ＣＤ−ＲＯＭなどの記憶媒体にて配布される形式でも良い。また、ネットワークを通じてプログラムを提供することも可能である。また、各種音声音響処理装置へ送出される他、Ｄ／Ａ（デジタル・アナログ）変換の後、増幅装置にて増幅し、スピーカなどから直接音声信号として出力することも可能である。 In the first to fourth embodiments described above, the noise-suppressed voice signal is converted into a digital data format from various audio-acoustic processes such as a voice encoding device, a voice recognition device, a voice storage device, and a hands-free call device. Although being transmitted to the apparatus, the noise suppression apparatus 100 according to the first to fourth embodiments is realized by a DSP (digital signal processor) alone or together with the other apparatuses described above, or executed as a software program. This is also possible. The program may be stored in a storage device of a computer that executes the software program, or may be distributed on a storage medium such as a CD-ROM. It is also possible to provide a program through a network. In addition to being sent to various audio-acoustic processing apparatuses, after D / A (digital / analog) conversion, it can be amplified by an amplifying apparatus and directly output as an audio signal from a speaker or the like.

また、上述した実施の形態１から実施の形態４では、一例として車両走行時の騒音を挙げて説明したが、これに限定されるものではなく、例えば、列車走行時の騒音や航空機騒音、エレベーターなどの昇降機動作騒音や、工場内の騒音や雑踏騒音などにも適用可能であり、実施の形態１から実施の形態４のそれぞれにて述べた効果を同様に奏功する。 Moreover, in Embodiment 1 to Embodiment 4 described above, the noise during vehicle travel has been described as an example. However, the present invention is not limited to this. For example, noise during travel of a train, aircraft noise, elevator The present invention can also be applied to elevator operating noise, factory noise, and hustle noise, and the effects described in the first to fourth embodiments are similarly achieved.

なお、本発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、各実施の形態の任意の構成要素の変形、または各実施の形態の任意の構成要素の省略が可能である。 In the present invention, within the scope of the invention, any combination of each embodiment, any component of each embodiment can be modified, or any component of each embodiment can be omitted.

この発明に係る雑音抑圧装置は、高品質な雑音抑圧が可能なため、音声通信・音声蓄積・音声認識システムが導入された、カーナビゲーション・携帯電話・インターフォン等の音声通信システム・ハンズフリー通話システム・ＴＶ会議システム・監視システム等の音質改善、および、音声認識システムの認識率の向上のために供するのに適している。 Since the noise suppression apparatus according to the present invention is capable of high-quality noise suppression, a voice communication system such as a car navigation system, a mobile phone, and an interphone, and a hands-free call system in which voice communication / sound storage / speech recognition system is introduced -Suitable for use in improving the sound quality of TV conference systems, surveillance systems, etc., and improving the recognition rate of voice recognition systems.

１入力端子、２フーリエ変換部、３パワースペクトル計算部、４音声・雑音区間判定部、５雑音スペクトル推定部、６，６ａ，６ｂ，６ｃ目標雑音スペクトル生成部、７抑圧量制限係数計算部、８ＳＮ比計算部、９抑圧量計算部、１０スペクトル抑圧部、１１逆フーリエ変換部、１２出力端子、６１雑音パワー計算部、６２，６２ａ，６２ｂ，６２ｃ目標雑音スペクトル選択部、６３，６３ａ，６３ｂ目標雑音スペクトルメモリ、６４周波数特性分析部、６５重み付き平均処理部、７１パワー計算部、７２係数補正部、１００雑音抑圧装置。 1 input terminal, 2 Fourier transform unit, 3 power spectrum calculation unit, 4 speech / noise interval determination unit, 5 noise spectrum estimation unit, 6, 6a, 6b, 6c target noise spectrum generation unit, 7 suppression amount limit coefficient calculation unit, 8 SN ratio calculation unit, 9 suppression amount calculation unit, 10 spectrum suppression unit, 11 inverse Fourier transform unit, 12 output terminal, 61 noise power calculation unit, 62, 62a, 62b, 62c target noise spectrum selection unit, 63, 63a, 63b Target noise spectrum memory, 64 frequency characteristic analysis unit, 65 weighted average processing unit, 71 power calculation unit, 72 coefficient correction unit, 100 noise suppression device.

この発明に係る雑音抑圧装置は、入力信号に関連する情報を用いて、あらかじめ生成された複数の周波数形状に対応した雑音スペクトルである目標雑音スペクトル候補から、目標雑音スペクトルを生成する目標雑音スペクトル生成部と、生成された目標雑音スペクトルに基づいて、入力信号に含まれた雑音の抑圧量の上下限を規定する抑圧量制限係数を計算する抑圧量制限係数計算部と、計算された抑圧量制限係数を用いて、スペクトル抑圧量を計算する抑圧量計算部とを備え、目標雑音スペクトル生成部は、推定雑音スペクトルの雑音パワーを計算する雑音パワー計算部と、雑音パワー計算部が計算した雑音パワーを用いて、複数の目標雑音スペクトル候補から目標雑音スペクトルを選択する目標雑音スペクトル選択部と、目標雑音スペクトル選択部が選択した複数の目標雑音スペクトルの重みつき平均を求め、平均化された目標雑音スペクトルを取得する重み付き平均処理部とを備えるものである。 A noise suppression device according to the present invention generates a target noise spectrum that generates a target noise spectrum from a target noise spectrum candidate that is a noise spectrum corresponding to a plurality of frequency shapes generated in advance using information related to an input signal. Based on the generated target noise spectrum, a suppression amount limiting coefficient calculation unit for calculating a suppression amount limiting coefficient that defines the upper and lower limits of the amount of noise suppression included in the input signal, and the calculated suppression amount limitation A suppression amount calculation unit that calculates a spectrum suppression amount using a coefficient , a target noise spectrum generation unit, a noise power calculation unit that calculates a noise power of an estimated noise spectrum, and a noise power calculated by the noise power calculation unit To select a target noise spectrum from a plurality of target noise spectrum candidates, and a target noise spectrum. Obtaining the weighted average of a plurality of target noise spectrum Le selecting unit selects one in which and a weighted average processing unit for acquiring a target noise spectrum averaged.

Claims

Using the spectral component obtained by converting the input signal from the time domain to the frequency domain and the estimated noise spectrum estimated from the input signal, the spectrum suppression amount for suppressing the noise included in the input signal is calculated and calculated. In the noise suppression device that generates the noise suppression signal by suppressing the amplitude of the spectrum component of the input signal using the spectrum suppression amount, and changing the amplitude component of the spectrum signal to the time domain,
A target noise spectrum generation unit that generates a target noise spectrum from a target noise spectrum candidate that is a noise spectrum corresponding to a plurality of frequency shapes generated in advance using information related to the input signal;
Based on the target noise spectrum generated by the target noise spectrum generating unit, a suppression amount limiting coefficient calculating unit that calculates a suppression amount limiting coefficient that defines upper and lower limits of the amount of noise suppression included in the input signal;
A noise suppression apparatus comprising: a suppression amount calculation unit that calculates the spectrum suppression amount using the suppression amount limitation coefficient calculated by the suppression amount limitation coefficient calculation unit.

The target noise spectrum generator is
A noise power calculator for calculating the noise power of the estimated noise spectrum;
The noise suppression according to claim 1, further comprising: a target noise spectrum selection unit that selects a target noise spectrum from the plurality of target noise spectrum candidates using the noise power calculated by the noise power calculation unit. apparatus.

The target noise spectrum generator is
A noise power calculator for calculating the noise power of the estimated noise spectrum;
Using the noise power calculated by the noise power calculator, a frequency characteristic analyzer that analyzes frequency characteristics of the estimated noise spectrum;
2. A target noise spectrum selection unit that selects a target noise spectrum from the plurality of target noise spectrum candidates using a frequency characteristic of the estimated noise spectrum analyzed by the frequency characteristic analysis unit. The noise suppressor described.

The target noise spectrum generation unit includes a weighted average processing unit that obtains a weighted average of a plurality of target noise spectra selected by the target noise spectrum selection unit and acquires an averaged target noise spectrum. The noise suppression device according to claim 2 or 3.

Using the spectral component obtained by converting the input signal from the time domain to the frequency domain and the estimated noise spectrum estimated from the input signal, the spectrum suppression amount for suppressing the noise included in the input signal is calculated and calculated. In the noise suppression method for generating a noise suppression signal by suppressing the amplitude of the spectrum component of the input signal using the spectrum suppression amount, and changing the amplitude component of the spectrum signal to the time domain,
A target noise spectrum generation step in which the target noise spectrum generation unit generates a target noise spectrum from a target noise spectrum candidate that is a noise spectrum corresponding to a plurality of frequency shapes generated in advance using information related to the input signal. When,
A suppression amount limiting coefficient calculating unit that calculates a suppression amount limiting coefficient that defines upper and lower limits of the amount of noise suppression included in the input signal based on the target noise spectrum; and
A noise suppression method, comprising: a suppression amount calculation unit including a suppression amount calculation step of calculating the spectrum suppression amount using the suppression amount restriction coefficient.

A procedure for calculating a spectrum suppression amount for suppressing noise included in the input signal using a spectral component obtained by converting the input signal from the time domain to the frequency domain and an estimated noise spectrum estimated from the input signal; For causing a computer to execute a procedure of amplitude suppressing the spectrum component of the input signal using the calculated spectrum suppression amount and a procedure of generating a noise suppression signal by changing the spectrum component subjected to the amplitude suppression to the time domain. In the noise suppression program,
A target noise spectrum generation procedure for generating a target noise spectrum from a target noise spectrum candidate that is a noise spectrum corresponding to a plurality of frequency shapes generated in advance using information related to the input signal;
Based on the target noise spectrum generated by the target noise spectrum generation procedure, a suppression amount limiting coefficient calculation procedure for calculating a suppression amount limiting coefficient that defines upper and lower limits of the noise suppression amount included in the input signal;
A noise suppression program comprising: a suppression amount calculation procedure for calculating the spectrum suppression amount using the suppression amount limitation coefficient calculated by a suppression amount limitation coefficient calculation procedure.