JP5526524B2

JP5526524B2 - Noise suppression device and noise suppression method

Info

Publication number: JP5526524B2
Application number: JP2008274858A
Authority: JP
Inventors: 恩彩劉
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-10-24
Filing date: 2008-10-24
Publication date: 2014-06-18
Anticipated expiration: 2028-10-24
Also published as: CN101727911B; JP2010102201A; KR20100045934A; CN101727911A; KR101088558B1

Description

本発明は、雑音抑圧装置及び雑音抑圧方法に関する。 The present invention relates to a noise suppression device and a noise suppression method.

従来、入力信号に応じてスピーカ等の負荷を駆動する音声再生装置、遠隔地間で音声を伝達する音声通信装置、音声の種別等を区別・認識することでその意味等を理解する音声認識装置、等々が提案されている。これら各装置では、音声を正確に再生し、伝達し、あるいは認識等するため、そこに含まれる雑音の影響が除去されるのが好適である。
そのような雑音抑圧の技術としては、例えば以下の特許文献１及び２、並びに非特許文献１乃至５に開示されているようなものが知られている。
特開２００７−２２６２６４号公報特開２００５−２５７７４８号公報 Boll,S., ”Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans.Vol.ASSP-27, No2, pp.113-120, 1979. M.Berouti, el al, “Enhancement of Speech Corrupted by Acoustic Noise”, Proceedings of ICASSP, pp.201-211, 1979. Lim & Oppenheim, “Enhancement and Bandwidth Compression of Noisy Speech”, Proc.IEEE, Vol67, No12, pp.1586-1604, 1979 Y.Ephraim and D.Malah, “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Trans.Vol.ASSP-32, No.6, pp1109-1121, 1984.. 佐々木潤子、田中雅史，“マスキング効果を用いた低歪み雑音低減方式の検討”，信学技報，EA98-106， pp.37-42, 1998 Conventionally, a voice reproduction device that drives a load such as a speaker according to an input signal, a voice communication device that transmits voice between remote locations, and a voice recognition device that understands the meaning and the like by distinguishing and recognizing the type of voice , Etc. have been proposed. In each of these apparatuses, since the sound is accurately reproduced, transmitted, or recognized, it is preferable that the influence of noise included therein is removed.
As such noise suppression techniques, for example, those disclosed in the following Patent Documents 1 and 2 and Non-Patent Documents 1 to 5 are known.
JP 2007-226264 A JP 2005-257748 A Boll, S., “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans.Vol.ASSP-27, No2, pp.113-120, 1979. M. Berouti, el al, “Enhancement of Speech Corrupted by Acoustic Noise”, Proceedings of ICASSP, pp. 201-211, 1979. Lim & Oppenheim, “Enhancement and Bandwidth Compression of Noisy Speech”, Proc.IEEE, Vol67, No12, pp.1586-1604, 1979 Y. Ephraim and D. Malah, “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Trans.Vol.ASSP-32, No.6, pp1109-1121, 1984. Junko Sasaki, Masafumi Tanaka, “Study of low distortion noise reduction method using masking effect”, IEICE Technical Report, EA98-106, pp.37-42, 1998

これら各文献に開示される技術は、基本的に、周波数領域の振幅スペクトルのレベルを適当な手法によって差し引くことによって雑音を抑圧する手法、即ちいわゆるスペクトラル・サブトラクション法（Spectral Subtraction）に関連し、これらいずれの技術によっても、一定の雑音抑圧効果が享受され得る。 The techniques disclosed in these documents are basically related to a technique for suppressing noise by subtracting the level of the amplitude spectrum in the frequency domain by an appropriate technique, that is, a so-called spectral subtraction method. even by the one of the technologies, certain noise suppression effect can be enjoyed.

しかしながら、これらの文献においては未開示・未解決の課題はなおある。
例えば、前記のスペクトラル・サブトラクション法は、入力信号中に含まれる雑音スペクトルを推定し、その雑音スペクトル推定値を、振幅スペクトルから差し引く手法に基づくが、この場合、いわゆるミュージカルノイズが発生するおそれが高いという問題がある。これは、ここでいう雑音スペクトル推定値が、必ずしも、実際の雑音スペクトルを反映するとは限らないことによる。つまり、ある周波数帯域では、雑音スペクトル推定値を差し引いた後にもなお雑音が残る場合があり、また、他の周波数帯域では、引き過ぎが生じる場合があるのである。このため、雑音スペクトル差し引き後の振幅スペクトルを時間領域に再変換すると、複数のランダムな周波数をもつ正弦波が合成されたものが現れる可能性があり、これが再生されることによって、非常に耳障りな雑音（即ち、ミュージカルノイズ）が発生してしまうおそれがあるのである。 However, these documents still have undisclosed and unsolved problems.
For example, the spectral subtraction method is based on a method of estimating a noise spectrum included in an input signal and subtracting the noise spectrum estimation value from an amplitude spectrum. In this case, there is a high possibility that so-called musical noise is generated. There is a problem. This is because the estimated noise spectrum here does not necessarily reflect the actual noise spectrum. That is, in some frequency bands, noise may still remain after the noise spectrum estimation value is subtracted, and in other frequency bands, excessive noise may occur. For this reason, if the amplitude spectrum after subtracting the noise spectrum is reconverted to the time domain, a composite of sine waves with a plurality of random frequencies may appear, and this is reproduced, which is very annoying. Noise (that is, musical noise) may be generated.

また、このミュージカルノイズを抑圧するため、雑音スペクトル推定値を差し引いた後の振幅スペクトルに、原音を加算する処理を行うことが提案されている（前記非特許文献５参照）。
しかし、この技術では、原音加算割合が、入力信号のＳＮＲを推定することに基づき定められるようになっているので、雑音抑圧処理のいわば柔軟性に若干欠ける嫌いがある。例えば、前記非特許文献５では、ＳＮＲが１０〜１５〔ｄＢ〕であることを基準に、ＳＮＲがそれ以上の時は原音加算割合が０．０５、それを下回る時は０．５、などというように、本来抑圧したい、あるいは、抑圧すべき雑音量とは無関係に、原音加算割合の設定が行われるため、実際上享受される雑音抑圧効果に若干の問題があるのである。このような事情は、前述した特許文献２においてもあてはまる。 In order to suppress this musical noise, it has been proposed to perform processing for adding the original sound to the amplitude spectrum after subtracting the estimated noise spectrum value (see Non-Patent Document 5).
However, in this technique, since the original sound addition ratio is determined based on the estimation of the SNR of the input signal, the noise suppression processing is somewhat lacking in flexibility. For example, in Non-Patent Document 5, on the basis that the SNR is 10 to 15 [dB], the original sound addition ratio is 0.05 when the SNR is higher than that, and 0.5 when the SNR is lower than that. Thus, since the original sound addition ratio is set regardless of the amount of noise that is originally desired to be suppressed or to be suppressed, there are some problems in the noise suppression effect that is actually enjoyed. Such a situation also applies to the above-mentioned Patent Document 2.

さらに、入力信号中には、主に音声が占める部分（音声部分）と、それが殆どない部分（雑音部分）とが存在する。
このような状況下で、例えば、前述の非特許文献１のように、音声部分にはスペクトラル・サブトラクション法を適用するが、雑音部分には固定ゲインを適用して雑音を抑圧するという場合、その固定ゲインの値が過小のときは、雑音部分から音声部分への切り替わりの場面において、背景雑音量が大きくなるという現象が生じ、固定ゲインの値が過大のときは、逆に背景雑音量が小さくなるという現象が生じ得る。これが例えば再生されれば、聴取者に、聴感上の不自然さを感じさせるおそれが高い。 Furthermore, in the input signal, there are mainly a portion occupied by speech (speech portion) and a portion almost free of it (noise portion).
Under such circumstances, for example, as in Non-Patent Document 1 described above, the spectral subtraction method is applied to the speech portion, but the noise is suppressed by applying a fixed gain to the noise portion. When the fixed gain value is too small, the background noise amount increases when the noise part switches to the voice part. When the fixed gain value is too large, the background noise amount decreases. The phenomenon of becoming can occur. If this is reproduced, for example, the listener is likely to feel unnaturalness in the sense of hearing.

本発明は、上述した課題の少なくとも一部を解決することの可能な雑音抑圧装置及び雑音抑圧方法を提供することを課題とする。 It is an object of the present invention to provide a noise suppression device and a noise suppression method that can solve at least a part of the above-described problems.

本発明に係る雑音抑圧装置は、上述した課題を解決するため、Ｋ個の周波数帯域ごと（ただし、Ｋは２以上の自然数）に、入力信号に含まれる雑音スペクトルを当該入力信号に基づいて推定する雑音スペクトル推定手段と、前記雑音スペクトル推定手段による推定結果に基づいて、前記Ｋ個の周波数帯域ごとの雑音抑圧ゲインを算出する第１ゲイン演算手段と、前記入力信号に前記雑音抑圧ゲインを適用した結果得られる第１雑音抑圧後信号に、当該雑音抑圧ゲインと、目標としている雑音抑圧の程度を示す目標雑音抑圧ゲインとの差に基づいて定められた第１割合で、前記入力信号を加算する原音加算手段と、を備える。 In order to solve the above-described problem, the noise suppression device according to the present invention estimates a noise spectrum included in an input signal for each of K frequency bands (where K is a natural number of 2 or more) based on the input signal. Noise spectrum estimating means, first gain calculating means for calculating a noise suppression gain for each of the K frequency bands based on an estimation result by the noise spectrum estimating means, and applying the noise suppression gain to the input signal The input signal is added at a first ratio determined based on the difference between the noise suppression gain and the target noise suppression gain indicating the target level of noise suppression to the first noise-suppressed signal obtained as a result of And original sound adding means.

本発明によれば、まず、原音加算手段が、第１雑音抑圧後信号に入力信号を加算するようになっているので、例えば前述したような雑音スペクトル推定値による振幅スペクトルの引き過ぎの事例が発生したとしても、その部分が当該原音によっていわば埋め合わせされるようなかたちになるので、前記ミュージカルノイズの発生がきわめて実効的に抑圧される。
しかも、本発明においては、原音加算割合としての意義をもつ「第１割合」が、雑音抑圧ゲインと目標雑音抑圧ゲインとの差に基づいて定められるようになっているので、実際上享受される雑音抑圧効果がきわめて実効的になる。というのも、本発明の場合、前述の例のようにＳＮＲ推定値と基準値との単純な比較によるのではなく、当該の「第１割合」が、目標雑音抑圧ゲインとの関係において定められ、また、この目標雑音抑圧ゲインは、本来抑圧したい、あるいは抑圧すべき雑音量に基づいて設定され得るものであるからである。
なお、本発明にいう「目標雑音抑圧ゲイン」は、本発明に係る雑音抑圧装置外部に設置された操作部等を介することによって人為的に与えられるようにしたり、あるいは、何らかの適当な手法により自動的に演算されるようにしてよい。 According to the present invention, since the original sound adding means adds the input signal to the first noise-suppressed signal, there is a case where the amplitude spectrum is excessively drawn by the noise spectrum estimation value as described above, for example. Even if it occurs, the portion is made up of the original sound, so that the occurrence of the musical noise is suppressed extremely effectively.
In addition, in the present invention, the “first ratio” having the significance as the original sound addition ratio is determined based on the difference between the noise suppression gain and the target noise suppression gain, so that it is actually enjoyed. The noise suppression effect becomes extremely effective. This is because, in the case of the present invention, the “first ratio” is determined in relation to the target noise suppression gain, not by a simple comparison between the SNR estimated value and the reference value as in the above example. This is because the target noise suppression gain can be set based on the amount of noise that is originally desired to be suppressed or to be suppressed.
The “target noise suppression gain” referred to in the present invention can be given artificially through an operation unit or the like installed outside the noise suppression apparatus according to the present invention, or can be automatically generated by any appropriate technique. May be calculated automatically.

この発明に係る雑音抑圧装置では、前記雑音抑圧ゲインについての、前記Ｋ個の周波数帯域に関する平均値ゲインを算出する第２ゲイン演算手段を更に備え、前記原音加算手段は、前記入力信号に前記平均値ゲインを適用した結果得られる第２雑音抑圧後信号に、当該平均値ゲインと前記目標雑音抑圧ゲインとの差に基づいて定められた第２割合で、前記入力信号を加算する、ように構成してもよい。
この態様によれば、第２雑音抑圧後信号に、「第２割合」で、入力信号が加算される。
ここで本態様では、「平均値ゲイン」が用いられることに特徴の１つがある。この平均値ゲインは、仮に、前記の雑音抑圧ゲインが、Ｇ（１），Ｇ（２），…，Ｇ（Ｋ）であるとすると、例えば、その平均値ゲインたるＧａｖｅが、Ｃａｖｅ＝（Ｇ（１）＋Ｇ（２）＋…＋Ｇ（Ｋ））／Ｋなどとして求められる。そして、さらに、入力信号を、周波数領域に変換して得られる振幅スペクトルが、Ｙ（１），Ｙ（２），…，Ｙ（Ｋ）であるとすると、周波数領域にある出力信号は、Ｇａｖｅ・Ｙ（１），Ｇａｖｅ・Ｙ（２），…，Ｇａｖｅ・Ｙ（Ｋ）、などと得られることになる（これは、本態様にいう「平均値ゲインを適用した結果得られる第２雑音抑圧後信号」の一具体例である。）。
したがって、まず、このような平均値ゲインの存在によれば、例えば前述したような雑音スペクトル推定値による振幅スペクトルの引き過ぎの事例等が発生することがないから、ミュージカルノイズの発生がきわめて実効的に抑圧される。
のみならず、本態様では更に、このような平均値ゲインと前記目標雑音抑圧ゲインとの差に基づいて定められた第２割合で、原音加算が行われるようになっているので、ミュージカルノイズの抑圧効果は更に効果的に享受されることになる。 The noise suppression apparatus according to the present invention further comprises second gain calculation means for calculating an average value gain for the K frequency bands for the noise suppression gain, wherein the original sound addition means is configured to add the average signal to the input signal. The input signal is added to the second noise-suppressed signal obtained as a result of applying the value gain at a second ratio determined based on the difference between the average gain and the target noise suppression gain. May be.
According to this aspect, the input signal is added to the second noise-suppressed signal at the “second ratio”.
Here, in this aspect, one of the features is that “average value gain” is used. Assuming that the noise suppression gain is G (1), G (2),..., G (K), for example, Gave as the average gain is Cave = (G (1) + G (2) +... + G (K)) / K or the like. Further, if the amplitude spectrum obtained by converting the input signal to the frequency domain is Y (1), Y (2),..., Y (K), the output signal in the frequency domain is Gave. Y (1), Gave · Y (2),..., Gave · Y (K), etc. (this is the “second noise obtained as a result of applying the average gain” in this aspect. This is a specific example of “post-suppression signal”.)
Therefore, first, the existence of such an average value gain does not cause, for example, an excessive case of the amplitude spectrum due to the estimated noise spectrum as described above, so that the generation of musical noise is extremely effective. Repressed.
In addition, in this aspect, since the original sound is added at the second ratio determined based on the difference between the average value gain and the target noise suppression gain, the musical noise can be reduced. The suppression effect can be enjoyed more effectively.

なお、本態様にいう「第２雑音抑圧後信号に、…第２割合で、…入力信号を加算する」と、前記の「第１雑音抑圧後信号に、…第１割合で、…入力信号を加算する」との関係は、次のようである。
まず、「第１雑音抑圧後信号」の生成及び「第１割合」の定立の双方に「雑音抑圧ゲイン」が関わり、前者は後者に基づく原音加算処理の対象となる一方、「第２雑音抑圧後信号」の生成及び「第２割合」の定立の双方に「平均値ゲイン」が関わり、前者は後者に基づく原音加算処理の対象となる、という概念上の区別を確認する。
この前提で、まず、本態様にいう「平均値ゲイン」が算出される以上は、本発明に係る「原音加算手段」は、その「平均値ゲイン」に基づく「第２割合」による原音加算を行うだけで、「第１割合」による原音加算は行わないという関係が、本発明においては成立しうる。つまり、この場合は、本態様が、前述の「第１割合」を含む態様に対して、いわば優先的地位に立つ。
あるいは、入力信号中のある部分では「第１割合」による原音加算が、他の部分では「第２割合」による原音加算が行われる、などといった態様も、本発明においては成立しうる（この場合、前者の部分は「第１雑音抑圧後信号」に関連し、後者の部分は「第２雑音抑圧後信号」に関連するということになる。）。
さらに、場合によっては、「第２雑音抑圧後信号」に「第２割合」による原音加算を行うだけでなく、「第１雑音抑圧後信号」にも「第２割合」による原音加算を行う、などといった態様も、本発明は特にその範囲内に収める（かかる態様は、本発明の各種態様の１つとして、後に言及される。）。このことはつまり、「平均値ゲイン」でなく「雑音抑圧ゲイン」の関与で成立する「第１雑音抑圧後信号」にも、「平均値ゲイン」に基づいて定められる「第２割合」の適用があってよいことを含意する（場合によっては、その逆も然りである。）。 It should be noted that “add the input signal to the second noise-suppressed signal at a second ratio” in the present embodiment and the above-mentioned “add the first noise-suppressed signal to the first noise suppression signal at the first ratio. The relationship of “adding” is as follows.
First, the “noise suppression gain” is involved in both the generation of the “signal after the first noise suppression” and the standing of the “first ratio”, while the former is the target of the original sound addition processing based on the latter, while the “second noise suppression” The conceptual distinction that “average gain” is involved in both the generation of “rear signal” and the standing of “second ratio”, and the former is the target of the original sound addition processing based on the latter, is confirmed.
Under this assumption, first, as long as the “average value gain” in the present embodiment is calculated, the “original sound adding means” according to the present invention performs the original sound addition by the “second ratio” based on the “average value gain”. In the present invention, a relationship in which the original sound is not added based on the “first ratio” can be established. In other words, in this case, this mode is in a priority position with respect to the mode including the “first ratio” described above.
Alternatively, an aspect in which the original sound addition by the “first ratio” is performed in a part of the input signal and the original sound addition by the “second ratio” is performed in the other part can be realized in the present invention (in this case) The former part is related to the “first noise-suppressed signal”, and the latter part is related to the “second noise-suppressed signal”.)
Further, depending on the case, not only the original sound addition by the “second ratio” is performed on the “second noise-suppressed signal” but also the original sound addition by the “second ratio” is performed on the “first noise-suppressed signal”. In particular, the present invention falls within the scope thereof (this aspect will be referred to later as one of various aspects of the present invention). This means that the “second ratio” determined based on the “average value gain” is applied not to the “average value gain” but also to the “first noise-suppressed signal” established by the involvement of the “noise suppression gain”. Implies that it may be (and vice versa in some cases).

また、本発明に係る雑音抑圧装置では、前記原音加算手段は、前記第１又は第２割合を時間軸上で平滑化した平滑化割合を算出し、前記第１又は第２雑音抑圧後信号に、前記平滑化割合で、前記入力信号を加算する、ように構成してもよい。
この態様によれば、前記第１又は第２割合を時間軸上で平滑化した平滑化割合が算出される。したがって、この平滑化割合は、「第１割合」又は「第２割合」が、前述の「差」に基づいて定立された後、なおこれらが平滑化処理を受けたものである。なお、「時間軸上で平滑化」とは、算出後の平滑化割合が、時系列に沿って、ＯＧｓｍｔ-Ｔ（１），ＯＧｓｍｔ-Ｔ（２），…，ＯＧｓｍｔ-Ｔ（ｒ），…、とあるとすると（ｒは適当な整数）、例えば、適当な平滑化係数δを用いて、ＯＧｓｍｔ-Ｔ（ｒ）＝δ・ＯＧｓｍｔ-Ｔ（ｒ−１）＋（１−δ）・ｏｇなどとして算出されることを意味する（ただし、ｏｇは、「第１割合」又は「第２割合」である。）。
これによれば、第１又は第２割合（正確に言うと、平滑化された第１又は第２割合。つまり、本態様に言う「平滑化割合」）の時間の経過に従った急激な変化が生じないから、雑音抑圧処理の連続性・一貫性が維持される。
なお、本発明においては、後述するように、経時的に区分されたフレーム毎の処理が行われると好適であるが、その場合、本態様にいう「時間軸」は、より具体的には、そのフレームの１個１個が順に並べられていく場合に観念される軸、として想定され得る。この点についての、より詳細な具体例については、後述する実施形態、特に式（７）に関して説明される。
また、本態様に関連して、本態様にいう「第１割合」又は「第２割合」は、後述する実施形態においては、「原音加算率ｏｇ」として、同じく「平滑化割合」は「原音加算割合ＯＧ_ｔ」として、それぞれ具体化されて説明される。 Further, in the noise suppression device according to the present invention, the original sound adding means calculates a smoothing ratio obtained by smoothing the first or second ratio on the time axis, and outputs the smoothed ratio to the first or second noise-suppressed signal. The input signal may be added at the smoothing ratio.
According to this aspect, a smoothing ratio obtained by smoothing the first or second ratio on the time axis is calculated. Therefore, after the “first ratio” or “second ratio” is established based on the aforementioned “difference”, the smoothing ratio is still subjected to the smoothing process. Note that “smoothing on the time axis” means that the calculated smoothing ratio is OGsmt-T (1), OGsmt-T (2),... OGsmt-T (r), in time series. ... (R is an appropriate integer), for example, using an appropriate smoothing coefficient δ, OGsmt−T (r) = δ · OGsmt−T (r−1) + (1−δ) · It is calculated as og or the like (however, og is “first ratio” or “second ratio”).
According to this, an abrupt change with the passage of time of the first or second ratio (more precisely, the smoothed first or second ratio, that is, the “smoothing ratio” referred to in this aspect). Therefore, the continuity and consistency of the noise suppression process is maintained.
In the present invention, as will be described later, it is preferable that processing for each frame divided over time is performed. In that case, the “time axis” in this aspect is more specifically, It can be assumed as an axis that is considered when the frames are arranged one by one. A more specific example of this point will be described with respect to an embodiment to be described later, particularly with respect to equation (7).
Further, in relation to this aspect, “first ratio” or “second ratio” referred to in this aspect is “original sound addition ratio og” and “smoothing ratio” is “original sound” in the embodiment described later. The addition ratio OG _t ”will be specifically described.

また、前記の「平均値ゲイン」等の概念を含む本発明の態様では、前記入力信号に含まれる音声の有無を検出することで、当該入力信号を、経時的に、当該音声が含まれる音声フレーム及び当該音声が含まれない雑音フレームに区分する音声検出手段を更に備え、前記第２雑音抑圧後信号は、前記入力信号のうち前記雑音フレームに該当する部分に、前記平均値ゲインを適用した結果得られる、ように構成してもよい。
この態様によれば、前述の平均値ゲインが、雑音フレームに、より好適には雑音フレームだけに適用されて、第２雑音抑圧後信号が生成される（そして、この第２雑音抑圧後信号は、前記第２割合による原音加算処理を受ける。）。雑音フレームでは、ミュージカルノイズが比較的に発生しやすいことからすると、本態様は、それを狙うかの如く平均値ゲインの適用を行うことになるので、ミュージカルノイズ抑圧効果を得るための最適な態様の１つということができる。
なお、本態様において、音声が「含まれる」あるいは「含まれない」という用語は、いわば絶対的な意味に解されてはならない。例えば、“すべてが音声”で満たされるフレームと、“全く音声が不存在”のフレームという２つの態様が観念的には想定されるが、この両者両極端を指して「音声フレーム」及び「雑音フレーム」の区別が行われる場合に本態様が限定されるわけでは勿論なく、また、後者のみが「雑音フレーム」で、それ以外はすべて「音声フレーム」という区別が行われる場合に、本態様が限定されるわけでもない。つまり、本態様は、「雑音フレーム」と認定される場合においても、当該の雑音フレームに「音声」が全く含まれてはならないということまで要求せず、本態様にいう「音声フレーム」及び「雑音フレーム」の区別は、上記の２つの場合の適当な中間点を基準に行われてよいのである。
以上のような意味において、本態様にいう「含まれる」あるいは「含まれない」という用語、あるいは、本態様に係る「音声フレーム」及び「雑音フレーム」の区別は、相対的なものということができる。 Further, in the aspect of the present invention including the concept such as the “average gain”, the input signal is detected over time by detecting the presence or absence of the sound included in the input signal. Voice detection means for classifying a frame and a noise frame not including the voice is further provided, and the second noise-suppressed signal is obtained by applying the average gain to a portion corresponding to the noise frame in the input signal. You may comprise so that a result may be obtained.
According to this aspect, the above average gain is applied to the noise frame, more preferably only to the noise frame, to generate the second noise-suppressed signal (and the second noise-suppressed signal is , Receiving the original sound addition process by the second ratio). In the noise frame, since the musical noise is relatively easy to generate, the present embodiment applies the average gain as if aiming at it, so that the optimum mode for obtaining the musical noise suppression effect is achieved. It can be said that.
In this aspect, the term “included” or “not included” in the voice should not be understood in an absolute sense. For example, two modes of a frame that is filled with “all speech” and a frame that “no speech exists” are assumed conceptually, but “voice frame” and “noise frame” refer to both extremes. ”Is not limited to this case, and this mode is limited to the case where only the latter is“ noise frame ”and all other cases are“ voice frames ”. It's not done. That is, this aspect does not require that “speech” should not be included in the noise frame even when it is recognized as a “noise frame”. The distinction of “noise frame” may be made on the basis of a suitable intermediate point in the above two cases.
In the above meaning, the term “included” or “not included” in this aspect, or the distinction between “voice frame” and “noise frame” in this aspect is relative. it can.

この態様では、前記第１雑音抑圧後信号は、前記入力信号のうち前記音声フレームに該当する部分に、前記雑音抑圧ゲインを適用した結果得られる、ように構成してもよい。
この態様によれば、前述の雑音抑圧ゲインが、音声フレームに適用されて、第１雑音抑圧後信号が生成される（そして、この第１雑音抑圧後信号は、前記第１割合による原音加算処理を受ける。）。この態様は、すぐ前に述べた態様と併存可能であるが、その場合、好適には、音声フレームでは、平均化処理を経ないいわば通常の雑音抑圧ゲインのみが利用され、雑音フレームでは、平均化処理を経た、平均値ゲインのみが利用される、ということになる。音声フレームでは、雑音の存在が然程目立たず、雑音フレームでは、その逆であるという事情に鑑みるなら、本態様に係る処理内容は、きわめて合理的・効率的・実効的な雑音抑圧効果を享受可能にする。 In this aspect, the first noise-suppressed signal may be configured to be obtained as a result of applying the noise suppression gain to a portion corresponding to the voice frame in the input signal.
According to this aspect, the noise suppression gain described above is applied to the speech frame to generate a first noise-suppressed signal (and the first noise-suppressed signal is subjected to original sound addition processing by the first ratio). Receive). This aspect can coexist with the aspect just described, but in this case, preferably, only the normal noise suppression gain without using the averaging process is used in the voice frame, and the average in the noise frame is used. That is, only the average value gain that has undergone the conversion processing is used. In view of the fact that the presence of noise is not so noticeable in speech frames and the opposite is the case in noise frames, the processing content according to this aspect enjoys a very rational, efficient, and effective noise suppression effect. to enable.

この態様では、前記原音加算手段は、前記第１雑音抑圧後信号に、前記第１割合に代えて前記第２割合で、前記入力信号を加算する、ように構成してもよい。
この態様によれば、「第２雑音抑圧後信号」に「第２割合」による原音加算を行うだけでなく、「第１雑音抑圧後信号」にも「第２割合」による原音加算を行うことになる。つまり、この場合、「平均値ゲイン」でなく「雑音抑圧ゲイン」の関与で成立する「第１雑音抑圧後信号」にも、「平均値ゲイン」に基づいて定められる「第２割合」による原音加算処理が行われるのである。
これによれば、原音加算割合として利用されるものが、「第２割合」に一本化されるので、処理の効率化等が実現される。 In this aspect, the original sound adding means may be configured to add the input signal to the first noise-suppressed signal at the second ratio instead of the first ratio.
According to this aspect, not only the original sound addition by the “second ratio” is performed on the “second noise-suppressed signal” but also the original sound addition by the “second ratio” is performed on the “first noise-suppressed signal”. become. That is, in this case, not the “average value gain” but also the “first noise-suppressed signal” established by the involvement of the “noise suppression gain”, the original sound by the “second ratio” determined based on the “average value gain” An addition process is performed.
According to this, since what is used as the original sound addition ratio is unified into the “second ratio”, processing efficiency and the like are realized.

また、「音声フレーム」に「雑音抑圧ゲイン」を適用して「第１雑音抑圧後信号」を得る態様では、前記原音加算手段は、前記音声フレームに係る前記第１雑音抑圧後信号に前記入力信号を加算しようとする場合は、当該音声フレームに直近の雑音フレームに関して既に算出された前記第２割合を、当該音声フレームにおける前記第１割合であるものとして、前記第１雑音抑圧後信号に前記入力信号を加算する、ように構成してもよい。
この態様によれば、音声フレームにおける原音加算処理では、その直近に処理された雑音フレームにおいて利用された原音加算割合（即ち、「第２割合」）が、その原音加算割合（即ち、「第１割合」）であるものとして利用される。つまり、この場合、その実質は直近の雑音フレームの「第２割合」の値であるのに、あたかもそれが、当該音声フレームにおける「第１割合」の値であるかのように擬制されるということである。
このようなことから、本態様においては、直近の雑音フレームで実行された雑音抑圧処理が、それに続く音声フレームにおける雑音抑圧処理にいわば引き継がれるようなことになり、雑音フレームから音声フレームへの切り替わりの場面において、雑音抑圧処理の一貫性が保持される。これにより、当該切り替わりの場面において、雑音量が急激に変化するといった現象の発生が未然に防止される。
なお、この態様は、第１雑音抑圧後信号に第１割合で原音を加算し、第２雑音抑圧後信号に第２割合で原音を加算するという態様が並存し、かつ、前者は入力信号中の雑音フレームに関し、後者は音声フレームに関するということを前提とする。 Further, in an aspect in which a “noise suppression gain” is applied to a “voice frame” to obtain a “first noise-suppressed signal”, the original sound adding means is configured to input the input to the first noise-suppressed signal related to the voice frame. When trying to add a signal, the second ratio already calculated for the noise frame nearest to the voice frame is assumed to be the first ratio in the voice frame, and the signal after the first noise suppression is You may comprise so that an input signal may be added.
According to this aspect, in the original sound addition process in the voice frame, the original sound addition ratio (that is, the “second ratio”) used in the noise frame processed most recently is changed to the original sound addition ratio (that is, “first”). Is used as a percentage. That is, in this case, although the actual value is the value of the “second ratio” of the most recent noise frame, it is imitated as if it is the value of the “first ratio” in the voice frame. That is.
For this reason, in this aspect, the noise suppression processing executed in the most recent noise frame is succeeded to the noise suppression processing in the subsequent speech frame, and switching from the noise frame to the speech frame is performed. In this case, the consistency of the noise suppression process is maintained. As a result, it is possible to prevent the occurrence of a phenomenon in which the noise amount changes abruptly in the switching scene.
In this aspect, there is an aspect in which the original sound is added to the signal after the first noise suppression at the first ratio, and the original sound is added to the signal after the second noise suppression at the second ratio, and the former is included in the input signal. It is assumed that the latter is related to the speech frame and the latter is related to the speech frame.

あるいは、「音声フレーム」に「雑音抑圧ゲイン」を適用して「第１雑音抑圧後信号」を得る態様では、前記原音加算手段は、前記雑音フレームに係る前記第２雑音抑圧後信号に前記入力信号を加算しようとする場合は、当該雑音フレームにおける前記第２割合を一時的第２割合として算出した後、当該雑音フレームの直前のフレームにおける前記第１又は第２割合を用いて、当該一時的第２割合を時間軸上で平滑化した平滑化割合を算出し、この平滑化割合を前記第２割合であるものとして、前記第２雑音抑圧後信号に前記入力信号を加算し、前記音声フレームに係る前記第１雑音抑圧後信号に前記入力信号を加算しようとする場合は、当該音声フレームの直前のフレームにおける前記第１又は第２割合を、当該音声フレームにおける前記第１割合であるものとして、前記第１雑音抑圧後信号に前記入力信号を加算する、ように構成してもよい。
この態様によれば、音声フレーム及び雑音フレームそれぞれについての第１又は第２割合が好適に定立される。雑音フレームに関しては、平滑化割合が算出されるので、雑音抑圧処理の一貫性・連続性が保持されるし、音声フレームに関しては、前述したような「直近の雑音フレーム」における雑音スペクトルの維持が好適になされる（つまり、本態様によれば、ある雑音フレームについての第２割合が既に設定されているなら、その後、音声フレームが続く限りは、当該第２割合の値が、新たな第１割合の値として維持され続ける。）。この点についての、より詳細な具体例については、後述する実施形態、特に式（７）に関して説明される。
なお、本態様において、単に「フレーム」という場合、それは「音声フレーム」である場合も、「雑音フレーム」である場合もある。
また、この態様は、第１雑音抑圧後信号に第１割合で原音を加算し、第２雑音抑圧後信号に第２割合で原音を加算するという態様が並存し、かつ、前者は入力信号中の雑音フレームに関し、後者は音声フレームに関するということを前提とする。
さらに、本態様に関連して、本態様にいう「第１割合」、「第２割合」及び「平滑化割合」等は、後述する実施形態においては、「原音加算率ｏｇ」あるいは「原音加算割合ＯＧｔ」などとして具体化されて説明されるが、この場合、前者における１つの言葉が、後者における１つの言葉に、１対１で対応するという関係は成立しないことに注意を要する。 Alternatively, in an aspect in which a “noise suppression gain” is applied to a “voice frame” to obtain a “first noise-suppressed signal”, the original sound adding unit is configured to input the input to the second noise-suppressed signal related to the noise frame. When trying to add signals, after calculating the second ratio in the noise frame as a temporary second ratio, the temporary ratio is calculated using the first or second ratio in the frame immediately before the noise frame. A smoothing ratio obtained by smoothing the second ratio on the time axis is calculated, and the input signal is added to the second noise-suppressed signal, assuming that the smoothing ratio is the second ratio, and the audio frame When the input signal is to be added to the first noise-suppressed signal according to the above, the first or second ratio in the frame immediately before the voice frame is set to the first or second ratio in the voice frame. As being 1 ratio, adds the input signal to the first noise suppression after signal may be configured to.
According to this aspect, the first or second ratio for each of the voice frame and the noise frame is preferably established. Since the smoothing ratio is calculated for the noise frame, the consistency and continuity of the noise suppression process is maintained, and for the voice frame, the noise spectrum in the “most recent noise frame” as described above is maintained. (I.e., according to this aspect, if the second rate for a certain noise frame has already been set, the value of the second rate is set to the new first value as long as the audio frame continues thereafter). Will continue to be maintained as a percentage value.) A more specific example of this point will be described with respect to an embodiment to be described later, particularly with respect to equation (7).
In this aspect, when simply referred to as “frame”, it may be “voice frame” or “noise frame”.
Further, in this aspect, there is an aspect in which the original sound is added to the signal after the first noise suppression at the first ratio and the original sound is added to the signal after the second noise suppression at the second ratio, and the former is included in the input signal. It is assumed that the latter is related to the speech frame and the latter is related to the speech frame.
Further, in relation to this aspect, “first ratio”, “second ratio”, “smoothing ratio” and the like referred to in this aspect are “original sound addition rate og” or “original sound addition” in the embodiment described later. In this case, it should be noted that there is no one-to-one relationship between one word in the former and one word in the latter.

また、本発明に係る雑音抑圧装置では、前記第１又は第２割合は、以下の式（Ａ）によって求められる、ように構成してもよい。
ｏｇ＝ｍａｘ（０，ｔｇ−Ｇ） …… （Ａ）
ここで、ｏｇは、求めるべき第１又は第２割合、ｔｇは、前記目標雑音抑圧ゲイン、Ｇは、前記雑音抑圧ゲイン又は前記平均値ゲイン、ｍａｘ（ａ，ｂ）は、ａ及びｂのうちいずれか大きい値を返す関数を意味する。
この態様によれば、第１又は第２割合が好適に算出される。この点についての、より詳細な具体例については、後述する実施形態、特に式（５）に関して説明される。 In the noise suppression device according to the present invention, the first or second ratio may be obtained by the following equation (A).
og = max (0, tg−G) (A)
Here, og is the first or second ratio to be obtained, tg is the target noise suppression gain, G is the noise suppression gain or the average gain, and max (a, b) is a or b Means a function that returns the larger value.
According to this aspect, the first or second ratio is suitably calculated. A more specific example of this point will be described with respect to an embodiment to be described later, particularly with respect to formula (5).

一方、本発明に係る雑音抑圧方法は、上記課題を解決するため、Ｋ個の周波数帯域ごと（ただし、Ｋは２以上の自然数）に、入力信号に含まれる雑音スペクトルを当該入力信号に基づいて推定する雑音スペクトル推定工程と、前記雑音スペクトル推定工程における推定結果に基づいて、前記Ｋ個の周波数帯域ごとの雑音抑圧ゲインを算出する第１ゲイン演算工程と、前記入力信号に前記雑音抑圧ゲインを適用した結果得られる第１雑音抑圧後信号に、当該雑音抑圧ゲインと、目標としている雑音抑圧の程度を示す目標雑音抑圧ゲインとの差に基づいて定められた第１割合で、前記入力信号を加算する原音加算工程と、を含む。 On the other hand, in order to solve the above problem, the noise suppression method according to the present invention calculates the noise spectrum included in the input signal for each of K frequency bands (where K is a natural number of 2 or more) based on the input signal. A noise spectrum estimating step for estimating; a first gain calculating step for calculating a noise suppression gain for each of the K frequency bands based on an estimation result in the noise spectrum estimating step; and the noise suppression gain for the input signal. In the first noise-suppressed signal obtained as a result of application, the input signal is input at a first ratio determined based on the difference between the noise suppression gain and the target noise suppression gain indicating the target level of noise suppression. An original sound adding step of adding.

本発明によれば、上述した、本発明に係る雑音抑圧装置に関して述べた作用効果と本質的に異ならない作用効果が奏されることが明白である。 According to the present invention, it is obvious that the above-described operational effects that are not essentially different from the operational effects described with respect to the noise suppression device according to the present invention are achieved.

この発明に係る雑音抑圧方法では、前記雑音抑圧ゲインについての、前記Ｋ個の周波数帯域に関する平均値ゲインを算出する第２ゲイン演算工程を更に含み、前記原音加算工程では、前記入力信号に前記平均値ゲインを適用した結果得られる第２雑音抑圧後信号に、当該平均値ゲインと前記目標雑音抑圧ゲインとの差に基づいて定められた第２割合で、前記入力信号が加算される、ように構成してもよい。
この態様によれば、上述した、本発明に係る雑音抑圧装置の各種態様のうち、第２雑音抑圧後信号に第２割合で入力信号が加算される態様に関して述べた作用効果と本質的に異ならない作用効果が奏されることが明白である。
なお、本態様にいう「第２雑音抑圧後信号に、…第２割合で、…入力信号を加算する」と、前記の「第１雑音抑圧後信号に、…第１割合で、…入力信号を加算する」との関係については、上述と同様である。 The noise suppression method according to the present invention further includes a second gain calculation step of calculating an average value gain for the K frequency bands for the noise suppression gain, and in the original sound adding step, the average signal is added to the input signal. The input signal is added to the second noise-suppressed signal obtained as a result of applying the value gain at a second ratio determined based on the difference between the average gain and the target noise suppression gain. It may be configured.
According to this aspect, of the various aspects of the noise suppression device according to the present invention described above, this is essentially different from the operational effects described regarding the aspect in which the input signal is added to the second noise-suppressed signal at the second ratio. It is clear that there is an effect that does not become.
It should be noted that “add the input signal to the second noise-suppressed signal at a second ratio” in the present embodiment and the above-mentioned “add the first noise-suppressed signal to the first noise suppression signal at the first ratio. The relationship of “adding” is the same as described above.

また、本発明に係る雑音抑圧方法では、前記入力信号に含まれる音声の有無を検出することで、当該入力信号を、経時的に、当該音声が含まれる音声フレーム及び当該音声が含まれない雑音フレームに区分する音声検出工程を更に含み、前記第２雑音抑圧後信号は、前記入力信号のうち前記雑音フレームに該当する部分に、前記平均値ゲインを適用した結果得られる、ように構成してもよい。
この態様によれば、上述した、本発明に係る雑音抑圧装置の各種態様のうち、雑音フレームに平均値ゲインを適用する態様に関して述べた作用効果と本質的に異ならない作用効果が奏されることが明白である。
なお、本態様にいう「含まれる」あるいは「含まれない」という用語の意義については、上述と同様である。 Further, in the noise suppression method according to the present invention, by detecting the presence or absence of speech included in the input signal, the input signal is analyzed over time with respect to speech frames including the speech and noise not including the speech. And further comprising a voice detection step of segmenting into frames, wherein the second noise-suppressed signal is obtained as a result of applying the average gain to a portion corresponding to the noise frame of the input signal. Also good.
According to this aspect, among the various aspects of the noise suppression device according to the present invention described above, operational effects that are not essentially different from the operational effects described regarding the aspect in which the average value gain is applied to the noise frame are exhibited. Is obvious.
Note that the meaning of the term “included” or “not included” in this embodiment is the same as described above.

以上のほか、本発明の、より具体化された態様、あるいはそれによって奏される作用効果は、すぐ後から始まる実施形態における記載において明らかにされる。 In addition to the above, a more specific aspect of the present invention or an effect achieved by the present invention will be clarified in the description of the embodiment starting immediately after.

＜第１実施形態＞
以下では、本発明に係る第１の実施の形態について図１を参照しながら説明する。なお、ここに言及した図１に加え、以下で参照する各図面（例えば図６等のグラフをも含む。）においては、各部の寸法の比率が実際のものとは適宜に異ならせてある場合がある。 <First Embodiment>
In the following, a first embodiment according to the present invention will be described with reference to FIG. In addition to FIG. 1 mentioned here, in each drawing referred to below (for example, including the graph of FIG. 6 and the like), the ratio of dimensions of each part is appropriately different from the actual one. There is.

雑音抑圧装置１は、図１に示すように、時間・周波数変換部１０、雑音スペクトル推定部２０、雑音抑圧ゲイン演算部３０、雑音期間・雑音抑圧ゲイン演算部４０、原音加算率演算部５０、原音加算ゲイン演算部６０、周波数・時間変換部７０、及び音声検出部８０からなる。 As shown in FIG. 1, the noise suppression apparatus 1 includes a time / frequency conversion unit 10, a noise spectrum estimation unit 20, a noise suppression gain calculation unit 30, a noise period / noise suppression gain calculation unit 40, an original sound addition rate calculation unit 50, An original sound addition gain calculation unit 60, a frequency / time conversion unit 70, and a sound detection unit 80 are included.

時間・周波数変換部１０は、時間領域の入力信号にフーリエ変換をかけて、周波数領域の信号に変換する。このフーリエ変換は、入力信号を経時的に所定数のフレームに分け、かつ、そのフレームに適当な窓関数をかけることを通じて行われるのが好適である。
前記の周波数領域の信号は、振幅スペクトル及び位相スペクトルに分けられ、このうちの位相スペクトルは、後述する周波数・時間変換部７０にそのまま送られる。他方、振幅スペクトルは、後述する雑音スペクトル推定部２０以後の各部に送られて、後述する各種の処理を受ける。 The time / frequency conversion unit 10 performs Fourier transform on the time domain input signal to convert it to a frequency domain signal. The Fourier transform is preferably performed through dividing the input signal into a predetermined number of frames over time and applying an appropriate window function to the frames.
The signal in the frequency domain is divided into an amplitude spectrum and a phase spectrum, and the phase spectrum is sent as it is to a frequency / time conversion unit 70 described later. On the other hand, an amplitude spectrum is sent to each part after the noise spectrum estimation part 20 mentioned later, and receives various processes mentioned later.

前記の時間領域の入力信号は、音声検出部８０にも供給される。音声検出部８０は、この入力信号の中の音声信号の有無を検出する。前述のように、入力信号がフレームに分けられる場合には、フレームごとの音声検出が行われる（なお、第１実施形態では、このような処理を前提とする。）。ここで「音声」とは特に、会話、話し言葉、音楽、各種の信号等々、人にとって有意味な音響を意味する。つまり、入力信号を適当な再生手段によって再生するとした場合、その入力信号中の「音声信号」を再生すれば、当該音響となる、という関係が成立する。
この音声信号は、例えば、入力信号のレベルが予め定めた閾値を超えるか否かを基準として検出される。もっとも、本発明は、これ以外にも様々な手法を採用することが可能である。例えば、確率・統計的手法を用いて音声信号の発生確率を推定する手法等が採用されてもよいし、あるいは、検出対象としても、前記入力信号を利用するのではなく、そのフーリエ変換後の信号（つまり、前記でいう周波数領域の信号）を利用する手法等が採用されてもよい。
なお、以下では、この音声検出部８０によって音声信号が存在すると判定されたフレームは、「音声フレーム」と、不存在であると判定されたフレームは、「雑音フレーム」と、それぞれ呼ぶことがある。なお、ここで存在・不存在というのは、いわば絶対的な意義をもたない。前述のように、音声信号の有無が所定の閾値を基準に判断されることがある以上、「雑音フレーム」に、厳密に言えば音声信号と呼べるものが含まれている可能性は排除されない。 The time domain input signal is also supplied to the voice detector 80. The voice detection unit 80 detects the presence or absence of a voice signal in the input signal. As described above, when the input signal is divided into frames, voice detection is performed for each frame (in the first embodiment, such processing is assumed). Here, “speech” means sound that is meaningful to a person, such as conversation, spoken language, music, various signals, and the like. That is, when the input signal is reproduced by an appropriate reproducing means, the relationship that the sound is obtained by reproducing the “voice signal” in the input signal is established.
This audio signal is detected based on, for example, whether or not the level of the input signal exceeds a predetermined threshold. However, the present invention can employ various methods other than this. For example, a method of estimating the probability of occurrence of an audio signal using a probability / statistical method may be employed, or the detection target may not be based on the input signal but after the Fourier transform. A method using a signal (that is, a signal in the above-described frequency domain) may be employed.
In the following description, a frame that is determined by the sound detection unit 80 to determine that an audio signal is present may be referred to as a “voice frame”, and a frame that is determined to be absent may be referred to as a “noise frame”. . The existence / non-existence here has no absolute meaning. As described above, since the presence / absence of the audio signal may be determined based on the predetermined threshold, the possibility that the “noise frame” includes what can be called an audio signal strictly speaking is not excluded.

雑音スペクトル推定部２０は、前記振幅スペクトルに基づいて、雑音スペクトルの推定値を算出する。第１実施形態では特に、以下の式（１）に基づいて、所定の個数に分割された周波数帯域ごとの雑音スペクトルが推定される。

ここで、Ｎ_ｔ（ｎ）は、現に処理中であるフレームにおける雑音スペクトル推定値、Ｎ_ｔ−１（ｎ）は、その直前のフレームにおける雑音スペクトル推定値（したがって、“ｔ”は、現に処理中であるフレームそれ自体を表現する添え字である。）、Ｙ（ｎ）は入力された振幅スペクトル、ｎは周波数帯域（に付けられた番号。なお、周波数帯域はＮ個に分割される。なお、このＮは、本発明にいう「Ｋ個の周波数帯域」のＫ以下（＝Ｎ≦Ｋ）である。）、βは平滑化係数である。また、式（１）中、ｃａｓｅ・Ａとあるのは、雑音スペクトル推定部２０が雑音フレームを処理する場合を表現し、ｃａｓｅ・Ｂとあるのは、音声フレームを処理する場合を表現している。
このように、雑音スペクトル推定部２０は、現に処理しているフレームが、雑音フレームであるか音声フレームであるかに応じて、雑音スペクトル推定値Ｎ_ｔ（ｎ）を求めるために利用する式を変更する。すなわち、音声フレーム処理時（ｃａｓｅ・Ｂ）には、その直前の雑音スペクトル推定値をそのまま用いて雑音スペクトル推定値Ｎ_ｔ（ｎ）を求め、雑音フレーム処理時（ｃａｓｅ・Ａ）には、入力した振幅スペクトルを時間軸上で平滑化することで、雑音スペクトル推定値Ｎ_ｔ（ｎ）を求める。 The noise spectrum estimation unit 20 calculates an estimated value of the noise spectrum based on the amplitude spectrum. Particularly in the first embodiment, a noise spectrum for each frequency band divided into a predetermined number is estimated based on the following equation (1).

Here, N _t (n) is a noise spectrum estimation value in the frame currently being processed, N _t−1 (n) is a noise spectrum estimation value in the immediately preceding frame (thus, “t” is actually processed). And Y (n) is an input amplitude spectrum, n is a frequency band (number assigned to it), and the frequency band is divided into N parts. Note that N is equal to or less than K (= N ≦ K) of “K frequency bands” in the present invention, and β is a smoothing coefficient. In Equation (1), “case · A” represents a case where the noise spectrum estimation unit 20 processes a noise frame, and “case · B” represents a case where an audio frame is processed. Yes.
As described above, the noise spectrum estimation unit 20 uses an expression used to obtain the noise spectrum estimation value N _t (n) depending on whether the frame currently processed is a noise frame or a speech frame. change. That is, at the time of voice frame processing (case · B), the noise spectrum estimation value N _t ( n ) is obtained using the noise spectrum estimation value immediately before that as it is, and at the time of noise frame processing (case · A), the input The noise spectrum estimation value N _t (n) is obtained by smoothing the measured amplitude spectrum on the time axis.

雑音抑圧ゲイン演算部３０は、前記振幅スペクトルと、式（１）で求められた雑音スペクトル推定値Ｎ_ｔ（ｎ）とに基づいて、雑音抑圧ゲインを算出する。第１実施形態では特に、以下の式（２）により雑音抑圧ゲインを算出する。

ここで、ｍａｘ（ａ，ｂ）は、ａ及びｂのうちいずれか大きい値を返す関数を意味する（以下、同じ。）。
この式（２）により、入力された振幅スペクトルＹ（ｎ）に対する雑音スペクトル推定値Ｎ_ｔ（ｎ）との間において、Ｙ（ｎ）＜Ｎ_ｔ（ｎ）が成立する場合は、Ｇ（ｎ）＝０となり、Ｙ（ｎ）＞Ｎ_ｔ（ｎ）が成立する場合は、Ｇ（ｎ）＝（Ｙ（ｎ）−Ｎ_ｔ（ｎ））／Ｙ（ｎ）となる。
この雑音抑圧ゲイン演算部３０で算出された雑音抑圧ゲインは、前記音声検出部８０によって区分された音声フレーム及び雑音フレームの別に応じて、雑音期間・雑音抑圧ゲイン演算部４０を介して又は直接に、原音加算ゲイン演算部６０に供給される。図１に示す雑音抑圧装置１は、このような処理を実現するためのスイッチを備える（図中弧線矢印参照）。 The noise suppression gain calculation unit 30 calculates a noise suppression gain based on the amplitude spectrum and the noise spectrum estimation value N _t (n) obtained by Expression (1). Particularly in the first embodiment, the noise suppression gain is calculated by the following equation (2).

Here, max (a, b) means a function that returns a larger value of a and b (hereinafter the same).
If Y (n) <N _t (n) holds between the noise spectrum estimation value N _t (n) for the input amplitude spectrum Y (n) according to this equation (2), G (n ) = 0, and when Y (n)> N _t (n) holds, G (n) = (Y (n) −N _t (n)) / Y (n).
The noise suppression gain calculated by the noise suppression gain calculation unit 30 is determined via the noise period / noise suppression gain calculation unit 40 or directly according to the voice frame and noise frame classified by the voice detection unit 80. The original sound addition gain calculation unit 60 is supplied. The noise suppression device 1 shown in FIG. 1 includes a switch for realizing such a process (see the arrow in the figure).

雑音期間・雑音抑圧ゲイン演算部４０（以下、簡単のため、「雑音期間用ゲイン演算部４０」ということがある。）は、雑音フレームに適用すべき雑音抑圧ゲインを算出する。第１実施形態においては、この雑音抑圧ゲインを算出するために、以下の手法がとられる。
まず、式（２）で求められた雑音抑圧ゲインＧ（ｎ）に基づいて、以下の式（３）で表現されるｇが算出される。

このｇは、式（３）の右辺から明らかなように、式（２）の雑音抑圧ゲインについての、周波数帯域ｎに関する平均値を意味する。
次いで、この式（３）の雑音抑圧ゲイン平均値ｇが、以下の式（４）によって平滑化される。

ここで、μは平滑化係数、Ｇ_ｔは、現に処理中である雑音フレームについての雑音抑圧ゲイン、Ｇ_ｔ−１は、その直前に処理した雑音フレームについての雑音抑圧ゲインである。
前述の式（１）のｃａｓｅ・Ａとして示される式もそうであるが、この式（４）では、現に処理中のフレームにおける雑音抑圧ゲインを求めるにあたって、その直前に処理されたフレームにおけるそれが参照されていることから、時間軸上で平滑化が行われているということがいえる（後述する式（７）についても同様である。）。
この式（４）中のＧ_ｔが、本雑音期間用ゲイン演算部４０において求められるべき、雑音期間に適用するための雑音抑圧ゲイン（以下、簡単のため、「雑音期間用ゲイン」ということがある。）である。
雑音期間用ゲイン演算部４０は、このようにして求められた雑音期間用ゲインＧ_ｔを、すべての周波数帯域に対して一律に適用する。以下では、この事情を表現するため、この一律に適用されるＧ_ｔを、Ｇ１（ｎ）と表現する。この場合、Ｇ１（０），Ｇ１（１），…，Ｇ１（Ｎ−１）のすべてが、Ｇ_ｔに等しい。 The noise period / noise suppression gain calculation unit 40 (hereinafter sometimes referred to as “noise period gain calculation unit 40” for simplicity) calculates a noise suppression gain to be applied to the noise frame. In the first embodiment, the following method is used to calculate the noise suppression gain.
First, g expressed by the following equation (3) is calculated based on the noise suppression gain G (n) obtained by equation (2).

As is apparent from the right side of Equation (3), g means an average value related to the frequency band n for the noise suppression gain of Equation (2).
Next, the noise suppression gain average value g of the equation (3) is smoothed by the following equation (4).

Here, μ is a smoothing coefficient, G _t is a noise suppression gain for a noise frame currently being processed, and G _t−1 is a noise suppression gain for a noise frame processed immediately before.
The same applies to the expression shown as case · A in the above-described expression (1). In this expression (4), when the noise suppression gain in the frame currently being processed is obtained, that in the frame processed immediately before is calculated. Since it is referred, it can be said that smoothing is performed on the time axis (the same applies to equation (7) described later).
G _t of the equation (4) is to be determined in the noise period for the gain calculation unit 40, the noise suppression gain to be applied to the noise period (hereinafter, for simplicity, be referred to as "noise period gain" Yes.)
Noise period gain calculation unit 40, thus the noise period for the gain G _t obtained by, applied uniformly to all frequency bands. In the following, in order to express this situation, G _t that is applied uniformly is expressed as G1 (n). In this case, G1 (0), G1 ( 1), ..., all G1 (N-1) is equal to _{G t.}

原音加算率演算部５０は、雑音抑圧された信号に対する、原音信号の原音加算率を算出する。第１実施形態では特に、この原音加算率ｏｇが、以下の式（５）に基づいて求められる。

ここで、ｔｇは、目標雑音抑圧ゲインであり、以下の式（６）に基づいている。

この式（６）中のＴＧは、目標雑音抑圧量であり、ｄＢ単位で与えられる。このＴＧ（あるいは、ｔｇ）は、装置外部から図示しない操作部等を介することによって人為的に与えられたり、あるいは、何らかの適当な手法により自動的に演算されてよい。
以上の式（５）によれば、目標雑音抑圧ゲインｔｇと雑音期間用ゲインＧ_ｔとの間において、ｔｇ＜Ｇ_ｔが成立する場合は、ｏｇ＝０となり、ｔｇ≧Ｇ_ｔが成立する場合は、ｏｇ＝ｔｇ−Ｇ_ｔとなる。 The original sound addition rate calculation unit 50 calculates the original sound addition rate of the original sound signal with respect to the noise-suppressed signal. Particularly in the first embodiment, the original sound addition rate og is obtained based on the following equation (5).

Here, tg is a target noise suppression gain and is based on the following equation (6).

TG in the equation (6) is a target noise suppression amount, and is given in dB units. This TG (or tg) may be given artificially from the outside of the apparatus through an operation unit (not shown) or may be automatically calculated by some appropriate method.
According to equation (5) above, between the target noise suppression gain tg and noise period for the gain _{G t,} if tg _{<G t} is satisfied, if og = 0 becomes, tg ≧ _{G t} is satisfied is a og = _{tg-G t.}

原音加算ゲイン演算部６０は、前記の原音加算率ｏｇに基づいて、原音加算後の雑音抑圧ゲインを算出する。第１実施形態においては、この雑音抑圧ゲインを算出するために、以下の手法がとられる。
まず、式（５）で求められた原音加算率ｏｇに基づいて、以下の式（７）で表現されるＯＧ_ｔが算出される。

ここでＯＧ_ｔは、現に処理中であるフレームにおける原音加算割合、ＯＧ_ｔ−１は、その直前のフレームにおける原音加算割合、λは平滑化係数である。なお、式（７）中のｃａｓｅ・Ａ及びｃａｓｅ・Ｂの意義は、上述の式（１）の場合と同様である（以下の式（８）においても同じである。）。
このように、原音加算ゲイン演算部６０は、現に処理しているフレームが、雑音フレームであるか音声フレームであるかに応じて、原音加算割合ＯＧ_ｔを求めるために利用する式を変更する。すなわち、音声フレーム処理時（ｃａｓｅ・Ｂ）には、その直前の原音加算割合をそのまま用いて、原音加算割合ＯＧ_ｔを求め、雑音フレーム処理時（ｃａｓｅ・Ａ）には、前記の原音加算率ｏｇを時間軸上で平滑化することで、原音加算割合ＯＧ_ｔを求める。 The original sound addition gain calculator 60 calculates the noise suppression gain after the original sound addition based on the original sound addition rate og. In the first embodiment, the following method is used to calculate the noise suppression gain.
First, OG _t expressed by the following equation (7) is calculated based on the original sound addition rate og obtained by equation (5).

Here, OG _t is the original sound addition ratio in the frame currently being processed, OG _t-1 is the original sound addition ratio in the immediately preceding frame, and λ is the smoothing coefficient. In addition, the meanings of case · A and case · B in the formula (7) are the same as in the case of the above-described formula (1) (the same applies to the following formula (8)).
In this way, the original sound addition gain calculation unit 60 changes the expression used to obtain the original sound addition ratio OG _t depending on whether the frame currently being processed is a noise frame or an audio frame. That is, at the time of audio frame processing (case · B), the original sound addition ratio OG _t is obtained using the original sound addition ratio immediately before it, and at the time of noise frame processing (case · A), the above-mentioned original sound addition ratio The original sound addition ratio OG _t is obtained by smoothing og on the time axis.

次いで、原音加算ゲイン演算部６０は、以下の式（８）に基づいて、原音加算後の雑音抑圧ゲインを求める。

ここで、Ｇ１（ｎ）は、上で説明したように、雑音フレームにおいて、すべての周波数帯域に対して一律に適用される雑音期間用ゲインを表している。
この式（８）によれば、前述の式（７）における場合分けに応じて、原音加算後の雑音抑圧ゲインＧ２（ｎ）（以下、簡単のため、「修正後ゲインＧ２（ｎ）」ということがある。）が求められる。 Next, the original sound addition gain calculation unit 60 obtains the noise suppression gain after the original sound addition based on the following equation (8).

Here, G1 (n) represents the noise period gain that is uniformly applied to all frequency bands in the noise frame, as described above.
According to this equation (8), the noise suppression gain G2 (n) after adding the original sound (hereinafter referred to as “corrected gain G2 (n)” for simplicity) according to the case classification in the aforementioned equation (7). May be required).

図１に示す乗算器１１は、以上のようにして求められた修正後ゲインＧ２（ｎ）を、振幅スペクトルＹ（ｎ）にかける。すなわち、Ｓ（ｎ）＝Ｇ２（ｎ）・Ｙ（ｎ）なる演算が行われ、その結果、最終的に得るべき雑音抑圧後の振幅スペクトルＳ（ｎ）が得られる。 The multiplier 11 shown in FIG. 1 applies the corrected gain G2 (n) obtained as described above to the amplitude spectrum Y (n). That is, an operation of S (n) = G2 (n) · Y (n) is performed, and as a result, an amplitude spectrum S (n) after noise suppression to be finally obtained is obtained.

最後に、周波数・時間変換部７０は、以上のようにして求められた雑音抑圧後の振幅スペクトルＳ（ｎ）と、時間・周波数変換部１０から直接的に供給される位相スペクトルに基づいて、時間領域の出力信号を生成する。第１実施形態では、時間・周波数変換部１０においてフーリエ変換がかけられているので、周波数・時間変換部７０は、逆フーリエ変換を実施する。 Finally, the frequency / time conversion unit 70 is based on the amplitude spectrum S (n) after noise suppression obtained as described above and the phase spectrum directly supplied from the time / frequency conversion unit 10. Generate a time domain output signal. In the first embodiment, since the Fourier transform is applied in the time / frequency conversion unit 10, the frequency / time conversion unit 70 performs an inverse Fourier transform.

次に、以上に述べた第１実施形態に係る雑音抑圧装置１の作用ないし動作及び効果について、既に参照した図１に加えて、図２乃至図４を参照しながら説明する。
まず、時間・周波数変換部１０は、入力信号に対して、フーリエ変換を施し、更にこれを、図１に示すように、振幅スペクトルＹ（ｎ）及び位相スペクトルに分解する（図２のステップＳ１０１）。この際、時間・周波数変換部１０は、前述のように、フレームごとの処理を実施する。
また、これと並行して、音声検出部８０は、入力信号中に含まれる音声信号の有無を検出する（図２のステップＳ１０２）。この検出処理は、入力信号を、音声フレームと雑音フレームとに分別する処理を可能にする。音声検出部８０は、当該の処理も行う。 Next, the operation, operation, and effect of the noise suppression device 1 according to the first embodiment described above will be described with reference to FIGS. 2 to 4 in addition to FIG. 1 already referred to.
First, the time / frequency converter 10 performs a Fourier transform on the input signal, and further decomposes it into an amplitude spectrum Y (n) and a phase spectrum as shown in FIG. 1 (step S101 in FIG. 2). ). At this time, the time / frequency conversion unit 10 performs processing for each frame as described above.
In parallel with this, the voice detection unit 80 detects the presence or absence of a voice signal included in the input signal (step S102 in FIG. 2). This detection process enables a process of separating the input signal into a voice frame and a noise frame. The voice detection unit 80 also performs this process.

次に、雑音スペクトル推定部２０は、前述した振幅スペクトルＹ（ｎ）、及び、前記式（１）によって、所定の幅を持つ周波数帯域ｎごとに雑音スペクトル推定値Ｎ_ｔ（ｎ）を求める。この場合、前述のように、現に処理しているフレームが雑音フレームであるか音声フレームであるかに応じて、異なる処理が行われる（図２のステップＳ１０３参照）。なお、図２に示すように、この雑音スペクトル推定値Ｎ_ｔ（ｎ）の算出処理以降は、図１に示す乗算器１１による出力信号生成処理（図２のステップＳ１０４）までの間は、雑音フレームと音声フレームとの区別に応じて、その内容が実質的に異なる処理が展開される。したがって、以下では、第１に雑音フレーム用の処理について、第２に音声フレーム用の処理について、〔I〕及び〔II〕に分別して説明する。
なお、このような分別処理は、図１に示すように、音声検出部８０の検出結果に応じたスイッチの切り替えによっている。 Next, the noise spectrum estimation unit 20 obtains a noise spectrum estimation value N _t (n) for each frequency band n having a predetermined width by using the amplitude spectrum Y (n) and the equation (1) described above. In this case, as described above, different processing is performed depending on whether the currently processed frame is a noise frame or a voice frame (see step S103 in FIG. 2). As shown in FIG. 2, the noise spectrum estimation value N _t (n) is calculated and thereafter the output signal generation process (step S104 in FIG. 2) by the multiplier 11 shown in FIG. Depending on the distinction between a frame and an audio frame, a process with substantially different contents is developed. Therefore, in the following, the processing for the noise frame will be described first, and the processing for the audio frame will be classified into [I] and [II].
In addition, such a classification process is based on switching of the switch according to the detection result of the audio | voice detection part 80, as shown in FIG.

〔I〕まず、雑音フレーム用処理では、前記式（１）のｃａｓｅ・Ａとして示される式により、雑音スペクトル推定値Ｎ_ｔ（ｎ）が求められる（図２のステップＳ２０１）。前述のように、これは、入力した振幅スペクトルＹ（ｎ）の平滑化処理による。 [I] First, in the noise frame processing, the noise spectrum estimation value N _t (n) is obtained from the equation shown as case · A in the equation (1) (step S201 in FIG. 2). As described above, this is due to the smoothing process of the input amplitude spectrum Y (n).

次に、前述の雑音スペクトル推定値Ｎ_ｔ（ｎ）、及び、前記式（２）に基づいて、雑音抑圧ゲインＧ（ｎ）が算出される（図２のステップＳ２０２）。これは、図１の雑音抑圧ゲイン演算部３０の作用による。前述のように、Ｙ（ｎ）＞Ｎ_ｔ（ｎ）が成立する場合はＧ（ｎ）＝（Ｙ（ｎ）−Ｎ_ｔ（ｎ））／Ｙ（ｎ）となるが、そうでない場合は、Ｇ（ｎ）＝０となる。これによると、例えば、図３（Ｃ）のような雑音抑圧ゲインが得られることになる（なお、図３（Ｂ）では、前述の雑音スペクトル推定値Ｎ_ｔ（ｎ）、図３（Ａ）では、入力信号の振幅スペクトルがそれぞれ例示されている。）。 Next, the noise suppression gain G (n) is calculated based on the noise spectrum estimation value N _t (n) and the equation (2) (step S202 in FIG. 2). This is due to the action of the noise suppression gain calculator 30 of FIG. As described above, when Y (n)> N _t (n) is satisfied, G (n) = (Y (n) −N _t (n)) / Y (n). , G (n) = 0. According to this, for example, a noise suppression gain as shown in FIG. 3C is obtained (in FIG. 3B, the above-described noise spectrum estimation value N _t (n), FIG. 3A). In each, the amplitude spectrum of the input signal is illustrated.)

次に、前記式（３）及び式（４）により、この雑音抑圧ゲインＧ（ｎ）の、周波数帯域に関する平均値ｇをとり、かつ、そのｇについての平滑化処理を行うことで、雑音期間用ゲインＧ_ｔが求められる（図２のステップＳ２０３）。この平均化・平滑化を経た雑音期間用ゲインＧ_ｔが、全周波数帯域に共通のＧ１（ｎ）となる。これは、雑音期間用ゲイン演算部４０の作用による。
このように、第１実施形態においては、式（２）により求められる雑音抑圧ゲインＧ（ｎ）をそのままの状態で用いるのではなく、そのＧ（ｎ）に対して、式（３）による周波数帯域に関する平均化、及び、式（４）による時間軸上の平滑化、を行った後の雑音期間用ゲインＧ_ｔを、全周波数帯域用の雑音期間用ゲインＧ１（ｎ）として用いることに、その大きな特徴の１つがある。
なお、図３（Ｄ）では、雑音抑圧ゲインＧ（ｎ）に対する平均化処理を行った場合の一例が例示されている（図３（Ｃ）中に示される破線も参照）。 Next, an average value g of the noise suppression gain G (n) with respect to the frequency band is obtained by Equation (3) and Equation (4), and a smoothing process is performed for the noise period. A gain _Gt is obtained (step S203 in FIG. 2). The averaging and smoothing the noise period for the gain G _t passed through becomes the common G1 (n) the entire frequency band. This is due to the action of the noise period gain calculation unit 40.
Thus, in the first embodiment, the noise suppression gain G (n) obtained by the equation (2) is not used as it is, but the frequency according to the equation (3) is used for the G (n). Using the noise period gain G _t after performing averaging on the band and smoothing on the time axis according to Equation (4) as the noise period gain G1 (n) for the entire frequency band, One of its major features.
Note that FIG. 3D illustrates an example when the averaging process is performed on the noise suppression gain G (n) (see also the broken line shown in FIG. 3C).

次に、前述の雑音期間用ゲインＧ_ｔ、及び、前記式（５）により、原音加算率ｏｇが求められる（図２のステップＳ２０４）。これは、図１の原音加算率演算部５０の作用による。ここでは、目標雑音抑圧ゲインｔｇないし目標雑音抑圧量ＴＧの設定の如何が１つの支配的要因として働く。つまり、雑音期間用ゲインＧ_ｔが、目標雑音抑圧ゲインｔｇよりも大きくなれば、原音加算率ｏｇは０に設定され、そうでなければ、雑音期間用ゲインＧ_ｔに応じた原音加算率ｏｇ（即ち、ｏｇ＝ｔｇ−Ｇ_ｔ）が設定される。この両者の使い分けは、原音を加算することによってもたらされる音質改善の効果を、目標雑音抑圧量ＴＧとの関係でどのように享受するかを決める意義がある。すなわち、後者の場合は、目標雑音抑圧量によって定められる枠内（即ち、ｔｇとＧ_ｔとの差にあたる部分）で原音を加算して音質改善を図ることを主目的とし、前者の場合は、Ｇ_ｔ＞ｔｇが成立していて、すでに音質改善の余裕がないので、原音加算率ｏｇを０にするのである（この場合、これによりむしろ、雑音量が多くなることが抑止される）。結局、前記の式（５）及び式（６）は、目標雑音抑圧量を遵守することを基準に、なお原音加算の余裕がある場合に、その枠内で音質改善を図っていく、という処理を実現する意義がある。
このように、第１実施形態においては、原音加算率ｏｇが、雑音期間用ゲインＧ_ｔを利用することによって求められることに、その大きな特徴の１つがある。 Next, the original sound addition rate og is obtained by the above-described noise period gain G _t and the above equation (5) (step S204 in FIG. 2). This is due to the action of the original sound addition rate calculation unit 50 in FIG. Here, the setting of the target noise suppression gain tg or the target noise suppression amount TG works as one dominant factor. That is, if the noise period gain G _t is larger than the target noise suppression gain tg, the original sound addition rate og is set to 0, and if not, the original sound addition rate og (in accordance with the noise period gain G _t ( That is, og = tg−G _t ) is set. The proper use of both has the significance of determining how to enjoy the sound quality improvement effect brought about by adding the original sounds in relation to the target noise suppression amount TG. That is, in the latter case, and within the framework defined by the target noise suppression quantity (i.e., the portion corresponding to the difference between tg and G _t) by adding the original sound that promote quality improvement and main purpose, in the former case, Since G _t > tg is satisfied and there is no room for improvement in sound quality, the original sound addition rate og is set to 0 (in this case, it is rather suppressed that the amount of noise increases). After all, the above formulas (5) and (6) are based on the observance of the target noise suppression amount, and the process of improving the sound quality within the frame when there is still room for adding the original sound. There is significance to realize.
Thus, in the first embodiment, that the original addition rate og is obtained by utilizing a noise period for the gain G _t, 1 Tsugaaru of its major characteristics.

次に、前述の原音加算率ｏｇ、及び、前記式（７）のｃａｓｅ・Ａとして示される式により、原音加算割合ＯＧ_ｔが求められる（図２のステップＳ２０５）。この原音加算割合ＯＧ_ｔは、前述のように、原音加算率ｏｇを時間軸上で平滑化することによって求められる。そして、このようにして求められた原音加算割合ＯＧ_ｔ、及び、前記式（８）により、原音加算後の雑音抑圧ゲイン、即ち、修正後ゲインＧ２（ｎ）が求められる。以上は、原音加算ゲイン演算部６０の作用による。
この場合、この修正後ゲインＧ２（ｎ）は結局、前述の平均化・平滑化を経た雑音期間用ゲインＧ１（ｎ）と、原音加算の程度とを勘案した上で決定されたゲインであるという意味合いをもつ。 Next, the original sound addition ratio OG _t is obtained from the above-described original sound addition rate og and the expression shown as case · A in the expression (7) (step S205 in FIG. 2). The original sound addition ratio OG _t is obtained by smoothing the original sound addition ratio og on the time axis as described above. Then, the noise suppression gain after the original sound addition, that is, the corrected gain G2 (n) is obtained from the original sound addition ratio OG _t thus obtained and the above equation (8). The above is due to the action of the original sound addition gain calculation unit 60.
In this case, the corrected gain G2 (n) is a gain determined in consideration of the noise period gain G1 (n) that has been subjected to the above-described averaging / smoothing and the degree of original sound addition. It has implications.

なお、装置立ち上げ直後の場合に配慮して、前記式（１）のＮ_ｔ−１（ｎ）に該当する値としての初期値が適当に定められていると好ましい（このような初期値としてのＮ_ｔ−１（ｎ）は、当然、後述する音声フレーム用処理における雑音スペクトル推定値Ｎ_ｔ（ｎ）の算出処理においても利用可能である。）。これと同様のことは、前記式（４）、式（７）中のＧ_ｔ−１（ｎ）についてもいえる。 In consideration of the case immediately after the start-up of the apparatus, it is preferable that an initial value as a value corresponding to N _t-1 (n) in the formula (1) is appropriately determined (as such an initial value). N _t-1 (n) can be used in the calculation process of the noise spectrum estimation value N _t (n) in the speech frame processing described later. The same can be said for G _t-1 (n) in the equations (4) and (7).

〔II〕他方、音声フレーム用処理では、基本的には、上述の雑音フレーム処理とほぼ同様の各処理が実行される。つまり、雑音スペクトル推定値Ｎ_ｔ（ｎ）及びそれに基づく雑音抑圧ゲインＧ（ｎ）が求められ（図２のステップＳ３０１・Ｓ２０２参照）、原音加算割合ＯＧ_ｔに基づいて修正後ゲインＧ２（ｎ）が求められる（図２のステップＳ３０３・Ｓ３０４）ことは、雑音フレーム処理と同じである。
ただし、この音声フレーム処理では、雑音フレーム処理と比べて、以下のような異同、あるいは注意点がある。 [II] On the other hand, in the audio frame processing, basically, almost the same processing as the above-described noise frame processing is executed. That is, the noise spectrum estimation value N _t (n) and the noise suppression gain G (n) based thereon are obtained (see steps S301 and S202 in FIG. 2), and the corrected gain G2 (n) based on the original sound addition ratio OG _t (Steps S303 and S304 in FIG. 2) is the same as the noise frame processing.
However, this audio frame processing has the following differences or points of caution compared to noise frame processing.

（ｉ）雑音スペクトル推定値Ｎ_ｔ（ｎ）は、前記式（１）のｃａｓｅ・Ａとして示される式によるのではなく、ｃａｓｅ・Ｂとして示される式により求められる（図２のステップＳ３０１）。この式は、Ｎ_ｔ（ｎ）＝Ｎ_ｔ−１（ｎ）であるから、音声フレーム処理は、いわば現状を維持する処理であるということがいえる。もう少し詳しく言うと、当該の音声フレームの前が、雑音フレームであった場合は、その雑音フレームにおいて算出された雑音スペクトル推定値Ｎ_ｔ−１（ｎ）がそのまま、当該の音声フレーム処理で利用されるということになり、また一方、当該の音声フレームの前が音声フレームであり、かつ、更にその前が雑音フレームであったという場合は、その雑音フレームにおいて算出された雑音スペクトル推定値Ｎ_ｔ−２（ｎ）がそのまま、当該の音声フレーム処理で利用される、などということになる。
要するに、音声フレームでは、直近の雑音フレームにおいて算出された雑音スペクトル推定値Ｎ_ｔ−ｐ（ｎ）（ｐは、当該の音声フレームの直前のフレームから数えて、その直近の雑音フレームまでのフレーム数（両端を含む。））が利用されることになるのである。 (I) The noise spectrum estimation value N _t (n) is obtained not by the equation shown as case · A in the equation (1) but by the equation shown as case · B (step S301 in FIG. 2). Since this equation is N _t (n) = N _t−1 (n), it can be said that the voice frame processing is processing that maintains the current state. More specifically, if the previous audio frame is a noise frame, the noise spectrum estimation value N _t−1 (n) calculated in the noise frame is used as it is in the audio frame processing. On the other hand, if the previous audio frame is an audio frame and further is a noise frame, the noise spectrum estimate N _t− calculated in the noise frame is calculated. ₂ (n) is used as it is in the audio frame processing as it is.
In short, in a speech frame, the estimated noise spectrum value N _tp (n) (p is the number of frames up to the nearest noise frame counted from the immediately preceding frame of the speech frame. (Including both ends)) will be used.

（ｉｉ）これと同様のことは、前記式（７）を用いて行われる、原音加算割合ＯＧ_ｔの算出処理においてもいえる。すなわち、式（７）のｃａｓｅ・Ｂとして示される式は、ＯＧ_ｔ＝ＯＧ_ｔ−１であるから、この場合もやはり、音声フレーム処理では、いわば現状が維持されるのである（図２のステップＳ３０３参照）。
上述の場合と表現を一致させるなら、音声フレームでは、直近の雑音フレームにおいて算出された原音加算割合ＯＧ_ｔ−ｐ（ｎ）（ｐは、当該の音声フレームの直前のフレームから数えて、その直近の雑音フレームまでのフレーム数（両端を含む。））が利用される、ということになる。 (Ii) The same thing can be said in the calculation process of the original sound addition ratio OG _t performed using the equation (7). That is, since the expression shown as case · B in Expression (7) is OG _t = OG _t−1 , in this case as well, the current state is maintained in the audio frame processing (step of FIG. 2). (See S303).
If the expression is consistent with the above case, in the voice frame, the original sound addition ratio OG _t-p (n) (p is calculated from the immediately preceding frame of the voice frame, and is calculated from the latest noise frame. This means that the number of frames up to the noise frame (including both ends) is used.

（ｉｉｉ）雑音抑圧ゲインＧ（ｎ）の演算それ自体は、音声フレーム及び雑音フレームの別に関わらず、前記式（２）を用いて同様に行われる。図２のステップＳ２０２において、〔雑音フレーム用処理〕と〔音声フレーム用処理〕とに対応するボックスが繋げられて描かれているのは、それを象徴的に表現している（もっとも、式（２）中のＮ_ｔ（ｎ）の値は、式（１）のｃａｓｅ・Ａ及びｃａｓｅ・Ｂの別に応じて、両フレームに関し当然異なる。）。 (Iii) The calculation of the noise suppression gain G (n) itself is performed in the same manner using the above equation (2) regardless of whether it is a speech frame or a noise frame. In step S202 of FIG. 2, the boxes corresponding to [noise frame processing] and [voice frame processing] are connected and drawn symbolically (note that the expression ( The value of N _t (n) in 2) is naturally different for both frames depending on the case · A and case · B in equation (1).

（ｉｖ）音声フレーム処理では、前記式（３）及び式（４）に関わる処理、即ち、雑音抑圧ゲインＧ（ｎ）についての平均化・平滑化処理が行われない（図２のステップＳ２０３及びその図中右方参照）。また、これに伴って、正当な雑音期間用ゲインＧ_ｔがいわば存在しないような状態となるので、前記式（５）に関わる処理、即ち、原音加算率ｏｇを算出する処理もまた行われない（図２のステップＳ２０４及びその図中右方参照）。 (Iv) In the audio frame processing, the processing related to the equations (3) and (4), that is, the averaging / smoothing processing for the noise suppression gain G (n) is not performed (step S203 and FIG. 2). (See the right side of the figure). As a result, a state in which a valid noise period gain G _t does not exist is present, so that the processing related to the equation (5), that is, the processing for calculating the original sound addition rate og is also not performed. (See step S204 in FIG. 2 and the right side in the figure).

（ｖ）最終的に算出される修正後ゲインＧ２（ｎ）は、前記式（８）のｃａｓｅ・Ａとして示される式によるのではなく、ｃａｓｅ・Ｂとして示される式により求められる（図２のステップＳ３０４）。この場合、雑音フレーム処理時においては、平均化・平滑化を経た雑音期間用ゲインＧ１（ｎ）が用いられるところ、音声フレーム処理時においては、式（２）から求められた雑音抑圧ゲインＧ（ｎ）がそのまま用いられる点が異なっている。 (V) The corrected gain G2 (n) that is finally calculated is determined not by the equation shown as case · A in the equation (8) but by the equation shown as case · B (see FIG. 2). Step S304). In this case, the noise period gain G1 (n) that has undergone averaging and smoothing is used during the noise frame processing, whereas the noise suppression gain G ( The difference is that n) is used as it is.

以上の〔Ｉ〕及び〔ＩＩ〕の処理を経ると、いずれにしても修正後ゲインＧ２（ｎ）が得られるが、この修正後ゲインＧ２（ｎ）に、元の振幅スペクトルＹ（ｎ）をかければ、雑音抑圧後の振幅スペクトルＳ（ｎ）が算出される（図２のステップＳ１０４）。
図３（Ｅ）では、簡単のため、単純に図３（Ａ）の振幅スペクトルＹ（ｎ）に、図３（Ｃ）の平均化された雑音抑圧ゲイン（つまり、ｇ）が乗算された結果が示されている。第１実施形態では、上述のように、これに加えて、原音加算の程度に配慮されたゲインの調整が更に行われはするが（式（８）、特にＯＧ_ｔ（ｎ）の役割、参照）、図３（Ｅ）は、仮に、そのような原音加算処理への配慮を省く場合を想定したときの処理の本質をよく表している（式（８）において、ＯＧ_ｔ（ｎ）＝０ならば、修正後ゲインＧ２（ｎ）は、単にＧ１（ｎ）、又は、Ｇ（ｎ）に等しいというだけである。）。 After the above processes [I] and [II], the corrected gain G2 (n) is obtained in any case. The original amplitude spectrum Y (n) is added to the corrected gain G2 (n). If so, the amplitude spectrum S (n) after noise suppression is calculated (step S104 in FIG. 2).
In FIG. 3E, for the sake of simplicity, the result of simply multiplying the amplitude spectrum Y (n) of FIG. 3A by the averaged noise suppression gain (ie, g) of FIG. 3C. It is shown. In the first embodiment, as described above, in addition to this, the gain is further adjusted in consideration of the degree of original sound addition (see the expression (8), particularly the role of OG _t (n), see ) And FIG. 3 (E) well represent the essence of processing assuming that such consideration for the original sound addition processing is omitted (in equation (8), OG _t (n) = 0). Then, the corrected gain G2 (n) is simply equal to G1 (n) or G (n).)

以上に述べたような構成及び作用をもつ雑音抑圧装置１によれば、以下の効果が奏される。
まず、第１実施形態の雑音抑圧装置１によれば、入力信号に含まれる雑音が極めて好適に抑圧される。ここで「好適に」ということのうちには、第１実施形態において特に、以下に記す各点の内実が含まれる。 According to the noise suppression device 1 having the configuration and operation described above, the following effects can be obtained.
First, according to the noise suppression device 1 of the first embodiment, the noise included in the input signal is suppressed extremely suitably. Here, the term “preferably” includes the following points in particular in the first embodiment.

（１）第１に、第１実施形態によれば、いわゆるミュージカルノイズの発生をきわめて実効的に防止することができる。ここでミュージカルノイズとは、入力信号の振幅スペクトルから雑音スペクトル推定値を差し引いた後に発生するノイズを意味する。
例えば、雑音スペクトル推定値に基づく雑音抑圧ゲインは、簡単には、前記式（２）中の（Ｙ（ｎ）−Ｎ（ｎ））／Ｙ（ｎ）を用いて求めることが可能であり、これをそのまま図１に示す乗算器１１で適用する態様を想定すれば、雑音抑圧後の振幅スペクトルＳ（ｎ）は、Ｓ（ｎ）＝｛（Ｙ（ｎ）−Ｎ（ｎ））／Ｙ（ｎ）｝・Ｙ（ｎ）＝Ｙ（ｎ）−Ｎ（ｎ）として求められることになる。つまり、この場合は、入力信号の振幅スペクトルから雑音スペクトル推定値を単純に差し引くことによって、雑音抑圧後の振幅スペクトルＳ（ｎ）が得られることになる。
しかし、この場合の雑音スペクトル推定値は、あくまでも“推定値”であるから、必ずしも、実際の雑音スペクトルを反映しているとは限らない。したがって、ある周波数帯域では、雑音スペクトル推定値差し引き後にもなお雑音が残る場合があり、また、他の周波数帯域では、引き過ぎが生じる場合もある（この引き過ぎの場合は、負の振幅スペクトルが考えられない以上、０に設定される。）。図４では、このような事情が概念的に表現されており、例えば図４（Ｃ）中の実線は引き残り（符号“ＫＮ”参照）、破線は引き過ぎ（符号“ＨＳ”参照）の各場合を表現している（なお、図４（Ａ）及び（Ｂ）は、図３（Ａ）及び（Ｂ）と全く同じである。また、図４（Ｃ）の符号ＨＳｔが指示する部分は、たまたま、Ｙ（ｎ）−Ｎ（ｎ）＝０が成立する場合の例示である。）。
このような振幅スペクトルＳ（ｎ）を時間領域に逆フーリエ変換すると、その信号は、複数のランダムな周波数をもつ正弦波が合成されたもののようになり、これが再生されれば、非常に耳障りな音となって聞こえてくることになる。これがミュージカルノイズである。
このように、ミュージカルノイズは、厳密に言えば不可知の実際の雑音スペクトルと、雑音スペクトル推定値とが一致しないことを主な原因として発生する。 (1) First, according to the first embodiment, generation of so-called musical noise can be extremely effectively prevented. Here, the musical noise means noise generated after the noise spectrum estimation value is subtracted from the amplitude spectrum of the input signal.
For example, the noise suppression gain based on the noise spectrum estimation value can be easily obtained by using (Y (n) −N (n)) / Y (n) in the equation (2). Assuming a mode in which this is applied as it is by the multiplier 11 shown in FIG. 1, the amplitude spectrum S (n) after noise suppression is S (n) = {(Y (n) −N (n)) / Y. (N)} · Y (n) = Y (n) −N (n). That is, in this case, the noise spectrum-suppressed amplitude spectrum S (n) is obtained by simply subtracting the noise spectrum estimation value from the amplitude spectrum of the input signal.
However, since the estimated noise spectrum value in this case is merely an “estimated value”, it does not necessarily reflect the actual noise spectrum. Therefore, in some frequency bands, noise may still remain after subtracting the noise spectrum estimate, and in other frequency bands, too much may occur (in this case, the negative amplitude spectrum is As long as it cannot be considered, it is set to 0.) In FIG. 4, such a situation is conceptually expressed. For example, solid lines in FIG. 4C remain undrawn (see reference “KN”), and broken lines are drawn too much (see reference “HS”). 4 (A) and (B) are exactly the same as FIGS. 3 (A) and 3 (B), and the part designated by the symbol HSt in FIG. Occasionally, this is an example when Y (n) -N (n) = 0 holds true.)
When such an amplitude spectrum S (n) is subjected to inverse Fourier transform in the time domain, the signal looks like a combination of sine waves having a plurality of random frequencies, and if this is reproduced, it is very annoying. It will be heard as sound. This is musical noise.
As described above, the musical noise is mainly caused by the fact that the actual noise spectrum that is unknown to the strict sense does not match the estimated noise spectrum value.

第１実施形態では、このようなミュージカルノイズの発生が極めて効果的に抑制される。というのも、雑音フレーム処理時においては、平均化・平滑化された雑音期間用ゲインＧ_ｔが用いられて、修正後ゲインＧ２（ｎ）が求められ、これが振幅スペクトルＹ（ｎ）に適用されるようになっているからである（図３（Ｅ）参照）。これにより、もともとの振幅スペクトルがもっていた周波数構造が維持されたまま、雑音抑圧が行われることになるので、ミュージカルノイズは極めて発生しがたくなっているのである。 In the first embodiment, the occurrence of such musical noise is extremely effectively suppressed. This is because at the time of noise frame processing, the averaged / smoothed noise period gain G _t is used to obtain a corrected gain G2 (n), which is applied to the amplitude spectrum Y (n). This is because (see FIG. 3E). As a result, noise suppression is performed while maintaining the frequency structure originally possessed by the amplitude spectrum, so that musical noise is extremely difficult to occur.

（１-ｉ）なお、雑音期間用ゲインＧ_ｔを求めるにあたって行われる平均化（前記式（３））及び平滑化（前記式（４））のそれぞれには、固有の意義がある。前者の目的は、図３からも明らかなように、主に、前記ミュージカルノイズの抑圧という効果を導くことにあり、後者の目的は、主に、いわば通時的にみた雑音抑圧処理の連続性を維持することにある。後者によれば、雑音期間用ゲインＧ_ｔ（ｎ）の時間の経過に従った急激な変化が生じないから、例えば、当該雑音フレームに含まれる信号が再生されるとした場合に、聴取者に、聴感上の違和感を与えることがない（なお、第１実施形態において行われる、その他の平滑化処理（即ち、式（１）のｃａｓｅ・Ａ、式（７）のｃａｓｅ・Ａ）は、基本的に、これと本質的に異ならない意義を持つ。）。 (1-i) It should be noted that each of the averaging performed when obtaining the noise period for the gain G _t (Formula (3)) and smoothed (the formula (4)) of, there is an inherent meaning. As is clear from FIG. 3, the purpose of the former is mainly to lead to the effect of suppressing the musical noise, and the purpose of the latter is mainly to say that the continuity of noise suppression processing as seen through time. Is to maintain. According to the latter, since the abrupt change with the passage of time of the noise period gain G _t (n) does not occur, for example, when the signal included in the noise frame is reproduced, the listener is In other words, the other smoothing processes performed in the first embodiment (that is, the case A of the expression (1) and the case A of the expression (7)) are not fundamental. In fact, it has a meaning that is not essentially different from this.)

（２）第２に、上記（１）は雑音フレーム処理に関するミュージカルノイズの発生予防についてであるが、これに関連して、第１実施形態によれば、音声フレーム処理に関するミュージカルノイズの発生予防もよりよく実現される。これは、前述のように、音声フレーム処理では、平均化・平滑化を経ない雑音抑圧ゲインＧ（ｎ）（式（２）参照）がいわばそのまま用いられるようなかたちで、修正後ゲインＧ２（ｎ）が求められることによる（式（８）のｃａｓｅ・Ｂ、あるいは、前述の〔ＩＩ〕（ｖ））。 (2) Secondly, the above (1) relates to the prevention of the occurrence of musical noise related to noise frame processing. In this regard, according to the first embodiment, the generation of musical noise related to voice frame processing is also prevented. Better realized. As described above, in the audio frame processing, the noise suppression gain G (n) (see Equation (2)) that does not undergo averaging / smoothing is used as it is, and the corrected gain G2 ( n) is obtained (case · B in formula (8) or [II] (v) described above).

（３）しかも第３に、第１実施形態によれば、雑音フレームから音声フレームへの切り替わりの場面において、雑音抑圧処理の一貫性が保持される。これは、前述のように、音声フレーム処理時においては、雑音スペクトル推定値Ｎ_ｔ（ｎ）として、直近の雑音フレームにおいて算出されたＮ_ｔ−ｐ（ｎ）が利用されるようになっていることによる（前述の〔ＩＩ〕（ｉ）の記載参照）。
以上の（２）及び（３）を要するに、第１実施形態では、音声フレームにおいて、実効的な雑音抑圧が行われながらも、なお雑音フレーム処理時における雑音抑圧処理（特に、その効果）を尊重して、両フレーム間の流れがより自然なものとなるような工夫がなされているのである。これによれば、第１実施形態の雑音抑圧装置１が何らかの音声再生手段に接続されたとした場合、雑音フレームから音声フレームへの切り替わりの場面において、聴取者に、雑音に係る音量感の変更等々の聴感上の違和感を与えることがない。 (3) Thirdly, according to the first embodiment, the consistency of the noise suppression process is maintained in the scene of switching from the noise frame to the voice frame. This is because, as described above, at the time of the speech frame processing, as the noise spectrum estimation value N _{t (n),} calculated N _{t-p (n)} is adapted to be utilized in the most recent noise frames (Refer to the description of [II] (i) above).
In short, the above (2) and (3) are required. In the first embodiment, while effective noise suppression is performed on a speech frame, the noise suppression processing (particularly, the effect) at the time of noise frame processing is respected. Thus, a contrivance is made so that the flow between both frames becomes more natural. According to this, when the noise suppression apparatus 1 of the first embodiment is connected to some kind of sound reproduction means, the listener can change the volume feeling related to noise in the scene of switching from the noise frame to the sound frame, etc. It does not give a sense of incongruity in hearing.

なお、音声フレームにおけるミュージカルノイズを抑制するためには、前述のＳ（ｎ）＝Ｙ（ｎ）−Ｎ（ｎ）に代えて、Ｓ（ｎ）＝Ｙ（ｎ）−αＮ（ｎ）とし、このα（＞０）の値を大きくするという手法も考えられるが、これでは、音質の劣化が激しくなるおそれが極めて高くなるという欠点を抱え込む。かといって、αを小さくするのでは、ミュージカルノイズの抑圧が不十分となる。
また、図４（Ｃ）中の破線で示す、雑音抑圧後の振幅スペクトルが０となってしまう部分（即ち、符号ＨＳ及びＨＳｔが指示する部分）に、一定の値（ノイズ・フロア）を加算する手法をとることによって、ミュージカルノイズを抑圧することも考えられる。これは、当該の部分ＨＳ及びＨＳｔに、いわば下駄を履かせることによって、引き残りの部分ＫＮをマスキングしよう（あるいは、目立たなくしよう）とする発想に出ている（また、この手法と、前記のαを使う手法とを併用する場合、αはより小さく設定されてもよいから、その場合、音質の劣化防止という効果も得られる。）。
しかし、このようなノイズ・フロアの加算は、とりもなおさず雑音の絶対量を増加させることを意味するから、雑音抑圧という本来の目的達成の観点からみて問題があるだけでなく、そのノイズ・フロアの量の設定如何によっては、雑音抑圧効果が極めて不十分になるおそれが高いという問題を生じさせる。 In order to suppress the musical noise in the voice frame, S (n) = Y (n) −αN (n) is used instead of S (n) = Y (n) −N (n) described above, Although a method of increasing the value of α (> 0) is also conceivable, this suffers from a drawback that the possibility of severe deterioration of sound quality becomes extremely high. However, if α is made small, the suppression of musical noise becomes insufficient.
In addition, a constant value (noise floor) is added to a portion where the amplitude spectrum after noise suppression becomes 0 (that is, a portion indicated by symbols HS and HSt) indicated by a broken line in FIG. It is also conceivable to suppress musical noise by adopting a technique to do this. This is based on the idea of masking the remaining part KN (or making it inconspicuous) by putting on the part HS and HSt, so to speak, clogs (so this technique and the above-mentioned). When using together with the method using α, α may be set smaller, and in this case, the effect of preventing deterioration of sound quality is also obtained.
However, such addition of the noise floor means to increase the absolute amount of noise, so there is not only a problem from the viewpoint of achieving the original purpose of noise suppression. Depending on the setting of the amount of floor, there is a high possibility that the noise suppression effect is very insufficient.

このような観点からみても、第１実施形態の雑音抑圧装置１が極めて優位に立つことが明らかである。すなわち、第１実施形態では、前記のαの利用のように、差し引き量をいわば機械的に増加するのではないから、音質の劣化が生じるおそれは殆どなく、また、前記ノイズ・フロアの単なる加算というような処理が行われるのではないから、いったんは行った雑音抑圧効果が犠牲になるというようなこともない。そして、既に述べたように、そうであるにも関わらず、ミュージカルノイズは、実効的に抑圧されるのである。 From this point of view, it is clear that the noise suppression device 1 of the first embodiment is extremely advantageous. That is, in the first embodiment, since the amount of subtraction is not mechanically increased as in the case of the use of α, there is almost no possibility that the sound quality will deteriorate, and the noise floor is simply added. Such a process is not performed, and the noise suppression effect once performed is not sacrificed. And as already mentioned, the musical noise is effectively suppressed despite this being the case.

（４）第１実施形態の雑音抑圧装置１によれば、前述の式（５）〜式（７）、あるいは、図２のステップＳ２０５及びＳ３０３を参照して説明したように、原音加算処理が行われるようになっているので、雑音抑圧効果が更に実効的に奏される。この原音加算処理によれば、あたかも、前述したノイズ・フロアの加算処理と同様の効果、即ち図４（Ｃ）の引き残り部分ＫＮのマスキング効果が期待できることから、ミュージカルノイズの抑圧、あるいは、音質の劣化防止がより実効的になるのである（もっとも、前記ノイズ・フロアはあくまでも「一定」である。この点が「原音」を利用する場合との決定的な相違である。）。
なお、上述においては、第１実施形態の雑音抑圧装置１によって奏される効果をより明瞭に把握するため、前記αを用いる手法、あるいはノイズ・フロアを用いる手法との対比において、当該効果についての説明を行っている部分があるが、本発明は、これらαあるいはノイズ・フロアを用いてミュージカルノイズを抑圧する手法を積極的に排除する意図までは有しない。すなわち、これらの手法と本発明及びその各種態様とは併用可能であり、そのような併用形態によれば、当該手法の良いところを享受しつつ、本発明及びその各種態様の効果をより際立たせること等が可能となる。 (4) According to the noise suppression apparatus 1 of the first embodiment, as described with reference to the above-described formulas (5) to (7) or steps S205 and S303 in FIG. As a result, the noise suppression effect is more effectively achieved. According to the original sound addition process, it is possible to expect the same effect as the noise floor addition process described above, that is, the masking effect of the remaining portion KN in FIG. 4C. (However, the noise floor is “constant” to the last. This is a decisive difference from the case of using “original sound”).
In the above description, in order to more clearly grasp the effect exerted by the noise suppression device 1 of the first embodiment, in comparison with the method using the α or the method using the noise floor, Although there is a part which has been described, the present invention does not have the intention to positively exclude the technique of suppressing musical noise using these α or noise floor. In other words, these methods and the present invention and various aspects thereof can be used in combination, and according to such a combined form, the effects of the present invention and various aspects thereof can be further emphasized while enjoying the advantages of the method. It becomes possible.

しかも、第１実施形態では、単に原音加算を実行するというのではなく、以下の各点に特徴がある。
（４-ｉ）まず、原音加算の割合（即ち、ＯＧ_ｔ）が、雑音期間用ゲインＧ_ｔと目標雑音抑圧ゲインｔｇとの大きさの如何に応じて定まる原音加算率ｏｇに基づいて定められるようになっている。具体的には、既に述べたように、原音加算処理においては、目標としている雑音抑圧の程度（即ち、ｔｇ）を支配的要因の１つとし、これとの関係において原音加算率ｏｇが決められるようになっているので、雑音期間用ゲインＧ_ｔに基づく処理と、原音加算処理との間で、バランスがとれた使い分けが行われることにより、より実効的に雑音抑圧効果ないしミュージカルノイズ抑圧効果、さらには音質改善効果が享受されることになる。 In addition, the first embodiment is characterized not by simply adding the original sound but by the following points.
(4-i) First, the ratio of original sound addition (that is, OG _t ) is determined based on the original sound addition ratio og determined according to the magnitude of the noise period gain G _t and the target noise suppression gain tg. It is like that. Specifically, as already described, in the original sound addition process, the target degree of noise suppression (that is, tg) is one of the dominant factors, and the original sound addition rate og is determined in relation to this. since it way, a process based on the noise period for the gain G _t, between the original addition process, by selectively used with balanced is performed more effectively noise suppression effects and musical noise suppression effect, Furthermore, the sound quality improvement effect is enjoyed.

（４-ｉｉ）また、このような原音加算処理でも、音声フレーム処理時においては、原音加算割合ＯＧ_ｔとして、直近の雑音フレームにおいて算出されたＯＧ_ｔ―ｐが利用されるようになっている（前述の〔ＩＩ〕（ｉｉ）の記載参照）。これは、ある音声フレームにおける雑音スペクトル推定値Ｎ_ｔ（ｎ）として、その前の雑音スペクトル推定値Ｎ_ｔ−１（ｎ）がそのまま利用されるという、前述の考え方とその本質を同じくする。つまり、この原音加算処理においても、雑音フレーム及び音声フレーム間の切り替わりの場面において、雑音抑圧処理の一貫性が保持されるのである。 (4-ii) Also in such an original sound addition process, OG _t-p calculated in the latest noise frame is used as the original sound addition ratio OG _t during the audio frame process. (Refer to the description of [II] (ii) above). This is the same as the above-described idea that the previous noise spectrum estimation value N _t-1 (n) is used as it is as the noise spectrum estimation value N _t (n) in a certain voice frame. In other words, even in the original sound addition process, the consistency of the noise suppression process is maintained in the scene of switching between the noise frame and the voice frame.

＜第２実施形態＞
以下では、本発明に係る第２の実施の形態について図５乃至図７を参照しながら説明する。なお、この第２実施形態は、上記第１実施形態との対比において、音声検出処理に関連する相違点があり、その他の点については、特に断りがない限り上記第１実施形態と全く同じである。したがって、以下では、前記相違点に関する説明を主に行い、その他の点についての説明は簡略化ないし省略する。また、図面上の符号についても前記相違点以外については流用する。 Second Embodiment
In the following, a second embodiment according to the present invention will be described with reference to FIGS. Note that the second embodiment is different from the first embodiment in that there are differences related to the sound detection processing, and the other points are exactly the same as the first embodiment unless otherwise specified. is there. Therefore, in the following description, the difference will be mainly described, and description of other points will be simplified or omitted. Further, the reference numerals on the drawings are also used except for the differences.

この第２実施形態の雑音抑圧装置１’は、図５に示すように、音声検出部８０１が、雑音抑圧ゲイン演算部３０の後段に接続される構成をもつ。すなわち、この音声検出部８０１は、前記式（２）によって算出される雑音抑圧ゲインＧ（ｎ）を利用することで、入力信号中の音声信号の有無を検出し、あるいは、音声フレームと雑音フレームとの区別を行う。 As shown in FIG. 5, the noise suppression device 1 ′ according to the second embodiment has a configuration in which a voice detection unit 801 is connected to a subsequent stage of the noise suppression gain calculation unit 30. That is, the voice detection unit 801 detects the presence / absence of a voice signal in the input signal by using the noise suppression gain G (n) calculated by the equation (2), or the voice frame and the noise frame. And make a distinction.

第２実施形態においては、音声信号の有無を検出するために、以下の手法がとられる。
まず、式（２）で求められた雑音抑圧ゲインＧ（ｎ）に基づいて、以下の式（９）で表現されるＶａｒが算出される。

ここで、ｇは、上記第１実施形態において利用されていた式（３）によって表現されるｇであって、要するに、Ｇ（ｎ）についての周波数帯域ｎに関する平均値である（第２実施形態は、このｇの演算を、雑音期間用ゲイン演算部４０だけでなく、音声検出部８０１も行う。むろん、両者の一方で行った演算の結果を、両者間で共用してもよい。）。
この式（９）のＶａｒは、表式から明らかな通り、Ｇ（ｎ）の分散を表す。 In the second embodiment, the following method is used to detect the presence or absence of an audio signal.
First, based on the noise suppression gain G (n) obtained by Expression (2), Var expressed by Expression (9) below is calculated.

Here, g is g expressed by Expression (3) used in the first embodiment, and in short, is an average value regarding the frequency band n for G (n) (second embodiment). The calculation of g is performed not only by the noise period gain calculation unit 40 but also by the voice detection unit 801. Of course, the result of the calculation performed by one of the two may be shared by both.
Var in the formula (9) represents the dispersion of G (n) as is apparent from the table.

次に、このＶａｒが所定値を越えるかどうかが判断される。この判断の意義は以下にある。
一般に、式（２）によって算出される雑音抑圧ゲインＧ（ｎ）は、音声信号が含まれる場合と含まれない場合とで大きく異なる様相を示す。図６及び図７はその一例を示しており、前者は、音声信号が含まれる場合の雑音抑圧ゲインＧ（ｎ）の演算例、後者は、含まれない場合の雑音抑圧ゲインＧ（ｎ）の演算例である。これらの図を対比すると明らかなように、両者の場合それぞれにおけるＧ（ｎ）の分散を計算すれば、両者間に大きな隔たりが生じることが容易に推測される。つまり、あるフレームについてのＧ（ｎ）の分散の値が一定程度大きければ、それは音声信号を含み、そうでなければ、音声信号を含まないという判断を行うことが、相当程度の確からしさで可能である。
前述のＶａｒに係る大小判断の意義は、ここにある。改めていえば、ある所定値ＶＢがあるとして、Ｖａｒ＞ＶＢであれば、当該フレームには音声信号があり、したがって、それは「音声フレーム」に区別され、Ｖａｒ≦ＶＢであれば、当該フレームには音声信号がなく、したがって、それは「雑音フレーム」に区別される、ということになる。 Next, it is determined whether or not this Var exceeds a predetermined value. The significance of this judgment is as follows.
In general, the noise suppression gain G (n) calculated by the equation (2) shows a very different aspect between the case where the audio signal is included and the case where the audio signal is not included. FIG. 6 and FIG. 7 show an example thereof. The former is a calculation example of the noise suppression gain G (n) when a speech signal is included, and the latter is the noise suppression gain G (n) when it is not included. It is a calculation example. As is clear from the comparison of these figures, if the variance of G (n) in each case is calculated, it can be easily estimated that a large gap occurs between the two. In other words, if the variance value of G (n) for a certain frame is large to a certain extent, it can be judged with a certain degree of certainty that it includes an audio signal, otherwise it does not include an audio signal. It is.
This is the significance of the above-described judgment on the magnitude of Var. In other words, if there is a certain predetermined value VB, if Var> VB, the frame has an audio signal. Therefore, it is distinguished as “audio frame”, and if Var ≦ VB, There is no audio signal, so it is distinguished as a “noise frame”.

なお、図５の構成では、図１の構成とは異なって、雑音スペクトル推定部２０が音声信号の検出結果を利用することができない。つまり、雑音スペクトル推定部２０は、音声フレーム及び雑音フレームの区別を前提とすることなく、雑音スペクトル推定値Ｎ_ｔ（ｎ）を演算する。
このような場合における雑音スペクトル推定値Ｎ_ｔ（ｎ）は、例えば、以下の式（１０）及び式（１１）によって求められてよい。

ここで、ＰＡ_ｔ（ｎ）は、現に処理中であるフレームにおける入力信号中の振幅スペクトルであって平滑化されたもの、ＰＡ_ｔ−１（ｎ）は、その直前のフレームにおける当該振幅スペクトルであって平滑化されたもの、αは平滑化係数、γ・βは制御パラメータである。また、式（１１）中、ｃａｓｅ・Ｃとあるのは、ＰＡ_ｔ（ｎ）＞Ｎ_ｔ−１（ｎ）が成立する場合を表現し、ｃａｓｅ・Ｄとあるのは、それ以外の場合を表現している。 In the configuration of FIG. 5, unlike the configuration of FIG. 1, the noise spectrum estimation unit 20 cannot use the detection result of the audio signal. That is, the noise spectrum estimation unit 20 calculates the noise spectrum estimation value N _t (n) without assuming the distinction between the voice frame and the noise frame.
The noise spectrum estimation value N _t (n) in such a case may be obtained by, for example, the following expressions (10) and (11).

Here, PA _t (n) is an amplitude spectrum in the input signal in the frame currently being processed and smoothed, and PA _t−1 (n) is the amplitude spectrum in the immediately preceding frame. The smoothing coefficient, α is a smoothing coefficient, and γ · β is a control parameter. In the formula (11), case · C represents a case where PA _t (n)> N _t−1 (n) is satisfied, and case · D represents a case other than that. expressing.

この場合、式（１１）のｃａｓｅ・Ｄとして示される式及び式（１０）の組み合わせが前記の式（１）のｃａｓｅ・Ａとして示される式と実質的にほぼ同義である。
他方、式（１１）のｃａｓｅ・Ｃとして示される式は、前記式（１）中においては該当するものはない。もっとも、この式は、上述のように、即ちＰＡ_ｔ（ｎ）＞Ｎ_ｔ−１（ｎ）が成立する場合、即ち、現に処理中であるフレームにおける振幅スペクトルが、その直前のフレームにおける雑音スペクトル推定値を越える場合に発動するから、このｃａｓｅ・Ｃは、その現に処理中であるフレームが、音声フレームである可能性を示唆するものと捉えることも不可能でない（仮に、多くのｎ（＝０，１，２，３，…）について、かかる条件が満たされるなら、その可能性はより高まるといえる。ただし、あくまでも“示唆”に過ぎない。）。
これら式（１０）及び式（１１）は、以上のような意味における限りで、前記式（１）と共通性をもつといえる。
いずれにしても、雑音スペクトル推定値が好適に算出されることに変わりはない。 In this case, the combination of the formula shown as case · D in the formula (11) and the formula (10) is substantially the same as the formula shown as case · A in the formula (1).
On the other hand, the formula shown as case · C in the formula (11) does not correspond to the formula (1). However, as described above, that is, when PA _t (n)> N _t−1 (n) holds, that is, the amplitude spectrum in the frame currently being processed becomes the noise spectrum in the immediately preceding frame. Since it is activated when the estimated value is exceeded, it is not impossible for this case · C to be considered as a suggestion that the frame currently being processed may be a speech frame (a lot of n (= (0, 1, 2, 3,...), If such a condition is satisfied, the possibility is further increased, but it is merely a “suggestion”).
It can be said that these formulas (10) and (11) have commonality with the above formula (1) as long as they have the above meanings.
In any case, the noise spectrum estimation value is still preferably calculated.

このような第２実施形態によれば、次のような効果が奏される。
まず、この第２実施形態によっても、上記第１実施形態によって奏された作用効果と本質的に異ならない作用効果が奏されることは明白である。すなわち、この第２実施形態でも、上記第１実施形態に関し述べた（１）から（４）の効果がほぼ同様に奏される。 According to such 2nd Embodiment, the following effects are show | played.
First, it is obvious that this second embodiment also provides the operational effects that are not essentially different from the operational effects achieved by the first embodiment. That is, also in the second embodiment, the effects (1) to (4) described in relation to the first embodiment are substantially the same.

加えて、この第２実施形態によれば、図１と図５とを対比すると明らかなように、処理効率の向上、回路構成の簡易化等の効果が享受される。これは、第１実施形態における音声検出が、いわば独立に行われていたのに代えて、第２実施形態における音声検出が、雑音抑圧ゲインＧ（ｎ）の利用が図られることによって行われ、その従属化が行われていることによる。
本発明においては、雑音抑圧ゲインＧ（ｎ）の演算は必ず行わなければならない処理である以上、その演算結果を利用して音声検出処理をも行ってしまうことが、処理の効率化・合理化を導くことは論を俟たない。しかも、その検出性能は相当程度高い（図６及び図７対比参照）。 In addition, according to the second embodiment, as is apparent from the comparison between FIG. 1 and FIG. 5, effects such as improved processing efficiency and simplified circuit configuration can be enjoyed. This is performed by using the noise suppression gain G (n) for the voice detection in the second embodiment instead of performing the voice detection in the first embodiment independently. This is due to the dependency.
In the present invention, since the calculation of the noise suppression gain G (n) is a process that must be performed, the voice detection process is also performed using the calculation result, which improves the efficiency and rationalization of the process. There is no doubt about guiding. Moreover, the detection performance is considerably high (see FIG. 6 and FIG. 7 comparison).

以上、本発明に係る実施の形態について説明したが、本発明に係る雑音抑圧装置は、上述した形態に限定されることはなく、各種の変形が可能である。
（１）上記第１及び第２実施形態では、雑音期間用ゲインＧ_ｔが、周波数軸上で平均化され、時間軸上で平滑化されているが、本発明は、かかる形態に限定されない。上でも既に述べたが、平均化処理と平滑化処理とでは、その主な狙いが異なっているので、特に平滑化処理に関しては、場合によっては省略されてよい。図３（Ｅ）にみられるように、平均化処理だけを実施したとしても、ミュージカルノイズの抑圧効果は一定程度享受可能である。 While the embodiments according to the present invention have been described above, the noise suppression device according to the present invention is not limited to the above-described embodiments, and various modifications can be made.
(1) In the first and second embodiments, the noise period gain G _t is averaged on the frequency axis and smoothed on the time axis, but the present invention is not limited to such a form. As described above, since the main purpose of the averaging process and the smoothing process are different, the smoothing process may be omitted depending on circumstances. As shown in FIG. 3E, even if only the averaging process is performed, the effect of suppressing the musical noise can be enjoyed to a certain extent.

（２）また、上記第１及び第２実施形態では、雑音期間用ゲインＧ_ｔが、前記式（３）による平均化処理、及び、前記式（４）による平滑化処理を経て求められているが、本発明は、これら式（３）及び式（４）のかたちにも拘らない。
まず、本発明において、雑音抑圧ゲイン平均値ｇは、式（３）によって求められる形態に限定されない。
すなわち、式（３）において、ｇは、Ｎ個すべての周波数帯域（全部でＮ個の０，１，２，…，Ｎ−１番目の周波数帯域）を用いて算出されているが、このｇは、例えば、そのうちの一部だけの周波数帯域を用いて算出されてもよい。この場合、極低域（ＤＣ成分に近い帯域）や極高域（ナイキスト周波数に近い帯域）の双方又は一方を除く、周波数帯域を用いることが考えられる。
また、雑音抑圧ゲイン平均値ｇを求めるにあたっては、個々の周波数帯域に異なる重み付けを行ってもよい。例えば、ある特定の重み係数を特定の周波数帯域についてだけ乗算したり、あるいは、連続的、段階的に増加又は減少する重み付け係数をすべての周波数帯域について乗算したり、等々のようである。
次に、本発明において、雑音期間用ゲインＧ_ｔは、前記式（４）によって求められる形態に限定されない。
すなわち、式（４）において、Ｇ_ｔは、雑音抑圧ゲイン平均値ｇを時間軸上で平滑化することにより求められているが、このＧ_ｔは、例えば、相隣接するフレームのｇの平均値として算出されてもよい。 (2) In the first and second embodiments, the noise period gain G _t is obtained through the averaging process according to the equation (3) and the smoothing process according to the equation (4). However, the present invention is not concerned with the form of these formulas (3) and (4).
First, in the present invention, the noise suppression gain average value g is not limited to the form obtained by Expression (3).
That is, in Expression (3), g is calculated using all N frequency bands (N 0, 1, 2,..., N−1th frequency bands in total). May be calculated using, for example, only some of the frequency bands. In this case, it is conceivable to use a frequency band excluding both or one of the extremely low band (band close to the DC component) and the extremely high band (band close to the Nyquist frequency).
Further, in obtaining the noise suppression gain average value g, different weights may be applied to individual frequency bands. For example, a specific weighting factor is multiplied only for a specific frequency band, a weighting factor that is continuously or stepwise increased or decreased is multiplied for all frequency bands, and so on.
Next, in the present invention, the noise period gain G _t is not limited to the form obtained by the equation (4).
That is, in Expression (4), G _t is obtained by smoothing the noise suppression gain average value g on the time axis. This G _t is, for example, the average value of g of adjacent frames. May be calculated as

（３）加えて、上記第１及び第２実施形態では、平均化・平滑化された雑音期間用ゲインＧ_ｔないしＧ１（ｎ）がすべての周波数帯域に適用されているが（式（８）のｃａｓｅ・Ａ、あるいは、図３（Ｅ）参照）、本発明は、かかる形態にも限定されない。
例えば、前述した極低域や極高域の双方又は一方を除く周波数帯域にだけ、当該雑音期間用ゲインＧ_ｔないしＧ１（ｎ）が適用されてよい。この場合、その適用除外となった周波数帯域については、固定値たるゲインが適用されるとよい。 (3) In addition, in the first and second embodiments, the averaged / smoothed noise period gains _Gt to G1 (n) are applied to all frequency bands (Equation (8)). Case · A or see FIG. 3E), the present invention is not limited to such a form.
For example, the noise period gains _Gt to G1 (n) may be applied only to the frequency band excluding both or one of the above-described extremely low frequency range and extremely high frequency range. In this case, a gain that is a fixed value may be applied to the frequency band that is excluded from the application.

（４）上記第１及び第２実施形態では、雑音抑圧ゲインＧ（ｎ）が、式（２）によって算出されているが、本発明は、かかる形態に限定されない。例えば、これ以外にも、ウィナーフィルタ法、ＭＭＳＥ（Minimum Mean-Square Error）法等が用いられてよい（これらについては、前述の非特許文献３及び４を参照）。ＳＮＲ（音声（信号）／ノイズ比率）を推定し、そのＳＮＲに基づいて雑音抑圧ゲインＧ（ｎ）が求められてもよい。 (4) In the first and second embodiments, the noise suppression gain G (n) is calculated by the equation (2), but the present invention is not limited to such a form. For example, besides this, the Wiener filter method, the MMSE (Minimum Mean-Square Error) method, etc. may be used (see the above-mentioned Non-Patent Documents 3 and 4). The SNR (voice (signal) / noise ratio) may be estimated, and the noise suppression gain G (n) may be obtained based on the SNR.

（５）上記第２実施形態では、音声フレーム及び雑音フレーム間の区別を行うために、前記式（９）により、雑音抑圧ゲインＧ（ｎ）についての周波数軸上の分散がとられているが、本発明は、かかる形態に限定されない。
例えば、分散に代えて、標準偏差が用いられてよいことは当然、時間軸上の分散、あるいは標準偏差が用いられてもよい。また、周波数帯域ごとの雑音抑圧ゲインＧ（ｎ）のうち、所定の２つの基準値によって画された空間内に収まるものが幾つあるか等に基づいて、音声フレーム及び雑音フレーム間の区別が行われてもよい（例えば、その数が比較的大であれば、雑音抑圧ゲインＧ（ｎ）は一定の箇所に集中して存在すると判断可能であるから、その散らばりの程度は小さいといえ、したがって、当該のフレームは、雑音フレームと認定される、などというようである。）。さらには、上述した各種の判断手法は、場合によって併用されてよい。これによれば、例えば、散らばりの程度が、前記分散と前記空間内に収まる雑音抑圧ゲインＧ（ｎ）の数との双方が参照された上で判断される、などということになる。 (5) In the second embodiment, in order to distinguish between a voice frame and a noise frame, the variance on the frequency axis for the noise suppression gain G (n) is taken according to the equation (9). The present invention is not limited to such a form.
For example, instead of the variance, the standard deviation may be used. Naturally, the variance on the time axis or the standard deviation may be used. In addition, based on the number of noise suppression gains G (n) for each frequency band that fall within the space defined by two predetermined reference values, a distinction is made between audio frames and noise frames. (For example, if the number is relatively large, it can be determined that the noise suppression gain G (n) is concentrated in a certain place, so that the degree of dispersion is small. The frame is identified as a noise frame, etc.). Furthermore, the various determination methods described above may be used in some cases. According to this, for example, the degree of dispersion is determined with reference to both the dispersion and the number of noise suppression gains G (n) that fit in the space.

本発明の第１実施形態に係る雑音抑圧装置の構成を示すブロック図である。It is a block diagram which shows the structure of the noise suppression apparatus which concerns on 1st Embodiment of this invention. 第１実施形態に係る雑音抑圧処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the noise suppression process which concerns on 1st Embodiment. 第１実施形態に係る雑音抑圧処理の内容を説明するための説明図である。It is explanatory drawing for demonstrating the content of the noise suppression process which concerns on 1st Embodiment. 従来の雑音抑圧処理の内容を説明するための説明図である。It is explanatory drawing for demonstrating the content of the conventional noise suppression process. 本発明の第２実施形態に係る雑音抑圧装置の構成を示すブロック図である。It is a block diagram which shows the structure of the noise suppression apparatus which concerns on 2nd Embodiment of this invention. 音声信号が含まれる場合の雑音抑圧ゲインＧ（ｎ）の演算例を示すグラフである。It is a graph which shows the example of a calculation of the noise suppression gain G (n) in case an audio | voice signal is included. 音声信号が含まれない場合の雑音抑圧ゲインＧ（ｎ）の演算例を示すグラフである。It is a graph which shows the example of calculation of noise suppression gain G (n) in case an audio signal is not included.

Explanation of symbols

１，１’……雑音抑圧装置、１０……時間・周波数変換部、２０……雑音スペクトル推定部、３０……雑音抑圧ゲイン演算部、４０……雑音期間・雑音抑圧ゲイン演算部（雑音期間用ゲイン演算部）、５０……原音加算率演算部、６０……原音加算ゲイン演算部、７０……周波数・時間変換部、１１……乗算器
Ｙ（ｎ）……入力信号の振幅スペクトル、Ｎ（ｎ）……雑音スペクトル推定値、Ｇ（ｎ）……雑音抑圧ゲイン、ｇ……雑音抑圧ゲイン平均値、Ｇ_ｔ，Ｇ１（ｎ）……雑音期間に適用するための雑音抑圧ゲイン（雑音期間用ゲイン）、ｏｇ……原音加算率、ｔｇ……目標雑音抑圧ゲイン、ＴＧ……目標雑音抑圧量、ＯＧ……原音加算割合、Ｇ２（ｎ）……原音加算後の雑音抑圧ゲイン（修正後ゲイン）
DESCRIPTION OF SYMBOLS 1,1 '... Noise suppression apparatus, 10 ... Time / frequency conversion part, 20 ... Noise spectrum estimation part, 30 ... Noise suppression gain calculation part, 40 ... Noise period / noise suppression gain calculation part (noise period) Gain calculation unit), 50 ... original sound addition rate calculation unit, 60 ... original sound addition gain calculation unit, 70 ... frequency / time conversion unit, 11 ... multiplier Y (n) ... amplitude spectrum of the input signal, N (n): Noise spectrum estimation value, G (n): Noise suppression gain, g: Noise suppression gain average value, G _t , G1 (n): Noise suppression gain for application to the noise period ( (Noise period gain), og ... original sound addition rate, tg ... target noise suppression gain, TG ... target noise suppression amount, OG ... original sound addition ratio, G2 (n) ... noise suppression gain after addition of original sound ( Corrected gain)

Claims

Noise spectrum estimation means for estimating a noise spectrum included in an input signal for each of K frequency bands (where K is a natural number of 2 or more), based on the input signal;
First gain calculation means for calculating a noise suppression gain for each of the K frequency bands based on an estimation result by the noise spectrum estimation means;
The first noise-suppressed signal obtained as a result of applying the noise suppression gain to the input signal is determined based on a difference between the noise suppression gain and a target noise suppression gain indicating a target level of noise suppression. Original sound adding means for adding the input signal at a first ratio;
A noise suppression device comprising:

A second gain calculating means for calculating an average value gain for the K frequency bands for the noise suppression gain;
The original sound adding means is
The second signal after noise suppression obtained as a result of applying the average gain to the input signal is input to the input signal at a second ratio determined based on the difference between the average gain and the target noise suppression gain. to add,
The noise suppression apparatus according to claim 1.

The original sound adding means is
Calculating a smoothing ratio obtained by smoothing the first or second ratio on the time axis;
Adding the input signal at the smoothing rate to the first or second noise-suppressed signal;
The noise suppression apparatus according to claim 1 or 2, wherein

Further comprising voice detection means for classifying the input signal into a voice frame including the voice and a noise frame not including the voice over time by detecting the presence or absence of the voice included in the input signal;
The second noise-suppressed signal is
Obtained as a result of applying the average gain to the portion corresponding to the noise frame of the input signal,
The noise suppression apparatus according to claim 2.

The first noise-suppressed signal is
Obtained as a result of applying the noise suppression gain to the portion of the input signal corresponding to the speech frame,
The noise suppressor according to claim 4.

The original sound adding means is
Adding the input signal to the first noise-suppressed signal at the second rate instead of the first rate;
The noise suppression apparatus according to claim 5.

The original sound adding means is
When trying to add the input signal to the first noise-suppressed signal related to the speech frame,
Adding the input signal to the first noise-suppressed signal, assuming that the second ratio already calculated for the noise frame nearest to the voice frame is the first ratio in the voice frame;
The noise suppression apparatus according to claim 5.

The original sound adding means is
When trying to add the input signal to the second noise-suppressed signal related to the noise frame,
After the second ratio in the noise frame is calculated as a temporary second ratio, the temporary second ratio is smoothed on the time axis using the first or second ratio in the frame immediately before the noise frame. And calculating the smoothed ratio, and assuming that the smoothed ratio is the second ratio, adding the input signal to the second noise-suppressed signal,
When trying to add the input signal to the first noise-suppressed signal related to the speech frame,
Adding the input signal to the first noise-suppressed signal, assuming that the first or second ratio in the frame immediately before the voice frame is the first ratio in the voice frame;
The noise suppression apparatus according to claim 5 or 7, wherein

The first or second ratio is:
The noise suppression apparatus according to claim 1, wherein the noise suppression apparatus is obtained by the following expression (A).
og = max (0, tg−G) (A)
Where og is the first or second ratio to be determined,
tg is the target noise suppression gain,
G is the noise suppression gain or the average gain,
max (a, b) means a function that returns a larger value of a and b.

A noise spectrum estimation step for estimating a noise spectrum included in the input signal for each of K frequency bands (where K is a natural number of 2 or more);
A first gain calculation step of calculating a noise suppression gain for each of the K frequency bands based on an estimation result in the noise spectrum estimation step;
The first noise-suppressed signal obtained as a result of applying the noise suppression gain to the input signal is determined based on a difference between the noise suppression gain and a target noise suppression gain indicating a target level of noise suppression. An original sound adding step of adding the input signal at a first ratio;
Including a noise suppression method.

A second gain calculating step of calculating an average value gain for the K frequency bands for the noise suppression gain;
In the original sound adding step,
The second signal after noise suppression obtained as a result of applying the average value gain to the input signal has the input signal at a second ratio determined based on the difference between the average value gain and the target noise suppression gain. To be added,
The noise suppression method according to claim 10.

A voice detecting step of detecting the presence or absence of a voice included in the input signal, and classifying the input signal into a voice frame including the voice and a noise frame not including the voice over time;
The second noise-suppressed signal is
Obtained as a result of applying the average gain to the portion corresponding to the noise frame of the input signal,
The noise suppression method according to claim 11.