JP2014021438A

JP2014021438A - Noise suppression device and program thereof

Info

Publication number: JP2014021438A
Application number: JP2012162697A
Authority: JP
Inventors: Nobumasa Seiyama; 信正清山; Reiko Saito; 礼子齋藤; Atsushi Imai; 篤今井; Tomoyasu Komori; 智康小森
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp; NHK Engineering System Inc
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2012-07-23
Filing date: 2012-07-23
Publication date: 2014-02-03
Anticipated expiration: 2032-07-23
Also published as: JP6027804B2

Abstract

PROBLEM TO BE SOLVED: To provide a noise suppression device and program which can obtain a good result without unnatural noise by noise suppression processing.SOLUTION: A noise suppression device comprises: several suppression processing units outputting noise suppression voice data about input voice data by processing with a noise suppression method different in respectively; a weight calculation unit calculating a weighting factor for each noise suppression method on the basis of the noise suppression voice data output by the several suppression processing units; and a voice integration unit mixing the noise suppression voice data by multiplying the weighting factor for each noise suppression method calculated by the weight calculation unit and the noise suppression voice data output by the several suppression processing units together.

Description

本発明は、音声処理に関する。特に、本発明は、音声に混入した雑音を抑圧することのできる雑音抑圧装置およびプログラムに関する。 The present invention relates to audio processing. In particular, the present invention relates to a noise suppression apparatus and program capable of suppressing noise mixed in speech.

テレビやラジオなど放送用の音声の収録は、生中継の場合も含み、必ずしも音声素材の収録に適した環境で行われるとは限らない。特に、緊急報道の現場などからの中継では、電力を自家発電で用意しなければならない場合もあり、音声の収録時に様々な雑音が混入することを避けることができない。そのような状況においても放送に耐えうる明瞭な音声を得るためには、混入する雑音を高品質に抑圧する技術が求められる。 Recording of audio for broadcasting such as television and radio is not necessarily performed in an environment suitable for recording of audio material, including the case of live broadcast. In particular, when relaying from the site of an emergency report, etc., it may be necessary to prepare electric power by in-house power generation, and it is inevitable that various noises are mixed when recording audio. In order to obtain clear sound that can withstand broadcasting even in such a situation, a technique for suppressing mixed noise with high quality is required.

従来の技術において、音声に付加された雑音成分を抑圧する方法のうち、よく知られている方法の一つは、スペクトルサブトラクションの技術である。非特許文献１は、スペクトルサブトラクションの技術について記載している。この方法は、雑音のスペクトルの平均値を推定し、推定された平均値を雑音が混在する入力信号のスペクトルから減算することで、雑音の低減を図る方法である。 Of the conventional techniques for suppressing the noise component added to the speech, one of the well-known methods is a spectral subtraction technique. Non-Patent Document 1 describes a technique of spectral subtraction. This method is a method for reducing noise by estimating an average value of a spectrum of noise and subtracting the estimated average value from a spectrum of an input signal in which noise is mixed.

また、別の技術として、本来の音声信号と推定した音声信号の平均二乗誤差を最小にする線形フィルターを構成して雑音が混在する入力信号から元の音声信号を得るウィーナーフィルター法がある。非特許文献２は、ウィーナーフィルター法について記載している。 As another technique, there is a Wiener filter method in which a linear filter that minimizes the mean square error between an original audio signal and an estimated audio signal is configured to obtain an original audio signal from an input signal in which noise is mixed. Non-Patent Document 2 describes the Wiener filter method.

また、別の技術として、雑音が混在する入力信号の振幅スペクトルと雑音推定スペクトルの平均値から周波数ごとのＳＮ比を推定しながら、本来の音声信号と推定した音声信号の短時間振幅スペクトルの平均二乗誤差を最小にするように短時間振幅スペクトルを復元するＭＭＳＥ−ＳＴＳＡ法がある。非特許文献３は、ＭＭＳＥ−ＳＴＳＡ法について記載している。 As another technique, while estimating the SN ratio for each frequency from the average value of the amplitude spectrum of the input signal mixed with noise and the noise estimation spectrum, the average of the short-time amplitude spectrum of the estimated speech signal as the original speech signal There is an MMSE-STSA method that restores the short-time amplitude spectrum so as to minimize the square error. Non-Patent Document 3 describes the MMSE-STSA method.

また、別の方法として、雑音が混在する入力信号を音声と雑音からなる信号空間と雑音のみからなる雑音空間に分離して本来の音声成分を推定する信号部分空間法がある。非特許文献４は、信号部分空間法について記載している。 As another method, there is a signal subspace method in which an original speech component is estimated by separating an input signal in which noise is mixed into a signal space composed of speech and noise and a noise space composed only of noise. Non-Patent Document 4 describes the signal subspace method.

S. F. Boll，”Suppression of acoustic noise in speech using spectral subtraction”，IEEE Transactions on Acoustics, Speech, & Signal Processing， vol.ASSP-27，no.7，pp. 113-120，1979年S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transactions on Acoustics, Speech, & Signal Processing, vol.ASSP-27, no.7, pp. 113-120, 1979 J. S. Lim and A. V. Oppenheim，”All-pole modeling of degraded speech”，IEEE Transactions on Acoustics, Speech, & Signal Processing，vol.26，no.3，pp. 197-210，1978年J. S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech”, IEEE Transactions on Acoustics, Speech, & Signal Processing, vol. 26, no. 3, pp. 197-210, 1978 Y. Ephraim and D. Malah，”Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator” ，IEEE Transactions on Signal Processing，vol.32，no.6，pp. 1109-1121，1984年Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”, IEEE Transactions on Signal Processing, vol. 32, no. 6, pp. 1109-1121, 1984 Y. Ephraim and H. L. V. Tres，”A Signal subspace approach for speech enhancement” ，IEEE Transactions on Speech and Audio Processing，vol.3，no.4，pp. 251-266，1995年Y. Ephraim and H. L. V. Tres, “A Signal subspace approach for speech enhancement”, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995

非特許文献１に記載されたスペクトルサブトラクションの方法では、処理後の音声に不自然な雑音成分（ミュージカルノイズ）を生じる場合がある。これは、雑音混入音声のスペクトルと雑音推定値のスペクトルの差が負の値となる場合に、値をゼロに置き換える半波整流によって雑音抑圧音声の振幅スペクトルを求めるためである。これにより、処理フレームごとにランダムな周波数位置に独立した小さなピークが生じ、これがミュージカルノイズとして知覚され、雑音抑圧音声の品質が劣化する。また、雑音の推定には誤りも含まれ、その雑音が知覚されるため、雑音抑圧音声に音質劣化を生じる。
同様に、非特許文献２〜４に記載された方法についても、雑音成分の推定精度に依存するため、その推定誤りによる音質劣化が避けられない。
なお、非特許文献３、４に記載された方法では、非特許文献１、２に記載された方法と比べて、雑音部分の抑圧に優れる一方で、音声部分の劣化が知覚されやすい。
このように、いずれの方法でも高品質に雑音抑圧音声を得ることができず、この問題を解決することができなかった。 In the method of spectral subtraction described in Non-Patent Document 1, an unnatural noise component (musical noise) may be generated in the processed speech. This is because, when the difference between the spectrum of the noise-mixed speech and the spectrum of the noise estimation value is a negative value, the amplitude spectrum of the noise-suppressed speech is obtained by half-wave rectification in which the value is replaced with zero. As a result, an independent small peak is generated at a random frequency position for each processing frame, which is perceived as musical noise, and the quality of noise-suppressed speech deteriorates. In addition, noise estimation includes errors, and the noise is perceived, resulting in sound quality degradation in the noise-suppressed speech.
Similarly, since the methods described in Non-Patent Documents 2 to 4 depend on the noise component estimation accuracy, sound quality deterioration due to an estimation error is inevitable.
Note that the methods described in Non-Patent Documents 3 and 4 are superior to the methods described in Non-Patent Documents 1 and 2 in suppressing the noise part, but the deterioration of the voice part is easily perceived.
As described above, it is impossible to obtain noise-suppressed speech with high quality by any method, and this problem cannot be solved.

本発明は、上記の課題認識に基づいて行なわれたものであり、従来の雑音抑圧法よりも良好な結果を得ることのできる雑音抑圧装置およびプログラムを提供するものである。 The present invention has been made on the basis of the above problem recognition, and provides a noise suppression device and a program that can obtain better results than conventional noise suppression methods.

［１］上記の課題を解決するため、本発明の一態様による雑音抑圧装置は、入力された音声のデータについて、それぞれ異なる雑音抑圧法による処理を行なうことによって雑音抑圧音声データを出力する複数の抑圧処理部と、前記複数の抑圧処理部から出力された前記雑音抑圧音声データに基づき、それぞれの雑音抑圧法のための重み係数を算出する重み算出部と、前記重み算出部によって算出されたそれぞれの雑音抑圧法のための重み係数を前記複数の抑圧処理部から出力された各々の前記雑音抑圧音声データに乗じて、前記雑音抑圧音声データを混合する音声統合部とを具備する。
雑音抑圧装置に入力される音声には、雑音が混入している。上記の構成により、各抑圧処理部がそれぞれ異なる雑音抑圧法による雑音抑圧音声データを出力する。各々の雑音抑制音声データは、音声成分（クリーンスピーチ）と雑音成分とを含んでいるが、雑音成分はそれぞれの雑音抑圧法により抑圧されている。なお異なる雑音抑圧法を用いていることにより、雑音成分の抑圧のされ方の具合も異なる。音声統合部は、これらの雑音抑圧音声データを混合する。このような混合により、雑音成分のエネルギーが弱まる。なお、音声統合部が混合する雑音抑圧音声データは、時間領域の信号のデータであっても良いし、周波数領域の信号のデータであっても良い。
また、上記の構成によれば、音声統合部が雑音抑圧音声データを混合する際、雑音抑圧法に応じた重み付けを行なう。従って、雑音抑圧法に応じたより良好な比率で雑音抑圧音声データを混合することができ、雑音抑圧の効果が向上する。 [1] In order to solve the above-described problem, a noise suppression device according to an aspect of the present invention is configured to output a plurality of noise-suppressed speech data by performing processing based on different noise suppression methods on input speech data. A suppression processing unit, a weight calculation unit that calculates a weighting factor for each noise suppression method based on the noise-suppressed speech data output from the plurality of suppression processing units, and each calculated by the weight calculation unit A speech integration unit that multiplies each of the noise-suppressed speech data output from the plurality of suppression processing units by a weighting coefficient for the noise suppression method, and mixes the noise-suppressed speech data.
Noise is mixed in the voice input to the noise suppression device. With the above configuration, each suppression processing unit outputs noise-suppressed voice data by a different noise suppression method. Each noise-suppressed voice data includes a voice component (clean speech) and a noise component, and the noise component is suppressed by each noise suppression method. Note that the use of different noise suppression methods also changes the way in which noise components are suppressed. The voice integration unit mixes these noise-suppressed voice data. Such mixing reduces the energy of the noise component. Note that the noise-suppressed voice data mixed by the voice integration unit may be time domain signal data or frequency domain signal data.
Moreover, according to said structure, when a speech integration part mixes noise suppression audio | voice data, it weights according to a noise suppression method. Therefore, the noise-suppressed voice data can be mixed at a better ratio according to the noise suppression method, and the effect of noise suppression is improved.

［２］また、本発明の一態様は、上記の雑音抑圧装置において、前記重み算出部は、前記複数の抑圧処理部から出力された前記雑音抑圧音声データの相互間の相関係数を算出し、他の雑音抑圧法との間の相関が高い雑音抑圧法ほど、前記重み係数の値が大きくなるよう算出することを特徴とする。
雑音抑圧音声データの相互間の相関係数が大きいほど、その雑音抑圧音声データでは雑音成分に比べて音声（クリーンスピーチ）成分が相対的により強いと言える。従って、そのような雑音成分よりも音声成分の方が強い雑音抑圧法による雑音抑圧音声データを、より強くして（重み付けを大きくして）混合することができる。よって雑音抑圧の効果がより一層向上する。一例としては、ある雑音抑圧法に関する重み係数が、その雑音抑圧法と他の雑音抑圧法との間の相互相関係数の総和（他の雑音抑圧法についての総和）に比例するよう、重み係数を算出するようにする。 [2] Further, according to one aspect of the present invention, in the noise suppression device, the weight calculation unit calculates a correlation coefficient between the noise-suppressed speech data output from the plurality of suppression processing units. The noise suppression method having a higher correlation with other noise suppression methods is calculated such that the value of the weighting coefficient is increased.
It can be said that the greater the correlation coefficient between the noise-suppressed speech data, the stronger the speech (clean speech) component in the noise-suppressed speech data than the noise component. Therefore, the noise-suppressed voice data obtained by the noise suppression method in which the voice component is stronger than the noise component can be mixed with a stronger (higher weight). Therefore, the effect of noise suppression is further improved. As an example, the weighting factor for a certain noise suppression method is proportional to the sum of cross-correlation coefficients between that noise suppression method and other noise suppression methods (summation for other noise suppression methods). Is calculated.

［３］また、本発明の一態様は、上記の雑音抑圧装置において、前記重み算出部は、前記複数の抑圧処理部から出力された前記雑音抑圧音声データに基づいて、各雑音抑圧法による前記雑音抑圧音声データについて、他の雑音抑圧法による前記雑音抑圧音声データを所望信号とする適応フィルター係数を算出し、算出された前記適応フィルター係数の値が大きいほど、前記重み係数の値が大きくなるよう算出することを特徴とする。
算出される適応フィルター係数が大きいほど、その雑音抑圧音声データでは雑音成分に比べて音声（クリーンスピーチ）成分が相対的により強いと言える。従って、そのような雑音成分よりも音声成分の方が強い雑音抑圧法による雑音抑圧音声データを、より強くして（重み付けを大きくして）混合することができる。よって雑音抑圧の効果がより一層向上する。一例としえは、ある雑音抑圧法に関する重み係数が、その雑音抑圧法が他の雑音抑圧法を所望信号とする適応フィルター係数の総和（他の雑音抑圧法についての総和）に比例するよう、重み係数を算出するようにする。 [3] Further, according to an aspect of the present invention, in the noise suppression device, the weight calculation unit is configured to perform the noise suppression method based on the noise suppression speech data output from the plurality of suppression processing units. For noise-suppressed speech data, an adaptive filter coefficient that uses the noise-suppressed speech data as a desired signal by another noise suppression method is calculated. The larger the calculated adaptive filter coefficient value, the larger the weighting factor value. It is characterized by calculating as follows.
It can be said that the larger the calculated adaptive filter coefficient is, the stronger the speech (clean speech) component is compared to the noise component in the noise-suppressed speech data. Therefore, the noise-suppressed voice data obtained by the noise suppression method in which the voice component is stronger than the noise component can be mixed with a stronger (higher weight). Therefore, the effect of noise suppression is further improved. As an example, the weighting factor for a noise suppression method is such that the noise suppression method is proportional to the sum of the adaptive filter coefficients for which the other noise suppression method is the desired signal (summation for other noise suppression methods). Is calculated.

［４］また、本発明の一態様は、上記の雑音抑圧装置において、前記複数の抑圧処理部のそれぞれから出力された前記雑音抑圧音声データに基づいて周波数特性データを算出する周波数特性算出部と、前記周波数特性データに基づいて振幅特性データを算出する振幅特性算出部とをさらに具備し、前記重み算出部は、前記振幅特性データに基づいてそれぞれの雑音抑圧法のための重み係数を算出し、前記音声統合部は、前記重み係数を前記振幅特性データに乗じて混合することによって、前記雑音抑圧音声データを混合することを特徴とする。
この構成により、抑圧処理部が出力する時間領域の信号のデータを周波数領域の信号のデータに変換し、音声統合部が周波数領域における音声信号を混合するようにできる。混合された周波数領域の音声信号を、適宜、時間領域の音声信号に戻す変換を行なっても良い。周波数特性算出部は、フーリエ変換を行なうことにより、雑音抑圧音声データから周波数特性データを算出する。なお、上記の周波数特性データも振幅特性データも、抑圧処理部によってそれぞれの雑音抑圧法で処理された雑音抑圧音声データである。 [4] Further, according to one aspect of the present invention, in the above-described noise suppression device, a frequency characteristic calculation unit that calculates frequency characteristic data based on the noise-suppressed speech data output from each of the plurality of suppression processing units; An amplitude characteristic calculation unit that calculates amplitude characteristic data based on the frequency characteristic data, and the weight calculation unit calculates a weighting coefficient for each noise suppression method based on the amplitude characteristic data. The voice integration unit mixes the noise-suppressed voice data by multiplying the amplitude characteristic data by the weighting coefficient and mixing them.
With this configuration, the time domain signal data output from the suppression processing unit can be converted into frequency domain signal data, and the audio integration unit can mix the audio signals in the frequency domain. The mixed frequency domain audio signal may be converted back to the time domain audio signal as appropriate. The frequency characteristic calculation unit calculates frequency characteristic data from the noise-suppressed voice data by performing Fourier transform. Note that both the frequency characteristic data and the amplitude characteristic data described above are noise-suppressed speech data processed by the noise suppression method by the suppression processing unit.

［５］また、本発明の一態様は、コンピューターに、入力された音声の波形データについて、それぞれ異なる雑音抑圧法による処理を行なうことによって雑音抑圧音声データを出力する複数の抑圧処理過程、前記複数の抑圧処理過程で出力された前記雑音抑圧音声データに基づき、それぞれの雑音抑圧法のための重み係数を算出する重み算出過程、前記重み算出過程によって算出されたそれぞれの雑音抑圧法のための重み係数を前記複数の抑圧処理過程から出力された各々の前記雑音抑圧音声データに乗じて、前記雑音抑圧音声データを混合する音声統合過程、の処理を実行させるためのプログラムである。 [5] Further, according to one aspect of the present invention, a plurality of suppression processing steps for outputting noise-suppressed speech data by performing processing based on different noise suppression methods on waveform data of speech input to a computer, A weight calculation process for calculating a weighting coefficient for each noise suppression method based on the noise-suppressed speech data output in the suppression processing process, and a weight for each noise suppression method calculated by the weight calculation process A program for executing a process of a voice integration process of multiplying each noise-suppressed voice data output from the plurality of suppression processing processes by a coefficient and mixing the noise-suppressed voice data.

本発明によれば、従来の雑音抑圧法のいずれかを単独で用いるよりも、より良好な雑音抑圧結果を得ることが出来る。 According to the present invention, a better noise suppression result can be obtained than when any one of the conventional noise suppression methods is used alone.

本発明の第１の実施形態による雑音抑圧装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the noise suppression apparatus by the 1st Embodiment of this invention. 同実施形態による重み算出部の詳細な機能構成を示すブロック図である。It is a block diagram which shows the detailed functional structure of the weight calculation part by the embodiment. 同実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフ（音声波形）である。It is a graph (voice waveform) which shows the example of the result of the noise suppression by the noise suppression apparatus of the embodiment. 同実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフ（音声スペクトル）である。It is a graph (voice spectrum) which shows the example of the result of the noise suppression by the noise suppression apparatus of the embodiment. 同実施形態の雑音抑圧装置による雑音抑圧処理時に、重み算出部が算出した重み係数値の時間変化を示すグラフである。It is a graph which shows the time change of the weighting coefficient value which the weight calculation part calculated at the time of the noise suppression process by the noise suppression apparatus of the embodiment. 同実施形態の雑音抑圧装置による雑音抑圧処理時に、相互相関係数算出部が算出した相互相関係数の時間変化を示すグラフである。It is a graph which shows the time change of the cross correlation coefficient which the cross correlation coefficient calculation part calculated at the time of the noise suppression process by the noise suppression apparatus of the embodiment. 第２の実施形態による雑音抑圧装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the noise suppression apparatus by 2nd Embodiment. 同実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフ（音声波形）である。It is a graph (voice waveform) which shows the example of the result of the noise suppression by the noise suppression apparatus of the embodiment. 同実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフ（音声スペクトル）である。It is a graph (voice spectrum) which shows the example of the result of the noise suppression by the noise suppression apparatus of the embodiment. 第３の実施形態による重み算出部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the weight calculation part by 3rd Embodiment.

次に、本発明の実施形態について、図面を参照しながら説明する。
なお、以下の説明文中で、数式または数式中の表現に言及するとき、ある表現（変数等の文字）の上にハット「＾」が付されている場合には、その表現と「＾」とを角括弧で囲んで表わす。例えば、［ｘ＾］は、文字ｘの上にハットが付されていることを表わす。また、ある表現（文字）の上にチルダが付されている場合には、その表現と「〜」とを角括弧で囲んで表わす。例えば、［ｘ〜］は、文字ｘの上にチルダが付されていることを表わす。また、ある表現に絶対値記号が付されている場合には、その表現を縦棒「｜」で囲んで表わす。例えば、｜ｘ｜は、文字ｘに絶対値記号が付されていることを表わす。 Next, embodiments of the present invention will be described with reference to the drawings.
In addition, in the following description, when a mathematical expression or an expression in a mathematical expression is referred to, when a hat “^” is added on a certain expression (characters such as variables), the expression and “^” Is enclosed in square brackets. For example, [x ^] indicates that a hat is added on the letter x. When a tilde is added on a certain expression (character), the expression and “˜” are enclosed in square brackets. For example, [x˜] indicates that a tilde is added on the letter x. When an absolute value symbol is attached to a certain expression, the expression is enclosed by a vertical bar “|”. For example, | x | represents that an absolute value symbol is attached to the character x.

［第１の実施形態］
図１は、第１の実施形態による雑音抑圧装置の機能構成を示すブロック図である。図示するように、雑音抑圧装置１は、音声入力部１１と、波形切り出し部１２と、Ｉ個（Ｉは、２以上の整数）の抑圧処理部１４−１〜１４−Ｉと、雑音抑圧音声行列記憶部１５と、重み算出部１６と、音声統合部１７と、波形重ね合わせ部１８と、音声出力部１９とを含んで構成される。 [First Embodiment]
FIG. 1 is a block diagram illustrating a functional configuration of the noise suppression device according to the first embodiment. As shown in the figure, the noise suppression apparatus 1 includes a voice input unit 11, a waveform cutout unit 12, I (I is an integer of 2 or more) suppression processing units 14-1 to 14-I, and noise-suppressed speech. The matrix storage unit 15, the weight calculation unit 16, the voice integration unit 17, the waveform superposition unit 18, and the voice output unit 19 are configured.

音声入力部１１は、外部から音声信号のデータを取得する。
波形切り出し部１２は、音声入力部１１が取得した音声を、適切な分析フレームに切り出す。
抑圧処理部１４−１〜１４−Ｉは、それぞれ、入力された音声に雑音抑圧法を適用して、雑音抑圧音声データを出力する。抑圧処理部１４−１〜１４−Ｉは、切り出された分析フレームごとに雑音抑圧法を適用する。なお、抑圧処理部１４−１〜１４−Ｉの各々は、互いに性質の異なるＩ種類（Ｉ≧２）の雑音抑圧法を使用する。ここでは、互いに性質の異なる複数の雑音抑圧法を用いて、抑圧処理部１４−１〜１４−Ｉのそれぞれが雑音抑圧処理を行なうことにより、本実施形態の効果が得られる。Ｉ≧３の場合にさらに良好な効果が得られる。使用する雑音抑圧法としては、例えば、前述のスペクトルサブトラクションや、ウィーナーフィルター法や、ＭＭＳＥ−ＳＴＳＡ法や、信号部分空間法などを挙げることができる。なお、その他の雑音抑圧法を用いても良い。 The voice input unit 11 acquires voice signal data from the outside.
The waveform cutout unit 12 cuts out the voice acquired by the voice input unit 11 into an appropriate analysis frame.
Each of the suppression processing units 14-1 to 14-I applies a noise suppression method to the input speech and outputs noise-suppressed speech data. The suppression processing units 14-1 to 14-I apply the noise suppression method for each extracted analysis frame. Each of the suppression processing units 14-1 to 14-I uses I types (I ≧ 2) of noise suppression methods having different properties. Here, each of the suppression processing units 14-1 to 14-I performs noise suppression processing using a plurality of noise suppression methods having different properties from each other, thereby obtaining the effects of the present embodiment. A better effect is obtained when I ≧ 3. Examples of the noise suppression method to be used include the aforementioned spectral subtraction, Wiener filter method, MMSE-STSA method, and signal subspace method. Other noise suppression methods may be used.

雑音抑圧音声行列記憶部１５は、抑圧処理部１４−１〜１４−Ｉによる処理の結果である雑音抑圧音声のデータを記憶する。具体的には、雑音抑圧音声行列記憶部１５は、抑圧処理部１４−１〜１４−Ｉのそれぞれが生成する雑音抑圧音声ベクトルを並べて構成される雑音抑圧音声行列の形式で、データを記憶する。雑音抑圧音声ベクトルは、分析フレームごとのデータである。
重み算出部１６は、抑圧処理部１４−１〜１４−Ｉから出力され雑音抑圧音声行列記憶部１５に一時的に記憶されている雑音抑圧音声データを読み出し、このデータに基づいてそれぞれの雑音抑圧法のための重み係数を算出する。重み算出部１６は、混合後の雑音抑圧結果が最適となるように重み係数を算出する。重み算出部１６は、分析フレームごとに、上記の重み係数を算出する。なお、重み係数算出方法の詳細については後述する。 The noise-suppressed speech matrix storage unit 15 stores data of noise-suppressed speech that is a result of processing by the suppression processing units 14-1 to 14-I. Specifically, the noise suppression speech matrix storage unit 15 stores data in the form of a noise suppression speech matrix configured by arranging the noise suppression speech vectors generated by the suppression processing units 14-1 to 14-I. . The noise suppression speech vector is data for each analysis frame.
The weight calculation unit 16 reads the noise-suppressed speech data output from the suppression processing units 14-1 to 14-I and temporarily stored in the noise-suppressed speech matrix storage unit 15, and based on this data, each noise suppression Calculate the weighting factor for the modulo. The weight calculation unit 16 calculates a weight coefficient so that the noise suppression result after mixing is optimized. The weight calculation unit 16 calculates the above weight coefficient for each analysis frame. Details of the weighting factor calculation method will be described later.

音声統合部１７は、抑圧処理部１４−１〜１４−Ｉから出力された雑音抑圧音声データを混合する。また、音声統合部１７は、雑音抑圧音声データを混合する際に、それぞれの雑音抑圧法に応じた重み係数を用いて、雑音抑圧法ごとの混合比率の重み付けを行なう。音声統合部１７は、上記の分析フレームごとに雑音抑圧音声データを混合する。なお、重み係数は、重み算出部１６によって算出されたものである。
波形重ね合わせ部１８は、音声統合部１７によって混合された分析フレームごとの音声波形データを元に、分析フレームのシフト幅分ずつずらして重ね合わせた音声波形データを生成する。波形重ね合わせ部１８が生成する音声波形データは、言うまでもなく、複数の雑音抑圧法によって雑音抑圧処理し、重み係数に基づいて混合された音声波形である。
音声出力部１９は、波形重ね合わせ部１８によって生成された音声を、外部に出力する。 The voice integration unit 17 mixes the noise suppression voice data output from the suppression processing units 14-1 to 14-I. Further, when mixing the noise-suppressed voice data, the voice integration unit 17 weights the mixing ratio for each noise suppression method using a weighting coefficient corresponding to each noise suppression method. The voice integration unit 17 mixes the noise-suppressed voice data for each analysis frame. The weight coefficient is calculated by the weight calculation unit 16.
Based on the voice waveform data for each analysis frame mixed by the voice integration unit 17, the waveform superposition unit 18 generates voice waveform data that is shifted by the shift width of the analysis frame and superimposed. Needless to say, the speech waveform data generated by the waveform superimposing unit 18 is a speech waveform that has been subjected to noise suppression processing by a plurality of noise suppression methods and mixed based on weighting factors.
The audio output unit 19 outputs the audio generated by the waveform superimposing unit 18 to the outside.

音声入力部１１は、外部から音声信号を取得する。なお、音声入力部１１は、アナログ信号として音声を取得した場合には、ＡＤ（analog-to-digital）変換を行う。そして、音声入力部１１は、デジタル化された音声データを波形切り出し部１２に供給する。この音声データは、一例として、サンプリング周波数１６ｋＨｚ、量子化ビット数１６ビット（bit）のデータである。なお、音声入力部１１が取得する音声には雑音が混入している。 The voice input unit 11 acquires a voice signal from the outside. The voice input unit 11 performs AD (analog-to-digital) conversion when voice is acquired as an analog signal. Then, the voice input unit 11 supplies the digitized voice data to the waveform cutout unit 12. This audio data is, for example, data having a sampling frequency of 16 kHz and a quantization bit number of 16 bits (bits). Note that noise is mixed in the voice acquired by the voice input unit 11.

音声入力部１１が波形切り出し部１２に供給するデータは、雑音混入音声ｙ（ｎ）として表わされる。ここで、ｎは時系列のサンプル番号であり、ｙ（ｎ）がそのサンプル値である。雑音混入音声ｙ（ｎ）は、（雑音なしの）音声ｘ（ｎ）と、雑音ｄ（ｎ）とにより、下の式（１）の加法性雑音モデルで構成される。 Data that the voice input unit 11 supplies to the waveform cutout unit 12 is represented as noise-mixed voice y (n). Here, n is a time-series sample number, and y (n) is the sample value. The noise-mixed speech y (n) is composed of an additive noise model of the following equation (1) by speech (no noise) speech x (n) and noise d (n).

波形切り出し部１２は、音声入力部１１から取得した雑音混入音声ｙ（ｎ）を、適切な分析フレームごとに切り出す。例えば、分析窓幅Ｎを２５６サンプル（約１６ミリ秒（＝２５６／１６ｋＨｚ））とし、シフト幅を窓幅の半分の長さ（Ｎ／２）の１２８サンプル（約８ミリ秒）とする。なお、分析窓幅を適宜異なる値としても良い。そして、切り出したｍ番目のフレームにおけるｎ番目のサンプルのデータをｙ（ｍ，ｎ）と表す。また簡便のため、ｍ番目のフレームで切り出した雑音混入音声のベクトルを、下の式（２）のようにｙで表す。 The waveform cutout unit 12 cuts out the noise-containing voice y (n) acquired from the voice input unit 11 for each appropriate analysis frame. For example, the analysis window width N is set to 256 samples (about 16 milliseconds (= 256/16 kHz)), and the shift width is set to 128 samples (about 8 milliseconds), which is half the window width (N / 2). Note that the analysis window width may be set to different values as appropriate. The data of the nth sample in the cut out mth frame is represented as y (m, n). For simplicity, the noise-mixed speech vector cut out in the mth frame is represented by y as in the following equation (2).

なお、式（２）の右辺の右肩に付する「Ｔ」は転置を表す。また、白抜き太字の「Ｒ」は実数の集合を表わす。つまり、ベクトルｙは、実数を要素とするＮ次元の列ベクトルである。そして、波形切り出し部１２は、このベクトルｙのデータを抑圧処理部１４−１〜１４−Ｉのそれぞれに供給する。 In addition, "T" attached | subjected to the right shoulder of the right side of Formula (2) represents transposition. Also, white bold “R” represents a set of real numbers. That is, the vector y is an N-dimensional column vector whose elements are real numbers. Then, the waveform cutout unit 12 supplies the data of the vector y to each of the suppression processing units 14-1 to 14-I.

抑圧処理部１４−１〜１４−Ｉの各々は、独自の雑音抑圧法により、与えられる雑音混入音声ベクトルｙを処理する。ｉ番目（ｉ＝１，２，・・・，Ｉ）の抑圧処理部１４−ｉは、自己の雑音抑圧法Ｆｉによって、雑音抑圧音声を求める処理を行なう。このとき、抑圧処理部１４−１〜１４−Ｉは、互いに性質の異なるＩ個の雑音抑圧法をそれぞれ用いることが望ましい。雑音抑圧法Ｆｉをある種の関数とみなすと、抑圧処理部１４−１〜１４−Ｉによる処理は下の式（３）で表わされる。 Each of the suppression processing units 14-1 to 14-I processes a given noise-mixed speech vector y by a unique noise suppression method. The i-th (i = 1, 2,..., I) suppression processing unit 14-i performs processing for obtaining noise-suppressed speech by using its own noise suppression method Fi. At this time, it is desirable that the suppression processing units 14-1 to 14-I use I noise suppression methods having different properties. When the noise suppression method Fi is regarded as a certain function, the processing by the suppression processing units 14-1 to 14-I is expressed by the following expression (3).

式（３）において、［ｘ＾］ｉは、雑音混入音声ベクトルｙを入力として抑圧処理部１４−ｉが算出する雑音抑圧音声ベクトルである。また、［Ｘ＾］は、雑音抑圧音声ベクトル［ｘ＾］１から［ｘ＾］Ｉまでを行ベクトルとする雑音抑圧音声行列（Ｎ行Ｉ列）である。なお、各雑音抑圧音声ベクトル［ｘ＾］ｉはそろっているものとする。また、Ｉは、雑音抑圧法の番号の集合である。即ち、Ｉ＝｛１，２，・・・，Ｉ｝である。 In Expression (3), [x ^] i is a noise-suppressed speech vector calculated by the suppression processing unit 14-i using the noise-mixed speech vector y as an input. [X ^] is a noise-suppressed speech matrix (N rows and I columns) having row vectors from the noise-suppressed speech vectors [x ^] 1 to [x ^] I. Note that it is assumed that each noise-suppressed speech vector [x ^] i is available. I is a set of noise suppression method numbers. That is, I = {1, 2,..., I}.

抑圧処理部１４−１〜１４−Ｉの各々は、自己が算出した雑音抑圧音声ベクトル［ｘ＾］ｉのデータを、雑音抑圧音声行列記憶部１５に書き込む。 Each of the suppression processing units 14-1 to 14 -I writes the data of the noise suppression speech vector [x ^] i calculated by itself into the noise suppression speech matrix storage unit 15.

次に、重み算出部１６は、得られた雑音抑圧音声行列を基に、音声統合のための重み係数を求める。各々の雑音抑圧法Ｆｉに対応する重み係数をｗｉとして、これらの重み係数を要素とする列ベクトルｗを下の式（４）のように定義する。 Next, the weight calculation unit 16 obtains a weight coefficient for speech integration based on the obtained noise-suppressed speech matrix. A weight vector corresponding to each noise suppression method Fi is defined as wi, and a column vector w having these weight coefficients as elements is defined as in the following Expression (4).

なお、重み係数ｗｉを要素とするベクトルｗの算出方法については、後で詳細に説明する。
次に、音声統合部１７は、雑音抑圧音声行列記憶部１５から読み出した雑音抑圧音声行列［Ｘ＾］に、各雑音抑圧法に対応する重み係数ｗｉのベクトルｗを乗じることにより、統合雑音抑圧音声ベクトル［ｘ〜］を算出する。つまり、統合雑音抑圧音声ベクトル［ｘ〜］は、下の式（５）により表わされる。即ち、このベクトル［ｘ〜］は、各雑音抑圧音声ベクトル［ｘ＾］ｉに重み係数ｗｉを乗じて混合したもの（重み係数による積和形）に相当する。 A method for calculating the vector w having the weight coefficient wi as an element will be described in detail later.
Next, the speech integration unit 17 multiplies the noise-suppressed speech matrix [X ^] read from the noise-suppressed speech matrix storage unit 15 by the vector w of the weight coefficient wi corresponding to each noise suppression method, thereby integrating noise suppression. A speech vector [x˜] is calculated. That is, the integrated noise-suppressed speech vector [x˜] is expressed by the following equation (5). That is, the vector [x˜] corresponds to a product obtained by multiplying each noise-suppressed speech vector [x ^] i by the weighting factor wi and mixing (a product-sum type using the weighting factor).

式（５）で算出される統合雑音抑圧音声ベクトル［ｘ〜］は、Ｎ次元の列ベクトルである。
下の式（６）は、求めるべき音声のベクトルｘと各雑音抑圧音声ベクトル［ｘ＾］ｉとの誤差のベクトルｅｉとの関係を表わす。また、Ｅは、誤差ベクトルｅ１，ｅ２，・・・，ｅｉを列ベクトルとする行列（Ｎ行Ｉ列）である。 The integrated noise-suppressed speech vector [x˜] calculated by Expression (5) is an N-dimensional column vector.
Equation (6) below represents the relationship between the speech vector x to be obtained and the error vector ei between each noise-suppressed speech vector [x ^] i. E is a matrix (N rows and I columns) having error vectors e1, e2,.

式（６）より、下の式（７）を得られる。即ち、式（７）により、各雑音抑圧音声ベクトル［ｘ＾］ｉを表わすことができる。 From the equation (6), the following equation (7) can be obtained. That is, each noise-suppressed speech vector [x ^] i can be expressed by Expression (7).

式（７）を式（３）に代入すると、式（５）で表わした統合雑音抑圧音声ベクトル［ｘ〜］は、下の式（８）のように、求めるべき音声ベクトルｘと誤差の行列Ｅとで表せる。 Substituting Equation (7) into Equation (3), the integrated noise-suppressed speech vector [x˜] represented by Equation (5) is a matrix of error and the speech vector x to be obtained as shown in Equation (8) below. It can be expressed as E.

式（８）における右辺の第２項（誤差行列と重み係数ベクトルの積）を最小化するような重み係数ベクトルｗを求めれば、統合雑音抑圧音声ベクトル［ｘ〜］が、求めるべき音声ベクトルｘに近づくことになる。しかしながら、実際には、求めるべき音声ベクトルｘと誤差行列Eは不明である。そこで、最適な重み係数ベクトルｗｏｐｔを求めるために、重み係数ベクトルｗに関する最適化手法を用いる。本実施形態で用いる最適化手法では、相関係数を用いる。 If the weighting coefficient vector w that minimizes the second term (product of the error matrix and the weighting coefficient vector) on the right side in Equation (8) is obtained, the integrated noise-suppressed speech vector [x˜] becomes the speech vector x to be obtained. Will approach. However, in reality, the speech vector x and the error matrix E to be obtained are unknown. Therefore, in order to obtain the optimum weight coefficient vector wopt, an optimization method related to the weight coefficient vector w is used. The optimization method used in this embodiment uses a correlation coefficient.

図２は、重み算出部の詳細な機能構成を示すブロック図である。図示するように、重み算出部１６は、相互相関係数算出部２０１と、相互相関係数加算部２０２と、重み係数正規化部２０３とを含んで構成される。
相互相関係数算出部２０１は、各雑音抑圧法による雑音抑圧音声ベクトルを基に、それらのベクトル間の相互相関係数を算出する。つまり、相互相関係数算出部２０１は、分析フレームごとに雑音抑圧法間の相互相関係数を算出する。
相互相関係数加算部２０２は、ある雑音抑圧法について、その雑音抑圧法と他の雑音抑圧法との間の相互相関係数を、前記他の雑音抑圧法のすべてについて加算する（総和をとる）。この値が、その雑音抑圧法についての重み係数値の元となる。
重み係数正規化部２０３は、相互相関係数加算部２０２によって算出された雑音抑圧法ごとの重み係数値を正規化する。具体的には、重み係数正規化部２０３は、すべての雑音抑圧法についての重み係数の総和が例えば１になるように、調整する。
各部の処理の詳細については、以下で説明する。 FIG. 2 is a block diagram illustrating a detailed functional configuration of the weight calculation unit. As illustrated, the weight calculation unit 16 includes a cross-correlation coefficient calculation unit 201, a cross-correlation coefficient addition unit 202, and a weight coefficient normalization unit 203.
The cross-correlation coefficient calculation unit 201 calculates a cross-correlation coefficient between these vectors based on noise-suppressed speech vectors obtained by the respective noise suppression methods. That is, the cross-correlation coefficient calculation unit 201 calculates a cross-correlation coefficient between noise suppression methods for each analysis frame.
The cross-correlation coefficient adding unit 202 adds a cross-correlation coefficient between the noise suppression method and another noise suppression method for a certain noise suppression method for all the other noise suppression methods (takes a summation). ). This value is the basis of the weight coefficient value for the noise suppression method.
The weight coefficient normalization unit 203 normalizes the weight coefficient value for each noise suppression method calculated by the cross correlation coefficient addition unit 202. Specifically, the weight coefficient normalization unit 203 adjusts so that the sum of the weight coefficients for all noise suppression methods becomes 1, for example.
Details of the processing of each unit will be described below.

以下では、図２のブロック図に沿って、重み係数を算出する手順について説明する。
まず、相互相関係数算出部２０１が、各雑音抑圧法による各雑音抑圧音声ベクトル［ｘ＾］ｉどうしの相互相関係数を求める。雑音抑圧音声ベクトル［ｘ＾］ｉと［ｘ＾］ｊの間の相互相関係数ｘｃｏｒｉ，ｊは、下の式（９）により計算される。 Hereinafter, the procedure for calculating the weighting factor will be described with reference to the block diagram of FIG.
First, the cross-correlation coefficient calculation unit 201 obtains a cross-correlation coefficient between each noise-suppressed speech vector [x ^] i by each noise suppression method. The cross-correlation coefficient xcori, j between the noise-suppressed speech vectors [x ^] i and [x ^] j is calculated by the following equation (9).

式（９）において、Ｅ［］は期待値である。つまり、式（９）により算出される相互相関係数は、即ち雑音抑圧音声ベクトル［ｘ＾］ｉと［ｘ＾］ｊの共分散を、それぞれの標準偏差で除したものである。雑音抑圧音声ベクトル［ｘ＾］ｉと［ｘ＾］ｊは、互いに異なる性質を有する雑音抑圧法を用いていて得られたものである。従って、音声区間では各雑音抑圧音声どうしの相互相関係数ｘｃｏｒｉ，ｊが高くなり、非音声区間（雑音区間）では各雑音抑圧音声どうしの相互相関係数ｘｃｏｒｉ，ｊが低くなることが期待される。
なお、相互相関係数算出部２０１は、ｉ，ｊ∈Ｉ、ｉ≠ｊである全てのｉとｊの組み合わせについて、相互相関係数を算出する。そして、相互相関係数算出部２０１は、求められた相互相関係数を相互相関係数加算部２０２に渡す。 In the equation (9), E [] is an expected value. That is, the cross-correlation coefficient calculated by Equation (9) is obtained by dividing the covariance of the noise-suppressed speech vectors [x ^] i and [x ^] j by their standard deviations. The noise-suppressed speech vectors [x ^] i and [x ^] j are obtained by using noise suppression methods having different properties. Therefore, the cross-correlation coefficient xcori, j between the noise-suppressed voices is expected to be high in the speech section, and the cross-correlation coefficient xcori, j between the noise-suppressed voices is expected to be low in the non-speech period (noise section). The
The cross-correlation coefficient calculation unit 201 calculates cross-correlation coefficients for all i and j combinations where i, jεI, and i ≠ j. Then, the cross-correlation coefficient calculating unit 201 passes the obtained cross-correlation coefficient to the cross-correlation coefficient adding unit 202.

次に相互相関係数加算部２０２は、相互相関係数算出部２０１によって算出された相互相関係数ｘｃｏｒｉ，ｊを用いて、各雑音抑圧音声ベクトル［ｘ＾］ｉに対する重み係数［ｗ＾］ｉを算出する。重み係数［ｗ＾］ｉは、下の式（１０）によって算出される。 Next, the cross-correlation coefficient adding unit 202 uses the cross-correlation coefficient xcori, j calculated by the cross-correlation coefficient calculating unit 201 to use the weight coefficient [w ^] for each noise-suppressed speech vector [x ^] i. i is calculated. The weighting coefficient [w ^] i is calculated by the following equation (10).

ここで、ｎは重み係数の度合いを設定する指数であり、例えばｎ＝２とする。なお設定等により、ｎの値を適宜変えても良い。式（１０）に示すように、重み係数［ｗ＾］ｉは、雑音抑圧音声ベクトル［ｘ＾］ｉに係る相関係数を加算したものに基づく。言い換えれば、重み係数［ｗ＾］ｉは、雑音抑圧音声ベクトル［ｘ＾］ｉと［ｘ＾］ｊの間の相関係数の、ｊに関する総和（但し、ｉ≠ｊ）に基づき、その総和をｎ乗して得られる。 Here, n is an index for setting the degree of the weight coefficient, and for example, n = 2. Note that the value of n may be appropriately changed depending on the setting. As shown in Expression (10), the weighting coefficient [w ^] i is based on the sum of the correlation coefficients related to the noise-suppressed speech vector [x ^] i. In other words, the weighting coefficient [w ^] i is based on the sum of the correlation coefficients between the noise-suppressed speech vectors [x ^] i and [x ^] j with respect to j (where i ≠ j). To the nth power.

次に、重み係数正規化部２０３は、重み係数ベクトル［ｗ＾］が式（４）を満たすように、式（１０）で得られた重み係数を正規化する。正規化された重み係数のベクトルは、式（１１）で表わされる。 Next, the weight coefficient normalization unit 203 normalizes the weight coefficient obtained by Expression (10) so that the weight coefficient vector [w ^] satisfies Expression (4). The normalized weight coefficient vector is expressed by Equation (11).

そして、このようにして得られた重み係数ベクトル［ｗ＾］を最適な重み係数ベクトルｗｏｐｔとする。即ち、ｗｏｐｔ＝［ｗ＾］である。重み算出部１６は、このようにして得られた重み係数ベクトルｗｏｐｔを出力する。 Then, the weight coefficient vector [w ^] obtained in this way is set as the optimum weight coefficient vector wopt. That is, wopt = [w ^]. The weight calculation unit 16 outputs the weight coefficient vector wopt obtained in this way.

図１のブロック図に戻り、音声統合部１７は、重み算出部１６から供給される最適な重み係数ベクトルｗｏｐｔを式（５）に適用して、即ち式（５）のｗにｗｏｐｔを代入して、最適な統合雑音抑圧音声ベクトルを算出する。音声統合部１７は、下の式（１２）によって最適な統合雑音抑圧音声ベクトル［ｘ〜］ｏｐｔを算出する。 Returning to the block diagram of FIG. 1, the speech integration unit 17 applies the optimum weight coefficient vector wopt supplied from the weight calculation unit 16 to the equation (5), that is, substitutes wopt for w in the equation (5). Thus, an optimal integrated noise suppression speech vector is calculated. The speech integration unit 17 calculates an optimal integrated noise suppression speech vector [x˜] opt by the following equation (12).

式（１２）において、ｍはフレームのインデックス、ｎはフレーム内のサンプルのインデックスを表す。また、下の式（１３）に示すように、ｃは式（９）の相互相関係数を平均して得られる定数であり、非音声区間（雑音区間）を抑圧する度合いを設定するのに用いる。 In equation (12), m represents the index of the frame, and n represents the index of the sample in the frame. Also, as shown in the following equation (13), c is a constant obtained by averaging the cross-correlation coefficients of equation (9), and is used to set the degree of suppression of non-speech intervals (noise intervals). Use.

式（１３）におけるｋは定数の度合いを設定する指数であり、たとえばｋ＝２とする。なお設定等により、ｋの値を適宜変えても良い。 K in the equation (13) is an index for setting the degree of constant, and for example, k = 2. Note that the value of k may be appropriately changed depending on the setting.

波形重ね合わせ部１８は、式（１２）で算出された時間波形［ｘ〜］ｏｐｔ（ｍ，ｎ）をフレームごとにシフト幅分ずらして、重ね合わせることにより、雑音抑圧音声［ｘ〜］（ｎ）を得る。 The waveform superimposing unit 18 shifts the time waveform [x˜] opt (m, n) calculated by the equation (12) by the shift width for each frame and superimposes the noise suppressed speech [x˜] ( n).

図３および図４は、本実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフである。図３の（ａ）〜（ｇ）は、それぞれ音声波形を示すものであり、横軸は時刻、縦軸は振幅である。また、図４の（ａ）〜（ｇ）は、それぞれ、音声スペクトルを示すものであり、横軸は時刻、縦軸は周波数である。図３および図４の横軸の単位は秒である。図４の縦軸の単位はヘルツである。図４は、周波数ごとの成分の強さの時間推移をグレースケールの濃さで表わしており、色が濃いほど（つまり、黒に近いほど）成分が強い。本例では、３種類の雑音抑圧法を用いて、抑圧処理部１４−１、１４−２、１４−３のそれぞれが抑圧処理を実行した。図３および図４のそれぞれにおいて、（ａ）はクリーンスピーチ、（ｂ）は付加雑音、（ｃ）は雑音抑圧装置への入力となる雑音混入音声、（ｄ）は本実施形態の雑音抑圧装置によって雑音を抑圧した音声、（ｅ）は雑音抑圧法１による雑音抑圧音声（抑圧処理部１４−１からの出力）、（ｆ）は雑音抑圧法２による雑音抑圧音声（抑圧処理部１４−２からの出力）、（ｇ）は雑音抑圧法３による雑音抑圧音声（抑圧処理部１４−３からの出力）の例を示す。 3 and 4 are graphs showing examples of results of noise suppression by the noise suppression device of the present embodiment. (A) to (g) in FIG. 3 each show a speech waveform, where the horizontal axis represents time and the vertical axis represents amplitude. Moreover, (a) to (g) in FIG. 4 each show a speech spectrum, the horizontal axis is time, and the vertical axis is frequency. The unit of the horizontal axis in FIGS. 3 and 4 is second. The unit of the vertical axis in FIG. 4 is hertz. FIG. 4 represents the time transition of the strength of the component for each frequency as the gray scale density, and the darker the color (that is, the closer to black), the stronger the component. In this example, each of the suppression processing units 14-1, 14-2, and 14-3 performs the suppression process using three types of noise suppression methods. 3 and 4, (a) is clean speech, (b) is additional noise, (c) is a noise-mixed speech that is input to the noise suppression device, and (d) is the noise suppression device of this embodiment. (E) is a noise-suppressed speech by the noise suppression method 1 (output from the suppression processing unit 14-1), and (f) is a noise-suppressed speech by the noise suppression method 2 (suppression processing unit 14-2). And (g) show examples of noise-suppressed speech (output from the suppression processing unit 14-3) by the noise suppression method 3.

雑音抑圧法１〜３に比べて、本手法により音声区間の劣化を抑え、非音声区間（雑音区間）の雑音が効果的に抑圧されているのがわかる。例えば、図３と図４において、（ｄ）本実施形態による雑音抑圧結果における雑音部分が（ｇ）の雑音抑圧法３なみに小さく、且つ、（ｄ）本実施形態の音声部分が（ｅ）の雑音抑圧法１なみに明瞭で情報欠落やひずみが少ない。また、図４において、（ｇ）雑音抑圧法３では音声部分で高域（グラフ上側）の情報が失われているが、（ｄ）本実施形態では、（ｅ）雑音抑圧法１のように情報が残っている。このような違いがグラフからも確認できるが、客観評価値を用いた評価についても後で述べる。 Compared with the noise suppression methods 1 to 3, it can be seen that the present method suppresses the degradation of the speech section and effectively suppresses the noise in the non-speech section (noise section). For example, in FIGS. 3 and 4, (d) the noise part in the noise suppression result according to the present embodiment is as small as (g) noise suppression method 3, and (d) the voice part of the present embodiment is (e). The noise suppression method 1 is clear and there is little information loss and distortion. In FIG. 4, (g) noise suppression method 3 has lost high-frequency information (upper graph) in the speech portion, but (d) in this embodiment, (e) noise suppression method 1 is used. Information remains. Although such a difference can be confirmed from the graph, the evaluation using the objective evaluation value will be described later.

図５は、本実施形態の雑音抑圧装置による雑音抑圧処理時に、重み算出部１６が算出した重み係数値の時間変化を示すグラフである。同図（ａ）〜（ｃ）のそれぞれにおいて、横軸は時刻を表わし、縦軸は重み係数の値を表わす。なお、横軸の単位は秒（second）である。図５に示す重み係数の値は、図３および図４に示した雑音抑圧処理実施結果に対応するものである。このグラフにおいて、抑圧処理部１４−１，１４−２，１４−３に対応する重み係数の値が、それぞれ、（ａ）のｗ１、（ｂ）のｗ２、（ｃ）のｗ３である。重み係数の値が大きいほど、統合雑音抑圧音声への寄与が大きいことを示している。グラフに示すように、音声区間では抑圧処理部１４−２（雑音抑圧法２）の寄与が大きく、非音声区間（雑音区間）では抑圧処理部１４−３（雑音抑圧法３）の寄与が大きい。このように音声区間であるか非音声区間であるかに応じて異なる雑音抑圧法の寄与が大きくなることは、図４に示した音声スペクトルとも整合する結果である。 FIG. 5 is a graph showing the time change of the weighting coefficient value calculated by the weight calculation unit 16 during the noise suppression processing by the noise suppression device of the present embodiment. In each of FIGS. 4A to 4C, the horizontal axis represents time, and the vertical axis represents the value of the weighting factor. The unit of the horizontal axis is second. The values of the weighting factors shown in FIG. 5 correspond to the noise suppression processing execution results shown in FIGS. In this graph, the weighting factor values corresponding to the suppression processing units 14-1, 14-2, and 14-3 are w1 in (a), w2 in (b), and w3 in (c), respectively. It shows that the larger the value of the weight coefficient, the greater the contribution to the integrated noise-suppressed speech. As shown in the graph, the contribution of the suppression processor 14-2 (noise suppression method 2) is large in the speech section, and the contribution of the suppression processor 14-3 (noise suppression method 3) is large in the non-speech section (noise section). . The increase in the contribution of the different noise suppression methods depending on whether the speech section is a non-speech section is a result that matches the speech spectrum shown in FIG.

図６は、本実施形態の雑音抑圧装置による雑音抑圧処理時に、相互相関係数算出部２０１が算出した相互相関係数の時間変化を示すグラフである。同図（ａ）〜（ｄ）のそれぞれにおいて、横軸は時刻を表わし、（ａ）〜（ｃ）の縦軸は相互相関値、（ｄ）の縦軸は式（１３）により相互相関値を平均して得られる定数の値を表わす。なお、横軸の単位は秒（second）である。（ａ）の「Ｍ１：Ｍ２」と記載しているグラフは、雑音抑圧法１と２との間の相互相関係数ｘｃｏｒ１，２の時間変化を表わす。（ｂ）の「Ｍ２：Ｍ３」と記載しているグラフは、雑音抑圧法２と３との間の相互相関係数ｘｃｏｒ２，３の時間変化を表わす。（ｃ）の「Ｍ３：Ｍ１」と記載しているグラフは、雑音抑圧法３と１との間の相互相関係数ｘｃｏｒ３，１の時間変化を表わす。また、（ｄ）の「ｃ」と記載しているグラフは、非音声区間（雑音区間）を抑圧する度合いの時間変化を示す。いずれの相互相関係数も、音声区間では高い値を示し、非音声区間（雑音区間）では低い値を示している。そして、非音声区間（雑音区間）を抑圧する度合いｃにより、雑音抑圧の効果を強調することができる。 FIG. 6 is a graph showing the time change of the cross-correlation coefficient calculated by the cross-correlation coefficient calculation unit 201 during the noise suppression processing by the noise suppression apparatus of the present embodiment. In each of the drawings (a) to (d), the horizontal axis represents time, the vertical axis of (a) to (c) is the cross-correlation value, and the vertical axis of (d) is the cross-correlation value according to equation (13). Represents the value of a constant obtained by averaging. The unit of the horizontal axis is second. The graph described as “M1: M2” in (a) represents the time change of the cross-correlation coefficients xcor1, 2 between the noise suppression methods 1 and 2. The graph described as “M2: M3” in (b) represents the time change of the cross-correlation coefficient xcor2, 3 between the noise suppression methods 2 and 3. The graph described as “M3: M1” in (c) represents the time change of the cross-correlation coefficient xcor3,1 between the noise suppression methods 3 and 1. In addition, the graph described as “c” in (d) shows a temporal change in the degree of suppressing the non-voice section (noise section). All the cross-correlation coefficients show a high value in the speech section and a low value in the non-speech section (noise section). Then, the effect of noise suppression can be enhanced by the degree c of suppressing the non-voice section (noise section).

次に、本実施形態の雑音抑圧装置による処理結果の客観評価値について説明する。雑音抑圧手法を客観的に評価するためにはさまざまな方法があるが、主観的な評価結果との乖離が少ないものが好ましい。ここでは客観評価値として、周波数重み付セグメンタルＳＮＲ（frequency-weighted segmental SNR；以下では「ｆｗＳＮＲｓｅｇ」と言う。）を用いる。ｆｗＳＮＲｓｅｇは、下の式（１４）により算出できる。 Next, the objective evaluation value of the processing result by the noise suppression apparatus of this embodiment will be described. There are various methods for objectively evaluating the noise suppression method, but a method with little deviation from the subjective evaluation result is preferable. Here, a frequency-weighted segmental SNR (hereinafter referred to as “fwSNRseg”) is used as an objective evaluation value. fwSNRseg can be calculated by the following equation (14).

式（１４）において、Ｂｊは、ｊ番目（ｊ＝１，２，・・・，Ｋ）の周波数帯域に対する重みである。Ｋは、周波数帯域の数であり、例えばＫ＝２５とする。Ｍは、信号の全フレーム数である。｜Ｘ（ｍ，ｊ）｜は、クリーンスピーチのｍ番目のフレームの、ｊ番目の周波数帯域のフィルターバンクの振幅である。｜［Ｘ＾］（ｍ，ｊ）｜は雑音抑圧した信号のｍ番目のフレームの、ｊ番目の周波数帯域のフィルターバンクの振幅である。このｆｗＳＮＲｓｅｇでは、segmental SNRに聴覚的な周波数帯域ごとの重みづけがされているため、主観的な聴感試験の結果と相関が高い。ｆｗＳＮＲｓｅｇの評価値が大きいほど評価が高い。客観評価値ｆｗＳＮＲｓｅｇについては、下記の参考文献にも記載されている。
参考文献：Tribolet, J., Noll, P., McDermott, B., and Crochiere, R. E. “A study of complexity and quality of speech waveform coders.” Proc. IEEE Int. Conf. Acoust. , Speech, Signal Processing, 586-590，１９７８年． In Expression (14), Bj is a weight for the jth (j = 1, 2,..., K) frequency band. K is the number of frequency bands, for example, K = 25. M is the total number of frames of the signal. | X (m, j) | is the amplitude of the filter bank of the jth frequency band of the mth frame of the clean speech. | [X ^] (m, j) | is the amplitude of the filter bank in the jth frequency band of the mth frame of the noise-suppressed signal. In this fwSNRseg, since the segmental SNR is weighted for each auditory frequency band, the correlation with the result of the subjective auditory test is high. The higher the evaluation value of fwSNRseg, the higher the evaluation. The objective evaluation value fwSNRseg is also described in the following reference.
References: Tribolet, J., Noll, P., McDermott, B., and Crochiere, RE “A study of complexity and quality of speech waveform coders.” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 586-590, 1978.

上記のｆｗＳＮＲｓｅｇを用いた本実施形態の評価結果は、下の表１の通りである。 The evaluation results of the present embodiment using the above fwSNRseg are as shown in Table 1 below.

表１は、クリーンスピーチと各雑音抑圧法および本実施形態での提案法との間の客観評価値（ｆｗＳＮＲｓｅｇ）を示す。この客観評価値の結果からも、雑音抑圧法１〜３をそれぞれ単独で使用する場合よりも、本実施形態による雑音抑圧結果の方が高品質であることがわかる。 Table 1 shows objective evaluation values (fwSNRseg) between clean speech, each noise suppression method, and the proposed method in the present embodiment. Also from the result of this objective evaluation value, it can be seen that the noise suppression result according to the present embodiment is of higher quality than the case where each of the noise suppression methods 1 to 3 is used alone.

以上のように、異なる性質を持つ複数の雑音抑圧法で得られた雑音抑圧音声を時間領域で混合する際に、相関係数を用いて算出した重みづけ係数により、各雑音抑圧法からの雑音抑圧音声への重み付けを行うことにより、雑音成分のエネルギー低減、および、音声部分のエネルギー増幅の効果が的確に得られ、高品質な雑音抑圧音声を簡便に得ることができる。 As described above, when mixing noise-suppressed speech obtained by multiple noise suppression methods with different properties in the time domain, the noise from each noise suppression method is calculated using the weighting coefficient calculated using the correlation coefficient. By weighting the suppressed speech, the effects of energy reduction of the noise component and energy amplification of the speech portion can be accurately obtained, and high-quality noise-suppressed speech can be easily obtained.

［第２の実施形態］
次に、第２の実施形態について説明する。なお、前述の実施形態と同様の事項については説明を省略し、本実施形態特有の事項を中心に説明する。
図７は、第２の実施形態による雑音抑圧装置の機能構成を示すブロック図である。なお、前実施形態と同一の処理を行なう機能ブロックについては、前実施形態の説明と同一の符号を付与している。図示するように、雑音抑圧装置２は、音声入力部１１と、波形切り出し部１２と、周波数特性算出部２２と、位相特性算出部２４と、Ｉ個（Ｉは、２以上の整数）の抑圧処理部１４−１〜１４−Ｉと、I個の周波数特性算出部２５−１〜２５−Ｉと、I個の振幅特性算出部２６−１〜２６−Ｉと、雑音抑圧振幅特性行列記憶部３５と、重み算出部３６と、音声統合部３７と、周波数特性算出部３８と、音声波形算出部３９と、波形重ね合わせ部１８と、音声出力部１９とを含んで構成される。 [Second Embodiment]
Next, a second embodiment will be described. In addition, description is abbreviate | omitted about the matter similar to the above-mentioned embodiment, and it demonstrates centering on the matter peculiar to this embodiment.
FIG. 7 is a block diagram illustrating a functional configuration of the noise suppression device according to the second embodiment. Note that functional blocks that perform the same processing as in the previous embodiment are assigned the same reference numerals as those in the previous embodiment. As shown in the figure, the noise suppression apparatus 2 includes a voice input unit 11, a waveform cutout unit 12, a frequency characteristic calculation unit 22, a phase characteristic calculation unit 24, and I (I is an integer of 2 or more) suppression. Processing units 14-1 to 14-I, I frequency characteristic calculation units 25-1 to 25-I, I amplitude characteristic calculation units 26-1 to 26-I, and a noise suppression amplitude characteristic matrix storage unit 35, a weight calculation unit 36, a voice integration unit 37, a frequency characteristic calculation unit 38, a voice waveform calculation unit 39, a waveform superposition unit 18, and a voice output unit 19.

音声入力部１１および波形切り出し部１２は、それぞれ、第１の実施形態におけるそれらと同様の機能を有する。
周波数特性算出部２２は、波形切り出し部１２によって切り出された音声データ（雑音混入音声）を基に、フーリエ変換により、その周波数特性データを算出する。
位相特性算出部２４は、周波数特性算出部２２によって得られた周波数特性データを基に、位相特性データを算出する。 The voice input unit 11 and the waveform cutout unit 12 have the same functions as those in the first embodiment.
The frequency characteristic calculation unit 22 calculates the frequency characteristic data by Fourier transform based on the voice data (noise-mixed voice) cut out by the waveform cutout unit 12.
The phase characteristic calculation unit 24 calculates phase characteristic data based on the frequency characteristic data obtained by the frequency characteristic calculation unit 22.

抑圧処理部１４−１〜１４−Ｉは、第１の実施形態における抑圧処理部と同様の機能を有する。抑圧処理部１４−１〜１４−Ｉは、それぞれが異なる性質の雑音抑圧法を用いるものである。
周波数特性算出部２５−１〜２５−Ｉは、それぞれ、抑圧処理部１４−１〜１４−Ｉによって算出された雑音抑圧音声データを元に、フーリエ変換により、その周波数特性データを算出する。
振幅特性算出部２６−１〜２６−Ｉは、それぞれ、周波数特性算出部２５−１〜２５−Ｉによって得られた周波数特性データを元に、振幅特性データを算出する。
雑音抑圧振幅特性行列記憶部３５は、振幅特性算出部２６−１〜２６−Ｉによって得られる雑音抑圧音声の振幅特性データを記憶する。具体的には、雑音抑圧振幅特性行列記憶部３５は、振幅特性算出部２６−１〜２６−Ｉがそれぞれ生成する振幅特性ベクトルを並べて構成される雑音抑圧振幅特性行列の形式で、データを記憶する。 The suppression processing units 14-1 to 14-I have the same functions as the suppression processing unit in the first embodiment. The suppression processing units 14-1 to 14-I use noise suppression methods having different properties.
The frequency characteristic calculation units 25-1 to 25-I calculate the frequency characteristic data by Fourier transform based on the noise-suppressed speech data calculated by the suppression processing units 14-1 to 14-I, respectively.
The amplitude characteristic calculation units 26-1 to 26-I calculate amplitude characteristic data based on the frequency characteristic data obtained by the frequency characteristic calculation units 25-1 to 25-I, respectively.
The noise suppression amplitude characteristic matrix storage unit 35 stores the amplitude characteristic data of the noise suppression speech obtained by the amplitude characteristic calculation units 26-1 to 26-I. Specifically, the noise suppression amplitude characteristic matrix storage unit 35 stores data in the form of a noise suppression amplitude characteristic matrix configured by arranging the amplitude characteristic vectors respectively generated by the amplitude characteristic calculation units 26-1 to 26-I. To do.

重み算出部３６は、雑音抑圧振幅特性行列記憶部３５に記憶されている雑音抑圧音声の振幅特性データを読み出し、このデータに基づいてそれぞれの雑音抑圧法のための重み係数を算出する。重み算出部１６は、混合後の雑音抑圧結果が最適となるように重み係数を算出する。
音声統合部３７は、振幅特性算出部２６−１〜２６−Ｉから出力された振幅特性データを混合する。このとき、音声統合部３７は、それぞれの雑音抑圧法に応じた重み係数を用いて、雑音抑圧法ごとの混合比率の重み付けを行なう。音声統合部３７は、上記の分析フレームごとに雑音抑圧音声データを混合する。なお、重み係数は、重み算出部１６によって算出されたものである。本実施形態では、音声統合部３７が混合するデータは、周波数領域の音声信号のデータである。 The weight calculation unit 36 reads the amplitude characteristic data of the noise-suppressed speech stored in the noise suppression amplitude characteristic matrix storage unit 35, and calculates a weight coefficient for each noise suppression method based on this data. The weight calculation unit 16 calculates a weight coefficient so that the noise suppression result after mixing is optimized.
The voice integration unit 37 mixes the amplitude characteristic data output from the amplitude characteristic calculation units 26-1 to 26-I. At this time, the voice integration unit 37 performs weighting of the mixing ratio for each noise suppression method using a weighting coefficient corresponding to each noise suppression method. The voice integration unit 37 mixes the noise-suppressed voice data for each analysis frame. The weight coefficient is calculated by the weight calculation unit 16. In the present embodiment, the data mixed by the sound integration unit 37 is data of a frequency domain sound signal.

周波数特性算出部３８は、音声統合部３７から出力された混合後の振幅特性データ（最適な重み付けで混合されたデータ）と、位相特性算出部２４によって算出された入力音声の位相特性データとから、混合後の音声の周波数特性データを算出する。
音声波形算出部３９は、周波数特性算出部によって得られた周波数特性データを基に、逆フーリエ変換により、雑音抑圧音声の時間波形データを得る。この時間波形データは、分析フレームごとのデータである。
波形重ね合わせ部１８および音声出力部１９は、それぞれ、第１の実施形態におけるそれらと同様の機能を有する。 The frequency characteristic calculation unit 38 is based on the amplitude characteristic data after mixing (data mixed with optimum weighting) output from the voice integration unit 37 and the phase characteristic data of the input voice calculated by the phase characteristic calculation unit 24. The frequency characteristic data of the sound after mixing is calculated.
The voice waveform calculation unit 39 obtains time waveform data of noise-suppressed voice by inverse Fourier transform based on the frequency characteristic data obtained by the frequency characteristic calculation unit. This time waveform data is data for each analysis frame.
The waveform superimposing unit 18 and the audio output unit 19 have the same functions as those in the first embodiment.

以下において、雑音抑圧装置２による処理手順の詳細を説明する。なお、本実施形態においても、サンプリング周波数は１６ｋＨｚ、量子化ビット数は１６ビットである。また、雑音混入音声ベクトルの長さＮは２５６（約１６ミリ秒）である。
音声入力部１１は、外部から音声を取得する。また、波形切り出し部１２は、分析フレームごとに音声波形を切り出す。 Details of the processing procedure performed by the noise suppression device 2 will be described below. Also in this embodiment, the sampling frequency is 16 kHz and the number of quantization bits is 16 bits. The length N of the noise-containing speech vector is 256 (about 16 milliseconds).
The voice input unit 11 acquires voice from outside. The waveform cutout unit 12 cuts out a speech waveform for each analysis frame.

抑圧処理部１４−１〜１４−Ｉは、波形切り出し部１２がフレームごとに切り出した雑音混入音声ベクトルｙに対して、それぞれ性質の異なる雑音抑圧法Ｆｉ用いて、雑音を抑圧する処理を行う。この処理により、抑圧処理部１４−１〜１４−Ｉは、雑音抑圧音声ベクトル［ｘ＾］ｉ（ｉ＝１，２，・・・，Ｉ）を出力する。各雑音抑圧音声ベクトルの長さも、Ｎ（２５６）である。 The suppression processing units 14-1 to 14 -I perform processing to suppress noise using the noise suppression methods Fi having different properties with respect to the noise-mixed speech vector y extracted by the waveform cutting unit 12 for each frame. By this processing, the suppression processing units 14-1 to 14-I output noise-suppressed speech vectors [x ^] i (i = 1, 2,..., I). The length of each noise-suppressed speech vector is also N (256).

周波数特性算出部２５−１〜２５−Ｉは、それぞれ、抑圧処理部１４−１〜１４−Ｉから供給された雑音抑圧音声ベクトル［ｘ＾］ｉについて、周波数特性ベクトル［Ｘ＾］ｉを求める。なお、周波数特性算出部２５−１〜２５−Ｉは、それぞれの雑音抑圧音声ベクトルを元に、適切な窓関数（たとえばハミング窓ｗｈａｍｍ（ｎ）＝０．５４−０．４６ｃｏｓ（２πｎ／Ｎ）（ｎ＝１，・・・，Ｎ））を乗じて切り出した信号に離散フーリエ変換（ＦＦＴ）を実行することにより、周波数特性ベクトル［Ｘ＾］ｉを算出する。ＦＦＴのポイント数はＮである。 The frequency characteristic calculators 25-1 to 25-I obtain frequency characteristic vectors [X ^] i for the noise-suppressed speech vectors [x ^] i supplied from the suppression processors 14-1 to 14-I, respectively. . Note that the frequency characteristic calculators 25-1 to 25-I generate an appropriate window function (for example, a Hamming window whham (n) = 0.54−0.46 cos (2πn / N) based on each noise-suppressed speech vector. A frequency characteristic vector [X ^] i is calculated by performing discrete Fourier transform (FFT) on the signal cut out by multiplying (n = 1,..., N)). The number of FFT points is N.

振幅特性算出部２６−１〜２６−Ｉは、それぞれ、周波数特性算出部２５−１〜２５−Ｉから供給される周波数特性ベクトル［Ｘ＾］ｉの絶対値を取ることにより、各雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｉ｜を算出する。振幅特性算出部２６−１〜２６−Ｉは、それぞれ算出した振幅特性ベクトル｜［Ｘ＾］ｉ｜を、雑音抑圧振幅特性行列記憶部３５に書き込む。 The amplitude characteristic calculators 26-1 to 26 -I take the absolute values of the frequency characteristic vectors [X ^] i supplied from the frequency characteristic calculators 25-1 to 25 -I, respectively. An amplitude characteristic vector | [X ^] i | The amplitude characteristic calculation units 26-1 to 26-I write the calculated amplitude characteristic vectors | [X ^] i | into the noise suppression amplitude characteristic matrix storage unit 35, respectively.

雑音抑圧振幅特性行列記憶部３５は、各雑音抑圧法による雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｉ｜を行ベクトルとする雑音抑圧振幅特性行列［Ｘ＾］を記憶する。即ち、雑音抑圧振幅特性行列［Ｘ＾］を算出する一連の処理の過程は、下の式（１５）で表わされる。 The noise suppression amplitude characteristic matrix storage unit 35 stores a noise suppression amplitude characteristic matrix [X ^] having an amplitude characteristic vector | [X ^] i | of noise-suppressed speech by each noise suppression method as a row vector. That is, a series of processing steps for calculating the noise suppression amplitude characteristic matrix [X ^] is expressed by the following equation (15).

なお、式（１５）において、白抜き太字の「Ｃ」は複素数の集合を表わす。また、Ｒｅ（）は複素数の実部をとることを表わし、Ｉｍ（）は複素数の虚部をとることを表わす。また、ｗｈａｍｍは前述の窓関数である。
重み算出部３６は、下で説明する重み係数を算出する。
音声統合部３７は、重み算出部３６によって算出された重みを用いて、統合雑音抑圧振幅特性ベクトル｜［Ｘ〜］｜を求める。具体的には次の通りである。即ち、各雑音抑圧音声の振幅特性ベクトルよる雑音抑圧振幅特性行列［Ｘ＾］に、各雑音抑圧法Ｆｉに対応する重み係数ベクトルｗｉを列ベクトルとする行列Ｗを乗じて対角成分をとることにより、下の式（１６）のように統合雑音抑圧振幅特性ベクトル｜［Ｘ〜］｜を得る。 In Expression (15), white bold “C” represents a set of complex numbers. Re () represents taking the real part of the complex number, and Im () represents taking the imaginary part of the complex number. Also, whmm is the window function described above.
The weight calculation unit 36 calculates a weight coefficient described below.
The voice integration unit 37 obtains an integrated noise suppression amplitude characteristic vector | [X˜] | using the weight calculated by the weight calculation unit 36. Specifically, it is as follows. That is, a diagonal component is obtained by multiplying the noise suppression amplitude characteristic matrix [X ^] by the amplitude characteristic vector of each noise-suppressed speech by a matrix W having the weight coefficient vector wi corresponding to each noise suppression method Fi as a column vector. As a result, the integrated noise suppression amplitude characteristic vector | [X˜] | is obtained as in the following Expression (16).

得られた統合雑音抑圧振幅特性ベクトル｜［Ｘ〜］｜は、各雑音抑圧音声の振幅特性ベクトルの各ビン｜［Ｘ＾］ｉ［ｌ］｜に重み係数ｗｉ［ｌ］を乗じて混合したものに相当する。なお、式（１６）において、ｄｉａｇ［］は、行列の対角成分を抽出して得られる列ベクトルをとることを表わす。重み係数ベクトルｗｉはＮ次元の行ベクトルである。そして、行列ＷはＩ行Ｎ列である。 The obtained integrated noise suppression amplitude characteristic vector | [X˜] | is mixed by multiplying each bin | [X ^] i [l] | by the weight coefficient wi [l] of the amplitude characteristic vector of each noise-suppressed speech. It corresponds to a thing. In equation (16), diag [] represents taking a column vector obtained by extracting the diagonal components of the matrix. The weight coefficient vector wi is an N-dimensional row vector. The matrix W has I rows and N columns.

重み算出部３６は、第１の実施形態の場合と同様に、例えば相互相関係数を用いるなどの最適化手法により、最適な重み係数行列Ｗｏｐｔを求める。その重み算出の手順について、以下で説明する。
まず、重み算出部３６は、下の式（１７）のように、各雑音抑圧法による雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｉ｜どうしの相互相関係数を求める。即ち、相互相関係数ｘｃｏｒｉ，ｊは、雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｉ｜と｜［Ｘ＾］ｊ｜の共分散を、それぞれの標準偏差で除したものである。 As in the case of the first embodiment, the weight calculation unit 36 obtains an optimum weight coefficient matrix Wopt by an optimization method such as using a cross-correlation coefficient. The procedure for calculating the weight will be described below.
First, the weight calculation unit 36 obtains the cross-correlation coefficient between the amplitude characteristic vectors | [X ^] i | of the noise-suppressed speech by the respective noise suppression methods as in the following equation (17). That is, the cross-correlation coefficient xcori, j is obtained by dividing the covariance of the amplitude characteristic vectors | [X ^] i | and | [X ^] j |

次に、重み算出部３６は、得られた相互相関係数ｘｃｏｒｉ，ｊを用いて、下の式（１８）によって、各雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｊ｜に対する重み係数ベクトル［ｗ＾］ｉを算出する。ここで、ｎは重み係数の度合いを設定する指数であり、たとえばｎ＝２とする。 Next, the weight calculation unit 36 uses the obtained cross-correlation coefficient xcori, j, and the weight coefficient vector for the amplitude characteristic vector | [X ^] j | [W ^] i is calculated. Here, n is an index for setting the degree of the weight coefficient, and for example, n = 2.

式（１８）に示す計算は、各雑音抑圧音声の振幅特性ベクトル｜［Ｘ＾］ｊ｜に係る相関係数を加算するものであり、各ビンで共通のものになっている。そして、重み算出部３６は、重み係数ベクトル［ｗ＾］ｊから重み係数行列［Ｗ＾］を得る。 The calculation shown in Expression (18) is to add the correlation coefficient related to the amplitude characteristic vector | [X ^] j | of each noise-suppressed speech, and is common to each bin. Then, the weight calculation unit 36 obtains a weighting coefficient matrix [W ^] from the weighting coefficient vector [w ^] j.

なお、重み算出部３６は、重み係数行列［Ｗ＾］が式（１６）を満たすように、下の式（２０）による正規化を行う。なお、式（２０）の右辺は、 The weight calculation unit 36 performs normalization according to the following equation (20) so that the weight coefficient matrix [W ^] satisfies the equation (16). The right side of equation (20) is

このようにして得られた重み係数行列［Ｗ＾］を、最適な重み係数行列Ｗｏｐｔとする。即ち、Ｗｏｐｔ＝［Ｗ＾］である。 The weighting coefficient matrix [W ^] obtained in this way is set as an optimum weighting coefficient matrix Wopt. That is, Wopt = [W ^].

音声統合部３７は、この最適な重み係数行列Ｗｏｐｔを式（１６）に適用して、即ちＷ＝Ｗｏｐｔとして、下の式（２１）に示すように、最適な統合雑音抑圧振幅特性ベクトル｜［Ｘ〜］ｏｐｔ｜を得る。 The speech integration unit 37 applies the optimum weighting coefficient matrix Wopt to the equation (16), that is, W = Wopt, as shown in the following equation (21), the optimum integrated noise suppression amplitude characteristic vector | [ X˜] opt |.

一方で、周波数特性算出部２２は、フレームごとに切り出した雑音混入音声ベクトルｙから周波数特性ベクトルＹを求める。
そして、位相特性算出部２４は、周波数特性ベクトルＹを基に位相特性ベクトル∠Ｙを算出する。
この位相特性ベクトル∠Ｙの算出は、下の式（２２）により行なわれる。 On the other hand, the frequency characteristic calculation unit 22 obtains the frequency characteristic vector Y from the noise-containing speech vector y cut out for each frame.
Then, the phase characteristic calculation unit 24 calculates the phase characteristic vector ∠Y based on the frequency characteristic vector Y.
The calculation of the phase characteristic vector ∠Y is performed by the following equation (22).

そして、周波数特性算出部３８は、最適な統合雑音抑圧振幅特性ベクトル｜［Ｘ〜］ｏｐｔ｜と雑音混入音声の位相特性ベクトル∠Ｙとを用いて、下の式（２３）のように周波数特性ベクトル［Ｘ〜］ｏｐｔを算出する。 Then, the frequency characteristic calculation unit 38 uses the optimum integrated noise suppression amplitude characteristic vector | [X˜] opt | and the phase characteristic vector ∠Y of the noise-mixed speech to express the frequency characteristic as shown in the following equation (23). The vector [X˜] opt is calculated.

そして、音声波形算出部３９は、上で得られた周波数特性［Ｘ〜］ｏｐｔの逆フーリエ変換（ＩＦＦＴ）をとり、フレームごとに雑音抑圧された音声の時間波形［ｘ〜］ｏｐｔ（ｍ，ｎ）（ｎ＝１，・・・、Ｎ）を得る。ここで、ｍはフレームの、ｎはフレーム内のサンプルのインデックスを表す。 Then, the speech waveform calculation unit 39 takes the inverse Fourier transform (IFFT) of the frequency characteristic [X˜] opt obtained above, and the speech time waveform [x˜] opt (m, n) (n = 1,..., N) is obtained. Here, m represents the frame, and n represents the index of the sample in the frame.

そして、波形重ね合わせ部１８は、時間波形［ｘ〜］ｏｐｔ（ｍ，ｎ）をハミング窓ｗｈａｍｍ（ｎ）で除して、で除して、適切な窓関数（例えば、ハニング窓ｗｈａｎｎ（ｎ）＝０．５−０．５ｃｏｓ（２πｎ／Ｔ）を乗じる。そして、窓関数を乗じたデータをフレームごとにソフト幅分ずらして、重ね合わせることにより、雑音抑圧音声［ｘ〜］（ｎ）を得る。
音声出力部１９は、波形重ね合わせ部１８によって計算された雑音抑圧音声を、外部に出力する。 Then, the waveform superimposing unit 18 divides the time waveform [x˜] opt (m, n) by the hamming window whmm (n) and divides by the hamming window whmm (n) to obtain an appropriate window function (for example, the Hanning window whann (n ) = 0.5−0.5 cos (2πn / T), and the data multiplied by the window function is shifted by the soft width for each frame and superimposed, thereby noise-reduced speech [x˜] (n) Get.
The voice output unit 19 outputs the noise-suppressed voice calculated by the waveform superposition unit 18 to the outside.

図８および図９は、本実施形態の雑音抑圧装置による雑音抑圧の結果の例を示すグラフである。図８の（ａ）〜（ｇ）は、それぞれ音声波形を示すものであり、横軸は時刻、縦軸は振幅である。また、図９の（ａ）〜（ｇ）は、それぞれ、音声スペクトルを示すものであり、横軸は時刻、縦軸は周波数である。図８および図９の横軸の単位は秒である。図９の縦軸の単位はヘルツである。図９は、図４と同様に、周波数ごとの成分の強さの時間推移をグレースケールの濃さで表わしている。また、図８および図９は、それぞれ図３および図４と同様に、（ａ）クリーンスピーチ、（ｂ）付加雑音、（ｃ）雑音抑圧装置への入力となる雑音混入音声、（ｄ）本実施形態によって雑音を抑圧した音声、（ｅ）雑音抑圧法１による雑音抑圧音声（ｆ）は雑音抑圧法２による雑音抑圧音声、（ｇ）雑音抑圧法３による雑音抑圧音声の例を示す。
雑音抑圧法１〜３に比べて、本実施形態の方法でも、音声区間の劣化を抑え、非音声区間（雑音区間）の雑音が効果的に抑圧されているのがわかる。 8 and 9 are graphs showing examples of results of noise suppression by the noise suppression device of the present embodiment. (A) to (g) of FIG. 8 each show a speech waveform, with the horizontal axis representing time and the vertical axis representing amplitude. Moreover, (a)-(g) of FIG. 9 shows an audio | voice spectrum, respectively, a horizontal axis is time and a vertical axis | shaft is a frequency. The unit of the horizontal axis in FIGS. 8 and 9 is second. The unit of the vertical axis in FIG. 9 is hertz. FIG. 9 shows the time transition of the strength of the component for each frequency as the gray scale, as in FIG. 8 and FIG. 9 are similar to FIGS. 3 and 4, respectively, (a) clean speech, (b) additional noise, (c) noise-mixed speech to be input to the noise suppression device, (d) An example of speech with noise suppressed according to the embodiment, (e) noise-suppressed speech by noise suppression method 1 (f) is noise-suppressed speech by noise suppression method 2, and (g) noise-suppressed speech by noise suppression method 3.
Compared with the noise suppression methods 1 to 3, it can be seen that the method of the present embodiment also suppresses the degradation of the speech segment and effectively suppresses the noise in the non-speech segment (noise segment).

次に、本実施形態の雑音抑圧装置による処理結果の客観評価値について説明する。客観評価値としては、第１の実施形態と同様に、ｆｗＳＮＲｓｅｇを用いる。ｆｗＳＮＲｓｅｇを用いた本実施形態の評価結果は、下の表２の通りである。 Next, the objective evaluation value of the processing result by the noise suppression apparatus of this embodiment will be described. As the objective evaluation value, fwSNRseg is used as in the first embodiment. The evaluation results of this embodiment using fwSNRseg are as shown in Table 2 below.

表に示すとおり、第２の実施形態による方法では、元の雑音抑圧法１〜３よりも良い結果が得られている。また、第２の実施形態による方法は、第１の実施形態による方法よりも良い結果が得られている。
第２の実施形態においては、抑圧処理部１４−１〜１４−Ｉから出力された時間領域の音声信号を周波数領域の信号に変換し、周波数領域の信号間の相互相関値をとることによって重み係数を算出するとともに、この重み係数に基づいて周波数領域の信号を混合した。以上のように、異なる性質を持つ複数の雑音抑圧法で得られた雑音抑圧音声を周波数領域で混合する際に、最適化手法を用いて算出した重みづけ係数により、各雑音抑圧法からの雑音抑圧音声への重み付けを行うことにより、雑音成分のエネルギー低減、および、音声部分のエネルギー増幅の効果が的確に得られ、高品質な雑音抑圧音声を精緻に得ることができる。 As shown in the table, the method according to the second embodiment gives better results than the original noise suppression methods 1 to 3. Further, the method according to the second embodiment has obtained better results than the method according to the first embodiment.
In the second embodiment, the time domain audio signals output from the suppression processing units 14-1 to 14-I are converted into frequency domain signals, and weights are obtained by obtaining cross-correlation values between the frequency domain signals. Coefficients were calculated and frequency domain signals were mixed based on the weighting coefficients. As described above, when mixing noise-suppressed speech obtained by multiple noise suppression methods with different properties in the frequency domain, the noise from each noise suppression method is calculated using the weighting coefficient calculated using the optimization method. By weighting the suppressed speech, the effects of energy reduction of the noise component and energy amplification of the speech portion can be accurately obtained, and high-quality noise-suppressed speech can be precisely obtained.

［第３の実施形態］
次に、第３の実施形態について説明する。なお、前述の実施形態と同様の事項については説明を省略し、本実施形態特有の事項を中心に説明する。本実施形態における雑音抑圧装置は、第１の実施形態における雑音抑圧装置と類似の構成を有し、重み係数の算出方法が異なるものである。つまり、本実施形態による雑音抑圧装置は、図１の機能ブロック図における重み算出部１６を、下で説明する重み算出部５６で置き換えた構成を有する。 [Third Embodiment]
Next, a third embodiment will be described. In addition, description is abbreviate | omitted about the matter similar to the above-mentioned embodiment, and it demonstrates centering on the matter peculiar to this embodiment. The noise suppression device according to the present embodiment has a configuration similar to that of the noise suppression device according to the first embodiment, and is different in the calculation method of the weight coefficient. That is, the noise suppression apparatus according to the present embodiment has a configuration in which the weight calculation unit 16 in the functional block diagram of FIG. 1 is replaced with a weight calculation unit 56 described below.

重み算出部５６による重み係数の算出に先立って、第１の実施形態における処理と同様に、波形切り出し部１２は、入力された雑音混入音声を、適切な分析フレームごとに切り出す。また、フレームごとに切り出したデータについて、抑圧処理部１４−１〜１４−Ｉのそれぞれが、性質の異なるI個の雑音抑圧法によって各々の雑音抑圧音声を求める。 Prior to the calculation of the weight coefficient by the weight calculation unit 56, the waveform cutout unit 12 cuts out the input noise-containing speech for each appropriate analysis frame, as in the processing in the first embodiment. In addition, for the data cut out for each frame, each of the suppression processing units 14-1 to 14-I obtains each noise-suppressed speech by I noise suppression methods having different properties.

図１０は、本実施形態による重み算出部の機能構成を示すブロック図である。図示するように、重み算出部５６は、適応フィルター係数算出部２２１と、適応フィルター係数加算部２２２と、重み係数正規化部２２３とを含んで構成される。各雑音抑圧法に対応する最適な重み係数ベクトルｗｏｐｔを求めるために、重み算出部５６は適応フィルターを用いる。適応フィルターにはさまざまな方法があるが、本実施形態では一例として正規化ＬＭＳアルゴリズムを利用する。正規化ＬＭＳアルゴリズムは、ＬＭＳアルゴリズムの係数修正項を、フィルターの状態ベクトルノルムで正規化するものである。 FIG. 10 is a block diagram illustrating a functional configuration of the weight calculation unit according to the present embodiment. As illustrated, the weight calculation unit 56 includes an adaptive filter coefficient calculation unit 221, an adaptive filter coefficient addition unit 222, and a weight coefficient normalization unit 223. In order to obtain the optimum weight coefficient vector wopt corresponding to each noise suppression method, the weight calculation unit 56 uses an adaptive filter. There are various adaptive filter methods. In this embodiment, a normalized LMS algorithm is used as an example. The normalized LMS algorithm normalizes the coefficient correction term of the LMS algorithm with the filter state vector norm.

適応フィルター係数算出部２２１は、各雑音抑圧法による雑音抑圧音声ベクトルを基に、ある雑音抑圧法による雑音抑圧音声ベクトルに関して、他の雑音抑圧法による雑音抑圧音声ベクトルを所望信号とする適応フィルター係数を求める。
適応フィルター係数加算部２２２は、ある雑音抑圧法について、その雑音抑圧法に関して他の雑音抑圧法を所望信号とする適応フィルター係数を、前記他の雑音抑圧法のすべてについて加算する（総和をとる）。この値が、その雑音抑圧法についての重み係数値の元となる。
重み係数正規化部２２３は、適応フィルター係数加算部２２２によって算出された雑音抑圧法ごとの重み係数値を正規化する。具体的には、重み係数正規化部２２３は、すべての雑音抑圧法についての重み係数の総和が例えば１になるように、調整する。 The adaptive filter coefficient calculation unit 221 uses, as a desired signal, an adaptive filter coefficient that uses a noise-suppressed speech vector obtained by another noise suppression method as a desired signal with respect to a noise-suppressed speech vector obtained by a certain noise suppression method based on the noise-suppressed speech vector obtained by each noise suppression method Ask for.
The adaptive filter coefficient adding unit 222 adds, for all noise suppression methods, an adaptive filter coefficient that uses another noise suppression method as a desired signal with respect to the noise suppression method for all the other noise suppression methods. . This value is the basis of the weight coefficient value for the noise suppression method.
The weighting factor normalization unit 223 normalizes the weighting factor value for each noise suppression method calculated by the adaptive filter factor addition unit 222. Specifically, the weight coefficient normalization unit 223 performs adjustment so that the sum of the weight coefficients for all noise suppression methods is, for example, 1.

以下、適応フィルターを用いて重み係数を算出する手順について説明する。
適応フィルター係数算出部２２１は、まず、雑音抑圧音声行列記憶部１５（図１）から、抑圧処理部１４−１〜１４−Ｉによって各雑音抑圧法で得られた雑音抑圧音声ベクトル［ｘ＾］ｉ（ｉ＝１，・・・，Ｉ）のデータを読み出し、適応フィルター係数を求める。具体的には、適応フィルター係数算出部２２１は、雑音抑圧音声ベクトル［ｘ＾］ｉに関して、別の雑音抑圧法による雑音抑圧音声ベクトル［ｘ＾］ｊ（ｉ≠ｊ）を所望信号として適応フィルター係数ｈｋ＋１（ｉ，ｊ）を求める。適応フィルター係数ｈｋ＋１（ｉ，ｊ）は、下の式（２４）により算出される。 Hereinafter, a procedure for calculating a weighting coefficient using an adaptive filter will be described.
First, the adaptive filter coefficient calculation unit 221 first obtains the noise-suppressed speech vector [x ^] obtained from each noise suppression method by the suppression processing units 14-1 to 14-I from the noise-suppressed speech matrix storage unit 15 (FIG. 1). The data of i (i = 1,..., I) is read and the adaptive filter coefficient is obtained. Specifically, the adaptive filter coefficient calculation unit 221 uses the noise suppression speech vector [x ^] j (i ≠ j) by another noise suppression method as the desired signal for the noise suppression speech vector [x ^] i. The coefficient hk + 1 (i, j) is obtained. The adaptive filter coefficient hk + 1 (i, j) is calculated by the following equation (24).

適応フィルター係数算出部２２１は、式（２４）の漸化式により、収束した適応フィルター係数を求める。ここで、αは適応フィルターの収束の度合いを決めるステップサイズパラメーターであり、βはゼロ除算を防ぐ安定化パラメーターである。適応フィルター係数はサンプルごとに更新する。前提として、抑圧処理部１４−ｉと１４−ｊとでは異なる性質の雑音抑圧法を用いている。また、適応フィルター係数ｈｋ＋１（ｉ，ｊ）は、各雑音抑圧音声どうしの相互相関の程度に相当する。従って、音声区間では適応フィルター係数ｈｋ＋１（ｉ，ｊ）が高くなり、非音声区間（雑音区間）では適応フィルター係数ｈｋ＋１（ｉ，ｊ）が低くなることが期待される。 The adaptive filter coefficient calculation unit 221 obtains the converged adaptive filter coefficient by the recurrence formula of Expression (24). Here, α is a step size parameter that determines the degree of convergence of the adaptive filter, and β is a stabilization parameter that prevents division by zero. The adaptive filter coefficient is updated for each sample. As a premise, noise suppression methods having different properties are used in the suppression processing units 14-i and 14-j. The adaptive filter coefficient hk + 1 (i, j) corresponds to the degree of cross-correlation between the noise-suppressed speech. Therefore, it is expected that the adaptive filter coefficient hk + 1 (i, j) is high in the voice section and the adaptive filter coefficient hk + 1 (i, j) is low in the non-voice section (noise section).

次に、適応フィルター係数加算部２２２は、式（２４）で得られた適応フィルター係数ｈｋ＋１（ｉ，ｊ）を用いて、下の式（２５）によって、各雑音抑圧音声ベクトル［ｘ＾］ｉに対する重み係数［ｗ＾］ｉを算出する。 Next, the adaptive filter coefficient adding unit 222 uses each of the adaptive filter coefficients hk + 1 (i, j) obtained in Expression (24) and performs each noise-suppressed speech vector [x ^] i according to Expression (25) below. The weight coefficient [w ^] i for is calculated.

式（２５）に示すように、求められる重み係数［ｗ＾］ｉは、各雑音抑圧音声ベクトル［ｘ＾］ｉに係る適応フィルター係数をすべてのｊ（但し、ｉ≠ｊ）について加算したものである。 As shown in Expression (25), the obtained weighting coefficient [w ^] i is obtained by adding the adaptive filter coefficients related to each noise-suppressed speech vector [x ^] i for all j (where i ≠ j). It is.

そして、重み係数正規化部２２３は、重み係数ベクトル［ｗ＾］が式（４）を満たすように、式（２６）による正規化を行なう。 Then, the weight coefficient normalization unit 223 performs normalization according to Expression (26) so that the weight coefficient vector [w ^] satisfies Expression (4).

このようにして得られた重み係数ベクトル［ｗ＾］を最適な重み係数ベクトルｗｏｐｔとする。即ち、ｗｏｐｔ＝［ｗ＾］である。
重み算出部５６によって最適な重み係数ベクトルｗｏｐｔが算出された後の処理は、第１の実施形態と同様である。即ち、音声統合部１７（図１）が、最適な重み係数ベクトルｗｏｐｔを式（５）に適用して（ｗ＝ｗｏｐｔ）、下の式（２７）に従って、最適な統合雑音抑圧音声ベクトル［ｘ〜］ｏｐｔを得る。 The weighting coefficient vector [w ^] obtained in this way is set as the optimum weighting coefficient vector wopt. That is, wopt = [w ^].
The processing after the optimum weight coefficient vector wopt is calculated by the weight calculation unit 56 is the same as that in the first embodiment. That is, the speech integration unit 17 (FIG. 1) applies the optimum weighting coefficient vector wopt to the equation (5) (w = wopt), and according to the following equation (27), the optimum integrated noise suppression speech vector [x ~] Get opt.

式（２７）において、ｍはフレームのインデックスを表わし、ｎはフレーム内のサンプルのインデックスを表わす。
そして、波形重ね合わせ部１８が、時間波形［ｘ〜］ｏｐｔ（ｍ，ｎ）をフレームごとにシフト幅分ずらして、重ね合わせる。これにより、雑音抑圧音声［ｘ〜］（ｎ）が得られる。 In equation (27), m represents the index of the frame, and n represents the index of the sample in the frame.
Then, the waveform superimposing unit 18 superimposes the time waveform [x˜] opt (m, n) by shifting the shift width for each frame. Thereby, noise-suppressed speech [x˜] (n) is obtained.

次に、本実施形態の雑音抑圧装置による処理結果の客観評価値について説明する。客観評価値としては、第１および第２の実施形態と同様に、ｆｗＳＮＲｓｅｇを用いる。ｆｗＳＮＲｓｅｇを用いた本実施形態の評価結果は、下の表３の通りである。 Next, the objective evaluation value of the processing result by the noise suppression apparatus of this embodiment will be described. As the objective evaluation value, fwSNRseg is used as in the first and second embodiments. The evaluation results of this embodiment using fwSNRseg are as shown in Table 3 below.

表に示すとおり、第３の実施形態による方法では、元の雑音抑圧法１〜３よりも良い結果が得られている。また、第３の実施形態による方法は、第１および第２の実施形態による方法（それぞれ、表１および表２に結果を示した）よりも良い結果が得られている。
以上のように、異なる性質を持つ複数の雑音抑圧法で得られた雑音抑圧音声を時間領域で混合する際に、適応フィルターを用いて算出した重みづけ係数により、各雑音抑圧法からの雑音抑圧音声への重み付けを行うことにより、雑音成分のエネルギー低減、および、音声部分のエネルギー増幅の効果が的確に得られ、高品質な雑音抑圧音声を簡便に得ることができる。 As shown in the table, the method according to the third embodiment gives better results than the original noise suppression methods 1 to 3. In addition, the method according to the third embodiment gives better results than the methods according to the first and second embodiments (results shown in Tables 1 and 2 respectively).
As described above, when mixing noise-suppressed speech obtained by multiple noise suppression methods with different properties in the time domain, the noise suppression from each noise suppression method is performed using the weighting coefficient calculated using an adaptive filter. By weighting the voice, the effects of energy reduction of the noise component and the energy amplification of the voice part can be accurately obtained, and high-quality noise-suppressed voice can be easily obtained.

［第４の実施形態］
次に、第４の実施形態について説明する。なお、前述の実施形態と同様の事項については説明を省略し、本実施形態特有の事項を中心に説明する。 [Fourth Embodiment]
Next, a fourth embodiment will be described. In addition, description is abbreviate | omitted about the matter similar to the above-mentioned embodiment, and it demonstrates centering on the matter peculiar to this embodiment.

前述の第１の実施形態においては、抑圧処理部１４−１〜１４−Ｉから出力された時間領域の音声信号を、重み係数に基づいて混合した。このとき、抑圧処理部１４−１〜１４−Ｉから出力されたデータ（雑音抑圧音声ベクトル）の相互相関値をとり、この相互相関値に基づいて重み係数を求めた。
前述の第２の実施形態においては、抑圧処理部１４−１〜１４−Ｉから出力された時間領域の音声信号を周波数領域の信号に変換し、周波数領域の信号間の相互相関値をとることによって重み係数を算出するとともに、この重み係数に基づいて周波数領域の信号を混合した。
前述の第３の実施形態においては、時間領域の音声信号を重み係数に基づいて混合した。但し、同実施形態では、抑圧処理部１４−１〜１４−Ｉから出力されたデータ（雑音抑圧音声ベクトル）間の適応フィルター値をとることによって重み係数を算出した。
この第４の実施形態は、上記の第２の実施形態および第３の実施形態の特徴を併せ持つ構成を有する。即ち、周波数領域の信号間で適応フィルター値を算出し、この適応フィルター値に基づいて重み係数を算出する。そして、算出された重み係数に基づいて周波数領域の信号を混合する。 In the first embodiment described above, the time-domain audio signals output from the suppression processing units 14-1 to 14-I are mixed based on the weighting factor. At this time, the cross-correlation value of the data (noise-suppressed speech vector) output from the suppression processing units 14-1 to 14-I was taken, and the weighting coefficient was obtained based on the cross-correlation value.
In the second embodiment described above, the time-domain audio signals output from the suppression processing units 14-1 to 14-I are converted into frequency-domain signals, and cross-correlation values between the frequency-domain signals are obtained. The weighting factor was calculated by the above, and signals in the frequency domain were mixed based on this weighting factor.
In the above-described third embodiment, the time-domain audio signal is mixed based on the weighting factor. However, in the embodiment, the weighting coefficient is calculated by taking an adaptive filter value between the data (noise-suppressed speech vectors) output from the suppression processing units 14-1 to 14-I.
The fourth embodiment has a configuration having the characteristics of the second embodiment and the third embodiment. That is, an adaptive filter value is calculated between signals in the frequency domain, and a weighting coefficient is calculated based on the adaptive filter value. Then, the frequency domain signals are mixed based on the calculated weighting factor.

つまり、本実施形態による雑音抑圧装置は、図７に示す機能ブロック図と類似の構成を有し、重み算出部による重み係数の算出方法のみが異なる。本実施形態による重み算出部は、振幅特性算出部２６−１〜２６−Ｉから出力される雑音抑圧音声の振幅特性ベクトルに基づき、各振幅特性ベクトルについて、他の振幅特性ベクトルを所望信号とする適応フィルター値を算出する。そして、重み算出部は、各振幅特性ベクトルについて、他の振幅特性ベクトル（他の雑音抑圧法）を所望信号とする適応フィルター値の総和をとり、さらに重み係数全体の総和が１になるように正規化する。音声統合部は、得られた重み係数による重みづけを行ないながら、振幅特性ベクトルを混合する。そして、混合された周波数領域の雑音抑圧音声の信号を、時間領域の信号に戻す変換を行い、時間窓ごとの波形の重ね合わせを行なった後、得られた雑音抑圧音声を出力する。 That is, the noise suppression device according to the present embodiment has a configuration similar to that of the functional block diagram shown in FIG. 7, and only the weight coefficient calculation method by the weight calculation unit is different. The weight calculation unit according to the present embodiment uses another amplitude characteristic vector as a desired signal for each amplitude characteristic vector based on the amplitude characteristic vector of the noise-suppressed speech output from the amplitude characteristic calculation units 26-1 to 26-I. Calculate the adaptive filter value. Then, the weight calculation unit calculates the sum of adaptive filter values using other amplitude characteristic vectors (other noise suppression methods) as desired signals for each amplitude characteristic vector, and further makes the total sum of the weighting coefficients be 1. Normalize. The voice integration unit mixes the amplitude characteristic vectors while performing weighting using the obtained weighting coefficient. Then, the mixed frequency-domain noise-suppressed speech signal is converted back to a time-domain signal, the waveforms are superimposed for each time window, and the obtained noise-suppressed speech is output.

なお、上述した各実施形態における雑音抑圧装置の機能をコンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that the function of the noise suppression device in each of the above-described embodiments may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible disk, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, a “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included, and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the above-described functions, or may be a program that can realize the above-described functions in combination with a program already recorded in a computer system.

［変形例］
次のような変形例により、この発明を実施するようにしても良い。
第１〜第４の実施形態では、相互相関値あるいは適応フィルター係数値を算出して、２つの異なる雑音抑圧法で得た信号間で相関の高い波形を抽出することにより重み係数を設定した。代わりに、変形例では、重み係数をパラメーターとする評価関数を適切に設定し、雑音抑圧結果（複数の異なる雑音抑圧法による信号を混合した結果）に関して評価関数値を計算する。そして、その評価関数値が最適となるパラメーターを求めるようにする。求めるパラメーターは多次元（I×１次元、またはI×N次元）のベクトルであり、例えば最急降下法を用いて、パラメーターを最適化する。 [Modification]
You may make it implement this invention with the following modifications.
In the first to fourth embodiments, a weighting coefficient is set by calculating a cross-correlation value or an adaptive filter coefficient value and extracting a waveform having a high correlation between signals obtained by two different noise suppression methods. Instead, in the modification, an evaluation function using a weighting factor as a parameter is appropriately set, and an evaluation function value is calculated for a noise suppression result (a result of mixing signals from a plurality of different noise suppression methods). Then, a parameter for which the evaluation function value is optimal is obtained. The parameter to be obtained is a multidimensional vector (I × 1 dimension or I × N dimension), and the parameter is optimized using, for example, the steepest descent method.

以上、この発明の複数の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 As described above, a plurality of embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes a design and the like within a scope not departing from the gist of the present invention. .

本発明は、音声処理全般に利用することができる。一例として、放送番組等のための音声を収録する機器類に利用することができる。 The present invention can be used for audio processing in general. As an example, the present invention can be used for devices that record audio for broadcast programs and the like.

１，２雑音抑圧装置
１１音声入力部
１２波形切り出し部
１４−１〜１４−Ｉ抑圧処理部
１５雑音抑圧音声行列記憶部
１６，３６，５６重み算出部
１７，３７音声統合部
１８波形重ね合わせ部
１９音声出力部
２２周波数特性算出部
２４位相特性算出部
２５−１〜２５−Ｉ周波数特性算出部
２６−１〜２６−Ｉ振幅特性算出部
３５雑音抑圧振幅特性行列記憶部
３８周波数特性算出部
３９音声波形算出部
２０１相互相関係数算出部
２０２相互相関係数加算部
２０３重み係数正規化部
２２１適応フィルター係数算出部
２２２適応フィルター係数加算部
２２３重み係数正規化部 1, 2 Noise suppressor
DESCRIPTION OF SYMBOLS 11 Voice input part 12 Waveform cut-out part 14-1-14-I Suppression process part 15 Noise suppression voice matrix memory | storage part 16,36,56 Weight calculation part 17,37 Voice integration part 18 Waveform superposition part 19 Voice output part 22 Frequency Characteristic calculation unit 24 Phase characteristic calculation unit 25-1 to 25-I Frequency characteristic calculation unit 26-1 to 26-I Amplitude characteristic calculation unit 35 Noise suppression amplitude characteristic matrix storage unit 38 Frequency characteristic calculation unit 39 Speech waveform calculation unit 201 Correlation coefficient calculation unit 202 Cross correlation coefficient addition unit 203 Weight coefficient normalization unit 221 Adaptive filter coefficient calculation unit 222 Adaptive filter coefficient addition unit 223 Weight coefficient normalization unit

Claims

A plurality of suppression processing units that output noise-suppressed voice data by performing processing based on different noise suppression methods for input voice data, and
A weight calculating unit that calculates a weighting coefficient for each noise suppression method based on the noise-suppressed speech data output from the plurality of suppression processing units;
Voice integration for mixing the noise-suppressed voice data by multiplying each noise-suppressed voice data output from the plurality of suppression processing sections by a weighting coefficient for each noise suppression method calculated by the weight calculation section And
A noise suppression device comprising:

The weight calculation unit calculates a correlation coefficient between the noise-suppressed speech data output from the plurality of suppression processing units, and the noise suppression method having a higher correlation with other noise suppression methods, Calculating so that the value of the weighting factor is large,
The noise suppression apparatus according to claim 1.

The weight calculation unit is configured to request the noise-suppressed speech data by another noise suppression method for the noise-suppressed speech data by each noise suppression method based on the noise-suppressed speech data output from the plurality of suppression processing units. Calculating an adaptive filter coefficient as a signal, and calculating the value of the weighting coefficient to be larger as the calculated value of the adaptive filter coefficient is larger;
The noise suppression apparatus according to claim 1.

A frequency characteristic calculation unit that calculates frequency characteristic data based on the noise-suppressed speech data output from each of the plurality of suppression processing units;
An amplitude characteristic calculator that calculates amplitude characteristic data based on the frequency characteristic data;
Further comprising
The weight calculation unit calculates a weight coefficient for each noise suppression method based on the amplitude characteristic data,
The speech integration unit mixes the noise-suppressed speech data by multiplying the amplitude characteristic data by the weighting factor and mixing.
The noise suppression device according to any one of claims 1 to 3, wherein

On the computer,
A plurality of suppression processing steps for outputting noise-suppressed speech data by performing processing by different noise suppression methods on the waveform data of the input speech,
A weight calculation process for calculating a weighting coefficient for each noise suppression method based on the noise-suppressed voice data output in the plurality of suppression processes;
Speech integration for mixing the noise-suppressed speech data by multiplying each noise-suppressed speech data output from the plurality of suppression processing steps by a weighting coefficient for each noise suppression method calculated by the weight calculating step process,
Program to execute the process.