JP2010066478A

JP2010066478A - Noise suppressing device and noise suppressing method

Info

Publication number: JP2010066478A
Application number: JP2008232241A
Authority: JP
Inventors: Tomoya Takatani; 智哉高谷; Evan Jani; エバンジャニ
Original assignee: Nara Institute of Science and Technology NUC; Toyota Motor Corp
Current assignee: Nara Institute of Science and Technology NUC; Toyota Motor Corp
Priority date: 2008-09-10
Filing date: 2008-09-10
Publication date: 2010-03-25

Abstract

<P>PROBLEM TO BE SOLVED: To maintain high sound quality, while effectively suppressing noise. <P>SOLUTION: A noise suppressing device 10 includes: a Fourier transformation section 2 and 6 for performing Fourier transformation on a mixed measurement signal including a voice signal and a noise signal, and a noise estimation signal which is an estimated noise signal; a mask function calculation section 8 for calculating a mask function H(f, t) which is a subtraction coefficient on the basis of the mixed measurement signal and the noise estimation signal, on which Fourier transformation is performed by the Fourier transformation section 2 and 6; and a subtraction processing section 9 for calculating a voice estimation signal which is estimated as the voice signal by subtracting the noise estimation signal from the mixed measurement signal by using the mask function H(f, t) calculated by the mask function calculation section 8. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、雑音抑制装置及び雑音抑制方法に関し、より詳細には、雑音を抑制しつつ、高音質を実現できる雑音抑制装置及び雑音抑制方法に関するものである。 The present invention relates to a noise suppression device and a noise suppression method, and more particularly to a noise suppression device and a noise suppression method that can realize high sound quality while suppressing noise.

従来、目的信号であるユーザの音声信号に他の音声信号や環境雑音信号等が混入した混合観測信号から、その音声信号のみを抽出するスペクトラム減算法を用いた音声認識装置が知られている（例えば、特許文献１参照）。このスペクトラム減算法は、雑音が混入した混合観測信号のパワースペクトラムから別途推定した雑音信号のパワースペクトラムを減算することで、目的の音声信号を推定する手法である。この減算処理を行う際に、減算係数と称される係数が雑音信号のパワースペクトラムに乗算され、当該パワースペクトラムの補正が行われている。
特開２００７−２４８５３４号公報 2. Description of the Related Art Conventionally, a speech recognition apparatus using a spectrum subtraction method that extracts only a speech signal from a mixed observation signal in which another speech signal, an environmental noise signal, or the like is mixed with a user speech signal that is a target signal is known ( For example, see Patent Document 1). This spectrum subtraction method is a method of estimating a target audio signal by subtracting a power spectrum of a noise signal separately estimated from a power spectrum of a mixed observation signal mixed with noise. When performing this subtraction process, a coefficient called a subtraction coefficient is multiplied by the power spectrum of the noise signal, and the power spectrum is corrected.
JP 2007-248534 A

ところで、音声信号ｓ（ｔ）、雑音信号ｎ（ｔ）、及び混合観測信号ｘ（ｔ）とすると、一般に下記（４）式が成立する。
ｘ（ｔ）＝ｓ（ｔ）＋ｎ（ｔ）（４）式 By the way, when the audio signal s (t), the noise signal n (t), and the mixed observation signal x (t) are used, the following equation (4) is generally established.
x (t) = s (t) + n (t) (4) Formula

このとき、別途推定された雑音推定信号をｎ１（ｔ）とすると、スペクトラム減算法によって得られる音声推定信号は、下記（５）式により表わすことができる。

At this time, if the noise estimation signal estimated separately is n1 (t), the speech estimation signal obtained by the spectrum subtraction method can be expressed by the following equation (5).

なお、上記（５）式において、Ｘ（ｆ、ｔ）及びＮ１（ｆ、ｔ）は、ｘ（ｔ）及びｎ１（ｔ）に対して短時間フーリエ変換を夫々行った信号である。また、βは減算係数であり、ａｎｇｌｅ（Ｙ）は複素数Ｙの位相角を出力する関数である。 In the above equation (5), X (f, t) and N1 (f, t) are signals obtained by performing short-time Fourier transform on x (t) and n1 (t), respectively. Β is a subtraction coefficient, and angle (Y) is a function that outputs the phase angle of the complex number Y.

したがって、上記（５）式によれば、出力信号である音声推定信号ｓ１（ｔ）における雑音の残留度合いは、減算係数βの値に依存していることが分かる。例えば、減算係数βを増加させ抑圧性能を向上させると、音質が低下し、一方で、減算係数βを低下させ抑圧性能を低下させると、音質が向上する。このように、抑圧性能と音質とには、二律背反（トレードオフ）の関係が生じているため、抑圧性能と音質とを同時に満足させる最適な減算係数βを設定するのは困難となっている。 Therefore, according to the above equation (5), it can be seen that the degree of residual noise in the speech estimation signal s1 (t) that is the output signal depends on the value of the subtraction coefficient β. For example, if the subtraction coefficient β is increased to improve the suppression performance, the sound quality is deteriorated. On the other hand, if the subtraction coefficient β is decreased to suppress the suppression performance, the sound quality is improved. In this way, since there is a tradeoff between suppression performance and sound quality, it is difficult to set an optimal subtraction coefficient β that satisfies both suppression performance and sound quality at the same time.

本発明は、このような問題点を解決するためになされたものであり、雑音を効果的に抑制しつつ、高音質を維持することができる雑音抑制装置及び雑音抑制方法を提供することを主たる目的とする。 The present invention has been made to solve such problems, and mainly provides a noise suppression device and a noise suppression method capable of maintaining high sound quality while effectively suppressing noise. Objective.

上記目的を達成するための本発明の一態様は、音声信号及び雑音信号を含む混合観測信号と、推定された雑音信号である雑音推定信号とに対してフーリエ変換を行うフーリエ変換部と、前記フーリエ変換部により前記フーリエ変換された前記混合観測信号と前記雑音推定信号とに基づいて、減算係数となるマスク関数を算出するマスク関数演算部と、前記マスク関数演算部により算出された前記マスク関数を用いて、前記混合観測信号から前記雑音推定信号を減算することで、音声信号と推定される音声推定信号を算出する減算処理部と、を備える、ことを特徴とする雑音抑制装置である。この一態様によれば、雑音を効果的に抑制しつつ、高音質を維持することができる。 One aspect of the present invention for achieving the above object includes a Fourier transform unit that performs Fourier transform on a mixed observation signal including a speech signal and a noise signal, and a noise estimation signal that is an estimated noise signal, Based on the mixed observation signal Fourier-transformed by the Fourier transform unit and the noise estimation signal, a mask function computing unit that computes a mask function to be a subtraction coefficient, and the mask function computed by the mask function computing unit And a subtraction processing unit that calculates a speech estimation signal that is estimated to be a speech signal by subtracting the noise estimation signal from the mixed observation signal. According to this aspect, it is possible to maintain high sound quality while effectively suppressing noise.

また、この一態様において、前記マスク関数演算部は、前記フーリエ変換された雑音推定信号と混合観測信号との比で表わされる２つの確信度を含む前記マスク関数を算出してもよい。 In this aspect, the mask function calculation unit may calculate the mask function including two certainty factors represented by a ratio between the Fourier-transformed noise estimation signal and the mixed observation signal.

さらに、この一態様において、前記マスク関数演算部は、前記確信度Ｐ_ｆ（ｔ）及びＰ_ｂ（ｆ）を、（１）式及び（２）式を用いて夫々算出してもよい。 Furthermore, in this one aspect, the mask function calculation unit may calculate the certainty factors P _f (t) and P _b (f) using the equations (1) and (2), respectively.

なお、この一態様において、前記マスク関数演算部は、（３）式を用いて前記マスク関数Ｈ（ｆ、ｔ）を算出してもよい。 In this aspect, the mask function calculation unit may calculate the mask function H (f, t) using equation (3).

この一態様において、前記減算処理部により算出された前記音声推定信号に対してフーリエ逆変換を行うフーリエ逆変換部を更に備えていてもよい。 In this aspect, the image processing apparatus may further include a Fourier inverse transform unit that performs an inverse Fourier transform on the speech estimation signal calculated by the subtraction processing unit.

他方、上記目的を達成するための本発明の一態様は、音声信号及び雑音信号を含む混合観測信号と、雑音信号であると推定された雑音推定信号とに対してフーリエ変換を行うフーリエ変換工程と、前記フーリエ変換工程で前記フーリエ変換された前記混合観測信号と前記雑音推定信号とに基づいて、減算係数となるマスク関数を算出するマスク関数演算工程と、前記マスク関数演算工程で算出された前記マスク関数を用いて、前記混合観測信号から前記雑音推定信号を減算することで、音声信号と推定される音声推定信号を算出する減算処理工程と、を含む、ことを特徴とする雑音抑制方法であってもよい。 On the other hand, one aspect of the present invention for achieving the above object is a Fourier transform step of performing a Fourier transform on a mixed observation signal including a speech signal and a noise signal and a noise estimation signal estimated to be a noise signal. And a mask function calculation step for calculating a mask function to be a subtraction coefficient based on the mixed observation signal and the noise estimation signal Fourier-transformed in the Fourier transform step, and a mask function calculation step calculated by the mask function calculation step. A subtraction processing step of calculating a speech estimation signal estimated as a speech signal by subtracting the noise estimation signal from the mixed observation signal using the mask function. It may be.

本発明によれば、雑音を効果的に抑制しつつ、高音質を維持することができる。 According to the present invention, it is possible to maintain high sound quality while effectively suppressing noise.

以下、本発明を実施するための最良の形態について、添付図面を参照しながら一実施形態を挙げて説明する。図１は、本発明の一実施形態に係る雑音抑制装置のシステム構成を示すブロック図である。 The best mode for carrying out the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram showing a system configuration of a noise suppression apparatus according to an embodiment of the present invention.

本実施形態に係る雑音抑制装置１０は、混合観測信号入力部１と、第１離散フーリエ変換部２と、位相演算部３と、第１スペクトル演算部４と、雑音推定信号入力部５と、第２離散フーリエ変換部６と、第２スペクトル演算部７と、マスク関数演算部８と、減算処理部９と、離散フーリエ逆変換部１１と、音声推定信号出力部１２と、を備えている。 The noise suppression apparatus 10 according to the present embodiment includes a mixed observation signal input unit 1, a first discrete Fourier transform unit 2, a phase calculation unit 3, a first spectrum calculation unit 4, a noise estimation signal input unit 5, A second discrete Fourier transform unit 6, a second spectrum computation unit 7, a mask function computation unit 8, a subtraction processing unit 9, a discrete Fourier inverse transform unit 11, and a speech estimation signal output unit 12 are provided. .

なお、雑音抑制装置１０は、主要なハードウェア構成として、制御処理、演算処理等を行うＣＰＵ（Central Processing Unit）と、ＣＰＵによって実行される制御プログラム、演算プログラム等が記憶されたＲＯＭ（Read Only Memory）と、処理データ等を一時的に記憶するＲＡＭ（Random Access Memory）と、を有するマイクロコンピュータにより構成されている。また、第１離散フーリエ変換部２、位相演算部３、第１スペクトル演算部４、第２離散フーリエ変換部６、第２スペクトル演算部７、マスク関数演算部８、減算処理部９、および離散フーリエ逆変換部１１は、例えば、上記ＲＯＭに格納され、上記ＣＰＵによって実行されるプログラムにより実現されていてもよい。 The noise suppression apparatus 10 includes, as main hardware configurations, a CPU (Central Processing Unit) that performs control processing, arithmetic processing, and the like, and a ROM (Read Only) that stores control programs, arithmetic programs, and the like executed by the CPU. Memory) and RAM (Random Access Memory) that temporarily stores processing data and the like. Also, the first discrete Fourier transform unit 2, the phase calculation unit 3, the first spectrum calculation unit 4, the second discrete Fourier transform unit 6, the second spectrum calculation unit 7, the mask function calculation unit 8, the subtraction processing unit 9, and the discrete For example, the inverse Fourier transform unit 11 may be realized by a program stored in the ROM and executed by the CPU.

混合観測信号入力部１の入力端子には、音声信号及び雑音信号を含む混合観測信号ｘ（ｔ）が入力される。ここで、音声信号とは、ユーザ音声等の目的信号であり、雑音信号とは、周囲の音声、環境雑音等のいわゆるノイズである。また、混合観測信号ｘ（ｔ）に対し、線形フィルタリング処理を施し、音声を強調させてもよい。混合観測信号入力部１は、入力された混合観測信号ｘ（ｔ）を第１離散フーリエ変換部２に対して出力する。 A mixed observation signal x (t) including an audio signal and a noise signal is input to the input terminal of the mixed observation signal input unit 1. Here, the voice signal is a target signal such as a user voice, and the noise signal is so-called noise such as ambient voice and environmental noise. Further, the mixed observation signal x (t) may be subjected to linear filtering processing to enhance the voice. The mixed observation signal input unit 1 outputs the input mixed observation signal x (t) to the first discrete Fourier transform unit 2.

第１離散フーリエ変換部２は、入力された混合観測信号ｘ（ｔ）に対して、周知のフーリエ変換を行い、フーリエ変換された混合観測信号Ｘ（ｆ、ｔ）を算出する。そして、第１離散フーリエ変換部２は、フーリエ変換した混合観測信号Ｘ（ｆ、ｔ）を、位相演算部３、第１スペクトル演算部４、及びマスク関数演算部８に対して出力する。 The first discrete Fourier transform unit 2 performs a well-known Fourier transform on the input mixed observation signal x (t) and calculates a Fourier-transformed mixed observation signal X (f, t). The first discrete Fourier transform unit 2 then outputs the Fourier-transformed mixed observation signal X (f, t) to the phase computation unit 3, the first spectrum computation unit 4, and the mask function computation unit 8.

位相演算部３は、第１離散フーリエ変換部２からのフーリエ変換された混合観測信号Ｘ（ｆ、ｔ）に基づいて、下記（６）式を用いて、位相ａｎｇｌｅ（Ｘ（ｆ、ｔ））を算出する。
ａｎｇｌｅ（Ｘ（ｆ、ｔ））＝ａｒｃｔａｎ（Ａ／Ｂ）（６）式 The phase calculation unit 3 uses the following equation (6) based on the Fourier-transformed mixed observation signal X (f, t) from the first discrete Fourier transform unit 2 to calculate the phase angle (X (f, t) ) Is calculated.
angle (X (f, t)) = arctan (A / B) (6)

ここで、Ｘ（ｆ、ｔ）＝Ａ＋Ｂｉ（ｉは複素数であり、Ａ及びＢは任意の実数とする）で表わすことができる。位相演算部３は、算出した位相ａｎｇｌｅ（Ｘ（ｆ、ｔ））を減算処理部９に対して出力する。 Here, X (f, t) = A + Bi (i is a complex number, and A and B are arbitrary real numbers). The phase calculation unit 3 outputs the calculated phase angle (X (f, t)) to the subtraction processing unit 9.

第１スペクトル演算部４は、第１離散フーリエ変換部２からのフーリエ変換された混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２を算出し、マスク関数演算部８に対して出力する。 The first spectrum calculation unit 4 calculates a power spectrum | X (f, t) | ² of the mixed observation signal X (f, t) subjected to Fourier transform from the first discrete Fourier transform unit 2, and a mask function calculation unit 8 is output.

雑音推定信号入力部５の入力端子には、推定された雑音信号である雑音推定信号ｎ１（ｔ）が入力される。なお、上記雑音信号の推定には、周知の雑音推定アルゴリズムを用いることができる。雑音推定信号入力部５は、入力された雑音推定信号ｎ１（ｔ）を第２離散フーリエ変換部６に対して出力する。 The noise estimation signal n1 (t), which is an estimated noise signal, is input to the input terminal of the noise estimation signal input unit 5. A known noise estimation algorithm can be used for estimating the noise signal. The noise estimation signal input unit 5 outputs the input noise estimation signal n1 (t) to the second discrete Fourier transform unit 6.

第２離散フーリエ変換部６は、入力された雑音推定信号ｎ１（ｔ）に対して周知のフーリエ変換を行い、フーリエ変換された雑音推定信号Ｎ１（ｆ、ｔ）を算出する。そして、第２離散フーリエ変換部６は、フーリエ変換した雑音推定信号Ｎ１（ｆ、ｔ）をマスク関数演算部８及び第２スペクトル演算部７に対して出力する。 The second discrete Fourier transform unit 6 performs a well-known Fourier transform on the input noise estimation signal n1 (t), and calculates a Fourier estimated noise estimation signal N1 (f, t). Then, the second discrete Fourier transform unit 6 outputs the noise estimation signal N1 (f, t) obtained by Fourier transform to the mask function computation unit 8 and the second spectrum computation unit 7.

第２スペクトル演算部７は、第２離散フーリエ変換部６からのフーリエ変換された雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２を算出し、マスク関数演算部８に対して出力する。 The second spectrum calculation unit 7 calculates a power spectrum | N1 (f, t) | ² of the noise estimation signal N1 (f, t) subjected to Fourier transform from the second discrete Fourier transform unit 6, and a mask function calculation unit 8 is output.

マスク関数演算部８は、第１スペクトル演算部４からの混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２と、第２スペクトル演算部７からの雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２と、に基づいて、減算係数βに相当するソフトマスク関数（マスク関数）Ｈ（ｆ、ｔ）を算出する。 The mask function calculation unit 8 includes the power spectrum | X (f, t) | ² of the mixed observation signal X (f, t) from the first spectrum calculation unit 4 and the noise estimation signal N1 from the second spectrum calculation unit 7. Based on the power spectrum | N1 (f, t) | ^{2 of} (f, t), a soft mask function (mask function) H (f, t) corresponding to the subtraction coefficient β is calculated.

ここで、減算係数βは、雑音を抑制する際の抑圧性能を決定する係数であり、後述の減算処理部９の（７）式が示すように、例えば、減算係数βを増加させると抑圧性能が向上し、一方、減算係数βを低下させると抑圧性能が低下する。 Here, the subtraction coefficient β is a coefficient that determines the suppression performance when suppressing noise. For example, when the subtraction coefficient β is increased, as shown in Equation (7) of the subtraction processing unit 9 described later, the suppression performance is increased. On the other hand, if the subtraction coefficient β is lowered, the suppression performance is lowered.

マスク関数演算部８は、まず、雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２と、混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２との比で表わされる２つの確信度Ｐ_ｆ（ｔ）、Ｐ_ｂ（ｆ）を、下記（１）式及び（２）式を用いて、夫々算出する。

First, the mask function calculation unit 8 first calculates the power spectrum | N1 (f, t) | ^{2 of} the noise estimation signal N1 (f, t) and the power spectrum | X (f, t) of the mixed observation signal X (f, t). ) | The ^two certainty factors P _f (t) and P _b (f) represented by the ratio to 2 are calculated using the following equations (1) and (2), respectively.

次に、マスク関数演算部８は、算出された上記確信度Ｐ_ｆ（ｔ）及びＰ_ｂ（ｆ）に基づいて、下記（３）式を用いて、ソフトマスク関数Ｈ（ｆ、ｔ）を算出する。

Next, the mask function calculation unit 8 calculates the soft mask function H (f, t) using the following equation (3) based on the calculated certainty factors P _f (t) and P _b (f). calculate.

なお、上記（３）式において、Ｉには、例えば１が設定されている。また、最小減算係数δ_ｏ及び最大減算係数δ_ｍには、後述の抑圧性能及び音質が最良となる最適値が実験的に求められ設定されている。 In the above formula (3), for example, 1 is set as I. For the minimum subtraction coefficient δ _o and the maximum subtraction coefficient δ _m , optimum values that give the best suppression performance and sound quality described later are experimentally obtained and set.

このように、雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２と、混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２との比で表わされる２つの確信度Ｐ_ｆ（ｔ）、Ｐ_ｂ（ｆ）を用いることで、抑圧性能と音質とを同時に満たすソフトマスク関数Ｈ（ｆ，ｔ）を、最適かつ自動的に設定することができる。マスク関数演算部８は、算出したソフトマスク関数Ｈ（ｆ、ｔ）を減算処理部９に対して出力する。 Thus, the power spectrum of the noise estimation signal N1 (f, t) | N1 (f, t) | ^2, the power spectrum of the mixed observation signals X (f, t) | X (f, t) | 2 and the By using the two certainty factors P _f (t) and P _b (f) expressed by the ratio, the soft mask function H (f, t) that simultaneously satisfies the suppression performance and the sound quality is set optimally and automatically. be able to. The mask function calculation unit 8 outputs the calculated soft mask function H (f, t) to the subtraction processing unit 9.

減算処理部９は、マスク関数演算部８により算出されたソフトマスク関数Ｈ（ｆ、ｔ）を用いて、混合観測信号Ｘ（ｆ、ｔ）から雑音推定信号Ｎ１（ｆ、ｔ）を減算することで、音声信号と推定される音声推定信号ｓ１（ｆ、ｔ）を算出する。 The subtraction processing unit 9 subtracts the noise estimation signal N1 (f, t) from the mixed observation signal X (f, t) using the soft mask function H (f, t) calculated by the mask function calculation unit 8. Thus, the estimated speech signal s1 (f, t) estimated as the speech signal is calculated.

より具体的には、減算処理部９は、位相演算部３からの位相ａｎｇｌｅ（Ｘ（ｆ、ｔ））と、フーリエ変換された混合観測信号Ｘ（ｆ、ｔ）及び雑音推定信号Ｎ１（ｆ、ｔ）と、マスク関数演算部８により算出されたソフトマスク関数Ｈ（ｆ、ｔ）と、に基づいて、下記（７）式を用いて、音声推定信号ｓ１（ｆ、ｔ）を算出する。

More specifically, the subtraction processing unit 9 performs the phase angle (X (f, t)) from the phase calculation unit 3, the mixed observation signal X (f, t) and the noise estimation signal N1 (f , T) and the soft mask function H (f, t) calculated by the mask function calculation unit 8, the speech estimation signal s 1 (f, t) is calculated using the following equation (7). .

なお、上記（７）式において、γは後述の如く、抑圧性能及び音質が最適となるような任意の最適値が設定される。 In the above equation (7), γ is set to an arbitrary optimum value that optimizes the suppression performance and sound quality, as will be described later.

減算処理部９は、算出した音声推定信号ｓ１（ｆ、ｔ）を離散フーリエ逆変換部１１に対して出力する。 The subtraction processing unit 9 outputs the calculated speech estimation signal s 1 (f, t) to the discrete Fourier inverse transform unit 11.

離散フーリエ逆変換部１１は、入力された音声推定信号ｓ１（ｆ、ｔ）に対してフーリエ逆変換を行い、フーリエ逆変換された音声推定信号ｓ１（ｔ）を算出する。そして、離散フーリエ逆変換部１１は、フーリエ逆変換された音声推定信号ｓ１（ｔ）を音声推定信号出力部１２に対して出力する。 The discrete Fourier inverse transform unit 11 performs inverse Fourier transform on the input speech estimation signal s1 (f, t), and calculates a speech estimation signal s1 (t) obtained by inverse Fourier transform. Then, the discrete Fourier inverse transform unit 11 outputs the speech estimated signal s1 (t) subjected to the Fourier inverse transform to the speech estimated signal output unit 12.

音声推定信号出力部１２は、離散フーリエ逆変換部１１から出力された、最終的な出力信号である音声推定信号ｓ１（ｔ）を、出力端子から出力する。 The speech estimation signal output unit 12 outputs the speech estimation signal s1 (t), which is the final output signal output from the discrete Fourier inverse transform unit 11, from the output terminal.

ところで、従来の雑音抑制装置において、雑音信号と音声信号との間に相関関係がなく、雑音推定信号が雑音信号に完全に一致していると推定できる場合において、例えば、減算係数βに１を設定することができる。この場合、音声推定信号ｓ１（ｆ、ｔ）は、下記（８）式により表現することができる。

By the way, in the conventional noise suppression apparatus, when there is no correlation between the noise signal and the voice signal and it can be estimated that the noise estimation signal completely matches the noise signal, for example, 1 is set to the subtraction coefficient β. Can be set. In this case, the speech estimation signal s1 (f, t) can be expressed by the following equation (8).

しかしながら、雑音信号を正確に推定することは非常に困難であり、実際には誤差に起因して音声推定信号（出力信号）に雑音信号が混入し、若しくは、音声推定信号が削られることとなる。ここで、雑音推定信号を下記（９）式で表現した場合を想定する。
Ｎ１（ｆ、ｔ）＝０．７×Ｎ１（ｆ、ｔ）（９）式 However, it is very difficult to accurately estimate the noise signal. In practice, the noise signal is mixed into the speech estimation signal (output signal) due to the error, or the speech estimation signal is deleted. . Here, it is assumed that the noise estimation signal is expressed by the following equation (9).
N1 (f, t) = 0.7 × N1 (f, t) (9) Formula

この場合、音声推定信号ｓ１（ｆ、ｔ）は、下記（１０）式により表現することができる。なお、この音声推定信号ｓ１（ｆ、ｔ）には、実際に、振幅の変調以外に、推定処理による誤差が含有されている。

In this case, the speech estimation signal s1 (f, t) can be expressed by the following equation (10). Note that the speech estimation signal s1 (f, t) actually contains errors due to estimation processing in addition to amplitude modulation.

また、システム動作時に設定される減算係数βに応じて、音声推定信号ｓ１（ｆ、ｔ）に対する抑圧性能（雑音を抑制する性能）及び音質は、例えば、下記表1のように変動することが分かる。

Further, depending on the subtraction coefficient β set during system operation, the suppression performance (noise suppression performance) and sound quality for the speech estimation signal s1 (f, t) may vary, for example, as shown in Table 1 below. I understand.

表１に示すように、減算係数βの最適値は、雑音推定信号の精度に依存していることが分かる。また、音声信号は非定常信号であり、雑音推定信号の精度は時々刻々と変化するため、それに伴い、減算係数βの最適値も変化することとなる。 As shown in Table 1, it can be seen that the optimum value of the subtraction coefficient β depends on the accuracy of the noise estimation signal. Also, since the audio signal is a non-stationary signal and the accuracy of the noise estimation signal changes from moment to moment, the optimum value of the subtraction coefficient β also changes accordingly.

そこで、本実施形態に係る雑音抑制装置１０は、ソフトマスク関数Ｈ（ｆ、ｔ）を用いることにより、後述の如く、抑圧性能を向上させつつ、高音質を維持することができるような減算係数βを最適かつ自動的に設定することができる。 Therefore, the noise suppression apparatus 10 according to the present embodiment uses the soft mask function H (f, t) to improve the suppression performance and maintain a high sound quality as will be described later. β can be set optimally and automatically.

図２は、本実施形態に係る雑音抑制装置による雑音抑制方法の処理フローの一例を示すフローチャートである。 FIG. 2 is a flowchart illustrating an example of a processing flow of a noise suppression method performed by the noise suppression apparatus according to the present embodiment.

図２に示すように、第１離散フーリエ変換部２は、入力された混合観測信号ｘ（ｔ）に対してフーリエ変換を行い、フーリエ変換された混合観測信号Ｘ（ｆ、ｔ）を算出する（フーリエ変換工程）（ステップＳ１００）。 As shown in FIG. 2, the first discrete Fourier transform unit 2 performs a Fourier transform on the input mixed observation signal x (t), and calculates a Fourier-transformed mixed observation signal X (f, t). (Fourier transform process) (step S100).

また、第２離散フーリエ変換部６は、入力された雑音推定信号ｎ１（ｔ）に対して周知のフーリエ変換を行い、フーリエ変換された雑音推定信号Ｎ１（ｆ、ｔ）を算出する（フーリエ変換工程）（ステップＳ１０１）。 The second discrete Fourier transform unit 6 performs a well-known Fourier transform on the input noise estimation signal n1 (t), and calculates a Fourier-transformed noise estimation signal N1 (f, t) (Fourier transform). Process) (step S101).

次に、位相演算部３は、第１離散フーリエ変換部２からのフーリエ変換された混合観測信号Ｘ（ｆ、ｔ）に基づいて、位相ａｎｇｌｅ（Ｘ（ｆ、ｔ））を算出する（ステップＳ１０２）。 Next, the phase calculation unit 3 calculates the phase angle (X (f, t)) based on the mixed observation signal X (f, t) subjected to the Fourier transform from the first discrete Fourier transform unit 2 (step). S102).

また、第１スペクトル演算部４は、第１離散フーリエ変換部２からのフーリエ変換された混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２を算出する（ステップＳ１０３）。 The first spectrum calculation unit 4 calculates the power spectrum | X (f, t) | ² of the mixed observation signal X (f, t) subjected to the Fourier transform from the first discrete Fourier transform unit 2 (step S103). ).

さらに、第２スペクトル演算部７は、第２離散フーリエ変換部６からのフーリエ変換された雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２を算出する（ステップＳ１０４）。 Further, the second spectrum calculation unit 7 calculates the power spectrum | N1 (f, t) | ² of the noise estimation signal N1 (f, t) subjected to the Fourier transform from the second discrete Fourier transform unit 6 (step S104). ).

その後、マスク関数演算部８は、確信度Ｐ_ｆ（ｔ）及びＰ_ｂ（ｆ）を夫々算出し（ステップＳ１０５）、算出された上記確信度Ｐ_ｆ（ｔ）及びＰ_ｂ（ｆ）に基づいて、ソフトマスク関数Ｈ（ｆ、ｔ）を算出する（マスク関数演算工程）（ステップＳ１０６）。 Thereafter, the mask function calculation unit 8 calculates the certainty factors P _f (t) and P _b (f) (step S105), and based on the calculated certainty factors P _f (t) and P _b (f). Then, the soft mask function H (f, t) is calculated (mask function calculation step) (step S106).

減算処理部９は、マスク関数演算部８により算出されたソフトマスク関数Ｈ（ｆ、ｔ）を用いて、音声推定信号ｓ１（ｆ、ｔ）を算出する（減算処理工程）（ステップＳ１０７）。そして、離散フーリエ逆変換部１１は、音声推定信号ｓ１（ｆ、ｔ）に対してフーリエ逆変換を行い（ステップＳ１０８）、音声推定信号出力部１２は、フーリエ逆変換された音声推定信号ｓ１（ｔ）を出力端子から出力する（ステップＳ１０９）。 The subtraction processing unit 9 calculates the speech estimation signal s1 (f, t) using the soft mask function H (f, t) calculated by the mask function calculation unit 8 (subtraction processing step) (step S107). Then, the discrete Fourier inverse transform unit 11 performs Fourier inverse transform on the speech estimation signal s1 (f, t) (step S108), and the speech estimation signal output unit 12 performs speech inversely transformed speech estimation signal s1 ( t) is output from the output terminal (step S109).

図３（ａ）及び（ｂ）は、本実施形態に係る雑音抑制装置１０と従来の雑音抑制装置との比較試験結果の一例を示している。なお、図３（ａ）は各雑音抑制装置による抑圧性能を示しており、位相ａｎｇｌｅ（横軸）とＳＮＲ（Ｓ／Ｎ比）値（縦軸）との関係を示している。図３（ｂ）は、各雑音抑制装置による音質を示しており、位相ａｎｇｌｅ（横軸）とＣＤ値（縦軸）との関係を示している。 3A and 3B show an example of a comparison test result between the noise suppression device 10 according to the present embodiment and a conventional noise suppression device. FIG. 3A shows the suppression performance of each noise suppression device, and shows the relationship between the phase angle (horizontal axis) and the SNR (S / N ratio) value (vertical axis). FIG. 3B shows the sound quality of each noise suppression device, and shows the relationship between the phase angle (horizontal axis) and the CD value (vertical axis).

また、図３（ａ）及び（ｂ）において、実線（１）は、雑音抑制が行われていない実際に観測される音声データである。実線（２）は、周知のＢｅａｍｆｏｒｍｅｒ法を用いた従来の雑音抑制装置により、雑音抑制を行った際の音声データである。実線（３）は、減算係数β＝２に設定した従来の雑音抑制装置により、雑音抑制を行った際の音声データである。実線（４）は、減算係数β＝５に設定した従来の雑音抑制装置により、雑音抑制を行った際の音声データである。実線（５）は、本実施形態に係る雑音抑制装置１０により、雑音抑制を行った際の音声データである。 In FIGS. 3A and 3B, a solid line (1) is actually observed voice data that is not subjected to noise suppression. A solid line (2) is voice data when noise suppression is performed by a conventional noise suppression apparatus using the well-known Beamformer method. A solid line (3) is voice data when noise suppression is performed by a conventional noise suppression device set to a subtraction coefficient β = 2. A solid line (4) is voice data when noise suppression is performed by a conventional noise suppression device set to a subtraction coefficient β = 5. A solid line (5) is voice data when noise suppression is performed by the noise suppression device 10 according to the present embodiment.

図３（ａ）に示すように、本実施形態に係る雑音抑制装置１０により雑音抑制を行った際のデータ（実線（５））は、従来の雑音抑制装置によるデータ（実線（２）乃至（４））と比較して、全位相領域に渡って、ＳＮＲ値が高く、良好に雑音抑制が行われている。さらに、図３（ｂ）に示すように、本実施形態に係る雑音抑制装置１０により雑音抑制を行った際のデータ（実線（５））は、全位相領域に渡って、ＣＤ値が比較的に高く維持されており、高音質に維持されている。すなわち、本実施形態に係る雑音抑制装置１０は、従来の雑音抑制装置と比較して、高い抑圧性能を発揮しつつ、高音質を維持することができる。 As shown in FIG. 3A, data (solid line (5)) when noise suppression is performed by the noise suppression apparatus 10 according to this embodiment is data (solid line (2) to ( Compared with 4)), the SNR value is high over the entire phase region, and noise suppression is satisfactorily performed. Further, as shown in FIG. 3B, the data (solid line (5)) when noise suppression is performed by the noise suppression apparatus 10 according to the present embodiment has a relatively low CD value over the entire phase region. Is maintained at a high level, and the sound quality is maintained. That is, the noise suppression device 10 according to the present embodiment can maintain high sound quality while exhibiting high suppression performance as compared with the conventional noise suppression device.

以上、本実施形態に係る雑音抑制装置１０において、マスク関数演算部８は、雑音推定信号Ｎ１（ｆ、ｔ）のパワースペクトル｜Ｎ１（ｆ、ｔ）｜^２と、混合観測信号Ｘ（ｆ、ｔ）のパワースペクトル｜Ｘ（ｆ、ｔ）｜^２との比で表わされる確信度Ｐ_ｆ（ｔ）、Ｐ_ｂ（ｆ）を夫々算出する。そして、マスク関数演算部８は、算出された確信度Ｐ_ｆ（ｔ）及びＰ_ｂ（ｆ）に基づいて、ソフトマスク関数Ｈ（ｆ、ｔ）を算出する。さらに、減算処理部９は、算出されたソフトマスク関数Ｈ（ｆ、ｔ）を用いて、音声推定信号ｓ１（ｆ、ｔ）を算出する。 As described above, in the noise suppression apparatus 10 according to the present embodiment, the mask function calculation unit 8 includes the power spectrum | N1 (f, t) | ² of the noise estimation signal N1 (f, t) and the mixed observation signal X (f, The certainty factors P _f (t) and P _b (f) expressed by the ratio of t) to the power spectrum | X (f, t) | ² are calculated. Then, the mask function calculation unit 8 calculates the soft mask function H (f, t) based on the calculated certainty factors P _f (t) and P _b (f). Further, the subtraction processing unit 9 calculates the speech estimation signal s1 (f, t) using the calculated soft mask function H (f, t).

これにより、高い抑圧性能を発揮しつつ、高音質を維持することができるような、減算係数βであるソフトマスク関数Ｈ（ｆ，ｔ）を最適かつ自動的に設定することができる。すなわち、雑音を効果的に抑制しつつ、高音質を維持することができる。 As a result, the soft mask function H (f, t), which is the subtraction coefficient β, can be optimally and automatically set so as to maintain high sound quality while exhibiting high suppression performance. That is, it is possible to maintain high sound quality while effectively suppressing noise.

なお、本実施形態に係る雑音抑制装置１０は、例えば、音声を認識する音声認識システムに適用してもよく、音声信号及び雑音信号を含む混合観測信号から雑音信号を除去する任意のシステムに適用可能である。 Note that the noise suppression device 10 according to the present embodiment may be applied to, for example, a speech recognition system that recognizes speech, and is applied to any system that removes a noise signal from a mixed observation signal including a speech signal and a noise signal. Is possible.

また、本発明を実施するための最良の形態について一実施形態を用いて説明したが、本発明はこうした一実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において、上述した実施形態に種々の変形及び置換を加えることができる。 Moreover, although the best mode for carrying out the present invention has been described using one embodiment, the present invention is not limited to such one embodiment, and within the scope not departing from the gist of the present invention, Various modifications and substitutions can be added to the above-described embodiments.

本発明の一実施形態に係る雑音抑制装置のシステム構成を示すブロック図である。It is a block diagram which shows the system configuration | structure of the noise suppression apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る雑音抑制装置による雑音抑制方法の処理フローの一例を示すフローチャートである。It is a flowchart which shows an example of the processing flow of the noise suppression method by the noise suppression apparatus which concerns on one Embodiment of this invention. （ａ）各雑音抑制装置による抑圧性能を示す図であり、位相とＳＮＲ値との関係を示す図である。（ｂ）各雑音抑制装置による音質を示す図であり、位相ａｎｇｌｅとＣＤ値との関係を示す図である。(A) It is a figure which shows the suppression performance by each noise suppression apparatus, and is a figure which shows the relationship between a phase and a SNR value. (B) It is a figure which shows the sound quality by each noise suppression apparatus, and is a figure which shows the relationship between phase angle and CD value.

Explanation of symbols

１混合観測信号入力部
２第１離散フーリエ変換部
３位相演算部
４第１スペクトル演算部
５雑音推定信号入力部
６第２離散フーリエ変換部
７第２スペクトル演算部
８マスク関数演算部
９減算処理部
１０雑音抑制装置
１１離散フーリエ逆変換部
１２音声推定信号出力部 DESCRIPTION OF SYMBOLS 1 Mixed observation signal input part 2 1st discrete Fourier transform part 3 Phase calculation part 4 1st spectrum calculation part 5 Noise estimation signal input part 6 2nd discrete Fourier transform part 7 2nd spectrum calculation part 8 Mask function calculation part 9 Subtraction process 10 Noise suppression device 11 Discrete Fourier inverse transform unit 12 Speech estimation signal output unit

Claims

A Fourier transform unit that performs Fourier transform on a mixed observation signal including a speech signal and a noise signal, and a noise estimation signal that is an estimated noise signal;
A mask function calculation unit that calculates a mask function to be a subtraction coefficient based on the mixed observation signal and the noise estimation signal Fourier-transformed by the Fourier transform unit;
A subtraction processing unit that calculates a speech estimation signal to be estimated as a speech signal by subtracting the noise estimation signal from the mixed observation signal using the mask function calculated by the mask function computation unit. A noise suppression device characterized by that.

The noise suppression device according to claim 1,
The said mask function calculating part calculates the said mask function containing two reliability represented by ratio of the said noise estimated signal and the mixed observation signal which were Fourier-transformed, The noise suppression apparatus characterized by the above-mentioned.

The noise suppression device according to claim 2,
The said mask function calculating part calculates the said reliability _Pf (t) and _Pb (f), respectively using the following (1) Formula and (2) Formula, The noise suppression apparatus characterized by the above-mentioned.

N1 (f, t): Fourier-transformed noise estimation signal X (f, t): Fourier-transformed mixed observation signal

The noise suppression device according to claim 3,
The mask function computing unit calculates the mask function H (f, t) by using the following equation (3).

δ _o : Minimum subtraction coefficient δ _m : Maximum subtraction coefficient

The noise suppression device according to claim 2,
A noise suppression apparatus, further comprising: a Fourier inverse transform unit that performs an inverse Fourier transform on the speech estimation signal calculated by the subtraction processing unit.

A Fourier transform step of performing a Fourier transform on the mixed observation signal including the audio signal and the noise signal, and the noise estimation signal estimated to be a noise signal;
A mask function calculation step of calculating a mask function to be a subtraction coefficient based on the mixed observation signal and the noise estimation signal that are Fourier-transformed in the Fourier transformation step;
A subtraction processing step of calculating a speech estimation signal to be estimated as a speech signal by subtracting the noise estimation signal from the mixed observation signal using the mask function calculated in the mask function computation step. The noise suppression method characterized by the above-mentioned.