JP6517124B2

JP6517124B2 - Noise suppression device, noise suppression method, and program

Info

Publication number: JP6517124B2
Application number: JP2015209729A
Authority: JP
Inventors: 智子川瀬; 健太丹羽; 雅清藤本; 記良鎌土; 中谷　智広; 智広中谷; 荒木　章子; 章子荒木; 小林　和則; 和則小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-10-26
Filing date: 2015-10-26
Publication date: 2019-05-22
Anticipated expiration: 2035-10-26
Also published as: JP2017083566A

Description

この発明は、マルチチャネルのマイクロホンアレーで収音した音響信号から雑音を抑圧して話者の音声を強調する技術に関する。 The present invention relates to a technique for suppressing noise from an acoustic signal picked up by a multi-channel microphone array to emphasize a speaker's voice.

マルチチャネルのマイクロホンアレーを利用して雑音抑圧を行う技術のうち、方向制御が容易なビームフォーミング技術として、遅延和アレーによる指向性マイクロホンアレーがよく用いられている（例えば、非特許文献１参照）。非特許文献１に記載された手法では、特定方向から到来した音がマルチチャネルのマイクロホンに到達するときのマイクロホン間での遅延を計算し、遅延が小さいマイクロホンの受音信号を遅延が大きいマイクロホンの受音信号に合わせて遅延させる。これにより、特定方向の信号が全受音信号で同相化され、それらの総和を出力としたときに出力のゲインが最大化される。 Among the techniques for performing noise suppression using a multi-channel microphone array, a directional microphone array based on a delay-and-sum array is often used as a beamforming technique for which direction control is easy (for example, see Non-Patent Document 1) . In the method described in Non-Patent Document 1, the delay between microphones when sound arriving from a specific direction reaches a multi-channel microphone is calculated, and the sound reception signal of the microphone with a small delay Delay according to the incoming signal. As a result, the signal in the specific direction is in-phase with all the sound receiving signals, and the output gain is maximized when the sum of the signals is output.

指向性マイクロホンアレーを用いるマルチチャネル雑音抑圧として、ビームフォーミングの後段にフィルタを適用する手法がある（例えば、特許文献１参照）。特許文献１に記載された手法では、方向別に収音した信号から信号量を推定し、その推定された信号量を用いてウィーナーフィルタを設計する。 As multi-channel noise suppression using a directional microphone array, there is a method of applying a filter to the subsequent stage of beam forming (see, for example, Patent Document 1). In the method described in Patent Document 1, the amount of signal is estimated from the signal collected for each direction, and a Wiener filter is designed using the estimated amount of signal.

特許第４７２４０５４号公報Patent No. 4724054 gazette

大賀寿郎、金田豊、山崎芳男著、“音響システムとディジタル処理”、電子情報通信学会、１９９５年Toshiro Oga, Yutaka Kanada, Yoshio Yamazaki, "Sound System and Digital Processing", The Institute of Electronics, Information and Communication Engineers, 1995

しかしながら、多くの音源が密に存在する状況では、各音源からの信号が互いに干渉するため、空間情報のみに基づいて音源を分離することが難しい。そのため、そのような状況では、従来のビームフォーミングでは雑音抑圧の性能が低下するという問題があった。 However, in the situation where many sound sources are densely present, it is difficult to separate the sound sources based only on spatial information because the signals from the respective sound sources interfere with each other. Therefore, in such a situation, there is a problem that the performance of noise suppression is degraded in the conventional beamforming.

この発明の目的は、このような問題に鑑みて、特定方向から到来する信号を高精度にモデル化し、雑音抑圧性能を高めることができる雑音抑圧技術を提供することである。 An object of the present invention is to provide a noise suppression technique capable of modeling a signal arriving from a specific direction with high accuracy and enhancing noise suppression performance in view of such a problem.

上記の課題を解決するために、この発明の雑音抑圧装置は、クリーン音声信号および雑音信号の少なくともいずれかと無音信号の各出力確率を、それぞれ、複数の正規分布を含有する混合正規分布で表現した信号成分モデルのパラメータを記憶するモデルパラメータ記憶部と、複数の異なる方向の角度領域から到来する音を強調して収音した複数の角度領域信号のそれぞれを複数の帯域成分に分割した複数の周波数領域信号から所望方向の角度領域の角度領域信号に所属する特定方向周波数領域信号を選択する特定方向選択部と、信号成分モデルのパラメータを用いて複数の正規分布を含有する混合正規分布で表現される特定方向信号モデルを生成し、特定方向周波数領域信号が観測されたときの特定方向信号モデルの事後確率を計算する特定方向信号モデル化部と、信号成分モデルのパラメータを用いて算出された周波数帯域毎の利得係数を特定方向信号モデルの事後確率で重み付き加算する利得係数算出部と、利得係数を特定方向周波数領域信号の各対応する周波数帯域の信号量に乗算する乗算部と、を含む。 In order to solve the above problems, the noise suppression device according to the present invention represents each output probability of at least one of a clean speech signal and a noise signal and a silent signal by a mixed normal distribution containing a plurality of normal distributions. A model parameter storage unit for storing parameters of the signal component model, and a plurality of frequencies obtained by dividing each of a plurality of angle region signals picked up by emphasizing sounds coming from a plurality of angle regions in different directions into a plurality of band components It is expressed by a mixed normal distribution including a plurality of normal distributions using a specific direction selection unit that selects a specific direction frequency domain signal belonging to an angular domain signal of an angular domain in a desired direction from the domain signal and parameters of the signal component model. Specific direction signal model, and calculate the posterior probability of the specific direction signal model when the specific direction frequency domain signal is observed. Direction signal modeling unit, gain coefficient calculation unit for weighted addition of gain coefficients for each frequency band calculated using parameters of signal component model with posterior probability of specific direction signal model, and gain coefficients in specific direction frequency domain And a multiplication unit for multiplying the signal amount of each corresponding frequency band of the signal.

この発明の雑音抑圧技術によれば、各角度領域の信号量と統計モデルに基づく信号量とを併用することで、特定方向から到来する信号を高精度にモデル化することができる。このモデルを利用することにより、ウィーナーフィルタを高精度に設計可能になるため、雑音抑圧性能が向上する。 According to the noise suppression technique of the present invention, it is possible to model a signal coming from a specific direction with high accuracy by using together the signal amount of each angle region and the signal amount based on the statistical model. By using this model, the Wiener filter can be designed with high accuracy, and the noise suppression performance is improved.

図１は、第一実施形態の雑音抑圧装置の機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of the noise suppression device of the first embodiment. 図２は、第一実施形態の雑音抑圧方法の処理手続きを例示する図である。FIG. 2 is a diagram illustrating the processing procedure of the noise suppression method of the first embodiment. 図３は、モデルパラメータ記憶部の機能構成を例示する図である。FIG. 3 is a diagram illustrating the functional configuration of the model parameter storage unit. 図４は、ビームフォーマー部の指向特性を説明するための図である。FIG. 4 is a diagram for explaining the directivity characteristic of the beam former. 図５は、ビームフォーマー部に設定する指向方向領域の分割例を説明するための図である。FIG. 5 is a diagram for explaining an example of division of the directivity direction area set in the beam former. 図６は、ビームフォーマー部の機能構成を例示する図である。FIG. 6 is a diagram illustrating the functional configuration of the beam former. 図７は、特定方向選択部の機能構成を例示する図である。FIG. 7 is a diagram illustrating a functional configuration of the specific direction selection unit. 図８は、信号量推定部の機能構成を例示する図である。FIG. 8 is a diagram illustrating a functional configuration of the signal amount estimating unit. 図９は、ビームフォーマー部の指向特性の一例を説明するための図である。FIG. 9 is a diagram for explaining an example of the directivity characteristic of the beam former. 図１０は、第一実施形態の特定方向信号モデル化部の機能構成を例示する図である。FIG. 10 is a diagram illustrating a functional configuration of the specific direction signal modeling unit of the first embodiment. 図１１は、利得係数算出部の機能構成を例示する図である。FIG. 11 is a diagram illustrating a functional configuration of the gain coefficient calculation unit. 図１２は、第二実施形態の雑音抑圧装置の機能構成を例示する図である。FIG. 12 is a diagram illustrating a functional configuration of the noise suppression device of the second embodiment. 図１３は、第二実施形態の特定方向信号モデル化部の機能構成を例示する図である。FIG. 13 is a diagram illustrating a functional configuration of the specific direction signal modeling unit of the second embodiment. 図１４は、第三実施形態の雑音抑圧装置の機能構成を例示する図である。FIG. 14 is a diagram illustrating a functional configuration of the noise suppression device of the third embodiment. 図１５は、第三実施形態の特定方向信号モデル化部の機能構成を例示する図である。FIG. 15 is a diagram illustrating a functional configuration of the specific direction signal modeling unit of the third embodiment.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In the drawings, components having the same functions are denoted by the same reference numerals and redundant description will be omitted.

［第一実施形態］
従来技術では、多音源が密に存在する場合、各音源からの音が互いに干渉するため、収音信号から推定するパワースペクトル密度（ＰＳＤ）の推定誤差が大きくなる。そこで、第一実施形態では、事前に状態ごとのクリーン音声モデルを学習し、そのクリーン音声モデルを用いて収音信号に含まれる音声ＰＳＤの推定誤差を補正する。その補正した音声ＰＳＤを用いてウィーナーフィルタを設計することで、従来よりも高精度な雑音抑圧を可能とする。 First Embodiment
In the prior art, when multiple sound sources are densely present, the sound from each sound source interferes with each other, so that the estimation error of the power spectral density (PSD) estimated from the collected sound signal becomes large. Therefore, in the first embodiment, a clean speech model for each state is learned in advance, and the estimation error of the speech PSD included in the collected sound signal is corrected using the clean speech model. By designing a Wiener filter using the corrected voice PSD, it is possible to suppress noise with higher accuracy than ever before.

なお、この発明の各実施形態では、音声信号（クリーン音声信号および無音信号）並びに雑音信号を次のように定義する。雑音が全く存在しない防音室等で録音を行っても、録音された信号には極微小で白色的な雑音が観測される。このような環境において観測される信号を「無音信号」と定義する。無音信号も雑音の一種であるといえるが、この雑音は録音機材等の電気回路や転送系などの電気的要因により発生する雑音である。一方、自動車の走行音や風の音などは、音波が大気中を伝わって観測される音響的要因により発生する雑音である。そこで、電気的要因による雑音と音響的要因による雑音とを区別し、後者のみを「雑音信号」と定義する。また、無音信号が観測されている環境において発話を行うと、発話音声信号が無音信号に重畳された形で観測される。この重畳された信号を「クリーン音声信号」と定義する。そして、雑音信号が存在しない環境では、連続する無音信号の合間にクリーン音声信号が観測される。これら無音信号とクリーン音声信号を総称して「音声信号」と定義する。 In each embodiment of the present invention, an audio signal (clean audio signal and silent signal) and a noise signal are defined as follows. Even when recording is performed in a soundproof room or the like in which no noise is present, very small and white noise is observed in the recorded signal. A signal observed in such an environment is defined as a "silence signal". A silence signal can also be said to be a type of noise, but this noise is noise that is generated due to electrical factors such as an electrical circuit of a recording equipment or a transfer system. On the other hand, the running noise and the wind noise of a car are noises generated by acoustic factors which are observed when sound waves are transmitted through the atmosphere. Therefore, noise due to electrical factors and noise due to acoustic factors are distinguished, and only the latter is defined as "noise signal". Also, when speech is performed in an environment where a silence signal is observed, the speech signal is observed in a form superimposed on the silence signal. This superimposed signal is defined as "clean speech signal". Then, in an environment where no noise signal is present, a clean speech signal is observed between continuous silence signals. The silence signal and the clean speech signal are generically defined as "voice signal".

第一実施形態の雑音抑圧装置は、図１に示すように、モデルパラメータ記憶部１０、Ｍ（≧2）個のマイクロホン１１−１〜１１−Ｍからなるマイクロホンアレー１１、Ｑ（≧2）個のビームフォーマー部１２−１〜１２−Ｑ、Ｑ個の周波数領域変換部１３−１〜１３−Ｑ、特定方向選択部１４、信号量推定部１５、ベクトル要素抽出部１６、特定方向信号モデル化部１７、利得係数算出部１８、乗算部１９、および逆周波数領域変換部２０を含む。この雑音抑圧装置が後述する各ステップの処理を行うことにより第一実施形態の雑音抑圧方法が実現される。 As shown in FIG. 1, the noise suppression apparatus according to the first embodiment includes a model parameter storage unit 10, a microphone array 11 consisting of M (22) microphones 11-1 to 11-M, and Q (≧ 2) Beamformer units 12-1 to 12-Q, Q frequency domain conversion units 13-1 to 13-Q, specific direction selection unit 14, signal amount estimation unit 15, vector element extraction unit 16, specific direction signal model And a gain coefficient calculation unit 18, a multiplication unit 19, and an inverse frequency domain conversion unit 20. The noise suppression method of the first embodiment is realized by the processing of each step described later by this noise suppression device.

雑音抑圧装置は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。雑音抑圧装置は、例えば、中央演算処理装置の制御のもとで各処理を実行する。雑音抑圧装置に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、雑音抑圧装置の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。雑音抑圧装置の各記憶部は、例えば、RAM（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリ（Flash Memory）のような半導体メモリ素子により構成される補助記憶装置、またはリレーショナルデータベースやキーバリューストアなどのミドルウェアにより構成することができる。 The noise suppression device is, for example, a special program configured by reading a special program into a known or dedicated computer having a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. It is an apparatus. The noise suppression device executes each process, for example, under the control of the central processing unit. The data input to the noise suppressor and the data obtained by each process are stored, for example, in the main storage device, and the data stored in the main storage device is read out as needed and used for other processing. Ru. In addition, at least a part of each processing unit of the noise suppression device may be configured by hardware such as an integrated circuit. Each storage unit of the noise suppression device is, for example, a main storage device such as a random access memory (RAM), an auxiliary storage device configured by a semiconductor memory device such as a hard disk, an optical disk, or a flash memory, or a relational database. And middleware such as a key value store.

図２を参照して、第一実施形態の雑音抑圧方法の処理手続きを説明する。 The processing procedure of the noise suppression method of the first embodiment will be described with reference to FIG.

（モデルパラメータ記憶部）
モデルパラメータ記憶部１０は、図３に示すように、無音ＧＭＭ記憶部１０Ａおよびクリーン音声ＧＭＭ記憶部１０Ｂを有する。無音ＧＭＭ記憶部１０Ａおよびクリーン音声ＧＭＭ記憶部１０Ｂには、それぞれ、あらかじめ用意された無音信号およびクリーン音声信号の確率モデルのパラメータが格納される。この実施形態では、確率モデルとして複数の正規分布を含有する混合正規分布モデル（ＧＭＭ：Gaussian Mixture Model）を利用する。なお、混合正規分布モデルに含まれる正規分布の数が多いほど推定精度は向上するが、処理速度は低下する。そのため、混合正規分布モデルに含まれる正規分布の数は、実効的には２〜５１２個の間の値が望ましく、３２個程度が最も望ましい。また、それぞれの正規分布は混合重みw^S _j,k、平均μ^S _j,k(ω)、分散σ^S _j,k ²(ω)をパラメータとして構成される。ここで、jはＧＭＭの種別（j=0:無音ＧＭＭ、j=1:クリーン音声ＧＭＭ）、kは各正規分布の番号、ωは周波数ビン番号である。ＧＭＭの特徴量には、例えば、信号パワーの各周波数ビンの成分を用いる。なお、ＧＭＭの構成方法については多くの公知の技術や周知技術が存在しているため、詳細な説明を省略する（例えば、下記参考文献１等参照）。
〔参考文献１〕中川聖一著、「確率モデルによる音声認識」、電子情報通信学会 (Model parameter storage unit)
As shown in FIG. 3, the model parameter storage unit 10 includes a silent GMM storage unit 10A and a clean voice GMM storage unit 10B. In the silent GMM storage unit 10A and the clean voice GMM storage unit 10B, parameters of a probability model of a silent signal and a clean voice signal prepared in advance are stored, respectively. In this embodiment, a mixed normal distribution model (GMM: Gaussian Mixture Model) containing a plurality of normal distributions is used as a probability model. The estimation accuracy improves as the number of normal distributions included in the mixture normal distribution model increases, but the processing speed decreases. Therefore, the number of normal distributions included in the mixed normal distribution model is desirably a value between 2 and 512 effectively, and approximately 32 is most desirable. Further, each normal distribution is configured using the mixing weight w ^s _{j, k} , the average μ ^s _{j, k} (ω), and the variance σ ^s _{j, k} ² (ω) as parameters. Here, j is the type of GMM (j = 0: silent GMM, j = 1: clean voice GMM), k is the number of each normal distribution, and ω is the frequency bin number. For example, the component of each frequency bin of signal power is used for the feature quantity of GMM. In addition, since many well-known techniques and well-known techniques exist about the structure method of GMM, detailed description is abbreviate | omitted (for example, refer the following reference 1 grade | etc.,).
[Reference 1] Nakagawa Seiichi, "Speech recognition based on probabilistic models", The Institute of Electronics, Information and Communication Engineers

すなわち、モデルパラメータ記憶部１０には、クリーン音声信号と無音信号の各出力確率を、それぞれ、複数の正規分布を含有する混合正規分布で表現した確率モデルのモデルパラメータ（以下、音声モデルパラメータとも呼ぶ）が記憶されている。音声モデルパラメータΦ^Sは式（１）により構成される。 That is, the model parameter storage unit 10 is a model parameter of a probability model (hereinafter also referred to as a speech model parameter) in which each output probability of the clean speech signal and the silence signal is represented by a mixed normal distribution containing a plurality of normal distributions. Is stored. The speech model parameter ^{S S} is configured by equation (1).

（マイクロホンアレー）
ステップＳ１１において、マイクロホンアレー１１は、Ｍ個のマイクロホン１１−１〜１１−Ｍを用いてＭ個の信号x_m(n)（m=1,…,M）を収音する。ここで、nは離散時間信号のサンプル番号を表す。信号x_m(n)はそれぞれビームフォーマー部１２−１〜１２−Ｑに入力される。 (Microphone array)
In step S11, the microphone array 11 picks up M signals x _m (n) (m = 1,..., M) using the M microphones 11-1 to 11-M. Here, n represents the sample number of the discrete time signal. The signals x _m (n) are input to the beam formers 12-1 to 12-Q, respectively.

（ビームフォーマー部）
ステップＳ１２において、ビームフォーマー部１２−１〜１２−Ｑは、例えば図４に示すような指向性のビームＢＭを、図５であらかじめ与えられたＱ個の方向領域Θ₁〜Θ_Qのいずれかに向け、該当する方向領域で発せられる音を強調して収音する処理を行い、結果を出力する。各ビームフォーマー部１２−１〜１２−Ｑの出力信号y₁(n), y₂(n), …, y_Q(n)はそれぞれ周波数領域変換部１３−１〜１３−Ｑに入力される。 (Beam Former)
In step S12, the beam former unit 12-1 to 12-Q is, for example, the directivity of the beam BM as shown in FIG. 4, one of Q direction region theta ₁ through? _Q previously given in Figure 5 A process is performed to emphasize and collect the sound emitted in the corresponding direction area toward the heel, and the result is output. The output signals y ₁ (n), y ₂ (n),..., Y _Q (n) of the beam formers 12-1 to 12-Q are input to the frequency domain transformers 13-1 to 13-Q, respectively. Ru.

図６はビームフォーマー部１２−１〜１２−Ｑの中の一つの例えばビームフォーマー部１２−ｑの構成を示している。同様の処理がすべてのビームフォーマー部において行われる。入力された信号x₁(n)〜x_M(n)はそれぞれフィルタ処理部ＦＣ−１〜ＦＣ−Ｍに入力される。フィルタ処理部ＦＣ−１〜ＦＣ−Ｍではあらかじめ与えられたフィルタ係数W_qm(n)を、式（２）に示す畳み込み演算に代入して得られる信号x'_qm(n)を出力する。 FIG. 6 shows the configuration of one of the beam formers 12-1 to 12-Q, for example, the beam former 12-q. Similar processing is performed in all beamformers. The input signals x ₁ (n) to x _M (n) are input to the filter processing units FC-1 to FC-M, respectively. The filter processing units FC-1 to FC-M output a signal x ' _qm (n) obtained by substituting the filter coefficient W _qm (n) given in advance into the convolution operation shown in the equation (2).

各フィルタ処理部ＦＣ−１〜ＦＣ−Ｍの出力信号は加算部ＡＤＤに入力される。加算部ＡＤＤでは入力信号を式（３）のように加算し、ビームフォーマー部の出力信号y_q(n)（q=1, …, Q）を得る。 Output signals of the filter processing units FC-1 to FC-M are input to the addition unit ADD. The adder ADD adds the input signals as shown in equation (3) to obtain the output signal y _q (n) (q = 1,..., Q) of the beam former.

フィルタ係数W_qm(n)は、それぞれのビームフォーマー部１２−１〜１２−Ｑの指向特性D_q(ω,θ)が、図５に示すあらかじめ与えられた第Ｑ方向領域Θ_Qで発せられる音を強調して受音し、それ以外の方向で発せられる音を抑圧するように構成される。 The filter coefficients W _qm (n) are emitted in the Q characteristic direction Θ _Q given in advance of the directivity characteristics D _q (ω, θ) of each of the beam formers 12-1 to 12-Q shown in FIG. It is configured to emphasize sound to be received and receive sound, and to suppress sound emitted in other directions.

（周波数領域変換部）
ステップＳ１３において、周波数領域変換部１３−１〜１３−Ｑは入力信号y₁(n), y₂(n), …, y_Q(n)を短い時間長（例えばサンプリング周波数16,000Hzの場合には256サンプル程度）のフレームに分解し、それぞれのフレームにおいて離散フーリエ変換を行って得られた例えばΩ個の周波数成分を出力信号Y₁(ω,l), Y₂(ω,l), …, Y_Q(ω,l)として出力する。ここで、ωは周波数ビン番号を表し、lはフレーム番号を表す。各周波数領域変換部１３−１〜１３−Ｑの出力信号Y₁(ω,l), Y₂(ω,l), …, Y_Q(ω,l)は特定方向選択部１４および信号量推定部１５にそれぞれ入力される。 (Frequency domain converter)
In step S13, the frequency domain conversion units 13-1 to 13-Q input signals y ₁ (n), y ₂ (n),..., Y _Q (n) have short time lengths (eg, sampling frequency 16,000 Hz) Is decomposed into frames of about 256 samples, and discrete Fourier transform is performed in each frame, for example, Ω frequency components obtained are output signals Y ₁ (ω, l), Y ₂ (ω, l), ... , Y _Q (ω, l). Here, ω represents a frequency bin number, and l represents a frame number. The output signals Y ₁ (ω, l), Y ₂ (ω, l),..., Y _Q (ω, l) of the frequency domain conversion units 13-1 to 13 -Q are the specific direction selection unit 14 and the signal amount estimation The data is input to the unit 15 respectively.

（特定方向選択部）
ステップＳ１４において、特定方向選択部１４は、入力された周波数領域変換部の出力信号から強調したい方向領域に指向性のビームを向けたビームフォーマー部の出力に対応する信号を選択して出力する。特定方向選択部１４の出力信号Y_S(ω,l)は、特定方向信号モデル化部１７および乗算部１９に入力される。 (Specific direction selection unit)
In step S14, the specific direction selection unit 14 selects and outputs a signal corresponding to the output of the beam former which directed the directional beam to the direction region to be emphasized from the output signal of the input frequency domain converter. . The output signal Y _S (ω, l) of the specific direction selection unit 14 is input to the specific direction signal modeling unit 17 and the multiplication unit 19.

図７は特定方向選択部１４の構成を示している。特定方向選択部１４では各ビームフォーマー部１２−１〜１２−Ｑより周波数領域変換部１３−１〜１３−Ｑを経て入力された周波数領域信号Y₁(ω,l), Y₂(ω,l), …, Y_Q(ω,l)のうち、強調したい第qs方向領域に対応するものを選択してY_S(ω,l)として出力する。 FIG. 7 shows the configuration of the specific direction selection unit 14. In the specific direction selection unit 14, the frequency domain signals Y ₁ (ω, l) and Y ₂ (ω which are input from the beamformer units 12-1 to 12 -Q through the frequency domain conversion units 13-1 to 13 -Q. , l), ..., Y _Q (ω, l), which corresponds to the qs direction region to be emphasized, is selected and output as Y _s (ω, l).

（信号量推定部）
ステップＳ１５において、信号量推定部１５は、入力された周波数領域変換部の出力信号から各方向領域Θ₁〜Θ_Qにおける音源から発せられる音信号の総和のパワー成分を求め、これを１つのベクトルにまとめた推定信号パワーベクトルX_est(ω,l)を出力する。信号量推定部１５の出力する推定信号パワーベクトルX_est(ω,l)はベクトル要素抽出部１６に入力される。 (Signal amount estimation unit)
In step S15, the signal amount estimation unit 15 obtains the power component of the sum of sound signals emitted from the sound source in each direction region Θ _{1 to} Θ _Q from the input output signal of the frequency domain conversion unit, The estimated signal power vector X _est (ω, l) summarized in FIG. The estimated signal power vector X _est (ω, l) output from the signal amount estimation unit 15 is input to the vector element extraction unit 16.

図８は信号量推定部１５の構成を示している。信号量推定部１５に入力される周波数領域信号Y₁(ω,l), Y₂(ω,l), …, Y_Q(ω,l)は、それぞれパワー演算部ＰＷ−１〜ＰＷ−Ｑに入力される。パワー演算部ＰＷ−１〜ＰＷ−Ｑは、各信号のパワー値|Y₁(ω,l)|², |Y₂(ω,l)|², …, |Y_Q(ω,l)|²を計算し、結果を出力する。パワー値|Y₁(ω,l)|², |Y₂(ω,l)|², …, |Y_Q(ω,l)|²は領域集約部１５Ａに入力される。領域集約部１５Ａは、あらかじめ決められた収音したい領域の集合Ｓから発せられる信号のパワー値の平均と、抑圧したい領域の集合Ｎから発せられる信号のパワー値の平均を求め、その結果からなる集約パワーベクトルY(ω,l)を出力する。 FIG. 8 shows the configuration of the signal amount estimation unit 15. The frequency domain signals Y ₁ (ω, l), Y ₂ (ω, l),..., Y _Q (ω, l) input to the signal amount estimation unit 15 are power operation units PW-1 to PW-Q, respectively. Is input to The power calculation units PW-1 to PW-Q have power values | Y ₁ (ω, l) | ² , | Y ₂ (ω, l) | ² , ..., | Y _Q (ω, l) | Calculate ² and output the result. The power values | Y ₁ (ω, l) | ² , | Y ₂ (ω, l) | ² ,..., | Y _Q (ω, l) | ² are input to the area aggregation unit 15A. The area aggregation unit 15A obtains the average of the power values of the signals emitted from the set S of the predetermined areas to be picked up and the power values of the signals emitted from the set N of the areas to be suppressed, Output the aggregated power vector Y (ω, l).

ここで、N_Sは集合Ｓに含まれる領域の数、N_Nは集合Ｎに含まれる領域の数を示している。また、すべての方向領域（1〜Q）を集合Ｓまたは集合Ｎに所属するようにあらかじめ定めておく。例えば、Q=4のとき、集合Ｓと集合Ｎを、それぞれS={1, 2}、N={3, 4}のように定めればよい。 Here, N _S indicates the number of regions included in the set S, and N _N indicates the number of regions included in the set N. Also, all direction areas (1 to Q) are determined in advance to belong to the set S or the set N. For example, when Q = 4, sets S and N may be defined as S = {1, 2} and N = {3, 4}, respectively.

集約パワーベクトルY(ω,l)は乗算部１５Ｂに入力される。乗算部１５Ｂのもう一方の入力であるパワー推定行列T^-1(ω)は、逆行列演算部１５Ｃの出力信号である。逆行列演算部１５Ｃには式（６）により定義される集約ゲイン行列T(ω)が入力され、その逆行列T^-1(ω)を出力する。 The aggregated power vector Y (ω, l) is input to the multiplication unit 15B. The power estimation matrix T ⁻¹ (ω), which is the other input of the multiplier 15B, is an output signal of the inverse matrix calculator 15C. The aggregation gain matrix T (ω) defined by the equation (6) is input to the inverse matrix operation unit 15C, and the inverse matrix T ⁻¹ (ω) is output.

集約ゲイン行列Tの各要素は、図９に示すように各ビームフォーマー部１２−１〜１２−Ｑの各方向領域に対する指向特性の平均値から求められるパラメータであり、例えば、式（７）に示すよう指向特性の方向に関する平均値を用いる。 Each element of the aggregation gain matrix T is a parameter obtained from an average value of directivity characteristics for each direction area of each of the beam formers 12-1 to 12-Q as shown in FIG. 9, for example, equation (7) Use the average value for the direction of directivity as shown in.

ε_pq(ω)はビームフォーマー部１２−ｐの第ｑ方向領域に対する指向特性の平均値である。なお、指向特性は、例えば前述の非特許文献１に記載されている周知の技術を用いてフィルタ係数W_qm(n)より求めることができる。 ε _pq (ω) is an average value of the directivity characteristics for the q-th direction region of the beam former 12-p. The directivity characteristic can be _{obtained from} the filter coefficient W _qm (n), for example, using a known technique described in the above-mentioned Non-Patent Document 1.

乗算部１５Ｂは式（８）に示すように入力された集約パワーベクトルY(ω,l)とパワー推定行列T^-1(ω)の乗算を周波数成分ごとに行い、推定信号パワーベクトルX_est(ω,l)を出力する。 The multiplying unit 15B performs multiplication of the integrated power vector Y (ω, l) and the power estimation matrix T ⁻¹ (ω), which are input as shown in equation (8), for each frequency component, and estimates the estimated signal power vector X _est ( Output ω, l).

推定信号パワーベクトルX_est(ω,l)は、式（９）に示すように、第１成分が収音領域信号推定パワー|S_bf(ω,l)|²となり、第２成分が抑圧領域信号推定パワー|N_bf(ω,l)|²となる。 In the estimated signal power vector X _est (ω, l), as shown in equation (9), the first component is the sound collection region signal estimated power | S _bf (ω, l) | ² and the second component is the suppression region signal estimation power _{| N bf (ω, l)} | 2 to become.

このように、収音領域と抑圧領域とで方向領域の集約を行って信号のパワー（信号量）を推定するのが、信号量推定部１５である。 As described above, the signal amount estimating unit 15 estimates the power (signal amount) of the signal by aggregating the direction regions in the sound collection region and the suppression region.

信号量推定部の処理は上述の方法に限定されず、既知の信号量推定技術を用いることができる（例えば、上記特許文献１、下記参考文献２等参照）。
〔参考文献２〕特許第４８５６６６２号公報 The processing of the signal amount estimation unit is not limited to the method described above, and a known signal amount estimation technique can be used (see, for example, Patent Document 1 and Reference Document 2 below).
[Reference 2] Japanese Patent No. 4856662

（ベクトル要素抽出部）
ステップＳ１６において、ベクトル要素抽出部１６は、入力された推定信号パワーベクトルX_est(ω,l)から抑圧領域信号推定パワー|N_bf(ω,l)|²を抽出し、出力する。ベクトル要素抽出部１６の出力信号|N_bf(ω,l)|²は、特定方向信号モデル化部１７および利得係数算出部１８にそれぞれ入力される。 (Vector element extraction unit)
In step S16, the vector element extraction unit 16 extracts the suppression region signal estimated power | N _bf (ω, l) | ² from the input estimated signal power vector X _est (ω, l) and outputs it. An output signal | N _bf (ω, l) | ² of the vector element extraction unit 16 is input to the specific direction signal modeling unit 17 and the gain coefficient calculation unit 18, respectively.

（特定方向信号モデル化部）
ステップＳ１７において、特定方向信号モデル化部１７は、ベクトル要素抽出部１６の出力信号|N_bf(ω,l)|²とモデルパラメータ記憶部１０に記憶された音声モデルパラメータΦ^Sとを用いて、信号量推定部１５で推定した信号量を補正するための特定方向信号モデルを生成する。また、その特定方向信号モデルを用いて、特定方向信号Y_S(ω,l)が観測されたときの各状態jの事後確率α_j(l)および各分布kの事後確率β_j,k(l)を計算する。特定方向信号モデル化部１７が出力する事後確率α_j(l), β_j,k(l)は、特定方向信号モデルの各状態および各分布が、観測された特定方向信号をどの程度よく表すかの度合いを意味している。特定方向信号モデル化部１７の出力する事後確率α_j(l), β_j,k(l)は利得係数算出部１８に入力される。 (Specific Direction Signal Modeling Unit)
In step S17, the specific direction signal modeling unit 17 uses the output signal | N _bf (ω, l) | ² of the vector element extraction unit 16 and the speech model parameter ^{S S} stored in the model parameter storage unit 10. A specific direction signal model for correcting the signal amount estimated by the signal amount estimation unit 15 is generated. Also, using the specific direction signal model, the posterior probability α _j (l) of each state j and the posterior probability β _{j, k of} each distribution k when the specific direction signal Y _S (ω, l) is observed Calculate l) The posterior probabilities α _j (l) and β _{j, k} (l) output by the specific direction signal modeling unit 17 represent how often each state and distribution of the specific direction signal model is observed in the specific direction signal observed It means the degree of lightness. The posterior probabilities α _j (l) and β _{j, k} (l) output from the specific direction signal modeling unit 17 are input to the gain coefficient calculation unit 18.

図１０は特定方向信号モデル化部１７の構成を示している。状態確率記憶部１７Ｃには、音声が状態jである確率a_jが記憶されている。特定方向信号モデル化部１７に入力される抑圧領域信号推定パワー|N_bf(ω,l)|²および音声モデルパラメータΦ^Sはモデルパラメータ生成部１７Ａに入力される。 FIG. 10 shows the configuration of the specific direction signal modeling unit 17. The state probability storage unit 17C stores the probability a _j that the voice is in the state j. The suppression region signal estimation power | N _bf (ω, l) | ² and the speech model parameter ^{S S} input to the specific direction signal modeling unit 17 are input to the model parameter generation unit 17A.

モデルパラメータ生成部１７Ａは、以下のようにして特定方向信号モデルのパラメータを算出する。特定方向信号はクリーン音声と雑音との和として表されるため、特定方向信号モデルの平均μ^YS _j,k(ω,l)を式（10）により算出する。 The model parameter generation unit 17A calculates the parameters of the specific direction signal model as follows. Since the specific direction signal is expressed as the sum of the clean speech and the noise, the average μ ^Ys _{j, k} (ω, l) of the specific direction signal model is calculated by equation (10).

また、特定方向信号モデルの分散σ^YS _j,k ²(ω)と混合重みw^YS _j,kとを、それぞれ式（11）（12）とする。 Further, the variance σ ^Ys _{j, k} ² (ω) of the specific direction signal model and the mixing weight w ^Ys _{j, k} are respectively given by equations (11) and (12).

そして、式（13）により定義される特定方向信号モデルパラメータΦ^YSを出力する。特定方向信号モデルパラメータΦ^YSは確率算出部１７Ｂに入力される。 Then, the specific direction signal model parameter Y ^YS defined by the equation (13) is output. The specific direction signal model parameter Y ^YS is input to the probability calculation unit 17B.

確率算出部１７Ｂには、特定方向信号Y_S(ω,l)、特定方向信号モデルパラメータΦ^YS、状態確率a_jが入力される。確率算出部１７Ｂは、特定方向信号モデルパラメータΦ^YSから状態jに関する特定方向信号の尤度p(Y_S|j)と分布kに関する特定方向信号の尤度p(Y_S|j,k)とを算出する。そして、特定方向信号の尤度p(Y_S|j), p(Y_S|j,k)と特定方向信号モデルパラメータΦ^YSとを用いて式（14）を計算し、分布kの事後確率β_j,k(l)を算出する。 The specific direction signal Y _S (ω, l), the specific direction signal model parameter Y ^YS , and the state probability a _j are input to the probability calculation unit 17B. Probability calculation unit 17B is likelihood p in a particular direction signal relating to the state j from a particular direction signal model parameters [Phi ^YS | likelihood p of (Y _S j) to the specific direction signal on the distribution _{k (Y S | j, k} ) and Calculate Then, equation (14) is calculated using the likelihood p (Y _s | j), p (Y _s | j, k) of the specific direction signal and the specific direction signal model parameter 事後^YS, and the posterior probability of the distribution k Calculate β _{j, k} (l).

また、特定方向信号の尤度p(Y_S|j), p(Y_S|j,k)と状態確率a_jを用いて式（15）を計算し、状態jの事後確率α_j(l)を算出する。 Also, the equation (15) is calculated using the likelihood p (Y _s | j), p (Y _s | j, k) of the specific direction signal and the state probability a _j, and the posterior probability α _j (l Calculate).

α_j(l)は式（16）（17）としてもよい。 α _j (l) may be expressed by equations (16) and (17).

ここで、|S_bf(ω,l)|²はベクトル要素抽出部で抽出されずに残った要素であり、音声ＰＳＤである。また、 Here, | S _bf (ω, l) | ² is an element left unextracted by the vector element extraction unit, and is a speech PSD. Also,

は、それぞれ|S_bf(ω,l)|², |N_bf(ω,l)|²の周波数平均を示す。

_Denotes the frequency average of | S _bf (ω, l) | ² and | N _bf (ω, l) | ^{2 respectively} .

（利得係数算出部）
ステップＳ１８において、利得係数算出部１８は、モデルパラメータ記憶部１０から音声モデルパラメータΦ^Sを読み込み、ベクトル要素抽出部１６の出力信号|N_bf(ω,l)|²と特定方向信号モデル化部１７から受け取った事後確率α_j(l), β_j,k(l)とを用いて、ウィーナーフィルタの利得係数R(ω,l)を算出する。利得係数算出部１８が出力する利得係数R(ω,l)は乗算部１９に入力される。 (Gain coefficient calculation unit)
In step S18, the gain coefficient calculation unit 18, the model parameter storage unit 10 reads the speech model parameters [Phi ^S, the output signal of the vector element extraction unit _{16 | N bf (ω, l} ) | 2 and a specific direction signal modeling unit The gain coefficient R (ω, l) of the Wiener filter is calculated using the posterior probabilities α _j (l) and β _{j, k} (l) received from the number 17. The gain coefficient R (ω, l) output from the gain coefficient calculator 18 is input to the multiplier 19.

図１１は利得係数算出部１８の構成を示している。利得係数算出部１８に入力される抑圧領域信号推定パワー|N_bf(ω,l)|²および音声モデルパラメータΦ^SはＳＮ比推定部１８Ａに入力される。 FIG. 11 shows the configuration of the gain coefficient calculation unit 18. The suppression region signal estimated power | N _bf (ω, l) | ² and the speech model parameter ^{S S} input to the gain coefficient calculation unit 18 are input to the SN ratio estimation unit 18A.

基本的なウィーナーフィルタの利得係数は式（18）で算出する。 The gain factor of the basic Wiener filter is calculated by equation (18).

この実施形態では、利得係数算出に用いる音声ＰＳＤの精度を高めるために、ＳＮ比推定部１８Ａでは、式（18）で定義されるウィーナーフィルタに対して、音声ＰＳＤを音声モデルパラメータΦ^Sで置き換え、式（19）により利得係数R_j,k(ω,l)を算出する。 In this embodiment, in order to improve the accuracy of the speech PSD used for gain coefficient calculation, the SN ratio estimation unit 18A replaces the speech PSD with the speech model parameter Φ ^S with respect to the Wiener filter defined by equation (18). The gain coefficient R _{j, k} (ω, l) is calculated by the equation (19).

重み付き加算部１８Ｂには、ＳＮ比推定部１８Ａの出力R_j,k(ω,l)と特定方向信号モデル化部１７の出力α_j(l), β_j,k(l)とが入力され、式（20）により利得係数R_j,k(ω,l)を事後確率α_j(l), β_j,k(l)で重み付き加算して利得係数R(ω,j)を算出する。 The weighted addition unit 18B receives the output R _{j, k} (ω, l) of the SN ratio estimation unit 18A and the outputs α _j (l) and β _{j, k} (l) of the specific direction signal modeling unit 17 (20) to calculate the gain coefficient R (ω, j) by weighted addition of the gain coefficient R _{j, k} (ω, l) with the posterior probability α _j (l), β _{j, k} (l) Do.

（乗算部）
ステップＳ１９において、乗算部１９は、入力された利得係数R(ω,l)と特定方向選択部１４の出力信号Y_S(ω,l)とを同じ周波数の成分ごとに掛け算した結果を出力する。乗算部１９の出力信号Y_SR(ω,l)は逆周波数領域変換部２０に入力される。 (Multiplication unit)
In step S19, the multiplication unit 19 outputs the result of multiplying the input gain coefficient R (ω, l) and the output signal Y _S (ω, l) of the specific direction selection unit 14 for each component of the same frequency. . The output signal Y _SR (ω, l) of the multiplication unit 19 is input to the inverse frequency domain conversion unit 20.

（逆周波数領域変換部）
ステップＳ２０において、逆周波数領域変換部２０は、入力された乗算部１９の出力信号Y_SR(ω,l)に対して逆離散フーリエ変換を行い、時間信号に復元された信号y(n)を出力する。この出力信号y(n)が雑音抑圧装置によって所望の音が強調されて収音された信号である。 (Inverse frequency domain converter)
In step S20, the inverse frequency domain transformation unit 20 performs inverse discrete Fourier transform on the input output signal Y _SR (ω, l) of the multiplication unit 19 to obtain the signal y (n) restored to the time signal. Output. The output signal y (n) is a signal picked up by emphasizing the desired sound by the noise suppressor.

第一実施形態では、雑音抑圧装置がマイクロホンアレーとＱ個のビームフォーマー部を備え、Ｑ個の方向領域のそれぞれで発せられる音を強調して収音する構成を説明したが、各方向領域で発せられる音を強調して収音することが可能であれば、このような構成には限定されない。例えば、雑音抑圧装置がマイクロホンアレーとＱ個のビームフォーマー部の代わりにＱ個の指向性マイクロホンを備えるものとし、各方向領域で発せられる音を強調して収音するように構成してもよい。また、指向性マイクロホンを用いる構成であってもビームフォーマー部を併用することで、より高精度に各方向領域で発せられる音を強調して収音することが可能である。このような構成は後述の実施形態においても同様に適用することができる。 In the first embodiment, the noise suppression device includes the microphone array and the Q beam formers, and the configuration for emphasizing and collecting the sound emitted in each of the Q direction regions has been described. It is not limited to such a configuration as long as it is possible to emphasize and pick up the sound emitted by the. For example, even if the noise suppressor includes Q directional microphones instead of the microphone array and the Q beam formers, the sound emitted in each direction region may be emphasized and picked up. Good. In addition, even in the configuration using directional microphones, it is possible to emphasize and collect the sound emitted in each direction area with higher accuracy by using the beam former together. Such a configuration can be similarly applied to the embodiments described later.

［第二実施形態］
第一実施形態では収音信号に含まれる音声ＰＳＤの推定誤差を、クリーン音声モデルを用いて補正した。収音環境が限定的であり、雑音の性質が想定できる場合には、想定される雑音に類似した雑音信号を用意し、事前に状態ごとの雑音モデルを学習することができる。第二実施形態では、この雑音モデルを用いて収音信号に含まれる雑音ＰＳＤの推定誤差を補正する。 Second Embodiment
In the first embodiment, the estimation error of the voice PSD contained in the collected signal is corrected using a clean voice model. If the sound collection environment is limited and the nature of noise can be assumed, a noise signal similar to the expected noise can be prepared and the noise model for each state can be learned in advance. In the second embodiment, this noise model is used to correct the estimation error of the noise PSD contained in the collected signal.

以下、第二実施形態の雑音抑圧装置の構成および雑音抑圧方法の処理手続きを、第一実施形態との相違点を中心に説明する。 Hereinafter, the configuration of the noise suppression apparatus of the second embodiment and the processing procedure of the noise suppression method will be described focusing on differences from the first embodiment.

第二実施形態の雑音抑圧装置は、図１２に示すように、モデルパラメータ記憶部１０に雑音モデルパラメータΦ^Nが記憶されており、ベクトル要素抽出部１６の出力が収音領域信号推定パワー|S_bf(ω,l)|²となる。 In the noise suppression apparatus of the second embodiment, as shown in FIG. 12, the noise model parameter ^{N N} is stored in the model parameter storage unit 10, and the output of the vector element extraction unit 16 is the sound collection region signal estimation power | S. _{bf (ω,} l) | ² to become.

（モデルパラメータ記憶部）
第二実施形態のモデルパラメータ記憶部には、雑音信号と無音信号の各出力確率を、それぞれ、複数の正規分布を含有する混合正規分布で表現した確率モデルのモデルパラメータ（以下、雑音モデルパラメータとも呼ぶ）が記憶されている。雑音モデルパラメータΦ^Nは、μ^N _j,k(ω)をそれぞれの正規分布の平均、σ^N _j,k ²(ω)を分散、w^N _j,kを混合重みとして、式（21）により構成される。 (Model parameter storage unit)
In the model parameter storage unit of the second embodiment, model parameters (hereinafter referred to as noise model parameters) of a probability model in which each output probability of a noise signal and a silence signal is represented by a mixed normal distribution containing a plurality of normal distributions, respectively. ) Is stored. The noise model parameter ^{N N} is expressed by equation (21) using μ ^N _{j, k} (ω) as the average of the respective normal distributions, σ ^N _{j, k} ² (ω) as the variance, and w ^N _{j, k} as the mixing weight Configured

（ベクトル要素抽出部）
第二実施形態のベクトル要素抽出部１６は、入力された推定信号パワーベクトルX_est(ω,l)から収音領域信号推定パワー|S_bf(ω,l)|²を抽出し、出力する。ベクトル要素抽出部１６の出力信号|S_bf(ω,l)|²は、特定方向信号モデル化部１７および利得係数算出部１８にそれぞれ入力される。 (Vector element extraction unit)
The vector element extraction unit 16 of the second embodiment extracts the sound collection area signal estimated power | S _bf (ω, l) | ² from the input estimated signal power vector X _est (ω, l) and outputs it. The output signal | S _bf (ω, l) | ² of the vector element extraction unit 16 is input to the specific direction signal modeling unit 17 and the gain coefficient calculation unit 18, respectively.

（特定方向信号モデル化部）
第二実施形態の特定方向信号モデル化部１７は、ベクトル要素抽出部１６の出力信号|S_bf(ω,l)|²とモデルパラメータ記憶部１０に記憶された雑音モデルパラメータΦ^Nとを用いて、特定方向信号モデルを生成する。具体的には、モデルパラメータ生成部１７Ａが、式（22）〜（24）により特定方向信号モデルのパラメータΦ^YSを算出する。 (Specific Direction Signal Modeling Unit)
The specific direction signal modeling unit 17 of the second embodiment, the vector output signal of the element extracting section 16 | using a ² and the model parameter noise model parameters stored in the storage unit _{^{10 Φ N | S bf (ω}} , l) To generate a specific direction signal model. Specifically, the model parameter generation unit 17A calculates the parameter Y ^YS of the specific direction signal model by Equations (22) to (24).

（利得係数算出部）
第二実施形態の利得係数算出部１８は、モデルパラメータ記憶部１０から雑音モデルパラメータΦ^Nを読み込み、ベクトル要素抽出部１６の出力信号|S_bf(ω,l)|²と特定方向信号モデル化部１７から受け取った事後確率α_j(l), β_j,k(l)とを用いて、ウィーナーフィルタの利得係数R(ω,l)を算出する。具体的には、ＳＮ比推定部１８Ａが、上記の式（18）で定義されるウィーナーフィルタに対して、雑音ＰＳＤを雑音モデルパラメータΦ^Nで置き換え、式（25）により利得係数R_j,k(ω,l)を算出する。 (Gain coefficient calculation unit)
Second embodiment of a gain factor calculating section 18, the model parameter storage unit 10 reads the noise model parameter [Phi ^N, the output signal of the vector element extraction unit _{16 | S bf (ω, l} ) | 2 and a specific direction signal modeling Using the posterior probabilities α _j (l) and β _{j, k} (l) received from the unit 17, the gain coefficient R (ω, l) of the Wiener filter is calculated. Specifically, the SN ratio estimation unit 18A replaces the noise PSD with the noise model parameter ^{N N} with respect to the Wiener filter defined by the above equation (18), and the gain coefficient R _{j, k} by the equation (25) Calculate (ω, l).

［第三実施形態］
第一実施形態ではクリーン音声モデルを用いて音声ＰＳＤの推定誤差を補正した。一方、第二実施形態では雑音モデルを用いて雑音ＰＳＤの推定誤差を補正した。第三実施形態では、クリーン音声モデルと雑音モデルを両方とも事前に学習しておき、音声ＰＳＤと雑音ＰＳＤのいずれについても推定誤差を補正する。 Third Embodiment
In the first embodiment, the clean speech model is used to correct the estimation error of the speech PSD. On the other hand, in the second embodiment, the estimation error of the noise PSD is corrected using a noise model. In the third embodiment, both the clean speech model and the noise model are learned in advance, and the estimation error is corrected for both the speech PSD and the noise PSD.

以下、第三実施形態の雑音抑圧装置の構成および雑音抑圧方法の処理手続きを、第一実施形態との相違点を中心に説明する。 Hereinafter, the configuration of the noise suppression device of the third embodiment and the processing procedure of the noise suppression method will be described focusing on differences from the first embodiment.

第三実施形態の雑音抑圧装置は、図１４に示すように、信号量推定部１５およびベクトル要素抽出部１６を備えておらず、モデルパラメータ記憶部１０には音声モデルパラメータΦ^Sおよび雑音モデルパラメータΦ^Nが記憶されている。 The noise suppression device according to the third embodiment does not include the signal amount estimation unit 15 and the vector element extraction unit 16 as shown in FIG. 14, and the model parameter storage unit 10 has a speech model parameter ^{S S} and a noise model parameter. Φ ^N is stored.

（特定方向信号モデル化部）
第三実施形態の特定方向信号モデル化部１７は、モデルパラメータ記憶部１０に記憶された音声モデルパラメータΦ^Sと雑音モデルパラメータΦ^Nとを用いて、特定方向信号モデルを生成する。具体的には、モデルパラメータ生成部１７Ａが、式（26）〜（28）により特定方向信号モデルのパラメータΦ^YSを算出する。 (Specific Direction Signal Modeling Unit)
The specific direction signal modeling unit 17 of the third embodiment generates a specific direction signal model using the speech model parameter ^{S S} and the noise model parameter ^{N N} stored in the model parameter storage unit 10. Specifically, the model parameter generation unit 17A calculates the parameter Y ^YS of the specific direction signal model by Equations (26) to (28).

ここで、j_S, j_Nはそれぞれ音声モデルと雑音モデルの状態番号を示し、k_S, k_Nはそれぞれ音声モデルと雑音モデルの分布番号を示す。 Here, j _S and j _N indicate the state numbers of the speech model and the noise model, and k _S and k _N indicate the distribution numbers of the speech model and the noise model, respectively.

（利得係数算出部）
第三実施形態の利得係数算出部１８は、モデルパラメータ記憶部１０から音声モデルパラメータΦ^Sおよび雑音モデルパラメータΦ^Nを読み込み、特定方向信号モデル化部１７から受け取った事後確率α_j(l), β_j,k(l)を用いて、利得係数R(ω,l)を算出する。具体的には、ＳＮ比推定部１８Ａが、上記の式（18）で定義されるウィーナーフィルタに対して、音声ＰＳＤを音声モデルパラメータΦ^Sで、雑音ＰＳＤを雑音モデルパラメータΦ^Nで置き換え、利得係数R_j,k(ω,l)を式（29）により算出する。 (Gain coefficient calculation unit)
The gain coefficient calculation unit 18 of the third embodiment reads the speech model parameter ^{S S} and the noise model parameter ^{N N} from the model parameter storage unit 10, and receives the posterior probability α _j (l) received from the specific direction signal modeling unit 17. The gain coefficient R (ω, l) is calculated using β _{j, k} (l). Specifically, with respect to the Wiener filter defined by the above equation (18), the SN ratio estimation unit 18A replaces the speech PSD with the speech model parameter ^{S S} and the noise PSD with the noise model parameter 、 ^N to obtain a gain The coefficient R _{j, k} (ω, l) is calculated by equation (29).

以下、この発明の雑音抑圧技術のポイントを説明する。雑音下での音声収音において、ビームフォーミングにより取得した空間情報から、音声ＰＳＤや雑音ＰＳＤを精度よく推定できる。一方、入力される音声や雑音と類似したデータが事前に用意できる場合、統計モデルを構築することで、音声ＰＳＤや雑音ＰＳＤをより高精度なものに補正できる。そのため、音声に雑音が混入した観測音を高精度にモデル化できる。ここで、観測音モデルに観測音を入力してウィーナーフィルタを設計する手法では、空間情報が用いられず性能向上が難しいため、ビームフォーミングにより取得した特定方向信号を入力する。このように、空間情報と音源情報を併用することで、高性能なウィーナーフィルタを設計し、音声を歪ませずに高い雑音抑圧量を実現することができる。 Hereinafter, points of the noise suppression technique of the present invention will be described. In speech collection under noise, speech PSD and noise PSD can be accurately estimated from spatial information acquired by beamforming. On the other hand, when data similar to the input speech or noise can be prepared in advance, the speech PSD or the noise PSD can be corrected to a more accurate one by constructing a statistical model. Therefore, it is possible to model an observation sound in which noise is mixed in the speech with high accuracy. Here, in the method of designing the Wiener filter by inputting the observation sound to the observation sound model, spatial information is not used and it is difficult to improve the performance, so a specific direction signal acquired by beamforming is input. Thus, by using spatial information and sound source information in combination, a high-performance Wiener filter can be designed, and a high noise suppression amount can be realized without distorting speech.

この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and it is needless to say that changes can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment are not only executed chronologically according to the order described, but may be executed in parallel or individually depending on the processing capability of the apparatus executing the process or the necessity.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiments are implemented by a computer, the processing content of the function that each device should have is described by a program. By executing this program on a computer, various processing functions in each of the above-described devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing content can be recorded in a computer readable recording medium. As the computer readable recording medium, any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, etc. may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 Further, the distribution of this program is carried out, for example, by selling, transferring, lending, etc. a portable recording medium such as a DVD, a CD-ROM, etc. in which the program is recorded. Furthermore, this program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 For example, a computer that executes such a program first temporarily stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, at the time of execution of the process, the computer reads the program stored in its own recording medium and executes the process according to the read program. Further, as another execution form of this program, the computer may read the program directly from the portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer Each time, processing according to the received program may be executed sequentially. In addition, a configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes processing functions only by executing instructions and acquiring results from the server computer without transferring the program to the computer It may be Note that the program in the present embodiment includes information provided for processing by a computer that conforms to the program (such as data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, in this embodiment, although the present apparatus is configured by executing a predetermined program on a computer, at least a part of the processing contents may be realized as hardware.

１０モデルパラメータ記憶部
１１マイクロホンアレー
１１−１〜１１−Ｍマイクロホン
１２−１〜１２−Ｑビームフォーマー部
１３−１〜１３−Ｑ周波数領域変換部
１４特定方向選択部
１５信号量推定部
１６ベクトル要素抽出部
１７特定方向信号モデル化部
１８利得係数算出部
１９乗算部
２０逆周波数領域変換部 10 model parameter storage unit 11 microphone array 11-1 to 11-M microphone 12-1 to 12-Q beam former unit 13-1 to 13-Q frequency domain conversion unit 14 specific direction selection unit 15 signal amount estimation unit 16 vector Element extraction unit 17 Specific direction signal modeling unit 18 Gain coefficient calculation unit 19 Multiplication unit 20 Inverse frequency domain conversion unit

Claims

A model parameter storage unit storing parameters of a signal component model in which each output probability of at least one of a clean speech signal and a noise signal and a silent signal is represented by a mixed normal distribution containing a plurality of normal distributions,
It belongs to the angle area signal of the angle area of the desired direction from the plurality of frequency area signals obtained by dividing each of the plurality of angle area signals collected by emphasizing the sound coming from the plurality of angle areas of different directions into a plurality of band components. A specific direction selector for selecting a specific direction frequency domain signal to be
Sound collection area calculated using suppression area signal estimated power calculated using angle area signal of angle area corresponding to predetermined suppression area and angle area signal of angle area corresponding to predetermined sound collection area When a specific direction signal model represented by a mixed normal distribution containing a plurality of normal distributions is generated using at least one of signal estimation powers and parameters of the signal component model, and the specific direction frequency domain signal is observed A specific direction signal modeling unit that calculates a posteriori probability of the specific direction signal model of
A gain coefficient calculation unit which performs weighted addition of gain coefficients for each frequency band calculated using parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication unit for multiplying the signal amount of each corresponding frequency band of the specific direction frequency domain signal with the gain coefficient;
Only including,
Let _j be the state number, k be the distribution number, ω be the frequency bin number, l be the frame number, and a j be the probability that the speech is state j.
The parameters of the above signal component model are composed of an average μ _{j, k} (ω), a variance σ _{j, k} ² (ω), and a mixing weight w _{j, k} ,
The specific direction signal modeling unit
Using the average μ _{j, k} (ω), the variance σ _{j, k} ² (ω), and the mixing weights w _{j, k} of the above signal component model, the average μ ^Y _{j, k} (ω, l) of the specific direction signal model A model parameter generation unit for ^obtaining the variance σ ^Ys _{j, k} ² (ω) and the mixing weight w ^Ys _{j, k} ,
The likelihood of the particular direction frequency domain signal for state j using the mean μ ^Ys _{j, k} (ω, l), the variance σ ^Ys _{j, k} ² (ω), and the mixing weights w ^Ys _{j, k} of the particular direction signal model For degree p (Y _S | j) and distribution k
Find the likelihood p (Y _s | j, k) of the specific direction frequency domain signal and calculate the posterior probability α _j (l) for the state j and the posterior probability β _{j, k} (l) for the distribution k A probability calculation unit,

Including
Noise suppressor.

Each output probability of clean speech signal and the silence signal, respectively, and the model parameter storage unit for storing the parameters of the signal component model representing a mixed normal distribution containing a plurality of normal distributions,
It belongs to the angle area signal of the angle area of the desired direction from the plurality of frequency area signals obtained by dividing each of the plurality of angle area signals collected by emphasizing the sound coming from the plurality of angle areas of different directions into a plurality of band components. A specific direction selector for selecting a specific direction frequency domain signal to be
The sum of the signals included in each angular region is the signal amount of that angular region, and the signal amount of each frequency domain signal is multiplied by the inverse matrix of the gain matrix whose elements are parameters obtained from the directivity characteristics for each angular region. A signal amount estimation unit that estimates a signal amount of each angle region ;
Using suppression domain signal estimation power of aggregate signals of angular regions corresponding to Oh et beforehand suppression area defined and the parameters of the signal component model specification represented by Gaussian Mixture containing a plurality of normal distributions A specific direction signal modeling unit that generates a direction signal model and calculates the posterior probability of the specific direction signal model when the specific direction frequency domain signal is observed;
A gain coefficient calculation unit which performs weighted addition of gain coefficients for each frequency band calculated using parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication unit for multiplying the signal amount of each corresponding frequency band of the specific direction frequency domain signal with the gain coefficient;
Noise suppression device comprising a.

Each output probability of noise signals and silence signal, respectively, and the model parameter storage unit for storing the parameters of the signal component model representing a mixed normal distribution containing a plurality of normal distributions,
It belongs to the angle area signal of the angle area of the desired direction from the plurality of frequency area signals obtained by dividing each of the plurality of angle area signals collected by emphasizing the sound coming from the plurality of angle areas of different directions into a plurality of band components. A specific direction selector for selecting a specific direction frequency domain signal to be
The sum of the signals included in each angular region is the signal amount of that angular region, and the signal amount of each frequency domain signal is multiplied by the inverse matrix of the gain matrix whose elements are parameters obtained from the directivity characteristics for each angular region. A signal amount estimation unit that estimates a signal amount of each angle region ;
It is represented by Gaussian Mixture containing a plurality of normal distributions with sound collecting area signal estimate power of aggregate signals of Oh et al beforehand corresponding angular region sound collecting area determined and the parameters of the signal component model A specific direction signal modeling unit that generates a specific direction signal model and calculates a posterior probability of the specific direction signal model when the specific direction frequency domain signal is observed;
A gain coefficient calculation unit which performs weighted addition of gain coefficients for each frequency band calculated using parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication unit for multiplying the signal amount of each corresponding frequency band of the specific direction frequency domain signal with the gain coefficient;
Noise suppression device comprising a.

The model parameter storage unit stores parameters of a signal component model in which each output probability of at least one of a clean speech signal and a noise signal and a silent signal is represented by a mixed normal distribution including a plurality of normal distributions,
Angle regions of desired directions from a plurality of frequency domain signals obtained by dividing each of a plurality of angle region signals picked up by emphasizing sounds coming from a plurality of angle regions of different directions from a plurality of frequency region signals Selecting a specific direction frequency domain signal belonging to the angular domain signal of
The suppression region signal estimated power calculated using the angle region signal of the angle region corresponding to the predetermined suppression region and the angle region signal of the angle region corresponding to the predetermined sound collection region are calculated by the specific direction signal modeling unit A specific direction signal model represented by a mixed normal distribution containing a plurality of normal distributions is generated using at least one of the sound collection region signal estimated power calculated using the signal component model and the parameter of the signal component model, and the specific direction A specific direction signal modeling step of calculating the posterior probability of the specific direction signal model when a frequency domain signal is observed;
A gain coefficient calculation step in which a gain coefficient calculation unit performs weighted addition of gain coefficients for each frequency band calculated using the parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication step of multiplying the signal coefficients of the corresponding frequency bands of the specific direction frequency domain signal by the multiplication section;
Only including,
Let _j be the state number, k be the distribution number, ω be the frequency bin number, l be the frame number, and a j be the probability that the speech is state j.
The parameters of the above signal component model are composed of an average μ _{j, k} (ω), a variance σ _{j, k} ² (ω), and a mixing weight w _{j, k} ,
The specific direction signal modeling step is
Using the average μ _{j, k} (ω), the variance σ _{j, k} ² (ω), and the mixing weights w _{j, k} of the above signal component model, the average μ ^Y _{j, k} (ω, l) of the specific direction signal model , Variance σ ^Ys _{j, k} ² (ω), mixing weight w ^Ys _{j, k} model parameter generation step,
The likelihood of the particular direction frequency domain signal for state j using the mean μ ^Ys _{j, k} (ω, l), the variance σ ^Ys _{j, k} ² (ω), and the mixing weights w ^Ys _{j, k} of the particular direction signal model Find the likelihood p (Y _s | j, k) of the above specific directional frequency domain signal with respect to degree p (Y _s | j) and distribution k, and calculate posterior probability α _j (l) for state j and distribution k by A probability calculation step of calculating the posterior probability β _{j, k} (l);

Including
Noise suppression method.

The model parameter storage unit, each output probability of a clean speech signal and the silence signal, respectively, and the parameters of the signal component model representing a mixed normal distribution containing a plurality of normal distributions is stored,
Angle regions of desired directions from a plurality of frequency domain signals obtained by dividing each of a plurality of angle region signals picked up by emphasizing sounds coming from a plurality of angle regions of different directions from a plurality of frequency region signals Selecting a specific direction frequency domain signal belonging to the angular domain signal of
The signal amount estimation unit takes the sum of the signals included in each angular region as the signal amount of that angular region, and each frequency domain signal with respect to the inverse matrix of the gain matrix whose elements are parameters obtained from the directivity characteristics for each angular region. multiplied by the amount of the signal, and the signal estimation step of estimating a signal amount of each angular region,
Specific direction signal modeling unit contains a plurality of normal distribution using a parameter of the suppression area signal estimation power and the signal component model aggregate signal of the angle region corresponding to the suppression area predetermined Gaussian mixture A specific direction signal modeling step of generating a specific direction signal model represented by a distribution, and calculating a posterior probability of the specific direction signal model when the specific direction frequency domain signal is observed;
A gain coefficient calculation step in which a gain coefficient calculation unit performs weighted addition of gain coefficients for each frequency band calculated using the parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication step of multiplying the signal coefficients of the corresponding frequency bands of the specific direction frequency domain signal by the multiplication section;
Noise suppression method, including.

The model parameter storage unit stores parameters of a signal component model in which each output probability of the noise signal and the silence signal is represented by a mixed normal distribution including a plurality of normal distributions,
Angle regions of desired directions from a plurality of frequency domain signals obtained by dividing each of a plurality of angle region signals picked up by emphasizing sounds coming from a plurality of angle regions of different directions from a plurality of frequency region signals Selecting a specific direction frequency domain signal belonging to the angular domain signal of
The signal amount estimation unit takes the sum of the signals included in each angular region as the signal amount of that angular region, and each frequency domain signal with respect to the inverse matrix of the gain matrix whose elements are parameters obtained from the directivity characteristics for each angular region. Signal amount estimation step of estimating the signal amount of each angle region by multiplying the signal amount of
A mixture in which a specific direction signal modeling unit contains a plurality of normal distributions using sound collection area signal estimated power in which signal amounts of angle areas corresponding to predetermined sound collection areas are collected and parameters of the signal component model A specific direction signal modeling step of generating a specific direction signal model represented by a normal distribution, and calculating the posterior probability of the specific direction signal model when the specific direction frequency domain signal is observed;
A gain coefficient calculation step in which a gain coefficient calculation unit performs weighted addition of gain coefficients for each frequency band calculated using the parameters of the signal component model according to the posterior probability of the specific direction signal model;
A multiplication step of multiplying the signal coefficients of the corresponding frequency bands of the specific direction frequency domain signal by the multiplication section;
Noise suppression method.

The program for functioning a computer as each part of the noise suppression apparatus in any one of Claim 1 to 3 .