JP2015026956A

JP2015026956A - Voice signal processing device and program

Info

Publication number: JP2015026956A
Application number: JP2013154825A
Authority: JP
Inventors: 克之高橋; Katsuyuki Takahashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2013-07-25
Filing date: 2013-07-25
Publication date: 2015-02-05
Anticipated expiration: 2033-07-25
Also published as: JP6221463B2

Abstract

PROBLEM TO BE SOLVED: To provide a voice signal processing device capable of performing flooring processing to obtain balance between noise suppression performance and voice quality.SOLUTION: A voice signal processing device for suppressing a noise component included in an input voice signal by coherence filter processing includes: means for calculating a coherence filter coefficient; means for acquiring an index value that expresses the arrival bearing of interference voice included in the input voice signal; means for determining a flooring threshold so as to obtain a larger value as the arrival bearing of the interference voice is closer to the arrival bearing of target voice in accordance with the acquired index value; and means for performing flooring by applying the determined flooring threshold to the calculated coherence filter coefficient of each frequency.

Description

本発明は、音声信号処理装置及びプログラムに関し、例えば、電話機やテレビ会議装置などの音声信号（この明細書では、音声信号や音響信号等の音信号を「音声信号」と呼んでいる）を扱う通信機や通信ソフトウェアに適用し得るものである。 The present invention relates to an audio signal processing apparatus and program, and handles, for example, an audio signal from a telephone or a video conference apparatus (in this specification, an audio signal such as an audio signal or an acoustic signal is called an “audio signal”). It can be applied to communication devices and communication software.

取得した音声信号中に含まれる雑音成分を抑圧する手法の一つとして、コヒーレンスフィルタ法が挙げられる。コヒーレンスフィルタ法は、特許文献１に記載されているように、左右に死角を有する信号の相互相関を周波数ごとに乗算することで、到来方位に偏りが大きい雑音成分を抑圧する手法である。 One of the methods for suppressing the noise component contained in the acquired audio signal is a coherence filter method. As described in Patent Document 1, the coherence filter method is a method of suppressing a noise component having a large bias in the arrival direction by multiplying the cross-correlation of signals having blind spots on the left and right for each frequency.

特開２０１３−０６１４２１号公報JP 2013-061421 A

コヒーレンスフィルタ処理に限らず、雑音抑圧処理は音に歪みを生じさせ、不自然な音質にしてしまうことがある。歪みの原因の一つは、雑音特性の誤推定による過剰な雑音抑圧処理である。そこで、過剰な抑圧を防止するために、「雑音状況などに応じて制御される雑音抑圧係数を所定の閾値と比較し、雑音抑圧係数が閾値より小さい場合には、その雑音抑圧係数を用いずに閾値を雑音抑圧係数にする」というフロアリング処理を用いることがある。フロアリング処理用の閾値（以下、フロアリング閾値と呼ぶ）は大き過ぎると抑圧性能が不足し、小さ過ぎると音の歪みが増す。 In addition to coherence filter processing, noise suppression processing may cause distortion in the sound, resulting in unnatural sound quality. One of the causes of distortion is excessive noise suppression processing due to erroneous estimation of noise characteristics. Therefore, in order to prevent excessive suppression, compare the noise suppression coefficient controlled according to the noise situation etc. with a predetermined threshold, and if the noise suppression coefficient is smaller than the threshold, do not use the noise suppression coefficient. In some cases, a flooring process that “a threshold value is a noise suppression coefficient” is used. If the threshold for flooring (hereinafter referred to as the flooring threshold) is too large, the suppression performance is insufficient, and if it is too small, sound distortion increases.

そのため、雑音抑圧性能と音質のバランスがとれるようなフロアリング処理を実行できる音声信号処理装置及びプログラムが望まれている。 Therefore, an audio signal processing apparatus and program capable of executing flooring processing that balances noise suppression performance and sound quality are desired.

第１の本発明は、入力音声信号に含まれている雑音成分をコヒーレンスフィルタ処理によって抑制する音声信号処理装置において、（１）コヒーレンスフィルタ係数を算出するコヒーレンスフィルタ係数算出手段と、（２）上記入力音声信号に含まれている妨害音声の到来方位を表す指標値を得る妨害音声到来方位指標値取得手段と、（３）取得した指標値に応じ、妨害音声の到来方位が目的音声の到来方位に近いほど大きな値をとるようにフロアリング閾値を決定するフロアリング閾値決定手段と、（４）算出された各周波数のコヒーレンスフィルタ係数に対し、決定されたフロアリング閾値を適用してフロアリングを行うフロアリング処理手段とを有することは特徴とする。 According to a first aspect of the present invention, (1) a coherence filter coefficient calculating means for calculating a coherence filter coefficient, and (2) the above, in an audio signal processing apparatus that suppresses a noise component contained in an input audio signal by coherence filter processing. Disturbing voice arrival direction index value acquisition means for obtaining an index value representing the arrival direction of the disturbing speech included in the input speech signal; and (3) depending on the acquired index value, the arrival direction of the disturbing speech is the arrival direction of the target speech. Flooring threshold value determining means for determining a flooring threshold value so as to take a larger value as close to, and (4) applying the determined flooring threshold value to the calculated coherence filter coefficient of each frequency. And a flooring processing means to perform.

第２の本発明の音声信号処理プログラムは、入力音声信号に含まれている雑音成分をコヒーレンスフィルタ処理によって抑制する音声信号処理装置に搭載されたコンピュータを、（１）コヒーレンスフィルタ係数を算出するコヒーレンスフィルタ係数算出手段と、（２）上記入力音声信号に含まれている妨害音声の到来方位を表す指標値を得る妨害音声到来方位指標値取得手段と、（３）取得した指標値に応じ、妨害音声の到来方位が目的音声の到来方位に近いほど大きな値をとるようにフロアリング閾値を決定するフロアリング閾値決定手段と、（４）算出された各周波数のコヒーレンスフィルタ係数に対し、決定されたフロアリング閾値を適用してフロアリングを行うフロアリング処理手段として機能させることを特徴とする。 The audio signal processing program according to the second aspect of the present invention provides a computer mounted on an audio signal processing device that suppresses noise components contained in an input audio signal by coherence filter processing. (1) Coherence for calculating coherence filter coefficients Filter coefficient calculating means, (2) disturbing voice arrival direction index value acquiring means for obtaining an index value representing the arrival direction of disturbing voice included in the input voice signal, and (3) disturbing according to the acquired index value. A flooring threshold value determining means for determining a flooring threshold value so that the voice arrival direction is closer to the target voice arrival direction, and (4) a coherence filter coefficient calculated for each frequency is determined. It functions as a flooring processing means for performing flooring by applying a flooring threshold.

本発明の音声信号処理装置及びプログラムによれば、妨害音声の到来方位に応じて適用するフロアリング閾値を切り替えるようにしたので、雑音抑圧性能と音質のバランスがとれるフロアリング処理を実行させることができる。 According to the audio signal processing device and the program of the present invention, the flooring threshold to be applied is switched according to the arrival direction of the disturbing voice, so that it is possible to execute the flooring process that balances the noise suppression performance and the sound quality. it can.

第１の実施形態の音声信号処理装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the audio | voice signal processing apparatus of 1st Embodiment. 第１の実施形態におけるコヒーレンスフィルタ処理部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the coherence filter process part in 1st Embodiment. 第１の実施形態における指向性形成部からの指向性信号の性質を示す説明図である。It is explanatory drawing which shows the property of the directivity signal from the directivity formation part in 1st Embodiment. 第１の実施形態における指向性形成部による２つの指向性の特性を示す説明図である。It is explanatory drawing which shows the characteristic of two directivities by the directivity formation part in 1st Embodiment. 方位ごとのコヒーレンスの挙動を示す説明図である。It is explanatory drawing which shows the behavior of the coherence for every azimuth | direction. 第１の実施形態におけるフロアリング閾値決定部が内蔵するコヒーレンスとフロアリング閾値との対応（変換テーブル）を示す説明図である。It is explanatory drawing which shows a response | compatibility (conversion table) with the coherence which the flooring threshold value determination part in 1st Embodiment incorporates, and a flooring threshold value. 第１の実施形態におけるフロアリング閾値決定部が変換関数の演算を実行してフロアリング閾値を決定するとした場合に適用する変換関数の一例を示す説明図である。It is explanatory drawing which shows an example of the conversion function applied when the flooring threshold value determination part in 1st Embodiment performs the calculation of a conversion function and determines a flooring threshold value. 第１の実施形態におけるフロアリング処理部が実行するフロアリング処理を示すフローチャートである。It is a flowchart which shows the flooring process which the flooring process part in 1st Embodiment performs. 第２の実施形態において生成される雑音信号の指向性を示す説明図である。It is explanatory drawing which shows the directivity of the noise signal produced | generated in 2nd Embodiment.

（Ａ）第１の実施形態
以下、本発明による音声信号処理装置及びプログラムの第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of an audio signal processing device and a program according to the present invention will be described in detail with reference to the drawings.

第１の実施形態に係る音声信号処理装置及びプログラムは、コヒーレンスフィルタ法を適用している。 The audio signal processing apparatus and program according to the first embodiment apply a coherence filter method.

一般に、コヒーレンスフィルタ係数は、雑音、とりわけ話者以外の人間の音声（妨害音声）の到来方位によって値が大きく変動する。そのため、フロアリング処理を行う場合には、フロアリング用の閾値を到来方位によって制御しなければ、所望するフロアリング効果が得られない。 In general, the value of the coherence filter coefficient greatly fluctuates depending on the arrival direction of noise, particularly human speech other than the speaker (interfering speech). Therefore, when the flooring process is performed, a desired flooring effect cannot be obtained unless the flooring threshold is controlled by the arrival direction.

コヒーレンスフィルタ係数は、妨害音声の到来方位によって次のような挙動をすることを本願発明者は認識した。 The inventor of the present application has recognized that the coherence filter coefficient behaves as follows depending on the arrival direction of the disturbing speech.

（ａ）妨害音声の到来方位が正面（目的音声の音源の方向）に近付くほど値が大きくなり、横にそれるほど値が小さくなる。（ｂ）妨害音声の到来方位が正面に近い場合には、高域のコヒーレンスフィルタ係数値は特に増大する。これらの挙動を整理して述べれば、妨害音声の到来方位が正面に近付くほど、コヒーレンスフィルタによる抑圧性能は低下し、特に高域の妨害音声成分の抑圧効果は低下する。 (A) The value increases as the direction of arrival of the disturbing sound approaches the front (the direction of the sound source of the target sound), and the value decreases as it moves sideways. (B) When the direction of arrival of the disturbing voice is close to the front, the high frequency coherence filter coefficient value increases particularly. If these behaviors are organized and described, the suppression performance of the coherence filter decreases as the arrival direction of the interfering speech comes closer to the front, and in particular, the suppression effect of the high frequency interfering speech component decreases.

第１の実施形態は、上述した挙動の認識に基づいて、フロアリング閾値を妨害音声の到来方位に応じて制御するようにしたものである。 In the first embodiment, the flooring threshold is controlled in accordance with the arrival direction of the disturbing voice based on the recognition of the behavior described above.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る音声信号処理装置の構成を示すブロック図である。ここで、一対のマイクロホンｍ１及びｍ２を除いた部分は、ハードウェアで構成することも可能であり、また、ＣＰＵが実行するソフトウェア（音声信号処理プログラム）とＣＰＵとで実現することも可能であるが、いずれの実現方法を採用した場合であっても、機能的には図１で表すことができる。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a configuration of an audio signal processing device according to the first embodiment. Here, the part excluding the pair of microphones m1 and m2 can be configured by hardware, and can also be realized by software (audio signal processing program) executed by the CPU and the CPU. However, even if any realization method is adopted, it can be functionally represented in FIG.

図１において、第１の実施形態に係る音声信号処理装置１０は、一対のマイクロホンｍ１、ｍ２、ＦＦＴ（高速フーリエ変換）部１１、コヒーレンスフィルタ処理部１２及びＩＦＦＴ（逆高速フーリエ変換）部１３を有する。 In FIG. 1, an audio signal processing apparatus 10 according to the first embodiment includes a pair of microphones m1 and m2, an FFT (Fast Fourier Transform) unit 11, a coherence filter processing unit 12, and an IFFT (Inverse Fast Fourier Transform) unit 13. Have.

一対のマイクロホンｍ１、ｍ２は、所定距離（若しくは任意の距離）だけ離れて配置され、それぞれ、周囲の音声を捕捉するものである。各マイクロホンｍ１、ｍ２は、無指向のもの（若しくは、正面方向にごくごく緩やかな指向性を有するもの）である。各マイクロホンｍ１、ｍ２で捕捉された音声信号（入力信号）は、図示しない対応するＡ／Ｄ変換器を介してデジタル信号ｓ１（ｎ）、ｓ２（ｎ）に変換されてＦＦＴ部１１に与えられる。なお、ｎはサンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The pair of microphones m1 and m2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each captures surrounding sounds. Each of the microphones m1 and m2 is omnidirectional (or has a very gentle directivity in the front direction). Audio signals (input signals) captured by the respective microphones m1 and m2 are converted into digital signals s1 (n) and s2 (n) via corresponding A / D converters (not shown) and given to the FFT unit 11. . Note that n is an index indicating the input order of samples, and is expressed as a positive integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is the newer input sample.

ＦＦＴ部１１は、マイクロホンｍ１及びｍ２から入力信号系列ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力信号ｓ１及びｓ２を周波数領域で表現することができる。なお、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から、所定のＮ個のサンプルからなる分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成して適用する。入力信号ｓ１（ｎ）から分析フレームＦＲＡＭＥ１（Ｋ）を構成する例を以下の（１）式に示すが、分析フレームＦＲＡＭＥ２（Ｋ）も同様である。

The FFT unit 11 receives input signal sequences s1 (n) and s2 (n) from the microphones m1 and m2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. Thereby, the input signals s1 and s2 can be expressed in the frequency domain. In performing the Fast Fourier Transform, analysis frames FRAME1 (K) and FRAME2 (K) composed of predetermined N samples are configured and applied from the input signals s1 (n) and s2 (n). An example of constructing the analysis frame FRAME1 (K) from the input signal s1 (n) is shown in the following equation (1), and the analysis frame FRAME2 (K) is the same.

なお、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。本文中では、Ｋが小さいほど古い分析フレームであり、大きいほど新しい分析フレームであるとする。また、以降の説明において、特に但し書きがない限りは、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。 K is an index indicating the order of frames and is expressed by a positive integer. In the text, it is assumed that the smaller the K, the older the analysis frame, and the larger, the newer the analysis frame. In the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified.

ＦＦＴ部１１は、分析フレームごとに高速フーリエ変換処理を施すことで、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換し、得られた周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）をそれぞれ、コヒーレンスフィルタ処理部１２に与える。なお、ｆは周波数を表すインデックスである。また、Ｘ１（ｆ，Ｋ）は単一の値ではなく、（２）式に示すように、複数の周波数ｆ１〜ｆｍのスペクトル成分から構成されるものである。さらに、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部からなる。Ｘ２（ｆ，Ｋ）や後述するＢ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）も同様である。 The FFT unit 11 converts the frequency domain signals X1 (f, K) and X2 (f, K) into the frequency domain signals X1 (f, K) by performing a fast Fourier transform process for each analysis frame. And X2 (f, K) are supplied to the coherence filter processing unit 12, respectively. Note that f is an index representing a frequency. X1 (f, K) is not a single value, but is composed of spectral components of a plurality of frequencies f1 to fm, as shown in equation (2). Furthermore, X1 (f, K) is a complex number and consists of a real part and an imaginary part. The same applies to X2 (f, K) and later-described B1 (f, K) and B2 (f, K).

Ｘ１（ｆ，Ｋ）＝｛Ｘ１（ｆ１，Ｋ），Ｘ１（ｆ２，Ｋ），…，Ｘ１（ｆｍ，Ｋ）｝ …（２）
後述するコヒーレンスフィルタ処理部１２においては、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）のうち、周波数領域信号Ｘ１（ｆ，Ｋ）をメインとし、周波数領域信号Ｘ２（ｆ，Ｋ）をサブとして処理を行うが、周波数領域信号Ｘ２（ｆ，Ｋ）をメインとし、周波数領域信号Ｘ１（ｆ，Ｋ）をサブとして処理を行っても良い（後述する（８）式参照）。 X1 (f, K) = {X1 (f1, K), X1 (f2, K),..., X1 (fm, K)} (2)
In the coherence filter processing unit 12 to be described later, the frequency domain signal X1 (f, K) of the frequency domain signals X1 (f, K) and X2 (f, K) is mainly used, and the frequency domain signal X2 (f, K) is used. However, the processing may be performed with the frequency domain signal X2 (f, K) as the main and the frequency domain signal X1 (f, K) as the sub (see equation (8) described later).

コヒーレンスフィルタ処理部１２は、後述する図２に示す詳細構成を有し、コヒーレンスフィルタ処理を実行し、雑音成分が抑圧された信号Ｙ（ｆ，Ｋ）を得て、ＩＦＦＴ部１３に与えるものである。 The coherence filter processing unit 12 has a detailed configuration shown in FIG. 2 to be described later. The coherence filter processing unit 12 performs coherence filter processing, obtains a signal Y (f, K) in which noise components are suppressed, and supplies the signal Y (f, K) to the IFFT unit 13. is there.

ＩＦＦＴ部１３は、雑音抑圧後信号Ｙ（ｆ，Ｋ）に対して、逆高速フーリエ変換を施して時間領域信号である出力信号ｙ（ｎ）を得るものである。 The IFFT unit 13 performs an inverse fast Fourier transform on the noise-suppressed signal Y (f, K) to obtain an output signal y (n) that is a time domain signal.

図２は、コヒーレンスフィルタ処理部１２の詳細構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a detailed configuration of the coherence filter processing unit 12.

図２において、コヒーレンスフィルタ処理部１２は、入力信号受信部２１、指向性形成部２２、フィルタ係数計算部２３、到来方位推定部２４、フロアリング閾値決定部２５、フロアリング処理部２６、フィルタ処理部２７及びフィルタ処理後信号送信部２８を有する。 In FIG. 2, the coherence filter processing unit 12 includes an input signal receiving unit 21, a directivity forming unit 22, a filter coefficient calculation unit 23, an arrival direction estimation unit 24, a flooring threshold value determination unit 25, a flooring processing unit 26, and filter processing. Unit 27 and post-filter processing signal transmission unit 28.

入力信号受信部２１は、ＦＦＴ部１１から出力された周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を受け取るものである。 The input signal receiving unit 21 receives the frequency domain signals X1 (f, K) and X2 (f, K) output from the FFT unit 11.

指向性形成部２２は、特定方向に指向性が強い２種類の指向性信号（第１及び第２の指向性信号）Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）を形成するものである。指向性信号Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）を形成する方法は、既存の方法を適用することができ、例えば、（３）式及び（４）式に従った演算により求める方法を適用することができる。

The directivity forming unit 22 forms two types of directivity signals (first and second directivity signals) B1 (f, K) and B2 (f, K) having strong directivity in a specific direction. . As a method of forming the directivity signals B1 (f, K) and B2 (f, K), an existing method can be applied. For example, a method of obtaining by calculation according to the equations (3) and (4). Can be applied.

以下、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）の算出式の意味を、（３）式を例に、図３及び図４を用いて説明する。図３（Ａ）に示した方向θから音波が到来し、距離ｌだけ隔てて設置されている一対のマイクロホンｍ１及びｍ２で捕捉されたとする。このとき、音波が一対のマイクロホンｍ１及びｍ２に到達するまでには時間差が生じる。この到達時間差τは、音の経路差をｄとすると、ｄ＝ｌ×ｓｉｎθなので、音速をｃとすると（５）式で与えられる。 Hereinafter, the meaning of the calculation formulas of the first and second directional signals B1 (f, K) and B2 (f, K) will be described with reference to FIGS. It is assumed that a sound wave arrives from the direction θ shown in FIG. 3A and is captured by a pair of microphones m1 and m2 that are separated by a distance l. At this time, there is a time difference until the sound wave reaches the pair of microphones m1 and m2. This arrival time difference τ is given by equation (5), where d = 1 × sin θ, where d is the sound path difference, and c is the sound speed.

τ＝ｌ×ｓｉｎθ／ｃ …（５）
ところで、入力信号ｓ１（ｎ）にτだけ遅延を与えた信号ｓ１（ｔ−τ）は、入力信号ｓ２（ｔ）と同一の信号である。従って、両者の差をとった信号ｙ（ｔ）＝ｓ２（ｔ）−ｓ１（ｔ−τ）は、θ方向から到来した音が除去された信号となる。結果として、一対のマイクロホン（マイクロホンアレー）ｍ１及びｍ２は図３（Ｂ）のような指向特性を持つようになる。 τ = 1 × sin θ / c (5)
Incidentally, a signal s1 (t−τ) obtained by delaying the input signal s1 (n) by τ is the same signal as the input signal s2 (t). Therefore, the signal y (t) = s2 (t) −s1 (t−τ) taking the difference between them is a signal from which the sound coming from the θ direction is removed. As a result, the pair of microphones (microphone array) m1 and m2 have directivity characteristics as shown in FIG.

なお、以上では、時間領域での演算を記したが、周波数領域で行っても同様なことがいえる。この場合の式が、上述した（３）式及び（４）式である。今、一例として、到来方位θが±９０度であることを想定する。すなわち、第１の指向性信号Ｂ１（ｆ）は、図４（Ａ）に示すように右方向に強い指向性を有し、第２の指向性信号Ｂ２（ｆ）は、図４（Ｂ）に示すように左方向に強い指向性を有する。なお、以降では、θ＝±９０度であることを想定して説明するが、θは±９０度に限定されるものではない。 In the above, the calculation in the time domain has been described, but the same can be said if it is performed in the frequency domain. The equations in this case are the above-described equations (3) and (4). As an example, it is assumed that the arrival direction θ is ± 90 degrees. That is, the first directivity signal B1 (f) has strong directivity in the right direction as shown in FIG. 4A, and the second directivity signal B2 (f) is shown in FIG. As shown in the figure, it has a strong directivity in the left direction. In the following description, it is assumed that θ = ± 90 degrees. However, θ is not limited to ± 90 degrees.

フィルタ係数計算部２３は、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）に基づいて、（６）式に従ってコヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）を計算するものである。

The filter coefficient calculation unit 23 calculates a coherence filter coefficient coef (f, K) according to the equation (6) based on the first and second directivity signals B1 (f, K) and B2 (f, K). Is.

到来方位推定部２４は、雑音の到来方位を推定し得る指標値を得てフロアリング閾値決定部２５に与えるものである。ここで、到来方位推定部２４は、雑音の到来方位の推定し得る指標値としてコヒーレンスＣＯＨ（Ｋ）を算出する。コヒーレンスＣＯＨ（Ｋ）は、（７）式に示すように、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）を全周波数で算術平均した値である。

The arrival azimuth estimation unit 24 obtains an index value that can estimate the noise arrival azimuth and gives it to the flooring threshold value determination unit 25. Here, the arrival direction estimation unit 24 calculates coherence COH (K) as an index value that can estimate the arrival direction of noise. The coherence COH (K) is a value obtained by arithmetically averaging the coherence filter coefficients coef (f, K) at all frequencies as shown in the equation (7).

図５は、コヒーレンスの挙動を示した説明図である。図５に示すように、雑音の到来方位に応じてコヒーレンスの値がとるレンジが変化することが分かる。この性質を用いることで、雑音の到来方位を推定することができる。 FIG. 5 is an explanatory diagram showing the behavior of coherence. As shown in FIG. 5, it can be seen that the range taken by the coherence value changes according to the arrival direction of noise. By using this property, the arrival direction of noise can be estimated.

フロアリング閾値決定部２５は、到来方位推定部２４が算出したコヒーレンスＣＯＨ（Ｋ）に基づいてフロアリング閾値を決定するものである。フロアリング閾値決定部２５は、雑音の到来方位が正面に近付く（コヒーレンスＣＯＨ（Ｋ）が大きくなる）ほど、大きくなるフロアリング閾値を決定するものであり、また、高域用のフロアリング閾値をそれより低域のフロアリング閾値より大きくなるようにフロアリング閾値を決定するものである。フロアリング閾値決定部２５は、このようなフロアリング閾値を決定することができるのであれば、その具体的な構成は問われない。例えば、フロアリング閾値決定部２５は、変換テーブルを利用してフロアリング閾値を決定するものであっても良く、変換関数の演算を実行してフロアリング閾値を決定するものであっても良い。 The flooring threshold value determination unit 25 determines a flooring threshold value based on the coherence COH (K) calculated by the arrival direction estimation unit 24. The flooring threshold value determination unit 25 determines a flooring threshold value that increases as the noise arrival direction approaches the front (the coherence COH (K) increases), and sets the flooring threshold value for high frequencies. The flooring threshold value is determined so as to be larger than the lower flooring threshold value. If the flooring threshold value determination part 25 can determine such a flooring threshold value, the specific structure will not be ask | required. For example, the flooring threshold value determination unit 25 may determine a flooring threshold value using a conversion table, or may execute a calculation of a conversion function to determine a flooring threshold value.

図６は、前者の決定方法で利用される、コヒーレンスＣＯＨ（Ｋ）と、帯域Θ、Ψ、Φ別のフロアリング閾値との対応（変換テーブル）を示す説明図である。 FIG. 6 is an explanatory diagram showing the correspondence (conversion table) between the coherence COH (K) and the flooring threshold values for the bands Θ, Ψ, and Φ, which are used in the former determination method.

フロアリング閾値は、帯域Θ、Ψ、Φごとに異なる値が設定されている。例えば、音声信号ｓ１（ｎ）、ｓ２（ｎ）のサンプリング周波数が８０００Ｈｚであれば、０〜２０００Ｈｚまでのコヒーレンスフィルタ係数用のフロアリング閾値Θ、２０００〜３０００Ｈｚまでのコヒーレンスフィルタ係数用のフロアリング閾値Ψ、３０００〜４０００Ｈｚまでのコヒーレンスフィルタ係数に用いるフロアリング閾値Φという３つの閾値が設定されている。そして、到来方位が正面に近付く（コヒーレンスＣＯＨ（Ｋ）が大きくなる）ほど、フロアリング閾値Θ、Ψ、Φは大きな値になるよう設定されている。 As the flooring threshold, different values are set for each of the bands Θ, Ψ, and Φ. For example, if the sampling frequency of the audio signals s1 (n) and s2 (n) is 8000 Hz, the flooring threshold Θ for coherence filter coefficients from 0 to 2000 Hz and the flooring threshold for coherence filter coefficients from 2000 to 3000 Hz. Three thresholds are set as a flooring threshold Φ used for a coherence filter coefficient up to Ψ, 3000 to 4000 Hz. The flooring threshold values Θ, Ψ, and Φ are set to be larger as the arrival direction approaches the front (the coherence COH (K) increases).

フロアリング閾値決定部２５は、与えられたコヒーレンスＣＯＨ（Ｋ）が変換テーブルのどの範囲Ａ以上Ｂ未満、Ｂ以上Ｃ未満、Ｃ以上Ｄ未満、…（但し、Ａ＜Ｂ＜Ｃ＜Ｄ＜…）に属するかを判定し、属する範囲に対応付けられている帯域Θ、Ψ、Φ別のフロアリング閾値の組を読み出してフロアリング処理部２６に与える。 The flooring threshold value determination unit 25 has a given coherence COH (K) in any range A to B, B to C, C to D in the conversion table, (A <B <C <D <. ), And a set of flooring thresholds corresponding to the bands Θ, Ψ, and Φ associated with the range to which it belongs is read and provided to the flooring processing unit 26.

図６の例では、コヒーレンスＣＯＨ（Ｋ）がＡ≦ＣＯＨ（Ｋ）＜Ｂの範囲に属する場合には、フロアリング閾値の組として、Θ＝α、Ψ＝δ、Φ＝ζが取り出され、コヒーレンスＣＯＨ（Ｋ）がＢ≦ＣＯＨ（Ｋ）＜Ｃの範囲に属する場合には、フロアリング閾値の組として、Θ＝β、Ψ＝ε、Φ＝ξが取り出され、コヒーレンスＣＯＨ（Ｋ）がＣ≦ＣＯＨ（Ｋ）＜Ｄの範囲に属する場合には、フロアリング閾値の組として、Θ＝γ、Ψ＝η、Φ＝ωが取り出される。なお、上述した通り、到来方位が正面に近付くほど、高域のコヒーレンスフィルタ係数が特に大きく増加するので、これに対応して帯域別のフロアリング閾値の大小関係がΘ＜Ψ＜Φという関係を満たすように、フロアリング閾値の具体的な値α、β、γ、δ、ε、η、ζ、ξ、ωが設定されている。 In the example of FIG. 6, when coherence COH (K) belongs to the range of A ≦ COH (K) <B, Θ = α, Ψ = δ, and Φ = ζ are taken out as a flooring threshold set, When the coherence COH (K) belongs to the range of B ≦ COH (K) <C, Θ = β, Ψ = ε, and Φ = ξ are extracted as a flooring threshold set, and the coherence COH (K) is When belonging to the range of C ≦ COH (K) <D, Θ = γ, Ψ = η, and Φ = ω are taken out as a flooring threshold set. As described above, as the direction of arrival approaches the front, the high frequency coherence filter coefficient increases particularly greatly, and accordingly, the magnitude relationship of the flooring threshold for each band corresponds to the relationship Θ <Ψ <Φ. Specific values α, β, γ, δ, ε, η, ζ, ξ, and ω of the flooring threshold are set so as to satisfy.

図７は、フロアリング閾値決定部２５が、変換関数の演算を実行してフロアリング閾値を決定する際に用いる変換関数の一例を示している。変換関数は、コヒーレンスＣＯＨ（Ｋ）の値毎に（若しくは、値が属する範囲毎に）設定されているものである。 FIG. 7 shows an example of the conversion function used when the flooring threshold value determination unit 25 determines the flooring threshold value by executing the calculation of the conversion function. The conversion function is set for each value of coherence COH (K) (or for each range to which the value belongs).

変換関数において、高周波数ほどフロアリング閾値の値を大きくすることを要するので、傾きは周波数が高い帯域ほど大きくなるよう設定しておく。また、帯域ごとに傾きが異なる直線をフロアリング閾値とする場合、雑音の到来方位が正面に近付くほど、フロアリング閾値直線の傾きは大きくなるようにする。直線に限定せず、周波数帯域ごとに異なる曲率の曲線を用いるなど、任意の曲線を、フロアリング閾値を決定する関数とするようにして良い。 In the conversion function, since it is necessary to increase the flooring threshold value as the frequency increases, the slope is set so as to increase as the frequency increases. Further, when a straight line having a different slope for each band is used as the flooring threshold, the slope of the flooring threshold straight line is increased as the noise arrival direction approaches the front. An arbitrary curve may be used as a function for determining the flooring threshold, such as using a curve having a different curvature for each frequency band without being limited to a straight line.

以下の各部の機能説明や動作の項の説明では、図６の変換テーブルを適用した場合について説明する。 In the following description of functions and operations of each unit, a case where the conversion table of FIG. 6 is applied will be described.

フロアリング処理部２６は、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）に対し、フロアリング閾値決定部２５が決定したフロアリング閾値を適用してフロアリング処理を行うものである。例えば、周波数ｆｉが最も低い帯域Θに属する場合、そのコヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）のフロアリング処理には、フロアリング処理部２６は、帯域Θについて決定されたフロアリング閾値α、β又はγ（図６参照）を適用する。また例えば、周波数ｆｉが低い方から２番目の帯域Ψに属する場合、そのコヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）のフロアリング処理には、フロアリング処理部２６は、帯域Ψについて決定されたフロアリング閾値δ、ε又はη（図６参照）を適用する。さらに例えば、周波数ｆｉが最も高い帯域Φに属する場合、そのコヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）のフロアリング処理には、フロアリング処理部２６は、帯域Φについて決定されたフロアリング閾値ζ、ξ又はω（図６参照）を適用する。 The flooring processing unit 26 performs flooring processing by applying the flooring threshold determined by the flooring threshold determination unit 25 to the coherence filter coefficient coef (f, K). For example, when the frequency fi belongs to the lowest band Θ, for the flooring process of the coherence filter coefficient coef (fi, K), the flooring processing unit 26 determines the flooring thresholds α, β or Apply γ (see FIG. 6). For example, when the frequency fi belongs to the second band Ψ from the lowest, the flooring processing unit 26 performs the flooring determined for the band Ψ for the flooring process of the coherence filter coefficient coef (fi, K). The threshold δ, ε or η (see FIG. 6) is applied. Further, for example, when the frequency fi belongs to the highest band Φ, for the flooring process of the coherence filter coefficient coef (fi, K), the flooring processing unit 26 determines the flooring thresholds ζ and ξ determined for the band Φ. Alternatively, ω (see FIG. 6) is applied.

フィルタ処理部２７は、フロアリング処理後のコヒーレンスフィルタ係数ｆｌｃｏｅｆ（ｆ、Ｋ）を適用して、（８）式に示すように、メインの周波数領域信号Ｘ１（ｆ，Ｋ）に対するコヒーレンスフィルタ処理を行い、雑音抑圧後信号（フィルタ処理後信号）Ｙ（ｆ、Ｋ）を得るものである。なお、（８）式は、各周波数のそれぞれの演算（乗算処理）を表している。 The filter processing unit 27 applies the coherence filter coefficient flcoef (f, K) after the flooring process, and performs the coherence filter process on the main frequency domain signal X1 (f, K) as shown in the equation (8). To obtain a noise-suppressed signal (filtered signal) Y (f, K). In addition, (8) Formula represents each calculation (multiplication process) of each frequency.

Ｙ（ｆ、Ｋ）＝Ｘ１（ｆ、Ｋ）×ｆｌｃｏｅｆ（ｆ、Ｋ） …（８）
ここで、コヒーレンスフィルタ処理の物理的な意味を補足しておく。コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）（フロアリング処理後コヒーレンスフィルタ係数ｆｌｃｏｅｆ（ｆ、Ｋ）も同様）は、左右に死角を有する信号成分の相互相関であるので、相関が大きい場合には到来方位には偏りがない正面から到来する音声成分であり、相関が小さい場合には到来方位が右か左に偏った成分である、というように入力音声の到来方位とも対応付けられる。従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）を乗算することは横から到来する雑音成分を抑圧する処理であるということができる。 Y (f, K) = X1 (f, K) × flcoef (f, K) (8)
Here, the physical meaning of the coherence filter process will be supplemented. The coherence filter coefficient coef (f, K) (the same applies to the post-flooring coherence filter coefficient flcoef (f, K)) is a cross-correlation of signal components having blind spots on the left and right. Is a voice component arriving from the front with no bias, and when the correlation is small, the arrival azimuth is a component biased to the right or left. Therefore, multiplication by the coherence filter coefficient coef (f, K) can be said to be processing for suppressing a noise component coming from the side.

フィルタ処理後信号送信部２８は、雑音抑圧後信号Ｙ（ｆ，Ｋ）を後段のＩＦＦＴ部１３に与えるものである。また、フィルタ処理後信号送信部２８は、Ｋを１だけ増加させて次のフレームの処理を起動させるものである。 The post-filter processing signal transmission unit 28 supplies the post-noise suppression signal Y (f, K) to the IFFT unit 13 at the subsequent stage. Further, the post-filter processing signal transmission unit 28 increases K by 1 and starts processing of the next frame.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態の音声信号処理装置１０の動作を、図面を参照しながら、全体動作、コヒーレンスフィルタ処理部１２における詳細動作の順に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the audio signal processing device 10 of the first embodiment will be described in the order of overall operation and detailed operation in the coherence filter processing unit 12 with reference to the drawings. To do.

一対のマイクロホンｍ１及びｍ２から入力された信号ｓ１（ｎ）、ｓ２（ｎ）はそれぞれ、ＦＦＴ部１１によって時間領域から周波数領域の信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換された後、コヒーレンスフィルタ処理部１２に与えられる。これにより、コヒーレンスフィルタ処理部１２において、コヒーレンスフィルタ処理が実行され、得られた雑音抑圧後信号Ｙ（ｆ，Ｋ）がＩＦＦＴ部１３に与えられる。ＩＦＦＴ部１３においては、周波数領域信号である雑音抑圧後信号Ｙ（ｆ，Ｋ）が、逆高速フーリエ変換によって、時間領域信号ｙ（ｎ）に変換され、この時間領域信号ｙ（ｎ）が出力される。 Signals s1 (n) and s2 (n) input from the pair of microphones m1 and m2 are respectively converted from time domain to frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 11. Is then provided to the coherence filter processing unit 12. Thus, the coherence filter processing unit 12 performs coherence filter processing, and the obtained noise-suppressed signal Y (f, K) is provided to the IFFT unit 13. In IFFT section 13, noise-suppressed signal Y (f, K), which is a frequency domain signal, is converted into time domain signal y (n) by inverse fast Fourier transform, and this time domain signal y (n) is output. Is done.

次に、コヒーレンスフィルタ処理部１２における詳細動作を説明する。以下では、あるフレームの処理を説明するが、このようなフレーム単位の処理が、フレームごとに繰り返される。 Next, a detailed operation in the coherence filter processing unit 12 will be described. In the following, processing of a certain frame will be described, but such frame-based processing is repeated for each frame.

新たなフレームになり、新たなフレーム（現フレームＫ）の周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）がＦＦＴ部１１から与えられると、指向性形成部２２によって、（３）式及び（４）式に従って、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）が計算され、さらに、これらの指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）に基づき、フィルタ係数計算部２３によって、（６）式に従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）が計算される。 When the frequency domain signals X1 (f, K) and X2 (f, K) of the new frame (current frame K) are given from the FFT unit 11, the directivity forming unit 22 (3) The first and second directional signals B1 (f, K) and B2 (f, K) are calculated according to the equations (4) and (4), and these directional signals B1 (f, K) and B2 ( Based on (f, K), the filter coefficient calculation unit 23 calculates the coherence filter coefficient coef (f, K) according to the equation (6).

計算されたコヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）に基づいて、到来方位推定部２４によって、雑音の到来方位を推定し得る指標値として、（７）式に従って、コヒーレンスＣＯＨ（Ｋ）が算出される。フロアリング閾値決定部２５によって、算出されたコヒーレンスＣＯＨ（Ｋ）が、変換テーブルのどの範囲Ａ以上Ｂ未満、Ｂ以上Ｃ未満、Ｃ以上Ｄ未満、…に属するかが判定され、属する範囲に対応付けられている帯域Θ、Ψ、Φ別のフロアリング閾値の組が変換テーブルから読み出される。これにより、フロアリング処理部２６によって、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）に対し、読み出されたフロアリング閾値が適用されてフロアリング処理が実行される。 Based on the calculated coherence filter coefficient coef (f, K), the coherence COH (K) is calculated by the arrival azimuth estimation unit 24 as an index value that can estimate the noise arrival azimuth according to the equation (7). . The flooring threshold value determination unit 25 determines which range A to B, B to C, C to D,... Of the conversion table the calculated coherence COH (K) belongs to. A set of flooring threshold values for each band Θ, Ψ, and Φ attached is read from the conversion table. Accordingly, the flooring processing unit 26 applies the read flooring threshold to the coherence filter coefficient coef (f, K) and executes the flooring process.

図８は、フロアリング処理部２６が実行するフロアリング処理を示すフローチャートである。 FIG. 8 is a flowchart showing the flooring process executed by the flooring processing unit 26.

フロアリング処理部２６は、ある周波数ｆｉのコヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）が、当該周波数ｆｉが属する帯域Θ、Ψ又はΦについて読み出されたフロアリング閾値ＴＨ（Ｋ）（＝α、β、γ、δ、ε、η、ζ、ξ又はω）より小さいか否かを判定し（ステップＳ１）、コヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）がフロアリング閾値ＴＨ（Ｋ）より小さい場合には、フロアリング閾値ＴＨ（Ｋ）をフロアリング処理後のコヒーレンスフィルタ係数ｆｌｃｏｅｆ（ｆｉ，Ｋ）に設定し（ステップＳ２）、コヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）がフロアリング閾値ＴＨ（Ｋ）以上の場合には、コヒーレンスフィルタ係数ｃｏｅｆ（ｆｉ，Ｋ）をそのままフロアリング処理後のコヒーレンスフィルタ係数ｆｌｃｏｅｆ（ｆｉ，Ｋ）に設定する（ステップＳ３）。 The flooring processing unit 26 uses a flooring threshold TH (K) (= α, β) in which a coherence filter coefficient coef (fi, K) of a certain frequency fi is read for the band Θ, Ψ, or Φ to which the frequency fi belongs. , Γ, δ, ε, η, ζ, ξ, or ω) (step S1). If the coherence filter coefficient coef (fi, K) is smaller than the flooring threshold TH (K) The flooring threshold TH (K) is set to the coherence filter coefficient flcoef (fi, K) after the flooring process (step S2), and the coherence filter coefficient coef (fi, K) is equal to or greater than the flooring threshold TH (K). In this case, the coherence filter coefficient coef (fi, K) is directly used as the coherence filter coefficient flcoef (f Is set to K) (step S3).

フロアリング処理後のコヒーレンスフィルタ係数ｆｌｃｏｅｆ（ｆ、Ｋ）を適用して、フィルタ処理部２７によって、（８）式に示すような、メインの周波数領域信号Ｘ１（ｆ，Ｋ）に対するコヒーレンスフィルタ処理が実行され、得られた雑音抑圧後信号（フィルタ処理後信号）Ｙ（ｆ、Ｋ）が、フィルタ処理後信号送信部２８によってＩＦＦＴ部１３に与えられると共に、フレーム変数Ｋが１だけ増加されて、次のフレームの処理に移行される。 By applying the coherence filter coefficient flcoef (f, K) after the flooring process, the filter processing unit 27 performs the coherence filter process for the main frequency domain signal X1 (f, K) as shown in the equation (8). The obtained noise-suppressed signal (filtered signal) Y (f, K) that is executed is provided to the IFFT unit 13 by the filtered signal transmitter 28, and the frame variable K is increased by 1, Processing proceeds to the next frame.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、妨害音声の到来方位に応じて決定したフロアリング閾値を適用して、コヒーレンスフィルタ係数にフロアリング処理を施すようにしたので、雑音抑圧性能と音質のバランスがとれるようなフロアリング処理を実行でき、フロアリング処理による所望した効果を得ることができる。 (A-3) Effects of the First Embodiment According to the first embodiment, the flooring threshold determined according to the arrival direction of the disturbing speech is applied, and the flooring process is performed on the coherence filter coefficient. Therefore, a flooring process that balances noise suppression performance and sound quality can be executed, and a desired effect by the flooring process can be obtained.

ここで、到来方位が正面に近付くほど高域のコヒーレンスフィルタ係数が増大するという独特の挙動に基づいて高域のフロアリング閾値を設定しているので、上記効果を一段と発揮させることができる。 Here, since the high-frequency flooring threshold is set based on the unique behavior that the high-frequency coherence filter coefficient increases as the arrival direction approaches the front, the above effect can be further exhibited.

これにより、第１の実施形態の音声信号処理装置若しくはプログラムを適用した、テレビ会議装置や携帯電話機などの通信装置における通話音質の向上が期待できる。 As a result, it is possible to expect improvement in call sound quality in a communication device such as a video conference device or a mobile phone to which the audio signal processing device or program of the first embodiment is applied.

（Ｂ）第２の実施形態
次に、本発明による音声信号処理装置及びプログラムの第２の実施形態を説明する。 (B) Second Embodiment Next, a second embodiment of the audio signal processing apparatus and program according to the present invention will be described.

第１の実施形態は、妨害音声の到来方位を表す指標値としてコヒーレンスＣＯＨを適用したものであった。この第２の実施形態は、妨害音声の到来方位を表す指標値としてコヒーレンスＣＯＨ（Ｋ）に代えて、ＳＮ比ＳＮＲ（Ｋ）を適用することとしたものである。 In the first embodiment, coherence COH is applied as an index value representing the arrival direction of disturbing speech. In the second embodiment, the SN ratio SNR (K) is applied instead of the coherence COH (K) as an index value representing the arrival direction of the disturbing speech.

第２の実施形態の音声信号処理装置も、その全体構成は、第１の実施形態の説明で用いた図１で表すことができる。また、第２の実施形態のコヒーレンスフィルタ処理部の詳細構成も、第１の実施形態の説明で用いた図２で表すことができる。 The overall configuration of the audio signal processing apparatus according to the second embodiment can be represented by FIG. 1 used in the description of the first embodiment. The detailed configuration of the coherence filter processing unit of the second embodiment can also be represented by FIG. 2 used in the description of the first embodiment.

但し、上述したように、到来方位推定部２４は、第１の実施形態と異なり、コヒーレンスＣＯＨ（Ｋ）ではなくＳＮ比ＳＮＲ（Ｋ）を算出するものである。フロアリング閾値決定部２５は、算出されたＳＮ比ＳＮＲ（Ｋ）に基づいて、帯域別のフロアリング閾値を決定するものである。 However, as described above, the arrival direction estimation unit 24 calculates the SN ratio SNR (K) instead of the coherence COH (K), unlike the first embodiment. The flooring threshold value determination unit 25 determines a flooring threshold value for each band based on the calculated SN ratio SNR (K).

以下、第２の実施形態の到来方位推定部２４が実行する、ＳＮ比ＳＮＲ（Ｋ）の算出方法を説明する。 Hereinafter, a method for calculating the SN ratio SNR (K) executed by the arrival direction estimation unit 24 of the second embodiment will be described.

到来方位推定部２４は、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、（９）式に従って、雑音信号Ｎ（ｆ，Ｋ）を算出する。（９）式の演算は、図９に示すように、正面に死角を有する指向性を形成する処理に相当する。従って、左右から到来する成分のみを得ることができる。今、目的方向を正面方向に想定しているので（例えば、目的話者が正面にいることを想定している）、横から到来する成分は雑音であるということができる。 The arrival direction estimation unit 24 calculates the noise signal N (f, K) according to the equation (9) based on the frequency domain signals X1 (f, K) and X2 (f, K). The calculation of equation (9) corresponds to a process of forming directivity having a blind spot on the front as shown in FIG. Therefore, only components coming from the left and right can be obtained. Since the target direction is assumed to be the front direction (for example, the target speaker is assumed to be in front), it can be said that the component coming from the side is noise.

Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（９）
次に、到来方位推定部２４は、メインの周波数領域信号Ｘ１（ｆ，Ｋ）と雑音信号Ｎ（ｆ，Ｋ）とに基づいて、（１０）式に従って、現フレームＫにおけるＳＮ比ＳＮＲ（Ｋ）を計算する。（１０）式の分母は、雑音信号のレベルであり、分子は、目的音信号のレベルである。目的音は正面から到来し、雑音は横（左右）から到来することを前提しているので、（１０）式によってＳＮ比を推定することができる。（１０）式におけるηは、０より大きく１より小さいパラメータである。

N (f, K) = X1 (f, K) -X2 (f, K) (9)
Next, the arrival direction estimation unit 24, based on the main frequency domain signal X1 (f, K) and the noise signal N (f, K), according to the equation (10), the SN ratio SNR (K in the current frame K) ). The denominator of equation (10) is the level of the noise signal, and the numerator is the level of the target sound signal. Since it is assumed that the target sound comes from the front and the noise comes from the side (left and right), the S / N ratio can be estimated by equation (10). In the equation (10), η is a parameter larger than 0 and smaller than 1.

以上のように算出されたＳＮ比ＳＮＲ（Ｋ）を、妨害音声の到来方位を表す指標値として適用し、上述した第１の実施形態と同様にして、妨害音声の到来方位に応じたフロアリング閾値を決定する。 The SN ratio SNR (K) calculated as described above is applied as an index value representing the arrival direction of disturbing speech, and flooring corresponding to the arrival direction of disturbing speech is performed in the same manner as in the first embodiment described above. Determine the threshold.

第２の実施形態によっても、妨害音声の到来方位に応じて決定したフロアリング閾値を適用して、コヒーレンスフィルタ係数にフロアリング処理を施すようにしたので、第１の実施形態と同様な効果を奏することができる。 Also in the second embodiment, the flooring threshold determined in accordance with the arrival direction of the disturbing speech is applied and the flooring process is performed on the coherence filter coefficient. Therefore, the same effect as that of the first embodiment can be obtained. Can play.

（Ｃ）他の実施形態
上記各実施形態の説明においても、種々変形実施形態について言及したが、さらに、以下に例示するような変形実施形態を挙げることができる。 (C) Other Embodiments In the description of each of the above embodiments, various modified embodiments have been referred to, but further modified embodiments as exemplified below can be given.

上記第１の実施形態では、コヒーレンスＣＯＨ（Ｋ）を妨害音声の到来方位を表す指標値として適用し、上記第２の実施形態ではＳＮ比ＳＮＲ（Ｋ）を妨害音声の到来方位を表す指標値として適用したものを示したが、妨害音声の到来方位を表すものであれば、他の指標値を適用しても良く、また、複数の指標値を同時に適用するようにしても良い。例えば、コヒーレンスＣＯＨ（Ｋ）が属する範囲とＳＮ比ＳＮＲ（Ｋ）が属する範囲との組み合わせに応じて、フロアリング閾値ＴＨ（Ｋ）を定めるようにしても良い。 In the first embodiment, coherence COH (K) is applied as an index value representing the arrival direction of disturbing speech, and in the second embodiment, the SN ratio SNR (K) is an index value representing the arrival direction of disturbing speech. However, as long as it represents the arrival direction of disturbing speech, other index values may be applied, or a plurality of index values may be applied simultaneously. For example, the flooring threshold TH (K) may be determined according to the combination of the range to which the coherence COH (K) belongs and the range to which the SN ratio SNR (K) belongs.

上記各実施形態では、フロアリング閾値（若しくはフロアリング閾値の算出関数）を切り替える帯域の数が３帯域のものを示したが、帯域数は２帯域以上であれば良い。 In each of the above embodiments, the number of bands for switching the flooring threshold (or the flooring threshold calculation function) is three, but the number of bands may be two or more.

上記各実施形態において、周波数領域の信号で処理していた処理を、可能ならば時間領域の信号で処理するようにしても良く、逆に、時間領域の信号で処理していた処理を、可能ならば周波数領域の信号で処理するようにしても良い。 In each of the above embodiments, the processing that was processed with the frequency domain signal may be performed with the time domain signal if possible, and conversely, the processing that was processed with the time domain signal is possible. In this case, processing may be performed using a frequency domain signal.

上記各実施形態では、雑音抑制技術として、コヒーレンスフィルタ法を単独で適用したものを示したが、他の雑音抑制技術（特許文献１参照）、例えば、ボイススイッチ法、ウィーナーフィルタ法、周波数減算法と併用するようにしても良い。 In each of the above-described embodiments, the noise suppression technique is shown by applying the coherence filter method alone, but other noise suppression techniques (see Patent Document 1), for example, the voice switch method, the Wiener filter method, the frequency subtraction method, and the like. You may make it use together.

上記各実施形態では、一対のマイクロホンが捕捉した信号を直ちに処理する音声信号処理装置やプログラムを示したが、本発明の処理対象の音声信号はこれに限定されるものではない。例えば、記録媒体から読み出した一対の音声信号を処理する場合にも、本発明を適用することができ、また、対向装置から送信されてきた一対の音声信号を処理する場合にも、本発明を適用することができる。 In each of the above-described embodiments, the audio signal processing apparatus and the program that immediately process the signal captured by the pair of microphones are shown, but the audio signal to be processed of the present invention is not limited to this. For example, the present invention can be applied to processing a pair of audio signals read from a recording medium, and the present invention can also be applied to processing a pair of audio signals transmitted from the opposite device. Can be applied.

１０…音声信号処理装置、１１…ＦＦＴ部、１２…コヒーレンスフィルタ処理部、１３…ＩＦＦＴ部、ｍ１、ｍ２…マイクロホン、２１…入力信号受信部、２２…指向性形成部、２３…フィルタ係数計算部、２４…到来方位推定部、２５…フロアリング閾値決定部、２６…フロアリング処理部、２７…フィルタ処理部、２８…フィルタ処理後信号送信部。 DESCRIPTION OF SYMBOLS 10 ... Audio | voice signal processing apparatus, 11 ... FFT part, 12 ... Coherence filter processing part, 13 ... IFFT part, m1, m2 ... Microphone, 21 ... Input signal receiving part, 22 ... Directivity formation part, 23 ... Filter coefficient calculation part , 24 ... arrival direction estimation unit, 25 ... flooring threshold value determination unit, 26 ... flooring processing unit, 27 ... filter processing unit, 28 ... post-filter processing signal transmission unit.

Claims

In an audio signal processing apparatus that suppresses noise components included in an input audio signal by coherence filter processing,
Coherence filter coefficient calculating means for calculating a coherence filter coefficient;
Disturbing voice arrival direction index value obtaining means for obtaining an index value representing the arrival direction of disturbing voice included in the input voice signal;
In accordance with the acquired index value, a flooring threshold value determining means for determining a flooring threshold value so as to take a larger value as the arrival direction of the disturbing speech is closer to the arrival direction of the target speech;
An audio signal processing apparatus comprising: flooring processing means for performing flooring by applying a determined flooring threshold to the calculated coherence filter coefficient of each frequency.

The speech signal processing apparatus according to claim 1, wherein the disturbing speech arrival direction index value acquisition means calculates coherence, which is an arithmetic average value at all frequencies of the coherence filter coefficient, as the index value.

The speech signal processing apparatus according to claim 1, wherein the disturbing speech arrival direction index value acquisition unit calculates an SN ratio of the input speech signal as an index value.

The flooring threshold value determining means determines the flooring threshold value according to a simple increasing function in which a flooring threshold value for a coherence filter coefficient having a high frequency is larger than a flooring threshold value for a coherence filter coefficient having a low frequency. The audio signal processing device according to claim 1.

5. The audio signal processing apparatus according to claim 4, wherein the simple increase function is a step function.

5. The audio signal processing apparatus according to claim 4, wherein the simple increase function is a polygonal function having a larger slope on a higher frequency side.

5. The audio signal processing apparatus according to claim 4, wherein the simple increase function is a function obtained by connecting a plurality of curved functions.

A computer mounted on an audio signal processing device that suppresses noise components contained in the input audio signal by coherence filter processing,
Coherence filter coefficient calculating means for calculating a coherence filter coefficient;
Disturbing voice arrival direction index value obtaining means for obtaining an index value representing the arrival direction of disturbing voice included in the input voice signal;
In accordance with the acquired index value, a flooring threshold value determining means for determining a flooring threshold value so as to take a larger value as the arrival direction of the disturbing speech is closer to the arrival direction of the target speech;
An audio signal processing program that functions as flooring processing means for performing flooring by applying a determined flooring threshold to the calculated coherence filter coefficient of each frequency.