JP2015025914A

JP2015025914A - Voice signal processor and program

Info

Publication number: JP2015025914A
Application number: JP2013154826A
Authority: JP
Inventors: 克之高橋; Katsuyuki Takahashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2013-07-25
Filing date: 2013-07-25
Publication date: 2015-02-05

Abstract

PROBLEM TO BE SOLVED: To provide a voice signal processor capable of suppressing the occurrence of musical noise even if noise components are suppressed according to a coherence filter method.SOLUTION: The present invention is related to a voice signal processor that suppresses noise components included in input voice signals by coherence filter processing. The processor includes: means for calculating a coherence filter coefficient; and means for smoothing the calculated coherence filter coefficient in a frequency area and then making it apply to the coherence filter processing.

Description

本発明は、音声信号処理装置及びプログラムに関し、例えば、電話機やテレビ会議装置などの音声信号（この明細書では、音声信号や音響信号等の音信号を「音声信号」と呼んでいる）を扱う通信機や通信ソフトウェアに適用し得るものである。 The present invention relates to an audio signal processing apparatus and program, and handles, for example, an audio signal from a telephone or a video conference apparatus (in this specification, an audio signal such as an audio signal or an acoustic signal is called an “audio signal”). It can be applied to communication devices and communication software.

取得した音声信号中に含まれる雑音成分を抑圧する手法の一つとして、コヒーレンスフィルタ法が挙げられる。コヒーレンスフィルタ法は、特許文献１に記載されているように、左右に死角を有する信号の相互相関を周波数ごとに乗算することで、到来方位に偏りが大きい雑音成分を抑圧する手法である。 One of the methods for suppressing the noise component contained in the acquired audio signal is a coherence filter method. As described in Patent Document 1, the coherence filter method is a method of suppressing a noise component having a large bias in the arrival direction by multiplying the cross-correlation of signals having blind spots on the left and right for each frequency.

特開２０１３−０６１４２１号公報JP 2013-061421 A

しかし、コヒーレンスフィルタ法は、雑音成分を抑圧する効果があるが、一方、ミュージカルノイズという異音成分（トーン性の雑音）を発生させ、音の自然さを損ねてしまうという課題がある。 However, the coherence filter method has an effect of suppressing a noise component, but on the other hand, there is a problem that an abnormal sound component (tone noise) called musical noise is generated and the naturalness of sound is impaired.

そのため、コヒーレンスフィルタ法に従って雑音成分を抑圧しても、ミュージカルノイズの発生を抑えることができる音声信号処理装置及びプログラムが望まれている。 Therefore, there is a demand for an audio signal processing apparatus and program that can suppress the occurrence of musical noise even if the noise component is suppressed according to the coherence filter method.

第１の本発明は、入力音声信号に含まれている雑音成分をコヒーレンスフィルタ処理によって抑制する音声信号処理装置において、（１）コヒーレンスフィルタ係数を算出するコヒーレンスフィルタ係数算出手段と、（２）算出された上記コヒーレンスフィルタ係数を、周波数領域上で平滑化してから、コヒーレンスフィルタ処理に適用させる係数平滑化手段とを有することを特徴とする。 According to a first aspect of the present invention, in a speech signal processing apparatus that suppresses a noise component contained in an input speech signal by coherence filter processing, (1) coherence filter coefficient calculation means for calculating a coherence filter coefficient, and (2) calculation Coefficient smoothing means for smoothing the above-described coherence filter coefficient on the frequency domain and then applying it to the coherence filter processing is provided.

第２の本発明の音声信号処理プログラムは、入力音声信号に含まれている雑音成分をコヒーレンスフィルタ処理によって抑制する音声信号処理装置に搭載されたコンピュータを、（１）コヒーレンスフィルタ係数を算出するコヒーレンスフィルタ係数算出手段と、（２）算出された上記コヒーレンスフィルタ係数を、周波数領域上で平滑化してから、コヒーレンスフィルタ処理に適用させる係数平滑化手段として機能させることを特徴とする。 The audio signal processing program according to the second aspect of the present invention provides a computer mounted on an audio signal processing device that suppresses noise components contained in an input audio signal by coherence filter processing. (1) Coherence for calculating coherence filter coefficients And (2) smoothing the calculated coherence filter coefficient in the frequency domain, and then functioning as a coefficient smoothing means to be applied to the coherence filter processing.

本発明によれば、一旦得られたコヒーレンスフィルタ係数を周波数領域上で平滑化してから、コヒーレンスフィルタ処理に用いるようにしたので、コヒーレンスフィルタ法に従って雑音成分を抑圧しても、ミュージカルノイズの発生を抑えることができる音声信号処理装置及びプログラムを提供できる。 According to the present invention, since the obtained coherence filter coefficient is smoothed in the frequency domain and then used for the coherence filter processing, even if the noise component is suppressed according to the coherence filter method, the generation of musical noise is prevented. An audio signal processing device and a program that can be suppressed can be provided.

第１の実施形態の音声信号処理装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the audio | voice signal processing apparatus of 1st Embodiment. 第１の実施形態におけるコヒーレンスフィルタ処理部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the coherence filter process part in 1st Embodiment. 第１の実施形態における指向性形成部からの指向性信号の性質を示す説明図である。It is explanatory drawing which shows the property of the directivity signal from the directivity formation part in 1st Embodiment. 第１の実施形態における指向性形成部による２つの指向性の特性を示す説明図である。It is explanatory drawing which shows the characteristic of two directivities by the directivity formation part in 1st Embodiment. 第１の実施形態におけるコヒーレンスフィルタ処理部の詳細動作を示すフローチャートである。It is a flowchart which shows detailed operation | movement of the coherence filter process part in 1st Embodiment. 第２の実施形態におけるコヒーレンスフィルタ処理部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the coherence filter process part in 2nd Embodiment. 方位ごとのコヒーレンスの挙動を示す説明図である。It is explanatory drawing which shows the behavior of the coherence for every azimuth | direction. 第２の実施形態における平均化パラメータ記憶部が記憶しているコヒーレンスと平均化パラメータとの対応（変換テーブル）を示す説明図である。It is explanatory drawing which shows a response | compatibility (conversion table) with the coherence which the averaging parameter memory | storage part in 2nd Embodiment has memorize | stored, and an averaging parameter. 第３の実施形態において生成される雑音信号の指向性を示す説明図である。It is explanatory drawing which shows the directivity of the noise signal produced | generated in 3rd Embodiment.

（Ａ）第１の実施形態
以下、本発明による音声信号処理装置及びプログラムの第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of an audio signal processing device and a program according to the present invention will be described in detail with reference to the drawings.

コヒーレンスフィルタ法に従って雑音成分を抑圧したときに、ミュージカルノイズが発生する要因は、コヒーレンスフィルタ係数の付与により、特定の周波数成分が際立って大きくなる、あるいは、小さくなることによって、周波数領域上の孤立点が生じることであることを、本願発明者は認識した。 When noise components are suppressed according to the coherence filter method, the cause of musical noise is that isolated points in the frequency domain are caused by a particular frequency component becoming significantly larger or smaller due to the addition of coherence filter coefficients. The inventor of the present application has recognized that this occurs.

第１の実施形態の音声信号処理装置及びプログラムは、コヒーレンスフィルタ係数を、近接する周波数成分のコヒーレンスフィルタ係数を用いて平滑化することで周波数領域での孤立点の発生を抑制し、ミュージカルノイズを軽減しようとしたものである。 The audio signal processing apparatus and program according to the first embodiment suppress the occurrence of isolated points in the frequency domain by smoothing the coherence filter coefficients using the coherence filter coefficients of the adjacent frequency components, and reduce musical noise. I tried to reduce it.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る音声信号処理装置の構成を示すブロック図である。ここで、一対のマイクロホンｍ１及びｍ２を除いた部分は、ハードウェアで構成することも可能であり、また、ＣＰＵが実行するソフトウェア（音声信号処理プログラム）とＣＰＵとで実現することも可能であるが、いずれの実現方法を採用した場合であっても、機能的には図１で表すことができる。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a configuration of an audio signal processing device according to the first embodiment. Here, the part excluding the pair of microphones m1 and m2 can be configured by hardware, and can also be realized by software (audio signal processing program) executed by the CPU and the CPU. However, even if any realization method is adopted, it can be functionally represented in FIG.

図１において、第１の実施形態に係る音声信号処理装置１０は、一対のマイクロホンｍ１、ｍ２、ＦＦＴ（高速フーリエ変換）部１１、コヒーレンスフィルタ処理部１２及びＩＦＦＴ（逆高速フーリエ変換）部１３を有する。 In FIG. 1, an audio signal processing apparatus 10 according to the first embodiment includes a pair of microphones m1 and m2, an FFT (Fast Fourier Transform) unit 11, a coherence filter processing unit 12, and an IFFT (Inverse Fast Fourier Transform) unit 13. Have.

一対のマイクロホンｍ１、ｍ２は、所定距離（若しくは任意の距離）だけ離れて配置され、それぞれ、周囲の音声を捕捉するものである。各マイクロホンｍ１、ｍ２は、無指向のもの（若しくは、正面方向にごくごく緩やかな指向性を有するもの）である。各マイクロホンｍ１、ｍ２で捕捉された音声信号（入力信号）は、図示しない対応するＡ／Ｄ変換器を介してデジタル信号ｓ１（ｎ）、ｓ２（ｎ）に変換されてＦＦＴ部１１に与えられる。なお、ｎはサンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The pair of microphones m1 and m2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each captures surrounding sounds. Each of the microphones m1 and m2 is omnidirectional (or has a very gentle directivity in the front direction). Audio signals (input signals) captured by the respective microphones m1 and m2 are converted into digital signals s1 (n) and s2 (n) via corresponding A / D converters (not shown) and given to the FFT unit 11. . Note that n is an index indicating the input order of samples, and is expressed as a positive integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is the newer input sample.

ＦＦＴ部１１は、マイクロホンｍ１及びｍ２から入力信号系列ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力信号ｓ１及びｓ２を周波数領域で表現することができる。なお、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から、所定のＮ個のサンプルからなる分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成して適用する。入力信号ｓ１（ｎ）から分析フレームＦＲＡＭＥ１（Ｋ）を構成する例を以下の（１）式に示すが、分析フレームＦＲＡＭＥ２（Ｋ）も同様である。

The FFT unit 11 receives input signal sequences s1 (n) and s2 (n) from the microphones m1 and m2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. Thereby, the input signals s1 and s2 can be expressed in the frequency domain. In performing the Fast Fourier Transform, analysis frames FRAME1 (K) and FRAME2 (K) composed of predetermined N samples are configured and applied from the input signals s1 (n) and s2 (n). An example of constructing the analysis frame FRAME1 (K) from the input signal s1 (n) is shown in the following equation (1), and the analysis frame FRAME2 (K) is the same.

なお、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。本文中では、Ｋが小さいほど古い分析フレームであり、大きいほど新しい分析フレームであるとする。また、以降の説明において、特に但し書きがない限りは、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。 K is an index indicating the order of frames and is expressed by a positive integer. In the text, it is assumed that the smaller the K, the older the analysis frame, and the larger, the newer the analysis frame. In the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified.

ＦＦＴ部１１は、分析フレームごとに高速フーリエ変換処理を施すことで、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換し、得られた周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）をそれぞれ、コヒーレンスフィルタ処理部１２に与える。なお、ｆは周波数を表すインデックスである。また、Ｘ１（ｆ，Ｋ）は単一の値ではなく、（２）式に示すように、複数の周波数ｆ１〜ｆｍのスペクトル成分から構成されるものである。さらに、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部からなる。Ｘ２（ｆ，Ｋ）や後述するＢ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）も同様である。 The FFT unit 11 converts the frequency domain signals X1 (f, K) and X2 (f, K) into the frequency domain signals X1 (f, K) by performing a fast Fourier transform process for each analysis frame. And X2 (f, K) are supplied to the coherence filter processing unit 12, respectively. Note that f is an index representing a frequency. X1 (f, K) is not a single value, but is composed of spectral components of a plurality of frequencies f1 to fm, as shown in equation (2). Furthermore, X1 (f, K) is a complex number and consists of a real part and an imaginary part. The same applies to X2 (f, K) and later-described B1 (f, K) and B2 (f, K).

Ｘ１（ｆ，Ｋ）＝｛Ｘ１（ｆ１，Ｋ），Ｘ１（ｆ２，Ｋ），…，Ｘ１（ｆｍ，Ｋ）｝ …（２）
後述するコヒーレンスフィルタ処理部１２においては、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）のうち、周波数領域信号Ｘ１（ｆ，Ｋ）をメインとし、周波数領域信号Ｘ２（ｆ，Ｋ）をサブとして処理を行うが、周波数領域信号Ｘ２（ｆ，Ｋ）をメインとし、周波数領域信号Ｘ１（ｆ，Ｋ）をサブとして処理を行っても良い（後述する（８）式参照）。 X1 (f, K) = {X1 (f1, K), X1 (f2, K),..., X1 (fm, K)} (2)
In the coherence filter processing unit 12 to be described later, the frequency domain signal X1 (f, K) of the frequency domain signals X1 (f, K) and X2 (f, K) is mainly used, and the frequency domain signal X2 (f, K) is used. However, the processing may be performed with the frequency domain signal X2 (f, K) as the main and the frequency domain signal X1 (f, K) as the sub (see equation (8) described later).

コヒーレンスフィルタ処理部１２は、後述する図２に示す詳細構成を有し、コヒーレンスフィルタ処理を実行し、雑音成分が抑圧された信号Ｙ（ｆ，Ｋ）を得て、ＩＦＦＴ部１３に与えるものである。 The coherence filter processing unit 12 has a detailed configuration shown in FIG. 2 to be described later. The coherence filter processing unit 12 performs coherence filter processing, obtains a signal Y (f, K) in which noise components are suppressed, and supplies the signal Y (f, K) to the IFFT unit 13. is there.

ＩＦＦＴ部１３は、雑音抑圧後信号Ｙ（ｆ，Ｋ）に対して、逆高速フーリエ変換を施して時間領域信号である出力信号ｙ（ｎ）を得るものである。 The IFFT unit 13 performs an inverse fast Fourier transform on the noise-suppressed signal Y (f, K) to obtain an output signal y (n) that is a time domain signal.

図２は、コヒーレンスフィルタ処理部１２の詳細構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a detailed configuration of the coherence filter processing unit 12.

図２において、コヒーレンスフィルタ処理部１２は、入力信号受信部２１、指向性形成部２２、フィルタ係数計算部２３、フィルタ係数平滑処理部２４、フィルタ処理部２５及びフィルタ処理後信号送信部２６を有する。 In FIG. 2, the coherence filter processing unit 12 includes an input signal receiving unit 21, a directivity forming unit 22, a filter coefficient calculation unit 23, a filter coefficient smoothing processing unit 24, a filter processing unit 25, and a post-filter processing signal transmission unit 26. .

コヒーレンスフィルタ処理部１２においては、これらの各部２１〜２６が協働して動作することにより、後述する図５のフローチャートに示す処理を実行する。 In the coherence filter processing unit 12, these units 21 to 26 operate in cooperation to execute processing shown in a flowchart of FIG. 5 described later.

入力信号受信部２１は、ＦＦＴ部１１から出力された周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）を受け取るものである。 The input signal receiving unit 21 receives the frequency domain signals X1 (f, K) and X2 (f, K) output from the FFT unit 11.

指向性形成部２２は、特定方向に指向性が強い２種類の指向性信号（第１及び第２の指向性信号）Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）を形成するものである。指向性信号Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）を形成する方法は、既存の方法を適用することができ、例えば、（３）式及び（４）式に従った演算により求める方法を適用することができる。

The directivity forming unit 22 forms two types of directivity signals (first and second directivity signals) B1 (f, K) and B2 (f, K) having strong directivity in a specific direction. . As a method of forming the directivity signals B1 (f, K) and B2 (f, K), an existing method can be applied. For example, a method of obtaining by calculation according to the equations (3) and (4). Can be applied.

以下、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）の算出式の意味を、（３）式を例に、図３及び図４を用いて説明する。図３（Ａ）に示した方向θから音波が到来し、距離ｌだけ隔てて設置されている一対のマイクロホンｍ１及びｍ２で捕捉されたとする。このとき、音波が一対のマイクロホンｍ１及びｍ２に到達するまでには時間差が生じる。この到達時間差τは、音の経路差をｄとすると、ｄ＝ｌ×ｓｉｎθなので、音速をｃとすると（５）式で与えられる。 Hereinafter, the meaning of the calculation formulas of the first and second directional signals B1 (f, K) and B2 (f, K) will be described with reference to FIGS. It is assumed that a sound wave arrives from the direction θ shown in FIG. 3A and is captured by a pair of microphones m1 and m2 that are separated by a distance l. At this time, there is a time difference until the sound wave reaches the pair of microphones m1 and m2. This arrival time difference τ is given by equation (5), where d = 1 × sin θ, where d is the sound path difference, and c is the sound speed.

τ＝ｌ×ｓｉｎθ／ｃ …（５）
ところで、入力信号ｓ１（ｎ）にτだけ遅延を与えた信号ｓ１（ｔ−τ）は、入力信号ｓ２（ｔ）と同一の信号である。従って、両者の差をとった信号ｙ（ｔ）＝ｓ２（ｔ）−ｓ１（ｔ−τ）は、θ方向から到来した音が除去された信号となる。結果として、一対のマイクロホン（マイクロホンアレー）ｍ１及びｍ２は図３（Ｂ）のような指向特性を持つようになる。 τ = 1 × sin θ / c (5)
Incidentally, a signal s1 (t−τ) obtained by delaying the input signal s1 (n) by τ is the same signal as the input signal s2 (t). Therefore, the signal y (t) = s2 (t) −s1 (t−τ) taking the difference between them is a signal from which the sound coming from the θ direction is removed. As a result, the pair of microphones (microphone array) m1 and m2 have directivity characteristics as shown in FIG.

なお、以上では、時間領域での演算を記したが、周波数領域で行っても同様なことがいえる。この場合の式が、上述した（３）式及び（４）式である。今、一例として、到来方位θが±９０度であることを想定する。すなわち、第１の指向性信号Ｂ１（ｆ）は、図４（Ａ）に示すように右方向に強い指向性を有し、第２の指向性信号Ｂ２（ｆ）は、図４（Ｂ）に示すように左方向に強い指向性を有する。なお、以降では、θ＝±９０度であることを想定して説明するが、θは±９０度に限定されるものではない。 In the above, the calculation in the time domain has been described, but the same can be said if it is performed in the frequency domain. The equations in this case are the above-described equations (3) and (4). As an example, it is assumed that the arrival direction θ is ± 90 degrees. That is, the first directivity signal B1 (f) has strong directivity in the right direction as shown in FIG. 4A, and the second directivity signal B2 (f) is shown in FIG. As shown in the figure, it has a strong directivity in the left direction. In the following description, it is assumed that θ = ± 90 degrees. However, θ is not limited to ± 90 degrees.

フィルタ係数計算部２３は、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）に基づいて、（６）式に従ってコヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）を計算するものである。

The filter coefficient calculation unit 23 calculates a coherence filter coefficient coef (f, K) according to the equation (6) based on the first and second directivity signals B1 (f, K) and B2 (f, K). Is.

フィルタ係数平滑処理部２４は、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）における各周波数のフィルタ係数値を、近傍の周波数のフィルタ係数値に近付ける平滑化を行うものである。フィルタ係数平滑処理部２４は、例えば、（７）式に示すような重み付け平均化処理により平滑化を行う。（７）式において、ｆｉは、今処理対象の周波数（注目周波数）を表しており、ｆ（ｉ−１）は、周波数領域上で１つ前の周波数（ＦＦＴにおける１つ前の周波数ポイントの周波数）である。（７）式において、αは、０．０＜α＜１．０を満たす重み付け係数である。 The filter coefficient smoothing processing unit 24 performs smoothing to bring the filter coefficient value of each frequency in the coherence filter coefficient coef (f, K) closer to the filter coefficient value of a nearby frequency. The filter coefficient smoothing processing unit 24 performs smoothing by, for example, a weighted averaging process as shown in Equation (7). In the equation (7), fi represents the frequency (target frequency) to be processed now, and f (i-1) is the previous frequency in the frequency domain (of the previous frequency point in the FFT). Frequency). In the equation (7), α is a weighting coefficient that satisfies 0.0 <α <1.0.

ａｖｅ＿ｃｏｅｆ（ｆｉ、Ｋ）＝α×ｃｏｅｆ（ｆｉ、Ｋ）＋（１−α）×ａｖｅ＿ｃｏｅｆ（ｆ（ｉ−１）、Ｋ） …（７）
（７）式の演算は、注目周波数ｆｉにおける係数ｃｏｅｆ（ｆｉ、Ｋ）と注目周波数ｆｉより小さい周波数成分ｆ１〜ｆ（ｉ−ｉ）の周波数の平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ（ｉ−１）、Ｋ）との重み付け平均値を計算している。このようにして得られた平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆｉ、Ｋ）は、より低い周波数におけるコヒーレンスフィルタ係数も寄与するため、周波数領域上の孤立点の発生を抑制することができる。 ave_coef (fi, K) = α × coef (fi, K) + (1−α) × ave_coef (f (i−1), K) (7)
The calculation of the equation (7) is performed by calculating the coefficient coef (fi, K) at the frequency of interest fi and the smoothed coherence filter coefficient ave_coef (f (i−1) of the frequency components f1 to f (ii) smaller than the frequency of interest fi. ) And K). The smoothed coherence filter coefficient ave_coef (fi, K) obtained in this way also contributes to the coherence filter coefficient at a lower frequency, so that the generation of isolated points in the frequency domain can be suppressed.

フィルタ係数平滑処理部２４が実行する平滑化のための演算は、（７）式の演算に限定されず、他の平滑化のための演算式を適用しても良い。例えば、注目周波数を中心とし、注目周波数を含めた近傍の複数の周波数のコヒーレンスフィルタ係数値（近傍周波数のものも平均化されていないものを適用する）の単純平均や重み付け平均を適用するようにしても良い。 The calculation for smoothing performed by the filter coefficient smoothing processing unit 24 is not limited to the calculation of Expression (7), and other calculation expressions for smoothing may be applied. For example, a simple average or a weighted average of the coherence filter coefficient values of a plurality of nearby frequencies including the target frequency (applying non-averaged ones in the vicinity frequency) is applied. May be.

フィルタ処理部２５は、平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）を適用して、（８）式に示すように、メインの周波数領域信号Ｘ１（ｆ，Ｋ）に対するコヒーレンスフィルタ処理を行い、雑音抑圧後信号（フィルタ処理後信号）Ｙ（ｆ、Ｋ）を得るものである。なお、（８）式は、各周波数のそれぞれの演算（乗算処理）を表している。 The filter processing unit 25 applies the smoothed coherence filter coefficient ave_coef (f, K), performs the coherence filter process on the main frequency domain signal X1 (f, K) as shown in the equation (8), and generates noise. A signal after suppression (filtered signal) Y (f, K) is obtained. In addition, (8) Formula represents each calculation (multiplication process) of each frequency.

Ｙ（ｆ、Ｋ）＝Ｘ１（ｆ、Ｋ）×ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ） …（８）
ここで、コヒーレンスフィルタ処理の物理的な意味を補足しておく。コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）（平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）も同様）は、左右に死角を有する信号成分の相互相関であるので、相関が大きい場合には到来方位には偏りがない正面から到来する音声成分であり、相関が小さい場合には到来方位が右か左に偏った成分である、というように入力音声の到来方位とも対応付けられる。従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）を乗算することは横から到来する雑音成分を抑圧する処理であるということができる。 Y (f, K) = X1 (f, K) × ave_coef (f, K) (8)
Here, the physical meaning of the coherence filter process will be supplemented. The coherence filter coefficient coef (f, K) (same as the smoothed coherence filter coefficient ave_coef (f, K)) is a cross-correlation of signal components having blind spots on the left and right. It is a voice component arriving from the front with no bias, and when the correlation is small, the arrival azimuth is a component that is biased to the right or left. Therefore, multiplication by the coherence filter coefficient coef (f, K) can be said to be processing for suppressing a noise component coming from the side.

フィルタ処理後信号送信部２６は、雑音抑圧後信号Ｙ（ｆ，Ｋ）を後段のＩＦＦＴ部１３に与えるものである。また、フィルタ処理後信号送信部２６は、Ｋを１だけ増加させて次のフレームの処理を起動させるものである。 The post-filter processing signal transmission unit 26 supplies the post-noise suppression signal Y (f, K) to the IFFT unit 13 at the subsequent stage. Further, the post-filter processing signal transmission unit 26 increases K by 1 and starts processing of the next frame.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態の音声信号処理装置１０の動作を、図面を参照しながら、全体動作、コヒーレンスフィルタ処理部１２における詳細動作の順に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the audio signal processing device 10 of the first embodiment will be described in the order of overall operation and detailed operation in the coherence filter processing unit 12 with reference to the drawings. To do.

一対のマイクロホンｍ１及びｍ２から入力された信号ｓ１（ｎ）、ｓ２（ｎ）はそれぞれ、ＦＦＴ部１１によって時間領域から周波数領域の信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換された後、コヒーレンスフィルタ処理部１２に与えられる。これにより、コヒーレンスフィルタ処理部１２において、コヒーレンスフィルタ処理が実行され、得られた雑音抑圧後信号Ｙ（ｆ，Ｋ）がＩＦＦＴ部１３に与えられる。ＩＦＦＴ部１３においては、周波数領域信号である雑音抑圧後信号Ｙ（ｆ，Ｋ）が、逆高速フーリエ変換によって、時間領域信号ｙ（ｎ）に変換され、この時間領域信号ｙ（ｎ）が出力される。 Signals s1 (n) and s2 (n) input from the pair of microphones m1 and m2 are respectively converted from time domain to frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 11. Is then provided to the coherence filter processing unit 12. Thus, the coherence filter processing unit 12 performs coherence filter processing, and the obtained noise-suppressed signal Y (f, K) is provided to the IFFT unit 13. In IFFT section 13, noise-suppressed signal Y (f, K), which is a frequency domain signal, is converted into time domain signal y (n) by inverse fast Fourier transform, and this time domain signal y (n) is output. Is done.

次に、コヒーレンスフィルタ処理部１２における詳細動作を、図５のフローチャートを参照しながら説明する。なお、図５は、あるフレームの処理を示しており、フレームごとに、図５に示す処理が繰り返される。 Next, the detailed operation in the coherence filter processing unit 12 will be described with reference to the flowchart of FIG. FIG. 5 shows the processing of a certain frame, and the processing shown in FIG. 5 is repeated for each frame.

新たなフレームになり、新たなフレーム（現フレームＫ）の周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）がＦＦＴ部１１から与えられると、（３）式及び（４）式に従って、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）が計算され（ステップＳ１）、さらに、これらの指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）に基づき、（６）式に従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）が計算される（ステップＳ２）。 When it becomes a new frame and the frequency domain signals X1 (f, K) and X2 (f, K) of the new frame (current frame K) are given from the FFT unit 11, according to the equations (3) and (4) , First and second directional signals B1 (f, K) and B2 (f, K) are calculated (step S1), and these directional signals B1 (f, K) and B2 (f, K) are calculated. ), The coherence filter coefficient coef (f, K) is calculated according to the equation (6) (step S2).

そして、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）の周波数（周波数成分）ｆｉごとに、（７）式に示すような、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）の平滑処理が実行され、平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）が得られる（ステップＳ３）。 Then, for each frequency (frequency component) fi of the coherence filter coefficient coef (f, K), a smoothing process of the coherence filter coefficient coef (f, K) as shown in the equation (7) is executed, and the post-smoothing coherence filter A coefficient ave_coef (f, K) is obtained (step S3).

得られた平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）を適用して、（８）式に示すような、メインの周波数領域信号Ｘ１（ｆ，Ｋ）に対するコヒーレンスフィルタ処理が実行され、得られた雑音抑圧後信号（フィルタ処理後信号）Ｙ（ｆ、Ｋ）がＩＦＦＴ部１３に与えられると共に、フレーム変数Ｋが１だけ増加されて（ステップＳ４）、次のフレームの処理に移行される。 The obtained smoothed coherence filter coefficient ave_coef (f, K) is applied, and the coherence filter process is executed on the main frequency domain signal X1 (f, K) as shown in the equation (8). The noise-suppressed signal (filtered signal) Y (f, K) is supplied to the IFFT unit 13 and the frame variable K is incremented by 1 (step S4), and the process proceeds to the next frame.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、コヒーレンスフィルタ処理において、コヒーレンスフィルタ係数を平滑して得た平滑後コヒーレンスフィルタ係数を、コヒーレンスフィルタ係数に代えて適用するようにしたので、コヒーレンスフィルタ係数の乗算によって生じる周波数領域上の孤立点の発生を防ぐことができ、コヒーレンスフィルタ処理で生じるミュージカルノイズを軽減することができる。 (A-3) Effect of First Embodiment According to the first embodiment, in the coherence filter processing, the smoothed coherence filter coefficient obtained by smoothing the coherence filter coefficient is applied instead of the coherence filter coefficient. Since it did in this way, generation | occurrence | production of the isolated point on the frequency domain which arises by the multiplication of a coherence filter coefficient can be prevented, and the musical noise which arises by a coherence filter process can be reduced.

これにより、第１の実施形態の音声信号処理装置若しくはプログラムを適用した、テレビ会議装置や携帯電話機などの通信装置における通話音質の向上が期待できる。 As a result, it is possible to expect improvement in call sound quality in a communication device such as a video conference device or a mobile phone to which the audio signal processing device or program of the first embodiment is applied.

（Ｂ）第２の実施形態
次に、本発明による音声信号処理装置及びプログラムの第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Next, a second embodiment of the audio signal processing apparatus and program according to the present invention will be described in detail with reference to the drawings.

雑音抑圧後信号におけるミュージカルノイズの発生度合いは、雑音がどの方位から到来するのかによっても変動する。そこで、第２の実施形態においては、（７）式に示すような平滑処理における他の周波数成分の寄与を、雑音の到来方位に応じて制御することとした。 The degree of occurrence of musical noise in the noise-suppressed signal varies depending on from which direction the noise comes. Therefore, in the second embodiment, the contribution of other frequency components in the smoothing process as shown in the equation (7) is controlled in accordance with the arrival direction of noise.

（Ｂ−１）第２の実施形態の構成
第２の実施形態に係る音声信号処理装置の全体構成も、第１の実施形態の説明で用いた上述した図１で表すことができる。但し、コヒーレンスフィルタ処理部（以下、符号１２Ａを用いる）の内部構成が、第１の実施形態のものと異なっている。 (B-1) Configuration of Second Embodiment The overall configuration of the audio signal processing apparatus according to the second embodiment can also be represented by the above-described FIG. 1 used in the description of the first embodiment. However, the internal configuration of the coherence filter processing unit (hereinafter referred to as reference numeral 12A) is different from that of the first embodiment.

図６は、第２の実施形態のコヒーレンスフィルタ処理部１２Ａの詳細構成を示すブロック図であり、上述した図２との同一、対応部分には同一符号を付して示している。 FIG. 6 is a block diagram showing a detailed configuration of the coherence filter processing unit 12A of the second embodiment, and the same reference numerals are given to the same and corresponding parts as in FIG.

図６において、第２の実施形態のコヒーレンスフィルタ処理部１２Ａは、入力信号受信部２１、指向性形成部２２、フィルタ係数計算部２３、フィルタ係数平滑処理部２４、フィルタ処理部２５及びフィルタ処理後信号送信部２６に加え、到来方位推定部２７、平均化パラメータ決定部２８及び平均化パラメータ記憶部２９を有する。 In FIG. 6, the coherence filter processing unit 12A of the second embodiment includes an input signal receiving unit 21, a directivity forming unit 22, a filter coefficient calculation unit 23, a filter coefficient smoothing processing unit 24, a filter processing unit 25, and a post-filter processing. In addition to the signal transmission unit 26, an arrival direction estimation unit 27, an averaging parameter determination unit 28, and an averaging parameter storage unit 29 are included.

入力信号受信部２１、指向性形成部２２、フィルタ係数計算部２３、フィルタ係数平滑処理部２４、フィルタ処理部２５及びフィルタ処理後信号送信部２６は、第１の実施形態のものと同様であり、その機能説明は省略する。なお、第２の実施形態のフィルタ係数平滑処理部２４は、（７）式の演算を実行する際に、固定の平均化パラメータαを適用するのではなく、平均化パラメータ決定部２８から与えられた平均化パラメータα（Ｋ）を適用する点は、第１の実施形態のフィルタ係数平滑処理部と異なっている。 The input signal receiving unit 21, directivity forming unit 22, filter coefficient calculation unit 23, filter coefficient smoothing processing unit 24, filter processing unit 25, and post-filter signal transmission unit 26 are the same as those in the first embodiment. The functional description is omitted. Note that the filter coefficient smoothing processing unit 24 of the second embodiment is given from the averaging parameter determining unit 28 instead of applying the fixed averaging parameter α when performing the calculation of the equation (7). The point that the averaging parameter α (K) is applied is different from the filter coefficient smoothing processing unit of the first embodiment.

到来方位推定部２７は、雑音の到来方位を推定し得る指標値を得て平均化パラメータ決定部２８に与えるものである。ここで、到来方位推定部２７は、雑音の到来方位の推定し得る指標値としてコヒーレンスＣＯＨ（Ｋ）を算出する。コヒーレンスＣＯＨ（Ｋ）は、（９）式に示すように、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ、Ｋ）を全周波数で算術平均した値である。

The arrival direction estimation unit 27 obtains an index value that can estimate the arrival direction of noise and gives it to the averaging parameter determination unit 28. Here, the arrival direction estimation unit 27 calculates coherence COH (K) as an index value that can estimate the arrival direction of noise. Coherence COH (K) is a value obtained by arithmetically averaging the coherence filter coefficients coef (f, K) at all frequencies, as shown in the equation (9).

図７は、コヒーレンスの挙動を示した説明図である。図７に示すように、雑音の到来方位に応じてコヒーレンスの値がとるレンジが変化することが分かる。この性質を用いることで、雑音の到来方位を推定することができる。 FIG. 7 is an explanatory diagram showing the behavior of coherence. As shown in FIG. 7, it can be seen that the range taken by the coherence value changes according to the arrival direction of noise. By using this property, the arrival direction of noise can be estimated.

平均化パラメータ決定部２８は、到来方位推定部２７が算出したコヒーレンスＣＯＨ（Ｋ）に基づいて、平均化パラメータ記憶部２９を参照し、フィルタ係数平滑処理部２４で用いる平均化パラメータα（Ｋ）を決定するものである。 The averaging parameter determination unit 28 refers to the averaging parameter storage unit 29 based on the coherence COH (K) calculated by the arrival direction estimation unit 27 and uses the averaging parameter α (K) used in the filter coefficient smoothing processing unit 24. Is to determine.

雑音（妨害音声等）の到来方位が正面に近付くほどコヒーレンスフィルタ係数の周波数領域上の孤立点が増える傾向にあるので、より多くの近傍の周波数成分と平滑処理を行うことが望ましい。そこで、到来方位が正面に近い場合（言い換えると、コヒーレンスＣＯＨ（Ｋ）が大きい場合）には、平均化パラメータαを小さくして他の周波数成分の寄与を大きくし、逆に、到来方位が横の場合（言い換えると、コヒーレンスＣＯＨ（Ｋ）が小さい場合）には、平均化パラメータαを大きくして他の周波数成分の寄与を小さくする、という制御を行うこととした。 Since the isolated points in the frequency domain of the coherence filter coefficient tend to increase as the arrival direction of noise (such as disturbing speech) approaches the front, it is desirable to perform smoothing processing with more nearby frequency components. Therefore, when the arrival direction is close to the front (in other words, when the coherence COH (K) is large), the averaging parameter α is decreased to increase the contribution of other frequency components. In this case (in other words, when the coherence COH (K) is small), the control is performed to increase the averaging parameter α to reduce the contribution of other frequency components.

平均化パラメータ決定部２８は、このような制御を実行し得る平均化パラメータα（Ｋ）を決定するものである。平均化パラメータ決定部２８は、このような制御を実行できる平均化パラメータα（Ｋ）を決定することができるのであれば、その具体的な構成は問われない。例えば、平均化パラメータ決定部２８は、変換テーブルを利用して平均化パラメータα（Ｋ）を決定するものであっても良く、変換関数の演算を実行して平均化パラメータα（Ｋ）を決定するものであっても良い。図６は、前者の場合の構成を示しており、平均化パラメータ記憶部２９が設けられている。 The averaging parameter determination unit 28 determines an averaging parameter α (K) that can execute such control. The averaging parameter determination unit 28 may be of any specific configuration as long as it can determine the averaging parameter α (K) that can execute such control. For example, the averaging parameter determination unit 28 may determine the averaging parameter α (K) using the conversion table, and executes the calculation of the conversion function to determine the averaging parameter α (K). It may be what you do. FIG. 6 shows a configuration in the former case, and an averaging parameter storage unit 29 is provided.

平均化パラメータ記憶部２９は、図８に示すように、コヒーレンスＣＯＨ（Ｋ）の範囲と、その範囲内に算出されたコヒーレンスＣＯＨ（Ｋ）の値が属するときに、適用される平均化パラメータα（Ｋ）との対応（変換テーブル）を記憶しているものである。 As shown in FIG. 8, the averaging parameter storage unit 29 applies the averaging parameter α applied when the range of the coherence COH (K) and the value of the coherence COH (K) calculated within the range belong. The correspondence (conversion table) with (K) is stored.

平均化パラメータ決定部２８は、与えられたコヒーレンスＣＯＨ（Ｋ）が変換テーブルのどの範囲Ａ以上Ｂ未満、Ｂ以上Ｃ未満、Ｃ以上Ｄ未満、…（但し、Ａ＜Ｂ＜Ｃ＜Ｄ＜…）に属するかを判定し、属する範囲に対応付けられている値β、γ、δ、…（但し、β＞γ＞δ＞…）を平均化パラメータα（Ｋ）としてフィルタ係数平滑処理部２４に与える。例えば、コヒーレンスＣＯＨ（Ｋ）がＢ以上Ｃ未満の範囲の値であると、平均化パラメータ決定部２８は、値がγである平均化パラメータα（Ｋ）をフィルタ係数平滑処理部２４に与える。 The averaging parameter determination unit 28 has a given coherence COH (K) in any range A to B, B to C, C to D in the conversion table, (A <B <C <D <. ) And the values β, γ, δ,... (Where β> γ> δ>...) Associated with the range to which they belong are used as the averaging parameter α (K). To give. For example, if the coherence COH (K) is a value in the range of B and less than C, the averaging parameter determination unit 28 gives the averaging parameter α (K) whose value is γ to the filter coefficient smoothing processing unit 24.

（Ｂ−２）第２の実施形態の動作
次に、第２の実施形態の音声信号処理装置の動作を説明する。全体動作は、第１の実施形態と同様であるので、以下では、第２の実施形態のコヒーレンスフィルタ処理部１２Ａの動作を説明する。 (B-2) Operation of Second Embodiment Next, the operation of the audio signal processing apparatus of the second embodiment will be described. Since the overall operation is the same as that of the first embodiment, the operation of the coherence filter processing unit 12A of the second embodiment will be described below.

新たなフレームになり、新たなフレーム（現フレームＫ）の周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）がＦＦＴ部１１から与えられると、（３）式及び（４）式に従って、第１及び第２の指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）が計算され、さらに、これらの指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）に基づき、（６）式に従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）が計算される。 When it becomes a new frame and the frequency domain signals X1 (f, K) and X2 (f, K) of the new frame (current frame K) are given from the FFT unit 11, according to the equations (3) and (4) , First and second directional signals B1 (f, K) and B2 (f, K) are calculated, and based on these directional signals B1 (f, K) and B2 (f, K), The coherence filter coefficient coef (f, K) is calculated according to the equation (6).

その後、（９）式に従って、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）を全周波数で算術平均したコヒーレンスＣＯＨ（Ｋ）が計算され、計算されたコヒーレンスの値が属する範囲に応じた平均化パラメータα（Ｋ）が変換テーブルから取出される。 Thereafter, according to the equation (9), coherence COH (K) obtained by arithmetically averaging the coherence filter coefficients coef (f, K) at all frequencies is calculated, and an averaging parameter α ( K) is taken from the conversion table.

そして、取出された平均化パラメータα（Ｋ）を適用して、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）の周波数（周波数成分）ｆｉごとに、（７）式に示すような、コヒーレンスフィルタ係数ｃｏｅｆ（ｆ，Ｋ）の平滑処理が実行され、平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）が得られる。 Then, by applying the extracted averaging parameter α (K), for each frequency (frequency component) fi of the coherence filter coefficient coef (f, K), a coherence filter coefficient coef ( The smoothing process of f, K) is executed, and the post-smoothing coherence filter coefficient ave_coef (f, K) is obtained.

得られた平滑後コヒーレンスフィルタ係数ａｖｅ＿ｃｏｅｆ（ｆ、Ｋ）を適用して、（８）式に示すような、メインの周波数領域信号Ｘ１（ｆ，Ｋ）に対するコヒーレンスフィルタ処理が実行され、得られた雑音抑圧後信号（フィルタ処理後信号）Ｙ（ｆ、Ｋ）がコヒーレンスフィルタ処理部１２Ａから出力される。 The obtained smoothed coherence filter coefficient ave_coef (f, K) is applied, and the coherence filter process is executed on the main frequency domain signal X1 (f, K) as shown in the equation (8). A noise-suppressed signal (filtered signal) Y (f, K) is output from the coherence filter processing unit 12A.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、雑音の到来方位に応じて適用する平均化パラメータを定めて、コヒーレンスフィルタ係数の平滑処理を行うようにしたので、雑音の到来方位に依存しないミュージカルノイズの低減効果を得ることができる。 (B-3) Effect of Second Embodiment According to the second embodiment, the averaging parameter to be applied is determined according to the noise arrival direction, and the smoothing process of the coherence filter coefficient is performed. It is possible to obtain a musical noise reduction effect that does not depend on the noise arrival direction.

これにより、本発明をテレビ会議システムや携帯電話などの通信装置に適用することで、通話音質の向上が期待できる。 As a result, by applying the present invention to a communication device such as a video conference system or a mobile phone, it is possible to expect improvement in call sound quality.

これにより、第２の実施形態の音声信号処理装置若しくはプログラムを適用した、テレビ会議装置や携帯電話機などの通信装置における通話音質の向上が期待できる。 As a result, it is possible to expect improvement in call sound quality in a communication device such as a video conference device or a mobile phone to which the audio signal processing device or program of the second embodiment is applied.

（Ｃ）第３の実施形態
次に、本発明による音声信号処理装置及びプログラムの第３の実施形態を説明する。 (C) Third Embodiment Next, a third embodiment of the audio signal processing apparatus and program according to the present invention will be described.

第２の実施形態は、雑音（妨害音声等）の到来方位を表す指標値としてコヒーレンスＣＯＨを適用したものであった。この第３の実施形態は、雑音の到来方位を表す指標値としてコヒーレンスＣＯＨ（Ｋ）に代えて、ＳＮ比ＳＮＲ（Ｋ）を適用することとしたものである。 In the second embodiment, coherence COH is applied as an index value representing the arrival direction of noise (such as disturbing speech). In the third embodiment, the SN ratio SNR (K) is applied instead of the coherence COH (K) as an index value representing the arrival direction of noise.

第３の実施形態の音声信号処理装置も、その全体構成は、第１の実施形態の説明で用いた図１で表すことができる。また、第３の実施形態のコヒーレンスフィルタ処理部（１２Ａ）の詳細構成も、第２の実施形態の説明で用いた図６で表すことができる。 The overall configuration of the audio signal processing apparatus according to the third embodiment can also be represented by FIG. 1 used in the description of the first embodiment. The detailed configuration of the coherence filter processing unit (12A) of the third embodiment can also be represented by FIG. 6 used in the description of the second embodiment.

但し、上述したように、到来方位推定部２７は、第２の実施形態と異なり、コヒーレンスＣＯＨ（Ｋ）ではなくＳＮ比ＳＮＲ（Ｋ）を算出するものである。平均化パラメータ決定部２９は、算出されたＳＮ比ＳＮＲ（Ｋ）に基づいて、平均化パラメータを決定するものである。 However, as described above, the arrival direction estimation unit 27 calculates the SN ratio SNR (K) instead of the coherence COH (K), unlike the second embodiment. The averaging parameter determination unit 29 determines an averaging parameter based on the calculated SN ratio SNR (K).

以下、第３の実施形態の到来方位推定部２７が実行する、ＳＮ比ＳＮＲ（Ｋ）の算出方法を説明する。 Hereinafter, the SN ratio SNR (K) calculation method executed by the arrival direction estimation unit 27 of the third embodiment will be described.

到来方位推定部２７は、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、（１０）式に従って、雑音信号Ｎ（ｆ，Ｋ）を算出する。（１０）式の演算は、図９に示すように、正面に死角を有する指向性を形成する処理に相当する。従って、左右から到来する成分のみを得ることができる。今、目的方向を正面方向に想定しているので（例えば、目的話者が正面にいることを想定している）、横から到来する成分は雑音であるということができる。 The arrival direction estimating unit 27 calculates the noise signal N (f, K) according to the equation (10) based on the frequency domain signals X1 (f, K) and X2 (f, K). The calculation of equation (10) corresponds to a process of forming directivity having a blind spot on the front as shown in FIG. Therefore, only components coming from the left and right can be obtained. Since the target direction is assumed to be the front direction (for example, the target speaker is assumed to be in front), it can be said that the component coming from the side is noise.

Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（１０）
次に、到来方位推定部２７は、メインの周波数領域信号Ｘ１（ｆ，Ｋ）と雑音信号Ｎ（ｆ，Ｋ）とに基づいて、（１１式に従って、現フレームＫにおけるＳＮ比ＳＮＲ（Ｋ）を計算する。（１１）式の分母は、雑音信号のレベルであり、分子は、目的音信号のレベルである。目的音は正面から到来し、雑音は横（左右）から到来することを前提しているので、（１１）式によってＳＮ比を推定することができる。（１１）式のηは、０＜η＜１の範囲内の値をとるパラメータである。

N (f, K) = X1 (f, K) -X2 (f, K) (10)
Next, the arrival direction estimating unit 27, based on the main frequency domain signal X1 (f, K) and the noise signal N (f, K), the SN ratio SNR (K) in the current frame K according to the equation (11). The denominator of equation (11) is the level of the noise signal, and the numerator is the level of the target sound signal, assuming that the target sound comes from the front and the noise comes from the side (left and right). Therefore, the S / N ratio can be estimated by the equation (11), where η is a parameter that takes a value in the range of 0 <η <1.

以上のように算出されたＳＮ比ＳＮＲ（Ｋ）を、雑音の到来方位を表す指標値として適用し、上述した第２の実施形態と同様にして、雑音の到来方位に応じた平均化パラメータを決定する。 The SN ratio SNR (K) calculated as described above is applied as an index value representing the arrival direction of noise, and an averaging parameter corresponding to the arrival direction of noise is set in the same manner as in the second embodiment described above. decide.

第３の実施形態によっても、雑音の到来方位に応じて決定した平均化パラメータを適用して、コヒーレンスフィルタ係数の平滑処理を施すようにしたので、第２の実施形態と同様な効果を奏することができる。 Also in the third embodiment, since the averaging parameter determined in accordance with the noise arrival direction is applied and the coherence filter coefficient is smoothed, the same effects as in the second embodiment can be obtained. Can do.

（Ｄ）他の実施形態
上記各実施形態の説明においても、種々変形実施形態について言及したが、さらに、以下に例示するような変形実施形態を挙げることができる。 (D) Other Embodiments In the description of each of the above-described embodiments, various modified embodiments have been referred to. However, modified embodiments exemplified below can be given.

上記第２の実施形態では、コヒーレンスＣＯＨ（Ｋ）を妨害音声の到来方位を表す指標値として適用し、上記第３の実施形態ではＳＮ比ＳＮＲ（Ｋ）を妨害音声の到来方位を表す指標値として適用したものを示したが、妨害音声の到来方位を表すものであれば、他の指標値を適用しても良く、また、複数の指標値を同時に適用するようにしても良い。例えば、コヒーレンスＣＯＨ（Ｋ）が属する範囲とＳＮ比ＳＮＲ（Ｋ）が属する範囲との組み合わせに応じて、平均化パラメータα（Ｋ）を制御するようにしても良い。 In the second embodiment, coherence COH (K) is applied as an index value representing the arrival direction of disturbing speech, and in the third embodiment, the SN ratio SNR (K) is an index value representing the arrival direction of disturbing speech. However, as long as it represents the arrival direction of disturbing speech, other index values may be applied, or a plurality of index values may be applied simultaneously. For example, the averaging parameter α (K) may be controlled according to the combination of the range to which the coherence COH (K) belongs and the range to which the SN ratio SNR (K) belongs.

上記第２の実施形態の説明で言及した変換テーブルにおけるコヒーレンスＣＯＨ（Ｋ）の範囲の数は２以上であれば良く、所定の数に限定されるものではない。 The number of coherence COH (K) ranges in the conversion table referred to in the description of the second embodiment may be two or more, and is not limited to a predetermined number.

上記各実施形態において、周波数領域の信号で処理していた処理を、可能ならば時間領域の信号で処理するようにしても良く、逆に、時間領域の信号で処理していた処理を、可能ならば周波数領域の信号で処理するようにしても良い。 In each of the above embodiments, the processing that was processed with the frequency domain signal may be performed with the time domain signal if possible, and conversely, the processing that was processed with the time domain signal is possible. In this case, processing may be performed using a frequency domain signal.

上記各実施形態では、雑音抑制技術として、コヒーレンスフィルタ法を単独で適用したものを示したが、他の雑音抑制技術（特許文献１参照）、例えば、ボイススイッチ法、ウィーナーフィルタ法、周波数減算法と併用するようにしても良い。 In each of the above-described embodiments, the noise suppression technique is shown by applying the coherence filter method alone, but other noise suppression techniques (see Patent Document 1), for example, the voice switch method, the Wiener filter method, the frequency subtraction method, and the like. You may make it use together.

上記各実施形態では、一対のマイクロホンが捕捉した信号を直ちに処理する音声信号処理装置やプログラムを示したが、本発明の処理対象の音声信号はこれに限定されるものではない。例えば、記録媒体から読み出した一対の音声信号を処理する場合にも、本発明を適用することができ、また、対向装置から送信されてきた一対の音声信号を処理する場合にも、本発明を適用することができる。 In each of the above-described embodiments, the audio signal processing apparatus and the program that immediately process the signal captured by the pair of microphones are shown, but the audio signal to be processed of the present invention is not limited to this. For example, the present invention can be applied to processing a pair of audio signals read from a recording medium, and the present invention can also be applied to processing a pair of audio signals transmitted from the opposite device. Can be applied.

１０…音声信号処理装置、１１…ＦＦＴ部、１２、１２Ａ…コヒーレンスフィルタ処理部、１３…ＩＦＦＴ部、ｍ１、ｍ２…マイクロホン、２１…入力信号受信部、２２…指向性形成部、２３…フィルタ係数計算部、２４…フィルタ係数平滑処理部、２５…フィルタ処理部、２６…フィルタ処理後信号送信部、２７…到来方位推定部、２８…平均化パラメータ決定部、２９…平均化パラメータ記憶部。 DESCRIPTION OF SYMBOLS 10 ... Audio | voice signal processing apparatus, 11 ... FFT part, 12, 12A ... Coherence filter processing part, 13 ... IFFT part, m1, m2 ... Microphone, 21 ... Input signal receiving part, 22 ... Directivity formation part, 23 ... Filter coefficient Calculation unit, 24 ... Filter coefficient smoothing processing unit, 25 ... Filter processing unit, 26 ... Filtered signal transmission unit, 27 ... Arrival direction estimation unit, 28 ... Average parameter determination unit, 29 ... Average parameter storage unit

Claims

In an audio signal processing apparatus that suppresses noise components included in an input audio signal by coherence filter processing,
Coherence filter coefficient calculating means for calculating a coherence filter coefficient;
An audio signal processing apparatus comprising: coefficient smoothing means for smoothing the calculated coherence filter coefficient on a frequency domain and applying the smoothed coefficient to a coherence filter process.

2. The audio signal processing according to claim 1, wherein the coefficient smoothing unit includes an average processing unit that smoothes a frequency component in the coherence filter coefficient by averaging the frequency component with an adjacent frequency component. apparatus.

The noise direction reflection value calculation unit for calculating the value reflecting the arrival direction of the noise component in the input voice signal and the averaging parameter indicating the reflection degree of the adjacent frequency component in the averaging reflected the calculated arrival direction. The audio signal processing apparatus according to claim 2, further comprising an averaging parameter determination unit that determines the value according to a value.

The audio signal according to claim 3, wherein the noise direction reflection value calculation unit calculates a coherence that is an arithmetic average value of all the frequencies of the coherence filter coefficient as a value reflecting the arrival direction of the noise component. Processing equipment.

The audio signal processing apparatus according to claim 3, wherein the noise direction reflection value calculation unit calculates an SN ratio of the input audio signal as a value reflecting the arrival direction of the noise component.

A computer mounted on an audio signal processing device that suppresses noise components contained in the input audio signal by coherence filter processing,
Coherence filter coefficient calculating means for calculating a coherence filter coefficient;
An audio signal processing program that functions as a coefficient smoothing unit that smoothes the calculated coherence filter coefficient in a frequency domain and then applies the coefficient to a coherence filter process.