JP6102144B2

JP6102144B2 - Acoustic signal processing apparatus, method, and program

Info

Publication number: JP6102144B2
Application number: JP2012209711A
Authority: JP
Inventors: 克之高橋
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2012-09-24
Filing date: 2012-09-24
Publication date: 2017-03-29
Anticipated expiration: 2032-09-24
Also published as: JP2014068052A

Description

本発明は音響信号処理装置、方法及びプログラムに関し、例えば、複数（多チャンネル）のマイクロホン（以下、マイクと呼ぶ）を利用している、電話やテレビ会議などの通信装置に適用し得るものである。 The present invention relates to an acoustic signal processing apparatus, method, and program, and can be applied to a communication apparatus such as a telephone or a video conference that uses a plurality of (multi-channel) microphones (hereinafter referred to as microphones). .

近年、多チャンネルのマイクを用いた音響信号、音声信号（本明細書においては、音響信号及び音声信号を併せて音響信号と呼んでいることもある）の処理技術が実現されている。このような場合、同じ型番のマイクであっても感度差があり、感度差を校正しなければ正確な音響特徴量の計算ができない。これまでは、事前にマイクの感度を測定し、感度差に応じた補正ゲインを設定したり、チャンネルごとに入力レベルを比較して平均値に一致させるような補正ゲインを自動設定したりなどの手法で対処していた。しかし、前者は手間がかかり、後者はマイクの感度差だけでなく取得した入力信号の差も埋めてしまうため、校正回路の後段で計算する音響特徴量の精度が保障されない、という課題がある。 In recent years, a processing technique of an acoustic signal and an audio signal using a multi-channel microphone (in this specification, the acoustic signal and the audio signal may be collectively referred to as an acoustic signal) has been realized. In such a case, there is a difference in sensitivity even if the microphones have the same model number, and accurate acoustic feature amounts cannot be calculated unless the sensitivity difference is calibrated. Previously, the sensitivity of the microphone was measured in advance and a correction gain was set according to the sensitivity difference, or the correction gain was automatically set to match the average value by comparing the input level for each channel. It was dealt with by the method. However, since the former takes time and the latter fills not only the difference in sensitivity of the microphone but also the difference in the acquired input signal, there is a problem that the accuracy of the acoustic feature amount calculated in the subsequent stage of the calibration circuit cannot be guaranteed.

このような課題の改善方法の一つが、入力信号のうち、マイク正面から到来する信号成分の区間でのみ入力レベルの比較を行って校正ゲインを計算する、というものである。これは、正面から到来する信号ならば各マイクと音源との距離が等しいため、マイクに到達する信号成分の音響的な特性差は微小であり、両者に発生する特性差はマイク感度のみであると期待できることを前提としている。マイクに入力される音響成分のうち特定方位から到来する信号成分のみを検出する先行技術として特許文献１に記載の技術がある。 One of the methods for improving such a problem is to calculate the calibration gain by comparing the input levels only in the section of the signal component coming from the front of the microphone among the input signals. This is because the distance between each microphone and the sound source is equal if the signal comes from the front, so the acoustic characteristic difference between the signal components reaching the microphone is very small, and the only characteristic difference that occurs between them is the microphone sensitivity. It is assumed that it can be expected. As a prior art for detecting only a signal component coming from a specific direction among acoustic components input to a microphone, there is a technique described in Patent Document 1.

特許文献１の記載技術は、所定の距離を離して配置された２つのマイクからそれぞれ取得した音声信号に、目的音声以外の他の音声（非目的音声）の影響を受け易い周波数帯域に対して小さい値をとる重み付きＣＳＰ（Ｃｒｏｓｓ−ｐｏｗｅｒＳｐｅｃｔｒｕｍＰｈａｓｅ）係数を用いて、利得調整処理及び目的音声区間の切り出し処理の少なくとも一方を行おうとしたものである。 The technology described in Patent Document 1 is based on a frequency band that is susceptible to the influence of sound other than the target sound (non-target sound) on the sound signals acquired from two microphones arranged at a predetermined distance from each other. At least one of gain adjustment processing and target speech segment cutout processing is attempted using a weighted CSP (Cross-power Spectrum Phase) coefficient that takes a small value.

特開２０１１−１１３０４４号公報JP 2011-113044 A

しかしながら、特許文献１の記載技術で利用される重み付きＣＳＰ係数は、所定の周波数帯域を他の周波数帯域と区別する特徴量であるため、非目的音声の周波数帯域と目的音声の周波数帯域の関係や、目的音声が有するスペクトラムによっては、マイク感度差を表す特徴量として適当でないことも生じる。 However, since the weighted CSP coefficient used in the technique described in Patent Document 1 is a feature amount that distinguishes a predetermined frequency band from other frequency bands, the relationship between the frequency band of the non-target voice and the frequency band of the target voice. In addition, depending on the spectrum of the target voice, it may not be appropriate as a feature quantity representing a microphone sensitivity difference.

そのため、入力される音響信号が有する周波数帯域に関係なく、多チャンネルのマイクからの複数の音響信号におけるマイク感度差を測定し、校正できる音響信号処理装置、方法及びプログラムが望まれている。 Therefore, there is a demand for an acoustic signal processing apparatus, method, and program that can measure and calibrate microphone sensitivity differences among a plurality of acoustic signals from a multi-channel microphone regardless of the frequency band of the input acoustic signal.

第１の本発明は、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理装置において、（１）入力音響信号に遅延減算処理を施すことで、第１の所定方位に死角を有する指向性特性を付与した第１の指向性信号を形成する第１の指向性形成部と、（２）入力音響信号に遅延減算処理を施すことで、上記第１の所定方位とは異なる第２の所定方位に死角を有する指向性特性を付与した第２の指向性信号を形成する第２の指向性形成部と、（３）上記第１及び第２の指向性信号を用いてコヒーレンスを得るコヒーレンス計算部と、（４）上記コヒーレンスに基づいて正面から到来する信号区間を検出する正面到来信号検出部と、（５）正面から到来する信号区間の各入力音響信号を用い、各入力音響信号のそれぞれについて同じ演算を行ってマイク感度の指標値を算出し、算出された複数のマイク感度の指標値から目標感度を決定し、各入力音響信号のそれぞれのマイク感度の指標値と目標感度とから、各入力音響信号のそれぞれに対する校正ゲインを算出する校正ゲイン計算部と、（６）得られた校正ゲインで対応する入力音響信号を校正する複数の校正ゲイン乗算部とを備えることを特徴とする。 The first aspect of the present invention is an acoustic signal processing apparatus for calibrating a difference in microphone sensitivity among a plurality of input acoustic signals. (1) A dead angle is provided in a first predetermined direction by performing a delay subtraction process on an input acoustic signal. A first directivity forming unit for forming a first directivity signal to which directivity characteristics are imparted, and (2) a second subtracting process from the first predetermined direction by performing a delay subtraction process on the input acoustic signal. A second directivity forming section for forming a second directivity signal having a directivity characteristic having a blind spot in a predetermined direction, and (3) obtaining coherence using the first and second directivity signals. A coherence calculation unit; (4) a front arrival signal detection unit that detects a signal interval coming from the front based on the coherence; and (5) each input acoustic signal using each input acoustic signal of the signal interval coming from the front. Same for each of Calculating the microphone sensitivity index value, determining the target sensitivity from the calculated multiple microphone sensitivity index values, and using each input acoustic signal's respective microphone sensitivity index value and target sensitivity for each input A calibration gain calculation unit that calculates a calibration gain for each of the acoustic signals, and (6) a plurality of calibration gain multiplication units that calibrate the corresponding input acoustic signal with the obtained calibration gain.

第２の本発明は、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理方法において、（１）第１の指向性形成部は、入力音響信号に遅延減算処理を施すことで、第１の所定方位に死角を有する指向性特性を付与した第１の指向性信号を形成し、（２）第２の指向性形成部は、入力音響信号に遅延減算処理を施すことで、上記第１の所定方位とは異なる第２の所定方位に死角を有する指向性特性を付与した第２の指向性信号を形成し、（３）コヒーレンス計算部は、上記第１及び第２の指向性信号を用いてコヒーレンスを得、（４）正面到来信号検出部は、上記コヒーレンスに基づいて正面から到来する信号区間を検出し、（５）校正ゲイン計算部は、正面から到来する信号区間の各入力音響信号を用い、各入力音響信号のそれぞれについて同じ演算を行ってマイク感度の指標値を算出し、算出された複数のマイク感度の指標値から目標感度を決定し、各入力音響信号のそれぞれのマイク感度の指標値と目標感度とから、各入力音響信号のそれぞれに対する校正ゲインを算出し、（６）複数の校正ゲイン乗算部はそれぞれ、自己に与えられた校正ゲインで対応する入力音響信号を校正することを特徴とする。 According to a second aspect of the present invention, in the acoustic signal processing method for calibrating a difference in microphone sensitivity among a plurality of input acoustic signals, (1) the first directivity forming unit performs a delay subtraction process on the input acoustic signal; A first directivity signal having a directivity characteristic having a blind spot in a first predetermined direction is formed. (2) The second directivity forming unit performs a delay subtraction process on the input acoustic signal, thereby Forming a second directivity signal having a directivity characteristic having a blind spot in a second predetermined orientation different from the first predetermined orientation; and (3) the coherence calculation unit is configured to output the first and second directivities. The signal is used to obtain coherence, (4) the front arrival signal detection unit detects a signal section arriving from the front based on the coherence, and (5) the calibration gain calculation unit is provided for each signal section arriving from the front. Using the input sound signal, each input sound signal The same calculation is performed for each to calculate the microphone sensitivity index value, the target sensitivity is determined from the calculated multiple microphone sensitivity index values, and the microphone sensitivity index value and target sensitivity of each input acoustic signal are determined. And (6) each of the plurality of calibration gain multipliers calibrates the corresponding input acoustic signal with the calibration gain given to the input acoustic signal.

第３の本発明の音響信号処理プログラムは、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理装置に搭載されるコンピュータを、（１）入力音響信号に遅延減算処理を施すことで、第１の所定方位に死角を有する指向性特性を付与した第１の指向性信号を形成する第１の指向性形成部と、（２）入力音響信号に遅延減算処理を施すことで、上記第１の所定方位とは異なる第２の所定方位に死角を有する指向性特性を付与した第２の指向性信号を形成する第２の指向性形成部と、（３）上記第１及び第２の指向性信号を用いてコヒーレンスを得るコヒーレンス計算部と、（４）上記コヒーレンスに基づいて正面から到来する信号区間を検出する正面到来信号検出部と、（５）正面から到来する信号区間の各入力音響信号を用い、各入力音響信号のそれぞれについて同じ演算を行ってマイク感度の指標値を算出し、算出された複数のマイク感度の指標値から目標感度を決定し、各入力音響信号のそれぞれのマイク感度の指標値と目標感度とから、各入力音響信号のそれぞれに対する校正ゲインを算出する校正ゲイン計算部と、（６）得られた校正ゲインで対応する入力音響信号を校正する複数の校正ゲイン乗算部として機能させることを特徴とする。 The acoustic signal processing program according to the third aspect of the present invention provides a computer mounted on an acoustic signal processing apparatus that calibrates a difference in microphone sensitivity among a plurality of input acoustic signals, and (1) performs a delay subtraction process on the input acoustic signal. A first directivity forming unit for forming a first directivity signal having a directivity characteristic having a blind spot in a first predetermined direction, and (2) performing a delay subtraction process on the input acoustic signal, A second directivity forming section that forms a second directivity signal having a directivity characteristic having a blind spot in a second predetermined orientation different from the first predetermined orientation; and (3) the first and second A coherence calculation unit that obtains coherence using a directional signal of (4), (4) a front arrival signal detection unit that detects a signal interval coming from the front based on the coherence, and (5) each of the signal intervals coming from the front Input acoustic signal The same calculation is performed for each input acoustic signal to calculate the microphone sensitivity index value, the target sensitivity is determined from the calculated plurality of microphone sensitivity index values, and the microphone sensitivity of each input acoustic signal is determined. A calibration gain calculation unit that calculates a calibration gain for each input acoustic signal from the index value and target sensitivity, and (6) a plurality of calibration gain multiplication units that calibrate the corresponding input acoustic signal using the obtained calibration gain. It is made to function.

本発明の音響信号処理装置、方法及びプログラムによれば、入力音響信号が有する周波数帯域に関係なく、多チャンネル入力音響信号におけるマイク感度差を校正することができる。 According to the acoustic signal processing apparatus, method, and program of the present invention, it is possible to calibrate the microphone sensitivity difference in the multi-channel input acoustic signal regardless of the frequency band of the input acoustic signal.

第１の実施形態に係る音響信号処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the acoustic signal processing apparatus which concerns on 1st Embodiment. 第１の実施形態における各指向性形成部からの指向性信号の性質を示す説明図である。It is explanatory drawing which shows the property of the directivity signal from each directivity formation part in 1st Embodiment. 第１の実施形態における２つの指向性形成部による指向性の特性を示す説明図である。It is explanatory drawing which shows the directivity characteristic by the two directivity formation parts in 1st Embodiment. 第１の実施形態における校正ゲイン計算部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the calibration gain calculation part in 1st Embodiment. 第１の実施形態における校正ゲイン計算部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calibration gain calculation part in 1st Embodiment. 第１の実施形態の変形実施形態における校正ゲイン計算部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calibration gain calculation part in the deformation | transformation embodiment of 1st Embodiment. 第２の実施形態における校正ゲイン計算部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the calibration gain calculation part in 2nd Embodiment. 第２の実施形態における校正ゲイン観測区間長制御部の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the calibration gain observation area length control part in 2nd Embodiment. 第２の実施形態における観測区間長記憶部の構成を示す説明図である。It is explanatory drawing which shows the structure of the observation area length memory | storage part in 2nd Embodiment. 第２の実施形態における校正ゲイン計算部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calibration gain calculation part in 2nd Embodiment. 第２の実施形態における校正ゲイン観測区間長制御部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the calibration gain observation area length control part in 2nd Embodiment.

（Ａ）第１の実施形態
以下、本発明による音響信号処理装置、方法及びプログラムの第１の実施形態を、図面を参照しながら詳述する。 (A) First Embodiment Hereinafter, a first embodiment of an acoustic signal processing apparatus, method, and program according to the present invention will be described in detail with reference to the drawings.

第１の実施形態は、音響信号の到来方位に係る特徴量としてコヒーレンスを適用することにより、入力される音響信号が有する周波数帯域に関係なく、複数の音響信号におけるマイク感度差を測定し、校正しようとしたものである。 In the first embodiment, by applying coherence as a feature amount related to the arrival direction of an acoustic signal, the microphone sensitivity difference in a plurality of acoustic signals is measured and calibrated regardless of the frequency band of the input acoustic signal. It is what I tried.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る音響信号処理装置の構成を示すブロック図である。ここで、一対のマイクｍ＿１及びｍ＿２を除いた部分は、ＣＰＵが実行するソフトウェア（音響信号処理プログラム）として実現することも可能であるが、機能的には、図１で表すことができる。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a configuration of an acoustic signal processing device according to the first embodiment. Here, the part excluding the pair of microphones m_1 and m_2 can be realized as software (acoustic signal processing program) executed by the CPU, but can be functionally represented in FIG.

図１において、第１の実施形態の音響信号処理装置１は、マイクｍ＿１、ｍ＿２、ＦＦＴ部１０、第１指向性形成部１１、第２の指向性形成部１２、コヒーレンス計算部１３、正面到来信号検出部１４、校正ゲイン計算部１５及び校正ゲイン乗算部１６、１７を有する。 In FIG. 1, the acoustic signal processing device 1 of the first embodiment includes microphones m_1 and m_2, an FFT unit 10, a first directivity forming unit 11, a second directivity forming unit 12, a coherence calculating unit 13, and a front arrival. A signal detection unit 14, a calibration gain calculation unit 15, and calibration gain multiplication units 16 and 17 are included.

一対のマイクｍ＿１、ｍ＿２は、所定距離（若しくは任意の距離）だけ離れて配置され、それぞれ、周囲の音響を捕捉するものである。各マイクｍ＿１、ｍ＿２で捕捉された音響信号（入力信号）は、図示しない対応するＡＤ変換器を介してデジタル信号ｓ１（ｎ）、ｓ２（ｎ）に変換されてＦＦＴ部１０に与えられる。なお、ｎはサンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎが小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The pair of microphones m_1 and m_2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each captures surrounding sounds. The acoustic signals (input signals) captured by the respective microphones m_1 and m_2 are converted into digital signals s1 (n) and s2 (n) via corresponding AD converters (not shown), and are given to the FFT unit 10. Note that n is an index indicating the input order of samples, and is expressed as a positive integer. In the text, it is assumed that the smaller n is the older input sample, and the larger n is the newer input sample.

ＦＦＴ部１０は、マイクｍ＿１及びｍ＿２から入力信号系列ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力信号ｓ１及びｓ２に高速フーリエ変換（あるいは離散フーリエ変換）を行うものである。これにより、入力信号ｓ１及びｓ２を周波数領域で表現することができる。なお、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から、所定のＮ個のサンプルからなる分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成して適用する。入力信号ｓ１（ｎ）から分析フレームＦＲＡＭＥ１（Ｋ）を構成する例を以下の（１）式に示すが、分析フレームＦＲＡＭＥ２（Ｋ）も同様である。

The FFT unit 10 receives input signal sequences s1 (n) and s2 (n) from the microphones m_1 and m_2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. Thereby, the input signals s1 and s2 can be expressed in the frequency domain. In performing the Fast Fourier Transform, analysis frames FRAME1 (K) and FRAME2 (K) composed of predetermined N samples are configured and applied from the input signals s1 (n) and s2 (n). An example of constructing the analysis frame FRAME1 (K) from the input signal s1 (n) is shown in the following equation (1), and the analysis frame FRAME2 (K) is the same.

なお、Ｋはフレームの順番を表すインデックスであり、正の整数で表現される。本文中では、Ｋが小さいほど古い分析フレームであり、大きいほど新しい分析フレームであるとする。また、以降の説明において、特に但し書きがない限りは、分析対象となる最新の分析フレームを表すインデックスはＫであるとする。 K is an index indicating the order of frames and is expressed by a positive integer. In the text, it is assumed that the smaller the K, the older the analysis frame, and the larger, the newer the analysis frame. In the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified.

ＦＦＴ部１０は、分析フレームごとに高速フーリエ変換処理を施すことで、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換し、得られた周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）をそれぞれ、対応する第１の指向性形成部１１、第２の指向性形成部１２に与える。なお、ｆは周波数を表すインデックスである。また、Ｘ１（ｆ，Ｋ）は単一の値ではなく、（２）式に示すように、複致の周波数ｆ１〜ｆｍのスペクトル成分から構成されるものである。Ｘ２（ｆ，Ｋ）や後述するＢ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）も同様である。 The FFT unit 10 performs fast Fourier transform processing for each analysis frame to convert the frequency domain signals X1 (f, K) and X2 (f, K) into the frequency domain signals X1 (f, K) obtained. And X2 (f, K) are given to the corresponding first directivity forming unit 11 and second directivity forming unit 12, respectively. Note that f is an index representing a frequency. X1 (f, K) is not a single value, but is composed of spectral components of multiple frequencies f1 to fm, as shown in equation (2). The same applies to X2 (f, K) and later-described B1 (f, K) and B2 (f, K).

Ｘ１（ｆ，Ｋ）＝｛（ｆ１，Ｋ），（ｆ２，Ｋ），…，（ｆｍ，Ｋ）｝
…（２）
第１の指向性形成部１１は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から特定方向に指向性が強い信号Ｂ１（ｆ，Ｋ）を形成するものである。第２の指向性形成部１２は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から特定方向（上述の特定方向とは異なる）に指向性が強い信号Ｂ２（ｆ，Ｋ）を形成するものである。特定方向に指向性が強い信号Ｂ１（ｆ，Ｋ）、Ｂ２（ｆ，Ｋ）の形成方法としては既存の方法を適用でき、例えば、（３）式を適用して右方向に指向性が強いＢ１（ｆ，Ｋ）や（４）式を適用して左方向に指向性が強いＢ２（ｆ，Ｋ）が形成できる。（３）式及び（４）式では、フレームインデックスＫは演算に関与しないので省略している。

X1 (f, K) = {(f1, K), (f2, K),..., (Fm, K)}
... (2)
The first directivity forming unit 11 forms a signal B1 (f, K) having strong directivity in a specific direction from the frequency domain signals X1 (f, K) and X2 (f, K). The second directivity forming unit 12 is a signal B2 (f, K) having strong directivity in a specific direction (different from the above specific direction) from the frequency domain signals X1 (f, K) and X2 (f, K). Is formed. As a method for forming the signals B1 (f, K) and B2 (f, K) having strong directivity in a specific direction, an existing method can be applied. For example, the directivity is strong in the right direction by applying the expression (3). By applying B1 (f, K) and equation (4), B2 (f, K) having strong directivity in the left direction can be formed. In the equations (3) and (4), the frame index K is omitted because it is not involved in the calculation.

これらの式の意味を、（３）式を例に、図２及び図３を用いて説明する。図２（Ａ）に示した方向θから音波が到来し、距離ｌだけ隔てて設置されている一対のマイクｍ＿１及びｍ＿２で捕捉されたとする。このとき、音波が一対のマイクｍ＿１及びｍ＿２に到達するまでには時間差が生じる。この到達時間差τは、音の経路差をｄとすると、ｄ＝ｌ×ｓｉｎθなので、音速をｃとすると（５）式で与えられる。 The meaning of these formulas will be described with reference to FIGS. 2 and 3, taking formula (3) as an example. It is assumed that sound waves arrive from the direction θ shown in FIG. 2A and are captured by a pair of microphones m_1 and m_2 that are separated by a distance l. At this time, there is a time difference until the sound wave reaches the pair of microphones m_1 and m_2. This arrival time difference τ is given by equation (5), where d = 1 × sin θ, where d is the sound path difference, and c is the sound speed.

τ＝ｌ×ｓｉｎθ／ｃ …（５）
ところで、入力信号ｓ１（ｎ）にτだけ遅延を与えた信号ｓ１（ｔ−τ）は、入力信号ｓ２（ｔ）と同一の信号である。従って、両者の差をとった信号ｙ（ｔ）＝ｓ２（ｔ）−ｓ１（ｔ−τ）は、θ方向から到来した音が除去された信号となる。結果として、マイクロフォンアレーｍ＿１及びｍ＿２は図２（Ｂ）のような指向特性を持つようになる。 τ = 1 × sin θ / c (5)
Incidentally, a signal s1 (t−τ) obtained by delaying the input signal s1 (n) by τ is the same signal as the input signal s2 (t). Therefore, the signal y (t) = s2 (t) −s1 (t−τ) taking the difference between them is a signal from which the sound coming from the θ direction is removed. As a result, the microphone arrays m_1 and m_2 have directivity characteristics as shown in FIG.

なお、以上では、時間領域での演算を記したが、周波数領域で行っても同様なことがいえる。この場合の式が、上述した（３）式及び（４）式である。今、一例として、到来方位θが±９０度であることを想定する。すなわち、第１の指向性形成部１１からの指向性信号Ｂ１（ｆ）は、図３（Ａ）に示すように右方向に強い指向性を有し、第２の指向性形成部１２からの指向性信号Ｂ２（ｆ）は、図３（Ｂ）に示すように左方向に強い指向性を有する。 In the above, the calculation in the time domain has been described, but the same can be said if it is performed in the frequency domain. The equations in this case are the above-described equations (3) and (4). As an example, it is assumed that the arrival direction θ is ± 90 degrees. That is, the directivity signal B1 (f) from the first directivity forming unit 11 has a strong directivity in the right direction as shown in FIG. The directivity signal B2 (f) has strong directivity in the left direction as shown in FIG.

コヒーレンス計算部１３は、以上のようにして得られた指向性信号Ｂ１（ｆ）、Ｂ２（ｆ）に対し、（６）式、（７）式に示す演算を施すことでコヒーレンスＣＯＨを得るものである。（６）式におけるＢ２（ｆ）^＊はＢ２（ｆ）の共役複素数である。また、フレームインデックスＫは、（６）式、（７）式の演算には関与しないので、（６）式、（７）式ではフレームインデックスＫの記載を省略している。

The coherence calculation unit 13 obtains coherence COH by performing the operations shown in the equations (6) and (7) on the directivity signals B1 (f) and B2 (f) obtained as described above. It is. B2 (f) ^* in the equation (6) is a conjugate complex number of B2 (f). Since the frame index K is not involved in the calculations of the expressions (6) and (7), the description of the frame index K is omitted in the expressions (6) and (7).

正面到来信号検出部１４は、コヒーレンスＣＯＨ（Ｋ）を正面到来信号検出閾値Θと比較し、コヒーレンスＣＯＨ（Ｋ）が正面到来信号検出閾値Θより大きければ正面から到来する信号区間とみなして検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)に１．０を代入し、コヒーレンスＣＯＨが正面到来信号検出閾値Θより小さければ正面以外の方位から到来する信号区間とみなして検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)には０．０を代入し、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)を校正ゲイン計算部１５に与えるものである。 The front arrival signal detection unit 14 compares the coherence COH (K) with the front arrival signal detection threshold Θ, and if the coherence COH (K) is larger than the front arrival signal detection threshold Θ, the front arrival signal detection unit 14 regards it as a signal interval coming from the front. If 1.0 is substituted into the storage variable VAD_RES (K) and the coherence COH is smaller than the front arrival signal detection threshold value Θ, it is regarded as a signal interval coming from a direction other than the front, and the detection result storage variable VAD_RES (K) is set to 0. 0 is substituted and the detection result storage variable VAD_RES (K) is given to the calibration gain calculator 15.

ここで、コヒーレンスの大小で入力信号が正面から到来した信号か否かを判定できる理由を簡単に説明する。 Here, the reason why it is possible to determine whether or not the input signal is a signal coming from the front depending on the level of coherence will be briefly described.

コヒーレンスの概念は、コヒーレンスＣＯＨの概念は、右から到来する信号と左から到来する信号の相関と言い換えられる（上述した（６）式はある周波数成分についての相関を算出する式であり、（７）式は全ての周波数成分の相関値の平均を計算している）。従って、コヒーレンスＣＯＨが小さい場合とは、２つの指向性信号Ｂ１及びＢ２の相関が小さい場合であり、反対にコヒーレンスＣＯＨが大きい場合とは相関が大きい場合と言い換えることができる。そして、相関が小さい場合の入力信号は、入力到来方位が右又は左のどちらかに大きく偏っている、つまり、正面以外から到来している信号といえる。一方、コヒーレンスＣＯＨの値が大きい場合は、到来方位の偏りがないため、入力信号が正面から到来する場合であるといえる。このようにコヒーレンスの大小で入力信号の到来方位が正面か否かを判定することができる。 The concept of coherence can be paraphrased as the correlation of the signal arriving from the right and the signal arriving from the left (the above equation (6) is an equation for calculating the correlation for a certain frequency component, and (7 ) Formula calculates the average of the correlation values of all frequency components). Therefore, the case where the coherence COH is small is a case where the correlation between the two directivity signals B1 and B2 is small. Conversely, the case where the coherence COH is large can be paraphrased as a case where the correlation is large. The input signal when the correlation is small can be said to be a signal whose input arrival azimuth is greatly biased to either the right or left, that is, the signal coming from other than the front. On the other hand, when the value of the coherence COH is large, it can be said that there is no bias in the arrival direction, and therefore the input signal comes from the front. In this way, it is possible to determine whether or not the arrival direction of the input signal is the front depending on the level of coherence.

因みに、校正ゲインを決定する段階の２つの入力信号は、一対のマイクｍ＿１及びｍ＿２の感度差がそのまま入り込んでおり、計算されたコヒーレンスＣＯＨは厳密に言えば正確ではないが、マイクの感度差程度では、入力音響信号が正面から到来する場合にはコヒーレンスＣＯＨは大きく、正面以外から到来する場合にはコヒーレンスＣＯＨは小さいという特性は維持されるので、正面から到来する信号の検出には不都合は生じない。 Incidentally, in the two input signals at the stage of determining the calibration gain, the sensitivity difference between the pair of microphones m_1 and m_2 directly enters, and the calculated coherence COH is not exactly accurate, but the sensitivity difference between microphones is about the same. However, the characteristic that the coherence COH is large when the input acoustic signal comes from the front and the coherence COH is small when the input acoustic signal comes from other than the front is maintained. Absent.

校正ゲイン計算部１５は、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）から、各入力信号ｓ１（ｎ）、ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、ＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを計算し、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを校正ゲイン乗算部１６に与え、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを校正ゲイン乗算部１７に与えるものである。 The calibration gain calculator 15 calculates calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for the input signals s1 (n) and s2 (n) from the detection result storage variable VAD_RES (K) and the input signal analysis frames FRAME1 (K) and FRAME2 (K). The calibration gain CALIB_GAIN_1CH is supplied to the calibration gain multiplier 16, and the calibration gain CALIB_GAIN_2CH is supplied to the calibration gain multiplier 17.

校正ゲイン乗算部１６は、入力信号ｓ１（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算して乗算後の信号ｙ１（ｎ）を出力するものである。 The calibration gain multiplier 16 multiplies the input signal s1 (n) by the calibration gain CALIB_GAIN_1CH and outputs a signal y1 (n) after multiplication.

校正ゲイン乗算部１７は、入力信号ｓ２（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算して乗算後の信号ｙ２（ｎ）を出力するものである。 The calibration gain multiplication unit 17 multiplies the input signal s2 (n) by the calibration gain CALIB_GAIN_2CH and outputs a signal y2 (n) after multiplication.

校正ゲインが乗算された信号ｙ１（ｎ）及びｙ２（ｎ）は、一対のマイクｍ＿１及びｍ＿２の感度差が是正（校正）されたものとなっている。 The signals y1 (n) and y2 (n) multiplied by the calibration gain are obtained by correcting (calibrating) the sensitivity difference between the pair of microphones m_1 and m_2.

図４は、校正ゲイン計算部１５の詳細構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a detailed configuration of the calibration gain calculation unit 15.

図４において、校正ゲイン計算部１５は、検出結果・入力信号受信部２１、校正ゲイン計算実行判定部２２、校正ゲイン計算実行部２３、記憶部２４及び校正ゲイン送信部２５を有する。 4, the calibration gain calculation unit 15 includes a detection result / input signal reception unit 21, a calibration gain calculation execution determination unit 22, a calibration gain calculation execution unit 23, a storage unit 24, and a calibration gain transmission unit 25.

検出結果・入力信号受信部２１は、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）を取り込むものである。 The detection result / input signal receiving unit 21 takes in the detection result storage variable VAD_RES (K) and the input signal analysis frames FRAME1 (K) and FRAME2 (K).

校正ゲイン計算実行判定部２２は、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０ならば（入力音響の到来方位がほぼ正面ならば）、校正ゲイン計算実行部２３が校正ゲインを計算するように制御し、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が０．０ならば（入力音響の到来方位が正面以外ならば）、校正ゲインを計算させずに、記憶部２４に格納されている値を校正ゲインとして設定するように制御するものである。 If the detection result storage variable VAD_RES (K) is 1.0 (if the incoming sound arrival direction is almost in front), the calibration gain calculation execution determination unit 22 calculates the calibration gain. If the detection result storage variable VAD_RES (K) is 0.0 (if the incoming sound arrival direction is other than the front), the value stored in the storage unit 24 is not calculated and the calibration gain is calculated. It controls to set as.

校正ゲイン計算実行部２３は、校正ゲイン計算実行判定部２２が校正ゲインの計算を実行するフレームであることを指示したときに、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）に基づいて、入力信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを計算するものである。第１の実施形態の校正ゲイン計算実行部２３は、（８）式〜（１２）式に従って、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを計算する。

The calibration gain calculation execution unit 23, based on the input signal analysis frames FRAME1 (K) and FRAME2 (K), when the calibration gain calculation execution determination unit 22 instructs that the frame is a frame for executing the calibration gain calculation, The calibration gain CALIB_GAIN_1CH for the input signal s1 (n) and the calibration gain CALIB_GAIN_2CH for the input signal s2 (n) are calculated. The calibration gain calculation execution unit 23 of the first embodiment calculates the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH according to the equations (8) to (12).

（８）式は、マイクｍ＿１が捕捉した入力信号ｓ１（ｎ）に係る現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿１ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿１ＣＨはマイクｍ＿１の感度を反映した値とみなすことができる。（９）式は、マイクｍ＿２が捕捉した入力信号ｓ２（ｎ）に係る現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿２ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿２ＣＨはマイクｍ＿２の感度を反映した値とみなすことができる。なお、例えば、所定フレーム数での各フレームの構成要素の総和値を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良く、また例えば、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０であった最新のＰ（Ｐ≦Ｋ）個のフレームを構成する全ての要素（信号成分）の絶対値の平均を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良い。後者の場合、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０であった最新のＰ−１個のフレームの構成要素の絶対値の総和値を保存しておくことにより、現フレーム（Ｋ番目のフレーム）ＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）の情報が与えられたときに容易にマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨを計算することができる。 Equation (8) calculates the average LEVEL_1CH of the absolute values of all the components of the current frame (Kth frame) related to the input signal s1 (n) captured by the microphone m_1. The value LEVEL_1CH can be regarded as a value reflecting the sensitivity of the microphone m_1. Equation (9) calculates the average LEVEL_2CH of the absolute values of all the components of the current frame (Kth frame) related to the input signal s2 (n) captured by the microphone m_2. The value LEVEL_2CH can be regarded as a value reflecting the sensitivity of the microphone m_2. For example, the total value of the components of each frame in a predetermined number of frames may be used as values LEVEL_1CH and LEVEL_2CH reflecting microphone sensitivity. For example, the detection result storage variable VAD_RES (K) is 1. You may make it use the average of the absolute value of all the elements (signal component) which comprise the newest P (P <= K) frame which was 0 as value LEVEL_1CH and LEVEL_2CH reflecting microphone sensitivity. In the latter case, by storing the sum of absolute values of the components of the latest P-1 frames whose detection result storage variable VAD_RES (K) was 1.0, the current frame (Kth The values LEVEL_1CH and LEVEL_2CH reflecting the microphone sensitivity can be easily calculated when the information of the frame FRAME1 (K) and FRAME2 (K) is given.

上述したように長期間の信号成分の絶対値の平均や総和値を算出することにより、瞬間的な入力信号の変動の影響を抑制してマイク感度を反映した値を算出することができる。（８）式及び（９）式は、マイク感度を反映した値の算出式の一例であり、上述したように、その他、種々の算出式が適用できる。但し、マイクｍ＿１のマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨの算出式と、マイクｍ＿２のマイク感度を反映した値ＬＥＶＥＬ＿２ＣＨの算出式とが同じ算出式であることを要する。 As described above, by calculating the average or total value of the absolute values of long-term signal components, it is possible to calculate a value reflecting the microphone sensitivity while suppressing the influence of instantaneous input signal fluctuations. Expressions (8) and (9) are examples of calculation formulas for values that reflect microphone sensitivity, and various other calculation formulas can be applied as described above. However, the calculation formula of the value LEVEL_1CH reflecting the microphone sensitivity of the microphone m_1 and the calculation formula of the value LEVEL_2CH reflecting the microphone sensitivity of the microphone m_2 need to be the same calculation formula.

（１０）式は２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの平均ＡＶＥ＿ＬＥＶＥＬを、校正後のマイクｍ＿１及びｍ＿２の目標感度として算出している。なお、２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの大きい方の値若しくは小さい方の値を目標感度とするようにしても良い。 Equation (10) calculates the average AVE_LEVEL of the sensitivities LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 as the target sensitivity of the calibrated microphones m_1 and m_2. Note that the larger or smaller value of the sensitivity levels LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 may be set as the target sensitivity.

（１１）式は、その右辺の分母ＬＥＶＥＬ＿１ＣＨを左辺に移行した式を考えると理解できるように、マイクｍ＿１の感度ＬＥＶＥＬ＿１ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを定める式になっている。同様に、（１２）式は、その右辺の分母ＬＥＶＥＬ＿２ＣＨを左辺に移行した式を考えると理解できるように、マイクｍ＿２の感度ＬＥＶＥＬ＿２ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを定める式になっている。 As can be understood from the equation (11), when the denominator LEVEL_1CH on the right side is shifted to the left side, the calibration gain CALIB_GAIN_1CH is set so that the value obtained by multiplying the sensitivity LEVEL_1CH of the microphone m_1 by the calibration gain CALIB_GAIN_1CH becomes the target sensitivity AVE_LEVEL. It is a formula that determines. Similarly, the expression (12) can be understood by considering the expression in which the denominator LEVEL_2CH on the right side is shifted to the left side, so that the value obtained by multiplying the sensitivity LEVEL_2CH of the microphone m_2 by the calibration gain CALIB_GAIN_2CH becomes the target sensitivity AVE_LEVEL. This is an equation for determining the calibration gain CALIB_GAIN_2CH.

記憶部２４は、校正ゲイン計算実行部２３が校正ゲインを計算しない場合に適用する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ）及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨ）を記憶しているものである。このような記憶校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨとして、校正させない値１．０を適用しても良く、また、校正ゲイン計算実行部２３が計算した直近の値を適用するようにしても良い。 The storage unit 24 stores calibration gains CALIB_GAIN_1CH (= INIT_GAIN_1CH) and CALIB_GAIN_2CH (= INIT_GAIN_2CH) that are applied when the calibration gain calculation execution unit 23 does not calculate a calibration gain. As such stored calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH, a value 1.0 that is not calibrated may be applied, or the latest value calculated by the calibration gain calculation execution unit 23 may be applied.

校正ゲイン送信部２５は、校正ゲイン計算実行部２３が計算で得た校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ、若しくは、記憶部２４から読み出された校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨをそれぞれ、対応する校正ゲイン乗算部１６、１７に与えるものである。 The calibration gain transmitter 25 corresponds to the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH obtained by the calculation of the calibration gain calculation execution unit 23 or the calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24, respectively. 17.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態の音響信号処理装置１の動作を、図面を参照しながら、全体動作、校正ゲイン計算部１５における詳細動作の順に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the acoustic signal processing device 1 of the first embodiment will be described in the order of the overall operation and the detailed operation in the calibration gain calculator 15 with reference to the drawings. To do.

一対のマイクｍ＿１及びｍ＿２から入力された信号ｓ１（ｎ）、ｓ２（ｎ）はそれぞれ、ＦＦＴ部１０によって時間領域から周波数領域の信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に変換された後、第１及び第２の指向性形成部１１及び１２のそれぞれによって、所定の方位に死角を有する指向性信号Ｂ１(ｆ，Ｋ)、Ｂ２（ｆ，Ｋ）が生成される。そして、コヒーレンス計算部１３において、指向性信号Ｂ１（ｆ，Ｋ）及びＢ２（ｆ，Ｋ）を適用して、（６）式及び（７）式の演算が実行され、コヒーレンスＣＯＨ（Ｋ）が算出される。 The signals s1 (n) and s2 (n) input from the pair of microphones m_1 and m_2 are respectively converted from the time domain to the frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 10. After that, directivity signals B1 (f, K) and B2 (f, K) having a blind spot in a predetermined direction are generated by the first and second directivity forming units 11 and 12, respectively. Then, the coherence calculation unit 13 applies the directivity signals B1 (f, K) and B2 (f, K) to execute the calculations of the equations (6) and (7), and the coherence COH (K) is calculated. Calculated.

校正ゲイン計算部１５においては、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）から、各入力信号ｓ１（ｎ）、ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、ＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨが定えられる。校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨは校正ゲイン乗算部１６に与えられ、入力信号ｓ１（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨが乗算されて、マイクｍ＿１の感度の平均感度からのずれが是正（校正）された信号ｙ１（ｎ）が出力される。一方、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨは校正ゲイン乗算部１７に与えられ、入力信号ｓ２（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨが乗算されて、マイクｍ＿２の感度の平均感度からのずれが是正（校正）された信号ｙ２（ｎ）が出力される。 In the calibration gain calculation unit 15, the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for the input signals s1 (n) and s2 (n) from the detection result storage variable VAD_RES (K) and the input signal analysis frames FRAME1 (K) and FRAME2 (K). Is determined. The calibration gain CALIB_GAIN_1CH is supplied to the calibration gain multiplication unit 16, and the input signal s1 (n) is multiplied by the calibration gain CALIB_GAIN_1CH to correct (calibrate) the signal y1 (n) from which the deviation of the sensitivity of the microphone m_1 from the average sensitivity is corrected. Is output. On the other hand, the calibration gain CALIB_GAIN_2CH is supplied to the calibration gain multiplication unit 17, and the input signal s2 (n) is multiplied by the calibration gain CALIB_GAIN_2CH to correct (calibrate) the signal y2 (corrected) from the average sensitivity of the microphone m_2. n) is output.

次に、校正ゲイン計算部１５の動作を説明する。図５は、校正ゲイン計算部１５の動作を示すフローチャートである。 Next, the operation of the calibration gain calculator 15 will be described. FIG. 5 is a flowchart showing the operation of the calibration gain calculator 15.

新たなフレームの処理に移行したときには、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）が、校正ゲイン計算部１５に取り込まれる（ステップＳ１００）。そして、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０か否かが判定される（ステップＳ１０１）。 When processing shifts to a new frame, the detection result storage variable VAD_RES (K) and the input signal analysis frames FRAME1 (K) and FRAME2 (K) are taken into the calibration gain calculation unit 15 (step S100). Then, it is determined whether or not the detection result storage variable VAD_RES (K) is 1.0 (step S101).

検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０の場合には、入力信号分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）が適用されて、上述した（８）式〜（１２）式に従って、入力信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨが計算される（ステップＳ１０２）。なお、校正ゲイン計算実行部２３が計算した直近の値を記憶部２４に記憶させる方法を採用している場合には、ステップＳ１０２において、計算された校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、ＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨが記憶部２４に書き込まれることも実行される。 When the detection result storage variable VAD_RES (K) is 1.0, the input signal analysis frames FRAME1 (K) and FRAME2 (K) are applied, and the input signal is determined according to the expressions (8) to (12) described above. A calibration gain CALIB_GAIN_1CH for s1 (n) and a calibration gain CALIB_GAIN_2CH for the input signal s2 (n) are calculated (step S102). If the method of storing the latest value calculated by the calibration gain calculation execution unit 23 in the storage unit 24 is adopted, the calculated calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH are stored in the storage unit 24 in step S102. Writing is also performed.

一方、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が０．０の場合には、記憶部２４に格納されている値ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨが、現フレームについての入力信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとして取り出される（ステップＳ１０３）。 On the other hand, when the detection result storage variable VAD_RES (K) is 0.0, the values INIT_GAIN_1CH and INIT_GAIN_2CH stored in the storage unit 24 correspond to the calibration gains CALIB_GAIN_1CH for the input signal s1 (n) for the current frame, and It is taken out as a calibration gain CALIB_GAIN_2CH for the input signal s2 (n) (step S103).

そして、計算で得られた校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ、若しくは、記憶部２４から読み出された校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨがそれぞれ、対応する校正ゲイン乗算部１６、１７に与えられる（ステップＳ１０４）。 Then, the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH obtained by calculation, or the calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24 are respectively supplied to the corresponding calibration gain multiplication units 16 and 17 (step S104).

その後、処理フレームを規定するパラメータＫを１インクリメントして、次のフレームの処理に進む（ステップＳ１０５）。 Thereafter, the parameter K defining the processing frame is incremented by 1, and the processing proceeds to the next frame (step S105).

（Ａ−３）第１の実施形態の効果
上記第１の実施形態によれば、コヒーレンスを適用して音響信号の到来方位が正面か否かを判定し、到来方位が正面のときの入力音響信号を用いて校正ゲインを得るようにしたので、入力音響信号が有する周波数帯域に関係なく、マイク感度差を適切に校正することができる。 (A-3) Effect of First Embodiment According to the first embodiment, it is determined whether or not the arrival direction of the acoustic signal is the front by applying coherence, and the input sound when the arrival direction is the front. Since the calibration gain is obtained using the signal, the microphone sensitivity difference can be appropriately calibrated regardless of the frequency band of the input acoustic signal.

これにより、第１の実施形態の音響信号処理装置を、テレビ会議システムや携帯電話などの通信装置に適用することで、通話音質の向上が期待できる。 As a result, by applying the acoustic signal processing device of the first embodiment to a communication device such as a video conference system or a mobile phone, it is possible to expect improvement in call sound quality.

（Ａ−４）第１の実施形態の変形実施形態
ところで、マイク感度差は固定のものとみなすことができ、従って、校正ゲインは一度計算した後は変動することはないとみなすことができる。このような状況で、定常的に校正ゲインを更新し続けることは演算量の無駄となるので、途中から更新を停止するようにしても良い。 (A-4) Modified Embodiment of First Embodiment By the way, it can be considered that the microphone sensitivity difference is fixed, and therefore, the calibration gain can be regarded as not fluctuating once calculated. In such a situation, if the calibration gain is constantly updated continuously, the amount of calculation is wasted, and the update may be stopped halfway.

この場合には、校正ゲイン計算部１３の校正ゲイン計算実行判定部２２で、例えば、校正ゲインの計算を実施したフレーム数をＣＯＵＮＴＥＲという変数を用いて観測し、変数ＣＯＵＮＴＥＲが所定の閾値Ｔを上回った後は、校正ゲインの計算を行うことなく、常に記憶部２４を照合するのみとする、という制御を追加すれば良い。図６は、このような変形実施形態における校正ゲイン計算部１３の動作を示すフローチャートであり、上述した図５との同一、対応ステップには同一符号を付して示している。第１の実施形態との変更点は、上述の通り、ステップＳ１０１の処理に変数ＣＯＵＮＴＥＲによる判定が追加されたことと、ステップＳ１０２の処理に変数ＣＯＵＮＴＥＲの１インクリメント処理が追加されたことである。 In this case, the calibration gain calculation execution determination unit 22 of the calibration gain calculation unit 13 observes, for example, the number of frames for which the calibration gain has been calculated using a variable COUNTER, and the variable COUNTER exceeds a predetermined threshold T. After that, it is only necessary to add a control of always checking the storage unit 24 without calculating the calibration gain. FIG. 6 is a flowchart showing the operation of the calibration gain calculator 13 in such a modified embodiment, and the same and corresponding steps as those in FIG. 5 are given the same reference numerals. The changes from the first embodiment are that, as described above, the determination by the variable COUNTER is added to the process of step S101, and the process of incrementing the variable COUNTER is added to the process of step S102.

（Ｂ）第２の実施形態
次に、本発明による音響信号処理装置、方法及びプログラムの第２の実施形態を、図面を参照しながら詳述する。 (B) Second Embodiment Next, a second embodiment of the acoustic signal processing apparatus, method and program according to the present invention will be described in detail with reference to the drawings.

（Ｂ−１）第２の実施形態に至った考え方
音響信号処理装置をテレビ会議システムや携帯電話などの通信装置に適用する場合、入力信号は、正面から到来する目的話者の音声（目的音声）、正面以外から到来する目的話者以外の人の声（妨害音声）、背景雑音の３種に大別できる。妨害音声は音声なので、通常の音声区間検出方法では目的音声と区別できないのだが、到来方位によって値が変化するコヒーレンスによる判定ならば目的音声と区別することができる。しかし、目的音声と妨害音声とが重畳された区間は目的音声と判定される。この状態で、第１の実施形態の方法で校正ゲインを計算すると、妨害音声の成分も反映された状態で校正ゲインが計算されてしまう。この結果、「入力信号ｓ１（ｎ）及びｓ２（ｎ）の特性差はマイク感度差のみに由来し、音響的な特性差は微小」という前提が崩れ、校正ゲインの推定精度が低下してしまう。妨害音の到来方位が正面に近ければ、精度の劣化度は小さいが、到来方位が右や左に偏るほどに精度の劣化が大きくなってしまう。 (B-1) Approach to the Second Embodiment When the acoustic signal processing device is applied to a communication device such as a video conference system or a mobile phone, the input signal is the voice of the target speaker coming from the front (target voice). ), Voices of people other than the target speaker coming from other than the front (interfering speech), and background noise. Since the disturbing voice is a voice, it cannot be distinguished from the target voice by a normal voice section detection method. However, if the judgment is based on coherence whose value changes depending on the arrival direction, it can be distinguished from the target voice. However, the section in which the target voice and the disturbing voice are superimposed is determined as the target voice. In this state, if the calibration gain is calculated by the method of the first embodiment, the calibration gain is calculated in a state in which the disturbing sound component is also reflected. As a result, the assumption that “the characteristic difference between the input signals s1 (n) and s2 (n) is derived only from the microphone sensitivity difference and the acoustic characteristic difference is minute” is lost, and the estimation accuracy of the calibration gain is reduced. . If the arrival direction of the disturbing sound is close to the front, the degree of accuracy deterioration is small, but the accuracy deterioration increases as the arrival direction is biased to the right or left.

第２の実施形態は、妨害音声が存在する場合であっても、校正ゲインの推定精度の劣化を防止しようとしたものである。 The second embodiment is intended to prevent deterioration in the accuracy of estimating the calibration gain even when disturbing speech is present.

精度の劣化は校正ゲインを計算する区間を長くすることで軽減できる。そこで、第２の実施形態では、非目的音声区間（妨害音声、背景雑音区間）でのコヒーレンスＣＯＨから、妨害音声の到来方位を推定し、到来方位に応じて適切な校正区間長Ｔを設定できるようにした。 Accuracy degradation can be reduced by lengthening the interval for calculating the calibration gain. Therefore, in the second embodiment, the arrival direction of the disturbing speech can be estimated from the coherence COH in the non-target speech section (interfering speech, background noise section), and an appropriate calibration section length T can be set according to the arrival direction. I did it.

（Ｂ−２）第２の実施形態の構成
第２の実施形態に係る音響信号処理装置も、その全体構成は、第１の実施形態の説明で用いた図１で表すことができる。なお、図１における符号１Ａは、第２の実施形態の音響信号処理装置を表している。 (B-2) Configuration of Second Embodiment The overall configuration of the acoustic signal processing apparatus according to the second embodiment can also be represented in FIG. 1 used in the description of the first embodiment. In addition, the code | symbol 1A in FIG. 1 represents the acoustic signal processing apparatus of 2nd Embodiment.

第２の実施形態の音響信号処理装置１Ａは、校正ゲイン計算部（第２の実施形態では符号１５Ａを用いる）が第１の実施形態と異なっており、その他の構成要素、すなわち、マイクｍ＿１、ｍ＿２、ＦＦＴ部１０、第１指向性形成部１１、第２の指向性形成部１２、コヒーレンス計算部１３、正面到来信号検出部１４及び校正ゲイン乗算部１６、１７は、第１の実施形態のものと同様であり、その説明は省略する。 The acoustic signal processing apparatus 1A of the second embodiment is different from the first embodiment in the calibration gain calculation unit (the reference numeral 15A is used in the second embodiment), and other components, that is, the microphone m_1, m_2, FFT unit 10, first directivity formation unit 11, second directivity formation unit 12, coherence calculation unit 13, front arrival signal detection unit 14, and calibration gain multiplication units 16 and 17 are the same as those in the first embodiment. The description is omitted.

図７は、第２の実施形態の校正ゲイン計算部１５Ａの詳細構成を示すブロック図である。 FIG. 7 is a block diagram showing a detailed configuration of the calibration gain calculation unit 15A of the second embodiment.

図７において、校正ゲイン計算部１５Ａは、検出結果・入力信号受信部２１Ａ、校正ゲイン計算実行判定部２２Ａ、校正ゲイン計算実行部２３、記憶部２４、校正ゲイン送信部２５及び校正ゲイン観測区間長制御部２６を有する。校正ゲイン計算実行部２３、記憶部２４及び校正ゲイン送信部２５は、第１の実施形態のものと同様なものであり、その説明は省略する。 In FIG. 7, a calibration gain calculation unit 15A includes a detection result / input signal reception unit 21A, a calibration gain calculation execution determination unit 22A, a calibration gain calculation execution unit 23, a storage unit 24, a calibration gain transmission unit 25, and a calibration gain observation section length. A control unit 26 is included. The calibration gain calculation execution unit 23, the storage unit 24, and the calibration gain transmission unit 25 are the same as those in the first embodiment, and a description thereof will be omitted.

検出結果・入力信号受信部２１Ａは、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）に加え、コヒーレンスＣＯＨ（Ｋ）を取り込むものである。 The detection result / input signal receiving unit 21A captures coherence COH (K) in addition to the detection result storage variable VAD_RES (K) and the input signal analysis frames FRAME1 (K) and FRAME2 (K).

校正ゲイン観測区間長制御部２６は、非目的音声区間のコヒーレンスに基づいて、後述するようにして校正ゲイン観測区間長Ｔを定めて、校正ゲイン計算実行判定部２２Ａに与えるものである。 The calibration gain observation section length control unit 26 determines the calibration gain observation section length T as described later based on the coherence of the non-target speech section, and gives it to the calibration gain calculation execution determination section 22A.

校正ゲイン計算実行判定部２２Ａは、校正ゲイン観測区間長Ｔを閾値として設定し、校正ゲイン観測区間長Ｔを超えたフレームについては、記憶部２４から校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、ＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを読み出すように制御するものである（図６参照）。 The calibration gain calculation execution determination unit 22A sets the calibration gain observation interval length T as a threshold value, and controls to read out the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH from the storage unit 24 for frames exceeding the calibration gain observation interval length T. (See FIG. 6).

図８は、第２の実施形態の校正ゲイン観測区間長制御部２６の詳細構成を示すブロック図である。 FIG. 8 is a block diagram showing a detailed configuration of the calibration gain observation section length control unit 26 of the second embodiment.

図８において、校正ゲイン観測区間長制御部２６は、コヒーレンス・検出結果受信部３１、非目的音声コヒーレンス平均処理実施判定部３２、非目的音声コヒーレンス平均計算部３３、校正ゲイン観測区間長照合部３４、観測区間長記憶部３５及び校正ゲイン観測区間長送信部３６を有する。 In FIG. 8, the calibration gain observation interval length control unit 26 includes a coherence / detection result reception unit 31, a non-target speech coherence average processing execution determination unit 32, a non-target speech coherence average calculation unit 33, and a calibration gain observation interval length verification unit 34. , An observation interval length storage unit 35 and a calibration gain observation interval length transmission unit 36.

コヒーレンス・検出結果受信部３１は、コヒーレンスＣＯＨ（Ｋ）と検出結果ＶＡＤ＿ＲＥＳ（Ｋ）を取得するものである。 The coherence / detection result receiving unit 31 acquires the coherence COH (K) and the detection result VAD_RES (K).

非目的音声コヒーレンス平均処理実施判定部３２は、検出結果ＶＡＤ＿ＲＥＳ（Ｋ）が０．０か否かを確認し、検出結果ＶＡＤ＿ＲＥＳ（Ｋ）が０．０のフレームならば非目的音区間として、非目的音声コヒーレンス平均計算部３３及び校正ゲイン観測区間長照合部３４による処理を実施させ、検出結果ＶＡＤ＿ＲＥＳ（Ｋ）が１．０のフレームならば、非目的音声コヒーレンス平均計算部３３及び校正ゲイン観測区間長照合部３４による処理を実施させないように制御するものである。 The non-target speech coherence averaging process execution determination unit 32 confirms whether or not the detection result VAD_RES (K) is 0.0. If the detection result VAD_RES (K) is 0.0, the non-target sound interval is determined as non-target sound section. If the target speech coherence average calculation unit 33 and the calibration gain observation interval length verification unit 34 are processed and the detection result VAD_RES (K) is a frame of 1.0, the non-target speech coherence average calculation unit 33 and the calibration gain observation interval Control is performed so that processing by the long collation unit 34 is not performed.

非目的音声コヒーレンス平均計算部３３は、現フレーム区間が非目的音声区間であれば、（１３）式に従って、非目的音声区間におけるコヒーレンスの平均値を計算し直し、現フレーム区間が目的音声区間であれば非目的音声区間におけるコヒーレンスの平均値は更新せず、前フレームの平均値を流用するものである。 If the current frame segment is a non-target speech segment, the non-target speech coherence average calculating unit 33 recalculates the average value of coherence in the non-target speech segment according to the equation (13), and the current frame segment is the target speech segment. If there is, the average value of the coherence in the non-target speech section is not updated, and the average value of the previous frame is used.

ＡＶＥ＿ＣＯＨ（Ｋ）＝δ×ＣＯＨ（Ｋ）＋（１．０−δ）×ＡＶＥ＿ＣＯＨ（Ｋ−１）（０．０＜δ＜１．０） …（１３）
（１３）式は、現フレーム区間（動作開始時点から数えてＫ番目のフレーム）の入力音響に対するコヒーレンスＣＯＨ（Ｋ）と、直前フレーム区間で得られた平均値ＡＶＥ＿ＣＯＨ（Ｋ−１）との重み付け加算平均値を計算しており、δの値の大小で、瞬時値ＣＯＨ（Ｋ）の平均値への寄与度を調整することができる。仮に、δを０に近い小さい値に設定した場合には、瞬時値の平均値への寄与度が小さくなるので、平均値ＡＶＥ＿ＣＯＨ（Ｋ）におけるコヒーレンスＣＯＨ（Ｋ）の変動の影響を抑制できる。また、δが１に近い値であれば、瞬時値の寄与度が高まるので、平均の効果を弱めることができる。 AVE_COH (K) = δ × COH (K) + (1.0−δ) × AVE_COH (K−1) (0.0 <δ <1.0) (13)
Equation (13) is a weighting of the coherence COH (K) for the input sound in the current frame section (Kth frame counted from the operation start time) and the average value AVE_COH (K-1) obtained in the immediately preceding frame section. The addition average value is calculated, and the contribution of the instantaneous value COH (K) to the average value can be adjusted by the magnitude of the value of δ. If δ is set to a small value close to 0, the contribution of the instantaneous value to the average value becomes small, so that the influence of fluctuations in the coherence COH (K) in the average value AVE_COH (K) can be suppressed. Also, if δ is a value close to 1, the contribution of the instantaneous value increases, so the average effect can be weakened.

観測区間長記憶部３５は、非目的音声区間におけるコヒーレンス平均値と校正ゲイン観測区間長との対応関係が記憶しているものである。図９は、観測区間長記憶部３５の構成を示す説明図である。図９の例では、コヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）がＤ以上Ｃ未満の範囲では校正ゲイン観測区間長としてγが対応付けられ、コヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）がＣ以上Ｂ未満の範囲では校正ゲイン観測区間長としてβ（＜γ）が対応付けられ、コヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）がＢ以上Ａ未満の範囲では校正ゲイン観測区間長としてα（＜β）が対応付けられている。すなわち、コヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）が小さくなればなるほど（妨害音の到来方位が正面からずれればずれるほど）、校正ゲイン観測区間長Ｔを長くするようになされている。 The observation interval length storage unit 35 stores the correspondence between the coherence average value and the calibration gain observation interval length in the non-target speech interval. FIG. 9 is an explanatory diagram showing the configuration of the observation section length storage unit 35. In the example of FIG. 9, in the range where the coherence average value AVE_COH (K) is not less than D and less than C, γ is associated as the calibration gain observation section length, and in the range where the coherence average value AVE_COH (K) is not less than C and less than B, the calibration gain. Β (<γ) is associated with the observation interval length, and α (<β) is associated with the calibration gain observation interval length in the range where the coherence average value AVE_COH (K) is greater than or equal to B and less than A. That is, as the coherence average value AVE_COH (K) becomes smaller (as the arrival direction of the disturbing sound deviates from the front), the calibration gain observation section length T is made longer.

コヒーレンスは到来方位によってレンジが異なるため、コヒーレンスの平均値と到来方位を対応付けることができる。すなわち、コヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）が得られれば到来方位を推定できる。今、妨害音声の到来方位を知りたいので、検出結果ＶＡＤ＿ＲＥＳ（Ｋ）が０．０の区間、すなわち、非目的音声区間を検出し、当該区間でのコヒーレンス平均値を算出している。そして、妨害音の到来方位と直結した特徴量であるコヒーレンスの平均値と観測区間長を対応付け、この対応関係に基づいて適切な観測区間長を取り出すようにしている。 Since the range of coherence varies depending on the arrival direction, the average value of coherence can be associated with the arrival direction. That is, if the coherence average value AVE_COH (K) is obtained, the arrival direction can be estimated. Now, since it is desired to know the direction of arrival of the disturbing speech, a section where the detection result VAD_RES (K) is 0.0, that is, a non-target speech section is detected, and the coherence average value in the section is calculated. Then, the average value of coherence, which is a feature quantity directly connected to the direction of arrival of the disturbing sound, is associated with the observation section length, and an appropriate observation section length is extracted based on this correspondence.

因みに、校正ゲイン観測区間長Ｔを非常に長い値に固定した場合、正面から大きくずれた方位からの妨害音の存在による校正ゲインの計算精度の低下を防止できるが、その反面、妨害音がない状態や正面近くからの妨害音がある場合において、校正ゲインが適切になった以降も長期に渡って無駄な計算を行う。そこで、第２の実施形態においては、妨害音の到来方位に応じて、校正ゲイン観測区間長Ｔを制御することとした。 Incidentally, when the calibration gain observation section length T is fixed to a very long value, it is possible to prevent a decrease in calibration gain calculation accuracy due to the presence of interference sound from a direction greatly deviating from the front, but there is no interference sound. When there is a disturbing sound from the state or near the front, even after the calibration gain has become appropriate, a wasteful calculation is performed over a long period of time. Therefore, in the second embodiment, the calibration gain observation section length T is controlled according to the arrival direction of the disturbing sound.

校正ゲイン観測区間長照合部３４は、非目的音声コヒーレンス平均計算部３３が計算した非目的音声区間におけるコヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）をキーとして観測区間長記憶部３５を照合し、校正ゲイン観測区間長Ｔを得るものである。 The calibration gain observation interval length verification unit 34 checks the observation interval length storage unit 35 using the coherence average value AVE_COH (K) in the non-target speech interval calculated by the non-target speech coherence average calculation unit 33 as a key, and the calibration gain observation interval The length T is obtained.

校正ゲイン観測区間長送信部３６は、校正ゲイン観測区間長Ｔを校正ゲイン計算実行判定部２２Ａに与えるものである。 The calibration gain observation interval length transmission unit 36 gives the calibration gain observation interval length T to the calibration gain calculation execution determination unit 22A.

（Ｂ−３）第２の実施形態の動作
次に、第２の実施形態の音響信号処理装置１Ａの動作を、図面を参照しながら、校正ゲイン計算部１５Ａにおける詳細動作、校正ゲイン観測区間長制御部２６における詳細動作の順に説明する。 (B-3) Operation of Second Embodiment Next, the operation of the acoustic signal processing device 1A of the second embodiment is described in detail with reference to the drawings, the detailed operation in the calibration gain calculation unit 15A, and the calibration gain observation section length. The detailed operation in the control unit 26 will be described in order.

図１０は、第２の実施形態の校正ゲイン計算部１５Ａの動作を示すフローチャートであり、上述した図６との同一、対応ステップには同一符号を付して示している。 FIG. 10 is a flowchart showing the operation of the calibration gain calculation unit 15A of the second embodiment, and the same and corresponding steps as those in FIG.

新たなフレームの処理に移行したときには、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)、入力信号分析フレームＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）及びコヒーレンスＣＯＨ（Ｋ）が、校正ゲイン計算部１５Ａに取り込まれる（ステップＳ１００）。 When the processing shifts to a new frame, the detection result storage variable VAD_RES (K), the input signal analysis frames FRAME1 (K), FRAME2 (K), and the coherence COH (K) are taken into the calibration gain calculation unit 15A (step) S100).

次に、校正ゲイン観測区間長Ｔが定められて設定される（ステップＳ１０６）。そして、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０で、かつ、変数ＣＯＵＮＴＥＲが閾値（校正ゲイン観測区間長）Ｔ以下であるか判定される（ステップＳ１０１）。この閾値Ｔは上述したようにステップＳ１０６で設定された値を用いる。 Next, the calibration gain observation section length T is determined and set (step S106). Then, it is determined whether the detection result storage variable VAD_RES (K) is 1.0 and the variable COUNTER is equal to or less than a threshold value (calibration gain observation section length) T (step S101). As the threshold T, the value set in step S106 as described above is used.

検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０で、かつ、変数ＣＯＵＮＴＥＲが閾値（校正ゲイン観測区間長）Ｔ以下の場合には、入力信号分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）が適用されて、上述した（８）式〜（１２）式に従って、入力信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨが計算され、また、変数ＣＯＵＮＴＥＲが１インクリメントされる（ステップＳ１０２）。 When the detection result storage variable VAD_RES (K) is 1.0 and the variable COUNTER is equal to or less than the threshold value (calibration gain observation section length) T, the input signal analysis frames FRAME1 (K) and FRAME2 (K) are applied. The calibration gain CALIB_GAIN_1CH for the input signal s1 (n) and the calibration gain CALIB_GAIN_2CH for the input signal s2 (n) are calculated according to the equations (8) to (12) described above, and the variable COUNTER is incremented by 1. (Step S102).

一方、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が０．０、若しくは、変数ＣＯＵＮＴＥＲが閾値Ｔを超えている、の一方が成立した場合には、記憶部２４に格納されている値ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨが、現フレームについての入力信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとして取り出される（ステップＳ１０３）。 On the other hand, when one of the detection result storage variable VAD_RES (K) is 0.0 or the variable COUNTER exceeds the threshold value T, the values INIT_GAIN_1CH and INIT_GAIN_2CH stored in the storage unit 24 are The calibration gain CALIB_GAIN_1CH for the input signal s1 (n) and the calibration gain CALIB_GAIN_2CH for the input signal s2 (n) for the current frame are extracted (step S103).

次に、校正ゲイン観測区間長制御部２６における詳細動作を、図１１のフローチャートを参照しながら説明する。 Next, detailed operation in the calibration gain observation section length control unit 26 will be described with reference to the flowchart of FIG.

新たなフレームの処理に移行したときには、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)及びコヒーレンスＣＯＨ（Ｋ）が、校正ゲイン観測区間長制御部２６に取り込まれる（ステップＳ２００）。そして、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が０．０か否かが判定される（ステップＳ２０１）。 When the processing shifts to a new frame, the detection result storage variable VAD_RES (K) and the coherence COH (K) are taken into the calibration gain observation section length control unit 26 (step S200). Then, it is determined whether or not the detection result storage variable VAD_RES (K) is 0.0 (step S201).

検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が０．０であれば、非目的音声区間におけるコヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）が、現フレームのコヒーレンスＣＯＨ（Ｋ）を適用した上述した（１３）式に従って算出され（ステップＳ２０２）、一方、検出結果格納変数ＶＡＤ＿ＲＥＳ(Ｋ)が１．０であれば、直前フレームのコヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ−１）を、現フレームにおけるコヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）としてそのまま適用される（ステップＳ２０３）。 If the detection result storage variable VAD_RES (K) is 0.0, the coherence average value AVE_COH (K) in the non-target speech section is calculated according to the above-described equation (13) using the coherence COH (K) of the current frame. (Step S202) On the other hand, if the detection result storage variable VAD_RES (K) is 1.0, the coherence average value AVE_COH (K-1) of the immediately preceding frame is directly applied as the coherence average value AVE_COH (K) in the current frame. (Step S203).

そして、現フレームにおけるコヒーレンス平均値ＡＶＥ＿ＣＯＨ（Ｋ）をキーとして、観測区間長記憶部３５が照合され、校正ゲイン観測区間長Ｔが得られる（ステップＳ２０４）。 Then, using the average coherence value AVE_COH (K) in the current frame as a key, the observation interval length storage unit 35 is collated to obtain the calibration gain observation interval length T (step S204).

その後、処理フレームを規定するパラメータＫを１インクリメントして、次のフレームの処理に進む（ステップＳ２０５）。 Thereafter, the parameter K defining the processing frame is incremented by 1, and the processing proceeds to the next frame (step S205).

なお、以上では、校正ゲイン観測区間長Ｔをフレーム毎に更新する場合を示したが、必ずしもフレーム毎に更新する必要はなく、校正ゲイン観測区間長Ｔを一度決定したら本機能を停止させるようにしても良く、あるいは、停止期間が所定期間になるごと（例えば３００フレームごと）に校正ゲイン観測区間長Ｔの決定動作を再開させるようにしても良く、校正ゲイン観測区間長Ｔの決定に関し、装置利用者が任意に調整できるようにしても良い。 In the above, the case where the calibration gain observation section length T is updated for each frame has been shown. However, it is not always necessary to update for each frame, and once the calibration gain observation section length T is determined, this function is stopped. Alternatively, the determination operation of the calibration gain observation section length T may be restarted every time the stop period reaches a predetermined period (for example, every 300 frames). The user may arbitrarily adjust it.

（Ｂ−４）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態と同様な効果に加え、以下のような効果を奏することができる。すなわち、第２の実施形態によれば、到来方位に応じて校正ゲイン観測区間長を最適値に設定できるので、妨害音が重畳していても校正ゲインの計算誤差を軽減することができる。 (B-4) Effects of Second Embodiment According to the second embodiment, in addition to the same effects as those of the first embodiment, the following effects can be achieved. That is, according to the second embodiment, the calibration gain observation section length can be set to an optimum value according to the arrival direction, so that the calibration gain calculation error can be reduced even if the interference sound is superimposed.

（Ｃ）他の実施形態
上記実施形態では、校正されていない入力信号をＦＦＴ部１０に入力させるものを示したが、計算した校正ゲインによって校正された入力信号がＦＦＴ部１０に入力させるようにし（フィードバック校正とし）、入力信号ｓ１（ｎ）、ｓ２（ｎ）にそれぞれ校正ゲインを乗算した校正後の入力信号を用いて校正ゲインを計算する構成とするようにしても良い。これにより、校正ゲイン計算の回数が増すごとに得られる校正ゲインの精度を高めることができる。この場合においては、フィードバックによって過去のフレームの情報が反映されるので、上述した（８）式及び（９）式を適用して、現フレームの全ての構成要素の絶対値の平均（ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨ）を算出するようにすれば良い。 (C) Other Embodiments In the above embodiment, an input signal that has not been calibrated is input to the FFT unit 10. However, an input signal calibrated by the calculated calibration gain is input to the FFT unit 10. (As feedback calibration), the calibration gain may be calculated using the input signals after calibration obtained by multiplying the input signals s1 (n) and s2 (n) by the calibration gain. Thereby, the accuracy of the calibration gain obtained each time the number of calibration gain calculations increases can be increased. In this case, since the information of the past frame is reflected by feedback, the above-described equations (8) and (9) are applied, and the average of the absolute values of all the components of the current frame (LEVEL_1CH, LEVEL_2CH ) May be calculated.

第２の実施形態（や第１の実施形態の変形実施形態）では、予め定めた校正ゲイン観測区間長Ｔによって校正ゲイン計算の終了判定を行っているが、これに限らず、例えば、校正ゲインの値が収束したときに校正ゲインの計算を終了させるようにしても良い。更新前後の校正ゲインの差が一定値以下になったことを、校正ゲインの値が収束したと捉えるようにしても良い。 In the second embodiment (or a modified embodiment of the first embodiment), the end determination of the calibration gain calculation is performed based on the predetermined calibration gain observation section length T. However, the present invention is not limited to this. The calibration gain calculation may be terminated when the value of converges. It may be considered that the value of the calibration gain has converged when the difference between the calibration gains before and after the update has become a certain value or less.

上記各実施形態の説明では、マイクｍ＿１、ｍ＿２以外の構成要素は、マイク感度差の校正のためだけに設けられているように説明したが、マイクｍ＿１、ｍ＿２以外の構成要素を他の目的のために流用するようにしても良い。例えば、校正ゲイン乗算部１６、１７を、ボイススイッチ用の乗算部として併用し、ボイススイッチ用の係数と校正ゲインとを乗算した値を、併用する乗算部に与えるようにしても良い。 In the description of each of the above embodiments, the components other than the microphones m_1 and m_2 have been described as being provided only for calibration of the microphone sensitivity difference. However, the components other than the microphones m_1 and m_2 are used for other purposes. For this purpose, it may be used. For example, the calibration gain multiplication units 16 and 17 may be used together as a voice switch multiplication unit, and a value obtained by multiplying the voice switch coefficient and the calibration gain may be given to the multiplication unit used together.

上記第２の実施形態では、テーブル構成の記憶部を利用して、校正ゲイン観測区間長を取得するものを示したが、コヒーレンス平均値から対応する校正ゲイン観測区間長を取得する方法は、変換テーブルを利用する方法に限定されず、例えば、変換関数を利用する方法であっても良い。 In the second embodiment, the table for acquiring the calibration gain observation section length using the storage unit having the table configuration is shown. However, the method for acquiring the corresponding calibration gain observation section length from the coherence average value is converted by The method is not limited to a method using a table, and may be a method using a conversion function, for example.

上記各実施形態において、周波数領域の信号で処理していた処理を、可能ならば時間領域の信号で処理するようにしても良く、逆に、時間領域の信号で処理していた処理を、可能ならば周波数領域の信号で処理するようにしても良い。 In each of the above embodiments, the processing that was processed with the frequency domain signal may be performed with the time domain signal if possible, and conversely, the processing that was processed with the time domain signal is possible. In this case, processing may be performed using a frequency domain signal.

上記各実施形態では、一対のマイクが捕捉した信号を直ちに処理する場合を示したが、本発明の処理対象の音響信号はこれに限定されるものではない。例えば、記録媒体から読み出した一対の音響信号を処理する場合にも、本発明を適用することができ、また、対向装置から送信されてきた一対の音響信号を処理する場合にも、本発明を適用することができる。 In each of the above embodiments, the case where the signal captured by the pair of microphones is immediately processed has been shown, but the acoustic signal to be processed of the present invention is not limited to this. For example, the present invention can be applied to processing a pair of acoustic signals read from a recording medium, and the present invention can also be applied to processing a pair of acoustic signals transmitted from a counter device. Can be applied.

上記各実施形態においては、マイクが２本の２チャンネル音響信号を処理する装置を示したが、これより多いチャンネル数の音響信号を処理する装置に対しても、本発明の技術思想を適用することができる。この場合において、ＦＦＴ部１０へは多チャンネルの中から選定した２チャンネルの信号を入力させ、校正ゲインは各チャンネルについて算出するようにしても良く、目標感度は、各チャンネルの感度の平均値とすれば良い。 In each of the above-described embodiments, an apparatus has been described in which a microphone processes two 2-channel acoustic signals. However, the technical idea of the present invention is also applied to an apparatus that processes acoustic signals having a larger number of channels. be able to. In this case, the FFT unit 10 may be inputted with a signal of two channels selected from among multiple channels, and the calibration gain may be calculated for each channel. The target sensitivity is the average value of the sensitivity of each channel. Just do it.

１、１Ａ…音響信号処理装置、ｍ＿１、ｍ＿２…マイク、１０…ＦＦＴ部、１１…第１指向性形成部、１２…第２の指向性形成部、１３…コヒーレンス計算部、１４…正面到来信号検出部、１５、１５Ａ…校正ゲイン計算部、１６、１７…校正ゲイン乗算部、２１、２１Ａ…検出結果・入力信号受信部、２２、２２Ａ…校正ゲイン計算実行判定部、２３…校正ゲイン計算実行部、２４…記憶部、２５…校正ゲイン送信部、２６…校正ゲイン観測区間長制御部。 DESCRIPTION OF SYMBOLS 1, 1A ... Acoustic signal processing apparatus, m_1, m_2 ... Microphone, 10 ... FFT part, 11 ... 1st directivity formation part, 12 ... 2nd directivity formation part, 13 ... Coherence calculation part, 14 ... Front arrival signal Detection unit, 15, 15A ... calibration gain calculation unit, 16, 17 ... calibration gain multiplication unit, 21, 21A ... detection result / input signal reception unit, 22, 22A ... calibration gain calculation execution determination unit, 23 ... calibration gain calculation execution Reference numeral 24: Storage unit 25: Calibration gain transmission unit 26: Calibration gain observation section length control unit

Claims

In an acoustic signal processing apparatus that calibrates a difference in microphone sensitivity among a plurality of input acoustic signals,
A first directivity forming unit that forms a first directivity signal having a directivity characteristic having a blind spot in a first predetermined direction by performing a delay subtraction process on the input acoustic signal;
Second directivity for forming a second directivity signal having a directivity characteristic having a blind spot in a second predetermined direction different from the first predetermined direction by performing a delay subtraction process on the input acoustic signal Forming part;
A coherence calculator that obtains coherence using the first and second directional signals;
A front arrival signal detector for detecting a signal interval coming from the front based on the coherence,
Using each input acoustic signal in the signal section coming from the front, the same calculation is performed for each input acoustic signal to calculate the microphone sensitivity index value, and the target sensitivity is determined from the calculated multiple microphone sensitivity index values A calibration gain calculation unit that calculates a calibration gain for each input acoustic signal from each microphone sensitivity index value and target sensitivity of each input acoustic signal;
An acoustic signal processing apparatus comprising: a plurality of calibration gain multipliers that calibrate a corresponding input acoustic signal with the obtained calibration gain.

The calibration gain calculation unit is a calibration gain observation interval length control unit that determines an observation interval length of the calibration gain based on the coherence of the interval detected by the front arrival signal detection unit that is not a signal interval coming from the front;
Calibration gain calculation is executed in the signal section coming from the front within the observation section length of the calibration gain, and the signal section that exceeds the observation section length of the calibration gain, and coming from other than the front within the observation section length of the calibration gain A calibration gain calculation execution determination unit that controls so that the calculation of the calibration gain is not executed in the signal interval to be performed;
A calibration gain storage unit for storing the calibration gain;
The calibration gain is calculated and output in the signal interval in which the calibration gain calculation is performed, and the calculated calibration gain is stored in the calibration gain storage unit, and the calibration gain storage is not performed in the signal interval in which the calibration gain calculation is not performed. The acoustic signal processing apparatus according to claim 1, further comprising: a calibration gain calculation execution unit that reads and outputs a calibration gain from the unit.

The calibration gain observation section length controller is
A coherence average calculating unit for obtaining an average value of the coherence of the section detected by the front arrival signal detection unit not being a signal section coming from the front;
An observation interval length storage unit storing a correspondence relationship between the stage of the average value of coherence and the observation interval length of the calibration gain;
3. A calibration gain observation interval length verification unit that acquires from the observation interval length storage unit the observation interval length of the calibration gain at the stage to which the average coherence value obtained by the coherence average calculation unit belongs. The acoustic signal processing device according to 1.

The calibration gain calculation unit calculates, for each of the input acoustic signals, an average value of absolute values of a plurality of signal components in the input acoustic signal as an index value of the microphone sensitivity related to the input acoustic signal. The acoustic signal processing device according to claim 1.

The calibration gain calculation unit determines an average value of the calculated index values of the plurality of microphone sensitivities as a target sensitivity, and divides the determined target sensitivity by an index value of the microphone sensitivity related to each of the input acoustic signals. The acoustic signal processing apparatus according to claim 1, wherein a calibration gain for each of the input acoustic signals is calculated.

In an acoustic signal processing method for calibrating a difference in microphone sensitivity among a plurality of input acoustic signals,
The first directivity forming unit forms a first directivity signal having a directivity characteristic having a blind spot in the first predetermined direction by performing a delay subtraction process on the input acoustic signal,
The second directivity forming unit performs a delay subtraction process on the input acoustic signal, thereby providing a second directivity having a directivity characteristic having a blind spot in a second predetermined direction different from the first predetermined direction. Form a signal,
The coherence calculation unit obtains coherence using the first and second directional signals,
The front arrival signal detection unit detects a signal interval coming from the front based on the coherence,
The calibration gain calculation unit uses each input acoustic signal in the signal section coming from the front, performs the same calculation for each input acoustic signal to calculate the microphone sensitivity index value, and calculates the calculated multiple microphone sensitivity indices The target sensitivity is determined from the value, and the calibration gain for each of the input acoustic signals is calculated from the index value of each microphone sensitivity of each input acoustic signal and the target sensitivity.
A plurality of calibration gain multiplication sections each calibrate corresponding input acoustic signals with a calibration gain given to the plurality of calibration gain multiplication sections.

A computer mounted on an acoustic signal processing apparatus that calibrates the difference in microphone sensitivity among a plurality of input acoustic signals,
A first directivity forming unit that forms a first directivity signal having a directivity characteristic having a blind spot in a first predetermined direction by performing a delay subtraction process on the input acoustic signal;
Second directivity for forming a second directivity signal having a directivity characteristic having a blind spot in a second predetermined direction different from the first predetermined direction by performing a delay subtraction process on the input acoustic signal Forming part;
A coherence calculator that obtains coherence using the first and second directional signals;
A front arrival signal detector for detecting a signal interval coming from the front based on the coherence,
Using each input acoustic signal in the signal section coming from the front, the same calculation is performed for each input acoustic signal to calculate the microphone sensitivity index value, and the target sensitivity is determined from the calculated multiple microphone sensitivity index values A calibration gain calculation unit that calculates a calibration gain for each input acoustic signal from each microphone sensitivity index value and target sensitivity of each input acoustic signal;
An acoustic signal processing program that functions as a plurality of calibration gain multipliers that calibrate a corresponding input acoustic signal with the obtained calibration gain.