JP2018032931A

JP2018032931A - Acoustic signal processing device, program and method

Info

Publication number: JP2018032931A
Application number: JP2016162712A
Authority: JP
Inventors: 克之高橋; Katsuyuki Takahashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2018-03-01
Anticipated expiration: 2036-08-23
Also published as: JP6711205B2

Abstract

PROBLEM TO BE SOLVED: To provide a sensitivity calibration gain calculation method capable of accurately calculating a sensitivity calibration gain and more easily setting a threshold even if there of an interference sound.SOLUTION: An acoustic signal processing device 10 comprises: a front face suppression signal generation part 12 for generating a front face suppression signal having a dead angle on a front face based on a difference of multiple frequency domain signals obtained by converting multiple input signals s1(n) and s2(n) obtained from multiple microphones m_1 and m_2 from a time domain to a frequency domain; a coherence calculation part 13 for calculating coherence based on the signal obtained from the multiple input signals; a feature amount calculation part 14 for calculating a feature amount representing a relationship between the front face suppression signal and the coherence; a calibration gain calculation part 15 for detecting a target sound phase that is not influenced by an interference sound, based on the feature amount and calculating a calibration gain for each of input signals in the target sound phase; and calibration gain multiplication parts 16 and 17 for calibrating the corresponding input signals by the calibration gains.SELECTED DRAWING: Figure 1

Description

本発明は、音響信号処理装置、プログラム及び方法に関し、例えば、電話やテレビ電話等に用いられる通信機又は通信ソフトウェア、あるいは、音声認識処理の前処理で用いる、音響信号処理に適用し得るものである。 The present invention relates to an acoustic signal processing device, a program, and a method, and can be applied to acoustic signal processing used in, for example, a communication device or communication software used for a telephone or a videophone, or preprocessing of voice recognition processing. is there.

近年、スマートフォンやカーナビゲーションなどのように、音声通話機能や音声認識機能等の様々な音声処理機能が搭載された機器が普及している。しかし、これらの機器が普及したことで、混雑した街中や走行中の車内など、以前よりも過酷な雑音環境下で音声処理機能が用いられるようになってきている。そのため、雑音環境下でも通話音質や音声認識性能を維持できるような、信号処理技術の需要が高まっている。 In recent years, devices equipped with various voice processing functions such as a voice call function and a voice recognition function have become widespread, such as smartphones and car navigation systems. However, with the widespread use of these devices, voice processing functions are being used in harsher noise environments than before, such as in crowded streets and running cars. For this reason, there is an increasing demand for signal processing technology that can maintain call sound quality and speech recognition performance even in a noisy environment.

特開２０１４−６８０５２号公報JP 2014-68052 A

平岡和幸、堀玄，“プログラミングのための確率統計”，株式会社オーム社発行，平成２１年１０月２３日，ｐ．１７８−ｐ．１７９Kazuyuki Hiraoka, Gen Hori, “Probability Statistics for Programming”, published by Ohm Co., Ltd., October 23, 2009, p. 178-p. 179

近年、多チャンネルのマイクを用いた音響信号処理技術が実現されているが、同じ型番のマイクであっても感度差があり、感度差を校正しなければ正確な音響特徴量の計算ができない。これまでは事前にマイクの感度を測定し、感度差に応じた補正ゲインを設定したり、チャンネルごとに入力レベルを比較して、平均値に一致させるような補正ゲインを自動設定するなどの手法で対処している。しかし、前者は手間がかかり、後者はマイクの感度差だけでなく取得した入力信号の差も埋めてしまうため、後段で計算する音響特徴量の精度が保障されない、という課題がある。 In recent years, an acoustic signal processing technique using a multi-channel microphone has been realized. However, even if the microphones have the same model number, there is a sensitivity difference, and an accurate acoustic feature cannot be calculated unless the sensitivity difference is calibrated. Previously, methods such as measuring microphone sensitivity in advance and setting a correction gain according to the sensitivity difference, or automatically setting a correction gain that matches the average value by comparing the input level for each channel It is dealt with in. However, since the former takes time and the latter fills not only the difference in microphone sensitivity but also the difference in the acquired input signals, there is a problem that the accuracy of the acoustic feature amount calculated in the latter stage is not guaranteed.

この課題の改善方法の１つが、入力信号のうち、マイク正面から到来する信号成分の区間でのみ入力レベルの比較を行って校正ゲインを計算する、というものである。これは正面から到来する信号ならば各マイクと音源との距離が等しいため、マイクに到達する信号成分の音響的な特性差は微小であり、両者に発生する特性差はマイク感度のみであると期待できることを前提としている。これを前提とした解決法の１つが特許文献１に記載される手法である。これは、正面から目的話者の音声が到来するか否かによってコヒーレンスという特徴量の大小が変動することに注目し、正面から音声が到来する信号区間でマイク感度差校正ゲインを算出する、という技術である。なお、コヒーレンスはマイクの感度差があっても、音声が正面から到来するか否かで大小が変動するという挙動は維持されるので、この手法で感度差を校正することができる。（補：コヒーレンスの計算方法は特許文献１の式７を参照のこと） One method of improving this problem is to calculate the calibration gain by comparing the input levels only in the section of the signal component coming from the front of the microphone among the input signals. If this is a signal coming from the front, the distance between each microphone and the sound source is the same, so the acoustic characteristic difference between the signal components reaching the microphone is very small, and the only characteristic difference that occurs between them is the microphone sensitivity. It is assumed that it can be expected. One of the solutions based on this is the technique described in Patent Document 1. This is because the feature quantity called coherence varies depending on whether the target speaker's voice comes from the front or not, and the microphone sensitivity difference calibration gain is calculated in the signal section where the voice comes from the front. Technology. Note that even if there is a difference in sensitivity between microphones, the coherence maintains the behavior that the magnitude varies depending on whether or not the sound comes from the front, so that the sensitivity difference can be calibrated by this method. (Supplement: Refer to Equation 7 in Patent Document 1 for coherence calculation method)

しかし、特許文献１の方法は、マイクアレイの正面から到来する目的音声と同時に左右から別の話者の話し声（妨害音）が到来する場合にもコヒーレンスが大きい値をとるため、正面から到来していない音声成分も校正ゲインに反映されてしまう。また、マイクの感度差はマイクアレイごとにランダムなので、正面から到来する信号区間を検出する閾値の最適化が難しく、目的音声区間を誤判定してしまう可能性がある。 However, since the method of Patent Document 1 has a high coherence value even when the speech of another speaker (interfering sound) comes from the left and right simultaneously with the target voice coming from the front of the microphone array, the method arrives from the front. Audio components that are not included are also reflected in the calibration gain. Further, since the sensitivity difference between the microphones is random for each microphone array, it is difficult to optimize the threshold value for detecting the signal section coming from the front, and the target speech section may be erroneously determined.

そのため、上記のような２つの課題を改善するため、妨害音が存在していても正確に感度校正ゲインが計算でき、かつ、閾値をより容易に設定できる感度校正ゲイン計算方法が求められている。 Therefore, in order to improve the two problems as described above, there is a need for a sensitivity calibration gain calculation method that can accurately calculate the sensitivity calibration gain even if there is an interfering sound and that can set the threshold more easily. .

上記課題を解決するために、第１の本発明に係る音響信号処理装置は、（１）複数のマイクのそれぞれから得られた複数の入力信号を時間領域から周波数領域に変換した複数の周波数領域信号の差に基づいて、正面に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）複数の入力信号から得た信号に基づいてコヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号と上記コヒーレンスとの関係性を表す特徴量を算出する特徴量算出部と、（４）特徴量に基づいて、妨害音の影響を受けない目的音区間を検出し、その目的音区間の各入力信号に対する校正ゲインを算出する校正ゲイン算出部と、（５）各校正ゲインで、対応する各入力信号を校正する校正部とを備えることを特徴とする。 In order to solve the above problems, an acoustic signal processing device according to a first aspect of the present invention is (1) a plurality of frequency domains obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain. (2) a front suppression signal generation unit that generates a frontal suppression signal having a blind spot on the front based on the signal difference; (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals; ) A feature value calculation unit that calculates a feature value indicating the relationship between the front suppression signal and the coherence; and (4) detecting a target sound section that is not affected by the disturbing sound based on the feature value and detecting the target sound. It comprises a calibration gain calculation unit that calculates a calibration gain for each input signal in the section, and (5) a calibration unit that calibrates each corresponding input signal with each calibration gain.

第２の本発明に係る音響信号処理プログラムは、コンピュータを、（１）複数のマイクのそれぞれから得られた複数の入力信号を時間領域から周波数領域に変換した複数の周波数領域信号の差に基づいて、正面に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）複数の入力信号から得た信号に基づいてコヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号とコヒーレンスとの関係性を表す特徴量を算出する特徴量算出部と、（４）特徴量に基づいて、妨害音の影響を受けない目的音区間を検出し、その目的音区間の各入力信号に対する校正ゲインを算出する校正ゲイン算出部と、（５）各校正ゲインで、対応する各入力信号を校正する校正部として機能させることを特徴とする。 The acoustic signal processing program according to the second aspect of the present invention is based on a difference between a plurality of frequency domain signals obtained by converting (1) a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain. A front suppression signal generation unit that generates a front suppression signal having a blind spot in front, (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals, and (3) a front suppression signal A feature value calculation unit for calculating a feature value representing the relationship with coherence; and (4) detecting a target sound section that is not affected by the interference sound based on the feature value, and for each input signal of the target sound section. A calibration gain calculation unit that calculates a calibration gain, and (5) each calibration gain functions as a calibration unit that calibrates each corresponding input signal.

第３の本発明に係る音響信号処理方法は、（１）正面抑圧信号生成部が、複数のマイクのそれぞれから得られた複数の入力信号を時間領域から周波数領域に変換した複数の周波数領域信号の差に基づいて、正面に死角を有する正面抑圧信号を生成し、（２）コヒーレンス算出部が、複数の入力信号から得た信号に基づいてコヒーレンスを算出し、（３）特徴量算出部が、正面抑圧信号とコヒーレンスとの関係性を表す特徴量を算出し、（４）校正ゲイン算出部が、特徴量に基づいて、妨害音の影響を受けない目的音区間を検出し、その目的音区間の各入力信号に対する校正ゲインを算出し、（５）校正部が、各校正ゲインで、対応する各入力信号を校正することを特徴とする。 The acoustic signal processing method according to the third aspect of the present invention includes: (1) a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain; (2) a coherence calculation unit calculates coherence based on signals obtained from a plurality of input signals, and (3) a feature amount calculation unit Then, a feature amount representing the relationship between the front suppression signal and coherence is calculated. (4) The calibration gain calculation unit detects a target sound section that is not affected by the interference sound based on the feature amount, and the target sound is calculated. A calibration gain for each input signal in the section is calculated, and (5) a calibration unit calibrates each corresponding input signal with each calibration gain.

本発明によれば、妨害音があっても正確に感度校正ゲインが計算でき、かつ、閾値をより容易に設定できる。 According to the present invention, it is possible to accurately calculate the sensitivity calibration gain even if there is an interference sound, and to set the threshold value more easily.

実施形態に係る音響信号処理装置の全体構成を示すブロック図である。1 is a block diagram showing an overall configuration of an acoustic signal processing device according to an embodiment. 実施形態に係る正面抑圧信号生成部で形成される指向性の特性を示す説明図である。It is explanatory drawing which shows the directivity characteristic formed in the front suppression signal production | generation part which concerns on embodiment. 実施形態に係る相関計算部の構成を示すブロック図である。It is a block diagram which shows the structure of the correlation calculation part which concerns on embodiment. 実施形態に係る校正ゲイン計算部の構成を示すブロック図である。It is a block diagram which shows the structure of the calibration gain calculation part which concerns on embodiment. 実施形態に係る校正ゲイン計算部における処理動作を示すフローチャートである。It is a flowchart which shows the processing operation in the calibration gain calculation part which concerns on embodiment.

（Ａ）主たる実施形態
以下では、本発明に係る音響信号処理装置、プログラム及び方法の実施形態を、図面を参照しながら詳細に説明する。 (A) Main Embodiment Hereinafter, embodiments of an acoustic signal processing device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）実施形態の構成
図１は、この実施形態に係る音響信号処理装置１０の全体構成を示すブロック図である。 (A-1) Configuration of Embodiment FIG. 1 is a block diagram showing an overall configuration of an acoustic signal processing apparatus 10 according to this embodiment.

図１において、音響信号処理装置１０は、複数（図１では２個の場合を例示している）のマイクｍ＿１及びｍ＿２、ＦＦＴ部１１、正面抑圧信号生成部１２、コヒーレンス計算部１３、相関計算部１４、校正ゲイン計算部１５、第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７を有する。 In FIG. 1, an acoustic signal processing apparatus 10 includes a plurality of microphones m_1 and m_2 (two cases are illustrated in FIG. 1), an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, and a correlation calculation. Unit 14, calibration gain calculator 15, first calibration gain multiplier 16, and second calibration gain multiplier 17.

なお、特許請求の範囲に記載の「特徴量算出部」は相関計算部１４を含むものである。また、「校正ゲイン算出部」は校正ゲイン計算部１５を含むものである。さらに、「校正部」は第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７を含むものである。 The “feature amount calculation unit” described in the claims includes a correlation calculation unit 14. The “calibration gain calculation unit” includes a calibration gain calculation unit 15. Further, the “calibration unit” includes a first calibration gain multiplication unit 16 and a second calibration gain multiplication unit 17.

図１に例示する音響信号処理装置１０において、マイクｍ＿１及びｍ＿２以外の構成要素は、ＣＰＵが実行するソフトウェア（音響信号処理プログラム）として実現することができ、音響信号処理プログラムの機能は、図１で表すことができる。 In the acoustic signal processing apparatus 10 illustrated in FIG. 1, components other than the microphones m_1 and m_2 can be realized as software (acoustic signal processing program) executed by the CPU, and the functions of the acoustic signal processing program are illustrated in FIG. Can be expressed as

マイクｍ＿１及びマイクｍ＿２は、所定距離（若しくは任意の距離）だけ離れて配置され、マイクｍ＿１及びマイクｍ＿２のそれぞれは、周囲の音響を捕捉するものである。各マイクｍ＿１及びマイクｍ＿２で捕捉された各音響信号（入力信号）は、図示しないアナログ／デジタル（Ａ／Ｄ）変換器に変換されて、入力信号ｓ１（ｎ）及びｓ２（ｎ）のそれぞれが、ＦＦＴ部１１と、校正ゲイン計算部１５と、第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７とに与えられる。なお、ｎは、サンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎの値が小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The microphone m_1 and the microphone m_2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each of the microphone m_1 and the microphone m_2 captures surrounding sounds. Each acoustic signal (input signal) captured by each microphone m_1 and microphone m_2 is converted to an analog / digital (A / D) converter (not shown), and each of the input signals s1 (n) and s2 (n) , FFT unit 11, calibration gain calculation unit 15, first calibration gain multiplication unit 16 and second calibration gain multiplication unit 17. Note that n is an index indicating the input order of samples and is expressed by a positive integer. In the text, it is assumed that the smaller the value of n, the older the input sample, and the larger the value, the newer the input sample.

ＦＦＴ部１１は、マイクｍ＿１及びｍ＿２から入力信号ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力信号ｓ１（ｎ）及びｓ２（ｎ）に高速フーリエ変換（あるいは離散フーリエ変換）を行なうものである。これにより、入力信号ｓ１（ｎ）及びｓ２（ｎ）を時間領域から周波数領域に変換することができる。なお、ＦＦＴ部１１は、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から所定のＮ個（Ｎは任意の整数）のサンプルから成る、分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成するものとする。 The FFT unit 11 receives input signals s1 (n) and s2 (n) from the microphones m_1 and m_2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 (n) and s2 (n). is there. Thereby, the input signals s1 (n) and s2 (n) can be converted from the time domain to the frequency domain. Note that, when performing the fast Fourier transform, the FFT unit 11 includes an analysis frame FRAME1 (K) composed of predetermined N (N is an arbitrary integer) samples from the input signals s1 (n) and s2 (n). Assume that FRAME2 (K) is configured.

入力信号ｓ１からＦＲＡＭＥ１を構成する例を（１）式に例示する。なお、以下の（１）式において、Ｋは、フレームの順番を表すインデックスであり、正の整数で表現される。以下では、Ｋの値が小さいほど古い分析フレームであり、Ｋの値が大きいほど新しい分析フレームであるものとする。また、以降の動作説明において、特に但し書きが無い限りは、分析対象となる最新の分析フレームを表すインデックスＫであるとする。

An example of configuring FRAME1 from the input signal s1 is illustrated in equation (1). In the following equation (1), K is an index indicating the order of frames, and is expressed as a positive integer. In the following, it is assumed that the smaller the K value, the older the analysis frame, and the larger the K value, the newer the analysis frame. In the following description of the operation, it is assumed that the index K represents the latest analysis frame to be analyzed unless otherwise specified.

ＦＦＴ部１１は、分析フレームごとに、高速フーリエ変換処理を施すことで、入力信号ｓ１から構成した分析フレームＦＲＡＭＥ１（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ１（ｆ，Ｋ）と、入力信号ｓ２から構成した分析フレームＦＲＡＭＥ２（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ２（ｆ，Ｘ）とを取得する。ＦＦＴ部１１は、周波数領域信号Ｘ１（ｆ，Ｋ）及び周波数領域信号Ｘ２（ｆ，Ｘ）を、正面抑圧信号生成部１２を供給すると共に、コヒーレンス計算部１３に与える。 The FFT unit 11 performs a fast Fourier transform process for each analysis frame, thereby performing a frequency domain signal X1 (f, K) obtained by performing a Fourier transform on the analysis frame FRAME1 (K) configured from the input signal s1, and an input A frequency domain signal X2 (f, X) obtained by performing Fourier transform on the analysis frame FRAME2 (K) configured from the signal s2 is acquired. The FFT unit 11 supplies the frequency domain signal X 1 (f, K) and the frequency domain signal X 2 (f, X) to the coherence calculation unit 13 while supplying the front suppression signal generation unit 12.

ここで、ｆは周波数を表すインデックスである。また、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ,Ｋ）は単一の値ではなく、以下の(２)式のように、複数の周波数ｆ１〜ｆｍのｍ個（ｍは任意の整数）の成分（スペクトル成分）から構成されるものであるとする。

Here, f is an index representing a frequency. Further, the frequency domain signals X1 (f, K) and X2 (f, K) are not single values, but m (f is an arbitrary number) of a plurality of frequencies f1 to fm as shown in the following equation (2). (Integer) component (spectrum component).

上記（２）式において、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部からなる。以降、Ｘ２（ｆ，Ｋ）、及び後述する正面抑圧信号生成部１２で現れる正面抑圧信号Ｎ（ｆ，Ｋ）についても同様である。 In the above equation (2), X1 (f, K) is a complex number and consists of a real part and an imaginary part. The same applies to X2 (f, K) and the front suppression signal N (f, K) appearing in the front suppression signal generation unit 12 described later.

正面抑圧信号生成部１２は、ＦＦＴ部からの信号について、周波数成分ごとに、正面方向から到来する信号成分を抑圧する処理を行なう。言い換えると、正面抑圧信号生成部１２は、正面方向の成分を抑圧する指向性フィルタとして機能する。 The front suppression signal generation unit 12 performs a process of suppressing the signal component coming from the front direction for each frequency component of the signal from the FFT unit. In other words, the front suppression signal generation unit 12 functions as a directivity filter that suppresses a component in the front direction.

例えば、正面抑圧信号生成部１２は、図２に示すように、８の字型の正面方向に死角を有する双指向性のフィルタを用いて、ＦＦＴ部１１からの信号から正面方向の成分を抑圧する指向性フィルタを形成する。 For example, as shown in FIG. 2, the front suppression signal generation unit 12 suppresses the front direction component from the signal from the FFT unit 11 using a bi-directional filter having a blind spot in the front direction of the figure 8 shape. A directional filter is formed.

具体的には、正面抑圧信号生成部１２は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、（３）式のような計算を行なって、周波数成分毎の正面抑圧信号Ｎ（ｆ，Ｎ）を生成する。以下の（３）式の計算は、図２のような、正面方向に死角を有する８の字型の双指向性のフィルタを形成する処理に相当する。
Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） Specifically, the front suppression signal generation unit 12 performs a calculation such as the expression (3) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 to obtain frequency components. A front suppression signal N (f, N) is generated for each. The calculation of the following formula (3) corresponds to a process of forming an 8-shaped bi-directional filter having a blind spot in the front direction as shown in FIG.
N (f, K) = X1 (f, K) -X2 (f, K) (3)

以上のように、正面抑圧信号生成部１２は、周波数ｆ１〜ｆｍの各周波数成分（各周波数帯の１フレーム分のパワー）を取得する。 As described above, the front suppression signal generation unit 12 acquires each frequency component of frequencies f1 to fm (power for one frame in each frequency band).

また、正面抑圧信号生成部１２は、（４）式に従って、周波数ｆ１〜ｆｍの全周波数に亘って、正面抑圧信号Ｎ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出する。

Further, the front suppression signal generator 12 calculates an average front suppression signal AVE_N (K) by averaging the front suppression signals N (f, K) over all frequencies f1 to fm according to the equation (4). To do.

コヒーレンス計算部１４は、ＦＦＴ部１１からの周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に含まれる特定方向に指向性が強い信号を形成してコヒーレンスＣＯＨ（Ｋ）を算出する。 The coherence calculation unit 14 calculates a coherence COH (K) by forming a highly directional signal in a specific direction included in the frequency domain signals X1 (f, K) and X2 (f, K) from the FFT unit 11. .

ここで、コヒーレンス計算部１４におけるコヒーレンスＣＯＨ（Ｋ）の算出処理を説明する。 Here, the calculation processing of coherence COH (K) in the coherence calculation unit 14 will be described.

コヒーレンス計算部１４は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第１の方向（例えば、左方向）に指向性が強いフィルタで処理した信号Ｂ１（ｆ，Ｋ）を形成し、またコヒーレンス計算部１４は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第２の方向（例えば、右方向）に指向性が強いフィルタで処理した信号Ｂ２（ｆ，Ｋ）を形成する。特定方向に指向性の強い信号Ｂ１（ｆ）、Ｂ２（ｆ）の形成方法は、既存の方法を適用することができ、ここでは、以下の（５）式を適用して第１の方向に指向性が強い信号Ｂ１を形成し、以下の（６）式を適用して第２の方向に指向性が強い信号Ｂ２を形成する場合を例示する。

The coherence calculation unit 14 processes the signal B1 (f, K) obtained by processing the frequency domain signals X1 (f, K) and X2 (f, K) with a filter having strong directivity in the first direction (for example, left direction). In addition, the coherence calculation unit 14 generates a signal B2 (f that is processed by a filter having strong directivity in the second direction (for example, right direction) from the frequency domain signals X1 (f, K) and X2 (f, K). , K). An existing method can be applied to the formation method of the signals B1 (f) and B2 (f) having high directivity in a specific direction. Here, the following equation (5) is applied to the first direction. An example in which the signal B1 having high directivity is formed and the signal B2 having high directivity in the second direction is formed by applying the following equation (6) will be described.

上記の（５）式、（６）式において、Ｓはサンプリング周波数、ＮはＦＦＴ分析フレーム長、τはマイクｍ＿１とマイクｍ＿２との間の音波到達時間差、ｉは虚数単位、ｆは周波数を示す。 In the above formulas (5) and (6), S is the sampling frequency, N is the FFT analysis frame length, τ is the difference in arrival time of sound waves between the microphone m_1 and the microphone m_2, i is the imaginary unit, and f is the frequency. .

次に、コヒーレンス計算部１３は、上記のようにして得られた信号Ｂ１（ｆ）、Ｂ２（ｆ）に対し、以下のような（７）式、（８）式に示す演算を施すことでコヒーレンスＣＯＨ（Ｋ）を得る。ここで、（７）式におけるＢ２（ｆ、Ｋ）^＊はＢ２（ｆ、Ｋ）の共役複素数である。

Next, the coherence calculation unit 13 performs the operations shown in the following expressions (7) and (8) on the signals B1 (f) and B2 (f) obtained as described above. Obtain coherence COH (K). Here, B2 (f, K) ^* in the equation (7) is a conjugate complex number of B2 (f, K).

ｃｏｅｆ（ｆ、Ｋ）は、インデックスが任意のインデックスＫのフレーム（分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成する任意の周波数ｆ（周波数ｆ１〜ｆｍのいずれかの周波数）の成分におけるコヒーレンスを表しているものとする。 coef (f, K) is a coherence in a component of an index K having an arbitrary index K (an arbitrary frequency f (any one of frequencies f1 to fm) constituting the analysis frames FRAME1 (K) and FRAME2 (K)). .

なお、ｃｏｅｆ（ｆ，Ｋ）を求める際に、信号Ｂ１（ｆ）の指向性の方向と信号Ｂ（ｆ）の指向性の方向が異なるものであれば、信号Ｂ１（ｆ）及び信号Ｂ２（ｆ）に係る指向性方向はそれぞれ、正面方向以外の任意の方向とするようにしてもよい。また、ｃｏｅｆ（ｆ，Ｋ）を算出する方法は、上記の算出方法に限定されるものではなく、例えば、特許文献１に記載される算出方法を適用することができる。 When obtaining coef (f, K), if the directionality of the signal B1 (f) is different from that of the signal B (f), the signals B1 (f) and B2 ( The directivity direction according to f) may be any direction other than the front direction. Further, the method for calculating coef (f, K) is not limited to the above calculation method, and for example, the calculation method described in Patent Document 1 can be applied.

相関計算部１４は、正面抑圧信号生成部１２から平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を取得し、コヒーレンス計算部１３からコヒーレンスＣＯＨ（Ｋ）を取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との相関係数ｃｏｒ（Ｋ）を算出する。 The correlation calculation unit 14 acquires the average front suppression signal AVE_N (K) from the front suppression signal generation unit 12, acquires the coherence COH (K) from the coherence calculation unit 13, and acquires the average front suppression signal AVE_N (K) and the coherence COH. A correlation coefficient cor (K) with (K) is calculated.

相関計算部１４が、正面方向以外に指向性を有する正面抑圧信号（平均正面抑圧信号）と、コヒーレンスとの相関係数を算出する意義を説明する。 The significance of the correlation calculation unit 14 calculating the correlation coefficient between the front suppression signal (average front suppression signal) having directivity other than the front direction and coherence will be described.

ここでは、マイクｍ＿１及びマイクｍ＿２の正面方向に、目的音を発する音源が存在し、正面方向以外の方向（例えば、マイクｍ＿１及びマイクｍ＿２の左右方向の方向）から妨害音が到来するものとする。 Here, it is assumed that there is a sound source that emits a target sound in the front direction of the microphone m_1 and the microphone m_2, and the interference sound comes from a direction other than the front direction (for example, the horizontal direction of the microphone m_1 and the microphone m_2). .

例えば、「妨害音声が存在せず」、かつ、「目的音が存在する」場合、正面抑圧信号は、目的音成分の大きさに比例した信号値となる。ただし、図２のように、正面方向のゲインは、横方向のゲインと比較して小さいため、妨害音が存在する場合よりも小さい値となる。 For example, when “no disturbing voice is present” and “the target sound is present”, the front suppression signal has a signal value proportional to the magnitude of the target sound component. However, as shown in FIG. 2, since the gain in the front direction is smaller than the gain in the horizontal direction, the gain is smaller than that in the case where an interfering sound is present.

また、コヒーレンスＣＯＨ（Ｋ）は、入力信号の到来方向と深い関係を持つ特徴量であり、２つの信号成分の相関と言い換えられる。これは、（６）式は、ある周波数成分についての相関を算出する式であり、（７）式は全ての周波数成分の相関値の平均を計算する式であるためである。そのため、コヒーレンスＣＯＨ（Ｋ）が小さい場合は、２つの信号成分の相関が小さい場合であり、反対に、コヒーレンスＣＯＨ（Ｋ）が大きい場合とは、２つの信号成分の相関が大きい場合と言い換えることができる。そして、コヒーレンスＣＯＨ（Ｋ）が小さい場合の入力信号は、到来方向が右又は左のいずれかに大きく偏っており、正面方向以外の方向から到来している信号といえる。一方、コヒーレンスＣＯＨ（Ｋ）が大きい場合の入力信号は、到来方向の偏りが少なく、正面方向から到来している信号であるといえる。 The coherence COH (K) is a feature quantity that has a deep relationship with the arrival direction of the input signal, and is rephrased as a correlation between two signal components. This is because Equation (6) is an equation for calculating a correlation for a certain frequency component, and Equation (7) is an equation for calculating an average of correlation values of all frequency components. Therefore, when the coherence COH (K) is small, the correlation between the two signal components is small. On the contrary, when the coherence COH (K) is large, the correlation between the two signal components is large. Can do. The input signal when the coherence COH (K) is small is a signal arriving from a direction other than the front direction because the arrival direction is greatly biased to either the right or the left. On the other hand, the input signal when the coherence COH (K) is large can be said to be a signal arriving from the front direction with little deviation in the arrival direction.

そうすると、「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きい値となり、「妨害音が存在し」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は小さい値となる。 Then, if “no interference sound exists” and “the target sound exists”, the coherence COH (K) becomes a large value, “the interference sound exists”, and “the target sound exists”. The coherence COH (K) is a small value.

以上の挙動を妨害音の有無に着目して整理すると、以下のような関係となる。
・「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きな値となり、正面抑圧信号は目的音成分の大きさに比例した値となる
・「妨害音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）が小さい値となり、正面抑圧信号は大きい値となる。 When the above behavior is organized by focusing on the presence or absence of interfering sounds, the following relationship is obtained.
・ If “no disturbing sound” and “target sound exists”, coherence COH (K) is a large value, and the front suppression signal is a value proportional to the magnitude of the target sound component. When the sound is present, the coherence COH (K) is a small value and the front suppression signal is a large value.

ところで、上記のような挙動の場合、正面抑圧信号とコヒーレンスＣＯＨ（Ｋ）との相関係数を導入すると、以下のようなことがいえる。
・「妨害音が存在しない」場合、相関係数は正の値となる
・「妨害音が存在する」場合、相関係数は負の値となる。
従って、正面抑圧信号とコヒーレンスとの相関係数の正負を観測するだけで、妨害音の有無を判断することができる。そして、この挙動を用いると、正面抑圧信号とコヒーレンスとの相関係数の値が「正」の場合、正面方向からの目的音のみの区間と判断できるので、妨害音の影響を受けることなく、マイクｍ＿１及びｍ＿２の感度差の校正ゲインを計算することができる。また、相関係数の値の正負を観測するだけで、目的音声区間を検出できるため、従来技術とは異なり閾値設定が容易になる。 By the way, in the case of the above behavior, if the correlation coefficient between the front suppression signal and the coherence COH (K) is introduced, the following can be said.
・ When “no disturbing sound” is present, the correlation coefficient is a positive value. ・ When “disturbing sound is present”, the correlation coefficient is a negative value.
Therefore, it is possible to determine the presence or absence of the interference sound only by observing the positive / negative of the correlation coefficient between the front suppression signal and the coherence. And using this behavior, when the value of the correlation coefficient between the front suppression signal and coherence is `` positive '', it can be determined that it is a section of only the target sound from the front direction, so it is not affected by the interference sound, The calibration gain for the sensitivity difference between the microphones m_1 and m_2 can be calculated. Also, since the target speech section can be detected simply by observing whether the correlation coefficient value is positive or negative, threshold setting is facilitated unlike the prior art.

以下では、相関計算部１４における、正面抑圧信号とコヒーレンスとの相関係数の算出処理を、図面を参照しながら詳細に説明する。 Hereinafter, the calculation processing of the correlation coefficient between the front suppression signal and the coherence in the correlation calculation unit 14 will be described in detail with reference to the drawings.

図３は、実施形態に係る相関計算部１４の構成を示すブロック図である。 FIG. 3 is a block diagram illustrating a configuration of the correlation calculation unit 14 according to the embodiment.

図３において、相関計算部１４は、正面抑圧信号・コヒーレンス取得部３１、相関係数計算部３２、相関係数出力部３３を有する。 In FIG. 3, the correlation calculation unit 14 includes a front suppression signal / coherence acquisition unit 31, a correlation coefficient calculation unit 32, and a correlation coefficient output unit 33.

正面抑圧信号・コヒーレンス取得部３１は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）とを取得し、相関係数計算部３２が、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）とに基づいて、相関係数ｃｏｒ（Ｋ）を算出する。そして、相関係数出力部３３は、算出した相関係数ｃｏｒ（Ｋ）を校正ゲイン計算部１５に出力する。 The front suppression signal / coherence acquisition unit 31 acquires the average front suppression signal AVE_N (K) and the coherence COH (K), and the correlation coefficient calculation unit 32 calculates the average front suppression signal AVE_N (K) and the coherence COH (K). ) To calculate the correlation coefficient cor (K). Then, the correlation coefficient output unit 33 outputs the calculated correlation coefficient cor (K) to the calibration gain calculation unit 15.

ここで、相関係数ｃｏｒ（Ｋ）の算出方法は限定されるものではないが、例えば、非特許文献１に記載された計算方法を適用することができる。例えば、以下の式（９）を用いて、フレームごとに相関係数ｃｏｒ（Ｋ）を求める。なお、以下の（９）式において、Ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の共分散を示している。また、以下の（９）式において、σＡＶＥ＿Ｎ（Ｋ）は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）の標準偏差を示し、σＣＯＨ（Ｋ）は、コヒーレンスＣＯＨ（Ｋ）の標準偏差を示している。このようにして得られる相関係数ｃｏｒ（Ｋ）は、−１．０〜１．０の値をとる。

Here, the calculation method of the correlation coefficient cor (K) is not limited. For example, the calculation method described in Non-Patent Document 1 can be applied. For example, the correlation coefficient cor (K) is obtained for each frame using the following equation (9). In the following equation (9), Cov [AVE_N (K), COH (K)] indicates the covariance between the average front suppression signal AVE_N (K) and the coherence COH (K). In the following equation (9), σAVE_N (K) indicates the standard deviation of the average front suppression signal AVE_N (K), and σCOH (K) indicates the standard deviation of the coherence COH (K). The correlation coefficient cor (K) obtained in this way takes a value of −1.0 to 1.0.

校正ゲイン計算部１５は、相関計算部１４から相関係数ｃｏｒ（Ｋ）を取得し、相関係数ｃｏｒ（Ｋ）の正負を観測し、相関係数ｃｏｒ（Ｋ）が「正」の区間の入力信号のみを用いて、マイクｍ＿１とマイクｍ＿２との校正ゲインを算出する。 The calibration gain calculation unit 15 acquires the correlation coefficient cor (K) from the correlation calculation unit 14, observes the positive / negative of the correlation coefficient cor (K), and in the section where the correlation coefficient cor (K) is “positive”. The calibration gain of microphone m_1 and microphone m_2 is calculated using only the input signal.

図４は、実施形態に係る校正ゲイン計算部１５の構成を示すブロック図である。 FIG. 4 is a block diagram illustrating a configuration of the calibration gain calculation unit 15 according to the embodiment.

図４において、校正ゲイン計算部１５は、相関係数及び入力信号受信部４１、校正ゲイン計算実行判定部４２、校正ゲイン計算部４３、校正ゲイン記憶部４４、校正ゲイン出力部４５を有する。 4, the calibration gain calculation unit 15 includes a correlation coefficient and input signal reception unit 41, a calibration gain calculation execution determination unit 42, a calibration gain calculation unit 43, a calibration gain storage unit 44, and a calibration gain output unit 45.

相関係数及び入力信号取得部４１は、相関計算部１４から相関係数ｃｏｒ（Ｋ）と、入力信号の分析フレームであるＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）とを取得するものである。 The correlation coefficient and input signal acquisition unit 41 acquires the correlation coefficient cor (K) and FRAME1 (K) and FRAME2 (K) that are analysis frames of the input signal from the correlation calculation unit 14.

校正ゲイン計算実行判定部４２は、校正ゲインの計算を実行するか否かを判定するため、相関係数ｃｏｒ（Ｋ）の値が「正」であるか又は「負」であるかを判定する。すなわち、相関係数ｃｏｒ（Ｋ）の値が「正」の場合、校正ゲイン計算実行判定部４２は、入力信号には妨害音が含まれていない目的音区間と判断し、校正ゲインの計算を実行する区間であることを判定する。一方、相関係数ｃｏｒ（Ｋ）の値が「負」の場合、校正ゲイン計算実行判定部４２は、入力信号には妨害音が含まれている区間と判断し、校正ゲインの計算を実行しない区間であると判定する。 The calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor (K) is “positive” or “negative” in order to determine whether or not to execute the calculation of the calibration gain. . That is, when the value of the correlation coefficient cor (K) is “positive”, the calibration gain calculation execution determination unit 42 determines that the input signal does not include the interference sound, and calculates the calibration gain. It is determined that it is a section to be executed. On the other hand, when the value of the correlation coefficient cor (K) is “negative”, the calibration gain calculation execution determination unit 42 determines that the input signal includes an interference sound and does not calculate the calibration gain. It is determined that it is a section.

校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２による判定結果に応じて、マイクｍ＿１及びｍ＿２の感度差に対する校正ゲインＬＥＶＥＬ＿ＧＡＩＮ＿１ＣＨ及びＬＥＶＥＬ＿ＧＡＩＮ＿２ＣＨを計算するものである。 The calibration gain calculation unit 43 calculates calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH with respect to the sensitivity difference between the microphones m_1 and m_2 according to the determination result by the calibration gain calculation execution determination unit 42.

校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２により相関係数ｃｏｒ（Ｋ）が「正」であると判定されると、校正ゲインＬＥＶＥＬ＿ＧＡＩＮ＿１ＣＨ及びＬＥＶＥＬ＿ＧＡＩＮ＿２ＣＨを計算する。一方、校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２により相関係数ｃｏｒ（Ｋ）が「負」であると判定されると、校正ゲインを計算せず、校正ゲイン記憶部４４に記憶されている値を校正ゲインとして設定する。 When the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor (K) is “positive”, the calibration gain calculation unit 43 calculates the calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH. On the other hand, when the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor (K) is “negative”, the calibration gain calculation unit 43 does not calculate the calibration gain and stores it in the calibration gain storage unit 44. Set the value as the calibration gain.

ここで、校正ゲイン計算部４３による校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨの計算方法を説明する。 Here, a method of calculating the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH by the calibration gain calculation unit 43 will be described.

校正ゲイン計算部４３は、以下の（１０，１）式、（１０，２）式、(１１)式、（１２，１）式及び（１２，２）式を用いて、入力信号ｓ１に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを計算する。

The calibration gain calculation unit 43 uses the following equations (10, 1), (10, 2), (11), (12, 1), and (12, 2) to calibrate the input signal s1. A gain CALIB_GAIN_1CH and a calibration gain CALIB_GAIN_2CH for the input signal s2 are calculated.

（１０，１）式は、マイクｍ＿１が捕捉した入力信号ｓ１（ｎ）の現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿１ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿１ＣＨはマイクｍ＿１の感度を反映した値とみなすことができる。（１０，２）式は、マイクｍ＿２が捕捉した入力信号ｓ２（ｎ）の現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿２ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿２ＣＨはマイクｍ＿２の感度を反映した値とみなすことができる。 The expression (10, 1) calculates the average LEVEL_1CH of absolute values of all the components of the current frame (Kth frame) of the input signal s1 (n) captured by the microphone m_1. The measured value LEVEL_1CH can be regarded as a value reflecting the sensitivity of the microphone m_1. Equation (10, 2) calculates the average LEVEL_2CH of absolute values of all the components of the current frame (Kth frame) of the input signal s2 (n) captured by the microphone m_2. The value LEVEL_2CH thus obtained can be regarded as a value reflecting the sensitivity of the microphone m_2.

なお、例えば、所定フレーム数での各フレームの構成要素の絶対値の総和値を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良い。また例えば、相関係数ｃｏｒ（Ｋ）が「正」である最新のＰ（Ｐ≦Ｋ）個のフレームを構成する全ての要素（信号成分）の絶対値の平均を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良い。後者の場合、相関係数ｃｏｒ（Ｋ）が「正」であった最新のＰ−１個のフレームの構成要素の絶対値の総和値を保存しておくことにより、現フレーム（Ｋ番目のフレーム）ＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）の情報が与えられたときに容易にマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨを計算することができる。上述したように長期間の信号成分の絶対値の平均や総和値を算出することにより、瞬間的な入力信号の変動の影響を抑制してマイク感度を反映した値を算出することができる。 For example, the sum of absolute values of the constituent elements of each frame in a predetermined number of frames may be used as values LEVEL_1CH and LEVEL_2CH reflecting microphone sensitivity. For example, the average of the absolute values of all the elements (signal components) constituting the latest P (P ≦ K) frames whose correlation coefficient cor (K) is “positive” is a value reflecting the microphone sensitivity. You may make it use as LEVEL_1CH and LEVEL_2CH. In the latter case, the current frame (K-th frame) is stored by storing the sum of absolute values of the constituent elements of the latest P−1 frames whose correlation coefficient cor (K) is “positive”. ) When the information of FRAME1 (K) and FRAME2 (K) is given, the values LEVEL_1CH and LEVEL_2CH reflecting the microphone sensitivity can be easily calculated. As described above, by calculating the average or total value of the absolute values of long-term signal components, it is possible to calculate a value reflecting the microphone sensitivity while suppressing the influence of instantaneous input signal fluctuations.

（１０，１）式及び（１０，２）式は、マイク感度を反映した値の算出式の一例であり、上述したように、その他、種々の算出式が適用できる。但し、マイクｍ＿１のマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨの算出式と、マイクｍ＿２のマイク感度を反映した値ＬＥＶＥＬ＿２ＣＨの算出式とが同じ算出式であることを要する。 Expressions (10, 1) and (10, 2) are examples of calculation formulas for values that reflect microphone sensitivity, and various other calculation formulas can be applied as described above. However, the calculation formula of the value LEVEL_1CH reflecting the microphone sensitivity of the microphone m_1 and the calculation formula of the value LEVEL_2CH reflecting the microphone sensitivity of the microphone m_2 need to be the same calculation formula.

（１１）式は、２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの平均ＡＶＥ＿ＬＥＶＥＬを、校正後のマイクｍ＿１及びｍ＿２の目標感度として算出している。なお、２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの大きい方の値若しくは小さい方の値を目標感度とするようにしても良い。 Expression (11) calculates the average AVE_LEVEL of the sensitivities LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 as the target sensitivity of the microphones m_1 and m_2 after calibration. Note that the larger or smaller value of the sensitivity levels LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 may be set as the target sensitivity.

（１２，１）式は、その右辺の分母ＬＥＶＥＬ＿１ＣＨを左辺に移項した式を考えると理解できるように、マイクｍ＿１の感度ＬＥＶＥＬ＿１ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを定める式になっている。同様に、（１２，２）式は、その右辺の分母ＬＥＶＥＬ＿２ＣＨを左辺に移項した式を考えると理解できるように、マイクｍ＿２の感度ＬＥＶＥＬ＿２ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを定める式になっている。 The expression (12, 1) is calibrated so that the value obtained by multiplying the sensitivity LEVEL_1CH of the microphone m_1 by the calibration gain CALIB_GAIN_1CH becomes the target sensitivity AVE_LEVEL so that it can be understood by considering the expression in which the denominator LEVEL_1CH on the right side is shifted to the left side. This is a formula for determining the gain CALIB_GAIN_1CH. Similarly, the expression (12, 2) is such that the value obtained by multiplying the sensitivity LEVEL_2CH of the microphone m_2 by the calibration gain CALIB_GAIN_2CH becomes the target sensitivity AVE_LEVEL, as can be understood by considering the expression in which the denominator LEVEL_2CH on the right side is shifted to the left side. The calibration gain CALIB_GAIN_2CH is determined by the following equation.

校正ゲイン記憶部４４は、校正ゲイン計算部４３が校正ゲインを計算しない場合に適用する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ）及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨ）を記憶しているものである。このような校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨとして、校正させない値１．０を適用しても良く、また、校正ゲイン計算部４３が計算した直近の値を適用するようにしても良い。 The calibration gain storage unit 44 stores calibration gains CALIB_GAIN_1CH (= INIT_GAIN_1CH) and CALIB_GAIN_2CH (= INIT_GAIN_2CH) that are applied when the calibration gain calculation unit 43 does not calculate a calibration gain. As such calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH, a value 1.0 that is not calibrated may be applied, or the latest value calculated by the calibration gain calculator 43 may be applied.

校正ゲイン出力部４５は、校正ゲイン計算部４３が計算で得た校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ、若しくは、記憶部２４から読み出された校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨをそれぞれ、対応する校正ゲイン乗算部１６、１７に与えるものである。 The calibration gain output unit 45 corresponds to the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH obtained by the calculation of the calibration gain calculation unit 43, or the calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24, respectively. It is something to give to.

第１校正ゲイン乗算部１６は、マイクｍ＿１からの入力信号ｓ１（ｎ）に、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算して得た、校正後信号ｙ１（ｎ）を出力するものである。 The first calibration gain multiplication unit 16 outputs a post-calibration signal y1 (n) obtained by multiplying the input signal s1 (n) from the microphone m_1 by the calibration gain CALIB_GAIN_1CH.

第２校正ゲイン乗算部１７は、マイクｍ＿２からの入力信号ｓ２（ｎ）に、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算して得た、校正後信号ｙ２（ｎ）を出力するものである。 The second calibration gain multiplication unit 17 outputs a post-calibration signal y2 (n) obtained by multiplying the input signal s2 (n) from the microphone m_2 by the calibration gain CALIB_GAIN_2CH.

（Ａ−２）実施形態の動作
次に、実施形態に係る音響信号処理装置１０における全体処理及び校正ゲインの計算処理の動作を、図面を参照しながら詳細に説明する。 (A-2) Operation of Embodiment Next, the operation of the overall processing and the calculation process of the calibration gain in the acoustic signal processing apparatus 10 according to the embodiment will be described in detail with reference to the drawings.

マイクｍ＿１及びｍ＿２のそれぞれから図示しないＡＤ変換器を介して、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）がＦＦＴ部１１に入力される。 Input signals s1 (n) and s2 (n) for one frame are input from the microphones m_1 and m_2 to the FFT unit 11 via an AD converter (not shown).

ＦＦＴ部１１は、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）に基づく分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）についてフーリエ変換し、周波数領域で示される信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）を取得する。ＦＦＴ部１１により生成された信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）が、正面抑圧信号生成部１２及びコヒーレンス計算部１３に与えられる。 The FFT unit 11 performs Fourier transform on the analysis frames FRAME1 (K) and FRAME2 (K) based on the input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain And X2 (f, K). The signals X1 (f, K) and X2 (f, K) generated by the FFT unit 11 are given to the front suppression signal generation unit 12 and the coherence calculation unit 13.

正面抑圧信号生成部１２は、信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）に基づいて、正面方向以外の方向に指向性を有する正面抑圧信号Ｎ（ｆ、Ｋ）を算出する。そして、正面抑圧信号生成部１２は、全周波数に亘って正面抑圧信号Ｎ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）を生成し、この平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を相関計算部１４に与える。 The front suppression signal generation unit 12 calculates a front suppression signal N (f, K) having directivity in a direction other than the front direction based on the signals X1 (f, K) and X2 (f, K). Then, the front suppression signal generator 12 generates an average front suppression signal AVE_N (f, K) by averaging the front suppression signals N (f, K) over all frequencies, and this average front suppression signal AVE_N (K ) To the correlation calculation unit 14.

一方、コヒーレンス計算部１３は、信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）に基づいて、コヒーレンスＣＯＨを算出し、コヒーレンスＣＯＨを相関計算部１４に与える。 On the other hand, the coherence calculation unit 13 calculates coherence COH based on the signals X1 (f, K) and X2 (f, K), and provides the coherence COH to the correlation calculation unit 14.

相関計算部１４は、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとを取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとの相関係数ｃｏｒ（Ｋ）を算出し、この相関係数ｃｏｒ（Ｋ）を校正ゲイン計算部１５に与える。 The correlation calculation unit 14 acquires the average front suppression signal AVE_N (f, K) and the coherence COH, calculates a correlation coefficient cor (K) between the average front suppression signal AVE_N (f, K) and the coherence COH, This correlation coefficient cor (K) is given to the calibration gain calculator 15.

校正ゲイン計算部１５は、相関係数ｃｏｒ（Ｋ）を取得し、この相関係数ｃｏｒ（Ｋ）の正負を観測し、その判断結果に応じて、各信号ｓ１（ｎ）及びｓ２（ｎ）に対する校正ゲインを算出する。また、校正ゲイン計算部１５は、信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを第１校正ゲイン乗算部１６に出力し、信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを第２校正ゲイン乗算部１７に出力する。 The calibration gain calculation unit 15 acquires the correlation coefficient cor (K), observes the sign of the correlation coefficient cor (K), and determines the signals s1 (n) and s2 (n) according to the determination result. Calculate the calibration gain for. Further, the calibration gain calculation unit 15 outputs the calibration gain CALIB_GAIN_1CH for the signal s1 (n) to the first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH for the signal s2 (n) to the second calibration gain multiplication unit 17. .

図５は、校正ゲイン計算部１５における処理動作を示すフローチャートである。 FIG. 5 is a flowchart showing the processing operation in the calibration gain calculator 15.

相関係数及び入力信号取得部４１は、相関係数部１４から相関係数ｃｏｒ（Ｋ）を取得し、入力信号ｓ１（ｎ）及びｓ２（ｎ）のＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を取得する（Ｓ５１）。 The correlation coefficient and input signal acquisition unit 41 acquires the correlation coefficient cor (K) from the correlation coefficient unit 14 and calculates FRAME1 (K) and FRAME2 (K) of the input signals s1 (n) and s2 (n). Obtain (S51).

そして、校正ゲイン計算実行判定部４２が、相関係数ｃｏｒ（Ｋ）の値が正であるか又は負であるかを判定する（Ｓ５２）。 Then, the calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor (K) is positive or negative (S52).

相関係数ｃｏｒ（Ｋ）が正の場合、正面方向以外の方向から到来した妨害音は存在せず、正面方向からの目的音区間とみなし、校正ゲイン計算部４３は、相関係数ｃｏｒ（Ｋ）、ＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を用いて、（１０，１）式、（１０，２）式、（１１）式、（１２，１）式、（１２，２）式に従って、信号ｓ１（ｎ）及び信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを算出する（Ｓ５３）。このとき、校正ゲイン計算部４３は、算出した校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨをそれぞれ、校正ゲイン記憶部４４に記憶して、校正ゲイン記憶部４４に記憶される校正ゲインを更新する。 When the correlation coefficient cor (K) is positive, there is no interfering sound coming from a direction other than the front direction, and it is regarded as a target sound section from the front direction, and the calibration gain calculation unit 43 determines the correlation coefficient cor (K ), FRAME1 (K) and FRAME2 (K), the signals according to the expressions (10, 1), (10, 2), (11), (12, 1), (12, 2) Calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for s1 (n) and signal s2 (n) are calculated (S53). At this time, the calibration gain calculation unit 43 stores the calculated calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH in the calibration gain storage unit 44, respectively, and updates the calibration gain stored in the calibration gain storage unit 44.

相関係数ｃｏｒ（Ｋ）が負の場合、正面方向以外の方向から到来した妨害音は存在するとみなし、校正ゲイン計算部４３は、校正ゲイン記憶部４４に記憶されている値を校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする（Ｓ５４）。 When the correlation coefficient cor (K) is negative, it is considered that there is an interference sound coming from a direction other than the front direction, and the calibration gain calculation unit 43 uses the values stored in the calibration gain storage unit 44 as the calibration gains CALIB_GAIN_1CH and It is set as CALIB_GAIN_2CH (S54).

つまり、校正ゲイン記憶部４４に、校正ゲインの初期値ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨが格納されている場合、ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨをＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨとし、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨをＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする。若しくは、校正ゲイン記憶部４４に、最新の校正ゲインが記憶されている場合は、校正ゲイン記憶部４４に記憶されている最新の校正ゲインを、今回の校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする。 That is, when initial values INIT_GAIN_1CH and INIT_GAIN_2CH of calibration gains are stored in the calibration gain storage unit 44, INIT_GAIN_1CH is set to CALIB_GAIN_1CH, and INIT_GAIN_2CH is set to CALIB_GAIN_2CH. Alternatively, when the latest calibration gain is stored in the calibration gain storage unit 44, the latest calibration gain stored in the calibration gain storage unit 44 is the current calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH.

そして、校正ゲイン出力部４５は、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを第１校正ゲイン乗算部１６に出力し、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを第２校正ゲイン乗算部１７に出力する（Ｓ５５）。そして、校正ゲイン計算部１５は、インデックスＫを更新して（Ｓ５６）、Ｓ５１に移行して次のインデックスの校正ゲインの算出処理を行なう。 Then, the calibration gain output unit 45 outputs the calibration gain CALIB_GAIN_1CH to the first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH to the second calibration gain multiplication unit 17 (S55). Then, the calibration gain calculation unit 15 updates the index K (S56), proceeds to S51, and performs the calculation process of the calibration gain of the next index.

ここで、校正ゲイン計算部１５は、校正ゲインを一度計算した後は校正ゲインが変動することは無いので、定常的に校正ゲインを更新し続けることは演算量の無駄となるので、途中から更新を停止してもよい。つまり、マイクｍ＿１及びｍ＿２を有する音響信号処理装置１０が使用される環境で、初期段階に、マイクｍ＿１及びｍ＿２に対する校正ゲインを取得した後は、定常的な校正ゲインの更新を行なう必要はなく、適宜校正ゲインの算出が必要な場合に行なうようにしてもよい。 Here, since the calibration gain does not fluctuate once the calibration gain has been calculated once, the calibration gain calculation unit 15 is updated from the middle because it is a waste of calculation amount to constantly update the calibration gain. May be stopped. That is, in the environment where the acoustic signal processing apparatus 10 having the microphones m_1 and m_2 is used, after obtaining the calibration gain for the microphones m_1 and m_2 in the initial stage, it is not necessary to update the steady calibration gain. You may make it perform when calculation of a calibration gain is needed suitably.

そして、第１校正ゲイン乗算部１６は、信号ｓ１（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算し、校正後信号ｙ１（ｎ）を出力し、第２校正ゲイン乗算部１７は、信号ｓ２（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算し、校正後信号ｙ２（ｎ）を出力する。 Then, the first calibration gain multiplication unit 16 multiplies the signal s1 (n) by the calibration gain CALIB_GAIN_1CH and outputs a post-calibration signal y1 (n), and the second calibration gain multiplication unit 17 adds the signal s2 (n) to the signal s2 (n). Multiply by the calibration gain CALIB_GAIN_2CH to output a post-calibration signal y2 (n).

（Ａ−３）実施形態の効果
以上のように、この実施形態によれば、正面方向以外の方向から到達する妨害音が存在する場合、正面抑圧信号とＣＯＨとの相関係数が負であり、妨害音が存在しない場合、正面抑圧信号とＣＯＨとの相関係数が正となる、という特徴的な挙動を用いることで、妨害音声の影響を受けることなく、かつ、設計者にとって閾値設定が容易なマイク感度校正方法を実現することができる。 (A-3) Effect of Embodiment As described above, according to this embodiment, when there is a disturbing sound that arrives from a direction other than the front direction, the correlation coefficient between the front suppression signal and COH is negative. By using the characteristic behavior that the correlation coefficient between the front suppression signal and COH is positive when no disturbing sound is present, the threshold setting for the designer is not affected by the disturbing sound. An easy microphone sensitivity calibration method can be realized.

これにより、マイクアレイを用いた各種信号処理方法の前処理に、マイクｍ＿１及びｍ＿２に対する校正ゲインを算出する処理を適用することで、その後の音声処理性能の向上が期待できる。 As a result, by applying a process for calculating the calibration gain for the microphones m_1 and m_2 to the preprocessing of various signal processing methods using the microphone array, it is possible to expect improvement in the subsequent voice processing performance.

（Ｂ）他の実施形態
上述した実施形態においても種々の変形実施形態を言及したが、本発明は、以下の変形実施形態にも適用できる。 (B) Other Embodiments Although various modified embodiments have been mentioned in the above-described embodiments, the present invention can also be applied to the following modified embodiments.

（Ｂ−１）上述した実施形態において、相関計算部が、正面抑圧信号とコヒーレンスとの特徴量として相関係数を算出する場合を例示したが、正面抑圧信号とコヒーレンスとの特徴量として共分散の値を算出しても、上述した実施形態と同様な効果が得られる。 (B-1) In the above-described embodiment, the case where the correlation calculation unit calculates the correlation coefficient as the feature quantity between the front suppression signal and the coherence is exemplified. However, the covariance is calculated as the feature quantity between the front suppression signal and the coherence. Even if this value is calculated, the same effect as in the above-described embodiment can be obtained.

（Ｂ−２）上述した実施形態では、本発明に係る音響信号処理装置は、複数のマイクを備えた音声処理機能（例えば、音声認識処理など）を有する装置であれば、様々な装置に適用することができ、例えば、スマートフォン、タブレット端末、テレビ会議端末、カーナビゲーションシステム、コールセンタ端末、ロボット、音信号をセンサ信号として使用する装置等に広く適用できる。 (B-2) In the above-described embodiment, the acoustic signal processing device according to the present invention is applicable to various devices as long as the device has a voice processing function (for example, voice recognition processing) provided with a plurality of microphones. For example, it can be widely applied to smartphones, tablet terminals, video conference terminals, car navigation systems, call center terminals, robots, devices using sound signals as sensor signals, and the like.

また、例えば、本発明の音響信号処理装置が通信機能を備える装置に搭載され、当該装置が、ネットワークを通じて、所定の音声処理機能を有するサーバに、校正後信号を送信するようにしてもよい。 Further, for example, the acoustic signal processing device of the present invention may be mounted on a device having a communication function, and the device may transmit a post-calibration signal to a server having a predetermined voice processing function through a network.

さらに、例えば、複数のマイクを備えた通信機能を有する装置が、ネットワークを通じて、本発明の音響信号処理装置を搭載したサーバに、各マイクの入力信号を送信するようにしてもよい。この場合、音響信号処理装置を搭載したサーバが、上述した実施形態と同様に、正面抑圧信号とコヒーレンスとの相関係数に応じて、各入力信号に対する校正ゲインを算出することができる。 Furthermore, for example, a device having a communication function including a plurality of microphones may transmit an input signal of each microphone to a server equipped with the acoustic signal processing device of the present invention through a network. In this case, the server equipped with the acoustic signal processing device can calculate the calibration gain for each input signal according to the correlation coefficient between the front suppression signal and the coherence, as in the above-described embodiment.

（Ｂ−３）上述した実施形態では、マイクが２個である場合を例示したが、３個以上のマイクのそれぞれから入力信号を取得する装置にも本発明を適用することができる。 (B-3) In the above-described embodiment, the case where there are two microphones has been illustrated, but the present invention can also be applied to an apparatus that acquires an input signal from each of three or more microphones.

１０…音響信号処理装置、ｍ＿１及びｍ＿２…マイク、１１・・ＦＴＴ部、１２…正面抑圧信号生成部、１３…コヒーレンス計算部、
１４…相関計算部、３１…正面抑圧信号、コヒーレンス取得部、３２…相関係数計算部、３３…相関係数出力部、
１５…校正ゲイン計算部、４１…相関係数及び入力信号取得部、４２…校正ゲイン計算実行判定部、４３…校正ゲイン計算部、４４…校正ゲイン記憶部、４５…校正ゲイン出力部、
１６…第１校正ゲイン乗算部、１７…第２校正ゲイン乗算部。 DESCRIPTION OF SYMBOLS 10 ... Acoustic signal processing apparatus, m_1 and m_2 ... Microphone, 11 * .FTT part, 12 ... Front suppression signal generation part, 13 ... Coherence calculation part,
14 ... correlation calculation unit, 31 ... front suppression signal, coherence acquisition unit, 32 ... correlation coefficient calculation unit, 33 ... correlation coefficient output unit,
DESCRIPTION OF SYMBOLS 15 ... Calibration gain calculation part, 41 ... Correlation coefficient and input signal acquisition part, 42 ... Calibration gain calculation execution determination part, 43 ... Calibration gain calculation part, 44 ... Calibration gain memory | storage part, 45 ... Calibration gain output part,
16: first calibration gain multiplication unit, 17: second calibration gain multiplication unit.

Claims

A front suppression signal generation unit that generates a front suppression signal having a blind spot on the front based on a difference between a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain; ,
A coherence calculator that calculates coherence based on signals obtained from the plurality of input signals;
A feature amount calculation unit that calculates a feature amount representing the relationship between the front suppression signal and the coherence;
A calibration gain calculation unit that detects a target sound section that is not affected by the interference sound based on the feature amount, and calculates a calibration gain for each of the input signals in the target sound section;
An acoustic signal processing apparatus comprising: a calibration unit that calibrates each corresponding input signal with each calibration gain.

The acoustic signal processing apparatus according to claim 1, wherein the feature amount calculation unit calculates a correlation value between the front suppression signal and the coherence as the feature amount.

The acoustic signal processing apparatus according to claim 2, wherein the calibration gain calculation unit detects the target sound section in accordance with the sign of the correlation value between the front suppression signal and the coherence.

The calibration gain calculator is
When the correlation value between the front suppression signal and the coherence is positive, it is determined as the target sound section, and a calibration gain for each input signal is calculated.
The acoustic signal processing according to claim 2 or 3, wherein when the correlation value between the front suppression signal and the coherence is negative, it is determined as a sound section including an interfering sound and the calibration gain is not calculated. apparatus.

Computer
A front suppression signal generation unit that generates a front suppression signal having a blind spot on the front based on a difference between a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain; ,
A coherence calculator that calculates coherence based on signals obtained from the plurality of input signals;
A feature amount calculation unit that calculates a feature amount representing the relationship between the front suppression signal and the coherence;
A calibration gain calculation unit that detects a target sound section that is not affected by the interference sound based on the feature amount, and calculates a calibration gain for each of the input signals in the target sound section;
An acoustic signal processing program that causes each calibration gain to function as a calibration unit that calibrates each corresponding input signal.

The front suppression signal generation unit generates a front suppression signal having a blind spot on the front based on the difference between a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain. And
A coherence calculator calculates coherence based on the signals obtained from the plurality of input signals,
A feature amount calculating unit calculates a feature amount representing a relationship between the front suppression signal and the coherence;
A calibration gain calculation unit detects a target sound section that is not affected by the interference sound based on the feature amount, calculates a calibration gain for each input signal in the target sound section,
The acoustic signal processing method, wherein the calibration unit calibrates each corresponding input signal with each calibration gain.