JP6711205B2

JP6711205B2 - Acoustic signal processing device, program and method

Info

Publication number: JP6711205B2
Application number: JP2016162712A
Authority: JP
Inventors: 克之高橋
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2020-06-17
Anticipated expiration: 2036-08-23
Also published as: JP2018032931A

Description

本発明は、音響信号処理装置、プログラム及び方法に関し、例えば、電話やテレビ電話等に用いられる通信機又は通信ソフトウェア、あるいは、音声認識処理の前処理で用いる、音響信号処理に適用し得るものである。 The present invention relates to an acoustic signal processing device, a program, and a method, and can be applied to, for example, a communication device or communication software used in a telephone, a videophone, or the like, or acoustic signal processing used in preprocessing of voice recognition processing. is there.

近年、スマートフォンやカーナビゲーションなどのように、音声通話機能や音声認識機能等の様々な音声処理機能が搭載された機器が普及している。しかし、これらの機器が普及したことで、混雑した街中や走行中の車内など、以前よりも過酷な雑音環境下で音声処理機能が用いられるようになってきている。そのため、雑音環境下でも通話音質や音声認識性能を維持できるような、信号処理技術の需要が高まっている。 In recent years, devices equipped with various voice processing functions such as a voice call function and a voice recognition function have become widespread, such as smartphones and car navigation systems. However, with the spread of these devices, the voice processing function has come to be used in a more harsh noise environment than before, such as in a crowded city or inside a running vehicle. Therefore, there is an increasing demand for signal processing technology capable of maintaining call quality and speech recognition performance even in a noisy environment.

特開２０１４−６８０５２号公報JP, 2014-68052, A

平岡和幸、堀玄，“プログラミングのための確率統計”，株式会社オーム社発行，平成２１年１０月２３日，ｐ．１７８−ｐ．１７９Kazuyuki Hiraoka, Gen Hori, “Probability Statistics for Programming”, published by Ohmsha, Ltd., October 23, 2009, p. 178-p. 179

近年、多チャンネルのマイクを用いた音響信号処理技術が実現されているが、同じ型番のマイクであっても感度差があり、感度差を校正しなければ正確な音響特徴量の計算ができない。これまでは事前にマイクの感度を測定し、感度差に応じた補正ゲインを設定したり、チャンネルごとに入力レベルを比較して、平均値に一致させるような補正ゲインを自動設定するなどの手法で対処している。しかし、前者は手間がかかり、後者はマイクの感度差だけでなく取得した入力信号の差も埋めてしまうため、後段で計算する音響特徴量の精度が保障されない、という課題がある。 In recent years, acoustic signal processing technology using a multi-channel microphone has been realized, but even microphones of the same model number have different sensitivities, and accurate acoustic feature quantities cannot be calculated without calibrating the sensitivity differences. Until now, methods such as measuring the microphone sensitivity in advance and setting the correction gain according to the difference in sensitivity, or comparing the input levels for each channel and automatically setting the correction gain that matches the average value, etc. Is being dealt with. However, the former takes time and the latter fills not only the difference in the sensitivity of the microphones but also the difference in the acquired input signals, so that there is a problem that the accuracy of the acoustic feature quantity calculated in the latter stage cannot be guaranteed.

この課題の改善方法の１つが、入力信号のうち、マイク正面から到来する信号成分の区間でのみ入力レベルの比較を行って校正ゲインを計算する、というものである。これは正面から到来する信号ならば各マイクと音源との距離が等しいため、マイクに到達する信号成分の音響的な特性差は微小であり、両者に発生する特性差はマイク感度のみであると期待できることを前提としている。これを前提とした解決法の１つが特許文献１に記載される手法である。これは、正面から目的話者の音声が到来するか否かによってコヒーレンスという特徴量の大小が変動することに注目し、正面から音声が到来する信号区間でマイク感度差校正ゲインを算出する、という技術である。なお、コヒーレンスはマイクの感度差があっても、音声が正面から到来するか否かで大小が変動するという挙動は維持されるので、この手法で感度差を校正することができる。（補：コヒーレンスの計算方法は特許文献１の式７を参照のこと） One of the methods for improving this problem is to calculate the calibration gain by comparing the input levels only in the section of the input signal that comes from the front of the microphone. This is because if the signal comes from the front, the distance between each microphone and the sound source is equal, so the difference in the acoustic characteristics of the signal components reaching the microphone is very small, and the only characteristic difference that occurs between them is the microphone sensitivity. It is supposed to be expected. One of the solutions based on this is the method described in Patent Document 1. This is to note that the magnitude of the feature quantity called coherence varies depending on whether or not the voice of the target speaker comes from the front, and calculates the microphone sensitivity difference calibration gain in the signal section where the voice comes from the front. It is a technology. Even if there is a difference in sensitivity between the microphones, the behavior that the magnitude changes depending on whether or not the voice comes from the front is maintained, and thus the sensitivity difference can be calibrated by this method. (Supplement: For the calculation method of coherence, refer to Formula 7 of Patent Document 1)

しかし、特許文献１の方法は、マイクアレイの正面から到来する目的音声と同時に左右から別の話者の話し声（妨害音）が到来する場合にもコヒーレンスが大きい値をとるため、正面から到来していない音声成分も校正ゲインに反映されてしまう。また、マイクの感度差はマイクアレイごとにランダムなので、正面から到来する信号区間を検出する閾値の最適化が難しく、目的音声区間を誤判定してしまう可能性がある。 However, in the method of Patent Document 1, the coherence takes a large value even when the target voice coming from the front of the microphone array and the speaking voice (interfering sound) of another speaker from the left and right simultaneously arrive, so the method comes from the front. The uncorrected voice component is also reflected in the calibration gain. Further, since the microphone sensitivity difference is random for each microphone array, it is difficult to optimize the threshold value for detecting the signal section coming from the front, and the target voice section may be erroneously determined.

そのため、上記のような２つの課題を改善するため、妨害音が存在していても正確に感度校正ゲインが計算でき、かつ、閾値をより容易に設定できる感度校正ゲイン計算方法が求められている。 Therefore, in order to improve the above two problems, there is a demand for a sensitivity calibration gain calculation method that can accurately calculate the sensitivity calibration gain even if there is an interfering sound and that can set the threshold value more easily. ..

上記課題を解決するために、第１の本発明に係る音響信号処理装置は、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理装置において、（１）第１の入力音響信号を時間領域から周波数領域に変換した第１の周波数領域信号と、第２の入力音響信号を時間領域から周波数領域に変換した第２の周波数領域信号とを周波数成分毎に差分をとって得た各周波数成分の値を平均して、正面方向に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）第１の周波数領域信号と第２の周波数領域信号とに基づいて、正面方向とは異なる第１の方向に指向性が強い指向性特性を付与した第１の指向性信号と、正面方向とは異なり、かつ、第１の方向とは異なる第２の方向に指向性が強い指向性特性を付与した第２の指向性信号とを用いて、コヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号とコヒーレンスとの関係性を表す相関値を算出する特徴量算出部と、（４）相関値が正の値であるか又は負の値であるかにより、第１及び第２の入力音響信号に対する各校正ゲインを算出する校正ゲイン算出部と、（５）各校正ゲインで、対応する各入力音響信号を校正する校正部とを備え、校正ゲイン算出部が、相関値が正の値のとき、妨害音の影響を受けていない、正面から到来する目的音区間を検出し、その目的音区間における第１及び第２の入力音響信号を用いて、各入力音響信号のマイク感度を反映させた値を算出し、算出した複数のマイク感度を反映させた値から目標感度を求め、各マイク感度を反映させた値と目標感度とに基づいて、各入力音響信号に対する各校正ゲインを算出し、相関値が負の値のとき、各入力音響信号に対する各校正ゲインの初期値、又は、校正ゲイン算出部に記憶される、妨害音の影響を受けていない目的音区間の最新の各校正ゲインとすることを特徴とする。 In order to solve the above problems, an acoustic signal processing device according to a first aspect of the present invention is an acoustic signal processing device that calibrates a difference in microphone sensitivity among a plurality of input acoustic signals. Each of the first frequency domain signal obtained by converting the time domain into the frequency domain and the second frequency domain signal obtained by converting the second input acoustic signal from the time domain into the frequency domain are obtained for each frequency component. A frontal suppression signal generator that averages the values of the frequency components to generate a frontal suppression signal having a blind spot in the frontal direction, and (2) a front surface based on the first frequency domain signal and the second frequency domain signal. A directivity signal having a strong directivity characteristic in a first direction different from the direction and a directivity in a second direction different from the front direction and different from the first direction. A coherence calculation unit that calculates coherence using the second directivity signal to which a strong directivity characteristic is added, and (3) a feature amount calculation unit that calculates a correlation value indicating the relationship between the front suppression signal and coherence. And (4) a calibration gain calculator that calculates each calibration gain for the first and second input acoustic signals depending on whether the correlation value is a positive value or a negative value, and (5) each calibration The calibration gain calculation unit is provided with a calibration unit that calibrates each corresponding input acoustic signal with a gain , and when the correlation value is a positive value, the target sound section that is not affected by the interfering sound and arrives from the front is obtained. A value that reflects the microphone sensitivity of each input acoustic signal is calculated using the first and second input acoustic signals in the target sound section, and the target is calculated from the values that reflect the calculated microphone sensitivity. Calculate the sensitivity, calculate each calibration gain for each input acoustic signal based on the value that reflects each microphone sensitivity and the target sensitivity, and when the correlation value is a negative value, set each calibration gain for each input acoustic signal. It is characterized in that the initial value or the latest calibration gain of the target sound section that is not affected by the interfering sound stored in the calibration gain calculation unit is used.

第２の本発明に係る音響信号処理プログラムは、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理プログラムにおいて、コンピュータを、（１）第１の入力音響信号を時間領域から周波数領域に変換した第１の周波数領域信号と、第２の入力音響信号を時間領域から周波数領域に変換した第２の周波数領域信号とを周波数成分毎に差分をとって得た各周波数成分の値を平均して、正面方向に死角を有する正面抑圧信号を生成する正面抑圧信号生成部と、（２）第１の周波数領域信号と第２の周波数領域信号とに基づいて、正面方向とは異なる第１の方向に指向性が強い指向性特性を付与した第１の指向性信号と、正面方向とは異なり、かつ、第１の方向とは異なる第２の方向に指向性が強い指向性特性を付与した第２の指向性信号とを用いて、コヒーレンスを算出するコヒーレンス算出部と、（３）正面抑圧信号とコヒーレンスとの関係性を表す相関値を算出する特徴量算出部と、（４）相関値が正の値であるか又は負の値であるかにより、第１及び第２の入力音響信号に対する各校正ゲインを算出する校正ゲイン算出部と、（５）各校正ゲインで、対応する各入力音響信号を校正する校正部として機能させ、校正ゲイン算出部が、相関値が正の値のとき、妨害音の影響を受けていない、正面から到来する目的音区間を検出し、その目的音区間における第１及び第２の入力音響信号を用いて、各入力音響信号のマイク感度を反映させた値を算出し、算出した複数のマイク感度を反映させた値から目標感度を求め、各マイク感度を反映させた値と目標感度とに基づいて、各入力音響信号に対する各校正ゲインを算出し、相関値が負の値のとき、各入力音響信号に対する各校正ゲインの初期値、又は、校正ゲイン算出部に記憶される、妨害音の影響を受けていない目的音区間の最新の各校正ゲインとすることを特徴とする。 An acoustic signal processing program according to a second aspect of the present invention is an acoustic signal processing program for calibrating a difference in microphone sensitivity between a plurality of input acoustic signals, wherein the computer uses (1) the first input acoustic signal from the time domain to the frequency domain. The value of each frequency component obtained by taking the difference for each frequency component of the first frequency domain signal converted to and the second frequency domain signal obtained by converting the second input acoustic signal from the time domain to the frequency domain. On average , a front suppression signal generator that generates a front suppression signal having a blind spot in the front direction, and (2) a first frequency domain signal and a second frequency domain signal based on The first directional signal having a directivity characteristic with a strong directivity in the direction 1 and the directivity characteristic with a strong directivity in a second direction different from the front direction and different from the first direction. A coherence calculation unit that calculates coherence using the added second directional signal ; (3) a feature amount calculation unit that calculates a correlation value representing the relationship between the front suppression signal and coherence; and (4) Depending on whether the correlation value is a positive value or a negative value, a calibration gain calculation unit that calculates each calibration gain for the first and second input acoustic signals, and (5) each calibration gain When functioning as a calibration unit that calibrates each input acoustic signal , the calibration gain calculation unit detects a target sound section that arrives from the front and is not affected by the interfering sound when the correlation value is a positive value, and determines the purpose. Using the first and second input acoustic signals in the sound section, a value reflecting the microphone sensitivity of each input acoustic signal is calculated, and the target sensitivity is calculated from the calculated values reflecting the plurality of microphone sensitivities. Based on the value reflecting the microphone sensitivity and the target sensitivity, calculate each calibration gain for each input acoustic signal, when the correlation value is a negative value, the initial value of each calibration gain for each input acoustic signal, or, calibration gain is stored in the calculation unit, characterized by the latest child and each calibration gain of the target sound section non affected disturbing sound.

第３の本発明に係る音響信号処理方法は、複数の入力音響信号におけるマイク感度の相違を校正する音響信号処理方法において、（１）正面抑圧信号生成部が、第１の入力音響信号を時間領域から周波数領域に変換した第１の周波数領域信号と、第２の入力音響信号を時間領域から周波数領域に変換した第２の周波数領域信号とを周波数成分毎に差分をとって得た各周波数成分の値を平均して、正面方向に死角を有する正面抑圧信号を生成し、（２）コヒーレンス算出部が、第１の周波数領域信号と第２の周波数領域信号とに基づいて、正面方向とは異なる第１の方向に指向性が強い指向性特性を付与した第１の指向性信号と、正面方向とは異なり、かつ、第１の方向とは異なる第２の方向に指向性が強い指向性特性を付与した第２の指向性信号とを用いて、コヒーレンスを算出し、（３）特徴量算出部が、正面抑圧信号とコヒーレンスとの関係性を表す相関値を算出し、（４）校正ゲイン算出部が、相関値が正の値であるか又は負の値であるかにより、第１及び第２の入力音響信号に対する各校正ゲインを算出し、（５）校正部が、各校正ゲインで、対応する各入力音響信号を校正し、校正ゲイン算出部が、相関値が正の値のとき、妨害音の影響を受けていない、正面から到来する目的音区間を検出し、その目的音区間における第１及び第２の入力音響信号を用いて、各入力音響信号のマイク感度を反映させた値を算出し、算出した複数のマイク感度を反映させた値から目標感度を求め、各マイク感度を反映させた値と上記目標感度とに基づいて、各入力音響信号に対する各校正ゲインを算出し、相関値が負の値のとき、各入力音響信号に対する各校正ゲインの初期値、又は、校正ゲイン算出部に記憶される、妨害音の影響を受けていない目的音区間の最新の各校正ゲインとすることを特徴とする。 An acoustic signal processing method according to a third aspect of the present invention is the acoustic signal processing method for calibrating the difference in microphone sensitivity between a plurality of input acoustic signals, wherein (1) the front suppression signal generation unit outputs the first input acoustic signal over time. Each frequency obtained by taking a difference for each frequency component of a first frequency domain signal converted from the domain to the frequency domain and a second frequency domain signal converted from the second input acoustic signal from the time domain to the frequency domain The values of the components are averaged to generate a frontal suppression signal having a blind spot in the frontal direction, and (2) the coherence calculator calculates the frontal direction based on the first frequency domain signal and the second frequency domain signal. Is a directivity signal having a directivity characteristic having a strong directivity in a different first direction, and a directivity having a strong directivity in a second direction different from the front direction and different from the first direction. The coherence is calculated using the second directivity signal to which the sexual characteristic is added, and (3) the feature amount calculation unit calculates a correlation value indicating the relationship between the front suppression signal and the coherence, and (4) The calibration gain calculation unit calculates each calibration gain for the first and second input acoustic signals depending on whether the correlation value is a positive value or a negative value, and (5) the calibration unit calculates each calibration gain. The gain is calibrated for each corresponding input acoustic signal, and when the correlation gain value is a positive value, the target sound section arriving from the front that is not affected by the interfering sound is detected and Using the first and second input acoustic signals in the sound section, a value reflecting the microphone sensitivity of each input acoustic signal is calculated, and the target sensitivity is calculated from the calculated values reflecting the plurality of microphone sensitivities. Based on the value reflecting the microphone sensitivity and the target sensitivity, calculate each calibration gain for each input acoustic signal, when the correlation value is a negative value, the initial value of each calibration gain for each input acoustic signal, or The calibration gain calculation unit stores the latest calibration gain of the target sound section that is not affected by the interfering sound .

本発明によれば、妨害音があっても正確に感度校正ゲインが計算でき、かつ、閾値をより容易に設定できる。 According to the present invention, the sensitivity calibration gain can be accurately calculated even if there is an interfering sound, and the threshold value can be set more easily.

実施形態に係る音響信号処理装置の全体構成を示すブロック図である。It is a block diagram showing the whole audio signal processing device composition concerning an embodiment. 実施形態に係る正面抑圧信号生成部で形成される指向性の特性を示す説明図である。It is explanatory drawing which shows the characteristic of the directivity formed in the front suppression signal generation part which concerns on embodiment. 実施形態に係る相関計算部の構成を示すブロック図である。It is a block diagram which shows the structure of the correlation calculation part which concerns on embodiment. 実施形態に係る校正ゲイン計算部の構成を示すブロック図である。It is a block diagram which shows the structure of the calibration gain calculation part which concerns on embodiment. 実施形態に係る校正ゲイン計算部における処理動作を示すフローチャートである。6 is a flowchart showing a processing operation in a calibration gain calculation unit according to the embodiment.

（Ａ）主たる実施形態
以下では、本発明に係る音響信号処理装置、プログラム及び方法の実施形態を、図面を参照しながら詳細に説明する。 (A) Main Embodiment Hereinafter, embodiments of an acoustic signal processing device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）実施形態の構成
図１は、この実施形態に係る音響信号処理装置１０の全体構成を示すブロック図である。 (A-1) Configuration of Embodiment FIG. 1 is a block diagram showing the overall configuration of an acoustic signal processing device 10 according to this embodiment.

図１において、音響信号処理装置１０は、複数（図１では２個の場合を例示している）のマイクｍ＿１及びｍ＿２、ＦＦＴ部１１、正面抑圧信号生成部１２、コヒーレンス計算部１３、相関計算部１４、校正ゲイン計算部１５、第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７を有する。 In FIG. 1, the acoustic signal processing device 10 includes a plurality of microphones m_1 and m_2 (two cases are illustrated in FIG. 1), an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, and a correlation calculation. It has a unit 14, a calibration gain calculation unit 15, a first calibration gain multiplication unit 16, and a second calibration gain multiplication unit 17.

なお、特許請求の範囲に記載の「特徴量算出部」は相関計算部１４を含むものである。また、「校正ゲイン算出部」は校正ゲイン計算部１５を含むものである。さらに、「校正部」は第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７を含むものである。 The “feature amount calculation unit” described in the claims includes the correlation calculation unit 14. The “calibration gain calculation unit” includes the calibration gain calculation unit 15. Further, the “calibration unit” includes the first calibration gain multiplication unit 16 and the second calibration gain multiplication unit 17.

図１に例示する音響信号処理装置１０において、マイクｍ＿１及びｍ＿２以外の構成要素は、ＣＰＵが実行するソフトウェア（音響信号処理プログラム）として実現することができ、音響信号処理プログラムの機能は、図１で表すことができる。 In the acoustic signal processing device 10 illustrated in FIG. 1, the constituent elements other than the microphones m_1 and m_2 can be realized as software (acoustic signal processing program) executed by the CPU, and the function of the acoustic signal processing program is shown in FIG. Can be expressed as

マイクｍ＿１及びマイクｍ＿２は、所定距離（若しくは任意の距離）だけ離れて配置され、マイクｍ＿１及びマイクｍ＿２のそれぞれは、周囲の音響を捕捉するものである。各マイクｍ＿１及びマイクｍ＿２で捕捉された各音響信号（入力信号）は、図示しないアナログ／デジタル（Ａ／Ｄ）変換器に変換されて、入力信号ｓ１（ｎ）及びｓ２（ｎ）のそれぞれが、ＦＦＴ部１１と、校正ゲイン計算部１５と、第１校正ゲイン乗算部１６及び第２校正ゲイン乗算部１７とに与えられる。なお、ｎは、サンプルの入力順を表すインデックスであり、正の整数で表現される。本文中では、ｎの値が小さいほど古い入力サンプルであり、大きいほど新しい入力サンプルであるとする。 The microphone m_1 and the microphone m_2 are arranged apart from each other by a predetermined distance (or an arbitrary distance), and each of the microphone m_1 and the microphone m_2 captures ambient sound. The acoustic signals (input signals) captured by the microphones m_1 and m_2 are converted into an analog/digital (A/D) converter (not shown), and the input signals s1(n) and s2(n) are respectively converted. , FFT unit 11, calibration gain calculation unit 15, first calibration gain multiplication unit 16 and second calibration gain multiplication unit 17. Note that n is an index indicating the input order of the samples, and is expressed by a positive integer. In the text, it is assumed that the smaller the value of n, the older the input sample and the larger the value of n, the newer the input sample.

ＦＦＴ部１１は、マイクｍ＿１及びｍ＿２から入力信号ｓ１（ｎ）及びｓ２（ｎ）を受け取り、その入力信号ｓ１（ｎ）及びｓ２（ｎ）に高速フーリエ変換（あるいは離散フーリエ変換）を行なうものである。これにより、入力信号ｓ１（ｎ）及びｓ２（ｎ）を時間領域から周波数領域に変換することができる。なお、ＦＦＴ部１１は、高速フーリエ変換を実施するにあたり、入力信号ｓ１（ｎ）及びｓ２（ｎ）から所定のＮ個（Ｎは任意の整数）のサンプルから成る、分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成するものとする。 The FFT unit 11 receives the input signals s1(n) and s2(n) from the microphones m_1 and m_2, and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1(n) and s2(n). is there. This allows the input signals s1(n) and s2(n) to be transformed from the time domain to the frequency domain. When performing the fast Fourier transform, the FFT unit 11 includes a predetermined number N (N is an arbitrary integer) of samples of the input signals s1(n) and s2(n), and the analysis frames FRAME1(K) and It is assumed that FRAME2(K) is configured.

入力信号ｓ１からＦＲＡＭＥ１を構成する例を（１）式に例示する。なお、以下の（１）式において、Ｋは、フレームの順番を表すインデックスであり、正の整数で表現される。以下では、Ｋの値が小さいほど古い分析フレームであり、Ｋの値が大きいほど新しい分析フレームであるものとする。また、以降の動作説明において、特に但し書きが無い限りは、分析対象となる最新の分析フレームを表すインデックスＫであるとする。

An example of constructing FRAME1 from the input signal s1 is illustrated by the equation (1). In the following equation (1), K is an index indicating the order of frames and is represented by a positive integer. Below, it is assumed that the smaller the value of K, the older the analysis frame, and the larger the value of K, the newer the analysis frame. Further, in the following explanation of the operation, unless otherwise specified, it is assumed that the index K represents the latest analysis frame to be analyzed.

ＦＦＴ部１１は、分析フレームごとに、高速フーリエ変換処理を施すことで、入力信号ｓ１から構成した分析フレームＦＲＡＭＥ１（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ１（ｆ，Ｋ）と、入力信号ｓ２から構成した分析フレームＦＲＡＭＥ２（Ｋ）にフーリエ変換して得た周波数領域信号Ｘ２（ｆ，Ｘ）とを取得する。ＦＦＴ部１１は、周波数領域信号Ｘ１（ｆ，Ｋ）及び周波数領域信号Ｘ２（ｆ，Ｘ）を、正面抑圧信号生成部１２を供給すると共に、コヒーレンス計算部１３に与える。 The FFT unit 11 performs a fast Fourier transform process on each analysis frame to input the frequency domain signal X1(f,K) obtained by Fourier transforming the analysis frame FRAME1(K) configured from the input signal s1. A frequency domain signal X2(f,X) obtained by Fourier transforming an analysis frame FRAME2(K) composed of the signal s2 is acquired. The FFT unit 11 supplies the frequency domain signal X1(f, K) and the frequency domain signal X2(f, X) to the front suppression signal generation unit 12 and also supplies them to the coherence calculation unit 13.

ここで、ｆは周波数を表すインデックスである。また、周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ,Ｋ）は単一の値ではなく、以下の(２)式のように、複数の周波数ｆ１〜ｆｍのｍ個（ｍは任意の整数）の成分（スペクトル成分）から構成されるものであるとする。

Here, f is an index showing a frequency. Further, the frequency domain signals X1(f,K) and X2(f,K) are not a single value, but m (m is an arbitrary number) of a plurality of frequencies f1 to fm as shown in the following equation (2). It is assumed to be composed of (integer) components (spectral components).

上記（２）式において、Ｘ１（ｆ，Ｋ）は複素数であり、実部と虚部からなる。以降、Ｘ２（ｆ，Ｋ）、及び後述する正面抑圧信号生成部１２で現れる正面抑圧信号Ｎ（ｆ，Ｋ）についても同様である。 In the above equation (2), X1(f,K) is a complex number, and is composed of a real part and an imaginary part. The same applies to X2(f,K) and the front suppression signal N(f,K) that appears in the front suppression signal generation unit 12 described below.

正面抑圧信号生成部１２は、ＦＦＴ部からの信号について、周波数成分ごとに、正面方向から到来する信号成分を抑圧する処理を行なう。言い換えると、正面抑圧信号生成部１２は、正面方向の成分を抑圧する指向性フィルタとして機能する。 The front suppression signal generation unit 12 performs processing for suppressing the signal component coming from the front direction for each frequency component of the signal from the FFT unit. In other words, the front suppression signal generation unit 12 functions as a directional filter that suppresses a component in the front direction.

例えば、正面抑圧信号生成部１２は、図２に示すように、８の字型の正面方向に死角を有する双指向性のフィルタを用いて、ＦＦＴ部１１からの信号から正面方向の成分を抑圧する指向性フィルタを形成する。 For example, as shown in FIG. 2, the front suppression signal generation unit 12 suppresses the front component from the signal from the FFT unit 11 by using a bidirectional filter having a figure 8 shape and having a blind spot in the front direction. To form a directional filter.

具体的には、正面抑圧信号生成部１２は、ＦＦＴ部１１からの信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に基づいて、（３）式のような計算を行なって、周波数成分毎の正面抑圧信号Ｎ（ｆ，Ｎ）を生成する。以下の（３）式の計算は、図２のような、正面方向に死角を有する８の字型の双指向性のフィルタを形成する処理に相当する。
Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） Specifically, the frontal suppression signal generation unit 12 performs a calculation as in Expression (3) based on the signals X1(f,K) and X2(f,K) from the FFT unit 11 to calculate the frequency component. A front suppression signal N(f,N) is generated for each. The calculation of the following equation (3) corresponds to the process of forming an 8-shaped bidirectional filter having a blind spot in the front direction as shown in FIG.
N(f,K)=X1(f,K)-X2(f,K) (3)

以上のように、正面抑圧信号生成部１２は、周波数ｆ１〜ｆｍの各周波数成分（各周波数帯の１フレーム分のパワー）を取得する。 As described above, the frontal suppression signal generation unit 12 acquires each frequency component of the frequencies f1 to fm (power for one frame in each frequency band).

また、正面抑圧信号生成部１２は、（４）式に従って、周波数ｆ１〜ｆｍの全周波数に亘って、正面抑圧信号Ｎ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を算出する。

Further, the frontal suppression signal generation unit 12 calculates an average frontal suppression signal AVE_N(K) by averaging the frontal suppression signals N(f,K) over all frequencies f1 to fm according to the equation (4). To do.

コヒーレンス計算部１３は、ＦＦＴ部１１からの周波数領域信号Ｘ１（ｆ，Ｋ）、Ｘ２（ｆ，Ｋ）に含まれる特定方向に指向性が強い信号を形成してコヒーレンスＣＯＨ（Ｋ）を算出する。 The coherence calculation unit 13 calculates a coherence COH(K) by forming a signal having strong directivity in a specific direction included in the frequency domain signals X1(f,K) and X2(f,K) from the FFT unit 11. ..

ここで、コヒーレンス計算部１３におけるコヒーレンスＣＯＨ（Ｋ）の算出処理を説明する。 Here, the process of calculating the coherence COH(K) in the coherence calculator 13 will be described.

コヒーレンス計算部１３は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第１の方向（例えば、左方向）に指向性が強いフィルタで処理した信号Ｂ１（ｆ，Ｋ）を形成し、またコヒーレンス計算部１３は、周波数領域信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）から第２の方向（例えば、右方向）に指向性が強いフィルタで処理した信号Ｂ２（ｆ，Ｋ）を形成する。特定方向に指向性の強い信号Ｂ１（ｆ）、Ｂ２（ｆ）の形成方法は、既存の方法を適用することができ、ここでは、以下の（５）式を適用して第１の方向に指向性が強い信号Ｂ１を形成し、以下の（６）式を適用して第２の方向に指向性が強い信号Ｂ２を形成する場合を例示する。

The coherence calculation unit 13 outputs the signal B1(f,K) processed by the filter having strong directivity in the first direction (for example, the left direction) from the frequency domain signals X1(f,K) and X2(f,K). Further, the coherence calculator 13 forms the signal B2(f) processed by the filter having strong directivity from the frequency domain signals X1(f,K) and X2(f,K) in the second direction (for example, rightward direction). , K). As a method of forming the signals B1(f) and B2(f) having a strong directivity in a specific direction, an existing method can be applied. Here, the following equation (5) is applied to apply the signals to the first direction. A case where the signal B1 having a strong directivity is formed and the following formula (6) is applied to form the signal B2 having a strong directivity in the second direction will be exemplified.

上記の（５）式、（６）式において、Ｓはサンプリング周波数、ＮはＦＦＴ分析フレーム長、τはマイクｍ＿１とマイクｍ＿２との間の音波到達時間差、ｉは虚数単位、ｆは周波数を示す。 In the above equations (5) and (6), S is a sampling frequency, N is an FFT analysis frame length, τ is a sound wave arrival time difference between the microphones m_1 and m_2, i is an imaginary unit, and f is a frequency. ..

次に、コヒーレンス計算部１３は、上記のようにして得られた信号Ｂ１（ｆ）、Ｂ２（ｆ）に対し、以下のような（７）式、（８）式に示す演算を施すことでコヒーレンスＣＯＨ（Ｋ）を得る。ここで、（７）式におけるＢ２（ｆ、Ｋ）^＊はＢ２（ｆ、Ｋ）の共役複素数である。

Next, the coherence calculation unit 13 performs the calculation shown in the following equations (7) and (8) on the signals B1(f) and B2(f) obtained as described above. Obtain coherence COH(K). Here, B2(f,K) ^* in the equation (7) is a conjugate complex number of B2(f,K).

ｃｏｅｆ（ｆ、Ｋ）は、インデックスが任意のインデックスＫのフレーム（分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を構成する任意の周波数ｆ（周波数ｆ１〜ｆｍのいずれかの周波数）の成分におけるコヒーレンスを表しているものとする。 coef(f,K) is a coherence in a component of an arbitrary frequency f (any one of frequencies f1 to fm) forming the frame (analysis frames FRAME1(K) and FRAME2(K)) with an index of arbitrary index K. Is represented.

なお、ｃｏｅｆ（ｆ，Ｋ）を求める際に、信号Ｂ１（ｆ）の指向性の方向と信号Ｂ（ｆ）の指向性の方向が異なるものであれば、信号Ｂ１（ｆ）及び信号Ｂ２（ｆ）に係る指向性方向はそれぞれ、正面方向以外の任意の方向とするようにしてもよい。また、ｃｏｅｆ（ｆ，Ｋ）を算出する方法は、上記の算出方法に限定されるものではなく、例えば、特許文献１に記載される算出方法を適用することができる。 When determining coef(f,K), if the directivity direction of the signal B1(f) and the directivity direction of the signal B(f) are different, the signal B1(f) and the signal B2( The directivity directions according to f) may be arbitrary directions other than the front direction. Further, the method for calculating coef(f,K) is not limited to the above calculation method, and for example, the calculation method described in Patent Document 1 can be applied.

相関計算部１４は、正面抑圧信号生成部１２から平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を取得し、コヒーレンス計算部１３からコヒーレンスＣＯＨ（Ｋ）を取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）との相関係数ｃｏｒ（Ｋ）を算出する。 The correlation calculation unit 14 acquires the average frontal suppression signal AVE_N(K) from the frontal suppression signal generation unit 12, the coherence COH(K) from the coherence calculation unit 13, and the average frontal suppression signal AVE_N(K) and the coherence COH. A correlation coefficient cor(K) with (K) is calculated.

相関計算部１４が、正面方向以外に指向性を有する正面抑圧信号（平均正面抑圧信号）と、コヒーレンスとの相関係数を算出する意義を説明する。 The significance of calculating the correlation coefficient between the coherence and the front suppression signal (average front suppression signal) having directivity in directions other than the front direction will be described.

ここでは、マイクｍ＿１及びマイクｍ＿２の正面方向に、目的音を発する音源が存在し、正面方向以外の方向（例えば、マイクｍ＿１及びマイクｍ＿２の左右方向の方向）から妨害音が到来するものとする。 Here, it is assumed that the sound source that emits the target sound exists in the front direction of the microphones m_1 and m_2, and the disturbing sound comes from a direction other than the front direction (for example, the left-right direction of the microphones m_1 and m_2). ..

例えば、「妨害音声が存在せず」、かつ、「目的音が存在する」場合、正面抑圧信号は、目的音成分の大きさに比例した信号値となる。ただし、図２のように、正面方向のゲインは、横方向のゲインと比較して小さいため、妨害音が存在する場合よりも小さい値となる。 For example, when “no interfering voice exists” and “target sound exists”, the front suppression signal has a signal value proportional to the magnitude of the target sound component. However, as shown in FIG. 2, since the gain in the front direction is smaller than the gain in the lateral direction, the gain is smaller than that in the case where the disturbing sound exists.

また、コヒーレンスＣＯＨ（Ｋ）は、入力信号の到来方向と深い関係を持つ特徴量であり、２つの信号成分の相関と言い換えられる。これは、（６）式は、ある周波数成分についての相関を算出する式であり、（７）式は全ての周波数成分の相関値の平均を計算する式であるためである。そのため、コヒーレンスＣＯＨ（Ｋ）が小さい場合は、２つの信号成分の相関が小さい場合であり、反対に、コヒーレンスＣＯＨ（Ｋ）が大きい場合とは、２つの信号成分の相関が大きい場合と言い換えることができる。そして、コヒーレンスＣＯＨ（Ｋ）が小さい場合の入力信号は、到来方向が右又は左のいずれかに大きく偏っており、正面方向以外の方向から到来している信号といえる。一方、コヒーレンスＣＯＨ（Ｋ）が大きい場合の入力信号は、到来方向の偏りが少なく、正面方向から到来している信号であるといえる。 Further, the coherence COH(K) is a feature quantity having a deep relationship with the arrival direction of the input signal, and can be paraphrased as a correlation between two signal components. This is because the formula (6) is a formula for calculating the correlation for a certain frequency component, and the formula (7) is a formula for calculating the average of the correlation values of all the frequency components. Therefore, when the coherence COH(K) is small, it means that the correlation between the two signal components is small, and conversely, when the coherence COH(K) is large, it means that the correlation between the two signal components is large. You can The input signal when the coherence COH(K) is small has a direction of arrival largely deviated to the right or left, and can be said to be a signal coming from a direction other than the front direction. On the other hand, the input signal when the coherence COH(K) is large can be said to be a signal arriving from the front direction with little deviation in the direction of arrival.

そうすると、「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きい値となり、「妨害音が存在し」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は小さい値となる。 Then, when "no interfering sound exists" and "the target sound exists", the coherence COH(K) becomes a large value, and "the interfering sound exists" and "the target sound exists" , Coherence COH(K) has a small value.

以上の挙動を妨害音の有無に着目して整理すると、以下のような関係となる。
・「妨害音が存在せず」、かつ、「目的音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）は大きな値となり、正面抑圧信号は目的音成分の大きさに比例した値となる
・「妨害音が存在する」場合、コヒーレンスＣＯＨ（Ｋ）が小さい値となり、正面抑圧信号は大きい値となる。 When the above behaviors are organized by focusing on the presence or absence of disturbing sounds, the following relationships are established.
When "no interfering sound exists" and "the target sound exists", the coherence COH(K) has a large value, and the front suppression signal has a value proportional to the size of the target sound component. When there is sound”, the coherence COH(K) has a small value and the front suppression signal has a large value.

ところで、上記のような挙動の場合、正面抑圧信号とコヒーレンスＣＯＨ（Ｋ）との相関係数を導入すると、以下のようなことがいえる。
・「妨害音が存在しない」場合、相関係数は正の値となる
・「妨害音が存在する」場合、相関係数は負の値となる。
従って、正面抑圧信号とコヒーレンスとの相関係数の正負を観測するだけで、妨害音の有無を判断することができる。そして、この挙動を用いると、正面抑圧信号とコヒーレンスとの相関係数の値が「正」の場合、正面方向からの目的音のみの区間と判断できるので、妨害音の影響を受けることなく、マイクｍ＿１及びｍ＿２の感度差の校正ゲインを計算することができる。また、相関係数の値の正負を観測するだけで、目的音声区間を検出できるため、従来技術とは異なり閾値設定が容易になる。 By the way, in the case of the above behavior, the following can be said when the correlation coefficient between the front suppression signal and the coherence COH(K) is introduced.
・If "interfering sound does not exist", the correlation coefficient has a positive value. ・If "interfering sound exists", the correlation coefficient has a negative value.
Therefore, it is possible to determine the presence or absence of the interfering sound only by observing whether the correlation coefficient between the front suppression signal and the coherence is positive or negative. Then, by using this behavior, when the value of the correlation coefficient between the front suppression signal and the coherence is “positive”, it can be determined that the section is only the target sound from the front direction, and thus is not affected by the interfering sound, A calibration gain for the sensitivity difference between microphones m_1 and m_2 can be calculated. Further, since the target voice section can be detected simply by observing whether the value of the correlation coefficient is positive or negative, the threshold value setting becomes easy unlike the prior art.

以下では、相関計算部１４における、正面抑圧信号とコヒーレンスとの相関係数の算出処理を、図面を参照しながら詳細に説明する。 Hereinafter, the calculation process of the correlation coefficient between the front suppression signal and the coherence in the correlation calculation unit 14 will be described in detail with reference to the drawings.

図３は、実施形態に係る相関計算部１４の構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the correlation calculation unit 14 according to the embodiment.

図３において、相関計算部１４は、正面抑圧信号・コヒーレンス取得部３１、相関係数計算部３２、相関係数出力部３３を有する。 In FIG. 3, the correlation calculation unit 14 includes a front suppression signal/coherence acquisition unit 31, a correlation coefficient calculation unit 32, and a correlation coefficient output unit 33.

正面抑圧信号・コヒーレンス取得部３１は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）とを取得し、相関係数計算部３２が、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）とに基づいて、相関係数ｃｏｒ（Ｋ）を算出する。そして、相関係数出力部３３は、算出した相関係数ｃｏｒ（Ｋ）を校正ゲイン計算部１５に出力する。 The frontal suppression signal/coherence acquisition unit 31 acquires the average frontal suppression signal AVE_N(K) and the coherence COH(K), and the correlation coefficient calculation unit 32 causes the average frontal suppression signal AVE_N(K) and the coherence COH(K). ) And the correlation coefficient cor(K) is calculated. Then, the correlation coefficient output unit 33 outputs the calculated correlation coefficient cor(K) to the calibration gain calculation unit 15.

ここで、相関係数ｃｏｒ（Ｋ）の算出方法は限定されるものではないが、例えば、非特許文献１に記載された計算方法を適用することができる。例えば、以下の式（９）を用いて、フレームごとに相関係数ｃｏｒ（Ｋ）を求める。なお、以下の（９）式において、Ｃｏｖ［ＡＶＥ＿Ｎ（Ｋ），ＣＯＨ（Ｋ）］は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）とコヒーレンスＣＯＨ（Ｋ）の共分散を示している。また、以下の（９）式において、σＡＶＥ＿Ｎ（Ｋ）は、平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）の標準偏差を示し、σＣＯＨ（Ｋ）は、コヒーレンスＣＯＨ（Ｋ）の標準偏差を示している。このようにして得られる相関係数ｃｏｒ（Ｋ）は、−１．０〜１．０の値をとる。

Here, the calculation method of the correlation coefficient cor(K) is not limited, but for example, the calculation method described in Non-Patent Document 1 can be applied. For example, the correlation coefficient cor(K) is calculated for each frame using the following equation (9). In the equation (9) below, Cov[AVE_N(K), COH(K)] represents the covariance of the average frontal suppression signal AVE_N(K) and the coherence COH(K). Further, in the following equation (9), σAVE_N(K) represents the standard deviation of the average frontal suppression signal AVE_N(K), and σCOH(K) represents the standard deviation of the coherence COH(K). The correlation coefficient cor(K) thus obtained takes a value of -1.0 to 1.0.

校正ゲイン計算部１５は、相関計算部１４から相関係数ｃｏｒ（Ｋ）を取得し、相関係数ｃｏｒ（Ｋ）の正負を観測し、相関係数ｃｏｒ（Ｋ）が「正」の区間の入力信号のみを用いて、マイクｍ＿１とマイクｍ＿２との校正ゲインを算出する。 The calibration gain calculation unit 15 acquires the correlation coefficient cor(K) from the correlation calculation unit 14, observes whether the correlation coefficient cor(K) is positive or negative, and determines whether the correlation coefficient cor(K) is “positive”. The calibration gains of the microphone m_1 and the microphone m_2 are calculated using only the input signal.

図４は、実施形態に係る校正ゲイン計算部１５の構成を示すブロック図である。 FIG. 4 is a block diagram showing the configuration of the calibration gain calculation unit 15 according to the embodiment.

図４において、校正ゲイン計算部１５は、相関係数及び入力信号取得部４１、校正ゲイン計算実行判定部４２、校正ゲイン計算部４３、校正ゲイン記憶部４４、校正ゲイン出力部４５を有する。
4, the calibration gain calculation unit 15 includes a correlation coefficient and input signal acquisition unit 41, a calibration gain calculation execution determination unit 42, a calibration gain calculation unit 43, a calibration gain storage unit 44, and a calibration gain output unit 45.

相関係数及び入力信号取得部４１は、相関計算部１４から相関係数ｃｏｒ（Ｋ）と、入力信号の分析フレームであるＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）とを取得するものである。 The correlation coefficient and input signal acquisition unit 41 acquires the correlation coefficient cor(K) and the FRAME1(K) and FRAME2(K) analysis frames of the input signal from the correlation calculation unit 14.

校正ゲイン計算実行判定部４２は、校正ゲインの計算を実行するか否かを判定するため、相関係数ｃｏｒ（Ｋ）の値が「正」であるか又は「負」であるかを判定する。すなわち、相関係数ｃｏｒ（Ｋ）の値が「正」の場合、校正ゲイン計算実行判定部４２は、入力信号には妨害音が含まれていない目的音区間と判断し、校正ゲインの計算を実行する区間であることを判定する。一方、相関係数ｃｏｒ（Ｋ）の値が「負」の場合、校正ゲイン計算実行判定部４２は、入力信号には妨害音が含まれている区間と判断し、校正ゲインの計算を実行しない区間であると判定する。 The calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor(K) is “positive” or “negative” in order to determine whether to execute the calibration gain calculation. .. That is, when the value of the correlation coefficient cor(K) is “positive”, the calibration gain calculation execution determination unit 42 determines that it is the target sound section in which the input signal does not include the interfering sound, and calculates the calibration gain. It is determined that the section is to be executed. On the other hand, when the value of the correlation coefficient cor(K) is "negative", the calibration gain calculation execution determination unit 42 determines that the input signal includes a disturbing sound, and does not execute the calibration gain calculation. It is determined to be a section.

校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２による判定結果に応じて、マイクｍ＿１及びｍ＿２の感度差に対する校正ゲインＬＥＶＥＬ＿ＧＡＩＮ＿１ＣＨ及びＬＥＶＥＬ＿ＧＡＩＮ＿２ＣＨを計算するものである。 The calibration gain calculator 43 calculates the calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH for the sensitivity difference between the microphones m_1 and m_2 according to the determination result of the calibration gain calculation execution determiner 42.

校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２により相関係数ｃｏｒ（Ｋ）が「正」であると判定されると、校正ゲインＬＥＶＥＬ＿ＧＡＩＮ＿１ＣＨ及びＬＥＶＥＬ＿ＧＡＩＮ＿２ＣＨを計算する。一方、校正ゲイン計算部４３は、校正ゲイン計算実行判定部４２により相関係数ｃｏｒ（Ｋ）が「負」であると判定されると、校正ゲインを計算せず、校正ゲイン記憶部４４に記憶されている値を校正ゲインとして設定する。 When the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor(K) is “positive”, the calibration gain calculation unit 43 calculates the calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH. On the other hand, when the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor(K) is “negative”, the calibration gain calculation unit 43 does not calculate the calibration gain and stores it in the calibration gain storage unit 44. Set the specified value as the calibration gain.

ここで、校正ゲイン計算部４３による校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨの計算方法を説明する。 Here, a method of calculating the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH by the calibration gain calculator 43 will be described.

校正ゲイン計算部４３は、以下の（１０，１）式、（１０，２）式、(１１)式、（１２，１）式及び（１２，２）式を用いて、入力信号ｓ１に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ、及び、入力信号ｓ２に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを計算する。

The calibration gain calculator 43 calibrates the input signal s1 by using the following equations (10, 1), (10, 2), (11), (12, 1) and (12, 2). The gain CALIB_GAIN_1CH and the calibration gain CALIB_GAIN_2CH for the input signal s2 are calculated.

（１０，１）式は、マイクｍ＿１が捕捉した入力信号ｓ１（ｎ）の現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿１ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿１ＣＨはマイクｍ＿１の感度を反映した値とみなすことができる。（１０，２）式は、マイクｍ＿２が捕捉した入力信号ｓ２（ｎ）の現フレーム（Ｋ番目のフレーム）の全ての構成要素の絶対値の平均ＬＥＶＥＬ＿２ＣＨを算出しているものであり、この算出した値ＬＥＶＥＬ＿２ＣＨはマイクｍ＿２の感度を反映した値とみなすことができる。 The equation (10, 1) is for calculating the average LEVEL_1CH of the absolute values of all the constituent elements of the current frame (Kth frame) of the input signal s1(n) captured by the microphone m_1. The value LEVEL_1CH that has been set can be regarded as a value that reflects the sensitivity of the microphone m_1. The equation (10, 2) is for calculating the average LEVEL_2CH of the absolute values of all the constituent elements of the current frame (Kth frame) of the input signal s2(n) captured by the microphone m_2. The value LEVEL_2CH can be regarded as a value that reflects the sensitivity of the microphone m_2.

なお、例えば、所定フレーム数での各フレームの構成要素の絶対値の総和値を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良い。また例えば、相関係数ｃｏｒ（Ｋ）が「正」である最新のＰ（Ｐ≦Ｋ）個のフレームを構成する全ての要素（信号成分）の絶対値の平均を、マイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨとして用いるようにしても良い。後者の場合、相関係数ｃｏｒ（Ｋ）が「正」であった最新のＰ−１個のフレームの構成要素の絶対値の総和値を保存しておくことにより、現フレーム（Ｋ番目のフレーム）ＦＲＡＭＥ１（Ｋ）、ＦＲＡＭＥ２（Ｋ）の情報が与えられたときに容易にマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨ、ＬＥＶＥＬ＿２ＣＨを計算することができる。上述したように長期間の信号成分の絶対値の平均や総和値を算出することにより、瞬間的な入力信号の変動の影響を抑制してマイク感度を反映した値を算出することができる。 Note that, for example, the total sum of absolute values of the constituent elements of each frame in a predetermined number of frames may be used as the values LEVEL_1CH and LEVEL_2CH that reflect the microphone sensitivity. Further, for example, the average of the absolute values of all the elements (signal components) forming the latest P (P≦K) frames whose correlation coefficient cor(K) is “positive” is a value that reflects the microphone sensitivity. You may make it use as LEVEL_1CH and LEVEL_2CH. In the latter case, the sum of absolute values of the components of the latest P-1 frames whose correlation coefficient cor(K) is "positive" is stored to save the current frame (K-th frame). ) When the information of FRAME1(K) and FRAME2(K) is given, the values LEVEL_1CH and LEVEL_2CH reflecting the microphone sensitivity can be easily calculated. As described above, by calculating the average or sum of the absolute values of the signal components for a long period of time, it is possible to suppress the influence of the instantaneous fluctuation of the input signal and calculate a value that reflects the microphone sensitivity.

（１０，１）式及び（１０，２）式は、マイク感度を反映した値の算出式の一例であり、上述したように、その他、種々の算出式が適用できる。但し、マイクｍ＿１のマイク感度を反映した値ＬＥＶＥＬ＿１ＣＨの算出式と、マイクｍ＿２のマイク感度を反映した値ＬＥＶＥＬ＿２ＣＨの算出式とが同じ算出式であることを要する。 The equations (10, 1) and (10, 2) are examples of the equations for calculating the value that reflects the microphone sensitivity, and various other equations can be applied as described above. However, the calculation formula of the value LEVEL_1CH that reflects the microphone sensitivity of the microphone m_1 and the calculation formula of the value LEVEL_2CH that reflects the microphone sensitivity of the microphone m_2 need to be the same.

（１１）式は、２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの平均ＡＶＥ＿ＬＥＶＥＬを、校正後のマイクｍ＿１及びｍ＿２の目標感度として算出している。なお、２つのマイクｍ＿１及びｍ＿２の感度ＬＥＶＥＬ＿１ＣＨ及びＬＥＶＥＬ＿２ＣＨの大きい方の値若しくは小さい方の値を目標感度とするようにしても良い。 Formula (11) calculates the average AVE_LEVEL of the sensitivities LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 as the target sensitivity of the calibrated microphones m_1 and m_2. The target sensitivity may be the larger value or the smaller value of the sensitivities LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2.

（１２，１）式は、その右辺の分母ＬＥＶＥＬ＿１ＣＨを左辺に移項した式を考えると理解できるように、マイクｍ＿１の感度ＬＥＶＥＬ＿１ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを定める式になっている。同様に、（１２，２）式は、その右辺の分母ＬＥＶＥＬ＿２ＣＨを左辺に移項した式を考えると理解できるように、マイクｍ＿２の感度ＬＥＶＥＬ＿２ＣＨに校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算した値が目標感度ＡＶＥ＿ＬＥＶＥＬになるように、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを定める式になっている。 In the equation (12, 1), as can be understood by considering an equation in which the denominator LEVEL_1CH on the right side of the equation is transferred to the left side, calibration is performed so that the value obtained by multiplying the sensitivity LEVEL_1CH of the microphone m_1 by the calibration gain CALIB_GAIN_1CH becomes the target sensitivity AVE_LEVEL. It is an expression that determines the gain CALIB_GAIN_1CH. Similarly, in the expression (12, 2), as can be understood by considering an expression in which the denominator LEVEL_2CH on the right side is transferred to the left side, the value obtained by multiplying the sensitivity LEVEL_2CH of the microphone m_2 by the calibration gain CALIB_GAIN_2CH becomes the target sensitivity AVE_LEVEL. In addition, it is an expression that determines the calibration gain CALIB_GAIN_2CH.

校正ゲイン記憶部４４は、校正ゲイン計算部４３が校正ゲインを計算しない場合に適用する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ）及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ（＝ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨ）を記憶しているものである。このような校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨとして、校正させない値１．０を適用しても良く、また、校正ゲイン計算部４３が計算した直近の値を適用するようにしても良い。 The calibration gain storage unit 44 stores the calibration gains CALIB_GAIN_1CH (=INIT_GAIN_1CH) and CALIB_GAIN_2CH (=INIT_GAIN_2CH) that are applied when the calibration gain calculation unit 43 does not calculate the calibration gain. As such calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH, a value 1.0 that is not calibrated may be applied, or the latest value calculated by the calibration gain calculator 43 may be applied.

校正ゲイン出力部４５は、校正ゲイン計算部４３が計算で得た校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨ、若しくは、記憶部２４から読み出された校正ゲインＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ及びＩＮＩＴ＿ＧＡＩＮ＿２ＣＨをそれぞれ、対応する校正ゲイン乗算部１６、１７に与えるものである。 The calibration gain output unit 45 corresponds to the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH obtained by the calculation by the calibration gain calculation unit 43, or the calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24, respectively. To give to.

第１校正ゲイン乗算部１６は、マイクｍ＿１からの入力信号ｓ１（ｎ）に、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算して得た、校正後信号ｙ１（ｎ）を出力するものである。 The first calibration gain multiplication unit 16 outputs the post-calibration signal y1(n) obtained by multiplying the input signal s1(n) from the microphone m_1 by the calibration gain CALIB_GAIN_1CH.

第２校正ゲイン乗算部１７は、マイクｍ＿２からの入力信号ｓ２（ｎ）に、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算して得た、校正後信号ｙ２（ｎ）を出力するものである。 The second calibration gain multiplication unit 17 outputs the post-calibration signal y2(n) obtained by multiplying the input signal s2(n) from the microphone m_2 by the calibration gain CALIB_GAIN_2CH.

（Ａ−２）実施形態の動作
次に、実施形態に係る音響信号処理装置１０における全体処理及び校正ゲインの計算処理の動作を、図面を参照しながら詳細に説明する。 (A-2) Operation of Embodiment Next, the operation of the overall processing and the calculation processing of the calibration gain in the acoustic signal processing device 10 according to the embodiment will be described in detail with reference to the drawings.

マイクｍ＿１及びｍ＿２のそれぞれから図示しないＡＤ変換器を介して、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）がＦＦＴ部１１に入力される。 The input signals s1(n) and s2(n) for one frame are input to the FFT unit 11 from the microphones m_1 and m_2 through an AD converter (not shown).

ＦＦＴ部１１は、１フレーム分の入力信号ｓ１（ｎ）及びｓ２（ｎ）に基づく分析フレームＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）についてフーリエ変換し、周波数領域で示される信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）を取得する。ＦＦＴ部１１により生成された信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）が、正面抑圧信号生成部１２及びコヒーレンス計算部１３に与えられる。 The FFT unit 11 performs Fourier transform on the analysis frames FRAME1(K) and FRAME2(K) based on the input signals s1(n) and s2(n) for one frame, and the signal X1(f,K) shown in the frequency domain. And X2(f,K). The signals X1(f,K) and X2(f,K) generated by the FFT unit 11 are given to the front suppression signal generation unit 12 and the coherence calculation unit 13.

正面抑圧信号生成部１２は、信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）に基づいて、正面方向以外の方向に指向性を有する正面抑圧信号Ｎ（ｆ、Ｋ）を算出する。そして、正面抑圧信号生成部１２は、全周波数に亘って正面抑圧信号Ｎ（ｆ，Ｋ）を平均した、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）を生成し、この平均正面抑圧信号ＡＶＥ＿Ｎ（Ｋ）を相関計算部１４に与える。 The frontal suppression signal generation unit 12 calculates a frontal suppression signal N(f,K) having directivity in a direction other than the frontal direction based on the signals X1(f,K) and X2(f,K). Then, the frontal suppression signal generation unit 12 generates an averaged frontal suppression signal AVE_N(f,K) by averaging the frontal suppression signals N(f,K) over all frequencies, and this average frontal suppression signal AVE_N(K ) Is given to the correlation calculation unit 14.

一方、コヒーレンス計算部１３は、信号Ｘ１（ｆ，Ｋ）及びＸ２（ｆ，Ｋ）に基づいて、コヒーレンスＣＯＨを算出し、コヒーレンスＣＯＨを相関計算部１４に与える。 On the other hand, the coherence calculation unit 13 calculates the coherence COH based on the signals X1(f,K) and X2(f,K), and supplies the coherence COH to the correlation calculation unit 14.

相関計算部１４は、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとを取得し、平均正面抑圧信号ＡＶＥ＿Ｎ（ｆ，Ｋ）とコヒーレンスＣＯＨとの相関係数ｃｏｒ（Ｋ）を算出し、この相関係数ｃｏｒ（Ｋ）を校正ゲイン計算部１５に与える。 The correlation calculation unit 14 acquires the average frontal suppression signal AVE_N(f,K) and the coherence COH, and calculates the correlation coefficient cor(K) between the average frontal suppression signal AVE_N(f,K) and the coherence COH, The correlation coefficient cor(K) is given to the calibration gain calculator 15.

校正ゲイン計算部１５は、相関係数ｃｏｒ（Ｋ）を取得し、この相関係数ｃｏｒ（Ｋ）の正負を観測し、その判断結果に応じて、各信号ｓ１（ｎ）及びｓ２（ｎ）に対する校正ゲインを算出する。また、校正ゲイン計算部１５は、信号ｓ１（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを第１校正ゲイン乗算部１６に出力し、信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを第２校正ゲイン乗算部１７に出力する。 The calibration gain calculation unit 15 acquires the correlation coefficient cor(K), observes whether the correlation coefficient cor(K) is positive or negative, and determines the signals s1(n) and s2(n) according to the determination result. Calculate the calibration gain for. Further, the calibration gain calculation unit 15 outputs the calibration gain CALIB_GAIN_1CH for the signal s1(n) to the first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH for the signal s2(n) to the second calibration gain multiplication unit 17. ..

図５は、校正ゲイン計算部１５における処理動作を示すフローチャートである。 FIG. 5 is a flowchart showing the processing operation in the calibration gain calculation unit 15.

相関係数及び入力信号取得部４１は、相関係数部１４から相関係数ｃｏｒ（Ｋ）を取得し、入力信号ｓ１（ｎ）及びｓ２（ｎ）のＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を取得する（Ｓ５１）。 The correlation coefficient and input signal acquisition unit 41 acquires the correlation coefficient cor(K) from the correlation coefficient unit 14 and outputs the FRAME1(K) and FRAME2(K) of the input signals s1(n) and s2(n). It is acquired (S51).

そして、校正ゲイン計算実行判定部４２が、相関係数ｃｏｒ（Ｋ）の値が正であるか又は負であるかを判定する（Ｓ５２）。 Then, the calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor(K) is positive or negative (S52).

相関係数ｃｏｒ（Ｋ）が正の場合、正面方向以外の方向から到来した妨害音は存在せず、正面方向からの目的音区間とみなし、校正ゲイン計算部４３は、相関係数ｃｏｒ（Ｋ）、ＦＲＡＭＥ１（Ｋ）及びＦＲＡＭＥ２（Ｋ）を用いて、（１０，１）式、（１０，２）式、（１１）式、（１２，１）式、（１２，２）式に従って、信号ｓ１（ｎ）及び信号ｓ２（ｎ）に対する校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを算出する（Ｓ５３）。このとき、校正ゲイン計算部４３は、算出した校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨをそれぞれ、校正ゲイン記憶部４４に記憶して、校正ゲイン記憶部４４に記憶される校正ゲインを更新する。 When the correlation coefficient cor(K) is positive, there is no interfering sound coming from directions other than the front direction, and it is regarded as the target sound section from the front direction, and the calibration gain calculation unit 43 determines that the correlation coefficient cor(K) ), FRAME1(K), and FRAME2(K), according to equations (10,1), (10,2), (11), (12,1), and (12,2), Calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for s1(n) and signal s2(n) are calculated (S53). At this time, the calibration gain calculator 43 stores the calculated calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH in the calibration gain storage 44, respectively, and updates the calibration gains stored in the calibration gain storage 44.

相関係数ｃｏｒ（Ｋ）が負の場合、正面方向以外の方向から到来した妨害音は存在するとみなし、校正ゲイン計算部４３は、校正ゲイン記憶部４４に記憶されている値を校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする（Ｓ５４）。 When the correlation coefficient cor(K) is negative, it is considered that there is an interfering sound coming from a direction other than the front direction, and the calibration gain calculation unit 43 sets the value stored in the calibration gain storage unit 44 to the calibration gain CALIB_GAIN_1CH and It is set to CALIB_GAIN_2CH (S54).

つまり、校正ゲイン記憶部４４に、校正ゲインの初期値ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨ、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨが格納されている場合、ＩＮＩＴ＿ＧＡＩＮ＿１ＣＨをＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨとし、ＩＮＩＴ＿ＧＡＩＮ＿２ＣＨをＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする。若しくは、校正ゲイン記憶部４４に、最新の校正ゲインが記憶されている場合は、校正ゲイン記憶部４４に記憶されている最新の校正ゲインを、今回の校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨ及びＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨとする。 That is, when the calibration gain storage unit 44 stores the initial values INIT_GAIN_1CH and INIT_GAIN_2CH of the calibration gain, INIT_GAIN_1CH is set to CALIB_GAIN_1CH and INIT_GAIN_2CH is set to CALIB_GAIN_2CH. Alternatively, when the latest calibration gain is stored in the calibration gain storage unit 44, the latest calibration gains stored in the calibration gain storage unit 44 are set as the current calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH.

そして、校正ゲイン出力部４５は、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを第１校正ゲイン乗算部１６に出力し、校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを第２校正ゲイン乗算部１７に出力する（Ｓ５５）。そして、校正ゲイン計算部１５は、インデックスＫを更新して（Ｓ５６）、Ｓ５１に移行して次のインデックスの校正ゲインの算出処理を行なう。 Then, the calibration gain output unit 45 outputs the calibration gain CALIB_GAIN_1CH to the first calibration gain multiplication unit 16 and outputs the calibration gain CALIB_GAIN_2CH to the second calibration gain multiplication unit 17 (S55). Then, the calibration gain calculation unit 15 updates the index K (S56), moves to S51, and performs a calibration gain calculation process for the next index.

ここで、校正ゲイン計算部１５は、校正ゲインを一度計算した後は校正ゲインが変動することは無いので、定常的に校正ゲインを更新し続けることは演算量の無駄となるので、途中から更新を停止してもよい。つまり、マイクｍ＿１及びｍ＿２を有する音響信号処理装置１０が使用される環境で、初期段階に、マイクｍ＿１及びｍ＿２に対する校正ゲインを取得した後は、定常的な校正ゲインの更新を行なう必要はなく、適宜校正ゲインの算出が必要な場合に行なうようにしてもよい。 Here, since the calibration gain calculation unit 15 does not change the calibration gain after once calculating the calibration gain, constantly updating the calibration gain wastes the amount of calculation. May be stopped. That is, in an environment where the acoustic signal processing device 10 having the microphones m_1 and m_2 is used, it is not necessary to constantly update the calibration gain after acquiring the calibration gains for the microphones m_1 and m_2 in the initial stage. The calibration gain may be calculated when necessary.

そして、第１校正ゲイン乗算部１６は、信号ｓ１（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿１ＣＨを乗算し、校正後信号ｙ１（ｎ）を出力し、第２校正ゲイン乗算部１７は、信号ｓ２（ｎ）に校正ゲインＣＡＬＩＢ＿ＧＡＩＮ＿２ＣＨを乗算し、校正後信号ｙ２（ｎ）を出力する。 Then, the first calibration gain multiplication unit 16 multiplies the signal s1(n) by the calibration gain CALIB_GAIN_1CH and outputs the post-calibration signal y1(n), and the second calibration gain multiplication unit 17 outputs the signal s2(n). The calibration gain CALIB_GAIN_2CH is multiplied, and the post-calibration signal y2(n) is output.

（Ａ−３）実施形態の効果
以上のように、この実施形態によれば、正面方向以外の方向から到達する妨害音が存在する場合、正面抑圧信号とＣＯＨとの相関係数が負であり、妨害音が存在しない場合、正面抑圧信号とＣＯＨとの相関係数が正となる、という特徴的な挙動を用いることで、妨害音声の影響を受けることなく、かつ、設計者にとって閾値設定が容易なマイク感度校正方法を実現することができる。 (A-3) Effects of the Embodiment As described above, according to this embodiment, when there is an interfering sound arriving from a direction other than the front direction, the correlation coefficient between the front suppression signal and COH is negative. By using the characteristic behavior that the correlation coefficient between the front suppression signal and COH becomes positive when there is no interfering sound, the threshold setting for the designer is not affected by the interfering sound. An easy microphone sensitivity calibration method can be realized.

これにより、マイクアレイを用いた各種信号処理方法の前処理に、マイクｍ＿１及びｍ＿２に対する校正ゲインを算出する処理を適用することで、その後の音声処理性能の向上が期待できる。 Accordingly, by applying the process of calculating the calibration gain for the microphones m_1 and m_2 to the pre-processing of various signal processing methods using the microphone array, it is expected that the subsequent audio processing performance will be improved.

（Ｂ）他の実施形態
上述した実施形態においても種々の変形実施形態を言及したが、本発明は、以下の変形実施形態にも適用できる。 (B) Other Embodiments Although various modified embodiments have been mentioned in the above-described embodiments, the present invention can be applied to the following modified embodiments.

（Ｂ−１）上述した実施形態において、相関計算部が、正面抑圧信号とコヒーレンスとの特徴量として相関係数を算出する場合を例示したが、正面抑圧信号とコヒーレンスとの特徴量として共分散の値を算出しても、上述した実施形態と同様な効果が得られる。 (B-1) In the above-described embodiment, the case where the correlation calculation unit calculates the correlation coefficient as the feature amount of the frontal suppression signal and the coherence is illustrated, but the covariance is calculated as the feature amount of the frontal suppression signal and the coherence. Even if the value of is calculated, the same effect as that of the above-described embodiment can be obtained.

（Ｂ−２）上述した実施形態では、本発明に係る音響信号処理装置は、複数のマイクを備えた音声処理機能（例えば、音声認識処理など）を有する装置であれば、様々な装置に適用することができ、例えば、スマートフォン、タブレット端末、テレビ会議端末、カーナビゲーションシステム、コールセンタ端末、ロボット、音信号をセンサ信号として使用する装置等に広く適用できる。 (B-2) In the above-described embodiment, the acoustic signal processing device according to the present invention is applicable to various devices as long as the device has a voice processing function (for example, voice recognition processing) including a plurality of microphones. It can be widely applied to, for example, smartphones, tablet terminals, video conference terminals, car navigation systems, call center terminals, robots, devices that use sound signals as sensor signals, and the like.

また、例えば、本発明の音響信号処理装置が通信機能を備える装置に搭載され、当該装置が、ネットワークを通じて、所定の音声処理機能を有するサーバに、校正後信号を送信するようにしてもよい。 Further, for example, the acoustic signal processing device of the present invention may be mounted on a device having a communication function, and the device may transmit the post-calibration signal to a server having a predetermined voice processing function via a network.

さらに、例えば、複数のマイクを備えた通信機能を有する装置が、ネットワークを通じて、本発明の音響信号処理装置を搭載したサーバに、各マイクの入力信号を送信するようにしてもよい。この場合、音響信号処理装置を搭載したサーバが、上述した実施形態と同様に、正面抑圧信号とコヒーレンスとの相関係数に応じて、各入力信号に対する校正ゲインを算出することができる。 Furthermore, for example, a device having a communication function, which includes a plurality of microphones, may transmit the input signal of each microphone to a server equipped with the acoustic signal processing device of the present invention through a network. In this case, the server equipped with the acoustic signal processing device can calculate the calibration gain for each input signal according to the correlation coefficient between the front suppression signal and the coherence, as in the above-described embodiment.

（Ｂ−３）上述した実施形態では、マイクが２個である場合を例示したが、３個以上のマイクのそれぞれから入力信号を取得する装置にも本発明を適用することができる。 (B-3) In the above-described embodiment, the case where the number of microphones is two is illustrated, but the present invention can be applied to an apparatus that acquires an input signal from each of three or more microphones.

１０…音響信号処理装置、ｍ＿１及びｍ＿２…マイク、１１・・ＦＴＴ部、１２…正面抑圧信号生成部、１３…コヒーレンス計算部、
１４…相関計算部、３１…正面抑圧信号、コヒーレンス取得部、３２…相関係数計算部、３３…相関係数出力部、
１５…校正ゲイン計算部、４１…相関係数及び入力信号取得部、４２…校正ゲイン計算実行判定部、４３…校正ゲイン計算部、４４…校正ゲイン記憶部、４５…校正ゲイン出力部、
１６…第１校正ゲイン乗算部、１７…第２校正ゲイン乗算部。 10... Acoustic signal processing device, m_1 and m_2... Microphone, 11... FTT section, 12... Front suppression signal generation section, 13... Coherence calculation section,
14... Correlation calculation unit, 31... Front suppression signal, coherence acquisition unit, 32... Correlation coefficient calculation unit, 33... Correlation coefficient output unit,
15... Calibration gain calculation unit, 41... Correlation coefficient and input signal acquisition unit, 42... Calibration gain calculation execution determination unit, 43... Calibration gain calculation unit, 44... Calibration gain storage unit, 45... Calibration gain output unit,
16... 1st calibration gain multiplication part, 17... 2nd calibration gain multiplication part.

Claims

In an acoustic signal processing device for calibrating the difference in microphone sensitivity between a plurality of input acoustic signals,
For each frequency component, a first frequency domain signal obtained by converting the first input acoustic signal from the time domain into the frequency domain and a second frequency domain signal obtained by converting the second input acoustic signal from the time domain into the frequency domain A value of each frequency component obtained by taking the difference is averaged, a front suppression signal generation unit that generates a front suppression signal having a blind spot in the front direction,
A first directivity signal having a directivity characteristic with strong directivity in a first direction different from the front direction based on the first frequency domain signal and the second frequency domain signal; A coherence calculation unit that calculates coherence by using a second directional signal that is different from the first direction and has a directivity characteristic with a strong directivity in a second direction different from the first direction ;
A feature amount calculation unit that calculates a correlation value representing the relationship between the front suppression signal and the coherence,
A calibration gain calculation unit that calculates each calibration gain for the first and second input acoustic signals depending on whether the correlation value is a positive value or a negative value ,
A calibration unit that calibrates each corresponding input acoustic signal with each calibration gain described above,
The calibration gain calculation unit is
When the correlation value is a positive value, a target sound section coming from the front, which is not affected by the interfering sound, is detected, and the first and second input acoustic signals in the target sound section are used, A value reflecting the microphone sensitivity of each input acoustic signal is calculated, a target sensitivity is obtained from a value reflecting a plurality of calculated microphone sensitivities, and based on the target sensitivity and the value reflecting each microphone sensitivity, Calculate each calibration gain for each input acoustic signal,
When the correlation value is a negative value, the initial value of each calibration gain for each of the input acoustic signals, or stored in the calibration gain calculation unit, the latest of the target sound section not affected by the interfering sound An acoustic signal processing device, wherein each of the calibration gains is used.

The calibration gain calculation unit is
When the correlation value is a positive value, it is determined that the target sound section that is not affected by the interfering sound is a section in which the calculation of each calibration gain is executed, and when the correlation value is a negative value, the effect of the interfering sound is determined. A calibration gain calculation execution determination unit that determines that the calculation of each calibration gain is not performed as a target sound section
A calibration gain storage unit that stores each calibration gain,
Based on the determination by the calibration gain calculation execution determination unit, in the target sound section that is not affected by the interfering sound, while performing the calculation of each of the calibration gain, the calculated calibration gain in the calibration gain storage unit And a calibration gain calculation unit that outputs the latest calibration gains stored in the calibration gain storage unit in a target sound section that is stored and is affected by an interfering sound.
Audio signal processing apparatus according to claim 1, characterized in that it comprises a.

The calibration gain calculation unit calculates, for each of the input acoustic signals, an average value of absolute values of a plurality of signal components in the input acoustic signal as a value that reflects the microphone sensitivity of the input acoustic signal. The acoustic signal processing device according to claim 1 or 2 .

The calibration gain calculation unit determines the average value of the values reflecting the calculated plurality of microphone sensitivities as the target sensitivity, and the determined target sensitivity reflects the microphone sensitivity associated with each of the input acoustic signals. The acoustic signal processing device according to claim 1, wherein a calibration gain for each of the input acoustic signals is calculated by dividing by a value .

In an acoustic signal processing program that calibrates the difference in microphone sensitivity between multiple input acoustic signals,
Computer,
For each frequency component, a first frequency domain signal obtained by converting the first input acoustic signal from the time domain into the frequency domain and a second frequency domain signal obtained by converting the second input acoustic signal from the time domain into the frequency domain A value of each frequency component obtained by taking the difference is averaged, a front suppression signal generation unit that generates a front suppression signal having a blind spot in the front direction,
A first directivity signal having a directivity characteristic with strong directivity in a first direction different from the front direction based on the first frequency domain signal and the second frequency domain signal; A coherence calculation unit that calculates coherence by using a second directional signal that is different from the first direction and has a directivity characteristic with a strong directivity in a second direction different from the first direction ;
A feature amount calculation unit that calculates a correlation value representing the relationship between the front suppression signal and the coherence,
A calibration gain calculation unit that calculates each calibration gain for the first and second input acoustic signals depending on whether the correlation value is a positive value or a negative value ,
With each of the above calibration gains, function as a calibration unit that calibrates each corresponding input acoustic signal ,
The calibration gain calculation unit is
When the correlation value is a positive value, a target sound section coming from the front, which is not affected by the interfering sound, is detected, and the first and second input acoustic signals in the target sound section are used, A value reflecting the microphone sensitivity of each input acoustic signal is calculated, a target sensitivity is obtained from a value reflecting a plurality of calculated microphone sensitivities, and based on the target sensitivity and the value reflecting each microphone sensitivity, Calculate each calibration gain for each input acoustic signal,
When the correlation value is a negative value, the initial value of each calibration gain for each of the input acoustic signals, or stored in the calibration gain calculation unit, the latest of the target sound section not affected by the interfering sound Set each calibration gain
Acoustic signal processing program characterized and this.

In the acoustic signal processing method for calibrating the difference in microphone sensitivity between a plurality of input acoustic signals,
A front suppression signal generation unit converts a first input acoustic signal from a time domain to a frequency domain in a first frequency domain signal and a second input acoustic signal from a time domain to a frequency domain in a second frequency domain signal. By averaging the values of each frequency component obtained by taking the difference with the signal for each frequency component, to generate a frontal suppression signal having a blind spot in the frontal direction,
A first directivity in which the coherence calculation unit imparts a strong directivity characteristic in a first direction different from the front direction based on the first frequency domain signal and the second frequency domain signal. Calculating coherence using a signal and a second directional signal that is different from the front direction and has a directional characteristic having a strong directivity in a second direction different from the first direction ;
The feature amount calculation unit calculates a correlation value representing the relationship between the front suppression signal and the coherence,
The calibration gain calculator calculates each calibration gain for the first and second input acoustic signals depending on whether the correlation value is a positive value or a negative value ,
The calibration unit calibrates each of the corresponding input acoustic signals with each of the above calibration gains ,
The calibration gain calculation unit is
When the correlation value is a positive value, a target sound section coming from the front, which is not affected by the interfering sound, is detected, and the first and second input acoustic signals in the target sound section are used, A value reflecting the microphone sensitivity of each input acoustic signal is calculated, a target sensitivity is obtained from a value reflecting a plurality of calculated microphone sensitivities, and based on the target sensitivity and the value reflecting each microphone sensitivity, Calculate each calibration gain for each input acoustic signal,
When the correlation value is a negative value, the initial value of each calibration gain for each of the input acoustic signals, or stored in the calibration gain calculation unit, the latest of the target sound section not affected by the interfering sound An acoustic signal processing method, wherein each calibration gain is used.