JP2000341658A

JP2000341658A - Speaker direction detecting system

Info

Publication number: JP2000341658A
Application number: JP11147468A
Authority: JP
Inventors: Yasumasa Morishita; 康正森下
Original assignee: NEC Engineering Ltd
Current assignee: NEC Engineering Ltd
Priority date: 1999-05-27
Filing date: 1999-05-27
Publication date: 2000-12-08

Abstract

PROBLEM TO BE SOLVED: To provide a speaker direction detecting system for miniaturizing the voice inputting part of a television conference device, and for improving stability by reducing the detection error of a speaker direction even when a signal arriving from a direction other than the direction of a speaker is superimposed on a speaker voice signal. SOLUTION: An echo canceller 8 estimates the transfer function of arrival delay from the output signals of two microphones 1 and 2 arranged in a horizontal direction. A maximum tap detecting circuit 10 retrieves a tap having the maximum value from an H register for holding the estimated transmission function, and supplies the tap number to a tap number/speaker direction converting circuit 12. In the same way, an echo cancel 9 estimates the transfer function of the arrival delay from the output signals of two microphones 2 and 3 arranged in a vertical direction. A maximum tap detecting circuit 11 retrieves the tap having the maximum value from the H register for holding the estimated transfer function, and supplies the tap number to the tap number/ speaker direction converting circuit 12. The tap number/speaker direction converting circuit 12 converts the two maximum tap numbers into a horizontal angle and a vertical angle for obtaining a speaker directional signal.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は話者方向検出システ
ムに関し、特にテレビジョン会議装置の様に画像入力用
ビテオカメラを話者の音声入用マイクロホン側に向ける
ための話者方向検出システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker direction detecting system, and more particularly to a speaker direction detecting system for pointing a video camera for image input to a microphone for voice input of a speaker, such as a television conference apparatus. is there.

【０００２】[0002]

【従来の技術】従来、撮像装置のカメラの撮像角を制御
するために、被写体の音声信号を受信してその方向を検
出する技術として、特開平７−１２３３１１号公報に開
示のものがある。すなわち、単一指向性マイクロホンと
双方向性マイクとを音声入力源として使用し、これ等２
つのマイクロホンの出力信号を同期を取ってマイクロホ
ンの感度調整回路を経て位相差を算出し、被写体の方向
を検出する様になっている。2. Description of the Related Art Conventionally, as a technique for receiving a voice signal of a subject and detecting the direction of the voice signal in order to control the imaging angle of a camera of an imaging apparatus, there is a technique disclosed in Japanese Patent Application Laid-Open No. Hei 7-123331. That is, a unidirectional microphone and a bidirectional microphone are used as audio input sources,
The output signals of the two microphones are synchronized, the phase difference is calculated through the sensitivity adjustment circuit of the microphones, and the direction of the subject is detected.

【０００３】[0003]

【発明が解決しようとする課題】テレビジョン会議装置
を小形化するためには、マイクロホンを装置内部に収納
することが考えられるが、この場合、上記公開公報の技
術に示された単一指向性マイクロホンにおいて理想的な
単一指向性パターンの特性を実現するには、マイクロホ
ン周囲の装置筐体に採音のための孔を大きく開けること
が必要であり、小型かつ十分な強度を有する筐体を実現
するには不利である。In order to reduce the size of the television conference apparatus, it is conceivable to house a microphone inside the apparatus. In this case, the unidirectionality disclosed in the above-mentioned publication is disclosed. To achieve the ideal unidirectional pattern characteristics of a microphone, it is necessary to make a large hole for sound collection in the device housing around the microphone. It is disadvantageous to realize.

【０００４】また、テレビジョン会議装置が使用される
状況では、話者は必ずしも一人であるとは限らず、むし
ろ複数の人が装置の周囲に存在することが多い。テレビ
ジョン会議装置に入力される音声信号は、通常の会話に
おいては一人の話者の音声成分がほとんどであるが、会
話の途中で割り込みをかけて話し始める別の話者が存在
する可能性や、机を鉛筆等で叩いたり、配布された資料
に目をとおすために資料をめくる音などの、定常的では
ない非音声信号（雑音信号）が重畳されていることも十
分にあり得る。In a situation where a television conference apparatus is used, the number of speakers is not always one, but rather a plurality of persons are often present around the apparatus. The audio signal input to the video conference device has most of the voice component of one speaker in a normal conversation, but there is a possibility that another speaker starts to interrupt and talk during the conversation. A non-stationary non-speech signal (noise signal) such as a sound of hitting a desk with a pencil or the like or turning a document to look through a distributed document may well be superimposed.

【０００５】従って、話者方向検出装置の出力信号を用
いてビデオカメラの撮像角を制御しようとした場合に、
話者方向の検出誤りが発生すると、話者以外の方向にカ
メラが向いてしまうという問題が生じ、テレビジョン会
議装置の利用者に多大な不都合が生じてしまうという問
題がある。[0005] Therefore, when an attempt is made to control the imaging angle of a video camera using the output signal of the speaker direction detecting device,
If a speaker direction detection error occurs, a problem arises in that the camera is directed in a direction other than the speaker, and there is a problem that a great inconvenience occurs for the user of the television conference apparatus.

【０００６】本発明の目的は、テレビジョン会議装置の
音声入力部を小型化し、話者音声信号に話者以外の方向
から到来する信号が重畳していても話者方向の検出誤り
を低減し安定性を高めることが可能な話者方向検出シス
テムを提供することである。SUMMARY OF THE INVENTION It is an object of the present invention to reduce the size of a voice input unit of a television conference apparatus, and to reduce detection errors in a speaker direction even when a signal arriving from a direction other than the speaker is superimposed on a speaker voice signal. An object of the present invention is to provide a speaker direction detection system capable of improving stability.

【０００７】[0007]

【課題を解決するための手段】本発明によれば、第一及
び第二のマイクロホンにより出力される信号間の遅延時
間を推定する遅延時間推定手段と、この推定遅延時間に
より話者方向を算出する話者方向算出手段とを含むこと
を特徴とする話者方向検出システムが得られる。According to the present invention, a delay time estimating means for estimating a delay time between signals output from first and second microphones, and a speaker direction is calculated based on the estimated delay time. And a speaker direction calculating means.

【０００８】そして、前記遅延時間推定手段は、前記第
一のマイクロホンにより出力される信号を主信号入力と
し、前記第二のマイクロホンにより出力される信号を参
照信号入力とするエコーキャンセラ構成であることを特
徴とし、前記エコーキャンセラは、前記参照信号を入力
とする適応ＦＩＲフィルタと、この適応ＦＩＲフィルタ
の出力と前記主信号との減算をなす減算器と、この減算
出力がほぼゼロとなるよう前記適応ＦＩＲフィルタの学
習動作を制御する制御手段とを有することを特徴とす
る。The delay time estimating means has an echo canceller configuration in which a signal output from the first microphone is used as a main signal input, and a signal output from the second microphone is used as a reference signal input. Wherein the echo canceller comprises: an adaptive FIR filter that receives the reference signal as input; a subtractor that subtracts the output of the adaptive FIR filter from the main signal; Control means for controlling a learning operation of the adaptive FIR filter.

【０００９】また、前記主信号を一定時間遅延せしめて
前記エコーキャンセラへ入力するようにしたことをと特
徴とし、そして、前記制御手段は、前記減算出力がほぼ
ゼロとなるよう前記適応ＦＩＲフィルタのタップ係数を
更新制御するようにしたことを特徴とし、また前記遅延
時間推定手段は前記適応ＦＩＲフィルタのタップ係数の
最大値を検出する手段を有し、前記話者方向算出手段は
この最大値に対応するタップ番号に応じて前記話者方向
を算出する手段を有することを特徴とする。The main signal is delayed by a predetermined time and input to the echo canceller, and the control means controls the adaptive FIR filter so that the subtraction output becomes substantially zero. The tap coefficient is controlled to be updated, and the delay time estimating means has means for detecting a maximum value of the tap coefficient of the adaptive FIR filter, and the speaker direction calculating means determines the maximum value of the tap coefficient. The apparatus further comprises means for calculating the speaker direction according to a corresponding tap number.

【００１０】更に、前記エコーキャンセラの消去量を算
出する手段と、この算出量が所定閾値を下回る場合に前
記話者方向算出手段の算出の更新を停止する手段とを含
むことを特徴とする。Further, the invention is characterized in that it comprises means for calculating the amount of erasure of the echo canceller, and means for stopping updating of the calculation of the speaker direction calculating means when the calculated amount is less than a predetermined threshold value.

【００１１】更にはまた、前記第一及び第二のマイクロ
ホンを水平（または垂直）方向に一定間隔で配置して、
前記遅延時間推定手段及び話者方向算出手段により、前
記話者の水平（または垂直）方向の角度を算出すること
を特徴とする。また、前記第一のマイクロホンと第三の
マイクロホンとを垂直（または水平）方向に一定間隔で
配置して、これ等第一及び第三のマイクロホンにより出
力される信号間の遅延時間を推定する第二の遅延時間推
定手段と、この推定遅延時間により前記話者の垂直（ま
たは水平）方向の角度を算出する第二の話者方向算出手
段とを、更に含むことを特徴とする。Furthermore, the first and second microphones are arranged at regular intervals in a horizontal (or vertical) direction,
The delay time estimating means and the speaker direction calculating means may calculate a horizontal (or vertical) direction angle of the speaker. In addition, the first microphone and the third microphone are arranged at regular intervals in the vertical (or horizontal) direction, and a delay time between signals output by these first and third microphones is estimated. It is characterized by further including second delay time estimating means and second speaker direction calculating means for calculating the vertical (or horizontal) angle of the speaker based on the estimated delay time.

【００１２】本発明による話者方向検出システムは、互
いに一定間隔だけおいて配置された２個の全指向性マイ
クロホンから夫々得られる２つの音声信号間に存在する
到達遅延時間を推定する部分に、エコーキャンセラを設
けたものである。このエコーキャンセラは到達遅延の伝
達関数を推定する動作を行い、このエコーキャンセラを
構成するＦＩＲフィルタのタップ係数を保持するＨレジ
スタには、推定伝達関数が保持される。この推定伝達関
数の中で最大振幅を有するタップ番号を検出する最大タ
ップ検出回路を設ける。この最大タップ検出回路は伝達
関数の最大振幅部の位置を到達遅延時間と見なす動作を
する。従って、伝達関数の推定結果から到達遅延時間を
求め、話者方向に換算して出力する様構成している。A speaker direction detection system according to the present invention includes a part for estimating a arrival delay time existing between two audio signals obtained from two omnidirectional microphones arranged at a fixed interval from each other. An echo canceller is provided. The echo canceller performs an operation of estimating the transfer function of the arrival delay, and the estimated transfer function is held in the H register that holds the tap coefficient of the FIR filter constituting the echo canceller. A maximum tap detecting circuit for detecting a tap number having the maximum amplitude in the estimated transfer function is provided. This maximum tap detection circuit operates to consider the position of the maximum amplitude part of the transfer function as the arrival delay time. Therefore, the arrival delay time is obtained from the estimation result of the transfer function, converted into the speaker direction, and output.

【００１３】[0013]

【発明の実施の形態】本発明の実施の形態につき添付図
面を参照しつつ詳細に説明する。図１を参照すると、本
発明の一実施の形態としての話者方向検出回路１３が示
されている。この話者方向検出回路１３は、音声信号入
力用として全指向性マイクロホン１〜３を有する。これ
らのマイクロホン１〜３の出力はそれぞれＡ／Ｄ（アナ
ログ−デジタル）変換器４〜６に供給されデジタル信号
に変換される。Ａ／Ｄ変換器５の出力信号は遅延回路７
に供給されており、よってＡ／Ｄ変換器４，６の出力信
号と比較して時間遅延を伴って出力される。Embodiments of the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 shows a speaker direction detecting circuit 13 according to an embodiment of the present invention. The speaker direction detection circuit 13 has omnidirectional microphones 1 to 3 for inputting audio signals. The outputs of these microphones 1 to 3 are supplied to A / D (analog-digital) converters 4 to 6 and converted into digital signals. The output signal of the A / D converter 5 is a delay circuit 7
And output with a time delay compared with the output signals of the A / D converters 4 and 6.

【００１４】これ等マイクロホン１〜３の配置例を図２
に示している。マイクロホン２と１とは装置の水平方向
において一定間隔を持って配置されており、これ等２つ
のマイクロホン２と１とにより、話者方向の水平角θを
検出する様になっている。また、マイクロホン２と３と
は装置の垂直方向において一定間隔を持って配置されて
おり、これ等２つのマイクロホン２と３とにより、話者
方向の垂直角φを検出する様になっている。FIG. 2 shows an example of arrangement of these microphones 1-3.
Is shown in The microphones 2 and 1 are arranged at a fixed interval in the horizontal direction of the apparatus, and the two microphones 2 and 1 detect the horizontal angle θ in the speaker direction. The microphones 2 and 3 are arranged at a certain interval in the vertical direction of the apparatus, and the two microphones 2 and 3 detect a vertical angle φ in the speaker direction.

【００１５】マイクロホン１と２との両出力の水平方向
遅延の伝達関数を求める目的で、エコーキャンセラ８が
設けられている。エコーキャンセラ８はＡ／Ｄ変換器４
の出力信号を参照信号として入力し、遅延回路７の出力
信号を主信号として入力する。エコーキャンセラ８を学
習動作させることで、マイクロホン１と２に入力される
音声信号の到達遅延の伝達関数の推定が行われ、その推
定結果はエコーキャンセラ８を構成するＦＩＲ（Finite
Impulse Response ）ディジタルフィルタ１７（図３参
照）の内部のＨレジスタに格納されることになる。An echo canceller 8 is provided for obtaining a transfer function of a horizontal delay between both outputs of the microphones 1 and 2. Echo canceller 8 is A / D converter 4
Is input as a reference signal, and the output signal of the delay circuit 7 is input as a main signal. By performing the learning operation of the echo canceller 8, the transfer function of the arrival delay of the audio signals input to the microphones 1 and 2 is estimated, and the estimation result is obtained by the FIR (Finite) which constitutes the echo canceller 8.
Impulse Response) is stored in the H register inside the digital filter 17 (see FIG. 3).

【００１６】更に、最大タップ検出回路１０が設けられ
ており、この最大タップ検出回路１０はエコーキャンセ
ラ８から供給されるＨレジスタの内容（タップ係数）を
入力とし、その中から最大値を有するタップを検索して
そのタップ番号をタップ番号−話者方向換算回路１２に
供給する。Further, a maximum tap detecting circuit 10 is provided. The maximum tap detecting circuit 10 receives the content (tap coefficient) of the H register supplied from the echo canceller 8 as an input, and selects a tap having a maximum value from the input. And supplies the tap number to the tap number-speaker direction conversion circuit 12.

【００１７】同様に、垂直方向遅延の伝達関数を求める
目的で、エコーキャンセラ９が設けられている。このエ
コーキャンセラ９はＡ／Ｄ変換器６の出力信号を参照信
号として入力し、遅延回路７の出力信号を主信号として
入力する。エコーキャンセラ９を学習動作させること
で、マイクロホン３と２とに入力される音声信号の到達
遅延の伝達関数の推定が行われ、その推定結果はエコー
キャンセラ９内部のＨレジスタに格納される。Similarly, an echo canceller 9 is provided for obtaining a transfer function of a vertical delay. The echo canceller 9 inputs the output signal of the A / D converter 6 as a reference signal, and inputs the output signal of the delay circuit 7 as a main signal. By performing the learning operation of the echo canceller 9, the transfer function of the arrival delay of the audio signal input to the microphones 3 and 2 is estimated, and the estimation result is stored in the H register inside the echo canceller 9.

【００１８】最大タップ検出回路１１はエコーキャンセ
ラ９から供給されるＨレジスタの内容（タップ係数）を
入力とし、その中から最大値を有するタップを検索して
そのタップ番号をタップ番号−話者方向換算回路１２に
供給する。このタップ番号−話者方向換算回路１２は最
大タップ検出回路１０，１１から夫々供給されたタップ
番号を装置正面方向に対する話者方向の水平角θと垂直
角φとに、夫々換算して出力する。The maximum tap detecting circuit 11 receives the content (tap coefficient) of the H register supplied from the echo canceller 9, searches for a tap having the maximum value from the input, and determines the tap number as a tap number-speaker direction. It is supplied to the conversion circuit 12. The tap number / speaker direction conversion circuit 12 converts the tap numbers supplied from the maximum tap detection circuits 10 and 11 into a horizontal angle θ and a vertical angle φ in the speaker direction with respect to the front direction of the device, respectively, and outputs them. .

【００１９】図３は図１に示したエコーキャンセラ８，
９の構成を示す図である。エコーキャンセラ８，９の各
々はリーク積分器１４，１５と、係数更新制御回路１６
と、適応ＦＩＲフィルタ１７と、減算器１８と、Ｈレジ
スタ修正量計算回路１９とを有する。エコーキャンセラ
８，９の各々には主信号ｙと参照信号ｘとが供給され
る。FIG. 3 shows the echo canceller 8 shown in FIG.
9 is a diagram illustrating a configuration of FIG. Each of echo cancellers 8 and 9 includes leak integrators 14 and 15 and coefficient update control circuit 16
, An adaptive FIR filter 17, a subtractor 18, and an H register correction amount calculation circuit 19. Each of the echo cancellers 8 and 9 is supplied with a main signal y and a reference signal x.

【００２０】減算器１８には、主信号ｙとＦＩＲフィル
タ１７の出力信号ｙ＾が供給され、信号ｙから信号ｙ＾
を差し引いた残差信号ｅを生成される。Ｈレジスタ修正
量計算回路１９は、残差信号ｅが最小になるように適応
ＦＩＲフィルタ１７のタップ係数の修正量Δｈを計算し
て適応ＦＩＲフィルタ１７へ供給する。The main signal y and the output signal y # of the FIR filter 17 are supplied to a subtracter 18, and the signal y # is converted from the signal y.
Is subtracted to generate a residual signal e. The H register correction amount calculation circuit 19 calculates the correction amount Δh of the tap coefficient of the adaptive FIR filter 17 so as to minimize the residual signal e and supplies the same to the adaptive FIR filter 17.

【００２１】図４は適応ＦＩＲフィルタ１７の構成を示
す図である。ＦＩＲフィルタ１７はＸレジスタ２０、乗
算回路２１、Ｈレジスタ２２および加算回路２３を有す
る。入力された参照信号ｘは、１サンプリング周期の遅
延を生じる１サンプル遅延器１００1 〜１００N-1 （Ｎ
はタップ長を表す正整数）からなるＸレジスタ２０に供
給される。１サンプル遅延器１００1 に供給された参照
信号データは、１クロック毎に隣接する１サンプル遅延
器に転送される。すなわち、Ｘレジスタはタップ付きの
遅延線である。FIG. 4 is a diagram showing a configuration of the adaptive FIR filter 17. The FIR filter 17 has an X register 20, a multiplier 21, an H register 22, and an adder 23. The input reference signal x has one sample delays 1001 to 100N-1 (N
Is a positive integer representing the tap length). The reference signal data supplied to the one-sample delay 1001 is transferred to an adjacent one-sample delay every clock. That is, the X register is a tapped delay line.

【００２２】また、参照信号ｘは乗算器２００0 に供給
され、Ｈレジスタ２２から供給されるタップ係数と乗算
される。各１サンプル遅延器１００1 〜１００N-1 の出
力信号はそれぞれ乗算器２００1 〜２００N-1 に供給さ
れ、Ｈレジスタ２２から供給されるタップ係数と乗算さ
れる。乗算器２００0 〜２００N-1 の出力信号は加算回
路２３に供給される。加算回路２３は乗算器２００0 〜
２００N-1 から受けた出力信号の総和を計算し、ＦＩＲ
フィルタ１７の出力信号ｙ＾とする。The reference signal x is supplied to a multiplier 2000 and is multiplied by a tap coefficient supplied from the H register 22. Output signals of the one-sample delay units 1001 to 100N-1 are supplied to multipliers 2001 to 200N-1, respectively, and are multiplied by tap coefficients supplied from the H register 22. The output signals of the multipliers 2000 to 200N-1 are supplied to the adder 23. The addition circuit 23 includes multipliers 2000 to
Calculate the sum of the output signals received from 200N-1 and calculate the FIR
It is assumed that the output signal of the filter 17 is y ＾.

【００２３】リーク積分器１４は主信号ｙのパワーを簡
易的に求める手段として備えられており、主信号ｙを入
力としてそのパワーＬｙを出力する。リーク積分器１５
は参照信号ｘのパワーを簡易的に求める手段として備え
られており、参照信号ｘを入力としてそのパワーＬｘを
出力する。これ等リーク積分器は、図５に示す構成であ
るが、当業者にとってよく知られているので、その詳細
な説明は省略する。The leak integrator 14 is provided as a means for easily obtaining the power of the main signal y, and receives the main signal y as input and outputs the power Ly. Leak integrator 15
Is provided as a means for easily obtaining the power of the reference signal x, and receives the reference signal x and outputs its power Lx. These leak integrators have the configuration shown in FIG. 5, but are well known to those skilled in the art, and thus detailed description thereof will be omitted.

【００２４】係数更新制御回路１６は適応ＦＩＲフィル
タ１７の係数更新の許可あるいは停止を指示する信号を
生成する手段として備えられている。図６を参照しなが
ら、この係数更新回路１６の動作を説明する。The coefficient update control circuit 16 is provided as means for generating a signal instructing permission or stop of coefficient update of the adaptive FIR filter 17. The operation of the coefficient updating circuit 16 will be described with reference to FIG.

【００２５】係数更新制御回路１６は１サンプリング周
期で動作する。係数更新制御回路１６には、リーク積分
器１４の出力信号Ｌｙとリーク積分器１５の出力信号Ｌ
ｘが供給される（ステップ６０１）。次に、パワーの大
小によって音声信号の有無を検出すべく、Ｌｘと閾値Ｌ
ｎとの大小を比較する（ステップ６０２）。Ｌｘが閾値
Ｌｎ以下ならば、更新停止信号を出力して（ステップ６
０５）動作を終了する。Ｌｘがしきい値Ｌｎより大きい
ならば、次のステップ６０３へ進む。ステップ６０３で
は、同様にして、Ｌｙと閾値Ｌｎとの大小を比較する。
Ｌｙが閾値Ｌｎ以下ならば、更新停止信号を出力して
（ステップ６０５）動作を終了する。Ｌｘがしきい値Ｌ
ｎより大きいならば、係数更新許可信号を出力して（ス
テップ６０４）動作を終了する。The coefficient update control circuit 16 operates in one sampling cycle. The output signal Ly of the leak integrator 14 and the output signal L of the leak integrator 15 are supplied to the coefficient update control circuit 16.
x is supplied (step 601). Next, in order to detect the presence or absence of an audio signal based on the magnitude of power, Lx and threshold
The value of n is compared with the value of n (step 602). If Lx is equal to or smaller than the threshold Ln, an update stop signal is output (step 6).
05) End the operation. If Lx is greater than the threshold value Ln, the process proceeds to the next step 603. In step 603, the magnitudes of Ly and the threshold Ln are similarly compared.
If Ly is equal to or smaller than the threshold value Ln, an update stop signal is output (step 605), and the operation ends. Lx is the threshold L
If it is larger than n, a coefficient update permission signal is output (step 604), and the operation ends.

【００２６】エコーキャンセラは、これを構成する要素
である適応ＦＩＲフィルタにおいて、Ｘレジスタに格納
されている参照信号ｘと、Ｈレジスタに格納されている
タップ係数との畳み込み演算を行って信号ｙ＾を生成
し、更に主信号ｙから信号ｙ＾を減算して残差信号ｅを
生成する。よって、この残差信号ｅは、The echo canceller performs a convolution operation on a reference signal x stored in an X register and a tap coefficient stored in an H register in an adaptive FIR filter which is a constituent element of the echo canceller, thereby obtaining a signal y ＾. , And subtracting the signal y から from the main signal y to generate a residual signal e. Therefore, this residual signal e is

【００２７】[0027]

【式１】となる。(Equation 1) Becomes

【００２８】このとき、適応ＦＩＲフィルタが収束して
残差信号ｅがゼロに見なせるならば、上式は、At this time, if the adaptive FIR filter converges and the residual signal e can be regarded as zero, the above equation becomes:

【００２９】[0029]

【式２】となり、エコーキャンセラ８において、主信号ｙと参照
信号ｘとの差分である信号遅延は伝達関数ｈで表され
る。すなわち、音源からマイクロホン１への音響空間の
伝達関数と、音源からマイクロホン２への音響空間の伝
達関数に遅延回路７による遅延を加えた伝達関数との差
分となる伝達関数がｈである。(Equation 2) In the echo canceller 8, the signal delay, which is the difference between the main signal y and the reference signal x, is represented by a transfer function h. That is, h is a transfer function that is the difference between the transfer function of the acoustic space from the sound source to the microphone 1 and the transfer function obtained by adding the delay of the delay circuit 7 to the transfer function of the acoustic space from the sound source to the microphone 2.

【００３０】もし、音源からマイクロホン１への距離
と、音源からマイクロホン２への距離とが等しい場合に
は、両者の伝達関数に差はなくなり、伝達関数ｈで最大
値を有するタップ位置は、遅延回路７による遅延で一意
に決定される。音源からマイクロホン１への距離と、音
源からマイクロホン２への距離とが等しくない場合に
は、この距離の差に比例して、伝達関数ｈで最大値を有
するタップ位置が、遅延回路７による遅延によって決定
される位置よりずれることになる。従って、この最大タ
ップ位置のずれから逆算して音源方向の特定が可能とな
るのである。エコーキャンセラ９の場合に関しても同様
である。If the distance from the sound source to the microphone 1 and the distance from the sound source to the microphone 2 are equal, there is no difference between the transfer functions of the two, and the tap position having the maximum value in the transfer function h is the delay. It is uniquely determined by the delay by the circuit 7. If the distance from the sound source to the microphone 1 is not equal to the distance from the sound source to the microphone 2, the tap position having the maximum value of the transfer function h is proportional to the difference between the distances. Deviated from the position determined by Therefore, it is possible to specify the sound source direction by calculating backward from the displacement of the maximum tap position. The same applies to the case of the echo canceller 9.

【００３１】このエコーキャンセラに入力される参照信
号と主信号の大きさが異なる場合でも、参照信号と主信
号の大きさの比率に合わせて伝達関数全体の振幅が大小
するだけであり、最大値を有するタップの位置は影響さ
れない。すなわち、最大値を有するタップの位置は音声
信号の入力手段であるマイクロホンの感度の大小に影響
されない。Even when the magnitude of the reference signal and the main signal input to the echo canceller is different, the amplitude of the entire transfer function only increases or decreases in accordance with the ratio between the magnitude of the reference signal and the magnitude of the main signal. The position of the tap with is not affected. That is, the position of the tap having the maximum value is not affected by the magnitude of the sensitivity of the microphone that is the audio signal input means.

【００３２】エコーキャンセラを構成する要素である適
応ＦＩＲフィルタのＨレジスタ係数の修正アルゴリズム
としては、様々な種類のものが存在する。例えば、学習
同定法を用いると、Ｈレジスタの修正は次式に従って行
われる。但し、次式において、添字ｋは適応ＦＩＲフィ
ルタのタップ番号、ΔｈはＨレジスタの修正量、μは適
応の速度と安定性を決める係数である。There are various types of algorithms for correcting the H register coefficient of the adaptive FIR filter, which is a component of the echo canceller. For example, when the learning identification method is used, the correction of the H register is performed according to the following equation. However, in the following equation, the subscript k is the tap number of the adaptive FIR filter, Δh is the correction amount of the H register, and μ is a coefficient that determines the speed and stability of adaptation.

【００３３】[0033]

【式３】エコーキャンセラの修正アルゴリズムには、上述の学習
同定法に限定されるものではなく、ＬＭＳ法(Least Mea
n Square）やその他のアルゴリズムを用いることができ
ることは言うまでもない。(Equation 3) The correction algorithm of the echo canceller is not limited to the learning identification method described above, but may be an LMS method (Least Mea
n Square) and other algorithms can of course be used.

【００３４】図７を参照しながら、最大タップ検出回路
１０の動作について説明する。ここで、Ｎは適応ＦＩＲ
フィルタのタップ長を表す定数、ｋはタップ番号を示す
変数、ｍは最大値を記憶しておくための変数である。先
ず、最大タップ検出回路１０は、エコーキャンセラ回路
８からＨレジスタを入力する（ステップ５０１）。次
に、変数ｉ，ｋ，ｍにゼロを夫々代入して初期化する
（ステップ５０２）。そして、変数ｉとＮとを比較して
（ステップ５０３）、ｉがＮより小さい場合は次のステ
ップ５０４に進み、ｉがＮ以上の場合にはタップ番号ｋ
を出力して（ステップ５０７）終了する。The operation of the maximum tap detection circuit 10 will be described with reference to FIG. Where N is the adaptive FIR
A constant indicating the tap length of the filter, k is a variable indicating the tap number, and m is a variable for storing the maximum value. First, the maximum tap detection circuit 10 inputs an H register from the echo canceller circuit 8 (step 501). Next, each of the variables i, k, and m is initialized by substituting zero (step 502). Then, the variable i is compared with N (step 503). If i is smaller than N, the process proceeds to the next step 504. If i is equal to or larger than N, the tap number k
Is output (step 507), and the processing ends.

【００３５】ステップ５０４では、Ｈレジスタのｉ番目
のタップｈ（ｉ）の値と変数ｍを比較する。ｈ（ｉ）が
変数ｍより大きい場合は次にステップ５０５に進み、ｈ
（ｉ）が変数ｍ以下の場合はステップ５０６に進む。ス
テップ５０５は、すでに検索し終えたタップの値よりも
大きな値を持つタップが見つかった場合の処理であり、
変数ｍにｈ（ｉ）を代入して変数ｋに変数ｉを代入す
る。ステップ５０６では、変数ｉの値に１を加えてステ
ップ５０３に戻る。以上の動作により、Ｈレジスタの中
から最大値を持つタップ番号の検出を行うことができ
る。最大タップ検出回路１１の動作も同様であり、エコ
ーキャンセラ回路９のＨレジスタの中から、最大値を有
するタップを検索してそのタップ番号を出力する。In step 504, the value of the i-th tap h (i) of the H register is compared with the variable m. If h (i) is greater than the variable m, the process proceeds to step 505, where h
If (i) is equal to or less than the variable m, the process proceeds to step 506. Step 505 is a process in the case where a tap having a value larger than the value of the already searched tap is found,
The variable m is substituted for h (i), and the variable k is substituted for the variable i. In step 506, 1 is added to the value of the variable i, and the process returns to step 503. With the above operation, the tap number having the maximum value can be detected from the H register. The operation of the maximum tap detection circuit 11 is the same. The tap having the maximum value is searched from the H register of the echo canceller circuit 9 and the tap number is output.

【００３６】タップ番号−話者方向換算回路１２の動作
を説明する。まず、図１を参照すると、マイクロホン２
からエコーキャンセラ８，９の主信号入力までの信号伝
達経路には遅延回路７が存在する。この遅延回路７は、
装置に対して話者が如何なる方向に位置していても、エ
コーキャンセラ８，９が確実に収束可能であることを満
たす目的で設けられている。なぜならば、一般に知られ
ている様に、エコーキャンセラが収束するためには因果
律に反しないことが条件の一つであり、参照信号よりも
主信号が遅延してエコーキャンセラに入力されることが
必要条件だからである。The operation of the tap number / speaker direction conversion circuit 12 will be described. First, referring to FIG.
There is a delay circuit 7 in the signal transmission path from the input to the main signal inputs of the echo cancellers 8 and 9. This delay circuit 7
It is provided for the purpose of satisfying that the echo cancellers 8 and 9 can reliably converge regardless of the direction in which the speaker is located with respect to the apparatus. This is because, as is generally known, in order for the echo canceller to converge, one of the conditions is that it does not violate causality, and that the main signal is input to the echo canceller after being delayed from the reference signal. It is a necessary condition.

【００３７】従って、例えば、遅延回路７の遅延量は２
つのマイクロホン間の距離を空気中の音速で割って求め
られる時間以上になるすれば良い。すなわち、エコーキ
ャンセラ８，９が推定した伝達関数には、遅延回路７の
遅延量が含まれている。Therefore, for example, the delay amount of the delay circuit 7 is 2
The distance between the two microphones should be equal to or longer than the time obtained by dividing the distance by the speed of sound in the air. That is, the transfer functions estimated by the echo cancellers 8 and 9 include the delay amount of the delay circuit 7.

【００３８】図８を参照しながら、水平方向における話
者方向の換算方法を説明する。マイクロホン間の距離を
Ｌ、空気中の音速をｖ、遅延回路７によって付加される
遅延をＤc 、最大タップ検出回路１０から供給される最
大タップ番号をｋh とすると、A method for converting the speaker direction in the horizontal direction will be described with reference to FIG. Assuming that the distance between the microphones is L, the sound velocity in the air is v, the delay added by the delay circuit 7 is Dc, and the maximum tap number supplied from the maximum tap detection circuit 10 is kh,

【００３９】[0039]

【式４】となる。従って、装置正面方向に対する角度θは、(Equation 4) Becomes Therefore, the angle θ with respect to the apparatus front direction is

【００４０】[0040]

【式５】として求められる。(Equation 5) Is required.

【００４１】同様にして、図９を参照して垂直方向にお
ける話者方向の換算方法を説明する。マイクロホン間の
距離をＬ、空気中の音速をｖ、遅延回路７によって付加
される遅延をＤc 、最大タップ検出回路１１から供給さ
れる最大タップ番号をｋv とすると、Similarly, a method for converting the speaker direction in the vertical direction will be described with reference to FIG. Assuming that the distance between the microphones is L, the sound velocity in the air is v, the delay added by the delay circuit 7 is Dc, and the maximum tap number supplied from the maximum tap detection circuit 11 is kv.

【００４２】[0042]

【式６】となる。従って、装置正面方向に対する角度φは、(Equation 6) Becomes Therefore, the angle φ with respect to the front direction of the device is

【００４３】[0043]

【式７】として求められる。Equation 7 Is required.

【００４４】本発明の他の実施の形態につき説明する。
この他の実施の形態の基本的構成は上記の第一の実施の
形態と同等であるが、タップ番号−話者方向換算回路１
２の出力信号である水平角信号と垂直角信号の安定性の
向上についてさらに工夫している。図１０，１１は本発
明の他の実施の形態を示す図であり、図１，３と同等部
分は同一符号にて示している。図１０に示す様に、図１
におけるエコーキャンセラ１３の構成に加えて、話者方
向の水平角、垂直角出力信号の安定度を改善する目的
で、消去量計算回路３０，３１と更新制御回路３２とが
設けられている。Next, another embodiment of the present invention will be described.
The basic configuration of the other embodiment is the same as that of the first embodiment, except that the tap number / speaker direction conversion circuit 1
The stability of the horizontal angle signal and the vertical angle signal which are the output signals of No. 2 is further improved. 10 and 11 are diagrams showing another embodiment of the present invention, and the same parts as those in FIGS. 1 and 3 are denoted by the same reference numerals. As shown in FIG.
In addition to the configuration of the echo canceller 13 described above, erasure amount calculation circuits 30 and 31 and an update control circuit 32 are provided for the purpose of improving the stability of the horizontal and vertical angle output signals in the speaker direction.

【００４５】図１２を参照しながら消去量計算回路３
０，３１の構成を説明する。消去量計算回路は除算回路
３５と対数計算回路３６とリーク積分回路３７とにより
構成されており、エコーキャンセラの消去量を求める動
作をする。消去量計算回路には、リーク積分器１４の出
力信号Ｌｙとリーク積分器３４（図１１参照）の出力信
号Ｌｅとが供給される。尚、リーク積分器３４はエコー
キャンセラの出力である残差信号ｅのパワーＬｅを算出
するものである。Referring to FIG. 12, erase amount calculation circuit 3
The configuration of 0, 31 will be described. The erasure amount calculation circuit includes a division circuit 35, a logarithm calculation circuit 36, and a leak integration circuit 37, and performs an operation of calculating the erasure amount of the echo canceller. The output signal Ly of the leak integrator 14 and the output signal Le of the leak integrator 34 (see FIG. 11) are supplied to the elimination amount calculation circuit. The leak integrator 34 calculates the power Le of the residual signal e, which is the output of the echo canceller.

【００４６】先ず、信号Ｌｙと信号Ｌｅとは除算回路３
５に供給され、除算回路３５はＬｅをＬｙで割りＬｅと
Ｌｙとの比を求めて対数計算回路３６に供給する。対数
計算回路３６は、ＬｅとＬｙとの比の常用対数をとり、
リーク積分器３７に供給する。リーク積分器３７はその
常用対数を平滑化して出力する。この様にして求めたエ
コーキャンセラの消去量は、それぞれ更新制御回路３２
に供給される。First, the signal Ly and the signal Le are divided by the division circuit 3
5, the division circuit 35 divides Le by Ly, finds the ratio of Le and Ly, and supplies the ratio to the logarithmic calculation circuit 36. The logarithmic calculation circuit 36 calculates the common logarithm of the ratio of Le and Ly,
It is supplied to the leak integrator 37. The leak integrator 37 smoothes the common logarithm and outputs it. The amount of erasure of the echo canceller obtained in this manner is stored in the update control circuit 32.
Supplied to

【００４７】更新制御回路３２は、エコーキャンセラの
収束しているか否かを判断し、推定された伝達関数が正
当性を確かめる目的で設けられている。図１３を参照し
ながら更新制御回路３２の構成と動作について説明す
る。更新制御回路３２には、大小比較器３８，３９とス
イッチ４０，４１とメモリ４２，４３とが設けられてい
る。大小比較器３８には、水平方向の伝達関数の推定を
行ったエコーキャンセラ８の消去量が供給され、閾値Ｌ
htと比較し結果をスイッチ４０へと出力する。スイッチ
４０には、タップ番号−話者方向換算回路から水平角信
号が供給され、消去量が閾値Ｌhtより大きい場合は、ス
イッチを閉じて水平角信号をメモリ４２に供給する。消
去量が閾値Ｌht以下である場合は、スイッチを開いて水
平角信号の供給を停止する。メモリ４２はその保持内容
を水平角信号として常に出力する。The update control circuit 32 is provided for determining whether or not the echo canceller has converged, and for verifying the estimated transfer function. The configuration and operation of the update control circuit 32 will be described with reference to FIG. The update control circuit 32 includes size comparators 38 and 39, switches 40 and 41, and memories 42 and 43. The magnitude comparator 38 is supplied with the amount of elimination of the echo canceller 8 that has estimated the transfer function in the horizontal direction, and the threshold L
Compare with ht and output the result to switch 40. The switch 40 is supplied with a horizontal angle signal from the tap number / speaker direction conversion circuit. If the erasure amount is larger than the threshold Lht, the switch is closed and the horizontal angle signal is supplied to the memory 42. If the erase amount is equal to or less than the threshold Lht, the switch is opened to stop supplying the horizontal angle signal. The memory 42 always outputs the held content as a horizontal angle signal.

【００４８】同様にして、大小比較器３９には、垂直方
向の伝達関数の推定を行ったエコーキャンセラ９の消去
量が給され、閾値Ｌvtと比較し結果をスイッチ４１へと
出力する。スイッチ４１には、タップ番号−話者方向換
算回路から垂直角信号が供給され、消去量が閾値Ｌvtよ
り大きい場合は、スイッチを閉じて垂直角信号をメモリ
４３に供給する。消去量が閾値Ｌvt以下である場合は、
スイッチを開いて垂直角信号の供給を停止する。メモリ
４３はその保持内容を垂直角信号として常に出力する。Similarly, the magnitude comparator 39 is supplied with the amount of erasure of the echo canceller 9 that has estimated the transfer function in the vertical direction, compares it with the threshold value Lvt, and outputs the result to the switch 41. The switch 41 is supplied with a vertical angle signal from the tap number / speaker direction conversion circuit. When the erasure amount is larger than the threshold Lvt, the switch is closed and the vertical angle signal is supplied to the memory 43. If the erasure amount is equal to or less than the threshold Lvt,
Open the switch to stop supplying the vertical angle signal. The memory 43 always outputs the held content as a vertical angle signal.

【００４９】以上に述べた動作で、安定度の改善された
水平角出力信号および垂直角出力信号が得られる。この
ように、本実施の形態では、エコーキャンセラの消去量
を監視しながら、話者方向信号の出力の更新または保持
を切替えることが可能であり、話者音声に別方向の人間
が机を鉛筆で叩いた様な雑音が重畳されたり、別方向の
話者が会話に割り込んで、複数の話者音声が加算された
場合でも、直前までに検出した話者方向信号を保持でき
るという効果が得られる。従って、検出した話者方向の
的確性をさらに高めるという作用が得られ、本発明の目
的が達成される。With the operation described above, a horizontal angle output signal and a vertical angle output signal with improved stability can be obtained. As described above, in the present embodiment, it is possible to switch the updating or holding of the output of the speaker direction signal while monitoring the amount of elimination of the echo canceller, and a person in another direction can pencil the desk with the speaker voice. Even if the noise as if struck by the user is superimposed, or a speaker in another direction interrupts the conversation, and multiple speaker voices are added, the effect of retaining the speaker direction signal detected until immediately before is obtained. Can be Therefore, an effect of further improving the accuracy of the detected speaker direction is obtained, and the object of the present invention is achieved.

【００５０】[0050]

【発明の効果】以上説明したように、本発明によれば、
以下のような効果が得られる。第１に、音声信号入力手
段であるマイクロホンの感度調整を不要とすることが可
能である。第２に、話者音声信号以外の雑音信号が音声
信号に重畳して入力された場合でも、話者方向の検出誤
りを低減することが可能である。As described above, according to the present invention,
The following effects can be obtained. First, it is possible to eliminate the need to adjust the sensitivity of the microphone as the audio signal input means. Secondly, even when a noise signal other than the speaker's voice signal is input while being superimposed on the voice signal, detection errors in the speaker direction can be reduced.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】本発明にて用いられるマイクロホンの配置と、
装置正面方向に対する話者音声信号の到来方向の水平角
および垂直角の関係を示す図である。FIG. 2 shows the arrangement of microphones used in the present invention,
FIG. 4 is a diagram illustrating a relationship between a horizontal angle and a vertical angle of an arrival direction of a speaker voice signal with respect to a front direction of the apparatus.

【図３】本発明の一実施例にて用いられるエコーキャン
セラの構成を示すブロック図である。FIG. 3 is a block diagram showing a configuration of an echo canceller used in one embodiment of the present invention.

【図４】本発明の一実施例にて用いられる適応ＦＩＲフ
ィルタの構成を示すブロック図である。FIG. 4 is a block diagram showing a configuration of an adaptive FIR filter used in one embodiment of the present invention.

【図５】本発明の一実施例にて用いられるリーク積分器
の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a leak integrator used in one embodiment of the present invention.

【図６】本発明の一実施例にて用いられる係数更新制御
回路の動作を示すフローチャートである。FIG. 6 is a flowchart showing the operation of a coefficient update control circuit used in one embodiment of the present invention.

【図７】最大タップ検出回路の動作を説明するフローチ
ャートである。FIG. 7 is a flowchart illustrating an operation of a maximum tap detection circuit.

【図８】本発明の一実施例にて用いられるタップ番号−
話者方向換算回路にて、水平角を求める計算を説明する
図である。FIG. 8 illustrates tap numbers used in an embodiment of the present invention.
FIG. 6 is a diagram illustrating calculation for obtaining a horizontal angle in a speaker direction conversion circuit.

【図９】本発明の一実施例にて用いられるタップ番号−
話者方向換算回路にて、垂直角を求める計算を説明する
図である。FIG. 9 illustrates tap numbers used in an embodiment of the present invention.
It is a figure explaining calculation which calculates a perpendicular angle in a speaker direction conversion circuit.

【図１０】本発明の他の実施例のブロック図である。FIG. 10 is a block diagram of another embodiment of the present invention.

【図１１】本発明の他の実施例にて用いられるエコーキ
ャンセラの構成を示す図である。FIG. 11 is a diagram showing a configuration of an echo canceller used in another embodiment of the present invention.

【図１２】本発明の他の実施例にて用いられる消去量計
算回路の構成を示すブロック図である。FIG. 12 is a block diagram showing a configuration of an erasure amount calculation circuit used in another embodiment of the present invention.

【図１３】本発明の他の実施例にて用いられる更新制御
回路の構成を示すブロック図である。FIG. 13 is a block diagram showing a configuration of an update control circuit used in another embodiment of the present invention.

[Explanation of symbols]

１〜３マイクロホン４〜６Ａ／Ｄ変換器７遅延回路８，９エコーキャンセラ１０，１１最大タップ検出回路１２タップ番号−話者方向換算回路１３話者方向検出回路１４，１５，３４，３７リーク積分器１６係数更新制御回路１７適応ＦＩＲフィルタ回路１８減算器１９Ｈレジスタ修正量計算回路２０Ｘレジスタ２１乗算回路２２Ｈレジスタ２３加算回路２４，２７乗算器２５加算器２６１サンプル遅延器３０，３１消去量計算回路３２更新制御回路３５除算回路３６対数計算回路３８，３９大小比較器４０，４１スイッチ４２，４３メモリ 1-3 Microphone 4-6 A / D converter 7 Delay circuit 8,9 Echo canceller 10,11 Maximum tap detection circuit 12 Tap number-speaker direction conversion circuit 13 Speaker direction detection circuit 14,15,34,37 Leakage Integrator 16 Coefficient update control circuit 17 Adaptive FIR filter circuit 18 Subtractor 19 H register correction amount calculation circuit 20 X register 21 Multiplication circuit 22 H register 23 Addition circuit 24, 27 Multiplier 25 Adder 26 One sample delay 30, 31 Erasure amount calculation circuit 32 Update control circuit 35 Division circuit 36 Logarithm calculation circuit 38,39 Large / Small comparator 40,41 Switch 42,43 Memory

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｒ 3/00 ３２０Ｈ０４Ｎ 5/232 Ｚ５Ｊ０８３ // Ｈ０４Ｎ 5/232 Ｇ１０Ｌ 3/00 ５１１５Ｋ０４６Ｆターム(参考） 5C022 AA12 AB62 5C064 AA02 AC09 5D015 DD02 5D020 BB00 BB09 5J023 DA05 DB03 DC07 5J083 AA05 AC17 AC18 AD02 AE08 AF01 BE20 BE53 BE57 CA10 5K046 HH01 HH53 HH79 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) H04R 3/00 320 H04N 5/232 Z 5J083 // H04N 5/232 G10L 3/00 511 5K046 F term (reference) 5C022 AA12 AB62 5C064 AA02 AC09 5D015 DD02 5D020 BB00 BB09 5J023 DA05 DB03 DC07 5J083 AA05 AC17 AC18 AD02 AE08 AF01 BE20 BE53 BE57 CA10 5K046 HH01 HH53 HH79

Claims

[Claims]

1. A system comprising: a delay time estimating means for estimating a delay time between signals output by first and second microphones; and a speaker direction calculating means for calculating a speaker direction based on the estimated delay time. A speaker direction detection system characterized by the following.

2. The delay time estimating means has an echo canceller configuration in which a signal output from the first microphone is used as a main signal input, and a signal output from the second microphone is used as a reference signal input. The speaker direction detection system according to claim 1, wherein:

3. The echo canceller according to claim 1, wherein the adaptive FIR filter receives the reference signal, a subtractor subtracts the output of the adaptive FIR filter from the main signal, and the output of the subtractor becomes substantially zero. 3. A speaker direction detecting system according to claim 2, further comprising control means for controlling a learning operation of said adaptive FIR filter.

4. The speaker direction detection system according to claim 2, wherein the main signal is delayed by a predetermined time and input to the echo canceller.

5. The speaker direction detecting system according to claim 3, wherein said control means updates and controls tap coefficients of said adaptive FIR filter so that said subtraction output becomes substantially zero. .

6. The adaptive FIR according to claim 6, wherein
Means for detecting the maximum value of the tap coefficient of the filter,
6. The speaker direction detection system according to claim 3, wherein said speaker direction calculation means includes means for calculating said speaker direction according to a tap number corresponding to said maximum value.

7. The apparatus according to claim 1, further comprising: means for calculating an erasure amount of said echo canceller; and means for stopping updating of said speaker direction calculation means when said calculated amount is less than a predetermined threshold value. The speaker direction detection system according to claim 2.

8. The first and second microphones are arranged at regular intervals in the horizontal (or vertical) direction, and the delay time estimating means and the speaker direction calculating means determine the horizontal (or vertical) of the speaker. 8. The speaker direction detecting circuit according to claim 1, wherein the direction angle is calculated.

9. The first microphone and the third microphone are arranged at regular intervals in the vertical (or horizontal) direction, and a delay time between signals output by the first and third microphones is set. The apparatus further includes: second delay time estimating means for estimating; and second speaker direction calculating means for calculating an angle of the speaker in the vertical (or horizontal) direction based on the estimated delay time. A speaker direction detection system according to any one of Items 1 to 8.