JP4929685B2

JP4929685B2 - Remote conference equipment

Info

Publication number: JP4929685B2
Application number: JP2005330730A
Authority: JP
Inventors: 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-11-15
Filing date: 2005-11-15
Publication date: 2012-05-09
Anticipated expiration: 2025-11-15
Also published as: CN101310558A; CN101310558B; JP2007142595A

Abstract

A teleconference device includes a loudspeaker array and microphone arrays arranged at the both sides of the loudspeaker array. A plurality of focal points are set in front of the respective microphone arrays and symmetrically with respect to the center line of the loudspeaker array. A flux of sound collection beams toward the focal points is outputted. By calculating a difference between the sound collection beams toward the focal points symmetric with respect to the center line, a sound component coming into the microphone from the loudspeaker array SPA is cancelled. Furthermore, a total of squares of wave height value of the difference value for a particular time is used to estimate which of the set focal points is the nearest. Furthermore, by comparing the totals of squares of the wave height values of the sound collection beams toward the focal points symmetric to each other, it is possible to judge the position of the speaker.

Description

マイクアレイと、スピーカアレイを備え、受信した音声とその音場を再現する装置に関し、特に、マイクアレイから話者または音源の位置を特定することに関する。 The present invention relates to an apparatus that includes a microphone array and a speaker array and reproduces received voice and its sound field, and more particularly to specifying the position of a speaker or a sound source from the microphone array.

従来、送信側の音声を受信して、送信側の音声の音場を再現する手段が提案されている（特許文献１〜３参照。）。このような装置では、複数のマイク等より収音した音声信号等を送信して、受信側では複数のスピーカを用いて、送信側の音場を再現するものである。このようにすれば話者の位置を音声で特定できる利点がある。 Conventionally, a means for receiving a sound on the transmission side and reproducing a sound field of the sound on the transmission side has been proposed (see Patent Documents 1 to 3). In such an apparatus, sound signals collected from a plurality of microphones or the like are transmitted, and a sound field on the transmission side is reproduced using a plurality of speakers on the reception side. This has the advantage that the position of the speaker can be specified by voice.

特許文献１では、複数のマイクアレイで受け取った音声情報を送信して、これと同数のスピーカアレイで出力することにより、送信元の音場を再現する立体音声情報の創出方法等が開示されている。 Patent Document 1 discloses a method of creating stereoscopic audio information that reproduces a sound field of a transmission source by transmitting audio information received by a plurality of microphone arrays and outputting the same by the same number of speaker arrays. Yes.

この特許文献１の方法によれば、確かに送信元の音場そのものを伝送可能であり、話者の位置を音声で特定することが可能であるが、回線リソースを多く使用する等の問題があり、話者の位置情報を特定し、伝送する手段等が開示されている（例えば、特許文献２参照。）。 According to the method of Patent Document 1, it is possible to transmit the sound field itself of the transmission source without fail, and it is possible to specify the position of the speaker by voice, but there are problems such as using a lot of line resources. There are disclosed means for identifying and transmitting speaker position information (see, for example, Patent Document 2).

特許文献２では、マイクによって話者の音声をとらえ、マイクから得られる話者情報によって話者位置情報を生成し、この話者位置情報を音声情報と共に多重化して伝送させる。受信側では送られて来る話者位置情報により鳴動させるスピーカの位置を切り替え、話者の声と位置を受信側で再現する装置が開示されている。 In Patent Document 2, a speaker's voice is captured by a microphone, speaker position information is generated from speaker information obtained from the microphone, and the speaker position information is multiplexed and transmitted together with the voice information. An apparatus is disclosed that switches the position of a speaker to be ringed according to speaker position information sent on the receiving side and reproduces the voice and position of the speaker on the receiving side.

特許文献３では、多人数の会議システムで、各話者にマイクをすべて持たせることが現実的でないことから、マイク制御部を用いて、各マイクに入力された音声信号の位相をシフトさせて合成することにより話者を特定する会議システムについての記載がある。特許文献３では、話者の座席位置に対応した位相シフトのパターンを変化させて、音声が最大となる位相パターンを決定し、この決定された位相シフトのパターンより話者の位置を特定している。
特開平２−１１４７９９号公報特開平９−２６１３５１号公報特開平１０−１４５７６３号公報 In Patent Document 3, since it is not practical to have each speaker have all microphones in a multi-person conference system, the phase of the audio signal input to each microphone is shifted using the microphone control unit. There is a description of a conference system that identifies speakers by synthesis. In Patent Document 3, the phase shift pattern corresponding to the seat position of the speaker is changed to determine the phase pattern that maximizes the voice, and the speaker position is specified from the determined phase shift pattern. Yes.
Japanese Patent Laid-Open No. 2-114799 JP-A-9-261351 JP-A-10-145663

しかしながら、以上の特許文献では、以下の問題があった。 However, the above patent documents have the following problems.

特許文献１の方法では前述のとおり、回線リソースを多く使用する等の問題があった。 As described above, the method of Patent Document 1 has a problem of using a lot of line resources.

特許文献２、３の方法では、マイクから得られる話者情報によって話者位置情報を生成することが可能であるが、相手装置から送信される音声を出力するスピーカの音声によってこの位置検出がかく乱され、実際とは異なる方向に音源があると誤認して、マイクアレイ（特許文献３ではカメラ）を指向させてしまう問題があった。 In the methods of Patent Documents 2 and 3, it is possible to generate speaker position information based on speaker information obtained from a microphone, but this position detection is disturbed by the sound of a speaker that outputs the sound transmitted from the counterpart device. However, there is a problem that the microphone array (the camera in Patent Document 3) is pointed at by misinterpreting that the sound source is in a direction different from the actual direction.

そこで、本発明は、遠隔会議装置において、相手装置から送信される音声を出力するスピーカがマイクに回り込んで収音されても、真の音源を推定できるようにすることを目的とする。 Accordingly, an object of the present invention is to enable a remote conference device to estimate a true sound source even if a speaker that outputs sound transmitted from a partner device wraps around a microphone and collects sound.

本発明は、上述の課題を解決するための手段を以下のように構成している。 In the present invention, means for solving the above-described problems are configured as follows.

（１）本発明は、音声を出力する複数のスピーカからなるスピーカアレイと、前記スピーカアレイの長手方向の両側を収音するように設けた第１、第２のマイクアレイと、前記第１のマイクアレイおよび前記第２のマイクアレイの各マイクが収音した音声信号にディレイ処理をして合成することにより、前記スピーカアレイの長手方向の中心線に関し互いに対称な位置に複数の第１収音エリア、および複数の第２収音エリアをそれぞれ設定する収音エリア設定手段と、前記複数の第１収音エリア、および複数の第２収音エリアからそれぞれ収音した音声信号のうち、前記対称な位置の収音エリア対から収音した音声信号の差分信号をそれぞれ計算する差分信号計算手段と、前記差分信号の信号強度が大きい収音エリア対を選択する第１の音源位置推定手段と、前記第１の音源位置推定手段が選択した収音エリア対のうち、収音した音声信号の強度の大きい方の収音エリアを選択し、この収音エリアに音源位置があると推定する第２の音源位置推定手段と、を備え、前記収音エリア設定手段は、前記第２の音源位置推定手段が選択した収音エリア内にさらに複数の狭収音エリアを設定して、この複数の狭収音エリアにそれぞれ焦点を当てた複数の狭収音ビームを生成する機能を備えるとともに、前記複数の狭収音エリアのうち、収音した音声信号の強度が大きいエリアに音源位置があると推定する第３の音源位置推定手段を備えたことを特徴とする。 (1) The present invention includes a speaker array including a plurality of speakers for outputting audio, and the first and second microphone arrays provided so as to pick up the longitudinal sides of the speaker array, the first By synthesizing the audio signals picked up by the microphones of the microphone array and the second microphone array by delay processing, a plurality of first condensers are symmetric with respect to the longitudinal center line of the speaker array. A sound collection area setting means for setting a sound area and a plurality of second sound collection areas, and a sound signal collected from each of the plurality of first sound collection areas and the plurality of second sound collection areas; first sound source for selecting a difference signal calculating means for calculating respective difference signal of the audio signal picked up from the serial symmetric positions sound-pickup area pairs, the signal strength is greater sound-pickup area pair of the differential signal The sound collection area with the higher intensity of the collected sound signal is selected from the pair of sound collection areas selected by the position estimation means and the first sound source position estimation means, and the sound collection position is in this sound collection area. Second sound source position estimating means for estimating the sound collection area, and the sound collection area setting means further sets a plurality of narrow sound collection areas in the sound collection area selected by the second sound source position estimation means. And a function of generating a plurality of narrow sound collection beams each focused on the plurality of narrow sound collection areas, and a sound source in an area where the intensity of the collected sound signal is large among the plurality of narrow sound collection areas Third sound source position estimating means for estimating that there is a position is provided .

収音エリア設定手段は、対称となる位置を収音エリアとし、その収音エリアに焦点を当てて第１、第２の収音ビームを生成している。また、相手装置から送信され、スピーカアレイから出力される音声は、前記１対のマイクアレイそれぞれのいずれの側にも略対称に音声が出力される。したがって、第１、第２の収音ビームには、スピーカアレイから出力される音声が略等しく入力されると考えられ、差分信号計算手段が第１、第２の収音ビームの差分信号を計算しているので、スピーカアレイから出力される音声をキャンセルできる。また、収音ビームの実効値の差分を計算しても、収音ビームが当てている焦点には、スピーカアレイから出力される音声が略等しく入力されると考えられ、同様にスピーカアレイから出力される音声をキャンセルできる。 The sound collection area setting means sets a symmetrical position as the sound collection area, and generates the first and second sound collection beams by focusing on the sound collection area. Further, the sound transmitted from the counterpart device and output from the speaker array is output substantially symmetrically on either side of each of the pair of microphone arrays. Therefore, it is considered that the sound output from the speaker array is input to the first and second sound collecting beams substantially equally, and the difference signal calculating means calculates the difference signal of the first and second sound collecting beams. Therefore, the sound output from the speaker array can be canceled. In addition, even if the difference between the effective values of the collected sound beam is calculated, it is considered that the sound output from the speaker array is input almost equally to the focal point to which the collected sound beam is applied. Canceled audio can be canceled.

また、このマイクアレイに対して入力されるスピーカアレイから出力される音声以外の音声は、このような差分を取ったとしても、消えてなくなることがない。例えば、典型的には、片側のマイクアレイ側にのみ話者が話した場合であってその話者の方向に向けた収音ビームを生成した場合には、一方の収音ビームには、その話者の音声が入り、逆側には音声が入力されないから、前記差分の計算には、その話者の音声そのものまたはその逆相の音声が残る。また、両側に音源があったとしても、音声が異なるので、ほとんどの場合、１対のマイクアレイに入力される音声は非対称となる。したがって、このような差分を取ったとしても、話者の音声が残る。また、前記実効値を計算しても、同様に、話者の音声の存在を抽出できる。 Further, even if such a difference is taken, the sound other than the sound output from the speaker array input to the microphone array does not disappear. For example, typically, when a speaker speaks only to one microphone array side and a sound collecting beam directed toward the speaker is generated, one sound collecting beam includes Since the voice of the speaker enters and no voice is input to the opposite side, the voice of the speaker itself or the voice of the opposite phase remains in the calculation of the difference. Even if there are sound sources on both sides, the sound is different, and in most cases, the sound input to the pair of microphone arrays is asymmetric. Therefore, even if such a difference is taken, the speaker's voice remains. Further, even if the effective value is calculated, the presence of the speaker's voice can be similarly extracted.

第１の音源位置推定手段は、上記差分信号の大きい収音エリア対のどちらかに音源位置が存在すると推定する。第２の音源位置推定手段は、収音エリア対のそれぞれで収音した音声信号を比較し、どちらに音源位置が存在するかを推定する。このように、この発明によればスピーカから出力される音声がマイクに回り込んで収音される可能性があっても、音源（話者の音声を含む。以下同じ。）の位置を正しく推定することができる。 The first sound source position estimating means estimates that a sound source position exists in one of the sound collection area pairs having a large difference signal. The second sound source position estimating means compares the sound signals picked up in each of the sound pickup area pairs and estimates which sound source position exists. As described above, according to the present invention, the position of the sound source (including the voice of the speaker; the same shall apply hereinafter) is correctly estimated even when the sound output from the speaker may be collected around the microphone. can do.

なお、音声信号の実効値は、特定時間の波高値の２乗の時間平均をリアルタイムで計算することで得られる。差分信号の信号強度は、所定の時間の波高値の２乗の時間平均やＦＦＴ変換したゲインの予め定めた複数の周波数ゲインの２乗和等で比較する。実効値の差分信号の信号強度は、実効値の計算よりも長い所定時間分のデータを用いて、実効値の差分信号の時間平均、またはこの差分信号の２乗の時間平均で計算することができる。以下同じである。 The effective value of the audio signal can be obtained by calculating in real time the time average of the square of the peak value at a specific time. The signal intensity of the difference signal is compared with the time average of the square of the peak value of a predetermined time, the square sum of a plurality of predetermined frequency gains of the FFT-transformed gain, or the like. The signal strength of the effective value difference signal can be calculated by using the time average of the effective value difference signal or the squared time average of the difference signal using data for a predetermined time longer than the effective value calculation. it can. The same applies hereinafter.

この発明では、第２の音源位置推定手段により音源位置があると推定された収音エリア内にさらに複数の狭収音エリアを設定し、そのそれぞれに狭収音ビームを生成する。第３の音源位置推定手段は、狭収音エリアのうち信号強度が大きいエリアを選択することにより、段階的に音源の位置を絞り込んで最初から細かく推定するよりも、短期間に音源の位置を推定することができる。 In the present invention, a plurality of narrow sound collection areas are set in the sound collection area estimated by the second sound source position estimation means to have a sound source position, and a narrow sound collection beam is generated in each of the narrow sound collection areas. The third sound source position estimating means selects the area having a high signal intensity from the narrow sound collection areas, thereby narrowing down the position of the sound source in stages and estimating the position of the sound source in a short time rather than starting from the beginning. Can be estimated.

本発明によれば、遠隔会議装置においてスピーカから出力される音声がマイクに回り込んで収音される可能性があっても、音源の位置を正しく推定することができる。 According to the present invention, it is possible to correctly estimate the position of a sound source even when there is a possibility that the sound output from the speaker in the remote conference device may be collected by the microphone.

＜第１の実施形態＞
図１を用いて、本願発明の第１の実施形態である遠隔会議装置の構成と使用形態について説明する。この第１の実施形態の遠隔会議装置は、相手装置から送信された音声をスピーカアレイを用いて相手装置側における話者の位置を再現して出力するとともに、マイクアレイを用いて話者の音声を収音するとともにその話者の位置を検出し、収音した音声および位置情報を相手装置に送信する装置である。 <First Embodiment>
The configuration and usage of the remote conference apparatus according to the first embodiment of the present invention will be described with reference to FIG. The remote conference apparatus according to the first embodiment reproduces and outputs the voice transmitted from the partner apparatus by reproducing the position of the speaker on the partner apparatus side using the speaker array, and uses the microphone array. , And the position of the speaker is detected, and the collected sound and position information are transmitted to the partner device.

図１は、この遠隔会議装置の外観図と使用形態を示しており、図１（Ａ）は、遠隔会議装置の外観斜視図、図１（Ｂ）は遠隔会議装置の底面図Ａ−Ａ矢視図である。また図１（Ｃ）は遠隔会議装置の使用形態を示す図である。 FIG. 1 shows an external view and a usage form of the remote conference device. FIG. 1A is an external perspective view of the remote conference device, and FIG. 1B is a bottom view AA arrow of the remote conference device. FIG. FIG. 1C is a diagram showing a usage pattern of the remote conference apparatus.

図１（Ａ）に示すように、遠隔会議装置１は、長直方体の装置本体と脚１１とを備えている。遠隔会議装置１の本体は、脚１１により設置面から所定間隔上方に浮かせて支えられている。遠隔会議装置１の底面には、複数のスピーカＳＰ１〜ＳＰ４を、長直方体である装置本体の長手方向に直線状に配置したスピーカアレイＳＰＡが下向きに設けられている。このスピーカアレイＳＰＡにより、遠隔会議装置１の底面から下向きに音声が出力され、この音声が会議机等の設置面で反射して会議参加者に到達する（図１（Ｃ）参照。）。 As shown in FIG. 1A, the remote conference apparatus 1 includes an apparatus body having a rectangular parallelepiped shape and legs 11. The main body of the remote conference apparatus 1 is supported by the legs 11 so as to float above the installation surface by a predetermined distance. On the bottom surface of the teleconferencing device 1, a speaker array SPA in which a plurality of speakers SP1 to SP4 are linearly arranged in the longitudinal direction of the device body which is a rectangular parallelepiped is provided downward. With this speaker array SPA, sound is output downward from the bottom surface of the remote conference apparatus 1, and this sound is reflected on the installation surface of the conference desk or the like and reaches the conference participants (see FIG. 1C).

また、図１（Ａ），（Ｂ）に示すように、装置本体の長手方向の両側面（以下、この両側面を右側面（同図（Ｂ）の上辺）および左側面（同図（Ｂ）の下辺）と呼ぶ。）には、マイクを直線状に配置したマイクアレイが設けられている。すなわち、装置本体の右側面には、マイクＭＲ１〜ＭＲ４からなるマイクアレイＭＲが設けられ、装置本体の左側面には、マイクＭＬ１〜ＭＬ４からなるマイクアレイＭＬが設けられている。遠隔会議装置１は、これらのマイクアレイＭＲ，ＭＬを用いて、話者である会議参加者の話し声を収音するとともに、その話者の位置を検出する。 Further, as shown in FIGS. 1A and 1B, both side surfaces of the apparatus main body in the longitudinal direction (hereinafter, both side surfaces are referred to as right side surface (upper side of FIG. 1B) and left side surface (FIG. 1B). In the lower side), a microphone array in which microphones are arranged in a straight line is provided. That is, a microphone array MR including microphones MR1 to MR4 is provided on the right side surface of the apparatus main body, and a microphone array ML including microphones ML1 to ML4 is provided on the left side surface of the apparatus main body. Using the microphone arrays MR and ML, the remote conference apparatus 1 collects the voice of the conference participant who is a speaker and detects the position of the speaker.

なお、図１では図示を省略しているが、遠隔会議装置１の内部には、マイクアレイＭＲ、ＭＬから収音した音声を加工して、話者の位置（人間の声のみならず、物体から出る音声でも良い。以下同じ、）を推定して、この位置とマイクアレイＭＲ、ＭＬから収音した音声とを多重化して送信する送信部２（図４参照）、および、相手装置から受信した音声をスピーカＳＰ１〜ＳＰ４からビーム化して出力する受信部３（図６参照）を備えている。 Although not shown in FIG. 1, inside the teleconference device 1, the voice collected from the microphone arrays MR and ML is processed, and the position of the speaker (not only the human voice but also the object) The same applies to the following, and the transmission unit 2 (see FIG. 4) for multiplexing and transmitting this position and the sound collected from the microphone arrays MR and ML, and receiving from the counterpart device The receiving unit 3 (see FIG. 6) is provided that converts the sound thus generated from the speakers SP1 to SP4 into a beam.

なお、図１ではマイクアレイＭＲ、ＭＬをスピーカアレイＳＰＡの中心線１０１に関して対称位置に設けているが、第１の実施形態の装置では、必ずしも対称に設ける必要はない。マイクアレイＭＲ、ＭＬが左右非対称であっても、左右の収音エリア（図３参照）が左右対称に形成されるように、送信部（図４参照）で信号処理を行うようにすればよい。 In FIG. 1, the microphone arrays MR and ML are provided at symmetrical positions with respect to the center line 101 of the speaker array SPA. However, the apparatus according to the first embodiment does not necessarily have to be provided symmetrically. Even if the microphone arrays MR and ML are left-right asymmetric, signal processing may be performed by the transmission unit (see FIG. 4) so that the left and right sound collection areas (see FIG. 3) are formed symmetrically. .

次に、図１（Ｃ）を用いて遠隔会議装置１の使用形態を説明する。遠隔会議装置１は、通常、会議机１００の中央に置いて使用される。会議机１００の左右両側または片側には話者９９８または／および話者９９９が着席する。スピーカアレイＳＰＡが出力した音声は、会議机１００で反射して左右の話者に到達するが、スピーカアレイＳＰＡが音声をビーム化して出力することにより、左右の話者に対してその音声を特定位置に定位させることができる。スピーカアレイＳＰＡによる音声のビーム化処理についての詳細は後述する。 Next, a usage pattern of the remote conference apparatus 1 will be described with reference to FIG. The remote conference apparatus 1 is usually used in the center of the conference desk 100. Speakers 998 and / or speakers 999 are seated on the left and right sides or one side of the conference desk 100. The sound output by the speaker array SPA is reflected by the conference desk 100 and reaches the left and right speakers. The speaker array SPA converts the sound into a beam and outputs it, so that the right and left speakers are identified. Can be localized in position. Details of the sound beam conversion processing by the speaker array SPA will be described later.

また、マイクアレイＭＲ，ＭＬは、話者の音声を収音する。マイクアレイＭＲ，ＭＬに接続されている信号処理部（送信部）は、各マイクユニットＭＲ１〜４，ＭＬ１〜４に入力される音声のタイミングの違いに基づいて話者の位置を検出する。 Further, the microphone arrays MR and ML pick up the voice of the speaker. A signal processing unit (transmission unit) connected to the microphone arrays MR and ML detects the position of the speaker based on the timing difference of the voices input to the microphone units MR1 to MR4 and ML1 to ML4.

また、図１では、図示の容易のため、スピーカの数、マイクの数を４つとしたが、第１の実施形態の装置を使用するためには４つに限らず、１つまたは多数のスピーカ、マイクを設けても良く、またマイクアレイＭＲ、ＭＬ、スピーカアレイＳＰＡは、１列でなく複数列設けても良い。そこで、以下の説明では、例えば、スピーカＳＰ１〜ＳＰＮをＳＰｉ（ｉ＝１〜Ｎ）、マイクＭＬ１〜ＭＬＮをＭＬｉ（ｉ＝１〜Ｎ）という風に添え字ｉを用いてスピーカアレイ、マイクアレイのそれぞれのスピーカおよびマイクを表現することにする。例えば、ＳＰｉ（ｉ＝１〜Ｎ）で、ｉ＝１についてはＳＰ１に対応する。 In FIG. 1, for ease of illustration, the number of speakers and the number of microphones are four. However, in order to use the apparatus of the first embodiment, the number of speakers is not limited to four, and one or many speakers are used. Microphones may be provided, and the microphone arrays MR, ML, and speaker array SPA may be provided in a plurality of rows instead of one row. Therefore, in the following description, for example, a speaker array and a microphone array using the suffix i in the manner of SPi (i = 1 to N) for the speakers SP1 to SPN and MLi (i = 1 to N) for the microphones ML1 to MLN. Each speaker and microphone will be expressed. For example, SPi (i = 1 to N) and i = 1 corresponds to SP1.

ここで、図２を参照して、スピーカアレイＳＰＡによる音声のビーム化処理、すなわち音声ビーム、および、マイクアレイＭＬ，ＭＲが形成する収音ビームについて説明する。 Here, with reference to FIG. 2, the sound beam conversion processing by the speaker array SPA, that is, the sound beam and the sound collecting beam formed by the microphone arrays ML and MR will be described.

同図（Ａ）は音声ビームを説明する図である。スピーカアレイＳＰＡの各スピーカユニットＳＰ１〜ＳＰＮに音声信号を供給する信号処理部（受信部）は、相手側装置から受信した音声信号を、同図に示すような遅延時間ＤＳ１〜ＤＳＮだけ遅延させて各スピーカユニットＳＰ１〜ＳＰＮに供給する。この図では、各スピーカは、仮想音源位置（焦点ＦＳ）に最も近いスピーカは遅延時間なしで音声を放音し、仮想音源位置に遠くなるほどその距離に応じた遅延時間を経て音声を放音するような遅延パターンが与えられる。この遅延パターンにより、各スピーカユニットＳＰ１〜ＳＰＮから出力される音声は、同図の仮想音源から発せられた音声と同じような波面を形成して広がってゆき、ユーザである会議出席者に対して、あたかも相手側の話者が仮想音源の位置に居るかのように音声を聴かせることができる。 FIG. 3A is a diagram for explaining an audio beam. The signal processing unit (receiving unit) that supplies audio signals to the speaker units SP1 to SPN of the speaker array SPA delays the audio signals received from the counterpart device by delay times DS1 to DSN as shown in FIG. It supplies to each speaker unit SP1-SPN. In this figure, the speakers closest to the virtual sound source position (focal point FS) emit sound without a delay time, and the sound is emitted through a delay time corresponding to the distance as the distance from the virtual sound source position increases. Such a delay pattern is given. Due to this delay pattern, the sound output from each of the speaker units SP1 to SPN spreads to form a wavefront similar to the sound emitted from the virtual sound source of FIG. The voice can be heard as if the other speaker is at the position of the virtual sound source.

同図（Ｂ）は、収音ビームを説明する図である。各マイクユニットＭＲ１〜ＭＲＮに入力された音声信号を図示のようにそれぞれ遅延時間ＤＭ１〜ＤＭＮだけ遅延させたのち合成する。この図では、各マイクが収音した音声信号は、収音エリア（焦点ＦＭ）に最も遠いマイクが収音した音声は遅延時間なしで加算部に入力され、収音エリアから近くなるほどその近づいた距離に応じた時間遅延させたのち加算部に入力されるような遅延パターンが与えられる。この遅延パターンにより、各音声信号は、収音エリア（焦点ＦＭ）から音波伝搬において等距離になり、合成した各音声信号は、この収音エリアの音声信号を同位相で強調するとともに、他のエリアの音声信号を位相ずれで相殺したものになっている。このように、複数のマイクに入力された音声をある収音エリアから音波伝搬上等距離になるように遅延させて合成することにより、その収音エリアの音声のみを収音することができる。 FIG. 5B is a diagram for explaining the sound collection beam. The audio signals input to the microphone units MR1 to MRN are synthesized after being delayed by delay times DM1 to DMN, respectively, as shown. In this figure, the sound signal picked up by each microphone is input to the adder without delay and the sound picked up by the microphone farthest from the sound pickup area (focal point FM) is closer to the sound pickup area. A delay pattern is provided so as to be input to the adder after being delayed for a time corresponding to the distance. Due to this delay pattern, each sound signal is equidistant from the sound collection area (focal point FM) in sound wave propagation, and each synthesized sound signal emphasizes the sound signal in this sound collection area with the same phase, The audio signal in the area is offset by the phase shift. In this way, by synthesizing the sound input to a plurality of microphones by delaying them so as to be equidistant in sound wave propagation from a certain sound collection area, only the sound in the sound collection area can be collected.

本実施形態の遠隔会議装置では、各マイクアレイＭＲ，ＭＬがそれぞれ複数（図３では４つ）の収音エリアに対して同時に収音ビームを形成している。これにより、話者がこの収音エリアのどこにいてもその音声を収音することができるとともに、その音声が収音された収音エリアにより、その話者の位置を検出することができる。 In the remote conference apparatus of the present embodiment, each microphone array MR, ML forms a sound collecting beam simultaneously with respect to a plurality (four in FIG. 3) of sound collecting areas. As a result, the voice can be picked up wherever the speaker is in the sound pickup area, and the position of the speaker can be detected from the sound pickup area where the sound is picked up.

次に図３を参照しつつ、前記収音ビームによる音源位置の検出およびその音源位置からの収音動作について説明する。図３は、遠隔会議装置および話者を上方から見下ろした平面図、すなわち図１（Ｃ）のＢ−Ｂ矢視図であり、マイクアレイによる収音ビーム形成の態様を説明する図である。 Next, the detection of the sound source position by the sound collection beam and the sound collection operation from the sound source position will be described with reference to FIG. FIG. 3 is a plan view of the teleconference device and the speaker looking down from above, that is, a view taken along the line BB in FIG.

≪デーモン音源を排除した音源位置検出・収音方式の説明≫
まず、この遠隔会議装置の音源位置検出および収音方式の原理について説明する。この説明では、スピーカアレイＳＰＡから音声ビームが出力されていないものとする。 ≪Description of sound source position detection and sound collection method excluding daemon sound source≫
First, the principle of the sound source position detection and sound collection method of the remote conference apparatus will be described. In this description, it is assumed that no sound beam is output from the speaker array SPA.

ここでは、右側面のマイクアレイＭＲの収音信号に対する処理について説明する。遠隔会議装置１の送信部２（図４参照）は、上述の遅延合成により収音エリア４１１〜４１４の４つのエリアを焦点とする収音ビームを形成する。これら複数の収音エリアは、遠隔会議装置１を用いた会議に出席する話者が存在する可能性のある位置を想定して決定されている。 Here, the process for the collected sound signal of the microphone array MR on the right side will be described. The transmission unit 2 (see FIG. 4) of the teleconference apparatus 1 forms a sound collection beam that focuses on the four areas of the sound collection areas 411 to 414 by the delay synthesis described above. The plurality of sound collection areas are determined on the assumption of a position where a speaker attending a conference using the remote conference device 1 may exist.

この、収音エリア４１１Ｒ〜４１４Ｒのうち、収音した音声信号のレベルが最も大きいエリアに話者（音源）が存在すると考えられる。たとえば、図３に示すように音源９９９が収音エリア４１４Ｒに存在する場合には、他の収音エリア４１１Ｒ〜４１３Ｒから収音した音声信号に比べて、収音エリア４１４Ｒから収音した音声信号のレベルが大きくなる。 Of these sound collection areas 411R to 414R, it is considered that a speaker (sound source) exists in an area where the level of the collected sound signal is the highest. For example, as shown in FIG. 3, when the sound source 999 is present in the sound collection area 414R, the sound signal collected from the sound collection area 414R compared to the sound signals collected from the other sound collection areas 411R to 413R. The level of will increase.

左側面のマイクアレイＭＬについても同様に、右側面とほぼ線対称に４系統の収音ビームを形成して、収音エリア４１１Ｌ〜４１４Ｌのうち、収音した音声信号のレベルが最も大きいエリアを検出する。なお、上記線対称の対称線は、スピーカアレイＳＰＡの軸とほぼ一致するように形成する。 Similarly, for the microphone array ML on the left side surface, four sound collecting beams are formed almost symmetrically with the right side surface, and the area where the level of the collected sound signal is the highest among the sound collecting areas 411L to 414L. To detect. The line symmetry line is formed so as to substantially coincide with the axis of the speaker array SPA.

以上が、本実施形態の遠隔会議装置の音源位置検出および収音方式の原理である。 The above is the principle of the sound source position detection and sound collection method of the remote conference apparatus of this embodiment.

スピーカアレイＳＰＡから音声が出力されず、マイクアレイＭＲ、ＭＬが回り込み音を収音しない状態では、この原理どおりで正しい音源位置検出と収音をすることができるが、遠隔会議装置は双方向に音声信号を送受信し、マイクアレイＭＲ、ＭＬによる収音と並行してスピーカアレイＳＰＡから音声が放音される。 In the state where no sound is output from the speaker array SPA and the microphone arrays MR and ML do not collect the sneak sound, the sound source position can be detected and collected in accordance with this principle. Audio signals are transmitted and received, and sound is emitted from the speaker array SPA in parallel with sound collection by the microphone arrays MR and ML.

スピーカアレイＳＰＡの各スピーカに供給される音声信号は、スピーカアレイ後方に設定された仮想音源位置から音声が到来した場合と同じ波面を形成するよう、図２（Ａ）に示すようなパターンの遅延が与えられている。一方、マイクアレイＭＲが収音した音声信号は、所定の収音エリアから到来する音声信号のタイミングが一致するように、図２（Ｂ）に示すようなパターンで遅延させたのち合成される。 The sound signal supplied to each speaker of the speaker array SPA has a pattern delay as shown in FIG. 2A so as to form the same wavefront as when sound comes from the virtual sound source position set behind the speaker array. Is given. On the other hand, the audio signal collected by the microphone array MR is synthesized after being delayed by a pattern as shown in FIG. 2B so that the timing of the audio signal coming from a predetermined sound collection area matches.

ここで、スピーカアレイＳＰＡの仮想音源位置が、マイクアレイＭＲの複数の収音エリアのうちいずれかと一致した場合には、スピーカアレイＳＰＡの各スピーカＳＰ１〜ＳＰＮに付与される遅延パターンとマイクアレイＭＲが収音した音声信号に対してその収音エリアについて付与される遅延パターンがちょうど裏返しになり、スピーカアレイＳＰＡから放音されマイクアレイＭＲに回り込んで収音された音声信号が大きなレベルで合成されてしまう。 Here, when the virtual sound source position of the speaker array SPA coincides with any one of the plurality of sound collection areas of the microphone array MR, the delay pattern given to each speaker SP1 to SPN of the speaker array SPA and the microphone array MR. The delay pattern assigned to the sound collection area for the sound signal picked up by the sound is just reversed, and the sound signal that is emitted from the speaker array SPA and wraps around the microphone array MR is synthesized at a high level. Will be.

上に述べた一般の音源位置検出方式で処理した場合、この大きなレベルで合成された回り込み音声信号が、本来はない音源（デーモン音源）として誤認識されてしまうという問題点がある。 In the case of processing by the general sound source position detection method described above, there is a problem that the wraparound audio signal synthesized at this large level is erroneously recognized as an original sound source (daemon sound source).

したがって、このデーモン音源をキャンセルしなければ、相手装置から到来した音声信号をそのまま返信してしまいエコーの原因になるとともに、本来の音源（話者）の音声を検出および収音することができなくなる。 Therefore, if this demon sound source is not canceled, the voice signal arriving from the counterpart device is returned as it is, causing echoes, and the sound of the original sound source (speaker) cannot be detected and collected. .

以上はマイクアレイＭＲについての説明であるが、マイクアレイＭＬについても（左右対称であるため）、全く同様である。 The above is the description of the microphone array MR, but the same applies to the microphone array ML (because it is symmetrical).

すなわち、デーモン音源は、音声ビームが、会議机１００で反射して左右対称に放射されることから、右側マイクアレイＭＲ，左側マイクアレイＭＬに同様に、左右対称に生じる。 That is, the demon sound source is generated symmetrically in the same manner in the right microphone array MR and the left microphone array ML because the sound beam is reflected by the conference desk 100 and radiated left and right symmetrically.

そこで、左側収音エリア４１１Ｌ〜４１４Ｌと右側収音エリア４１１Ｒ〜４１４Ｒの音量を比較し、音量レベルが大きく、音源が存在すると推定されても、左右の対応するエリアで同じように音量レベルが大きい場合には、これは、スピーカアレイＳＰＡの音声ビームが回り込んだデーモン音源であるとして、これを収音対象から外すことで、真の音源の音声の検出および収音を可能にするとともに、回り込み音声によるエコーを防止するようにしている。 Therefore, the sound volume levels of the left sound collecting areas 411L to 414L and the right sound collecting areas 411R to 414R are compared, and even if it is estimated that the sound volume exists and the sound volume is present, the sound volume levels are similarly large in the corresponding areas on the left and right. In this case, it is assumed that this is a demon sound source in which the sound beam of the speaker array SPA is circulated, and by removing this from the sound collection target, it is possible to detect and collect sound from the true sound source and Echoes due to voice are prevented.

このため、この遠隔会議装置の送信部では、左側マイクアレイＭＬの収音エリア４１１Ｌ〜４１４Ｌから収音した音声信号レベルと、右側マイクアレイＭＲの収音エリア４１１Ｒ〜４１４Ｒから収音した音声信号レベルとを比較し、左右の収音エリアでレベルがほぼ同じ対を排除し、左右の収音エリアでレベルが大きく異なる場合に、その大きい方の収音エリアに音源が存在すると判断するようにしている。 Therefore, in the transmission unit of this teleconference device, the sound signal level collected from the sound collection areas 411L to 414L of the left microphone array ML and the sound signal level collected from the sound collection areas 411R to 414R of the right microphone array MR If the levels of the left and right sound collection areas are significantly different, it is determined that a sound source exists in the larger sound collection area. Yes.

そして相手装置には、その大きい方の音声信号のみを送信するとともに、その信号（デジタル信号）のサブコード等にその音声信号を検出した収音エリアの位置を表す位置情報を付加する。 Then, only the larger audio signal is transmitted to the counterpart device, and position information indicating the position of the sound collection area where the audio signal is detected is added to the subcode of the signal (digital signal).

以下、上記のデーモン音源排除処理を実行する信号処理部（送信部）の構成について説明する。なお、図３の狭収音ビーム４３１〜４３４については、図７の第２実施形態の説明で参照して説明する。 Hereinafter, the configuration of the signal processing unit (transmission unit) that executes the daemon sound source exclusion process will be described. The narrow sound collecting beams 431 to 434 in FIG. 3 will be described with reference to the description of the second embodiment in FIG.

≪収音ビームを形成する送信部の構成≫
図４は、遠隔会議装置１の送信部２の構成を示すブロック図である。ここで、太い矢印は、複数系統の音声信号が伝送されていることを示し、細い矢印は、１つの音声信号が伝送されていることを示している。また、破線の矢印は指示入力が伝送されていることを示している。 ≪Configuration of transmitter that forms sound collection beam≫
FIG. 4 is a block diagram illustrating a configuration of the transmission unit 2 of the remote conference device 1. Here, a thick arrow indicates that a plurality of audio signals are transmitted, and a thin arrow indicates that one audio signal is transmitted. A broken arrow indicates that an instruction input is transmitted.

図中の第１ビーム生成部２３１および第２ビーム生成部２３２は、図３に示した左右の収音エリア４１１Ｒ〜４１４Ｒ、４１１Ｌ〜４１４Ｌを焦点とするそれぞれ４系統の収音ビームを形成する信号処理部である。 The first beam generation unit 231 and the second beam generation unit 232 in the figure are signals that form four systems of sound collection beams that focus on the left and right sound collection areas 411R to 414R and 411L to 414L shown in FIG. It is a processing unit.

第１ビーム生成部２３１には、Ａ／Ｄ変換器２１１を介して右側マイクアレイＭＲの各マイクユニットＭＲ１〜ＭＲＮが収音した音声信号が入力される。また、同様に、第２ビーム生成部２３２には、Ａ／Ｄ変換器２１２を介して左側マイクアレイＭＬの各マイクユニットＭＬ１〜ＭＬＮが収音した音声信号が入力される。 The first beam generator 231 receives an audio signal picked up by each microphone unit MR1 to MRN of the right microphone array MR via the A / D converter 211. Similarly, the second beam generation unit 232 receives an audio signal collected by the microphone units ML1 to MLN of the left microphone array ML via the A / D converter 212.

第１ビーム生成部２３１，第２ビーム生成部２３２は、それぞれ４つの収音ビームを形成して４つの収音エリア４１１Ｒ〜４１４Ｒ、４１１Ｌ〜４１４Ｌから音声を収音し、この収音した音声信号を差分値計算回路２２およびセレクタ２７１，２７２に出力する。 The first beam generation unit 231 and the second beam generation unit 232 form four sound collection beams to collect sound from the four sound collection areas 411R to 414R and 411L to 414L, and the collected sound signals Is output to the difference value calculation circuit 22 and the selectors 271 and 272.

図５は、第１ビーム形成部２３１の詳細構成を示す図である。第１ビーム生成部２３１では、各収音エリア４１ｊ（ｊ＝１〜Ｋ）に対応する複数の遅延処理部４５ｊを有している。各遅延処理部４５ｊでは、各収音エリア４１ｊに焦点を持つ収音ビーム出力ＭＢｊを生成するため、ディレイパターンのデータ４０ｊに基づき、各マイク出力毎に音声信号を遅延させる。各遅延処理部４５ｊは、ＲＯＭ上に記憶したディレイパターンのデータ４０ｊを入力して、ディレイ４６ｊｉ（ｊ＝１〜Ｋ、ｉ＝１〜Ｎ）にディレイ量を設定する。 FIG. 5 is a diagram illustrating a detailed configuration of the first beam forming unit 231. The first beam generation unit 231 includes a plurality of delay processing units 45j corresponding to the sound collection areas 41j (j = 1 to K). Each delay processing unit 45j delays an audio signal for each microphone output based on the delay pattern data 40j in order to generate a sound collection beam output MBj having a focus on each sound collection area 41j. Each delay processing unit 45j inputs the delay pattern data 40j stored in the ROM, and sets the delay amount in the delay 46ji (j = 1 to K, i = 1 to N).

そして、加算部４７ｊは、これらディレイがかけられたディジタル音声信号を加算して、マイクビーム出力ＭＢｊ（ｊ＝１〜Ｋ）として出力する。この収音ビーム出力ＭＢｊは、それぞれ、図３に示す収音エリア４１ｊへ焦点を結ぶ収音ビームとなる。そして、各遅延処理部４５ｊが演算した収音ビーム出力ＭＢｊは、それぞれ差分値計算回路２２等に出力される。 Then, the adder 47j adds these delayed digital audio signals and outputs the result as a microphone beam output MBj (j = 1 to K). Each of the sound collection beam outputs MBj is a sound collection beam that focuses on the sound collection area 41j shown in FIG. The sound collection beam output MBj calculated by each delay processing unit 45j is output to the difference value calculation circuit 22 and the like.

また、図５では第１ビーム形成部２３１について説明したが、第２ビーム形成部２３２も、これと同様の構成である。 In addition, although the first beam forming unit 231 has been described with reference to FIG. 5, the second beam forming unit 232 has the same configuration.

図４において、差分値計算回路２２は、各収音エリアで収音した音声信号のうち左右対称の位置にある収音エリアで収音したもの同士の音量レベルを比較し、その差分値を算出する。すなわち、収音エリアＡの信号レベルをＰ（Ａ）で表すとすると、差分値計算回路２２は、
Ｄ（４１１）＝｜Ｐ（４１１Ｒ）−Ｐ（４１１Ｌ）｜
Ｄ（４１２）＝｜Ｐ（４１２Ｒ）−Ｐ（４１２Ｌ）｜
Ｄ（４１３）＝｜Ｐ（４１３Ｒ）−Ｐ（４１３Ｌ）｜
Ｄ（４１４）＝｜Ｐ（４１４Ｒ）−Ｐ（４１４Ｌ）｜
を計算する。この計算した差分値Ｄ（４１１）〜Ｄ（４１４）を第１推定部２５１に出力する。 In FIG. 4, the difference value calculation circuit 22 compares the sound volume levels of the sound signals collected in the sound collection areas at the left and right symmetrical positions among the sound signals collected in each sound collection area, and calculates the difference value. To do. That is, if the signal level of the sound collection area A is represented by P (A), the difference value calculation circuit 22
D (411) = | P (411R) −P (411L) |
D (412) = | P (412R) −P (412L) |
D (413) = | P (413R) −P (413L) |
D (414) = | P (414R) −P (414L) |
Calculate The calculated difference values D (411) to D (414) are output to the first estimation unit 251.

なお、差分値計算回路２２は、左右の収音エリアで収音した音声信号の信号波形をそのまま引き算して差分値信号を出力するよう構成してもよく、左右の収音エリアで収音した音声信号の実効値を一定時間積分した音量レベル値を引き算した値を前記一定時間毎に出力するよう構成してもよい。 The difference value calculation circuit 22 may be configured to output the difference value signal by directly subtracting the signal waveform of the sound signal collected in the left and right sound collection areas. A value obtained by subtracting the volume level value obtained by integrating the effective value of the audio signal for a predetermined time may be output at each predetermined time.

差分値計算回路２２が差分値信号を出力する場合には、第１推定部２５１の推定を容易にするため、差分値計算回路２２と第１推定部２５１との間にＢＰＦ２４１を挿入すればよい。ＢＰＦ２４１は、差分値信号から、会話音声の周波数領域のなかで、収音ビームによって指向性制御を良好に行うことができる１ｋ〜２ｋＨｚ周辺の周波数帯域を通過させるように設定される。 When the difference value calculation circuit 22 outputs a difference value signal, a BPF 241 may be inserted between the difference value calculation circuit 22 and the first estimation unit 251 in order to facilitate estimation by the first estimation unit 251. . The BPF 241 is set so as to pass the frequency band around 1 kHz to 2 kHz where the directivity control can be satisfactorily performed by the sound collection beam in the frequency range of the conversational sound from the difference value signal.

このように、スピーカアレイＳＰＡの中心線を対称軸として左右対称の位置にある左右の収音エリアの収音信号の音量レベル同士を差分することにより、スピーカアレイＳＰＡから左右のマイクアレイＭＲ、ＭＬへ左右対称に回り込んだ音声成分がキャンセルされ、回り込みの音声信号をデーモン音源として誤認識してしまうことがない。 As described above, the left and right microphone arrays MR and ML are differentiated from the speaker array SPA by subtracting the volume levels of the collected sound signals in the left and right sound collection areas at symmetrical positions with the center line of the speaker array SPA as the symmetry axis. The sound component that circulates symmetrically to the left and right is canceled, and the circulated sound signal is not erroneously recognized as a daemon sound source.

第１推定部２５１は、差分値計算回路２２から入力された差分値のうち最大のものを選択し、その最大の差分値が計算された収音エリアのペアを選択する。この収音エリアを第２推定部２５２に入力すべく、第１推定部２５１は、この収音エリアの音声信号を第２推定部２５２に出力する選択信号をセレクタ２７１，２７２に出力する。 The first estimation unit 251 selects the maximum difference value input from the difference value calculation circuit 22 and selects a pair of sound collection areas in which the maximum difference value is calculated. In order to input this sound collection area to the second estimation unit 252, the first estimation unit 251 outputs a selection signal for outputting the sound signal of this sound collection area to the second estimation unit 252 to the selectors 271 and 272.

セレクタ２７１は、この選択信号に基づき、右側ビーム生成部２３１がビーム化して収音した４つの収音エリアの信号のうち第１推定部２５１が選択した収音エリアの信号を第２推定部２５２および信号選択部２６に供給すべく信号を選択する。また、セレクタ２７２は、前記選択信号に基づき、左側ビーム生成部２３２がビーム化して収音した４つの収音エリアの信号のうち第１推定部２５１が選択した収音エリアの信号を第２推定部２５２および信号選択部２６に供給すべく信号を選択する。 Based on this selection signal, the selector 271 selects the signal of the sound collection area selected by the first estimation unit 251 from the four sound collection area signals collected by the right beam generation unit 231 as the second estimation unit 252. A signal is selected to be supplied to the signal selector 26. Further, the selector 272 performs second estimation on the sound collection area signal selected by the first estimation unit 251 among the four sound collection area signals collected by the left beam generation unit 232 as a beam based on the selection signal. A signal is selected to be supplied to the unit 252 and the signal selection unit 26.

第２推定部２５２は、第１推定部２５１で推定され、セレクタ２７１，２７２から選択的に出力された収音エリアの音声信号を入力する。第２推定部２５２は入力された左右の収音エリアの音声信号を比較し、そのレベルの大きい方を真の音源の音声信号であると判断する。第２推定部２５２は、この真の音源が存在する収音エリアの方向，距離を示す情報を位置情報２５２２として多重化部２８に出力するとともに、信号選択部２６にこの真の音源の音声信号を選択的に多重化部２８に入力するように指示する。 The second estimation unit 252 receives the sound signal of the sound collection area estimated by the first estimation unit 251 and selectively output from the selectors 271 and 272. The second estimation unit 252 compares the input audio signals of the left and right sound collection areas, and determines that the higher level is the audio signal of the true sound source. The second estimation unit 252 outputs information indicating the direction and distance of the sound collection area where the true sound source is present to the multiplexing unit 28 as position information 2522, and also outputs the audio signal of the true sound source to the signal selection unit 26. Is selectively input to the multiplexer 28.

多重化部２８は、第２推定部２５２から入力された位置情報２５２２と、信号選択部２６から選択された真の音源の音声信号２６１とを多重化し、この多重化した信号を相手装置に対して送信する。 The multiplexing unit 28 multiplexes the position information 2522 input from the second estimation unit 252 and the sound signal 261 of the true sound source selected from the signal selection unit 26, and the multiplexed signal is transmitted to the counterpart device. To send.

なお、これらの推定部２５１、２５２は、音源位置の推定を一定期間ごとに繰り返して行う。例えば０．５秒ごとに繰り返す。この場合、０．５秒分の信号波形または振幅実効値を比較すればよい。このように所定期間ごとに繰り返し音源位置を推定して収音エリアを切り換えるようにすれば、話者の移動に対応した収音をすることができる。 Note that these estimation units 251 and 252 repeatedly perform estimation of the sound source position at regular intervals. For example, repeat every 0.5 seconds. In this case, the signal waveform or the effective amplitude value for 0.5 seconds may be compared. In this way, if the sound collection area is switched by repeatedly estimating the sound source position every predetermined period, it is possible to collect sound corresponding to the movement of the speaker.

なお、真の音源位置と回り込みによるデーモン音源位置が重なった場合には、左右の信号波形を差分した差分信号を収音信号として相手装置に出力するようにしてもよい。差分信号はデーモン音源波形のみキャンセルして真の音源の信号波形を保存しているからである。 When the true sound source position and the demon sound source position due to wraparound overlap, a difference signal obtained by subtracting the left and right signal waveforms may be output to the counterpart device as a sound collection signal. This is because the difference signal cancels only the daemon sound source waveform and stores the signal waveform of the true sound source.

また、話者が２つの収音エリアに跨がって存在している場合や話者が移動した場合に対応するため、以下のような別形態も考えられる。第１推定部２５１が差分信号の強度が大きい順に２つの収音エリアを選択するとともに、その強度比を出力する。第２推定部２５２はこの信号強度の最大のペアまたは２つのペアを比較して真の音源がどちら側にあるかを推定する。信号選択部２６は、第１推定部２５１および第２推定部２５２によって選択された一方の側の２つの音声信号を、この指示された強度比の重みをかけて合成し、出力信号２６１として出力する。このように信号強度比の重みつきで常に２つの位置の音声を合成すれば、話者の移動に対して、常に上述と同様のクロスフェードがかかることになり、音像定位が自然に移動する。 Further, in order to cope with the case where the speaker exists over two sound collection areas or the case where the speaker moves, the following other forms are also conceivable. The first estimation unit 251 selects two sound collection areas in descending order of the intensity of the difference signal, and outputs the intensity ratio. The second estimation unit 252 compares the maximum pair or two pairs of signal strengths to estimate which side the true sound source is on. The signal selection unit 26 synthesizes the two audio signals on one side selected by the first estimation unit 251 and the second estimation unit 252 by applying the weight of the instructed intensity ratio, and outputs the resultant as an output signal 261 To do. In this way, if the voices at two positions are always synthesized with the weight of the signal intensity ratio, the same crossfade as described above is always applied to the movement of the speaker, and the sound image localization moves naturally.

≪音声ビームを形成する受信部３の構成≫
次に図６を用いて、受信部３の内部構成について説明する。受信部３は、相手装置から音声信号を受信するとともに、音声信号のサブコードから位置情報を分離する音声信号受信部３１と、音声信号受信部３１が分離した位置情報からこの音声信号を定位させる位置を決定し、その位置に音像を定位させるための指向性制御パラメータを算出するパラメータ算出部３２と、パラメータ算出部３２から入力されたパラメータに基づいて、受信した音声信号の指向性を制御する指向性制御部３３と、指向性が制御された音声信号をアナログ信号に変換するＤ／Ａ変換器３４ｉ（ｉ＝１〜Ｎ）と、Ｄ／Ａ変換器３４ｉ（ｉ＝１〜Ｎ）から出力されたアナログの音声信号を増幅するアンプ３５ｉ（ｉ＝１〜Ｎ）とを備えている。アンプ３５ｉが出力したアナログの音声信号は、図１で示した外部のスピーカＳＰｉ（ｉ＝１〜Ｎ）に供給される。 << Configuration of the receiving unit 3 that forms an audio beam >>
Next, the internal configuration of the receiving unit 3 will be described with reference to FIG. The receiving unit 3 receives the audio signal from the counterpart device, and localizes the audio signal from the audio signal receiving unit 31 that separates the position information from the subcode of the audio signal, and the position information separated by the audio signal receiving unit 31. A parameter calculation unit 32 that calculates a directivity control parameter for determining the position and localizing the sound image at the position, and controls the directivity of the received audio signal based on the parameter input from the parameter calculation unit 32. From the directivity control unit 33, the D / A converter 34i (i = 1 to N) for converting the sound signal whose directivity is controlled to the analog signal, and the D / A converter 34i (i = 1 to N). And an amplifier 35i (i = 1 to N) for amplifying the output analog audio signal. The analog audio signal output from the amplifier 35i is supplied to the external speaker SPi (i = 1 to N) shown in FIG.

音声信号受信部３１は、インターネットや公衆電話回線等を介して相手装置と通信をする機能部であり、通信インタフェースやバッファメモリ等を備えている。音声信号受信部３１は、相手装置から位置情報２５２２をサブコードとして含む音声信号３０を受信する。受信した音声信号のサブコードから位置情報を分離してパラメータ算出部３２に入力するとともに、音声信号を指向性制御部３３に入力する。 The audio signal receiving unit 31 is a functional unit that communicates with a partner apparatus via the Internet, a public telephone line, or the like, and includes a communication interface, a buffer memory, and the like. The audio signal receiving unit 31 receives the audio signal 30 including the position information 2522 as a subcode from the counterpart device. The position information is separated from the subcode of the received audio signal and input to the parameter calculation unit 32, and the audio signal is input to the directivity control unit 33.

パラメータ算出部３２は、指向性制御部３３で用いるパラメータを算出する計算部であり、パラメータ算出部３２は、受信した位置情報に基づく位置に焦点を生成し、音声信号にこの焦点から放音されているような指向性を持たせるための、各スピーカユニットに供給する音声信号に与えるディレイ量を算出する。 The parameter calculation unit 32 is a calculation unit that calculates parameters used in the directivity control unit 33. The parameter calculation unit 32 generates a focus at a position based on the received position information, and a sound signal is emitted from the focus. The amount of delay to be given to the audio signal supplied to each speaker unit is calculated in order to provide such directivity.

指向性制御部３３は、パラメータ算出部３２で設定されたパラメータに基づいて、音声信号受信部３１で受信した音声信号を、スピーカＳＰｉ（ｉ＝１〜Ｎ）の出力系統ごとに処理する。即ち、スピーカＳＰｉ（ｉ＝１〜Ｎ）の各々に対応する複数の処理部をパラレルに設ける。各処理部は、パラメータ算出部３２が算出したパラメータ（ディレイ量パラメータ等）に基づいて、音声信号に対してディレイ量等を設定してＤ／Ａ変換器３４ｉ（ｉ＝１〜Ｎ）にそれぞれ出力する。 The directivity control unit 33 processes the audio signal received by the audio signal reception unit 31 for each output system of the speakers SPi (i = 1 to N) based on the parameters set by the parameter calculation unit 32. That is, a plurality of processing units corresponding to each of the speakers SPi (i = 1 to N) are provided in parallel. Each processing unit sets a delay amount or the like for the audio signal on the basis of the parameter (delay amount parameter or the like) calculated by the parameter calculation unit 32, and each D / A converter 34i (i = 1 to N) is set. Output.

Ｄ／Ａ変換器３４ｉ（ｉ＝１〜Ｎ）は、指向性制御部３３から出力された各出力系統ごとのデジタル音声信号をアナログ信号に変換して出力する。アンプ３５ｉ（ｉ＝１〜Ｎ）は、Ｄ／Ａ変換器３４ｉ（ｉ＝１〜Ｎ）から出力されたアナログの音声信号をそれぞれ増幅して、スピーカＳＰｉ（ｉ＝１〜Ｎ）に出力する。 The D / A converter 34 i (i = 1 to N) converts the digital audio signal for each output system output from the directivity control unit 33 into an analog signal and outputs the analog signal. The amplifier 35i (i = 1 to N) amplifies the analog audio signal output from the D / A converter 34i (i = 1 to N), and outputs the amplified signal to the speaker SPi (i = 1 to N). .

以上説明した受信部３が、相手装置から受信した音声信号を、相手装置における音源の位置関係を自装置で再現するために、装置本体底面に設置されているスピーカアレイＳＰＡから音声信号を位置情報に基づいてビーム化して出力し、仮想的な音源位置から音声が出力されたような指向性を再現する処理を行う。 In order for the receiving unit 3 described above to reproduce the positional relationship of the sound source in the partner device with the sound signal received from the partner device, the voice signal is received from the speaker array SPA installed on the bottom surface of the device body. Based on the above, a beam is generated and output, and the directivity as if sound is output from a virtual sound source position is reproduced.

＜第２の実施形態＞
次に、図７を参照して、第２の実施形態の遠隔会議装置について説明する。この実施形態は図４で示した第１の実施形態の応用であって、同一部分は、同じ符号を付して説明を準用する。また、図３を収音ビームの説明において、補助的に参照する。 <Second Embodiment>
Next, a remote conference apparatus according to the second embodiment will be described with reference to FIG. This embodiment is an application of the first embodiment shown in FIG. 4, and the same portions are denoted by the same reference numerals, and the description will be applied mutatis mutandis. Further, FIG. 3 is referred to supplementarily in the description of the sound collecting beam.

第１実施形態では、差分信号が大きい収音エリアのペアのどちらかに真の音源が存在するとし、第２推定部２５２がどちらに真の音源が存在するかを推定しているが、この実施形態では、さらに、第２推定部２５２が推定した真の音源が存在する収音エリアをさらに詳細に探索して、音源位置を正確に検出するための詳細位置探索用ビーム（狭ビーム）生成機能２３１３、２３２３を備えている。 In the first embodiment, it is assumed that a true sound source exists in one of the pairs of sound collection areas where the difference signal is large, and the second estimation unit 252 estimates in which one the true sound source exists. In the embodiment, further, a detailed position search beam (narrow beam) generation for accurately detecting the sound source position by searching in more detail the sound collection area where the true sound source estimated by the second estimation unit 252 exists. Functions 2313 and 2323 are provided.

第２推定部２５２が、図３に図示するように、真の音源９９９が収音エリア４１４Ｒに存在すると推定すると、第２推定部２５２は、この推定結果を第１ビーム生成部２３１に通知する。このように、第２推定部２５２では、マイクアレイＭＲ、ＭＬのどちら側に真の音源があるのか推定しているので、推定結果の通知２５２３、２５２４は、いずれか一方にしか入力されない。もし、左側エリアに真の音源が存在すると推定した場合には、第２推定部２５２は、第２ビーム生成部２３２にその推定結果を通知する。 When the second estimation unit 252 estimates that the true sound source 999 is present in the sound collection area 414R as illustrated in FIG. 3, the second estimation unit 252 notifies the first beam generation unit 231 of the estimation result. . As described above, since the second estimation unit 252 estimates which side of the microphone array MR or ML has the true sound source, the notifications 2523 and 2524 of the estimation results are input to only one of them. If it is estimated that a true sound source is present in the left area, the second estimation unit 252 notifies the second beam generation unit 232 of the estimation result.

第１ビーム生成部２３１は、この通知に基づき、詳細位置探索用ビーム生成機能２３１３を動作させて、図３の狭収音エリア４３１〜４３４を焦点とする狭ビームを生成して、さらに詳細に音源９９９の位置を探索する。 Based on this notification, the first beam generation unit 231 operates the detailed position search beam generation function 2313 to generate a narrow beam focusing on the narrow sound collection areas 431 to 434 in FIG. The position of the sound source 999 is searched.

また、第２の実施形態の装置では、第３推定部２５３、第４推定部２５４を備えている。この詳細位置探索用ビーム生成機能２３１３、２３２３の収音ビームから信号強度の高い順に２つ選択する。ただし、推定部２５３、２５４のうちで動作するのは、第２推定部２５２が推定した側のみである。 Further, the apparatus according to the second embodiment includes a third estimation unit 253 and a fourth estimation unit 254. Two of the collected sound beams of the detailed position search beam generation functions 2313 and 2323 are selected in descending order of signal intensity. However, only the side estimated by the second estimation unit 252 operates among the estimation units 253 and 254.

図３の例では、狭収音エリア４３１〜４３４へ指向させた収音ビームから音声信号を収音しており、真の音源９９９は、収音エリア４３４と収音エリア４３３に跨がった位置に存在している。この場合、第３推定部２５３は、信号強度の大きい順に収音エリア４３４、４３３から収音した音声信号を選択する。第３推定部２５３は、選択した２つの音声信号の信号強度に応じて、この選択した収音エリアの焦点位置を比例配分して話者の位置を推定・出力するとともに、選択した２つの音声信号を重みづけ合成して音声信号として出力する。 In the example of FIG. 3, the sound signal is collected from the sound collection beam directed to the narrow sound collection areas 431 to 434, and the true sound source 999 straddles the sound collection area 434 and the sound collection area 433. Exists in position. In this case, the third estimation unit 253 selects the sound signals collected from the sound collection areas 434 and 433 in descending order of signal strength. The third estimation unit 253 estimates and outputs the position of the speaker by proportionally allocating the focal position of the selected sound collection area according to the signal strengths of the two selected sound signals, and the two selected sound signals. The signals are weighted and synthesized and output as an audio signal.

以上は、右側エリアの第１ビーム生成部２３１（詳細位置探索用ビーム生成機能２３１３）および第３推定部２５３について説明したが、左側エリアの第２ビーム形成部２３２（詳細位置探索用ビーム生成機能２３２３）および第４推定部２５４についても同様の構成であるとともに同様の処理動作を実行する。 The first beam generation unit 231 (detail position search beam generation function 2313) and the third estimation unit 253 in the right area have been described above, but the second beam formation unit 232 (detail position search beam generation function in the left area) has been described. 2323) and the fourth estimation unit 254 have the same configuration and execute similar processing operations.

なお、以上で示した第２の実施形態の装置の詳細位置探索の機能は、話者が頻繁に移動する場合には、処理が追いつかない場合もある。そこで、第２推定部２５２から出力される話者の位置が一定時間留まっている場合にのみ、この機能を働かせることも考えられる。この場合、第２推定部２５２から出力される話者の位置が一定時間以内に移動する場合には、図７に示した構成を備えていても、図４に示した第１実施形態と同様の動作を行うようにすればよい。 Note that the detailed position search function of the apparatus of the second embodiment described above may not be able to catch up when the speaker moves frequently. Therefore, it is also conceivable to use this function only when the position of the speaker output from the second estimation unit 252 remains for a certain period of time. In this case, when the position of the speaker output from the second estimation unit 252 moves within a certain time, even if the configuration shown in FIG. 7 is provided, it is the same as in the first embodiment shown in FIG. It is sufficient to perform the operation.

なお、この絞込み推定を行う推定部２５３、２５４は、本発明の「第３の音源位置推定手段」に相当する。 Note that the estimation units 253 and 254 that perform the narrowing estimation correspond to the “third sound source position estimation unit” of the present invention.

＜第３の実施形態＞
次に、図８を用いて、この発明の第３の実施形態の遠隔会議装置の送信部について説明する。図８は、この送信部のブロック図である。この実施形態の装置の送信部２は、差分値計算回路２２の入力がＡ／Ｄ変換器２１１、２１２の出力である点、差分値計算回路２２の出力信号を用いて収音ビームを生成する第３ビーム生成部２３７を設けている点、第４ビーム生成部２３８、第５ビーム生成部２３９を設けている点、セレクタ２７１、２７２がない点が異なる。その他の部分は、同様の符号を付して、以上の説明を準用する。以下、この実施形態の装置の相違点、重要点のみ説明する。 <Third Embodiment>
Next, the transmission unit of the remote conference apparatus according to the third embodiment of the present invention will be described with reference to FIG. FIG. 8 is a block diagram of this transmission unit. The transmission unit 2 of the apparatus of this embodiment generates a sound collection beam by using the output of the difference value calculation circuit 22 in that the input of the difference value calculation circuit 22 is the output of the A / D converters 211 and 212. The difference is that the third beam generation unit 237 is provided, the fourth beam generation unit 238 and the fifth beam generation unit 239 are provided, and the selectors 271 and 272 are not provided. The other parts are denoted by the same reference numerals, and the above description is applied mutatis mutandis. Only the differences and important points of the apparatus of this embodiment will be described below.

図８に示すように、差分値計算回路２２には、直接、Ａ／Ｄ変換器２１１、２１２の出力を入力する。そのため、第２の実施形態の装置では、マイクＭＲｉとマイクＭＬｉの数Ｎは同数として、互いに対称の位置に設ける。差分値計算回路２２は、「（マイクＭＲｉの音声信号）−（マイクＭＬｉの音声信号）」（ｉ＝１〜Ｎ）をそれぞれ計算する。これにより、図４で示した実施形態の装置と同様、スピーカアレイＳＰＡから回り込んだ音声がマイクアレイＭＲ、ＭＬに入力される分をキャンセルできる。 As shown in FIG. 8, the outputs of the A / D converters 211 and 212 are directly input to the difference value calculation circuit 22. Therefore, in the apparatus according to the second embodiment, the number N of the microphones MRi and the microphones MLi are the same and are provided at symmetrical positions. The difference value calculation circuit 22 calculates “(voice signal of microphone MRi) − (voice signal of microphone MLi)” (i = 1 to N), respectively. As a result, as with the apparatus of the embodiment shown in FIG. 4, it is possible to cancel the amount of audio that has entered from the speaker array SPA and input to the microphone arrays MR and ML.

ここで、この第３の実施形態の装置では、それぞれのマイクＭＲｉ、ＭＬｉは、スピーカアレイＳＰＡの長手方向の中心線に関して略左右対称である必要がある。差分値計算回路２２で各マイク同士で回り込み音声をキャンセルするためである。なお、この差分値計算回路２２は、遠隔会議装置１のマイクアレイＭＲ、ＭＬの起動中は、常時計算を行う。 Here, in the apparatus of the third embodiment, each of the microphones MRi and MLi needs to be substantially symmetrical with respect to the center line in the longitudinal direction of the speaker array SPA. This is because the difference value calculation circuit 22 cancels the wraparound sound between the microphones. The difference value calculation circuit 22 always performs calculation while the microphone arrays MR and ML of the remote conference apparatus 1 are activated.

第３ビーム生成部２３７は、差分値計算回路２２の出力信号の束を基にして、第１ビーム生成部２３１、第２ビーム生成部２３２と同様に、仮想的な４つの収音エリアを焦点とする収音ビームを出力する。この仮想的な収音エリアは、スピーカアレイＳＰＡの中心線１０１に関して左右対称に設定した収音エリア対（４１１Ｒと４１１Ｌ、４１２Ｒと４１２Ｌ、４１３Ｒと４１３Ｌ、４１４Ｒと４１４Ｌ：図３参照）に対応し、第３ビーム生成部２３７が出力する音声信号は、第１の実施形態における差分信号Ｄ（４１１）、Ｄ（４１２）、Ｄ（４１３）、Ｄ（４１４）と同様のものである。この差分信号を、ＢＰＦ２４１を通して第１推定部２５１に出力すれば、図４で示した装置の第１推定部２５１と同様に音源位置の推定を行うことができる。この推定結果２５１１、２５１２は、第４ビーム生成部２３８、第５ビーム生成部２３９に出力される。 The third beam generation unit 237 focuses four virtual sound collection areas based on the bundle of output signals of the difference value calculation circuit 22 in the same manner as the first beam generation unit 231 and the second beam generation unit 232. The sound collection beam is output. This virtual sound collection area corresponds to a pair of sound collection areas (411R and 411L, 412R and 412L, 413R and 413L, 414R and 414L: see FIG. 3) set symmetrically with respect to the center line 101 of the speaker array SPA. The audio signal output from the third beam generation unit 237 is the same as the differential signals D (411), D (412), D (413), and D (414) in the first embodiment. If this difference signal is output to the first estimation unit 251 through the BPF 241, the sound source position can be estimated in the same manner as the first estimation unit 251 of the apparatus shown in FIG. The estimation results 2511 and 2512 are output to the fourth beam generation unit 238 and the fifth beam generation unit 239.

図８の第４ビーム生成部２３８、第５ビーム生成部２３９について説明する。第４ビーム生成部２３８、第５ビーム生成部２３９には、Ａ／Ｄ変換器２１１、２１２が出力するデジタル音声信号が直接入力されている。このデジタル音声信号に基づいて、第１推定部２５１から入力された推定結果２５１１，２５１２が指示する収音エリアを焦点とする収音ビームを生成し、その収音エリアの音声信号を取り出す。すなわち、この第４ビーム生成部２３８、第５ビーム生成部２３９が生成する収音ビームが、第１実施形態において、セレクタ２７１，２７２が選択した収音ビームに対応する。 The fourth beam generation unit 238 and the fifth beam generation unit 239 in FIG. 8 will be described. Digital audio signals output from the A / D converters 211 and 212 are directly input to the fourth beam generation unit 238 and the fifth beam generation unit 239. Based on this digital audio signal, a sound collection beam with the sound collection area indicated by the estimation results 2511 and 2512 input from the first estimation unit 251 as a focus is generated, and the sound signal in the sound collection area is extracted. That is, the sound collection beams generated by the fourth beam generation unit 238 and the fifth beam generation unit 239 correspond to the sound collection beams selected by the selectors 271 and 272 in the first embodiment.

このように、この第４ビーム生成部２３８、第５ビーム生成部２３９は、第１推定部２５１から指示された収音ビームで収音した１系統の音声出力のみを出力する。この第４ビーム生成部２３８、第５ビーム生成部２３９が、各収音ビームの焦点である収音エリアから収音した音声信号は、第２推定部２５２に入力される。 As described above, the fourth beam generation unit 238 and the fifth beam generation unit 239 output only one system of sound output collected by the sound collection beam instructed by the first estimation unit 251. The sound signals collected by the fourth beam generation unit 238 and the fifth beam generation unit 239 from the sound collection area that is the focal point of each sound collection beam are input to the second estimation unit 252.

以下の動作は、第１の実施形態と同様である。第２推定部２５２は、２つの音声信号を比較し、そのレベルの大きい方の収音エリアに音源が存在すると判定する。第２推定部２５２は、この真の音源が存在する収音エリアの方向，距離を示す情報を位置情報２５２２として多重化部２８に出力するとともに、信号選択部２６にこの真の音源の音声信号を選択的に多重化部２８に入力するように指示する。多重化部２８は、第２推定部２５２から入力された位置情報２５２２と、信号選択部２６から選択された真の音源の音声信号２６１とを多重化し、この多重化した信号を相手装置に対して送信する。 The following operations are the same as those in the first embodiment. The second estimation unit 252 compares the two audio signals and determines that a sound source exists in the sound collection area having the higher level. The second estimation unit 252 outputs information indicating the direction and distance of the sound collection area where the true sound source is present to the multiplexing unit 28 as position information 2522, and also outputs the audio signal of the true sound source to the signal selection unit 26. Is selectively input to the multiplexer 28. The multiplexing unit 28 multiplexes the position information 2522 input from the second estimation unit 252 and the sound signal 261 of the true sound source selected from the signal selection unit 26, and the multiplexed signal is transmitted to the counterpart device. To send.

なお、図８に示した第３の実施形態においても、第２の実施形態と同様、多段階に推定を行って、音源の位置を最初は広く、再度狭く絞り込んで探索することも可能である。その場合には、第２推定部２５２は、１回の探索が終了すると、さらに狭い範囲を探索するよう指示する指示入力２５２３、２５２４を第４，第５ビーム生成部２３８、２３９に出力する。この動作は音源が存在する側のビーム生成部のみに対して出力する。この指示入力を受けたビーム生成部は、この指示入力を受けると内部のさらに狭い範囲に対応するディレイパターンを読み出し、ＲＯＭからディレイパターンのデータ４０ｊを書き換える。 In the third embodiment shown in FIG. 8 as well, as in the second embodiment, it is possible to perform a multi-stage estimation and search by narrowing down the position of the sound source at first wide and narrow again. . In that case, when one search is completed, the second estimation unit 252 outputs instruction inputs 2523 and 2524 for instructing to search a narrower range to the fourth and fifth beam generation units 238 and 239. This operation is output only to the beam generator on the side where the sound source exists. Upon receiving this instruction input, the beam generation unit reads out the delay pattern corresponding to a narrower internal range and rewrites the delay pattern data 40j from the ROM.

なお、第１，第３の実施形態では、第１推定部２５１が、左右の収音エリア４１１Ｒ〜４１４Ｒ、４１１Ｌ〜４１４Ｌからそれぞれ１つずつの収音エリア（４１ｊＲ、４１ｊＬ）を選択し、さらに、第２推定部２５２が、４１ｊＲ、４１ｊＬのどちらに真の音源が存在するかを推定しているが、必ずしも第２推定部を設ける必要はない。 In the first and third embodiments, the first estimation unit 251 selects one sound collection area (41jR, 41jL) from each of the left and right sound collection areas 411R to 414R, 411L to 414L, and further The second estimation unit 252 estimates in which of 41jR and 41jL the true sound source exists, but the second estimation unit is not necessarily provided.

たとえば、遠隔会議装置を右側または左側片方のみで使用している場合など、真の音源の反対側に雑音源がない場合には、収音エリア４１ｊＲ、４１ｊＬの両方の音声の合成信号（または差分信号）をそのまま収音信号として相手装置に出力しても問題ないからである。 For example, when there is no noise source on the opposite side of the true sound source, such as when the teleconferencing device is used only on the right side or the left side, a synthesized signal (or difference) of both the sound collection areas 41jR and 41jL This is because there is no problem even if the signal is directly output to the counterpart device as a sound collection signal.

また、これらの実施形態で示した数値等は、本発明を限定するものではない。また、以上の図で、機能を発揮するブロックの構成間に信号のやり取りがある場合には、これらのブロックの機能の一部が他方のブロックで処理する構成でも、以上で示した実施形態と同様の効果を奏する場合がありうる。 The numerical values and the like shown in these embodiments do not limit the present invention. Further, in the above diagram, when there is a signal exchange between the configurations of the blocks that exhibit the functions, even in the configuration in which some of the functions of these blocks are processed by the other block, The same effect may be produced.

この発明の第１の実施形態に係る遠隔会議装置の外観および使用形態を示す図The figure which shows the external appearance and usage pattern of the remote conference apparatus which concerns on 1st Embodiment of this invention 同遠隔会議装置の音声ビームおよび収音ビームを説明する図The figure explaining the sound beam and sound collection beam of the teleconference device 同遠隔会議装置のマイクアレイに設定される収音エリアを説明する図The figure explaining the sound collection area set to the microphone array of the remote conference device 同遠隔会議装置の送信部のブロック図Block diagram of the transmitter of the teleconference device 同遠隔会議装置の第１ビーム生成部の構成図Configuration diagram of first beam generation unit of the teleconference device 同遠隔会議装置の受信部のブロック図Block diagram of the receiving unit of the teleconference device この発明の第２の実施形態の遠隔会議装置の送信部のブロック図The block diagram of the transmission part of the remote conference apparatus of 2nd Embodiment of this invention この発明の第３の実施形態の遠隔会議装置の送信部のブロック図The block diagram of the transmission part of the remote conference apparatus of 3rd Embodiment of this invention

Explanation of symbols

１…遠隔会議装置、２…送信部、２２…差分値計算回路、２３１…第１ビーム生成部、２３２…第２ビーム生成部、２３７…第３ビーム生成部、２３８…第４ビーム生成部、２３９…第５ビーム生成部、２５１…第１推定部、２５２…第２推定部、２５３…第３推定部、２５４…第４推定部、２６…信号選択部、２７１，２７２…セレクタ、２８…多重化部
３…受信部、３１…データ受信部、３２…パラメータ算出部、３３…指向性制御部
４５ｊ（ｊ＝１〜Ｋ）…遅延処理部、４０ｊ（ｊ＝１〜Ｋ）…ディレイパターンメモリ、４６１ｉ（ｉ＝１〜Ｎ）…ディレイ、４７ｊ（ｊ＝１〜Ｋ）…マイク入力合成部
ＳＰｉ（ｉ＝１〜Ｍ）…スピーカ、ＳＰＡ…スピーカアレイ
ＭＬ，ＭＲ…マイクアレイ、ＭＬｉ（ｉ＝１〜Ｎ），ＭＲｉ（ｉ＝１〜Ｎ）…マイク
１００…机、１０１…中心線
４１１Ｒ〜４１４Ｒ，４１１Ｌ〜４１４Ｌ…収音エリア、９９９…音源（話者） DESCRIPTION OF SYMBOLS 1 ... Remote conference apparatus, 2 ... Transmission part, 22 ... Difference value calculation circuit, 231 ... 1st beam generation part, 232 ... 2nd beam generation part, 237 ... 3rd beam generation part, 238 ... 4th beam generation part, 239 ... Fifth beam generation unit, 251 ... First estimation unit, 252 ... Second estimation unit, 253 ... Third estimation unit, 254 ... Fourth estimation unit, 26 ... Signal selection unit, 271,272 ... Selector, 28 ... Multiplexer 3 ... receiver, 31 ... data receiver, 32 ... parameter calculator, 33 ... directivity controller 45 j (j = 1 to K) ... delay processor, 40 j (j = 1 to K) ... delay pattern Memory, 461i (i = 1 to N) ... Delay, 47j (j = 1 to K) ... Microphone input synthesis unit SPi (i = 1 to M) ... Speaker, SPA ... Speaker array ML, MR ... Microphone array, MLi ( i = 1 to N), MRi (i = 1 to 1) N) ... Microphone 100 ... Desk, 101 ... Center lines 411R to 414R, 411L to 414L ... Sound collection area, 999 ... Sound source (speaker)

Claims

A speaker array comprising a plurality of speakers for outputting sound;
First and second microphone arrays provided to pick up sound on both sides in the longitudinal direction of the speaker array;
By synthesizing and synthesizing the audio signals picked up by the microphones of the first microphone array and the second microphone array by delay processing, a plurality of positions are symmetrically located with respect to the longitudinal center line of the speaker array. Sound collection area setting means for setting each of the first sound collection area and the plurality of second sound collection areas;
Differences for calculating differential signals of sound signals collected from the pair of sound collection areas at symmetrical positions among the sound signals collected from the plurality of first sound collection areas and the plurality of second sound collection areas, respectively. Signal calculation means;
First sound source position estimating means for selecting a sound collection area pair having a high signal intensity of the differential signal;
A sound collection area with a higher intensity of the collected sound signal is selected from the sound collection area pair selected by the first sound source position estimating means, and a second sound source position is estimated to be present in the sound collection area. Sound source position estimation means,
Equipped with a,
The sound collection area setting means further sets a plurality of narrow sound collection areas in the sound collection area selected by the second sound source position estimation means, and focuses each of the plurality of narrow sound collection areas. A function to generate a narrow sound collection beam of
A remote conferencing apparatus comprising: a third sound source position estimating means for estimating that a sound source position is in an area where the intensity of the collected sound signal is large among the plurality of narrow sound collection areas .