JP2007151103A

JP2007151103A - Teleconference device

Info

Publication number: JP2007151103A
Application number: JP2006294683A
Authority: JP
Inventors: Makoto Tanaka; 田中　　良; Takuya Tamaru; 卓也田丸; Katsuichi Osakabe; 勝一刑部
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-11-02
Filing date: 2006-10-30
Publication date: 2007-06-14
Anticipated expiration: 2026-10-30
Also published as: JP4867579B2

Abstract

PROBLEM TO BE SOLVED: To reproduce correctly the acoustic field information on sound source by outputting it to a speaker array, while suppressing the consumption of transmission line resource, in a teleconference device for reproducing the received sound and its acoustic field. SOLUTION: A teleconference device 1 includes a function as a transmitter unit 1A and a receiver unit 1B. The transmitter unit 1A performs transmission 20 by transmitting an audio signal 27B formed from a sound collection signal of a microphone array Mi (i=1-N) and position information 27A. The position information forms a plurality of sound collection beams directed to a particular direction and selects the sound collection beam of the maximum sound volume. In the receiver unit 1B, a parameter calculation unit 32 sets a virtual sound source according to the data of received signals 30 and the parameter calculation unit 32 sets a delay parameter. A virtual sound source generation signal processing unit 33 forms a sound emission beam according to the parameters and outputs it to a speaker SPi. COPYRIGHT: (C)2007,JPO&INPIT

Description

音声を位置情報と共に送信して、受信した音声とその音場を再現する遠隔会議装置に関する。 The present invention relates to a remote conference apparatus that transmits voice together with position information and reproduces the received voice and its sound field.

従来、送信側の音声を受信して、送信側の音声の音場を再現する遠隔会議装置が提案されている（特許文献１〜２参照。）。 Conventionally, there has been proposed a remote conference apparatus that receives a voice on the transmission side and reproduces a sound field of the voice on the transmission side (see Patent Documents 1 and 2).

特許文献１には、以下の装置が記載されている。マイクによって話者の音声をとらえ、マイクから得られる話者情報によって話者位置情報を形成し、この話者位置情報を音声情報と共に多重化して伝送させる。また、受信側では送られて来る話者位置情報により鳴動させるスピーカの位置を切り替え、話者の声と位置を受信側で再現する。 Patent Document 1 describes the following apparatus. The voice of the speaker is picked up by the microphone, the speaker position information is formed by the speaker information obtained from the microphone, and the speaker position information is multiplexed and transmitted together with the voice information. Further, the position of the speaker to be ringed is switched on the receiving side based on the speaker position information sent, and the voice and position of the speaker are reproduced on the receiving side.

また、特許文献２では、複数のマイクアレイで受け取った音声情報を送信して、これと同数のスピーカアレイに出力することにより、送信元の音場を再現する立体音声情報の創出方法等が開示されている。
特開平９−２６１３５１号公報特開平２−１１４７９９号公報 Also, Patent Document 2 discloses a method for creating stereoscopic audio information that reproduces a sound field of a transmission source by transmitting audio information received by a plurality of microphone arrays and outputting it to the same number of speaker arrays. Has been.
JP-A-9-261351 Japanese Patent Laid-Open No. 2-114799

しかしながら、特許文献１では、再生する音声を音像定位させる方法が複数のスピーカの音量バランスのみで調整することによるため、正確な音像定位を行うことが困難であった。 However, in Patent Document 1, it is difficult to perform accurate sound image localization because the method of localizing sound to be reproduced is based on only the volume balance of a plurality of speakers.

また、特許文献２では、複数のマイクアレイで受け取った音場情報をそのまま出力するので、自然な音声を出力することができる反面、アレイスピーカの数と同数のチャンネル数の信号を遠隔地に伝送しなければならないため、多くの伝送回線リソースを消費してしまう欠点があった。 In Patent Document 2, since sound field information received by a plurality of microphone arrays is output as it is, natural sound can be output, but a signal having the same number of channels as the number of array speakers is transmitted to a remote place. Therefore, there is a drawback that a lot of transmission line resources are consumed.

そこで、本発明は、送信先の音場を再現する装置において、受信した音源位置情報に基づいて音声信号をスピーカアレイに出力することにより、伝送回線リソースの消費を抑えつつ、送信先の音場を正確に再現することを目的とする。 Accordingly, the present invention provides an apparatus for reproducing a sound field of a transmission destination by outputting a sound signal to the speaker array based on the received sound source position information, thereby suppressing the consumption of transmission line resources and the sound field of the transmission destination. The purpose is to reproduce accurately.

本発明は、上述の課題を解決するための手段を以下のように構成している。 In the present invention, means for solving the above-described problems are configured as follows.

（１）本発明は、
音声を収音し、収音信号として出力する複数のマイクをアレイ状に配置したマイクアレイと、
前記複数のマイクから出力した収音信号を遅延加算することにより複数の収音エリアへ指向させた収音ビームをそれぞれ形成する収音ビーム形成部と、
前記複数の収音ビームのうち最大音量を示す収音ビームに対応する収音エリアを位置情報として検出する位置情報検出手段と、
前記マイクの収音信号の出力と前記位置情報とを送信する送信手段と、
複数のスピーカをアレイ状に配置したアレイスピーカと、
外部より音声信号および位置情報を受信する受信部と、
前記受信した位置情報に基づき決定される位置を仮想音源位置とする放音ビームが形成されるよう前記受信した音声信号を処理して前記複数のスピーカに供給する信号処理部と、
を備えたことを特徴とする。 (1) The present invention
A microphone array in which a plurality of microphones that collect sound and output as sound collection signals are arranged in an array;
A sound collection beam forming unit that respectively forms a sound collection beam directed to a plurality of sound collection areas by delay-adding the sound collection signals output from the plurality of microphones;
Position information detection means for detecting, as position information, a sound collection area corresponding to a sound collection beam indicating the maximum volume among the plurality of sound collection beams;
Transmitting means for transmitting the output of the sound pickup signal of the microphone and the position information;
An array speaker in which a plurality of speakers are arranged in an array; and
A receiving unit for receiving an audio signal and position information from the outside;
A signal processing unit that processes the received audio signal so as to form a sound emission beam having a position determined based on the received position information as a virtual sound source position and supplies the sound output beam to the plurality of speakers;
It is provided with.

本発明の装置は、２台以上の遠隔会議装置を通信回線やネットワークを介して相互に接続するための、１台分の機能を備えている。即ち、受信側ユニットと送信側ユニットとを備えている。
送信側ユニットでは、音源位置情報検出手段は音源の位置を検出すると共に、マイクはその音源の音声信号を収音し、この音声を電気信号に変換する。 The device of the present invention has a function for one device for connecting two or more remote conference devices to each other via a communication line or a network. That is, a receiving unit and a transmitting unit are provided.
In the transmission side unit, the sound source position information detection means detects the position of the sound source, and the microphone picks up the sound signal of the sound source and converts this sound into an electric signal.

音源位置情報検出手段は、複数の位置へ指向させた収音ビームを前記複数のマイクの収音信号から形成し、その複数ある収音ビームの最大音量から収音ビームの収音エリアの位置を話者の位置情報として検出する。送信部は、前記アレイマイクで得られた音声信号と前記位置情報とを送信する。 The sound source position information detection means forms a sound collection beam directed to a plurality of positions from the sound collection signals of the plurality of microphones, and determines the position of the sound collection area of the sound collection beam from the maximum volume of the plurality of sound collection beams. Detect as speaker location information. The transmission unit transmits the audio signal obtained by the array microphone and the position information.

受信側ユニットは、受信部が相手装置から音源の音声信号と前記音源の位置情報とを受信する。信号処理部は、前記スピーカユニットの後方に、この受信した音源の位置情報によって決定される位置を仮想音源位置とする放音ビームが形成されるよう、前記受信した音声信号を処理して前記スピーカユニットに供給する。例えば、この仮想音源の座標は、装置中央を原点とし、背面方向をＹ軸とする座標軸平面上とすることができる。
以上、２台の遠隔会議装置のうちの一台について説明したが、逆方向への音声信号・位置関係の収音・放音も同様である。 In the receiving unit, the receiving unit receives the sound signal of the sound source and the position information of the sound source from the counterpart device. The signal processing unit processes the received audio signal so as to form a sound emission beam having a virtual sound source position at a position determined by the received sound source position information behind the speaker unit. Supply to the unit. For example, the coordinates of the virtual sound source can be on a coordinate axis plane with the center of the apparatus as the origin and the back direction as the Y axis.
Although one of the two remote conference apparatuses has been described above, the same applies to sound signal collection / sound emission in the opposite direction.

本発明のこれらの構成により、会議参加者に対して、遠隔会議装置をはさんだ反対側に設けた相手装置側の会議室の仮想的な音源と対話しているかのような、リアルな位置関係で対話をすることができる。また、前記音源の位置情報を、前記音源の音声信号と共に受信するから、特許文献２のように、多くの伝送回線リソースを消費してしまう欠点を解消できる。また、特許文献２のように、スピーカが送信側と受信側で同じ数に限定する必要がない。 With these configurations of the present invention, a realistic positional relationship as if the conference participants are interacting with a virtual sound source in the conference room on the opposite device side provided on the opposite side across the remote conference device You can have a conversation. Further, since the position information of the sound source is received together with the sound signal of the sound source, it is possible to eliminate the disadvantage of consuming a lot of transmission line resources as in Patent Document 2. Further, unlike Patent Document 2, it is not necessary to limit the number of speakers to the same number on the transmission side and the reception side.

なお、送信手段が送信する前記マイクの収音信号の出力は、マイクの収音信号であればよく、音源が複数のマイク音声信号のうちのいずれか１つ、または複数を合成するもの（例えば単純加算したもの）でも良いし、また複数のマイクからの収音信号を遅延加算した収音ビームを出力するものでも良いし、収音ビームを複数合成するものでも良い。 Note that the output of the collected sound signal of the microphone transmitted by the transmitting means may be any sound collected signal of the microphone, and the sound source synthesizes any one or a plurality of microphone sound signals (for example, It may be a simple addition), may output a sound collection beam obtained by delay-adding sound collection signals from a plurality of microphones, or may synthesize a plurality of sound collection beams.

（２）本発明は、
前記位置情報検出手段は、前記最大音量となる収音ビームの収音エリアを検出した後、該収音エリアをさらに細分した複数の細分化収音エリアへ指向させた取得用収音ビームを各々形成し、前記取得用収音ビームの音量の大きい順に複数選択した取得用収音ビームに対応する細分化収音エリアに基づいて前記位置情報を検出することを特徴とする。 (2) The present invention
The position information detecting means detects a sound collecting area of the sound collecting beam having the maximum sound volume, and then obtains an acquisition sound collecting beam directed to a plurality of subdivided sound collecting areas obtained by further subdividing the sound collecting area. The position information is detected based on subdivided sound collection areas corresponding to the plurality of acquisition sound collection beams that are formed and are selected in descending order of the volume of the acquisition sound collection beam.

この構成では、位置情報検出手段は、細分化収音エリアへ指向させた取得用収音ビームを形成し、その収音ビームの音量が大きいものを選択するから、位置情報としてより細かいエリアを探索できる。また、この構成では、初めから細かいエリアそれぞれに対する収音ビームを形成して選択するのではなくて、２段階に絞り込んで選択している。したがって、形成すべき収音ビームの数を少なくすることができるから、位置情報検出手段に用いるハードウェアを簡素化できる。 In this configuration, the position information detection means forms the acquisition sound collection beam directed to the subdivided sound collection area, and selects a sound volume of the sound collection beam having a large volume. it can. Further, in this configuration, instead of forming and selecting a sound collecting beam for each fine area from the beginning, selection is made by narrowing down to two stages. Therefore, since the number of sound collecting beams to be formed can be reduced, the hardware used for the position information detecting means can be simplified.

（３）本発明は、
前記取得用収音ビームの音量の大きい順に複数選択した取得用収音ビームに対応する細分化収音エリアの間を、前記選択した取得用収音ビームの強弱に応じた比例配分により、前記位置情報を検出すると共に、
前記信号処理部は、前記選択した複数の取得用収音ビームの出力を前記比例配分により合成することを特徴とする。 (3) The present invention
The position between the subdivided sound collection areas corresponding to the plurality of acquisition sound collection beams selected in descending order of the volume of the acquisition sound collection beam is proportionally distributed according to the strength of the selected acquisition sound collection beam. Detect information,
The signal processing unit synthesizes the outputs of the selected plurality of acquisition sound collecting beams by the proportional distribution.

この構成では、位置情報検出手段は、選択した細分化収音エリアの間を、前記選択した取得用収音ビームの強弱に対応した比例配分により位置情報を検出するので、細分化収音エリアの中間部分を補間でき、より精度の高い位置情報を得ることができる。また、信号処理部は前記選択した複数の取得用収音ビームの出力を前記比例配分に基づいて合成するので、各収音ビームの収音エリアの中間部分を補間できる。また、このように常にバランスをとって補間することで、音源が移動した場合にも、滑らかに収音ビームの切り替えをすることができる。 In this configuration, the position information detecting means detects the position information between the selected subdivided sound collection areas by proportional distribution corresponding to the strength of the selected acquisition sound collection beam. The intermediate portion can be interpolated, and more accurate position information can be obtained. Further, since the signal processing unit synthesizes the outputs of the selected plurality of acquisition sound collecting beams based on the proportional distribution, it is possible to interpolate an intermediate portion of the sound collection areas of the respective sound collection beams. Further, by always interpolating in a balanced manner in this way, the sound collecting beams can be switched smoothly even when the sound source moves.

本発明によれば、会議参加者に対して、遠隔会議装置をはさんだ反対側に設けた相手装置側の会議室の仮想的な音源と対話しているかのような、リアルな位置関係で対話をすることができる。 According to the present invention, it is possible to interact with a conference participant in a realistic positional relationship as if interacting with a virtual sound source in a conference room on the other device side provided on the opposite side across the remote conference device. Can do.

図１を用いて、本実施形態の音声通信に用いる遠隔会議装置について説明する。図１は、本実施形態の音声通信に用いる遠隔会議装置の外観図（Ａ）と、概略機能図（Ｂ）である。 A remote conference apparatus used for voice communication according to the present embodiment will be described with reference to FIG. 1A and 1B are an external view (A) and a schematic functional diagram (B) of a remote conference device used for voice communication according to the present embodiment.

まず、図１（Ａ）を用いて、遠隔会議装置１の構成を説明する。図１（Ａ）に示すように、遠隔会議装置１は、外観上、複数のスピーカＳＰｉ（ｉ＝１〜Ｎ、Ｎは整数）からなるスピーカアレイと、複数のマイクＭｉ（ｉ＝１〜Ｎ）からなるマイクアレイとを備えており、例えば、机１０１の上に設置されるものである。遠隔地に設置された２台の遠隔会議装置と通信回線で接続して、相互に音声通信を行う。 First, the configuration of the remote conference apparatus 1 will be described with reference to FIG. As shown in FIG. 1 (A), the remote conference device 1 has an appearance of a speaker array composed of a plurality of speakers SPi (i = 1 to N, N is an integer) and a plurality of microphones Mi (i = 1 to N). ), And is installed on the desk 101, for example. Connect to two remote conferencing devices installed at remote locations via a communication line, and perform voice communication with each other.

なお、スピーカアレイを構成する複数のスピーカＳＰｉ（ｉ＝１〜Ｎ）は、それぞれ独立して音声信号を扱うことができる音声処理系統に接続され、独立に音声出力が可能となっており、マイクアレイを構成する複数のマイクＭｉ（ｉ＝１〜Ｎ）は、それぞれ独立に音声出力をデジタルデータに変換可能である。なお、座標系Ｘ、Ｙ、Ｚについての説明は後述する。 The plurality of speakers SPi (i = 1 to N) constituting the speaker array are connected to an audio processing system that can handle an audio signal independently, and can output audio independently, and a microphone The plurality of microphones Mi (i = 1 to N) constituting the array can independently convert the audio output into digital data. The description of the coordinate systems X, Y, and Z will be described later.

次に、図１（Ｂ）を用いて、遠隔会議装置１の機能の概略を説明する。図１（Ｂ）の左側の図に示すように、音声を送信する場合には、送信側の話者１０２Ａは、遠隔会議装置１に向かって語りかける。そうすると、送信側ユニット１Ａでは、マイクアレイを構成する複数のマイクＭｉ（ｉ＝１〜Ｎ）がそれぞれ音声を収音するとともに、これら収音された音声に基づいて送信側の話者１０２Ａの位置を検出し、収音した音声信号とその位置を示す位置情報とを送信する。この位置情報は、収音した電気信号それぞれにディレイを設定して合成した（即ち遅延加算した）収音ビームを複数パターン形成し、そのパターンの強弱から計算する。この位置情報の座標系は、例えば、図１（Ａ）、図１（Ｂ）に示すように、ＸＹＺ軸座標系で与えることができる。具体的には、ＸＹＺ軸座標系は、遠隔会議装置１の受音面に対して、上下方向がＺ軸、左右方向がＸ軸、この受音面に垂直な奥行き方向をＹ軸方向とする。また、Ｘ軸、Ｚ軸方向の代わりに、この受音面の中心からの距離Ｒと角度θの極座標で与えることも可能である。 Next, an outline of functions of the remote conference apparatus 1 will be described with reference to FIG. As shown in the diagram on the left side of FIG. 1B, when transmitting a voice, the transmitting speaker 102 A speaks toward the remote conference device 1. Then, in the transmission-side unit 1A, a plurality of microphones Mi (i = 1 to N) constituting the microphone array each collects sound, and the position of the speaker 102A on the transmission side is based on the collected sound. Is detected, and the collected sound signal and position information indicating the position are transmitted. This position information is calculated from the strength of the pattern by forming a plurality of patterns of sound collection beams synthesized (ie, delay-added) by setting a delay for each collected electrical signal. The coordinate system of this position information can be given by an XYZ axis coordinate system as shown in FIGS. 1 (A) and 1 (B), for example. Specifically, in the XYZ axis coordinate system, the vertical direction with respect to the sound receiving surface of the teleconference device 1 is the Z axis, the left and right direction is the X axis, and the depth direction perpendicular to the sound receiving surface is the Y axis direction. . Further, instead of the X-axis and Z-axis directions, it is also possible to give the polar coordinates of the distance R from the center of the sound receiving surface and the angle θ.

また、図１（Ｂ）の右側の図に示すように、遠隔会議装置１の受信側では、この送信された音声信号と位置情報とを基に、受信側ユニット１Ｂは、仮想音源ＶＳの位置を計算し、これに基づいて、各スピーカＳＰｉに出力する信号へのディレイ量をそれぞれ算出し、送信側ユニット１Ａ側で話された音声にディレイを付与してスピーカＳＰｉに出力する。これにより、送信側ユニット１Ａ側で話された音声の音場を正確に再現して、受信側の聴取者１０２Ｂに、送信側の音場を伝える。ここで、仮想音源ＶＳは、受信側ユニット１Ｂに話しかける受信側の聴取者１０２Ｂと対向する位置に設定するのが自然であるから、受信側の聴取者１０２Ｂから見て左右方向のＸ軸座標及び奥行き方向のＹ軸座標は、実際に送信側ユニット１Ａで得られた値と、受信側ユニット１Ｂで仮想的な位置を設定する座標の値が正負逆になるようにする。前述の極座標を用いる場合も同様である。 As shown in the diagram on the right side of FIG. 1B, on the reception side of the remote conference device 1, the reception side unit 1B determines the position of the virtual sound source VS based on the transmitted audio signal and position information. Based on this, the delay amount to the signal output to each speaker SPi is calculated, and a delay is added to the speech spoken on the transmission side unit 1A side and output to the speaker SPi. Thereby, the sound field of the voice spoken on the transmission side unit 1A side is accurately reproduced, and the sound field on the transmission side is transmitted to the listener 102B on the reception side. Here, since it is natural to set the virtual sound source VS at a position facing the listener 102B on the receiving side talking to the receiving unit 1B, the X-axis coordinates in the left and right directions as viewed from the listener 102B on the receiving side and The Y-axis coordinate in the depth direction is set so that the value actually obtained by the transmission-side unit 1A and the value of the coordinate for setting the virtual position by the reception-side unit 1B are reversed. The same applies when using the polar coordinates described above.

なお、以上の図１の説明では、送信側ユニット１Ａと受信側ユニット１Ｂを概念的に区別するため別の符号を用いて記述したが、実際には、本実施形態の遠隔会議装置１は、双方向通信が可能であり、受信側の聴取者１０２Ｂも、送信側の話者１０２Ａに音声を受信側ユニット１Ｂのマイクアレイを通じて送信することができる。すなわち、各遠隔会議装置は、図１（Ｂ）の送信側ユニット１Ａと受信側ユニット１Ｂを一体に備えたものである。
また、本実施形態では、遠隔会議装置１が伝達する音声として、人声を例に挙げているが、本実施形態の装置１の対象は、人声に限らない。例えば、音楽等であっても良い。以下、同じである。 In the above description of FIG. 1, the transmission side unit 1 A and the reception side unit 1 B have been described using different symbols in order to conceptually distinguish them. Bidirectional communication is possible, and the listener 102B on the receiving side can also transmit the voice to the speaker 102A on the transmitting side through the microphone array of the receiving unit 1B. That is, each remote conference apparatus is integrally provided with the transmission side unit 1A and the reception side unit 1B of FIG.
In the present embodiment, human voice is given as an example of the voice transmitted by the remote conference device 1, but the target of the device 1 of the present embodiment is not limited to the human voice. For example, music may be used. The same applies hereinafter.

ここで、図５（Ａ）を参照して送信側ユニット１Ａで行なう話者の位置の探索方法について簡単に説明する（図５の詳細は後述する。）。遠隔会議装置１内部の検出用ビーム形成部（後述図２の検出用ビーム形成部２１に相当）は、図５（Ａ）に示すように、このディレイパターンそれぞれに従い、送信側の話者１０２Ａの位置として想定する、複数の収音エリア１１１〜１１４へ指向させた収音ビームを各々形成する。 Here, referring to FIG. 5A, a method for searching for the position of the speaker performed by the transmitting unit 1A will be briefly described (details of FIG. 5 will be described later). As shown in FIG. 5A, the detection beam forming unit (corresponding to the detection beam forming unit 21 shown in FIG. 2 described later) in the teleconference apparatus 1 follows the delay pattern of the transmitting speaker 102A. A sound collection beam directed to a plurality of sound collection areas 111 to 114 assumed as positions is formed.

この収音ビームは、マイクＭｉ（ｉ＝１〜Ｎ）それぞれからの距離差によるディレイ量差をその特定の位置に合うように調整してデジタル音声信号を重ね合わせることにより形成する。これにより、その特定の位置の音声が互いに強められて入力されると共に、この特定の位置以外から来る音声は互いに打ち消されることになり、この特定の位置の音声への指向性を有することになる。 This sound collecting beam is formed by superimposing digital audio signals by adjusting a delay amount difference due to a distance difference from each of the microphones Mi (i = 1 to N) so as to match the specific position. As a result, the voices at the specific position are input while being strengthened with each other, and the voices coming from other than the specific position are canceled out, and have directivity to the voice at the specific position. .

この収音エリア１１１〜１１４と送信側の話者１０２Ａの位置とが合致していれば、これらの収音ビームの中から、音量が最も大きな収音ビームが得られることになる。そこで、遠隔会議装置１内部の検出用ビーム形成部は、収音エリア１１１〜１１４に指向させた４個の収音ビームを同時に形成し、音量の大きい方向を探索する。また、遠隔会議装置１内部の取得用ビーム形成部（後述図２の取得用ビーム形成部２２に相当し、検出用ビーム形成部と同様の回路構成を持つ）は、このようにして得られた音量の大きい方向（図５ではエリア１１４）を更に分割する細分化エリア１３１〜１３４を設定する。また、この取得用ビーム形成部は、これらの細分化エリア１３１〜１３４に指向させた収音ビームを各々形成する。遠隔会議装置１は、この細分化エリア１３１〜１３４に向けた収音ビームから音量の大きい収音ビームを選択する。このように２段階で探索することにより、話者１０２Ａの位置を正確且つ迅速に探索できる。 If these sound collection areas 111 to 114 and the position of the speaker 102A on the transmission side match, the sound collection beam having the highest volume can be obtained from these sound collection beams. Therefore, the beam forming unit for detection inside the remote conference apparatus 1 simultaneously forms four sound collecting beams directed to the sound collecting areas 111 to 114 and searches for a direction in which the sound volume is high. In addition, an acquisition beam forming unit (corresponding to the acquisition beam forming unit 22 of FIG. 2 described later and having a circuit configuration similar to that of the detection beam forming unit) inside the teleconference apparatus 1 was obtained in this way. Subdivision areas 131 to 134 for further dividing the direction in which the volume is high (area 114 in FIG. 5) are set. In addition, the acquisition beam forming unit forms a sound collection beam directed to the subdivided areas 131 to 134, respectively. The remote conference apparatus 1 selects a sound collecting beam having a large volume from the sound collecting beams directed to the subdivided areas 131 to 134. Thus, by searching in two steps, the position of the speaker 102A can be searched accurately and quickly.

このビーム形成部は、常時稼動するか、または、遠隔会議装置１が所定の間隔ごと、例えば０．５秒おきにビーム位置算出部２４が位置検出を行い、そのときにビーム位置算出部２４から指示を受けて、稼動するようにする。 The beam forming unit is always operated, or the teleconferencing device 1 detects the position at a predetermined interval, for example, every 0.5 seconds, and the beam position calculating unit 24 then detects the position. Get instructions and get up and running.

次に、図２を用いて、遠隔会議装置１の内部構成について説明する。図２は、この内部構成を表すブロック図である。
まず、図２（Ａ）を用いて、送信側ユニット１Ａの構成について説明する。送信側ユニット１Ａは、収音した音声をデジタルで加工するために、マイクＭｉと、Ａ／Ｄ変換器２９とを備える。また、話者が話している位置を推定し、位置情報を制御信号２４１として出力することができるよう、話者がいると想定されるマイク前方の複数の位置へ指向させた収音ビームを形成する検出用ビーム形成部２１と、ＢＰＦ２３（バンドパスフィルタ）とビーム位置算出部２４を備える。また、ビーム位置算出部２４で検出した位置の近傍をさらに詳しく探索し、その位置へ指向させた収音ビームを形成することができるよう、取得用ビーム形成部２２と信号選択部２５と信号加算部２６を備える。また、位置情報２７Ａと収音ビームを多重化するために多重化部２８を備える。以下それぞれの構成を説明する。 Next, the internal configuration of the remote conference apparatus 1 will be described with reference to FIG. FIG. 2 is a block diagram showing this internal configuration.
First, the configuration of the transmission side unit 1A will be described with reference to FIG. The transmission side unit 1A includes a microphone Mi and an A / D converter 29 in order to digitally process the collected sound. Also, a sound collecting beam directed to a plurality of positions in front of the microphone assumed to be present is formed so that the position where the speaker is speaking can be estimated and position information can be output as the control signal 241. A detection beam forming unit 21, a BPF 23 (bandpass filter), and a beam position calculation unit 24. Further, the vicinity of the position detected by the beam position calculation unit 24 is searched in more detail, and the acquisition beam forming unit 22, the signal selection unit 25, and the signal addition are performed so that a sound collecting beam directed to the position can be formed. Part 26 is provided. Further, a multiplexing unit 28 is provided for multiplexing the position information 27A and the collected sound beam. Each configuration will be described below.

図２（Ａ）のマイクＭｉ（ｉ＝１〜Ｎ）は、図１に示すようにＮ個のマイクであり、これらのマイクＭｉにより送信側の話者１０２Ａの音声を音声信号に変換する。
Ａ／Ｄ変換器２９は、Ａ／Ｄ変換用のＩＣで構成でき、マイクＭｉにより取得した音声信号をデジタル音声信号に変換して、検出用ビーム形成部２１、取得用ビーム形成部２２に送る。 The microphones Mi (i = 1 to N) in FIG. 2A are N microphones as shown in FIG. 1, and these microphones Mi convert the voice of the speaker 102A on the transmission side into a voice signal.
The A / D converter 29 can be composed of an IC for A / D conversion, converts an audio signal acquired by the microphone Mi into a digital audio signal, and sends the digital audio signal to the detection beam forming unit 21 and the acquisition beam forming unit 22. .

検出用ビーム形成部２１は、話者がいると想定されるエリアへ指向させた収音ビームを複数同時に形成する。そのため、例えば、ディレイ処理を行うためのリングバッファ用ＲＡＭまたはそれに相当するものを備えて、マイクＭｉ各々で収音した音声信号のディレイ量を調整する。また、このディレイ処理を行うため、リングバッファ用ＲＡＭの書き込み、読み出しを制御するプログラムおよびディレイ制御用データを格納したＲＯＭを備える。またビーム形成部２１はこのプログラムを実行させるための計算部を備えている。遠隔会議装置１は、遠隔会議装置１の信号処理を動作させるＤＳＰを設けており、検出用ビーム形成部２１の計算部はその一機能として構成する。 The detection beam forming unit 21 simultaneously forms a plurality of sound collecting beams directed to an area assumed to have a speaker. Therefore, for example, a ring buffer RAM for performing a delay process or an equivalent thereof is provided to adjust the delay amount of the audio signal collected by each microphone Mi. In order to perform this delay processing, a ROM storing a program for controlling writing and reading of the ring buffer RAM and data for delay control is provided. In addition, the beam forming unit 21 includes a calculation unit for executing this program. The remote conference device 1 is provided with a DSP that operates the signal processing of the remote conference device 1, and the calculation unit of the beam forming unit for detection 21 is configured as one function thereof.

そこで、図２（Ａ）の検出用ビーム形成部２１は、前述したように送信側ユニット１Ａの前にある話者の位置を複数想定して収音ビームの収音エリアを設定し、その位置へ指向させた収音ビームを形成する。 Therefore, as described above, the detection beam forming unit 21 in FIG. 2A sets a sound collection area of the sound collection beam assuming a plurality of positions of the speaker in front of the transmission side unit 1A, and the position A sound collecting beam directed to the head is formed.

なお、以下の説明では、検出用ビーム形成部２１で形成する収音ビームと取得用ビーム形成部２２で形成する収音ビームを区別するため、後者の収音ビームを取得用ビームと称する。 In the following description, the latter sound collection beam is referred to as an acquisition beam in order to distinguish between the sound collection beam formed by the detection beam forming unit 21 and the sound collection beam formed by the acquisition beam forming unit 22.

図２（Ａ）の検出用ビーム形成部２１は、粗い範囲の探索をするために収音エリア（図５（Ａ）の１１１〜１１４に相当）へ指向させた収音ビームを形成する。
取得用ビーム形成部２２は、検出用ビーム形成部２１と同様の構成を備えている。ただし、取得用ビーム形成部２２は、検出用ビーム形成部２１の収音ビームを解析するビーム位置算出部２４の計算結果に基づき、さらに細かい範囲に収音エリア（図５（Ａ）の収音エリア１１４の中の細分化収音エリア１３１〜１３４）を設定する。取得用ビーム形成部２２は、細分化収音エリア１３１〜１３４に指向させた取得用ビームを各々形成し、信号選択部２５に出力する。このように、検出用ビーム形成部２１、取得用ビーム形成部２２で段階的に細かく収音ビーム、取得用ビームを形成し、送信側の話者１０２Ａの位置を探索しているので、初めから細かく探索するよりも、ハードウェアの構成を簡略化できると共に、計算速度が速くなり、送信側の話者１０２Ａの位置の変動に対する応答を速くすることができる。
図２（Ａ）のＢＰＦ２３は、検出用ビーム形成部２１の収音ビームＭＢ１〜ＭＢ４に対し、送信側の話者１０２Ａの検出に必要な音声帯域以外の帯域をカットするフィルタを畳み込み演算し、収音ビームＭＢ’１〜ＭＢ’４を出力する。これにより計算量を削減することができ、ビーム位置算出部２４が行なう探索を高速化できる。なお、ＢＰＦ２３では、このような帯域をカットしても、前述の検出用ビーム形成部２１では、粗い収音エリア１１１〜１１４を設定しているので問題ない。 The detection beam forming unit 21 in FIG. 2A forms a sound collecting beam directed to a sound collecting area (corresponding to 111 to 114 in FIG. 5A) in order to search for a rough range.
The acquisition beam forming unit 22 has the same configuration as the detection beam forming unit 21. However, the acquisition beam forming unit 22, based on the calculation result of the beam position calculation unit 24 that analyzes the collected sound beam of the detection beam forming unit 21, collects the sound collection area (the sound collection area of FIG. The subdivided sound collection areas 131 to 134) in the area 114 are set. The acquisition beam forming unit 22 forms acquisition beams directed to the subdivided sound collection areas 131 to 134 and outputs them to the signal selection unit 25. As described above, since the detection beam forming unit 21 and the acquisition beam forming unit 22 form the sound collection beam and the acquisition beam finely step by step and search for the position of the speaker 102A on the transmission side, from the beginning. Compared to the detailed search, the hardware configuration can be simplified, the calculation speed can be increased, and the response to the change in the position of the transmitting speaker 102A can be increased.
The BPF 23 in FIG. 2A performs a convolution operation on the collected sound beams MB1 to MB4 of the detection beam forming unit 21 with a filter that cuts a band other than the voice band necessary for detection of the speaker 102A on the transmission side, The sound collection beams MB′1 to MB′4 are output. As a result, the amount of calculation can be reduced, and the search performed by the beam position calculation unit 24 can be speeded up. In the BPF 23, even if such a band is cut, there is no problem because the detection beam forming unit 21 sets the rough sound collection areas 111 to 114.

ビーム位置算出部２４は、ＢＰＦ２３で出力した収音ビームＭＢ’１〜ＭＢ’４のうちから最も音量が大きいパターンを選択する。これにより、送信側の話者１０２Ａの位置を特定できる。ビーム位置算出部２４は、当該特定された位置情報を制御信号２４１として取得用ビーム形成部２２に出力する。具体的には、送信側の話者１０２Ａがいると想定される複数の位置のパターンについて予めＩＤコードを設けておき、ビーム位置算出部２４は、収音ビームの音量が大きい位置のＩＤコードを取得用ビーム形成部２２に出力する。 The beam position calculation unit 24 selects the pattern with the highest volume from the collected sound beams MB′1 to MB′4 output from the BPF 23. Thereby, the position of the speaker 102A on the transmission side can be specified. The beam position calculation unit 24 outputs the specified position information to the acquisition beam forming unit 22 as a control signal 241. Specifically, an ID code is provided in advance for a pattern of a plurality of positions where it is assumed that there is a speaker 102A on the transmission side, and the beam position calculation unit 24 selects an ID code at a position where the volume of the collected sound beam is high. Output to the acquisition beam forming unit 22.

なお、このビーム位置算出部２４における音量の計算は、デジタル時系列音声データをＦＦＴ変換して、複数の特定周波数ごとのゲインの２乗を足し合わせることで代用することができる。また、ビーム位置算出部２４は、所定間隔ごと、例えば０．５秒ごとに動作する。これにより、ビーム位置算出部２４は、話者１０２Ａの移動を制御信号２４１として検出できる。 The calculation of the sound volume in the beam position calculation unit 24 can be substituted by performing FFT conversion on digital time-series audio data and adding the squares of gains for each of a plurality of specific frequencies. Further, the beam position calculation unit 24 operates every predetermined interval, for example, every 0.5 seconds. Thereby, the beam position calculation unit 24 can detect the movement of the speaker 102 A as the control signal 241.

図２（Ａ）の取得用ビーム形成部２２は、ビーム位置算出部２４からの制御信号２４１を受けて、検出用ビーム形成部２１で合成した収音ビームのうちから最も音量が大きい収音ビームに対応する収音エリアをさらに細分する複数（本実施例では４つ）の収音エリアへ指向させた取得用ビームを形成する。
また、信号選択部２５は、取得用ビーム形成部２２が出力した取得用ビームのうち音量が大きい２つの取得用ビーム（図５の細分化収音エリア１３３、１３４に相当）を選択し、これらの取得用ビームの音量に応じて比例配分した位置を音源の位置１０２Ａとして決定し、これを位置情報２７Ａとして多重化部２８に出力する。 2A receives the control signal 241 from the beam position calculation unit 24, and the sound collection beam having the loudest volume among the sound collection beams synthesized by the detection beam formation unit 21. The acquisition beams directed to a plurality of (four in this embodiment) sound collection areas that further subdivide the sound collection area corresponding to 1 are formed.
In addition, the signal selection unit 25 selects two acquisition beams (corresponding to the subdivided sound collection areas 133 and 134 in FIG. 5) of the acquisition beams output from the acquisition beam forming unit 22, and selects these. The position proportionally distributed according to the volume of the acquisition beam is determined as the sound source position 102A, and this is output to the multiplexing unit 28 as position information 27A.

図２（Ａ）の信号加算部２６は、信号選択部２５が出力した２つの取得用ビームを音量に応じて比例配分して合成し、この合成したデジタルの音声信号２７Ｂを多重化部２８に送る。このように常に２つ以上の取得用ビームを合成することにより、話者が移動したり、入れ替わったりしたときも音声がスムーズに移動するようにすることができる。 The signal adder 26 in FIG. 2A synthesizes the two acquisition beams output from the signal selector 25 in proportion to each other according to the volume, and synthesizes the synthesized digital audio signal 27B to the multiplexer 28. send. Thus, by always synthesizing two or more acquisition beams, it is possible to smoothly move the voice even when the speaker moves or is switched.

図２（Ａ）の多重化部２８は、信号選択部２５で生成した位置情報２７Ａと、信号加算部２６で生成した音声信号２７Ｂを多重化して、受信側ユニット１Ｂに送信２０を行う。 The multiplexing unit 28 in FIG. 2A multiplexes the position information 27A generated by the signal selection unit 25 and the audio signal 27B generated by the signal addition unit 26, and performs transmission 20 to the reception side unit 1B.

次に図２（Ｂ）を用いて、受信側ユニット１Ｂの内部構成について説明する。受信側ユニット１Ｂは、受信部３１と、パラメータ算出部３２と、仮想音源生成信号処理部３３と、ＤＡＣ３４ｉ（ｉ＝１〜Ｎ）と、ＡＭＰ３５ｉ（ｉ＝１〜Ｎ）とを備え、外部のスピーカＳＰｉ（ｉ＝１〜Ｎ）を接続している。以下、それぞれの構成を説明する。 Next, the internal configuration of the receiving unit 1B will be described with reference to FIG. The reception side unit 1B includes a reception unit 31, a parameter calculation unit 32, a virtual sound source generation signal processing unit 33, a DAC 34i (i = 1 to N), and an AMP 35i (i = 1 to N). Speakers SPi (i = 1 to N) are connected. Each configuration will be described below.

図２（Ｂ）の受信部３１は、送信側ユニット１Ａから、位置情報２７Ａと、音声信号２７Ｂについて受信信号３０の受信を行う。この音声信号２７Ｂは、仮想音源生成信号処理部３３に送られると共に、位置情報２７Ａは、パラメータ算出部３２に送られる。
パラメータ算出部３２は、位置情報２７Ａを基に仮想音源ＶＳ（図１（Ｂ）参照。）を設定して、仮想音源ＶＳの位置から各スピーカＳＰｉの位置までの距離３６ｉ（ｉ＝１〜Ｎ）を計算する。そして、この距離３６ｉを音速で除算することにより、ディレイ量のパラメータを設定する。 The reception unit 31 in FIG. 2B receives the reception signal 30 for the position information 27A and the audio signal 27B from the transmission-side unit 1A. The audio signal 27B is sent to the virtual sound source generation signal processing unit 33, and the position information 27A is sent to the parameter calculation unit 32.
The parameter calculation unit 32 sets a virtual sound source VS (see FIG. 1B) based on the position information 27A, and a distance 36i (i = 1 to N) from the position of the virtual sound source VS to the position of each speaker SPi. ). Then, the delay amount parameter is set by dividing the distance 36i by the speed of sound.

図２（Ｂ）の仮想音源生成信号処理部３３、ＤＡＣ３４ｉ（ｉ＝１〜Ｎ）、ＡＭＰ３５ｉ（ｉ＝１〜Ｎ）について説明する。
仮想音源生成信号処理部３３は、パラメータ算出部３２で設定するパラメータに基づいて、受信部３１で受信した音声、即ち、送信側ユニット１Ａの多重化部２８で生成した音声信号２７Ｂを、スピーカＳＰｉ（ｉ＝１〜Ｎ）の出力系統ごとに信号処理し、ＤＡＣ３４ｉ（ｉ＝１〜Ｎ）にそれぞれ出力する。受信部３１に入力する音声信号２７Ｂは１系統であるが、音声信号２７Ｂに対しパラメータ算出部３２のパラメータ（ディレイ量設定）に基づいて、スピーカＳＰｉの出力系統ごとに、遅延処理を行う。これにより、仮想音源ＶＳを焦点とする放音ビームをスピーカＳＰｉから出力することができ、あたかも、受信側ユニット１Ｂの前方に仮想音源があるような音像定位を実現できる。 The virtual sound source generation signal processing unit 33, the DAC 34i (i = 1 to N), and the AMP 35i (i = 1 to N) in FIG. 2B will be described.
Based on the parameters set by the parameter calculation unit 32, the virtual sound source generation signal processing unit 33 converts the audio received by the reception unit 31, that is, the audio signal 27B generated by the multiplexing unit 28 of the transmission side unit 1A, into the speaker SPi. Signal processing is performed for each output system (i = 1 to N) and output to the DAC 34i (i = 1 to N). Although the audio signal 27B input to the receiving unit 31 is one system, the audio signal 27B is subjected to delay processing for each output system of the speaker SPi based on the parameter (delay amount setting) of the parameter calculation unit 32. As a result, a sound emitting beam with the virtual sound source VS as the focal point can be output from the speaker SPi, and sound image localization as if there is a virtual sound source in front of the receiving side unit 1B can be realized.

ＤＡＣ３４ｉは、仮想音源生成信号処理部３３で得られた音声デジタルデータをアナログ信号に変換して出力する。
ＡＭＰ３５ｉ（ｉ＝１〜Ｎ）は、ＤＡＣ３４ｉ（ｉ＝１〜Ｎ）の出力をそれぞれ増幅して、スピーカＳＰｉ（ｉ＝１〜Ｎ）に出力する。 The DAC 34i converts the audio digital data obtained by the virtual sound source generation signal processing unit 33 into an analog signal and outputs the analog signal.
The AMP 35i (i = 1 to N) amplifies the output of the DAC 34i (i = 1 to N), respectively, and outputs the amplified output to the speaker SPi (i = 1 to N).

ここで、送信側ユニット１Ａの信号選択部２５による絞り込み探索について、補足説明をする。信号選択部２５は、例えば取得用ビーム形成部２２が、特定時間、話者が静止していると判断した場合にのみ行なっても良い。処理能力の問題上、この絞込み探索をする時間的余裕がない場合があるからである。この場合には、音声信号２７Ｂの生成を以下（Ａ）〜（Ｂ）のように処理する。
（Ａ）音源の位置が、粗く設定した収音エリア１１１〜１１４の１つに特定時間留まる場合には、取得用ビーム形成部２２は、制御信号２４１が所定期間一定であることを検知し、ビーム位置算出部２４で選択した収音エリアのさらに内側の細分化収音エリア（図５（Ａ）の１３１〜１３４に相当）へ指向させた複数の取得用ビームを形成する。
（Ｂ）音源の位置が、粗く設定した収音エリア１１１〜１１４間で移動した場合には、取得用ビーム形成部２２は、制御信号２４１の変動を検知して、取得用ビーム形成部２２に検出用ビーム形成部２１と同様の複数の粗い範囲を設定した収音エリア１１１〜１１４へ指向させた取得用ビームを形成する。この場合の取得用ビームは、上記収音ビームと同様に粗い範囲の収音エリアへ指向する。 Here, a supplementary explanation will be given for the search refinement by the signal selection unit 25 of the transmission-side unit 1A. For example, the signal selection unit 25 may be performed only when the acquisition beam forming unit 22 determines that the speaker is stationary for a specific time. This is because there is a case where there is no time for performing this narrowing search due to the problem of processing capability. In this case, the generation of the audio signal 27B is processed as follows (A) to (B).
(A) When the position of the sound source stays in one of the roughly set sound collection areas 111 to 114 for a specific time, the acquisition beam forming unit 22 detects that the control signal 241 is constant for a predetermined period, A plurality of acquisition beams directed to subdivision sound collection areas (corresponding to 131 to 134 in FIG. 5A) further inside the sound collection area selected by the beam position calculation unit 24 are formed.
(B) When the position of the sound source moves between the sound collection areas 111 to 114 set roughly, the acquisition beam forming unit 22 detects the fluctuation of the control signal 241 and sends it to the acquisition beam forming unit 22. The acquisition beam directed to the sound collection areas 111 to 114 in which a plurality of coarse ranges similar to the detection beam forming unit 21 are set is formed. The acquisition beam in this case is directed to a sound collection area in a rough range in the same manner as the sound collection beam.

これら（Ａ）、（Ｂ）いずれの場合においても、信号選択部２５は、取得用ビーム形成部２２から出力された複数の取得用ビームの中で、音量が大きい２つを選択して信号加算部２６に出力し、信号加算部２６はこれらの取得用ビームを音量に応じて比例配分して合成する。
また、図２（Ａ）の信号加算部２６は、音源が大きく移動する場合に音量の大きい２つを選択して単純に合成するのではなく、切り替え前の音声をフェードアウトすると共に切り替え後の音声をフェードインして、音声をクロスフェードするようにしてもよい。このように音源が大きく移動する場合、音声を自然に接続できる。このクロスフェードを行う時間中は、送信側ユニット１Ａ、受信側ユニット１Ｂ間で、収音エリア１１１〜１１４の補間位置と、クロスフェード開始時、終了時を送受信する。パラメータ算出部３２は、仮想音源の位置ＶＳについても、このクロスフェード時間内で次第に移動するようにし、この移動に基づいてパラメータを設定する。これにより音像の移動が自然になる。 In either of these cases (A) and (B), the signal selection unit 25 selects two of the plurality of acquisition beams output from the acquisition beam forming unit 22 and outputs a signal with a high volume. The signal adding unit 26 synthesizes these acquisition beams by proportionally distributing them according to the volume.
In addition, the signal adding unit 26 in FIG. 2A does not simply synthesize two high-volume sounds when the sound source moves greatly, but fades out the sound before switching, and the sound after switching. May be faded in to crossfade the sound. When the sound source moves greatly as described above, the sound can be naturally connected. During the time for performing this crossfade, the transmitting unit 1A and the receiving unit 1B transmit and receive the interpolation positions of the sound collection areas 111 to 114, and the crossfading start and end times. The parameter calculation unit 32 also moves gradually within the crossfade time for the position VS of the virtual sound source, and sets parameters based on this movement. This makes the movement of the sound image natural.

なお、音源の位置が粗く設定した収音エリア１１１〜１１４間で移動した場合は、取得用ビーム形成部２２に検出用ビーム形成部２１と同様の複数の粗い範囲を設定するが、この場合、信号選択部２５の出力する位置情報２７Ａは、ビーム位置算出部２４の出力する位置情報である制御情報２４１と同等になる。したがって、信号選択部２５が出力する位置情報２７Ａの代わりに、ビーム位置算出部２４にて得られる制御情報２４１を、多重化部２８に２７Ｃ（図２（Ａ）参照）として出力しても良い。 In addition, when the position of the sound source moves between the sound collection areas 111 to 114 set roughly, a plurality of rough ranges similar to those of the detection beam forming unit 21 are set in the acquisition beam forming unit 22, but in this case, The position information 27 A output from the signal selection unit 25 is equivalent to the control information 241 that is position information output from the beam position calculation unit 24. Therefore, instead of the position information 27A output by the signal selection unit 25, the control information 241 obtained by the beam position calculation unit 24 may be output to the multiplexing unit 28 as 27C (see FIG. 2A). .

図３を用いて、図２のパラメータ算出部３２におけるパラメータの設定方法について、具体的に説明する。図３は、この設定方法についての概念図である。
まず、図３（Ａ）で示すように、仮想音源ＶＳと受信側ユニット１Ｂの各スピーカＳＰｉとの距離３６ｉを計算する。次に、図３（Ｂ）で示すように、パラメータ算出部３２のパラメータを設定する。例えば、ＳＰ１について、距離３６１を音速Ｖで除算した値とする。 The parameter setting method in the parameter calculation unit 32 of FIG. 2 will be specifically described with reference to FIG. FIG. 3 is a conceptual diagram regarding this setting method.
First, as shown in FIG. 3A, the distance 36i between the virtual sound source VS and each speaker SPi of the receiving unit 1B is calculated. Next, as shown in FIG. 3B, parameters of the parameter calculation unit 32 are set. For example, a value obtained by dividing the distance 361 by the speed of sound V for SP1.

以上、図２、図３の説明で説明したように、伝送元のマイクＭｉの個数Ｎ個分の音声系統を伝送しなくとも、受信側ユニット１Ｂは、送信側ユニット１Ａの音声信号２７Ｂと、位置情報２７Ａのみに基づいて、スピーカＳＰｉ（ｉ＝１〜Ｎ）上に、音場を正確に再現できる。 As described above with reference to FIGS. 2 and 3, the reception-side unit 1B can transmit the audio signal 27B of the transmission-side unit 1A without transmitting the audio system for the number N of transmission source microphones Mi. A sound field can be accurately reproduced on the speaker SPi (i = 1 to N) based only on the position information 27A.

＜送信側装置の音声の収音の実施例＞
以下、図４、図５を参照して、図２（Ａ）で説明した検出用ビーム形成部２１についての収音の実施例についてさらに具体的に説明を行う。 <Example of sound collection by the transmitting device>
Hereinafter, with reference to FIG. 4 and FIG. 5, the sound collection example for the detection beam forming unit 21 described with reference to FIG.

図４を用いて、検出用ビーム形成部２１が、収音エリア１１１〜１１４へ指向させた収音ビームを形成する方法について説明する。図４では、Ｘ１〜ＸＮの入力にディレイを付与して加算する方法を説明するため、仮想の計算テーブル２１１１〜２１１４を示している。形成すべき収音ビームＭＢ１〜ＭＢ４は、Ｘ１〜ＸＮの入力に所定のディレイを付与して合成したものである。検出用ビーム形成部２１の図示しないＲＯＭ内には、この収音エリアに収音ビームを指向させるよう、ディレイパターンのデータ２１２１〜２１２４が、収音エリア１１１〜１１４に対応して記録されている。検出用ビーム形成部２１は、ディレイパターンのデータ２１２１〜２１２４に基づいてディレイＤｊｉ（ｊ＝１〜４、ｉ＝１〜Ｎ）にディレイ量を設定する。マイク入力合成部２１４１〜２１４４は、ディレイＸ１〜ＸＮの入力をＤｊ１〜ＤｊＮに通した信号を、計算テーブル２１１１〜２１１４ごとに加算して、収音ビームＭＢ１〜ＭＢ４として出力する。 With reference to FIG. 4, a description will be given of a method in which the detection beam forming unit 21 forms a sound collection beam directed to the sound collection areas 111 to 114. FIG. 4 shows virtual calculation tables 2111 to 2114 in order to explain a method of adding a delay to the inputs of X1 to XN. The sound collecting beams MB1 to MB4 to be formed are synthesized by adding a predetermined delay to the inputs of X1 to XN. In the ROM (not shown) of the detection beam forming unit 21, delay pattern data 2121 to 2124 are recorded corresponding to the sound collection areas 111 to 114 so that the sound collection beam is directed to the sound collection area. . The detection beam forming unit 21 sets the delay amount to the delay Dji (j = 1 to 4, i = 1 to N) based on the delay pattern data 2121 to 2124. The microphone input combining units 2141 to 2144 add the signals obtained by passing the inputs of the delays X1 to XN to Dj1 to DjN for each of the calculation tables 2111 to 2114, and output the collected sound beams MB1 to MB4.

なお、以上では説明上、収音ビームＭＢ１〜ＭＢ４毎に計算テーブル２１１１〜２１１４を示したが、実装上は、Ｄ１１〜Ｄ４１をまとめて１つのリングバッファ用のメモリ等で構成するのが簡便である。このリングバッファには、Ｘ１を入力し、Ｄ１１〜Ｄ４１の出力タップを設ける。Ｄ１ｉ〜Ｄ４ｉ（ｉ＝２〜Ｎ）も同様に、それぞれ１つのＸｉを入力するリングバッファを構成し、Ｄ１ｉ〜Ｄ４ｉの出力タップを設ける。また、実装上、ディレイパターンのデータ２１２１〜２１２４を、収音ビームＭＢ１〜ＭＢ４毎でなく、入力Ｘｉ（ｉ＝１〜Ｎ）毎に用意するのが好ましい。 In the above description, the calculation tables 2111 to 2114 are shown for each of the sound collecting beams MB1 to MB4 for the sake of explanation. However, in terms of mounting, it is easy to configure D11 to D41 together with a single ring buffer memory or the like. is there. X1 is input to this ring buffer, and output taps D11 to D41 are provided. Similarly, D1i to D4i (i = 2 to N) each constitute a ring buffer for inputting one Xi, and output taps D1i to D4i are provided. Further, in terms of mounting, it is preferable to prepare delay pattern data 2121 to 2124 for each input Xi (i = 1 to N) instead of for each of the sound collecting beams MB1 to MB4.

次に、図５（Ａ）を参照して音源位置の検出方法について説明する。検出用ビーム形成部２１が、複数の収音エリア１１１〜１１４へ指向させた収音ビームを同時に形成する。ビーム位置算出部２４は、各収音ビームの音量の大小を比較して、この収音エリアのいずれかの領域に話者がいるかを監視する。 Next, a method for detecting a sound source position will be described with reference to FIG. The detection beam forming unit 21 simultaneously forms sound collection beams directed to the plurality of sound collection areas 111 to 114. The beam position calculation unit 24 compares the volume levels of the sound collection beams and monitors whether there is a speaker in any one of the sound collection areas.

図５（Ａ）において、各マイクＭｉ（ｉ＝１〜Ｎ）が収音した音声信号を適当なディレイ時間だけを遅延させて合成することにより、収音エリア１１１〜１１４へ指向させた収音ビームを形成するが、以下、各マイクに設定するディレイ時間（すなわち、各マイクが収音した音声を遅延させるディレイ時間）の設定方法について説明する。ここでは、収音エリア１１１に収束する収音ビームを例に挙げて説明する。図５（Ａ）に示すように、収音エリア１１１の中心からマイクＭｉ（ｉ＝１〜Ｎ）までの距離１２１ｉ（ｉ＝１〜Ｎ）を計算する。この距離を音速（３４０ｍ／秒）で除算して収音エリア１１１から各マイクまでの音波の伝播時間を計算する。計算した伝搬時間のうち、最長のものを基準時間として、この基準時間と各マイクまでの伝播時間との差を、そのマイクに設定するディレイ時間とする。このディレイ時間が検出用ビーム形成部２１に設定される。 In FIG. 5 (A), the sound signals picked up by the microphones Mi (i = 1 to N) are synthesized by delaying only an appropriate delay time so as to be directed to the sound collection areas 111 to 114. Hereinafter, a method of setting a delay time set for each microphone (that is, a delay time for delaying sound collected by each microphone) will be described. Here, a sound collection beam that converges in the sound collection area 111 will be described as an example. As shown in FIG. 5A, a distance 121i (i = 1 to N) from the center of the sound collection area 111 to the microphone Mi (i = 1 to N) is calculated. Dividing this distance by the speed of sound (340 m / sec) calculates the propagation time of the sound wave from the sound collection area 111 to each microphone. Among the calculated propagation times, the longest one is used as a reference time, and the difference between the reference time and the propagation time to each microphone is set as a delay time set for the microphone. This delay time is set in the detection beam forming unit 21.

このように設定されたディレイ時間で各マイクが収音した音声信号を遅延させて合成することにより、収音エリア１１１から各マイクへ伝搬した音声を、位相をそろえて合成することができ、収音エリア１１１の音声を高いゲインで出力することができる。一方、他のエリアから伝搬した音声は、位相がずれた状態で合成されるため、振幅が相殺されて低ゲインとなる。 By delaying and synthesizing the sound signal collected by each microphone with the delay time set in this way, the sound propagated from the sound collection area 111 to each microphone can be synthesized with the same phase. The sound in the sound area 111 can be output with a high gain. On the other hand, since the sound propagated from other areas is synthesized with the phase shifted, the amplitude is canceled and the gain becomes low.

以上では収音エリア１１１に収束する収音ビームについて説明したが、収音エリア１１２、１１３、１１４に収束する収音ビームについても同様である。 The sound collection beam that converges in the sound collection area 111 has been described above, but the same applies to the sound collection beams that converge in the sound collection areas 112, 113, and 114.

次に、図５（Ｂ）を用いて、図２（Ａ）で示したビーム位置算出部２４の算出方法について、より具体的に説明する。以下の説明では、送信側の話者が図５（Ａ）に示す１０２Ａ（収音エリア１１４の中の細分化収音エリア１３３内）の位置に存在すると仮定する。また、音源発生信号の波形が符号２４０で示した形状と仮定する。
まず、検出用ビーム形成部２１は、収音エリア１１１〜１１４を設定して、送信側の話者１０２Ａの位置を探索している。図５（Ａ）の１０２Ａに話者がいるとすれば、この収音エリア１１４へ指向させた収音ビームＭＢ’４の音量は最も大きくなる。他方、この収音エリア１１４から遠ざかる収音エリア１１３、１１２、１１１へ指向させた収音ビームは、波形２４３→波形２４２→波形２４１に示すように、音量は小さくなる。そこで、ビーム位置算出部２４は、図２に示す収音ビームＭＢ’１〜ＭＢ’４のうちで音量が最大のものに対応する収音エリア（図５では１１４）を選択する。
また、信号選択部２５は、ビーム位置算出部２４と同様にして、音量の最も大きな取得用ビーム出力を選択して出力する。ただし、信号選択部２５は、取得用ビーム形成部２２が設定した細分化収音エリア１３１〜１３４の内で音量が最大のものから順に２つの取得用ビームを選択する。この取得用ビームは、図５（Ａ）では、細分化収音エリア１３３、１３４に対応するビームとなる。 Next, the calculation method of the beam position calculation unit 24 illustrated in FIG. 2A will be described more specifically with reference to FIG. In the following description, it is assumed that the speaker on the transmission side is present at the position 102A (in the subdivided sound collection area 133 in the sound collection area 114) shown in FIG. Further, it is assumed that the waveform of the sound source generation signal has a shape indicated by reference numeral 240.
First, the detection beam forming unit 21 sets the sound collection areas 111 to 114 and searches for the position of the speaker 102A on the transmission side. If a speaker is present at 102A in FIG. 5A, the volume of the sound collection beam MB′4 directed to the sound collection area 114 is the highest. On the other hand, the sound collection beam directed to the sound collection areas 113, 112, and 111 moving away from the sound collection area 114 has a small volume as shown in the waveform 243 → the waveform 242 → the waveform 241. Therefore, the beam position calculation unit 24 selects the sound collection area (114 in FIG. 5) corresponding to the sound collection beam MB′1 to MB′4 shown in FIG.
Similarly to the beam position calculation unit 24, the signal selection unit 25 selects and outputs the acquisition beam output with the highest volume. However, the signal selection unit 25 selects two acquisition beams in order from the one with the largest volume among the subdivided sound collection areas 131 to 134 set by the acquisition beam forming unit 22. The acquisition beam is a beam corresponding to the subdivided sound collection areas 133 and 134 in FIG.

以上で説明した構成により、それぞれ、図５（Ａ）に示すような、収音エリア１１ｊの方向へ指向させた収音ビームＭＢｊを形成することができる。図２（Ａ）の取得用ビーム形成部２２も、これと同様の構成をとることができる。 With the configuration described above, it is possible to form a sound collection beam MBj directed in the direction of the sound collection area 11j as shown in FIG. The acquisition beam forming unit 22 shown in FIG. 2A can also have the same configuration.

＜具体的使用形態と音場再現例＞
図６を用いて、本実施形態の装置の具体的使用形態と音場再現例について説明する。図６は、この具体的使用形態を表す図である。前述の図１の説明のとおり遠隔会議装置１は、通信回線やネットワークを介して相互に接続するための、１台分の機能を備えており、これと同等の２台またはそれ以上の遠隔会議装置を、ネットワークを介して相互に接続するようにする。 <Specific usage and sound field reproduction>
A specific usage pattern of the apparatus of the present embodiment and an example of sound field reproduction will be described with reference to FIG. FIG. 6 is a diagram showing this specific usage pattern. As described above with reference to FIG. 1, the remote conference device 1 has a function of one unit for connecting to each other via a communication line or a network, and two or more remote conferences equivalent to this are provided. The devices are connected to each other via a network.

図６（Ａ）、（Ｂ）に示す２台の送信側ユニット１Ａ、受信側ユニット１Ｂは、音声信号と共に位置情報を送受信するので、互いに同じスピーカの数ＳＰｉ、あるいはマイクＭｉを用意しなくとも送信側の話者１０２Ａの音場を伝送できるが、以下では、同一の構成の遠隔会議装置１を用いるものとして説明する。また、簡単のため、以下の説明ではＸＹ平面の２次元座標を用いることとし、送信側ユニット１Ａ、受信側ユニット１Ｂの中心の位置を図６に示すようにそれぞれ原点の座標とする。 Since the two transmission side units 1A and the reception side unit 1B shown in FIGS. 6A and 6B transmit and receive the position information together with the audio signal, the same number of speakers SPi or the microphone Mi are not required. Although the sound field of the speaker 102A on the transmission side can be transmitted, the following description will be made assuming that the remote conference device 1 having the same configuration is used. For the sake of simplicity, in the following description, two-dimensional coordinates on the XY plane are used, and the positions of the centers of the transmission side unit 1A and the reception side unit 1B are the coordinates of the origin as shown in FIG.

図６（Ａ）に示すように、送信側の話者１０２ＡのＸ、Ｙ座標が（Ｘ１，−Ｙ１）であったとする。なお、この実施形態では、話者から見て、送信側ユニット１Ａ、受信側ユニット１Ｂの右側がＸプラス方向（すなわち、図６（Ａ）では紙面の左方向、図６（Ｂ）では紙面の右方向がＸプラス方向）としている。遠隔会議装置１の送信側ユニット１Ａは、送信側の話者１０２Ａから発せられ、マイクＭｉで得られた音声信号を解析して、送信側の話者１０２Ａの位置情報を得る。この位置情報を得る方法は、前述した図４、図５の説明のようにマイクＭｉ（ｉ＝１〜Ｎ）の音声信号の解析により行うことができる。図６の以下の説明では、この位置情報を得た結果、送信側の話者１０２Ａは、図６（Ａ）に示すように、送信側の話者１０２ＡのＸ軸のプラスの位置にいると仮定して説明する。 As shown in FIG. 6A, it is assumed that the X and Y coordinates of the transmitting speaker 102A are (X1, -Y1). In this embodiment, when viewed from the speaker, the right side of the transmitting unit 1A and the receiving unit 1B is in the X plus direction (that is, the left direction in FIG. 6A and the right side in FIG. 6B). The right direction is the X plus direction). The transmission-side unit 1A of the remote conference apparatus 1 analyzes the voice signal emitted from the transmission-side speaker 102A and obtained by the microphone Mi to obtain the position information of the transmission-side speaker 102A. This position information can be obtained by analyzing the audio signal of the microphone Mi (i = 1 to N) as described with reference to FIGS. In the following description of FIG. 6, as a result of obtaining this position information, it is assumed that the transmitting speaker 102A is at a positive position on the X axis of the transmitting speaker 102A as shown in FIG. An explanation will be given.

図６（Ｂ）に示すように、受信側の聴取者１０２Ｂから見れば、送信側の話者１０２Ａは向かい合って会話しているようにするのが自然であるから、送信側の話者１０２Ａの代わりとなる仮想音源ＶＳを設定すべき位置のＸ座標の値は、原点より左の−Ｘ１であり、また、Ｙ座標の値は、奥行き側後方であるからＹ１となる。結果的に仮想音源ＶＳを設定すべき位置の座標は、（−Ｘ１,Ｙ１）となる。そして、前述の図３（Ｂ）で説明したようにして、図２（Ｂ）のパラメータ算出部３２は、この仮想音源ＶＳの位置に基づいて、スピーカＳＰｉ（ｉ＝１〜Ｎ）ごとにそのディレイ量、音量の各パラメータを計算する。そして、図２に示すように、仮想音源生成信号処理部３３においてこれを設定して、ＤＡＣ３４ｉ（ｉ＝１〜Ｎ）、ＡＭＰ３５ｉ（ｉ＝１〜Ｎ）を経てスピーカＳＰｉ（ｉ＝１〜Ｎ）から音声として出力する。 As shown in FIG. 6B, from the viewpoint of the listener 102B on the reception side, it is natural that the speaker 102A on the transmission side is conversing face to face. The value of the X coordinate at the position where the alternative virtual sound source VS should be set is -X1 to the left of the origin, and the value of the Y coordinate is Y1 because it is behind the depth side. As a result, the coordinates of the position where the virtual sound source VS should be set are (−X1, Y1). Then, as described above with reference to FIG. 3B, the parameter calculation unit 32 in FIG. 2B performs, for each speaker SPi (i = 1 to N), based on the position of the virtual sound source VS. Calculate the delay and volume parameters. Then, as shown in FIG. 2, this is set in the virtual sound source generation signal processing unit 33, and the speaker SPi (i = 1 to N) via the DAC 34i (i = 1 to N) and AMP 35i (i = 1 to N). ) To output as audio.

以上の実施形態の説明について、補足する。
以上の説明では、説明の容易のため、送信側ユニット１Ａは送信を行うものとして説明したが、送信側ユニット１Ａ、１Ｂは送信のみ、または受信のみ行うものではない。遠隔会議装置１は、双方向通信を行う機能を備えており、受信側の聴取者１０２Ｂが話しかけた音声と受信側の聴取者１０２Ｂの位置情報は、受信側ユニット１Ｂを用いて取得され、ネットワークを通じて送信側ユニット１Ａに送信される。その機能は、以上の説明と同様である。 It supplements about description of the above embodiment.
In the above description, for ease of explanation, the transmission side unit 1A has been described as performing transmission. However, the transmission side units 1A and 1B do not perform only transmission or only reception. The remote conference apparatus 1 has a function of performing two-way communication. The voice spoken by the listener 102B on the receiving side and the position information of the listener 102B on the receiving side are acquired using the receiving unit 1B. To the transmitting unit 1A. Its function is the same as described above.

また、以上の説明では、送信側の話者１０２Ａは１人であり、１つの音源として説明したが、複数人いても良い。この場合には、位置情報を複数用意して、それぞれの位置での音量を検出して伝送する。受信側ユニットは、上述の図３で示したようなパラメータ設定を話者それぞれについて行い、これに基づいて算出したスピーカＳＰｉ（ｉ＝１〜Ｎ）の出力をユニットごとに加算する。 Further, in the above description, the transmitting side speaker 102A is one and described as one sound source, but a plurality of speakers may be present. In this case, a plurality of pieces of position information are prepared, and the sound volume at each position is detected and transmitted. The receiving unit performs the parameter setting as shown in FIG. 3 described above for each speaker, and adds the outputs of the speakers SPi (i = 1 to N) calculated based on the parameters for each unit.

また、図２、図５では、同時に出力する収音ビーム、取得ビーム、収音エリアの数をそれぞれ４つとしたが、いずれもそれ以上であっても良く、それ以下であっても良い。同時に形成する収音ビーム、取得ビームの数は４つに限定されない。また、同時でなく、時分割で収音ビームを出力して、音量を比較するのであれば、収音ビーム、取得用ビームは２つ以上でなく、１つでも足りる。 In FIGS. 2 and 5, the number of sound collecting beams, acquisition beams, and sound collecting areas that are output simultaneously is four, but any of these may be more or less. The number of sound collecting beams and acquisition beams formed simultaneously is not limited to four. Further, if the sound collecting beams are output not in the same time but in a time division manner and the sound volumes are compared, not only two sound collecting beams and acquisition beams but one is sufficient.

また、話者が複数ある場合を想定する場合は、このような段階的な位置の絞込みをするときに、少なくとも最初の段階では、話者の位置を１つに絞り込むのではなく、複数の候補を用意してそれぞれの近傍をさらに探索しても良い。 In addition, when assuming a case where there are a plurality of speakers, when narrowing down such a staged position, at least in the first stage, instead of narrowing down the speaker position to one, a plurality of candidates May be further searched for each neighborhood.

また、以上の説明では、スピーカＳＰｉ、マイクＭｉ（ｉ＝１〜Ｎ）の数はそれぞれ同じＮとしたが、必ずしも同じにする必要はない。伝送経路では、特許文献２のようなものと異なり、音源の位置情報を用いて伝送しているからこれらの数が異なっていても、送信元の音場を再現できる。 In the above description, the number of speakers SPi and microphones Mi (i = 1 to N) is the same N, but it is not necessarily the same. In the transmission path, unlike the case of Patent Document 2, transmission is performed using the position information of the sound source, so even if these numbers are different, the sound field of the transmission source can be reproduced.

また、本実施形態では、検出用ビーム形成部２１、取得用ビーム形成部２２により２段階に収音ビーム、取得用ビームを設定し、送信側の話者１０２Ａを探索したが、３段階以上にエリアを絞り込む設定をして、探索しても良い。 In this embodiment, the detection beam forming unit 21 and the acquisition beam forming unit 22 set the sound collection beam and the acquisition beam in two stages, and search for the speaker 102A on the transmission side. You may search by setting the area to narrow down.

また、取得用ビーム形成部２２を設けなくても、ビーム位置算出部２４だけで位置情報２７Ｃを検出し出力することも可能である。ビーム位置算出部２４単独でも収音ビームを比較すれば収音エリア１１１〜１１４のいずれに音源１０２Ａが含まれるかを検出できる。また、ビーム位置算出部２４は、必ずしも比例配分により求めなくても、単に最大の音量となる取得用ビームを検出して、これに対応する位置情報２７Ｃを検出することができる。 Even without providing the acquisition beam forming unit 22, the position information 27C can be detected and output only by the beam position calculating unit 24. It is possible to detect which of the sound collection areas 111 to 114 includes the sound source 102A by comparing the collected sound beams even with the beam position calculation unit 24 alone. Further, the beam position calculation unit 24 can simply detect the acquisition beam having the maximum sound volume and can detect the position information 27C corresponding to the acquisition beam, without necessarily obtaining by proportional distribution.

さらに、以上の説明では、図２の送信側ユニット１Ａでは、音声信号２７Ｂとして、収音ビームまたは取得用ビームを送信するとしたが、単にマイクＭ１〜ＭＮのいずれかの出力を送信しても良い。また、マイクＭ１〜ＭＮのいずれかまたは全部を足し合わせたものを送信しても良い。その他、マイクＭ１〜ＭＮが出力する収音信号を加算等して用いたものであればどのような音声信号でも良い。このようにしても、受信側ユニット１Ｂは、位置情報２７Ａまたは２７Ｃに基づいて伝送回線リソースの消費を抑えつつ、送信先の音場を正確に再現することができる。ただし、話者１０２Ａに向けた収音ビームまたは取得用ビームを送信する方が、遠隔会議装置１の耐ノイズ性能が向上する。 Furthermore, in the above description, the transmission side unit 1A in FIG. 2 transmits the sound collection beam or the acquisition beam as the audio signal 27B, but may simply transmit the output of any of the microphones M1 to MN. . Moreover, you may transmit what added any or all of microphone M1-MN. In addition, any audio signal may be used as long as the collected sound signals output from the microphones M1 to MN are used. Even in this case, the receiving-side unit 1B can accurately reproduce the sound field of the transmission destination while suppressing the consumption of transmission line resources based on the position information 27A or 27C. However, the noise resistance performance of the remote conference apparatus 1 is improved by transmitting the sound collection beam or acquisition beam toward the speaker 102A.

本実施形態の音声通信に用いる遠隔会議装置の外観図と、概略機能図である。It is the external view of the remote conference apparatus used for the voice communication of this embodiment, and a schematic function diagram. 本実施形態の音声通信に用いる遠隔会議装置の内部構成を表すブロック図である。It is a block diagram showing the internal structure of the remote conference apparatus used for the voice communication of this embodiment. パラメータ算出部３２におけるパラメータの設定方法についての概念図である。5 is a conceptual diagram of a parameter setting method in a parameter calculation unit 32. FIG. 本実施形態の音声通信に用いる遠隔会議装置の検出用ビーム形成部の内部構成図を示す。The internal block diagram of the beam forming part for a detection of the teleconference apparatus used for the audio | voice communication of this embodiment is shown. 話者の位置を特定するための説明図である。It is explanatory drawing for pinpointing a speaker's position. 具体的使用形態と音場再現例を説明する図である。It is a figure explaining a specific usage pattern and a sound field reproduction example.

Explanation of symbols

１−遠隔会議装置、１Ａ−送信側ユニット、１Ｂ−受信側ユニット、
１０１−机、１０２Ａ−送信側の話者、１０２Ｂ−受信側の聴取者、
１１１〜１１４―収音エリア、１２ｊｉ（ｊ＝１〜４，ｉ＝１〜Ｎ）−距離、
１３１〜１３４−細分化収音エリア、ＳＰｉ（ｉ＝１〜Ｎ）−スピーカ、
Ｍｉ（ｉ＝１〜Ｎ）−マイク、ＭＢｊ（ｊ＝１〜４）−収音ビーム、
ＶＳ−仮想音源、２０−送信、２１−検出用ビーム形成部、
２１１ｊ（ｊ＝１〜４）−計算テーブル、
２１２ｊ（ｊ＝１〜４）−ディレイパターンのデータ、
Ｄｊｉ（ｊ＝１〜４、ｉ＝１〜Ｎ）−ディレイ、
２１４ｊ（ｊ＝１〜４）−マイク入力合成部、
２２−取得用ビーム形成部、２３−ＢＰＦ、２４−ビーム位置算出部、
２４１−制御信号、２５−信号選択部、２６−信号加算部、
２７Ａ−位置情報、２７Ｂ−音声信号、２８−多重化部、２９−Ａ／Ｄ変換器、
３０−受信信号、３１−受信部、３２−パラメータ算出部、
３３−仮想音源生成信号処理部、３４ｉ（ｉ＝１〜Ｎ）−ＤＡＣ、
３５ｉ（ｉ＝１〜Ｎ）−ＡＭＰ、３６ｉ（ｉ＝１〜Ｎ）−距離 1-teleconference device, 1A-transmission side unit, 1B-reception side unit,
101-desk, 102A-speaker on transmitting side, 102B-listener on receiving side,
111-114-sound collection area, 12ji (j = 1-4, i = 1-N)-distance,
131-134-subdivided sound collection area, SPi (i = 1-N)-speaker,
Mi (i = 1 to N) -microphone, MBj (j = 1 to 4) -sound collecting beam,
VS-virtual sound source, 20-transmission, 21-beam forming unit for detection,
211j (j = 1 to 4) —calculation table,
212j (j = 1 to 4) -delay pattern data,
Dji (j = 1 to 4, i = 1 to N) -delay,
214j (j = 1 to 4)-microphone input synthesis unit,
22-Beam acquisition unit for acquisition, 23-BPF, 24-Beam position calculation unit,
241-control signal, 25-signal selection unit, 26-signal addition unit,
27A-position information, 27B-speech signal, 28-multiplexer, 29-A / D converter,
30-received signal, 31-receiver, 32-parameter calculator,
33-virtual sound source generation signal processing unit, 34i (i = 1 to N) -DAC,
35i (i = 1 to N) -AMP, 36i (i = 1 to N) -distance

Claims

A microphone array in which a plurality of microphones that collect sound and output as sound collection signals are arranged in an array;
A sound collection beam forming unit that respectively forms a sound collection beam directed to a plurality of sound collection areas by delay-adding the sound collection signals output from the plurality of microphones;
Position information detection means for detecting, as position information, a sound collection area corresponding to a sound collection beam indicating the maximum volume among the plurality of sound collection beams;
Transmitting means for transmitting the output of the sound pickup signal of the microphone and the position information;
An array speaker in which a plurality of speakers are arranged in an array; and
A receiving unit for receiving an audio signal and position information from the outside;
A signal processing unit that processes the received audio signal so as to form a sound emission beam having a position determined based on the received position information as a virtual sound source position and supplies the sound output beam to the plurality of speakers;
Remote conference device with

The position information detecting means detects a sound collecting area of the sound collecting beam having the maximum sound volume, and then obtains an acquisition sound collecting beam directed to a plurality of subdivided sound collecting areas obtained by further subdividing the sound collecting area. The teleconference apparatus according to claim 1, wherein the position information is detected based on subdivided sound collection areas corresponding to the plurality of acquisition sound collection beams that are formed and selected in descending order of volume of the acquisition sound collection beam.

The position between the subdivided sound collection areas corresponding to the plurality of acquisition sound collection beams selected in descending order of the volume of the acquisition sound collection beam is proportionally distributed according to the strength of the selected acquisition sound collection beam. Detect information,
The remote conference device according to claim 2, wherein the signal processing unit synthesizes the outputs of the selected plurality of acquisition sound collecting beams by the proportional distribution.