JP5515728B2

JP5515728B2 - Terminal device, processing method, and processing program

Info

Publication number: JP5515728B2
Application number: JP2009292283A
Authority: JP
Inventors: 宝浩島津
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-12-24
Filing date: 2009-12-24
Publication date: 2014-06-11
Anticipated expiration: 2029-12-24
Also published as: JP2011135272A

Description

この発明は、周囲から音声の入力を受け付ける音声入力部と、ネットワークを介して接続された他拠点の端末装置から受信された音声信号に基づいて音声を出力する音声出力部とを用いて、他拠点の端末装置との間で会議をおこなう端末装置、処理方法および処理プログラムに関する。 The present invention uses a voice input unit that receives voice input from the surroundings and a voice output unit that outputs voice based on a voice signal received from a terminal device at another base connected via a network. The present invention relates to a terminal device, a processing method, and a processing program that hold a conference with a terminal device at a base.

テレビ会議システムは、複数の拠点間で会議をおこなう際、各拠点間で相互に音声などの音声信号を送受信する。各拠点の端末装置は、マイクによって会議の参加者である利用者の発話などの音声の入力を受け付ける。端末装置は、マイクによって集音された音声を音声信号としてネットワークを介して他拠点の端末装置へ送信する。端末装置は、他拠点から送信された音声信号を音声としてスピーカから出力する。 When a conference is held between a plurality of bases, the video conference system transmits and receives audio signals such as voices between the bases. The terminal device at each site accepts input of speech such as the speech of a user who is a participant in the conference through a microphone. The terminal device transmits the sound collected by the microphone as a sound signal to the terminal device at another site via the network. The terminal device outputs an audio signal transmitted from another site as audio from the speaker.

マイクは、スピーカから出力された音声が回り込む音響エコーの入力も受け付ける。端末装置は、他拠点の端末装置に対して、利用者の発話と音響エコーとを含む音声信号を送信することがある。近年では、音響エコーの影響を除去させるため、スピーカからマイクに回り込む音響エコーの伝達特性を計算し、音声信号から音響エコーの成分を差し引く提案がされている（特許文献１）。 The microphone also accepts an acoustic echo input in which the sound output from the speaker wraps around. The terminal device may transmit an audio signal including the user's utterance and acoustic echo to the terminal device at another base. In recent years, in order to remove the influence of acoustic echoes, a proposal has been made to calculate the transmission characteristics of acoustic echoes that circulate from a speaker to a microphone and subtract the acoustic echo components from the audio signal (Patent Document 1).

特開２００８−２６１９２３号公報JP 2008-261923 A

しかしながら、上述した特許文献１に記載の技術では、伝達特性の計算元となる音響エコーそのものを減らす考慮はされていない。したがって、音響エコーが多大であると、音声信号から音響エコーの成分を除去することが困難であるという問題が一例として挙げられる。 However, in the technique described in Patent Document 1 described above, no consideration is given to reducing the acoustic echo itself that is a source of calculation of transfer characteristics. Therefore, the problem that it is difficult to remove the component of the acoustic echo from the audio signal when the acoustic echo is large is an example.

この発明は、上述した問題を解決するため、音響エコーを的確に除去できるように、音声出力部の出力態様を調整して、音響エコーの低減を図ることのできる端末装置、処理方法および処理プログラムを提供することを目的とする。 In order to solve the above-described problem, the present invention adjusts the output mode of the audio output unit so that the acoustic echo can be accurately removed, thereby reducing the acoustic echo, a terminal device, a processing method, and a processing program The purpose is to provide.

上述した課題を解決し、目的を達成するため、請求項１に記載の端末装置は、音声の入力を受け付ける音声入力部と、ネットワークを介して接続された他拠点の端末装置から受信された音声信号に基づいて音声を出力する音声出力部と、利用者の配置を検知する検知手段と、を備える端末装置であって、テスト音声信号を生成する生成手段と、前記生成手段によって生成された前記テスト音声信号をテスト音声に変換して前記音声出力部によって出力させるテスト音声出力手段と、前記テスト音声出力手段によって出力された前記テスト音声が前記音声入力部へ入力されることによって生じる音響エコー情報を測定する測定手段と、前記測定手段によって測定された前記音響エコー情報に基づいて、前記音声出力部の出力態様を変更する変更制御手段と、を備え、前記変更制御手段は、前記検知手段によって検知された前記利用者の配置に基づいて、前記出力態様の変更制御をおこない、前記テスト音声出力手段は、前記検知手段によって、前記端末装置の周囲の所定範囲に前記利用者が配置されたことを検知した場合、前記テスト音声を出力させることを特徴とする。 In order to solve the above-described problems and achieve the object, the terminal device according to claim 1 is a voice input unit that receives voice input and voice received from a terminal device at another base connected via a network. A terminal device comprising: an audio output unit that outputs audio based on a signal; and a detection unit that detects an arrangement of a user , wherein the generation unit generates a test audio signal, and the generation unit generates the test audio signal Test sound output means for converting a test sound signal into a test sound and outputting it by the sound output section, and acoustic echo information generated by inputting the test sound output by the test sound output means to the sound input section A measurement unit for measuring the output, and a change for changing an output mode of the audio output unit based on the acoustic echo information measured by the measurement unit Comprising a control means, wherein the change control means, based on the arrangement of the user detected by the detection means performs change control of the output mode, the test sound output means by said detecting means, When it is detected that the user is placed in a predetermined range around the terminal device, the test sound is output.

請求項２に記載の端末装置は、上記発明において、前記変更制御手段は、前記音声出力部によって出力される音声の指向角度の変更制御をおこなうことを特徴とする。 The terminal device according to claim 2 is characterized in that, in the above invention, the change control means performs change control of a directivity angle of a sound output by the sound output unit.

請求項３に記載の端末装置は、上記発明において、前記端末装置の配置に関する配置情報を取得する取得手段をさらに備え、前記テスト音声出力手段は、前記配置情報に変更があった場合、前記テスト音声を出力させることを特徴とする。 According to a third aspect of the present invention, in the above invention, the terminal device further includes an acquisition unit that acquires arrangement information related to the arrangement of the terminal device, and the test voice output unit is configured to perform the test when the arrangement information is changed. It is characterized by outputting sound.

請求項４に記載の処理方法は、音声の入力を受け付ける音声入力部と、ネットワークを介して接続された他拠点の端末装置から受信された音声信号に基づいて音声を出力する音声出力部と、利用者の配置を検知する検知手段と、を備える端末装置による処理方法であって、テスト音声信号を生成する生成工程と、前記生成工程によって生成された前記テスト音声信号をテスト音声に変換して前記音声出力部によって出力させるテスト音声出力工程と、前記テスト音声出力工程によって出力された前記テスト音声が前記音声入力部へ入力されることによって生じる音響エコー情報を測定する測定工程と、前記測定工程によって測定された前記音響エコー情報に基づいて、前記音声出力部の出力態様を変更する変更制御工程と、を含み、前記変更制御工程は、前記検知手段によって検知された前記利用者の配置に基づいて、前記出力態様の変更制御をおこない、前記テスト音声出力工程は、前記検知手段によって、前記端末装置の周囲の所定範囲に前記利用者が配置されたことを検知した場合、前記テスト音声を出力させることを特徴とする。 The processing method according to claim 4 includes: a voice input unit that receives voice input; a voice output unit that outputs voice based on a voice signal received from a terminal device at another base connected via a network; a detection knowledge means for detecting the placement of the user, a processing method by the terminal device provided with, converting a generation step of generating a test sound signal, the test sound signal generated by said generating step to test sound A test voice output step to be output by the voice output unit, a measurement step of measuring acoustic echo information generated by the test voice output by the test voice output step being input to the voice input unit, and the measurement A change control step of changing an output mode of the sound output unit based on the acoustic echo information measured by the step, and the change The control step performs change control of the output mode based on the arrangement of the users detected by the detection unit, and the test voice output step sets the predetermined range around the terminal device by the detection unit. When it is detected that the user is placed, the test sound is output.

請求項５に記載の処理プログラムは、音声の入力を受け付ける音声入力部と、ネットワークを介して接続された他拠点の端末装置から受信された音声信号に基づいて音声を出力する音声出力部と、利用者の配置を検知する検知手段と、を備える端末装置のための処理プログラムであって、テスト音声信号を生成させる生成工程と、前記生成工程によって生成された前記テスト音声信号をテスト音声に変換させて前記音声出力部によって出力させるテスト音声出力工程と、前記テスト音声出力工程によって出力された前記テスト音声が前記音声入力部へ入力されることによって生じる音響エコー情報を測定させる測定工程と、前記測定工程によって測定された前記音響エコー情報に基づいて、前記音声出力部の出力態様を変更させる変更制御工程と、をコンピュータに実行させ、前記変更制御工程は、前記検知手段によって検知された前記利用者の配置に基づいて、前記出力態様の変更制御をおこない、前記テスト音声出力工程は、前記検知手段によって、前記端末装置の周囲の所定範囲に前記利用者が配置されたことを検知した場合、前記テスト音声を出力させることを特徴とする。
The processing program according to claim 5 includes: a voice input unit that receives voice input; a voice output unit that outputs voice based on a voice signal received from a terminal device at another base connected via a network; a detection knowledge unit and a processing program for the terminal apparatus having a for detecting the placement of the user, a generation step of generating a test sound signal, the test sound signal generated by said generating step to test sound A test voice output step that is converted and output by the voice output unit; a measurement step that measures acoustic echo information generated by the test voice output by the test voice output step being input to the voice input unit; Based on the acoustic echo information measured by the measurement step, a change control process for changing the output mode of the audio output unit The change control step performs change control of the output mode based on the arrangement of the users detected by the detection means, and the test sound output step is executed by the detection means. The test voice is output when it is detected that the user is placed in a predetermined range around the terminal device.

請求項１に記載の発明によれば、テスト音声信号によって測定された音響エコー情報に基づいて、音声出力部の出力態様を変更することができる。したがって、音響エコーの除去を的確に実行できるように音響エコーの低減を図ることができる。そして、低減された音響エコー情報を適応フィルタによってフィルタリングして、音響エコーを的確に実行することができる。また、利用者の配置を検知して、音声出力部の出力態様の変更制御をおこなうことができる。したがって、利用者の配置にあわせて、音声出力部の出力態様の変更制御の適切化を図ることができる。さらに、利用者が端末装置の利用する場合にテスト音声を出力することができる。したがって、音声出力部の出力態様の変更制御を実行するタイミングの最適化を図ることができる。 According to the first aspect of the invention, the output mode of the audio output unit can be changed based on the acoustic echo information measured by the test audio signal. Therefore, the acoustic echo can be reduced so that the acoustic echo can be accurately removed. Then, the reduced acoustic echo information can be filtered by the adaptive filter, and the acoustic echo can be accurately executed. In addition, it is possible to control the change of the output mode of the audio output unit by detecting the user's arrangement. Therefore, the change control of the output mode of the audio output unit can be made appropriate in accordance with the arrangement of the users. Furthermore, when the user uses the terminal device, a test voice can be output. Therefore, it is possible to optimize the timing for executing the output mode change control of the audio output unit.

請求項２に記載の発明によれば、音声出力部によって音声が出力される角度を変更制御することで、簡便に音響エコー情報の低減を図ることができる。 According to the second aspect of the invention, the acoustic echo information can be easily reduced by changing and controlling the angle at which the voice is output by the voice output unit.

請求項３に記載の発明によれば、端末装置の配置情報に変更があった場合にテスト音声を出力することができる。したがって、音声出力部の出力態様の変更制御を実行するタイミングの最適化を図ることができる。 According to the third aspect of the present invention, it is possible to output a test voice when there is a change in the arrangement information of the terminal device. Therefore, it is possible to optimize the timing for executing the output mode change control of the audio output unit.

請求項４に記載の発明によれば、テスト音声信号によって測定された音響エコー情報に基づいて、音声出力部の出力態様を変更することができる。したがって、音響エコーの除去を的確に実行できるように音響エコーの低減を図ることができる。そして、低減された音響エコー情報を適応フィルタによってフィルタリングして、音響エコーを的確に実行することができる。また、利用者の配置を検知して、音声出力部の出力態様の変更制御をおこなうことができる。したがって、利用者の配置にあわせて、音声出力部の出力態様の変更制御の適切化を図ることができる。さらに、利用者が端末装置の利用する場合にテスト音声を出力することができる。したがって、音声出力部の出力態様の変更制御を実行するタイミングの最適化を図ることができる。 According to the fourth aspect of the present invention, the output mode of the voice output unit can be changed based on the acoustic echo information measured by the test voice signal. Therefore, the acoustic echo can be reduced so that the acoustic echo can be accurately removed. Then, the reduced acoustic echo information can be filtered by the adaptive filter, and the acoustic echo can be accurately executed. In addition, it is possible to control the change of the output mode of the audio output unit by detecting the user's arrangement. Therefore, the change control of the output mode of the audio output unit can be made appropriate in accordance with the arrangement of the users. Furthermore, when the user uses the terminal device, a test voice can be output. Therefore, it is possible to optimize the timing for executing the output mode change control of the audio output unit.

請求項５に記載の発明によれば、コンピュータによって、テスト音声信号によって測定された音響エコー情報に基づいて、音声出力部の出力態様を変更させることができる。したがって、音響エコーの除去を的確に実行できるように音響エコーの低減させることができる。そして、コンピュータによって、低減された音響エコー情報を適応フィルタによってフィルタリングさせて、音響エコーを的確に実行させることができる。また、コンピュータによって利用者の配置を検知して、音声出力部の出力態様の変更制御をおこなうことができる。したがって、利用者の配置にあわせて、音声出力部の出力態様の変更制御の適切化を図ることができる。さらに、コンピュータによって、利用者が端末装置の利用する場合にテスト音声を出力することができる。したがって、音声出力部の出力態様の変更制御を実行するタイミングの最適化を図ることができる。
According to the fifth aspect of the present invention, the output mode of the sound output unit can be changed by the computer based on the acoustic echo information measured by the test sound signal. Therefore, the acoustic echo can be reduced so that the acoustic echo can be accurately removed. The reduced acoustic echo information can be filtered by the adaptive filter by the computer, and the acoustic echo can be accurately executed. In addition, it is possible to perform change control of the output mode of the audio output unit by detecting the arrangement of the user by the computer. Therefore, the change control of the output mode of the audio output unit can be made appropriate in accordance with the arrangement of the users. Further, the computer can output a test voice when the user uses the terminal device. Therefore, it is possible to optimize the timing for executing the output mode change control of the audio output unit.

以上説明したように、本発明にかかる端末装置、処理方法および処理プログラムによれば、音響エコーを的確に除去できるように、音声出力部の出力態様を調整して、音響エコーの低減を図ることができるという効果を奏する。 As described above, according to the terminal device, processing method, and processing program according to the present invention, the acoustic echo can be reduced by adjusting the output mode of the audio output unit so that the acoustic echo can be accurately removed. There is an effect that can be.

本発明の実施形態のテレビ会議システムの一例を示す説明図である。It is explanatory drawing which shows an example of the video conference system of embodiment of this invention. 本発明の実施形態のテレビ会議端末の機能的構成の一例を示す説明図である。It is explanatory drawing which shows an example of a functional structure of the video conference terminal of embodiment of this invention. 本発明の実施形態の音声Ｉ／Ｆの機能的構成の一例を示す説明図である。It is explanatory drawing which shows an example of a functional structure of the audio | voice I / F of embodiment of this invention. 本発明の実施形態のスピーカの方向の変更制御の一例を示す説明図である。It is explanatory drawing which shows an example of the change control of the direction of the speaker of embodiment of this invention. 本発明の実施形態の音響エコー情報の測定結果の一例を示す説明図である。It is explanatory drawing which shows an example of the measurement result of the acoustic echo information of embodiment of this invention. 本発明の実施形態のテレビ会議システムの処理の内容を示すフローチャートである。It is a flowchart which shows the content of the process of the video conference system of embodiment of this invention. 本発明の実施形態の変形例における複数の利用者に対するスピーカの角度制御の一例を示す説明図である。It is explanatory drawing which shows an example of the angle control of the speaker with respect to the some user in the modification of embodiment of this invention. 本発明の実施形態の変形例におけるピーカのＯＮ／ＯＦＦ状態の変更制御の一例を示す説明図である。It is explanatory drawing which shows an example of the change control of the ON / OFF state of a peaker in the modification of embodiment of this invention. 本発明の実施形態の変形例におけるピーカの出力状態の変更制御の一例を示す説明図である。It is explanatory drawing which shows an example of the change control of the output state of a peaker in the modification of embodiment of this invention. 本発明の実施形態の変形例におけるテレビ会議端末の処理の内容を示すフローチャートである。It is a flowchart which shows the content of the process of the video conference terminal in the modification of embodiment of this invention.

以下に添付図面を参照して、この発明にかかる端末装置、処理方法および処理プログラムの好適な実施の形態を詳細に説明する。 Exemplary embodiments of a terminal device, a processing method, and a processing program according to the present invention are explained in detail below with reference to the accompanying drawings.

（実施形態）
（全体構成）
図１を用いて、本発明の実施形態にかかる端末装置を、複数拠点でテレビ会議をおこなうテレビ会議システムに用いるテレビ会議端末に適用した場合について説明する。図１は、本発明の実施形態のテレビ会議システムの一例を示す説明図である。図１において、テレビ会議システム１００は、会議室Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがネットワークＮＷを介して接続されて構成されている。 (Embodiment)
(overall structure)
A case where the terminal device according to the embodiment of the present invention is applied to a video conference terminal used in a video conference system that performs a video conference at a plurality of sites will be described with reference to FIG. FIG. 1 is an explanatory diagram showing an example of a video conference system according to an embodiment of the present invention. In FIG. 1, a video conference system 100 is configured by connecting video conference terminals 110a and 110b installed in conference rooms A and B via a network NW.

テレビ会議システム１００は、地理的に離れた会議室Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがインターネットなどのネットワークＮＷを介して接続されたり、建物内の離れた会議室Ａ，Ｂに設置されたテレビ会議端末１１０ａ，１１０ｂがＬＡＮ（ローカルエリアネットワーク）などのネットワークＮＷを介して接続されたりしている。 In the video conference system 100, video conference terminals 110a and 110b installed in geographically separated conference rooms A and B are connected via a network NW such as the Internet, or are connected to remote conference rooms A and B in a building. The installed video conference terminals 110a and 110b are connected via a network NW such as a LAN (local area network).

なお、図１では、２つの会議室Ａ，Ｂでテレビ会議をおこなう場合について説明するが、３つ以上の拠点でテレビ会議をおこなうこととしてもよい。また、テレビ会議端末１１０ａ，１１０ｂがネットワークを介して相互に接続されることとして説明するが、ネットワーク上の任意の位置に設置された管理サーバなどを介して相互に接続される構成でもよい。ネットワークＮＷは、公衆電話回線網などであってもよい。 In addition, although FIG. 1 demonstrates the case where a video conference is performed in two meeting rooms A and B, it is good also as performing a video conference in three or more bases. In addition, although the video conference terminals 110a and 110b are described as being connected to each other via a network, a configuration in which the video conference terminals 110a and 110b are connected to each other via a management server installed at an arbitrary position on the network may be employed. The network NW may be a public telephone line network.

テレビ会議システム１００は、会議室Ａ，Ｂにおける会議の映像および音声をテレビ会議端末１１０によって送受信させる。具体的には、テレビ会議端末１１０ａは、カメラ２１２ａやマイク２１４ａによって会議室Ａでテレビ会議に参加する利用者の映像および音声を取得する。テレビ会議端末１１０ａは、取得した映像および音声をパケット化し、映像信号および音声信号としてネットワークＮＷを介して会議室Ｂのテレビ会議端末１１０ｂに送信する。 The video conference system 100 causes the video conference terminal 110 to transmit and receive video and audio of the conference in the conference rooms A and B. Specifically, the video conference terminal 110a acquires video and audio of a user who participates in the video conference in the conference room A using the camera 212a and the microphone 214a. The video conference terminal 110a packetizes the acquired video and audio, and transmits them as video signals and audio signals to the video conference terminal 110b in the conference room B via the network NW.

テレビ会議端末１１０ａは、会議室Ｂのテレビ会議端末１１０ｂから送信される映像信号や音声信号をネットワークＮＷを介して受信する。テレビ会議端末１１０ａは、受信した信号から映像および音声をディスプレイ２１１ａやスピーカ２１３ａ（ＳＰ１〜ＳＰ４）によって再生する。なお、会議室Ａ，Ｂに設置されるスピーカ２１３の数は４個に限定されることなく、必要に応じて増減できる。 The video conference terminal 110a receives the video signal and the audio signal transmitted from the video conference terminal 110b in the conference room B via the network NW. The video conference terminal 110a reproduces video and audio from the received signal through the display 211a and the speakers 213a (SP1 to SP4). The number of speakers 213 installed in the conference rooms A and B is not limited to four and can be increased or decreased as necessary.

スピーカ２１３ａ（ＳＰ１〜ＳＰ４）によって音声が再生されると、会議室Ａでの反響や音声の回り込みなどによる音響エコーが発生する。テレビ会議端末１１０ａは、マイク２１４ａによって集音される会議室Ａでテレビ会議に参加する利用者からの音声や音響エコーを含む音声信号を通信先のテレビ会議端末１１０ｂに送信することとなる。なお、会議室Ａ，Ｂに設置されるマイク２１４の数は２個に限定されることなく、必要に応じて増減できる。 When the sound is reproduced by the speakers 213a (SP1 to SP4), an acoustic echo is generated due to reverberation in the conference room A or sound wraparound. The video conference terminal 110a transmits a voice signal including a voice and a sound echo from a user participating in the video conference in the conference room A collected by the microphone 214a to the video conference terminal 110b of the communication destination. Note that the number of microphones 214 installed in the conference rooms A and B is not limited to two and can be increased or decreased as necessary.

ここで、テレビ会議端末１１０ａは、テレビ会議端末１１０ｂから受信される音声信号とは異なるテスト音声信号を生成する。テレビ会議端末１１０ａは、テスト音声信号をテスト音声としてスピーカ２１３ａから出力する。 Here, the video conference terminal 110a generates a test audio signal different from the audio signal received from the video conference terminal 110b. The video conference terminal 110a outputs a test audio signal as a test audio from the speaker 213a.

テレビ会議端末１１０ａは、スピーカ２１３ａから出力されたテスト音声の反響や回り込みをマイク２１４ａによって集音する。テレビ会議端末１１０ａは、マイク２１４ａによる集音に基づいて、テスト音声に起因する音響エコー情報を測定する。 The video conference terminal 110a collects the echo and wraparound of the test sound output from the speaker 213a with the microphone 214a. The video conference terminal 110a measures acoustic echo information caused by the test voice based on the sound collection by the microphone 214a.

テレビ会議端末１１０ａは、スピーカ２１３ａ（ＳＰ１〜ＳＰ４）の出力態様の変更制御をおこなう。具体的には、テレビ会議端末１１０ａは、各スピーカ２１３ａ（ＳＰ１〜ＳＰ４）の向きを変更する。テレビ会議端末１１０ａは、テスト音声を出力しつつ各スピーカ２１３ａ（ＳＰ１〜ＳＰ４）の向きを変更して音声エコー情報を測定する。テレビ会議端末１１０ａは、出力音声の指向角度である各スピーカ２１３ａ（ＳＰ１〜ＳＰ４）の向きをテスト音声による音響エコーが最小となるよう調整する。 The video conference terminal 110a performs change control of the output mode of the speakers 213a (SP1 to SP4). Specifically, the video conference terminal 110a changes the direction of each speaker 213a (SP1-SP4). The video conference terminal 110a changes the direction of each speaker 213a (SP1 to SP4) while outputting the test voice, and measures voice echo information. The video conference terminal 110a adjusts the direction of each speaker 213a (SP1 to SP4) that is the directivity angle of the output sound so that the acoustic echo due to the test sound is minimized.

テレビ会議端末１１０ａは、音響エコーをフィルタリングする適応フィルタを設定する。すなわち、テレビ会議端末１１０ａは、音響エコーが最小となるよう向きが調整されたスピーカ２１３ａ（ＳＰ１〜ＳＰ４）によって入力される音響エコーに基づいて適応フィルタを設定する。テレビ会議端末１１０ａは、マイク２１４ａによって集音される会議室Ａでテレビ会議に参加する利用者からの音声や音響エコーを含む音声信号を適応フィルタによってフィルタリングして通信先のテレビ会議端末１１０ｂに送信する。 The video conference terminal 110a sets an adaptive filter for filtering the acoustic echo. That is, the video conference terminal 110a sets the adaptive filter based on the acoustic echo input by the speaker 213a (SP1 to SP4) whose direction is adjusted so that the acoustic echo is minimized. The video conference terminal 110a filters an audio signal including voice and acoustic echo from a user participating in the video conference in the conference room A collected by the microphone 214a by an adaptive filter and transmits the filtered audio signal to the video conference terminal 110b of the communication destination. To do.

テスト音声によって音響エコーが低減されるよう調整されたスピーカ２１３ａ（ＳＰ１〜ＳＰ４）を用いることで、会議中の音響エコーも低減された状態となる。したがって、テレビ会議端末１１０ａは、テレビ会議端末１１０ｂに送信対象の音声信号から音響エコーの除去を確実におこなうことができる。 By using the speakers 213a (SP1 to SP4) adjusted so that the acoustic echo is reduced by the test voice, the acoustic echo during the meeting is also reduced. Therefore, the video conference terminal 110a can reliably remove the acoustic echo from the audio signal to be transmitted to the video conference terminal 110b.

なお、本発明の実施形態では、テレビ会議端末１１０ａを例に挙げて、スピーカ２１３ａ（ＳＰ１〜ＳＰ４）の向きを調整して音響エコーを除去する構成について説明したが、これに限ることはない。すなわち、テレビ会議端末１１０ｂなどの他機器も同様の構成を備えることとしてもよい。 In the embodiment of the present invention, the configuration of removing the acoustic echo by adjusting the direction of the speaker 213a (SP1 to SP4) has been described by taking the video conference terminal 110a as an example, but the present invention is not limited to this. That is, other devices such as the video conference terminal 110b may have the same configuration.

（機能的構成）
図２を用いて、本発明の実施形態のテレビ会議端末１１０の機能的構成について説明する。図２は、本発明の実施形態のテレビ会議端末の機能的構成の一例を示す説明図である。 (Functional configuration)
A functional configuration of the video conference terminal 110 according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is an explanatory diagram illustrating an example of a functional configuration of the video conference terminal according to the embodiment of this invention.

図２において、テレビ会議端末１１０は、ＣＰＵ（セントラルプロセッシングユニット）２０１と、ＲＡＭ（ランダムアクセスメモリ）２０２と、ＲＯＭ（リードオンリーメモリ）２０３と、ディスプレイ２１１やカメラ２１２に対して映像の入出力を制御する映像Ｉ／Ｆ２０４と、スピーカ２１３やマイク２１４に対して音声の入出力を制御する音声Ｉ／Ｆ２０５と、各種情報の入力を受け付ける操作部２０６と、外部機器との通信を制御する通信Ｉ／Ｆ２０７と、各種情報を記憶する記憶媒体２０８とを備えている。また、テレビ会議端末１１０の各構成部は、バス２００によってそれぞれ接続されている。 In FIG. 2, the video conference terminal 110 inputs and outputs video to and from a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 202, a ROM (Read Only Memory) 203, a display 211 and a camera 212. Video I / F 204 to be controlled, audio I / F 205 that controls input / output of audio to / from speaker 213 and microphone 214, operation unit 206 that receives input of various information, and communication I that controls communication with external devices / F207 and a storage medium 208 for storing various types of information. Each component of the video conference terminal 110 is connected by a bus 200.

ＣＰＵ２０１は、テレビ会議端末１１０全体の制御をおこなう。ＣＰＵ２０１は、ＲＡＭ２０２をワークエリアとして、ＲＯＭ２０３から読み込まれる各種プログラムを実行する。 The CPU 201 controls the entire video conference terminal 110. The CPU 201 executes various programs read from the ROM 203 using the RAM 202 as a work area.

映像Ｉ／Ｆ２０４は、ＣＰＵ２０１の制御にしたがって、ディスプレイ２１１に各種情報を表示させる。映像Ｉ／Ｆ２０４は、たとえば、カメラ２１２によって撮像された会議室Ａまたは会議室Ｂである自拠点の映像や、会議室Ｂまたは会議室Ａである他拠点のテレビ会議端末１１０から受信された映像信号をデコードした映像や、他拠点とのテレビ会議に関する処理画面などをディスプレイ２１１に表示させる。 The video I / F 204 displays various information on the display 211 under the control of the CPU 201. The video I / F 204 is, for example, a video of the local site that is the conference room A or the conference room B captured by the camera 212 or a video received from the video conference terminal 110 of another site that is the conference room B or the conference room A. An image obtained by decoding the signal, a processing screen related to a video conference with another base, and the like are displayed on the display 211.

映像Ｉ／Ｆ２０４は、ＣＰＵ２０１の制御にしたがって、カメラ２１２によって自拠点の利用者の映像を撮像する。映像Ｉ／Ｆ２０４は、ＣＰＵ２０１の制御にしたがって、カメラ２１２によって撮像された映像を記憶媒体２０８に出力する。 The video I / F 204 captures a video of the user at the local site by the camera 212 under the control of the CPU 201. The video I / F 204 outputs the video captured by the camera 212 to the storage medium 208 according to the control of the CPU 201.

音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、スピーカ２１３に各種音声を出力させる。音声Ｉ／Ｆ２０５は、他拠点のテレビ会議端末１１０から受信された音声信号をデコードした音声や、他拠点とのテレビ会議に関する案内音声などをスピーカ２１３に出力させる。 The sound I / F 205 causes the speaker 213 to output various sounds according to the control of the CPU 201. Audio | voice I / F205 outputs the audio | voice which decoded the audio | voice signal received from the video conference terminal 110 of another base, the guidance audio | voice regarding a video conference with another base, etc. to the speaker 213.

音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク２１４によって自拠点の利用者の音声を集音する。スピーカ２１３によって音声が出力されると、会議室Ａ，Ｂでの反響や回り込みなどによる音響エコーが発生する。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク２１４によって自拠点の音響エコーも集音することとなる。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、マイク２１４によって集音された音声を記憶媒体２０８に出力する。 The voice I / F 205 collects the voice of the user at the local site by the microphone 214 under the control of the CPU 201. When sound is output from the speaker 213, acoustic echoes due to reverberation or wraparound in the conference rooms A and B occur. The voice I / F 205 also collects acoustic echoes of its own base by the microphone 214 under the control of the CPU 201. The sound I / F 205 outputs the sound collected by the microphone 214 to the storage medium 208 according to the control of the CPU 201.

音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、他拠点（他の会議室）から受信される音声とは異なるテスト音声信号を生成する。音声Ｉ／Ｆ２０５は、生成されたテスト音声信号を用いて、音響エコーが最小となるようスピーカ２１３の出力態様を調整する。 The audio I / F 205 generates a test audio signal different from the audio received from another base (another conference room) according to the control of the CPU 201. The audio I / F 205 adjusts the output mode of the speaker 213 using the generated test audio signal so that the acoustic echo is minimized.

図３を用いて、本発明の実施形態の音声Ｉ／Ｆ２０５の詳細について説明する。図３は、本発明の実施形態の音声Ｉ／Ｆの機能的構成の一例を示す説明図である。 Details of the audio I / F 205 according to the embodiment of this invention will be described with reference to FIG. FIG. 3 is an explanatory diagram illustrating an example of a functional configuration of the audio I / F according to the embodiment of this invention.

図３において、音声Ｉ／Ｆ２０５は、テスト音声信号を生成するテスト信号発生部３０１と、駆動部３０５を制御してスピーカ２１３の出力態様を調整可能な出力態様制御部３０２と、マイク２１４から入力される音響エコーに関する音響エコー情報を測定する残留エコー測定部３０３と、音響エコーの除去に用いる適応フィルタを設定する適応フィルタ設定部３０４と、デジタルの音声信号をアナログの音声に変換するＤ／Ａコンバータ３１０と、アナログの音声をデジタルの音声信号に変換するＡ／Ｄコンバータ３１１とを備えている。 In FIG. 3, the audio I / F 205 is input from a test signal generation unit 301 that generates a test audio signal, an output mode control unit 302 that can control the driving unit 305 to adjust the output mode of the speaker 213, and a microphone 214. A residual echo measurement unit 303 that measures acoustic echo information regarding the acoustic echo to be performed, an adaptive filter setting unit 304 that sets an adaptive filter used to remove the acoustic echo, and a D / A that converts a digital audio signal into analog audio A converter 310 and an A / D converter 311 that converts analog sound into a digital sound signal are provided.

テスト信号発生部３０１は、ＣＰＵ２０１の制御にしたがって、テスト音声信号を生成する。テスト音声信号は、他拠点から受信される音声とは異なるテスト音声を出力元の信号である。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、Ｄ／Ａコンバータ３１０によってテスト音声信号をテスト音声に変換してスピーカ２１３（ＳＰ１〜ＳＰ４）から出力させる。 The test signal generator 301 generates a test audio signal according to the control of the CPU 201. The test sound signal is a signal from which a test sound different from the sound received from another base is output. The audio I / F 205 converts the test audio signal into the test audio by the D / A converter 310 under the control of the CPU 201 and outputs the test audio signal from the speakers 213 (SP1 to SP4).

具体的には、ＣＰＵ２０１は、カメラ２１２によって撮像された自拠点の映像から利用者がテレビ会議端末１１０の周囲の所定範囲に存在する場合や、テレビ会議端末１１０の設置個所に変更があった場合など、音声Ｉ／Ｆ２０５によってテスト音声信号を生成させる。換言すれば、ＣＰＵ２０１は、テレビ会議が開始されるタイミングや、周辺環境に変化生じたタイミングでテスト音声信号を生成させることとなる。 Specifically, the CPU 201 determines that the user exists in a predetermined range around the video conference terminal 110 from the video of the local site captured by the camera 212 or the installation location of the video conference terminal 110 has changed. For example, a test audio signal is generated by the audio I / F 205. In other words, the CPU 201 generates a test audio signal at the timing when the video conference is started or when the change occurs in the surrounding environment.

出力態様制御部３０２は、ＣＰＵ２０１の制御にしたがって、駆動部３０５を駆動させてスピーカ２１３（ＳＰ１〜ＳＰ４）の向きの変更制御を実行する。具体的には、ＣＰＵ２０１は、カメラ２１２によって、各スピーカ２１３（ＳＰ１〜ＳＰ４）に相対する利用者の方向から所定角度の変更制御をおこない、テスト音声を各スピーカ２１３（ＳＰ１〜ＳＰ４）から出力させる。 The output mode control unit 302 drives the drive unit 305 according to the control of the CPU 201 to execute the direction change control of the speakers 213 (SP1 to SP4). Specifically, the CPU 201 performs change control of a predetermined angle from the direction of the user facing each speaker 213 (SP1 to SP4) by the camera 212, and outputs a test sound from each speaker 213 (SP1 to SP4). .

ここで、図４を用いて、本発明の実施形態のスピーカ２１３（ＳＰ１〜ＳＰ４）の方向の変更制御について説明する。図４は、本発明の実施形態のスピーカの方向の変更制御の一例を示す説明図である。 Here, the change control of the direction of the speakers 213 (SP1 to SP4) according to the embodiment of the present invention will be described with reference to FIG. FIG. 4 is an explanatory diagram illustrating an example of change control of the direction of the speaker according to the embodiment of this invention.

図４において、ＣＰＵ２０１は、カメラ２１２によって、スピーカ２１３（ＳＰ１〜ＳＰ４）と相対する利用者Ｍの方向を検知すると、利用者Ｍの方向をθ１に設定する。具体的には、ＣＰＵ２０１は、カメラ２１２によって撮像された画像の中から人物画像を抽出する。ＣＰＵ２０１は、撮像時のカメラ２１２の方向と、抽出された人物とに基づいて、スピーカ２１３（ＳＰ１〜ＳＰ４）に対する利用者Ｍの方向を検知する。ＣＰＵ２０１は、音声Ｉ／Ｆ２０５を制御して、θ１から所定の角度間隔でθ２，θ３，θ４，θ５を設定する。図４の例では、θ１を０°として、３０°間隔で６０°〜−６０°までの範囲である。 In FIG. 4, when the CPU 201 detects the direction of the user M facing the speaker 213 (SP1 to SP4) by the camera 212, the CPU 201 sets the direction of the user M to θ1. Specifically, the CPU 201 extracts a human image from images captured by the camera 212. CPU201 detects the direction of the user M with respect to the speaker 213 (SP1-SP4) based on the direction of the camera 212 at the time of imaging, and the extracted person. The CPU 201 controls the audio I / F 205 to set θ2, θ3, θ4, and θ5 at predetermined angular intervals from θ1. In the example of FIG. 4, θ1 is 0 °, and the range is 60 ° to −60 ° at 30 ° intervals.

ＣＰＵ２０１は、音声Ｉ／Ｆ２０５を制御して、それぞれの利用者の方向にスピーカ２１３（ＳＰ１〜ＳＰ４）を向けて、それぞれスピーカ２１３（ＳＰ１〜ＳＰ４）からテスト音声を出力させる。このように、利用者Ｍの方向から所定角度とすることで、スピーカ２１３（ＳＰ１〜ＳＰ４）が利用者Ｍにとって音声を聴くのに適さない向きとなることを防ぐことができる。 The CPU 201 controls the audio I / F 205 to direct the speakers 213 (SP1 to SP4) toward the respective users and to output test audio from the speakers 213 (SP1 to SP4), respectively. As described above, by setting the angle at a predetermined angle from the direction of the user M, it is possible to prevent the speaker 213 (SP1 to SP4) from being in an unsuitable direction for the user M to listen to the voice.

なお、各スピーカ２１３（ＳＰ１〜ＳＰ４）に相対する利用者が存在しない場合は、たとえば、各テレビ会議端末１１０によって初期設定された方向や、椅子など利用者が配置可能な方向をθ１に設定することとしてもよい。 If there is no user facing each speaker 213 (SP1 to SP4), for example, the direction initially set by each video conference terminal 110 or the direction in which a user such as a chair can be placed is set to θ1. It is good as well.

図３に戻って、残留エコー測定部３０３は、マイク２１４から入力される音響エコーに関する音響エコー情報を測定する。残留エコー測定部３０３は、テスト信号発生部３０１によって生成されたテスト音声信号がスピーカ２１３によってテスト音声として出力される際、テスト音声信号を抽出する。ＣＰＵ２０１は、マイク２１４によって集音された音声をＡ／Ｄコンバータ３１１によって音声信号に変換する。残留エコー測定部３０３は、抽出されたテスト音声信号と、マイク２１４によって集音される音声信号とを比較して音響エコー情報を検出する。 Returning to FIG. 3, the residual echo measurement unit 303 measures acoustic echo information regarding the acoustic echo input from the microphone 214. The residual echo measurement unit 303 extracts the test sound signal when the test sound signal generated by the test signal generation unit 301 is output as the test sound by the speaker 213. The CPU 201 converts the sound collected by the microphone 214 into an audio signal by the A / D converter 311. The residual echo measurement unit 303 detects the acoustic echo information by comparing the extracted test sound signal with the sound signal collected by the microphone 214.

図５を用いて、本発明の実施形態の音響エコー情報の測定結果について説明する。図５は、本発明の実施形態の音響エコー情報の測定結果の一例を示す説明図である。 The measurement result of the acoustic echo information according to the embodiment of the present invention will be described with reference to FIG. FIG. 5 is an explanatory diagram illustrating an example of a measurement result of acoustic echo information according to the embodiment of the present invention.

図５において、測定結果テーブル５００は、記憶媒体２０８に記憶され、各スピーカ２１３（ＳＰ１〜ＳＰ４）について、図４に示した方向の変更制御を実行した場合の音響エコー成分の大きさを示している。すなわち、各スピーカ２１３（ＳＰ１〜ＳＰ４）の向きがそれぞれθ１〜５のうちいずれかであるすべての組合せについて、テスト音声による音響エコーを測定した結果である。 In FIG. 5, a measurement result table 500 is stored in the storage medium 208, and shows the magnitude of the acoustic echo component when the direction change control shown in FIG. 4 is executed for each speaker 213 (SP1 to SP4). Yes. That is, it is the result of measuring the acoustic echo by the test voice for all combinations in which the direction of each speaker 213 (SP1 to SP4) is any one of θ1 to 5, respectively.

図５の例では、スピーカ２１３（ＳＰ１）がθ１、スピーカ２１３（ＳＰ２）がθ２、スピーカ２１３（ＳＰ３）がθ１、スピーカ２１３（ＳＰ４）がθ１の組合せのとき、音響エコーが最小状態５０１であることを示している。 In the example of FIG. 5, the acoustic echo is in the minimum state 501 when the speaker 213 (SP1) is θ1, the speaker 213 (SP2) is θ2, the speaker 213 (SP3) is θ1, and the speaker 213 (SP4) is θ1. It is shown that.

図３に戻って、ＣＰＵ２０１は、テスト音声の出力に応じた残留エコー測定部３０３による測定結果に基づいて、音声Ｉ／Ｆ２０５を制御して駆動部３０５を駆動させる。具体的には、ＣＰＵ２０１は、出力態様制御部３０２によってスピーカ２１３（ＳＰ１）がθ１、スピーカ２１３（ＳＰ２）がθ２、スピーカ２１３（ＳＰ３）がθ１、スピーカ２１３（ＳＰ４）がθ１となるように変更制御を実行する。 Returning to FIG. 3, the CPU 201 drives the drive unit 305 by controlling the audio I / F 205 based on the measurement result by the residual echo measurement unit 303 according to the output of the test audio. Specifically, the CPU 201 changes the output mode control unit 302 so that the speaker 213 (SP1) is θ1, the speaker 213 (SP2) is θ2, the speaker 213 (SP3) is θ1, and the speaker 213 (SP4) is θ1. Execute control.

適応フィルタ設定部３０４は、ＣＰＵ２０１の制御にしたがって、スピーカ２１３によって出力される音声信号と、マイク２１４から入力される音声信号とを比較して、音響エコーの成分をフィルタリングする適応フィルタを設定する。すなわち、テスト音声による音響エコーが最小限になるよう設定されたスピーカ２１３の出力態様によって適応フィルタを設定することになるため、音響エコーの的確なフィルタリングをおこなうことができる。 The adaptive filter setting unit 304 compares the audio signal output from the speaker 213 with the audio signal input from the microphone 214 under the control of the CPU 201 and sets an adaptive filter that filters the acoustic echo component. That is, since the adaptive filter is set according to the output mode of the speaker 213 set so that the acoustic echo due to the test voice is minimized, the acoustic echo can be accurately filtered.

図２に戻って、操作部２０６は、利用者などから各種情報の入力を受け付ける。操作部２０６は、タッチパネルや操作ボタンなどによって構成され、テレビ会議に関する情報の入力を受け付けて、入力された信号をＣＰＵ２０１へ出力する。 Returning to FIG. 2, the operation unit 206 accepts input of various information from a user or the like. The operation unit 206 includes a touch panel, operation buttons, and the like. The operation unit 206 accepts input of information regarding a video conference and outputs an input signal to the CPU 201.

通信Ｉ／Ｆ２０７は、通信回線を通じてインターネットなどのネットワークＮＷに接続され、このネットワークＮＷを介して他のテレビ会議端末１１０やその他外部機器に接続される。通信Ｉ／Ｆ２０７は、ネットワークＮＷとテレビ会議端末１１０内部のインターフェースをつかさどり、外部機器に対するデータの入出力を制御する。通信Ｉ／Ｆ２０７には、たとえば、モデムやＬＡＮアダプタなどを採用することができる。 The communication I / F 207 is connected to a network NW such as the Internet through a communication line, and is connected to other video conference terminals 110 and other external devices via this network NW. The communication I / F 207 controls an interface inside the network NW and the video conference terminal 110 and controls data input / output with respect to an external device. As the communication I / F 207, for example, a modem or a LAN adapter can be employed.

通信Ｉ／Ｆ２０７は、ＣＰＵ２０１の制御にしたがって、記憶媒体２０８に記憶された自拠点の映像および音声を、他拠点のテレビ会議端末１１０へネットワークＮＷを介して送信する。通信Ｉ／Ｆ２０７は、ＣＰＵ２０１の制御にしたがって、映像および音声を映像信号および音声信号として所定のタイミングでネットワークＮＷを介して送信する。 The communication I / F 207 transmits the video and audio of its own site stored in the storage medium 208 to the video conference terminal 110 at another site via the network NW according to the control of the CPU 201. The communication I / F 207 transmits video and audio as video signals and audio signals via the network NW at a predetermined timing in accordance with the control of the CPU 201.

記憶媒体２０８は、ＨＤ（ハードディスク）や着脱可能な記録媒体の一例としてのＦＤ（フレキシブルディスク）などである。記憶媒体２０８は、それぞれのドライブデバイスを有し、ＣＰＵ２０１の制御にしたがって各種データが記録される。また、記憶媒体２０８からは、それぞれのドライブデバイスの制御にしたがってデータが読み取られる。 The storage medium 208 is an HD (hard disk) or an FD (flexible disk) as an example of a removable recording medium. The storage medium 208 has respective drive devices, and various data are recorded under the control of the CPU 201. Further, data is read from the storage medium 208 according to the control of each drive device.

なお、各構成要素と、各機能を対応付けて説明すると、図２に示したＣＰＵ２０１および音声Ｉ／Ｆ２０５によって、本発明の生成手段、テスト音声出力手段、測定手段および変更制御手段の機能を実現する。具体的には、図３に示したテスト信号発生部３０１によって本発明の生成手段、残留エコー測定部３０３によって本発明の測定手段、出力態様制御部３０２によって本発明の変更制御手段の機能をそれぞれ実現する。また、ＣＰＵ２０１、カメラ２１２および映像Ｉ／Ｆ２０４によって、本発明の検知手段の機能を実現する。また、スピーカ２１３によって本発明の音声出力部、マイクによって本発明の音声入力部の機能をそれぞれ実現する。 If each component is described in association with each function, the functions of the generation unit, test audio output unit, measurement unit, and change control unit of the present invention are realized by the CPU 201 and the audio I / F 205 shown in FIG. To do. Specifically, the test signal generator 301 shown in FIG. 3 functions as the generation means of the present invention, the residual echo measurement section 303 functions as the measurement means according to the present invention, and the output mode control section 302 functions as the change control means according to the present invention. Realize. The CPU 201, the camera 212, and the video I / F 204 realize the function of the detection unit of the present invention. Further, the speaker 213 implements the functions of the audio output unit of the present invention and the microphones of the audio input unit of the present invention.

（テレビ会議システム１００の処理の内容）
図６を用いて、本発明の実施形態のテレビ会議システム１００の処理の内容について説明する。図６は、本発明の実施形態のテレビ会議システムの処理の内容を示すフローチャートである。 (Contents of processing of the video conference system 100)
The contents of processing of the video conference system 100 according to the embodiment of the present invention will be described with reference to FIG. FIG. 6 is a flowchart showing the contents of processing of the video conference system according to the embodiment of the present invention.

図６のフローチャートにおいて、まず、ＣＰＵ２０１は、テレビ会議が開始されたか否かを判断する（ステップＳ６０１）。テレビ会議の開始は、たとえば、利用者による操作部２０６の操作に基づいて、通信Ｉ／Ｆ２０７を介して他のテレビ会議端末１１０に対して接続要求をおこなう。通信Ｉ／Ｆ２０７を介して他のテレビ会議端末１１０から応答を受信することによって判断される。 In the flowchart of FIG. 6, first, the CPU 201 determines whether or not a video conference has started (step S601). To start a video conference, for example, a connection request is made to another video conference terminal 110 via the communication I / F 207 based on the operation of the operation unit 206 by the user. This is determined by receiving a response from another video conference terminal 110 via the communication I / F 207.

ステップＳ６０１において、テレビ会議が開始されるのを待って、開始された場合（ステップＳ６０１：Ｙｅｓ）は、ＣＰＵ２０１は、各スピーカ２１３（ＳＰ１〜ＳＰ４）を初期状態に設定する（ステップＳ６０２）。初期状態は、図４に示したθ１の状態である。 In step S601, the CPU 201 sets the speakers 213 (SP1 to SP4) in an initial state (step S602) after waiting for the video conference to be started and starting the video conference (step S601: Yes). The initial state is the state of θ1 shown in FIG.

ＣＰＵ２０１は、音声Ｉ／Ｆ２０５を制御して、テスト信号発生部３０１によってテスト音声信号を発生させ（ステップＳ６０３）、ステップＳ６０２において設定された状態のスピーカ２１３からテスト音声を出力する。 The CPU 201 controls the audio I / F 205 to cause the test signal generator 301 to generate a test audio signal (step S603), and outputs the test audio from the speaker 213 set in step S602.

ＣＰＵ２０１は、音声Ｉ／Ｆ２０５を制御して、マイク２１４から入力された残留エコー測定部３０３によって音響エコー情報を測定する（ステップＳ６０４）。ＣＰＵ２０１は、測定された音響エコー情報を測定結果テーブル５００として記憶媒体２０８に記憶する（ステップＳ６０５）。 The CPU 201 controls the audio I / F 205 and measures acoustic echo information using the residual echo measurement unit 303 input from the microphone 214 (step S604). The CPU 201 stores the measured acoustic echo information in the storage medium 208 as the measurement result table 500 (step S605).

ＣＰＵ２０１は、音声Ｉ／Ｆ２０５を制御して、駆動部３０５の駆動によってスピーカ２１３の方向を変更し（ステップＳ６０６）、各スピーカ２１３（ＳＰ１〜ＳＰ４）について所定の角度範囲ですべての方向の組合せが終了したか否かを判断する（ステップＳ６０７）。 The CPU 201 controls the audio I / F 205 to change the direction of the speaker 213 by driving the drive unit 305 (step S606), and the combinations of all directions within a predetermined angle range for each speaker 213 (SP1 to SP4). It is determined whether or not the processing has been completed (step S607).

ステップＳ６０６において、すべての組合せが終了していない場合（ステップＳ６０７：Ｎｏ）は、ＣＰＵ２０１は、ステップＳ６０３へ戻って、声Ｉ／Ｆ２０５を制御して、テスト信号発生部３０１によってテスト音声信号を発生させて、処理を繰り返す。 If all combinations have not been completed in step S606 (step S607: No), the CPU 201 returns to step S603, controls the voice I / F 205, and generates a test audio signal by the test signal generator 301. And repeat the process.

ステップＳ６０６において、すべての組合せが終了した場合（ステップＳ６０７：Ｙｅｓ）は、ＣＰＵ２０１は、記憶媒体２０８に記憶されている測定結果テーブル５００から、音響エコーの最小状態の設定を決定する（ステップＳ６０８）。 In step S606, when all the combinations are completed (step S607: Yes), the CPU 201 determines the setting of the minimum state of the acoustic echo from the measurement result table 500 stored in the storage medium 208 (step S608). .

ＣＰＵ２０１は、ステップＳ６０７において決定された設定に基づいてスピーカ２１３（ＳＰ１〜ＳＰ４）の方向を設定して（ステップＳ６０９）、一連の処理を終了する。 CPU201 sets the direction of speaker 213 (SP1-SP4) based on the setting determined in Step S607 (Step S609), and ends a series of processings.

なお、本発明の各構成要素における処理と、本発明の実施形態の各処理または各機能とを関連付けて説明すると、ステップＳ６０３におけるＣＰＵ２０１、音声Ｉ／Ｆ２０５およびテスト信号発生部３０１の処理によって、本発明の処理方法における生成工程およびテスト音声出力工程の処理が実行される。ステップＳ６０４におけるＣＰＵ２０１、音声Ｉ／Ｆ２０５および残留エコー測定部３０３の処理によって、本発明の処理方法における測定工程の処理が実行される。ステップＳ６０５〜Ｓ６０９におけるＣＰＵ２０１、音声Ｉ／Ｆ２０５および駆動部３０５の処理によって、本発明の処理方法における変更制御工程の処理が実行される。 The processing in each component of the present invention will be described in association with each processing or each function in the embodiment of the present invention. By the processing of the CPU 201, the audio I / F 205, and the test signal generation unit 301 in step S603, the present processing is performed. The process of the production | generation process and test audio | voice output process in the processing method of invention is performed. The processing of the measurement process in the processing method of the present invention is executed by the processing of the CPU 201, the audio I / F 205, and the residual echo measurement unit 303 in step S604. The processing of the change control step in the processing method of the present invention is executed by the processing of the CPU 201, the audio I / F 205, and the driving unit 305 in steps S605 to S609.

以上説明したように、本発明の実施形態のテレビ会議システム、テレビ会議端末および処理方法によれば、テスト音声信号を発生させて、テスト音声による音響エコーが最小となるようスピーカの向きを設定することができる。したがって、音響エコーが最小の状態で適応フィルタによるフィルタリング処理をおこなうことができるため、的確に音響エコーの除去をおこなうことができる。 As described above, according to the video conference system, the video conference terminal, and the processing method of the embodiment of the present invention, the test audio signal is generated, and the speaker orientation is set so that the acoustic echo due to the test audio is minimized. be able to. Therefore, since the filtering process by the adaptive filter can be performed in a state where the acoustic echo is minimum, the acoustic echo can be accurately removed.

また、本発明の実施形態によれば、利用者の存在する方向から所定範囲でスピーカの向きの変更制御をおこなうため、利用者にとって適切な使用範囲でスピーカの向きを制御することができる。 Further, according to the embodiment of the present invention, the speaker direction change control is performed within a predetermined range from the direction in which the user exists. Therefore, the speaker direction can be controlled within an appropriate usage range for the user.

特に、本発明の実施形態によれば、複数のマイク、スピーカが存在する場合や、複数人数の利用者が存在する場合などであっても、最小限に音響エコーを抑えてから音響エコーの除去をおこなうこととなる。したがって、的確な音響エコーの除去を実行することができる。 In particular, according to the embodiment of the present invention, even when there are a plurality of microphones and speakers, or when there are a plurality of users, acoustic echo removal is performed after minimizing acoustic echoes. Will be performed. Accordingly, accurate acoustic echo removal can be performed.

（その他の一部の変形例）
本発明の実施形態では特に、図６に示したステップＳ６０１において、テレビ会議が開始された段階でテスト音声信号を発生させる構成としたがこれに限ることはない。具体的には、ＣＰＵ２０１は、テレビ会議の開始や所定周期で本発明の検知手段および取得手段として機能するカメラ２１２によってテレビ会議がおこなわれている会議室の画像を撮像する。ＣＰＵ２０１は、撮像された画像が、以前に撮像された画像と異なる場合、利用者の配置やテレビ会議端末の設置場所など周辺環境に変化が生じたタイミングとして、テスト音声信号を発生させることとしてもよい。 (Other variations)
In the embodiment of the present invention, in particular, in step S601 shown in FIG. 6, the test audio signal is generated when the video conference is started. However, the present invention is not limited to this. Specifically, the CPU 201 captures an image of the conference room in which the video conference is being held by the camera 212 functioning as the detection unit and the acquisition unit of the present invention at the start of the video conference or at a predetermined cycle. If the captured image is different from the previously captured image, the CPU 201 may generate a test audio signal as a timing at which a change occurs in the surrounding environment such as the arrangement of the user and the installation location of the video conference terminal. Good.

ここで、図１０を用いて、本発明の実施形態の変形例におけるテレビ会議端末１１０の処理の内容について説明する。図１０は、本発明の実施形態の変形例におけるテレビ会議端末の処理の内容を示すフローチャートである。なお、図１０のフローチャートにおいて、図６と同様の処理については、同一のステップ番号を付して説明を省略する。 Here, the content of the process of the video conference terminal 110 in the modification of the embodiment of the present invention will be described with reference to FIG. FIG. 10 is a flowchart showing the contents of processing of the video conference terminal according to the modification of the embodiment of the present invention. In the flowchart of FIG. 10, processes similar to those in FIG. 6 are denoted by the same step numbers and description thereof is omitted.

図１のフローチャートにおいて、ＣＰＵ２０１は、ステップＳ６０１においてテレビ会議が開始されると、映像Ｉ／Ｆ２０４を介してカメラ２１２によって会議室内の周辺環境の撮像をおこない（ステップＳ１００１）、撮像した画像データを記録媒体２０８に記憶させる。 In the flowchart of FIG. 1, when a video conference is started in step S601, the CPU 201 images the surrounding environment in the conference room with the camera 212 via the video I / F 204 (step S1001), and records the captured image data. It is stored in the medium 208.

ＣＰＵ２０１は、前回に記憶媒体２０８に記憶されている画像データと、ステップＳ１００１において撮像された画像データとを比較して、テレビ会議端末１１０が設置された設置場所に変更があったか否かを判断する（ステップＳ１００２）。 The CPU 201 compares the image data previously stored in the storage medium 208 with the image data captured in step S1001, and determines whether or not the installation location where the video conference terminal 110 is installed has changed. (Step S1002).

ステップＳ１００２において、設置場所の変更がなかった場合（ステップＳ１００２：Ｎｏ）は、そのまま一連の処理を終了する。すなわち、前回おこなわれたテレビ会議と同等の環境である場合は、あらためて、スピーカの調整をする手間を省くことができる。ステップＳ１００２において、設置場所に変更があった場合（ステップＳ１００２：Ｙｅｓ）は、図６に示したステップＳ６０２以降の処理へ移行する。 In step S1002, when there is no change in the installation location (step S1002: No), the series of processing ends as it is. That is, when the environment is the same as that of the previous video conference, it is possible to save the trouble of adjusting the speaker again. If the installation location is changed in step S1002 (step S1002: Yes), the process proceeds to step S602 and subsequent steps shown in FIG.

なお、図１０のフローチャートでは、テレビ会議開始後に撮像された画像データを用いて設置場所の変更を検知する構成としたがこれに限ることはない。すなわち、テレビ会議開始後に所定周期で撮像された画像データを比較することで、会議中の利用者の入退室や、会議場所の変更などを検知して、周辺環境の変化に迅速に対応することとしてもよい。 In the flowchart of FIG. 10, the installation location change is detected using image data captured after the start of the video conference. However, the present invention is not limited to this. In other words, by comparing the image data captured at a predetermined cycle after the start of the video conference, it is possible to detect entry / exit of users during the conference, change of conference location, etc., and respond quickly to changes in the surrounding environment It is good.

このように、テスト音声信号を発生させるタイミングの調整によって、音響エコーの状態に変化が生じる可能性があるタイミングで的確にスピーカ２１３の設定を変更することができる。したがって、確実に音響エコーのフィルタリングをおこなうことができる。また、周辺環境の変化はカメラ２１２での検知に限ることはない。すなわち、利用者のログイン状態や、ＧＰＳ（ＧｌｏｂａＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）などの位置検出手段を用いることとしてもよい。このようにすれば、周辺環境の検知手段に汎用性を持たせることができる。 Thus, by adjusting the timing for generating the test audio signal, the setting of the speaker 213 can be accurately changed at a timing at which the state of the acoustic echo may change. Therefore, acoustic echo filtering can be reliably performed. The change in the surrounding environment is not limited to detection by the camera 212. That is, a user's login state or position detecting means such as GPS (Global Positioning System) may be used. In this way, it is possible to provide versatility to the surrounding environment detection means.

また、本発明の実施形態では特に、テスト音声について限定していないが１種類に限ることはない。すなわち、複数種類のテスト音声を用いて音響エコー情報を測定することとしてもよい。このようにすれば、的確に音響エコー情報を測定することができる。また、テスト音声は可聴音に限らず、非可聴音としてもよい。テスト音声を非可聴音とすることで、テレビ会議中など利用者が利用している間にテスト音声を出力しても利用者に違和感を与えることがない。 In the embodiment of the present invention, the test voice is not particularly limited, but is not limited to one type. That is, acoustic echo information may be measured using a plurality of types of test sounds. In this way, acoustic echo information can be accurately measured. The test sound is not limited to an audible sound, and may be a non-audible sound. By making the test sound non-audible, even if the test sound is output while the user is using such as during a video conference, the user does not feel uncomfortable.

また、本発明の実施形態では特に、テスト音声信号について、他拠点の音声とは異なるテスト音声信号を生成することとして説明したが、これに限ることはない。具体的には、他拠点から受信される音声信号の一部あるいは全部に基づいてテスト音声信号を生成することとしてもよい。すなわち、他拠点から受信される音声信号をそのままテスト音声信号として用いたり、他拠点から受信される音声信号の一部をテスト音声信号として用いたりしてもよい。このようにすれば、テスト音声信号を生成する処理負荷を低減させることができる。 In the embodiment of the present invention, the test audio signal has been described as generating a test audio signal different from the audio of another site, but the present invention is not limited to this. Specifically, the test audio signal may be generated based on part or all of the audio signal received from another site. That is, an audio signal received from another site may be used as it is as a test audio signal, or a part of an audio signal received from another site may be used as a test audio signal. In this way, the processing load for generating the test audio signal can be reduced.

また、本発明の実施形態では特に、スピーカ２１３の向きを、測定結果テーブル５００に示したように最小状態５００に設定することとしたがこれに限ることはない。すなわち、音響エコーの大きさが、音響エコーが除去可能なレベルであるなどの所定値以下となった設定としてもよい。このようにすることで、多くのスピーカ２１３の向きの組合せをすべて試す必要がなくなるため、迅速な設定かつ処理負荷の低減を図ることができる。 In the embodiment of the present invention, the direction of the speaker 213 is set to the minimum state 500 as shown in the measurement result table 500, but the present invention is not limited to this. That is, the acoustic echo size may be set to a predetermined value or less, such as a level at which the acoustic echo can be removed. In this way, it is not necessary to try all combinations of orientations of the speakers 213, so that quick setting and reduction of processing load can be achieved.

また、最小状態５００が複数存在した場合は、利用者の方向に向いているθ１に近い向きを優先して設定することとしてもよい。このようにすれば、本来のスピーカ２１３の役割としての利用者に音声を聴かせる部分の機能を損なうことがない。 When there are a plurality of minimum states 500, the direction close to θ1 facing the user may be set with priority. In this way, the function of the part that allows the user to listen to the sound as the original role of the speaker 213 is not impaired.

また、本発明の実施形態では特に、スピーカ２１３に対して１人の利用者の方向をθ１に設定することとして説明したが、これに限ることはない。具体的には、複数の利用者に対して、複数の利用者を包括する角度の範囲でスピーカ２１３の角度制御をおこなうこととしてもよい。 In the embodiment of the present invention, the direction of one user is set to θ1 with respect to the speaker 213, but the present invention is not limited to this. Specifically, the angle control of the speaker 213 may be performed on a plurality of users within a range of angles including the plurality of users.

ここで、図７を用いて、本発明の実施形態の変形例における複数の利用者でスピーカ２１３の角度制御をおこなう場合について説明する。図７は、本発明の実施形態の変形例における複数の利用者に対するスピーカの角度制御の一例を示す説明図である。 Here, the case where the angle control of the speaker 213 is performed by a plurality of users in a modification of the embodiment of the present invention will be described with reference to FIG. FIG. 7 is an explanatory diagram showing an example of speaker angle control for a plurality of users according to a modification of the embodiment of the present invention.

図７において、スピーカ２１３の前には２人の利用者Ｍ１，Ｍ２が存在する。テレビ会議端末１１０は、カメラ２１２によって利用者Ｍ１，Ｍ２を撮像する。テレビ会議端末１１０は、撮像された画像データと、撮像時のカメラ２１２の向きからスピーカ２１３に対する利用者Ｍ１，Ｍ２の方向を検知する。テレビ会議端末１１０は、利用者Ｍ１，Ｍ２の方向を包括する角度範囲θについて、所定の間隔でテスト音声を発生させてスピーカの向きを設定することになる。このようにすれば、複数の利用者に対しても的確にスピーカの向きを設定することができる。 In FIG. 7, two users M1 and M2 exist in front of the speaker 213. The video conference terminal 110 images the users M1 and M2 with the camera 212. The video conference terminal 110 detects the directions of the users M1 and M2 with respect to the speaker 213 from the captured image data and the orientation of the camera 212 at the time of imaging. The video conference terminal 110 sets the direction of the speaker by generating a test sound at a predetermined interval with respect to the angle range θ including the directions of the users M1 and M2. In this way, the direction of the speaker can be accurately set for a plurality of users.

また、本発明の実施形態では特に、スピーカ２１３の角度の変更制御をおこなって音響エコーの最小状態を検出することとしたが、これに限ることはない。具体的には、スピーカ２１３の角度の代わりに、スピーカ２１３のＯＮ／ＯＦＦ状態に基づいて、音響エコーの最小状態を検出することとしてもよい。 In the embodiment of the present invention, the angle change of the speaker 213 is particularly controlled to detect the minimum state of the acoustic echo. However, the present invention is not limited to this. Specifically, the minimum state of the acoustic echo may be detected based on the ON / OFF state of the speaker 213 instead of the angle of the speaker 213.

ここで、図８を用いて、本発明の実施形態の変形例におけるスピーカのＯＮ／ＯＦＦ状態の変更制御について説明する。図８は、本発明の実施形態の変形例におけるピーカのＯＮ／ＯＦＦ状態の変更制御の一例を示す説明図である。 Here, the change control of the ON / OFF state of the speaker in a modification of the embodiment of the present invention will be described with reference to FIG. FIG. 8 is an explanatory view showing an example of change control of the ON / OFF state of the peaker in a modification of the embodiment of the present invention.

図８において、測定結果テーブル８００は、記憶媒体２０８に記憶され、各スピーカ２１３（ＳＰ１〜ＳＰ４）について、ＯＮまたはＯＦＦの状態としたときの音響エコー成分の大きさを示している。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、各スピーカ２１３（ＳＰ１〜ＳＰ４）をＯＮまたはＯＦＦとする。すなわち、各スピーカ２１３（ＳＰ１〜ＳＰ４）の状態がそれぞれＯＮまたはＯＦＦのうちいずれかであるすべての組合せのうち、すべてＯＦＦの状態をのぞいた場合について、テスト音声による音響エコーを測定した結果である。 In FIG. 8, a measurement result table 800 is stored in the storage medium 208 and indicates the magnitude of the acoustic echo component when each speaker 213 (SP1 to SP4) is turned on or off. The audio I / F 205 turns each speaker 213 (SP1 to SP4) ON or OFF according to the control of the CPU 201. That is, it is the result of measuring the acoustic echo by the test voice when all the combinations in which the state of each speaker 213 (SP1 to SP4) is either ON or OFF, except for the OFF state. .

図８の例では、スピーカ２１３（ＳＰ１）がＯＦＦ、スピーカ２１３（ＳＰ２）がＯＦＦ、スピーカ２１３（ＳＰ３）がＯＦＦ、スピーカ２１３（ＳＰ４）がＯＮの組合せのとき、音響エコーが最小状態８０１であることを示している。 In the example of FIG. 8, the acoustic echo is in the minimum state 801 when the speaker 213 (SP1) is OFF, the speaker 213 (SP2) is OFF, the speaker 213 (SP3) is OFF, and the speaker 213 (SP4) is ON. It is shown that.

このように、スピーカ２１３（ＳＰ１〜ＳＰ４）のＯＮ／ＯＦＦ状態によって音響エコー情報を測定することで、スピーカ２１３（ＳＰ１〜ＳＰ４）の角度を制御する場合に比べて簡便な仕組みで最小状態８０１を設定することができる。 Thus, by measuring the acoustic echo information according to the ON / OFF state of the speaker 213 (SP1 to SP4), the minimum state 801 can be reduced with a simpler mechanism than when the angle of the speaker 213 (SP1 to SP4) is controlled. Can be set.

また、スピーカ２１３の角度やＯＮ／ＯＦＦ状態の代わりに、スピーカ２１３（ＳＰ１〜ＳＰ４）の出力について総和出力を一定として出力値に基づいて、音響エコーの最小状態を検出することとしてもよい。 Moreover, it is good also as detecting the minimum state of an acoustic echo based on an output value, making the sum total output constant about the output of the speaker 213 (SP1-SP4) instead of the angle of a speaker 213, or an ON / OFF state.

ここで、図９を用いて、本発明の実施形態の変形例におけるスピーカの出力状態の変更制御について説明する。図９は、本発明の実施形態の変形例におけるピーカの出力状態の変更制御の一例を示す説明図である。 Here, the change control of the output state of the speaker in a modification of the embodiment of the present invention will be described with reference to FIG. FIG. 9 is an explanatory diagram showing an example of change control of the output state of the peaker in a modification of the embodiment of the present invention.

図９において、測定結果テーブル９００は、記憶媒体２０８に記憶され、各スピーカ２１３（ＳＰ１〜ＳＰ４）について、出力値の総和を１００とした出力状態の組合せによる音響エコー成分の大きさを示している。音声Ｉ／Ｆ２０５は、ＣＰＵ２０１の制御にしたがって、図示しない増幅器などによって各スピーカ２１３（ＳＰ１〜ＳＰ４）の出力値を変更可能とする。すなわち、各スピーカ２１３（ＳＰ１〜ＳＰ４）を合計すると一定の出力が保たれた状態である組合せ、テスト音声による音響エコーを測定した結果である。 In FIG. 9, a measurement result table 900 is stored in the storage medium 208, and indicates the magnitude of the acoustic echo component for each speaker 213 (SP1 to SP4) by a combination of output states where the sum of output values is 100. . The audio I / F 205 can change the output value of each speaker 213 (SP1 to SP4) by an amplifier (not shown) or the like under the control of the CPU 201. That is, it is a result of measuring acoustic echoes due to a combination and a test voice in which a certain output is maintained when the speakers 213 (SP1 to SP4) are summed.

図９の例では、スピーカ２１３（ＳＰ１）が２５、スピーカ２１３（ＳＰ２）が２５、スピーカ２１３（ＳＰ３）が２５、スピーカ２１３（ＳＰ４）が２５の組合せのとき、音響エコーが最小状態９０１であることを示している。 In the example of FIG. 9, the acoustic echo is in the minimum state 901 when the speaker 213 (SP1) is 25, the speaker 213 (SP2) is 25, the speaker 213 (SP3) is 25, and the speaker 213 (SP4) is 25. It is shown that.

また、上述した説明では、実施形態および一部の変形例について別々の例として説明したが、これに限ることはない。すなわち、それぞれを組み合わせた構成として、実施形態および一部の変形例による手法を適宜組み合わせて利用してもよい。 In the above description, the embodiment and some of the modifications have been described as separate examples, but the present invention is not limited to this. That is, as a combination of the above, the methods according to the embodiment and some modifications may be used in appropriate combination.

なお、本発明の実施形態および変形例で説明した方法は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。この通信プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネットなどのネットワークを介して配布することが可能な伝送媒体であってもよい。 Note that the methods described in the embodiments and modifications of the present invention can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The communication program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

１００テレビ会議システム
１１０（１１０ａ，１１０ｂ）テレビ会議端末
２００バス
２０１ＣＰＵ
２０２ＲＡＭ
２０３ＲＯＭ
２０４映像Ｉ／Ｆ
２０５音声Ｉ／Ｆ
２０６操作部
２０７通信Ｉ／Ｆ
２０８記憶媒体
２１１ディスプレイ
２１２カメラ
２１３スピーカ
２１４マイク
３０１テスト信号発生部
３０２出力態様制御部
３０３残留エコー測定部
３０４適応フィルタ測定部
３０５駆動部
100 video conference system 110 (110a, 110b) video conference terminal 200 bus 201 CPU
202 RAM
203 ROM
204 Video I / F
205 Voice I / F
206 Operation unit 207 Communication I / F
DESCRIPTION OF SYMBOLS 208 Storage medium 211 Display 212 Camera 213 Speaker 214 Microphone 301 Test signal generation part 302 Output mode control part 303 Residual echo measurement part 304 Adaptive filter measurement part 305 Drive part

Claims

A voice input unit that receives a voice input; a voice output unit that outputs a voice based on a voice signal received from a terminal device connected to another base via a network; and a detection unit that detects the arrangement of the user. A terminal device comprising:
Generating means for generating a test audio signal;
Test audio output means for converting the test audio signal generated by the generating means into test audio and outputting the test audio signal by the audio output unit;
Measuring means for measuring acoustic echo information generated by the test sound output by the test sound output means being input to the sound input unit;
Based on the acoustic echo information measured by the measuring means, change control means for changing the output mode of the audio output unit;
With
The change control means performs change control of the output mode based on the arrangement of the users detected by the detection means,
The test voice output means outputs the test voice when the detection means detects that the user is placed in a predetermined range around the terminal apparatus.

The terminal apparatus according to claim 1, wherein the change control unit performs change control of a directivity angle of a sound output by the sound output unit.

An acquisition means for acquiring arrangement information related to the arrangement of the terminal device;
The terminal device according to claim 1, wherein the test sound output unit outputs the test sound when the arrangement information is changed.

Detection knowledge means for detecting an audio input unit for receiving input of voice, a voice output unit for outputting sound based on the audio signal received from the terminal device connected different hub via a network, the placement of the user A processing method by a terminal device comprising:
A generation process for generating a test audio signal;
A test voice output step of converting the test voice signal generated by the generation step into a test voice and outputting the test voice signal by the voice output unit;
A measurement step of measuring acoustic echo information generated by inputting the test voice output by the test voice output step to the voice input unit;
Based on the acoustic echo information measured by the measurement step, a change control step for changing the output mode of the audio output unit;
Including
The change control step performs change control of the output mode based on the arrangement of the users detected by the detection means,
The test voice output step outputs the test voice when the detecting means detects that the user is placed in a predetermined range around the terminal device.

Detection knowledge means for detecting an audio input unit for receiving input of voice, a voice output unit for outputting sound based on the audio signal received from the terminal device connected different hub via a network, the placement of the user A processing program for a terminal device comprising:
A generation process for generating a test audio signal;
A test voice output step of converting the test voice signal generated by the generation step into a test voice and outputting the test voice signal by the voice output unit;
A measurement step of measuring acoustic echo information generated by inputting the test voice output by the test voice output step to the voice input unit;
Based on the acoustic echo information measured by the measurement step, a change control step for changing the output mode of the audio output unit;
To the computer,
The change control step performs change control of the output mode based on the arrangement of the users detected by the detection means,
The test voice output step outputs the test voice when the detecting means detects that the user is placed in a predetermined range around the terminal device. Processing program.