JP2013141182A

JP2013141182A - Interaction environment reproduction method and device

Info

Publication number: JP2013141182A
Application number: JP2012001211A
Authority: JP
Inventors: Yasuhito Fujita; 康仁藤田; Yoichi Hata; 洋一畑; Yoshimitsu Goto; 由光後藤
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 2012-01-06
Filing date: 2012-01-06
Publication date: 2013-07-18

Abstract

PROBLEM TO BE SOLVED: To provide an interaction environment reproduction method which allows for confirmation of echo occurrence situation in the interaction environment of the other side interlocutor, in its own interaction environment.SOLUTION: First test sound data recorded in interaction environment where its own conference terminal is installed as a test sound source is stored in a recording unit, reproduction sound of the first test sound data is taken in via a microphone, and the environmental data thus taken in is stored in the recording unit as second test sound data. The second test sound data thus stored includes the information of interaction environment where its own conference terminal is installed, and echo occurrence situation in the interaction environment where its own conference terminal is installed is reproduced by reproducing the second test sound data via a speaker.

Description

本発明は、所定の伝送手段に互いに接続された複数の情報端末間でテレビ会議、電話会議等を実現する双方向対話システムにおける対話環境再現方法及び該対話環境再現方法を実現するための装置に関し、特に、自己の情報端末が設置された対話環境（自己の対話環境）において、相手側対話者の情報端末へ送信されるべき電子データのうち、音声や環境音などのオーディオデータを再生することにより、該自己の対話環境下で相手側対話者の対話環境を再現する技術に関するものである。 The present invention relates to an interactive environment reproduction method in an interactive dialogue system for realizing a video conference, a telephone conference, etc. between a plurality of information terminals connected to each other to a predetermined transmission means, and an apparatus for realizing the interactive environment reproduction method. In particular, in the interactive environment where the information terminal of the user is installed (the user's own dialog environment), audio data such as voice and environmental sound among the electronic data to be transmitted to the information terminal of the other party's dialog Thus, the present invention relates to a technique for reproducing the dialogue environment of the other party's dialogue person in the own dialogue environment.

従来、既存の一般電話回線網、光ネットワーク、専用回線など、有線又は無線のデータ通信を実現するインターネットに代表されるネットワークに接続されているパーソナルコンピュータ（以下、ＰＣという）を利用して、アプリケーション、資料・画像、映像、音声情報などを遠隔地の一人又は複数人とやり取りをするテレビ会議、電話会議等の双方向対話システム（ウエブ会議と呼ばれることもある）が利用されている（特許文献１参照）。この双方向対話システムにおいて音声などの電子データのやり取りを行う場合、ＰＣに内蔵あるいは接続されたマイク及びスピーカの他、オーディオ端子やＵＳＢ端子に接続されたヘッドセット、ＰＣに接続された専用受話器なども利用される。 Conventionally, an application using a personal computer (hereinafter referred to as a PC) connected to a network represented by the Internet that realizes wired or wireless data communication, such as an existing general telephone line network, an optical network, a dedicated line, etc. In addition, interactive dialogue systems (also called web conferences) such as video conferences and telephone conferences that exchange data / images, videos, audio information, etc. with one or more people in remote locations are used (Patent Literature). 1). When exchanging electronic data such as voice in this interactive dialogue system, in addition to the microphone and speaker built in or connected to the PC, a headset connected to the audio terminal or USB terminal, a dedicated receiver connected to the PC, etc. Is also used.

このような双方向対話システムは対話者ごとに異なる対話環境に起因して相手側対話者の環境下で発生するエコーが問題となっており、そのため、例えば特許文献２に開示されたようなエコーキャンセリング技術の研究が盛んに行われている。 Such an interactive dialogue system has a problem of echo generated in the environment of the other party's dialogue due to different dialogue environments for each of the dialogues. Research on canceling technology is actively conducted.

日本国特許４７０６４３９号明細書Japanese Patent No. 4706439 Specification 特開２００８−１４１７３５号公報JP 2008-141735 A

発明者らは、上述のような従来の双方向対話システムに適用されるエコーキャンセリング技術について検討した結果、以下のような課題を発見した。すなわち、従来の双方向対話システムでは、ＰＣなどの情報端末の設置環境（反響具合、環境ノイズ有無などの対話環境）、マイクなどの集音器に対する発言者の位置、声量など様々な条件下で運用されており、状況によって、エコーが発生する場合がある。 As a result of studying the echo canceling technique applied to the conventional interactive dialog system as described above, the inventors have found the following problems. In other words, in a conventional interactive dialogue system, there are various conditions such as the installation environment of an information terminal such as a PC (interactive environment such as reverberation and presence of environmental noise), the position of the speaker with respect to a sound collector such as a microphone, and the volume of voice. Echo may occur depending on the situation.

そのため、従来から、上記特許文献２など、エコーやハウリングを除去する様々な技術が提案されているが、実際の対話環境では、様々な空間構造（関連設備の配置も含む）、設置場所・向き、発言者の声量などが存在し、全ての利用状況で十分なエコーキャンセルを実現することは難しい。 For this reason, various techniques for removing echoes and howling have been proposed, such as the above-mentioned Patent Document 2, but in an actual interactive environment, various spatial structures (including the arrangement of related equipment), installation locations and orientations. There is a voice volume of the speaker, and it is difficult to realize sufficient echo cancellation in all use situations.

特にエコーに関しては、相手側対話者の対話環境下で発生するため、主なエコー発生源である対話者自身はエコー発生が認識できない。そのため、相手側対話者の対話環境下におけるエコー解消に向けた調整は困難であった。 In particular, since echo is generated in the dialog environment of the other party's dialog, the user who is the main echo source cannot recognize the occurrence of echo. For this reason, it has been difficult to make adjustments for echo cancellation in the conversation environment of the other party's conversation.

本発明は、上述のような課題を解決するためになされたものであり、所定の伝送手段に互いに接続された複数の情報端末間でテレビ会議、電話会議等を実現する双方向対話システムにおいて、対話者自身が自己の情報端末が設置された対話環境（自己の対話環境）で相手側対話者の対話環境（エコー発生状況）を再現することを可能にする対話環境再現方法及び該対話環境再現方法を実現するための装置を提供することを目的としている。 The present invention has been made in order to solve the above-described problems, and in an interactive dialogue system that realizes a video conference, a conference call, etc. between a plurality of information terminals connected to a predetermined transmission means. Dialog environment reproduction method and dialog environment reproduction that enable the dialog person himself / herself to reproduce the conversation environment (echo occurrence state) of the other party's talker in the dialog environment (self-conversation environment) in which his / her information terminal is installed An object is to provide an apparatus for implementing the method.

本発明に係る対話環境再現方法は、所定の伝送手段に互いに接続された複数の情報端末間でテレビ会議、電話会議等を実現する双方向対話システムにおける対話環境再現方法に関し、対話開始前の事前確認作業として実行される。すなわち、当該再現方法は、自己の情報端末が設置された対話環境でエコー発生状況を再現することにより、対話者自身による相手側対話者の対話環境の事前確認（双方向対話開始前の確認）を可能にする。 The interactive environment reproduction method according to the present invention relates to an interactive environment reproduction method in a two-way interactive system that realizes a video conference, a telephone conference, etc. between a plurality of information terminals connected to a predetermined transmission means. It is executed as confirmation work. In other words, the reproduction method reproduces the echo generation situation in the dialogue environment where its information terminal is installed, so that the dialogue environment of the other party is confirmed in advance (confirmation before starting the interactive dialogue). Enable.

ここで、所定の伝送手段は、インターネットなど、有線、無線を問わず、公衆回線、形態電話回線等の一般的な通信ネットワークの他、構内ＬＡＮ、家庭内ＬＡＮも含む概念であり、パッケットデータの送受信を行う情報端末（ＰＣなどの情報処理装置）間に位置する通信経路全般を意味する。また、情報端末間で授受されるデータは、少なくとも音声データを含む電子データ（ディジタル、アナログのいずれであってもよい）である。 Here, the predetermined transmission means is a concept including a local LAN and a home LAN as well as a general communication network such as the public line, a form telephone line, etc., regardless of wired or wireless, such as the Internet. It means all communication paths located between information terminals (information processing devices such as PCs) that perform transmission and reception. The data exchanged between the information terminals is electronic data (which may be either digital or analog) including at least audio data.

本実施形態に係る対話環境再現方法では、自己の情報端末が設置された対話環境下で収録されたテスト用音源として、第１テスト音データが当該自己の情報端末の記録手段に格納される。なお、第1テスト音データは、音声や環境音などのオーディオデータであるのが好ましく、環境音には、音楽、人工的に作られた効果音の他、外部騒音、機器雑音等も含まれる。また、第1テスト音データは、種々の取得ルートを介して得られる。例えば、第1テスト音データが所定の伝送手段に接続された別の情報端末の記録手段に予め格納された電子データ（予め自己の情報端末のマイクを介して取り込まれたディジタル又はアナログデータ）である場合、該第１テスト音データは、所定の伝送手段を介して取得可能である。また、ネットワーク上に予め記録されたテスト用音源（第1テスト音データ）が存在しない場合、自己の情報端末のマイクを介して取り込まれた電子データが、第１テスト音データとして自己の情報端末の記録手段に格納されてもよい。 In the interactive environment reproduction method according to the present embodiment, the first test sound data is stored in the recording means of the own information terminal as a test sound source recorded in the interactive environment where the own information terminal is installed. The first test sound data is preferably audio data such as voice and environmental sound. The environmental sound includes external noise, equipment noise, etc. in addition to music and artificially created sound effects. . In addition, the first test sound data is obtained through various acquisition routes. For example, the first test sound data is electronic data stored in advance in recording means of another information terminal connected to predetermined transmission means (digital or analog data previously taken in through the microphone of its own information terminal) In some cases, the first test sound data can be acquired via a predetermined transmission means. Also, when there is no test sound source (first test sound data) recorded in advance on the network, the electronic data captured through the microphone of the own information terminal is used as the first test sound data. It may be stored in the recording means.

本実施形態に係る対話環境再現方法では、上述のように記録手段に格納された第１テスト音データに基づく音を自己の情報端末のスピーカを介して再生しながら該自己の情報端末のマイクを介して取り込まれた第２テスト音データが、第1テスト音データと同様に、自己の対話環境に関する環境データとして記録手段に格納される。この第２テスト音データは、自己の情報端末が設置された環境下において再生された第1テスト音データを、マイクを介して取り込むことにより得られたデータであるため、本来、相手側対話者の対話環境において再生される環境データ（相手側対話者がスピーカを介して聞く音声や環境音などのオーディオデータ）に相当する。 In the interactive environment reproduction method according to the present embodiment, a sound based on the first test sound data stored in the recording unit as described above is reproduced through the speaker of the own information terminal while the microphone of the own information terminal is used. Like the first test sound data, the second test sound data fetched via this is stored in the recording means as environment data relating to its own dialogue environment. Since the second test sound data is data obtained by capturing the first test sound data reproduced in the environment where the information terminal is installed through the microphone, the second test sound data is inherently the other party's dialogue person. Corresponds to environmental data (audio data such as voices and environmental sounds heard by the other party's dialog through a speaker).

そこで、本実施形態に係る対話環境再現方法では、自己の情報端末のスピーカを介して第２テスト音データに基づく音を再生することにより、自己の情報端末が設置された対話環境下でのエコー発生状況が対話者自身の対話環境下で再現され得る。なお、再生されたエコー発生状況は、ＬＥＤなどの所定の表示手段を利用して視覚的に表示できるような構成も実現可能である。また、その際、再現されたエコー発生状況を、第２テスト音データに含まれるエコー成分（第１テスト音データと第２テスト音データとの差分データ）の音量情報に基づいて判断し、その判定結果を所定の表示手段を利用して視覚的に表示する構成も実現可能である。特に、判定結果の視覚的な表示は、対話環境の整備状態を客観的に評価する指標として有効である。 Therefore, in the interactive environment reproduction method according to the present embodiment, the sound based on the second test sound data is reproduced via the speaker of the own information terminal, thereby echoing in the interactive environment where the own information terminal is installed. The occurrence situation can be reproduced in the dialogue environment of the dialogue person. It is possible to realize a configuration in which the reproduced echo occurrence state can be visually displayed using a predetermined display means such as an LED. At that time, the reproduced echo occurrence situation is determined based on the volume information of the echo component (difference data between the first test sound data and the second test sound data) included in the second test sound data, A configuration in which the determination result is visually displayed using a predetermined display means can also be realized. In particular, the visual display of the determination result is effective as an index for objectively evaluating the state of maintenance of the conversation environment.

本実施形態に係る対話環境再現方法において、双方向対話に寄与する情報端末（少なくとも相手側対話者の対話環境に設置された情報端末）は、少なくともエコーキャンセリング機能を有してもよい。 In the dialog environment reproduction method according to the present embodiment, an information terminal that contributes to a two-way dialog (at least an information terminal installed in the dialog environment of the other-party talker) may have at least an echo canceling function.

さらに、本発明に係る対話環境再現方法は、上記ＰＣ等のコンピュータにより実現可能なコンピュータプログラムとして、ハードディスク、ＣＤ、ＤＶＤ、ブルーレイ等の、コンピュータで読み書き可能な外部記録媒体（情報記録媒体）に記録されていてもよい。 Furthermore, the interactive environment reproduction method according to the present invention is recorded on a computer readable / writable external recording medium (information recording medium) such as a hard disk, CD, DVD, or Blu-ray as a computer program that can be realized by a computer such as the PC. May be.

また、上述の対話環境再現方法を実現する装置（本発明に係る装置）は、少なくとも、制御手段、記録手段、表示手段を有する。制御手段は、上述の対話環境再現方法を実行する。記録手段は、電子データの読み取り及び書き込みが可能な電子デバイスである。表示手段は、記録手段に格納された第２テスト音データに含まれるエコー成分の発生（所定値以上の音量でエコー成分が再現される場合の他、エコー成分の音量が変化する場合も含む）を視覚的に表示する。また、対話環境の整備状態を客観的に評価するため、当該装置は、さらに判定手段を備えてもよい。この判定手段は、制御手段により再現されたエコー発生状況を、第２テスト音データに含まれるエコー成分の音量情報に基づいて判定する。その際、表示手段は、判定手段の判定結果を予め設定された判定レベルに応じて視覚的に表示する。 In addition, an apparatus (apparatus according to the present invention) that realizes the above-described interactive environment reproduction method includes at least a control unit, a recording unit, and a display unit. The control means executes the above-described interactive environment reproduction method. The recording means is an electronic device capable of reading and writing electronic data. The display means generates an echo component included in the second test sound data stored in the recording means (including the case where the echo component is reproduced at a volume higher than a predetermined value and the volume of the echo component changes) Is displayed visually. Moreover, in order to objectively evaluate the maintenance state of the dialogue environment, the apparatus may further include a determination unit. The determination means determines the echo occurrence state reproduced by the control means based on the volume information of the echo component included in the second test sound data. At that time, the display means visually displays the determination result of the determination means in accordance with a predetermined determination level.

なお、本発明に係る各実施例は、以下の詳細な説明及び添付図面によりさらに十分に理解可能となる。これら実施例は単に例示のために示されるものであって、この発明を限定するものと考えるべきではない。 The embodiments according to the present invention can be more fully understood from the following detailed description and the accompanying drawings. These examples are given for illustration only and should not be construed as limiting the invention.

また、本発明のさらなる応用範囲は、以下の詳細な説明から明らかになる。しかしながら、詳細な説明及び特定の事例はこの発明の好適な実施例を示すものではあるが、例示のためにのみ示されているものであって、この発明の範囲における様々な変形および改良はこの詳細な説明から当業者には自明であることは明らかである。 Further scope of applicability of the present invention will become apparent from the detailed description given below. However, the detailed description and specific examples, while indicating the preferred embodiment of the invention, are presented for purposes of illustration only and various modifications and improvements within the scope of the invention may It will be apparent to those skilled in the art from the detailed description.

本発明に係る対話環境再現方法及び装置は、ネットワークを介して複数情報端末間で行われるテレビ会議、電話会議等の双方向対話において発生するエコーなどのトラブルの有無を、双方向対話開始前に事前確認するための技術である。対話者は、当該再現方法を使用して、相手側対話者の対話環境下で発生し得るエコーを該対話者自身の対話環境下で事前確認することで、マイクやスピーカの配置方法の変更など、エコー除去のための対策を施すことが可能になる。 The interactive environment reproduction method and apparatus according to the present invention determines whether or not there is a trouble such as an echo generated in a bidirectional conversation such as a video conference and a telephone conference between a plurality of information terminals via a network before the interactive conversation is started. This is a technology for checking in advance. By using the reproduction method, the interlocutor confirms in advance the echo that may occur in the other party's dialogue environment in the dialogue environment of the other party, thereby changing the arrangement method of the microphone or speaker, etc. Therefore, it is possible to take measures for echo cancellation.

本実施形態に係る対話環境再現方法が適用可能な双方向対話システム（テレビ会議システム）の一構成例、及び会議用端末（本実施形態に係る装置を含む）の構成例を示す図である。It is a figure which shows the structural example of the interactive dialogue system (video conference system) to which the interactive environment reproduction method which concerns on this embodiment is applicable, and the structural example of the terminal for a meeting (the apparatus which concerns on this embodiment). 本実施形態に係る対話環境再現方法が適用可能な双方向対話システム（音声会議システム）の他の構成例を示す図である。It is a figure which shows the other structural example of the interactive dialogue system (voice conference system) which can apply the dialogue environment reproduction method which concerns on this embodiment. エコー発生メカニズムとエコーキャンセリング回路の構成例を説明するための図である。It is a figure for demonstrating the example of a structure of an echo generation | occurrence | production mechanism and an echo canceling circuit. 本実施形態に係る対話環境再現方法の一例を説明するためのフローチャートである。It is a flowchart for demonstrating an example of the interactive environment reproduction method which concerns on this embodiment. 本実施形態に係る対話環境再現方法を実行する会議用端末、特にオーディオインターフェース（音声インターフェースを含む）における概略動作を説明するための図である。It is a figure for demonstrating schematic operation | movement in the terminal for meetings which performs the interactive environment reproduction method which concerns on this embodiment, especially an audio interface (a voice interface is included). 本実施形態に係る対話環境再現方法の各工程における再生音声の波形及びエコー成分の波形を示す図である。It is a figure which shows the waveform of the reproduction | regeneration audio | voice and the waveform of an echo component in each process of the dialogue environment reproduction method which concerns on this embodiment. ＬＥＤアラームの種々の構成例を示す図である。It is a figure which shows the various structural examples of an LED alarm.

以下、本発明に係る対話環境再現方法及び装置の各実施形態を、図１〜図７を用いて詳細に説明する。なお、図面の説明において同一の要素には同一符号を付して重複する説明を省略する。また、対話環境を再現するためのテスト用音源は、音声や環境音（音楽、人工的に作られた効果音の他、外部騒音、機器雑音等も含まれる）などのオーディオデータが利用可能であるが、以下の実施形態の説明では、簡単のため、ディジタル音源としての音声データを利用したケースに限定して詳細に説明する。 Hereinafter, embodiments of the interactive environment reproduction method and apparatus according to the present invention will be described in detail with reference to FIGS. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant description is omitted. Audio data such as voice and environmental sounds (including music, artificially produced sound effects, external noise, equipment noise, etc.) can be used as test sound sources to reproduce the interactive environment. However, in the following description of the embodiment, for the sake of simplicity, a detailed description will be given only for the case of using audio data as a digital sound source.

図１は、本実施形態に係る対話環境再現方法が適用可能な双方向対話システム（テレビ会議システム）の一構成例、及び会議用端末（本実施形態に係る装置を含む）の構成例を示す図である。図１（ａ）には、ネットワーク（所定の伝送手段）１０を介して対話者３０Ａと相手側対話者３０Ｂとが一対一の双方向対話を実行するためのシステム構成（最小単位）が示されている。 FIG. 1 shows a configuration example of an interactive dialogue system (video conference system) to which the interactive environment reproduction method according to the present embodiment can be applied, and a configuration example of a conference terminal (including the device according to the present embodiment). FIG. FIG. 1 (a) shows a system configuration (minimum unit) for a conversation person 30A and a partner conversation person 30B to execute a one-to-one interactive conversation via a network (predetermined transmission means) 10. ing.

すなわち、対話者３０Ａ側には、ネットワーク１０に接続された会議用端末（ＰＣ等の情報端末）２０Ａが設置されており、この会議用端末２０Ａには、映像インターフェースの一部を構成する周辺機器、例えば対話者３０Ａ等を撮像するためのＣＣＤカメラ等の撮像部４１Ａ、相手側対話者３０Ｂの像等を表示するためのモニタ（表示手段）４２Ａが接続されている。また、会議用端末２０Ａには、オーディオインターフェース（音声インターフェースを含む）の一部を構成する周辺機器、例えば相手側対話者３０Ｂからの音声を再生するスピーカ５１Ａ、対話者３０Ａの音声を取り込むためのマイク５２Ａが接続されている。 That is, a conference terminal (an information terminal such as a PC) 20A connected to the network 10 is installed on the side of the interlocutor 30A. The conference terminal 20A includes peripheral devices that form part of the video interface. For example, an imaging unit 41A such as a CCD camera for imaging the conversation person 30A and the like, and a monitor (display means) 42A for displaying an image of the other party conversation person 30B are connected. In addition, the conference terminal 20A captures the sound of peripheral devices constituting a part of the audio interface (including the voice interface), for example, the speaker 51A for reproducing the voice from the other party talker 30B and the voice of the talker 30A. A microphone 52A is connected.

一方、相手側対話者３０Ｂ側には、ネットワーク１０に接続された会議用端末２０Ｂが設置されており、この会議用端末２０Ｂには、映像インターフェースの一部を構成する周辺機器、例えば相手側対話者３０Ｂ等を撮像するための撮像部４１Ｂ、対話者３０Ａの像等を表示するためのモニタ４２Ｂが接続されている。また、会議用端末２０Ｂには、オーディオインターフェースの一部を構成する周辺機器、例えば対話者３０Ａからの音声を再生するスピーカ５１Ｂ、相手側対話者３０Ｂの音声を取り込むためのマイク５２Ｂが接続されている。 On the other hand, a conference terminal 20B connected to the network 10 is installed on the other party talker 30B side, and the conference terminal 20B includes peripheral devices that constitute part of the video interface, for example, the other party conversation. An imaging unit 41B for imaging the person 30B and the like, and a monitor 42B for displaying an image of the conversation person 30A and the like are connected. Also, the conference terminal 20B is connected to peripheral devices that constitute part of the audio interface, for example, a speaker 51B that reproduces sound from the talker 30A, and a microphone 52B that captures the voice of the other party talker 30B. Yes.

上述の会議用端末２０Ａ、２０Ｂそれぞれは、図１（ｂ）に示されたような構造を有する。なお、会議用端末２０Ａ、２０Ｂは同一の構造を有するものとし、図１（ｂ）に示された各部の参照符号には添え字Ａ、Ｂが省略されている。 Each of the conference terminals 20A and 20B described above has a structure as shown in FIG. The conference terminals 20A and 20B have the same structure, and the subscripts A and B are omitted from the reference numerals of the respective parts shown in FIG.

会議用端末２０（２０Ａ、２０Ｂ）は、所定の伝送手段であるネットワーク１０と接続するためのインターフェースとして、データ入出力部（以下、Ｉ／Ｏという）２２１と、キーボード、ポインティングデバイス、外部記憶装置等の周辺機器との間でデータ授受を行うためのＩ／Ｏ２２２を備える。また、会議用端末２０は、種々の音声データや映像データを格納する記憶部（記録手段）２３０と、対話者３０Ａ及び相手側対話者３０Ｂの対話環境に起因して発生するエコーをキャンセルするためのエコーキャンセリング回路２４０、音声データ（ディジタルデータ）をスピーカ５１（５１Ａ、５１Ｂ）に出力可能にするためのディジタル／アナログ変換器（以下、Ｄ／Ａという）２５１、マイク５２（５２Ａ、５２Ｂ）から取り込まれた音声（アナログデータ）をディジタルデータに変換するためのアナログ／ディジタル変換器（以下、Ａ／Ｄという）２５２、Ｉ／Ｏ２２１を介してネットワーク１０から送られてきた映像情報をモニタ４２（４２Ａ、４２Ｂ）に表示するための描画部２６０、そして、当該会議用端末２０を構成する各部の動作内容及び動作タイミングを制御するための制御部（制御手段）２１０を備える。 The conference terminal 20 (20A, 20B) has a data input / output unit (hereinafter referred to as I / O) 221 as an interface for connecting to the network 10 as a predetermined transmission means, a keyboard, a pointing device, and an external storage device. I / O 222 for exchanging data with peripheral devices such as the above. In addition, the conference terminal 20 cancels echoes generated due to the conversation environment of the storage unit (recording unit) 230 for storing various audio data and video data, and the conversation party 30A and the other party conversation party 30B. Echo canceling circuit 240, digital / analog converter (hereinafter referred to as D / A) 251 for enabling audio data (digital data) to be output to speaker 51 (51A, 51B), microphone 52 (52A, 52B) The video information sent from the network 10 via the analog / digital converter (hereinafter referred to as A / D) 252 and I / O 221 for converting the audio (analog data) taken in from the digital signal into the digital data is monitored 42. (42A, 42B) and the drawing unit 260 for displaying on each, and each of the conference terminal 20 And a control unit (control means) 210 for controlling the operation content and the operation timing.

なお、表示手段は、会議用端末２０の本体に電気的に接続されるモニタ４２には限定されず、当該本体に取り付けられた液晶モニタ２６１、ＬＥＤアラーム２６２も含まれる。液晶モニタ２６１、ＬＥＤアラーム２６２は、制御部２１０により再現されるエコー発生状況の程度を視覚的に表示するための表示手段として有効である。特に、制御部２１０が本実施形態に係る対話環境再現方法を実行する制御手段として機能する他、対話環境の整備状態を判定する判定手段としても機能する場合、制御部２１０は、液晶モニタ２６１、ＬＥＤアラーム２６２、モニタ４２などの表示手段に判定結果を表示させるよう、描画部２６０を制御する。 The display means is not limited to the monitor 42 electrically connected to the main body of the conference terminal 20, and includes a liquid crystal monitor 261 and an LED alarm 262 attached to the main body. The liquid crystal monitor 261 and the LED alarm 262 are effective as display means for visually displaying the degree of echo generation status reproduced by the control unit 210. In particular, when the control unit 210 functions as a control unit that executes the interactive environment reproduction method according to the present embodiment, and also functions as a determination unit that determines a maintenance state of the interactive environment, the control unit 210 includes the liquid crystal monitor 261, The drawing unit 260 is controlled to display the determination result on the display means such as the LED alarm 262 and the monitor 42.

上述の構成要素のうち、制御部２１０、記憶部２３０、エコーキャンセリング回路２４０、Ｄ／Ａ２５１、Ａ／Ｄ２５２、スピーカ５１、及びマイク５２によりオーディオインターフェース２００Ａが構成される。また、制御部２１０、記憶部２３０、描画部２６０、モニタ４２、及び撮像部４１により映像インターフェース２００Ａが構成される。 Among the above-described components, the control unit 210, the storage unit 230, the echo canceling circuit 240, the D / A 251, the A / D 252, the speaker 51, and the microphone 52 constitute the audio interface 200A. The control unit 210, the storage unit 230, the drawing unit 260, the monitor 42, and the imaging unit 41 constitute a video interface 200A.

また、ネットワーク１０を介して行われる双方向対話の実施形態は図１に示されたような一対一の構成には限定されない。例えば、少なくとも一方の対話環境に複数の対話者が参加することで、一対多、多対多の双方向対話に対しても、本実施形態に係る対話環境再現方法は有効である。 Further, the interactive dialogue performed through the network 10 is not limited to the one-to-one configuration as shown in FIG. For example, the interactive environment reproduction method according to the present embodiment is effective even for one-to-many and many-to-many interactive conversations when a plurality of interlocutors participate in at least one interactive environment.

すなわち、図２は、本実施形態に係る対話環境再現方法が適用可能な双方向対話システム（音声会議システム）の他の構成例を示す図である。 That is, FIG. 2 is a diagram illustrating another configuration example of the interactive dialogue system (voice conference system) to which the dialogue environment reproduction method according to the present embodiment can be applied.

図２に示された双方向対話システムでは、ネットワーク１０を介して接続された一方の会議用端末２０Ａが会場Ａ（対話環境４００Ａ）に設置され、他方の会議用端末２０Ｂが別の会場Ｂ（対話環境４００Ｂ）に設置される。また、この多対多の双方向対話は、ネットワーク１０に接続された会議用サーバ３００により管理されている。なお、会議用サーバ３００は、会議のスケジュール、議事内容の記録データ（音声、映像、テキスト等）が格納される記憶部３２０、ネットワーク１０を介して会議用端末２０Ａ、２０Ｂとデータ等の授受を行うためのＩ／Ｏ３１０を、少なくとも備える。 In the interactive dialogue system shown in FIG. 2, one conference terminal 20A connected via the network 10 is installed in the venue A (interactive environment 400A), and the other conference terminal 20B is installed in another venue B ( Installed in the interactive environment 400B). The many-to-many interactive dialogue is managed by the conference server 300 connected to the network 10. The conference server 300 exchanges data and the like with the conference terminals 20A and 20B via the network 320 and the storage unit 320 that stores the conference schedule and the recording data (voice, video, text, etc.) of the proceedings. I / O 310 for performing at least.

会場Ａの対話環境４００Ａには、複数の対話者３１Ａ（それぞれが図１（ａ）の対話者３０Ａに相当）が参加しており、対話者３１Ａの前方には、ネットワーク１０を介して会議用サーバ３００、会場Ｂの対話環境４００Ｂとのデータ授受を行うためのＩ／２２１を有する会議用端末２０Ａが設置されている。会議用端末２０Ａには、少なくとも、スピーカ５１Ａ、マイク５２Ａが接続されている。もちろん、会議用端末２０Ａには、モニタ及び撮像部が接続されてもよい。同様に、会場Ｂの対話環境４００Ｂには、複数の相手側対話者３１Ｂ（それぞれが図１（ａ）の相手側対話者３０Ｂに相当）が参加しており、相手側対話者３１Ｂの前方には、ネットワーク１０を介して会議用サーバ３００、会場Ａの対話環境４００Ａとのデータ授受を行うためのＩ／Ｏ２２１を有する会議用端末２０Ｂが設置されている。会議用端末２０Ｂには、少なくとも、スピーカ５１Ｂ、マイク５２Ｂが接続されおり、もちろん、モニタ及び撮像部がさらに接続されてもよい。 A plurality of conversation persons 31A (each corresponding to the conversation person 30A in FIG. 1A) participates in the conversation environment 400A of the venue A, and for the conference via the network 10 in front of the conversation person 31A. A conference terminal 20A having an I / 221 for exchanging data with the server 300 and the interactive environment 400B at the venue B is installed. At least a speaker 51A and a microphone 52A are connected to the conference terminal 20A. Of course, a monitor and an imaging unit may be connected to the conference terminal 20A. Similarly, in the conversation environment 400B of the venue B, a plurality of other-party talkers 31B (each corresponding to the other-party talker 30B in FIG. 1 (a)) participate in the dialogue environment 400B. Are provided with a conference terminal 20B having an I / O 221 for exchanging data with the conference server 300 and the interactive environment 400A of the venue A via the network 10. At least a speaker 51B and a microphone 52B are connected to the conference terminal 20B, and of course, a monitor and an imaging unit may be further connected.

上述のような一対一、一対多、多対多の双方向対話において音声データの授受が行われる場合、会議用端末２０の設置環境（反響具合、環境ノイズ有無などの対話環境）、マイクなどの集音器に対する発言者の位置、声量など様々な条件下で運用されており、状況によって、エコーが発生する場合がある。 When voice data is exchanged in the one-to-one, one-to-many, and many-to-many interactive conversations as described above, the installation environment of the conference terminal 20 (interactive environment such as the presence of echoes and the presence of environmental noise), the collection of microphones, and the like It is operated under various conditions such as the position of the speaker relative to the sound device and the volume of the voice, and an echo may occur depending on the situation.

図３（ａ）は、そのエコー発生メカニズムを説明するための図である。通常、ネットワーク１０を介して対話者３０Ａと相手側対話者３０Ｂが双方向対話を行う場合、対話者３０Ａの音声Ａはマイク５２Ａから取り込まれ、スピーカ５１Ｂによって再生されることにより相手側対話者３０Ｂに聴かれる。一方、相手側対話者３０Ｂの音声Ｂも、マイク５２Ｂにより取り込まれ、スピーカ５１Ａにより再生することで対話者３０Ａに聴かれる。このとき、相手側対話者３０Ｂがいる対話環境において、スピーカ５１Ｂから再生出力された音声Ａの一部がエコー成分３６としてマイク５２Ｂに取り込まれる可能性がある。なお、このエコー成分３６には、直接マイク５２Ｂに取り込まれる成分、障害物３５に反射した成分、相手側対話者３０Ｂ自身に反射した成分が含まれる。結局、対話者３０Ａがいる対話環境では、スピーカ５１Ａから相手側対話者３０Ｂの音声Ｂとともにエコー成分３６が再生出力されることになる。 FIG. 3A is a diagram for explaining the echo generation mechanism. Normally, when the interlocutor 30A and the other-party talker 30B perform a two-way dialogue via the network 10, the voice A of the talker 30A is captured from the microphone 52A and reproduced by the speaker 51B, thereby causing the other-party talker 30B. Listened to. On the other hand, the voice B of the other party talker 30B is also taken in by the microphone 52B and is heard by the talker 30A by being reproduced by the speaker 51A. At this time, there is a possibility that part of the voice A reproduced and output from the speaker 51B is taken into the microphone 52B as the echo component 36 in the conversation environment where the other party conversation person 30B exists. The echo component 36 includes a component that is directly captured by the microphone 52B, a component that is reflected by the obstacle 35, and a component that is reflected by the other-party talker 30B itself. Eventually, in the conversation environment where the conversation person 30A exists, the echo component 36 is reproduced and output from the speaker 51A together with the voice B of the other party conversation person 30B.

上述のような双方向対話でのエコー発生の特徴は、対話者３０Ａ、相手側対話者３０Ｂとも、相手側対話者の対話環境で再生される自身の音声を確認することはできず、相手側でエコーが発生しているか否かを確認できないまま双方向対話が行われていることである。このような状況を放置しているとスムーズな双方向対話が行われなくなる可能性があるため、近年、図３（ｂ）に示されたようなエコーキャンセリング回路２４０を備えた会議用端末が利用されるようになってきた。すなわち、エコーキャンセリング回路２４０は、音声データをライン入力し、Ｄ／Ａ２５１を介してスピーカ５１Ｂから出力するパス上に設けられたボイススイッチ２４２と、マイク５２ＢからＡ／Ｄ２５２を介して取り込まれた音声Ｂをライン出力するパス上に設けられたボイススイッチ２４３、エコーサプレッサ２４５、エコー成分を取り込まれた音声Ｂのディジタルデータから差分するための適応フィルタ２４１、ボイススイッチ２４２、２４３それぞれを制御するためのボイススイッチ制御部２４４を備える。 The feature of echo generation in the two-way dialogue as described above is that neither the conversation party 30A nor the other party conversation person 30B can confirm the voice of the other party that is played in the conversation environment of the other party. The two-way dialogue is being performed without confirming whether or not an echo has occurred. If such a situation is left unattended, smooth interactive dialogue may not be performed. In recent years, a conference terminal equipped with an echo canceling circuit 240 as shown in FIG. It has come to be used. That is, the echo canceling circuit 240 inputs voice data through a line and is captured from the voice switch 242 provided on the path output from the speaker 51B via the D / A 251 and the microphone 52B via the A / D 252. To control the voice switch 243, the echo suppressor 245, the adaptive filter 241 for subtracting the echo component from the captured digital data of the voice B, and the voice switches 242, 243 provided on the path for outputting the voice B on the line Voice switch control unit 244.

適応フィルタ２４１は、マイク５２Ｂに戻ってくるエコー成分３６を予測してＡ／Ｄ２５２から出力された音声データから除去する。エコーサプレッサ２４５は、適応フィルタ２４１によるエコー成分除去の残存成分を小さくする処理を行う。また、ボイススイッチ２４２、２４３は、ボイススイッチ制御部２４４が適切な音量に制御することによりハウリング発生を抑制するよう、ライン入力とライン出力間の音量調整を行う。 The adaptive filter 241 predicts the echo component 36 that returns to the microphone 52B and removes it from the audio data output from the A / D 252. The echo suppressor 245 performs processing for reducing the residual component of echo component removal by the adaptive filter 241. In addition, the voice switches 242 and 243 perform volume adjustment between the line input and the line output so that howling is suppressed by the voice switch control unit 244 controlling to an appropriate volume.

なお、エコーキャンセリングの原理は、自身（例えば対話者３０Ａ）の対話環境下において、相手（例えば相手側対話者３０Ｂ）の音声に混じって既に送信された自身の音声成分を、再生すべき音声データから削除すること、あるいは、相手（例えば相手側対話者３０Ｂ）の対話環境下において、該相手の音声に混じって既に再生された自身（例えば対話者３０Ａ）の音声成分を、送信すべき音声データから削除することにより行われる。具体的には、一例として、相手から送信されてきたエコー成分に似せた擬似エコー信号を発生させ、実際のエコー成分（エコー信号）からこの擬似エコー信号を引き算することによりエコー成分を打ち消す。また、相手側対話者の対話環境（部屋等）の残響などに起因して自身の音声が送信されてきたときも同様の処理が行われる。このようなエコーキャンセリングを実行するアルゴリズムは複数種類あり、その中にはＮＬＭＳ（学習同定法）アルゴリズムのように発生するエコー成分と必要な擬似エコー成分のバランスを自動計算するものもある。 The principle of echo canceling is that the voice component to be reproduced is already transmitted in the voice environment of the other party (for example, the other party talker 30B) in the own conversation environment (for example, the other party 30A). In the dialogue environment of the other party (for example, the other party dialoger 30B), the voice component to be transmitted (for example, the talker 30A) that has already been reproduced in the voice of the other party is deleted. This is done by deleting from the data. Specifically, as an example, a pseudo echo signal resembling an echo component transmitted from the other party is generated, and the echo component is canceled by subtracting the pseudo echo signal from the actual echo component (echo signal). The same processing is performed when the user's own voice is transmitted due to the reverberation of the other party's dialogue environment (room, etc.). There are a plurality of algorithms for executing such echo canceling, and some of them automatically calculate the balance between the generated echo component and the necessary pseudo echo component, as in the NLMS (learning identification method) algorithm.

次に、本実施形態に係る対話環境再現方法の一例を、図４〜図６を参照しながら詳細に説明する。なお、以下の動作説明は、対話者３０Ｂ（図１（ａ））や会場Ｂの対話環境４００Ｂ（図２）で双方向対話開始前に行われる動作である。また、図４は、本実施形態に係る対話環境再現方法の一例を説明するためのフローチャートである。図５は、本実施形態に係る対話環境再現方法を実行する会議用端末、特にオーディオインターフェースにおける概略動作を説明するための図である。図６は、本実施形態に係る対話環境再現方法の各工程における再生音声の波形及びエコー成分の波形を示す図である。 Next, an example of the interactive environment reproduction method according to the present embodiment will be described in detail with reference to FIGS. The following description of the operation is performed before the interactive conversation is started in the conversation person 30B (FIG. 1A) or the conversation environment 400B in the venue B (FIG. 2). FIG. 4 is a flowchart for explaining an example of the interactive environment reproduction method according to this embodiment. FIG. 5 is a diagram for explaining a schematic operation in a conference terminal for executing the interactive environment reproduction method according to the present embodiment, particularly in an audio interface. FIG. 6 is a diagram showing the waveform of the reproduced voice and the waveform of the echo component in each step of the interactive environment reproduction method according to this embodiment.

図１（ａ）等に示された会議用端末２０の動作モードには、通常の双方向対話モード、エコーキャンセリングを行いながら双方向対話を実行するエコーキャンセリングモード、双方向対話開始前に相手側対話者側のエコー発生状況を確認するためのテストモード（本実施形態に係る対話環境再現方法を実行するモード）があるが、以下、会議用端末２０のうち特にオーディオインターフェース２００Ａが関与するテストモードについて詳細に説明する。 The operation mode of the conference terminal 20 shown in FIG. 1A and the like includes a normal interactive dialog mode, an echo canceling mode in which interactive conversation is performed while performing echo canceling, and before the interactive conversation starts. There is a test mode (a mode for executing the interactive environment reproduction method according to the present embodiment) for confirming the echo occurrence status of the other party's dialog side, but the audio interface 200A is particularly involved in the conference terminal 20 below. The test mode will be described in detail.

まず、テストモードを開始する一方の対話者が、自己の対話環境下において、設置されている会議用端末に対して、一定時間発言する。会議用端末は、対話者の発言内容(音声）を、予め設定された録音ボリューム値に基づいて録音することにより、テスト用音声データ（テスト用音源としてのディジタルデータ）の取得が行われる（ステップＳＴ１０）。具体的には、図５（ａ）に示されたように、会議用端末のオーディオインターフェース２００Ａのマイク５２からＡ／Ｄ２５２を介して取り込まれたテスト用音声（テスト用音源）は、制御部２１０により、データ１（第１テスト音データに相当する第１音声データ）として記憶部２３０に格納される。 First, one of the interlocutors who starts the test mode speaks for a certain period of time to the conference terminal that is installed in his / her interactive environment. The conference terminal obtains test voice data (digital data as a test sound source) by recording the speech content (voice) of the conversation person based on a preset recording volume value (step 1). ST10). Specifically, as shown in FIG. 5A, the test sound (test sound source) captured from the microphone 52 of the audio interface 200A of the conference terminal via the A / D 252 is transmitted to the control unit 210. Thus, the data is stored in the storage unit 230 as data 1 (first audio data corresponding to the first test sound data).

続いて、オーディオインターフェース２００Ａは、記憶部２３０に格納されたデータ１を、テスト用音声として、予め設定された再生ボリューム値に基づいてスピーカ５１からＤ／Ａ２５１を介して再生出力する。同時に、スピーカ５１からの再生音声は、マイク５２からＡ／Ｄ２５２を介してオーディオインターフェース２００Ａに取り込まれる（ステップＳＴ２０）。すなわち、図５（ｂ）に示されたように、オーディオインターフェース２００Ａでは、制御部２１０が所定の録音ボリューム値に基づいて取り込まれた音声データ（第２テスト音データに相当する第２音声データ）を、データ２として、記憶部２３０に格納させる。このように記憶部２３０に格納されたデータ２は、テストモードが実行される対話環境の環境データ（音声）である。また、データ２の格納の際、当該オーディオインターフェース２００Ａがエコーキャンセリング回路２４０を有する場合には、再生音声の取り込みの際にエコーキャンセリング動作が行われてもよい。なお、このステップＳＴ２０においてスピーカ５１から再生出力される音声の波形を図６（ａ）に示す。 Subsequently, the audio interface 200A reproduces and outputs the data 1 stored in the storage unit 230 from the speaker 51 via the D / A 251 as a test audio based on a preset reproduction volume value. At the same time, the reproduced sound from the speaker 51 is taken into the audio interface 200A from the microphone 52 via the A / D 252 (step ST20). That is, as shown in FIG. 5B, in the audio interface 200A, the audio data (second audio data corresponding to the second test sound data) captured by the control unit 210 based on a predetermined recording volume value is provided. Is stored in the storage unit 230 as data 2. Thus, the data 2 stored in the storage unit 230 is environment data (voice) of the interactive environment in which the test mode is executed. Further, when the data 2 is stored, if the audio interface 200A includes the echo canceling circuit 240, an echo canceling operation may be performed when the reproduced sound is captured. In addition, the waveform of the sound reproduced and output from the speaker 51 in step ST20 is shown in FIG.

このテストモードでは、記憶部２３０に格納されたデータ２（テストモードが実行される対話環境の環境データ）をスピーカ５１から再生出力する（ステップＳＴ３０）。すなわち、図５（ｃ）に示されたように、オーディオインターフェース２００Ａでは、制御部２１０が記憶部２３０から格納されていたデータ２を読み出し、読み出されたデータ２が、所定の再生ボリューム値に基づいて、Ｄ／Ａ２５１を介してスピーカ５１から再生出力される。このステップＳＴ３０における再生音声（確認用音声）を聞くことによりテストモードを行っている対話者は、自己の対話環境におけるエコー発生状況を確認することが可能となる（ステップＳＴ４０）。 In this test mode, the data 2 (environment data of the interactive environment in which the test mode is executed) stored in the storage unit 230 is reproduced and output from the speaker 51 (step ST30). That is, as shown in FIG. 5C, in the audio interface 200A, the control unit 210 reads the data 2 stored from the storage unit 230, and the read data 2 becomes a predetermined reproduction volume value. Based on this, reproduction is output from the speaker 51 via the D / A 251. By listening to the reproduced sound (confirmation sound) in step ST30, the conversation person performing the test mode can confirm the state of echo occurrence in his / her conversation environment (step ST40).

なお、ステップＳＴ２０において記憶部２３０にデータ２が格納される際、エコーキャンセリングが有効に機能していれば、ステップＳＴ３０において再生されるデータ２に基づく音声からはエコー成分が除去されるため、そのときのエコー成分（データ２とデータ１の差分）は、図６（ｂ）に示すような波形になる。 Note that when data 2 is stored in the storage unit 230 in step ST20, if echo canceling is functioning effectively, the echo component is removed from the sound based on the data 2 reproduced in step ST30. The echo component (difference between data 2 and data 1) at that time has a waveform as shown in FIG.

一方、ステップＳＴ２０において記憶部２３０にデータ２が格納される際、エコーキャンセリングが十分に機能していなければ、ステップＳＴ３０において再生されるデータ２に基づく音声にエコー成分が残ってしまい、そのときのエコー成分は、図６（ｃ）に示すような波形になる。この場合、実際の双方向対話が行われた場合、相手側にエコーが聴こえていることになる。 On the other hand, when the data 2 is stored in the storage unit 230 in step ST20, if the echo canceling is not sufficiently functioning, an echo component remains in the sound based on the data 2 reproduced in step ST30. The echo component has a waveform as shown in FIG. In this case, when an actual interactive dialogue is performed, an echo is heard on the other side.

ステップＳＴ４０で行われるエコー確認として、たとえば簡易的に、ステップＳＴ３０で再生される音声波形の振幅が一定値以上出力されないことを確認することで、エコー成分と適切な入力音声成分かの識別が可能になる。この識別は、ステップＳＴ２０で再生された音声の波形とステップＳＴ３０で再生された音声が同一か否かにより行われる。また、この識別は、再生音声をフーリエ変換して周波数成分を比較することによっても可能である。 As the echo confirmation performed in step ST40, for example, by simply confirming that the amplitude of the speech waveform reproduced in step ST30 is not output above a certain value, it is possible to distinguish between the echo component and an appropriate input speech component. become. This identification is performed based on whether or not the sound waveform reproduced in step ST20 and the sound reproduced in step ST30 are the same. This identification can also be performed by Fourier transforming the reproduced sound and comparing the frequency components.

ステップＳＴ４０においてエコー発生が確認された場合、例えば、対話環境の改善（カーテンなどを設置することで吸音）、マイクとスピーカの距離を離す、マイクやスピーカの向きを調整する、再生ボリュームを調整するなど、音声再生環境（対話環境）の調整が行われる（ステップＳＴ５０）。 If echo generation is confirmed in step ST40, for example, improvement of the conversation environment (sound absorption by installing a curtain or the like), separation of the distance between the microphone and the speaker, adjustment of the direction of the microphone or the speaker, adjustment of the reproduction volume. For example, the sound reproduction environment (dialogue environment) is adjusted (step ST50).

なお、ステップＳＴ２０において再生された音声のボリューム（音量）が適切でない場合は、ボリュームが適切になるように、マイクボリューム、スピーカボリューム等を調整し、ステップＳＴ１０からテストモードを開始する。 If the volume (volume) of the sound reproduced in step ST20 is not appropriate, the microphone volume, speaker volume, etc. are adjusted so that the volume is appropriate, and the test mode is started from step ST10.

また、ステップＳＴ１０において、記憶部２３０に格納されるデータ１は、マイク５２からＡ／Ｄ２５２を介して取り込まれるディジタルデータには限定されない。例えば、ネットワーク１０に接続されたサーバ（例えば会議用サーバ３００の記憶部３１０）に予め可能されたマイク５２の設置環境と同じ環境下で取得された音声データをＩ／Ｏ２２１を介して取得し、この取得された音声データが、テスト用音声データとして、記憶部２３０に格納されてもよい。その際、データ１は、利用される会場（会議室）ごとに事前に用意されるのが好ましい。会議用サーバ３００などにデータ１を保管しておけば、会議用端末が新たに会場に導入される時など、必要な時にそのデータ１をテスト用音声データとして使用することも可能になる。相手の対話環境（会議室）での再生ボリュームを確保するのに最低限必要なマイクボリュームを確認しておき、ボリューム調整時にそのマイクボリューム値よりも下げられないように制限を設けることも可能である。 In step ST10, the data 1 stored in the storage unit 230 is not limited to digital data captured from the microphone 52 via the A / D 252. For example, the audio data acquired in the same environment as the installation environment of the microphone 52 previously made available in a server (for example, the storage unit 310 of the conference server 300) connected to the network 10 is acquired via the I / O 221. The acquired audio data may be stored in the storage unit 230 as test audio data. At that time, the data 1 is preferably prepared in advance for each venue (conference room) to be used. If data 1 is stored in the conference server 300 or the like, the data 1 can be used as test audio data when necessary, such as when a conference terminal is newly introduced into the venue. It is also possible to confirm the minimum microphone volume required to secure the playback volume in the other party's interactive environment (conference room), and to set a limit so that the microphone volume value cannot be lowered when adjusting the volume. is there.

上述のように、ステップＳＴ４０において自己の対話環境におけるエコー発生状況を再現する制御部２１０は、再現されたエコー発生状況に基づいて、該対話環境の整備状態を判定する判定手段としても機能し得る。この場合、制御部２１０は、会議用端末本体に取り付けられた液晶モニタ２６１やＬＥＤアラーム２６２に判定結果を視覚的に表示させるよう描画部２６０を制御する。なお、この判定結果は、会議用端末に接続されたモニタ４２に表示させてもよい。 As described above, the control unit 210 that reproduces the echo occurrence state in its own dialogue environment in step ST40 can also function as a determination unit that determines the maintenance state of the dialogue environment based on the reproduced echo occurrence situation. . In this case, the control unit 210 controls the drawing unit 260 so that the determination result is visually displayed on the liquid crystal monitor 261 and the LED alarm 262 attached to the conference terminal body. The determination result may be displayed on the monitor 42 connected to the conference terminal.

図７は、制御部２１０による表示制御の一例として、ＬＥＤアラーム２６２の種々の構成例を示す図である。 FIG. 7 is a diagram illustrating various configuration examples of the LED alarm 262 as an example of display control by the control unit 210.

図７（ａ）の例では、ＬＥＤアラーム２６２Ａは、対話開始が可能なレベルまでエコー発生状況が抑えられていることを示すＬＥＤ１（「ＯＫ」表示のＬＥＤ）、対話環境の整備が必要であることを示すＬＥＤ２（「ＮＯ」表示のＬＥＤ）を備える。判定手段として機能する制御部２１０は、例えば、ステップＳＴ３０において再生されるデータ２に含まれるエコー成分（データ１とデータ２との差分）の振幅（音量情報）が予め設定された閾値を超えるか否かで、対話環境の整備状態を判定する。制御部２１０は、この判定結果に基づいて、ＬＥＤアラームのＬＥＤ１及びＬＥＤ２のいずれかを点灯させるよう、描画部２６０を制御する。 In the example of FIG. 7A, the LED alarm 262A needs to maintain the dialog environment, LED 1 (LED indicating “OK”) indicating that the echo occurrence status is suppressed to a level at which the dialog can be started. LED2 (LED of "NO" display) which shows this. Whether the amplitude (volume information) of the echo component (difference between data 1 and data 2) included in the data 2 reproduced in step ST30 exceeds a preset threshold, for example, by the control unit 210 functioning as a determination unit If not, the state of maintenance of the dialogue environment is judged. Based on the determination result, the control unit 210 controls the drawing unit 260 so that either LED1 or LED2 of the LED alarm is turned on.

図７（ｂ）に示されたＬＥＤアラーム２６２Ｂは、２種類のＬＥＤを備える点では上述のＬＥＤアラーム２６２Ａと同様であるが、対話環境の整備状況の判定結果をレベル表示する点でＬＥＤアラーム２６２Ａと異なる。すなわち、ＬＥＤアラーム２６２Ｂは、対話開始が可能なレベルまでエコー発生状況が抑えられていることを示すＬＥＤ１（「ＯＫ」表示のＬＥＤ）、対話環境の整備が必要であることを、その輝度を調節することにより複数レベルで示すＬＥＤ３（「ＮＯ」表示のＬＥＤ）を備える。この場合、制御部２１０は、予め複数のレベルごとに閾値を設定しておき、ステップＳＴ３０において再生されるデータ２に含まれるエコー成分の音量情報に基づいて、ＬＥＤ３が該エコー成分の音量変化に対応した輝度になるよう描画部２６０を制御する。また、制御部２１０が判定手段としても機能する場合、該制御部２１０は、ステップＳＴ３０において再生されるデータ２に含まれるエコー成分の振幅がいずれの閾値を超えるかで、対話環境の整備レベルを判定する（ステップＳＴ４０）。制御部２１０は、この判定結果に基づいて、ＬＥＤアラームのＬＥＤ１及びＬＥＤ３のいずれかを選択し、さらにＬＥＤ３を選択する場合には判定結果に応じたレベルの輝度でＬＥＤ３を点灯させるよう、描画部２６０を制御する。 The LED alarm 262B shown in FIG. 7B is the same as the above-described LED alarm 262A in that it includes two types of LEDs, but the LED alarm 262A in that the determination result of the maintenance status of the interactive environment is displayed as a level. And different. That is, the LED alarm 262B adjusts the brightness of the LED 1 (“OK” display LED) indicating that the echo occurrence state is suppressed to a level at which the dialogue can be started, and that the dialogue environment needs to be maintained. By doing so, LED3 (LED of "NO" display) shown in multiple levels is provided. In this case, the control unit 210 sets a threshold value for each of a plurality of levels in advance, and the LED 3 changes the volume of the echo component based on the volume information of the echo component included in the data 2 reproduced in step ST30. The drawing unit 260 is controlled so as to have a corresponding luminance. When the control unit 210 also functions as a determination unit, the control unit 210 determines the level of maintenance of the conversation environment depending on which threshold the amplitude of the echo component included in the data 2 reproduced in step ST30 exceeds. Determination is made (step ST40). Based on the determination result, the control unit 210 selects one of the LED alarms LED1 and LED3, and when selecting the LED3, the drawing unit turns on the LED3 with a level of luminance according to the determination result. 260 is controlled.

図７（ｃ）に示されたＬＥＤアラーム２６２Ｃは、ＬＥＤアラーム２６２Ａと同じ２種類のＬＥＤ１、ＬＥＤ２の他、対話環境の整備状況の判定結果をレベル表示する１又はそれ以上のＬＥＤ群（ＬＥＤ４）を備える。すなわち、ＬＥＤアラーム２６２Ｃは、対話開始が可能なレベルまでエコー発生状況が抑えられていることを示すＬＥＤ１（「ＯＫ」表示のＬＥＤ）、対話環境の整備が必要であることを示すＬＥＤ２（「ＮＯ」表示のＬＥＤ）、対話環境の整備必要度に応じて点灯する１又はそれ以上のＬＥＤ群（ＬＥＤ４）を備える。この場合、制御部２１０は、予め複数のレベルごとに閾値を設定しておき、ステップＳＴ３０において再生されるデータ２に含まれるエコー成分の音量情報に基づいて、ＬＥＤアラーム２６２ＣのＬＥＤ１、ＬＥＤ２及びＬＥＤ４のうち、該エコー成分の音量に対応するＬＥＤを点灯するよう描画部２６０を制御する。また、制御部２１０が判定手段としても機能する場合、該制御部２１０は、ステップＳＴ３０において再生されるデータ２に基づく音の振幅がいずれの閾値を超えるかで、対話環境の整備レベルを判定する（ステップＳＴ４０）。制御部２１０は、この判定結果に基づいて、ＬＥＤアラーム２６２ＣのＬＥＤ１、ＬＥＤ２及びＬＥＤ４のいずれかを選択し、さらにＬＥＤ４を選択する場合には判定結果に応じたレベルのＬＥＤを点灯させるよう、描画部２６０を制御する。 The LED alarm 262C shown in FIG. 7C includes the same two types of LEDs 1 and 2 as the LED alarm 262A, and one or more LED groups (LED4) for displaying the level of the determination result of the maintenance status of the interactive environment. Is provided. That is, the LED alarm 262C includes LED1 ("OK" display LED) indicating that the echo occurrence state is suppressed to a level at which the dialog can be started, and LED2 ("NO" indicating that the dialog environment needs to be maintained. "LED"), and one or more LED groups (LED4) that are turned on according to the necessity of maintenance of the interactive environment. In this case, the control unit 210 sets a threshold value for each of a plurality of levels in advance, and based on the volume information of the echo component included in the data 2 reproduced in step ST30, the LED1, LED2, and LED4 of the LED alarm 262C. Among these, the drawing unit 260 is controlled so as to turn on the LED corresponding to the volume of the echo component. Further, when the control unit 210 also functions as a determination unit, the control unit 210 determines the maintenance level of the interactive environment depending on which threshold the sound amplitude based on the data 2 reproduced in step ST30 exceeds. (Step ST40). Based on the determination result, the controller 210 selects any one of the LED1, LED2, and LED4 of the LED alarm 262C, and when selecting LED4, draws the LED at a level corresponding to the determination result. The unit 260 is controlled.

以上のように本発明よれば、対話者自身の対話環境が発生源となり相手側対話者の対話環境下で発生するエコーを、発生源である対話者自身の対話環境下で事前確認することが可能になる。そのため、相手側対話者の対話環境下で発生するエコー、特に各情報端末が有するエコーキャンセリング機能では除去しきれないエコーの解消に向けた対策を取り易くなる。また、このエコー解消作業は、エコーに関する問題が発生する可能性のある相手側対話者の対話環境と接続することなく、対話者自身がエコー発生状況を事前確認できる。 As described above, according to the present invention, it is possible to confirm in advance the echo generated in the conversation environment of the other party's dialogue person in the conversation environment of the other party's own conversation in the conversation environment of the other party. It becomes possible. For this reason, it is easy to take measures for eliminating echoes generated in the conversation environment of the other party's dialog, particularly echoes that cannot be removed by the echo canceling function of each information terminal. In addition, the echo canceling work allows the conversation person to confirm the echo occurrence state in advance without being connected to the conversation environment of the partner conversation person who may cause a problem related to the echo.

以上の本発明の説明から、本発明を様々に変形しうることは明らかである。そのような変形は、本発明の思想および範囲から逸脱するものとは認めることはできず、すべての当業者にとって自明である改良は、以下の請求の範囲に含まれるものである。 From the above description of the present invention, it is apparent that the present invention can be modified in various ways. Such modifications cannot be construed as departing from the spirit and scope of the invention, and modifications obvious to one skilled in the art are intended to be included within the scope of the following claims.

１０…ネットワーク（伝送手段）２０、２０Ａ、２０Ｂ…会議用端末（情報端末）、３０Ａ、３１Ａ…対話者、３０Ｂ、３１Ｂ…相手側対話者、５１、５１Ａ、５１Ｂ…スピーカ、５２、５２Ａ、５２Ｂ…マイク、２１０…制御部（制御手段、判定手段）２３０…記録部（記録手段）、２４０…エコーキャンセリング回路、２６０…描画部、２６１…液晶モニタ（表示手段）、２６２、２６２Ａ、２６２Ｂ、２６２Ｃ…ＬＥＤアラーム（表示手段）。 DESCRIPTION OF SYMBOLS 10 ... Network (transmission means) 20, 20A, 20B ... Conference terminal (information terminal), 30A, 31A ... Dialogue, 30B, 31B ... Opposite talker, 51, 51A, 51B ... Speaker, 52, 52A, 52B ... Microphone, 210 ... Control unit (control unit, determination unit) 230 ... Recording unit (recording unit), 240 ... Echo canceling circuit, 260 ... Drawing unit, 261 ... Liquid crystal monitor (display unit), 262, 262A, 262B, 262C ... LED alarm (display means).

Claims

In a two-way interactive system that enables transmission and reception of electronic data including at least audio data via a predetermined transmission means, in order for the interlocutor to confirm the echo occurrence status in the interactive environment where the information terminal is installed The interactive environment reproduction method of
Storing the first test sound data in the recording means of the information terminal as a test sound source recorded in an interactive environment in which the information terminal is installed;
While reproducing the sound based on the first test sound data via the speaker of the own information terminal, the second test sound data captured via the microphone of the own information terminal is stored as environmental data in the recording means. Store and
An interactive environment reproduction method for reproducing an echo occurrence state in an interactive environment in which the information terminal is installed by reproducing a sound based on the second test sound data through the speaker.

The interactive environment reproduction method according to claim 1, wherein the echo component included in the second test sound data is visually displayed using predetermined display means as information indicating the reproduced echo occurrence state.

Determining the degree of the reproduced echo occurrence status based on the volume information of the echo component,
3. The interactive environment reproduction method according to claim 2, wherein the judgment result of the echo occurrence state is visually displayed according to a preset judgment level by using a previous period display means.

The interactive environment reproduction method according to claim 1, wherein the information terminal has at least an echo canceling function.

The first test sound data is electronic data stored in advance in recording means of another information terminal connected to the predetermined transmission means, and is electronic data obtained via the predetermined transmission means. The interactive environment reproduction method according to claim 1, wherein the interactive environment is reproduced.

The interactive environment reproduction method according to any one of claims 1 to 4, wherein the first test sound data is electronic data captured through a microphone of the information terminal.

The computer program for performing the reproduction method of the interactive environment as described in any one of Claims 1-6.

A computer-readable recording medium on which the computer program according to claim 7 is recorded.

Control means for executing the interactive environment reproduction method according to any one of claims 1 to 6,
The recording means;
A display means for visually displaying an echo component included in the second test sound data stored in the recording means.

Determination means for determining the degree of echo occurrence status reproduced by the control means based on volume information of the echo component;
The apparatus according to claim 9, wherein the display unit visually displays the determination result of the determination unit according to a predetermined determination level.