JP2009302824A

JP2009302824A - Voice communication system

Info

Publication number: JP2009302824A
Application number: JP2008154063A
Authority: JP
Inventors: Takahiro Tanaka; 孝浩田中
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-06-12
Filing date: 2008-06-12
Publication date: 2009-12-24

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique capable of communicating a situation when an error or noise, etc. occur during a remote conference, etc. using a plurality of communication terminals. <P>SOLUTION: Voice uttered by a participant 1A is collected by a microphone 15 connected to a conference terminal 10A, and voice data expressing the collected voice is transmitted to a conference terminal 10B. The conference terminal 10B outputs the voice data to be received from the conference terminal 10A to a speaker 17. A microphone 35 collects the voice to be emitted from the speaker 17 to output a voice signal. A text converter 30 performs A/D conversion of the voice signal to be output from the microphone 35 to generate voice data, performs text conversion of the generated voice data to generate text data, and transmits the generated text data to a text display device 40. The text display device 40 receives the text data from the text converter 30, and displays a text indicated by the received text data on a display device 43. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声通信システムに関する。 The present invention relates to a voice communication system.

近年、通信網を介して接続された複数の通信端末を用いて会議を行う遠隔会議システムが普及している。このような遠隔会議システムにおいては、発話者と聴取者が直接対面していないため、発話者は自身の声が相手にきちんと届いているかを不安に感じる場合がある。特許文献１には、発話者の不安を解消するために、送信した音声・画像データが受信側においてどのような状態で届いているかを送信側でリアルタイムに表示する技術が提案されている。また、会議の後確認（議事録作成）などの為に、会話の内容をテキスト変換する技術が提案されている（特許文献２、３等）。また、近年では、携帯電話などのように、接続が不安定になると「ピーピーピー」といった警告音を鳴らす装置が用いられている。
特開２００５−２６９４９８号公報特開平０４−３６２７６９号公報特開２００３−２８３６７４号公報 In recent years, a remote conference system that performs a conference using a plurality of communication terminals connected via a communication network has become widespread. In such a remote conference system, since the speaker and the listener do not directly face each other, the speaker may feel anxious about whether his / her voice is properly delivered to the other party. Patent Document 1 proposes a technique for displaying in real time on the transmission side what kind of state the transmitted voice / image data has reached on the reception side in order to eliminate the anxiety of the speaker. In addition, a technique for converting the contents of a conversation into text for confirmation after meeting (preparation of minutes) has been proposed (Patent Documents 2, 3, etc.). In recent years, devices such as mobile phones that emit warning sounds such as “Peep-Peep” when connection is unstable have been used.
JP 2005-269498 A Japanese Patent Laid-Open No. 04-362769 JP 2003-283694 A

ところで、遠隔会議等の音声通信を行う場合においては、音声に雑音が混入したり、音声がかすれたり、あるいはエコー等の残響によって音声が不明瞭になる場合がある。しかしながら、上述した従来の技術では、通信状況の致命的な問題は認識できるが、このような音声が不明瞭になっている場合であっても、その状況を発話者側に認識させることはできなかった。
本発明は上述した背景に鑑みてなされたものであり、複数の通信端末を用いて遠隔会議等の音声通信を行う際に、会話の途中でエラーやノイズ等が発生した場合に発話者側にその状況を伝えることのできる技術を提供することを目的とする。 By the way, when performing voice communication such as a remote conference, the voice may be obscured due to noise mixed in the voice, the voice being faint, or reverberation such as echo. However, although the above-mentioned conventional technology can recognize a fatal problem in the communication situation, even if such voice is unclear, the situation can be recognized by the speaker. There wasn't.
The present invention has been made in view of the above-described background. When voice communication such as a remote conference is performed using a plurality of communication terminals, if an error or noise occurs in the middle of a conversation, The purpose is to provide technology that can communicate the situation.

上記課題を解決するために、本発明は、通信ネットワークを介して互いに接続された第１、第２の端末を有する音声通信システムであって、前記第１の端末は、収音した音声を表す音声データを出力する収音装置から出力される音声データが入力される第１の音声入力端子と、供給されるデータに応じた画像を表示する表示装置にデータを出力する映像出力端子と、前記第１の音声入力端子に入力される音声データを、前記第２の端末へ送信する音声データ送信手段と、前記第２の端末からテキストデータを受信するテキストデータ受信手段と、前記テキストデータ受信手段によって受信されたテキストデータを前記映像出力端子に出力するテキストデータ出力手段とを具備し、前記第２の端末は、供給される音声データに応じて放音する放音装置に音声データを出力する音声出力端子と、前記放音装置によって放音される音声を収音し、収音した音声を表す音声データを出力する収音装置から出力される音声データが入力される第２の音声入力端子と、前記第１の端末から送信されてくる音声データを受信する第２の音声データ受信手段と、前記第２の音声データ受信手段によって受信された音声データを、前記音声出力端子に出力する出力手段と、前記第２の音声入力端子に入力された音声データをテキスト変換してテキストデータを生成するテキスト変換手段と、前記テキスト変換手段によって生成されたテキストデータを、前記第１の端末へ送信するテキストデータ送信手段とを具備することを特徴とする音声通信システムを提供する。 In order to solve the above-mentioned problem, the present invention is an audio communication system having first and second terminals connected to each other via a communication network, wherein the first terminal represents collected sound. A first audio input terminal to which audio data output from a sound collection device that outputs audio data is input; a video output terminal that outputs data to a display device that displays an image according to the supplied data; Voice data transmitting means for transmitting voice data input to the first voice input terminal to the second terminal, text data receiving means for receiving text data from the second terminal, and the text data receiving means Text data output means for outputting the text data received by the video output terminal to the video output terminal, wherein the second terminal emits sound according to the supplied audio data. A sound output terminal for outputting sound data to the device, and sound data output from the sound collecting device for collecting the sound emitted by the sound emitting device and outputting the sound data representing the collected sound are input. A second voice input terminal, second voice data receiving means for receiving voice data transmitted from the first terminal, and voice data received by the second voice data receiving means, An output means for outputting to the voice output terminal; a text conversion means for generating text data by converting the voice data input to the second voice input terminal; and text data generated by the text conversion means, There is provided a voice communication system comprising text data transmitting means for transmitting to the first terminal.

また、本発明は、通信ネットワークを介して互いに接続された第１、第２の端末を有する音声通信システムであって、前記第１の端末は、収音した音声を表す音声データを出力する収音装置から出力される音声データが入力される第１の音声入力端子と、供給されるデータに応じた画像を表示する表示装置にデータを出力する映像出力端子と、前記第１の音声入力端子に入力される音声データを、前記第２の端末へ送信する音声データ送信手段と、前記第２の端末からテキストデータを受信するテキストデータ受信手段と、前記テキストデータ受信手段によって受信されたテキストデータを前記映像出力端子に出力するテキストデータ出力手段とを具備し、前記第２の端末は、前記第１の端末から送信されてくる音声データを受信する第２の音声データ受信手段と、前記第２の音声データ受信手段により受信された音声データをテキスト変換してテキストデータを生成するテキスト変換手段と、前記テキスト変換手段によって生成されたテキストデータを、前記第１の端末へ送信するテキストデータ送信手段とを具備することを特徴とする音声通信システムを提供する。 The present invention is also an audio communication system having first and second terminals connected to each other via a communication network, wherein the first terminal outputs audio data representing the collected audio. A first audio input terminal to which audio data output from the sound device is input, a video output terminal for outputting data to a display device that displays an image corresponding to the supplied data, and the first audio input terminal Voice data transmitting means for transmitting voice data input to the second terminal, text data receiving means for receiving text data from the second terminal, and text data received by the text data receiving means Is output to the video output terminal, and the second terminal receives second audio data transmitted from the first terminal. Voice data receiving means; text conversion means for generating text data by text-converting the voice data received by the second voice data receiving means; and text data generated by the text conversion means, A voice communication system comprising: text data transmitting means for transmitting to the terminal.

また、本発明は、通信ネットワークを介して互いに接続された第１、第２の端末を有する音声通信システムであって、前記第１の端末は、収音した音声を表す音声データを出力する収音装置から出力される音声データが入力される第１の音声入力端子と、供給されるデータに応じた画像を表示する表示装置にデータを出力する映像出力端子と、前記第１の音声入力端子に入力される音声データを、前記第２の端末へ送信する音声データ送信手段と、前記第１の音声入力端子に入力される音声データをテキスト変換してテキストデータを生成する第１のテキスト変換手段と、前記第２の端末からテキストデータを受信するテキストデータ受信手段と、前記テキストデータ受信手段によって受信されたテキストデータと前記第１のテキスト変換手段によって生成されたテキストデータとを比較し、両者の差分に応じたメッセージを、前記映像出力端子及び供給される音声に応じて放音する放音装置に接続された第１の音声出力端子の少なくともいずれか一方に出力するメッセージ出力手段とを具備し、前記第２の端末は、供給される音声データに応じて放音する放音装置に音声データを出力する第２の音声出力端子と、前記放音装置によって放音される音声を収音し、収音した音声を表す音声データを出力する収音装置から出力される音声データが入力される第２の音声入力端子と、前記第１の端末から送信されてくる音声データを受信する第２の音声データ受信手段と、前記第２の音声データ受信手段によって受信された音声データを、前記第２の音声出力端子に出力する出力手段と、前記第２の音声入力端子に入力された音声データをテキスト変換してテキストデータを生成する第２のテキスト変換手段と、前記第２のテキスト変換手段によって生成されたテキストデータを、前記第１の端末へ送信するテキストデータ送信手段とを具備することを特徴とする音声通信システムを提供する。 The present invention is also an audio communication system having first and second terminals connected to each other via a communication network, wherein the first terminal outputs audio data representing the collected audio. A first audio input terminal to which audio data output from the sound device is input, a video output terminal for outputting data to a display device that displays an image corresponding to the supplied data, and the first audio input terminal Voice data transmitting means for transmitting voice data input to the second terminal, and first text conversion for generating text data by text-converting voice data input to the first voice input terminal Means, text data receiving means for receiving text data from the second terminal, text data received by the text data receiving means and the first text conversion means At least a first audio output terminal connected to the video output terminal and a sound emitting device that emits a message according to the difference between the text data generated by Message output means for outputting to any one of the above, the second terminal, a second voice output terminal for outputting voice data to a sound emitting device that emits sound according to the supplied voice data; A second sound input terminal for receiving sound data output from the sound collecting device for collecting sound emitted by the sound emitting device and outputting sound data representing the collected sound; A second voice data receiving means for receiving voice data transmitted from the terminal; and an output means for outputting the voice data received by the second voice data receiving means to the second voice output terminal; The second text conversion means for generating text data by text-converting the voice data input to the second voice input terminal, and the text data generated by the second text conversion means for the first text There is provided a voice communication system comprising text data transmission means for transmitting to a terminal.

また、本発明の好ましい態様において、前記第１の端末は、前記第１の音声入力端子に入力される音声データをテキスト変換して第２テキストデータを生成する第２のテキスト変換手段と、前記テキストデータ受信手段によって受信されたテキストデータと前記第２のテキスト変換手段によって生成された第２テキストデータとを比較し、両者の差分に応じたメッセージを、前記映像出力端子及び供給される音声に応じて放音する放音装置に接続された音声出力端子の少なくともいずれか一方に出力するメッセージ出力手段とを具備してもよい。 Also, in a preferred aspect of the present invention, the first terminal converts the voice data input to the first voice input terminal into text and generates second text data; and The text data received by the text data receiving means is compared with the second text data generated by the second text converting means, and a message corresponding to the difference between the two is sent to the video output terminal and the supplied audio. A message output means for outputting to at least one of the audio output terminals connected to the sound emitting device that emits sound in response.

また、本発明の更に好ましい態様において、前記第２の端末は、前記音声データ受信手段によって受信された音声データをテキスト変換して第３のテキストデータを生成する第３のテキスト変換手段と、前記第３のテキスト変換手段によって生成された第３のテキストデータを、前記第１の端末へ送信する第３のテキストデータ送信手段とを具備し、前記第１の端末は、前記第３のテキストデータ送信手段によって送信された第３のテキストデータを受信する第３のテキストデータ受信手段を具備し、前記第１の端末の前記メッセージ出力手段は、前記第３のテキストデータ受信手段によって受信された第３のテキストデータを、前記テキストデータ及び前記第２のテキストデータの少なくともいずれか一方と比較し、比較結果に応じたメッセージを出力してもよい。 Further, in a further preferred aspect of the present invention, the second terminal converts the voice data received by the voice data receiving means into text and generates third text data; and 3rd text data transmission means which transmits 3rd text data produced | generated by the 3rd text conversion means to the said 1st terminal, The said 1st terminal is said 3rd text data. A third text data receiving unit configured to receive the third text data transmitted by the transmitting unit, wherein the message output unit of the first terminal receives the third text data received by the third text data receiving unit; 3 is compared with at least one of the text data and the second text data, and a message corresponding to the comparison result is obtained. It may be output over di.

また、本発明の更に好ましい態様において、前記第２の端末の前記テキストデータ送信手段は、前記テキスト変換手段によって生成されたテキストデータと自端末を識別する識別情報とを前記第１の端末へ送信し、前記第１の端末の前記テキストデータ受信手段は、前記第２の端末から前記テキストデータと前記識別情報とを受信し、前記第１の端末の前記テキストデータ出力手段は、前記テキストデータ受信手段によって受信されたテキストデータと識別情報とを前記映像出力端子に出力してもよい。 Further, in a further preferred aspect of the present invention, the text data transmitting means of the second terminal transmits text data generated by the text converting means and identification information for identifying the terminal to the first terminal. The text data receiving means of the first terminal receives the text data and the identification information from the second terminal, and the text data output means of the first terminal receives the text data The text data and identification information received by the means may be output to the video output terminal.

また、本発明の更に好ましい態様において、前記第１の端末は、前記識別情報と表示態様との対応関係を記憶する記憶手段を具備し、前記テキストデータ出力手段は、前記テキストデータ受信手段によって受信されたテキストデータの示すテキストを、前記テキストデータ受信手段によって受信された識別情報に対応する表示態様で前記表示装置に表示させてもよい。 Further, in a further preferred aspect of the present invention, the first terminal comprises storage means for storing the correspondence between the identification information and the display form, and the text data output means is received by the text data receiving means. The text indicated by the text data may be displayed on the display device in a display mode corresponding to the identification information received by the text data receiving means.

本発明によれば、複数の通信端末を用いて遠隔会議等の音声通信を行う際に、会話の途中でエラーやノイズ等が発生した場合に発話者側にその状況を伝えることができる。 ADVANTAGE OF THE INVENTION According to this invention, when performing voice communications, such as a teleconference, using a some communication terminal, when an error, a noise, etc. generate | occur | produce in the middle of a conversation, the situation can be conveyed to the speaker side.

＜第１実施形態＞
＜第１実施形態の構成＞
図１は、この発明の一実施形態に係る遠隔会議システム１００の構成を示すブロック図である。この遠隔会議システム１００は、複数の拠点のそれぞれに設置された複数の会議端末１０Ａ，１０Ｂと、会議端末１０Ａに対応して設置されたテキスト表示装置４０と、会議端末１０Ｂに対応して設置されたテキスト変換装置３０とが、インターネット等の通信ネットワーク２０を介して互いに接続されて構成される。なお、図１においては２つの会議端末１０Ａ，１０Ｂ、１つのテキスト変換装置３０及び１つのテキスト表示装置４０を図示しているが、会議端末等の数はこれに限定されるものではなく、これより多くてもよい。図１に示す例においては、遠隔会議の参加者１Ａ，１Ｂがそれぞれ、会議端末１０Ａ，１０Ｂを用いて音声通信を行うことで、遠隔会議が実現される。以下の説明においては、説明の便宜上、会議端末１０Ａ，１０Ｂを各々区別する必要がない場合には、これらを「会議端末１０」と称して説明する。 <First Embodiment>
<Configuration of First Embodiment>
FIG. 1 is a block diagram showing a configuration of a remote conference system 100 according to an embodiment of the present invention. This remote conference system 100 is installed corresponding to a plurality of conference terminals 10A and 10B installed at each of a plurality of bases, a text display device 40 installed corresponding to the conference terminal 10A, and a conference terminal 10B. The text conversion device 30 is connected to each other via a communication network 20 such as the Internet. In FIG. 1, two conference terminals 10A and 10B, one text conversion device 30 and one text display device 40 are illustrated, but the number of conference terminals and the like is not limited to this, There may be more. In the example shown in FIG. 1, the remote conference is realized by the participants 1A and 1B of the remote conference performing voice communication using the conference terminals 10A and 10B, respectively. In the following description, for convenience of description, when it is not necessary to distinguish between the conference terminals 10A and 10B, they will be referred to as “conference terminals 10”.

テキスト変換装置３０は、会議の参加者の音声をテキスト変換してテキストデータを生成する装置であり、例えばパーソナルコンピュータである。テキスト表示装置４０は、通信ネットワーク２０を介してテキスト変換装置３０から送信されてくるテキストデータを受信し、受信したテキストデータの示すテキスト（文字）を表示する装置であり、例えばパーソナルコンピュータである。 The text conversion device 30 is a device that generates text data by converting the voices of the participants in the conference into text, and is a personal computer, for example. The text display device 40 is a device that receives text data transmitted from the text conversion device 30 via the communication network 20 and displays text (characters) indicated by the received text data, and is a personal computer, for example.

図２は、会議端末１０の構成の一例を示すブロック図である。図において、制御部１１は、ＣＰＵ（Central Processing Unit）やＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）を備え、ＲＯＭ又は記憶部１２に記憶されているコンピュータプログラムを読み出して実行することにより、バスＢＵＳを介して会議端末１０の各部を制御する。記憶部１２は、制御部１１によって実行されるコンピュータプログラムやその実行時に使用されるデータを記憶するための記憶手段であり、例えばハードディスク装置である。映像出力端子１３ａは、表示装置１３に接続され、表示装置１３にデータを出力する出力端子である。表示装置１３は、液晶パネルを備え、供給されるデータに応じて各種の画像を表示する。操作部１４は、会議端末１０の利用者による操作に応じた信号を出力する。マイクロホン１５は、収音し、収音した音声を表す音声信号（アナログ信号）を出力する。音声入力端子１６ａは、マイクロホン１５とオーディオケーブルを介して接続され、マイクロホン１５から出力される音声信号が入力される入力端子である。音声出力端子１６ｂは、オーディオケーブルを介してスピーカ１７に接続され、スピーカ１７へ音声信号を出力する出力端子である。音声処理部１６は、マイクロホン１５が出力する音声信号（アナログ信号）をＡ／Ｄ変換によりデジタルデータに変換する。また、音声処理部１６は、供給されるデジタルデータをＤ／Ａ変換によるアナログ信号に変換してスピーカ１７に供給する。スピーカ１７は、音声処理部１６から出力されるアナログ信号に応じた強度で放音する。通信部１８は、他の会議端末１０との間で通信ネットワーク２０を介して通信を行うための通信手段である。撮影装置１９は、撮影し、撮影した映像を表す映像データを出力する。映像入力端子１９ａは、撮影装置１９から出力されるデータが入力される入力端子である。 FIG. 2 is a block diagram illustrating an example of the configuration of the conference terminal 10. In the figure, the control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and reads and executes a computer program stored in the ROM or the storage unit 12. Control each part of the conference terminal 10 via the bus BUS. The storage unit 12 is a storage unit for storing a computer program executed by the control unit 11 and data used at the time of execution, and is, for example, a hard disk device. The video output terminal 13 a is an output terminal that is connected to the display device 13 and outputs data to the display device 13. The display device 13 includes a liquid crystal panel and displays various images according to supplied data. The operation unit 14 outputs a signal corresponding to an operation by a user of the conference terminal 10. The microphone 15 collects sound and outputs a sound signal (analog signal) representing the collected sound. The audio input terminal 16a is an input terminal that is connected to the microphone 15 via an audio cable and receives an audio signal output from the microphone 15. The audio output terminal 16 b is an output terminal that is connected to the speaker 17 via an audio cable and outputs an audio signal to the speaker 17. The audio processing unit 16 converts an audio signal (analog signal) output from the microphone 15 into digital data by A / D conversion. The audio processing unit 16 converts the supplied digital data into an analog signal by D / A conversion and supplies the analog signal to the speaker 17. The speaker 17 emits sound with an intensity corresponding to the analog signal output from the sound processing unit 16. The communication unit 18 is a communication unit for performing communication with another conference terminal 10 via the communication network 20. The imaging device 19 captures and outputs video data representing the captured video. The video input terminal 19a is an input terminal to which data output from the photographing device 19 is input.

なお、この実施形態では、会議端末１０が入力端子及び出力端子を備え、会議端末１０がマイクロホン１５及びスピーカ１７とオーディオケーブルを介して接続される場合について説明するが、これに限らず、会議端末１０にマイクロホン１５とスピーカ１７とが内蔵される構成としても良い。表示装置１３及び撮影装置１９についても同様であり、会議端末１０に表示部や撮影部が含まれる構成としてもよい。 In this embodiment, the case where the conference terminal 10 includes an input terminal and an output terminal and the conference terminal 10 is connected to the microphone 15 and the speaker 17 via an audio cable will be described. 10 may include a microphone 15 and a speaker 17. The same applies to the display device 13 and the photographing device 19, and the conference terminal 10 may include a display unit and a photographing unit.

次に、テキスト変換装置３０の構成について、図面を参照しつつ説明する。図３は、テキスト変換装置３０の構成の一例を示すブロック図である。図において、制御部３１は、ＣＰＵやＲＯＭ、ＲＡＭを備え、ＲＯＭ又は記憶部３２に記憶されているコンピュータプログラムを読み出して実行することにより、バスＢＵＳを介してテキスト変換装置３０の各部を制御する。記憶部３２は、制御部３１によって実行されるコンピュータプログラムやその実行時に使用されるデータを記憶するための記憶手段であり、例えばハードディスク装置である。マイクロホン３５は、収音し、収音した音声を表す音声信号（アナログ信号）を出力する。音声入力端子３６ａは、マイクロホン３５とオーディオケーブルを介して接続され、マイクロホン１５から出力される音声信号が入力される入力端子である。音声処理部３６は、マイクロホン３５が出力する音声信号（アナログ信号）をＡ／Ｄ変換によりデジタルデータに変換する。通信部３８は、他の装置との間で通信ネットワーク２０を介して通信を行うための通信手段である。この実施形態ではテキスト変換装置３０が外部のマイクロホン３５とオーディオケーブルを介して接続される構成について説明するが、マイクロホン３５がテキスト変換装置３０に内蔵される構成であってもよい。 Next, the configuration of the text conversion device 30 will be described with reference to the drawings. FIG. 3 is a block diagram illustrating an example of the configuration of the text conversion device 30. In the figure, the control unit 31 includes a CPU, a ROM, and a RAM, and controls each unit of the text conversion device 30 via the bus BUS by reading and executing a computer program stored in the ROM or the storage unit 32. . The storage unit 32 is a storage unit for storing a computer program executed by the control unit 31 and data used at the time of execution, and is, for example, a hard disk device. The microphone 35 collects sound and outputs an audio signal (analog signal) representing the collected sound. The audio input terminal 36a is connected to the microphone 35 via an audio cable, and is an input terminal to which an audio signal output from the microphone 15 is input. The audio processing unit 36 converts an audio signal (analog signal) output from the microphone 35 into digital data by A / D conversion. The communication unit 38 is a communication unit for performing communication with other devices via the communication network 20. In this embodiment, a configuration in which the text conversion device 30 is connected to an external microphone 35 via an audio cable will be described, but a configuration in which the microphone 35 is built in the text conversion device 30 may be used.

次に、テキスト表示装置４０の構成について、図面を参照しつつ説明する。図４は、テキスト表示装置４０の構成の一例を示すブロック図である。図において、制御部４１は、ＣＰＵやＲＯＭ、ＲＡＭを備え、ＲＯＭ又は記憶部４２に記憶されているコンピュータプログラムを読み出して実行することにより、バスＢＵＳを介してテキスト表示装置４０の各部を制御する。記憶部４２は、制御部４１によって実行されるコンピュータプログラムやその実行時に使用されるデータを記憶するための記憶手段であり、例えばハードディスク装置である。映像出力端子４３ａは、表示装置４３と接続され、表示装置４３にデータを出力する出力端子である。表示装置４３は、液晶パネルを備え、制御部１１による制御の下に各種の画像を表示する。通信部４８は、他の装置との間で通信ネットワーク２０を介して通信を行うための通信手段である。この実施形態ではテキスト表示装置４０が表示装置４３と外部接続する構成について説明するが、テキスト表示装置４０が表示装置４３を内蔵する構成であってもよい。 Next, the configuration of the text display device 40 will be described with reference to the drawings. FIG. 4 is a block diagram illustrating an example of the configuration of the text display device 40. In the figure, the control unit 41 includes a CPU, a ROM, and a RAM, and controls each unit of the text display device 40 via the bus BUS by reading and executing a computer program stored in the ROM or the storage unit 42. . The storage unit 42 is a storage unit for storing a computer program executed by the control unit 41 and data used at the time of execution, and is, for example, a hard disk device. The video output terminal 43 a is an output terminal that is connected to the display device 43 and outputs data to the display device 43. The display device 43 includes a liquid crystal panel and displays various images under the control of the control unit 11. The communication unit 48 is a communication unit for performing communication with other devices via the communication network 20. In this embodiment, a configuration in which the text display device 40 is externally connected to the display device 43 will be described. However, the text display device 40 may have a configuration in which the display device 43 is incorporated.

＜第１実施形態の動作＞
次に、本実施形態に係る会議端末１０の動作について説明する。会議端末１０は、マイクロホン１５で収音した音声を表す音声データと撮影装置１９で撮影した映像を表す映像データとを含むデータ（以下「会議データ」と称する）を、他の会議端末１０に送信する。また、会議端末１０は、複数の他の会議端末１０のそれぞれについて、各会議端末１０の利用者の音声を収音するマイクロホン１５によって収音された音声を表す音声データと、撮影装置１９によって撮影された映像を表す映像データとを含む会議データを、通信ネットワーク２０を介して受信し、受信した会議データに含まれる音声データをスピーカ１７から音として放音するとともに、受信した会議データに含まれる映像データを表示装置１３に出力して映像を表示させる。これにより遠隔会議が実現される。 <Operation of First Embodiment>
Next, the operation of the conference terminal 10 according to the present embodiment will be described. The conference terminal 10 transmits data including audio data representing the sound collected by the microphone 15 and video data representing the video captured by the imaging device 19 (hereinafter referred to as “conference data”) to the other conference terminals 10. To do. In addition, the conference terminal 10 captures the voice data representing the voice collected by the microphone 15 that collects the voice of the user of each conference terminal 10 and the photographing device 19 for each of the plurality of other conference terminals 10. The conference data including the video data representing the received video is received via the communication network 20, and the audio data included in the received conference data is emitted as sound from the speaker 17 and included in the received conference data. The video data is output to the display device 13 to display the video. Thereby, the remote conference is realized.

図５は、本実施形態に係るシステムの動作を説明するための図である。ここでは、会議端末１０Ａを用いる参加者１Ａが発話を行い、一方、会議端末１０Ｂを用いる参加者１Ｂが参加者１Ａの発言を聴取する場合の動作例について説明する。参加者１Ａが発言すると、参加者１Ａが発した音声が会議端末１０Ａに接続されたマイクロホン１５で収音される。会議端末１０Ａの制御部１１は、音声入力端子１６ａに入力された音声データを、会議端末１０Ｂへ送信する。 FIG. 5 is a diagram for explaining the operation of the system according to the present embodiment. Here, an example of an operation when the participant 1A using the conference terminal 10A speaks and the participant 1B using the conference terminal 10B listens to the speech of the participant 1A will be described. When the participant 1A speaks, the voice uttered by the participant 1A is picked up by the microphone 15 connected to the conference terminal 10A. The control unit 11 of the conference terminal 10A transmits the audio data input to the audio input terminal 16a to the conference terminal 10B.

会議端末１０Ｂは、会議端末１０Ａから音声データを受信し、受信した音声データを音声出力端子１６ｂに出力する。これにより、会議端末１０Ｂのスピーカ１７からは、受信した音声データの表す音声が放音される。このとき、テキスト変換装置３０に接続されたマイクロホン３５は、スピーカ１７から放音される音声を収音し、収音した音声を表す音声信号を出力する。テキスト変換装置３０はマイクロホン３５から出力される音声信号をＤ／Ａ変換して音声データを生成し、生成した音声データを音声解析することによって音声データをテキスト変換してテキストデータを生成する。テキスト変換装置３０は、生成したテキストデータを、テキスト表示装置４０へ送信する。すなわち、テキスト変換装置３０は、マイクロホン３５で収音された音声を表すテキストデータをテキスト表示装置４０へ送信する。このとき、テキスト変換装置３０は、マイクロホン３５から出力される音声信号の音圧を検出し、音圧が予め定められた閾値以下の期間が所定時間長以上続いた時間区間を音声の途切れとして検出する。テキスト変換装置３０は、音声の途切れを文節として検出し、文節が検出される毎にテキスト変換処理を行う。 The conference terminal 10B receives the audio data from the conference terminal 10A, and outputs the received audio data to the audio output terminal 16b. Thereby, the voice represented by the received voice data is emitted from the speaker 17 of the conference terminal 10B. At this time, the microphone 35 connected to the text conversion device 30 collects the sound emitted from the speaker 17 and outputs a sound signal representing the collected sound. The text conversion device 30 D / A converts the audio signal output from the microphone 35 to generate audio data, and analyzes the generated audio data to convert the audio data to text to generate text data. The text conversion device 30 transmits the generated text data to the text display device 40. That is, the text conversion device 30 transmits text data representing the sound collected by the microphone 35 to the text display device 40. At this time, the text conversion device 30 detects the sound pressure of the sound signal output from the microphone 35, and detects a time interval in which the sound pressure is below a predetermined threshold for a predetermined time length or more as a sound break. To do. The text conversion device 30 detects breaks in speech as phrases, and performs text conversion processing each time a phrase is detected.

テキスト表示装置４０は、テキスト変換装置３０からテキストデータを受信し、受信したテキストデータを映像出力端子４３ａに出力することによって、受信したテキストデータの示すテキスト（文字）を表示装置４３に表示させる。図６は、表示装置４３に表示される画面の一例を示す図である。図示のように、表示装置４３には、テキスト変換装置３０によって収音された音声がテキスト変換された結果を示す文字が表示される。テキスト表示装置４０の制御部４１は、テキストデータを受信する毎に表示装置４３の表示内容を更新する。すなわち、遠隔会議が行われている最中において、表示装置４３に表示される文字がリアルタイムに更新される。 The text display device 40 receives text data from the text conversion device 30 and outputs the received text data to the video output terminal 43a, thereby causing the display device 43 to display text (characters) indicated by the received text data. FIG. 6 is a diagram illustrating an example of a screen displayed on the display device 43. As shown in the drawing, the display device 43 displays characters indicating the result of text conversion of the sound collected by the text conversion device 30. The control unit 41 of the text display device 40 updates the display content of the display device 43 every time text data is received. That is, during the teleconference, the characters displayed on the display device 43 are updated in real time.

会議の参加者は、表示装置４３に表示された画面を確認しつつ会議を行うことができる。例えば、発話者の声がマイクにうまく入っていない場合には、発話者の音声が正確に相手側の会議端末１０に届いていない場合がある。具体的には、例えば、参加者１Ａが「これは１００円です」と発言した場合でも、聴取者１Ｂは、「これは・・・円です」といったように、発言の一部又は全部が聞き取れない場合がある。そのような場合であっても、本実施形態によれば、テキスト変換装置３０が、マイクロホン３５によって収音された音声をテキスト変換してテキストデータを生成してテキスト表示装置４０へ送信し、テキスト表示装置４０がテキストデータの表すテキストを表示するから、これにより、発話者である参加者１Ａは、自身の発言が聴取者に対してどのように伝わっているかを或る程度把握することができる。このとき、表示装置４３に表示される画面を確認して、自身の発言が正確に伝わっていないと把握される場合には、発話者である参加者１Ａは、もう少しマイクロホン１５に近づいて発言を行ったり、声を大きくしたり、話の内容を繰り返す、といった種々のアクションを起こすことができ、遠隔会議をより円滑に進めることができる。 Participants of the conference can hold the conference while confirming the screen displayed on the display device 43. For example, when the speaker's voice does not enter the microphone well, the speaker's voice may not reach the conference terminal 10 on the other side accurately. Specifically, for example, even if the participant 1A says “This is 100 yen”, the listener 1B can hear a part or all of the statements such as “This is a yen”. There may not be. Even in such a case, according to the present embodiment, the text conversion device 30 converts the sound collected by the microphone 35 into text, generates text data, and transmits the text data to the text display device 40. Since the display device 40 displays the text represented by the text data, the participant 1A who is the speaker can thereby understand to some extent how his / her speech is transmitted to the listener. . At this time, if the user confirms the screen displayed on the display device 43 and finds that his / her speech is not accurately transmitted, the participant 1A who is the speaker approaches the microphone 15 a little more and speaks. Various actions can be taken such as going to, raising the voice, repeating the content of the talk, and the remote conference can proceed more smoothly.

ところで、発話者の音声が聴取者側でどのように聞こえているかを発言者に把握するために、音声でフィードバックをかける事が考えられるが、音声によってフィードバックする場合には、エコーやハウリングが発生した場合のように音声が不明瞭になって聞き取り難くなる場合がある。それに対し、本実施形態によれば、音声をテキスト変換して文字として通知することによって、発話者は自身の発言の伝達状況を客観的に判断することができる。また、音声ではなく、テキスト変換によってテキストとして状況を知らせることで、発話者にフィードバックするデータ量を減らし、レスポンスを高めることができる。 By the way, it is possible to give feedback to the speaker to understand how the speaker's voice is heard on the listener side. However, when feedback is performed by voice, echo and howling occur. In some cases, the voice becomes unclear and difficult to hear. On the other hand, according to the present embodiment, the speaker can objectively determine the state of transmission of his / her speech by converting the voice into text and notifying it as characters. Further, by notifying the situation as text instead of voice, the amount of data fed back to the speaker can be reduced and the response can be improved.

以上説明したように本実施形態によれば、会話（文節）の途中で、エラー／ノイズ等が発生しても、発話者側にその状況を伝えることができる。発話者側はその部分や状況をリアルタイムに認識し、その対応策をとることができる。具体的な対応策としては、例えば、マイクロホンに近づいたり、声を大きくしたり（マイクのボリューム／レベルを上げる）、話の内容を繰り返したりする、等が考えられる。 As described above, according to the present embodiment, even if an error / noise or the like occurs in the middle of a conversation (sentence), the situation can be conveyed to the speaker side. The speaker side can recognize the part and situation in real time and take countermeasures. As specific countermeasures, for example, approaching a microphone, increasing the voice (increasing the volume / level of the microphone), repeating the content of the talk, and the like can be considered.

本実施形態では、会議端末１０Ａとテキスト表示装置４０とが別体の装置として構成されていたが、会議端末１０Ａとテキスト表示装置４０とがひとつの装置として構成されていてもよい。すなわち、本実施形態に係る会議端末１０Ａの機能と、本実施形態に係るテキスト表示装置４０の機能とを有する装置を用いるようにしてもよい。同様に、本実施形態では、会議端末１０Ｂとテキスト変換装置３０とが別体の装置として構成されていたが、会議装置１０Ｂとテキスト変換装置３０とがひとつの装置として構成されていてもよい。すなわち、本実施形態に係る会議端末１０Ｂの機能と、本実施形態に係るテキスト変換装置３０の機能とを有する装置を用いるようにしてもよい。 In the present embodiment, the conference terminal 10A and the text display device 40 are configured as separate devices, but the conference terminal 10A and the text display device 40 may be configured as a single device. That is, a device having the function of the conference terminal 10A according to the present embodiment and the function of the text display device 40 according to the present embodiment may be used. Similarly, in the present embodiment, the conference terminal 10B and the text conversion device 30 are configured as separate devices, but the conference device 10B and the text conversion device 30 may be configured as one device. That is, a device having the function of the conference terminal 10B according to the present embodiment and the function of the text conversion device 30 according to the present embodiment may be used.

＜第２実施形態＞
次に、本発明の第２の実施形態について説明する。図７は、本実施形態に係る遠隔会議システム２００の構成を示すブロック図である。この遠隔会議システム２００は、複数の拠点のそれぞれに設置された複数の会議端末１０Ｃ，１０Ｄが、インターネット等の通信ネットワークを介して互いに接続されて構成される。なお、図７においては２つの会議端末１０Ｃ，１０Ｄを図示しているが、会議端末の数はこれに限定されるものではなく、これより多くてもよい。 Second Embodiment
Next, a second embodiment of the present invention will be described. FIG. 7 is a block diagram showing a configuration of the remote conference system 200 according to the present embodiment. The remote conference system 200 is configured by connecting a plurality of conference terminals 10C and 10D installed at a plurality of bases to each other via a communication network such as the Internet. Although two conference terminals 10C and 10D are illustrated in FIG. 7, the number of conference terminals is not limited to this, and may be larger than this.

本実施形態に係る会議端末１０Ｃ，１０Ｄが、上述した第１の実施形態において説明した会議端末１０Ａ，１０Ｂと異なる点は、会議端末１０Ｃの制御部が行う動作が異なる点と、会議端末１０Ｄの制御部が行う動作が異なる点であり、会議端末１０Ｃ，１０Ｄのハードウェア構成は、上述した第１の実施形態において説明した会議端末１０の構成と同様である。そのため以下では各装置の構成の説明を省略する。また、以下の説明においては、説明の便宜上、会議端末１０Ｃ，１０Ｄを各々区別する必要がない場合には、これらを「会議端末１０」と称して説明する。 The conference terminals 10C and 10D according to the present embodiment differ from the conference terminals 10A and 10B described in the first embodiment described above in that the operation performed by the control unit of the conference terminal 10C is different from the conference terminal 10D. The operation performed by the control unit is different, and the hardware configuration of the conference terminals 10C and 10D is the same as the configuration of the conference terminal 10 described in the first embodiment. Therefore, description of the configuration of each device is omitted below. In the following description, for convenience of description, when it is not necessary to distinguish between the conference terminals 10C and 10D, they will be referred to as “conference terminals 10”.

次に、本実施形態の動作について、図８を参照しつつ説明する。図８は、本実施形態に係る会議端末１０Ｃ及び会議端末１０Ｄの機能的構成の一例を示す図である。図において、送信制御部１１１，テキスト変換部１１２，受信制御部１１７，テキスト比較部１１８及びメッセージ出力部１１９は、会議端末１０Ｃの制御部１１が記憶部１２に記憶されたコンピュータプログラムを読み出して実行することにより実現される。また、図において、受信制御部１１３，テキスト変換部１１４，送信制御部１１５及びテキスト変換部１１６は、会議端末１０Ｄの制御部１１が、記憶部１２に記憶されたコンピュータプログラムを読み出して実行することにより実現される。なお、図中の矢印はデータの流れを概略的に示すものである。 Next, the operation of the present embodiment will be described with reference to FIG. FIG. 8 is a diagram illustrating an example of a functional configuration of the conference terminal 10C and the conference terminal 10D according to the present embodiment. In the figure, a transmission control unit 111, a text conversion unit 112, a reception control unit 117, a text comparison unit 118, and a message output unit 119 read out and execute a computer program stored in the storage unit 12 by the control unit 11 of the conference terminal 10C. It is realized by doing. In the figure, the reception control unit 113, the text conversion unit 114, the transmission control unit 115, and the text conversion unit 116 read and execute the computer program stored in the storage unit 12 by the control unit 11 of the conference terminal 10D. It is realized by. The arrows in the figure schematically show the flow of data.

図において、送信制御部１１１は、音声処理部１６から出力される音声データ、すなわちマイクロホン１５で収音された音声を表す音声データを、通信部１８を介して会議端末１０Ｄへ送信する。テキスト変換部１１２は、音声処理部１６から出力される音声データ、すなわちマイクロホン１５で収音された音声を表す音声データをテキスト変換してテキストデータを生成し、生成したテキストデータ（以下「テキストデータＴ１」という）を記憶部１２の所定の記憶領域に記憶する。 In the figure, the transmission control unit 111 transmits audio data output from the audio processing unit 16, that is, audio data representing audio collected by the microphone 15 to the conference terminal 10 </ b> D via the communication unit 18. The text conversion unit 112 converts the voice data output from the voice processing unit 16, that is, voice data representing the voice collected by the microphone 15 into text, generates text data, and generates the generated text data (hereinafter “text data”). T1 ”) is stored in a predetermined storage area of the storage unit 12.

会議端末１０Ｄの受信制御部１１３は、通信部１８を介して会議端末１０Ｃから音声データを受信し、受信した音声データを音声処理部１６に供給してスピーカ１７から音として放音させる。また、受信制御部１１３は、受信した音声データをテキスト変換部１１４に供給する。テキスト変換部１１４は、受信制御部１１３から供給される音声データをテキスト変換してテキストデータを生成し、生成したテキストデータ（以下「テキストデータＴ２」という）を送信制御部１１５に出力する。 The reception control unit 113 of the conference terminal 10D receives audio data from the conference terminal 10C via the communication unit 18, supplies the received audio data to the audio processing unit 16, and emits sound as sound from the speaker 17. Further, the reception control unit 113 supplies the received voice data to the text conversion unit 114. The text conversion unit 114 converts the audio data supplied from the reception control unit 113 into text, generates text data, and outputs the generated text data (hereinafter referred to as “text data T2”) to the transmission control unit 115.

テキスト変換部１１６は、音声処理部１６から供給される音声データ、すなわちマイクロホン１５で収音された音声を表す音声データをテキスト変換してテキストデータを生成し、生成したテキストデータ（以下「テキストデータＴ３」という）を送信制御部１１５に出力する。なお、テキスト変換部１１２、テキスト変換部１１４及びテキスト変換部１１６のそれぞれが行うテキスト変換処理の処理結果にばらつきが生じないように、テキスト変換部１１２、テキスト変換部１１４及びテキスト変換部１１６は同一のアルゴリズムを用いてテキスト変換処理を行うことが好ましい。なお、会議端末１０Ｃ及び会議端末１０Ｄが行うテキスト変換処理は従来のテキスト変換技術を用いればよく、ここでは処理についての詳細な説明を省略する。送信制御部１１５は、音声処理部１６から出力される音声データと、テキスト変換部１１４から供給されるテキストデータＴ２、及びテキスト変換部１１６から供給されるテキストデータＴ３を、通信部１８を介して会議端末１０Ｃへ送信する。 The text conversion unit 116 converts the voice data supplied from the voice processing unit 16, that is, voice data representing the voice collected by the microphone 15 into text, generates text data, and generates the generated text data (hereinafter “text data”). T3 ”) is output to the transmission control unit 115. Note that the text conversion unit 112, the text conversion unit 114, and the text conversion unit 116 are the same so that the processing results of the text conversion processing performed by the text conversion unit 112, the text conversion unit 114, and the text conversion unit 116 do not vary. It is preferable to perform the text conversion process using the algorithm. Note that the text conversion processing performed by the conference terminal 10C and the conference terminal 10D may use a conventional text conversion technology, and a detailed description of the processing is omitted here. The transmission control unit 115 receives the voice data output from the voice processing unit 16, the text data T2 supplied from the text conversion unit 114, and the text data T3 supplied from the text conversion unit 116 via the communication unit 18. Transmit to the conference terminal 10C.

会議端末１０Ｃの受信制御部１１７は、通信部１８を介して会議端末１０Ｄから音声データとテキストデータＴ２，Ｔ３とを受信し、受信したテキストデータＴ２，Ｔ３をテキスト比較部１１８に供給する。なお、受信制御部１１７は、受信した音声データを音声処理部１６を介してスピーカ１７に供給することによって音として放音させるが、この音声データの流れについては図面が煩雑になるのを防ぐため、図８においては図示を省略している。 The reception control unit 117 of the conference terminal 10C receives the voice data and the text data T2 and T3 from the conference terminal 10D via the communication unit 18, and supplies the received text data T2 and T3 to the text comparison unit 118. The reception control unit 117 emits the received audio data as sound by supplying the received audio data to the speaker 17 via the audio processing unit 16, but the flow of this audio data is to prevent the drawing from becoming complicated. In FIG. 8, the illustration is omitted.

テキスト比較部１１８は、記憶部１２に記憶されたテキストデータＴ１、受信されたテキストデータＴ２及びテキストデータＴ３をそれぞれ比較し、比較結果に応じてメッセージを生成する。具体的には、例えば、テキストデータＴ１とテキストデータＴ２との間に差分がある場合には、テキスト比較部１１８は、ネットワークの問題によりデータの欠落が発生していると判定し、その旨を示すメッセージを生成する。また、例えば、テキストデータＴ２とテキストデータＴ３との間に差分がある場合には、テキスト比較部１１８は、スピーカ１７の放音又はマイクロホン１５の収音に問題があるか、又は、マイクロホン１５の周囲の騒音によってノイズが発生していると判定し、その旨を示すメッセージを生成する。テキスト比較部１１８は、生成したメッセージをメッセージ出力部１１９に供給する。 The text comparison unit 118 compares the text data T1, the received text data T2, and the text data T3 stored in the storage unit 12, and generates a message according to the comparison result. Specifically, for example, when there is a difference between the text data T1 and the text data T2, the text comparison unit 118 determines that data is missing due to a network problem, and indicates that fact. Generate a message to indicate. Further, for example, when there is a difference between the text data T2 and the text data T3, the text comparison unit 118 has a problem in sound emission of the speaker 17 or sound collection of the microphone 15, or the microphone 15 It is determined that noise is generated by ambient noise, and a message indicating that is generated. The text comparison unit 118 supplies the generated message to the message output unit 119.

メッセージ出力部１１９は、テキスト比較部１１８から供給されるメッセージデータを表示装置１３に出力する。このとき、メッセージ出力部１１９は、テキストデータＴ１、Ｔ２及びＴ３を表示装置１３に出力することによって、テキストデータＴ１、Ｔ２及びＴ３それぞれの示すテキストを表示させるようにしてもよい。 The message output unit 119 outputs the message data supplied from the text comparison unit 118 to the display device 13. At this time, the message output unit 119 may display texts indicated by the text data T1, T2, and T3 by outputting the text data T1, T2, and T3 to the display device 13, respectively.

会議の参加者は、表示装置１３に表示された画面を確認しつつ会議を行うことができる。このように本実施形態によれば、複数ポイントで音声のテキスト変換を行い、それぞれの変換結果を比較することによって、どの経路でノイズが発生したかを発話者側に明示することができる。このようにノイズが発生するポイントや経路を発話者に報知することによって、発話者がノイズに対して対策を（リアルタイム又は次回準備として）とることが可能となる。 Participants of the conference can hold the conference while confirming the screen displayed on the display device 13. As described above, according to the present embodiment, it is possible to clearly indicate to the speaker side which route the noise has occurred by performing text conversion of speech at a plurality of points and comparing the respective conversion results. Thus, by notifying the speaker of the point or route where the noise occurs, the speaker can take measures against the noise (in real time or as the next preparation).

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその例を示す。なお、以下の各態様を適宜に組み合わせてもよい。
（１）上述の実施形態では、本発明に係る通信端末を用いて遠隔会議を行う場合について説明したが、本発明はこれに限らず、例えば、通信ネットワークを介して講義や講演を行う場合においても本発明を適用することができる。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below. In addition, you may combine each following aspect suitably.
(1) In the above-described embodiment, the case where a remote conference is performed using the communication terminal according to the present invention has been described. However, the present invention is not limited to this, and for example, when a lecture or lecture is performed via a communication network. The present invention can also be applied.

（２）上述の第２の実施形態では、テキスト変換部１１２で生成されたテキストデータＴ１、テキスト変換部１１４で生成されたテキストデータＴ２、及びテキスト変換部１１６で生成されたテキストデータＴ３を比較し、比較結果に応じてメッセージを生成するようにしたが、３つのテキストデータを全て相互に比較する必要はなく、例えば、会議端末１０Ｃのテキスト比較部１１８が、テキストデータＴ１とテキストデータＴ３とを比較し、両者の差分に応じたメッセージを出力するようにしてもよい。要は、テキスト比較部１１８は、テキストデータＴ１，Ｔ２及びＴ３の少なくともいずれか２つを比較し、比較結果に応じてメッセージを出力するようにすればよい。 (2) In the second embodiment described above, the text data T1 generated by the text converter 112, the text data T2 generated by the text converter 114, and the text data T3 generated by the text converter 116 are compared. However, the message is generated according to the comparison result. However, it is not necessary to compare all the three text data with each other. For example, the text comparison unit 118 of the conference terminal 10C may select the text data T1 and the text data T3. And a message corresponding to the difference between the two may be output. In short, the text comparison unit 118 may compare at least any two of the text data T1, T2, and T3 and output a message according to the comparison result.

（３）上述の第２の実施形態では、メッセージ出力部１１９は、テキスト比較部１１８によって生成されたメッセージデータを映像出力端子１３ａに出力するようにしたが、メッセージの出力態様は、これに限らず、音として発話者に報知するようにしてもよい。この場合の会議端末１０の構成の一例を図９に示す。図９において、メッセージ出力部１１９ａは、テキスト比較部１１８によって生成されたメッセージデータ（比較結果を示すデータ）を音声データに変換して音声出力端子１６ｂに出力する。要は、メッセージ出力部１１９（又はメッセージ出力部１１９ａ）が、映像出力端子１３ａ及び音声出力端子１６ｂの少なくともいずれか一方に、メッセージを表すデータを出力するようにすればよい。 (3) In the second embodiment described above, the message output unit 119 outputs the message data generated by the text comparison unit 118 to the video output terminal 13a. However, the message output mode is not limited to this. Instead, you may make it alert | report to a speaker as a sound. An example of the configuration of the conference terminal 10 in this case is shown in FIG. In FIG. 9, the message output unit 119a converts the message data (data indicating the comparison result) generated by the text comparison unit 118 into voice data and outputs the voice data to the voice output terminal 16b. In short, the message output unit 119 (or the message output unit 119a) may output data representing a message to at least one of the video output terminal 13a and the audio output terminal 16b.

（４）上述の第２の実施形態において、会議端末１０Ｃが、テキストデータの比較結果を示すメッセージを出力するに加えて、テキストデータＴ１，Ｔ２及びＴ３の少なくともいずれか一つを表示装置１３に出力するようにしてもよい。
また、上述の第２の実施形態において、会議端末１０Ｃが、比較結果を示すメッセージを出力せずに、テキストデータＴ１，Ｔ２及びＴ３の少なくともいずれかひとつを表示装置１３に出力するようにしてもよい。具体的には、例えば、会議端末１０Ｃが、会議端末１０Ｄから受信したテキストデータＴ２（すなわちテキスト変換部１１４で音声からテキストに変換されたデータ）を表示装置１３に出力することによって、テキストデータＴ２の示す文字を表示装置１３に表示させるようにしてもよい。
テキストデータＴ１、Ｔ２、Ｔ３の表示にあたっては、それぞれ不具合の意味するところが異なる。例えば、テキストデータＴ１であれば、会議端末１Ｃ側でのマイク収音系の不具合、テキストデータＴ２であれば通信上の不具合、テキストデータＴ３であれば会議端末１Ｄ側での放音から収音系の不具合となる。そこで、会議端末１０Ｄがテキストデータを表示する際に、どこの系でどのような不具合が起こっているかという情報がわかる形式で表示することが好ましい。 (4) In the second embodiment described above, in addition to the conference terminal 10C outputting a message indicating the comparison result of the text data, at least one of the text data T1, T2, and T3 is displayed on the display device 13. You may make it output.
In the second embodiment described above, the conference terminal 10C may output at least one of the text data T1, T2, and T3 to the display device 13 without outputting a message indicating the comparison result. Good. Specifically, for example, the conference terminal 10C outputs the text data T2 received from the conference terminal 10D (that is, data converted from voice to text by the text conversion unit 114) to the display device 13, thereby the text data T2 May be displayed on the display device 13.
In displaying the text data T1, T2, and T3, the meanings of defects are different. For example, in the case of text data T1, a microphone pickup system malfunction on the conference terminal 1C side, in the case of text data T2, a malfunction in communication, and in the case of text data T3, sound is collected from sound emission on the conference terminal 1D side. It becomes a malfunction of the system. Therefore, when the conference terminal 10D displays the text data, it is preferable to display the text data in a format in which information indicating what kind of trouble is occurring in which system.

（５）上述の第１の実施形態において、テキスト変換装置３０が、生成したテキストデータと、会議端末１０Ｂ（又はテキスト変換装置３０）を識別する識別情報とをあわせてテキスト表示装置４０へ送信するようにしてもよい。この場合、テキスト表示装置４０は、テキスト変換装置３０から送信されてくるテキストデータと識別情報とを受信し、受信したテキストデータと識別情報とを映像出力端子に出力する。これにより、テキスト表示装置４０に接続された表示装置４３には、受信されたテキストデータの示す文字と会議端末１０Ｂ（又はテキスト変換装置３０）を識別する識別情報とが表示される。例えば、３以上の複数の会議端末１０を用いて遠隔会議を行っている場合において、テキストデータと各テキストデータに対応する識別情報（端末ＩＤ）が表示されることによって、会議の参加者は、他の会議端末１０との通信状況（エラー、ノイズが発生していないか、等）を、会議端末１０毎に把握することができる。すなわち、発話者は、どの拠点でどのような問題が発生したかを、表示される画面を参照することで確認することができる。 (5) In the first embodiment described above, the text conversion device 30 transmits the generated text data and identification information for identifying the conference terminal 10B (or the text conversion device 30) to the text display device 40. You may do it. In this case, the text display device 40 receives the text data and the identification information transmitted from the text conversion device 30, and outputs the received text data and the identification information to the video output terminal. Thereby, the display device 43 connected to the text display device 40 displays the characters indicated by the received text data and the identification information for identifying the conference terminal 10B (or the text conversion device 30). For example, in a case where a remote conference is performed using a plurality of conference terminals 10 of 3 or more, text data and identification information (terminal ID) corresponding to each text data are displayed. The communication status with other conference terminals 10 (whether an error or noise has occurred) can be grasped for each conference terminal 10. That is, the speaker can confirm what kind of problem has occurred at which site by referring to the displayed screen.

また、この態様において、テキスト表示装置４０の記憶部４２に、識別情報と表示態様との対応関係を予め記憶しておく構成とし、テキスト表示装置４０の制御部４１が、テキストデータと識別情報とを受信すると、受信したテキストデータの示すテキストを、受信された識別情報に対応する表示態様で表示するように表示装置４３を制御するようにしてもよい。このようにすることで、多地点との会話において、発話者が各拠点の状況を把握することができ、重要度に応じた対応を行うことができる。 In this aspect, the storage unit 42 of the text display device 40 is configured to store the correspondence between the identification information and the display mode in advance, and the control unit 41 of the text display device 40 stores the text data and the identification information. , The display device 43 may be controlled so that the text indicated by the received text data is displayed in a display mode corresponding to the received identification information. By doing in this way, in the conversation with many points, the speaker can grasp the situation of each base, and can respond according to the importance.

上述の第２の実施形態についても同様であり、会議端末１０Ｄが、生成したテキストデータと会議端末１０Ｄを識別する識別情報とをあわせて会議端末１０Ｃへ送信し、会議端末１０Ｃが、比較結果を示すメッセージと識別情報とを出力するようにしてもよい。この場合において会議端末１０Ｃの表示装置１３に表示される画面の一例を図１０に示す。図１０に示す例においては、会議端末１０Ｃの制御部１１は、受信されたテキストデータＴ２の示すテキストＰ１，Ｐ２，…を拠点（会議端末１０Ｃ）毎に表示するとともに、テキストデータの比較結果に応じたメッセージＭ１を表示装置１３に表示させる。
また、会議端末１０Ｃの記憶部１２に識別情報と表示態様との対応関係を予め記憶しておく構成とし、会議端末１０Ｃの制御部１１が、比較結果を示すメッセージを、受信された識別情報に対応する表示態様で表示するように表示装置１３を制御するようにしてもよい。 The same applies to the second embodiment described above. The conference terminal 10D transmits the generated text data and identification information for identifying the conference terminal 10D to the conference terminal 10C, and the conference terminal 10C displays the comparison result. A message and identification information may be output. An example of the screen displayed on the display device 13 of the conference terminal 10C in this case is shown in FIG. In the example shown in FIG. 10, the control unit 11 of the conference terminal 10C displays the texts P1, P2,... Indicated by the received text data T2 for each base (conference terminal 10C), and displays the comparison result of the text data. The corresponding message M1 is displayed on the display device 13.
In addition, the correspondence relationship between the identification information and the display mode is stored in advance in the storage unit 12 of the conference terminal 10C, and the control unit 11 of the conference terminal 10C converts the message indicating the comparison result into the received identification information. The display device 13 may be controlled to display in a corresponding display mode.

（６）上述の第２の実施形態では、遠隔会議を行っている最中に会議端末１０Ｃがテキストデータの比較を行って比較結果を示すメッセージを出力するようにしたが、比較結果をリアルタイムに出力するに限らず、比較結果をログとして記憶部１２の所定の記憶領域に蓄積するようにしてもよい。このようにすることにより、会議の参加者等は、次回の会議等においてロス対策（環境の整備や、伝聞内容の吟味）等を行うことができる。 (6) In the second embodiment described above, the conference terminal 10C compares the text data and outputs a message indicating the comparison result during the remote conference, but the comparison result is displayed in real time. Not only the output but also the comparison result may be accumulated as a log in a predetermined storage area of the storage unit 12. By doing so, participants of the conference can take measures against loss (environmental maintenance, examination of the contents of hearing), etc. at the next conference.

（７）上述の実施形態において、会議端末１０の制御部１１によって実行されるプログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータが読取可能な記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由で会議端末１０にダウンロードさせることも可能である。また、上述の第２の実施形態において図８に示した送信制御部１１１、テキスト変換部１１２、受信制御部１１３、テキスト変換部１１４、送信制御部１１５、テキスト変換部１１６、受信制御部１１７、テキスト比較部１１８及びメッセージ出力部１１９の各部は、制御部１１がＲＯＭ又は記憶部１２に記憶されているコンピュータプログラムを実行することによってソフトウェアとして実現されたが、これに限らず、各部がハードウェアで構成されていてもよい。 (7) In the above-described embodiment, the program executed by the control unit 11 of the conference terminal 10 includes a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, and a semiconductor memory. It can be provided in a state where it is recorded on a computer-readable recording medium. It is also possible to download to the conference terminal 10 via a network such as the Internet. In the second embodiment described above, the transmission control unit 111, the text conversion unit 112, the reception control unit 113, the text conversion unit 114, the transmission control unit 115, the text conversion unit 116, the reception control unit 117, shown in FIG. Each unit of the text comparison unit 118 and the message output unit 119 is realized as software by the control unit 11 executing a computer program stored in the ROM or the storage unit 12. However, the present invention is not limited to this, and each unit is hardware. It may be comprised.

遠隔会議システムの構成の一例を示す図である。It is a figure which shows an example of a structure of a remote conference system. 会議端末のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of a conference terminal. テキスト変換装置３０のハードウェア構成の一例を示すブロック図である。3 is a block diagram illustrating an example of a hardware configuration of a text conversion device 30. FIG. テキスト表示装置４０のハードウェア構成の一例を示すブロック図である。3 is a block diagram illustrating an example of a hardware configuration of a text display device 40. FIG. 遠隔会議システムの動作を説明するための図である。It is a figure for demonstrating operation | movement of a remote conference system. 表示装置に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display apparatus. 遠隔会議システムの構成の一例を示す図である。It is a figure which shows an example of a structure of a remote conference system. 遠隔会議システムの動作を説明するための図である。It is a figure for demonstrating operation | movement of a remote conference system. 遠隔会議システムの動作を説明するための図である。It is a figure for demonstrating operation | movement of a remote conference system. 表示装置に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on a display apparatus.

Explanation of symbols

１０…会議端末、１１，３１，４１…制御部、１２，３２，４２…記憶部、１３，４３…表示装置、１４…操作部、１５，３５…マイクロホン、１６，３６…音声処理部、１７…スピーカ、１８，３８，４８…通信部、１９…撮影装置、２０…通信ネットワーク、３０…テキスト変換装置、４０…テキスト表示装置、１００，２００…遠隔会議システム、１１１，１１５…送信制御部、１１２，１１４，１１６…テキスト変換部、１１３，１１７…受信制御部、１１８…テキスト比較部、１１９…メッセージ出力部。 DESCRIPTION OF SYMBOLS 10 ... Conference terminal, 11, 31, 41 ... Control part, 12, 32, 42 ... Memory | storage part, 13, 43 ... Display apparatus, 14 ... Operation part, 15, 35 ... Microphone, 16, 36 ... Voice processing part, 17 ... Speaker, 18, 38, 48 ... communication unit, 19 ... imaging device, 20 ... communication network, 30 ... text conversion device, 40 ... text display device, 100,200 ... remote conference system, 111,115 ... transmission control unit, 112, 114, 116 ... text conversion unit, 113, 117 ... reception control unit, 118 ... text comparison unit, 119 ... message output unit.

Claims

A voice communication system having first and second terminals connected to each other via a communication network,
The first terminal is
A first sound input terminal to which sound data output from a sound collecting device that outputs sound data representing collected sound is input;
A video output terminal that outputs data to a display device that displays an image according to the supplied data;
Audio data transmitting means for transmitting audio data input to the first audio input terminal to the second terminal;
Text data receiving means for receiving text data from the second terminal;
Text data output means for outputting the text data received by the text data receiving means to the video output terminal,
The second terminal is
An audio output terminal for outputting audio data to a sound emitting device that emits sound according to the supplied audio data;
A second sound input terminal for receiving sound data output from the sound collecting device for collecting sound emitted by the sound emitting device and outputting sound data representing the collected sound;
Second audio data receiving means for receiving audio data transmitted from the first terminal;
Output means for outputting the audio data received by the second audio data receiving means to the audio output terminal;
Text conversion means for generating text data by text-converting voice data input to the second voice input terminal;
A voice communication system comprising: text data transmission means for transmitting text data generated by the text conversion means to the first terminal.

A voice communication system having first and second terminals connected to each other via a communication network,
The first terminal is
A first sound input terminal to which sound data output from a sound collecting device that outputs sound data representing collected sound is input;
A video output terminal that outputs data to a display device that displays an image according to the supplied data;
Audio data transmitting means for transmitting audio data input to the first audio input terminal to the second terminal;
Text data receiving means for receiving text data from the second terminal;
Text data output means for outputting the text data received by the text data receiving means to the video output terminal,
The second terminal is
Second audio data receiving means for receiving audio data transmitted from the first terminal;
Text conversion means for generating text data by text-converting the voice data received by the second voice data receiving means;
A voice communication system comprising: text data transmission means for transmitting text data generated by the text conversion means to the first terminal.

A voice communication system having first and second terminals connected to each other via a communication network,
The first terminal is
A first sound input terminal to which sound data output from a sound collecting device that outputs sound data representing collected sound is input;
A video output terminal that outputs data to a display device that displays an image according to the supplied data;
Audio data transmitting means for transmitting audio data input to the first audio input terminal to the second terminal;
First text conversion means for converting text data of voice data input to the first voice input terminal into text data;
Text data receiving means for receiving text data from the second terminal;
The text data received by the text data receiving means is compared with the text data generated by the first text converting means, and a message corresponding to the difference between them is determined according to the video output terminal and the supplied audio. Message output means for outputting to at least one of the first audio output terminals connected to the sound emitting device for emitting sound,
The second terminal is
A second audio output terminal that outputs audio data to a sound emitting device that emits sound according to the supplied audio data;
A second sound input terminal for receiving sound data output from the sound collecting device for collecting sound emitted by the sound emitting device and outputting sound data representing the collected sound;
Second audio data receiving means for receiving audio data transmitted from the first terminal;
Output means for outputting the audio data received by the second audio data receiving means to the second audio output terminal;
Second text conversion means for converting text data of voice data input to the second voice input terminal to generate text data;
A voice communication system, comprising: text data transmission means for transmitting text data generated by the second text conversion means to the first terminal.

The first terminal is
Second text conversion means for generating text data by converting voice data input to the first voice input terminal;
The text data received by the text data receiving means is compared with the second text data generated by the second text converting means, and a message corresponding to the difference between the two is supplied to the video output terminal. The voice communication system according to claim 1, further comprising: message output means for outputting to at least one of voice output terminals connected to a sound emitting device that emits sound according to voice.

The second terminal is
Third text conversion means for converting the voice data received by the voice data receiving means into text to generate third text data;
Third text data transmission means for transmitting the third text data generated by the third text conversion means to the first terminal, and
The first terminal is
A third text data receiving means for receiving the third text data transmitted by the third text data transmitting means;
The message output means of the first terminal compares the third text data received by the third text data receiving means with at least one of the text data and the second text data; The voice communication system according to claim 4, wherein a message corresponding to the comparison result is output.

The text data transmission means of the second terminal transmits the text data generated by the text conversion means and identification information for identifying the terminal itself to the first terminal,
The text data receiving means of the first terminal receives the text data and the identification information from the second terminal;
The text data output means of the first terminal outputs the text data and identification information received by the text data receiving means to the video output terminal. The voice communication system according to item.

The first terminal is
Storage means for storing the correspondence between the identification information and the display mode;
The text data output means displays the text indicated by the text data received by the text data receiving means on the display device in a display mode corresponding to the identification information received by the text data receiving means. The voice communication system according to claim 6.