JP2009267623A

JP2009267623A - Communication device and voice communication system

Info

Publication number: JP2009267623A
Application number: JP2008112790A
Authority: JP
Inventors: Makoto Tanaka; 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-04-23
Filing date: 2008-04-23
Publication date: 2009-11-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique for notifying a speaker whether a listener is actually listening to conversation in voice communication among a plurality of communication terminals. <P>SOLUTION: A terminal 10 decodes and mixes received voice data for outputting to a voice processing section 15, when receiving the voice data from another terminal 10. When the listener depresses a button B1, an operation section 18 outputs an operation signal corresponding to the operation content. A packet compositing section 17 transmits the voice data indicating voice collected by a microphone MC and notification data according to the operation signal output from the operation section 18 to another terminal 10. The voice processing section 15 processes the voice data so that the longer elapsed time after the last depression of the button B1 is, the larger the degree of deterioration in sound quality becomes. The voice data processed by the voice processing section 15 are supplied to a speaker SP for outputting as sound. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、通信装置及び音声通信システムに関する。 The present invention relates to a communication device and a voice communication system.

近年、通信網を介して接続された複数の通信端末を用いて会議を行う遠隔会議システムが普及している。このような遠隔会議システムにおいては、発話者と聴取者が直接対面していないため、発話者が聴取者の反応を感じることが困難であり、自身の声が相手に届いているかを不安に感じる場合がある。そこで、特許文献１には、発話者の不安を解消するために、送信した音声・画像データが受信側においてどのような状態で届いているかを送信側でリアルタイムに表示する技術が提案されている。また、特許文献２には、ネットワークを介して接続された複数の拠点同士で音声データの送受信する音声会議システムにおいて、ネットワークを介して接続された他の拠点（自拠点）から伝送された入力音声のエコーの音声信号を他の拠点に送り返すことにより、送り返された他の拠点にて、相手拠点において期待する明瞭な音声品質で再生されているかを確認することができる技術が提案されている。
特開２００５−２６９４９８号公報特開２００７−２７４１７６号公報 In recent years, a remote conference system that performs a conference using a plurality of communication terminals connected via a communication network has become widespread. In such a teleconference system, since the speaker and listener are not directly facing each other, it is difficult for the speaker to feel the listener's reaction, and he / she feels uneasy about whether his / her voice reaches the other party. There is a case. Therefore, Patent Document 1 proposes a technique for displaying in real time on the transmission side what kind of state the transmitted voice / image data has reached on the reception side in order to eliminate the anxiety of the speaker. . Patent Document 2 discloses an input voice transmitted from another base (own base) connected via a network in an audio conference system that transmits and receives audio data between a plurality of bases connected via a network. A technique has been proposed in which an echo audio signal is sent back to another site, so that it can be confirmed at the other site that is sent back, whether it is being reproduced with the clear voice quality expected at the other site.
JP 2005-269498 A JP 2007-274176 A

しかしながら、上述の特許文献１や２に記載の技術では、話者側の音声が相手側端末においてどのような状態で届いているかを確認することができるものの、聴取者が本当に話を聞いているか否かを確認することはできない。音声が相手側端末に確実に届いているとしても、相手側端末において再生される音声を聴取者が聞いていない虞もあり、このような場合でも、発話者はそれを確認することは出来なかった。
本発明は上述した背景に鑑みてなされたものであり、複数の通信端末間で音声通信を行う際に、聴取者が実際に話を聞いているかを、発話者に対して報知することのできる技術を提供することを目的とする。 However, with the techniques described in Patent Documents 1 and 2 described above, it is possible to confirm in what state the voice on the speaker side has reached the counterpart terminal, but is the listener really listening to the story? It is not possible to confirm whether or not. Even if the sound is surely delivered to the other party's terminal, there is a possibility that the listener is not listening to the voice played on the other party's terminal. In such a case, the speaker cannot confirm it. It was.
The present invention has been made in view of the above-described background. When performing voice communication between a plurality of communication terminals, it is possible to notify the speaker whether the listener is actually listening to the speech. The purpose is to provide technology.

上記課題を解決するために、本発明は、通信ネットワークを介して接続された他の端末から送信されてくる音声データを受信する受信手段と、操作者によって操作される操作手段から出力される予め定められた信号を検出する信号検出手段と、前記信号検出手段が前記信号を検出したときに、その旨を示す通知データを、前記他の端末に対して送信する通知データ送信手段と、前記受信手段により受信された音声データを加工する音声データ加工手段であって、前記信号検出手段が前記信号を検出してからの経過時間が長いほど加工の度合いが高くなるように該音声データを加工する音声データ加工手段と、前記音声データ加工手段により加工された音声データを放音手段に出力する出力手段とを具備することを特徴とする通信装置を提供する。 In order to solve the above-described problems, the present invention provides a reception unit that receives audio data transmitted from another terminal connected via a communication network, and an output unit that is output in advance from an operation unit that is operated by an operator. A signal detection unit for detecting a predetermined signal, a notification data transmission unit for transmitting notification data indicating that to the other terminal when the signal detection unit detects the signal, and the reception Audio data processing means for processing the audio data received by the means, wherein the audio data is processed so that the degree of processing becomes higher as the elapsed time from the detection of the signal by the signal detection means becomes longer. Provided is a communication device comprising: voice data processing means; and output means for outputting voice data processed by the voice data processing means to a sound emitting means. .

本発明の好ましい態様において、前記音声データ加工手段は、前記信号検出手段が前記信号を検出してからの経過時間が長いほど音質の劣化の度合いが高くなるように前記音声データを加工してもよい。 In a preferred aspect of the present invention, the audio data processing means may process the audio data so that the degree of deterioration in sound quality increases as the elapsed time from the detection of the signal by the signal detection means increases. Good.

また、本発明の別の好ましい態様において、前記音声データ加工手段は、前記信号検出手段が前記信号を検出してからの経過時間が長いほど音圧が小さくなるように前記音声データを加工してもよい。 In another preferable aspect of the present invention, the sound data processing means processes the sound data so that the sound pressure decreases as the elapsed time from the detection of the signal by the signal detection means increases. Also good.

また、本発明の更に好ましい態様において、前記操作手段は複数の操作子を備え、操作者によって操作された操作子に応じた信号を出力し、前記信号検出手段は、前記操作手段の操作子に対応する信号を検出し、前記通知データ送信手段は、前記検出手段が前記信号を検出したときに、検出された信号に対応する操作子に応じた通知データを、前記他の端末に対して送信してもよい。 Further, in a further preferred aspect of the present invention, the operating means includes a plurality of operating elements, outputs a signal corresponding to the operating element operated by an operator, and the signal detecting means is provided to the operating element of the operating means. A corresponding signal is detected, and the notification data transmission means transmits notification data corresponding to an operator corresponding to the detected signal to the other terminal when the detection means detects the signal. May be.

また、本発明の更に好ましい態様において、前記受信手段は、通信ネットワークを介して接続された複数の他の端末から送信されてくる音声データを前記他の端末毎に受信し、前記受信手段により受信された音声データの音圧を前記他の端末毎に検出する音圧検出手段と、前記複数の他の端末のなかから、前記音圧検出手段によって音圧が検出された端末を特定する端末特定手段とを具備し、前記通知データ送信手段は、前記信号検出手段が前記信号を検出したときに、その旨を示す通知データを、前記端末特定手段によって特定された端末に対して送信してもよい。 Moreover, in a further preferred aspect of the present invention, the receiving means receives voice data transmitted from a plurality of other terminals connected via a communication network for each of the other terminals, and is received by the receiving means. A sound pressure detecting means for detecting the sound pressure of the recorded voice data for each of the other terminals, and a terminal specification for specifying a terminal from which the sound pressure is detected by the sound pressure detecting means among the plurality of other terminals And when the signal detection means detects the signal, the notification data transmission means may send notification data indicating the fact to the terminal specified by the terminal specification means. Good.

また、本発明は、第１の端末と第２の端末とが通信ネットワークを介して接続された音声通信システムであって、前記第１の端末は、前記第２の端末から送信されてくる音声データを受信する音声データ受信手段と、操作者によって操作される操作手段から出力される予め定められた信号を検出する信号検出手段と、前記信号検出手段が前記信号を検出したときに、その旨を示す通知データを、前記他の端末に対して送信する通知データ送信手段と、前記受信手段により受信された音声データを加工する音声データ加工手段であって、前記信号検出手段が前記信号を検出してからの経過時間が長いほど加工の度合いが高くなるように該音声データを加工する音声データ加工手段と、前記音声データ加工手段により加工された音声データを放音手段に出力する音声データ出力手段とを具備し、前記第２の端末は、収音手段によって収音された音声を表す音声データを、前記第１の端末に対して送信する音声データ送信手段と、前記通知データ送信手段によって送信された通知データを受信する通知データ受信手段と、前記通知データ受信手段によって受信された通信データの示す内容を報知する報知手段とを具備することを特徴とする音声通信システムを提供する。 The present invention is also an audio communication system in which a first terminal and a second terminal are connected via a communication network, and the first terminal transmits audio transmitted from the second terminal. Audio data receiving means for receiving data; signal detecting means for detecting a predetermined signal output from an operating means operated by an operator; and when the signal detecting means detects the signal, to that effect Notification data transmitting means for transmitting notification data indicating to the other terminal, and voice data processing means for processing voice data received by the receiving means, wherein the signal detecting means detects the signal The sound data processing means for processing the sound data so that the degree of processing becomes higher as the elapsed time from the time elapses, and the sound data processed by the sound data processing means is emitted Voice data output means for outputting to the stage, and the second terminal transmits voice data representing the voice collected by the sound collection means to the first terminal; And a notification data receiving means for receiving the notification data transmitted by the notification data transmitting means, and a notifying means for notifying the contents indicated by the communication data received by the notification data receiving means. A communication system is provided.

本発明によれば、複数の通信端末間で音声通信を行う際に、聴取者が実際に話を聞いているかを、発話者に対して報知することができる。 ADVANTAGE OF THE INVENTION According to this invention, when performing voice communication between several communication terminals, it can alert | report to a speaker whether a listener is actually listening to a talk.

＜構成＞
図１は、この発明の一実施形態である遠隔会議システム１の構成を示すブロック図である。この遠隔会議システム１は、各地に設置された複数の端末１０ａ，１０ｂ，１０ｃ…が、インターネット等の通信網２０に接続されて構成される。なお、以下の説明においては、説明の便宜上、端末１０ａ，１０ｂ，１０ｃ…を各々区別する必要がない場合には、これらを「端末１０」と称して説明する。遠隔会議の参加者が端末１０を用いて音声通信を行うことで、遠隔会議が実現される。 <Configuration>
FIG. 1 is a block diagram showing a configuration of a remote conference system 1 according to an embodiment of the present invention. The remote conference system 1 is configured by connecting a plurality of terminals 10a, 10b, 10c... Installed in various places to a communication network 20 such as the Internet. In the following description, for convenience of description, when it is not necessary to distinguish the terminals 10a, 10b, 10c..., These will be referred to as “terminal 10”. A remote conference is realized by a voice conference performed by a participant of the remote conference using the terminal 10.

図２は、端末１０の構成の一例を示すブロック図である。図において、マイクロホンＭＣは、会議に参加している者（以下「参加者」という）が発声した音声を収音し、収音した音声を表す音声信号（アナログ信号）を出力する収音手段である。マイクロホンＭＣから出力される音声信号は、ＣＯＤＥＣ１６に出力される。ＣＯＤＥＣ１６は、マイクロホンＭＣから出力される音声信号をデジタルデータに変換する。操作部１８は、会議の参加者による操作に応じた信号を出力する。操作部１８には、会議の参加者が他の参加者の話をきちんと聞いている旨を入力するための専用のボタンＢ１が設けられている。以下の説明では、説明の便宜上、ボタンＢ１が押下されたときに操作部１８から出力される信号を「操作信号Ｓ１」と称して説明する。操作部１８から出力される操作信号Ｓ１はパケット合成部１７に出力されるとともに、音声加工部１５に出力される。パケット合成部１７は、操作部１８から出力される操作信号Ｓ１を検出し、操作信号Ｓ１が検出されたタイミングで検出した操作信号Ｓ１に応じて通知データを生成する。パケット合成部１７は、ＣＯＤＥＣ１６から出力される音声データと、操作部１８から出力される操作信号Ｓ１に応じた通知データとをパケット化して他の端末１０へ送信する。この通知データは、参加者が専用のボタンＢ１を能動的に押下することによって生成されるデータであるから、会議の参加者が発話者の話をきちんと聞いているかを示すデータとして用いることができる。 FIG. 2 is a block diagram illustrating an example of the configuration of the terminal 10. In the figure, the microphone MC is a sound collecting means for collecting a voice uttered by a person participating in the conference (hereinafter referred to as “participant”) and outputting a voice signal (analog signal) representing the collected voice. is there. The audio signal output from the microphone MC is output to the CODEC 16. The CODEC 16 converts an audio signal output from the microphone MC into digital data. The operation unit 18 outputs a signal corresponding to an operation by a conference participant. The operation unit 18 is provided with a dedicated button B <b> 1 for inputting that the conference participants are listening to other participants properly. In the following description, for convenience of description, a signal output from the operation unit 18 when the button B1 is pressed will be referred to as “operation signal S1”. The operation signal S1 output from the operation unit 18 is output to the packet synthesizing unit 17 and also output to the voice processing unit 15. The packet combining unit 17 detects the operation signal S1 output from the operation unit 18, and generates notification data according to the operation signal S1 detected at the timing when the operation signal S1 is detected. The packet synthesizing unit 17 packetizes the audio data output from the CODEC 16 and the notification data corresponding to the operation signal S1 output from the operation unit 18, and transmits the packetized data to other terminals 10. This notification data is data generated when the participant actively presses the dedicated button B1, and thus can be used as data indicating whether the conference participant is listening to the speaker. .

端末１０には、音声通信を行う他の端末１０毎にそれぞれデータ受信部２２が設けられる。端末１０は、他の端末１０のそれぞれから受信されるデータを個別に受信する。パケット分離部１１は、通信網２０を介して接続された他の端末１０から送信されてくるデータを受信する。他の端末１０から送信されてくるデータには、他の端末１０のマイクロホンＭＣで収音された音声を表す音声データと、他の参加者の聴取状況（参加者が発話者の話をきちんと聞いているか否か）を示す通知データとが含まれる。パケット分離部１１は、受信したデータから音声データと通知データとを分離し、音声データをＣＯＤＥＣ１２に出力するとともに、通知データを導通判定部１３に出力する。ＣＯＤＥＣ１２は、受信された音声データをデコードし、音声ミキサ１４に出力する。音声ミキサ１４は、複数の他の端末１０のそれぞれに対応するＣＯＤＥＣ１２，１２，…から出力される音声信号をミキシングし、スピーカＳＰに出力する。スピーカＳＰは供給される音声信号に応じて放音する。 The terminal 10 is provided with a data receiving unit 22 for each of the other terminals 10 that perform voice communication. The terminal 10 individually receives data received from each of the other terminals 10. The packet separator 11 receives data transmitted from another terminal 10 connected via the communication network 20. The data transmitted from the other terminal 10 includes the voice data representing the sound collected by the microphone MC of the other terminal 10 and the listening status of the other participants (the participant listens to the speaker properly). Notification data indicating whether or not) is included. The packet separation unit 11 separates the voice data and the notification data from the received data, outputs the voice data to the CODEC 12, and outputs the notification data to the continuity determination unit 13. The CODEC 12 decodes the received audio data and outputs it to the audio mixer 14. The audio mixer 14 mixes audio signals output from the CODECs 12, 12,... Corresponding to each of the plurality of other terminals 10 and outputs the mixed audio signals to the speaker SP. The speaker SP emits sound according to the supplied audio signal.

導通判定部１３は、パケット分離部１１から供給される他端末１０の通知データに応じて聴取者が発話者の話をきちんと聞いているか否かを判定し、判定結果を示すデータを表示部１９に出力する。表示部１９は、導通判定部１３から供給されるデータに応じて、判定結果を示す画像を表示する。この表示の態様としては、例えば、聴取者が発話者の話をきちんと聞いている旨を示すメッセージや画像を表示するようにしてもよく、受信された通知データに応じた内容を発話者に報知するものであればどのようなものであってもよい。また、本実施形態では、通知データの内容を表す画像を表示部１９に表示することによって発話者に報知したが、報知の態様はこれに限らず、例えば、通知データに応じた音声メッセージをスピーカＳＰから放音するようにしてもよく、通知データに応じた内容を発話者に報知するものであればどのようなものであってもよい。 The continuity determination unit 13 determines whether or not the listener is listening to the speaker properly according to the notification data of the other terminal 10 supplied from the packet separation unit 11, and displays data indicating the determination result on the display unit 19. Output to. The display unit 19 displays an image indicating the determination result according to the data supplied from the continuity determination unit 13. As a display mode, for example, a message or an image indicating that the listener is listening to the speaker's story may be displayed, and the content corresponding to the received notification data is notified to the speaker. Anything can be used. Further, in the present embodiment, the speaker is notified by displaying an image representing the content of the notification data on the display unit 19. However, the notification mode is not limited to this, and for example, a voice message corresponding to the notification data is displayed on the speaker. The sound may be emitted from the SP, and any may be used as long as the content corresponding to the notification data is notified to the speaker.

音声加工部１５は、音声ミキサ１４から出力される音声データを加工してスピーカＳＰに出力する。音声加工部１５は、操作部１８から出力される操作信号Ｓ１を検出し、操作信号Ｓ１が検出されてから（すなわち参加者によってボタンＢ１が押下されてから）の経過時間が長いほど加工の度合いが高くなるように音声データを加工する。この実施形態では、音声加工部１５は、音声データの出力ゲインを調節するゲイン調整部（図示略）を備え、操作部１８から操作信号Ｓ１が出力されてからの経過時間が長いほど音圧が小さくなるように出力ゲインを調整する。 The audio processing unit 15 processes the audio data output from the audio mixer 14 and outputs the processed audio data to the speaker SP. The voice processing unit 15 detects the operation signal S1 output from the operation unit 18, and the longer the elapsed time after the operation signal S1 is detected (that is, after the button B1 is pressed by the participant), the greater the degree of processing. The audio data is processed so that becomes higher. In this embodiment, the audio processing unit 15 includes a gain adjustment unit (not shown) that adjusts the output gain of audio data, and the sound pressure increases as the elapsed time from the operation signal S1 output from the operation unit 18 increases. Adjust the output gain to make it smaller.

図３は、音声加工部１５から出力される音声データのゲインの変化の一例を示す図である。図において、横軸は時刻を示し、縦軸は出力ゲインのレベルを示す。図３に示す例においては、音声加工部１５は、音声通信が開始されてから所定時間が経過した時刻ｔ１から、出力ゲインを、基準となるレベル（以下「基準レベル」という）から、時間の経過に伴って徐々に下げていく。ここで、時刻ｔ２において聴取者によってボタンＢ１が押下されると、操作部１８は操作された内容に応じた操作信号Ｓ１を音声加工部１５に出力する。音声加工部１５は、操作部１８から操作信号Ｓ１が供給されると、出力ゲインを基準レベルに戻す（図３の時刻ｔ２参照）。そして、時刻ｔ２から所定時間が経過すると、音声加工部１５は、音声データの出力ゲインを徐々に下げる処理を開始する（図３の時刻ｔ３参照）。その後、音声加工部１５は、再度ボタンＢ１が押下される（図３の時刻ｔ４参照）まで、出力ゲインを徐々に低下させる処理を継続する。 FIG. 3 is a diagram illustrating an example of a change in gain of audio data output from the audio processing unit 15. In the figure, the horizontal axis indicates time, and the vertical axis indicates the level of output gain. In the example shown in FIG. 3, the voice processing unit 15 changes the output gain from the reference level (hereinafter referred to as “reference level”) from the time t1 when a predetermined time has elapsed since the start of voice communication. Gradually lowers over time. When the button B1 is pressed by the listener at time t2, the operation unit 18 outputs an operation signal S1 corresponding to the operated content to the sound processing unit 15. When the operation signal S1 is supplied from the operation unit 18, the sound processing unit 15 returns the output gain to the reference level (see time t2 in FIG. 3). When a predetermined time elapses from time t2, the voice processing unit 15 starts a process of gradually reducing the output gain of the voice data (see time t3 in FIG. 3). Thereafter, the voice processing unit 15 continues the process of gradually reducing the output gain until the button B1 is pressed again (see time t4 in FIG. 3).

このように、本実施形態では、ボタンＢ１が押下されてから時間が経過するほど放音される音圧が低くなり、放音される音声が聞き取り難くなる。そして、聴取者がボタンＢ１を押下することでゲインが基準レベルに復帰する。これにより、聴取者は、定期的にボタンＢ１を押下しなければならず、その結果、発話者は、聴取者の端末１０から定期的に送信されてくる通知データによって、聴取者の聴取状況を認識することができる。 As described above, in the present embodiment, the sound pressure emitted becomes lower as time elapses after the button B1 is pressed, and the emitted sound becomes difficult to hear. Then, when the listener presses the button B1, the gain returns to the reference level. As a result, the listener must periodically press the button B1, and as a result, the speaker can check the listener's listening status based on the notification data periodically transmitted from the terminal 10 of the listener. Can be recognized.

＜動作＞
次に、本実施形態の動作について説明する。まず、端末１０は相互に音声通信を行う。端末１０は、マイクロホンＭＣで収音した音声を表す音声データを、他の端末１０に送信するとともに、他の端末１０から通信網２０を介して受信される音声データを受信し、受信した音声データをスピーカＳＰから音として放音する。これにより、遠隔会議が実現される。 <Operation>
Next, the operation of this embodiment will be described. First, the terminals 10 perform voice communication with each other. The terminal 10 transmits the audio data representing the sound collected by the microphone MC to the other terminal 10, receives the audio data received from the other terminal 10 via the communication network 20, and receives the received audio data Is emitted as a sound from the speaker SP. Thereby, a remote conference is realized.

端末１０の利用者は、操作部１８を操作して、専用のボタンＢ１を所定時間長毎に押下しつつ、発話者の話を聞く。ここで、ボタンＢ１が押下されると、操作部１８は、操作された内容に応じて操作信号Ｓ１を出力する。マイクロホンＭＣで収音された音声はＣＯＤＥＣ１６でコード化されて音声データとしてパケット合成部１７に出力される。パケット合成部１７は、ＣＯＤＥＣ１６から出力される音声データと操作部１８から出力された操作信号Ｓ１に応じた通知データとをパケット化して、他の端末１０へ送信する。 The user of the terminal 10 operates the operation unit 18 and listens to the speaker's story while pressing the dedicated button B1 every predetermined time length. Here, when the button B1 is pressed, the operation unit 18 outputs an operation signal S1 according to the operated content. The sound collected by the microphone MC is encoded by the CODEC 16 and output to the packet synthesizer 17 as sound data. The packet synthesizing unit 17 packetizes the audio data output from the CODEC 16 and the notification data corresponding to the operation signal S1 output from the operation unit 18, and transmits the packetized data to another terminal 10.

一方、パケット分離部１１は、受信したデータから音声データと通知データとを分離し、音声データをＣＯＤＥＣ１２に出力するとともに、通知データを導通判定部１３に出力する。ＣＯＤＥＣ１２は、供給される音声データをデコードし、音声ミキサ１４に出力する。音声ミキサ１４は、複数のＣＯＤＥＣ１２から出力される音声データをミキシングして音声加工部１５に出力する。 On the other hand, the packet separation unit 11 separates the voice data and the notification data from the received data, outputs the voice data to the CODEC 12, and outputs the notification data to the continuity determination unit 13. The CODEC 12 decodes the supplied audio data and outputs it to the audio mixer 14. The audio mixer 14 mixes the audio data output from the plurality of CODECs 12 and outputs it to the audio processing unit 15.

音声加工部１５は、操作部１８から操作信号Ｓ１が出力されてからの経過時間が長いほど音圧が低くなるように、出力ゲインを調整する。音声加工部１５によってゲイン調整された音声データはスピーカＳＰに出力される。スピーカＳＰからは、他の端末１０から受信された音声データの表す音声が放音される。このとき、ボタンＢ１が押下されたときからの経過時間が長いほど、放音される音声の音圧は低くなる。 The voice processing unit 15 adjusts the output gain so that the sound pressure becomes lower as the elapsed time from when the operation signal S1 is output from the operation unit 18 is longer. The audio data whose gain has been adjusted by the audio processing unit 15 is output to the speaker SP. From the speaker SP, the voice represented by the voice data received from the other terminal 10 is emitted. At this time, the longer the elapsed time from when the button B1 is pressed, the lower the sound pressure of the emitted sound.

パケット分離部１１で分離された複数の通知データは導通判定部１３に供給される。導通判定部１３は、供給される通知データに応じて聴取者が話をきちんと聞いているか否かを判定し、判定結果を示すデータを表示部１９に出力する。表示部１９は、導通判定部１３から供給されるデータに応じて、導通判定部１３の判定結果を示す画像を表示する。 The plurality of notification data separated by the packet separation unit 11 is supplied to the continuity determination unit 13. The continuity determination unit 13 determines whether the listener is listening properly according to the supplied notification data, and outputs data indicating the determination result to the display unit 19. The display unit 19 displays an image indicating the determination result of the continuity determination unit 13 according to the data supplied from the continuity determination unit 13.

このように本実施形態では、ボタンＢ１が押下されてから時間が経過するほど放音される音圧が低くなり、放音される音声が聞き取り難くなる。端末１０は、ボタンＢ１が押下されてから経過した時間が長いほど音声が劣化するように音声を加工して出力するから、これにより、聴取者が能動的にボタンＢ１を押下することを促進することができる。また、発話者は、聴取者の端末１０から定期的に送信されてくる通知データによって、通話が確立していることを認識することができる。 As described above, in the present embodiment, the sound pressure emitted becomes lower as time elapses after the button B1 is pressed, and the emitted sound becomes difficult to hear. Since the terminal 10 processes and outputs the sound such that the sound deteriorates as the time elapsed since the button B1 was pressed, this facilitates the listener to actively press the button B1. be able to. Further, the speaker can recognize that the call has been established by the notification data periodically transmitted from the listener's terminal 10.

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその例を示す。なお、以下の各態様を適宜に組み合わせてもよい。
（１）上述の実施形態では、本発明に係る端末を用いて遠隔会議を行う場合について説明したが、本発明はこれに限らず、例えば、通信ネットワークを介して講義や講演を行う場合においても本発明を適用することができる。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below. In addition, you may combine each following aspect suitably.
(1) In the above-described embodiment, the case where a remote conference is performed using the terminal according to the present invention has been described. However, the present invention is not limited to this, and for example, when a lecture or lecture is performed via a communication network. The present invention can be applied.

（２）上述の実施形態では、音声加工部１５は、ボタンＢ１が押下されてからの経過時間が長いほど音圧が低くなるように音声データを加工したが、音声データの加工の態様はこれに限らず、例えば、音声加工部１５が、ボタンＢ１が押下されてからの経過時間が長いほど音質の劣化の度合いが高くなるように音声データを加工するようにしてもよい。この場合は、具体的には、例えば、音声加工部１５は、ボタンＢ１が押下されてからの経過時間が長いほど、含まれるノイズの割合が大きくなるように音声データを加工するようにしてもよい。また、例えば、ボタンＢ１が押下されてからの経過時間が長いほど、所定の周波数帯域の音圧レベルを高くすることによって音質を劣化させるようにしてもよい。このように、音声加工部１５が、音声の音圧を徐々に低くするようにしてもよく、また、音質が徐々に劣化するように加工してもよく、また、これらを組み合わせてもよい。要は、音声加工部１５が、ボタンＢ１が最後に押下されてからの経過時間が長いほど加工の度合いが高くなるように音声データを加工するようにすればよい。要するに、音声加工部１５が、ボタンＢ１が最後に押下されてからの経過時間が長いほど、スピーカＳＰから放音される音声が聞き取りづらくなるように、音声データを加工して出力するようにすればよい。 (2) In the above-described embodiment, the voice processing unit 15 processes the voice data so that the sound pressure becomes lower as the elapsed time after the button B1 is pressed. For example, the audio processing unit 15 may process the audio data so that the degree of deterioration in sound quality increases as the elapsed time from the time when the button B1 is pressed is longer. In this case, specifically, for example, the voice processing unit 15 may process the voice data so that the proportion of noise included increases as the elapsed time from the pressing of the button B1 increases. Good. Further, for example, the sound quality may be deteriorated by increasing the sound pressure level in a predetermined frequency band as the elapsed time after the button B1 is pressed is longer. As described above, the sound processing unit 15 may gradually reduce the sound pressure of the sound, may be processed so that the sound quality gradually deteriorates, or may be combined. In short, the voice processing unit 15 may process the voice data so that the longer the elapsed time since the button B1 was last pressed, the higher the degree of processing. In short, the voice processing unit 15 processes and outputs the voice data so that the longer the elapsed time since the button B1 was last pressed, the harder it is to hear the voice emitted from the speaker SP. That's fine.

（３）上述の実施形態では、利用者によって専用のボタンＢ１が押下された旨を示す通知データを用いたが、通知データはこれに限らず、端末１０のマイクロホンＭＣによって収音された音声を表す音声データを解析することによって通知データを生成するようにしてもよい。この場合の構成について、図４を参照しつつ説明する。図４において、パケット分離部１１、ＣＯＤＥＣ１２、導通判定部１３、音声ミキサ１４、ＣＯＤＥＣ１６、パケット合成部１７及び音声加工部１５は上述した実施形態において図１で示した各部と同様であり、ここではその詳細な説明を省略する。 (3) In the above-described embodiment, the notification data indicating that the dedicated button B1 is pressed by the user is used. However, the notification data is not limited to this, and the sound collected by the microphone MC of the terminal 10 is used. Notification data may be generated by analyzing voice data to be represented. The configuration in this case will be described with reference to FIG. In FIG. 4, the packet separation unit 11, the CODEC 12, the continuity determination unit 13, the audio mixer 14, the CODEC 16, the packet synthesis unit 17, and the audio processing unit 15 are the same as the respective units shown in FIG. Detailed description thereof is omitted.

図４において、音声解析部２１は、ＣＯＤＥＣ１６から出力される音声データを解析し、解析結果に応じて通知データを生成する。具体的には、例えば、所定のメモリに、あいづちの音声やうなずきの音声（以下「うなずき音声」という）やうなずき音声の特徴を表す照合用データを予め記憶しておき、音声解析部２１が、ＣＯＤＥＣ１６から出力される音声データをメモリに記憶された照合用データと照合し、照合結果が予め定められた条件を満たす音声データ（以下「うなずき音声データ」）を抽出するようにし、うなずき音声データが検出された場合に通知データを生成するようにしてもよい。この場合は、音声加工部１５は、うなずき音声が最後に検出されてからの経過時間が長いほど加工度合いが大きくなるように音声データを加工する。これにより、聴取者がうなずき音声を発してからの経過時間が長いほど音声が聞き取り難くなる。このようにすることで、聴取者のうなずき音声によって聴取者の聴取状況を発話者に報知することができ、また、聴取者がうなずき音声を発するのを促進することができる。 In FIG. 4, the voice analysis unit 21 analyzes the voice data output from the CODEC 16, and generates notification data according to the analysis result. Specifically, for example, in a predetermined memory, data for matching, which expresses the characteristics of a nodding voice, a nodding voice (hereinafter referred to as “nodding voice”) and a nodding voice, is stored in advance. The voice data output from the CODEC 16 is collated with the data for collation stored in the memory, and voice data satisfying a predetermined collation result (hereinafter “nodding voice data”) is extracted. Notification data may be generated when a message is detected. In this case, the voice processing unit 15 processes the voice data so that the degree of processing increases as the elapsed time from the last detection of the nodding voice increases. As a result, the longer the elapsed time since the listener uttered the nodding sound, the more difficult it is to hear the sound. By doing in this way, a listener's listening situation can be alert | reported to a speaker by a listener's nodding voice, and it can accelerate | stimulate that a listener emits a nodding voice.

また、音声データを照合用データと照合するに限らず、例えば、音声解析部２１が、ＣＯＤＥＣ１６から出力される音声データの音圧を検出し、検出された音圧に応じて通知データを生成するようにしてもよい。具体的には、例えば、音声解析部２１が、予め定められた閾値以上の音圧が検出された場合に、通知データを生成するようにしてもよい。この場合は、予め定められた閾値以上の音圧が継続して検出されない場合（静かすぎる場合）には、その継続時間が長いほど、放音される音声が聞き取りにくくなる。また、逆に、予め定められた閾値以上の音圧が所定時間継続して検出された場合（騒がしすぎる場合）に、発話者の話を聞いていないとみなして通知データを生成しないようにしてもよい。 Further, not only the voice data is collated with the collation data, for example, the voice analysis unit 21 detects the sound pressure of the voice data output from the CODEC 16 and generates notification data according to the detected sound pressure. You may do it. Specifically, for example, the voice analysis unit 21 may generate notification data when a sound pressure equal to or higher than a predetermined threshold is detected. In this case, when a sound pressure equal to or higher than a predetermined threshold is not continuously detected (when it is too quiet), the longer the duration is, the harder it is to hear the emitted sound. On the other hand, when sound pressure equal to or higher than a predetermined threshold is detected for a predetermined period of time (when it is too noisy), it is assumed that the speaker is not listening and the notification data is not generated. Also good.

また、上述の実施形態において、画像解析によってうなずき動作を検出するようにしてもよい。この場合は、端末１０の利用者を撮影する撮影手段を設ける構成とし、撮影手段から出力される映像データを画像解析し、解析結果に応じてうなずき動作を検出し、うなずき動作が検出された場合に通知データを生成するようにしてもよい。 In the above-described embodiment, the nodding operation may be detected by image analysis. In this case, a configuration is provided in which photographing means for photographing the user of the terminal 10 is provided, image data output from the photographing means is subjected to image analysis, nodding motion is detected according to the analysis result, and nodding motion is detected Notification data may be generated.

（４）上述の実施形態においては、端末１０は、表示部１９に通知データの示す内容を表す画像を表示することによって聴取者の聴取状況を報知したが、報知の態様はこれに限らず、例えば、所定の通知音を放音することによって報知するようにしてもよく、また、例えば、音声メッセージを出力することによって報知してもよい。要は発話者に対して何らかの手段でメッセージ乃至情報を伝えられるように、通知データを出力するものであればよい。 (4) In the above-described embodiment, the terminal 10 notifies the listener of the listening situation by displaying an image representing the content indicated by the notification data on the display unit 19, but the notification mode is not limited to this. For example, notification may be made by emitting a predetermined notification sound, or notification may be made by outputting a voice message, for example. In short, any notification data may be output so that a message or information can be transmitted to the speaker by some means.

（５）上述の実施形態では、基準レベルが維持される時間（出力ゲインが基準レベルになってから、ゲイン低下が始まるまでの時間：図３における時刻０〜ｔ１、時刻ｔ２〜ｔ３）が一定である場合について説明したが、基準レベルが維持される時間は一定である必要はなく、ランダムであってもよい。 (5) In the above-described embodiment, the time during which the reference level is maintained (the time from when the output gain reaches the reference level until the gain reduction starts: time 0 to t1, time t2 to t3 in FIG. 3) is constant. However, the time for which the reference level is maintained does not have to be constant, and may be random.

（６）上述の実施形態において、端末１０が、通知データを送信した後は所定時間が経過するまで次の通知データを送信しないようにしてもよい。あまり頻繁に通知音が放音されると発話者にとって煩わしく感じる場合がある。そこで、例えば、１度通知データを送信した後は次の送信まで最低２０秒は空ける、といった制御を行うようにしてもよい。 (6) In the above-described embodiment, after the terminal 10 transmits the notification data, the terminal 10 may not transmit the next notification data until a predetermined time has elapsed. If the notification sound is emitted too frequently, the speaker may feel annoying. Therefore, for example, after sending notification data once, control may be performed such that at least 20 seconds are left until the next transmission.

（７）上述の実施形態では、端末１０が、ボタンＢ１が押下されることによって操作部１８から出力される操作信号Ｓ１に応じて通知データを生成するようにしたが、複数の操作子（ボタン等）を備える操作部を用いるようにしてもよい。この場合は、端末１０は、操作部の各操作子に対応する信号を検出し、検出した信号に応じた通知データを、他の端末１０に対して送信するようにしてもよい。具体的には、例えば、「ＯＫ」を示すボタンや「ＮＧ」を示すボタン等を操作部に設け、端末１０が、押下されたボタンに応じた通知データ（「ＯＫ」を示す通知データや「ＮＧ」を示す通知データ、等）を、他の端末１０に対して送信するようにしてもよい。 (7) In the above-described embodiment, the terminal 10 generates the notification data in response to the operation signal S1 output from the operation unit 18 when the button B1 is pressed. Etc.) may be used. In this case, the terminal 10 may detect a signal corresponding to each operation element of the operation unit, and transmit notification data corresponding to the detected signal to another terminal 10. Specifically, for example, a button indicating “OK”, a button indicating “NG”, or the like is provided in the operation unit, and the terminal 10 receives notification data corresponding to the pressed button (notification data indicating “OK” or “ Notification data indicating “NG” or the like) may be transmitted to another terminal 10.

（８）上述の実施形態において、音声通信中の他の端末１０の全てに通知データを送信せずに、発話中である参加者の端末１０に対して通知データを送信するようにしてもよい。この場合は、端末１０が、受信された音圧データの音圧を他の端末１０毎に検出し、複数の他の端末１０のなかから音圧が検出された端末１０を特定し、特定された端末１０に対して通知データを送信するようにしてもよい。 (8) In the above-described embodiment, the notification data may be transmitted to the terminal 10 of the participant who is speaking without transmitting the notification data to all the other terminals 10 that are performing voice communication. . In this case, the terminal 10 detects the sound pressure of the received sound pressure data for each of the other terminals 10 and identifies the terminal 10 from which the sound pressure is detected from among the plurality of other terminals 10. Notification data may be transmitted to the terminal 10.

（９）上述の実施形態における端末１０の各部は、ハードウェアとして構成されてもよく、また、ＣＰＵ（Central Processing Unit）等の制御部がハードウェア等の記憶手段に記憶されたコンピュータプログラムを実行することによってソフトウェアとして実現するようにしてもよい。また、この場合、制御部によって実行されるプログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータが読取可能な記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由で端末１０にダウンロードさせることも可能である。 (9) Each unit of the terminal 10 in the above-described embodiment may be configured as hardware, and a control unit such as a CPU (Central Processing Unit) executes a computer program stored in a storage unit such as hardware. By doing so, it may be realized as software. In this case, the program executed by the control unit is a computer-readable recording medium such as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, and a semiconductor memory. Can be provided in a recorded state. It is also possible to download to the terminal 10 via a network such as the Internet.

遠隔会議システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a remote conference system. 端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a terminal. 音声加工部の処理の内容の一例を示す図である。It is a figure which shows an example of the content of the process of an audio processing part. 端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a terminal.

Explanation of symbols

１…遠隔会議システム、１０…端末、１１…パケット分離部、１２…ＣＯＤＥＣ、１３…導通判定部、１４…音声ミキサ、１５…音声加工部、１６…ＣＯＤＥＣ、１７…パケット合成部、１８…操作部、１９…表示部、２０…通信網、２１…音声解析部、２２…データ受信部。 DESCRIPTION OF SYMBOLS 1 ... Remote conference system, 10 ... Terminal, 11 ... Packet separation part, 12 ... CODEC, 13 ... Continuity determination part, 14 ... Audio mixer, 15 ... Audio processing part, 16 ... CODEC, 17 ... Packet synthesis part, 18 ... Operation , 19 ... display unit, 20 ... communication network, 21 ... voice analysis unit, 22 ... data reception unit.

Claims

Receiving means for receiving audio data transmitted from another terminal connected via a communication network;
Signal detection means for detecting a predetermined signal output from an operation means operated by an operator;
Notification data transmitting means for transmitting notification data indicating that to the other terminal when the signal detecting means detects the signal;
Audio data processing means for processing the audio data received by the receiving means, wherein the audio data is processed so that the degree of processing increases as the elapsed time from the detection of the signal by the signal detection means increases. Audio data processing means for processing;
Output means for outputting the sound data processed by the sound data processing means to the sound emitting means.

The sound data processing means processes the sound data so that the degree of deterioration of sound quality becomes higher as the elapsed time from the detection of the signal by the signal detection means becomes longer. The communication device described.

2. The communication according to claim 1, wherein the voice data processing means processes the voice data such that a sound pressure becomes smaller as an elapsed time after the signal detection means detects the signal. apparatus.

The operation means includes a plurality of operators, and outputs a signal corresponding to the operator operated by the operator,
The signal detection means detects a signal corresponding to the operation element of the operation means,
The notification data transmission unit transmits notification data corresponding to an operation element corresponding to the detected signal to the other terminal when the detection unit detects the signal. Item 4. The communication device according to any one of Items 1 to 3.

The receiving means receives voice data transmitted from a plurality of other terminals connected via a communication network for each of the other terminals,
A sound pressure detecting means for detecting the sound pressure of the voice data received by the receiving means for each of the other terminals;
Terminal specifying means for specifying a terminal whose sound pressure is detected by the sound pressure detecting means from among the plurality of other terminals, and
2. The notification data transmitting unit, when the signal detecting unit detects the signal, transmits notification data indicating the fact to the terminal specified by the terminal specifying unit. 5. The communication device according to any one of items 4 to 4.

A voice communication system in which a first terminal and a second terminal are connected via a communication network,
The first terminal is
Voice data receiving means for receiving voice data transmitted from the second terminal;
Signal detection means for detecting a predetermined signal output from an operation means operated by an operator;
Notification data transmitting means for transmitting notification data indicating that to the other terminal when the signal detecting means detects the signal;
Audio data processing means for processing the audio data received by the receiving means, wherein the audio data is processed so that the degree of processing increases as the elapsed time from the detection of the signal by the signal detection means increases. Audio data processing means for processing;
Voice data output means for outputting the voice data processed by the voice data processing means to the sound emitting means,
The second terminal is
Audio data transmitting means for transmitting audio data representing the sound collected by the sound collecting means to the first terminal;
Notification data receiving means for receiving notification data transmitted by the notification data transmitting means;
A voice communication system, comprising: notification means for notifying the contents indicated by the communication data received by the notification data receiving means.