JP2010021924A

JP2010021924A - Communication apparatus

Info

Publication number: JP2010021924A
Application number: JP2008182538A
Authority: JP
Inventors: Makoto Tanaka; 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-07-14
Filing date: 2008-07-14
Publication date: 2010-01-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology capable of notifying a speaking person about a listening situation of a listening person without disturbing a speech when performing voice communication between a plurality of communication terminals. <P>SOLUTION: When receiving data from another terminal 10, a terminal 10 extracts notification data from the received data. A plurality of notification data items are supplied to a conduction determination part 131 of a notification sound synthesis section 13. When all the supplied notification data are value indicative of "OK", the conduction determination part 131 changes over a switch section so as to output an effective sound component but in the other case, the switch section is changed over so that the effective sound component may not be output. A synthetic sound generation part 133 of the notification sound synthesis section 13 outputs the effective sound component based on voice signals output from a microphone MC. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、通信装置に関する。 The present invention relates to a communication device.

近年、通信網を介して接続された複数の通信端末を用いて会議を行う遠隔会議システムが普及している。このような遠隔会議システムにおいては、発話者と聴取者が直接対面していないため、発話者が聴取者の反応を感じることが困難であり、自身の声が相手に届いているかを不安に感じる場合がある。そこで、特許文献１には、発話者の不安を解消するために、送信した音声・画像データが受信側においてどのような状態で届いているかを送信側でリアルタイムに表示する技術が提案されている。また、特許文献２には、ネットワークを介して接続された複数の拠点同士で音声データの送受信する音声会議システムにおいて、ネットワークを介して接続された他の拠点（自拠点）から伝送された入力音声のエコーの音声信号を他の拠点に送り返すことにより、送り返された他の拠点にて、相手拠点において期待する明瞭な音声品質で再生されているかを確認することができる技術が提案されている。
特開２００５−２６９４９８号公報特開２００７−２７４１７６号公報 In recent years, a remote conference system that performs a conference using a plurality of communication terminals connected via a communication network has become widespread. In such a teleconference system, since the speaker and the listener are not directly facing each other, it is difficult for the speaker to feel the listener's reaction, and he / she feels uneasy whether his / her voice reaches the other party. There is a case. Therefore, Patent Document 1 proposes a technique for displaying in real time on the transmission side what kind of state the transmitted voice / image data has reached on the reception side in order to eliminate the anxiety of the speaker. . Patent Document 2 discloses an input voice transmitted from another base (own base) connected via a network in an audio conference system that transmits and receives audio data between a plurality of bases connected via a network. A technique has been proposed in which an echo audio signal is sent back to another site, so that it can be confirmed at the other site that is sent back, whether it is being played back with the clear voice quality expected at the other site.
JP 2005-269498 A JP 2007-274176 A

しかしながら、上述の特許文献１に記載の技術では、表示モニタが必要であり、表示モニタのない音声会議装置には適用することができない。また、特許文献２に記載の技術では、自身の音声のエコー音を聞いて確認する必要があるが、確認と会話が別々になってしまうため、会話をしながら同時に確認することはできない。
本発明は上述した背景に鑑みてなされたものであり、複数の通信端末間で音声通信を行う際に、聴取者の聴取状況を、発話を妨げることなく発話者に対して報知することのできる技術を提供することを目的とする。 However, the technique described in Patent Document 1 described above requires a display monitor and cannot be applied to an audio conference apparatus without a display monitor. Further, in the technique described in Patent Document 2, it is necessary to confirm by listening to the echo sound of its own voice. However, since confirmation and conversation are separate, confirmation cannot be performed simultaneously while talking.
The present invention has been made in view of the above-described background, and when voice communication is performed between a plurality of communication terminals, the listener's listening status can be notified to the speaker without disturbing the utterance. The purpose is to provide technology.

上記課題を解決するために、本発明は、通信ネットワークを介して接続された他の端末から送信されてくるデータを受信する受信手段と、収音手段によって収音された音声を表す音声データを、前記他の端末へ送信する送信手段と、前記収音手段によって収音された音声を表す音声データを用いて、該音声の音響効果の成分を表す効果音成分信号を生成する効果音成分生成手段と、前記受信手段によって受信されるデータを解析し、解析結果に応じて該データが予め定められた条件を満たすか否かを判定する判定手段と、前記判定手段による判定結果が肯定的である場合に、前記効果音成分生成手段によって生成された効果音成分信号を、放音手段に出力する放音制御手段とを具備することを特徴とする通信装置を提供する。 In order to solve the above problems, the present invention provides a receiving means for receiving data transmitted from another terminal connected via a communication network, and audio data representing the sound collected by the sound collecting means. The sound effect component generation for generating the sound effect component signal representing the sound effect component of the sound using the transmission means for transmitting to the other terminal and the sound data representing the sound collected by the sound collecting means Analyzing the data received by the receiving means, determining whether the data satisfies a predetermined condition according to the analysis result, and the determination result by the determining means is affirmative In some cases, there is provided a communication apparatus comprising sound emission control means for outputting a sound effect component signal generated by the sound effect component generation means to the sound emission means.

本発明の好ましい態様において、前記受信手段は、複数の端末からデータを受信し、前記判定手段は、前記受信手段によって受信されるデータを前記端末毎に解析し、解析結果に応じて該データが予め定められた条件を満たすか否かを前記端末毎に判定し、前記効果音成分生成手段は、前記判定手段による判定結果が肯定的である端末の数及び割合の少なくともいずれか一方に応じて、生成する効果音成分信号を異ならせてもよい。 In a preferred aspect of the present invention, the receiving means receives data from a plurality of terminals, and the determining means analyzes the data received by the receiving means for each terminal, and the data is determined according to the analysis result. Whether or not a predetermined condition is satisfied is determined for each terminal, and the sound effect component generation unit is responsive to at least one of the number and the ratio of the terminals for which the determination result by the determination unit is positive The sound effect component signals to be generated may be different.

本発明の更に好ましい態様において、前記受信手段は、複数の端末からデータを受信し、前記判定手段は、前記受信手段によって受信されるデータを前記端末毎に解析し、解析結果に応じて該データが予め定められた条件を満たすか否かを前記端末毎に判定し、前記放音制御手段は、前記判定手段による判定結果が肯定的である端末の数及び割合の少なくともいずれか一方が予め定められた条件を満たす場合に、前記効果音成分信号を前記放音手段に出力してもよい。 In a further preferred aspect of the present invention, the receiving means receives data from a plurality of terminals, and the determining means analyzes the data received by the receiving means for each terminal, and the data according to the analysis result. For each of the terminals, the sound emission control means determines in advance at least one of the number and the ratio of terminals whose determination result by the determination means is affirmative. If the specified condition is satisfied, the sound effect component signal may be output to the sound emitting means.

また、本発明の更に好ましい態様において、前記複数の端末の位置を示す位置データを前記端末毎に記憶する位置データ記憶手段を具備し、前記放音制御手段は、前記放音手段から放音される音声が、前記位置データ記憶手段に記憶された位置データと前記判定手段による前記端末毎の判定結果とに応じた方向に音像定位するように制御してもよい。 Further, in a further preferred aspect of the present invention, there is provided position data storage means for storing position data indicating positions of the plurality of terminals for each of the terminals, and the sound emission control means is emitted from the sound emission means. The sound may be controlled so that sound is localized in a direction according to the position data stored in the position data storage means and the determination result for each terminal by the determination means.

また、本発明の更に好ましい態様において、前記放音手段による放音の態様を示す放音態様データを、前記端末毎に記憶する放音態様データ記憶手段を具備し、前記放音制御手段は、前記判定手段による判定結果が肯定的である端末に対応する放音態様データの示す放音態様で放音させてもよい。 Further, in a further preferred aspect of the present invention, the sound emission control means further comprises sound emission mode data storage means for storing sound emission mode data indicating a sound emission mode by the sound emission means for each terminal, The sound may be emitted in the sound emission mode indicated by the sound emission mode data corresponding to the terminal having a positive determination result by the determination unit.

また、本発明の更に好ましい態様において、前記判定手段は、前記受信手段によって受信されるデータから、前記送信手段によって送信された音声データが前記他の端末で正常に再生されているか否かを示す通知データを抽出し、抽出した通知データに応じて判定してもよい。 Further, in a further preferred aspect of the present invention, the determination means indicates whether or not the audio data transmitted by the transmission means is normally reproduced on the other terminal from the data received by the reception means. Notification data may be extracted and determined according to the extracted notification data.

また、本発明の更に好ましい態様において、前記判定手段は、前記受信手段によって受信される音声データの音圧を検出し、検出した音圧が予め定められた条件を満たすか否かを判定してもよい。 Further, in a further preferred aspect of the present invention, the determination means detects the sound pressure of the audio data received by the reception means, and determines whether or not the detected sound pressure satisfies a predetermined condition. Also good.

また、本発明の更に好ましい態様のおいて、前記受信手段は、音声データを含むデータを前記他の端末から受信し、前記判定手段は、前記受信手段によって受信されるデータに含まれる音声データを解析し、解析結果に応じて判定してもよい。 Further, in a further preferred aspect of the present invention, the receiving means receives data including voice data from the other terminal, and the determining means receives voice data included in the data received by the receiving means. Analysis may be performed and determination may be made according to the analysis result.

また、音声又は音声の特徴を表す照合データを記憶する照合データ記憶手段を具備し、前記判定手段は、前記受信手段によって受信される音声データを前記照合データ記憶手段に記憶された照合データと照合し、照合結果が予め定められた条件を満たす音声データが含まれるか否かを判定してもよい。 In addition, it includes collation data storage means for storing collation data representing voice or voice characteristics, and the determination means collates voice data received by the reception means with collation data stored in the collation data storage means. Then, it may be determined whether or not the voice data satisfying a predetermined matching result is included.

本発明によれば、複数の通信端末間で音声通信を行う際に、聴取者の聴取状況を、発話を妨げることなく発話者に対して報知することができる。 ADVANTAGE OF THE INVENTION According to this invention, when performing voice communication between several communication terminals, a listener's listening condition can be alert | reported with respect to a speaker, without disturbing speech.

＜構成＞
図１は、この発明の一実施形態である遠隔会議システム１の構成を示すブロック図である。この遠隔会議システム１は、各地に設置された複数の端末１０ａ，１０ｂ，１０ｃ…が、インターネット等の通信網２０に接続されて構成される。なお、以下の説明においては、説明の便宜上、端末１０ａ，１０ｂ，１０ｃ…を各々区別する必要がない場合には、これらを「端末１０」と称して説明する。遠隔会議の参加者が端末１０を用いて音声通信を行うことで、遠隔会議が実現される。 <Configuration>
FIG. 1 is a block diagram showing a configuration of a remote conference system 1 according to an embodiment of the present invention. The remote conference system 1 is configured by connecting a plurality of terminals 10a, 10b, 10c... Installed in various places to a communication network 20 such as the Internet. In the following description, for convenience of description, when it is not necessary to distinguish the terminals 10a, 10b, 10c..., These will be referred to as “terminal 10”. A remote conference is realized by a voice conference performed by a participant of the remote conference using the terminal 10.

図２は、端末１０の構成の一例を示すブロック図である。図において、マイクロホンＭＣは、会議に参加している者（以下「参加者」という）が発声した音声を収音し、収音した音声を表す音声信号（アナログ信号）を出力する収音手段である。マイクロホンＭＣから出力される音声信号は、ＣＯＤＥＣ１６に出力される。ＣＯＤＥＣ１６は、マイクロホンＭＣから出力される音声信号をデジタルデータに変換する。操作部１８は、会議の参加者による操作に応じた信号を出力する。操作部１８には、会議の参加者が他の参加者の話をきちんと聞いている旨を入力するための専用のボタンＢ１が設けられている。以下の説明では、説明の便宜上、ボタンＢ１が押下されたときに操作部１８から出力される信号を「操作信号Ｓ１」と称して説明する。操作部１８から出力される操作信号Ｓ１はパケット合成部１７に出力される。パケット合成部１７は、操作部１８から出力される操作信号Ｓ１を検出し、操作信号Ｓ１が検出されたタイミングで検出した操作信号Ｓ１に応じて通知データを生成する。パケット合成部１７は、ＣＯＤＥＣ１６から出力される音声データと、操作部１８から出力される操作信号Ｓ１に応じた通知データとをパケット化して他の端末１０へ送信する。この通知データは、参加者が専用のボタンＢ１を能動的に押下することによって生成されるデータであるから、会議の参加者が発話者の話をきちんと聞いているかを示すデータとして用いることができる。 FIG. 2 is a block diagram illustrating an example of the configuration of the terminal 10. In the figure, the microphone MC is a sound collecting means for collecting a voice uttered by a person participating in the conference (hereinafter referred to as “participant”) and outputting a voice signal (analog signal) representing the collected voice. is there. The audio signal output from the microphone MC is output to the CODEC 16. The CODEC 16 converts an audio signal output from the microphone MC into digital data. The operation unit 18 outputs a signal corresponding to an operation by a conference participant. The operation unit 18 is provided with a dedicated button B <b> 1 for inputting that the conference participants are listening to other participants properly. In the following description, for convenience of description, a signal output from the operation unit 18 when the button B1 is pressed will be referred to as “operation signal S1”. The operation signal S1 output from the operation unit 18 is output to the packet combining unit 17. The packet combining unit 17 detects the operation signal S1 output from the operation unit 18, and generates notification data according to the operation signal S1 detected at the timing when the operation signal S1 is detected. The packet synthesizing unit 17 packetizes the audio data output from the CODEC 16 and the notification data corresponding to the operation signal S1 output from the operation unit 18, and transmits the packetized data to other terminals 10. This notification data is data generated when the participant actively presses the dedicated button B1, and thus can be used as data indicating whether the conference participant is listening to the speaker. .

端末１０には、音声通信を行う他の端末１０毎にそれぞれデータ受信部２２が設けられる。端末１０は、他の端末１０のそれぞれから受信されるデータを個別に受信する。パケット分離部１１は、通信網２０を介して接続された他の端末１０から送信されてくるデータを受信する。他の端末１０から送信されてくるデータには、他の端末１０のマイクロホンＭＣで収音された音声を表す音声データと、他の参加者の聴取状況（参加者が発話者の話をきちんと聞いているか否か）を示す通知データとが含まれる。パケット分離部１１は、受信したデータから音声データと通知データとを分離し、音声データをＣＯＤＥＣ１２に出力するとともに、通知データを通知音合成部１３に出力する。ＣＯＤＥＣ１２は、受信された音声データをデコードし、音声ミキサ１４に出力する。音声ミキサ１４は、複数の他の端末１０のそれぞれに対応するＣＯＤＥＣ１２，１２，…から出力される音声信号をミキシングし、加算器１５に出力する。 The terminal 10 is provided with a data receiving unit 22 for each of the other terminals 10 that perform voice communication. The terminal 10 individually receives data received from each of the other terminals 10. The packet separator 11 receives data transmitted from another terminal 10 connected via the communication network 20. The data transmitted from the other terminal 10 includes the voice data representing the sound collected by the microphone MC of the other terminal 10 and the listening status of the other participants (the participant listens to the speaker properly). Notification data indicating whether or not) is included. The packet separation unit 11 separates the voice data and the notification data from the received data, outputs the voice data to the CODEC 12, and outputs the notification data to the notification sound synthesis unit 13. The CODEC 12 decodes the received audio data and outputs it to the audio mixer 14. The audio mixer 14 mixes audio signals output from the CODECs 12, 12,... Corresponding to the plurality of other terminals 10, and outputs them to the adder 15.

通知音合成部１３は、パケット分離部１１から供給される他端末１０の通知データに応じて、聴取者が発話者の話をきちんと聞いているか否かを判定し、判定結果が肯定的である場合に、マイクロホンＭＣから出力される音声信号を用いて、音声の音響効果の成分を表す効果音成分信号を生成する。
図３は、通知音合成部１３の構成の一例を示すブロック図である。図において、導通判定部１３１は、他の端末１０から送信されてくる通知データを端末１０毎に判定し、端末１０毎の通知データが予め定められた条件を満たすか否かを判定する。この実施形態では、導通判定部１３１は、他の端末１０から受信した通知データの値が全て「ＯＫ」を示す値である場合に、予め定められた条件を満たす（以下「導通ＯＫ」という）と判定する。一方、他の端末１０から受信された通知データのうちの少なくともいずれかひとつが「ＮＧ」である場合には、導通判定部１３１は予め定められた条件を満たさない（以下「導通ＮＧ」という）と判定する。スイッチ部１３２は、導通判定部１３１の制御によって切り替えられる。導通判定部１３１は、判定結果が「導通ＯＫ」である場合には効果音成分が出力されるようにスイッチ部１３２を切り替える。一方、判定結果が「導通ＮＧ」である場合には、導通判定部１３１は、効果音成分が出力されないようにスイッチ部１３２を切り替える。 The notification sound synthesizing unit 13 determines whether or not the listener has properly heard the speaker's story according to the notification data of the other terminal 10 supplied from the packet separation unit 11, and the determination result is affirmative. In some cases, a sound effect component signal representing a sound effect component of sound is generated using the sound signal output from the microphone MC.
FIG. 3 is a block diagram illustrating an example of the configuration of the notification sound synthesis unit 13. In the figure, a continuity determination unit 131 determines notification data transmitted from another terminal 10 for each terminal 10 and determines whether the notification data for each terminal 10 satisfies a predetermined condition. In this embodiment, the continuity determination unit 131 satisfies a predetermined condition (hereinafter referred to as “conduction OK”) when the values of the notification data received from other terminals 10 are all values indicating “OK”. Is determined. On the other hand, when at least one of the notification data received from the other terminals 10 is “NG”, the continuity determination unit 131 does not satisfy a predetermined condition (hereinafter referred to as “conduction NG”). Is determined. The switch unit 132 is switched under the control of the continuity determination unit 131. When the determination result is “conduction OK”, the continuity determination unit 131 switches the switch unit 132 so that the sound effect component is output. On the other hand, when the determination result is “conduction NG”, the continuity determination unit 131 switches the switch unit 132 so that the sound effect component is not output.

合成音生成部１３３は、マイクロホンＭＣから出力される音声信号を用いて、その音声信号の示す音声の音響効果の成分を表す効果音成分信号を生成し、生成した効果音成分信号をスイッチ部１３２に出力する。ここでは、合成音生成部１３３は、マイクロホンＭＣから出力される音声信号を用いて、所定の音響空間において該音声信号の表す音声が放音されたときのリバーブ効果音を表す音声信号（効果音成分信号）を生成する。
図４は、合成音生成部１３３が行う処理の内容の一例を示す図である。図４（ａ）は、マイクロホンＭＣから出力される音声信号の一例を示す図であり、図４（ｂ）は、合成音生成部１３３が生成する効果音成分信号の一例を表す図である。図において、横軸は時刻を示し、縦軸は振幅を示す。図示のように、合成音生成部１３３は、マイクロホンＭＣから出力される音声信号に対してリバーブ効果が付与された音声信号を生成するのではなく、マイクロホンＭＣから出力される音声信号の表す音声が所定の空間において放音されたときに発生し得るリバーブ効果音のみを表す音声信号を生成する。 The synthesized sound generating unit 133 generates a sound effect component signal representing a sound effect component of the sound indicated by the sound signal using the sound signal output from the microphone MC, and the generated sound effect component signal is switched to the switch unit 132. Output to. Here, the synthesized sound generation unit 133 uses the audio signal output from the microphone MC to generate an audio signal (sound effect) representing a reverb sound effect when the sound represented by the audio signal is emitted in a predetermined acoustic space. Component signal).
FIG. 4 is a diagram illustrating an example of the contents of the process performed by the synthesized sound generation unit 133. FIG. 4A is a diagram illustrating an example of an audio signal output from the microphone MC, and FIG. 4B is a diagram illustrating an example of a sound effect component signal generated by the synthesized sound generation unit 133. In the figure, the horizontal axis indicates time, and the vertical axis indicates amplitude. As illustrated, the synthesized sound generation unit 133 does not generate an audio signal with a reverb effect applied to an audio signal output from the microphone MC, but the audio represented by the audio signal output from the microphone MC. An audio signal representing only a reverb effect sound that can be generated when sound is emitted in a predetermined space is generated.

通知音合成部１３から出力される効果音成分信号は、加算器１５に出力される。加算器１５では、音声ミキサ１４から出力される音声信号と通知音合成部１３から出力される効果音成分信号とが加算され、スピーカＳＰに出力される。これにより、スピーカＳＰからは、他の端末１０から受信された音声データの表す音声に加えて、効果音成分が放音される。 The sound effect component signal output from the notification sound synthesizer 13 is output to the adder 15. In the adder 15, the audio signal output from the audio mixer 14 and the sound effect component signal output from the notification sound synthesis unit 13 are added and output to the speaker SP. Thereby, in addition to the voice represented by the voice data received from the other terminal 10, the sound effect component is emitted from the speaker SP.

＜動作＞
次に、本実施形態の動作について説明する。まず、端末１０は相互に音声通信を行う。端末１０は、マイクロホンＭＣで収音した音声を表す音声データを、他の端末１０に送信するとともに、他の端末１０から通信網２０を介して受信される音声データを受信し、受信した音声データをスピーカＳＰから音として放音する。これにより、遠隔会議が実現される。 <Operation>
Next, the operation of this embodiment will be described. First, the terminals 10 perform voice communication with each other. The terminal 10 transmits the audio data representing the sound collected by the microphone MC to the other terminal 10, receives the audio data received from the other terminal 10 via the communication network 20, and receives the received audio data Is emitted as a sound from the speaker SP. Thereby, a remote conference is realized.

端末１０の利用者は、操作部１８を操作して、専用のボタンＢ１を所定時間長毎に押下しつつ、発話者の話を聞く。ここで、ボタンＢ１が押下されると、操作部１８は、操作された内容に応じて操作信号Ｓ１を出力する。マイクロホンＭＣで収音された音声はＣＯＤＥＣ１６でコード化されて音声データとしてパケット合成部１７に出力される。パケット合成部１７は、ＣＯＤＥＣ１６から出力される音声データと操作部１８から出力された操作信号Ｓ１に応じた通知データとをパケット化して、他の端末１０へ送信する。 The user of the terminal 10 operates the operation unit 18 and listens to the speaker's story while pressing the dedicated button B1 every predetermined time length. Here, when the button B1 is pressed, the operation unit 18 outputs an operation signal S1 according to the operated content. The sound collected by the microphone MC is encoded by the CODEC 16 and output to the packet synthesizer 17 as sound data. The packet synthesizing unit 17 packetizes the audio data output from the CODEC 16 and the notification data corresponding to the operation signal S1 output from the operation unit 18, and transmits the packetized data to another terminal 10.

一方、パケット分離部１１は、受信したデータから音声データと通知データとを分離し、音声データをＣＯＤＥＣ１２に出力するとともに、通知データを通知音合成部１３に出力する。ＣＯＤＥＣ１２は、供給される音声データをデコードし、音声ミキサ１４に出力する。音声ミキサ１４は、複数のＣＯＤＥＣ１２から出力される音声データをミキシングして加算器１５に出力する。 On the other hand, the packet separation unit 11 separates the voice data and the notification data from the received data, outputs the voice data to the CODEC 12, and outputs the notification data to the notification sound synthesis unit 13. The CODEC 12 decodes the supplied audio data and outputs it to the audio mixer 14. The audio mixer 14 mixes the audio data output from the plurality of CODECs 12 and outputs it to the adder 15.

パケット分離部１１で分離された複数の通知データは通知音合成部１３の導通判定部１３１に供給される。導通判定部１３１は、供給される通知データの値が全て「ＯＫ」である場合には、効果音成分信号が出力されるようにスイッチ部１３２を切り替える一方、それ以外の場合には、効果音成分信号が出力されないようにスイッチ部１３２を切り替える。 The plurality of notification data separated by the packet separation unit 11 is supplied to the continuity determination unit 131 of the notification sound synthesis unit 13. The continuity determination unit 131 switches the switch unit 132 so that the sound effect component signal is output when the values of the supplied notification data are all “OK”, while in other cases, the sound effect sound The switch unit 132 is switched so that the component signal is not output.

通知音合成部１３の合成音生成部１３３は、マイクロホンＭＣから出力される音声信号を基にして効果音成分信号を生成し、加算器１５に出力する。すなわち、通知音合成部１３は、他の端末１０から受信した通知データの値が全て「ＯＫ」である場合に効果音成分信号を出力する。一方、他の端末１０から受信した通知データのなかに「ＮＧ」であるものが含まれる場合には、スイッチ部１３２が切り替えられることにより効果音成分信号は出力されない。 The synthesized sound generating unit 133 of the notification sound synthesizing unit 13 generates a sound effect component signal based on the audio signal output from the microphone MC and outputs it to the adder 15. That is, the notification sound synthesizer 13 outputs the sound effect component signal when the values of the notification data received from the other terminals 10 are all “OK”. On the other hand, when the notification data received from the other terminal 10 includes “NG”, the sound effect component signal is not output when the switch unit 132 is switched.

合成音生成部１３３から出力される効果音成分信号は、加算器１５において、音声ミキサ１４から出力される音声信号とミキシングされ、スピーカＳＰに出力される。これにより、スピーカＳＰからは、他の端末１０から受信された音声データの表す音声に加えて、効果音成分信号の表す効果音成分が放音される。 The sound effect component signal output from the synthesized sound generation unit 133 is mixed with the audio signal output from the audio mixer 14 in the adder 15 and output to the speaker SP. Thereby, in addition to the sound represented by the sound data received from the other terminal 10, the sound effect component represented by the sound effect component signal is emitted from the speaker SP.

このように、本実施形態によれば、端末１０は、マイクロホンＭＣから出力される音声信号から効果音成分信号を生成し、「導通ＯＫ」である場合に効果音成分を再生する。このようにすることにより、聴取者の聴取状況を、発話を妨げることなく発話者に対して報知することができる。
また、本実施形態によれば、接続地点数が多い場合でも全体を総合評価して一つの効果音にするので話者が全体を把握するのが容易になる。
また、本実施形態では、効果音が付与された音声全体ではなく、効果音のみを出力することで、聴取者の聴取状況を発話者に対してより自然に通知することができる。 Thus, according to the present embodiment, the terminal 10 generates the sound effect component signal from the sound signal output from the microphone MC, and reproduces the sound effect component when “conduction is OK”. In this way, the listener's listening status can be notified to the speaker without disturbing the utterance.
Moreover, according to this embodiment, even when there are many connection points, since the whole is evaluated comprehensively and it is set as one sound effect, it becomes easy for a speaker to grasp | ascertain the whole.
Further, in the present embodiment, by outputting only the sound effect, not the entire sound to which the sound effect is given, the listener's listening status can be notified more naturally to the speaker.

＜変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその例を示す。なお、以下の各態様を適宜に組み合わせてもよい。
（１）上述の実施形態では、本発明に係る端末を用いて遠隔会議を行う場合について説明したが、本発明はこれに限らず、例えば、通信ネットワークを介して講義や講演を行う場合においても本発明を適用することができる。 <Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below. In addition, you may combine each following aspect suitably.
(1) In the above-described embodiment, the case where a remote conference is performed using the terminal according to the present invention has been described. However, the present invention is not limited to this, and for example, when a lecture or lecture is performed via a communication network. The present invention can be applied.

（２）上述の実施形態では、合成音生成部１３３は、マイクロホンＭＣから出力される音声信号を用いてリバーブ効果音を表す効果音成分信号を生成したが、生成する効果音成分信号はこれに限らず、例えば、エコーやサラウンド効果等の効果音を表す音声信号であってもよく、空間系エフェクトの効果音（音響成分）を表す音声信号であればどのようなものであってもよい。
また、合成音生成部１３３が生成する効果音成分信号は音響成分を表す音声信号に限らず、例えば、音声信号との差が所定度となるようなハモリ音を表す音声信号を効果音成分信号として生成してもよい。 (2) In the above-described embodiment, the synthesized sound generation unit 133 generates a sound effect component signal representing a reverb sound effect using the sound signal output from the microphone MC. For example, it may be an audio signal representing an effect sound such as an echo or surround effect, or any audio signal representing an effect sound (acoustic component) of a spatial effect.
In addition, the sound effect component signal generated by the synthesized sound generation unit 133 is not limited to the sound signal representing the acoustic component, and for example, the sound signal representing the hammer sound whose difference from the sound signal becomes a predetermined degree is used as the sound effect component signal. May be generated as

（３）上述の実施形態では、他の端末１０から受信した通知データが全て「ＯＫ」である場合にのみ効果音成分を放音するようにしたが、効果音成分を放音する判定の態様はこれに限らず、例えば、他の端末１０から受信した通知データの過半数が「ＯＫ」である場合に効果音成分を放音するようにしてもよく、また、他の端末１０から受信した通知データにおいて「ＯＫ」のものが予め定められた閾値以上ある場合に、効果音成分を放音するようにしてもよい。要は、他の端末１０のうちの導通がＯＫである端末の数及び割合の少なくともいずれか一方が予め定められた条件を満たす場合に効果音成分信号を出力するようにすればよい。 (3) In the above-described embodiment, the sound effect component is emitted only when the notification data received from the other terminals 10 is all “OK”. However, the present invention is not limited to this. For example, when the majority of the notification data received from the other terminal 10 is “OK”, the sound effect component may be emitted, or the notification received from the other terminal 10 may be used. The sound effect component may be emitted when “OK” in the data is equal to or greater than a predetermined threshold. In short, it is only necessary to output the sound effect component signal when at least one of the number and the ratio of the terminals that are OK among the other terminals 10 satisfies a predetermined condition.

また、上述の実施形態において、端末１０が、通知データが「ＯＫ」である端末１０の数や割合に応じて、生成する効果音成分信号を異ならせるようにしてもよい。この場合は、例えば、通知音合成部１３が、「ＯＫ」を示す通知データの数が多いほど（すなわち聞いている人が多いほど）音響効果が高くなる（効果音が豪華になる）ように効果音成分信号を生成するようにしてもよい。この場合は、発話者の話を聴取している者が多いほど発話者の声に対する空間エフェクトが豪華になっていくから、これにより、発話者は、聴取者が多いほど気持ちよく話すことができる。また、例えば、端末１０が、通知データが「ＯＫ」である端末１０の数が多いほど効果音の音圧を大きくするようにしてもよく、また、例えば、通知データが「ＯＫ」である端末１０の数に応じた和音を放音するようにしてもよい。また、例えば、通知データが「ＯＫ」である端末１０の割合が大きいほど高い音程の効果音を放音するようにしてもよい。 In the above-described embodiment, the terminal 10 may generate different sound effect component signals according to the number and ratio of the terminals 10 whose notification data is “OK”. In this case, for example, the notification sound synthesizing unit 13 increases the sound effect (the sound effect becomes gorgeous) as the number of pieces of notification data indicating “OK” increases (that is, as the number of people who are listening) increases. A sound effect component signal may be generated. In this case, since the spatial effect on the voice of the speaker becomes gorgeous as the number of people listening to the speaker's story increases, the speaker can speak comfortably as the number of listeners increases. Further, for example, the terminal 10 may increase the sound pressure of the sound effect as the number of the terminals 10 whose notification data is “OK” increases, and for example, the terminal whose notification data is “OK”. You may make it emit the chord according to the number of ten. Further, for example, the higher the proportion of the terminals 10 whose notification data is “OK”, the higher the sound effect may be emitted.

また、上述の実施形態において、端末１０が、聞いている人が少ない（「ＮＧ」を示す通知データの数が多い）場合に、効果音成分に代えてオーディエンスの雑音や口笛音を放音するようにしてもよい。
また、上述の実施形態では、「導通ＯＫ」である場合に効果音を放音するようにしたが、これに限らず、例えば、「導通ＮＧ」と判定された場合に効果音を放音するようにしてもよく、端末１０の設計等に応じて適宜変更可能である。要は、聴取者の聴取状況（発話者の話を聞いているか否か、音声通信が正常に確立されているか否か、等）を、発話者に対して報知し得る態様であればよい。 In the above-described embodiment, when the number of people who are listening is small (the number of notification data indicating “NG” is large), the terminal 10 emits an audience noise or a whistle sound instead of the sound effect component. You may do it.
In the above-described embodiment, the sound effect is emitted when “conduction is OK”. However, the present invention is not limited to this. For example, the sound effect is emitted when it is determined as “conduction NG”. It may be configured, and can be appropriately changed according to the design of the terminal 10 or the like. The point is that the listener can be informed of the listening situation (whether the speaker is listening, whether voice communication is normally established, etc.).

（４）上述の実施形態では、利用者によって専用のボタンＢ１が押下された旨を示す通知データを用いたが、通知データはこれに限らず、通知音合成部１３が、端末１０のマイクロホンＭＣによって収音された音声を表す音声データを解析することによって通知データを生成するようにしてもよい。要は、端末１０が、自端末から送信した音声データが他の端末で正常に再生されているか否かを示すデータを通知データとして用いるようにしてもよい。要は、相手がきちんと話を聞いている（又は通信が正常に接続されている）ことを示す情報であればどのようなものであってもよい。 (4) In the above-described embodiment, the notification data indicating that the dedicated button B1 is pressed by the user is used. However, the notification data is not limited to this, and the notification sound synthesizing unit 13 is connected to the microphone MC of the terminal 10. The notification data may be generated by analyzing the voice data representing the voice collected by. In short, the terminal 10 may use, as the notification data, data indicating whether or not the audio data transmitted from the terminal 10 is normally played back by another terminal. In short, any information may be used as long as the information indicates that the other party is listening properly (or communication is normally connected).

（５）上述の実施形態では、他の端末１０から受信する通知データの値を判定することによって、導通がＯＫか否かを判定したが、導通の判定方法はこれに限らず、例えば、他の端末１０から受信する音声データを解析し、解析結果に応じて判定するようにしてもよい。より具体的には、例えば、他の端末１０から受信される音声データの音圧を検出し、検出した音圧が予め定められた条件を満たす場合に、導通ＯＫであると判定するようにしてもよい。 (5) In the above-described embodiment, whether or not continuity is OK is determined by determining the value of notification data received from another terminal 10, but the determination method of continuity is not limited to this. The voice data received from the terminal 10 may be analyzed and determined according to the analysis result. More specifically, for example, the sound pressure of audio data received from another terminal 10 is detected, and when the detected sound pressure satisfies a predetermined condition, it is determined that the connection is OK. Also good.

また、他の例として、例えば、他の端末１０から受信される音声データからうなずき音が検出されるか否かによって判定するようにしてもよい。この場合の具体的な構成の一例について、図５を参照しつつ説明する。図５に示す端末１０Ａにおいて、パケット分離部１１，ＣＯＤＥＣ１２，通知音合成部１３，加算器１５，ＣＯＤＥＣ１６，パケット合成部１７は、上述した実施形態における端末１０の各部と同様であり、ここではその詳細な説明を省略する。図５において、メモリ２４には、うなずき音声やうなずき音声の特徴を表す照合用データが記憶されている。音声解析部２３は、他の端末１０から受信された音声データをメモリ２１に記憶された照合データと照合し、受信した音声データから照合結果が予め定められた条件を満たすデータ（以下「うなずき音データ」という）を検出する。音声解析部２３は、予め定められた頻度でうなずき音データが検出されている場合には「ＯＫ」を示す導通情報を通知音合成部１３に出力する一方、それ以外の場合には「ＮＧ」を示す導通情報を通知音合成部１３に出力する。なお、通知音合成部１３の動作は上述の実施形態と同様であり、ここではその詳細な説明を省略する。
すなわち、この態様では、相手が話しを聞いているか否かをうなずき音声データが検出されるか否かによって判定する。うなずき音声データの検出の態様としては、照合用データと照合して検出するようにしてもよく、また、他の端末１０から受信された音声データを解析し、音声認識できなかったものをうなずき音声として検出するようにしてもよい。 As another example, for example, the determination may be made based on whether or not a nod sound is detected from audio data received from another terminal 10. An example of a specific configuration in this case will be described with reference to FIG. In the terminal 10A shown in FIG. 5, the packet separation unit 11, the CODEC 12, the notification sound synthesis unit 13, the adder 15, the CODEC 16, and the packet synthesis unit 17 are the same as the respective units of the terminal 10 in the above-described embodiment. Detailed description is omitted. In FIG. 5, the memory 24 stores nodding voice and collation data representing the characteristics of the nodding voice. The voice analysis unit 23 collates the voice data received from the other terminal 10 with the collation data stored in the memory 21, and the data (hereinafter referred to as “nodding sound”) in which the collation result is determined from the received voice data. Data)). The voice analysis unit 23 outputs continuity information indicating “OK” to the notification sound synthesis unit 13 when the nodding sound data is detected at a predetermined frequency, while “NG” otherwise. Is output to the notification sound synthesizing unit 13. The operation of the notification sound synthesizing unit 13 is the same as that in the above-described embodiment, and the detailed description thereof is omitted here.
That is, in this aspect, whether or not the other party is listening is determined by nodding voice data. As a mode of detection of the nodding voice data, it may be detected by collating with the matching data, or the voice data received from the other terminal 10 is analyzed and the voice that cannot be recognized by the nodding voice. May be detected.

（６）また、上述の実施形態において、画像解析によってうなずき動作を検出するようにしてもよい。この場合は、端末１０に利用者を撮影する撮影手段を設ける構成とし、端末１０が、撮影手段から出力される映像データを画像解析し、解析結果に応じてうなずき動作を検出し、うなずき動作が検出されたか否かを判定するようにすればよい。 (6) In the above-described embodiment, the nodding operation may be detected by image analysis. In this case, the terminal 10 is provided with photographing means for photographing the user, and the terminal 10 performs image analysis on the video data output from the photographing means, detects the nodding action according to the analysis result, and the nodding action is performed. What is necessary is just to determine whether it was detected.

（７）上述の実施形態において、端末１０が、他の端末１０の位置情報に応じて、放音する効果音をパンニングするようにしてもよい。この場合は、他の端末１０の位置を示す位置データを他の端末１０毎に記憶しておき、端末１０が、端末１０毎の判定結果と各端末の位置情報とに応じた方向に音像定位するように制御するようにしてもよい。 (7) In the above-described embodiment, the terminal 10 may pan the sound effect to be emitted according to the position information of the other terminal 10. In this case, position data indicating the position of the other terminal 10 is stored for each of the other terminals 10, and the terminal 10 performs sound image localization in a direction corresponding to the determination result for each terminal 10 and the position information of each terminal. You may make it control so that it may do.

（８）上述の実施形態において、端末１０が、話者のジェスチャーを検出し、検出結果に応じた効果音を放音するようにしてもよい。この場合、ジェスチャーは国によって意味が異なるため、国毎にデータベースを異ならせるようにしてもよい。具体的には、端末１０に利用者を撮影する撮影手段を設ける構成とし、端末１０が、撮影手段から出力される映像データを画像解析し、解析結果を予め定められたデータベースに登録された照合用データと照合して話者のジェスチャーを検出し、検出したジェスチャーに対応する効果音を、ジェスチャーと効果音との対応関係を記憶するデータベースを参照して特定し、特定した効果音を放音するようにしてもよい。 (8) In the above-described embodiment, the terminal 10 may detect a speaker's gesture and emit a sound effect according to the detection result. In this case, since the meaning of the gesture differs depending on the country, the database may be different for each country. Specifically, the terminal 10 is provided with a photographing means for photographing the user, and the terminal 10 performs image analysis on the video data output from the photographing means and collates the analysis result registered in a predetermined database. The speaker's gesture is detected by comparing with the data, and the sound effect corresponding to the detected gesture is identified by referring to the database storing the correspondence relationship between the gesture and the sound effect, and the identified sound effect is emitted. You may make it do.

（９）上述の実施形態における端末１０又は端末１０Ａの各部は、ハードウェアとして構成されてもよく、また、ＣＰＵ（Central Processing Unit）等の制御部がハードウェア等の記憶手段に記憶されたコンピュータプログラムを実行することによってソフトウェアとして実現するようにしてもよい。また、この場合、制御部によって実行されるプログラムは、磁気記録媒体（磁気テープ、磁気ディスクなど）、光記録媒体（光ディスクなど）、光磁気記録媒体、半導体メモリなどのコンピュータが読取可能な記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由で各装置にダウンロードさせることも可能である。 (9) Each unit of the terminal 10 or the terminal 10A in the above-described embodiment may be configured as hardware, and a computer in which a control unit such as a CPU (Central Processing Unit) is stored in storage means such as hardware You may make it implement | achieve as software by running a program. In this case, the program executed by the control unit is a computer-readable recording medium such as a magnetic recording medium (magnetic tape, magnetic disk, etc.), an optical recording medium (optical disk, etc.), a magneto-optical recording medium, and a semiconductor memory. Can be provided in a recorded state. It is also possible to download to each device via a network such as the Internet.

遠隔会議システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a remote conference system. 端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a terminal. 通知音合成部の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a notification sound synthetic | combination part. 合成音生成部が行う処理の内容を説明するための図である。It is a figure for demonstrating the content of the process which a synthetic sound production | generation part performs. 端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a terminal.

Explanation of symbols

１…遠隔会議システム、１０…端末、１１…パケット分離部、１２…ＣＯＤＥＣ、１３…通知音合成部、１４…音声ミキサ、１５…加算器、１６…ＣＯＤＥＣ、１７…パケット合成部、１８…操作部、２０…通信網、２４…メモリ、２３…音声解析部、２２…データ受信部、１３１…導通判定部、１３２…スイッチ部、１３３…合成音生成部。 DESCRIPTION OF SYMBOLS 1 ... Remote conference system, 10 ... Terminal, 11 ... Packet separation part, 12 ... CODEC, 13 ... Notification sound synthesis part, 14 ... Audio mixer, 15 ... Adder, 16 ... CODEC, 17 ... Packet synthesis part, 18 ... Operation , 20 ... communication network, 24 ... memory, 23 ... voice analysis unit, 22 ... data reception unit, 131 ... continuity determination unit, 132 ... switch unit, 133 ... synthesized sound generation unit.

Claims

Receiving means for receiving data transmitted from other terminals connected via a communication network;
Transmitting means for transmitting voice data representing the sound collected by the sound collecting means to the other terminal;
Using sound data representing the sound collected by the sound collecting means, a sound effect component generating means for generating a sound effect component signal representing a sound effect component of the sound; and
Analyzing the data received by the receiving means, and determining means for determining whether the data satisfies a predetermined condition according to an analysis result;
And a sound emission control means for outputting the sound effect component signal generated by the sound effect component generation means to the sound emission means when the determination result by the determination means is affirmative. apparatus.

The receiving means receives data from a plurality of terminals,
The determination unit analyzes the data received by the reception unit for each terminal, determines whether the data satisfies a predetermined condition according to the analysis result for each terminal,
The sound effect component generation means varies the sound effect component signal to be generated according to at least one of the number and the ratio of terminals whose determination result by the determination means is affirmative. The communication apparatus as described in.

The receiving means receives data from a plurality of terminals,
The determination unit analyzes the data received by the reception unit for each terminal, determines whether the data satisfies a predetermined condition according to the analysis result for each terminal,
The sound emission control means sends the sound effect component signal to the sound emission means when at least one of the number and the ratio of terminals whose determination result by the determination means is positive satisfies a predetermined condition. The communication device according to claim 1, wherein the communication device outputs the information.

Comprising position data storage means for storing position data indicating positions of the plurality of terminals for each of the terminals;
The sound emission control means localizes the sound emitted from the sound emission means in a direction according to position data stored in the position data storage means and a determination result for each terminal by the determination means. It controls as follows. The communication apparatus of Claim 2 or 3 characterized by the above-mentioned.

A sound emission mode data storage unit for storing sound emission mode data indicating a mode of sound emission by the sound emission unit for each terminal;
5. The sound emission control unit according to claim 2, wherein the sound emission control unit emits sound in a sound emission mode indicated by a sound emission mode data corresponding to a terminal for which a determination result by the determination unit is positive. The communication apparatus as described in.

The determination unit extracts notification data indicating whether or not the voice data transmitted by the transmission unit is normally reproduced in the other terminal from the data received by the reception unit, and the extracted notification data The communication apparatus according to claim 1, wherein the determination is made according to

The said determination means detects the sound pressure of the audio | voice data received by the said reception means, and determines whether the detected sound pressure satisfy | fills predetermined conditions. The communication apparatus of any one of Claims.

The receiving means receives data including audio data from the other terminal,
The communication apparatus according to any one of claims 1 to 5, wherein the determination unit analyzes voice data included in data received by the reception unit, and determines according to an analysis result.

Comprising collation data storage means for storing collation data representing voice or voice characteristics;
The determination unit compares the voice data received by the reception unit with the verification data stored in the verification data storage unit, and determines whether the verification result includes voice data that satisfies a predetermined condition. The communication device according to claim 8, wherein: