JP2015069136A

JP2015069136A - Communication conference device having sound volume adjustment function for each speaker

Info

Publication number: JP2015069136A
Application number: JP2013205220A
Authority: JP
Inventors: 土屋　剛志; Tsuyoshi Tsuchiya; 剛志土屋
Original assignee: Nakayo Inc
Current assignee: Nakayo Inc
Priority date: 2013-09-30
Filing date: 2013-09-30
Publication date: 2015-04-13

Abstract

PROBLEM TO BE SOLVED: To provide a communication conference device capable of adjusting transmission sound volume level of a voice signal which is transmitted from the communication conference device to an other pair ground according to a request from the other pair ground side for each speaker.SOLUTION: The communication conference device comprises: speaker identification means for identifying a speaker in a communication conference; and sound volume adjustment request signal reception means for receiving, from an other pair ground, a sound volume adjustment request signal for requesting for level up or down of a transmission sound volume of a voice signal which is transmitted from an own device. During the transmission of the voice signal to the other pair ground, when the sound volume adjustment request signal is received, according to the request content, the transmission sound volume level of the voice signal transmitted to the other pair ground is increased or reduced by a predetermined amount.

Description

本発明は、他対地へ送信する音声信号の送話音量レベルを、話者毎に自動的に調整する機能を有する通信会議装置に関する。 The present invention relates to a communication conference apparatus having a function of automatically adjusting a transmission volume level of an audio signal transmitted to another ground for each speaker.

通信会議装置において、多数の参加者が居て、マイクの数が参加者数より少ない場合、発言する参加者（話者）とマイクの距離により、他拠点へ送信する音声信号の送話音量レベルが変動する。また、話者自身の音量にも個人差がある。そのため、一般的に、多数の参加者が参加する通信会議においては、発言内容が聞き取り難い話者が居ることがあった。 In a teleconferencing device, when there are a large number of participants and the number of microphones is less than the number of participants, the transmission volume level of the audio signal to be transmitted to another site depending on the distance between the speaking participant (speaker) and the microphone Fluctuates. There are also individual differences in the speaker's own volume. For this reason, in general, in a communication conference in which a large number of participants participate, there is a case where there is a speaker whose contents are difficult to hear.

これを解決する方法として、電磁波等を出力する装置及び検出する装置により、話者の位置を特定する技術がある（例えば、特許文献１）。この技術で話者の位置を特定することにより、他対地に対して、話者毎に適切な音量レベルで音声信号を送出することが可能になる。 As a method for solving this, there is a technique for specifying the position of a speaker using a device that outputs an electromagnetic wave or the like and a device that detects the electromagnetic wave (for example, Patent Document 1). By specifying the position of the speaker with this technique, it becomes possible to transmit a sound signal at an appropriate volume level for each speaker to another ground.

しかしながら、特許文献１に記載された技術は、話者の位置を特定するために特別な装置（電磁波等を出力する装置及び検出する装置）を必要としており、高価であった。 However, the technique described in Patent Document 1 requires a special device (a device that outputs electromagnetic waves or the like and a device that detects the electromagnetic wave) in order to specify the position of the speaker, and is expensive.

また、通信会議装置から他対地に対して適切な送話音量レベルで音声信号を送出しても、その音声信号を受信する他対地の環境または話者が発する音量，音質によっては、特定の話者の音声が聞き取りにくい場合がある（例えば、低音雑音の環境下における低音な受信音声）。このようなケースでは、通信会議装置から他対地へ送出する音声信号の送話音量レベルを、他対地側からの要求によって、話者毎に調整可能であることが望まれる。 Also, even if a voice signal is transmitted from the teleconference device to the other ground at an appropriate transmission volume level, depending on the other ground environment receiving the voice signal or the volume and sound quality of the speaker, a specific talk In some cases, it is difficult to hear a person's voice (for example, a low-pitched received voice in a low-noise environment). In such a case, it is desirable that the transmission volume level of the audio signal transmitted from the communication conference apparatus to the other ground can be adjusted for each speaker according to a request from the other ground side.

特開２０１１−１９９７６３号公報JP 2011-199763 A

そこで、本発明の課題は、話者の位置を特定するために特別な装置を必要とせず、通信会議装置から他対地へ送出する音声信号の送話音量レベルを、他対地側からの要求により、話者毎に調整することが可能な通信会議装置を提供することにある。 Therefore, the problem of the present invention is that no special device is required to specify the position of the speaker, and the transmission volume level of the audio signal transmitted from the communication conference device to the other ground is determined by a request from the other ground side. An object of the present invention is to provide a communication conference apparatus that can be adjusted for each speaker.

上記の課題を解決するために、本発明は、複数の話者の音声を集音する自装置と接続されたマイクが入力した音声信号、または１以上の他対地から受信した音声信号を処理し、処理した音声信号を他対地へ送信する通信会議装置であって、前記他対地の各々とネットワークを介して音声信号を送信または受信する音声信号送受信手段と、前記マイクが入力した音声信号の話者を識別する話者識別手段と、前記話者識別手段が識別する話者毎の音量レベルに係る情報を記憶しておく話者別音量レベル記憶手段と、前記話者別音量レベル記憶手段を参照して他対地へ送信する音声信号の音量レベルを調整する話者別音量レベル調整手段と、他対地から自装置が送信する音声信号の音量レベルのアップまたはダウンを要求する音量調整要求信号を受信する音量調整要求信号受信手段と、を有し、他対地へ前記マイクが入力した音声信号を送信している最中に、前記音量調整要求信号受信手段が他対地から前記音量調整要求信号を受信した場合に、前記話者別音量レベル調整手段は、前記受信した音量調整要求信号の要求内容に従って、他対地へ送信する音声信号の音量レベルを所定量アップまたは所定量ダウンすると共に、前記アップまたはダウンした音量レベルに係る情報を、前記話者識別手段が識別した話者と対応付けて前記話者別音量レベル記憶手段に記憶することを特徴とする。 In order to solve the above problems, the present invention processes an audio signal input from a microphone connected to a self-device that collects the sounds of a plurality of speakers, or an audio signal received from one or more other grounds. A communication conferencing apparatus for transmitting a processed audio signal to another ground, and a voice signal transmitting / receiving means for transmitting or receiving a voice signal via each of the other grounds and a network; and a voice signal input by the microphone Speaker identifying means for identifying a speaker, volume level storing means for each speaker for storing information relating to a volume level for each speaker identified by the speaker identifying means, and volume level storing means for each speaker. A volume level adjusting means for each speaker that adjusts the volume level of the audio signal to be transmitted to the other ground with reference, and a volume adjustment request signal for requesting to increase or decrease the volume level of the audio signal transmitted from the other apparatus to the own device. Receiving A volume adjustment request signal receiving means for receiving the volume adjustment request signal from the other ground while the sound signal input by the microphone is being transmitted to the other ground. In this case, the speaker-specific volume level adjusting means increases or decreases the volume level of the audio signal transmitted to the other ground by a predetermined amount according to the request content of the received volume adjustment request signal. Information related to the volume level that is down is stored in the speaker-specific volume level storage unit in association with the speaker identified by the speaker identification unit.

本発明によれば、話者の位置を特定するために特別な装置を必要とせず、実際に話者の音声を受信する他対地側から要求に応じて、他対地へ送出する音声信号の送話音量レベルを話者毎に調整するので、受信側の環境を加味した適切なレベル調整が可能である。また、その調整結果を記憶しておいて、同一他拠点との次回の通信会議にそのまま流用出来るので、再調整が軽微で済むという利点がある。 According to the present invention, there is no need for a special device for specifying the position of the speaker, and the transmission of the audio signal to be transmitted to the other ground in response to a request from the other ground side that actually receives the voice of the speaker. Since the speech volume level is adjusted for each speaker, it is possible to adjust the level appropriately in consideration of the environment on the receiving side. In addition, since the adjustment result is stored and can be used as it is for the next communication conference with the same other base, there is an advantage that the readjustment is light.

本発明による通信会議装置１の内部ブロック構成図Block diagram of internal configuration of communication conference apparatus 1 according to the present invention 通信会議端末２Ａが設置された拠点Ａの会議室の例Example of conference room at site A where the teleconference terminal 2A is installed 話者別音量レベル記憶部１７に記憶されたデータの例Example of data stored in speaker-specific volume level storage unit 17 本発明による通信会議装置１の動作フローチャートOperation flowchart of the communication conference apparatus 1 according to the present invention.

以下、本発明の実施形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明を適用した話者別音量調整機能を有する通信会議装置１（以下、本装置と略す）のブロック構成図の例である。本装置１は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）３を経由して通信会議端末２Ａ、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）４を経由して通信会議端末２Ｂ、２Ｃと繋がる。本装置１は、ＬＡＮ３またはＷＡＮ４を経由して複数の通信会議端末２Ａ、２Ｂ、２Ｃに対して、会議通信呼の制御や映像信号、音声信号を中継する機能を有する。ここで、本実施例は簡単化のために、本装置１に接続された通信会議端末２Ａ、２Ｂ、２Ｃを３台として説明しているが、その限りではない。 FIG. 1 is an example of a block configuration diagram of a communication conference apparatus 1 (hereinafter abbreviated as this apparatus) having a speaker-specific volume adjustment function to which the present invention is applied. The apparatus 1 is connected to communication conference terminals 2A and 2C via a communication conference terminal 2A via a LAN (Local Area Network) 3 and via a WAN (Wide Area Network) 4. The apparatus 1 has a function of controlling a conference communication call and relaying a video signal and an audio signal to a plurality of communication conference terminals 2A, 2B, and 2C via a LAN 3 or a WAN 4. Here, for the sake of simplicity, the present embodiment has been described assuming that there are three communication conference terminals 2A, 2B, and 2C connected to the apparatus 1, but this is not a limitation.

本装置１は、通信制御部１１、映像信号送受信部１２、話者識別部１３、音声信号送受信部１４、話者別音量レベル調整部１５、音量調整信号受信部１６、話者別音量レベル記憶部１７から構成される。 The apparatus 1 includes a communication control unit 11, a video signal transmission / reception unit 12, a speaker identification unit 13, an audio signal transmission / reception unit 14, a volume level adjustment unit 15 for each speaker, a volume adjustment signal reception unit 16, and a volume level storage for each speaker. The unit 17 is configured.

通信制御部１１は、通信会議端末２Ａ、２Ｂ、２Ｃからの要求に従って、会議通信呼の確立を行うと共に、各通信会議端末と映像信号および音声信号の送受信を行う。 The communication control unit 11 establishes a conference communication call according to a request from the communication conference terminals 2A, 2B, and 2C, and transmits and receives video signals and audio signals to and from each communication conference terminal.

映像信号送受信部１２は、各通信会議端末２Ａ、２Ｂ、２Ｃから受信した映像信号を、他対地へ送信するための中継処理を行う。 The video signal transmission / reception unit 12 performs a relay process for transmitting the video signals received from the communication conference terminals 2A, 2B, and 2C to another ground.

話者識別部１３は、映像信号送受信部１２を介して各通信会議端末から受信した話者を撮影するカメラからの映像信号を解析して通信会議の話者の識別を行う。 The speaker identification unit 13 analyzes the video signal from the camera that captures the speaker received from each communication conference terminal via the video signal transmission / reception unit 12 and identifies the speaker of the communication conference.

尚、話者を識別する映像信号の解析は、一般的な顔画像認識技術が適用でき、例えば顔の特徴点を抽出し、その位置関係を解析し、予め登録されている各話者のデータ（図示せず）と照合して話者を識別すればよい。 For the analysis of the video signal for identifying the speaker, a general face image recognition technique can be applied. For example, facial feature points are extracted, their positional relationship is analyzed, and data of each speaker registered in advance is analyzed. What is necessary is just to identify a speaker by collating with (not shown).

さらに、カメラ映像中に複数人が居る場合は、口の動きを認識し、口が動いている者が話者であると判定すればよい。 Furthermore, when there are a plurality of people in the camera video, it is only necessary to recognize the movement of the mouth and determine that the person whose mouth is moving is the speaker.

音声信号送受信部１４は、各通信会議端末２Ａ、２Ｂ、２Ｃから受信した音声信号を、他対地へ送信するための中継処理を行う。 The audio signal transmission / reception unit 14 performs a relay process for transmitting the audio signal received from each of the communication conference terminals 2A, 2B, 2C to the other ground.

音量調整信号受信部１６は、他対地から本装置１が送出する音声信号の送話音量レベルのアップまたはダウンを要求する音量調整要求信号を受信する。 The volume adjustment signal receiving unit 16 receives a volume adjustment request signal for requesting to increase or decrease the transmission volume level of the audio signal transmitted from the other ground.

話者別音量レベル記憶部１７は、話者識別部１３が識別した話者に対応した、送話音量レベルを記憶する。 The speaker-specific volume level storage unit 17 stores the transmission volume level corresponding to the speaker identified by the speaker identification unit 13.

話者別音量レベル調整部１５は、話者識別部１３が識別した話者の送話音量レベルを話者別音量レベル記憶部１７から抽出し、音声信号送受信部１４に対して、音声信号の送話音量レベルを指示する。 The speaker-specific volume level adjustment unit 15 extracts the transmission volume level of the speaker identified by the speaker identification unit 13 from the speaker-specific volume level storage unit 17, and transmits the audio signal to the audio signal transmission / reception unit 14. Instruct the transmission volume level.

また、音声信号送受信部１４が音声信号を送信している最中に、音量調整要求信号受信部１６が他対地からの音量調整要求信号を受信した場合に、その要求内容に従って、他対地へ送信する音声信号の送話音量レベルを所定量アップまたは所定量ダウンする指示をすると共に、前記アップまたはダウンした送話音量レベルに係る情報を、話者識別部１３が識別した話者と対応付けて、話者別音量レベル記憶部１７に記憶する。 Further, when the audio signal transmission / reception unit 14 is transmitting the audio signal, when the volume adjustment request signal receiving unit 16 receives the volume adjustment request signal from another ground, it is transmitted to the other ground according to the requested content. Instructing to increase or decrease the transmission volume level of the audio signal to be transmitted by a predetermined amount and associating the information related to the increased or decreased transmission volume level with the speaker identified by the speaker identification unit 13 And stored in the speaker-specific volume level storage unit 17.

通信会議端末２Ａ、２Ｂ、２Ｃは、通信会議に参加する各拠点に設置される。本実施例における通信会議端末は、通信制御部２１、映像信号入力部２２、映像信号出力部２３、音声信号入力部２４、音声信号出力部２５、操作部２６、カメラ２７、モニタ２８、マイク２９、スピーカ３０から構成されるテレビ会議端末である。操作部２６は、通信会議の開始、終了等の操作を行う操作部である。 The communication conference terminals 2A, 2B, and 2C are installed at each site participating in the communication conference. The communication conference terminal in this embodiment includes a communication control unit 21, a video signal input unit 22, a video signal output unit 23, an audio signal input unit 24, an audio signal output unit 25, an operation unit 26, a camera 27, a monitor 28, and a microphone 29. , A video conference terminal composed of a speaker 30. The operation unit 26 is an operation unit that performs operations such as start and end of a communication conference.

映像信号入力部２２はカメラ２７からの映像信号の入力部であって、入力された映像信号は、通信制御部２１を介して本装置１に送出される。音声信号入力部２４はマイク２９からの音声信号の入力部であって、入力された音声信号は、通信制御部２１を介して本装置１に送出される。 The video signal input unit 22 is a video signal input unit from the camera 27, and the input video signal is sent to the apparatus 1 via the communication control unit 21. The audio signal input unit 24 is an input unit for an audio signal from the microphone 29, and the input audio signal is sent to the apparatus 1 via the communication control unit 21.

また、通信制御部２１は、本装置１で中継された他対地の映像信号および音声信号を受信し、映像信号は映像信号出力部２３を介してモニタ２８へ、音声信号は音声信号出力部２５を介してスピーカ３０へ出力される。 Further, the communication control unit 21 receives the video signal and audio signal of the other ground relayed by the apparatus 1, the video signal is sent to the monitor 28 via the video signal output unit 23, and the audio signal is the audio signal output unit 25. Is output to the speaker 30 via.

図２は、通信会議端末２Ａが設置された拠点Ａの会議室の例である。複数の会議参加者（参加者Ａ−１、Ａ−２、Ａ−３）に対して、話者の発言を集音する通信会議端末のマイクは一つであり、参加者Ａ−１、参加者Ａ−３はマイクに近い位置で会議に参加しており、参加者Ａ−２はマイクから遠い位置で会議に参加していることを示している。 FIG. 2 is an example of a conference room at the base A where the communication conference terminal 2A is installed. For a plurality of conference participants (participants A-1, A-2, A-3), there is one microphone of the communication conference terminal that collects the speaker's speech, and the participant A-1, the participant The person A-3 is participating in the conference at a position close to the microphone, and the participant A-2 is participating in the meeting at a position far from the microphone.

図３は、話者別音量レベル記憶部１７に記憶されたデータの例である。通信会議に参加している拠点（列２０１）の参加者（列２０２）毎に、他対地へ送出する音声信号の送話音量レベル（列２０３）が記憶されている。 FIG. 3 is an example of data stored in the speaker-specific volume level storage unit 17. For each participant (column 202) of the base (column 201) participating in the communication conference, the transmission volume level (column 203) of the audio signal transmitted to the other ground is stored.

話者別音量レベル記憶部１７は、通信会議端末２の操作部２６から登録された、各参加者の送話音量レベルを記憶する事が可能である。 The speaker-specific volume level storage unit 17 can store the transmission volume level of each participant registered from the operation unit 26 of the communication conference terminal 2.

尚、通信会議端末２の操作部２６から、参加者の送話音量レベルが登録されていない場合、送話音量レベル（列２０３）の初期値は０としてもよいし、通信会議端末２のカメラ２７で撮影された映像から参加者の位置を特定し、参加者の位置に応じて送話音量レベル（列２０３）の初期値を自動的に決定してもよい。 If the transmission volume level of the participant is not registered from the operation unit 26 of the communication conference terminal 2, the initial value of the transmission volume level (column 203) may be 0, or the camera of the communication conference terminal 2 The position of the participant may be specified from the video imaged in 27, and the initial value of the transmission volume level (column 203) may be automatically determined according to the position of the participant.

図３において、図２に示した通信会議端末２Ａが設置された拠点Ａの会議室で会議に参加している参加者Ａ−２（行２１２）は、マイクから遠い位置で会議に参加しているため、各拠点へ送出する音声信号の送話音量レベル(列２０３)が、拠点Ｂに対して＋３ｄＢ、拠点Ｃに対して＋３ｄＢと記憶されている。 In FIG. 3, a participant A-2 (line 212) participating in the conference in the conference room of the base A where the communication conference terminal 2A shown in FIG. 2 is installed participates in the conference at a position far from the microphone. Therefore, the transmission volume level (column 203) of the audio signal transmitted to each site is stored as +3 dB for site B and +3 dB for site C.

図４は、本発明による通信会議装置１の動作フローチャートである。Ｓ３００において、本フローは通信会議が開始された段階でスタートする。以下、図１〜３を併用して本フローを説明する。 FIG. 4 is an operation flowchart of the communication conference apparatus 1 according to the present invention. In S300, this flow starts when the communication conference is started. Hereinafter, this flow will be described with reference to FIGS.

尚、通信会議開始、終了等の通信会議関連イベントに係わる処理は一般的な通信会議装置と同じであるので、その詳細は割愛する。 The processing related to the communication conference related event such as the start and end of the communication conference is the same as that of a general communication conference apparatus, and the details thereof are omitted.

話者別音量レベル調整部１５は、音声信号送受信部１４の音声が有音であるか無音であるか、すなわち発言者の有無を監視し、発言者有りの場合（Ｓ３０２、ＹＥＳ）、話者識別部１３が話者を特定したか否かを確認する。 The volume level adjusting unit 15 for each speaker monitors whether the sound of the audio signal transmitting / receiving unit 14 is voiced or silent, that is, the presence or absence of a speaker. If there is a speaker (YES in S302), the speaker It is confirmed whether or not the identification unit 13 has specified the speaker.

話者別音量レベル調整部１５は、話者識別部１３が話者を特定した場合（Ｓ３０３、ＹＥＳ）、話者識別部１３が特定した話者の音量レベルに係わる情報を話者別音量レベル記憶部１７から取得し、音声信号送受信部１４に対して、送話音量レベルを指示する（Ｓ３０５）。 When the speaker identification unit 13 specifies a speaker (YES in S303), the speaker-specific volume level adjustment unit 15 displays information related to the speaker volume level specified by the speaker identification unit 13 as a speaker-specific volume level. Obtained from the storage unit 17 and instructs the audio signal transmission / reception unit 14 of the transmission volume level (S305).

例えば、話者識別部１３が特定した話者が図３の参加者Ａ−２であった場合、話者別音量レベル調整部１５は、話者別音量レベル記憶部１７の参加者Ａ−２の音量レベルに係わる情報(図３の行２１２)から、他対地へ送出する音声信号の送話音量レベル（図３の列２０３）を抽出（拠点Ｂ＝＋３ｄＢ、拠点Ｃ＝＋３ｄＢ）して、音声信号送受信部１４に対して、送話音量レベルを指示する。 For example, when the speaker identified by the speaker identifying unit 13 is the participant A-2 in FIG. 3, the speaker-specific volume level adjusting unit 15 includes the participant A-2 in the speaker-specific volume level storage unit 17. 3 is extracted (base B = + 3 dB, base C = + 3 dB) from the volume level information (line 212 in FIG. 3) of the voice signal to be transmitted to the other ground (column 203 in FIG. 3). The audio signal transmitting / receiving unit 14 is instructed to send a sound volume.

尚、話者識別部１３は、発言者が会議に途中参加したメンバーであった場合や、席を移動した等の場合に、話者を特定できない可能性がある。 Note that the speaker identification unit 13 may not be able to identify the speaker when the speaker is a member who has joined the conference halfway or moved.

話者別音量レベル調整部１５が話者識別部１３を参照した際に、話者が特定できてない場合（Ｓ３０３、ＮＯ）は、発言中の話者が話者別音量レベル記憶部１７に記憶されていない話者だと判断し、話者別音量レベル記憶部１７に発言中の話者を新規に登録する（Ｓ３０４）。話者別音量レベル記憶部１７に新規に登録する話者の送話音量レベルの初期値は０としてもよいし、通信会議端末２のカメラ２７で撮影された映像から参加者の位置を特定し、参加者の位置に応じて送話音量レベル（列２０３）の初期値を自動的に決定してもよい。 When the speaker-specific volume level adjustment unit 15 refers to the speaker identification unit 13 and the speaker cannot be identified (S303, NO), the speaker who is speaking is stored in the speaker-specific volume level storage unit 17. It is determined that the speaker is not stored, and the speaker who is speaking is newly registered in the speaker-specific volume level storage unit 17 (S304). The initial value of the transmission volume level of the speaker newly registered in the speaker-specific volume level storage unit 17 may be 0, or the position of the participant is specified from the video taken by the camera 27 of the communication conference terminal 2. The initial value of the transmission volume level (column 203) may be automatically determined according to the position of the participant.

Ｓ３０２、ＮＯの場合はＳ３０２に戻り、通信会議の参加メンバーのいずれかが、発言を開始するまで監視を継続する。 In the case of S302, NO, the process returns to S302, and monitoring is continued until one of the participating members of the communication conference starts speaking.

次に話者別音量レベル調整部１５は、話者識別部１３が識別した話者の音声信号を音声信号送受信部１４が送受信している最中に、音量調整要求信号受信部１６が、他対地からの音量調整要求信号を受信した場合（Ｓ３０６、ＹＥＳ）、その要求内容に従って、他対地へ送信する音声信号の送話音量レベルを所定量アップまたは所定量ダウンする指示をする（Ｓ３０７）と共に、前記アップまたはダウンした送話音量レベルに係る情報を、話者識別部１３が識別した話者と対応付けて、話者別音量レベル記憶部１７に記憶する（Ｓ３０８）。 Next, the volume level adjustment unit 15 for each speaker is configured so that the volume adjustment request signal receiving unit 16 receives the voice signal of the speaker identified by the speaker identifying unit 13 while the audio signal transmitting / receiving unit 14 is transmitting / receiving When a volume adjustment request signal from the ground is received (S306, YES), an instruction is given to increase or decrease the transmission volume level of the audio signal transmitted to the other ground by a predetermined amount according to the requested content (S307). The information related to the up or down transmission volume level is stored in the speaker-specific volume level storage unit 17 in association with the speaker identified by the speaker identification unit 13 (S308).

具体的には、音声信号送受信部１４が話者（Ａ−２）の音声信号を送受信している最中に、音量調整信号受信部１６が拠点Ｂから送話音量レベルのアップ要求を受信した場合、話者別音量レベル調整部１５は、音声信号送受信部１４に対して、送話音量レベルを所定量アップする指示(例えば、拠点Ｂへの送話音量レベルを＋３ｄＢから＋６ｄＢに変更する指示)を行うと共に、その送話音量レベルを話者別音量レベル記憶部１７に記憶する。 Specifically, the volume adjustment signal receiving unit 16 receives a request to increase the transmission volume level from the base B while the audio signal transmitting / receiving unit 14 is transmitting / receiving the audio signal of the speaker (A-2). In this case, the speaker-specific volume level adjustment unit 15 instructs the audio signal transmission / reception unit 14 to increase the transmission volume level by a predetermined amount (for example, an instruction to change the transmission volume level to the base B from +3 dB to +6 dB). ) And the transmitted sound volume level is stored in the volume level storage unit 17 for each speaker.

Ｓ３０６、ＮＯの場合は、Ｓ３０９に進む。 In the case of S306, NO, the process proceeds to S309.

尚、音量調整要求信号受信部１６が、いずれかの他対地から音量調整要求信号を受信した場合に、送話音量レベルを所定量アップまたは所定量ダウンする制御としては、以下のような方法がある。
・要求のあった他対地に対する送話音量レベルのみを変更する。
(例えば、要求のあった他対地へ送出する音声信号の送話音量レベルのみを＋３ｄＢアップする。)
・各他対地に対する送話音量レベルを一括で変更する。
(例えば、各他対地へ送出する音声信号の送話音量レベルを一括で＋３ｄＢアップする。)
・要求のあった他対地の送話音量レベルのみ、可変量を大きくする。
(例えば、要求のあった他対地へ送出する音声信号の送話音量レベルは＋６ｄＢアップし、それ以外の他対地へ送出する音声信号の送話音量レベルは＋３ｄＢアップする。) In addition, when the volume adjustment request signal receiving unit 16 receives a volume adjustment request signal from any other ground, the following method is used as the control to increase or decrease the transmission volume level by a predetermined amount. is there.
・ Change only the volume level for sending to other grounds where requested.
(For example, only the transmission volume level of the audio signal to be transmitted to the requested other ground is increased by +3 dB.)
・ Change the transmission volume level for each other ground at once.
(For example, the transmission volume level of the audio signal sent to each other ground is increased by +3 dB in a lump.)
-Increase the variable amount only at the other-ground transmission volume level requested.
(For example, the transmission volume level of the audio signal transmitted to the other ground that has been requested is increased by +6 dB, and the transmission volume level of the other audio signal transmitted to the other ground is increased by +3 dB.)

送話音量レベルを所定量アップまたは所定量ダウンする制御については、上記いずれかの方法としてもよいし、本装置に設定を具備して選択できる様にしてもよい。 The control for raising or lowering the transmission volume level by a predetermined amount may be performed by any of the above methods, or may be selected by setting the apparatus.

また、他対地からの音量調整要求信号に従って、送話音量レベルを所定量アップまたは所定量ダウンしたことを、各拠点のモニタ等に表示できる様にしてもよい。（図示せず。） Further, in accordance with a volume adjustment request signal from another place, the fact that the transmission volume level has been increased or decreased by a predetermined amount may be displayed on a monitor or the like at each site. (Not shown)

尚、話者別音量レベル記憶部１７に記憶したデータは、同一他拠点との次回の通信会議にそのまま流用出来るので、再調整が軽微で済むという利点がある。 Since the data stored in the speaker-specific volume level storage unit 17 can be used as it is for the next communication conference with the same other base, there is an advantage that readjustment can be made light.

話者識別部１３は、Ｓ３０３で特定した話者の発言の終了を監視し、話者の発言が終了した場合（Ｓ３０９、ＹＥＳ）、話者別音量レベル調整部１５に対してその旨を通知する。Ｓ３０９、ＮＯの場合は、Ｓ３０６に戻る。 The speaker identification unit 13 monitors the end of the speaker's speech specified in S303, and when the speaker's speech has ended (S309, YES), notifies the speaker-specific volume level adjustment unit 15 to that effect. To do. In the case of S309, NO, the process returns to S306.

話者識別部１３から通知を受け取った話者別音量レベル調整部１５は、音声信号送受信部１４に対して、送話音量レベルのクリア(初期値０に戻す)を指示（Ｓ３１０）し、Ｓ３０２に戻る。 Upon receiving the notification from the speaker identification unit 13, the speaker-specific volume level adjustment unit 15 instructs the voice signal transmission / reception unit 14 to clear the transmission volume level (return to the initial value 0) (S310), and then S302. Return to.

以上説明した通り、本発明によれば、話者の音声を受信する他対地で、特定話者の音声が聞き取りにくい場合に、他対地側からの簡易な操作で、本装置から他対地へ送出する特定話者の音声信号の送話音量レベルを調整することが可能な、話者別音量調整機能を有する通信会議装置を提供することが可能である。 As described above, according to the present invention, when it is difficult to hear the voice of a specific speaker on the other ground that receives the voice of the speaker, it is transmitted from the device to the other ground with a simple operation from the other ground side. It is possible to provide a communication conference apparatus having a speaker-specific volume adjustment function capable of adjusting a transmission volume level of a voice signal of a specific speaker.

尚、本実施形態においては、話者を識別する手段として、話者を撮影するカメラからの映像信号を解析して話者を識別しているが、本発明はこれに限定されない。例えば、受信した音声信号を解析して声紋を抽出し、予め登録されている各話者の声紋と照合して話者を特定してもよい。また、音量レベルの調整を要求する側が話者を識別するＩＤを指定するようにしてもよい。 In the present embodiment, as means for identifying a speaker, a speaker is identified by analyzing a video signal from a camera that photographs the speaker. However, the present invention is not limited to this. For example, the voice signal may be extracted by analyzing the received voice signal, and the speaker may be specified by collating with a voice fingerprint of each speaker registered in advance. In addition, an ID for identifying a speaker may be designated on the side requesting the volume level adjustment.

１・・・通信会議装置
１１・・・通信制御部
１２・・・映像信号送受信部
１３・・・話者識別部
１４・・・音声信号送受信部
１５・・・話者別音量レベル調整部
１６・・・音量調整要求信号受信部
１７・・・話者別音量レベル記憶部

２Ａ・・・通信会議端末
２Ｂ・・・通信会議端末
２Ｃ・・・通信会議端末

２１・・・通信制御部
２２・・・映像信号入力部
２３・・・映像信号出力部
２４・・・音声信号入力部
２５・・・音声信号出力部
２６・・・操作部
２７・・・カメラ
２８・・・モニタ
２９・・・マイク
３０・・・スピーカ

３・・・ＬＡＮ
４・・・ＷＡＮ DESCRIPTION OF SYMBOLS 1 ... Communication conference apparatus 11 ... Communication control part 12 ... Video signal transmission / reception part 13 ... Speaker identification part 14 ... Voice signal transmission / reception part 15 ... Volume level adjustment part 16 classified by speaker ... Volume adjustment request signal receiving unit 17 ... Volume level storage unit for each speaker

2A: Communication conference terminal 2B: Communication conference terminal 2C: Communication conference terminal

21 ... Communication control unit 22 ... Video signal input unit 23 ... Video signal output unit 24 ... Audio signal input unit 25 ... Audio signal output unit 26 ... Operation unit 27 ... Camera 28 ... Monitor 29 ... Microphone 30 ... Speaker

3 ... LAN
4 ... WAN

Claims

A communication conference that processes a voice signal input from a microphone connected to the own device that collects voices of a plurality of speakers or a voice signal received from one or more other grounds and transmits the processed voice signals to the other grounds A device,
Audio signal transmitting / receiving means for transmitting or receiving audio signals via each of the other grounds via a network, speaker identification means for identifying a speaker of the audio signal input by the microphone, and speaker identification means The volume level storage means for each speaker for storing information related to the volume level for each speaker, and the transmission volume level of the audio signal to be transmitted to another ground with reference to the volume level storage means for each speaker is adjusted. A volume level adjustment means for each speaker, and a volume adjustment request signal reception means for receiving a volume adjustment request signal for requesting an increase or a decrease in the transmission volume level of the audio signal transmitted by the own apparatus from another ground,
When the sound volume adjustment request signal receiving means receives the sound volume adjustment request signal from another ground while transmitting the audio signal input by the microphone to the other ground,
The volume level adjusting means for each speaker increases or decreases a transmission volume level of a voice signal to be transmitted to another ground by a predetermined amount in accordance with a request content of the received volume adjustment request signal. A communication conferencing apparatus having a volume control function for each speaker, characterized in that information related to the transmission volume level is stored in the speaker-specific volume level storage means in association with the speaker identified by the speaker identification means .

The teleconference device according to claim 1,
A video signal input means for inputting a video signal from a video camera for photographing a plurality of speakers, and a video signal transmission means for processing the input video signal and transmitting it to another point;
The speaker identification means analyzes a video signal input from the video camera, or analyzes an angle of the video camera to identify a speaker of the audio signal, Communication conference device having.