JPH10136101A

JPH10136101A - Video conference system

Info

Publication number: JPH10136101A
Application number: JP8285035A
Authority: JP
Inventors: Teruo Katsumi; 輝男勝見
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-10-28
Filing date: 1996-10-28
Publication date: 1998-05-22

Abstract

PROBLEM TO BE SOLVED: To provide a video conference system where a transmission voice level from a conference terminal equipment is simultaneously adjusted from the conference terminal equipment in the case of conducting a video conference connecting to multi-points. SOLUTION: A control request section 35-i of an optional conference terminal 32-i makes a request of voice level adjustment. Upon the receipt of a request of voice level adjustment from the conference terminal 32-i, a control right permission section 48 discriminates whether or not this request is permitted and sends the result to the conference terminal 32-i. The conference terminal 32-i whose request is permitted uses a control information setting section 36-i to instruct adjustment of a voice level of a voice processing section 43-i of an optional conference terminal. A voice signal received from conference terminals 32-1-32-n is given respectively to voice processing sections 43-1-43-n, where a volume level of the voice is adjusted, and whether the voice signal is a voiced sound or a silence signal is discriminated and a talker discrimination section 45 discriminates a talker. A video processing section 46 sends a video image from a conference terminal whose talker is discriminated to all conference terminals.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数地点を結ぶテ
レビ会議システムに係わり、例えば会議端末からの入力
音量の調整を行うテレビ会議システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video conference system for connecting a plurality of points, and for example, to a video conference system for adjusting an input volume from a conference terminal.

【０００２】[0002]

【従来の技術】テレビ会議システムは、テレビジョンを
介して映像と音により会議を行うシステムである。多地
点テレビ会議システムでは、多地点制御装置を用いて、
複数の会議端末を多地点制御装置を中心にスター状に接
続して会議を行うのが一般的である。多地点制御装置
は、各会議端末から送られてくる音声や映像を同一の会
議室に同席しているかのように編集して、これらの会議
端末に分配する。多地点制御装置には、会議端末から入
力される音声のレベルにより現在の話者を判定し、分配
する映像を、話者と判定された会議端末からの映像に切
り替える音声切替を用いたものがある。この場合、発言
時の音声レベルに差が存在すると、音声切替による話者
の映像の選択が適正に行われない。この音声切替を適正
なものとするために、従来は各会議端末でそれぞれ送信
音声のレベルを調整する操作が必要であった。また、音
声切替を行う多地点テレビ会議システムについては、例
えば、特開平０４−３５４４７号公報と特開平０４−３
７１０５８号公報に記載がある。このうち、特開平０４
−３５４４７号公報に記載された多地点テレビ会議シス
テムでは、音声切替を適正なものとするために、会議端
末の設置された会議室の状況に応じて、入力された音声
を有声と見なすか無声と見なすか判定するためのしきい
値をそれぞれの会議端末から設定していた。2. Description of the Related Art A video conference system is a system for performing a conference using video and sound via a television. In a multipoint video conference system, using a multipoint control device,
In general, a conference is performed by connecting a plurality of conference terminals in a star shape around a multipoint control device. The multipoint control device edits the audio and video transmitted from each conference terminal as if they were present in the same conference room, and distributes them to these conference terminals. Some multipoint control devices use voice switching that determines the current speaker based on the level of the voice input from the conference terminal and switches the video to be distributed to the video from the conference terminal determined to be the speaker. is there. In this case, if there is a difference in the voice level at the time of the speech, the selection of the speaker's video by the voice switching is not performed properly. Conventionally, in order to make the voice switching appropriate, it has been necessary to adjust the level of the transmission voice at each conference terminal. A multi-point video conference system that performs audio switching is described in, for example, Japanese Patent Application Laid-Open Nos. 04-35447 and 04-3544.
It is described in JP-A-71058. Of these,
In the multipoint video conference system described in Japanese Patent Application Laid-Open No. 35454/1993, in order to make the voice switching appropriate, the input voice is regarded as voiced or unvoiced depending on the state of the conference room in which the conference terminal is installed. The threshold value for determining whether or not to consider is set from each conference terminal.

【０００３】図６は、この特開平０４−３５４４７号公
報に記載された多地点テレビ会議システムの概要を表し
たものである。このシステムでは、会議端末１２−１〜
１２−ｎが多地点制御装置１１に接続されている。この
図で、一点鎖線で囲んだ音声処理部１５−１〜１５−ｎ
は、音声復号化部２０−１〜２０−ｎ、音声符号化部２
１−１〜２１−ｎとレベル検出部２２−１〜２２−ｎと
を備えている。会議端末１２−ｉ（以後、ｉは１〜ｎの
なかのいずれかの整数を意味する）から入力される音声
信号は、回線インタフェース１３を介し、音声復号化部
２０−ｉで復号され、音声加算部１６で加算される。音
声加算部１６で加算された音声信号は、音声符号化部２
１−ｉで符号化され、各会議端末１２−ｉに送信され
る。また、入力された音声信号を、有声とするか無声と
するかを判定するしきい値を音声処理部１５−ｉに対応
する会議端末１２−ｉからそれぞれ入力する。レベル検
出部２２−ｉは、音声復号化部２０−ｉで復号された音
声信号の平均電力値を計算する。平均電力値がこのしき
い値以上であれば、入力された音声を有声と判定する。
話者判定部１７はレベル検出部２２−ｉで、有声と判定
された音声信号の中から、音声レベル値が最大の会議端
末を話者と判定する。制御部１４では、話者判定部１７
において判定された結果に基づき、映像処理部１８に対
し話者と判定された会議端末からの映像を、会議に参加
中の全端末に送信する制御を行う。以上のようにこの従
来例では、レベル検出部２２−ｉで有声とするか無声と
するかを判定するしきい値を対応する会議端末から設定
していた。これにより、テレビ会議端末の設置された会
議室の状況を反映した、音声による画像切り替えを行っ
ていた。FIG. 6 shows an outline of a multipoint video conference system disclosed in Japanese Patent Application Laid-Open No. 04-35447. In this system, the conference terminals 12-1 to 12-1
12-n are connected to the multipoint control device 11. In this figure, the audio processing units 15-1 to 15-n surrounded by alternate long and short dash lines.
Are the audio decoding units 20-1 to 20-n and the audio encoding unit 2
1-1 to 21-n and level detectors 22-1 to 22-n. An audio signal input from the conference terminal 12-i (hereinafter, i represents an integer from 1 to n) is decoded by the audio decoding unit 20-i via the line interface 13, and The addition is performed by the adding unit 16. The audio signal added by the audio adding unit 16 is output to the audio encoding unit 2
1-i and transmitted to each conference terminal 12-i. Also, a threshold value for determining whether the input voice signal is voiced or unvoiced is input from the conference terminal 12-i corresponding to the voice processing unit 15-i. The level detector 22-i calculates an average power value of the audio signal decoded by the audio decoder 20-i. If the average power value is equal to or greater than this threshold, the input voice is determined to be voiced.
The speaker determination unit 17 determines the conference terminal having the largest voice level value from the voice signals determined to be voiced by the level detection unit 22-i as the speaker. In the control unit 14, the speaker determination unit 17
Based on the result determined in, the video processing unit 18 is controlled to transmit the video from the conference terminal determined to be the speaker to all the terminals participating in the conference. As described above, in this conventional example, the threshold for determining whether to be voiced or unvoiced by the level detection unit 22-i is set from the corresponding conference terminal. As a result, the image is switched by voice reflecting the situation of the conference room where the video conference terminal is installed.

【０００４】[0004]

【発明が解決しようとする課題】このように従来のテレ
ビ会議システムでは、音声切替を適正なものとするため
に、会議端末からの送信音声レベルの調整を各会議端末
で行なわなければならないという問題点があった。ま
た、各会議端末でそれぞれ送信音声レベルの調整を行う
ため、会議端末間の送信音声レベルの調整が難しいとい
う問題点があった。As described above, in the conventional video conference system, in order to make the audio switching proper, it is necessary to adjust the transmission audio level from the conference terminal in each conference terminal. There was a point. Further, since the transmission audio level is adjusted at each conference terminal, it is difficult to adjust the transmission audio level between the conference terminals.

【０００５】また、特開平０４−３５４４７号公報に記
載されているテレビ会議システムでは、それぞれの会議
端末ごとに有声とするか無声とするかを識別するための
しきい値を設定していた。そのため、しきい値を低いレ
ベルで設定した会議端末では、話者が小さい声で発言し
てもテレビ会議の話者と判定され、映像が全会議端末に
映し出された。また、しきい値を高いレベルで設定した
会議端末では、しきい値を低いレベルで設定した会議端
末の発言者より大きい声で発言しても、話者とは判定さ
れなかった。そのため、話者の判定が偏るという問題点
があった。[0005] Further, in the video conference system described in Japanese Patent Application Laid-Open No. 04-35447, a threshold value for discriminating between voiced and unvoiced is set for each conference terminal. Therefore, in the conference terminal in which the threshold value is set at a low level, even if the speaker speaks in a low voice, it is determined that the speaker is a video conference speaker, and the video is displayed on all the conference terminals. Also, in the conference terminal with the threshold set at a higher level, even if the speaker speaks with a higher voice than the speaker of the conference terminal with the threshold set at a lower level, it is not determined to be the speaker. Therefore, there is a problem that the determination of the speaker is biased.

【０００６】また、各会議端末でしきい値の設定を行っ
ていたため、全体的に調和のとれたしきい値の設定を行
うことが困難であるという問題点があった。In addition, since the threshold value is set at each conference terminal, it is difficult to set a harmonized threshold value as a whole.

【０００７】そこで本発明の目的は、会議端末からの送
信音声のレベルを会議端末から一括して調整することの
できるテレビ会議システムを提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a video conference system capable of adjusting the level of a transmission voice from a conference terminal collectively from the conference terminal.

【０００８】本発明の他の目的は、音声レベルの調整さ
れた音声信号を用いて話者の判定及び、音声による画像
切り替えを行うことのできるテレビ会議システムを提供
することにある。Another object of the present invention is to provide a video conference system capable of judging a speaker and switching images by voice using a voice signal whose voice level is adjusted.

【０００９】[0009]

【課題を解決するための手段】請求項１記載の発明で
は、（イ）複数の者との間で音声と映像を使って会議を
行う際の自端末側の音声を送出する音声送出手段と、自
端末側の映像を送出する映像送出手段と、会議に参加し
ている各端末の音声および映像を入力し、会議が行われ
ている状態を音声と映像で出力する音声・映像出力手段
と、任意の端末からの音声の出力レベルの調整を指示す
る音声レベル調整指示手段とをそれぞれ有する複数の会
議端末と、（ロ）これら複数の会議端末から送られてき
た音声を加算してそれぞれの会議端末に供給する音声供
給手段と、（ハ）前記複数の会議端末から送られてきた
映像のうち所望のものを選択してそれぞれの会議端末に
供給する映像供給手段と、（ニ）前記複数の会議端末の
いずれかの音声レベル調整指示手段が指示した会議端末
の音声レベルを指示に従って調整する音声レベル調整手
段とをテレビ会議システムに具備させる。According to the first aspect of the present invention, there is provided: (a) a voice transmitting means for transmitting a voice of a terminal when a conference is held between a plurality of persons using voice and video; Video transmission means for transmitting the video of the terminal itself, audio / video output means for inputting audio and video of each terminal participating in the conference, and outputting the state of the conference as audio and video. A plurality of conference terminals each having audio level adjustment instructing means for instructing the adjustment of the output level of audio from any terminal; and (b) adding the audios transmitted from the plurality of conference terminals and adding Audio supply means for supplying to the conference terminal; (c) video supply means for selecting a desired one of the images transmitted from the plurality of conference terminals and supplying the selected one to each of the conference terminals; Audio level of one of the conference terminals To and a sound level adjusting means for adjusting the sound level of the conference terminal that adjustment instruction means has instructed according to the instructions in the television conference system.

【００１０】すなわち請求項１記載の発明では、それぞ
れの会議端末の音声送出手段から送られてくる音声の出
力レベルを音声レベル調整手段により調整する。全ての
会議端末は、音声レベル調整指示手段を備え、この音声
レベル調整指示手段から任意の会議端末に対応する音声
レベル調整手段に対し音声レベルの調整を指示すること
ができる。That is, according to the first aspect of the present invention, the output level of the sound sent from the sound sending means of each conference terminal is adjusted by the sound level adjusting means. All conference terminals are provided with audio level adjustment instructing means, and the audio level adjustment instructing means can instruct the audio level adjusting means corresponding to any conference terminal to adjust the audio level.

【００１１】請求項２記載の発明では、（イ）複数の者
との間で音声と映像を使って会議を行う際の自端末側の
音声を送出する音声送出手段と、自端末側の映像を送出
する映像送出手段と、会議に参加している各端末の音声
および映像を入力し、会議が行われている状態を音声と
映像で出力する音声・映像出力手段と、任意の端末から
の音声の出力レベルの調整を指示する音声レベル調整指
示手段とをそれぞれ有する１または複数の第１の会議端
末と、（ロ）複数の者との間で音声と映像を使って会議
を行う際の自端末側の音声を送出する音声送出手段と、
自端末側の映像を送出する映像送出手段と、会議に参加
している各端末の音声および映像を入力し、会議が行わ
れている状態を音声と映像で出力する音声・映像出力手
段とをそれぞれ有する１または複数の第２の会議端末
と、（ハ）これらの第１の会議端末と第２の会議端末と
から送られてきた音声を加算してそれぞれの第１の会議
端末と第２の会議端末とに供給する音声供給手段と、
（ニ）前記第１の会議端末と第２の会議端末とから送ら
れてきた映像のうち所望のものを選択してそれぞれの第
１の会議端末と第２の会議端末とに供給する映像供給手
段と、（ホ）前記第１の会議端末の音声レベル調整指示
手段の指示に従って音声レベルを調整する音声レベル調
整手段とをテレビ会議システムに具備させる。According to the second aspect of the present invention, there is provided: (a) a voice transmitting means for transmitting a voice of the terminal when a conference is held between a plurality of persons using voice and video; And audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and output from any terminal. (B) when a conference using audio and video is performed between one or a plurality of first conference terminals each having audio level adjustment instruction means for instructing an adjustment of an audio output level, and (b) a plurality of persons; Voice transmitting means for transmitting the voice of the terminal itself;
Video transmitting means for transmitting the video of the terminal itself, and audio / video output means for inputting the audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video. (C) adding the voices transmitted from the first conference terminal and the second conference terminal, respectively, to the first conference terminal and the second conference terminal, Audio supply means for supplying to the conference terminal of
(D) Video supply for selecting a desired video from the video transmitted from the first conference terminal and the second conference terminal and supplying the selected video to the first conference terminal and the second conference terminal, respectively. Means, and (e) an audio level adjusting means for adjusting an audio level in accordance with an instruction of an audio level adjusting instruction means of the first conference terminal.

【００１２】すなわち請求項２記載の発明では、それぞ
れの会議端末の音声送出手段から送られてくる音声の出
力レベルを音声レベル調整手段により調整する。任意の
会議端末に対応する音声レベル調整手段に対し音声レベ
ルの調整を指示することができる音声レベル調整指示手
段を備える第１の会議端末と、音声レベル調整指示手段
を持たない第２の会議端末とがある。That is, according to the second aspect of the present invention, the output level of the sound transmitted from the sound transmitting means of each conference terminal is adjusted by the sound level adjusting means. A first conference terminal including an audio level adjustment instruction unit capable of instructing an audio level adjustment unit corresponding to an arbitrary conference terminal to adjust the audio level, and a second conference terminal not including the audio level adjustment instruction unit There is.

【００１３】請求項３記載の発明では、（イ）複数の者
との間で音声と映像を使って会議を行う際の自端末側の
音声を送出する音声送出手段と、自端末側の映像を送出
する映像送出手段と、会議に参加している各端末の音声
および映像を入力し、会議が行われている状態を音声と
映像で出力する音声・映像出力手段と、任意の端末から
の音声の出力レベルを有声と見なすか無声と見なすか判
定するしきい値を設定する有声無声しきい値設定手段と
をそれぞれ有する複数の会議端末と、（ロ）これら複数
の会議端末から送られてきた音声を加算してそれぞれの
会議端末に供給する音声供給手段と、（ハ）前記複数の
会議端末のいずれかの有声無声しきい値設定手段が設定
したしきい値に従って会議端末からの音声がしきい値以
上であれば有声と判定する有声無声判定手段とをテレビ
会議システムに具備させる。According to the third aspect of the present invention, (a) audio transmitting means for transmitting the audio of the own terminal when a conference is held with a plurality of persons using audio and video; And audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and output from any terminal. A plurality of conference terminals each having voiced and unvoiced threshold setting means for setting a threshold value for determining whether the voice output level is regarded as voiced or unvoiced; and (b) sent from the plurality of conference terminals. (C) voice from the conference terminal according to the threshold set by the voiced / unvoiced threshold setting means of any of the plurality of conference terminals. Voiced if above threshold To and a voiced unvoiced determination means for constant video conferencing system.

【００１４】すなわち請求項３記載の発明では、全ての
会議端末は、有声無声しきい値設定手段を備える。有声
無声しきい値設定手段は、任意の会議端末の音声送出手
段から送られてくる音声の出力レベルと比較し有声とす
るか無声とするか判定するしきい値を設定する。有声無
声判定手段は、この有声無声しきい値設定手段により設
定されたしきい値と会議端末からの音声の出力レベルを
比較し、音声の出力レベルがしきい値以上であれば有声
と判定する。それぞれの会議端末に送り返す映像を選択
するために、この有声無声判定手段により有声と判定さ
れた音声を用るならば、会議端末からしきい値を変更す
ることにより映像の選択を制御できる。また、それぞれ
の会議端末に送り返すために加算する音声に、この有声
無声判定手段により有声と判定された音声を用るなら
ば、会議端末に送り返す音声をしきい値の変更により選
択できる。That is, according to the third aspect of the present invention, all conference terminals include voiced / unvoiced threshold setting means. The voiced / unvoiced threshold setting means compares the output level of the voice transmitted from the voice transmitting means of an arbitrary conference terminal and sets a threshold value for determining whether to be voiced or unvoiced. The voiced / unvoiced determination means compares the threshold value set by the voiced / unvoiced threshold value setting means with the output level of the voice from the conference terminal, and determines that the voice is voiced if the voice output level is equal to or higher than the threshold value. . If the voice determined to be voiced by the voiced / unvoiced determination means is used to select a video to be returned to each conference terminal, the selection of the video can be controlled by changing the threshold value from the conference terminal. Further, if the voice added by the voiced / unvoiced determination means is used as the voice to be added back to each conference terminal, the voice to be returned to the conference terminal can be selected by changing the threshold value.

【００１５】請求項４記載の発明では、（イ）複数の者
との間で音声と映像を使って会議を行う際の自端末側の
音声を送出する音声送出手段と、自端末側の映像を送出
する映像送出手段と、会議に参加している各端末の音声
および映像を入力し、会議が行われている状態を音声と
映像で出力する音声・映像出力手段と、任意の端末から
の音声の出力レベルの調整を指示する音声レベル調整指
示手段とをそれぞれ有する複数の会議端末と、（ロ）調
整の指示が競合したとき１台の会議端末を選択する調整
許可手段と、（ハ）その選択した会議端末の指示する内
容で調整する音声レベル調整手段と、（ニ）これら複数
の会議端末から送られてきた音声を加算してそれぞれの
会議端末に供給する音声供給手段と、（ホ）前記複数の
会議端末から送られてきた映像のうち所望のものを選択
してそれぞれの会議端末に供給する映像供給手段とをテ
レビ会議システムに具備させる。According to the fourth aspect of the present invention, (a) audio transmitting means for transmitting the audio of the own terminal when a conference is held between a plurality of persons using audio and video, and video of the own terminal And audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and output from any terminal. A plurality of conference terminals each having audio level adjustment instruction means for instructing the adjustment of the audio output level; (b) adjustment permission means for selecting one conference terminal when the adjustment instructions conflict; Audio level adjusting means for adjusting with the content specified by the selected conference terminal; (d) audio supply means for adding the audio transmitted from the plurality of conference terminals and supplying the audio to each conference terminal; ) Sent from the plurality of conference terminals Come was in the video it is and a video generator means for supplying to each of the conference terminals select the desired one in the video conference system.

【００１６】すなわち請求項４記載の発明では、それぞ
れの会議端末の音声送出手段から送られてくる音声の出
力レベルを音声レベル調整手段により調整する。全ての
会議端末は、音声レベル調整指示手段を備え、この音声
レベル調整指示手段から任意の会議端末に対応する音声
レベル調整手段に対し音声レベルの調整を指示すること
ができる。複数の会議端末が音声レベルの調整を指示
し、調整の指示が競合したとき調整許可手段により１台
の会議端末を選択する。That is, in the invention according to claim 4, the output level of the sound transmitted from the sound transmitting means of each conference terminal is adjusted by the sound level adjusting means. All conference terminals are provided with audio level adjustment instructing means, and the audio level adjustment instructing means can instruct the audio level adjusting means corresponding to any conference terminal to adjust the audio level. A plurality of conference terminals instruct the adjustment of the audio level, and when the adjustment instructions conflict, one conference terminal is selected by the adjustment permitting means.

【００１７】請求項５記載の発明では、（イ）前記音声
レベル調整手段により調整された音声の中で最大の音声
レベルである会議端末を話者と判定する話者判定手段
と、（ロ）この話者判定手段により話者と判定された会
議端末から送信された映像をそれぞれの会議端末に供給
する映像供給手段とをテレビ会議システムに具備させ
る。According to the fifth aspect of the present invention, (a) speaker determination means for determining the conference terminal having the highest voice level among the voices adjusted by the voice level adjustment means as a speaker, and (b) The video conference system is provided with video supply means for supplying the video transmitted from the conference terminal determined as the speaker by the speaker determination means to each conference terminal.

【００１８】すなわち請求項５記載の発明では、話者判
定手段により前記した音声レベル調整手段により調整さ
れた音声の中で最大の音声レベルである会議端末を話者
と判定する。この話者判定手段により話者と判定された
会議端末から送信された映像を映像供給手段によりそれ
ぞれの会議端末に供給する。これにより会議端末の送る
映像の選択を会議端末から音声レベルを調整することに
より制御できる。That is, in the invention according to claim 5, the conference terminal having the maximum voice level among the voices adjusted by the voice level adjusting means is determined as the speaker by the speaker determining means. The video transmitted from the conference terminal determined as the speaker by the speaker determination means is supplied to each conference terminal by the video supply means. Thus, the selection of the video to be sent by the conference terminal can be controlled by adjusting the audio level from the conference terminal.

【００１９】[0019]

BEST MODE FOR CARRYING OUT THE INVENTION

【００２０】[0020]

【実施例】以下実施例につき、本発明を詳細に説明す
る。The present invention will be described in detail with reference to the following examples.

【００２１】図１は、本発明の一実施例におけるテレビ
会議システムの概要を表したものである。このシステム
では、多地点制御装置３１に会議端末３２−１〜３２−
ｎが接続されている。多地点制御装置３１は、回線イン
タフェース４１と、制御部４２と、各会議端末３２−１
〜３２−ｎに対応して音声信号の処理を行う音声処理部
４３−１〜４３−ｎと、それぞれの音声処理部４３−１
〜４３−ｎからの音声信号を加算する音声加算部４４
と、音声処理部４３から出力された音声レベル値のうち
音声レベルが最大の会議端末を話者と判定する話者判定
部４５と、話者判定部４５で話者と判定された会議端末
の映像をこれらの会議端末に送信する映像処理部４６と
を備える。制御部４２は、制御権許可部４８を備える。
図２は、音声処理部のより詳細な構成の概要を示したも
のである。音声処理部４３−１〜４３−ｎは、会議端末
３２−１〜３２−ｎごとに対応して備えられる。音声処
理部４３−ｉは、音声復号化部６１−ｉ、音声レベル調
整部６２−ｉ、レベル検出部６３−ｉと音声符号化部６
４−ｉとを備える。また、図１で会議端末３２−１〜３
２−ｎはそれぞれ、制御要求部３５−１〜３５−ｎと制
御情報設定部３６−１〜３６−ｎとを備える。いずれか
の制御要求部３５−ｉは、音声レベル調整部６２−１〜
６２−ｎの音声レベルの調整を行う要求（以後、音声レ
ベル制御権要求と呼ぶ）を出すものである。制御部４２
の制御権許可部４８は音声レベル制御権要求を受信する
と、この要求を許可するか否か判断し、その結果を要求
を出した会議端末に送信するものである。音声レベル制
御権要求を許可された会議端末の制御情報設定部３６−
ｉは、音声レベル調整部６２−１〜６２−ｎの中から音
声レベルの調整が必要と思われるものの調整を指示する
ものである。FIG. 1 shows an outline of a video conference system according to an embodiment of the present invention. In this system, the conference terminals 32-1 to 32-32 are provided to the multipoint control device 31.
n are connected. The multipoint control device 31 includes a line interface 41, a control unit 42, and each of the conference terminals 32-1.
Audio processing units 43-1 to 43-n for processing audio signals corresponding to.
Audio adder 44 for adding audio signals from .about.43-n
And a speaker determination unit 45 that determines the conference terminal having the highest voice level among the voice level values output from the voice processing unit 43 as a speaker, and a conference terminal that is determined to be a speaker by the speaker determination unit 45. A video processing unit 46 for transmitting video to these conference terminals. The control unit 42 includes a control right permission unit 48.
FIG. 2 shows an outline of a more detailed configuration of the audio processing unit. The audio processing units 43-1 to 43-n are provided corresponding to the conference terminals 32-1 to 32-n. The audio processing unit 43-i includes an audio decoding unit 61-i, an audio level adjustment unit 62-i, a level detection unit 63-i, and an audio encoding unit 6.
4-i. Also, in FIG.
2-n includes control request units 35-1 to 35-n and control information setting units 36-1 to 36-n, respectively. One of the control requesting units 35-i includes the sound level adjusting units 62-1 to 62-1.
A request for adjusting the audio level 62-n (hereinafter referred to as an audio level control right request) is issued. Control unit 42
Upon receiving the audio level control right request, the control right permitting unit 48 determines whether to permit the request and transmits the result to the conference terminal that issued the request. The control information setting unit 36 of the conference terminal permitted to request the audio level control right
“i” indicates adjustment of the audio level adjustment units 62-1 to 62-n that are considered to require adjustment of the audio level.

【００２２】各会議端末３２−１〜３２−ｎは、多地点
制御装置３１との間で映像、音声、データを多重化した
信号として送受信するものである。ここで、データは会
議で使用するデータと、制御信号とを含む。制御信号の
一例には、会議端末３２−１〜３２−ｎから入力する音
声のボリュームレベルの調整を行うための制御信号があ
る。回線インタフェース４１は、会議端末３２−１〜３
２−ｎから送信された多重化された信号を分離し、映像
信号を映像処理部４６、音声信号を音声処理部４３−１
〜４３−ｎ、データおよび制御信号を制御部４２にそれ
ぞれ送るものである。また、回線インタフェース４１
は、制御部４２、音声処理部４３−１〜４３−ｎ、映像
処理部４６から出力される信号を多重化して会議端末３
２−１〜３２−ｎに送信するようになっている。制御部
４２は、会議端末３２−１〜３２−ｎおよび話者判定部
４５から出力される制御信号を受信し、これに基づき音
声処理部４３−１〜４３−ｎと映像処理部４６を制御す
るものである。制御部４２と制御権許可部４８は、記憶
媒体に制御プログラムを記憶し、ＣＰＵで実行するもの
である。Each of the conference terminals 32-1 to 32-n transmits and receives a multiplexed signal of video, audio, and data to and from the multipoint control device 31. Here, the data includes data used in the conference and a control signal. One example of the control signal is a control signal for adjusting the volume level of the audio input from the conference terminals 32-1 to 32-n. The line interface 41 is connected to the conference terminals 32-1 to 32-3.
2-n, demultiplexes the multiplexed signal transmitted from 2-n, converts the video signal into a video processing unit 46, and converts the audio signal into an audio processing unit 43-1.
, 43-n, and data and control signals to the control unit 42. Also, the line interface 41
Multiplexes signals output from the control unit 42, the audio processing units 43-1 to 43-n, and the video processing unit 46, and
2-1 to 32-n. The control unit 42 receives the control signals output from the conference terminals 32-1 to 32-n and the speaker determination unit 45, and controls the audio processing units 43-1 to 43-n and the video processing unit 46 based on the control signals. Is what you do. The control unit 42 and the control right permission unit 48 store a control program in a storage medium, and are executed by the CPU.

【００２３】音声復号化部６１−ｉは、対応する会議端
末３２−ｉから送信された符号化された音声信号を復号
化し、音声レベル調整部６２−ｉに送るものである。音
声レベル調整部６２−ｉは、音声復号化部６１−ｉから
出力される復号化された音声信号のボリュームレベルを
調整し、レベル検出部６３−ｉと音声加算部４４に送る
ようになっている。この音声レベルを調整する値は、あ
らかじめ多地点制御装置３１から設定しておいた値を用
いることも可能であるし、任意の会議端末の制御情報設
定部３６−ｉから音声レベルを調整する値を設定変更す
ることもできる。レベル検出部６３−ｉは、音声レベル
調整部６２−ｉでレベル調整された音声信号の平均電力
値を求め、あらかじめ設定された有声とするか無声とす
るかを判定するしきい値より大きい音声レベル値を話者
判定部４５に出力する。音声符号化部６４−ｉは、音声
加算部４４で加算処理された音声信号を符号化し、回線
インタフェース４１を介し会議端末３２−ｉに出力する
ようになっている。The audio decoding section 61-i decodes the encoded audio signal transmitted from the corresponding conference terminal 32-i and sends it to the audio level adjusting section 62-i. The audio level adjuster 62-i adjusts the volume level of the decoded audio signal output from the audio decoder 61-i, and sends the volume level to the level detector 63-i and the audio adder 44. I have. As the value for adjusting the audio level, a value set in advance from the multipoint control device 31 can be used, or the value for adjusting the audio level from the control information setting unit 36-i of any conference terminal can be used. Can be changed. The level detection unit 63-i obtains an average power value of the audio signal whose level has been adjusted by the audio level adjustment unit 62-i, and outputs a voice larger than a preset threshold value for determining whether to be voiced or unvoiced. The level value is output to the speaker determination unit 45. The audio encoding unit 64-i encodes the audio signal added by the audio adding unit 44 and outputs the encoded audio signal to the conference terminal 32-i via the line interface 41.

【００２４】以上のような構成を備える本実施例のテレ
ビ会議システムに関する動作を、図１および図２を用い
て次に説明する。The operation of the video conference system according to the present embodiment having the above-described configuration will be described below with reference to FIGS.

【００２５】会議端末３２−１から音声レベルの調整を
行う場合、制御要求部３５−１から多地点制御装置３１
に対して音声レベル制御権要求を送信する。音声レベル
制御権要求の信号は、多地点制御装置３１における回線
インタフェース４１で分離され、制御部４２に入力さ
れ、制御権許可部４８に渡される。制御権許可部４８
は、この会議端末３２−１の音声レベル制御権要求を許
可するか否か判断する。制御権許可部４８は、先に音声
レベル制御権要求を出した会議端末３２−ｉに音声レベ
ルの制御権を与えるものである。When adjusting the audio level from the conference terminal 32-1, the control request unit 35-1 sends the multipoint control device 31
Sends a voice level control right request to. The signal of the voice level control right request is separated by the line interface 41 in the multipoint control device 31, input to the control unit 42, and passed to the control right permission unit 48. Control right permission unit 48
Determines whether to permit the audio level control right request of the conference terminal 32-1. The control right permitting section 48 gives the control right of the audio level to the conference terminal 32-i which previously issued the request for the audio level control right.

【００２６】図３は、会議端末３２−ｉに音声レベル制
御権要求を許可するか否か判断するための処理の流れを
示したものである。制御権許可部４８では、いずれかの
会議端末が音声レベルを調整しているか否かを、調整フ
ラッグＦにセットされた値により調べる。はじめは、ど
の会議端末も音声レベルを調整していないので、初期値
として調整フラッグＦに０を設定する（ステップＳ１０
１）。会議端末から音声レベル制御権要求があるまで待
機し（ステップＳ１０２）、会議端末から音声レベル制
御権要求が出されたならば（ステップＳ１０２；Ｙ）、
Ｆの値を調べる（ステップＳ１０３）。Ｆの値が０なら
ば（ステップＳ１０３；Ｙ）、他に音声レベルを調整し
ている会議端末がないので音声レベル制御権要求を許可
する。制御権許可部４８が音声レベル制御権要求を許可
すると判定した場合には（ステップＳ１０３；Ｙ）、こ
の会議端末に対し音声レベルの調整を許可する指示（以
後、音声レベル制御権許可と呼ぶ）を、回線インタフェ
ース４１を介して送信し（ステップＳ１０４）、Ｆに１
をセットする（ステップＳ１０５）。Ｆが０でないなら
ば、音声レベル制御権要求を許可しないと判定し（ステ
ップＳ１０３；Ｎ）、この会議端末に対して、音声レベ
ル制御権要求を拒否する指示を送信する（ステップＳ１
０６）。このような手順により、音声レベル制御権許可
を受信した会議端末３２のみが次に述べる制御が可能と
なる。以上のように、１台の会議端末から会議に参加す
るすべての会議端末の音声レベルを調整するので、音声
レベル調整部６２−ｉへの制御は競合せず、統制のとれ
た音声レベルの調整が可能となる。FIG. 3 shows a flow of a process for determining whether to permit the conference terminal 32-i to request the audio level control right. The control right permission unit 48 checks whether any of the conference terminals is adjusting the audio level based on the value set in the adjustment flag F. Initially, no audio level is adjusted by any of the conference terminals, so 0 is set to the adjustment flag F as an initial value (step S10).
1). It waits until a voice level control right request is issued from the conference terminal (step S102). If a voice level control right request is issued from the conference terminal (step S102; Y),
The value of F is checked (step S103). If the value of F is 0 (step S103; Y), the audio level control right request is permitted because there is no other conference terminal whose audio level is being adjusted. If the control right permitting unit 48 determines that the request for the audio level control right is permitted (step S103; Y), an instruction to permit the conference terminal to adjust the audio level (hereinafter, referred to as "permission for controlling the audio level control right"). Is transmitted via the line interface 41 (step S104), and 1
Is set (step S105). If F is not 0, it is determined that the voice level control right request is not permitted (step S103; N), and an instruction to reject the voice level control right request is transmitted to this conference terminal (step S1).
06). According to such a procedure, only the conference terminal 32 that has received the audio level control right permission can perform the control described below. As described above, since the audio levels of all the conference terminals participating in the conference are adjusted from one conference terminal, the control of the audio level adjustment unit 62-i does not compete, and the controlled audio level adjustment is performed. Becomes possible.

【００２７】会議端末３２−１が、音声レベル制御権許
可を制御権許可部４８から受信すると、入力された音声
レベルを調整するための制御情報が制御情報設定部３６
−１により多地点制御装置３１に送信される。制御情報
は、音声レベルを調整する対象の会議端末を指定する情
報と、音声レベル調整情報からなる。ここで、調整の対
象となる会議端末を会議端末３２−２とする。この制御
情報は多地点制御装置３１の制御部４２に渡され、制御
部４２は制御対象となる会議端末３２−２に対応する音
声処理部４３−２に線５１を介して、この音声レベル調
整情報を出力する。この音声レベル調整情報は、音声処
理部４３−２の音声レベル調整部６２−２に供給され
る。音声レベル調整部６２−２では、この音声レベル調
整情報に基づき、音声復号化部６１−１で復号化された
音声信号のボリュームレベルを調整する。音声レベルを
調整した後の調整値は、音声処理部４３−２の音声レベ
ル調整部６２−２で記憶する。会議端末３２−１の制御
情報設定部３６−１から、制御対象の会議端末３１−１
〜３２−ｎを次々に変更して、各会議端末の間で音声レ
ベルがそろうよう調整する。When the conference terminal 32-1 receives the voice level control right permission from the control right permission unit 48, the control information for adjusting the input voice level is transmitted to the control information setting unit 36.
-1 is transmitted to the multipoint control device 31. The control information includes information for specifying a conference terminal whose audio level is to be adjusted, and audio level adjustment information. Here, the conference terminal to be adjusted is the conference terminal 32-2. This control information is passed to the control unit 42 of the multipoint control device 31, and the control unit 42 sends this audio level adjustment via the line 51 to the audio processing unit 43-2 corresponding to the conference terminal 32-2 to be controlled. Output information. This audio level adjustment information is supplied to the audio level adjustment unit 62-2 of the audio processing unit 43-2. The audio level adjustment unit 62-2 adjusts the volume level of the audio signal decoded by the audio decoding unit 61-1 based on the audio level adjustment information. The adjustment value after adjusting the audio level is stored in the audio level adjustment unit 62-2 of the audio processing unit 43-2. From the control information setting unit 36-1 of the conference terminal 32-1, the conference terminal 31-1 to be controlled is transmitted.
.About.32-n are successively changed to adjust the audio levels between the conference terminals.

【００２８】音声レベル調整部６２−２で音声レベルを
調整された音声信号は線５５を介して、音声加算部４４
に出力される。さらに、音声レベル調整部６２−２の出
力は、レベル検出部６３−２に送られる。レベル検出部
６３−２は音声レベル調整部６２−２から出力された音
声信号の平均電力値を計算し、あらかじめ設定されたし
きい値と比較し、この平均電力値がしきい値以上であれ
ば有声と判定する。有声と判定された音声レベル値は、
線５４を介し話者判定部４５に送られる。このため、話
者判定部４５で話者を判定するために使用する音声レベ
ル値も、音声レベルを調整された音声信号に基づく音声
レベル値となる。そのため、有声と判定された会議端末
が複数ある場合、会議端末間でボリュームレベルをそろ
えれば、音声レベルのそろった音声信号を比較すること
になり、話者の判定がより正確なものとなる。これによ
り、話者判定に基づく画像の切り替えがより的確なもの
となる。音声加算部４４で加算される音声信号は、音声
レベル調整部６２−１〜６２−ｎで音声レベルを調整さ
れた信号であるため、各会議端末間でボリュームレベル
の調整された音声信号である。そのため、加算信号につ
いても各会議端末間でボリュームレベルの調整された加
算信号を得ることができる。The audio signal whose audio level has been adjusted by the audio level adjusting unit 62-2 is transmitted via a line 55 to the audio adding unit 44.
Is output to Further, the output of the audio level adjustment unit 62-2 is sent to the level detection unit 63-2. The level detector 63-2 calculates an average power value of the audio signal output from the audio level adjuster 62-2, compares the average power value with a preset threshold value, and determines whether the average power value is equal to or greater than the threshold value. If it is voiced, it is determined. The voice level value determined to be voiced is
The signal is sent to the speaker determination unit 45 via the line 54. For this reason, the audio level value used to determine the speaker by the speaker determination unit 45 is also an audio level value based on the audio signal whose audio level has been adjusted. Therefore, when there are a plurality of conference terminals determined to be voiced, if the volume levels are equal among the conference terminals, the audio signals having the same audio level will be compared, and the speaker determination will be more accurate. . As a result, the switching of the image based on the speaker determination becomes more accurate. Since the audio signal added by the audio adding unit 44 is a signal whose audio level has been adjusted by the audio level adjusting units 62-1 to 62-n, it is an audio signal whose volume level has been adjusted between the conference terminals. . Therefore, it is possible to obtain an addition signal whose volume level is adjusted between the conference terminals also for the addition signal.

【００２９】また、音声レベル制御権は調整操作が完了
した場合、不要となるので、会議端末３２−１から制御
権許可部４８に対し、音声レベル制御権を放棄する制御
情報を送信し、制御権許可部４８は調整フラッグＦ＝０
とすることで音声レベル制御権を解放する。ただし、こ
の場合も、調整された音声レベルは、音声レベル調整部
６２−ｉで保持され、会議中最適な状態を維持できる。
また、音声レベル制御権が解放されているので、他の会
議端末が音声レベルの調整をすることも可能となる。Since the audio level control right becomes unnecessary when the adjustment operation is completed, the conference terminal 32-1 transmits control information for relinquishing the audio level control right to the control right permitting section 48, and The right permission unit 48 sets the adjustment flag F = 0.
Release the audio level control right. However, also in this case, the adjusted audio level is held in the audio level adjustment unit 62-i, and the optimal state can be maintained during the conference.
Also, since the audio level control right is released, other conference terminals can also adjust the audio level.

【００３０】このように本発明の実施例では、任意の会
議端末から任意の会議端末の音声信号のボリュームレベ
ルを調整できるので、それぞれの会議端末で調整する場
合と比べて容易に統制のとれた音声レベルの調整が可能
となる。また、会議端末から入力された音声信号の音声
ボリュームレベルを調整した信号を用いて、有声とする
か無声とするかの検出を行い、有声と判定された会議端
末の音声レベル値を比較し話者を判定する。そのため、
ボリューム調整された音声信号の音声レベル値を比較す
るので、話者の判定がより的確となる。また、ボリュー
ム調整された音声信号を加算するため、各会議端末間で
声の大きさの差が少なくなり、聞きやすい音声を各会議
端末で得ることができる。As described above, in the embodiment of the present invention, since the volume level of the audio signal of an arbitrary conference terminal can be adjusted from an arbitrary conference terminal, control can be easily achieved as compared with the case where the adjustment is made at each conference terminal. The audio level can be adjusted. In addition, using the signal obtained by adjusting the audio volume level of the audio signal input from the conference terminal, it is detected whether the audio signal is voiced or unvoiced. Is determined. for that reason,
Since the audio level values of the volume-adjusted audio signal are compared, the determination of the speaker becomes more accurate. In addition, since the volume-adjusted audio signals are added, the difference in voice volume between the conference terminals is reduced, and audio that is easy to hear can be obtained at each conference terminal.

【００３１】また、図１の制御権許可部４８が、音声レ
ベル制御権要求を許可するか否か判断する処理は、次の
ように行っても良い。音声レベル制御権を持つ会議端末
を制御権許可部４８に登録し、ある特定の会議端末に音
声レベル制御権を与え、権限のない会議端末からの音声
レベルの制御を防ぐようにする。音声レベル制御権を持
つ会議端末を複数登録した場合は、先に音声レベル制御
権要求を出した会議端末に制御権を与えるようにしても
良いし、あらかじめ優先順位を与えておき、この優先順
位に従って制御権を与えても良い。The process in which the control right permission unit 48 shown in FIG. 1 determines whether to permit the audio level control right request may be performed as follows. The conference terminal having the audio level control right is registered in the control right permission unit 48, and the audio level control right is given to a specific conference terminal, so that the audio level control from an unauthorized conference terminal is prevented. When a plurality of conference terminals having the audio level control right are registered, the control right may be given to the conference terminal which has issued the audio level control right request first, or the priorities may be given in advance, and May be given in accordance with the following.

【００３２】また、図１で示したテレビ会議システム
で、音声レベルの調整を指示する会議端末を特定のもの
に限定すれば、制御要求部３５−ｉと制御権許可部４８
はなくてもよい。なお、映像処理部４６は、会議端末か
らの映像を合成し、それぞれの会議端末に送信しても良
い。会議端末からの映像を合成する際に、話者判定部４
５で話者と判定された会議端末の映像を大きく、あるい
は、話者とわかるように印を付け合成しても良い。In the video conference system shown in FIG. 1, if the conference terminal instructing the adjustment of the audio level is limited to a specific one, the control request unit 35-i and the control right permission unit 48 can be controlled.
May not be required. Note that the video processing unit 46 may combine the video from the conference terminals and transmit the video to each conference terminal. When synthesizing the video from the conference terminal, the speaker determination unit 4
The video of the conference terminal determined to be the speaker in 5 may be large or may be marked so as to be recognized as the speaker and synthesized.

【００３３】変形例図４は、本発明の変形例におけるテレビ会議システムの
概要を表したものである。図４で先の実施例の図１およ
び図２と同一部分には同一の符号を付しており、これら
の説明を適宜省略する。この変形例の音声処理部８８−
１〜８８−ｎは、制御部８２の出力とレベル検出部９１
−１〜９１−ｎが線８５で接続され、レベル検出部９１
−１〜９１−ｎの出力と話者判定部４５が線９３により
接続されている。任意の会議端末７２−ｉからレベル検
出部９１−１〜９１−ｎで用いる有声とするか無声とす
るかを判定するためのしきい値を設定し、レベル検出部
９１−１〜９１−ｎに記憶する。レベル検出部９１−ｉ
で有声とするか無声とするかを検出する時、このしきい
値を用いる。また、会議端末がしきい値を設定する要求
を出すしきい値設定要求部７５−１〜７５−ｎとこの要
求に対し、許可するか否か判定するしきい値設定許可部
８４がある。これにより任意の１台の会議端末から任意
のレベル検出部９１−ｉのしきい値を設定できる。な
お、しきい値を設定する会議端末を特定のものに限定す
れば、しきい値設定要求部７５−ｉと制御権許可部８４
はなくてもよい。Modification FIG. 4 shows an outline of a video conference system according to a modification of the present invention. In FIG. 4, the same parts as those in FIGS. 1 and 2 of the previous embodiment are denoted by the same reference numerals, and description thereof will be omitted as appropriate. The voice processing unit 88-
1 to 88-n are the output of the control unit 82 and the level detection unit 91
-1 to 91-n are connected by a line 85 and a level detector 91
Outputs of -1 to 91-n and the speaker determination unit 45 are connected by a line 93. A threshold for determining whether to be voiced or unvoiced used by the level detectors 91-1 to 91-n from any conference terminal 72-i is set, and the level detectors 91-1 to 91-n are set. To memorize. Level detector 91-i
This threshold is used when detecting voiced or unvoiced in. Further, there are a threshold setting requesting unit 75-1 to 75-n for the conference terminal to issue a request for setting a threshold, and a threshold setting permitting unit 84 for determining whether or not to permit the request. As a result, the threshold value of an arbitrary level detection unit 91-i can be set from any one conference terminal. If the conference terminal for setting the threshold is limited to a specific one, the threshold setting requesting unit 75-i and the control right permission unit 84
May not be required.

【００３４】図５は、本発明の他の変形例における音声
処理部の概要を表したものである。この変形例では、あ
る会議端末から任意の会議端末の音声レベル調整部６２
−ｉにおける音声レベルを調整する。また、会議端末か
らしきい値を入力し、入力されたしきい値は、線１０２
を介し対応するレベル検出部１０３−ｉで用いられる。
レベル検出部１０３−ｉの出力は、話者判定部４５に線
１０４を介し入力される。これにより、ある会議端末か
ら任意の会議端末の音声信号のボリュームレベルを設定
するとともに、各会議端末からそれぞれ自端末に対応す
るレベル検出部１０３のしきい値を設定するものであ
る。したがって、各会議端末がしきい値を設定すること
により独自に有声の判定を制御でき、話者判定を制御す
ることができる。これにより、各会議端末からの話者判
定に関する制御要求をも満たすことができる。FIG. 5 shows an outline of a voice processing unit according to another modification of the present invention. In this modification, the audio level adjusting unit 62 of a certain conference terminal is changed to an arbitrary conference terminal.
-Adjust the audio level at i. Also, a threshold is input from the conference terminal, and the input threshold is
Is used by the corresponding level detection unit 103-i.
The output of the level detection unit 103-i is input to the speaker determination unit 45 via a line 104. Thus, the volume level of the audio signal of an arbitrary conference terminal is set from a certain conference terminal, and the threshold of the level detection unit 103 corresponding to the own terminal is set from each conference terminal. Therefore, each conference terminal can independently control the voiced judgment by setting the threshold value, and can control the speaker judgment. Thereby, it is possible to satisfy the control request regarding the speaker determination from each conference terminal.

【００３５】また、音声加算部４４にレベル検出部１０
３−ｉで有声と判定された音声を入力することにより、
ノイズを除去することができる。加算非加算切換スイッ
チ１０５−ｉに音声レベル調整部６２−ｉの出力を線５
５を介し入力し、レベル検出部１０３−ｉから出力され
た音声レベル値により加算非加算切換スイッチ１０５−
ｉのスイッチを切り替える。これにより、レベル検出部
１０３−ｉで有声と判定された音声信号のみ線１０６を
介し音声加算部４４に出力される。例えば、ある会議端
末３２−ｉの入力音声を音声レベル調整部６２−ｉで増
幅すると、入力信号がノイズの場合ノイズも増幅され
る。そこで、レベル検出部１０３−ｉで有声と判定する
しきい値をノイズの場合は無声とするよう変更すること
により、ノイズの場合は加算非加算切換スイッチ１０５
−ｉで音声レベル調整部６２−ｉからの信号がカットさ
れる。これにより、音声レベルを調整した音声信号のう
ち、ノイズを除去した音声信号を音声加算部４４で加算
したものを、それぞれの会議端末に出力するテレビ会議
システムを提供することができる。なお、ある会議端末
から任意の会議端末の音声信号のボリュームレベルを一
括して設定するとともに、ある会議端末から任意のレベ
ル検出部１０３−ｉのしきい値を一括して設定するよう
にしても良い。Further, the sound adding section 44 includes the level detecting section 10.
By inputting the voice determined to be voiced in 3-i,
Noise can be removed. The output of the audio level adjuster 62-i is connected to the addition / non-addition changeover switch 105-i via line 5.
5 and the addition / non-addition changeover switch 105- based on the audio level value output from the level detector 103-i.
Flip switch i. As a result, only the audio signal determined to be voiced by the level detection unit 103-i is output to the audio addition unit 44 via the line 106. For example, when the input audio of a certain conference terminal 32-i is amplified by the audio level adjustment unit 62-i, if the input signal is noise, the noise is also amplified. Therefore, by changing the threshold for judging voiced by the level detection unit 103-i to be unvoiced in the case of noise, the addition / non-addition switch 105
At -i, the signal from the audio level adjustment unit 62-i is cut. This makes it possible to provide a video conference system that outputs, to the respective conference terminals, a signal obtained by adding the audio signal from which the noise has been removed from the audio signals whose audio levels have been adjusted by the audio addition unit 44. It should be noted that the volume level of the audio signal of an arbitrary conference terminal may be set collectively from a certain conference terminal, and the threshold value of an arbitrary level detection unit 103-i may be set collectively from a certain conference terminal. good.

【００３６】[0036]

【発明の効果】以上説明したように請求項１記載の発明
によれば、会議端末から送られてくる音声の出力レベル
を音声レベル調整手段により調整する。任意の会議端末
に対応する音声レベル調整手段の調整を会議端末の音声
レベル調整指示手段から指示する。このため、音声の出
力レベルの調整を一括して行うことができる。この音声
レベル調整指示手段を全ての会議端末が備えるため、ど
の会議端末からでも音声の出力レベルを調整できる。As described above, according to the first aspect of the present invention, the output level of the sound sent from the conference terminal is adjusted by the sound level adjusting means. The adjustment of the audio level adjusting means corresponding to an arbitrary conference terminal is instructed from the audio level adjustment instructing means of the conference terminal. Therefore, the adjustment of the audio output level can be performed collectively. Since the audio level adjustment instructing means is provided in all conference terminals, the audio output level can be adjusted from any conference terminal.

【００３７】また、請求項２記載の発明では、音声の出
力レベルの調整を音声レベル調整手段に指示できる第１
の会議端末と、指示できない第２の会議端末とがある。
調整権の付与を第１の会議端末を設置するか第２の会議
端末を設置するかであらかじめ設定できる。According to the second aspect of the present invention, the first audio signal level adjusting means can instruct the audio level adjusting means to adjust the audio output level.
And a second conference terminal that cannot be instructed.
The grant of the adjustment right can be set in advance depending on whether the first conference terminal is installed or the second conference terminal is installed.

【００３８】また、請求項３記載の発明では、任意の会
議端末からの音声を有声とするか無声とするか判定する
しきい値を有声無声しきい値設定手段により設定する。
そのため、しきい値の変更を一括して行うことができ
る。また、全ての会議端末が有声無声しきい値設定手段
を備えるため、どの会議端末からでも有声の判定を制御
できる。According to the third aspect of the present invention, the threshold for determining whether voice from any conference terminal is voiced or unvoiced is set by the voiced / unvoiced threshold setting means.
Therefore, the threshold value can be changed at a time. Further, since all the conference terminals are provided with the voiced / unvoiced threshold setting means, the voiced judgment can be controlled from any of the conference terminals.

【００３９】また、請求項４の発明によれば、音声の出
力レベルの調整指示が競合した場合、１台の会議端末を
選択することにより競合を回避できる。According to the fourth aspect of the present invention, when the voice output level adjustment instructions conflict, the conflict can be avoided by selecting one conference terminal.

[Brief description of the drawings]

【図１】本発明の一実施例におけるテレビ会議システム
のシステム構成の概要を示すブロック図である。FIG. 1 is a block diagram illustrating an outline of a system configuration of a video conference system according to an embodiment of the present invention.

【図２】本実施例における音声処理部のブロック図であ
る。FIG. 2 is a block diagram of an audio processing unit according to the embodiment.

【図３】会議端末から出された音声レベル制御権要求を
許可するか否か判定する処理を表した流れ図である。FIG. 3 is a flowchart illustrating a process of determining whether to permit a voice level control right request issued from a conference terminal.

【図４】本発明の他の一実施例におけるテレビ会議シス
テムのシステム構成の概要を示すブロック図である。FIG. 4 is a block diagram showing an outline of a system configuration of a video conference system according to another embodiment of the present invention.

【図５】本実施例における他の音声処理部のブロック図
である。FIG. 5 is a block diagram of another audio processing unit in the embodiment.

【図６】従来提案された音声切替を用いた多地点テレビ
会議システムのシステム構成の概要を示すブロック図で
ある。FIG. 6 is a block diagram showing an outline of a system configuration of a conventionally proposed multipoint video conference system using voice switching.

[Explanation of symbols]

３２，７２会議端末３５制御要求部３６制御情報設定部（音声レベル調整指示手段）４４音声加算部（音声供給手段）４５話者判定部４６映像処理部４８制御権許可部（調整許可手段）６２音声レベル調整部６３，９１，１０３レベル検出部（有声無声判定手
段）７６しきい値設定部32, 72 Conference terminal 35 Control request unit 36 Control information setting unit (voice level adjustment instruction unit) 44 Audio addition unit (voice supply unit) 45 Speaker determination unit 46 Video processing unit 48 Control right permission unit (adjustment permission unit) 62 Voice level adjuster 63, 91, 103 Level detector (voiced / unvoiced determination means) 76 Threshold setting unit

Claims

[Claims]

An audio transmitting means for transmitting an audio of a terminal when a conference is performed between a plurality of persons using audio and video, a video transmitting means for transmitting an image of the terminal, Audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and instructing adjustment of the output level of audio from any terminal A plurality of conference terminals each having an audio level adjustment instructing unit; an audio supply unit for adding audio transmitted from the plurality of conference terminals and supplying the added audio to each conference terminal; Video supply means for selecting a desired one of the received video images and supplying it to each conference terminal, and adjusting the audio level of the conference terminal instructed by the audio level adjustment instruction means of any one of the plurality of conference terminals according to the instruction A video conference system comprising:

2. A voice transmitting means for transmitting a voice of the terminal when a conference is performed between a plurality of persons using voice and video, a video transmitting means for transmitting a video of the terminal, Audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and instructing adjustment of the output level of audio from any terminal 1 having audio level adjustment instruction means
Or a plurality of first conference terminals, a voice transmitting means for transmitting a voice of the own terminal side when a conference is performed between the plurality of persons using voice and video, and a video for transmitting a video of the own terminal side One or a plurality of second units each having a transmission unit and audio / video output units for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video. A conference terminal, and voices sent from the first conference terminal and the second conference terminal are added to each other, and the first conference terminal and the second conference terminal are added.
Means for supplying audio to the first conference terminal and the second conference terminal, and selecting a desired video from the video transmitted from the first conference terminal and the second conference terminal. A video conference system comprising: a video supply unit that supplies the video signal to the first conference terminal; and an audio level adjustment unit that adjusts an audio level in accordance with an instruction from the audio level adjustment instruction unit of the first conference terminal.

3. A voice transmitting means for transmitting a voice of the own terminal when a conference is performed between a plurality of persons using voice and video, a video transmitting means for transmitting a video of the own terminal, Audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and whether the audio output level from any terminal is regarded as voiced A plurality of conference terminals each having voiced and unvoiced threshold setting means for setting a threshold value for determining whether to be regarded as unvoiced, and adding the voices transmitted from the plurality of conference terminals and supplying them to the respective conference terminals Voice-supplying means, and voiced-unvoiced determination means for determining that the voice from the conference terminal is voiced if the voice from the conference terminal is equal to or greater than the threshold value according to the threshold value set by the voiced-unvoiced threshold value setting means for any of the plurality of conference terminals Having TV conference system which is characterized.

4. A voice transmitting means for transmitting a voice of the own terminal when a conference is performed between a plurality of persons using voice and video, a video transmitting means for transmitting a video of the own terminal, Audio / video output means for inputting audio and video of each terminal participating in the conference and outputting the state of the conference as audio and video, and instructing adjustment of the output level of audio from any terminal A plurality of conference terminals each having audio level adjustment instruction means; an adjustment permission means for selecting one conference terminal when adjustment instructions conflict; an audio level adjustment for adjusting based on the content specified by the selected conference terminal Means, an audio supply means for adding the audio transmitted from the plurality of conference terminals and supplying the same to each conference terminal, and selecting a desired one from the video transmitted from the plurality of conference terminals. Each meeting Video conference system, characterized by comprising a video supply means for supplying to the terminal.

5. A speaker judging means for judging a conference terminal having a maximum sound level among the sounds adjusted by the sound level adjusting means as a speaker, and a speaker judged by the speaker judging means. 2. The video conference system according to claim 1, further comprising video supply means for supplying a video transmitted from the conference terminal to each conference terminal.