JP2002223422A

JP2002223422A - Multi-point video conference controller and video packet transmission method

Info

Publication number: JP2002223422A
Application number: JP2001020242A
Authority: JP
Inventors: Tatsuya Kato; 達也加藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-01-29
Filing date: 2001-01-29
Publication date: 2002-08-09

Abstract

PROBLEM TO BE SOLVED: To provide a multi-point video conference system that effectively switches images at a low cost. SOLUTION: Discriminating number of effective voice packets not silent packets per time received from each video conference terminal connected to a multi-point video conference controller detects a talker terminal and the detected talker terminal acting like a sender source video phone terminal delivers a selected image video packet to all video conference terminals.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、テレビ会議システ
ムに関し、特に、無音時に音声パケットを送信しないテ
レビ会議端末が接続された多地点テレビ会議制御装置よ
り各テレビ会議端末にビデオパケットを送信する方法お
よび装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video conference system, and more particularly, to a method of transmitting video packets to each video conference terminal from a multipoint video conference controller connected to a video conference terminal that does not transmit audio packets when there is no sound. And equipment.

【０００２】[0002]

【従来の技術】従来、この種の多地点テレビ会議制御装
置は、話者であるテレビ会議端末の画像を選択し全ての
参加テレビ会議端末に送信することにより、各テレビ会
議端末の受信画像に話者の映像を表示することを目的と
して用いられる。2. Description of the Related Art Conventionally, this type of multipoint video conference control device selects an image of a video conference terminal which is a speaker and transmits the selected image to all participating video conference terminals so that a received image of each video conference terminal can be obtained. It is used for displaying the image of the speaker.

【０００３】例えば、特開２０００−１３４５９６に
は、各テレビ会議端末から受信した音声レベル（音響パ
ワー）を検出し、音声レベルの最大であるテレビ会議端
末を切り替え画像として使用する技術が記載されてい
る。For example, Japanese Patent Application Laid-Open No. 2000-134596 describes a technique for detecting a voice level (sound power) received from each video conference terminal and using the video conference terminal having the maximum voice level as a switching image. I have.

【０００４】図１１は、従来の多地点テレビ電話制御装
置の一例を示すブロック図である。会議室１において会
議参加者が発した音声はマイク５１０とアンプ５１１を
介してテレビ会議端末５１２に送られ、圧縮符号化され
た後に図略の撮像装置から得られた画像信号と多重され
て公衆回線に送信される。この多重化信号は多地点テレ
ビ会議制御装置５１３のラインインタフェース部５１４
−１で受信され、ＭＵＸ／ＤＥＭＵＸ５１５−１で音声
信号、画像信号が分離される。分離された音声信号は音
声復号化部５１７−１で復号化され、制御部５１６と音
声信号減衰部５２０−１に分配される。会議室２、３で
発せられた音声も同様にラインインタフェース部５１４
−２、５１４−３、ＭＵＸ／ＤＥＭＵＸ５１５−２、５
１５−３、音声復号化部５１７−２、５１７−３を介し
て制御部５１６及び音声信号減衰部５２０−２、５２０
−３に分配される。FIG. 11 is a block diagram showing an example of a conventional multipoint videophone control device. In the conference room 1, the sound uttered by the conference participants is sent to the video conference terminal 512 via the microphone 510 and the amplifier 511, and is multiplexed with an image signal obtained from an unillustrated imaging device after being compression-encoded and multiplexed. Sent to the line. The multiplexed signal is supplied to the line interface unit 514 of the multipoint video conference controller 513.
-1, the audio signal and the image signal are separated by the MUX / DEMUX 515-1. The separated audio signal is decoded by the audio decoding unit 517-1 and distributed to the control unit 516 and the audio signal attenuating unit 520-1. Similarly, voices emitted in the conference rooms 2 and 3 are also transmitted to the line interface unit 514.
-2, 514-3, MUX / DEMUX 515-2,5
15-3, the control unit 516 and the audio signal attenuating units 520-2, 520 via the audio decoding units 517-2, 517-3.
-3.

【０００５】図１２は、制御部５１６の詳細な構成を示
している。制御部５１６に入力された各々の音声信号
は、音声レベル検出部５３０−１、５３０−２、５３０
−３において予め定められた一定保護時間の間だけレベ
ル測定が行われ、その最大音声レベルが測定される。測
定された最大音声レベルを表す信号はレベル最大値調整
部５３１に送出され、既設の画像切替閾値との差分が算
出される。レベル最大値調整部５３１は、会議室１、
２、３に相当する差分が等しくなるよう音声レベル減衰
量を算出する。算出された音声レベル減衰量及び減衰調
整後の予想差分値は制御回路５３２に通知され、それを
受けた制御回路５３２は、減衰制御部５３３を介して音
声信号減衰部５２０−１の減衰制御を行う。FIG. 12 shows a detailed configuration of the control unit 516. Each audio signal input to the control unit 516 is output from an audio level detection unit 530-1, 530-2, 530.
At -3, a level measurement is performed only for a predetermined fixed protection time, and the maximum audio level is measured. The signal representing the measured maximum audio level is sent to the maximum level adjustment unit 531 and the difference between the signal and the existing image switching threshold is calculated. The maximum level adjustment unit 531 is provided for the conference room 1,
The sound level attenuation is calculated so that the differences corresponding to 2 and 3 become equal. The calculated audio level attenuation amount and the estimated difference value after the attenuation adjustment are notified to the control circuit 532, and the control circuit 532 that has received the notification controls the attenuation control of the audio signal attenuation unit 520-1 via the attenuation control unit 533. Do.

【０００６】図１１において、音声信号減衰部５２０−
１、５２０−２、５２０−３において平滑化された音声
信号は音声加算部５２２に送られてここで加算される。
加算される際、各音声信号のいずれかが画像切替閾値を
越えているかが測定され、その情報は制御部５１６の制
御回路５３２に送られる。制御回路５３２はこの情報を
元に分配画像信号のソースを判断し、画像切替部５１９
の制御を行う。画像切替部５１９は各テレビ会議端末５
１２に供給すべき画像信号を選択し、ＭＵＸ／ＤＥＭＵ
Ｘ５１５−１、５１５−２、５１５−３に供給する。In FIG. 11, an audio signal attenuating section 520-
The audio signals smoothed in 1, 520-2, and 520-3 are sent to the audio addition unit 522, where they are added.
At the time of addition, it is measured whether any of the audio signals exceeds the image switching threshold, and the information is sent to the control circuit 532 of the control unit 516. The control circuit 532 determines the source of the distributed image signal based on this information, and
Control. The image switching unit 519 is connected to each video conference terminal 5
12 to select an image signal to be supplied to the MUX / DEMU
X515-1, 515-2, 515-3.

【０００７】[0007]

【発明が解決しようとする課題】上述した従来の技術は
次のような問題点があった。The above-mentioned prior art has the following problems.

【０００８】その問題点は、各会議室１、２、３からの
音声レベルの平滑化を行ったうえ、音声レベルの比較を
行い、レベルの高い会議室を画像切替の判断としている
点にある。The problem is that the audio levels from the conference rooms 1, 2, and 3 are smoothed, the audio levels are compared, and the conference room with the higher level is determined to switch the image. .

【０００９】その理由は、音声レベルの平滑化や最大音
声レベルの検出には、処理量の多い算術処理を行う必要
があり、これはハードウエアを高価にするなどの弊害を
もたらす。The reason is that it is necessary to perform a large amount of arithmetic processing for smoothing the audio level and detecting the maximum audio level, which causes problems such as an increase in hardware cost.

【００１０】本発明の目的は、以上の問題点を解決して
安価に効果的に画像切替を行う多地点テレビ会議制御装
置およびビデオパケット送信方法を提供することであ
る。It is an object of the present invention to provide a multipoint video conference control apparatus and a video packet transmission method that can effectively and inexpensively switch images by solving the above problems.

【００１１】[0011]

【課題を解決するための手段】本発明の多地点テレビ会
議制御装置のビデオパケット送信方法は、無音時に音声
パケットを送信しないテレビ会議端末が接続された多地
点テレビ会議制御装置より各テレビ会議端末にビデオパ
ケットを送信する方法において、各テレビ会議端末から
受信する無音ではない有効な音声パケットの時間あたり
数を判断することにより話者端末を検出する段階と、前
記検出された話者端末を送信元テレビ電話端末として多
地点テレビ会議制御装置に接続された全テレビ会議端末
に選択画像ビデオパケットを配送する段階とを有する。The video packet transmitting method of the multipoint video conference control device according to the present invention comprises the steps of: transmitting a video packet from a multipoint video conference control device connected to a video conference terminal that does not transmit audio packets during silence; Detecting a speaker terminal by determining the number of valid non-silent voice packets per time received from each video conference terminal, and transmitting the detected speaker terminal. Delivering the selected image video packet to all video conference terminals connected to the multipoint video conference control device as the former video telephone terminal.

【００１２】これらの処理により、無音時に音声パケッ
トを送信しないテレビ会議端末が接続された多地点テレ
ビ会議装置において、効果的に話者端末を識別し全テレ
ビ会議端末に配送する選択画像パケットを話者端末の画
像パケットに切り替えることができる。According to these processes, in a multipoint video conference apparatus connected to a video conference terminal that does not transmit voice packets when there is no sound, it is possible to effectively identify a speaker terminal and transmit a selected image packet to be delivered to all video conference terminals. Can be switched to the image packet of the user terminal.

【００１３】話者端末を検出する段階は、インターネッ
トを介して複数のテレビ会議端末から受信し伸張処理が
施された音声パケットを入力し、無音ではない有効な各
音声パケットの単位時間あたりの数を計数し、一番多い
パケットを受信した端末を判別して該端末を話者端末と
する識別信号を出力し、選択画像ビデオパケットを配送
する段階は、インターネットを介して複数のテレビ会議
端末から受信した画像パケットを入力し、同時に入力す
る前記識別信号に応じて話者端末としての選択画像パケ
ットを切り替えて、各テレビ会議端末に向け送信するも
のを含む。The step of detecting a speaker terminal is performed by inputting voice packets received from a plurality of video conference terminals via the Internet and subjected to decompression processing, and the number of valid voice packets that are not silence per unit time. Counting, the terminal that has received the most packets is identified, the identification signal for identifying the terminal as the speaker terminal is output, and the step of delivering the selected image video packet is performed by a plurality of video conference terminals via the Internet. The present invention also includes a method of inputting a received image packet, switching a selected image packet as a speaker terminal according to the identification signal input at the same time, and transmitting the selected image packet to each video conference terminal.

【００１４】話者端末を検出する段階は、インターネッ
トを介して複数のテレビ会議端末から受信し伸張処理が
施された音声パケットを入力し、無音ではない有効な各
音声パケットの単位時間あたりの数を計数して出力する
段階と、前記有効な各音声パケットの単位時間あたりの
数を入力して比較を行い、最大の時間あたりパケット数
を送出したテレビ会議端末を話者端末として選択する話
者端末選択信号を出力する段階とを有するものを含む。The step of detecting a speaker terminal is performed by inputting voice packets received from a plurality of video conference terminals via the Internet and subjected to expansion processing, and the number of valid voice packets that are not silence per unit time. And a step of inputting and comparing the number of valid voice packets per unit time, and selecting a video conference terminal that has transmitted the maximum number of packets per time as a speaker terminal. Outputting a terminal selection signal.

【００１５】本発明の多地点テレビ会議制御装置は、無
音時に音声パケットを送信しない複数のテレビ会議端末
がインターネットを介して接続されたテレビ会議を開催
するための多地点テレビ会議制御装置において、各テレ
ビ会議端末から受信する無音ではない有効な音声パケッ
トの時間あたり数を判断することにより話者端末を検出
する話者判別部と、前記検出された話者端末を送信元テ
レビ会議端末として多地点テレビ会議制御装置に接続さ
れた全テレビ会議端末に選択画像ビデオパケットを配送
する画像切替部とを有する。[0015] The multipoint video conference control device of the present invention is a multipoint video conference control device for holding a video conference in which a plurality of video conference terminals that do not transmit voice packets during silence are connected via the Internet. A speaker discriminator for detecting a speaker terminal by judging the number of valid voice packets that are not silence received from the video conference terminal per time; and a multi-point as the source video conference terminal using the detected speaker terminal as a source video conference terminal An image switching unit that delivers the selected image video packet to all the video conference terminals connected to the video conference control device.

【００１６】話者判別部は、インターネットを介して複
数のテレビ会議端末から受信し伸張処理が施された音声
パケットを入力し、無音ではない有効な各音声パケット
の単位時間あたりの数を計数し、一番多いパケットを受
信した端末を判別して該端末を話者端末とする識別信号
を出力し、画像切替部は、インターネットを介して複数
のテレビ会議端末から受信した画像パケットを入力し、
同時に入力する前記識別信号に応じて話者端末としての
選択画像パケットを切り替えて各テレビ会議端末に向け
送信するものを含む。The speaker discriminating unit inputs voice packets received from a plurality of video conference terminals via the Internet and subjected to decompression processing, and counts the number of valid non-silent voice packets per unit time. Discriminating a terminal that has received the most packets, and outputs an identification signal that identifies the terminal as a speaker terminal; the image switching unit inputs image packets received from a plurality of video conference terminals via the Internet;
It includes a method of switching a selected image packet as a speaker terminal according to the identification signal input at the same time and transmitting the selected image packet to each video conference terminal.

【００１７】話者判別部は、複数のテレビ会議端末に対
応して設けられ、対応するテレビ会議端末から送信され
た音声パケットを入力し、時間あたりパケット数を計数
して出力するパケット計数部と、前記パケット計数部が
出力した時間あたりパケット数を入力して比較を行い、
最大の時間あたりパケット数を送出したテレビ会議端末
を話者端末として選択する話者端末選択信号を出力する
パケット数比較部とを有するものを含む。The speaker discriminating section is provided corresponding to the plurality of video conference terminals, receives a voice packet transmitted from the corresponding video conference terminal, counts the number of packets per time, and outputs the packet. Performing a comparison by inputting the number of packets per time output by the packet counting unit,
And a packet number comparison unit that outputs a speaker terminal selection signal for selecting a video conference terminal that has transmitted the maximum number of packets per time as a speaker terminal.

【００１８】[0018]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００１９】図１は本発明の多地点テレビ会議制御装置
のビデオパケット送信方法の一実施の形態のフローチャ
ートを示す。FIG. 1 shows a flowchart of an embodiment of a video packet transmitting method of the multipoint video conference controller according to the present invention.

【００２０】本発明の実施の形態の多地点テレビ会議制
御装置のビデオパケット送信方法は、図１に示すよう
に、まず、無音時に音声パケットを送信しない複数のテ
レビ会議端末から受信する無音ではない有効な音声パケ
ットの時間あたり数を判断することにより話者端末を検
出する(ステップS１)。このとき、インターネットを介
して各テレビ会議端末から受信し伸張処理が施された音
声パケットを入力し、無音ではない有効な各音声パケッ
トの単位時間あたりの数を計数する。この計数値を比較
して、一番多いパケットを受信した端末を判別し、この
端末を話者端末として識別する話者端末識別信号を出力
する。つづいて、ステップS１で検出された話者端末を
送信元テレビ会議端末として多地点テレビ会議制御装置
に接続された全テレビ会議端末に選択画像ビデオパケッ
トを配送する(ステップS２)。このとき、インターネッ
トを介して複数のテレビ会議端末から受信した画像パケ
ットを入力し、同時に入力する話者端末識別信号に応じ
て選択画像パケットを切り替えて、各テレビ会議端末に
向けて送信する。As shown in FIG. 1, the video packet transmission method of the multipoint video conference control apparatus according to the embodiment of the present invention is not a silence received from a plurality of video conference terminals that do not transmit voice packets when there is no voice. The speaker terminal is detected by determining the number of valid voice packets per time (step S1). At this time, audio packets received from each video conference terminal and subjected to decompression processing via the Internet are input, and the number of valid non-silent audio packets per unit time is counted. By comparing the count values, the terminal that has received the most packets is determined, and a speaker terminal identification signal for identifying this terminal as a speaker terminal is output. Subsequently, the selected image video packet is delivered to all the video conference terminals connected to the multipoint video conference control device, using the speaker terminal detected in step S1 as a source video conference terminal (step S2). At this time, image packets received from a plurality of video conference terminals via the Internet are input, and the selected image packets are switched according to the simultaneously input speaker terminal identification signal, and transmitted to each video conference terminal.

【００２１】これらの処理により、無音時に音声パケッ
トを送信しないテレビ会議端末が接続された多地点テレ
ビ会議装置において、効果的に話者端末を識別し全テレ
ビ会議端末に配送する選択画像パケットを話者端末の画
像パケットに切り替えることができる。According to these processes, in a multipoint video conference apparatus to which a video conference terminal that does not transmit voice packets during silence is connected, a speaker terminal is effectively identified and a selected image packet to be delivered to all video conference terminals is transmitted. Can be switched to the image packet of the user terminal.

【００２２】図２は本発明の多地点テレビ会議装置の一
実施の形態を含む多地点テレビ会議システムのブロック
図、図３は図２の多地点テレビ会議装置ＭＣＵのブロッ
ク図を示す。FIG. 2 is a block diagram of a multipoint video conference system including an embodiment of the multipoint video conference device of the present invention, and FIG. 3 is a block diagram of the multipoint video conference device MCU of FIG.

【００２３】図２を参照すると、多地点テレビ会議制御
装置ＭＣＵは、インターネット１０を介してテレビ会議
端末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４とそれぞ
れ接続されている。Referring to FIG. 2, the multipoint video conference control unit MCU is connected to video conference terminals EP-1, EP-2, EP-3, and EP-4 via the Internet 10.

【００２４】図３を参照すると、多地点テレビ会議制御
装置ＭＣＵは、ネットワークインタフェイス部１と第
一、第二、第三及び第四の音声圧縮伸張部２−１、２−
２、２−３、２−４と音声加算部３と話者判別部４と画
像切替部５と呼制御部６とからなる。Referring to FIG. 3, the multipoint video conference control unit MCU includes a network interface unit 1 and first, second, third and fourth audio compression / decompression units 2-1 and 2-.
2, 2-3, 2-4, a voice addition unit 3, a speaker determination unit 4, an image switching unit 5, and a call control unit 6.

【００２５】話者判別部４は、インターネット１０を介
しテレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ
−４から受信し伸張処理が施された音声パケットＡＤｏ
−１、ＡＤｏ−２、ＡＤｏ−３、ＡＤｏ−４を入力し、
無音ではない有効な各音声パケットの単位時間あたりの
数を計数し、一番多いパケットを受信した端末を判別
し、話者端末識別信号ＳＥＬを画像切替部５に出力す
る。画像切替部５は、インターネット１０を介しテレビ
会議端末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４から
受信した画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−３Ｖ
ｉ−４を入力し、同時に入力する話者端末識別信号ＳＥ
Ｌに応じて選択画像パケットを切り替える。選択画像パ
ケットはＶｏ−１、Ｖｏ−２、Ｖｏ−３、Ｖｏ−４とし
て各テレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−３、Ｅ
Ｐ−４に向け送信される。The speaker discriminating unit 4 communicates with the video conference terminals EP-1, EP-2, EP-3, and EP via the Internet 10.
Packet ADo received from -4 and subjected to decompression processing
-1, ADo-2, ADo-3, ADo-4
The number of valid voice packets that are not silence per unit time is counted, the terminal that has received the most packets is determined, and a speaker terminal identification signal SEL is output to the image switching unit 5. The image switching unit 5 transmits the image packets Vi-1, Vi-2, and Vi-3V received from the video conference terminals EP-1, EP-2, EP-3, and EP-4 via the Internet 10.
i-4 is input and the speaker terminal identification signal SE is input at the same time.
The selected image packet is switched according to L. The selected image packets are Vo-1, Vo-2, Vo-3, and Vo-4 as video conference terminals EP-1, EP-2, EP-3, and E, respectively.
Sent to P-4.

【００２６】これらの処理により、無音時に音声パケッ
トを送信しないテレビ会議端末が接続された多地点テレ
ビ会議装置において、効果的に話者端末を識別し全テレ
ビ会議端末に配送する選択画像パケットを話者端末の画
像パケットに切り替えることができる。According to these processes, in a multipoint video conference apparatus to which a video conference terminal that does not transmit voice packets during silence is connected, a speaker terminal is effectively identified and a selected image packet to be delivered to all video conference terminals is transmitted. Can be switched to the image packet of the user terminal.

【００２７】本実施の形態の多地点テレビ会議装置の構
成をさらに詳しく説明する。The configuration of the multipoint video conference apparatus according to the present embodiment will be described in more detail.

【００２８】ネットワークインタフェイス部１は、イン
ターネット１０を介し各テレビ会議端末ＥＰ−１、ＥＰ
−２、ＥＰ−３、ＥＰ−４からネットワークデータＮＤ
−１、ＮＤ−２、ＮＤ−３、ＮＤ−４を送受信する。ま
た、ネットワークインタフェイス部１は第一、第二、第
三及び第四の音声圧縮伸張部２−１、２−２、２−３、
２−４に各テレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−
３、ＥＰ−４から受信した圧縮音声パケットＡＥｉ−
１、ＡＥｉ−２、ＡＥｉ−３、ＡＥｉ−４をそれぞれ出
力し、第一、第二、第三及び第四の音声圧縮伸張部２−
１、２−２、２−３、２−４から各テレビ会議端末ＥＰ
−１、ＥＰ−２、ＥＰ−３、ＥＰ−４に送信するべく圧
縮音声パケットＡＥｏ−１、ＡＥｏ−２、ＡＥｏ−３、
ＡＥｏ−４を入力する。さらに、ネットワークインタフ
ェイス部１は、画像切替部５に対し各テレビ会議端末Ｅ
Ｐ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４から受信した圧
縮画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−３、Ｖｉ−
４をそれぞれ出力し、画像切替部５から各テレビ会議端
末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４に送信する
べく圧縮画像パケットＶｏ−１、Ｖｏ−２、Ｖｏ−３、
Ｖｏ−４を入力する。加えてネットワークインタフェイ
ス部１は、呼制御部６との間で呼制御信号ＣＩを送受信
する。第一、第二、第三および第四の音声圧縮伸張部２
−１、２−２、２−３、２−４は、ネットワークインタ
フェイス部１からそれぞれ圧縮音声パケットＡＥｉ−
１、ＡＥｉ−２、ＡＥｉ−３、ＡＥｉ−４を入力し音声
伸張処理を施したうえ、音声加算部３と話者判別部４に
それぞれ伸張音声パケットＡＤｉ−１、ＡＤｉ−２、Ａ
Ｄｉ−３、ＡＤｉ−４を出力する。音声加算部３は、第
一、第二、第三および第四の音声圧縮伸張部２−１、２
−２、２−３、２−４から伸張音声パケットＡＤｉ−
１、ＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４を入力し、音
声加算処理を施したうえ、第一、第二、第三および第四
の音声圧縮部２−１、２−２、２−３、２−４に対し、
音声加算処理が施された伸張音声パケットＡＤｏ−１、
ＡＤｏ−２、ＡＤｏ−３、ＡＤｏ−４を出力する。話者
判別部４は、第一、第二、第三および第四の音声圧縮伸
張部２−１、２−２、２−３、２−４から伸張音声パケ
ットＡＤｉ−１、ＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４
を入力し、現在の話者端末を判別し、話者であることを
識別する話者端末選択信号ＳＥＬを画像切替部５に出力
する。画像切替部５は、ネットワークインタフェイス部
１から圧縮画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−
３、Ｖｉ−４を入力し、圧縮画像パケットＶｏ−１、Ｖ
ｏ−２、Ｖｏ−３、Ｖｏ−４を出力する。同時に話者判
別部４から話端末者選択信号ＳＥＬを入力する。The network interface unit 1 is connected to each of the video conference terminals EP-1 and EP via the Internet 10.
-2, EP-3, EP-4 to network data ND
-1, ND-2, ND-3, ND-4 are transmitted and received. The network interface unit 1 includes first, second, third, and fourth audio compression / decompression units 2-1, 2-2, 2-3,
2-4, each video conference terminal EP-1, EP-2, EP-
3. Compressed voice packet AEi- received from EP-4
1, AEi-2, AEi-3, and AEi-4, respectively, and outputs the first, second, third, and fourth audio compression / decompression units 2-
1, 2-2, 2-3, 2-4 to each video conference terminal EP
-1, EP-2, EP-3, compressed voice packets AEo-1, AEo-2, AEo-3,
Enter AEo-4. Further, the network interface unit 1 provides the image switching unit 5 with each video conference terminal E
Compressed image packets Vi-1, Vi-2, Vi-3, Vi- received from P-1, EP-2, EP-3, EP-4
4 are output, and the compressed image packets Vo-1, Vo-2, Vo-3, and -3 are transmitted from the image switching unit 5 to the video conference terminals EP-1, EP-2, EP-3, and EP-4.
Enter Vo-4. In addition, the network interface unit 1 transmits and receives a call control signal CI to and from the call control unit 6. First, second, third and fourth audio compression / decompression units 2
-1, 2-2, 2-3, and 2-4 are compressed voice packets AEi-
1, AEi-2, AEi-3, and AEi-4 are input and subjected to voice expansion processing, and then expanded voice packets ADi-1, ADi-2, A
Di-3 and ADi-4 are output. The voice adding unit 3 includes first, second, third, and fourth voice compression / decompression units 2-1 and 2-1.
-2, 2-3, and 2-4, decompressed voice packet ADi-
1, ADi-2, ADi-3, and ADi-4 are input and subjected to audio addition processing, and then the first, second, third, and fourth audio compression units 2-1, 2-2, 2- For 3, 2-4,
Decompressed voice packet ADo-1 subjected to voice addition processing,
ADo-2, ADo-3, and ADo-4 are output. The speaker discriminating unit 4 receives the expanded voice packets ADi-1, ADi-2, ADi-2, ADi-3, ADi-4
Is input, the current speaker terminal is determined, and a speaker terminal selection signal SEL for identifying the speaker terminal is output to the image switching unit 5. The image switching unit 5 sends the compressed image packets Vi-1, Vi-2, Vi-
3, Vi-4 is input and the compressed image packets Vo-1, V
o-2, Vo-3, and Vo-4 are output. At the same time, a speaker terminal selection signal SEL is input from the speaker determination unit 4.

【００２９】図４は図３の話者判別部４のブロック図を
示す。FIG. 4 is a block diagram of the speaker discriminating section 4 shown in FIG.

【００３０】第一、第二、第三、第四の音声圧縮伸張部
２−１、２−２、２−３、２−４からそれぞれ入力され
る伸張音声パケットＡＤｉ−１、ＡＤｉ−２、ＡＤｉ−
３、ＡＤｉ−４は、第一、第二、第三そして第四のパケ
ット計数部４０−１、４０−２、４０−３、４０−４に
入力される。第一、第二、第三そして第四のパケット計
数部４０−１、４０−２、４０−３、４０−４はそれぞ
れ入力される伸張音声パケットＡＤｉ−１、ＡＤｉ−
２、ＡＤｉ−３、ＡＤｉ−４の時間あたりパケット数を
計数し、パケット数比較部４１にパケット数ＮＵＭ−
１、ＮＵＭ−２、ＮＵＭ−３、ＮＵＭ−４を出力する。
パケット数比較部４１は入力されるパケット数ＮＵＭ−
１、ＮＵＭ−２、ＮＵＭ−３、ＮＵＭ−４の比較を行
い、話者端末選択信号ＳＥＬを出力する。The decompressed voice packets ADi-1, ADi-2, inputted from the first, second, third, and fourth voice compression / decompression units 2-1, 2-2, 2-3, 2-4, respectively. ADi-
3. ADi-4 is input to the first, second, third and fourth packet counting units 40-1, 40-2, 40-3 and 40-4. The first, second, third, and fourth packet counting units 40-1, 40-2, 40-3, and 40-4 respectively input the decompressed voice packets ADi-1 and ADi-
2, the number of packets per time of ADi-3 and ADi-4 is counted, and the number of packets NUM-
1, NUM-2, NUM-3 and NUM-4 are output.
The number-of-packets comparing unit 41 calculates the number of input packets NUM−
1, NUM-2, NUM-3, and NUM-4 are compared, and a speaker terminal selection signal SEL is output.

【００３１】図５は図３の音声加算部３のブロック図を
示す。FIG. 5 is a block diagram of the voice adding unit 3 shown in FIG.

【００３２】加算器ＡＤＤ−１は、入力される伸張音声
パケットＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４の加算処
理を行い、伸張音声パケットＡＤｏ−１を出力する。ま
た、加算器ＡＤＤ−２は、入力される伸張音声パケット
ＡＤｉ−１、ＡＤｉ−３、ＡＤｉ−４の加算処理を行
い、伸張音声パケットＡＤｏ−２を出力する。同様に加
算器ＡＤＤ−３は、入力される伸張音声パケットＡＤｉ
−１、ＡＤｉ−２、ＡＤｉ−４の加算処理を行い、伸張
音声パケットＡＤｏ−３を出力する。さらに、加算器Ａ
ＤＤ−４は、入力される伸張音声パケットＡＤｉ−１、
ＡＤｉ−２、ＡＤｉ−３の加算処理を行い、伸張音声パ
ケットＡＤｏ−４を出力する。The adder ADD-1 performs an addition process on the input decompressed voice packets ADi-2, ADi-3, and ADi-4, and outputs a decompressed voice packet ADo-1. The adder ADD-2 performs an addition process on the input decompressed voice packets ADi-1, ADi-3, and ADi-4, and outputs a decompressed voice packet ADo-2. Similarly, the adder ADD-3 outputs the input decompressed voice packet ADi.
-1, ADi-2 and ADi-4 are added, and an expanded voice packet ADo-3 is output. Further, adder A
DD-4 is an input decompressed voice packet ADi-1,
ADi-2 and ADi-3 are added and an expanded voice packet ADo-4 is output.

【００３３】図６は図３の画像切替部５のブロック図を
示す。FIG. 6 is a block diagram of the image switching unit 5 of FIG.

【００３４】画像切替部５では、スイッチＳＷが入力さ
れる話者端末選択信号ＳＥＬにしたがい、入力される圧
縮画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−３、Ｖｉ−
４を選択し、選択された圧縮画像パケットを圧縮画像パ
ケットＶｏ−１、Ｖｏ−２、Ｖｏ−３、Ｖｏ−４として
出力する。In the image switching section 5, the compressed image packets Vi-1, Vi-2, Vi-3, Vi- are inputted according to the speaker terminal selection signal SEL inputted by the switch SW.
4 is selected, and the selected compressed image packet is output as compressed image packets Vo-1, Vo-2, Vo-3, and Vo-4.

【００３５】図７は図２のテレビ会議端末ＥＰ−１、Ｅ
Ｐ−２、ＥＰ−３、ＥＰ−４のブロック図を示す。FIG. 7 shows the video conference terminals EP-1, E of FIG.
The block diagram of P-2, EP-3, EP-4 is shown.

【００３６】マイクロフォン１００は集音した音声信号
を音声圧縮部１０１に出力する。音声圧縮部１０１は入
力された音声信号の音声圧縮処理を行い、圧縮音声パケ
ットをネットワークインタフェイス部１０８に出力す
る。またビデオカメラ１０２は撮影した画像信号を画像
圧縮部１０３に出力する。画像圧縮部１０３は入力され
た画像信号の画像圧縮処理を行い、圧縮音声パケットを
ネットワークインタフェイス部１０８に出力する。ネッ
トワークインタフェイス部１０８は入力された圧縮音声
パケット及び圧縮画像パケットをインターネット１０を
介し、接続されたＭＣＵにネットワークデータＮＤ−ｎ
として送信する。また同時にＭＣＵから受信したネット
ワークデータＮＤ−ｎはネットワークインタフェイス部
１０８に入力され、画像伸張部１０４および音声伸張部
１０６にそれぞれ圧縮画像パケット及び圧縮音声パケッ
トを出力する。画像伸張部１０４は多地点テレビ会議装
置ＭＣＵからインターネット１０を介し受信した圧縮画
像パケットの画像伸張処理を行い、画像信号を出力す
る。ディスプレイ１０５は画像伸張部１０４から入力さ
れる画像信号を描画する。音声伸張部１０６は多地点テ
レビ会議装置ＭＣＵからインターネット１０を介し受信
した圧縮音声パケットの音声伸張処理を行い、音声信号
を出力する。スピーカ１０７は音声伸張部１０６から入
力される音声信号を再生する。The microphone 100 outputs the collected audio signal to the audio compression unit 101. The audio compression unit 101 performs audio compression processing on the input audio signal, and outputs a compressed audio packet to the network interface unit 108. The video camera 102 outputs a captured image signal to the image compression unit 103. The image compression unit 103 performs an image compression process on the input image signal, and outputs a compressed audio packet to the network interface unit 108. The network interface unit 108 transmits the input compressed voice packet and the compressed image packet to the MCU connected via the Internet 10 to the network data ND-n.
Send as At the same time, the network data ND-n received from the MCU is input to the network interface unit 108, and outputs a compressed image packet and a compressed audio packet to the image expansion unit 104 and the audio expansion unit 106, respectively. The image decompression unit 104 performs image decompression processing on a compressed image packet received from the multipoint video conference device MCU via the Internet 10 and outputs an image signal. The display 105 draws an image signal input from the image expansion unit 104. The audio decompression unit 106 performs an audio decompression process on a compressed audio packet received from the multipoint video conference device MCU via the Internet 10 and outputs an audio signal. The speaker 107 reproduces an audio signal input from the audio expansion unit 106.

【００３７】(動作の説明)次に、本実施形態の動作につ
いて説明する。(Description of Operation) Next, the operation of this embodiment will be described.

【００３８】図２において、多地点テレビ会議制御装置
ＭＣＵはインターネット１０を経由し、各テレビ会議端
末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４に接続さ
れ、圧縮音声パケット及び圧縮画像パケットを含んだネ
ットワークデータＮＤ−１、ＮＤ−２、ＮＤ−３、ＮＤ
−４を送受信することで、多地点のテレビ会議を実現し
ている。In FIG. 2, a multipoint video conference control unit MCU is connected to each of the video conference terminals EP-1, EP-2, EP-3, and EP-4 via the Internet 10, and receives compressed audio packets and compressed images. Network data ND-1, ND-2, ND-3, ND including packet
-4 is transmitted and received to realize a multipoint video conference.

【００３９】図３を参照すると、インターネット１０を
経由し各テレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−
３、ＥＰ−４から受信し、音声圧縮伸張部２−１、２−
２、２−３、２−４においてそれぞれ音声伸張処理を施
された伸張音声パケットＡＤｉ−１、ＡＤｉ−２、ＡＤ
ｉ−３、ＡＤｉ−４は、話者判別部４に入力される。話
者判別部４は入力された伸張音声パケットＡＤｉ−１、
ＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４の単位時間あたり
の伸張音声パケット数から、現在のテレビ会議の話者
が、テレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−３、Ｅ
Ｐ−４の何れかを判断し、話者端末選択信号ＳＥＬを出
力する。また同時に、インターネット１０を経由し各テ
レビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４
から受信した圧縮画像パケットＶｉ−１、Ｖｉ−２、Ｖ
ｉ−３、Ｖｉ−４は、画像切替部５に入力される。画像
切替部５は、再度各テレビ会議端末ＥＰ−１、ＥＰ−
２、ＥＰ−３、ＥＰ−４に出力する圧縮画像パケットＶ
ｏ−１、Ｖｏ−２、Ｖｏ−３、Ｖｏ−４として、話者判
別部４が出力する話者端末選択信号ＳＥＬに応じて圧縮
画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−３、Ｖｉ−４
のいずれかを出力する。つまり現在のテレビ会議の話者
端末の画像が選択画像として全てのテレビ会議端末に送
信される。Referring to FIG. 3, each of the video conference terminals EP-1, EP-2, and EP-
3. Received from EP-4, voice compression / decompression units 2-1 and 2-
2, 2-3, and 2-4, decompressed voice packets ADi-1, ADi-2, and AD subjected to voice decompression processing, respectively.
i-3 and ADi-4 are input to the speaker determination unit 4. The speaker identification unit 4 receives the input expanded voice packet ADi-1,
From the number of decompressed voice packets per unit time of ADi-2, ADi-3, and ADi-4, the speaker of the current video conference can determine the video conference terminals EP-1, EP-2, EP-3, E
One of P-4 is determined, and a speaker terminal selection signal SEL is output. At the same time, each of the video conference terminals EP-1, EP-2, EP-3, and EP-4 via the Internet 10.
Compressed video packets Vi-1, Vi-2, V received from
i-3 and Vi-4 are input to the image switching unit 5. The image switching unit 5 again transmits the video conference terminals EP-1, EP-
2, a compressed image packet V output to EP-3 and EP-4
Compressed image packets Vi-1, Vi-2, Vi-3, Vi according to the speaker terminal selection signal SEL output by the speaker discriminating unit 4 as o-1, Vo-2, Vo-3, and Vo-4. -4
Is output. That is, the image of the speaker terminal of the current video conference is transmitted to all the video conference terminals as the selected image.

【００４０】図４を参照すると、インターネット１０を
経由し各テレビ会議端末ＥＰ−１、ＥＰ−２、ＥＰ−
３、ＥＰ−４から受信し、音声圧縮伸張部２−１、２−
２、２−３、２−４においてそれぞれ音声伸張処理を施
された伸張音声パケットＡＤｉ−１、ＡＤｉ−２、ＡＤ
ｉ−３、ＡＤｉ−４は、それぞれパケット計数部４０−
１、４０−２、４０−３、４０−４に入力される。パケ
ット計数部４０−１、４０−２、４０−３、４０−４は
それぞれ入力される伸張音声パケットの単位時間あたり
のパケット数を計数し、それぞれパケット数ＮＵＭ−
１、ＮＵＭ−２、ＮＵＭ−３、ＮＵＭ−４を出力する。
パケット数比較部４１はパケット計数部４０−１、４０
−２、４０−３、４０−４から入力されるパケット数Ｎ
ＵＭ−１、ＮＵＭ−２、ＮＵＭ−３、ＮＵＭ−４が入力
され、パケット数が最大の入力をテレビ会議の話者であ
ると判断し、話者端末選択信号ＳＥＬを出力する。Referring to FIG. 4, each video conference terminal EP-1, EP-2, EP-
3. Received from EP-4, voice compression / decompression units 2-1 and 2-
2, 2-3, and 2-4, decompressed voice packets ADi-1, ADi-2, and AD subjected to voice decompression processing, respectively.
i-3 and ADi-4 are respectively a packet counting unit 40-
1, 40-2, 40-3, and 40-4. The packet counting units 40-1, 40-2, 40-3, and 40-4 each count the number of input expanded voice packets per unit time, and count the number of packets NUM-
1, NUM-2, NUM-3 and NUM-4 are output.
The packet number comparing unit 41 includes packet counting units 40-1 and 40
-2, number of packets N input from 40-3, 40-4
UM-1, NUM-2, NUM-3, and NUM-4 are input, the input having the largest number of packets is determined to be the speaker of the video conference, and the speaker terminal selection signal SEL is output.

【００４１】図８は図４のパケット計数部４０−１、４
０−２、４０−３、４０−４の動作を説明する図を示
す。FIG. 8 shows the packet counting units 40-1 and 4 in FIG.
The figure explaining operation | movement of 0-2, 40-3, and 40-4 is shown.

【００４２】今、各テレビ会議端末ＥＰ−１、ＥＰ−
２、ＥＰ−３、ＥＰ−４から受信し音声伸張処理が施さ
れた伸張音声パケットＡＤｉ−１、ＡＤｉ−２、ＡＤｉ
−３、ＡＤｉ−４が、時間タイミングＴ１、Ｔ
２、．．．、Ｔ５と入力されたとする。このときの、各
テレビ会議端末から受信した伸張音声パケットを計数し
パケット数ＮＵＭ−１、ＮＵＭ−２、ＮＵＭ−３、ＮＵ
Ｍ−４を得る。このとき、テレビ会議端末から伸張音声
パケットを受信しない場合計数しない。図８の例では、
ＮＵＭ−１は５、ＮＵＭ−２は３、ＮＵＭ−３は４、Ｎ
ＵＭ−４は２と、パケット数ＮＵＭが計数される。Now, each of the video conference terminals EP-1, EP-
2. Decompressed voice packets ADi-1, ADi-2, ADi received from EP-3 and EP-4 and subjected to voice decompression processing
-3, ADi-4 are at time timings T1, T
2,. . . , T5. At this time, the number of decompressed voice packets received from each video conference terminal is counted, and the number of packets NUM-1, NUM-2, NUM-3, NU
M-4 is obtained. At this time, if no decompressed voice packet is received from the video conference terminal, counting is not performed. In the example of FIG.
NUM-1 is 5, NUM-2 is 3, NUM-3 is 4, N
For UM-4, 2 and the number of packets NUM are counted.

【００４３】図９は図４のパケット数比較部４１の動作
を説明する図を示す。FIG. 9 is a diagram for explaining the operation of the packet number comparison unit 41 of FIG.

【００４４】いま、各テレビ会議端末に対応するパケッ
ト計数部４０−１、４０−２、４０−３、４０−４から
それぞれパケット数ＮＵＭ−１、ＮＵＭ−２、ＮＵＭ−
３、ＮＵＭ−４が入力されると、パケット数比較部４１
は、それぞれパケット数を比較し最大であるパケット数
に対応するテレビ会議端末を、現在の話者テレビ会議端
末であると判断し、話者端末選択信号ＳＥＬを出力す
る。図９の例では、ＮＵＭ−１＝５、ＮＵＭ−２＝３、
ＮＵＭ−３＝４、ＮＵＭ−４＝２を比較し、ＮＵＭ−１
がパケット数が最大であるため、話者端末選択信号ＳＥ
Ｌ＝１を出力する。Now, the packet counting units 40-1, 40-2, 40-3, and 40-4 corresponding to the respective video conference terminals output the packet numbers NUM-1, NUM-2, and NUM-, respectively.
3. When NUM-4 is input, the packet number comparing unit 41
Compares the number of packets, determines that the video conference terminal corresponding to the maximum number of packets is the current speaker video conference terminal, and outputs a speaker terminal selection signal SEL. In the example of FIG. 9, NUM-1 = 5, NUM-2 = 3,
NUM-3 = 4, NUM-4 = 2 are compared, and NUM-1
Has the largest number of packets, so the speaker terminal selection signal SE
L = 1 is output.

【００４５】図６において、スイッチＳＷ入力される話
者端末選択信号ＳＥＬにしたがい、テレビ会議端末ＥＰ
−１、ＥＰ−２、ＥＰ−３、ＥＰ−４から受信し入力さ
れる圧縮画像パケットＶｉ−１、Ｖｉ−２、Ｖｉ−３、
Ｖｉ−４を選択し、選択された圧縮画像パケットを圧縮
画像パケットＶｏ−１、Ｖｏ−２、Ｖｏ−３、Ｖｏ−４
として出力する。よって、各テレビ会議端末ＥＰ−１、
ＥＰ−２、ＥＰ−３、ＥＰ−４には、話者端末からの圧
縮画像パケットをそれぞれに出力することになり、テレ
ビ会議端末全てに話者端末の画像を同じように表示させ
ることとなる。図６では、話者端末選択信号ＳＥＬ＝１
であるため、圧縮画像パケットＶｏ−１、Ｖｏ−２、Ｖ
ｏ−３、Ｖｏ−４は全て圧縮画像パケットＶｉ−１とな
り、全テレビ会議端末にはテレビ会議端末ＥＰ−１が多
地点会議制御装置ＭＣＵが出力する画像が表示される結
果となる。In FIG. 6, according to the speaker terminal selection signal SEL inputted by the switch SW, the video conference terminal EP
-1, EP-2, EP-3, and compressed image packets Vi-1, Vi-2, Vi-3 received and input from EP-4,
Vi-4 is selected, and the selected compressed image packets are converted into compressed image packets Vo-1, Vo-2, Vo-3, and Vo-4.
Output as Therefore, each video conference terminal EP-1,
The compressed image packet from the speaker terminal is output to each of EP-2, EP-3, and EP-4, and the image of the speaker terminal is displayed in the same manner on all the video conference terminals. . In FIG. 6, the speaker terminal selection signal SEL = 1
Therefore, the compressed image packets Vo-1, Vo-2, V
o-3 and Vo-4 are all compressed image packets Vi-1, and the result is that all the video conference terminals display an image output from the multipoint conference control unit MCU by the video conference terminal EP-1.

【００４６】また、図５を参照し、音声加算部３の動作
を説明する。The operation of the voice adding unit 3 will be described with reference to FIG.

【００４７】音声加算部３は各テレビ会議端末ＥＰ−
１、ＥＰ−２、ＥＰ−３、ＥＰ−４から受信し、音声圧
縮伸張部２−１、２−２、２−３、２−４においてそれ
ぞれ音声伸張処理を施された伸張音声パケットＡＤｉ−
１、ＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４が入力され
る。一方、出力は、最終的にテレビ会議端末ＥＰ−１に
送信されることとなる伸張音声パケットＡＤｏ−１、テ
レビ会議端末ＥＰ−２に送信されることとなる伸張音声
パケットＡＤｏ−２、テレビ会議端末ＥＰ−３に送信さ
れることとなる伸張音声パケットＡＤｏ−３、テレビ会
議端末ＥＰ−４に送信されることとなる伸張音声パケッ
トＡＤｏ−４をそれぞれ出力する。ここで、伸張音声パ
ケットＡＤｏ−１に注目した場合、伸張音声パケットＡ
Ｄｏ−１は、テレビ会議端末ＥＰ−２、ＥＰ−３、ＥＰ
−４から受信した伸張音声パケットＡＤｉ−２、ＡＤｉ
−３、ＡＤｉ−４を加算器ＡＤＤ−１で加算処理したも
のである。同様に伸張音声パケットＡＤｏ−２はテレビ
会議端末ＥＰ−１、ＥＰ−３、ＥＰ−４から受信した伸
張音声パケットＡＤｉ−１、ＡＤｉ−３、ＡＤｉ−４を
加算器ＡＤＤ−２で加算処理したもの、伸張音声パケッ
トＡＤｏ−３はテレビ会議端末ＥＰ−１、ＥＰ−２、Ｅ
Ｐ−４から受信した伸張音声パケットＡＤｉ−１、ＡＤ
ｉ−２、ＡＤｉ−４を加算器ＡＤＤ−３で加算処理した
もの、伸張音声パケットＡＤｏ−４はテレビ会議端末Ｅ
Ｐ−１、ＥＰ−２、ＥＰ−３から受信した伸張音声パケ
ットＡＤｉ−１、ＡＤｉ−２、ＡＤｉ−３を加算器ＡＤ
Ｄ−４で加算処理したものである。これらの作用によ
り、音声加算部３は、送信先のテレビ会議端末からの入
力音声以外の音声を加算処理し出力する。The audio adder 3 is connected to each video conference terminal EP-
1, EP-2, EP-3, EP-4, and decompressed voice packets ADi- which have been subjected to voice decompression processing in the voice compression / decompression units 2-1, 2-2, 2-3, 2-4, respectively.
1, ADi-2, ADi-3, and ADi-4 are input. On the other hand, the output is an extended voice packet ADo-1 to be finally transmitted to the video conference terminal EP-1, an extended voice packet ADo-2 to be transmitted to the video conference terminal EP-2, and a video conference. It outputs an expanded voice packet ADo-3 to be transmitted to the terminal EP-3 and an expanded voice packet ADo-4 to be transmitted to the video conference terminal EP-4. Here, when attention is paid to the expanded voice packet ADo-1, the expanded voice packet A
Do-1 is a teleconference terminal EP-2, EP-3, EP
, The expanded voice packets ADi-2 and ADi received from
-3 and ADi-4 are added by an adder ADD-1. Similarly, the decompressed voice packet ADo-2 is obtained by adding the decompressed voice packets ADi-1, ADi-3, and ADi-4 received from the video conference terminals EP-1, EP-3, and EP-4 by the adder ADD-2. , The extended voice packet ADo-3 is a teleconference terminal EP-1, EP-2, E
Decompressed voice packets ADi-1, AD received from P-4
i-2 and ADi-4 are added by an adder ADD-3, and the decompressed voice packet ADo-4 is a video conference terminal E
The decompressed voice packets ADi-1, ADi-2 and ADi-3 received from P-1, EP-2 and EP-3 are added to the adder AD
This is the result of addition processing at D-4. By these operations, the voice adding unit 3 performs an addition process on voices other than the voice input from the video conference terminal of the transmission destination and outputs the processed voice.

【００４８】次に図７を参照し、テレビ会議端末ＥＰ−
１、ＥＰ−２、ＥＰ−３、ＥＰ−４の動作を説明する。Next, referring to FIG. 7, a video conference terminal EP-
1, the operations of EP-2, EP-3, and EP-4 will be described.

【００４９】音声圧縮部１０１は、マイクロフォン１０
０から入力される音声信号に音声圧縮処理を施した圧縮
音声パケットをネットワークインタフェイス部１０８、
インターネット１０を介し、多地点テレビ会議制御装置
ＭＣＵに送信する。同様に画像圧縮部１０３はビデオカ
メラ１０２で撮影した画像信号に画像圧縮処理を施した
圧縮画像パケットをネットワークインタフェイス部１０
８、インターネット１０を介し、多地点テレビ会議制御
装置ＭＣＵに送信する。また、画像伸張部１０４は、多
地点テレビ会議制御装置ＭＣＵからインターネット１
０、ネットワークインタフェイス部１０８を介し圧縮画
像パケットを受信する。ここで画像伸張部１０４は画像
伸張処理を施し、得られた映像信号をディスプレイに出
力することによりディスプレイに画像が表示される。こ
こで、多地点テレビ会議制御装置ＭＣＵが出力する圧縮
画像パケットは、現在の話者であるテレビ会議端末が出
力する圧縮画像パケットであるため、ディスプレイには
話者のテレビ会議端末のカメラで撮影された映像が表示
されることとなる。[0049] The audio compression unit 101 includes the microphone 10
A compressed voice packet obtained by subjecting a voice signal inputted from 0 to voice compression processing is transmitted to the network interface unit 108,
The data is transmitted to the multipoint video conference control unit MCU via the Internet 10. Similarly, the image compression unit 103 converts a compressed image packet obtained by performing image compression processing on an image signal captured by the video camera 102 into a network interface unit 10.
8. Transmission to the multipoint video conference control unit MCU via the Internet 10. Further, the image decompression unit 104 transmits the Internet 1 from the multipoint video conference control unit MCU.
0, receive the compressed image packet via the network interface unit 108. Here, the image decompression unit 104 performs an image decompression process, and outputs the obtained video signal to a display, whereby an image is displayed on the display. Here, since the compressed image packet output by the multipoint video conference control unit MCU is a compressed image packet output by the video conference terminal that is the current speaker, the display uses the camera of the speaker's video conference terminal. The displayed video is displayed.

【００５０】図１０は図７の音声圧縮部１０１の動作を
説明する図を示す。FIG. 10 is a diagram for explaining the operation of the audio compression unit 101 in FIG.

【００５１】今、音声圧縮部１０１が音声圧縮処理を行
う際、無音として判断する無音閾値ＴＨを設ける。次
に、図１０（ｂ）に示すように、音声圧縮部１０１が音
声圧縮処理を行った結果、図１０（ａ）で図示した無音
閾値ＴＨをしたまわる音声信号の場合、無音パケットと
して圧縮処理を行い、無音閾値ＴＨを超えた音声信号の
場合通常の音声パケットとして圧縮処理を行うことを示
している。図１０（ｃ）では、圧縮された音声パケット
のうち、音声圧縮部１０１が出力するパケットは無音パ
ケットを除いた有効な音声パケットのみであることを示
している。これは同時にテレビ会議端末ＥＰ−ｎが多地
点テレビ会議制御装置ＭＣＵに送信する音声パケットは
有効な音声圧縮パケットのみであることを示している。Now, when the audio compression section 101 performs the audio compression processing, a silence threshold TH for judging silence is provided. Next, as illustrated in FIG. 10B, when the audio compression unit 101 performs the audio compression process and finds that the audio signal satisfies the silence threshold TH illustrated in FIG. Indicates that the audio signal exceeds the silence threshold TH, and the compression process is performed as a normal audio packet. FIG. 10C shows that among the compressed voice packets, the packets output by the voice compression unit 101 are only valid voice packets excluding silence packets. This indicates that the audio packets transmitted from the video conference terminal EP-n to the multipoint video conference control unit MCU are only valid audio compression packets.

【００５２】これらの作用により、本発明による多地点
テレビ会議制御装置は、各テレビ会議端末から入力され
る有効な（無音でない）圧縮音声パケットの単位時間あ
たりの数により現在の話者テレビ会議端末を判断し、す
べてのテレビ会議端末に話者テレビ会議端末の画像が送
信されることとなる。With these operations, the multipoint video conference control device according to the present invention provides the present speaker video conference terminal with the number of valid (non-silent) compressed voice packets input from each video conference terminal per unit time. Is determined, and the image of the speaker video conference terminal is transmitted to all the video conference terminals.

【００５３】[0053]

【発明の効果】以上説明したように本発明は、各テレビ
会議端末から受信する無音ではない有効な音声パケット
の時間あたり数を判断することにより、話者端末を効果
的に検出するので、多地点テレビ会議制御装置に接続さ
れた全テレビ会議端末に配送するビデオパケットの送信
元テレビ電話端末を話者端末とすることで、安価に効果
的に画像切替を行う多地点テレビ会議制御装置が実現で
きるという効果がある。As described above, according to the present invention, the speaker terminal is effectively detected by judging the number of valid non-silent voice packets per time received from each video conference terminal. A multipoint videoconference control device that effectively and inexpensively switches images is realized by using the videophone terminal that transmits video packets to all videoconference terminals connected to the local videoconference control device as the speaker terminal. There is an effect that can be.

[Brief description of the drawings]

【図１】本発明の多地点テレビ会議制御装置のビデオパ
ケット送信方法の一実施の形態のフローチャートであ
る。FIG. 1 is a flowchart of an embodiment of a video packet transmission method of a multipoint video conference controller according to the present invention.

【図２】本発明の多地点テレビ会議装置の一実施の形態
を含む多地点テレビ会議システムのブロック図である。FIG. 2 is a block diagram of a multipoint video conference system including an embodiment of the multipoint video conference device of the present invention.

【図３】図２の多地点テレビ会議装置ＭＣＵのブロック
図である。FIG. 3 is a block diagram of the multipoint video conference device MCU of FIG. 2;

【図４】図３の話者判別部４のブロック図である。FIG. 4 is a block diagram of a speaker determination unit 4 of FIG. 3;

【図５】図３の音声加算部３のブロック図である。FIG. 5 is a block diagram of a voice adding unit 3 of FIG. 3;

【図６】図３の画像切替部５のブロック図である。FIG. 6 is a block diagram of the image switching unit 5 of FIG. 3;

【図７】図２のテレビ会議端末ＥＰ−nのブロック図で
ある。FIG. 7 is a block diagram of the video conference terminal EP-n in FIG. 2;

【図８】図４のパケット計数部４０−nの動作を説明す
る図である。FIG. 8 is a diagram illustrating the operation of the packet counting unit 40-n in FIG.

【図９】図４のパケット数比較部４１の動作を説明する
図である。FIG. 9 is a diagram illustrating the operation of the packet number comparison unit 41 of FIG.

【図１０】図７の音声圧縮部１０１の動作を説明する図
であって、（ａ）はマイクロフォン１００からの音声信
号と無音閾値を表し、（ｂ）は音声圧縮部１０１が音声
圧縮処理を行う前の圧縮データを表し、（ｃ）は音声圧
縮処理後の圧縮データを表わす。10A and 10B are diagrams illustrating the operation of the audio compression unit 101 in FIG. 7, wherein FIG. 10A illustrates an audio signal from the microphone 100 and a silence threshold, and FIG. 10B illustrates an audio compression process performed by the audio compression unit 101. (C) represents the compressed data after the audio compression processing.

【図１１】多地点テレビ電話制御装置の従来例を示すブ
ロック図である。FIG. 11 is a block diagram showing a conventional example of a multipoint videophone control device.

【図１２】図１１の制御部５１６の詳細な構成を示すブ
ロック図である。12 is a block diagram illustrating a detailed configuration of a control unit 516 in FIG.

[Explanation of symbols]

１、１０８ネットワークインタフェイス部２−１、２−２、２−３、２−４音声圧縮伸張部３音声加算部４話者判別部５画像切替部６呼制御部１０インターネット４０−１、４０−２、４０−３、４０−４パケット
計数部４１パケット数比較部１００マイクロフォン１０１音声圧縮部１０２ビデオカメラ１０３画像圧縮部１０４画像伸張部１０５ディスプレイ１０６音声伸張部１０７スピーカＡＤｉ−１、ＡＤｉ−２、ＡＤｉ−３、ＡＤｉ−４
伸張音声パケットＡＤｏ−１、ＡＤｏ−２、ＡＤｏ−３、ＡＤｏ−４
音声加算処理が施された伸張音声パケットＡＥｉ−１、ＡＥｉ−２、ＡＥｉ−３、ＡＥｉ−４
圧縮音声パケットＡＥｏ−１、ＡＥｏ−２、ＡＥｏ−３、ＡＥｏ−４
圧縮音声パケットＥＰ−１、ＥＰ−２、ＥＰ−３、ＥＰ−４テレビ会
議端末ＭＣＵ多地点テレビ会議装置ＮＤ−１、ＮＤ−２、ＮＤ−３、ＮＤ−４、ＮＤ−ｎ
ネットワークデータＮＵＭ−１、ＮＵＭ−２、ＮＵＭ−３、ＮＵＭ−４
パケット数ＳＥＬ話者端末識別信号ＳW スイッチＴ１、Ｔ２、．．．Ｔ５時間タイミングＶｉ−１、Ｖｉ−２、Ｖｉ−３、Ｖｉ−４画像パケ
ット、圧縮画像パケットＶｏ−１、Ｖｏ−２、Ｖｏ−３、Ｖｏ−４選択画像
パケット、圧縮画像パケット1, 108 Network interface unit 2-1, 2-2, 2-3, 2-4 Voice compression / expansion unit 3 Voice addition unit 4 Speaker discrimination unit 5 Image switching unit 6 Call control unit 10 Internet 40-1, 40 -2, 40-3, 40-4 Packet counting unit 41 Packet number comparison unit 100 Microphone 101 Audio compression unit 102 Video camera 103 Image compression unit 104 Image expansion unit 105 Display 106 Audio expansion unit 107 Speakers ADi-1, ADi-2 , ADi-3, ADi-4
Expanded voice packets ADo-1, ADo-2, ADo-3, ADo-4
Decompressed voice packets subjected to voice addition processing AEi-1, AEi-2, AEi-3, AEi-4
Compressed voice packet AEo-1, AEo-2, AEo-3, AEo-4
Compressed voice packet EP-1, EP-2, EP-3, EP-4 Video conference terminal MCU Multipoint video conference device ND-1, ND-2, ND-3, ND-4, ND-n
Network data NUM-1, NUM-2, NUM-3, NUM-4
Number of packets SEL Speaker terminal identification signal SW switch T1, T2,. . . T5 time timing Vi-1, Vi-2, Vi-3, Vi-4 image packet, compressed image packet Vo-1, Vo-2, Vo-3, Vo-4 selected image packet, compressed image packet

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 11/00 ３０２Ｇ１０Ｌ 9/00 ＮＦターム(参考） 5C064 AA02 AC04 AC06 AC12 AC17 AD01 AD08 AD13 AD14 5K015 AA10 AB01 AB02 AF05 JA10 JA11 5K030 HA08 HB01 HB02 HC01 JT01 JT04 LD08 5K101 KK04 LL02 MM04 NN06 NN07 NN18 NN25 SS07 TT03 UU19 UU20 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification code FI Theme coat ゛ (Reference) H04M 11/00 302 G10L 9/00 NF Term (Reference) 5C064 AA02 AC04 AC06 AC12 AC17 AD01 AD08 AD13 AD14 5K015 AA10 AB01 AB02 AF05 JA10 JA11 5K030 HA08 HB01 HB02 HC01 JT01 JT04 LD08 5K101 KK04 LL02 MM04 NN06 NN07 NN18 NN25 SS07 TT03 UU19 UU20

Claims

[Claims]

1. A method of transmitting a video packet to each video conference terminal from a multipoint video conference control device to which a video conference terminal that does not transmit audio packets during silence is connected. Detecting the speaker terminal by determining the number of audio packets per time, and transmitting the detected speaker terminal to all the video conference terminals connected to the multipoint video conference control device as the source video phone terminal. Delivering the selected image video packet to the multipoint video conference controller.

2. The step of detecting a speaker terminal includes the steps of: inputting a voice packet which has been received from a plurality of video conference terminals via the Internet and subjected to decompression processing; Counting the number of packets, outputting the identification signal for identifying the terminal that has received the most packets as the speaker terminal, and distributing the selected image video packet. 2. The multipoint video conference control device according to claim 1, wherein an image packet received from a terminal is input, and a selected image packet as a speaker terminal is switched according to the identification signal input at the same time and transmitted to each video conference terminal. Video packet transmission method.

3. The step of detecting a speaker terminal includes inputting voice packets that have been received from a plurality of video conference terminals via the Internet and subjected to decompression processing, and the unit time of each valid non-silent voice packet is And outputting the number of valid voice packets per unit time, performing comparison, and selecting the video conference terminal that has transmitted the maximum number of packets per time as the speaker terminal. Outputting a speaker terminal selection signal. 4. The method according to claim 2, further comprising: outputting a speaker terminal selection signal.

4. A multi-point video conference control device for holding a video conference connected via the Internet to a plurality of video conference terminals that do not transmit voice packets when there is no sound, wherein there is no silence received from each video conference terminal. A speaker discriminating unit for detecting a speaker terminal by judging the number of valid voice packets per time, and a speaker connection unit connected to the multipoint video conference control device using the detected speaker terminal as a source video conference terminal. An image switching unit that delivers a selected image video packet to a video conference terminal.

5. A speaker discriminating unit which receives voice packets received from a plurality of video conference terminals via the Internet and subjected to decompression processing, and calculates the number of valid voice packets which are not silence per unit time. Count, determine the terminal that has received the most packets, and output an identification signal that identifies the terminal as the speaker terminal. The image switching unit inputs image packets received from a plurality of video conference terminals via the Internet. 5. The multipoint video conference control device according to claim 4, wherein a selected image packet as a speaker terminal is switched and transmitted to each video conference terminal in accordance with the identification signal input at the same time.

6. A speaker counting unit provided corresponding to a plurality of video conference terminals, receiving voice packets transmitted from the corresponding video conference terminals, counting the number of packets per time and outputting the packets. And a packet for outputting a speaker terminal selection signal for selecting, as a speaker terminal, a video conference terminal that has transmitted and input the number of packets per time output by the packet counting unit, 6. The multipoint video conference control device according to claim 5, further comprising a number comparison unit.