JP6179834B1

JP6179834B1 - Video conferencing equipment

Info

Publication number: JP6179834B1
Application number: JP2016188603A
Authority: JP
Inventors: 本田　義雅; 義雅本田; 剛志滝田
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-09-27
Filing date: 2016-09-27
Publication date: 2017-08-16
Anticipated expiration: 2036-09-27
Also published as: JP2018056719A

Abstract

【課題】テレビ会議装置において、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示すること。【解決手段】通信制御部１０３は、複数の他拠点の会議端末装置１０−２〜２４から、各々の映像データおよび音声データを受信する。映像・音声合成部１０５は、テレビ会議に参加する拠点の数に応じて画面レイアウトを決定し、画面レイアウトに従って、各拠点の映像データを合成した合成映像データを生成する。このとき、映像・音声合成部１０５は、音声データのレベルが閾値以上の各拠点の映像データの表示を、他の拠点の映像データの表示よりも強調されるように合成映像データを生成する。映像・音声出力制御部１０６は、合成映像データを、表示装置５００の画面に表示させる。【選択図】図１In a video conference apparatus, even when the number of bases is large, video data from a speech base is displayed in an easily understandable manner for a viewer. A communication control unit 103 receives video data and audio data from a plurality of conference terminal devices 10-2 to 24 at other bases. The video / audio synthesizing unit 105 determines a screen layout according to the number of bases participating in the video conference, and generates composite video data by combining the video data of each base according to the screen layout. At this time, the video / audio synthesizing unit 105 generates synthesized video data so that the display of the video data of each site whose audio data level is equal to or higher than the threshold is emphasized as compared with the display of the video data of other sites. The video / audio output control unit 106 displays the composite video data on the screen of the display device 500. [Selection] Figure 1

Description

本発明は、複数の拠点のそれぞれに設置された装置と同時接続可能なテレビ会議装置に関する。 The present invention relates to a video conference apparatus that can be simultaneously connected to apparatuses installed in a plurality of bases.

複数の拠点を繋いで遠隔会議を行うことができるテレビ会議システムが普及している。特許文献１には、３拠点の映像データを同時にモニタの画面に表示するテレビ会議装置が記載されている。 Video conferencing systems that can perform remote conferences by connecting a plurality of bases have become widespread. Patent Document 1 describes a video conference apparatus that simultaneously displays video data of three sites on a monitor screen.

特開２０１２−２３１４２８号公報JP 2012-231428 A

また、近年、多くの拠点（例えば、２４拠点）と同時に接続できるテレビ会議装置が開発されている。 In recent years, video conference apparatuses that can be connected simultaneously with many sites (for example, 24 sites) have been developed.

各拠点からの映像データを、同一面積の領域に表示させると、拠点数が多い場合に、各映像データの表示面積が小さくなるので、視聴者には、発言を行った参加者が居る拠点（以下、「発言拠点」という）の映像データが分かり難くなってしまう。 If the video data from each location is displayed in the area of the same area, the display area of each video data will be reduced if the number of locations is large. The video data of the “speaking base” will be difficult to understand.

本発明の目的は、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示することができるテレビ会議装置を提供することである。 An object of the present invention is to provide a video conference apparatus capable of displaying video data from a speech base in an easily understandable manner even when the number of bases is large.

本発明のテレビ会議装置は、ホスト拠点に設けられ、複数の他拠点のテレビ会議装置と同時に接続可能なテレビ会議装置であって、前記ホスト拠点を撮影して映像データを取得する映像入力部と、前記ホスト拠点の音声を収音して音声データを取得する音声入力部と、前記複数の他拠点の会議端末装置から、各々の映像データおよび音声データを受信する通信制御部と、テレビ会議に参加する拠点の数に応じて画面レイアウトを決定し、前記画面レイアウトに従って前記各拠点の映像データを合成した合成映像データを生成して画面に表示させる表示制御部と、を具備し、前記音声データのレベルを検出するレベル検出部をさらに有し、前記表示制御部は、主として発言を行う参加者が居る発言拠点と、基本的に発言を行わず傍聴のみを行う参加者が居る傍聴拠点とが予め決められている場合に、前記発言拠点の映像データの表示面積が、前記傍聴拠点の映像データの表示面積よりも大きくなるように合成映像データを生成し、前記音声データのレベルが閾値以上となった前記傍聴拠点の映像データの表示面積を、他の前記傍聴拠点の映像データの表示面積よりも大きく、かつ、前記発言拠点の映像データの表示面積よりも小さくなるように合成映像データを生成する。
The video conference device of the present invention is a video conference device provided at a host site and connectable simultaneously with a video conference device at a plurality of other sites, and a video input unit that captures the host site and acquires video data; An audio input unit that collects audio from the host site and acquires audio data; a communication control unit that receives video data and audio data from the conference terminal devices at the plurality of other sites; and a video conference. A display control unit that determines a screen layout according to the number of participating sites, generates composite video data obtained by synthesizing video data of each site according to the screen layout, and displays the synthesized video data on the screen, and the audio data The display control unit further includes a speech base where there is a participant who mainly speaks, and basically only listens without speaking. If the hearing bases pressurized person is present is predetermined, the display area of the image data of the talk bases, generates a composite image data to be greater than the display area of the image data of the hearing site, the The display area of the video data of the listening base where the audio data level is equal to or higher than the threshold is larger than the display area of the video data of the other listening base and smaller than the display area of the video data of the speech base. The composite video data is generated as follows.

本発明によれば、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示することができる。 According to the present invention, even when the number of bases is large, video data from the speech bases can be displayed in an easy-to-understand manner for the viewer.

本発明の実施の形態１に係るテレビ会議装置の構成を示すブロック図1 is a block diagram showing a configuration of a video conference apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態１に係るテレビ会議装置の状態の遷移を示す状態遷移図State transition diagram showing state transition of the video conference apparatus according to Embodiment 1 of the present invention 本発明の実施の形態１に係るテレビ会議装置の動作を示すフロー図The flowchart which shows operation | movement of the video conference apparatus based on Embodiment 1 of this invention. 本発明の実施の形態１に係るテレビ会議装置の画面レイアウトの制御動作を示すフロー図The flowchart which shows the control operation of the screen layout of the video conference apparatus concerning Embodiment 1 of this invention 本発明の実施の形態１に係るテレビ会議装置の画面に表示される合成映像データの画面レイアウトの一例を示す図The figure which shows an example of the screen layout of the synthetic | combination video data displayed on the screen of the video conference apparatus based on Embodiment 1 of this invention. 本発明の実施の形態２に係るテレビ会議装置の画面に表示される合成映像データの画面レイアウトの一例を示す図The figure which shows an example of the screen layout of the synthetic | combination video data displayed on the screen of the video conference apparatus based on Embodiment 2 of this invention. 本発明の実施の形態３に係るテレビ会議装置の画面に表示される合成映像データの画面レイアウトの一例を示す図The figure which shows an example of the screen layout of the synthetic | combination video data displayed on the screen of the video conference apparatus based on Embodiment 3 of this invention. 本発明のバリエーションに係るテレビ会議装置の画面に表示される合成映像データの画面レイアウトの一例を示す図The figure which shows an example of the screen layout of the synthetic | combination video data displayed on the screen of the video conference apparatus based on the variation of this invention

以下、図面を適宜参照して、本発明の実施の形態につき、詳細に説明する。なお、以下では、同時に接続できる拠点数が２４であるテレビ会議システムを例に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate. In the following, a video conference system in which the number of sites that can be connected simultaneously is 24 will be described as an example.

（実施の形態１）
＜テレビ会議装置の構成および接続関係＞
まず、本発明の実施の形態１に係るテレビ会議装置１０の構成および接続関係について、図１を用いて詳細に説明する。 (Embodiment 1)
<Configuration and connection of video conference equipment>
First, the configuration and connection relationship of the video conference apparatus 10 according to Embodiment 1 of the present invention will be described in detail with reference to FIG.

各拠点には、それぞれ、テレビ会議装置１０が設置されている。ホスト拠点（多拠点接続装置（ＭＣＵ）、拠点番号１）のテレビ会議装置１０−１は、他の拠点（拠点番号２〜２４）のそれぞれに設置された各テレビ会議装置１０−２〜２４と、ネットワーク２０を介して接続し、音声データ及び映像データの送信／受信を行う。ネットワーク２０は、典型的にはインターネットである。 A video conference device 10 is installed at each base. The video conference apparatus 10-1 at the host site (multi-site connection device (MCU), site number 1) is connected to each video conference device 10-2 to 24 installed at each of the other sites (base numbers 2 to 24). And connected via the network 20 to transmit / receive audio data and video data. The network 20 is typically the Internet.

テレビ会議装置１０−１は、本体装置１００、ユーザ操作入力装置２００、映像入力装置３００、音声入力装置４００および表示装置５００から構成されている。 The video conference device 10-1 includes a main device 100, a user operation input device 200, a video input device 300, an audio input device 400, and a display device 500.

本体装置１００は、ユーザ操作入力装置２００、映像入力装置３００、音声入力装置４００および表示装置５００と接続している。 The main device 100 is connected to the user operation input device 200, the video input device 300, the audio input device 400, and the display device 500.

ユーザ操作入力装置２００は、ユーザによる拠点を選択する操作を検出し、選択された拠点を示す選択拠点情報を含む信号を有線又は無線により本体装置１００に送信する。ユーザ操作入力装置２００は、典型的にはリモコン又はタッチパネルである。 The user operation input device 200 detects an operation of selecting a base by the user, and transmits a signal including selected base information indicating the selected base to the main body device 100 by wire or wireless. The user operation input device 200 is typically a remote control or a touch panel.

映像入力装置３００は、本体装置１００が設置されている拠点を撮影して得られた映像データを本体装置１００に出力する。なお、映像入力装置３００が出力する映像データには、カメラが撮像した映像データの他に、ＰＣに保存された映像データ、ＤＶＤプレーヤにより再生された映像データ等がある。 The video input device 300 outputs video data obtained by photographing the location where the main device 100 is installed to the main device 100. The video data output from the video input device 300 includes video data saved by a PC, video data reproduced by a DVD player, and the like in addition to video data captured by a camera.

音声入力装置４００は、本体装置１００が設置されている拠点の音声を収音して得られた音声データ等を本体装置１００に出力する。なお、音声入力装置４００が出力する音声データには、マイクが収音した音声データの他に、ＰＣに保存された映像データとセットの音声データ、ＤＶＤプレーヤにより再生された映像データとセットの音声データ等がある。 The voice input device 400 outputs voice data and the like obtained by collecting the voice of the site where the main device 100 is installed to the main device 100. The audio data output from the audio input device 400 includes audio data collected by a microphone, video data stored in a PC and audio data set, video data reproduced by a DVD player, and audio data set. There are data.

表示装置５００は、本体装置１００から出力された映像データを画面に表示すると共に、本体装置１００から出力された音声データを図示しないスピーカから出力する。 The display device 500 displays the video data output from the main device 100 on a screen and outputs the audio data output from the main device 100 from a speaker (not shown).

本体装置１００は、ユーザ指示受信部１０１と、会議制御部１０２と、通信制御部１０３と、映像・音声符号化／復号部１０４と、映像・音声合成部１０５と、映像・音声出力制御部１０６と、静止画保持部１０７と、映像入力制御部１０８と、音声入力制御部１０９と、から主に構成される。なお、映像・音声合成部１０５と映像・音声出力制御部１０６とにより、表示制御部が構成される。 The main device 100 includes a user instruction receiving unit 101, a conference control unit 102, a communication control unit 103, a video / audio encoding / decoding unit 104, a video / audio synthesis unit 105, and a video / audio output control unit 106. And a still image holding unit 107, a video input control unit 108, and an audio input control unit 109. The video / audio synthesis unit 105 and the video / audio output control unit 106 constitute a display control unit.

ユーザ指示受信部１０１は、ユーザ操作入力装置２００から送信された信号を受信し、受信信号に含まれている選択拠点情報を抽出し、会議制御部１０２に出力する。選択拠点情報には、テレビ会議に参加する各拠点の発信先情報（ＩＰアドレスあるいはＩＳＤＮ番号）が含まれる。 The user instruction receiving unit 101 receives a signal transmitted from the user operation input device 200, extracts selected base information included in the received signal, and outputs it to the conference control unit 102. The selected base information includes destination information (IP address or ISDN number) of each base participating in the video conference.

会議制御部１０２は、ユーザ指示受信部１０１から入力した選択拠点情報に基づいて、通信制御部１０３と、映像・音声符号化／復号部１０４と、映像・音声合成部１０５と、におけるデータの入出力のタイミングを制御する。また、会議制御部１０２は、ユーザ指示受信部１０１から入力した選択拠点情報を映像・音声合成部１０５に出力する。また、会議制御部１０２は、選択拠点情報に基づいて、通信制御部１０３における発信処理及び呼の確立処理を制御すると共に、映像データを受信したか否かを監視する。 The conference control unit 102 receives data from the communication control unit 103, the video / audio encoding / decoding unit 104, and the video / audio synthesis unit 105 based on the selected base information input from the user instruction receiving unit 101. Control output timing. In addition, the conference control unit 102 outputs the selected base information input from the user instruction receiving unit 101 to the video / voice synthesis unit 105. In addition, the conference control unit 102 controls the transmission processing and call establishment processing in the communication control unit 103 based on the selected base information, and monitors whether video data has been received.

通信制御部１０３は、会議制御部１０２の制御に従ったタイミングで動作する。通信制御部１０３は、他のテレビ会議装置１０−２〜２４との間で呼を確立する。そして、通信制御部１０３は、呼を確立した後、他のテレビ会議装置１０−２〜２４が送信した映像データ及び音声データを、ネットワーク２０を介して受信し、映像・音声符号化／復号部１０４に出力する。また、通信制御部１０３は、呼を確立した後、映像・音声符号化／復号部１０４から入力した映像データ及び音声データを、ネットワーク２０を介して他のテレビ会議装置１０−２〜２４に送信する。なお、通信制御部１０３は、所定の通信プロトコルに従って動作する。通信プロトコルは、典型的にはＳＩＰ又はＨ．３２３である。 The communication control unit 103 operates at a timing according to the control of the conference control unit 102. The communication control unit 103 establishes a call with the other video conference apparatuses 10-2 to 24-24. Then, after establishing the call, the communication control unit 103 receives the video data and audio data transmitted from the other video conference apparatuses 10-2 to 24 through the network 20, and receives the video / audio encoding / decoding unit. To 104. Also, after establishing the call, the communication control unit 103 transmits the video data and audio data input from the video / audio encoding / decoding unit 104 to the other video conference apparatuses 10-2 to 24 through the network 20. To do. The communication control unit 103 operates according to a predetermined communication protocol. The communication protocol is typically SIP or H.264. H.323.

映像・音声符号化／復号部１０４は、会議制御部１０２の制御に従ったタイミングで動作する。映像・音声符号化／復号部１０４は、映像・音声合成部１０５から入力した映像データ、および、音声入力制御部１０９から入力した音声データを符号化して通信制御部１０３に出力する。また、映像・音声符号化／復号部１０４は、通信制御部１０３から入力した他のテレビ会議装置１０−２〜２４からの映像データおよび音声データを復号して映像・音声合成部１０５に出力する。 The video / audio encoding / decoding unit 104 operates at a timing according to the control of the conference control unit 102. The video / audio encoding / decoding unit 104 encodes the video data input from the video / audio synthesis unit 105 and the audio data input from the audio input control unit 109 and outputs the encoded data to the communication control unit 103. Also, the video / audio encoding / decoding unit 104 decodes video data and audio data from the other video conference apparatuses 10-2 to 24-24 input from the communication control unit 103 and outputs them to the video / audio synthesis unit 105. .

また、映像・音声符号化／復号部１０４は、各テレビ会議装置１０−１〜２４の音声データのレベルを検出し、検出結果を映像・音声合成部１０５に出力する（レベル検出部）。 The video / audio encoding / decoding unit 104 detects the level of the audio data of each of the video conference apparatuses 10-1 to 10-24, and outputs the detection result to the video / audio synthesizing unit 105 (level detection unit).

映像・音声合成部１０５は、会議制御部１０２の制御に従ったタイミングで動作する。映像・音声合成部１０５は、映像・音声符号化／復号部１０４から入力した他のテレビ会議装置１０−２〜２４からの映像データと、映像入力制御部１０８から入力した映像データと、に基づいて、会議制御部１０２から入力した選択拠点情報の拠点数に応じて、複数の映像データを合成した合成映像データを生成し、映像・音声出力制御部１０６に出力する。なお、映像・音声合成部１０５は、各テレビ会議装置１０−２〜２４から映像データを受信するまでの間、静止画保持部１０７に保持された静止画が表示されるように合成映像データを生成する。 The video / audio synthesis unit 105 operates at a timing according to the control of the conference control unit 102. The video / audio synthesizer 105 is based on the video data from the other video conference apparatuses 10-2 to 24-24 input from the video / audio encoding / decoding unit 104 and the video data input from the video input control unit 108. In accordance with the number of bases of the selected base information input from the conference control unit 102, composite video data obtained by combining a plurality of video data is generated and output to the video / audio output control unit 106. The video / audio synthesis unit 105 outputs the synthesized video data so that the still image held in the still image holding unit 107 is displayed until the video data is received from each of the video conference apparatuses 10-2 to 24-24. Generate.

また、映像・音声合成部１０５は、映像・音声符号化／復号部１０４から入力した他のテレビ会議装置１０−２〜２４からの音声データと、音声入力制御部１０９から入力した音声データを合成した合成音声データを生成し、映像・音声出力制御部１０６に出力する。また、映像・音声合成部１０５は、映像入力制御部１０８から入力した映像データを映像・音声符号化／復号部１０４に出力する。 Also, the video / audio synthesis unit 105 synthesizes the audio data input from the video / audio encoding / decoding unit 104 and the audio data input from the audio input control unit 109 with the audio data input from the other video conference apparatuses 10-2 to 10-24. The synthesized voice data is generated and output to the video / audio output control unit 106. Further, the video / audio synthesis unit 105 outputs the video data input from the video input control unit 108 to the video / audio encoding / decoding unit 104.

また、映像・音声合成部１０５は、音声データレベルの検出結果に応じて発言拠点を認定し、発言拠点の映像データを強調表示するように合成映像データを変更する。 Also, the video / audio synthesis unit 105 recognizes the speech base in accordance with the detection result of the voice data level, and changes the composite video data so that the video data of the speech base is highlighted.

映像・音声出力制御部１０６は、映像・音声合成部１０５から入力した合成映像データを表示装置５００の画面に表示させると共に、合成音声データを表示装置５００のスピーカから音声として出力させる。 The video / audio output control unit 106 displays the synthesized video data input from the video / audio synthesis unit 105 on the screen of the display device 500 and outputs the synthesized audio data from the speaker of the display device 500 as audio.

静止画保持部１０７は、他のテレビ会議装置１０−２〜２４から映像データを受信するまでの期間において、所定の静止画を表示装置５００の画面に表示させるための静止画データを予め保持している。 The still image holding unit 107 holds in advance still image data for displaying a predetermined still image on the screen of the display device 500 during a period until video data is received from the other video conference apparatuses 10-2 to 10-24. ing.

映像入力制御部１０８は、映像入力装置３００から入力した映像データを映像・音声合成部１０５に出力する。 The video input control unit 108 outputs the video data input from the video input device 300 to the video / audio synthesis unit 105.

音声入力制御部１０９は、音声入力装置４００から入力した音声データを映像・音声符号化／復号部１０４および映像・音声合成部１０５に出力する。 The audio input control unit 109 outputs the audio data input from the audio input device 400 to the video / audio encoding / decoding unit 104 and the video / audio synthesis unit 105.

＜テレビ会議装置の接続状態の遷移＞
次に、テレビ会議装置１０−１の接続状態の遷移について、図２を用いて詳細に説明する。 <Transition of video conferencing equipment connection status>
Next, the transition of the connection state of the video conference apparatus 10-1 will be described in detail with reference to FIG.

テレビ会議装置１０−１は、電源ＯＮとなることにより動作を開始する。テレビ会議装置１０−１は、電源ＯＮされた直後では非通信状態である（Ｓ１）。 The video conference apparatus 10-1 starts operating when the power is turned on. The video conference apparatus 10-1 is in a non-communication state immediately after being powered on (S1).

テレビ会議装置１０−１は、非通信状態（Ｓ１）において、他の１つのテレビ会議装置１０−ｉ（ｉは２から２４のいずれかの整数）と接続することにより、テレビ会議装置１０−ｉと一対一の通信状態（１：１通信状態）になる（Ｓ２）。そして、テレビ会議装置１０−１は、１：１通信状態（Ｓ２）において、通信状態であるテレビ会議装置１０−ｉとの接続を切断すると非通信状態になる（Ｓ１）。なお、テレビ会議装置１０−１は、非通信状態（Ｓ１）において、電源ＯＦＦとなることにより動作を終了する。 In the non-communication state (S1), the video conference apparatus 10-1 is connected to the other video conference apparatus 10-i (i is an integer from 2 to 24), whereby the video conference apparatus 10-i. And a one-to-one communication state (1: 1 communication state) (S2). Then, in the 1: 1 communication state (S2), the video conference device 10-1 becomes a non-communication state when the connection with the video conference device 10-i in the communication state is disconnected (S1). In addition, the video conference apparatus 10-1 ends the operation when the power is turned off in the non-communication state (S1).

また、テレビ会議装置１０−１は、１：１通信状態（Ｓ２）から、さらに他のテレビ会議装置１０−ｊ（ｊはｉ以外の２から２４のいずれかの整数）と接続することにより、複数の拠点のテレビ会議装置１０−ｉ、ｊと通信状態（ＭＣＵ通信状態）になる（Ｓ３）。 In addition, the video conference apparatus 10-1 is further connected to another video conference apparatus 10-j (j is an integer from 2 to 24 other than i) from the 1: 1 communication state (S2). A communication state (MCU communication state) is established with the video conference apparatuses 10-i, j at a plurality of bases (S3).

そして、テレビ会議装置１０−１は、通信状態にあるテレビ会議装置１０−ｊとの接続を切断すれば一対一の通信状態になり（Ｓ２）、さらに通信状態にあるテレビ会議装置１０−ｉとの接続を切断すれば非通信状態になる（Ｓ１）。 Then, the video conference apparatus 10-1 is in a one-to-one communication state when the connection with the video conference apparatus 10-j in communication is cut off (S2), and further, with the video conference apparatus 10-i in communication state If the connection is disconnected, the communication state is turned off (S1).

また、テレビ会議装置１０−１は、非通信状態（Ｓ１）において、他の全てのテレビ会議装置１０−２〜２４に対して一斉に発信することにより他の全てのテレビ会議装置１０−２〜２４と通信状態（ＭＣＵ通信状態）になる（Ｓ３）。そして、テレビ会議装置１０は、他の全てのテレビ会議装置１０−２〜２４との接続を一斉に切断すれば非通信状態になる（Ｓ１）。 In addition, the video conference apparatus 10-1 transmits all the other video conference apparatuses 10-2 to 24 at the same time in the non-communication state (S1), thereby transmitting all the other video conference apparatuses 10-2 to 10-2. 24 and a communication state (MCU communication state) (S3). And the video conference apparatus 10 will be in a non-communication state, if the connection with all the other video conference apparatuses 10-2-24 is cut | disconnected all at once (S1).

一斉に発信する方法としては、ユーザが発信時に発信先をテレビ会議装置１０−１に手入力する方法、又は、テレビ会議装置１０−１に予め記憶させておいた複数の発信先情報を登録したリストをユーザに選択させる方法等が考えられる。また、発信先を特定する方法としては、ＩＰアドレス、電話番号又は識別コード等が考えられる。 As a method of transmitting all at once, a method in which a user manually inputs a destination to the video conference device 10-1 at the time of outgoing or a plurality of destination information stored in advance in the video conference device 10-1 is registered. A method of allowing the user to select a list can be considered. Further, as a method for specifying the destination, an IP address, a telephone number, an identification code, or the like can be considered.

＜テレビ会議装置の動作＞
次に、テレビ会議装置１０−１の動作について、図３を用いて詳細に説明する。なお、図３は、テレビ会議装置１０−１が、他の全てのテレビ会議装置１０−２〜２４に対して一斉に発信する場合のフローである。 <Operation of video conference device>
Next, operation | movement of the video conference apparatus 10-1 is demonstrated in detail using FIG. FIG. 3 is a flow in the case where the video conference apparatus 10-1 transmits all other video conference apparatuses 10-2 to 24 all at once.

まず、ユーザ指示受信部１０１が、ユーザ操作入力装置２００から信号を受信し、受信した信号に含まれる選択拠点情報を抽出して会議制御部１０２に出力する。 First, the user instruction receiving unit 101 receives a signal from the user operation input device 200, extracts selected base information included in the received signal, and outputs it to the conference control unit 102.

会議制御部１０２は、選択拠点情報のＮ（Ｎは１以上の整数、図３ではＮ＝２３）拠点のそれぞれに設置されたテレビ会議装置１０−２〜２４に一斉発信を行うように通信制御部１０３を制御する。これにより、通信制御部１０３は、選択拠点のテレビ会議装置１０−２〜２４に一斉発信を行う（Ｓ１１）。具体的には、会議制御部１０２は、選択拠点情報が複数の発信先情報を記録した１つのリストの情報である場合、そのリストに登録されている発信先情報の発信先の数により拠点数Ｎを認識することができる。 The conference control unit 102 performs communication control so that simultaneous transmission is performed to the video conference apparatuses 10-2 to 24 installed in each of the N locations (N is an integer of 1 or more, N = 23 in FIG. 3) of the selected location information. The unit 103 is controlled. Thereby, the communication control part 103 performs simultaneous transmission to the video conference apparatuses 10-2-24 of a selection base (S11). Specifically, when the selected base information is information of one list in which a plurality of destination information is recorded, the conference control unit 102 determines the number of bases based on the number of destinations of the destination information registered in the list. N can be recognized.

また、映像・音声合成部１０５は、会議制御部１０２から入力した選択拠点情報に基づいて画面レイアウトを決定する（Ｓ１２）。 Further, the video / audio synthesis unit 105 determines the screen layout based on the selected base information input from the conference control unit 102 (S12).

また、通信制御部１０３は、他の拠点の各テレビ会議装置１０−２〜２４との間で呼を確立する（Ｓ１３）。 In addition, the communication control unit 103 establishes a call with each of the video conference apparatuses 10-2 to 24 at other bases (S13).

次に、映像・音声合成部１０５は、映像入力制御部１０８から入力したテレビ会議装置１０−１の映像データ、および、他のテレビ会議装置１０−２〜２４の映像データのそれぞれが、画面レイアウトの、対応する分割領域に表示されるように合成映像データを生成する。そして、映像・音声出力制御部１０６が、映像・音声合成部１０５から入力した初期合成映像データを表示装置５００の画面に表示させる（Ｓ１４）（図５（ａ）、図６（ａ）参照）。なお、他の各テレビ会議装置１０−２〜２４から映像データを受信するまでは、対応する分割領域に静止画データが表示される。 Next, the video / audio synthesizing unit 105 displays the screen layout of the video data of the video conference device 10-1 input from the video input control unit 108 and the video data of the other video conference devices 10-2 to 10-24. The synthesized video data is generated so as to be displayed in the corresponding divided area. Then, the video / audio output control unit 106 displays the initial synthesized video data input from the video / audio synthesis unit 105 on the screen of the display device 500 (S14) (see FIGS. 5A and 6A). . Note that still image data is displayed in the corresponding divided areas until video data is received from the other video conference apparatuses 10-2 to 10-24.

次に、映像・音声合成部１０５は、発言拠点の映像データを強調するための強調表示制御動作（Ｓ１５）を行う。 Next, the video / audio synthesizing unit 105 performs an emphasis display control operation (S15) for emphasizing the video data of the speech base.

以下、本実施の形態のテレビ会議装置１０−１における、強調表示制御動作（Ｓ１５）の詳細について図４を用いて説明する。 Details of the highlight display control operation (S15) in the video conference apparatus 10-1 of the present embodiment will be described below with reference to FIG.

まず、映像・音声符号化／復号部１０４が、各テレビ会議装置１０−１〜２４の音声データのレベルを検出し、検出結果を映像・音声合成部１０５に出力する（Ｓ２１）。 First, the video / audio encoding / decoding unit 104 detects the level of the audio data of each video conference device 10-1 to 24 and outputs the detection result to the video / audio synthesis unit 105 (S21).

映像・音声合成部１０５は、音声データのレベルが閾値以上の場合に発言があったと認識する。そして、映像・音声合成部１０５は、新たな発言拠点があったか否かを判定する（Ｓ２２）。 The video / audio synthesizing unit 105 recognizes that there is an utterance when the level of the audio data is equal to or higher than the threshold value. Then, the video / sound synthesis unit 105 determines whether or not there is a new speech base (S22).

新たな発言拠点があった場合（Ｓ２２：ＹＥＳ）、映像・音声合成部１０５は、当該発言拠点について個別タイマをスタートさせる（Ｓ２３）。また、映像・音声合成部１０５は、当該発言拠点の映像データが強調表示されるように合成映像データを生成する。そして、映像・音声出力制御部１０６は、映像・音声合成部１０５から入力した新たな合成映像データを表示装置５００の画面に表示させる（Ｓ２４）。その後、フローはＳ２１に戻る。 When there is a new speech base (S22: YES), the video / voice synthesis unit 105 starts an individual timer for the speech base (S23). Also, the video / audio synthesis unit 105 generates synthesized video data so that the video data of the speech base is highlighted. Then, the video / audio output control unit 106 displays the new synthesized video data input from the video / audio synthesis unit 105 on the screen of the display device 500 (S24). Thereafter, the flow returns to S21.

一方、新たな発言拠点がなかった場合（Ｓ２２：ＮＯ）、映像・音声合成部１０５は、既存の発言拠点が追加の発言を行ったか否かを判定する（Ｓ２５）。 On the other hand, when there is no new speech base (S22: NO), the video / speech synthesis unit 105 determines whether or not the existing speech base has made an additional speech (S25).

そして、既存の発言拠点が追加の発言を行った場合（Ｓ２５：ＹＥＳ）、映像・音声合成部１０５は、個別タイマをリスタートさせる（Ｓ２６）。その後、フローはＳ２１に戻る。 When the existing utterance base makes an additional utterance (S25: YES), the video / audio synthesizer 105 restarts the individual timer (S26). Thereafter, the flow returns to S21.

一方、既存の発言拠点が追加の発言を行わなかった場合（Ｓ２５：ＮＯ）、映像・音声合成部１０５は、個別タイマが満了した（所定時間Ｔを計時した）か否かを判定する（Ｓ２７）。 On the other hand, when the existing utterance base does not make an additional utterance (S25: NO), the video / speech synthesizer 105 determines whether the individual timer has expired (the predetermined time T is counted) (S27). ).

個別タイマが満了した場合（Ｓ２７：ＹＥＳ）、映像・音声合成部１０５は、当該発言拠点の強調表示を解除した合成映像データを生成する。そして、映像・音声出力制御部１０６は、映像・音声合成部１０５から入力した新たな更新合成映像データを表示装置５００の画面に表示させる（Ｓ２８）。その後、フローはＳ２１に戻る。 When the individual timer expires (S27: YES), the video / sound synthesis unit 105 generates synthesized video data in which the highlighting of the speech base is canceled. Then, the video / audio output control unit 106 displays the new updated synthesized video data input from the video / audio synthesis unit 105 on the screen of the display device 500 (S28). Thereafter, the flow returns to S21.

個別タイマが満了していない場合（Ｓ２７：ＮＯ）、フローはＳ２１に戻る。 If the individual timer has not expired (S27: NO), the flow returns to S21.

＜画面レイアウトの具体例＞
次に、映像・音声合成部１０５による合成映像データの画面レイアウトの具体例について、図５を用いて詳細に説明する。なお、図５において、画面内の各領域に記された数字は拠点番号に対応する。例えば、「１」と記された領域には、拠点番号１のテレビ会議装置１０−１で撮像された映像データが表示される。 <Specific examples of screen layout>
Next, a specific example of the screen layout of the synthesized video data by the video / audio synthesis unit 105 will be described in detail with reference to FIG. In FIG. 5, the numbers written in each area in the screen correspond to the base numbers. For example, video data captured by the video conference apparatus 10-1 with the site number 1 is displayed in the area marked “1”.

図５（ａ）に示すように、合成映像データは、（Ｎ＋１）個以上（図５（ａ）では２４個）に分割された各領域に、対応する拠点の映像データ（あるいは静止画データ）が表示されるようにレイアウトされる。 As shown in FIG. 5A, the composite video data includes video data (or still image data) of a base corresponding to each area divided into (N + 1) or more (24 in FIG. 5A). Is laid out so that is displayed.

図５（ａ）に示した合成映像データを表示した状態で、拠点番号１９の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図５（ｂ）に示すように、当該発言拠点（拠点番号１８）の映像データの表示領域を拡大させるように合成映像データを変更する。発言拠点の映像データの表示領域を拡大させることにより、発言拠点が強調される。 In the state where the composite video data shown in FIG. 5A is displayed, when the participant with the base number 19 newly speaks, as shown in FIG. The composite video data is changed so as to enlarge the display area of the video data of the message base (base number 18). The speech base is emphasized by expanding the display area of the video data of the speech base.

その後、さらに、拠点番号１２の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図５（ｃ）に示すように、全ての発言拠点（拠点番号１２、１９）の映像データの表示領域を拡大させるように合成映像データを変更する。 Thereafter, when the participant who has the base number 12 newly makes a speech, the video conference apparatus 10-1 displays images of all the speech bases (base numbers 12, 19) as shown in FIG. The composite video data is changed to enlarge the data display area.

＜効果＞
このように、本実施の形態では、発言拠点の映像データの表示面積が、他の拠点の映像データの表示面積よりも大きくなるように合成映像データを生成する。これにより、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示することができる。 <Effect>
As described above, in the present embodiment, the composite video data is generated so that the display area of the video data at the speech base is larger than the display area of the video data at the other base. Thereby, even when the number of bases is large, video data from the speech bases can be displayed in an easy-to-understand manner for the viewer.

（実施の形態２）
強調表示制御として、上記実施の形態１では、発言拠点の映像データの表示領域を拡大させる場合について説明したが、実施の形態２では、発言拠点の映像データの表示方法を変更させる場合について説明する。 (Embodiment 2)
As the highlighting control, the first embodiment has been described with respect to the case where the display area of the video data of the speech base is enlarged, but the second embodiment will describe the case where the display method of the video data of the speech base is changed. .

なお、本実施の形態のテレビ会議装置１０の構成および接続関係は、実施の形態１で説明した図１に示したものと同一であるので、その説明を省略する。また、本実施の形態のテレビ会議装置１０−１の動作は、実施の形態１で説明した図３、図４に示したものと同一であるので、その説明を省略する。 Note that the configuration and connection relationship of the video conference apparatus 10 according to the present embodiment are the same as those shown in FIG. Moreover, since the operation | movement of the video conference apparatus 10-1 of this Embodiment is the same as what was shown in FIG. 3, FIG. 4 demonstrated in Embodiment 1, the description is abbreviate | omitted.

＜画面レイアウトの具体例＞
次に、映像・音声合成部１０５による合成映像データの画面レイアウトの具体例について、図６を用いて詳細に説明する。なお、図６において、画面内の各領域に記された数字は拠点番号に対応する。例えば、「１」と記された領域には、拠点番号１のテレビ会議装置１０−１で撮像された映像データが表示される。 <Specific examples of screen layout>
Next, a specific example of the screen layout of the synthesized video data by the video / sound synthesizer 105 will be described in detail with reference to FIG. In FIG. 6, the numbers written in each area on the screen correspond to the base numbers. For example, video data captured by the video conference apparatus 10-1 with the site number 1 is displayed in the area marked “1”.

図６（ａ）に示すように、合成映像データは、（Ｎ＋１）個以上（図６（ａ）では２４個）に分割された各領域に、対応する拠点の映像データ（あるいは静止画データ）が表示されるようにレイアウトされる。このとき、各拠点の映像データの左肩部分には、対応する拠点のＩＰアドレスや拠点名称などの拠点情報が表示される。なお、図６では説明の簡単化のため全て"site"と図示している。以降、映像データの”site”は、拠点情報を示すものとする。 As shown in FIG. 6A, the composite video data includes video data (or still image data) of a base corresponding to each area divided into (N + 1) or more (24 in FIG. 6A). Is laid out so that is displayed. At this time, base information such as the IP address and base name of the corresponding base is displayed on the left shoulder of the video data of each base. In FIG. 6, “site” is shown for simplicity of explanation. Hereinafter, “site” in the video data indicates base information.

図６（ａ）に示した合成映像データを表示した状態で、拠点番号１９の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図６（ｂ）に示すように、当該発言拠点（拠点番号１８）の映像データの"site"の表示方法を変更した合成映像データを生成する。発言拠点の映像データの一部の表示方法を変更することにより、発言拠点が強調される。なお、表示方法の変更のパターンとして、図６（ｂ）のように反転させるものや、色を変化させるものがある。 When the participant with the base number 19 newly speaks in the state in which the composite video data shown in FIG. 6A is displayed, the video conference device 10-1 is as shown in FIG. The composite video data is generated by changing the display method of the “site” of the video data of the message base (base number 18). The speech base is emphasized by changing the display method of a part of the video data of the speech base. Note that there are patterns for changing the display method as shown in FIG. 6B and patterns for changing the color.

その後、さらに、拠点番号１２の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図６（ｃ）に示すように、全ての発言拠点（拠点番号１２、１９）の映像データの"site"の表示を変更した合成映像データを生成する。 Thereafter, when the participant with the base number 12 newly makes a speech, the video conference apparatus 10-1 displays the images of all the speech bases (base numbers 12, 19) as shown in FIG. Generates composite video data with the data "site" display changed.

＜効果＞
このように、本実施の形態では、発言拠点の映像データの表示方法が、他の拠点の映像データの表示方法と異なるように合成映像データを生成する。これにより、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示することができる。 <Effect>
As described above, in the present embodiment, the composite video data is generated so that the video data display method at the speech base is different from the video data display method at other bases. Thereby, even when the number of bases is large, video data from the speech bases can be displayed in an easy-to-understand manner for the viewer.

（実施の形態３）
強調表示制御として、上記実施の形態１では、発言拠点の映像データの表示領域を拡大させる場合について説明したが、実施の形態３では、発言拠点の映像データの表示領域を拡大させるとともに、表示位置を変更させる場合について説明する。 (Embodiment 3)
As the highlighting control, the case where the display area of the video data at the speech base is enlarged is described in the first embodiment, but the display area of the video data at the speech base is enlarged and the display position is displayed in the third embodiment. A case of changing the above will be described.

＜画面レイアウトの具体例＞
次に、映像・音声合成部１０５による合成映像データの画面レイアウトの具体例について、図７を用いて詳細に説明する。なお、図７において、画面内の各領域に記された数字は拠点番号に対応する。例えば、「１」と記された領域には、拠点番号１のテレビ会議装置１０−１で撮像された映像データが表示される。 <Specific examples of screen layout>
Next, a specific example of the screen layout of the synthesized video data by the video / audio synthesis unit 105 will be described in detail with reference to FIG. In FIG. 7, the numbers written in each area in the screen correspond to the base numbers. For example, video data captured by the video conference apparatus 10-1 with the site number 1 is displayed in the area marked “1”.

図５（ａ）に示した合成映像データを表示した状態で、拠点番号１の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図７（ａ）に示すように、当該発言拠点（拠点番号１）の映像データの表示領域を拡大させるとともに、表示位置を変更するように合成映像データを変更する。 In the state where the composite video data shown in FIG. 5A is displayed, when the participant with the base number 1 newly speaks, as shown in FIG. The display area of the video data of the message base (base number 1) is enlarged, and the composite video data is changed so as to change the display position.

その後、さらに、拠点番号１９の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図７（ｂ）に示すように、全ての発言拠点（拠点番号１、１９）の映像データの表示領域を拡大させるとともに、表示位置を変更するように合成映像データを変更する。 Thereafter, when the participant with the base number 19 newly makes a speech, the video conference apparatus 10-1 displays the video of all the speech bases (base numbers 1, 19) as shown in FIG. While expanding the data display area, the composite video data is changed so as to change the display position.

その後、さらに、拠点番号２０の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図７（ｃ）に示すように、全ての発言拠点（拠点番号１、１９、２０）の映像データの表示領域を拡大させるとともに、表示位置を変更するように合成映像データを変更する。 Thereafter, when the participant who has the base number 20 newly makes a statement, the video conference apparatus 10-1 makes all the speech bases (base numbers 1, 19, and 20) as shown in FIG. The display area of the video data is enlarged and the composite video data is changed so as to change the display position.

その後、さらに、拠点番号３の参加者が新たに発言を行った場合、テレビ会議装置１０−１は、図７（ｄ）に示すように、全ての発言拠点（拠点番号１、１９、２０、３）の映像データの表示領域を拡大させるとともに、表示位置を変更するように合成映像データを変更する。 Thereafter, when the participant with the base number 3 newly makes a statement, as shown in FIG. 7 (d), the video conference apparatus 10-1 displays all the speech bases (base numbers 1, 19, 20, 3) The video data display area is expanded and the composite video data is changed so as to change the display position.

＜効果＞
このように、本実施の形態では、発言拠点の映像データの表示面積が、他の拠点の映像データの表示面積よりも大きくなり、かつ、発言拠点の映像データの表示位置が変更されるように合成映像データを生成する。これにより、拠点数が多い場合でも、発言拠点からの映像データを視聴者に分かり易く表示することができる。 <Effect>
As described above, in this embodiment, the display area of the video data of the speech base is larger than the display area of the video data of the other base, and the display position of the video data of the speech base is changed. Generate composite video data. Thereby, even when the number of bases is large, video data from the speech bases can be displayed in an easy-to-understand manner for the viewer.

（バリエーション）
本発明では、主として発言を行う参加者が居る発言拠点と、基本的に発言を行わず傍聴のみを行う参加者が居る傍聴拠点とが予め決められている場合に、図８（ａ）に示すように、発言拠点（図８（ａ）では拠点番号１、２、３、４）の映像データの表示面積を傍聴拠点の映像データの表示面積よりも大きくする等、発言拠点の映像データを常に強調表示するようにしてもよい。 (variation)
In the present invention, when a speech base where there is a participant who mainly speaks and a hearing base where there is a participant who does not basically speak and only listens are determined in advance, it is shown in FIG. In this way, the video data of the speech base is always changed, for example, the display area of the video data of the speech base (base numbers 1, 2, 3, 4 in FIG. 8A) is larger than the display area of the video data of the listening base. You may make it highlight.

さらに、図８（ｂ）に示すように、傍聴拠点の参加者が質問等の発言を行った場合、当該傍聴拠点（図８（ｂ）では拠点番号１７）の映像データの表示面積を他の傍聴拠点の映像データの表示面積よりも大きくする等、当該傍聴拠点の映像データを強調表示するようにしてもよい。 Further, as shown in FIG. 8 (b), when a participant in the hearing base makes a question or the like, the display area of the video data of the hearing base (the base number 17 in FIG. 8 (b)) is changed to another. You may make it highlight the video data of the said hearing base, such as making it larger than the display area of the video data of a hearing base.

なお、本発明は、部材の種類、配置、個数等は前述の実施の形態に限定されるものではなく、その構成要素を同等の作用効果を奏するものに適宜置換する等、発明の要旨を逸脱しない範囲で適宜変更可能である。 The present invention is not limited to the above-described embodiments in terms of the type, arrangement, number, etc. of the members, and departs from the gist of the invention, such as appropriately replacing the constituent elements with those having the same operational effects. It is possible to change appropriately within the range not to be.

具体的には、上記の実施の形態では、映像を表示するまでに静止画を表示する場合について説明したが、本発明はこれに限られず、静止画以外のメッセージ等の文字情報を表示する、あるいは、黒画面のままとするようにしてもよい。 Specifically, in the above embodiment, the case where a still image is displayed before displaying a video has been described, but the present invention is not limited thereto, and character information such as a message other than a still image is displayed. Alternatively, the black screen may be left as it is.

また、上記の実施の形態では、同時に接続できる拠点数が２４であるテレビ会議システムを例に説明したが、本発明は同時に接続できる拠点数に制限は無い。 In the above embodiment, the video conference system in which the number of sites that can be connected simultaneously is 24 has been described as an example. However, the present invention has no limitation on the number of sites that can be connected simultaneously.

また、本発明では、ユーザの設定により、表示用の映像データと送信用の映像データとを別個に生成することができる。例えば、映像入力制御部１０８から出力された映像データを表示用の映像データとし、映像・音声合成部１０５で合成された映像データを送信用の映像データとすることができる。 In the present invention, display video data and transmission video data can be generated separately according to user settings. For example, the video data output from the video input control unit 108 can be used as display video data, and the video data synthesized by the video / audio synthesis unit 105 can be used as transmission video data.

本発明は、ホスト拠点に設けられ、ホスト拠点と異なる複数の拠点の相手装置と同時に接続可能なテレビ会議装置に用いるに好適である。 The present invention is suitable for use in a video conference apparatus that is provided at a host site and can be connected simultaneously with counterpart devices at a plurality of sites different from the host site.

１０テレビ会議装置
１００本体装置
１０１ユーザ指示受信部
１０２会議制御部
１０３通信制御部
１０４映像・音声符号化／復号部
１０５映像・音声合成部
１０６映像・音声出力制御部
１０７静止画保持部
１０８映像入力制御部
１０９音声入力制御部
２００ユーザ操作入力装置
３００映像入力装置
４００音声入力装置
５００表示装置 DESCRIPTION OF SYMBOLS 10 Video conference apparatus 100 Main body apparatus 101 User instruction | indication receiving part 102 Conference control part 103 Communication control part 104 Video | voice / audio encoding / decoding part 105 Video | voice / synthesizer 106 Video | voice / audio output control part 107 Still image holding part 108 Video input Control unit 109 Audio input control unit 200 User operation input device 300 Video input device 400 Audio input device 500 Display device

Claims

A video conference device provided at a host site and connectable simultaneously with a video conference device at a plurality of other sites,
A video input unit that captures the host site and obtains video data;
A voice input unit that collects voice of the host site and acquires voice data;
A communication control unit that receives each video data and audio data from the conference terminal device at the plurality of other sites,
A display control unit that determines a screen layout according to the number of locations participating in a video conference, generates composite video data obtained by combining video data of each location according to the screen layout, and displays the composite video data on the screen;
Comprising
A level detector for detecting the level of the audio data;
The display control unit
When the speech base where there is a participant who mainly speaks and the hearing base where there is a participant who basically does not speak and only listens are determined in advance, the display area of the video data of the speech base, Generate composite video data to be larger than the display area of the video data of the hearing base ,
The display area of the video data of the listening base where the level of the audio data is equal to or greater than the threshold is larger than the display area of the video data of the other listening base , and more than the display area of the video data of the speech base Generate composite video data to be smaller,
Video conferencing equipment.