JP4768578B2

JP4768578B2 - Video conference system and control method in video conference system

Info

Publication number: JP4768578B2
Application number: JP2006297326A
Authority: JP
Inventors: 加寿子十文字
Original assignee: NEC AccessTechnica Ltd
Current assignee: NEC AccessTechnica Ltd
Priority date: 2006-11-01
Filing date: 2006-11-01
Publication date: 2011-09-07
Anticipated expiration: 2026-11-01
Also published as: JP2008118235A

Description

本発明は、多地点間のテレビ電話会議を実現するテレビ会議システム、及び、テレビ会議システムにおける制御方法に関する。 The present invention relates to a video conference system that realizes a multi-point videophone conference and a control method in the video conference system.

従来、多地点テレビ会議システムに使用するテレビ会議端末装置では、表示する相手の画像数を指定すること等により、画面を分割して会議に参加している相手を表示する。この表示方法に関し、例えば、特開平９−７００３１号公報（特許文献１）記載の技術が知られている。この技術では、全出席者とその中の発言者のみとを別々のカメラで撮影し、発言者の映像を全出席者映像にスーパーインポーズで表示することにより、誰が発言しているかが分かるようになっている。発言者を強調して表示する技術は、特開２００２−２１８４２４号公報（特許文献２）にも開示されている。 2. Description of the Related Art Conventionally, in a video conference terminal device used for a multi-point video conference system, by specifying the number of images of the other party to be displayed, the other party participating in the conference is displayed by dividing the screen. With regard to this display method, for example, a technique described in JP-A-9-70031 (Patent Document 1) is known. With this technology, all attendees and only the speakers within them are photographed with separate cameras, and the images of the speakers are displayed in a superimposed manner on all attendee images, so that it is possible to see who is speaking. It has become. A technique for emphasizing and displaying a speaker is also disclosed in Japanese Patent Laid-Open No. 2002-218424 (Patent Document 2).

又、特開２００４−３４３７６１号公報（特許文献３）では、受動的参加者に提供される会議メディアをある能動的参加者の活動や選択に基づいて決定する技術が提案されている。この技術においては、発言者からのフィードバックがバックチャネルを介して提供され、発言者の活動に基づいて、発言していない参加者にサーバからリアルタイムにコンテンツが配信される。 Japanese Patent Laid-Open No. 2004-343761 (Patent Document 3) proposes a technique for determining conference media provided to a passive participant based on the activity and selection of a certain active participant. In this technology, feedback from a speaker is provided via a back channel, and content is distributed in real time from a server to participants who are not speaking based on the activity of the speaker.

特開平９−７００３１号公報Japanese Patent Laid-Open No. 9-70031 特開２００２−２１８４２４号公報JP 2002-218424 A 特開２００４−３４３７６１号公報Japanese Patent Laid-Open No. 2004-343761

上記特許文献１おける発言者強調表示においては、発言者の決定は司会者又はテレビ会議システム操作者の判断で行っている。又、上記特許文献２においては、発言者の決定はテレビ会議システムの自動判定である。しかし、いずれの場合においても発言者の意志は反映されていない。 In the speaker emphasis display in Patent Document 1, the speaker is determined based on the judgment of the presenter or the video conference system operator. In Patent Document 2, the determination of the speaker is automatic determination of the video conference system. However, in any case, the will of the speaker is not reflected.

又、特許文献３においては、情報を提供するには会議で発言することしか出来ず、すると出席者全員にその情報が伝わる。従って、特定の会議出席者のみに個別に情報を伝えることが出来ない。即ち、発言者以外の会議出席者から他の特定の会議出席者に情報を送る手段が提供されていない。 In Patent Document 3, the only way to provide information is to speak at a meeting, and the information is transmitted to all attendees. Therefore, information cannot be individually communicated only to specific conference attendees. That is, no means is provided for sending information from a conference participant other than the speaker to another specific conference participant.

本発明の目的は、このような従来の課題を解決し、発言者が発言を強調したい時に画面でその意志を反映でき、更に、テレビ会議の発言者以外の出席者同士での情報伝達を可能にしたテレビ会議システム、及び、テレビ会議システムにおける制御方法を提供することにある。 The purpose of the present invention is to solve such a conventional problem, and to reflect the will on the screen when the speaker wants to emphasize the speech, and further, it is possible to transmit information between participants other than the speaker of the video conference Another object of the present invention is to provide a video conference system and a control method for the video conference system.

本発明は、テレビ会議サーバとネットワークを介して接続された複数のテレビ会議端末装置を含むテレビ会議システムにおいて、前記テレビ会議端末装置が、発言時に使用するマイクロホンと、会議出席者の映像を表示する表示部と、全会議出席者に対して聞こえるように発言する第１の発言モードと、全会議出席者に対して聞こえるようにし更に画面上で発言者を強調表示する第２の発言モードと、特定の会議出席者のみに聞こえるように発言する第３の発言モードとを発言者の操作で切替選択する手段と、前記テレビ会議端末装置が、前記第２の発言モードが選択された場合に、強調表示を示すメタデータを作成し、前記テレビ会議サーバに送信する手段と、を備えたことを特徴とする。
The present invention provides a video conference system including a plurality of video conference terminal devices connected to a video conference server via a network, wherein the video conference terminal device displays a microphone used when speaking and a video of a conference attendee. A display unit, a first speech mode for speaking to all conference attendees, and a second speech mode for allowing all conference attendees to hear and further highlight the speakers on the screen; The means for switching and selecting the third speech mode that speaks so as to be heard only by a specific conference attendant, and when the second speech mode is selected by the video conference terminal device, Means for creating metadata indicating highlighting and transmitting it to the video conference server .

又、本発明のテレビ会議システムにおいて、前記テレビ会議サーバは、前記メタデータを受信するとそのメタデータを送信した発言者を強調表示するテレビ会議映像を作成し、全会議出席者のテレビ会議端末装置に配信するようにしても良い。 In the video conference system of the present invention, when the video conference server receives the metadata, the video conference server creates a video conference video that highlights the speaker who transmitted the metadata, and the video conference terminal device of all conference attendees You may make it deliver to.

又、本発明のテレビ会議システムにおいて、前記テレビ会議端末装置が、他の会議出席者のアドレスを管理する手段と、前記第３の発言モードが選択された場合に、選択された前記特定の会議出席者のアドレスに音声データを送信する手段と、を更に備えも良い。 In the video conference system of the present invention, when the video conference terminal device selects the means for managing the addresses of other conference attendees and the third speech mode, the specific conference selected. And a means for transmitting voice data to the address of the attendee.

又、本発明のテレビ会議システムにおいて、前記テレビ会議端末装置が、前記第３の発言モードでの音声データ受信時と、その他の発言モードでの音声データ受信時とで、区別して音声再生する手段を更に備えても良い。 Further, in the video conference system according to the present invention, the video conference terminal device reproduces the audio by distinguishing between reception of the audio data in the third speech mode and reception of the audio data in the other speech mode. May be further provided.

又、本発明のテレビ会議システムにおいて、前記マイクロホンは、前記第３の発言モードで使用するマイクロホンと、その他の発言モードで使用するマイクロホンと、から構成されても良い。 In the video conference system of the present invention, the microphone may include a microphone used in the third speech mode and a microphone used in another speech mode.

又、本発明のテレビ会議システムにおいて、前記テレビ会議端末装置が、前記第３の発言モードでの音声データ受信時には、その音声データを送信した会議出席者の映像を、他の会議出席者と区別出来るように表示する手段を更に備えても良い。 In the video conference system of the present invention, when the video conference terminal device receives audio data in the third speech mode, the video of the conference attendee who transmitted the audio data is distinguished from other conference attendees. You may further provide the means to display as possible.

本発明は、テレビ会議サーバとネットワークを介して接続された複数のテレビ会議端末装置を含むテレビ会議システムにおける制御方法であって、前記テレビ会議端末装置において、全会議出席者に対して聞こえるように発言する第１の発言モードと、全会議出席者に対して聞こえるようにし更に画面上で発言者を強調表示する第２の発言モードと、特定の会議出席者のみに聞こえるように発言する第３の発言モードと、を発言者の操作で切替選択し、前記テレビ会議端末装置が、前記第２の発言モードが選択された場合に、強調表示を示すメタデータを作成し、前記テレビ会議サーバに送信することを特徴とする。
The present invention relates to a control method in a video conference system including a plurality of video conference terminal devices connected to a video conference server via a network so that all video conference attendees can hear in the video conference terminal device. A first speech mode that speaks, a second speech mode that is audible to all conference participants and highlights the speaker on the screen, and a third speech mode that is audible only to certain conference attendees When the video conference terminal device selects the second speech mode, the video conference terminal device creates metadata indicating highlighting and stores the metadata in the video conference server. It is characterized by transmitting .

又、本発明のテレビ会議システムにおける制御方法において、前記テレビ会議サーバが、前記メタデータを受信するとそのメタデータを送信した発言者を強調表示するテレビ会議映像を作成し、全会議出席者のテレビ会議端末装置に配信するようにしても良い。 In the control method in the video conference system according to the present invention, when the video conference server receives the metadata, the video conference server creates a video conference video that highlights a speaker who has transmitted the metadata, so that all conference participants' televisions are displayed. You may make it deliver to a conference terminal device.

又、本発明のテレビ会議システムにおける制御方法において、前記テレビ会議端末装置において、他の会議出席者のアドレスを管理し、前記第３の発言モードが選択された場合に、選択された前記特定の会議出席者のアドレスに音声データを送信するようにしても良い。 Further, in the control method in the video conference system according to the present invention, the video conference terminal device manages addresses of other conference attendees, and when the third speech mode is selected, the selected specific mode is selected. Audio data may be transmitted to the address of the meeting attendee.

又、本発明のテレビ会議システムにおける制御方法において、前記テレビ会議端末装置が、前記第３の発言モードでの音声データ受信時と、その他の発言モードでの音声データ受信時とで、区別して音声再生するようにしても良い。 Further, in the control method in the video conference system of the present invention, the video conference terminal apparatus distinguishes between the audio data received when the audio data is received in the third speech mode and the audio data received in the other speech mode. You may make it reproduce.

又、本発明のテレビ会議システムにおける制御方法において、前記テレビ会議端末装置が、前記第３の発言モードでの音声データ受信時には、その音声データを送信した会議出席者の映像を、他の会議出席者と区別出来るように表示するようにしても良い。 Further, in the control method in the video conference system according to the present invention, when the video conference terminal device receives the audio data in the third speech mode, the video of the conference attendee who transmitted the audio data is displayed in another conference attendance. You may make it display so that it may be distinguished from a person.

本発明によれば、発言者が発言を強調したい時に自らの操作で画面上の強調表示を行えるのでその意志を的確に反映できる。更に、本発明によれば、発言を会議出席者全員には聞かせずに特定の相手を選んで送信出来る手段を設けたので、テレビ会議の発言者以外の出席者同士での情報伝達が可能になる。 According to the present invention, when a speaker wants to emphasize a statement, the highlight on the screen can be performed by his / her own operation, and the intention can be accurately reflected. Furthermore, according to the present invention, since a means for selecting and transmitting a specific person without sending a message to all the meeting attendees is provided, it is possible to transmit information between attendees other than the speaker of the video conference. Become.

次に、本発明を実施するための最良の形態について図面を参照して詳細に説明する。図１は、本発明の一実施形態におけるテレビ会議端末装置の構成を示す機能ブロック図である。図２は、本発明の一実施形態におけるテレビ会議システム全体を示す機能ブロック図である。 Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a video conference terminal device according to an embodiment of the present invention. FIG. 2 is a functional block diagram showing the entire video conference system according to the embodiment of the present invention.

図１において、テレビ会議端末装置１００は、第１のマイクロホン１０１と、第２のマイクロホン１０２と、指示入力部１０３と、カメラ１０４と、表示部１０５と、音声受信制御部１０６と、音声送信制御部１０７と、メタデータ作成部１０８と、映像制御部１０９と、出席者アドレス管理部１１０と、スピーカ１１１とを備えている。 In FIG. 1, a video conference terminal device 100 includes a first microphone 101, a second microphone 102, an instruction input unit 103, a camera 104, a display unit 105, a voice reception control unit 106, and voice transmission control. Unit 107, metadata creation unit 108, video control unit 109, attendee address management unit 110, and speaker 111.

テレビ会議システムとしては、図２に示すように、ネットワーク２０２を介してテレビ会議サーバ２０１と、複数のテレビ会議端末装置１００ａ−１００ｃが接続されている。テレビ会議端末装置１００ａ−１００ｃは、図１のテレビ会議端末装置１００と同一構成である。 As the video conference system, as shown in FIG. 2, a video conference server 201 and a plurality of video conference terminal devices 100 a to 100 c are connected via a network 202. The video conference terminal devices 100a to 100c have the same configuration as the video conference terminal device 100 of FIG.

本実施形態におけるテレビ会議端末装置１００では、発言を送信する場合に次の３種類のモードがある。第１のモードは、テレビ電話会議の出席者全員に聞こえるように発言するが強調表示をしないモードであり、ここではこれを「会議発言モード」と呼ぶ。第２のモードは、テレビ電話会議の出席者全員に聞こえるように発言し、更に強調表示をするモードであり、これを「強調発言モード」と呼ぶ。第３のモードは、特定の会議出席者を選択し、その出席者のみに発言が聞こえるようにするモードであり、これを「個別発言モード」と呼ぶ。 In the video conference terminal device 100 in the present embodiment, there are the following three types of modes when transmitting a message. The first mode is a mode in which all the participants in the video conference call speak so that they can be heard but are not highlighted. Here, this is referred to as “conference speech mode”. The second mode is a mode in which all the attendees of the video conference call can speak and are further highlighted, and this is called “emphasized speech mode”. The third mode is a mode in which a specific conference attendee is selected so that only the attendee can hear the speech, and this is called an “individual speech mode”.

図１に戻り、テレビ会議端末装置１００の詳細について説明する。第１のマイクロホン１０１は、テレビ電話会議の出席者全員に聞こえるように発言するときに利用されるマイクロホンである。即ち、マイクロホン１０１は、会議発言モード及び強調発言モードで使用される。一方、第２のマイクロホン１０２は、テレビ電話会議の出席者の中で指定した出席者にのみ聞こえるように発言するとき（個別発言モード）に利用されるマイクロホンである。 Returning to FIG. 1, details of the video conference terminal device 100 will be described. The first microphone 101 is a microphone that is used when speaking so as to be heard by all the participants in the video conference call. That is, the microphone 101 is used in the conference speech mode and the emphasized speech mode. On the other hand, the second microphone 102 is a microphone used when speaking so as to be audible only to the attendee designated among the attendees of the videophone conference (individual speaking mode).

指示入力部１０３は、発言者が指示を入力する機能ブロックである。発言者は、テレビ電話会議で第１のマイクロホン１０１を用いて発言する場合で、出席者全員に聞こえるように発言したことを強調したいときに、この指示入力部１０３を操作して強調指示を入力する。又、発言者は、第２のマイクロホン１０２を用いて特定の出席者のみに発言をしたい場合には、この指示入力部１０３を用いてその旨の指示と必要な情報とを入力する。この指示入力部１０３は、例えば、キーボート、マウス、専用ボタン等の入力装置で実現出来る。 The instruction input unit 103 is a functional block for a speaker to input an instruction. When a speaker speaks using the first microphone 101 in a video conference call and wants to emphasize that he / she can hear all attendees, he / she operates the instruction input unit 103 to input an emphasis instruction. To do. In addition, when the speaker wants to speak only to a specific attendee using the second microphone 102, the instruction input unit 103 is used to input an instruction to that effect and necessary information. The instruction input unit 103 can be realized by an input device such as a keyboard, a mouse, and a dedicated button.

カメラ１０４は、テレビ電話会議の出席者の顔など、出席者が映したい映像を撮影してテレビ会議サーバ２０１へ送る。表示部１０５は、テレビ電話会議に参加した場合にテレビ会議サーバ２０１からネットワーク２０２を介して配信される映像を表示する。 The camera 104 shoots an image that the attendee wants to see, such as the face of the attendee of the videophone conference, and sends it to the teleconference server 201. The display unit 105 displays an image distributed from the video conference server 201 via the network 202 when participating in the video conference call.

音声受信制御部１０６は、ネットワーク２０２から受信した音声データを判別して処理し、スピーカ１１１を通して再生する機能ブロックである。音声受信制御部１０６において受信する音声データは、テレビ会議に出席している特定の出席者のテレビ会議端末装置から自分宛に届く音声と、テレビ会議サーバ２０１により全ての会議出席者に配信される音声とがある。音声送信制御部１０７は、指示入力部１０３の操作、及び、第１のマイクロホン１０１と第２のマイクロホン１０２のどちらが使用されたかにより、音声データの送信宛先を制御する。 The audio reception control unit 106 is a functional block that determines and processes audio data received from the network 202 and reproduces it through the speaker 111. The audio data received by the audio reception control unit 106 is distributed to all conference attendees by the video conference server 201 and the audio received from the video conference terminal device of the specific attendee attending the video conference. There is voice. The voice transmission control unit 107 controls the transmission destination of the voice data according to the operation of the instruction input unit 103 and which of the first microphone 101 and the second microphone 102 is used.

メタデータ作成部１０８は、指示入力部１０３の操作に応じて必要なメタデータを作成する機能ブロックである。メタデータとは、データについての情報を記述した管理用データであり、本実施形態の場合は、強調を意味する。作成されたメタデータは、テレビ会議サーバ２０１へ音声データと関連付けて送信される。 The metadata creation unit 108 is a functional block that creates necessary metadata according to the operation of the instruction input unit 103. Metadata is management data describing information about data, and in the case of the present embodiment, it means emphasis. The created metadata is transmitted to the video conference server 201 in association with the audio data.

映像制御部１０９は、表示部１０５に表示する映像を処理する機能ブロックである。ネットワーク２０２を介してテレビ会議サーバ２０１から配信される出席者映像の表示制御や、出席者アドレス管理部１１０が管理している出席者リストの表示制御を行う。 The video control unit 109 is a functional block that processes video displayed on the display unit 105. Display control of attendee video distributed from the video conference server 201 via the network 202 and display control of the attendee list managed by the attendee address management unit 110 are performed.

出席者アドレス管理部１１０は、テレビ電話会議開始時にテレビ会議サーバ２０１から出席者リストを受け取り、他の出席者のアドレスを格納する。特定の会議出席者のみへ伝える為に第２のマイクロホン１０２を使用して発言する場合には、指示入力部１０３の操作により出席者リストを表示して相手先、即ち、音声データを送付する宛先を決定する。 The attendee address management unit 110 receives the attendee list from the video conference server 201 at the start of the video conference call, and stores the addresses of other attendees. When speaking using the second microphone 102 to communicate only to a specific meeting attendee, the attendee list is displayed by operating the instruction input unit 103 and the destination, that is, the destination to which the voice data is sent To decide.

スピーカ１１１は、音声受信制御部１０６の制御により、テレビ会議サーバ２０１から送られてきた音声と、多の出席者から直接送信された音声とを区別して再生する。 Under the control of the audio reception control unit 106, the speaker 111 distinguishes and reproduces audio transmitted from the video conference server 201 and audio transmitted directly from many attendees.

次に、本発明を実施する為の最良の形態の動作について図面を参照して説明する。図３は、本発明の一実施形態において出席者が発言する場合のテレビ会議端末装置１００側の動作を示すフローチャートである。図４は、本発明の一実施形態におけるテレビ会議サーバ２０１側の動作を示すフローチャートである。図５は、本発明の一実施形態におけるテレビ会議端末１００側の音声受信動作と再生動作を示すフローチャートである。 Next, the operation of the best mode for carrying out the present invention will be described with reference to the drawings. FIG. 3 is a flowchart showing an operation on the video conference terminal device 100 side when an attendee speaks in an embodiment of the present invention. FIG. 4 is a flowchart showing the operation on the video conference server 201 side in the embodiment of the present invention. FIG. 5 is a flowchart showing an audio receiving operation and a reproducing operation on the video conference terminal 100 side in one embodiment of the present invention.

まず、テレビ電話会議に参加するテレビ会議端末装置１００は、ネットワーク２０２を介してテレビ会議サーバ２０１にアクセスし、会議参加の登録処理を行う（図３のステップＳ３０１）。テレビ電話会議が開始されると、カメラ１０４の撮影映像をテレビ会議サーバ２０１に送信すると共に、テレビ会議サーバ２０１から会議用映像が配信されるのでこれを受信し表示部１０５に表示する。又、テレビ会議サーバ２０１からは、出席者名とアドレスと含む出席者リストも配信されるので、これを出席者アドレス管理部１１０に格納する。この出席者リストは表示部１０５の一部に表示することができる（ステップＳ３０２）。尚、受信する音声の処理については後述する。 First, the video conference terminal device 100 participating in the video phone conference accesses the video conference server 201 via the network 202 and performs a conference participation registration process (step S301 in FIG. 3). When the video conference call is started, the video captured by the camera 104 is transmitted to the video conference server 201 and the video for conference is distributed from the video conference server 201, so that it is received and displayed on the display unit 105. The video conference server 201 also distributes an attendee list including attendee names and addresses, which are stored in the attendee address management unit 110. This attendee list can be displayed on a part of the display unit 105 (step S302). The processing of the received voice will be described later.

会議出席者が発言を希望し（ステップＳ３０３のＹＥＳ）、会議上での発言とする場合（出席者全員へ聞かせる場合）で（ステップＳ３０４のＹＥＳ）、発言を強調したい場合には（ステップＳ３０５のＹＥＳ）、自分が発言していることを出席者全員のテレビ会議端末装置１００の画面上で強調表示することが出来る。即ち、指示入力部１０３により強調指示の入力操作を行い、マイクロホン１０１を用いて発言する（強調発言モード）（ステップＳ３０６）。 If the conference attendee wants to speak (YES in step S303), the conference attendant wants to speak on the conference (if all attendees are asked) (YES in step S304), and if he wants to emphasize the speech (step S305) YES), it can be highlighted on the screen of the video conference terminal device 100 of all attendees that he / she is speaking. That is, the instruction input unit 103 performs an emphasis instruction input operation, and speaks using the microphone 101 (enhancement speech mode) (step S306).

メタデータ作成部１０８は、指示入力部１０３での強調指示を検知して、音声は強調されていることを示すメタデータを作成する。音声送信制御部１０７は、このメタデータを発言音声と関連付けてテレビ会議サーバ２０１へ送信する。 The metadata creation unit 108 detects the emphasis instruction from the instruction input unit 103 and creates metadata indicating that the sound is emphasized. The voice transmission control unit 107 transmits this metadata to the video conference server 201 in association with the speech voice.

会議上で発言するが特に強調表示を希望しない場合は（ステップＳ３０５のＮＯ）、強調指示を入力せずに、マイクロホン１０１を用いて発言する（会議発言モード）（ステップＳ３０７）。この場合に音声送信制御部１０７は、強調を意味するメタデータを付加せずに音声データをテレビ会議サーバ２０１へ送信する。 If the user speaks on the conference but does not particularly want to highlight (NO in step S305), he / she speaks using the microphone 101 (conference speech mode) without inputting an emphasis instruction (step S307). In this case, the audio transmission control unit 107 transmits the audio data to the video conference server 201 without adding metadata meaning emphasis.

一方、会議上で発言せずに、特定の出席者のみに発言を伝えたい場合には（ステップＳ３０４のＮＯ）、指示入力部１０３を操作して出席者リストの中から発言の宛先を選択し、マイクロホン１０２を用いて発言する（個別発言モード）（ステップＳ３０８）。この場合に音声送信制御部１０７は、音声データを選択された相手先のアドレスに直接送信する。このように、テレビ会議端末装置１００では、発言状況に応じた処理が会議終了（ステップＳ３０９のＹＥＳ）まで繰り返される。 On the other hand, when it is desired to convey a message only to a specific attendee without speaking at the conference (NO in step S304), the instruction input unit 103 is operated to select a message destination from the attendee list. Then, the user speaks using the microphone 102 (individual speech mode) (step S308). In this case, the voice transmission control unit 107 directly transmits the voice data to the selected destination address. In this manner, in the video conference terminal device 100, the process according to the speech situation is repeated until the conference ends (YES in step S309).

一方、テレビ会議サーバ２０１においては、出席者が揃ってテレビ電話会議が開始されると（図４のステップＳ４０１）、出席者のテレビ会議端末装置１００に出席者リストを配信する（ステップＳ４０２）。各出席者のテレビ会議端末装置１００から映像及び音声データが送信されてくると、それらのデータをテレビ会議用（複数の出席者表示）に加工する（ステップＳ４０３）。このとき、受信した音声データに強調表示のメタデータが付加されていると（ステップＳ４０４のＹＥＳ）、そのデータを送信してきたテレビ会議端末の出席者を強調するような映像を作成する（ステップＳ４０５）。作成された映像及び音声のデータは、ネットワーク２０２を介して出席者のテレビ会議端末装置１００に配信される（ステップＳ４０６）。会議が終了するまで（ステップＳ４０７のＹＥＳ）、ステップＳ４０３からステップ４０６の処理が繰り返される。 On the other hand, in the video conference server 201, when the attendees are gathered and the video conference call is started (step S401 in FIG. 4), the attendee list is distributed to the video conference terminal device 100 of the attendee (step S402). When video and audio data are transmitted from the video conference terminal device 100 of each attendee, the data is processed for video conference (displaying a plurality of attendees) (step S403). At this time, if the highlighted metadata is added to the received audio data (YES in step S404), a video that emphasizes the attendee of the video conference terminal that has transmitted the data is created (step S405). ). The created video and audio data is distributed to the attendee's video conference terminal device 100 via the network 202 (step S406). Until the conference ends (YES in step S407), the processing from step S403 to step 406 is repeated.

次に、テレビ会議端末装置１００での音声受信動作については、音声受信制御部１０６は、受信した音声データの発信元アドレスを参照して、テレビ会議サーバ２０１から配信された音声データか否かを判定する（図５のステップＳ５０１）。テレビ会議サーバ２０１から受信した音声データであれば、主音声で再生しスピーカ１１１に出力する（ステップＳ５０２）。一方、テレビ会議サーバ２０１から受信した音声データでなければ、他の出席者からのマイクロホン１０２を使用した発言だと判断し、副音声で再生しスピーカ１１１に出力する（ステップＳ５０３）。 Next, regarding the voice reception operation in the video conference terminal device 100, the voice reception control unit 106 refers to the transmission source address of the received voice data and determines whether or not the voice data is delivered from the video conference server 201. Determination is made (step S501 in FIG. 5). If it is audio data received from the video conference server 201, it is reproduced with the main audio and output to the speaker 111 (step S502). On the other hand, if it is not the audio data received from the video conference server 201, it is determined that the speech is from another attendee using the microphone 102, and is reproduced with sub audio and output to the speaker 111 (step S503).

図６（ａ）及び（ｂ）は、テレビ会議端末装置１００の表示部１０５に表示される映像の一例を示す図である。図６（ａ）は、出席者アドレス管理部１１０が管理している出席者リストにより、映像制御部１０９がテレビ会議用映像を表示部１０５に出力している画面６０１の例である。この画面例では、４分割された画面に各出席者の映像とそれに割り当られた出席者番号と出席者名等（６０３ａ〜６０３ｄ）が表示されている。この画面例では、出席者Ｊｈｏｎさんが、テレビ会議端末装置１００の指示入力部１０３で強調を指示し発言しているので（図３のステップＳ３０６処理）、強調表示６０２が為されている（強調発言モード）。この強調表示は、例えば、赤色の枠を表示する等、目につきやすいものとする。 6A and 6B are diagrams illustrating an example of an image displayed on the display unit 105 of the video conference terminal device 100. FIG. FIG. 6A shows an example of a screen 601 in which the video control unit 109 outputs the video for video conference to the display unit 105 based on the attendee list managed by the attendee address management unit 110. In this screen example, the video of each attendee, the attendee number assigned to the attendee, and attendee names (603a to 603d) are displayed on the screen divided into four. In this screen example, since attendee Jhon gives an instruction for emphasis by the instruction input unit 103 of the video conference terminal device 100 (step S306 in FIG. 3), the emphasis display 602 is made (emphasis). Speak mode). This highlighting is easily noticeable, for example, by displaying a red frame.

図６（ｂ）は、個別発言モードで第２のマイクロホン１０２を使用した発言（図３のステップＳ３０８処理）を行う為に、発言宛先を選択する際の画面例である。第２のマイクロホン１０２で発言する個別発言モードの旨を指示入力部１０３により操作すると、出席者アドレス管理部１１０が管理している出席者リスト６１０が表示されるので、指示入力部１０３により出席者番号を入力する等で発言の宛先を選択する。マイクロホン１０２が有効となりこれを利用して集音され、音声送信制御部１０７は音声データを生成し、選択した宛先に送信する。 FIG. 6B is an example of a screen when selecting a message destination to perform a message (step S308 in FIG. 3) using the second microphone 102 in the individual message mode. When the instruction input unit 103 is operated to indicate the individual speech mode in which the second microphone 102 speaks, the attendee list 610 managed by the attendee address management unit 110 is displayed. Select the address of the utterance by entering the number. The microphone 102 is activated and collected using the microphone 102, and the voice transmission control unit 107 generates voice data and transmits it to the selected destination.

尚、図示していないが、個別発言モードで発言の宛先となったテレビ会議端末装置１００の表示部１０５の画面において、音声データを送信してきた出席者を画面上で判別できるような表示を加えても良い。即ち、個別発言モードによる音声データを受信したテレビ会議端末装置１００では、音声データの送信アドレスから、音声データ送信者が判別出来るので、その送信者を画面上で、強調表示６０２と区別出来る表示にすれば良い。例えば、強調発言モードの強調表示６０２を赤い色の枠表示とすれば、個別発言モードによる送信者をそれ以外の青色や黄色等で枠表示することにより、誰が個別発言を送信してきたか一目で分かる。 Although not shown, a display is added on the screen of the display unit 105 of the video conference terminal device 100 that is the destination of the speech in the individual speech mode so that the attendee who has transmitted the voice data can be identified on the screen. May be. That is, in the video conference terminal device 100 that has received the voice data in the individual speech mode, the voice data sender can be identified from the voice data transmission address, so that the sender can be distinguished from the highlight display 602 on the screen. Just do it. For example, if the emphasis display 602 of the emphasis speech mode is displayed in a red frame, it is possible to know at a glance who has transmitted the individual remark by displaying the sender in the individual speech mode in a blue or yellow frame other than that. .

尚、上記実施形態における３つの発言モードの切替については、指示入力部１０３を発言しようとするたびに操作して発言モードを選択するようにしても良いし、又、特に操作がなければ基本的には会議発言モードとして第１のマイクロホン１０１を使用し、強調発言モード及び個別発言モードを使用する時には指示入力部１０３からの操作入力を行うようにしても良い。 Note that the switching of the three speech modes in the above embodiment may be performed by operating the instruction input unit 103 each time a speech is to be made, and the speech mode may be selected. For example, the first microphone 101 may be used as the conference speech mode, and the operation input from the instruction input unit 103 may be performed when the emphasized speech mode and the individual speech mode are used.

次に、本発明の他の実施形態について、図面を参照して説明する。図７は、本発明の他の実施形態におけるテレビ会議端末装置の構成を示す機能ブロック図である。図７において、テレビ会議端末装置７００は、マイクロホン７０１と、マイクロホン切替制御部７０２と、指示入力部７０３と、カメラ７０４と、表示部７０５と、音声受信制御部７０６と、音声送信制御部７０７と、メタデータ作成部７０８と、映像制御部７０９と、出席者アドレス管理部７１０と、スピーカ７１１とを備えている。 Next, another embodiment of the present invention will be described with reference to the drawings. FIG. 7 is a functional block diagram showing a configuration of a video conference terminal device according to another embodiment of the present invention. In FIG. 7, the video conference terminal device 700 includes a microphone 701, a microphone switching control unit 702, an instruction input unit 703, a camera 704, a display unit 705, a voice reception control unit 706, and a voice transmission control unit 707. , A metadata creation unit 708, a video control unit 709, an attendee address management unit 710, and a speaker 711.

図１の実施形態との違いは、第２のマイクロホン１０２（図１）が省略され、その代わりにマイクロホン切替制御部７０２が設けられていることである。即ち、１つのマイクロホン７０１を、会議上での発言（会議発言モード及び強調発言モード）と特定相手への発言（個別発言モード）とに共通して使用し、その制御動作をマイクロホン切替制御部７０２で切り替えている。例えば、指示入力部７０３での指示入力が特になければ、会議発言モードで処理し（図３のステップＳ３０７処理に相当）、強調指示入力があれば、強調発言モードで処理（図３のステップＳ３０６処理に相当）する。更に、指示入力部７０３で個別発言モードの指示入力があれば相手を選択して音声データを送信する処理を行う（図３のステップＳ３０８処理に相当）。それ以外の各機能ブロックの構成、動作については図１の実施形態における相当する各機能ブロックと同一であるので、重複する説明は省略する。 The difference from the embodiment of FIG. 1 is that the second microphone 102 (FIG. 1) is omitted and a microphone switching control unit 702 is provided instead. That is, one microphone 701 is used in common for a speech on the conference (conference speech mode and emphasis speech mode) and a speech to a specific partner (individual speech mode), and the control operation thereof is the microphone switching control unit 702. Switching with. For example, if there is no particular instruction input at the instruction input unit 703, processing is performed in the conference speech mode (corresponding to the processing in step S307 in FIG. 3), and if there is an emphasis instruction input, processing is performed in the enhanced speech mode (step S306 in FIG. 3). Equivalent to processing). Further, if there is an instruction input in the individual speech mode in the instruction input unit 703, a process of selecting the other party and transmitting voice data is performed (corresponding to step S308 in FIG. 3). Since the configuration and operation of each of the other functional blocks are the same as the corresponding functional blocks in the embodiment of FIG.

尚、本発明は上述の実施形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲で種々変形して実施することが出来る。 In addition, this invention is not limited only to the above-mentioned embodiment, It can implement in various deformation | transformation in the range which does not deviate from the summary of this invention.

本発明の一実施形態におけるテレビ会議端末装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the video conference terminal device in one Embodiment of this invention. 本発明の一実施形態におけるテレビ会議システム全体を示す機能ブロック図である。It is a functional block diagram which shows the whole video conference system in one Embodiment of this invention. 本発明の一実施形態にいて出席者が発言する場合のテレビ会議端末装置側の動作を示すフローチャートである。It is a flowchart which shows the operation | movement by the side of the video conference terminal device in case an attendant speaks in one Embodiment of this invention. 本発明の一実施形態におけるテレビ会議サーバ側の動作を示すフローチャートである。It is a flowchart which shows the operation | movement by the side of the video conference server in one Embodiment of this invention. 本発明の一実施形態におけるテレビ会議端末側の音声受信動作と再生動作を示すフローチャートである。It is a flowchart which shows the audio | voice reception operation | movement and reproduction | regeneration operation | movement by the side of the video conference terminal in one Embodiment of this invention. （ａ）及び（ｂ）は、本発明の一実施形態におけるテレビ会議端末装置の表示部に表示される映像の一例を示す図である。(A) And (b) is a figure which shows an example of the image | video displayed on the display part of the video conference terminal device in one Embodiment of this invention. 本発明の他の実施形態におけるテレビ会議端末装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the video conference terminal device in other embodiment of this invention.

Explanation of symbols

１００テレビ会議端末装置
１０１マイクロホン
１０２マイクロホン
１０３指示入力部
１０４カメラ
１０５表示部
１０６音声受信制御部
１０７音声送信制御部
１０８メタデータ作成部
１０９映像制御部
１１０出席者アドレス管理部
１１１スピーカ
２０１テレビ会議サーバ
２０２ネットワーク
７００テレビ会議端末装置
７０１マイクロホン
７０２マイクロホン切替制御部
７０３指示入力部
７０４カメラ
７０５表示部
７０６音声受信制御部
７０７音声送信制御部
７０８メタデータ作成部
７０９映像制御部
７１０出席者アドレス管理部
７１１スピーカ DESCRIPTION OF SYMBOLS 100 Video conference terminal device 101 Microphone 102 Microphone 103 Instruction input part 104 Camera 105 Display part 106 Audio | voice reception control part 107 Audio | voice transmission control part 108 Metadata production part 109 Video control part 110 Attendee address management part 111 Speaker 201 Video conference server 202 Network 700 Video conference terminal device 701 Microphone 702 Microphone switching control unit 703 Instruction input unit 704 Camera 705 Display unit 706 Audio reception control unit 707 Audio transmission control unit 708 Metadata creation unit 709 Video control unit 710 Attendee address management unit 711 Speaker

Claims

In a video conference system including a plurality of video conference terminal devices connected to a video conference server via a network,
The video conference terminal device
A microphone to use when speaking,
A display for displaying the video of the attendees;
A first speech mode that speaks to all conference attendees, a second speech mode that is audible to all conference attendants and highlights the speakers on the screen, and a specific conference attendance Means for switching and selecting a third speech mode in which the user speaks so as to be heard only by the user,
Means for creating metadata indicating highlighting and transmitting to the video conference server when the video conference terminal device selects the second speech mode;
A video conference system characterized by comprising:

The video conference server, upon receiving the metadata, creates a video conference video that highlights a speaker who has transmitted the metadata, and distributes the video conference video to video conference terminal devices of all conference participants. 1. The video conference system according to 1 .

The video conference terminal device
A means of managing the addresses of other meeting attendees;
Means for transmitting audio data to the address of the selected particular attendee when the third speech mode is selected;
Further videoconferencing system according to any one of claims 1 to 2, characterized in that with a.

The video conference terminal device further comprises means for reproducing audio by distinguishing between receiving audio data in the third speaking mode and receiving audio data in another speaking mode. Item 4. The video conference system according to any one of Items 1 to 3 .

The video conference according to any one of claims 1 to 4 , wherein the microphone includes a microphone used in the third speech mode and a microphone used in another speech mode. system.

The video conference terminal device further includes means for displaying the video of the conference attendee who transmitted the audio data so as to be distinguishable from other conference attendees when receiving the audio data in the third speech mode. The video conference system according to any one of claims 1 to 5 , wherein:

A control method in a video conference system including a plurality of video conference terminal devices connected to a video conference server via a network,
In the video conference terminal device,
A first speech mode that speaks to all conference attendees,
A second speech mode that is audible to all attendees and highlights the speaker on the screen;
A third speech mode that speaks only to certain meeting attendees,
, Select by switching the speaker ,
In the video conference system, the video conference terminal device creates metadata indicating highlighting when the second speech mode is selected, and transmits the metadata to the video conference server . Control method.

The video conference server, when receiving the metadata, creates a video conference video that highlights a speaker who has transmitted the metadata, and distributes the video conference video to video conference terminal devices of all conference participants. 8. A control method in the video conference system according to 7 .

In the video conference terminal device, addresses of other conference attendees are managed, and when the third speech mode is selected, audio data is transmitted to the selected address of the specific conference attendee. control method in the video conference system according to any one of claims 7 to 8, characterized.

The teleconference terminal device, and when the audio data received at the third speech mode, at the time of the voice data received at other remarks mode, distinguish of claims 7 to 9, characterized in that the sound reproduced The control method in the video conference system as described in any one.

When the video conference terminal device receives audio data in the third speech mode, the video conference terminal device displays the video of the conference attendee who transmitted the audio data so that it can be distinguished from other conference attendees. The video conference system according to any one of claims 7 to 10 .