JP2017152952A

JP2017152952A - Communication control apparatus, conference system and program

Info

Publication number: JP2017152952A
Application number: JP2016034038A
Authority: JP
Inventors: 知紀梅沢; Tomoki Umezawa
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-02-25
Filing date: 2016-02-25
Publication date: 2017-08-31

Abstract

PROBLEM TO BE SOLVED: To provide an apparatus, a system, a method, and a program capable of improving conference efficiency by transmitting important speech to a desired terminal at low delay.SOLUTION: The apparatus for controlling communication among a plurality of conference terminals includes: a receiving unit for receiving voice information of voice generated by a user using a first conference terminal out of a plurality of conference terminals and an attention degree indicating a degree that respective users utilizing respective conference terminals pay attention to respective users displayed on respective conference terminals; a determination unit 61 for determining priority indicating which communication between conference terminals is to have priority on the basis of the received attention degree; and a control unit for performing control for transmitting the received voice information to other conference terminals excluding the first conference terminal.SELECTED DRAWING: Figure 5

Description

本発明は、複数の会議端末間の通信を制御する通信制御装置、複数の会議端末と通信制御装置とを含む会議システムおよびその制御をコンピュータに実行させるためのプログラムに関する。 The present invention relates to a communication control device for controlling communication between a plurality of conference terminals, a conference system including a plurality of conference terminals and a communication control device, and a program for causing a computer to execute the control.

ネットワーク技術の進展、出張経費や出張時間の削減の要請に伴い、インターネット等の通信ネットワークを介して遠距離の拠点間での会議を実現するために、会議システムが用いられている。 Along with the advancement of network technology and the demand for reduction of business trip expenses and business trip time, a conference system is used to realize a conference between long-distance bases via a communication network such as the Internet.

この会議システムにおいて、対話が検出された拠点の遅延時間を小さくし、それ以外の拠点の遅延時間を大きくするシステムが提案されている（特許文献１参照）。 In this conference system, a system has been proposed in which the delay time of a base where a conversation is detected is reduced and the delay time of other bases is increased (see Patent Document 1).

しかしながら、従来の会議システムでは、多拠点間で会議を実施する場合、拠点間の繋がりや誰に向けて発話しているのかを判断せずにデータを伝送するため、重要な発話が遅延し、会議効率が低下するという問題があった。 However, in a conventional conference system, when a conference is held between multiple sites, important utterances are delayed because data is transmitted without determining the connection between the sites and who is speaking. There was a problem that conference efficiency decreased.

本発明は、上記の問題に鑑みてなされたものであり、重要な発話を所望の拠点へ低遅延で伝送し、会議効率を向上させることができる装置、システム、方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and provides an apparatus, system, method, and program capable of transmitting important utterances to a desired base with low delay and improving conference efficiency. Objective.

上述した課題を解決し、目的を達成するために、本発明は、複数の会議端末間の通信を制御する通信制御装置であって、複数の会議端末のうちの第１の会議端末を利用する利用者が発した音声の音声情報と、各会議端末を利用する各利用者が該各会議端末に表示される該各利用者を注目する度合いを示す注目度とを受信する受信部と、受信部により受信された注目度に基づき、どの会議端末間の通信を優先するかを示す優先度を決定する決定部と、決定部により決定された優先度に従って、受信部により受信された音声情報を、第１の会議端末を除く他の会議端末に送信する制御を行う制御部とを含む、通信制御装置を提供する。 In order to solve the above-described problems and achieve the object, the present invention is a communication control device that controls communication between a plurality of conference terminals, and uses the first conference terminal among the plurality of conference terminals. A receiving unit that receives voice information of a voice uttered by a user and a degree of attention indicating a degree of attention of each user displayed on each conference terminal by each user using each conference terminal; Based on the degree of attention received by the unit, a determination unit for determining a priority indicating which communication terminal to give priority to communication, and voice information received by the reception unit according to the priority determined by the determination unit And a control unit that performs control to transmit to other conference terminals other than the first conference terminal.

本発明によれば、重要な発話を所望の拠点へ低遅延で伝送し、会議効率を向上させることが可能となる。 According to the present invention, it is possible to transmit important utterances to a desired base with low delay, and to improve conference efficiency.

会議システムの構成例を示した図。The figure which showed the structural example of the conference system. 会議端末を構成するカメラおよび処理装置のハードウェア構成を示した図。The figure which showed the hardware constitutions of the camera which comprises a conference terminal, and a processing apparatus. 会議端末の機能ブロック図。The functional block diagram of a conference terminal. サーバのハードウェア構成を示した図。The figure which showed the hardware constitutions of the server. サーバの機能ブロック図。The functional block diagram of a server. 優先度の決定方法の第１の例を説明する図。The figure explaining the 1st example of the determination method of a priority. 図６に示す決定方法で使用される注目度を例示した図。The figure which illustrated the attention degree used with the determination method shown in FIG. 優先度の決定方法の第２の例を説明する図。The figure explaining the 2nd example of the determination method of a priority. 図８に示す決定方法で使用される注目度を例示した図。The figure which illustrated the attention degree used with the determination method shown in FIG. 優先度の決定方法の第３の例を説明する図。The figure explaining the 3rd example of the determination method of a priority. 図９に示す決定方法で使用される注目度を例示した図。The figure which illustrated the attention degree used with the determination method shown in FIG. 会議端末で実行される処理の流れを示したフローチャート。The flowchart which showed the flow of the process performed with a conference terminal. サーバで実行される処理の流れを示したフローチャート。The flowchart which showed the flow of the process performed with a server.

図１は、複数の拠点間で会議を行うために使用される会議システムの構成例を示した図である。複数の拠点間であるため、拠点の数は、３以上である。複数の拠点は、例えば本社の会議室、支店Aの会議室、支店Bの会議室等とされる。図１に示す例では、３つの拠点A、B、Cが示されている。会議システムは、各拠点A、B、Cに設置される会議端末A、B、Cと、各会議端末A、B、Cにネットワーク１０を介して接続される通信制御装置としてのサーバ１１とを含んで構成される。 FIG. 1 is a diagram illustrating a configuration example of a conference system used for performing a conference between a plurality of bases. Since there are a plurality of bases, the number of bases is three or more. The plurality of bases are, for example, a conference room at the head office, a conference room at branch A, and a conference room at branch B. In the example shown in FIG. 1, three bases A, B, and C are shown. The conference system includes conference terminals A, B, and C installed at the respective bases A, B, and C, and a server 11 as a communication control device connected to each of the conference terminals A, B, and C via the network 10. Consists of including.

ネットワーク１０は、LAN(Local Area Network)であってもよいし、WAN(Wide Area Network)であってもよいし、インターネットであってもよい。また、ネットワーク１０は、有線ネットワークであってもよいし、無線ネットワークであってもよい。 The network 10 may be a LAN (Local Area Network), a WAN (Wide Area Network), or the Internet. The network 10 may be a wired network or a wireless network.

会議端末は、１つの機器で構成されていてもよいし、複数の機器から構成されていてもよい。複数の機器から構成される場合、それぞれが分離した複数の機器から構成されていてもよいし、１つの筐体内にそれら複数の機器が収納されたものであってもよい。 The conference terminal may be composed of one device or may be composed of a plurality of devices. When configured from a plurality of devices, the devices may be configured from a plurality of separated devices, or may be a device in which the plurality of devices are housed in one housing.

会議端末は、撮像装置としてのカメラ１２、表示装置としてのディスプレイ１３、音声入力装置としてのマイク１４、音声出力装置としてのスピーカ１５、処理装置１６を含んで構成される。処理装置１６は、カメラ１２やマイク１４から入力される情報、ディスプレイ１３やスピーカ１５に出力する情報を処理する。また、会議端末は、ディスプレイ１３に表示された画像におけるユーザの視線の位置を視線情報として連続的に検知する視線検知装置１７を備える。図１では、会議端末Aについてのみ、会議端末Aを構成するカメラ１２等を示しているが、会議端末B、Cも同様の構成とされ、カメラ１２等を備えている。 The conference terminal includes a camera 12 as an imaging device, a display 13 as a display device, a microphone 14 as an audio input device, a speaker 15 as an audio output device, and a processing device 16. The processing device 16 processes information input from the camera 12 and the microphone 14 and information output to the display 13 and the speaker 15. The conference terminal also includes a line-of-sight detection device 17 that continuously detects the position of the user's line of sight in the image displayed on the display 13 as line-of-sight information. In FIG. 1, only the conference terminal A shows the camera 12 and the like constituting the conference terminal A, but the conference terminals B and C have the same configuration and include the camera 12 and the like.

カメラ１２は、静止画や動画等の画像を撮像する装置で、自拠点で会議に参加し、会議端末を利用する利用者（ユーザ）を撮像し、その画像の画像情報を画像データとして出力する。会議端末Aを利用するユーザをユーザAとし、会議端末Bを利用するユーザをユーザBとした場合、ユーザAに対し、そのユーザAがいる拠点Aが自拠点である。一方、ユーザAに対し、ユーザBがいる拠点Bが他拠点である。なお、各拠点のユーザの数は、1人であってもよいし、複数人であってもよい。 The camera 12 is a device that captures images such as still images and moving images. The camera 12 participates in a conference at its own location, captures a user (user) who uses the conference terminal, and outputs image information of the image as image data. . When a user who uses the conference terminal A is a user A and a user who uses the conference terminal B is a user B, the base A where the user A is with respect to the user A is his own base. On the other hand, the base B where the user B is with respect to the user A is another base. The number of users at each site may be one or a plurality of users.

ディスプレイ１３は、各拠点のカメラにより撮像された画像の画像データの入力を受け付け、それらの画像を表示する。ディスプレイ１３は、ブラウン管(CRT)であってもよいし、液晶ディスプレイであってもよいし、プラズマディスプレイであってもよいし、有機EL(Electroluminescence)ディスプレイであってもよい。 The display 13 accepts input of image data of images taken by cameras at each site and displays those images. The display 13 may be a cathode ray tube (CRT), a liquid crystal display, a plasma display, or an organic EL (Electroluminescence) display.

マイク１４は、自拠点のユーザが発した音声の入力を受け付け、音声情報を音声データとして出力する。スピーカ１５は、他拠点から送信された音声データを出力する。マイク１４およびスピーカ１５は、これに限られるものではないが、振動板（ダイヤフラム）とコイルとを含むことができる。マイク１４は、ダイヤフラムが音波を受けて振動し、磁界内でコイルが動くことで音声信号を生成し、音声信号を音声データとして出力する。スピーカ１５は、その逆で、音声信号によりコイルが動き、ダイヤフラムを振動させて音波を発生させ、その音波により音声を出力する。 The microphone 14 receives an input of a voice uttered by a user at its own base, and outputs voice information as voice data. The speaker 15 outputs audio data transmitted from another site. The microphone 14 and the speaker 15 are not limited to this, but can include a diaphragm (diaphragm) and a coil. The microphone 14 generates a sound signal when the diaphragm receives a sound wave and vibrates, and the coil moves in a magnetic field, and outputs the sound signal as sound data. On the contrary, the speaker 15 moves the coil by the audio signal, vibrates the diaphragm to generate a sound wave, and outputs the sound by the sound wave.

処理装置１６は、カメラ１２から出力される画像データ、マイク１４から出力される音声データをサーバ１１に送信する。また、処理装置１６は、サーバ１１から送信された画像データや音声データを受信し、画像データをディスプレイ１３に送信して表示させ、音声データをスピーカ１５に送信して出力させる。処理装置１６は、画像データや音声データを送信する際、それらのデータを符号化し、符号化された画像データや音声データを受信した際、それらのデータを復号する処理を行う。 The processing device 16 transmits the image data output from the camera 12 and the audio data output from the microphone 14 to the server 11. Further, the processing device 16 receives the image data and audio data transmitted from the server 11, transmits the image data to the display 13 for display, and transmits the audio data to the speaker 15 for output. The processing device 16 encodes the data when transmitting the image data and the audio data, and performs a process of decoding the data when receiving the encoded image data and audio data.

処理装置１６は、画像データの画像をディスプレイ１３に表示させる際、指定されたサイズで、指定された位置に表示させることができる。また、処理装置１６は、音声データを出力させる際、指定された音量で音声を出力させることができる。 When displaying the image of the image data on the display 13, the processing device 16 can display the image with a specified size at a specified position. Further, when outputting the audio data, the processing device 16 can output the audio at a designated volume.

視線検知装置１７は、例えばカメラとそのカメラで撮像した画像を分析する画像処理装置とを含んで構成することができる。視線検知装置１７は、例えば基準点を目頭とし、その目頭と虹彩との位置関係から視線の方向を検知することができる。これに限られるものではなく、赤外光を照射する照射装置と、赤外光を撮像可能なカメラと、画像処理装置とを用い、赤外光を照射して反射した角膜の位置を基準点とし、その位置と瞳孔との位置関係から視線の方向を検知することも可能である。なお、画像処理装置は、カメラにより撮像した画像から視線の方向を検知するために、CPU、メモリ、画像I/F、出力I/Fを含んで構成することができる。 The line-of-sight detection device 17 can include, for example, a camera and an image processing device that analyzes an image captured by the camera. The line-of-sight detection device 17 can detect the direction of the line of sight from the positional relationship between the eye and the iris, for example, with the reference point as the head. However, the present invention is not limited to this, and the position of the cornea reflected by irradiating infrared light using an irradiation device that irradiates infrared light, a camera capable of imaging infrared light, and an image processing device is a reference point. It is also possible to detect the direction of the line of sight from the positional relationship between the position and the pupil. Note that the image processing apparatus can include a CPU, a memory, an image I / F, and an output I / F in order to detect the direction of the line of sight from an image captured by the camera.

図２を参照して、会議端末のカメラ１２および処理装置１６のハードウェア構成について詳細に説明する。カメラ１２は、レンズ２０と、画像センサ２１と、画像処理回路２２と、画像I/F２３とを含んで構成される。レンズ２０は、被写体に反射した光が入射され、対象の像を結像する。画像センサ２１は、CCD(Charge Coupled Device)やCMOS(Complementary Metal Oxide Semiconductor)等の撮像素子で、上記対象の像をアナログ信号に変換する。 With reference to FIG. 2, the hardware configuration of the camera 12 and the processing device 16 of the conference terminal will be described in detail. The camera 12 includes a lens 20, an image sensor 21, an image processing circuit 22, and an image I / F 23. The lens 20 receives light reflected from the subject and forms an image of the object. The image sensor 21 is an imaging element such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor), and converts the target image into an analog signal.

画像処理回路２２は、A/Dコンバータを含み、アナログ信号をデジタル信号に変換し、画像データとして出力する。画像I/F２３は、画像処理回路２２により画像処理された画像データを処理装置１６に送信する。 The image processing circuit 22 includes an A / D converter, converts an analog signal into a digital signal, and outputs it as image data. The image I / F 23 transmits the image data processed by the image processing circuit 22 to the processing device 16.

処理装置１６は、CPU３０と、メモリ３１と、通信I/F３２と、制御回路３３と、画像I/F３４と、入出力I/F３５とを含んで構成される。CPU３０は、各種の処理を実行する。メモリ３１は、CPU３０が実行する処理に必要な各種のプログラム、画像データや音声データ等を格納する。メモリ３１は、例えば、揮発性メモリであるRAM(Random Access Memory)、不揮発性メモリであるROM(Read Only Memory)を含む。 The processing device 16 includes a CPU 30, a memory 31, a communication I / F 32, a control circuit 33, an image I / F 34, and an input / output I / F 35. The CPU 30 executes various processes. The memory 31 stores various programs necessary for processing executed by the CPU 30, image data, audio data, and the like. The memory 31 includes, for example, a RAM (Random Access Memory) that is a volatile memory and a ROM (Read Only Memory) that is a nonvolatile memory.

通信I/F３２は、画像データや音声データ、制御データ等をネットワークに接続された通信相手に送信し、また、それらのデータを通信相手から受信する。制御回路３３は、処理装置１６全体を制御する。画像I/F３４は、カメラ１２から画像データを受信する。入出力I/F３５は、音声データの入力、通信相手から受信した画像データのディスプレイ１３への出力、音声データの出力を制御する。 The communication I / F 32 transmits image data, audio data, control data, and the like to a communication partner connected to the network, and receives those data from the communication partner. The control circuit 33 controls the entire processing device 16. The image I / F 34 receives image data from the camera 12. The input / output I / F 35 controls input of audio data, output of image data received from a communication partner to the display 13, and output of audio data.

図３は、会議端末の機能ブロック図である。会議端末は、撮像部４０と、表示部４１と、視線検知部４２と、音声入力部４３と、音声出力部４４と、注目度算出部４５と、通信部４６とを含んで構成される。会議端末は、これらの機能部以外に、符号化部４７、復号部４８を備えることができる。 FIG. 3 is a functional block diagram of the conference terminal. The conference terminal includes an imaging unit 40, a display unit 41, a line-of-sight detection unit 42, an audio input unit 43, an audio output unit 44, an attention level calculation unit 45, and a communication unit 46. The conference terminal can include an encoding unit 47 and a decoding unit 48 in addition to these functional units.

撮像部４０は、カメラ１２により実現され、表示部４１は、ディスプレイ１３により実現され、視線検知部４２は、視線検知装置１７により実現され、音声入力部４３は、マイク１４により実現され、音声出力部４４は、スピーカ１５により実現される。注目度算出部４５、符号化部４７、復号部４８は、CPU３０がメモリ３１に格納されたプログラムを実行することにより実現される。通信部４６は、通信I/F３２により実現される。 The imaging unit 40 is realized by the camera 12, the display unit 41 is realized by the display 13, the line-of-sight detection unit 42 is realized by the line-of-sight detection device 17, and the audio input unit 43 is realized by the microphone 14 for audio output. The unit 44 is realized by the speaker 15. The attention level calculation unit 45, the encoding unit 47, and the decoding unit 48 are realized by the CPU 30 executing a program stored in the memory 31. The communication unit 46 is realized by the communication I / F 32.

撮像部４０は、自拠点の会議端末を利用するユーザを撮像し、画像データを符号化部４７へ出力する。符号化部４７は、画像データを符号化し、符号化された画像データを通信部４６に渡す。また、音声入力部４３は、自拠点のユーザが発した音声の入力を受け付け、音声データを符号化部４７へ出力する。 The imaging unit 40 captures an image of a user who uses the conference terminal at the local site, and outputs the image data to the encoding unit 47. The encoding unit 47 encodes the image data and passes the encoded image data to the communication unit 46. In addition, the voice input unit 43 receives voice input from a user at the local site and outputs voice data to the encoding unit 47.

符号化部４７は、画像データの符号化については、H.264/AVCやH.264/SVC等のエンコーダにより実現され、音声データの符号化については、Speex等のエンコーダにより実現される。エンコーダは、画像データや音声データ等のデジタル信号を、一定の規則に従って所望する符号に変換するもので、例えば、データを圧縮することができる。符号化部４７は、符号化されたデータを通信部４６に渡す。 The encoding unit 47 is realized by an encoder such as H.264 / AVC or H.264 / SVC for encoding image data, and is realized by an encoder such as Speex for encoding audio data. The encoder converts a digital signal such as image data or audio data into a desired code according to a certain rule. For example, the encoder can compress data. The encoding unit 47 passes the encoded data to the communication unit 46.

通信部４６は、送信部として機能し、符号化されたデータを、サーバ１１を介して自拠点以外の他拠点の会議端末に送信する。また、通信部４６は、受信部としても機能し、他拠点の会議端末から送信された符号化されたデータを受信する。通信部４６は、符号化されたデータを、復号部４８に渡す。 The communication unit 46 functions as a transmission unit, and transmits the encoded data to the conference terminal at a base other than its own base via the server 11. The communication unit 46 also functions as a receiving unit, and receives encoded data transmitted from the conference terminal at another site. The communication unit 46 passes the encoded data to the decoding unit 48.

復号部４８は、符号化されたデータを、元の画像データや音声データに復号する。復号部４８は、符号化された画像データについては、H.264/AVCやH.264/SVC等のデコーダにより実現され、符号化された音声データについては、Speex等のデコーダにより実現される。デコーダは、符号化されたデータを、元のデータに戻すもので、例えば、圧縮されたデータを伸長することができる。復号部４８は、復号した画像データを表示部４１に渡し、復号した音声データを音声出力部４４に渡す。 The decoding unit 48 decodes the encoded data into original image data and audio data. The decoding unit 48 is realized by a decoder such as H.264 / AVC or H.264 / SVC for encoded image data, and is realized by a decoder such as Speex for encoded audio data. The decoder returns the encoded data to the original data. For example, the decoder can expand the compressed data. The decoding unit 48 passes the decoded image data to the display unit 41 and passes the decoded audio data to the audio output unit 44.

表示部４１は、受け取った画像データを映像等として表示し、音声出力部４４は、受け取った音声データを音声として出力する。 The display unit 41 displays the received image data as a video or the like, and the audio output unit 44 outputs the received audio data as audio.

視線検知部４２は、ユーザの視線が、表示部４１に表示する画面内のどの拠点の映像上にあるかを検知する。視線検知部４２は、例えば二次元座標(x,y)として検知し、その座標がどの拠点の映像上にあるかを判断し、視線情報として拠点名等の、その拠点を特定する情報を出力する。二次元座標の基準点(0,0)は、いかなる位置であってもよいが、例えば画面の左上端に表示される画素とすることができる。なお、視線検知を行う際、視線から期待した位置の座標が得られるように初期状態を決定するために、較正（キャリブレーション）を実施することができる。 The line-of-sight detection unit 42 detects which site in the screen displayed on the display unit 41 the user's line of sight is on. The line-of-sight detection unit 42 detects, for example, two-dimensional coordinates (x, y), determines which base image the coordinates are on, and outputs information identifying the base, such as a base name, as line-of-sight information. To do. The reference point (0, 0) of the two-dimensional coordinates may be any position, but may be a pixel displayed at the upper left corner of the screen, for example. Note that when performing line-of-sight detection, calibration can be performed in order to determine the initial state so that the coordinates of the position expected from the line of sight can be obtained.

注目度算出部４５は、視線検知部４２が出力した視線情報を使用し、ユーザが表示部４１に表示する画面内の自拠点および他拠点のユーザを注目する度合いを示す注目度を算出する。注目度は、例えば、過去10秒間に取得した視線情報の積分値から、画面内の各拠点の映像に対する注目度を百分率で算出する方法を用いることができる。各拠点の映像は、各拠点のユーザを撮像した映像である。 The attention level calculation unit 45 uses the line-of-sight information output by the line-of-sight detection unit 42 and calculates the degree of attention indicating the degree of attention of the user at the local site and the other site in the screen displayed on the display unit 41 by the user. As the degree of attention, for example, a method of calculating the degree of attention with respect to the video of each base in the screen as a percentage from the integrated value of the line-of-sight information acquired in the past 10 seconds can be used. The video at each base is a video that captures the user at each base.

具体的には、10秒間に、拠点Bの映像が8秒、拠点Cの映像が2秒、拠点Aの映像が0秒注視されていた場合、拠点Aの注目度を0%、拠点Bの注目度を80%、拠点Cの注目度を20％として算出することができる。過去10秒間に限定されるものではなく、それ以外の期間や音声が検知されている期間の視線情報を用いてもよい。 Specifically, if the video of site B was watched for 8 seconds, the video of site C for 2 seconds, and the video of site A for 0 seconds in 10 seconds, the level of attention of site A was 0%, It is possible to calculate the degree of attention as 80% and the degree of attention of the base C as 20%. The line-of-sight information is not limited to the past 10 seconds, and other periods or periods in which sound is detected may be used.

注目度算出部４５は、算出した各拠点の注目度を注目度データとして通信部４６に渡し、通信部４６がサーバ１１に送信する。 The attention level calculation unit 45 passes the calculated attention level of each base to the communication unit 46 as attention level data, and the communication unit 46 transmits the attention level data to the server 11.

次に、図４を参照して、サーバ１１のハードウェア構成について説明する。サーバ１１は、CPU５０と、ROM５１と、RAM５２と、HDD(Hard Disk Drive)５３と、通信I/F５４とを含んで構成される。CPU５０は、ROM５１やHDD５３に格納されたプログラムをRAM５２に読み出し、実行する。CPU５０は、そのプログラムの実行により、サーバ１１全体を制御し、所定の処理を実行する。ROM５１は、サーバ１１を起動するためのブートプログラムやHDD５３等を制御するためのファームウェアを格納する。RAM５２は、CPU５０に対して作業領域を提供する。HDD５３は、後述する各機能部を実現するためのプログラムや各種データを格納する。通信I/F５４は、ネットワーク１０に接続し、ネットワーク１０を介して各拠点の会議端末との通信を可能にする。 Next, the hardware configuration of the server 11 will be described with reference to FIG. The server 11 includes a CPU 50, a ROM 51, a RAM 52, an HDD (Hard Disk Drive) 53, and a communication I / F 54. The CPU 50 reads out the program stored in the ROM 51 or the HDD 53 to the RAM 52 and executes it. The CPU 50 controls the entire server 11 by executing the program and executes a predetermined process. The ROM 51 stores a boot program for starting the server 11 and firmware for controlling the HDD 53 and the like. The RAM 52 provides a work area for the CPU 50. The HDD 53 stores programs and various data for realizing each functional unit described later. The communication I / F 54 is connected to the network 10 and enables communication with the conference terminal at each site via the network 10.

図５は、サーバ１１の機能ブロック図である。サーバ１１は、通信部６０と、決定部６１と、制御部として、映像制御部６２と音声制御部６３とを含んで構成される。通信部６０は、送信部および受信部として機能し、１つの拠点の会議端末を利用するユーザが発した音声の音声データ、各拠点の会議端末から画像データと注目度とを受信する。映像制御部６２は、通信部６０が受信した各拠点の会議端末からの画像データを、会議を行っている全ての拠点の会議端末に送信するように通信部６０に対して指示する。通信部６０は、映像制御部６２からの指示に従い、画像データを指示された会議端末に送信し、各拠点のユーザの映像を表示させる。これにより、顔を合わせた会議を行うことが可能となる。 FIG. 5 is a functional block diagram of the server 11. The server 11 includes a communication unit 60, a determination unit 61, and a video control unit 62 and an audio control unit 63 as control units. The communication unit 60 functions as a transmission unit and a reception unit, and receives voice data of a voice uttered by a user who uses a conference terminal at one site, and image data and a degree of attention from the conference terminal at each site. The video control unit 62 instructs the communication unit 60 to transmit the image data received from the conference terminal at each site received by the communication unit 60 to the conference terminals at all sites having the conference. The communication unit 60 transmits the image data to the instructed conference terminal according to the instruction from the video control unit 62 and displays the video of the user at each base. As a result, it is possible to hold a face-to-face meeting.

音声制御部６３は、会議を行っている上記の１つの拠点以外の他の全ての拠点の会議端末に音声データを送信するように通信部６０に対して指示する。通信部６０は、音声制御部６３からの指示に従い、音声データを指示された会議端末に送信し、当該他の全ての拠点でユーザが発した音声を出力させる。これにより、他の全ての拠点のユーザがその音声を聞き、それに応答する形で会話することができる。 The voice control unit 63 instructs the communication unit 60 to transmit voice data to the conference terminals at all the bases other than the one base where the conference is performed. The communication unit 60 transmits the audio data to the instructed conference terminal according to the instruction from the audio control unit 63, and outputs the audio uttered by the user at all the other bases. As a result, users at all other sites can listen to the voice and have a conversation in response.

決定部６１は、通信部６０が受信した注目度に基づき、どの会議端末間、すなわちどの拠点間の通信を優先するかを示す優先度を決定する。優先度の決定方法については後述する。音声制御部６３は、音声データの送信を通信部６０に対して指示し、その送信の制御を行うが、決定部６１が決定した優先度に従って音声データの送信を制御する。 The determination unit 61 determines a priority indicating which conference terminal, that is, which base has priority for communication, based on the attention received by the communication unit 60. A method for determining the priority will be described later. The voice control unit 63 instructs the communication unit 60 to transmit the voice data and controls the transmission. The voice control unit 63 controls the transmission of the voice data according to the priority determined by the determination unit 61.

具体的には、音声データである音声パケットに優先度を付与し、高い優先度が設定された音声パケットに対して、ネットワークリソースを多く割り当て、低い優先度が設定された音声パケットに対しては、ネットワークリソースの割り当てを少なくして、音声パケットを送信する。ネットワークリソースは、ネットワーク１０に接続し、ネットワーク１０を介した通信を制御する通信I/F５４等である。これにより、高い優先度の音声パケットについては、単位時間当たりに送信されるパケット数が増え、重要な発話の遅延を少なくし、会議効率を向上させることができる。 Specifically, a priority is given to a voice packet that is voice data, a large amount of network resources are allocated to a voice packet with a high priority, and a voice packet with a low priority is set. The voice packet is transmitted by reducing the allocation of network resources. The network resource is a communication I / F 54 that is connected to the network 10 and controls communication via the network 10. As a result, for high-priority voice packets, the number of packets transmitted per unit time increases, delay of important speech can be reduced, and conference efficiency can be improved.

また、音声制御部６３は、決定部６１が決定した優先度が、指定された優先度より低い拠点間の会議端末への音声データの送信を制御することができる。この制御で、低い優先度の音声パケットに対して割り当てるネットワークリソースを減らすことで、ネットワークリソースを節約することができる。 Further, the voice control unit 63 can control transmission of voice data to the conference terminals between the bases whose priority determined by the determination unit 61 is lower than the designated priority. With this control, network resources can be saved by reducing network resources allocated to low priority voice packets.

上記で節約したネットワークリソースを、指定された優先度以上の拠点間の会議端末への音声データの送信に割り当てることができる。これにより、単位時間当たりに送信される高い優先度の音声パケットのパケット数が増えるため、重要な発話の遅延をさらに少なくし、さらに会議効率を向上させることができる。 The network resource saved as described above can be allocated to transmission of audio data to the conference terminal between the bases having the designated priority or higher. Thereby, since the number of high priority voice packets transmitted per unit time increases, the delay of important speech can be further reduced, and the conference efficiency can be further improved.

映像制御部６２は、各拠点から取得した画像データを、各拠点に送信するように通信部６０に対して指示することができる。その際、映像制御部６２は、画像中に注目度が含まれるように画像を加工し、その加工した画像の画像データを通信部６０に渡し、送信するように指示することができる。 The video control unit 62 can instruct the communication unit 60 to transmit the image data acquired from each site to each site. At that time, the video control unit 62 can process the image so that the attention level is included in the image, and pass the image data of the processed image to the communication unit 60 and instruct to transmit it.

なお、画像データの送信についても、上記と同様、指定された優先度より低い拠点間の会議端末への画像データの送信を制御し、ネットワークリソースを節約することができる。また、節約したネットワークリソースを、上記の指定された優先度以上の拠点間の会議端末への画像データの送信に割り当てることで、その間の音声データの送受信をスムーズに行うことができ、会議効率を向上させることができる。 As for the transmission of the image data, similarly to the above, it is possible to control the transmission of the image data to the conference terminals between the bases lower than the designated priority, and to save network resources. In addition, by allocating the saved network resources to the transmission of image data to the conference terminal between the bases having the specified priority or higher, the audio data can be smoothly transmitted and received during that time, thereby improving the conference efficiency. Can be improved.

ここでは、自拠点に対して自拠点の画像データも送信するようにしているが、自拠点に対しては他拠点の画像データのみを送信することも可能である。各拠点では、処理装置１６が、どの画像データをどの画面の位置、サイズで表示するかを決定し、ディスプレイ１３に各画像を各位置、サイズで表示させることができる。 Here, the image data of the local site is also transmitted to the local site, but it is also possible to transmit only the image data of other sites to the local site. At each site, the processing device 16 can determine which image data is to be displayed at which screen position and size, and can cause the display 13 to display each image at each position and size.

図６を参照して、優先度を決定する方法の一例を説明する。ここでは、通常の会議時に、注目度に応じて優先度を決定する方法を説明する。会議を拠点A〜Cの３拠点で行い、各拠点の会議端末の表示部４１が表示する画面には、各拠点で撮像したユーザの映像が表示されている。各ユーザの映像は、全てのユーザを同じサイズで表示されていてもよいし、ユーザの設定や注目度等に応じて、サイズを変えて表示されていてもよい。また、現在発話している、あるいは最近発話したユーザはサイズが大きく、それ以外のユーザはサイズが小さく表示されていてもよい。 An example of a method for determining the priority will be described with reference to FIG. Here, a method for determining the priority according to the degree of attention during a normal meeting will be described. The conference is held at three locations A to C, and the user's image captured at each location is displayed on the screen displayed by the display unit 41 of the conference terminal at each location. Each user's video may be displayed in the same size for all users, or may be displayed in different sizes according to the user's settings, the degree of attention, and the like. In addition, a user who is currently speaking or has recently spoken may be displayed in a large size, and other users may be displayed in a small size.

この例では、拠点Aの画面には、注目度が最も高い拠点Bのユーザの映像が大きく表示され、その右側には、次に注目度が高い拠点Cのユーザの映像と、最も注目度が低い拠点Aのユーザの映像とが小さく表示されている。同様に、拠点Bには、注目度が最も高い拠点Aのユーザの映像が大きく、それ以外の拠点B、Cのユーザの映像が小さく表示されている。また、拠点Cには、注目度が最も高い拠点Bのユーザの映像が大きく、それ以外の拠点A、Cのユーザの映像が小さく表示されている。また、各拠点の映像には、その注目度も表示されている。 In this example, the image of the user of the site B with the highest degree of attention is displayed large on the screen of the site A, and the video of the user of the site C with the next highest degree of attention and the highest degree of attention are displayed on the right side. The video of the user at the low base A is displayed small. Similarly, at the base B, the video of the user at the base A with the highest degree of attention is large and the video of the users at the other bases B and C is displayed small. In addition, at the base C, the video of the user at the base B with the highest degree of attention is large and the video of the users at the other bases A and C is displayed small. In addition, the degree of attention is also displayed on the video of each site.

図７に、各拠点の会議端末から受信した注目度を例示する。拠点Aに対する注目度は、拠点A〜Cの各ユーザが拠点Aの映像にどれだけ注目しているかを示す。同様に、拠点Bに対する注目度は、拠点Bの映像にどれだけ注目しているかを示し、拠点Cに対する注目度は、拠点Cの映像にどれだけ注目しているかを示す。 FIG. 7 illustrates the degree of attention received from the conference terminal at each site. The degree of attention with respect to the base A indicates how much the users of the bases A to C are paying attention to the video at the base A. Similarly, the degree of attention to the base B indicates how much attention is paid to the video of the base B, and the degree of attention to the base C indicates how much attention is paid to the video of the base C.

図７に示す例では、拠点Aに対する注目度は、拠点Aのユーザについては0%で、拠点Bのユーザについては70%で、拠点Cのユーザについては30%となっている。ユーザは、表示された注目度を見ることで、自分がどの拠点の映像を注視しているかを知ることができる。 In the example illustrated in FIG. 7, the degree of attention to the base A is 0% for the user at the base A, 70% for the user at the base B, and 30% for the user at the base C. The user can know which base image he / she is watching by looking at the displayed attention level.

再び図６を参照して、各拠点のユーザは、会議において発話するが、その音声は、音声データとして、発話したユーザがいる拠点以外の他の拠点へ送信され、出力される。その際、優先度に従って、当該他の拠点へ送信される。具体的には、拠点Aのユーザの発話Aは、優先度に従って拠点B、Cへ送信される。 Referring to FIG. 6 again, the user at each site speaks in the conference, but the voice is transmitted as voice data to other sites other than the site where the user who spoke is present and output. At that time, it is transmitted to the other base according to the priority. Specifically, the utterance A of the user at the site A is transmitted to the sites B and C according to the priority.

決定部６１は、優先度を決定するに先立って、各拠点間の注目度を平均し、各拠点間の関連性の度合いを示す関連度を算出する。具体的に説明すると、拠点Aに対する拠点Bの注目度は80%で、拠点Bに対する拠点Aの注目度は70%である。拠点A-B間の関連度は、これらの注目度を平均し、75%と算出される。同様にして、拠点A-C間の関連度は25%、拠点B-C間の関連度は40%と算出される。 Prior to determining the priority, the determination unit 61 averages the attention level between the bases, and calculates the degree of association indicating the degree of relevance between the bases. Specifically, the degree of attention of the base B with respect to the base A is 80%, and the degree of attention of the base A with respect to the base B is 70%. The degree of association between bases A and B is calculated as 75% by averaging these attention degrees. Similarly, the degree of association between bases A and C is calculated as 25%, and the degree of association between bases B and C is calculated as 40%.

決定部６１は、この関連度の最も高い拠点A-B間が、拠点間の繋がりが最も強く、通信を最優先させるべきと判断し、拠点A-B間でやりとりされる音声データに対して優先度を高い値に決定する。優先度は、音声データを音声パケットとして送信する場合の、音声パケットに含まれるIPヘッダのToS(Type of Service)フィールドやDSCP(Differentiated Service Code Point)フィールド等に設定することができる。音声制御部６３は、このフィールドを参照し、優先制御(QoS制御)を行うことができる。 The determination unit 61 determines that the connection between the bases AB having the highest degree of association is the strongest and the communication should be given the highest priority, and the priority is high for the voice data exchanged between the bases AB. Decide on a value. The priority can be set in a ToS (Type of Service) field, a DSCP (Differentiated Service Code Point) field, or the like of an IP header included in the voice packet when voice data is transmitted as a voice packet. The voice control unit 63 can perform priority control (QoS control) with reference to this field.

なお、ToSフィールドは、3ビットで優先度を表すため、8段階で優先度を定義することができる。これに対し、DSCPフィールドは、6ビットで優先度を表すため、64段階で優先度を定義することができる。 Since the ToS field represents priority with 3 bits, priority can be defined in 8 stages. On the other hand, since the DSCP field represents priority with 6 bits, priority can be defined in 64 steps.

この例では、拠点A-B間の音声データを最優先するため、決定部６１は、拠点Aから受信し、拠点Bへ送信する音声データに対して高い優先度を設定し、拠点Cへ送信する音声データに対しては低い優先度を設定する。また、決定部６１は、拠点Bから受信し、拠点Aへ送信する音声データに対して高い優先度を設定し、拠点Cへ送信する音声データに対しては低い優先度を設定する。 In this example, in order to give the highest priority to the voice data between the bases AB, the determination unit 61 sets a high priority for the voice data received from the base A and sent to the base B, and sent to the base C. Set low priority for data. The determination unit 61 sets a high priority for the audio data received from the base B and transmitted to the base A, and sets a low priority for the audio data transmitted to the base C.

決定部６１は、拠点A、Bから受信し、拠点Cへ送信する音声データに対し、関連度に従って優先度を設定してもよいが、いずれも50%以下の低い関連度であるため、優先度を設定しなくてもよい。この場合、音声制御部６３は、優先度に関係なく、それぞれを最も効率良く送信する通常処理（ベストエフォートの処理）を、通信部６０に対して指示することができる。 The determination unit 61 may set the priority according to the relevance level for the audio data received from the bases A and B and transmitted to the base C, but since both are low relevance levels of 50% or less, priority is given. It is not necessary to set the degree. In this case, the voice control unit 63 can instruct the communication unit 60 to perform normal processing (best effort processing) for transmitting each of them most efficiently regardless of the priority.

図６に示した例では、全ての拠点間で関連度が異なり、通信を最優先すべき拠点間が存在したが、全ての拠点間で関連度が異なるとは限らない。図８は、優先度を決定する方法の別の例を説明する図である。会議を拠点A〜Cの３拠点で行い、各拠点の会議端末の表示部４１が表示する画面には、各拠点で撮像したユーザの映像が表示されている。各ユーザの映像は、例えば、現在発話しているユーザ、あるいは最近発話したユーザが大きく、それ以外のユーザが小さく表示されている。 In the example illustrated in FIG. 6, the degree of association is different between all the bases, and there is a place where communication should be given the highest priority. However, the degree of association is not necessarily different among all the bases. FIG. 8 is a diagram for explaining another example of a method for determining the priority. The conference is held at three locations A to C, and the user's image captured at each location is displayed on the screen displayed by the display unit 41 of the conference terminal at each location. In each user's video, for example, a user who is currently speaking or a user who has recently spoken is large, and other users are small.

図９に、各拠点の会議端末から受信した注目度を例示する。拠点Aに対する注目度は、拠点A〜Cの各ユーザが、拠点Aの映像に、拠点Bに対する注目度は拠点Bの映像に、拠点Cに対する注目度は拠点Cの映像にどれだけ注目しているかを示している。 FIG. 9 illustrates the degree of attention received from the conference terminal at each site. As for the degree of attention to the base A, each user of the bases A to C pays attention to the video of the base A, the degree of attention to the base B to the video of the base B, and the degree of attention to the base C to the video of the base C. It shows that.

上記と同様に、決定部６１は、優先度を決定するに先立って、各拠点間の注目度を平均し、各拠点間の関連度を算出する。この例では、いずれの拠点間の関連度も50%となり、拠点間の繋がりも優劣が存在しない。このため、決定部６１は、優先度を設定せず、音声制御部６３は、通常処理を通信部６０に対して指示する。 In the same manner as described above, the determination unit 61 averages the attention level between the bases and calculates the degree of association between the bases before determining the priority. In this example, the degree of association between any of the bases is 50%, and there is no superiority or inferiority in the connection between the bases. For this reason, the determination unit 61 does not set the priority, and the voice control unit 63 instructs the communication unit 60 to perform normal processing.

上記で説明した会議は、各拠点のユーザが自由に発話し、発話をやりとりする通常会議であるが、この通常会議とは異なる形式の会議も存在する。このような会議としては、例えば、質疑応答時に発話するユーザが１拠点のユーザであるプレゼンテーション型の会議が挙げられる。このプレゼンテーション型の会議における優先度の決定方法について、図１０を参照して説明する。 The conference described above is a normal conference in which users at each base freely speak and exchange utterances, but there are conferences of a format different from the normal conference. An example of such a meeting is a presentation-type meeting in which a user who speaks during a question-and-answer session is a user at one site. A method of determining priority in this presentation type meeting will be described with reference to FIG.

図１０に示す例も、会議を拠点A〜Cの３拠点で行い、各拠点の会議端末の表示部４１が表示する画面には、各拠点で撮像したユーザの映像が表示されている。各ユーザの映像は、全てのユーザを同じサイズで表示されていてもよいし、ユーザの設定や注目度等に応じて、サイズを変えて表示されていてもよい。また、現在発話している、あるいは最近発話したユーザはサイズが大きく、それ以外のユーザはサイズが小さく表示されていてもよい。 In the example illustrated in FIG. 10, the conference is also performed at three sites A to C, and the user's image captured at each site is displayed on the screen displayed by the display unit 41 of the conference terminal at each site. Each user's video may be displayed in the same size for all users, or may be displayed in different sizes according to the user's settings, the degree of attention, and the like. In addition, a user who is currently speaking or has recently spoken may be displayed in a large size, and other users may be displayed in a small size.

この例では、拠点Aの画面には、注目度が最も高い拠点Bのユーザの映像が大きく表示され、その右側には、次に注目度が高い拠点Cのユーザの映像と、最も注目度が低い拠点Aのユーザの映像とが小さく表示されている。同様に、拠点Bには、注目度が最も高い拠点Aのユーザの映像が大きく、それ以外の拠点B、Cのユーザの映像が小さく表示されている。また、拠点Cには、注目度が最も高い拠点Bのユーザの映像が大きく、それ以外の拠点A、Cのユーザの映像が小さく表示されている。 In this example, the image of the user of the site B with the highest degree of attention is displayed large on the screen of the site A, and the video of the user of the site C with the next highest degree of attention and the highest degree of attention are displayed on the right side. The video of the user at the low base A is displayed small. Similarly, at the base B, the video of the user at the base A with the highest degree of attention is large and the video of the users at the other bases B and C is displayed small. In addition, at the base C, the video of the user at the base B with the highest degree of attention is large and the video of the users at the other bases A and C is displayed small.

図１１に、各拠点の会議端末から受信した注目度を例示する。拠点Aに対する注目度は、拠点A〜Cの各ユーザが、拠点Aの映像に、拠点Bに対する注目度は拠点Bの映像に、拠点Cに対する注目度は拠点Cの映像にどれだけ注目しているかを示す。 FIG. 11 illustrates the degree of attention received from the conference terminal at each site. As for the degree of attention to the base A, each user of the bases A to C pays attention to the video of the base A, the degree of attention to the base B to the video of the base B, and the degree of attention to the base C to the video of the base C. Indicates whether or not

上記と同様に、決定部６１は、優先度を決定するに先立って、各拠点間の注目度を平均し、各拠点間の関連度を算出する。この例では、拠点B-C間の繋がりが最も強いと判断されるが、実際には、拠点A-B間で質疑応答がなされるため、視線情報のみで判断すると、誤判定をしてしまう。そこで、視線情報に加えて、音声データを用いて判断することができるが、音声データは周囲の雑音等を含むことから、その雑音等を除くべく、各拠点の音声データに含まれる音量に関する音量データも考慮して判断することができる。 In the same manner as described above, the determination unit 61 averages the attention level between the bases and calculates the degree of association between the bases before determining the priority. In this example, it is determined that the connection between the bases B and C is the strongest. However, since a question and answer is actually made between the bases A and B, an erroneous determination is made if only the line-of-sight information is determined. Therefore, in addition to the line-of-sight information, determination can be made using audio data. Since the audio data includes ambient noise, etc., the volume related to the volume included in the audio data at each site is excluded in order to remove the noise. Judgment can be made in consideration of data.

この質疑応答時には、拠点Aと拠点Bから大きい音量の音声が受信されるため、決定部６１は、拠点A-B間で質疑応答が行われていると判断し、拠点A-B間の優先度を高く設定することができる。音量情報を考慮して音声データの送信を制御することで、誤判定を防止し、実際に関連度が高い拠点間の発話の遅延を少なくことができる。 At the time of this question and answer, since a loud sound is received from the bases A and B, the determination unit 61 determines that a question and answer is being performed between the bases AB, and sets a high priority between the bases AB. can do. By controlling the transmission of the audio data in consideration of the volume information, it is possible to prevent erroneous determination and reduce the delay of speech between bases that are actually highly relevant.

これまで、会議端末が行う処理、サーバ１１が行う処理について、いくつかの例を挙げて説明してきた。ここで、図１２に、会議端末が行う処理の流れを、図１３に、サーバ１１が行う処理の流れをまとめる。会議端末は、ステップ１２００から処理を開始し、ステップ１２０５では、各拠点に設置された会議端末の視線検知部４２が視線を検知し、視線情報を取得する。そして、注目度算出部４５が視線情報に基づき、表示部４１に表示された各拠点の映像を注目する度合いを示す注目度を算出する。この注目度は、映像として、拠点A〜Cの３つの拠点のユーザの映像を表示している場合、各拠点につき算出する。 So far, the process performed by the conference terminal and the process performed by the server 11 have been described with some examples. Here, FIG. 12 summarizes the flow of processing performed by the conference terminal, and FIG. 13 summarizes the flow of processing performed by the server 11. The conference terminal starts processing from step 1200, and in step 1205, the gaze detection unit 42 of the conference terminal installed at each site detects the gaze and acquires gaze information. Then, the attention level calculation unit 45 calculates the attention level indicating the degree of attention of the video of each base displayed on the display unit 41 based on the line-of-sight information. This degree of attention is calculated for each site when videos of users at three sites A to C are displayed as images.

ステップ１２１０では、会議を終了したかを判断する。終了していない場合は、ステップ１２１５へ進み、通信部４６が、注目度算出部４５が算出した注目度をサーバ１１に送信する。そして、ステップ１２０５へ戻る。 In step 1210, it is determined whether the conference has ended. If not completed, the process proceeds to step 1215, and the communication unit 46 transmits the attention level calculated by the attention level calculation unit 45 to the server 11. Then, the process returns to step 1205.

一方、終了した場合は、ステップ１２２０へ進み、会議端末による処理を終了する。会議は、予め登録した会議名の会議へログインすることにより開始することができ、ログオフすることで終了することができる。例えば、画面に表示されるログインボタンを、マウス等の入力装置を利用して押下し、ユーザ名、パスワード等を入力することでログインすることができ、画面に表示されるログオフボタンを、その入力装置を利用して押下することでログオフすることができる。なお、会議は、その会議で使用されている全ての会議端末でログオフされたことを受けて終了したと判断することができる。 On the other hand, if the process is completed, the process proceeds to step 1220, and the process by the conference terminal is ended. A conference can be started by logging in to a conference with a conference name registered in advance, and can be ended by logging off. For example, the user can log in by pressing the login button displayed on the screen using an input device such as a mouse and entering a user name, password, etc., and input the logoff button displayed on the screen. It is possible to log off by pressing using the device. Note that it can be determined that the conference has ended after being logged off at all conference terminals used in the conference.

サーバ１１は、ステップ１３００から処理を開始し、ステップ１３０５では、通信部６０が、各拠点に設置された会議端末から算出された注目度を注目度データとして受信する。ステップ１３１０では、決定部６１が、拠点間の関連度を算出する。ステップ１３１５では、決定部６１が、算出した拠点間の関連度を比較し、拠点A-B間>拠点A-C間≧拠点B-C間であるかを判断する。この条件を満たす場合は、ステップ１３２０へ進み、満たさない場合は、ステップ１３３５へ進む。 The server 11 starts processing from step 1300, and in step 1305, the communication unit 60 receives the attention level calculated from the conference terminal installed at each base as attention level data. In step 1310, the determination unit 61 calculates the degree of association between bases. In step 1315, the determination unit 61 compares the calculated degrees of association between the bases, and determines whether base A-B> base A-C ≧ base B-C. If this condition is satisfied, the process proceeds to step 1320. Otherwise, the process proceeds to step 1335.

ステップ１３２０では、決定部６１は、拠点A-B間の音声データを高優先度に決定する。ステップ１３２５では、拠点A-C間の音声データを低優先度に決定し、ステップ１３３０で、拠点B-C間の音声データも低優先度に決定する。ここでは、優先度を高優先度と低優先度の２段階で表し、いずれかになるように決定しているが、３段階以上とし、例えば高優先度、中優先度、低優先度等としてもよい。 In step 1320, the determination unit 61 determines the audio data between the sites A and B with high priority. In step 1325, the voice data between bases A and C is determined to have a low priority, and in step 1330, voice data between bases B and C is also determined to have a low priority. Here, the priority is expressed in two stages of high priority and low priority, and it is determined to be either one, but it is set to three or more stages, for example, high priority, medium priority, low priority, etc. Also good.

ステップ１３３５では、決定部６１が、拠点A-C間>拠点B-C間≧拠点A-B間であるかを判断する。この条件を満たす場合は、ステップ１３４０へ進み、満たさない場合は、ステップ１３５５へ進む。ステップ１３４０では、決定部６１は、拠点A-C間の音声データを高優先度に決定する。ステップ１３４５では、決定部６１は、拠点B-C間の音声データを低優先度に決定し、ステップ１３５０で、拠点A-B間の音声データも低優先度に決定する。 In step 1335, the determination unit 61 determines whether the relationship between the bases A and C> the bases B and C ≧ the bases A and B. If this condition is met, the process proceeds to step 1340, and if not, the process proceeds to step 1355. In step 1340, the determination unit 61 determines the audio data between the locations A and C with high priority. In step 1345, the determination unit 61 determines the voice data between the bases B and C to be low priority, and in step 1350, the voice data between the bases A and B is also determined to be low priority.

ステップ１３５５では、決定部６１が、拠点B-C間>拠点A-C間≧拠点A-B間であるかを判断する。この条件を満たす場合は、ステップ１３６０へ進み、満たさない場合は、ステップ１３８０へ進む。ステップ１３６０では、決定部６１は、拠点B-C間の音声データを高優先度に決定する。ステップ１３６５では、決定部６１は、拠点A-C間の音声データを低優先度に決定し、ステップ１３７０で、拠点A-B間の音声データも低優先度に決定する。 In step 1355, the determination unit 61 determines whether the relationship between the bases B and C> the bases A and C ≧ the bases A and B. If this condition is satisfied, the process proceeds to step 1360; otherwise, the process proceeds to step 1380. In step 1360, the determination unit 61 determines the audio data between the bases B and C with high priority. In step 1365, the determination unit 61 determines the audio data between the bases A and C to be low priority, and in step 1370, the audio data between the bases A and B is also determined to be low priority.

ステップ１３７５では、音声制御部６３が、決定された優先度に従って音声データの送信を制御し、その制御を受けて、通信部６０が各会議端末に音声データを送信する。ステップ１３８０では、優先度が決定されないため、通常処理を実行する。すなわち、特定の会議端末に優先的に音声データを送信するのではなく、全ての会議端末に最も効率良く音声データが送信されるように制御する。 In step 1375, the voice control unit 63 controls the transmission of voice data according to the determined priority, and the communication unit 60 transmits the voice data to each conference terminal in response to the control. In step 1380, since priority is not determined, normal processing is executed. In other words, the audio data is not preferentially transmitted to a specific conference terminal, but is controlled so that the audio data is most efficiently transmitted to all the conference terminals.

ステップ１３８５では、会議が終了したかを判断し、終了していない場合は、ステップ１３０５へ戻り、終了した場合は、ステップ１３９０へ進み、サーバ１１による処理を終了する。 In step 1385, it is determined whether or not the conference has ended. If it has not ended, the process returns to step 1305. If it has ended, the process proceeds to step 1390, and the processing by the server 11 is ended.

このようにして、各拠点間の視線情報により拠点間の関連度に応じて、音声データの優先度を決定し、その優先度に従って音声データの送信を制御することで、重要な発話の遅延を少なくし、会議効率を向上させることができる。 In this way, the priority of voice data is determined according to the degree of association between the bases based on the line-of-sight information between the bases, and the transmission of the voice data is controlled according to the priority, thereby reducing the delay of important speech. This can reduce the meeting efficiency.

これまで本発明を、通信制御装置、会議システムおよびプログラムとして上述した実施の形態をもって説明してきたが、本発明は上述した実施の形態に限定されるものではない。本発明は、他の実施の形態、追加、変更、削除など、当業者が想到することができる範囲内で変更することができ、いずれの態様においても本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 Although the present invention has been described with the above-described embodiments as a communication control device, a conference system, and a program, the present invention is not limited to the above-described embodiments. The present invention can be modified within a range that can be conceived by those skilled in the art, such as other embodiments, additions, modifications, and deletions, and the present invention is not limited as long as the operations and effects of the present invention are exhibited in any aspect. It is included in the scope of the invention.

したがって、上記のプログラムは、インストール可能な形式または実行可能な形式のファイルでフロッピー（登録商標）ディスク、CD(Compact Disk)、CD-R(Compact Disk Read Only Memory)、DVD(Digital Versatile Disk)、SDメモリカード(SD Memory Card)、USBメモリ(Universal Serial Bus Memory)等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成することができる。また、上記のプログラムは、インターネット等のネットワーク経由で提供または配布するように構成されていてもよい。さらに、上記のプログラムを他の各種ソフトウェアとともにROM等に予め組み込んで提供するように構成してもよい。 Therefore, the above program is a file in an installable or executable format, such as a floppy (registered trademark) disk, CD (Compact Disk), CD-R (Compact Disk Read Only Memory), DVD (Digital Versatile Disk), It can be configured to be recorded on a computer-readable recording medium such as an SD memory card (SD Memory Card) or a USB memory (Universal Serial Bus Memory). Further, the above program may be configured to be provided or distributed via a network such as the Internet. Further, the program may be provided by being incorporated in advance in a ROM or the like together with other various software.

上記の実施形態では、上記の各機能部は、CPU３０がメモリ３１に記憶されたプログラム、CPU５０がHDD５３等に記憶されたプログラムを読み出して実行することにより生成される。しかしながら、これに限られるものではなく、例えば各機能部のうちの少なくとも一部が専用のハードウェア回路（例えば半導体集積回路）で実現されていてもよい。 In the above embodiment, each functional unit described above is generated by the CPU 30 reading and executing the program stored in the memory 31 and the CPU 50 reading and executing the program stored in the HDD 53 or the like. However, the present invention is not limited to this. For example, at least a part of each functional unit may be realized by a dedicated hardware circuit (for example, a semiconductor integrated circuit).

１０…ネットワーク、１１…サーバ、１２…カメラ、１３…ディスプレイ、１４…マイク、１５…スピーカ、１６…処理装置、１７…視線検知装置、２０…レンズ、２１…画像センサ、２２…画像処理回路、２３…画像I/F、３０…CPU、３１…メモリ、３２…通信I/F、３３…制御回路、３４…画像I/F、３５…入出力I/F、４０…撮像部、４１…表示部、４２…視線検知部、４３…音声入力部、４４…音声出力部、４５…注目度算出部、４６…通信部、４７…符号化部、４８…復号部、５０…CPU、５１…ROM、５２…RAM、５３…HDD、５４…通信I/F、６０…通信部、６１…決定部、６２…映像制御部、６３…音声制御部 DESCRIPTION OF SYMBOLS 10 ... Network, 11 ... Server, 12 ... Camera, 13 ... Display, 14 ... Microphone, 15 ... Speaker, 16 ... Processing device, 17 ... Gaze detection device, 20 ... Lens, 21 ... Image sensor, 22 ... Image processing circuit, 23 ... Image I / F, 30 ... CPU, 31 ... Memory, 32 ... Communication I / F, 33 ... Control circuit, 34 ... Image I / F, 35 ... Input / output I / F, 40 ... Imaging section, 41 ... Display Unit, 42 ... gaze detection unit, 43 ... audio input unit, 44 ... audio output unit, 45 ... attention degree calculation unit, 46 ... communication unit, 47 ... encoding unit, 48 ... decoding unit, 50 ... CPU, 51 ... ROM 52 ... RAM, 53 ... HDD, 54 ... Communication I / F, 60 ... Communication unit, 61 ... Determining unit, 62 ... Video control unit, 63 ... Audio control unit

特開２０１１−６６４８０号公報JP 2011-66480 A

Claims

A communication control device for controlling communication between a plurality of conference terminals,
Voice information of a voice uttered by a user using the first conference terminal among the plurality of conference terminals, and each user displayed on each conference terminal by each user using each conference terminal A receiving unit that receives a degree of attention indicating a degree of attention;
A determination unit for determining a priority indicating a degree of priority for communication between the conference terminals based on the attention level received by the reception unit;
And a control unit that controls transmission of the audio information to other conference terminals other than the first conference terminal according to the priority determined by the determination unit.

The receiving unit receives image information of an image obtained by capturing each user who uses each conference terminal from each conference terminal;
The communication control apparatus according to claim 1, wherein the control unit controls transmission of the plurality of image information received by the receiving unit to each conference terminal according to the priority.

The voice information includes volume information regarding volume,
The communication control device according to claim 1, wherein the determination unit determines the priority based on the attention level and the volume information.

The said control part controls transmission of the said audio | voice information to the conference terminal between the conference terminals in which the said priority determined by the said determination part is lower than the designated priority. The communication control device according to 1.

The control unit according to any one of claims 1 to 4, wherein the control unit controls transmission of the plurality of pieces of image information to a conference terminal between conference terminals in which the priority determined by the determination unit is lower than a specified priority. The communication control apparatus according to item 1.

The control unit reduces a communication amount per unit time to be transmitted to a conference terminal between conference terminals lower than the designated priority, and a unit to be transmitted to a conference terminal between conference terminals having the designated priority or higher. The communication control device according to claim 4, wherein control is performed so as to increase a communication amount per hour.

A conference system including a plurality of conference terminals and a communication control device that controls communication between the plurality of conference terminals,
The communication control device is
Voice information of a voice uttered by a user using the first conference terminal among the plurality of conference terminals, and each user displayed on each conference terminal by each user using each conference terminal A receiving unit that receives a degree of attention indicating a degree of attention;
A determination unit for determining a priority indicating a degree of priority for communication between the conference terminals based on the attention level received by the reception unit;
A control unit that controls transmission of the audio information to other conference terminals other than the first conference terminal according to the priority determined by the determination unit.

The conference terminal outputs audio based on an imaging unit that captures an image of the user of the conference terminal, an audio input unit that receives input of audio generated by the user, and audio information transmitted from another conference terminal Based on image information transmitted from another conference terminal, and detecting the position of the user's line of sight in the image displayed on the display unit as line of sight information A line-of-sight detection unit, an attention level calculation unit that calculates the degree of attention based on the line-of-sight information detected by the line-of-sight detection unit, and image information of the image captured by the imaging unit and the audio input unit The conference system according to claim 7, further comprising: a transmission unit that transmits the voice information of the voice and the attention level calculated by the attention level calculation unit to the communication control device.

The said display part receives the said attention degree transmitted from each other conference terminal from the said communication control apparatus, and displays this attention degree with the image imaged by the imaging part of each said other conference terminal. The conference system described in 1.

A program for causing a computer to control communication between a plurality of conference terminals,
Voice information of a voice uttered by a user using the first conference terminal among the plurality of conference terminals, and each user displayed on each conference terminal by each user using each conference terminal Receiving a degree of attention indicating a degree of attention;
Determining a priority indicating a degree of priority for communication between conference terminals based on the received attention degree;
And a step of controlling transmission of the audio information to other conference terminals other than the first conference terminal according to the determined priority.