JP2020088618A

JP2020088618A - Video conference system, communication terminal, and method for controlling microphone of communication terminal

Info

Publication number: JP2020088618A
Application number: JP2018220885A
Authority: JP
Inventors: 怜士川▲崎▼; Reiji Kawasaki; 龍彦長野; Tatsuhiko Nagano
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-06-04
Anticipated expiration: 2038-11-27
Also published as: JP7225735B2

Abstract

To appropriately determine the directivity for collecting sound of a microphone.SOLUTION: A video conference system according to one aspect of a disclosed technique includes a plurality of communication terminals, a camera for outputting an image to the communication terminals, and a microphone for outputting voice to the communication terminals, and performs a video conference. The communication terminal includes: a line-of-sight information storage unit that stores light-of-sight information indicating lines of sight of participants in the video conference detected from the image; and a directivity determination unit that determines the directivity for collecting sound of the microphone based on the stored line-of-sight information.SELECTED DRAWING: Figure 3

Description

本願は、ビデオ会議システム、通信端末、及び通信端末のマイクロホンの制御方法に関する。 The present application relates to a video conference system, a communication terminal, and a method of controlling a microphone of the communication terminal.

複数の遠隔地を結んで双方向の画像および音声による会議を行うビデオ会議システムが普及している。また、複数のマイクロホンの出力音声を信号処理し、所定の方向の集音の感度を高くする（集音の指向性を制御する）ことで、発言者の音声を高感度に集音し、周囲の不要な音の集音を抑制するビームフォーミング技術が知られている。 2. Description of the Related Art Video conferencing systems that connect two or more remote locations to hold a two-way video and audio conference have become popular. In addition, by processing the output voices of multiple microphones to increase the sensitivity of sound collection in a predetermined direction (controlling the directivity of the sound collection), the speaker's voice is collected with high sensitivity, and There is known a beamforming technique that suppresses the collection of unnecessary sound.

一方、ビデオ会議システムでは、ビデオ会議を実行する各拠点に配置された通信端末のうちの何れか１つの通信端末において、音が発生した領域（方向）に基づき、マイクロホンの集音の指向性を決定する技術が開示されている（例えば、特許文献１参照）。 On the other hand, in the video conference system, the directivity of the sound collection of the microphone is determined based on the area (direction) in which the sound is generated in any one of the communication terminals arranged in each base for executing the video conference. A technique for determining is disclosed (for example, see Patent Document 1).

しかしながら、特許文献１の技術では、ビデオ会議の参加者の周囲で大きな音が発生した際に、その音の発生源の方向にマイクロホンの集音の指向性を誤って決定する場合があった。 However, in the technique of Patent Document 1, when a loud sound is generated around the participants of the video conference, the directivity of the microphone sound collection may be erroneously determined in the direction of the sound source.

本発明は、上記の点に鑑みてなされたものであって、マイクロホンの集音の指向性を適切に決定することを課題とする。 The present invention has been made in view of the above points, and an object of the present invention is to appropriately determine the directivity of sound collection of a microphone.

開示の技術の一態様に係るビデオ会議システムは、複数の通信端末と、前記通信端末に画像を出力するカメラと、前記通信端末に音声を出力するマイクロホンと、を備え、ビデオ会議を実行するビデオ会議システムであって、前記通信端末は、前記画像から検出した前記ビデオ会議の参加者の視線を示す視線情報を蓄積する視線情報蓄積部と、蓄積された前記視線情報に基づき、前記マイクロホンの集音の指向性を決定する指向性決定部と、を有する。 A video conference system according to an aspect of the disclosed technology includes a plurality of communication terminals, a camera that outputs an image to the communication terminals, and a microphone that outputs audio to the communication terminals, and a video that performs a video conference. In the conference system, the communication terminal stores a line-of-sight information storage unit that stores line-of-sight information indicating a line-of-sight of a participant of the video conference detected from the image, and a collection of the microphones based on the stored line-of-sight information. And a directivity determining unit that determines the directivity of the sound.

本発明の一実施形態によれば、マイクロホンの集音の指向性を適切に決定することができる。 According to the embodiment of the present invention, it is possible to appropriately determine the directivity of the sound collection of the microphone.

実施形態に係るビデオ会議システムの構成の一例を説明する図である。It is a figure explaining an example of composition of a video conference system concerning an embodiment. 実施形態に係る通信端末のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the communication terminal which concerns on embodiment. 第１の実施形態に係る通信端末の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the communication terminal which concerns on 1st Embodiment. 参加者の視線の一例を説明する図である。It is a figure explaining an example of a participant's line of sight. 第１の実施形態に係る視線情報の一例を説明する図である。It is a figure explaining an example of the line-of-sight information which concerns on 1st Embodiment. 第１の実施形態に係る時間に応じた視線変化の一例を説明する図である。It is a figure explaining an example of a gaze change according to time concerning a 1st embodiment. 第１の実施形態に係る注目領域の一例を説明する図である。It is a figure explaining an example of the attention area|region which concerns on 1st Embodiment. 第１の実施形態に係るサブパケットに含まれる情報の一例を説明する図である。It is a figure explaining an example of the information contained in the subpacket which concerns on 1st Embodiment. マイクロホンの集音の指向性について説明する図であり、（ａ）はビームフォーミングの集音方向を説明する図であり、（ｂ）は注目領域情報と集音方向との対応関係を説明する図である。It is a figure explaining the directivity of the sound collection of a microphone, (a) is a figure explaining the sound collection direction of beamforming, (b) is a figure explaining the correspondence of attention area information and a sound collection direction. Is. 第１の実施形態に係る注目領域の検出処理の一例を示すフローチャートである。6 is a flowchart illustrating an example of a region-of-interest detection process according to the first embodiment. 第１の実施形態に係る指向性の制御処理の一例を示すフローチャートである。It is a flow chart which shows an example of directivity control processing concerning a 1st embodiment. 第１の実施形態に係るビデオ会議システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the video conference system which concerns on 1st Embodiment. 第１の実施形態に係るビデオ会議システムの効果の一例を説明する図であり、（ａ）は比較例に係る通信端末を用いるビデオ会議を説明する図であり、（ｂ）は第１の実施形態に係る通信端末を用いるビデオ会議を説明する図である。It is a figure explaining an example of an effect of a video conference system concerning a 1st embodiment, (a) is a figure explaining a video conference using a communication terminal concerning a comparative example, and (b) is a 1st implementation. It is a figure explaining the video conference using the communication terminal which concerns on a form. 第２の実施形態に係る通信端末の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the communication terminal which concerns on 2nd Embodiment. 第２の実施形態に係るビデオ会議システムの動作の一例を示すシーケンス図である。It is a sequence diagram which shows an example of operation|movement of the video conference system which concerns on 2nd Embodiment. 第３の実施形態に係る通信端末の機能構成の一例を示すブロック図である。It is a block diagram showing an example of functional composition of a communication terminal concerning a 3rd embodiment. 第３の実施形態に係る入力画面の一例を説明する図である。It is a figure explaining an example of the input screen which concerns on 3rd Embodiment.

以下、図面を参照して発明を実施するための形態について説明する。各図面において、同一の構成部分には同一符号を付し、重複した説明を省略する場合がある。 Hereinafter, embodiments for carrying out the invention will be described with reference to the drawings. In each drawing, the same reference numerals are given to the same components, and duplicate description may be omitted.

実施形態では、拠点Ａに設置された通信端末２Ａ、或いは２Ｂを例に説明する場合があるが、何れの拠点に設置された通信端末２も、説明に係る通信端末２Ａ、或いは２Ｂと同様の機能を有し、同様の動作を実行可能であるものとする。 In the embodiment, the communication terminal 2A or 2B installed in the base A may be described as an example, but the communication terminal 2 installed in any base is the same as the communication terminal 2A or 2B according to the description. It has a function and can execute the same operation.

＜実施形態に係るビデオ会議システムの構成＞
図１は、実施形態に係るビデオ会議システムの構成の一例を説明する図である。図１に示すように、拠点Ａに通信端末２Ａが設置され、拠点Ｂに通信端末２Ｂが設置されている。通信端末２Ａ及び２Ｂ（以下では、区別しない場合は、通信端末２という）は、インターネットやＬＡＮ（Local Area Network）等のネットワーク３を介してサーバ４と接続されている。但し、これに限定されるものではなく、ビデオ会議システム１に含まれる通信端末２の数（拠点の数）は任意に変更可能である。 <Structure of Video Conferencing System According to Embodiment>
FIG. 1 is a diagram illustrating an example of a configuration of a video conference system according to an embodiment. As shown in FIG. 1, the communication terminal 2A is installed at the site A and the communication terminal 2B is installed at the site B. Communication terminals 2A and 2B (hereinafter, referred to as communication terminal 2 if not distinguished) are connected to a server 4 via a network 3 such as the Internet or a LAN (Local Area Network). However, the number is not limited to this, and the number of communication terminals 2 (the number of bases) included in the video conference system 1 can be arbitrarily changed.

サーバ４は、各通信端末２がサーバ４と接続しているか否かを監視し、ビデオ会議開始時における通信端末２Ａ及び２Ｂの呼び出し制御等のビデオ会議時に必要な制御を行う。 The server 4 monitors whether or not each communication terminal 2 is connected to the server 4, and performs control necessary for a video conference such as call control of the communication terminals 2A and 2B at the start of the video conference.

ビデオ会議時の通信端末２Ａ及び２Ｂは、自端末のデータ送信時は、サーバ４に対して画像データ及び音声データの少なくとも１つ（以下、画像・音声データという）を送信し、サーバ４は、相手側の他の通信端末２に対して画像・音声データを送信する。 The communication terminals 2A and 2B at the time of the video conference transmit at least one of image data and audio data (hereinafter referred to as image/audio data) to the server 4 at the time of data transmission of the own terminal, and the server 4 The image/voice data is transmitted to the other communication terminal 2 on the partner side.

一方、データ受信時は、サーバ４を介して、相手側の他の通信端末２の画像・音声データを受信する。例えば拠点Ａと拠点Ｂでビデオ会議を行った場合、拠点Ａの通信端末２Ａが送信したデータは、サーバ４を介して拠点Ｂの通信端末２Ｂに送信され、他の通信端末２（ビデオ会議に参加していない通信端末２）には送信されない。 On the other hand, when receiving data, the image/voice data of the other communication terminal 2 on the partner side is received via the server 4. For example, when a video conference is held between the base A and the base B, the data transmitted by the communication terminal 2A of the base A is transmitted to the communication terminal 2B of the base B via the server 4, and is transmitted to the other communication terminal 2 (for the video conference). It is not transmitted to the communication terminal 2) which has not participated.

同様に、拠点Ｂの通信端末２Ｂが送信した画像・音声データは、サーバ４を介して、ビデオ会議に参加している拠点Ａの通信端末２Ａに送信され、会議に参加していない他の通信端末２には送信されない。このような制御を行うことで、複数の通信端末２(複数の拠点間)でビデオ会議を行うことができる。 Similarly, the image/audio data transmitted by the communication terminal 2B of the base B is transmitted to the communication terminal 2A of the base A participating in the video conference via the server 4 and other communication not participating in the conference. It is not transmitted to the terminal 2. By performing such control, it is possible to hold a video conference with a plurality of communication terminals 2 (between a plurality of bases).

なお、図１に示すビデオ会議システム１の構成は、一例であって他の構成であってもよい。 Note that the configuration of the video conference system 1 shown in FIG. 1 is an example and may have another configuration.

また、通信端末２は、通信機能を備えた装置であれば、ＰＪ（Projector：プロジェクタ）、画像形成装置、ＩＷＢ（Interactive White Board：相互通信が可能な電子式の黒板機能を有する白板）、デジタルサイネージ等の出力装置、ＨＵＤ（Head Up Display）装置、産業機械、撮像装置、集音装置、医療機器、ネットワーク家電、ノートＰＣ（Personal Computer）、携帯電話、スマートフォン、タブレット端末、ゲーム機、ＰＤＡ（Personal Digital Assistant）、デジタルカメラ、ウェアラブルＰＣまたはデスクトップＰＣ等であってもよい。 The communication terminal 2 is a device having a communication function, such as a PJ (Projector: projector), an image forming device, an IWB (Interactive White Board: a white board having an electronic blackboard function capable of mutual communication), a digital device. Output device such as signage, HUD (Head Up Display) device, industrial machine, imaging device, sound collecting device, medical device, network home appliance, notebook PC (Personal Computer), mobile phone, smartphone, tablet terminal, game machine, PDA ( It may be a Personal Digital Assistant), a digital camera, a wearable PC, a desktop PC, or the like.

＜実施形態に係る通信端末のハードウェア構成＞
次に、実施形態に係る通信端末のハードウェア構成について説明する。図２は、通信端末２のハードウェア構成の一例を説明する図である。通信端末２は一例としてＩＷＢである。 <Hardware Configuration of Communication Terminal According to Embodiment>
Next, the hardware configuration of the communication terminal according to the embodiment will be described. FIG. 2 is a diagram illustrating an example of the hardware configuration of the communication terminal 2. The communication terminal 2 is an IWB as an example.

通信端末２は、ＣＰＵ（Central Proccesing Unit）２０１と、ＲＯＭ（Read Only Memory）２０２と、ＲＡＭ（Random Access Memory）２０３と、ＳＳＤ（Solid State Drive）２０４と、ネットワークコントローラ２０５と、センサコントローラ２０６と、キャプチャデバイス２０７とを有している。また通信端末２は、電子ペンコントローラ２０８と、外部記憶コントローラ２０９と、ＧＰＵ２１０と、ディスプレイコントローラ２１１と、カメラコントローラ２１２と、マイクロホンコントローラ２１３と、スピーカコントローラ２１４とを有している。これらはバスＢを介して相互に接続されている。 The communication terminal 2 includes a CPU (Central Proccesing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, an SSD (Solid State Drive) 204, a network controller 205, and a sensor controller 206. , And a capture device 207. The communication terminal 2 also includes an electronic pen controller 208, an external storage controller 209, a GPU 210, a display controller 211, a camera controller 212, a microphone controller 213, and a speaker controller 214. These are connected to each other via a bus B.

ＣＰＵ２０１は、ＲＯＭ２０２やＳＳＤ２０４等の記憶装置からプログラムやデータをＲＡＭ２０３上に読み出し、処理を実行することで、通信端末２全体の制御や機能を実現する演算装置である。 The CPU 201 is an arithmetic device that realizes control and functions of the entire communication terminal 2 by reading programs and data from a storage device such as the ROM 202 and SSD 204 onto the RAM 203 and executing processing.

ＲＯＭ２０２は、電源を切ってもプログラムやデータを保持することが可能な不揮発性の半導体メモリ（記憶装置）である。ＲＯＭ２０２には、通信端末２の起動時に実行されるＢＩＯＳ（Basic Input／Output System）、ＯＳ設定、及びネットワーク設定等のプログラムやデータが格納されている。ＲＡＭ２０３は、プログラムやデータを一時保持する揮発性の半導体メモリ（記憶装置）である。 The ROM 202 is a non-volatile semiconductor memory (storage device) that can retain programs and data even when the power is turned off. The ROM 202 stores programs and data such as a BIOS (Basic Input/Output System) executed when the communication terminal 2 is activated, OS settings, and network settings. The RAM 203 is a volatile semiconductor memory (storage device) that temporarily holds programs and data.

ＳＳＤ２０４は、ＩＷＢの座標検出システム用のアプリケーションや各種データが記憶された不揮発メモリである。なお、座標検出システム用のアプリケーションは、外部メモリ２１６に記憶された状態で取得されてもよいし、ネットワークコントローラ２０５を介してサーバ等からダウンロードされてもよい。ネットワークコントローラ２０５は、ネットワーク３（図１参照）を介してサーバなどと通信する際に通信プロトコルに基づく処理を実行することができる。なお、ＳＳＤ２０４はＨＤＤ（Hard Disk Drive）であってもよい。 The SSD 204 is a non-volatile memory that stores an application for the IWB coordinate detection system and various data. The application for the coordinate detection system may be acquired while being stored in the external memory 216, or may be downloaded from a server or the like via the network controller 205. The network controller 205 can execute processing based on a communication protocol when communicating with a server or the like via the network 3 (see FIG. 1). The SSD 204 may be an HDD (Hard Disk Drive).

センサコントローラ２０６は、ＩＷＢの備えるディスプレイ２１７に、電子ペン２１５、及び指が接触した場合に、接触位置の座標検出処理を実行することができる。キャプチャデバイス２０７は、ＰＣ２２１に電気的に接続され、ＰＣ２２１の備える表示装置に表示されている画像、又は映像をキャプチャすることができる。電子ペンコントローラ２０８は、入力装置である電子ペン２１５に電気的に接続され、電子ペン２１５による超音波や赤外線等の発生を制御することができる。 The sensor controller 206 can execute coordinate detection processing of a contact position when the electronic pen 215 and a finger touch the display 217 included in the IWB. The capture device 207 is electrically connected to the PC 221 and can capture an image or video displayed on the display device included in the PC 221. The electronic pen controller 208 is electrically connected to the electronic pen 215, which is an input device, and can control the generation of ultrasonic waves, infrared rays, and the like by the electronic pen 215.

外部記憶コントローラ２０９は、着脱可能な外部メモリ２１６に対する書き込み、或いは外部メモ２１６リからの読み出しを行うことができる。外部メモリ２１６は、ＵＳＢ（Universal Serial Bus）メモリやＳＤカード等である。 The external storage controller 209 can write to the removable external memory 216 or read from the external memo 216. The external memory 216 is a USB (Universal Serial Bus) memory, an SD card, or the like.

ＧＰＵ（Graphics Processing Unit）２１０は、ディスプレイ２１７の各ピクセルの画素値を演算する描画専用のプロセッサである。ディスプレイコントローラ２１１は、ディスプレイ２１７に電気的に接続されており、ＧＰＵ２１０が生成した画像をディスプレイ２１７に出力することができる。ディスプレイ２１７は、ビデオ会議を実行する他の通信端末２からネットワーク３を介して伝送されたビデオ会議参加者の映像を表示することができる。 A GPU (Graphics Processing Unit) 210 is a drawing-only processor that calculates the pixel value of each pixel of the display 217. The display controller 211 is electrically connected to the display 217 and can output the image generated by the GPU 210 to the display 217. The display 217 can display the image of the video conference participant transmitted via the network 3 from the other communication terminal 2 that executes the video conference.

カメラコントローラ２１２は、カメラ２１８に電気的に接続され、カメラ２１８による撮像を制御することができる。カメラ２１８は、後述するように、ビデオ会議時の参加者を撮像するために用いられ、カメラ２１８で撮像された画像から参加者の視線が検出される。カメラコントローラ２１２は、カメラ２１８により撮像された画像データをＣＰＵ２０１に出力することができる。なお、参加者の視線検出のために、カメラの代わりに独立した視線センサモジュールを用いてもよい。 The camera controller 212 is electrically connected to the camera 218 and can control imaging by the camera 218. As will be described later, the camera 218 is used to capture an image of the participant during the video conference, and the line of sight of the participant is detected from the image captured by the camera 218. The camera controller 212 can output the image data captured by the camera 218 to the CPU 201. An independent line-of-sight sensor module may be used instead of the camera for detecting the line-of-sight of the participant.

マイクロホンコントローラ２１３は、マイクロホン２１９に電気的に接続され、マイクロホン２１９の集音の感度の高さ及び指向性等を制御することができる。なお、マイクロホン２１９の指向性については、別途詳述する。 The microphone controller 213 is electrically connected to the microphone 219 and can control the high sensitivity and directivity of the sound collection of the microphone 219. The directivity of the microphone 219 will be described in detail later.

スピーカコントローラ２１４は、スピーカ２２０に電気的に接続され、スピーカ２２０により発生させる音の大きさ等を制御することができる。 The speaker controller 214 is electrically connected to the speaker 220 and can control the volume of sound generated by the speaker 220 and the like.

＜第１の実施形態に係る通信端末の機能構成＞
次に、第１の実施形態に係るビデオ会議システムについて説明する。先ず、第１の実施形態に係る通信端末の機能構成について説明する。図３は、本実施形態に係る通信端末の機能構成の一例を説明するブロック図である。 <Functional configuration of communication terminal according to first embodiment>
Next, the video conference system according to the first embodiment will be described. First, the functional configuration of the communication terminal according to the first embodiment will be described. FIG. 3 is a block diagram illustrating an example of the functional configuration of the communication terminal according to the present embodiment.

通信端末２Ａは、撮像部２１と、視線検出部２２と、視線情報蓄積部２３と、注目領域検出部２４と、サブパケット生成部２５と、送信部２６と、受信部２７と、指向性決定部２８と、指向性制御部２９とを有している。 The communication terminal 2A includes an imaging unit 21, a line-of-sight detection unit 22, a line-of-sight information storage unit 23, an attention area detection unit 24, a subpacket generation unit 25, a transmission unit 26, a reception unit 27, and a directivity determination. It has a unit 28 and a directivity control unit 29.

撮像部２１は、ビデオ会議に参加している拠点Ａでの参加者の画像を撮像し、撮像した画像情報を視線検出部２２に出力する機能を有する。参加者の画像には、参加者の視線を検出するために、少なくとも参加者の目が含まれている。 The imaging unit 21 has a function of capturing an image of a participant at the site A participating in the video conference and outputting the captured image information to the line-of-sight detection unit 22. The image of the participant includes at least the eyes of the participant in order to detect the line of sight of the participant.

視線検出部２２は、撮像部２１から入力した画像情報から参加者の視線を検出し、視線情報蓄積部２３に出力する機能を有する。また、視線情報蓄積部２３は視線検出部２２から入力した視線情報を蓄積する機能を有する。 The line-of-sight detection unit 22 has a function of detecting the line of sight of the participant from the image information input from the image pickup unit 21 and outputting the line of sight to the line-of-sight information storage unit 23. Further, the line-of-sight information storage unit 23 has a function of storing the line-of-sight information input from the line-of-sight detection unit 22.

ここで、図４は、参加者の視線の一例を説明する図である。図４は、拠点Ａでの参加者１０１が、拠点Ｂでの参加者１０２ａ〜１０２ｄの映像が表示されているディスプレイ２１７を観察している様子を示している。 Here, FIG. 4 is a diagram illustrating an example of the line of sight of the participant. FIG. 4 shows that the participant 101 at the site A is observing the display 217 on which the images of the participants 102a to 102d at the site B are displayed.

例えば、拠点Ｂで参加者１０２ａが発言をした場合、参加者１０１はディスプレイ２１７で参加者１０２ａが表示されている領域に視線を向け、また参加者１０２ｄが発言をした場合、参加者１０１はディスプレイ２１７で参加者１０２ｄが表示されている領域に視線を向ける。参加者の視線情報は、このように、参加者１０１がディスプレイ２１７で視線を向けている領域を示す情報をいう。 For example, when the participant 102a speaks at the site B, the participant 101 looks at the area where the participant 102a is displayed on the display 217, and when the participant 102d speaks, the participant 101 displays on the display. At 217, the line of sight is turned to the area where the participant 102d is displayed. The line-of-sight information of the participant is information indicating the region where the line-of-sight of the participant 101 is directed on the display 217 as described above.

視線検出部２２は、拠点Ａで撮像部２１が撮像した参加者の画像を画像処理することで、視線情報を検出することができる。視線検出部２２は、画像処理として、例えば、画像から人物の目の黒目に該当する画像領域を抽出し、黒目の重心位置座標を視線情報として算出する処理を実行することができる。視線検出の画像処理方法には、公知技術を適用することができるため、ここでは詳細な説明を省略する。 The line-of-sight detection unit 22 can detect the line-of-sight information by performing image processing on the image of the participant captured by the image capturing unit 21 at the site A. The line-of-sight detection unit 22 can execute, for example, a process of extracting an image region corresponding to a black eye of a person's eye from an image and calculating barycentric position coordinates of the black eye as line-of-sight information as image processing. Since a known technique can be applied to the image processing method for detecting the line of sight, detailed description thereof will be omitted here.

図５は、視線検出部２２が検出し、視線情報蓄積部２３が蓄積する視線情報の一例を説明する図である。視線検出部２２は、時間に応じて変化する参加者の視線を検出し、順次に視線情報蓄積部２３に出力することができる。また、視線情報蓄積部２３は、図５に示すように、視線情報を示す重心位置座標を時間毎で蓄積することができる。 FIG. 5 is a diagram illustrating an example of the line-of-sight information detected by the line-of-sight detection unit 22 and stored by the line-of-sight information storage unit 23. The line-of-sight detection unit 22 can detect the line-of-sight of the participant, which changes with time, and sequentially output the lines of sight to the line-of-sight information storage unit 23. Further, as shown in FIG. 5, the line-of-sight information storage unit 23 can store the barycentric position coordinates indicating the line-of-sight information at each time.

ここで、図６は、時間に応じた視線変化の一例を説明する図である。図６は、ディスプレイ２１７上での視線の軌跡を表す視線情報分布６１を示している。 Here, FIG. 6 is a diagram illustrating an example of a change in the line of sight according to time. FIG. 6 shows the line-of-sight information distribution 61 that represents the locus of the line of sight on the display 217.

拠点Ａでのビデオ会議の参加者が複数人いる場合は、視線検出部２２は各人の視線情報を検出し、視線情報蓄積部２３は、人数分の視線情報を蓄積することができる。 When there are a plurality of participants in the video conference at the site A, the line-of-sight detection unit 22 detects the line-of-sight information of each person, and the line-of-sight information storage unit 23 can store the line-of-sight information for the number of people.

図３に戻り、各機能部の説明を続ける。 Returning to FIG. 3, the description of each functional unit will be continued.

注目領域検出部２４は、視線情報蓄積部２３が蓄積した視線情報に基づき、拠点Ａで参加者が注目している注目領域を検出し、検出した注目領域情報をサブパケット生成部２５に出力する機能を有する。 The attention area detection unit 24 detects the attention area in which the participant is paying attention at the site A based on the line-of-sight information accumulated by the line-of-sight information accumulation unit 23, and outputs the detected attention region information to the subpacket generation unit 25. Have a function.

例えば、注目領域検出部２４は、視線が集中する時間を閾値とし、予め定められた時間閾値以上に視線が集中した領域を注目領域として検出することができる。具体的には、時間閾値を２秒とし、視線検出の検出時間間隔を１０ミリ秒とすると、２０回以上の検出で同じ領域に視線が向けられていた場合、注目領域検出部２４は、この領域を注目領域として検出することができる。 For example, the attention area detection unit 24 can detect, as a attention area, an area in which the eye gaze is concentrated for a predetermined time threshold or more with a time when the eye gaze is concentrated as a threshold. Specifically, when the time threshold is set to 2 seconds and the detection time interval of the line-of-sight detection is set to 10 milliseconds, when the line of sight is directed to the same region in 20 or more detections, the attention area detection unit 24 A region can be detected as a region of interest.

なお、この「領域」は、注目領域を検出する単位領域として予め決定されている。単位領域は、１×１画素の領域や１０×１０画素の領域等を任意に設定することができる。時間を閾値にすることで、参加者が凝視した領域を注目領域として検出することができる。 The "area" is predetermined as a unit area for detecting the attention area. As the unit area, a 1×1 pixel area, a 10×10 pixel area, or the like can be arbitrarily set. By setting the time as the threshold value, the region gazed by the participant can be detected as the attention region.

また、注目領域検出部２４は、視線が集中する回数を閾値とし、予め定められた回数閾値以上に視線が集中した領域を注目領域として検出することもできる。時間を閾値にした場合との違いとしては、同じ領域を複数回の検出で連続して取得した場合に、集中した時間によらず１回とカウントする点である。 Further, the attention area detection unit 24 can detect the area where the gaze is concentrated more than a predetermined number of times threshold as the attention area, with the number of times the eyes are concentrated as a threshold. The difference from the case where the time is set as the threshold value is that when the same region is continuously acquired by a plurality of detections, it is counted once regardless of the concentrated time.

具体的には、回数の閾値を１０回とし、視線検出の検出時間間隔を１０ミリ秒とすると、同じ領域が不連続に１０回検出された場合に、注目領域検出部２４は、この領域を注目領域として検出することができる。回数を閾値にすることで、注目しているときに時折視線を外すことがあっても、適切に注目領域を検出することが可能となる。 Specifically, when the threshold value of the number of times is 10 and the detection time interval of the line-of-sight detection is 10 milliseconds, when the same area is detected 10 times discontinuously, the attention area detection unit 24 detects this area. It can be detected as a region of interest. By setting the number of times to the threshold value, it becomes possible to appropriately detect the attention area even if the line of sight is occasionally missed when attention is paid.

また、注目領域検出部２４は、視線を集中した参加者の人数を閾値として、予め定められた人数閾値以上の参加者が視線を集中した領域を注目領域として検出することもできる。拠点Ａでのビデオ会議の参加者が複数いる場合は、視線情報分布６１（図６）が人数分得られるが、注目領域検出部２４は、閾値以上の人数の参加者の視線が集中した領域を、注目領域として検出することができる。 Further, the attention area detection unit 24 can also detect, as a attention area, an area in which the number of participants whose eye gaze is concentrated is a threshold value and the number of participants whose eye gaze is equal to or more than a predetermined number of people eyes is concentrated. When there are a plurality of participants in the video conference at the location A, the line-of-sight information distribution 61 (FIG. 6) is obtained for the number of people, but the attention area detection unit 24 is an area in which the eyes of a number of participants equal to or greater than the threshold are concentrated. Can be detected as a region of interest.

人数を閾値にすることで、よそ見をしている参加者がいたとしても、他の参加者の視線情報から適切に注目領域を検出することが可能となる。 By setting the number of people as the threshold value, it is possible to appropriately detect the attention area from the line-of-sight information of other participants even if there are participants who are looking away.

注目領域を検出するための閾値は、上述の時間、回数、及び人数を組み合わせて用いてもよい。組み合わせることで、注目領域の検出精度をより向上させることができる。 The threshold value for detecting the attention area may be used in combination with the time, the number of times, and the number of people. By combining them, the detection accuracy of the attention area can be further improved.

図７は、注目領域の一例を説明する図である。視線情報分布６１において、破線で囲った領域６２ａ〜６２ｄが注目領域として検出されている。 FIG. 7 is a diagram illustrating an example of the attention area. In the line-of-sight information distribution 61, areas 62a to 62d surrounded by broken lines are detected as attention areas.

図３に戻り、説明を各機能部の説明を続ける。 Returning to FIG. 3, the description of each functional unit will be continued.

サブパケット生成部２５は、注目領域検出部２４が検出した注目領域情報を入力し、拠点Ｂ等の他拠点に送信するためのサブパケットを生成し、送信部２６に出力する機能を有する。また送信部２６は、入力したサブパケットを、参加者の映像及び音声や同期データ等の会議データに含ませて、他の拠点に送信する機能を有する。なお、送信部２６がサブパケットを送信する他の拠点に設置された通信端末２Ｂは、「他の第１通信端末」の一例である。また、注目領域検出部２４が検出した注目領域情報は、「所定の座標情報」の一例である。 The subpacket generation unit 25 has a function of receiving the attention area information detected by the attention area detection unit 24, generating a subpacket for transmission to another base such as the base B, and outputting the subpacket to the transmission unit 26. Further, the transmission unit 26 has a function of including the input subpacket in the conference data such as the video and audio of the participant and the synchronization data, and transmitting the subpacket to another base. Note that the communication terminal 2B installed at another site to which the transmission unit 26 transmits the subpacket is an example of “another first communication terminal”. The attention area information detected by the attention area detection unit 24 is an example of “predetermined coordinate information”.

ここで、図８は、サブパケットに含まれる情報の一例を説明する図である。図８では、注目領域のＸ、Ｙ座標が一覧で示されている。単位領域が複数画素（１０×１０画素等）で構成される場合は、単位領域の中心座標等が注目領域の座標情報となる。 Here, FIG. 8 is a diagram illustrating an example of information included in a subpacket. In FIG. 8, the X and Y coordinates of the attention area are shown in a list. When the unit area is composed of a plurality of pixels (10×10 pixels or the like), the center coordinates of the unit area or the like becomes the coordinate information of the attention area.

ここで、サブパケット生成部２５は、注目領域の全ての座標情報ではなく、一部の座標情報からサブパケットを生成してもよい。一部の座標としては、Ｘ座標の最大値、Ｘ座標の最小値、Ｙ座標の最大値、及びＹ座標の最小値の少なくとも１つ等が挙げられる。 Here, the subpacket generation unit 25 may generate the subpacket from some coordinate information instead of all coordinate information of the attention area. As some of the coordinates, at least one of the maximum value of the X coordinate, the minimum value of the X coordinate, the maximum value of the Y coordinate, the minimum value of the Y coordinate, and the like can be given.

マイクロホン２１９の指向性の決定（詳細は後述）に影響が大きいのは、注目領域のうちの端部の領域であるため、最大値及び最小値という端部のデータからサブパケットを生成してもよい。また、Ｙ座標と比較してＸ座標は指向性の決定に影響が大きいため、Ｘ座標からサブパケットを生成してもよい。このように注目領域の一部の座標情報からサブパケットを生成することで、送信するデータ量を削減し、通信の負荷を低減し、また通信速度を上げることができる。 The determination of the directivity of the microphone 219 (details will be described later) has a large effect on the end region of the region of interest, so even if a subpacket is generated from the end data of the maximum value and the minimum value. Good. Further, since the X coordinate has a greater influence on the determination of the directivity than the Y coordinate, the subpacket may be generated from the X coordinate. By thus generating the subpacket from the coordinate information of a part of the attention area, it is possible to reduce the amount of data to be transmitted, reduce the communication load, and increase the communication speed.

一方で、ビデオ会議を行う他の拠点で、同じ画素数、及び／又は画面サイズのディスプレイを使用しているとは限らない。そのため、この差異に起因して、注目領域情報に基づいたマイクロホン２１９の指向性の決定を適切に行えない場合がある。 On the other hand, the display having the same number of pixels and/or the same screen size is not always used at another site where the video conference is held. Therefore, due to this difference, it may not be possible to appropriately determine the directivity of the microphone 219 based on the attention area information.

そこで、サブパケット生成部２５は、注目領域情報とともに、製品のモデル識別番号をサブパケットに含めてもよい。受信側でディスプレイの画素数、及び／又は画面サイズの差異の影響を補正することで、適切にマイクロホン２１９の指向性を決定することができる。 Therefore, the subpacket generation unit 25 may include the model identification number of the product in the subpacket together with the attention area information. The directivity of the microphone 219 can be appropriately determined by correcting the influence of the difference in the number of display pixels and/or the screen size on the receiving side.

ここで、マイクロホン２１９の集音の指向性について説明する。本実施形態では、マイクロホン２１９の集音の指向性の制御のために、ビームフォーミング技術を用いることができる。マイクロホンのビームフォーミング（以降では、単にビームフォーミングという）とは、複数のマイクロホンを用い、所定の方向に音波の指向性を高める技術である。 Here, the directivity of the sound collection of the microphone 219 will be described. In the present embodiment, the beamforming technique can be used to control the directivity of the sound collection of the microphone 219. The beam forming of a microphone (hereinafter, simply referred to as beam forming) is a technique of using a plurality of microphones and increasing the directivity of sound waves in a predetermined direction.

具体的には、複数のマイクロホンが出力する音声を信号処理することで、集音の感度を所定の方向では高く、それ以外の方向では低くするようにする。これにより、ビデオ会議において、発言者の音は集音感度を上げて聞こえやすくし、周囲の不要な音は集音感度を下げて聞こえ難くすることができる。 Specifically, the sound output from a plurality of microphones is subjected to signal processing so that the sensitivity of sound collection is high in a predetermined direction and low in other directions. Thus, in the video conference, the sound of the speaker can be made easy to hear by increasing the sound collection sensitivity, and unnecessary sounds around can be made hard to hear by lowering the sound collection sensitivity.

図９は、マイクロホンの集音の指向性について説明する図である。（ａ）はビームフォーミングの集音方向を説明する図であり、（ｂ）は注目領域情報と集音方向との対応関係を説明する図である。 FIG. 9 is a diagram illustrating the directivity of the sound collection of the microphone. (A) is a figure explaining the sound collection direction of beamforming, (b) is a figure explaining the correspondence of attention area information and a sound collection direction.

本実施形態では、図９（ａ）に示すように、ディスプレイ２１７のディスプレイ面に垂直な方向を集音方向の０度とし、図中左側に回転する方向を正の集音方向とし、右側に回転する方向を負の集音方向としている。また、図９（ａ）の±９０度方向（水平方向）に対応する方向を撮像部２１による画像のＸ方向とした場合に、注目領域情報と集音方向とを図６（ｂ）に示す対応関係としている。 In the present embodiment, as shown in FIG. 9A, the direction perpendicular to the display surface of the display 217 is 0 degrees of the sound collecting direction, the direction rotating to the left side in the drawing is the positive sound collecting direction, and the right side is the right direction. The direction of rotation is the negative sound collection direction. Further, when the direction corresponding to the ±90 degree direction (horizontal direction) in FIG. 9A is the X direction of the image by the image pickup unit 21, the attention area information and the sound collection direction are shown in FIG. 6B. Correspondence.

図６（ｂ）の対応関係を参照して、注目領域情報のＸ座標に基づき、マイクロホンの集音方向の最大値と最小値を示す角度、すなわち指向性を設定することができる。例えば、注目領域情報のＸ座標の最大値が２８０画素で最小値が２４０画素の場合、指向性は最小値０度〜最大値２５度に設定され、この方向に位置する参加者１０１ａ及び１０１ｂの発する音に対する集音感度が上がり、逆に参加者１０１ｃ及び１０１ｄの発する音に対する集音感度は下がる。このようにして注目領域に応じて指向性が決定される。図９（ｂ）の対応関係を示すデータは、予め決定され、ＳＳＤ２０４等のメモリに記憶されている。 With reference to the correspondence relationship in FIG. 6B, the angle indicating the maximum value and the minimum value in the sound collection direction of the microphone, that is, the directivity can be set based on the X coordinate of the attention area information. For example, when the maximum value of the X coordinate of the attention area information is 280 pixels and the minimum value is 240 pixels, the directivity is set to the minimum value 0 degree to the maximum value 25 degrees, and the participants 101a and 101b located in this direction have directivity. The sound collection sensitivity for sounds emitted increases, and conversely, the sound collection sensitivity for sounds emitted by the participants 101c and 101d decreases. In this way, the directivity is determined according to the attention area. The data indicating the correspondence relationship in FIG. 9B is determined in advance and stored in the memory such as the SSD 204.

なお、注目領域情報のＹ座標は、ディスプレイから参加者までの距離に対応するため、参加者までの距離が長い場合は、マイクロホンの集音感度を上げる等、Ｙ座標の値に応じてマイクロホンの集音感度を決定してもよい。 Since the Y coordinate of the attention area information corresponds to the distance from the display to the participant, when the distance to the participant is long, the sound collection sensitivity of the microphone is increased, and the Y coordinate of the microphone is changed according to the value of the Y coordinate. The sound collection sensitivity may be determined.

図３に戻り、各機能部の説明を続けると、受信部２７は、他の拠点から注目領域情報を受信し、指向性決定部２８に出力する機能を有する。なお、受信部２７が受信する注目領域情報を送信する、他の拠点に設置された通信端末２Ｂは、「他の第２通信端末」の一例である。なお、上述の「他の第１の通信端末」と「他の第２の通信端末」は、同じ通信端末であってもよいし、異なる通信端末であってもよい。 Returning to FIG. 3, continuing description of each functional unit, the receiving unit 27 has a function of receiving attention area information from another base and outputting it to the directivity determining unit 28. In addition, the communication terminal 2B installed in another site, which transmits the attention area information received by the receiving unit 27, is an example of “another second communication terminal”. The above-mentioned “other first communication terminal” and “other second communication terminal” may be the same communication terminal or different communication terminals.

指向性決定部２８は、受信部２７が受信した注目領域情報に基づき、ＳＳＤ２０４等に記憶された図９（ｂ）の対応関係を示すデータを参照してマイクロホン２１９の集音の指向性を決定し、決定した指向性情報を指向性制御部２９に出力する機能を有する。 The directivity determination unit 28 determines the directivity of the sound collection of the microphone 219 based on the attention area information received by the reception unit 27 with reference to the data indicating the correspondence relationship of FIG. 9B stored in the SSD 204 or the like. However, the directivity control unit 29 has a function of outputting the determined directivity information.

指向性制御部２９は、入力した指向性情報に基づき、マイクロホン２１９の集音の指向性を制御することができる。 The directivity control unit 29 can control the directivity of the sound collection of the microphone 219 based on the input directivity information.

なお、本実施形態では、公知のビームフォーミング技術を適用することができるため、ビームフォーミング制御技術等の詳細な説明は省略する。 In this embodiment, since a known beamforming technique can be applied, detailed description of the beamforming control technique and the like will be omitted.

＜第１の実施形態に係るビデオ会議システムの動作＞
次に、図１０は、本実施形態に係る通信端末２Ａによる注目領域の検出処理の一例を示すフローチャートである。 <Operation of the video conference system according to the first embodiment>
Next, FIG. 10 is a flowchart showing an example of attention area detection processing by the communication terminal 2A according to the present embodiment.

先ず、ステップＳ１０１において、撮像部２１は、拠点Ａでのビデオ会議への参加者の画像を撮像し、撮像した画像情報を視線検出部２２に出力する。 First, in step S101, the image capturing unit 21 captures an image of a participant in a video conference at the site A, and outputs the captured image information to the line-of-sight detecting unit 22.

続いて、ステップＳ１０２において、視線検出部２２は、入力した画像情報から参加者の視線を検出し、検出した視線情報を視線情報蓄積部２３に出力する。なお、参加者が複数いる場合は、各参加者の視線を検出し、各参加者の視線情報を視線情報蓄積部２３に出力する。 Subsequently, in step S102, the line-of-sight detection unit 22 detects the line of sight of the participant from the input image information, and outputs the detected line-of-sight information to the line-of-sight information storage unit 23. When there are a plurality of participants, the line of sight of each participant is detected and the line-of-sight information of each participant is output to the line-of-sight information storage unit 23.

続いて、ステップＳ１０３において、視線情報蓄積部２３は、入力した視線情報を蓄積する。なお、参加者が複数いる場合は、各参加者の視線情報を蓄積する。 Subsequently, in step S103, the line-of-sight information storage unit 23 stores the input line-of-sight information. If there are multiple participants, the line-of-sight information of each participant is stored.

続いて、ステップＳ１０４において、視線検出部２２は、所定の時間を経過したか否かを判定する。この「所定の時間」は、視線情報を蓄積するために予め定められた時間である。 Subsequently, in step S104, the line-of-sight detection unit 22 determines whether a predetermined time has passed. This "predetermined time" is a predetermined time for accumulating the line-of-sight information.

所定の時間が経過していないと判断された場合は（ステップＳ１０４、Ｎｏ）、ステップＳ１０１に戻る。一方、所定の時間が経過していると判断された場合は（ステップＳ１０４、Ｙｅｓ）、ステップＳ１０５において、注目領域検出部２４は、視線情報蓄積部２３により蓄積された視線情報に基づき、拠点Ａで参加者が注目している注目領域を検出する。そして、検出した注目領域情報をサブパケット生成部２５に出力する。 When it is determined that the predetermined time has not elapsed (No in step S104), the process returns to step S101. On the other hand, when it is determined that the predetermined time has passed (Yes at Step S104), the attention area detection unit 24 at Step S105, based on the line-of-sight information accumulated by the line-of-sight information accumulation unit 23, the base A Detects the attention area in which the participant is paying attention. Then, the detected attention area information is output to the subpacket generation unit 25.

続いて、ステップＳ１０６において、サブパケット生成部２５は、入力した注目領域情報から拠点Ｂ等の他拠点に送信するためのサブパケットを生成し、送信部２６に出力する。 Subsequently, in step S106, the subpacket generation unit 25 generates a subpacket to be transmitted to another base such as the base B from the input attention area information, and outputs the subpacket to the transmission unit 26.

続いて、ステップＳ１０７において、送信部２６は、入力したサブパケットを、参加者の映像及び音声や、同期データ等の会議データに含ませて、拠点Ｂ等の他の拠点に送信する。 Subsequently, in step S107, the transmission unit 26 includes the input subpacket in the conference data such as the video and audio of the participant and the synchronization data, and transmits the subpacket to another base such as the base B.

このようにして、通信端末２Ａは拠点Ａでのビデオ会議への参加者の注目領域情報を、拠点Ｂ等の他拠点に送信することができる。 In this way, the communication terminal 2A can transmit the attention area information of the participants in the video conference at the base A to another base such as the base B.

次に、図１１は、本実施形態に係る通信端末２Ａによる指向性の制御処理の一例を示すフローチャートである。 Next, FIG. 11 is a flowchart showing an example of directivity control processing by the communication terminal 2A according to the present embodiment.

先ず、ステップＳ１１１において、受信部２７は、注目領域情報が含まれるサブパケットを受信したか否かを判定する。注目領域情報が含まれるサブパケットを受信していない場合は（ステップＳ１１１、Ｎｏ）、再度ステップＳ１１１の処理を実行する。一方、注目領域情報を受信した場合は（ステップＳ１１１、Ｙｅｓ）、ステップＳ１１２において、受信部２７は受信した注目領域情報を指向性決定部２８に出力する。指向性決定部２８は、入力した注目領域情報に基づき、ＳＳＤ２０４等に記憶された注目領域情報と集音方向との対応関係を示すデータを参照し、マイクロホン２１９の集音の指向性を決定する。そして、決定した指向性情報を指向性制御部２９に出力する。 First, in step S111, the receiving unit 27 determines whether or not a subpacket including the attention area information is received. If the subpacket including the attention area information is not received (step S111, No), the process of step S111 is executed again. On the other hand, when the attention area information is received (Yes in step S111), the receiving unit 27 outputs the received attention area information to the directivity determining unit 28 in step S112. Based on the input attention area information, the directivity determination unit 28 refers to the data indicating the correspondence between the attention area information and the sound collection direction stored in the SSD 204 or the like, and determines the sound collection directivity of the microphone 219. .. Then, the determined directivity information is output to the directivity control unit 29.

続いて、ステップＳ１１３において、指向性制御部２９は、入力した指向性情報に基づき、マイクロホン２１９の集音の指向性を制御する。 Subsequently, in step S113, the directivity control unit 29 controls the directivity of the sound collection of the microphone 219 based on the input directivity information.

このようにして、通信端末２Ａは、拠点Ｂ等の他拠点から受信した注目領域情報に基づき、通信端末２Ａの備えるマイクロホン２１９の指向性を決定し、制御することができる。 In this way, the communication terminal 2A can determine and control the directivity of the microphone 219 included in the communication terminal 2A based on the attention area information received from another site such as the site B.

次に、図１２は、本実施形態に係るビデオ会議システムの動作の一例を示すシーケンス図である。 Next, FIG. 12 is a sequence diagram showing an example of the operation of the video conference system according to the present embodiment.

先ず、ステップＳ１２１において、通信端末２Ａは、通信端末２Ｂに対してビデオ会議の開始を要求する信号を送信する。 First, in step S121, the communication terminal 2A transmits a signal requesting the start of a video conference to the communication terminal 2B.

続いて、ステップＳ１２２において、通信端末２Ｂは、ビデオ会議を開始可能である場合は、その旨を示す要求応答信号を通信端末２Ａに送信する。 Subsequently, in step S122, when the video conference can be started, the communication terminal 2B transmits a request response signal indicating that to the communication terminal 2A.

続いて、ステップＳ１２３において、通信端末２Ａは、図１０で説明した注目領域検出処理を実行する。 Then, in step S123, the communication terminal 2A executes the attention area detection process described in FIG.

続いて、ステップＳ１２４において、通信端末２Ａは、注目領域情報を含むサブパケットを、会議データに含めて通信端末Ｂに送信する。 Subsequently, in step S124, the communication terminal 2A includes the subpacket including the attention area information in the conference data and transmits the conference data to the communication terminal B.

続いて、ステップＳ１２５において、通信端末２Ｂは、受信したサブパケットに含まれる注目領域情報に基づいて、図１１で説明した指向性の制御処理を実行する。 Subsequently, in step S125, the communication terminal 2B executes the directivity control process described in FIG. 11 based on the attention area information included in the received subpacket.

続いて、ステップＳ１２６において、ビデオ会議を終了する場合は、通信端末２Ａは、通信端末２Ｂに対してビデオ会議の終了を要求する信号を送信する。 Subsequently, in step S126, when ending the video conference, the communication terminal 2A transmits a signal requesting the end of the video conference to the communication terminal 2B.

続いて、ステップＳ１２７において、通信端末２Ｂは、ビデオ会議を終了可能である場合は、その旨を示す要求応答信号を通信端末２Ａに送信する。 Then, in step S127, when the video conference can be ended, the communication terminal 2B transmits a request response signal indicating that to the communication terminal 2A.

このようにして、ビデオ会議システム１はビデオ会議を実行することができる。 In this way, the video conference system 1 can conduct a video conference.

＜第１の実施形態に係るビデオ会議システムの効果＞
次に、本実施形態に係るビデオ会議システムの効果について説明する。 <Effects of the video conference system according to the first embodiment>
Next, the effect of the video conference system according to the present embodiment will be described.

図１３は、本実施形態に係るビデオ会議システムの効果の一例を説明する図である。（ａ）は比較例に係る通信端末を用いるビデオ会議を説明する図であり、（ｂ）は本実施形態に係る通信端末を用いるビデオ会議を説明する図である。 FIG. 13 is a diagram illustrating an example of effects of the video conference system according to the present embodiment. (A) is a figure explaining the video conference using the communication terminal which concerns on a comparative example, (b) is a figure explaining a video conference using the communication terminal which concerns on this embodiment.

図１３（ａ）において、比較例に係る通信端末５Ａは、通信端末５Ａｏｔｈに隣接して設置されている。通信端末５Ａが使用されるビデオ会議１０と、通信端末５Ａｏｔｈが使用されるビデオ会議１０ｏｔｈは別のビデオ会議である。また、通信端末５Ａは、音の発生した方向（領域）の音声を集音するように、マイクロホンの指向性を制御する機能を備えている。 In FIG. 13A, the communication terminal 5A according to the comparative example is installed adjacent to the communication terminal 5Aoth. The video conference 10 in which the communication terminal 5A is used and the video conference 10oth in which the communication terminal 5Aoth is used are different video conferences. Further, the communication terminal 5A has a function of controlling the directivity of the microphone so as to collect the voice in the direction (area) in which the sound is generated.

図１３（ａ）の場合、通信端末５Ａと通信端末５Ａｏｔｈとの距離が近いと、通信端末５Ａは、ビデオ会議１０ｏｔｈの参加者５０１ａの発言に反応し、参加者５０１ａの方向（矢印１３１）の音声を集音するように、マイクロホンの指向性を誤って決定する場合がある。 In the case of FIG. 13A, when the distance between the communication terminal 5A and the communication terminal 5Aoth is short, the communication terminal 5A reacts to the speech of the participant 501a of the video conference 10oth and moves in the direction of the participant 501a (arrow 131). The directivity of the microphone may be erroneously determined so as to collect voice.

一方、図１３（ｂ）において、本実施形態に係る通信端末２Ａは、図１３（ａ）の場合と同様に、通信端末５Ａｏｔｈに隣接して設置されている。また、通信端末２Ａが使用されるビデオ会議１０と、通信端末５Ａｏｔｈが使用されるビデオ会議１０ｏｔｈは別のビデオ会議である。 On the other hand, in FIG. 13B, the communication terminal 2A according to the present embodiment is installed adjacent to the communication terminal 5Aoth, as in the case of FIG. 13A. Further, the video conference 10 in which the communication terminal 2A is used and the video conference 10oth in which the communication terminal 5Aoth is used are different video conferences.

本実施形態では、上述のように、拠点Ｂ等の他拠点の通信端末２Ｂから送信される注目領域情報に基づき、マイクロホン２１９の指向性を制御する。別のビデオ会議に参加している参加者５０１ａが拠点Ｂでの参加者の注目領域に含まれることはないため、参加者５０１ａが発言しても、通信端末２Ａはマイクロホン２１９の指向性を参加者５０１ａの方向に向けるような誤った決定を行うことはない。 In the present embodiment, as described above, the directivity of the microphone 219 is controlled based on the attention area information transmitted from the communication terminal 2B of another base such as the base B. Since the participant 501a participating in another video conference is not included in the attention area of the participant at the base B, even if the participant 501a speaks, the communication terminal 2A participates in the directivity of the microphone 219. It does not make an erroneous decision in the direction of the person 501a.

このようにして、本実施形態では、マイクロホン２１９の集音の指向性を、ビデオ会議１０の参加者１０１ａの方向に、適切に決定することができる。 In this way, in the present embodiment, the sound collection directivity of the microphone 219 can be appropriately determined in the direction of the participant 101a of the video conference 10.

なお、通信端末２Ａの備える撮像部２１は、別のビデオ会議に参加している参加者が撮像した画像に含まれないように、撮像視野が予め設定されていてもよい。これにより、別のビデオ会議に参加している参加者が、拠点Ｂでの参加者の注目領域に確実に含まれないようにできるため、マイクロホンの指向性を参加者５０１ａの方向に向けるような誤決定を確実に防ぐことができる。 The imaging unit 21 included in the communication terminal 2A may have an imaging field of view set in advance so as not to be included in an image captured by a participant participating in another video conference. This ensures that the participant who is participating in another video conference is not included in the attention area of the participant at the base B, so that the directivity of the microphone is directed toward the participant 501a. It is possible to reliably prevent erroneous decisions.

また、本実施形態では、通信端末２Ａは、検出した注目領域情報を他の第１通信端末に送信し、他の第２通信端末から受信した注目領域情報に基づいて指向性を決定する例を説明したが、これに限定されるものではない。 Further, in the present embodiment, an example in which the communication terminal 2A transmits the detected attention area information to another first communication terminal and determines the directivity based on the attention area information received from the other second communication terminal. Although described, the present invention is not limited to this.

例えば、通信端末２Ａは、検出した視線情報を他の第１通信端末に送信し、他の第２通信端末から受信した視線情報を蓄積し、蓄積した視線情報に基づき、注目領域を検出し、検出した注目領域情報に基づいて指向性を決定してもよい。この場合に送信される視線情報は、「所定の座標情報」の一例である。 For example, the communication terminal 2A transmits the detected line-of-sight information to the other first communication terminal, accumulates the line-of-sight information received from the other second communication terminal, detects the attention area based on the accumulated line-of-sight information, The directivity may be determined based on the detected attention area information. The line-of-sight information transmitted in this case is an example of “predetermined coordinate information”.

或いは、通信端末２Ａは、検出した注目領域に基づき決定した指向性情報を他の第１通信端末に送信し、他の第２通信端末から受信した指向性情報に基づいて指向性を制御してもよい。 Alternatively, the communication terminal 2A transmits the directivity information determined based on the detected attention area to the other first communication terminal, and controls the directivity based on the directivity information received from the other second communication terminal. Good.

ここで、通信端末２Ａが注目領域情報、又は指向性情報を他の第１通信端末に送信する場合は、視線情報を他の第１通信端末に送信する場合と比較して、送信するデータ量を削減できる効果を得ることができる。 Here, when the communication terminal 2A transmits the attention area information or the directivity information to another first communication terminal, the amount of data to be transmitted as compared with the case where the line-of-sight information is transmitted to the other first communication terminal. It is possible to obtain the effect of reducing

［第２の実施形態］
次に、第２の実施形態に係るビデオ会議システムを説明する。なお、既に説明した実施形態と同一の構成部についての説明は省略する。 [Second Embodiment]
Next, a video conference system according to the second embodiment will be described. Note that description of the same components as those of the above-described embodiment will be omitted.

ここで、通信端末２が会議室等の所定の場所に固定されて設置されている場合、参加者が着席する場所は、ビデオ会議毎でほとんど変わらないため、前回行ったビデオ会議で決定された指向性情報を、今回行うビデオ会議でそのまま適用できる場合がある。 Here, when the communication terminal 2 is fixedly installed in a predetermined place such as a conference room, the place where the participant is seated is almost the same for each video conference, and thus the place where the participant is seated is determined in the previous video conference. In some cases, the directional information can be applied as it is in the video conference held this time.

そこで、本実施形態では、前回行ったビデオ会議における指向性情報を指向性情報記憶部３０に記憶しておき、指向性決定部２８ａは、今回行うビデオ会議において、指向性情報記憶部３０を参照して取得した指向性情報に基づき、指向性を決定する。 Therefore, in the present embodiment, the directivity information in the previous video conference is stored in the directivity information storage unit 30, and the directivity determination unit 28a refers to the directivity information storage unit 30 in the current video conference. Then, the directivity is determined based on the acquired directivity information.

図１４は、本実施形態に係るビデオ会議システム１ａの備える通信端末２Ａａの機能構成の一例を説明するブロック図である。 FIG. 14 is a block diagram illustrating an example of a functional configuration of the communication terminal 2Aa included in the video conference system 1a according to this embodiment.

通信端末２Ａａは、指向性決定部２８ａと、指向性情報記憶部３０と、指向性情報更新部３１とを有している。 The communication terminal 2Aa includes a directivity determining unit 28a, a directivity information storage unit 30, and a directivity information updating unit 31.

指向性情報記憶部３０は、指向性決定部２８ａが決定した指向性情報を、指向性情報更新部３１を介して入力し、入力した指向性情報を記憶する機能を有する。 The directivity information storage unit 30 has a function of inputting the directivity information determined by the directivity determining unit 28a via the directivity information updating unit 31 and storing the input directivity information.

また指向性決定部２８ａは、注目領域情報に基づき、マイクロホン２１９の集音の指向性を決定する機能とともに、指向性情報記憶部３０から取得した指向性情報に基づき、マイクロホン２１９の集音の指向性を決定する機能を有する。 In addition, the directivity determination unit 28a has a function of determining the directivity of the sound collection of the microphone 219 based on the attention area information, and the directivity of the sound collection of the microphone 219 based on the directivity information acquired from the directivity information storage unit 30. Has the function of determining sex.

指向性情報更新部３１は、指向性情報記憶部３０に記憶された指向性情報を更新する機能を有する。 The directional information updating unit 31 has a function of updating the directional information stored in the directional information storage unit 30.

図１５は、本実施形態に係るビデオ会議システムの動作の一例を示すシーケンス図である。 FIG. 15 is a sequence diagram showing an example of the operation of the video conference system according to this embodiment.

先ず、ステップＳ１５１において、通信端末２Ａａは、通信端末２Ｂａに対してビデオ会議の開始を要求する信号を送信する。 First, in step S151, the communication terminal 2Aa transmits a signal requesting the start of a video conference to the communication terminal 2Ba.

続いて、ステップＳ１５２において、通信端末２Ｂａは、ビデオ会議を開始可能である場合は、その旨を示す要求応答信号を通信端末２Ａａに送信する。 Subsequently, in step S152, if the video conference can be started, the communication terminal 2Ba transmits a request response signal indicating that to the communication terminal 2Aa.

続いて、ステップＳ１５３において、通信端末２Ｂａは、通信端末２Ｂａの指向性情報記憶部３０から取得した指向性情報に基づき、マイクロホン２１９の集音の指向性を決定する。 Subsequently, in step S153, the communication terminal 2Ba determines the sound collection directivity of the microphone 219 based on the directivity information acquired from the directivity information storage unit 30 of the communication terminal 2Ba.

続いて、ステップＳ１５４において、通信端末２Ｂａは、指向性情報に基づき、マイクロホン２１９の集音の指向性を制御する。 Subsequently, in step S154, the communication terminal 2Ba controls the directivity of the sound collection of the microphone 219 based on the directivity information.

続いて、ステップＳ１５５において、通信端末２Ａａは、図１０で説明した注目領域検出処理を実行する。 Then, in step S155, the communication terminal 2Aa executes the attention area detection process described in FIG.

続いて、ステップＳ１５６において、通信端末２Ａａは、注目領域情報を含むサブパケットを、会議データに含めて通信端末Ｂａに送信する。 Subsequently, in step S156, the communication terminal 2Aa transmits the subpacket including the attention area information to the communication terminal Ba by including it in the conference data.

続いて、ステップＳ１５７において、通信端末２Ｂａは、受信したサブパケットに含まれる注目領域情報に基づいて、図１１で説明した指向性の制御処理を実行する。 Subsequently, in step S157, the communication terminal 2Ba executes the directivity control process described in FIG. 11 based on the attention area information included in the received subpacket.

続いて、ステップＳ１５８において、通信端末２Ｂａは、今回決定した指向性情報により、通信端末２Ｂａの指向性情報記憶部３０に記憶された指向性情報を更新する。 Subsequently, in step S158, the communication terminal 2Ba updates the directivity information stored in the directivity information storage unit 30 of the communication terminal 2Ba with the directivity information determined this time.

続いて、ステップＳ１５９において、ビデオ会議を終了する場合は、通信端末２Ａａは、通信端末２Ｂａに対してビデオ会議の終了を要求する信号を送信する。 Then, in step S159, when ending the video conference, the communication terminal 2Aa transmits a signal requesting the end of the video conference to the communication terminal 2Ba.

続いて、ステップＳ１６０において、通信端末２Ｂａは、ビデオ会議を終了可能である場合は、その旨を示す要求応答信号を通信端末２Ａａに送信する。 Subsequently, in step S160, if the video conference can be ended, the communication terminal 2Ba transmits a request response signal indicating that to the communication terminal 2Aa.

以上説明したように、本実施形態では、前回行ったビデオ会議における指向性情報を指向性情報記憶部３０に記憶しておき、指向性決定部２８ａは、今回行うビデオ会議において、指向性情報記憶部３０を参照して取得した指向性情報に基づき、指向性を決定する。 As described above, in the present embodiment, the directivity information of the previous video conference is stored in the directivity information storage unit 30, and the directivity determination unit 28a stores the directivity information storage in the current video conference. The directivity is determined based on the directivity information acquired by referring to the unit 30.

ビデオ会議の開始直後は、視線検出や視線情報の蓄積等で、指向性を決定するまでに一定の時間がかかるが、本実施形態によれば、ビデオ会議の開始の直後に、前回行ったビデオ会議における指向性情報に基づき指向性を決定するため、ビデオ会議の開始の直後から指向性を適切に設定することができ、ビデオ会議を開始直後から円滑に実行することができる。 Immediately after the start of the video conference, it takes a certain amount of time to determine the directivity due to line-of-sight detection, accumulation of line-of-sight information, etc. According to the present embodiment, immediately after the start of the video conference, the previous video Since the directivity is determined based on the directivity information in the conference, the directivity can be set appropriately immediately after the start of the video conference, and the video conference can be smoothly executed immediately after the start.

なお、上述したもの以外の効果は、第１の実施形態で説明したものと同様である。 The effects other than those described above are the same as those described in the first embodiment.

［第３の実施形態］
次に、第３の実施形態に係るビデオ会議システムを説明する。 [Third Embodiment]
Next, a video conference system according to the third embodiment will be described.

図１６は、本実施形態に係るビデオ会議システム１ｂの備える通信端末２Ａｂの機能構成の一例を説明するブロック図である。 FIG. 16 is a block diagram illustrating an example of the functional configuration of the communication terminal 2Ab included in the video conference system 1b according to this embodiment.

通信端末２Ａｂは、入力画面表示部３２と、設定入力部３３とを有している。 The communication terminal 2Ab has an input screen display unit 32 and a setting input unit 33.

入力画面表示部３２は、ビデオ会議システム１ｂのユーザが、注目領域を検出するための時間閾値、回数閾値、及び人数閾値の少なくとも１つ以上と、指向性決定部２８が指向性を決定するための条件とを入力する入力画面を表示する機能を有する。 The input screen display unit 32 has at least one of a time threshold value, a frequency threshold value, and a person number threshold value for the user of the video conference system 1b to detect the attention area, and the directivity determination unit 28 determines the directivity. It has a function of displaying an input screen for inputting the conditions and.

ここで、図１７は、このような入力画面の一例を説明する図である。図１７に示すように、入力画面３２１には、入力項目として、指向性制御処理を実行するか否かを入力するための「マイク指向性制御」と、視線情報を取得する「所定の時間」（図１０参照）を入力するための「視線情報取得時間」と、注目領域の検出で用いられる閾値の種類及び閾値を入力するための「注目領域検出条件」とが含まれている。 Here, FIG. 17 is a diagram illustrating an example of such an input screen. As shown in FIG. 17, the input screen 321 has, as input items, “microphone directivity control” for inputting whether or not to execute directivity control processing, and “predetermined time” for obtaining line-of-sight information. The “gaze information acquisition time” for inputting (see FIG. 10) and the “type of attention detection condition” for inputting the type of threshold used in detection of the attention area and the threshold are included.

このような入力画面３２１はディスプレイ２１７に表示され、ビデオ会議システム１ｂのユーザは、入力画面３２１を通じて注目領域を検出するための閾値や、指向性を決定するための条件を入力することができる。なお、ビデオ会議システム１ｂの管理者がこのような入力を行うようにしてもよい。また、図１７に示した入力項目は一例であって、他の入力項目を追加し、或いは図１７に示した入力項目と置き換えてもよい。 Such an input screen 321 is displayed on the display 217, and the user of the video conference system 1b can input a threshold value for detecting the attention area and a condition for determining the directivity through the input screen 321. The administrator of the video conference system 1b may make such an input. The input items shown in FIG. 17 are merely examples, and other input items may be added or replaced with the input items shown in FIG.

図１６に戻り、説明を続ける。設定入力部３３は、ディスプレイ２１７に表示された入力画面３２１を通じてユーザが入力した設定情報を入力し、視線検出部２２及び注目領域検出部２４に出力する。なお、この出力先は、入力項目に応じて決定することができる。 Returning to FIG. 16, the description will be continued. The setting input unit 33 inputs the setting information input by the user through the input screen 321 displayed on the display 217, and outputs the setting information to the line-of-sight detection unit 22 and the attention area detection unit 24. The output destination can be determined according to the input item.

以上説明したように、本実施形態では、ビデオ会議システム１ｂのユーザが各種閾値や指向性決定の条件等の設定情報を入力する入力画面を表示し、入力された設定情報に応じて通信端末の設定を行う。 As described above, in the present embodiment, the user of the video conference system 1b displays the input screen for inputting the setting information such as various thresholds and the conditions for determining directivity, and the communication terminal of the communication terminal is displayed according to the input setting information. Make settings.

周囲の不要な音の大きさ等、ビデオ会議を行う環境によって、指向性を決定するための適切な条件が異なる場合があるが、本実施形態によれば、会議環境に応じて適切に指向性を決定し、制御することができる。 Although the appropriate conditions for determining the directivity may differ depending on the environment in which the video conference is performed, such as the volume of unnecessary sound in the surroundings, according to the present embodiment, the directivity is appropriately set according to the conference environment. Can be determined and controlled.

尚、本発明は、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the above specifically disclosed embodiments, and various modifications and changes can be made without departing from the scope of the claims.

また実施形態は、通信端末のマイクロホンの制御方法も含む。例えば、通信端末のマイクロホンの制御方法は、カメラから画像の入力を受け、および、マイクロホンから音声の入力を受け、ビデオ会議で用いられる通信端末の前記マイクロホンの制御方法であって、前記カメラによって前記ビデオ会議の参加者の画像を撮像する工程と、前記画像から検出した前記参加者の視線を示す視線情報を蓄積する工程と、蓄積された前記視線情報に基づき、前記マイクロホンによる集音の指向性を決定する工程と、を含む。このような通信端末のマイクロホンの制御方法により、上述のビデオ会議システムと同様の効果を得ることができる。 The embodiment also includes a method of controlling the microphone of the communication terminal. For example, a method of controlling a microphone of a communication terminal is a method of controlling the microphone of a communication terminal used in a video conference by receiving an image input from a camera and an audio input from the microphone, wherein A step of capturing an image of a participant of the video conference, a step of accumulating line-of-sight information indicating the line-of-sight of the participant detected from the image, and a directivity of sound collection by the microphone based on the accumulated line-of-sight information. And a step of determining. With such a method of controlling the microphone of the communication terminal, it is possible to obtain the same effect as that of the above video conference system.

また、上記で説明した実施形態の各機能は、一又は複数の処理回路によって実現することが可能である。ここで、本明細書における「処理回路」とは、電子回路により実装されるプロセッサのようにソフトウェアによって各機能を実行するようプログラミングされたプロセッサや、上記で説明した各機能を実行するよう設計されたASIC(Application Specific Integrated Circuit)、DSP（digital signal processor）、FPGA（field programmable gate array）や従来の回路モジュール等のデバイスを含むものとする。 Further, each function of the embodiments described above can be realized by one or a plurality of processing circuits. Here, the “processing circuit” in the present specification is a processor programmed to execute each function by software, such as a processor implemented by an electronic circuit, or designed to execute each function described above. Devices such as ASICs (Application Specific Integrated Circuits), DSPs (digital signal processors), FPGAs (field programmable gate arrays), and conventional circuit modules are included.

１、１ａ、１ｂビデオ会議システム
２、２Ａ通信端末
２Ｂ通信端末（他の第１通信端末の一例、他の第２通信端末の一例）
３ネットワーク
４サーバ
２１撮像部
２２視線検出部
２３視線情報蓄積部
２４注目領域検出部
２５サブパケット生成部
２６送信部
２７受信部
２８指向性決定部
２９指向性制御部
３０指向性情報記憶部
３１指向性情報更新部
３２入力画面表示部
３３設定入力部
２０１ＣＰＵ
２０２ＲＯＮ
２０３ＲＡＭ
２０４ＳＳＤ
２０５ネットワークコントローラ
２０６センサコントローラ
２０７キャプチャデバイス
２０８電子ペンコントローラ
２０９外部記憶コントローラ
２１０ＧＰＵ
２１１ディスプレイコントローラ
２１２カメラコントローラ
２１３マイクロホンコントローラ
２１４スピーカコントローラ
２１５電子ペン
２１６外部メモリ
２１７ディスプレイ
２１８カメラ
２１９マイクロホン
２２０スピーカ 1, 1a, 1b Video conference system 2, 2A Communication terminal 2B Communication terminal (an example of another first communication terminal, an example of another second communication terminal)
3 network 4 server 21 imaging unit 22 line-of-sight detection unit 23 line-of-sight information storage unit 24 attention area detection unit 25 subpacket generation unit 26 transmission unit 27 reception unit 28 directivity determination unit 29 directivity control unit 30 directivity information storage unit 31 directivity Sex information updating unit 32 input screen display unit 33 setting input unit 201 CPU
202 RON
203 RAM
204 SSD
205 Network Controller 206 Sensor Controller 207 Capture Device 208 Electronic Pen Controller 209 External Storage Controller 210 GPU
211 Display Controller 212 Camera Controller 213 Microphone Controller 214 Speaker Controller 215 Electronic Pen 216 External Memory 217 Display 218 Camera 219 Microphone 220 Speaker

特開２０１７−０３４５０２号公報JP, 2017-034502, A

Claims

A video conferencing system that includes a plurality of communication terminals, a camera that outputs images to the communication terminals, and a microphone that outputs audio to the communication terminals, and that performs a video conference.
The communication terminal,
A line-of-sight information storage unit that stores line-of-sight information indicating the line-of-sight of the participants of the video conference detected from the image,
A video conferencing system comprising: a directivity determining unit that determines a directivity of sound collection of the microphone based on the accumulated line-of-sight information.

Based on the accumulated line-of-sight information, an attention area detection unit that detects an attention area in which the participant pays attention,
A transmitter that transmits the detected attention area to another first communication terminal,
The video conferencing system according to claim 1, wherein the directivity determination unit determines the directivity based on the attention area received from another second communication terminal.

A sub-packet generation unit that generates a sub-packet including predetermined coordinate information extracted from the line-of-sight information,
The video conference system according to claim 2, wherein the transmitting unit transmits the subpacket.

The directivity determination unit is a region where the line of sight of the participant is directed at a time equal to or greater than a predetermined time threshold, a region where the line of sight of the participant is directed at a number of times equal to or greater than a predetermined number of times threshold, and 4. The video conferencing system according to claim 1, wherein the directivity is determined based on at least one of the areas in which the line of sight of the participants is directed by a predetermined number of people or more.

A directional information storage unit that stores the determined directional information,
A directional information updating unit for updating the stored directional information,
The video conferencing system according to claim 1, wherein the directivity determining unit determines the directivity based on the directivity information acquired by referring to the directivity information storage unit.

A user of the video conference system displays an input screen for inputting at least one of the time threshold value, the number of times threshold value, and the number of people threshold value, and a condition for the directivity determining unit to determine the directivity. The video conference system according to claim 4, further comprising an input screen display unit.

A communication terminal used in a video conference, which receives an image input from a camera and an audio input from a microphone,
A line-of-sight information storage unit that stores line-of-sight information indicating the line-of-sight of the participants of the video conference detected from the image,
And a directivity determining unit that determines the directivity of sound collection of the microphone based on the accumulated line-of-sight information.

A method of controlling the microphone of a communication terminal used in a video conference, the method including receiving an image input from a camera and receiving an audio input from a microphone,
Capturing an image of the participants in the video conference with the camera;
Accumulating line-of-sight information indicating the line-of-sight of the participant detected from the image,
A method of controlling a microphone of a communication terminal, comprising: determining the directivity of sound collection by the microphone based on the accumulated line-of-sight information.