JP7306765B2

JP7306765B2 - Communication device, communication program and storage medium

Info

Publication number: JP7306765B2
Application number: JP2022528302A
Authority: JP
Inventors: 克治金井
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-01-18
Filing date: 2022-01-18
Publication date: 2023-07-11
Anticipated expiration: 2042-01-18
Also published as: JPWO2022154128A1; WO2022154128A1

Description

本発明は、他の地点にいる参加者を画面に表示させてビデオ会議や会話を行うためのコミュニケーション装置等に関する。 The present invention relates to a communication device or the like for displaying participants at other locations on a screen for video conferences or conversations.

この種のコミュニケーション装置としては、複数の端末装置をネットワークで接続し、端末装置に設置した撮像装置からの参加者の映像を各端末装置の表示画面に表示させて会議を行うビデオ会議装置がある。従来のビデオ会議装置は、例えば特許文献１の図１に示すように端末装置の表示画面に複数の参加者の正面顔を格子状に整列させて表示していた。 As a communication device of this type, there is a video conference device in which a plurality of terminal devices are connected via a network, and video conferences are held by displaying images of participants from imaging devices installed in the terminal devices on the display screens of each terminal device. . As shown in FIG. 1 of Japanese Unexamined Patent Application Publication No. 2002-100000, for example, a conventional video conference apparatus displays the front faces of a plurality of participants in a grid pattern on the display screen of the terminal device.

特開２０１７－００５６１６公報Japanese Patent Application Laid-Open No. 2017-005616 特開２００１－１３６５０１公報Japanese Patent Application Laid-Open No. 2001-136501

しかしながら、特許文献１のように単に正面顔を列挙して表示する装置では、実際の会議室や会場にいるように顔を動かして見たいところだけ見たりすることもできないので、その場にいるような臨場感をなかなか得られない。また例えば特許文献２に示すように複数の撮像装置を用いれば横顔を撮像して表示させることはできる。ところが、特許文献１や特許文献２のような技術では専ら対話相手を推定して視線を一致させることを目的としているため、例えば顔を動かして相手の横顔をのぞき込んだり、プレゼンテーションを見ながら隣の人に話しかけたり、他の参加者を見渡したりするように必ずしも視線を一致させなくてよい動きまでは想定されていない。したがって、実際の会議室や会場にいるような臨場感までは伝わりにくい。 However, with a device that simply enumerates and displays front faces like in Patent Document 1, it is not possible to move the face and see only the part one wants to see, as in an actual conference room or venue. It's hard to get a sense of realism like that. In addition, as disclosed in Patent Document 2, for example, by using a plurality of imaging devices, a side face can be captured and displayed. However, the techniques disclosed in Patent Document 1 and Patent Document 2 are intended solely for estimating a conversation partner and matching their line of sight. Movements that do not necessarily require eye contact, such as talking to people or looking over other participants, are not assumed. Therefore, it is difficult to convey the presence of being in an actual conference room or venue.

このような事情を考慮して、本発明は、自分の顔の動きに応じて見え方やその周囲の見え方を変えることで、まるでその場にいるような臨場感を得られるコミュニケーション装置等を提供することを目的とする。 In consideration of such circumstances, the present invention provides a communication device or the like that can give a sense of realism as if one is actually there by changing the appearance of one's own face and the appearance of the surroundings according to the movement of one's face. intended to provide

上記課題を解決するために、本発明の装置は、複数の参加者の映像を端末装置に表示させるコミュニケーション装置であって、複数の参加者のそれぞれについて異なる向きの顔の映像を含む複数の映像を取得する取得部と、取得部で取得された顔の映像から少なくとも参加者の一人の顔の動きを検知する動き検知部と、動き検知部で検知された参加者の顔の動きに応じて他の参加者の顔の向きの映像を選定する映像選定部と、映像選定部で選定された映像から少なくとも他の参加者の顔の映像を端末装置に表示させるための映像を生成する映像生成部と、を備える。本態様のコミュニケーション装置によれば、自分の顔の動きに応じて他の参加者の見え方を変えることができるので、まるでその場にいるような臨場感を得られる。 In order to solve the above problems, the device of the present invention is a communication device for displaying images of a plurality of participants on a terminal device, wherein the plurality of images including images of faces facing different directions for each of the plurality of participants. a motion detection unit that detects the facial movement of at least one of the participants from the facial video acquired by the acquisition unit; An image selection unit that selects an image of another participant's face orientation, and an image generation that generates an image for displaying at least an image of the other participant's face on a terminal device from the image selected by the image selection unit. and According to the communication device of this aspect, it is possible to change the appearance of other participants in accordance with the movement of one's own face, thereby providing a sense of realism as if one were there.

本発明の好適な態様において、複数の参加者を配置する仮想会議室と、仮想会議室における複数の参加者の位置と、を記憶する仮想会議室記憶部を備え、映像生成部は、仮想会議室における参加者の位置に応じてその参加者を表示する位置を特定した映像を生成する。本態様によれば、仮想会議室の参加者の位置に応じてその参加者を表示する位置が特定されるので、他の参加者の位置がまるで実際の会議室の位置にいるような見え方でビデオ会議や会話を行うことができる。 In a preferred aspect of the present invention, a virtual conference room for arranging a plurality of participants and a virtual conference room storage unit for storing the positions of the plurality of participants in the virtual conference room are provided. A position-specific image is generated to display the participant according to the position of the participant in the room. According to this aspect, since the position where the participant is displayed is specified according to the position of the participant in the virtual conference room, the position of the other participants looks as if they were in the actual conference room. You can have video conferences and conversations with

本発明の好適な態様において、映像選定部は、仮想会議室における参加者の位置に応じた顔の向きの映像を選定する。本態様によれば、例えば自分の横にいる参加者は横顔の映像を表示できるようになるので、他の参加者の見え方がまるで実際の会議室にいるような見え方でビデオ会議や会話を行うことができる。 In a preferred aspect of the present invention, the image selection unit selects images of face orientations according to the positions of the participants in the virtual conference room. According to this aspect, for example, a participant next to him/herself can display a side view image, so that the appearance of other participants can be seen as if they were in an actual conference room. It can be performed.

本発明の好適な態様において、映像選定部は、動き検知部で検知された参加者の顔の動きに応じて仮想会議室の表示範囲を選定し、映像生成部は、複数の参加者のうち表示範囲に含まれる参加者を端末装置に表示させるための映像を生成する。本態様によれば、自分の顔を動かすことにより、仮想会議室のうち見たい範囲を表示させることができる。例えばテーブルの右側にいる参加者だけを見たい場合には顔を右に動かせば、自分の右側の参加者だけが見えるようにできる。また実際の会議室でその参加者の近くに行って会話するように、会話する参加者の方に顔を動かせばその参加者だけを表示させその参加者とだけ会話をすることもできる。これにより、まるで実際の会議室で会話をしているような臨場感を得られる。 In a preferred aspect of the present invention, the video selection unit selects the display range of the virtual conference room according to the facial movements of the participants detected by the motion detection unit, and the video generation unit selects the A video is generated for displaying the participants included in the display range on the terminal device. According to this aspect, by moving one's face, it is possible to display a desired range of the virtual conference room. For example, if you want to see only the participants on the right side of the table, you can move your face to the right so that only the participants on your right can be seen. It is also possible to display only the participant by moving the face toward the participant to have a conversation, just like going near the participant and having a conversation in an actual conference room. This gives a sense of realism as if you were having a conversation in an actual conference room.

本発明の好適な態様において、動き検知部は、取得部で取得された顔の映像から少なくとも参加者の一人の顔の向きが正面、右向き、左向きのいずれかであるかを検知し、映像選定部は、動き検知部で検知された参加者の顔の向きが正面の場合は、仮想会議室の表示範囲にその参加者の右側と左側に位置する他の参加者の映像を含まないようにし、動き検知部で検知された参加者の顔の向きが右向きの場合は、仮想会議室の表示範囲にその参加者の右側に位置する他の参加者の映像を含むようにし、動き検知部で検知された参加者の顔の向きが左向きの場合は、その参加者の左側に位置する他の参加者の映像を含むようにする。本態様によれば、自分の右側の参加者と会話する場合は右を向き、自分の左側の参加者と会話する場合は左を向けばよいので、まるで会議室で隣の参加者と会話する場合のような臨場感を得られる。また自分の右側に複数の参加者がいればそれを見たいときには右を向けばよく、自分の左側に複数の参加者がいればそれを見たいときには左を向けばよいので、まるで実際の会議室で顔の向きを変えた場合のような臨場感を得られる。 In a preferred aspect of the present invention, the motion detection unit detects whether the face orientation of at least one of the participants is front, right, or left from the face image acquired by the acquisition unit, and selects the image. When the face direction of a participant detected by the motion detection unit is facing the front, the display area of the virtual conference room should not include the images of other participants located to the right and left of that participant. , if a participant's face is facing right as detected by the motion detection unit, the display area of the virtual conference room will include the video of the other participant located to the right of the participant, and the motion detection unit will If the face of the detected participant faces left, the image of the other participant positioned to the left of the participant is included. According to this aspect, when conversing with the participant on the right side of oneself, it is necessary to turn to the right, and when conversing with the participant on the left side of oneself, it is sufficient to turn to the left, so it is as if talking with the next participant in the conference room. You can get a sense of reality like a case. Also, if there are multiple participants on your right side, you can turn to the right when you want to see them, and if you want to see multiple participants on your left side, you can turn to the left when you want to see them. You can get a sense of realism as if you changed the direction of your face in a room.

本発明の好適な態様において、仮想会議室での参加者の位置が変わると、映像選定部は、その変わった位置に応じた顔の向きの映像を選定し、映像生成部は、その変わった位置に応じてその参加者を表示する位置を特定した映像を生成する。本態様によれば、仮想会議室において席を移動することでも、実際の会議室と同じような見え方をビデオ会議や会話でも実現できるので、臨場感を高めることができる。 In a preferred aspect of the present invention, when the position of the participant in the virtual conference room changes, the image selection unit selects an image of the face direction corresponding to the changed position, and the image generation unit A position-specified video is generated that displays the participant according to the position. According to this aspect, by moving the seats in the virtual conference room, it is possible to achieve the same appearance as in the actual conference room in the video conference or conversation, thereby enhancing the sense of realism.

本発明の好適な態様において、動き検知部で検知された参加者の顔の動きに応じて他の参加者の音声を選定する音声選定部と、音声選定部で選定された音声に基づいて他の参加者の音声を端末装置から出力させるための音声を生成する音声生成部とを備える。本態様によれば、映像に合わせて音声についても参加者の顔の動きに応じた音声を選定できる。例えば顔の動きに応じて映像が表示される参加者のみの音声を選定して出力させることができる。 In a preferred embodiment of the present invention, a voice selection unit that selects voices of other participants according to facial movements of participants detected by the motion detection unit; and a voice generator for generating voices for outputting the voices of the participants from the terminal device. According to this aspect, it is possible to select a sound according to the movement of the face of the participant in accordance with the video. For example, it is possible to select and output only the voice of the participant whose video is displayed according to the movement of the face.

本発明の好適な態様において、取得部は、複数の参加者のそれぞれについて異なる向きの顔の映像と異なる向きの周囲の映像を取得し、映像選定部は、動き検知部で検知された参加者の顔の動きに応じて他の参加者の顔の向きの映像と周囲の映像を選定し、映像生成部は、映像選定部で選定された映像から少なくとも他の参加者の顔の映像と周囲の映像を端末装置に表示させるための映像を生成する。本態様によれば、異なる向きの顔の映像と異なる向きの周囲の映像から、参加者の顔の動きに応じて他の参加者の顔の向きの映像と周囲の映像を選定するので、顔を動かせばその顔の動きに連動して周囲の映像も動いて表示される。これにより、顔を動かすだけで見えない部分も見えるように周囲の映像を動かすことができるので、まるでその場にいるような臨場感を体験できる。 In a preferred aspect of the present invention, the acquisition unit acquires images of faces facing different directions and surrounding images facing different directions for each of a plurality of participants, and the image selection unit obtains images of the participants detected by the motion detection unit. The image generation unit selects at least the image of the other participant's face and the image of the surroundings from the images selected by the image selection unit. A video for displaying the video on the terminal device is generated. According to this aspect, the image of the other participant's face direction and the surrounding image are selected according to the movement of the face of the participant from the images of the face of the different direction and the surrounding images of the different direction. If you move the , the surrounding image will move and be displayed in conjunction with the movement of the face. This makes it possible to move the surrounding image so that invisible parts can be seen just by moving the face, so you can experience a realistic feeling as if you were there.

本発明の好適な態様において、動き検知部は、参加者の顔の動きとして顔の移動と顔の向きを検知し、映像選定部は、顔の移動に応じて周囲の映像を選定し、顔の向きに応じて周囲の映像の表示範囲を選定し、映像生成部は、映像選定部で選定された表示範囲で周囲の映像を端末装置に表示させるための映像を生成する。本態様によれば、参加者の顔の移動に応じて周囲の映像を選定し、顔の向きに応じて周囲の映像の表示範囲を選定するので、顔を動かせばその顔の動きに連動して周囲の映像の表示範囲も変えられる。これにより、顔を動かすだけで見えない部分も見えるように周囲の映像の表示範囲を変えることができるので、まるでその場にいるような臨場感を体験できる。 In a preferred embodiment of the present invention, the motion detection unit detects movement and direction of the face as the movement of the face of the participant, and the image selection unit selects surrounding images according to the movement of the face, and detects the movement of the face. The display range of the surrounding image is selected according to the orientation of the image, and the image generation unit generates an image for displaying the surrounding image on the terminal device in the display range selected by the image selection unit. According to this aspect, the surrounding image is selected according to the movement of the face of the participant, and the display range of the surrounding image is selected according to the orientation of the face. You can also change the display range of the surrounding image by pressing With this, you can change the display range of the surrounding image so that you can see the invisible part just by moving your face, so you can experience the realism as if you were there.

本発明の好適な態様において、複数の参加者を配置する仮想会議室と、仮想会議室における複数の参加者の位置と周囲の映像の位置と、を記憶する仮想会議室記憶部を備え、映像選定部は、参加者の顔の動きに応じて選定した周囲の映像の向きに合わせて参加者の表示位置を変える。本態様によれば、参加者の顔の動きに応じて選定した周囲の映像の向きに合わせて参加者の表示位置を変えることで、その表示位置から周囲の映像を見ているように表示できるので、まるでその場にいるような臨場感のある体験が可能となる。 In a preferred aspect of the present invention, a virtual conference room for arranging a plurality of participants and a virtual conference room storage unit for storing the positions of the plurality of participants in the virtual conference room and the positions of surrounding images, The selection unit changes the display position of the participant according to the direction of the surrounding image selected according to the movement of the participant's face. According to this aspect, by changing the display position of the participant in accordance with the orientation of the surrounding image selected according to the movement of the participant's face, it is possible to display the surrounding image as if it were being viewed from that display position. Therefore, it is possible to have a realistic experience as if you were there.

本発明の好適な態様において、本発明の記憶媒体は、コミュニケーション装置が行うビデオ処理をコンピュータに実行させるコミュニケーションプログラムを記憶したコンピュータ読み取り可能な記憶媒体であって、ビデオ処理は、複数の参加者のそれぞれについて異なる向きの顔の映像を含む複数の映像を取得するステップと、取得された顔の映像から少なくとも参加者の一人の顔の動きを検知するステップと、検知された参加者の顔の動きに応じて他の参加者の顔の向きの映像を選定するステップと、選定された映像から少なくとも他の参加者の顔の映像を端末装置に表示させるための映像を生成するステップと、を含む。本態様の記憶媒体のプログラムをコンピュータで読み取って実行させることでビデオ処理を実行でき、コンピュータをコミュニケーション装置として機能させることができる。 In a preferred aspect of the present invention, the storage medium of the present invention is a computer-readable storage medium storing a communication program that causes a computer to execute video processing performed by a communication device, the video processing being performed by a plurality of participants. obtaining a plurality of videos, each including videos of faces in different orientations; detecting facial movements of at least one of the participants from the obtained facial videos; and detecting facial movements of the participants. and generating an image for displaying at least the image of the other participant's face from the selected image on the terminal device. . Video processing can be executed by reading and executing the program stored in the storage medium of this embodiment, and the computer can function as a communication device.

上記課題を解決するために、本発明のプログラムは、コミュニケーション装置が行うビデオ処理をコンピュータに実行させるコミュニケーションプログラムであって、ビデオ処理は、複数の参加者のそれぞれについて異なる向きの顔の映像を含む複数の映像を取得するステップと、取得された顔の映像から少なくとも参加者の一人の顔の動きを検知するステップと、検知された参加者の顔の動きに応じて他の参加者の顔の向きの映像を選定するステップと、選定された映像から少なくとも他の参加者の顔の映像を端末装置に表示させるための映像を生成するステップと、を含む。本態様のプログラムを実行することでビデオ処理を実行でき、コンピュータをコミュニケーション装置として機能させることができる。 In order to solve the above problems, a program of the present invention is a communication program that causes a computer to execute video processing performed by a communication device, wherein the video processing includes images of faces facing different directions for each of a plurality of participants. acquiring a plurality of images; detecting facial movements of at least one of the participants from the acquired facial images; It includes the steps of selecting an orientation image, and generating from the selected image an image for displaying at least an image of the other participant's face on the terminal device. By executing the program of this aspect, video processing can be executed, and the computer can function as a communication device.

本発明によれば、自分の顔の動きに応じて他の参加者の見え方やその周囲の見え方を変えることで、まるでその場にいるような臨場感でビデオ会議や会話を行うことができる。 According to the present invention, by changing the appearance of other participants and the appearance of the surroundings according to the movement of one's face, it is possible to have a video conference or conversation with a sense of realism as if one were there. can.

本発明の第１実施形態に係るビデオ会議システムの構成を示す図である。1 is a diagram showing the configuration of a video conference system according to a first embodiment of the present invention; FIG. ビデオ会議装置（コミュニケーション装置）と端末装置のブロック図である。1 is a block diagram of a video conference device (communication device) and a terminal device; FIG. 図２の仮想会議室構成情報の具体例を示す図である。3 is a diagram showing a specific example of virtual meeting room configuration information in FIG. 2; FIG. 図２の参加者情報の具体例を示す図である。3 is a diagram showing a specific example of participant information in FIG. 2; FIG. 図２の仮想会議室表示情報の具体例を示す図である。3 is a diagram showing a specific example of virtual conference room display information in FIG. 2; FIG. ビデオ会議システム（参加者２人）の概略構成を示す図である。1 is a diagram showing a schematic configuration of a video conference system (two participants); FIG. 図６の端末装置を正面から見た概略図である。FIG. 7 is a schematic diagram of the terminal device of FIG. 6 viewed from the front; 図６の端末装置を上から見た図であり撮像装置からの映像例を示す図である。FIG. 7 is a top view of the terminal device in FIG. 6 and shows an example of an image from an imaging device; 図６の端末装置を左から見た図であり撮像装置からの映像例を示す図である。FIG. 7 is a view of the terminal device in FIG. 6 viewed from the left, and is a view showing an example of an image from an imaging device; ビデオ会議処理の具体例を示すフローチャートである。4 is a flowchart showing a specific example of video conference processing; 図１０に示すビデオ処理の具体例を示すフローチャートである。FIG. 11 is a flow chart showing a specific example of the video processing shown in FIG. 10; FIG. 顔の動きなしを検知した場合の作用説明図である。FIG. 10 is an explanatory view of the effect when detecting no movement of the face; 顔の動きありを検知した場合の作用説明図である。FIG. 10 is an explanatory view of action when detecting presence of movement of the face; 第２実施形態のビデオ会議システム（参加者４人）の概略構成を示す図である。FIG. 12 is a diagram showing a schematic configuration of a video conference system (four participants) according to the second embodiment; 第２実施形態の仮想会議室表示情報の具体例を示す図である。It is a figure which shows the specific example of the virtual meeting room display information of 2nd Embodiment. 参加者Ａからの見え方と他の参加者の撮像装置との関係を示す図である。FIG. 10 is a diagram showing the relationship between the view from participant A and the imaging devices of other participants; 参加者Ｂからの見え方と他の参加者の撮像装置との関係を示す図である。FIG. 10 is a diagram showing the relationship between the view from participant B and the imaging devices of other participants; 第２実施形態のビデオ処理の具体例を示すフローチャートである。9 is a flowchart showing a specific example of video processing according to the second embodiment; 第２実施形態における顔の動きありを検知した場合の作用説明図である。FIG. 11 is an explanatory view of the effect when detecting that there is movement of the face in the second embodiment; 図１９における仮想会議室表示情報の具体例を示す図である。FIG. 20 is a diagram showing a specific example of virtual conference room display information in FIG. 19; 第３実施形態のビデオ処理の具体例を示すフローチャートである。FIG. 11 is a flowchart showing a specific example of video processing according to the third embodiment; FIG. 第３実施形態における仮想会議室表示情報の具体例を示す図である。It is a figure which shows the specific example of the virtual meeting room display information in 3rd Embodiment. 図２２の仮想会議室表示情報による参加者Ａからの見え方を示す図である。FIG. 23 is a diagram showing how the virtual meeting room display information of FIG. 22 is viewed from participant A; 顔の向きが左に動いた場合の仮想会議室表示情報を示す図である。FIG. 10 is a diagram showing virtual conference room display information when the face direction moves to the left; 図２４の仮想会議室表示情報による参加者Ａからの見え方を示す図である。FIG. 25 is a diagram showing how the virtual meeting room display information of FIG. 24 is viewed from the participant A. FIG. 顔の向きが右に動いた場合の仮想会議室表示情報を示す図である。FIG. 10 is a diagram showing virtual conference room display information when the face direction moves to the right; 図２６の仮想会議室表示情報による参加者Ａからの見え方を示す図である。FIG. 27 is a diagram showing how the virtual meeting room display information of FIG. 26 is viewed from participant A; 第１変形例に係る仮想会議室表示情報の具体例を示す図である。FIG. 10 is a diagram showing a specific example of virtual conference room display information according to the first modified example; 参加者Ｄの席移動後の参加者Ａからの見え方を示す図である。FIG. 10 is a diagram showing how the participant A sees after the participant D moves his/her seat. 参加者Ｄの席移動前の参加者Ｄからの見え方を示す図である。FIG. 10 is a diagram showing how the participant D sees the image before the participant D moves his/her seat. 参加者Ｄの席移動後の参加者Ｄからの見え方を示す図である。FIG. 10 is a diagram showing how the participant D sees after the participant D moves his/her seat. 第２変形例に係る仮想会議室表示情報の具体例を示す図である。FIG. 11 is a diagram showing a specific example of virtual conference room display information according to the second modification; 図３２の仮想会議室表示情報による参加者Ａ１からの見え方を示す図である。FIG. 33 is a diagram showing how the virtual meeting room display information in FIG. 32 is viewed from participant A1. 第４実施形態のコミュニケーションシステムの概略構成を示す図である。It is a figure which shows schematic structure of the communication system of 4th Embodiment. 図３４の端末装置を正面から見た概略図である。35 is a schematic diagram of the terminal device of FIG. 34 viewed from the front; FIG. 図３４の端末装置を上から見た図であり各撮像装置からの周囲映像の具体例を示す図である。FIG. 35 is a top view of the terminal device of FIG. 34 and shows a specific example of surrounding images from each imaging device; 第４実施形態のコミュニケーション装置と端末装置のブロック図である。It is a block diagram of a communication device and a terminal device of a fourth embodiment. 図３６の仮想会議室構成情報の具体例を示す図である。FIG. 37 is a diagram showing a specific example of the virtual meeting room configuration information of FIG. 36; 図３６の参加者情報の具体例を示す図である。FIG. 37 is a diagram showing a specific example of participant information in FIG. 36; 図３６の仮想会議室表示情報の具体例を示す図である。FIG. 37 is a diagram showing a specific example of virtual conference room display information in FIG. 36; 参加者Ｂの撮像装置２４ｇからの映像例を示す図であり、（ａ）は全方位映像、（ｂ）は全方位映像から生成される出力映像である。FIG. 10A is a diagram showing an example of an image from an imaging device 24g of a participant B, (a) is an omnidirectional image, and (b) is an output image generated from the omnidirectional image. 参加者Ａが正面から顔を動かさずに見える表示画像を示す図である。FIG. 10 is a diagram showing a display image that participant A can see from the front without moving his/her face. 参加者Ｂの撮像装置２４ｆからの映像例を示す図であり、（ａ）は全方位映像、（ｂ）は全方位映像から生成される出力映像である。FIG. 10A is a diagram showing an example of an image from an imaging device 24f of a participant B, (a) is an omnidirectional image, and (b) is an output image generated from the omnidirectional image. 参加者Ａが顔を左に動かして見える表示画像を示す図である。FIG. 10 is a diagram showing a display image seen by Participant A as he moves his face to the left; 参加者Ｂの撮像装置２４ｅからの映像例を示す図であり、（ａ）は全方位映像、（ｂ）は全方位映像から生成される出力映像である。FIG. 10A is a diagram showing an example of an image from an imaging device 24e of a participant B, (a) is an omnidirectional image, and (b) is an output image generated from the omnidirectional image. 参加者Ａが顔を右に動かして見える表示画像を示す図である。FIG. 10 is a diagram showing a display image seen when participant A moves his face to the right; 参加者Ａが左に顔を向けた場合の表示範囲の変化を示す図である。FIG. 10 is a diagram showing changes in the display range when participant A turns his face to the left; 参加者Ａが左に顔を向けて見える表示画像を示す図である。FIG. 10 is a diagram showing a display image in which Participant A can be seen with his face turned to the left; 参加者Ａが右に顔を向けた場合の表示範囲の変化を示す図である。FIG. 10 is a diagram showing changes in the display range when participant A turns his face to the right; 参加者Ａが右に顔を向けて見える表示画像を示す図である。FIG. 11 shows a display image in which Participant A looks to the right;

＜第１実施形態＞
以下、本発明の第１実施形態について図面を参照しながら説明する。第１実施形態では本発明のコミュニケーション装置の例示としてのビデオ会議装置１０を備えるビデオ会議システム１００（コミュニケーションシステム）を例に挙げる。図１は、第１実施形態に係るビデオ会議システム１００の構成を示す図である。図１のビデオ会議システム１００は、ビデオ会議装置１０と端末装置２０とを備える。<First embodiment>
A first embodiment of the present invention will be described below with reference to the drawings. In the first embodiment, a video conference system 100 (communication system) including a video conference device 10 as an example of the communication device of the present invention is taken as an example. FIG. 1 is a diagram showing the configuration of a video conference system 100 according to the first embodiment. A video conference system 100 in FIG. 1 includes a video conference device 10 and terminal devices 20 .

ビデオ会議システム１００は、複数の参加者を端末装置２０の画面に表示させてビデオ会議（Ｗｅｂ会議、テレビ会議）を行うものである。本実施形態のビデオ会議装置１０は、自分の顔の動きに応じて（自分の顔の動きに連動して）、他の参加者の見え方を変えて画面に表示する。自分の顔の動き（顔の移動や顔の向きなど）に連動して他の参加者の見え方が変わるので、まるでその場にいるような臨場感のある会議ができる。 The video conference system 100 displays a plurality of participants on the screen of the terminal device 20 to hold a video conference (web conference, video conference). The video conference device 10 of this embodiment changes the appearance of other participants according to the movement of their own face (in conjunction with the movement of their own face) and displays them on the screen. Since the appearance of other participants changes in conjunction with the movement of your face (face movement, face direction, etc.), you can have a realistic meeting as if you were there.

第１実施形態のビデオ会議装置１０は、端末装置２０をクライアントとするサーバコンピュータで構成する場合を例示する。ビデオ会議装置１０は、複数台で分散処理するように構成してもよく、また１台のサーバ装置に設けられた複数の仮想マシンによって構成してもよい。また、ビデオ会議装置１０は、パーソナルコンピュータで構成してもよく、クラウドサーバで構成してもよい。ビデオ会議装置１０と端末装置２０とはインターネットなどのネットワークＮを介して互いに通信可能に構成されている。 The video conference device 10 of the first embodiment is configured by a server computer having the terminal device 20 as a client. The video conference device 10 may be configured to perform distributed processing by a plurality of devices, or may be configured by a plurality of virtual machines provided in one server device. Also, the video conference device 10 may be configured by a personal computer or may be configured by a cloud server. The video conference device 10 and the terminal device 20 are configured to communicate with each other via a network N such as the Internet.

端末装置２０は、ユーザによって利用される情報処理装置である。端末装置２０は、例えばスマートフォン、タブレット、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）などの携帯端末や、デスクトップ型パーソナルコンピュータ、ノート型パーソナルコンピュータなどである。ネットワークＮには２つの端末装置２０が接続される場合を例示しているが、３つ以上の端末装置２０が接続されていてもよい。各端末装置２０はそれぞれ別々の地点で参加者に使用されるが、同じ地点で他の参加者が使用するものを含んでいてもよい。 The terminal device 20 is an information processing device used by a user. The terminal device 20 is, for example, a mobile terminal such as a smart phone, a tablet, or a PDA (Personal Digital Assistant), a desktop personal computer, a notebook personal computer, or the like. Although two terminal devices 20 are connected to the network N, three or more terminal devices 20 may be connected. Each terminal device 20 is used by participants at separate points, but may include terminals used by other participants at the same point.

図２は、図１のビデオ会議装置１０（コミュニケーション装置）と端末装置２０の具体的構成例を示すブロック図である。図２に示すようにビデオ会議装置１０は、通信部１１と制御部１２と記憶部１４とを備える。通信部１１と制御部１２と記憶部１４とは、それぞれバスライン１０Ｌに接続され、相互に情報（データ）のやり取りが可能である。 FIG. 2 is a block diagram showing a specific configuration example of the video conference device 10 (communication device) and the terminal device 20 in FIG. As shown in FIG. 2, the video conference device 10 includes a communication section 11, a control section 12, and a storage section . The communication unit 11, the control unit 12, and the storage unit 14 are each connected to the bus line 10L, and can mutually exchange information (data).

通信部１１は、ネットワークＮと有線又は無線で接続され、端末装置２０との間で情報（データ）の送受信を行う。通信部１１は、インターネットやイントラネットの通信インターフェースとして機能し、例えばＴＣＰ／ＩＰを用いた通信などが可能である。 The communication unit 11 is connected to the network N by wire or wirelessly, and transmits and receives information (data) to and from the terminal device 20 . The communication unit 11 functions as a communication interface for the Internet or an intranet, and is capable of communication using TCP/IP, for example.

制御部１２は、ビデオ会議装置１０全体を統括的に制御する。制御部１２は、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの集積回路で構成される。制御部１２は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）を備える。制御部１２は、必要なプログラムをＲＯＭにロードし、ＲＡＭを作業領域としてそのプログラムを実行することで、各種の処理（ビデオ会議処理など）を行う。 The control unit 12 centrally controls the entire video conference device 10 . The control unit 12 is configured by an integrated circuit such as an MPU (Micro Processing Unit). The control unit 12 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory). The control unit 12 loads a necessary program into the ROM and executes the program using the RAM as a work area to perform various processes (such as video conference processing).

記憶部１４は、制御部１２で実行される各種プログラムやこれらのプログラムによって使用されるデータなどを記憶するコンピュータ読み取り可能な記憶媒体である。記憶部１４は、ハードディスク、光ディスク、磁気ディスクなどの記憶装置で構成される。記憶部１４の構成はこれらに限られず、記憶部１４をＲＡＭやフラッシュメモリなどの半導体メモリなどで構成してもよい。例えば記憶部１４をＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）で構成することもできる。 The storage unit 14 is a computer-readable storage medium that stores various programs executed by the control unit 12 and data used by these programs. The storage unit 14 is configured by a storage device such as a hard disk, an optical disk, or a magnetic disk. The configuration of the storage unit 14 is not limited to these, and the storage unit 14 may be configured by a semiconductor memory such as RAM or flash memory. For example, the storage unit 14 can be configured with an SSD (Solid State Drive).

記憶部１４は、プログラム記憶部１５、データ記憶部１６、仮想会議室記憶部１８などを備える。プログラム記憶部１５は、制御部１２で実行される各種プログラムを記憶する。制御部１２は、プログラム記憶部１５から必要なプログラムを読み出して各種の処理を実行する。 The storage unit 14 includes a program storage unit 15, a data storage unit 16, a virtual conference room storage unit 18, and the like. The program storage unit 15 stores various programs executed by the control unit 12 . The control unit 12 reads necessary programs from the program storage unit 15 and executes various processes.

データ記憶部１６には、例えばユーザ情報１６１などが記憶される。ユーザ情報１６１には、予め登録されているユーザＩＤ、会社名、氏名などが含まれる。なお、ユーザ情報１６１はこれに限られない。 The data storage unit 16 stores, for example, user information 161 and the like. The user information 161 includes pre-registered user IDs, company names, names, and the like. Note that the user information 161 is not limited to this.

仮想会議室記憶部１８には、予め設定される仮想会議室の構成や仮想会議室での参加者の位置情報などが記憶される。具体的には仮想会議室記憶部１８には、仮想会議室構成情報１８１、参加者情報１８２、仮想会議室表示情報１８３などが記憶される。その他、仮想会議室記憶部１８には、仮想会議室で表示する会議室構成要素として例えばテーブル１８３ｃ、ホワイトボード１８３ｄなどの画像データが記憶される。このホワイトボード１８３ｄには、プレゼンテーション用の画像や動画も表示できる。なお、本明細書において仮想会議室は仮想会話室としてもよい。 The virtual conference room storage unit 18 stores the preset configuration of the virtual conference room, the position information of the participants in the virtual conference room, and the like. Specifically, the virtual conference room storage unit 18 stores virtual conference room configuration information 181, participant information 182, virtual conference room display information 183, and the like. In addition, image data such as a table 183c and a whiteboard 183d are stored in the virtual conference room storage unit 18 as conference room components to be displayed in the virtual conference room. Images and moving images for presentation can also be displayed on this whiteboard 183d. In this specification, the virtual conference room may be a virtual conversation room.

仮想会議室構成情報１８１には、各端末装置２０に表示する参加者の映像を配置するための仮想会議室の構成情報であり、例えば図３に示すようなデータテーブルからなる。具体的には仮想会議室の種類、収容人数、テーブル１８３ｃの有無、ホワイトボード１８３ｄの有無、参加者を表示できる縦列数や横列数などである。本実施形態では、参加者の収容人数や参加者を配置する列数、テーブル１８３ｃの表示の有無などが異なる複数の仮想会議室の構成情報が記憶される。 The virtual conference room configuration information 181 is configuration information of the virtual conference room for arranging the images of the participants to be displayed on each terminal device 20, and includes, for example, a data table as shown in FIG. Specifically, they include the type of virtual conference room, the number of people accommodated, the presence or absence of a table 183c, the presence or absence of a whiteboard 183d, the number of columns and rows in which participants can be displayed, and the like. In this embodiment, the configuration information of a plurality of virtual conference rooms with different number of participants, the number of columns in which participants are arranged, and whether or not to display the table 183c is stored.

図３では、小会議室Ｒ１、中会議室Ｒ２、大会議室Ｒ３、対面会議室Ｒ４などを例示している。小会議室Ｒ１、中会議室Ｒ２、大会議室Ｒ３は収容人数が異なり、対面会議室Ｒ４と他の会議室とでは参加者の見え方が異なる。例えば小会議室Ｒ１、中会議室Ｒ２、大会議室Ｒ３では、後述する図６に示すようにテーブル１８３ｂに周りに自分と他の参加者の映像が枠にはめ込まれて表示される。テーブル１８３ｂを表示しないようにすることもできる。２人の対面会議室Ｒ４では、テーブル１８３ｂも自分も表示されず、相手の映像だけが表示される。なお、仮想会議室は図示したものに限らない。また仮想会議室は予め設定されているものだけでなく、仮想会議室の種類、収容人数、テーブル１８３ｃの有無、ホワイトボード１８３ｄの有無、参加者を表示できる縦列数や横列数などをユーザが変更したり設定したりできるようにしてもよい。 FIG. 3 illustrates a small conference room R1, a medium conference room R2, a large conference room R3, a face-to-face conference room R4, and the like. The capacity of the small conference room R1, the medium conference room R2, and the large conference room R3 are different, and the appearance of the participants differs between the face-to-face conference room R4 and the other conference rooms. For example, in the small conference room R1, the middle conference room R2, and the large conference room R3, the images of the participant and the other participants are displayed in a frame around the table 183b as shown in FIG. 6, which will be described later. It is also possible not to display the table 183b. In the two-person meeting room R4, neither the table 183b nor the user himself is displayed, and only the image of the other party is displayed. Note that the virtual conference room is not limited to the illustrated one. In addition, the user can change not only the preset virtual conference room, but also the type of virtual conference room, the number of people that can be accommodated, the presence or absence of a table 183c, the presence or absence of a whiteboard 183d, and the number of columns and rows in which participants can be displayed. or set.

参加者情報１８２には、会議に参加する参加者の情報が記憶され、例えば図４に示すようなデータテーブルで構成される。具体的には参加者情報１８２としては、参加者が参加する会議室ＩＤ、ユーザＩＤ、地点、入室の有無、退室の有無、仮想会議室上の参加者の位置（横列ｙ０１、縦列ｔ０１など）、参加者の顔の動き、映像データの数、音声データの種類などの情報が挙げられる。映像データや音声データは端末装置２０から受信するデータであり、映像データの数は例えば正面、右側、左側、下側であれば４つである。端末装置２０から受信する音声データの種類はステレオ音声である。参加者情報１８２は上述したものに限られず、例えばＩＰアドレス、スクリーンショット画像などを記憶してもよい。 The participant information 182 stores information about participants who participate in the conference, and is composed of a data table as shown in FIG. 4, for example. Specifically, the participant information 182 includes the ID of the conference room in which the participant participates, the user ID, the location, the presence or absence of entry into the room, the presence or absence of exit from the room, and the position of the participant on the virtual conference room (row y01, column t01, etc.). , facial movements of the participants, the number of video data, the type of audio data, and the like. Video data and audio data are data received from the terminal device 20, and the number of video data is four for the front, right, left, and bottom sides, for example. The type of audio data received from the terminal device 20 is stereo audio. The participant information 182 is not limited to those described above, and may store, for example, IP addresses, screenshot images, and the like.

仮想会議室表示情報１８３は、各参加者が入室する仮想会議室の表示情報である。例えば図５に示すように仮想会議室表示情報１８３には、各参加者を配置する列や各参加者の位置（参加者の映像をはめ込む枠の位置）、テーブル１８３ｃやホワイトボード１８３ｄなどの会議要素の位置、仮想会議室の表示範囲１８３ｂなどが含まれる。仮想会議室表示情報１８３は、図３の仮想会議室構成情報１８１の仮想会議室の種類毎に設けられる。どの仮想会議室を利用するかは、ホストとなる参加者が予め設定できるようになっている。 The virtual conference room display information 183 is display information of the virtual conference room that each participant enters. For example, as shown in FIG. 5, the virtual conference room display information 183 includes the columns in which each participant is arranged, the position of each participant (the position of the frame in which the video of the participant is fitted), and the conference information such as a table 183c and a whiteboard 183d. The position of the element, the display extent 183b of the virtual conference room, etc. are included. The virtual conference room display information 183 is provided for each type of virtual conference room in the virtual conference room configuration information 181 in FIG. The host participant can set in advance which virtual conference room to use.

図５は、図３の小会議室Ｒ１の仮想会議室を参加者Ａ、Ｂの２人が利用する場合を例示する。図５に示すように外枠１８３ａが仮想会議室全体を示す。その内側の太枠が表示範囲１８３ｂを示す。図３の小会議室Ｒ１は収容人数が２人、テーブルあり、横列数２つ、縦列数２つである。図５では、中央にテーブル１８３ｃを配置する。テーブル１８３ｃの下側の列が横列ｙ０１、テーブル１８３ｃの上側の列が横列ｙ０２に相当する。テーブル１８３ｃの左側の列が縦列ｔ０１に相当し、テーブル１８３ｃの右側の列が縦列ｔ０２に相当する。なお、図５に示すテーブル１８３ｃ、ホワイトボード１８３ｄ、縦列、横列などの位置は予め設定されているものを利用してもよく、ホストとなる参加者が位置を変えて利用するようにしてもよい。 FIG. 5 illustrates a case where two participants A and B use the virtual conference room of the small conference room R1 in FIG. As shown in FIG. 5, an outer frame 183a indicates the entire virtual conference room. The thick frame inside it indicates the display range 183b. The small conference room R1 in FIG. 3 has a capacity of two people, a table, two rows and two columns. In FIG. 5, a table 183c is placed in the center. The lower row of the table 183c corresponds to the horizontal row y01, and the upper row of the table 183c corresponds to the horizontal row y02. The left column of the table 183c corresponds to the column t01, and the right column of the table 183c corresponds to the column t02. Note that the positions of the table 183c, the whiteboard 183d, columns, rows, etc. shown in FIG. .

図５の横列ｙ０１に配置されるＡ枠、横列ｙ０２に配置されるＢ枠には各参加者の映像をはめ込むことができる。例えば後述する図６は、小会議室Ｒ１を利用してＡ枠には参加者Ａの映像がはめ込まれ、Ｂ枠に参加者Ｂの映像がはめ込まれる場合を例示する。なお、端末装置２０の表示には、仮想会議室を平面で表示するか、立体で表示するかを予め選択可能である。平面表示が選択された場合は図５の表示範囲１８３ｂがそのまま表示画面２５２に表示され、立体表示が選択された場合は図６のようにテーブル１８３ｃなどが立体的に表示される。図５では参加者が配置される列とテーブル１８３ｃを含めた範囲が表示範囲１８３ｂとなる。図５ではホワイトボード１８３ｄは表示範囲１８３ｂに含まれていないため、表示されない。もし表示する場合には、最初から又は会議中にホワイトボード１８３ｄも含むように表示範囲１８３ｂを調整することができる。 A video of each participant can be inserted into the A frame arranged in the horizontal row y01 and the B frame arranged in the horizontal row y02 in FIG. For example, FIG. 6, which will be described later, exemplifies a case where the image of participant A is fitted in the A frame and the image of participant B is fitted in the B frame using the small conference room R1. It should be noted that it is possible to select in advance whether the virtual conference room should be displayed two-dimensionally or three-dimensionally for display on the terminal device 20 . When the two-dimensional display is selected, the display range 183b of FIG. 5 is displayed as it is on the display screen 252, and when the three-dimensional display is selected, the table 183c and the like are stereoscopically displayed as shown in FIG. In FIG. 5, the display range 183b is the range including the columns in which the participants are arranged and the table 183c. Since the whiteboard 183d is not included in the display range 183b in FIG. 5, it is not displayed. If displayed, the display range 183b can be adjusted to include the whiteboard 183d from the beginning or during the meeting.

図２の制御部１２は、取得部１２１、動き検知部１２２、映像選定部１２３、映像生成部１２４、音声選定部１２５、音声生成部１２６、出力部１２７を備える。これら制御部１２の各構成要素は、物理的な回路で構成してもよく、ＣＰＵが実行可能なプログラムで構成してもよい。制御部１２の構成は、図２に示す構成に限られない。 The control unit 12 in FIG. 2 includes an acquisition unit 121 , a motion detection unit 122 , an image selection unit 123 , an image generation unit 124 , an audio selection unit 125 , an audio generation unit 126 and an output unit 127 . Each component of the control unit 12 may be configured by a physical circuit, or may be configured by a program executable by the CPU. The configuration of the control unit 12 is not limited to the configuration shown in FIG.

取得部１２１は、通信部１１を介して端末装置２０毎に撮像装置２４で撮像された参加者の複数の映像データと音声データを受信する。具体的には取得部１２１は各参加者の異なる向き（例えば正面、右側、左側、上側、下側など）の顔の映像を含む映像データとステレオの音声データ（左側音声、右側音声）を受信する。取得部１２１はどの撮像装置２４で撮像された映像かは問わないが、正面の顔の映像の他に、もし自分を見る相手がその場にいたら相手が顔を動かしたときに見えるべき自分の顔の映像（右側、左側、上側、下側のいずれかなど）も入力される。例えば顔を左右に動かして他の参加者の映像を変える場合には、顔の右側からの映像（右顔映像）と左側からの映像（左顔映像）が入力される。 The acquisition unit 121 receives, via the communication unit 11 , a plurality of video data and audio data of the participants captured by the imaging device 24 for each terminal device 20 . Specifically, the acquisition unit 121 receives video data including facial images of different orientations of each participant (for example, front, right, left, upper, lower, etc.) and stereo audio data (left audio, right audio). do. The acquisition unit 121 does not matter which imaging device 24 the image is captured by, but in addition to the image of the front face, if there is a person who is looking at the person there, the image of the person who should be seen when the person moves his or her face. A face image (either right, left, top, bottom, etc.) is also input. For example, when moving the face left and right to change the image of another participant, the image from the right side of the face (right face image) and the image from the left side (left face image) are input.

動き検知部１２２は、取得部１２１で取得された映像データから参加者の顔の動きを検知する。動き検知部１２２は、参加者の顔の動きがあるか否かも検知する。ここでの「顔の動き」としては、顔の移動（顔の位置の変化）と顔の回転（顔の向きの変化）が挙げられる。動き検知部１２２は、例えば横方向（左右方向）、縦方向（上下方向）、前後方向などの顔の移動（顔の位置の変化）を検知できる。その他、動き検知部１２２は、右向き、左向き、上向き、下向きなどの顔の回転（顔の向きの変化）も検知できるようにしてもよい。第１実施形態の動き検知部１２２は、顔の動きありを検知した場合は、横方向（右方向、左方向）と縦方向（上方向、下方向）のどの方向に顔が移動したかも検知する。 The movement detection unit 122 detects movement of the participant's face from the video data acquired by the acquisition unit 121 . The motion detection unit 122 also detects whether or not the face of the participant moves. The "movement of the face" here includes movement of the face (change in position of the face) and rotation of the face (change in orientation of the face). The movement detection unit 122 can detect movement of the face (change in position of the face) in, for example, the horizontal direction (horizontal direction), the vertical direction (vertical direction), and the front-rear direction. In addition, the motion detection unit 122 may be configured to detect rotation of the face such as rightward, leftward, upward, downward, etc. (change in face direction). When the movement detection unit 122 of the first embodiment detects that there is a movement of the face, it also detects in which direction the face has moved: the horizontal direction (right direction, left direction) or the vertical direction (upward direction, downward direction). do.

動き検知部１２２は、顔の正面の映像から顔の動きを検知してもよく、他の向きの映像（右向きの顔の映像や左向きの顔の映像）から顔の動きを検知してもよい。例えば顔の正面の映像から顔の部分を認識し、顔の向き（鼻の向きでもよい）をベクトル化して、そのベクトルの位置の変化から顔の移動（顔の位置の変化）を検知し、そのベクトルの角度の変化から顔の回転（顔の向きの変化）を検知してもよい。この場合、例えば撮像装置２４の映像からの顔の動きをＡＩ（人工知能）などで機械学習させた学習済モデルや既存の学習済モデルを用いて、撮像装置２４の映像から顔の動きを検知するようにしてもよい。顔の動きの検知は上記の方法に限られず、顔の映像から検知できるものであれば、どのような方法で検知してもよい。 The motion detection unit 122 may detect the movement of the face from the image of the front of the face, or may detect the movement of the face from the image of another direction (image of the face facing right or left). . For example, by recognizing the face part from the image of the front of the face, vectorizing the direction of the face (or the direction of the nose), detecting the movement of the face (change in the position of the face) from the change in the position of the vector, The rotation of the face (change in face orientation) may be detected from the change in angle of the vector. In this case, for example, the movement of the face from the image of the imaging device 24 is detected from the image of the imaging device 24 using a trained model obtained by machine learning with AI (artificial intelligence) or an existing trained model. You may make it The detection of the movement of the face is not limited to the above method, and any method may be used as long as it can be detected from the image of the face.

映像選定部１２３は、参加者毎に取得部１２１で取得した異なる向きの顔の映像から、動き検知部１２２で検知された顔の動きに応じて、端末装置２０に表示する他の参加者の映像を選定する。例えば実際の会議では対面する相手がいる場合、自分の顔を左に動かせば、相手の顔の右側が見えるはずである。そこで、例えば動き検知部１２２が参加者の左方向への顔の移動（左移動）を検知した場合、映像選定部１２３はその参加者の対面位置に表示する他の参加者の映像として、取得部１２１で取得された異なる向きの顔の映像（正面、右側、左側、上側、下側など）から顔の右側の映像を選定する。映像生成部１２４は映像選定部１２３で選定した各参加者の顔の映像を、仮想会議室の所定の位置（顔映像枠の位置）にはめ込んで、テーブルなどの画像と合成した出力映像を生成する。これにより、実際の会議で自分が顔を動かしたときと同じように他の参加者の映像を見ることができる。 The image selection unit 123 selects images of other participants to be displayed on the terminal device 20 according to the facial movements detected by the movement detection unit 122 from images of faces in different directions acquired by the acquisition unit 121 for each participant. Select a video. For example, in an actual meeting, when there is a person facing each other, if you move your face to the left, you should be able to see the right side of the person's face. Therefore, for example, when the movement detection unit 122 detects a movement of the participant's face in the left direction (left movement), the image selection unit 123 acquires the image of the other participant to be displayed at the position facing the participant. An image of the right side of the face is selected from the images of the face in different orientations (front, right, left, upper, lower, etc.) acquired by the unit 121 . The image generation unit 124 inserts the image of each participant's face selected by the image selection unit 123 into a predetermined position (the position of the face image frame) in the virtual conference room, and generates an output image synthesized with an image such as a table. do. This allows you to see the images of other participants in the same way as if you moved your face in an actual meeting.

音声選定部１２５は、動き検知部１２２で検知された参加者の顔の動きに応じて他の参加者の音声を選定する。例えば実際の会議では左側にいる参加者に顔を動かして話しかける場合は左側の方が強く聞こえるはずである。そこで、例えば動き検知部１２２が参加者の左への顔の動きを検知した場合には、その参加者の左側に表示する他の参加者の音声として、取得部１２１で取得されたステレオの音声データのうち左側音声を選定する。音声生成部１２６は左右の音声や全体の音量を調整した出力音声を生成する。上記のように動き検知部１２２が参加者の左への顔の動きを検知した場合は、その参加者の左側に表示する他の参加者の音声については、左側音声の音量を右側音声よりも大きめにした出力音声を生成する。 The voice selection unit 125 selects voices of other participants according to the facial movements of the participants detected by the motion detection unit 122 . For example, in an actual conference, when speaking to a participant on the left side by moving one's head, the left side should sound stronger. Therefore, for example, when the movement detection unit 122 detects the movement of the participant's face to the left, the stereo sound acquired by the acquisition unit 121 is used as the voice of the other participant displayed on the left side of the participant. The left voice is selected from the data. The audio generation unit 126 generates output audio in which the left and right audio and the overall volume are adjusted. When the movement detection unit 122 detects the movement of the participant's face to the left as described above, the volume of the left audio is set higher than that of the right audio for the audio of the other participants displayed to the left of the participant. Generates louder output audio.

出力部１２７は、映像生成部１２４で生成された出力映像と音声生成部１２６で生成された出力音声とを動画データ（ビデオデータ）として通信部１１に出力する。なお、制御部１２は、動き検知部１２２検知された顔の動きなどを図４の参加者情報１８２に記憶し更新する。映像選定部１２３や音声選定部１２５は、参加者情報１８２の顔の動きに基づいて映像や音声を選定するようにしてもよい。 The output unit 127 outputs the output image generated by the image generation unit 124 and the output sound generated by the sound generation unit 126 to the communication unit 11 as moving image data (video data). Note that the control unit 12 stores and updates the participant information 182 shown in FIG. 4 with the movement of the face detected by the movement detection unit 122 . The video selection unit 123 and the audio selection unit 125 may select video and audio based on facial movements in the participant information 182 .

出力部１２７からの出力映像及び出力音声は動画データとして通信部１１を介して端末装置２０に送信される。端末装置２０は、受信した出力映像を表示装置２５の表示画面２５２に表示し、出力音声をスピーカ２７から出力する。なお、出力映像と出力音声はＷｅｂの動画データであってもよい。この場合、ビデオ会議装置１０は、出力映像と出力音声を動画データとしてＷｅｂに表示させて、端末装置２０はそのＷｅｂから動画データを受信してブラウザに表示する。 The output video and output audio from the output unit 127 are transmitted to the terminal device 20 via the communication unit 11 as moving image data. The terminal device 20 displays the received output video on the display screen 252 of the display device 25 and outputs the output audio from the speaker 27 . Note that the output video and output audio may be web video data. In this case, the video conference device 10 displays the output video and output audio as moving image data on the web, and the terminal device 20 receives the moving image data from the web and displays it on the browser.

次に、端末装置２０の構成例について図２を参照しながら説明する。図２に示す端末装置２０は、通信部２１、制御部２２、記憶部２３、撮像装置２４，表示装置２５、マイク２６、スピーカ２７、入力装置２８を備える。これらはそれぞれバスライン２０Ｌに接続され、相互に情報（データ）のやり取りが可能である。 Next, a configuration example of the terminal device 20 will be described with reference to FIG. A terminal device 20 shown in FIG. These are connected to the bus line 20L, respectively, and can exchange information (data) with each other.

通信部２１は、ネットワークＮと有線又は無線で接続され、ビデオ会議装置１０との間で情報（データ）の送受信を行う。通信部２１は、インターネットやイントラネットの通信インターフェースとして機能し、例えばＴＣＰ／ＩＰを用いた通信などが可能である。 The communication unit 21 is connected to the network N by wire or wirelessly, and transmits and receives information (data) to and from the video conference device 10 . The communication unit 21 functions as a communication interface for the Internet or an intranet, and is capable of communication using TCP/IP, for example.

制御部２２は、端末装置２０全体を統括的に制御する。制御部２２は、ＭＰＵ（ＭｉｃｒｏＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの集積回路で構成される。制御部２２は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）を備える。制御部２２は、必要なプログラムをＲＯＭにロードし、ＲＡＭを作業領域としてそのプログラムを実行することで、各種の処理を行う。 The control unit 22 controls the terminal device 20 as a whole. The control unit 22 is configured by an integrated circuit such as an MPU (Micro Processing Unit). The control unit 22 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory). The control unit 22 loads necessary programs into the ROM and executes the programs using the RAM as a work area, thereby performing various processes.

記憶部２３は、制御部２２で実行される各種プログラムやこれらのプログラムによって使用されるデータを記憶する記憶媒体の例示である。記憶部２３は、ハードディスク、光ディスク、磁気ディスクなどの記憶装置で構成される。記憶部２３の構成はこれらに限られず、記憶部２３をＲＡＭやフラッシュメモリなどの半導体メモリで構成してもよい。例えば記憶部２３をＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）で構成することもできる。 The storage unit 23 is an example of a storage medium that stores various programs executed by the control unit 22 and data used by these programs. The storage unit 23 is configured by a storage device such as a hard disk, an optical disk, or a magnetic disk. The configuration of the storage unit 23 is not limited to these, and the storage unit 23 may be configured by a semiconductor memory such as RAM or flash memory. For example, the storage unit 23 can be configured with an SSD (Solid State Drive).

撮像装置２４は、参加者の顔を異なる方向から撮影する複数のカメラである。撮像装置２４は、２つでも３つでもよく、４つ以上でもよい。本実施形態の撮像装置２４は、端末装置２０の左右に２つ、上下に２つの合計４つで構成した場合を例示する。詳細は後述する。撮像装置２４は、端末装置２０に内蔵されるカメラでもよく、外付けのカメラでもよい。またＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）カメラ、Ｗｅｂカメラ、ＩｏＴカメラなど映像を出力できるものであればどのようなカメラであってもよい。 The image pickup device 24 is a plurality of cameras that photograph the faces of the participants from different directions. The imaging devices 24 may be two, three, or four or more. The imaging device 24 of the present embodiment is configured by two in the left and right sides of the terminal device 20 and two in the top and bottom of the terminal device 20, for a total of four. Details will be described later. The imaging device 24 may be a camera built into the terminal device 20 or an external camera. Also, any camera such as a CCD (Charge Coupled Device) camera, a web camera, an IoT camera, etc., that can output an image may be used.

表示装置２５は、液晶ディスプレイや有機ＥＬディスプレイなどであり、制御部２２からの指示に従って各種情報を表示する。制御部２２は、ビデオ会議装置１０から通信部２１を介して受信した動画データを表示装置２５の表示画面２５２に表示させる。 The display device 25 is a liquid crystal display, an organic EL display, or the like, and displays various information according to instructions from the control section 22 . The control unit 22 causes the display screen 252 of the display device 25 to display the moving image data received from the video conference device 10 via the communication unit 21 .

マイク２６は、参加者の音声を取り込むステレオマイクである。マイク２６は端末装置２０に内蔵されるマイクでもよく、外付けのマイクでもよい。スピーカ２７は、ステレオスピーカであり、制御部２２からの指示に従って各種の音声や音楽を出力する。スピーカ２７は端末装置２０に内蔵されるスピーカでもよく、外付けのスピーカでもよい。 A microphone 26 is a stereo microphone that captures the voices of the participants. The microphone 26 may be a microphone built into the terminal device 20 or an external microphone. The speaker 27 is a stereo speaker and outputs various sounds and music according to instructions from the control unit 22 . The speaker 27 may be a speaker built into the terminal device 20 or an external speaker.

入力装置２８は、キーボードやマウスなどであり、ユーザからの操作入力を受け付けて操作内容に対応した制御信号を制御部２２へ送信する。入力装置２８は表示装置２５に設けられたタッチパネルであってもよい。本発明ではマウスやキーボードを使わなくても、顔の動きにより表示画面２５２の他の参加者の見え方を変えられる。 The input device 28 is a keyboard, a mouse, or the like, and receives an operation input from the user and transmits a control signal corresponding to the operation content to the control unit 22 . The input device 28 may be a touch panel provided on the display device 25 . In the present invention, the appearance of other participants on the display screen 252 can be changed by facial movements without using a mouse or keyboard.

図６は、参加者が２人の場合のビデオ会議システム１００の概略構成を示す図である。図６のビデオ会議システム１００は、ビデオ会議装置１０を構成するサーバに、参加者Ａの端末装置２０Ａと参加者Ｂの端末装置２０ＢとがネットワークＮで接続されて構成される。端末装置２０Ａ、２０Ｂは同様の構成であるため、同様の機能を有する要素は同様の符号を付して説明する。 FIG. 6 is a diagram showing a schematic configuration of the video conference system 100 when there are two participants. A video conference system 100 of FIG. 6 is configured by connecting a terminal device 20A of a participant A and a terminal device 20B of a participant B to a server constituting a video conference device 10 via a network N. FIG. Since the terminal devices 20A and 20B have the same configuration, the elements having the same functions are denoted by the same reference numerals.

図６の端末装置２０Ａ、２０Ｂは、本体２０ａと表示装置２５を一体にしたパーソナルコンピュータである。表示装置２５の前面下方には中央にマイク２６、左右にスピーカ２７が設けられている。表示装置２５の背面には本体２０ａが設けられている。本体２０ａには、通信部２１、制御部２２、記憶部２３などが内蔵されている。表示装置２５の外周には４つの撮像装置２４ａ、２４ｂ、２４ｃ、２４ｄが設けられている。４つの撮像装置２４ａ、２４ｂ、２４ｃ、２４ｄはそれぞれ異なる向きの顔の映像を含む映像を取得するためのものであり、本体２０ａに接続されて通信部２１から映像を送信できるようになっている。なお、４つの撮像装置２４ａ、２４ｂ、２４ｃ、２４ｄはそれぞれ向きが異なるため、顔だけでなく背景も異なって見える。そのため、どの映像が選定されて表示されるかによって背景も変わるので、より臨場感を高めることができる。 Terminal devices 20A and 20B in FIG. 6 are personal computers in which a main body 20a and a display device 25 are integrated. A microphone 26 is provided in the center and speakers 27 are provided on the left and right sides of the front surface of the display device 25 . A main body 20 a is provided on the back of the display device 25 . The body 20a incorporates a communication section 21, a control section 22, a storage section 23, and the like. Four imaging devices 24 a, 24 b, 24 c, and 24 d are provided on the periphery of the display device 25 . The four imaging devices 24a, 24b, 24c, and 24d are for acquiring images including images of faces facing in different directions, respectively, and are connected to the main body 20a so that images can be transmitted from the communication unit 21. . Note that since the orientations of the four imaging devices 24a, 24b, 24c, and 24d are different, not only the face but also the background look different. Therefore, the background changes depending on which image is selected and displayed, so that the sense of presence can be enhanced.

なお、撮像装置２４（２４ａ、２４ｂ、２４ｃ、２４ｄ）は必ずしも表示装置２５に設置されていなくてもよく、異なる向きの顔の映像を含む映像データを取得できれば壁や机などどのような場所に設置されていてもよい。また撮像装置２４は必ずしも端末装置２０Ａ、２０Ｂに電気的に接続されていなくてもよい。撮像装置２４は、異なる向きの顔の映像を含む映像データがネットワークＮを介してビデオ会議装置１０に送信される構成であれば、ＣＣＤカメラ、Ｗｅｂカメラ、ＩｏＴカメラなど、どのような撮像装置２４を用いてもよい。また、撮像装置２４の数も４つに限られない。 Note that the imaging devices 24 (24a, 24b, 24c, and 24d) do not necessarily have to be installed in the display device 25, and can be placed on any place such as a wall or a desk as long as video data including videos of faces in different directions can be acquired. may be installed. Also, the imaging device 24 does not necessarily have to be electrically connected to the terminal devices 20A and 20B. The image capturing device 24 may be any image capturing device 24 such as a CCD camera, a web camera, an IoT camera, etc., as long as the image data including images of faces facing different directions is transmitted to the video conference device 10 via the network N. may be used. Also, the number of imaging devices 24 is not limited to four.

ここで撮像装置２４ａ、２４ｂ、２４ｃ、２４ｄの設置位置と映像例を図７乃至図９を参照しながら説明する。図７は図６の端末装置２０Ｂの正面から見た図であり撮像装置の位置を示す図である。図８は図７の端末装置２０Ｂを上から見た図であり撮像装置２４からの映像例を示す図である。図９は図７の端末装置２０Ｂを左から見た図であり撮像装置２４からの映像例を示す図である。端末装置２０Ａも同様の構成であるため、ここでは端末装置２０Ｂを代表して説明する。 Here, installation positions of the imaging devices 24a, 24b, 24c, and 24d and image examples will be described with reference to FIGS. 7 to 9. FIG. FIG. 7 is a front view of the terminal device 20B in FIG. 6 and shows the position of the imaging device. FIG. 8 is a top view of the terminal device 20B in FIG. 7 and shows an example of an image from the imaging device 24. In FIG. FIG. 9 is a view of the terminal device 20B in FIG. 7 viewed from the left, showing an example of an image from the imaging device 24. As shown in FIG. Since the terminal device 20A has the same configuration, the terminal device 20B will be described here as a representative.

図７に示すように撮像装置２４ａは表示装置２５の右側面に配置される右側撮像装置（第１撮像装置）であり、図８に示すように参加者Ａの顔の右側を撮像する。撮像装置２４ａは表示装置２５の縦方向に沿った回転軸２４４ａを中心に左右に回動自在に支持部２４２ａに支持される。撮像装置２４ａを左右に回動することで顔の右側の撮像角度を調整できる。図７に示すように撮像装置２４ｂは表示装置２５の左側面に配置される左側撮像装置（第２撮像装置）であり、図８に示すように参加者Ａの顔の左側を映像する。撮像装置２４ｂは表示装置２５の縦方向に沿った回転軸２４４ｂを中心に左右に回動自在に支持部２４２ｂに支持される。撮像装置２４ｂを左右に回動することで顔の左側の撮像角度を調整できる。 As shown in FIG. 7, the imaging device 24a is a right imaging device (first imaging device) arranged on the right side of the display device 25, and images the right side of the participant A's face as shown in FIG. The imaging device 24a is supported by a supporting portion 242a so as to be freely rotatable in the left and right direction around a rotating shaft 244a along the vertical direction of the display device 25. As shown in FIG. The imaging angle of the right side of the face can be adjusted by turning the imaging device 24a left and right. As shown in FIG. 7, the imaging device 24b is a left imaging device (second imaging device) arranged on the left side of the display device 25, and images the left side of the participant A's face as shown in FIG. The imaging device 24b is supported by a supporting portion 242b so as to be freely rotatable in the left and right direction around a rotation shaft 244b along the vertical direction of the display device 25. As shown in FIG. By turning the imaging device 24b left and right, the imaging angle of the left side of the face can be adjusted.

図７に示すように撮像装置２４ｃは表示装置２５の上側面に配置される中央撮像装置（第３撮像装置）であり、図８及び図９に示すように参加者Ａの顔の正面を映像する。撮像装置２４ｃは表示装置２５の横方向に沿った回転軸２４４ｃを中心に上下に回動自在に支持部２４２ｃに支持される。撮像装置２４ｃを上下に回動することで顔の正面の撮像角度を調整できる。ここでは撮像装置２４ｃで顔の正面を撮像する場合を例示するが、撮像装置２４ｃの撮像角度を変えて顔の上側を撮像するようにしてもよい。図７に示すように撮像装置２４ｄは表示装置２５の下側面に配置される下側撮像装置（第４撮像装置）であり、図９に示すように参加者Ａの顔の下側を映像する。撮像装置２４ｄは表示装置２５の横方向に沿った回転軸２４４ｄを中心に上下に回動自在に支持部２４２ｄに支持される。撮像装置２４ｄを上下に回動することで顔の下側の撮像角度を調整できる。 As shown in FIG. 7, the imaging device 24c is a central imaging device (third imaging device) arranged on the upper side of the display device 25, and as shown in FIGS. do. The imaging device 24c is supported by a supporting portion 242c so as to be vertically rotatable about a rotating shaft 244c along the horizontal direction of the display device 25. As shown in FIG. The imaging angle of the front of the face can be adjusted by rotating the imaging device 24c up and down. Here, a case where the image capturing device 24c captures an image of the front of the face is exemplified, but the image capturing angle of the image capturing device 24c may be changed to image the upper side of the face. As shown in FIG. 7, the imaging device 24d is a lower imaging device (fourth imaging device) arranged on the lower side of the display device 25, and images the lower side of the face of the participant A as shown in FIG. . The imaging device 24d is supported by a supporting portion 242d so as to be vertically rotatable about a rotating shaft 244d along the horizontal direction of the display device 25. As shown in FIG. The imaging angle of the lower side of the face can be adjusted by rotating the imaging device 24d up and down.

ところで、ビデオ会議装置１０などのコミュニケーション装置は、端末装置２０に設置した撮像装置２４からの参加者の映像を各端末装置２０の表示装置２５の表示画面２５２に表示させて会議や会話を行うことができるようになっている。この場合、例えば端末装置２０の表示装置２５の表示画面２５２に複数の参加者の正面顔を格子状に整列させて表示することもできる。 By the way, a communication device such as the video conference device 10 displays images of participants from the imaging device 24 installed in the terminal device 20 on the display screen 252 of the display device 25 of each terminal device 20 to hold a meeting or have a conversation. is now possible. In this case, for example, the front faces of a plurality of participants can be displayed in a grid pattern on the display screen 252 of the display device 25 of the terminal device 20 .

しかしながら、単に正面顔を列挙して表示するだけでは、実際の会議室や会場にいるように顔を動かして見たいところだけ見たりすることもできないので、その場にいるような臨場感をなかなか得られない。また複数の撮像装置を用いれば横顔を撮像して表示させることはできる。ところが、単に対話相手を推定して視線を一致させるだけでは、顔を動かして相手の横顔をのぞき込んだり、プレゼンテーションを見ながら隣の人に話しかけたり、他の参加者を見渡したりする動きには対応できない。したがって、実際の会議室や会場にいるような臨場感までは伝わりにくい。 However, by simply listing and displaying the front face, it is not possible to move the face and see only the part you want to see as if you were in the actual meeting room or venue, so it is difficult to create a realistic feeling as if you were there. I can't get it. Also, by using a plurality of imaging devices, it is possible to image and display a side face. However, simply estimating the conversation partner and matching their gaze does not support movements such as moving the face to look into the other person's profile, talking to the person next to them while watching a presentation, or looking over other participants. Can not. Therefore, it is difficult to convey the presence of being in an actual conference room or venue.

さらにＶＲ（ＶｉｒｔｕａｌＲｅａｌｉｔｙ）会議によれば見たいところが見えるので臨場感を得ることも可能と考えられる。ところが、ＶＲ会議ではヘッドマウントディスプレイ（ＶＲゴーグルなど）が必要になり、これを装着しながら会議をするのは煩わしい。また大量の通信データが必要になるので、すべての参加者に高スペックの端末装置２０や通信環境がないと、音声が途切れたり画面がフリーズしたりしてスムーズに会議や会話を行えないという問題もある。 In addition, VR (Virtual Reality) conferences allow the viewer to see what they want to see, so it is possible to obtain a sense of realism. However, a VR conference requires a head-mounted display (such as VR goggles), and it is troublesome to hold a conference while wearing this. In addition, since a large amount of communication data is required, if all participants do not have a high-spec terminal device 20 or a communication environment, there is a problem that the voice is interrupted or the screen freezes, and smooth meetings and conversations cannot be conducted. There is also

本実施形態のビデオ会議装置１０によれば、自分の顔の動きに応じて他の参加者の見え方を変えることができるようにすることで、ＶＲを使わなくてもまるでその場にいるような臨場感を得ることができる。 According to the video conference device 10 of the present embodiment, by making it possible to change the appearance of other participants according to the movement of their own faces, participants can feel as if they are there without using VR. You can get a sense of realism.

以下、このようなビデオ会議装置１０が行うビデオ会議処理について図面を参照しながら説明する。図１０はビデオ会議処理の具体例を示すフローチャートである。図１０のビデオ会議処理は、制御部１２（取得部１２１、動き検知部１２２、映像選定部１２３、映像生成部１２４、音声選定部１２５、音声生成部１２６、出力部１２７など）によってプログラム記憶部１５から必要なプログラムが読み出されて実行される。 Video conference processing performed by the video conference device 10 will be described below with reference to the drawings. FIG. 10 is a flow chart showing a specific example of video conference processing. 10 is performed by the control unit 12 (acquisition unit 121, motion detection unit 122, video selection unit 123, video generation unit 124, audio selection unit 125, audio generation unit 126, output unit 127, etc.). A necessary program is read from 15 and executed.

このビデオ会議処理によって、ビデオ会議装置１０にネットワークで接続された複数の参加者の各端末装置２０でＷｅｂ会議を行う場合を例示する。ビデオ会議装置１０は、各参加者の端末装置２０から、予め設定された仮想会議室への入室要求を受信することにより、Ｗｅｂ会議を行うことができるようになる。 A case will be exemplified where a Web conference is held by the terminal devices 20 of a plurality of participants connected to the video conference device 10 via a network by this video conference processing. The video conference device 10 can hold a web conference by receiving a request to enter a preset virtual conference room from the terminal device 20 of each participant.

以下では、図６に示す２人の参加者でビデオ会議や会話を行う場合を例に挙げながらビデオ会議処理について説明する。図１２は顔の動きなし検知した場合の作用説明図であり、図１３は顔の動きありを検知した場合の作用説明図である。図６では仮想会議室としてテーブル１８３ｃと自分を表示する小会議室Ｒ１（図３参照）の場合を例示したが、図１２及び図１３ではテーブル１８３ｃと自分を表示せず、相手のみを表示する対面会議室Ｒ４（図３参照）の場合を例示する。図１２及び図１３の参加者Ａの表示画面２５２には参加者Ｂのみが表示され、参加者Ｂの表示画面２５２には参加者Ａのみが表示される。なお、テーブル１８３ｃやホワイトボード１８３ｄなどの表示は会議中に切り替えることもできる。 In the following, the video conference processing will be described by exemplifying the case where two participants hold a video conference or have a conversation as shown in FIG. 12A and 12B are diagrams for explaining the action when no face movement is detected, and FIGS. 13A and 13B are diagrams for explaining the action when the presence of face movement is detected. FIG. 6 illustrates the case of a small conference room R1 (see FIG. 3) that displays the table 183c and himself as a virtual conference room, but in FIGS. The case of the meeting room R4 (see FIG. 3) is exemplified. Only the participant B is displayed on the display screen 252 of the participant A in FIGS. 12 and 13, and only the participant A is displayed on the display screen 252 of the participant B. FIG. The display of the table 183c, the whiteboard 183d, etc. can be switched during the conference.

先ず制御部１２は、図１０のステップＳ１１０にて端末装置２０から入室要求を受信したか否かを判断し、入力要求を受信したと判断するとステップＳ１２０にて入室処理を行う。具体的にはユーザからのビデオ会議システム１００へのログインと、仮想会議室への入室要求を受け付ける。ビデオ会議装置１０のサーバ上に設けられた仮想会議室に入室することにより既に入室している他の端末装置２０との間で双方向のＷｅｂ会議を行うことができるようになる。ここでの仮想会議室は、ホストとなる参加者により予め図３の中から選ばれた会議室である。 First, the control unit 12 determines whether or not an entry request has been received from the terminal device 20 in step S110 of FIG. Specifically, it accepts a login to the video conference system 100 from the user and a request to enter the virtual conference room. By entering the virtual conference room provided on the server of the video conference device 10, a two-way web conference can be held with another terminal device 20 already in the room. The virtual conference room here is a conference room selected in advance from FIG. 3 by the participant who will be the host.

仮想会議室への入室を受け付けると、制御部１２はユーザを参加者として登録し、図４に示す参加者情報を記憶し、ステップＳ１３０にて映像及び音声の取得を開始してステップＳ１４０のビデオ処理を実行する。ビデオ処理では、参加者の顔の動きを検知し、顔の動きに応じて他の参加者の撮像装置からの映像と音声を選定して動画データを生成する。ビデオ処理の詳細は後述する。 Upon accepting entry into the virtual conference room, the control unit 12 registers the user as a participant, stores the participant information shown in FIG. Execute the process. In video processing, movement of a participant's face is detected, and moving image data is generated by selecting video and audio from imaging devices of other participants according to the movement of the face. Details of the video processing will be described later.

続いて、ステップＳ１５０にて制御部１２は端末装置２０から退室要求を受信したか否かを判断する。退室要求を受信しない間は、ステップＳ１３０及びステップＳＳ１４０の処理を繰り返す。制御部１２は、端末装置２０から退室要求を受信したと判断すると、ステップＳ１６０にてその退室要求のあった端末装置２０からの映像と音声の取得を終了しその参加者をログアウトする。他の端末装置２０については、ステップＳ１３０及びステップＳ１４０の処理を続行する。そしてすべての端末装置２０から退室要求があって映像と音声の取得を終了すると、一連のビデオ会議処理を終了する。 Subsequently, in step S<b>150 , the control unit 12 determines whether or not a request to leave the room has been received from the terminal device 20 . The processes of steps S130 and SS140 are repeated until the exit request is received. When the controller 12 determines that it has received a request to leave the room from the terminal device 20, in step S160 it ends the acquisition of the video and audio from the terminal device 20 that requested to leave the room, and logs out the participant. For other terminal devices 20, the processing of steps S130 and S140 is continued. When all the terminal devices 20 request to leave the room and the acquisition of the video and audio ends, the series of videoconference processing ends.

次に、図１０のステップＳ１４０のビデオ処理について図１１を参照しながら詳細に説明する。図１１は図１０に示すビデオ処理の具体例を示すフローチャートである。このビデオ処理は、本発明のコミュニケーションプログラムの例示である。本実施形態のビデオ処理は、先ず図１１に示すステップＳ１４２にて制御部１２は取得部１２１にて取得した複数の映像のうち少なくとも１つの映像から参加者の顔の動きを検知する。 Next, the video processing of step S140 in FIG. 10 will be described in detail with reference to FIG. FIG. 11 is a flow chart showing a specific example of the video processing shown in FIG. This video processing is exemplary of the communication program of the present invention. In the video processing of the present embodiment, first, in step S142 shown in FIG. 11, the control unit 12 detects the movement of the participant's face from at least one image among the plurality of images acquired by the acquisition unit 121. FIG.

具体的には取得部１２１が撮像装置２４による参加者Ａ、Ｂのそれぞれの端末装置２０Ａ、２０Ｂからすべての撮像装置２４ａ、２４ｂ、２４ｃ、２４ｄからの４つの映像（異なる向きの顔の映像）を参加者Ａ、Ｂ毎に取得する。参加者Ａ、Ｂ毎に、取得した４つの映像のうち少なくとも１つの映像から動き検知部１２２が参加者Ａ、Ｂの顔の動きを検知する。例えば撮像装置２４ｃによる正面の顔の映像から顔の動きを検知する。 Specifically, the acquiring unit 121 acquires four images (images of faces in different directions) from all the imaging devices 24a, 24b, 24c, and 24d from the terminal devices 20A and 20B of the participants A and B, respectively, by the imaging device 24. are acquired for each of participants A and B. For each of the participants A and B, the movement detection unit 122 detects the movement of the faces of the participants A and B from at least one of the four images acquired. For example, the movement of the face is detected from the image of the front face captured by the imaging device 24c.

本実施形態の動き検知部１２２が検知する顔の動きは、視線の動きや顔の向きではなく、横方向（右方向や左方向）又は縦方向（上方向や下方向）などの顔の移動である。例えば図１３は、動き検知部１２２が参加者Ａの顔の動きあり（左方向への顔の移動）を検知した場合であり、参加者Ｂについては顔の動きなしを検知した場合である。 The movement of the face detected by the movement detection unit 122 of the present embodiment is not the movement of the line of sight or the direction of the face, but the movement of the face in the horizontal direction (rightward or leftward) or vertical direction (upward or downward). is. For example, FIG. 13 shows a case where the motion detection unit 122 detects that the face of participant A is moving (moving the face to the left), and that the face of participant B is not moving.

制御部１２は、ステップＳ１４２にて顔の動きが検知されると、ステップＳ１４４にて顔の動きに応じて他の参加者の４つの映像から１つの映像と音声を選定し、ステップＳ１４６にて選定した映像と音声から出力映像と出力音声を生成し、ステップＳ１４８にて出力映像と出力音声を動画データとして出力する。 When the movement of the face is detected in step S142, the control unit 12 selects one video and audio from four videos of the other participants in step S144 according to the movement of the face, and in step S146. An output video and an output audio are generated from the selected video and audio, and the output video and the output audio are output as moving image data in step S148.

なお、動き検知部１２２は、顔の動いた距離が所定の距離以上のときに顔の動きありを検知し、顔が動いたとしても顔の動いた距離が所定の距離よりも小さい場合は顔の動きなしを検知するようにしてもよい。これによれば、顔が多少動いただけでは映像が変わらないようにできる。顔の動きを検知するための所定の距離を調整することで、顔の微妙な動きによって映像の変化が頻繁になり過ぎないように調整したり、意図して顔を動かしたときだけ映像が変化するように調整したりできる。 Note that the motion detection unit 122 detects that the face has moved when the distance over which the face has moved is greater than or equal to a predetermined distance. It is also possible to detect that there is no movement of the According to this, it is possible to prevent the image from changing even if the face moves slightly. By adjusting the predetermined distance for detecting facial movements, it is possible to prevent the image from changing too frequently due to subtle movements of the face, or to change the image only when the face is intentionally moved. can be adjusted to

具体的には図１２に示すように顔の動きが検知されていない間は、ステップＳ１４４にて参加者Ａ、Ｂはともに撮像装置２４ｃからの顔の正面の映像（撮像装置２４ｃの映像）が映像選定部１２３により選定され、参加者Ａ、Ｂはともにマイク２６から左右の音声が音声選定部１２５により選定される。ステップＳ１４６にて映像生成部１２４は選定された映像から出力映像を生成し、音声生成部１２６は選定された音声から出力音声を生成する。ステップＳ１４８にて出力部１２７は生成された出力映像と出力音声を画像データとして出力する。 Specifically, as shown in FIG. 12, while the movement of the face is not detected, in step S144, both the participants A and B receive an image of the front of the face from the imaging device 24c (image of the imaging device 24c). Selection is made by the video selection unit 123 , and left and right sounds from the microphones 26 of both the participants A and B are selected by the sound selection unit 125 . In step S146, the video generation unit 124 generates output video from the selected video, and the audio generation unit 126 generates output audio from the selected audio. In step S148, the output unit 127 outputs the generated output video and output audio as image data.

こうして、参加者Ａの表示画面２５２には、参加者Ｂの顔の正面の映像（撮像装置２４ｃの映像）が表示され、左右の音声がそのまま出力される。参加者Ｂの表示画面２５２には、参加者Ａの顔の正面の映像（撮像装置２４ｃの映像）が表示され、左右の音声がそのまま出力される。 In this way, on the display screen 252 of the participant A, the image of the front of the face of the participant B (image of the imaging device 24c) is displayed, and the left and right sounds are output as they are. On the display screen 252 of the participant B, the front image of the face of the participant A (image of the imaging device 24c) is displayed, and the left and right sounds are output as they are.

そして会議中に図１２に示すような参加者Ａの左方向へ顔の動きがあると、ステップＳ１４４にて参加者Ａの左方向への顔の動きが検知され、ステップＳ１４６にて他の参加者Ｂについては顔の右側の映像（撮像装置２４ａの映像）が映像選定部１２３により選定され、マイク２６から右の音声が音声選定部１２５により選定される。ステップＳ１４６にて映像生成部１２４は選定された映像から顔の右側の出力映像を生成する。音声生成部１２６は参加者Ａの端末装置２０Ａに出力する参加者Ｂからの音声は、選定された音声から右の方が左よりも強くなる出力音声を生成する。このとき、図１３に示すように参加者Ｂからは参加者Ａが右に動いたように見えるので、参加者Ｂの端末装置２０Ｂに出力する参加者Ａからの音声は右の方が左より強くなるような出力音声を生成する。ステップＳ１４８にて出力部１２７は生成された出力映像と出力音声を画像データとして出力する。 12 during the conference, the movement of the face of participant A to the left is detected in step S144, and the movement of the face of participant A to the left is detected in step S146. For the person B, the image on the right side of the face (image of the imaging device 24 a ) is selected by the image selection unit 123 , and the sound on the right from the microphone 26 is selected by the sound selection unit 125 . In step S146, the image generator 124 generates an output image of the right side of the face from the selected image. The sound generator 126 generates output sound from the selected sound in which the sound from the participant B is output to the terminal device 20A of the participant A such that the sound on the right side is stronger than the sound on the left side. At this time, as shown in FIG. 13, it appears to participant B that participant A has moved to the right. Produces an output sound that is stronger. In step S148, the output unit 127 outputs the generated output video and output audio as image data.

こうして、参加者Ａの表示画面２５２には、参加者Ｂの顔の右側の映像（撮像装置２４ａの映像）が表示され、右の方が左よりも強い音声が出力される。参加者Ｂの端末装置２０Ｂの表示画面２５２には、参加者Ａの顔の正面の映像（撮像装置２４ｃの映像）が表示され、左右の音声がそのまま出力される。そして、参加者Ａの右方向へ顔の動きを検知すると、参加者Ａの表示画面２５２には、参加者Ｂの顔の左側の映像（撮像装置２４ｂの映像）が表示され、右の方が左よりも強い音声が出力される。これに対して、参加者Ｂの表示画面２５２には、参加者Ａの顔の正面の映像（撮像装置２４ｃの映像）が右にずれて表示され、右の方が左よりも強い音声が出力される。 In this way, on the display screen 252 of the participant A, the image of the right side of the face of the participant B (the image of the imaging device 24a) is displayed, and the sound on the right side is stronger than that on the left side. On the display screen 252 of the terminal device 20B of the participant B, the front image of the face of the participant A (image of the imaging device 24c) is displayed, and the left and right sounds are output as they are. Then, when the movement of participant A's face toward the right direction is detected, on the display screen 252 of participant A, the image of the left side of participant B's face (image of the imaging device 24b) is displayed, and the right side is displayed. A stronger sound is output than the left. On the other hand, on the display screen 252 of the participant B, the image of the front of the face of the participant A (the image of the imaging device 24c) is shifted to the right, and the sound on the right side is stronger than the sound on the left side. be done.

このような本実施形態によれば、例えば参加者Ａの顔の動きに応じて他の参加者Ｂの映像を、参加者Ａが動いた方の撮像装置の映像に切り替えることにより、実際の会議室でまるでその方向に動いたときのように参加者Ｂの横顔が見える。このように、自分の顔の動きに応じて他の参加者の見え方を変えることができるので、まるでその場にいるような臨場感を得ることができる。また顔の動きや撮像装置２４の角度によっては、視線も合わせることもできるので、会話もしやすくなる。 According to this embodiment, for example, the image of the other participant B is switched to the image of the image pickup device in which the participant A moves according to the movement of the face of the participant A, thereby realizing the actual conference. Participant B's profile can be seen in the room as if it were moving in that direction. In this way, it is possible to change the appearance of other participants in accordance with the movement of one's own face, so that one can feel as if one is actually there. In addition, depending on the movement of the face and the angle of the imaging device 24, it is also possible to match the line of sight, making it easier to have a conversation.

＜第２実施形態＞
本発明の第２実施形態について説明する。以下に例示する各形態において実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。第１実施形態では、表示画面２５２に対する「顔の動き」に応じて他の参加者の映像を切り替える場合を例示したが、第２実施形態では、仮想会議室での「顔の位置」に応じて他の参加者の映像を切り替える場合や、「顔の動き」に応じて仮想会議室の表示範囲１８３ｂを変える場合を例示する。なお、第２実施形態のビデオ会議装置１０の構成は、第１実施形態と同様のためその詳細な説明を省略する。<Second embodiment>
A second embodiment of the present invention will be described. Constituent elements having substantially the same functional configuration in each form illustrated below are denoted by the same reference numerals, thereby omitting redundant description. In the first embodiment, the case of switching the video of the other participant according to the "movement of the face" on the display screen 252 was exemplified. A case where the video of another participant is switched by pressing a button or a case where the display range 183b of the virtual conference room is changed according to the "movement of the face" will be exemplified. The configuration of the video conference device 10 of the second embodiment is the same as that of the first embodiment, so detailed description thereof will be omitted.

図１４は、第２実施形態のビデオ会議システム１００の概略構成を示す図であり、参加者４人の場合である。図１４のビデオ会議システム１００は、ビデオ会議装置１０を構成するサーバに、参加者Ａの端末装置２０Ａと参加者Ｂの端末装置２０Ｂと参加者Ｃの端末装置２０Ｃと参加者Ｄの端末装置２０ＤとがネットワークＮで接続されて構成される。端末装置２０Ａ、２０Ｂ、２０Ｃ、２０Ｄはいずれも図６と同様の構成であるため、詳細な説明を省略する。 FIG. 14 is a diagram showing a schematic configuration of the video conference system 100 of the second embodiment, in the case of four participants. In the video conference system 100 of FIG. 14, the server constituting the video conference device 10 includes a terminal device 20A of participant A, a terminal device 20B of participant B, a terminal device 20C of participant C, and a terminal device 20D of participant D. are connected by a network N. Since the terminal devices 20A, 20B, 20C, and 20D all have the same configuration as in FIG. 6, detailed description thereof will be omitted.

図１５は、第２実施形態の仮想会議室表示情報１８３の具体例を示す図であり、図３の中会議室Ｒ２の仮想会議室を利用する場合の例示である。図１５のＡ枠～Ｄ枠には各参加者の映像をはめ込むことができる。図１４の構成では横列ｙ０１に配置されるＡ枠には参加者Ａの映像がはめ込まれ、横列０２に配置されるＢ枠に参加者Ｂの映像がはめ込まれ、縦列ｔ０１に配置されるＣ枠には参加者Ｃの映像がはめ込まれ、縦列ｔ０２に配置されるＤ枠に参加者Ｄの映像がはめ込まれる。図１４では、図１５の表示範囲１８３ｂを立体表示した場合である。図５では参加者Ａ～Ｄが配置される列とテーブル１８３ｃを含めた範囲が表示範囲１８３ｂとなる。図１５でも図５と同様にホワイトボード１８３ｄは表示範囲１８３ｂに含まれていないため、表示されない。 FIG. 15 is a diagram showing a specific example of the virtual conference room display information 183 of the second embodiment, and is an example of using the virtual conference room R2 in the middle of FIG. Each participant's video can be inserted into A-frame to D-frame in FIG. In the configuration of FIG. 14, the image of participant A is fitted in the A frame arranged in row y01, the image of participant B is fitted in the B frame arranged in row 02, and the C frame arranged in column t01. , and the video of participant D is inserted in the D frame arranged in column t02. In FIG. 14, the display range 183b in FIG. 15 is stereoscopically displayed. In FIG. 5, the display range 183b is the range including the columns in which the participants A to D are arranged and the table 183c. In FIG. 15, as in FIG. 5, the whiteboard 183d is not included in the display range 183b, so it is not displayed.

図１５の仮想会議室での各参加者Ａ～Ｄの位置に応じて、各参加者Ａ～Ｄの表示画面２５２に表示される位置と顔の向きが異なる。具体的には各参加者の表示画面２５２には、実際の会議室の場合とほぼ同様の顔の位置と顔の向きで表示される。図１６は参加者Ａの表示画面２５２での見え方と他の参加者の撮像装置との関係を示す図である。図１７は参加者Ｂの表示画面２５２での見え方と他の参加者の撮像装置との関係を示す図である。 Depending on the positions of the participants A to D in the virtual conference room of FIG. 15, the positions and face orientations displayed on the display screen 252 of the participants A to D differ. Specifically, on the display screen 252 of each participant, the face position and face orientation are displayed in substantially the same manner as in the actual conference room. FIG. 16 is a diagram showing the relationship between how participant A appears on the display screen 252 and the imaging devices of other participants. FIG. 17 is a diagram showing the relationship between how participant B appears on the display screen 252 and the imaging devices of other participants.

各参加者Ａ～Ｄの位置については、図１５の仮想会議室において自分をテーブル１８３ｃの手前に配置したときに奥側、右側、左側に見える位置と同じ位置に他の参加者が配置される。例えば図１６に示すように参加者Ａの表示画面２５２では、テーブル１８３ｃに対して自分（参加者Ａ）は手前に、参加者Ｂは奥側に、参加者Ｃは左側に、参加者Ｄは右側にそれぞれ表示される。他方、図１７に示すように参加者Ｂの表示画面２５２では、テーブル１８３ｃに対して参加者Ｂ（自分）は手前に、参加者Ａは奥側に、参加者Ｄは左側に、参加者Ｃは右側にそれぞれ表示される。 Regarding the positions of the participants A to D, the other participants are arranged in the same positions as those seen on the back side, right side, and left side when the participant is placed in front of the table 183c in the virtual conference room of FIG. . For example, as shown in FIG. 16, on the display screen 252 of participant A, he (participant A) is on the front side of the table 183c, participant B is on the back side, participant C is on the left side, and participant D is on the left side. displayed on the right side. On the other hand, as shown in FIG. 17, on the display screen 252 of the participant B, the participant B (himself) is on the front side of the table 183c, the participant A is on the back side, the participant D is on the left side, and the participant C is on the left side. are displayed on the right side respectively.

各参加者Ａ～Ｄの顔の向きについては、図１５の仮想会議室において自分をテーブル１８３ｃの手前に配置したときに奥側、右側、左側から見える顔の向きと同じになるように他の参加者の顔の向きの映像を表示する。例えば図１６に示すように参加者Ａの表示画面２５２では、参加者Ａ（自分）と参加者Ｂについては撮像装置２４ｃからの顔の正面の映像が表示され、参加者Ｃについては撮像装置２４ａからの顔の右側の映像が表示され、参加者Ｄについては撮像装置２４ｂからの顔の左側の映像が表示される。他方、図１７に示すように参加者Ｂの表示画面２５２では、参加者Ｂ（自分）と参加者Ａについては撮像装置２４ｃからの顔の正面の映像が表示されるのに対して、参加者Ｃについては撮像装置２４ｂからの顔の左側の映像が表示され、参加者Ｄについては撮像装置２４ａからの顔の右側の映像が表示される。 The face directions of the participants A to D were set to be the same as the face directions seen from the back, right, and left sides of the virtual conference room shown in FIG. Display the image of the face orientation of the participant. For example, as shown in FIG. 16, on the display screen 252 of participant A, for participant A (himself) and participant B, images of the front of the face from the imaging device 24c are displayed, and for participant C, the imaging device 24a is displayed. The image of the right side of the face of the participant D is displayed, and the image of the left side of the face of the participant D from the imaging device 24b is displayed. On the other hand, on the display screen 252 of the participant B as shown in FIG. For C, an image of the left side of the face from the imaging device 24b is displayed, and for participant D, an image of the right side of the face is displayed from the imaging device 24a.

このように図１６と図１７とでは参加者Ｃ、Ｄの位置と顔の向きが左右逆になっていることが分かる。このように第２実施形態によれば、仮想会議室において自分が手前にいると仮定した場合の他の参加者Ａ～Ｄの位置と顔の向きに合うように撮像装置の映像を切り替えて各参加者Ａ～Ｄの表示画面２５２に表示する。これにより、各参加者Ａ～Ｄの表示画面２５２には実際の会議室の場合と同様の位置と向きで表示することができる。したがって、まるでその会議室にいるかのような臨場感でＷｅｂ会議を行うことができる。 16 and 17, it can be seen that the positions and face directions of participants C and D are left-to-right reversed. As described above, according to the second embodiment, the video of the imaging device is switched so that the positions and face directions of the other participants A to D when it is assumed that the participant is in front in the virtual conference room are switched. It is displayed on the display screens 252 of the participants AD. As a result, the display screens 252 of the participants A to D can be displayed in the same position and orientation as in the actual conference room. Therefore, it is possible to hold a Web conference as if you were in the conference room.

図１８は、第２実施形態のビデオ処理の具体例を示すフローチャートであり、図１０に示すビデオ処理の変形例である。図１８のビデオ処理では参加者の顔の動き（顔の移動）に応じて表示範囲１８３ｂを変えことができる。図１９は第２実施形態における顔の動きを検知した場合の作用説明図であり、図２０はそのときの仮想会議室表示情報１８３の具体例を示す図である。 FIG. 18 is a flowchart showing a specific example of video processing according to the second embodiment, which is a modification of the video processing shown in FIG. In the video processing of FIG. 18, the display range 183b can be changed according to the movement of the participant's face (movement of the face). 19A and 19B are diagrams for explaining the action when the movement of the face is detected in the second embodiment, and FIG. 20 is a diagram showing a specific example of the virtual conference room display information 183 at that time.

第２実施形態のビデオ処理は、先ず図１８に示すステップＳ２４２にて制御部１２は、取得部１２１で取得した映像から少なくとも一人の参加者の顔の動きを検知する。具体的には動き検知部１２２が、取得部１２１で参加者毎に取得された４つの映像のうち少なくとも１つの映像（例えば正面の顔の映像）からその参加者の顔の動きを検知する。なお、ステップＳ２４２については図１１のステップＳ１４２と同様なので詳細な説明を省略する。制御部１２は、ステップＳ２４２にて顔の動きが検知されると、ステップＳ２４４にて顔の動きに応じて他の参加者の映像と音声を選定する。 In the video processing of the second embodiment, first, at step S242 shown in FIG. Specifically, the movement detection unit 122 detects the movement of the participant's face from at least one of the four images acquired for each participant by the acquisition unit 121 (for example, the front face image). Since step S242 is the same as step S142 in FIG. 11, detailed description thereof will be omitted. When the movement of the face is detected in step S242, the control unit 12 selects video and audio of other participants according to the movement of the face in step S244.

第２実施形態のステップＳ２４４では、映像については顔の動きに応じて表示範囲１８３ｂが映像選定部１２３により選定される。例えば図１９に示すように参加者Ａが左に顔を動かした場合、図２０に示すように顔の動きと同じ向き（左側）にずらした表示範囲１８３ｂ’が映像選定部１２３により選定される。もし参加者Ａが右に顔を動かした場合は右にずらした表示範囲（図示しない）が選定される。また参加者Ａが上に顔を動かした場合は上にずらした表示範囲（図示しない）が選定され、参加者Ａが下に顔を動かした場合は下にずらした表示範囲（図示しない）が選定される。音声については、表示範囲１８３ｂ’に含まれる参加者Ｂ、Ｃの音声が音声選定部１２５で選定され、表示範囲１８３ｂ’に含まれない参加者Ｄの音声は音声選定部１２５で選定されない。 In step S244 of the second embodiment, the image selection unit 123 selects the display range 183b according to the movement of the face. For example, when the participant A moves his/her face to the left as shown in FIG. 19, the image selection unit 123 selects a display range 183b' shifted in the same direction (to the left) as the movement of the face as shown in FIG. . If participant A moves his face to the right, a right-shifted display range (not shown) is selected. When the participant A moves his/her face upward, an upward display range (not shown) is selected. selected. As for the voices, the voices of participants B and C included in the display range 183b' are selected by the voice selection unit 125, and the voices of the participant D not included in the display range 183b' are not selected by the voice selection unit 125.

制御部１２はステップＳ２４６にて、選定した表示範囲１８３ｂで映像と音声から出力映像と出力音声を生成し、ステップＳ２４８にて出力映像と出力音声を動画データとして出力する。これにより、図１９に示すように参加者Ａの顔の動きに応じて表示画面２５２には図２０の表示範囲１８３ｂ’が反映されて表示される。 In step S246, the control unit 12 generates output video and output audio from the video and audio in the selected display range 183b, and outputs the output video and output audio as moving image data in step S248. Thus, as shown in FIG. 19, the display range 183b' of FIG. 20 is reflected and displayed on the display screen 252 according to the facial movement of the participant A.

図２０では顔の動きを検知する前の表示範囲１８３ｂを点線で示す。表示範囲１８３ｂ’は表示範囲１８３ｂと比較して参加者Ａ～Ｃが右側に寄って含まれている。しかも表示範囲１８３ｂには参加者Ｄが含まれるのに対して、表示範囲１８３ｂ’には参加者Ｄが含まれない。これにより参加者Ａは顔を動かすことで表示画面２５２に特定の参加者Ｄを見えなくさせることができる。しかも表示範囲１８３ｂ’に含まれない参加者Ｄの音声は音声選定部１２５で選定されないので、参加者Ａの端末装置２０では聞こえなくなる。 In FIG. 20, the display range 183b before detecting the movement of the face is indicated by a dotted line. The display range 183b' includes the participants A to C closer to the right than the display range 183b. Moreover, the display range 183b includes the participant D, whereas the display range 183b' does not include the participant D. Thereby, the participant A can make the specific participant D invisible on the display screen 252 by moving his/her face. Moreover, since the voice of participant D that is not included in the display range 183b' is not selected by the voice selection unit 125, it cannot be heard by the terminal device 20 of participant A. FIG.

このように第２実施形態では、表示画面２５２に対する顔の動きに応じて表示範囲１８３ｂを変えることができる。これにより、顔を動かすことで話したい参加者とだけ会話することもできる。なお、図１９及び図２０では、表示画面２５２に対して左に顔を動かすことで表示範囲１８３ｂを左にずらす場合を例示したが、これに限られず、表示画面２５２に対して左右（水平方向）や上下（垂直方向）に動かすことで表示範囲１８３ｂも左右や上下にずらすことができる。具体的には、右に顔を動かすことで表示範囲１８３ｂを右にずらすことができる。さらに上に顔を動かすことで表示範囲１８３ｂを上にずらすこともできる。下に顔を動かすことで表示範囲１８３ｂを下にずらすこともできる。 Thus, in the second embodiment, the display range 183b can be changed according to the movement of the face with respect to the display screen 252. FIG. As a result, it is possible to have a conversation with just the participant you want to talk to by moving your face. 19 and 20 illustrate the case where the display range 183b is shifted to the left by moving the face to the left with respect to the display screen 252, but the present invention is not limited to this. ) or up and down (vertical direction), the display range 183b can also be shifted left and right or up and down. Specifically, the display range 183b can be shifted to the right by moving the face to the right. By moving the face further upward, the display range 183b can be shifted upward. The display range 183b can also be shifted downward by moving the face downward.

また、顔が動く距離に応じて表示範囲１８３ｂのずらし量を変えることができる。例えば顔を大きく動かすほど表示範囲１８３ｂのずらし量を大きくする。これにより、表示範囲１８３ｂを参加者の視野と捉えれば、参加者は顔の動きに応じた見え方ができるので、まるで実際の会議室にいるような臨場感を得ることができる。 Also, the shift amount of the display range 183b can be changed according to the distance the face moves. For example, the more the face is moved, the greater the shift amount of the display range 183b. As a result, if the display range 183b is regarded as the field of view of the participant, the participant can see according to the movement of the face, so that it is possible to obtain a sense of realism as if being in an actual conference room.

このような第２実施形態によれば、仮想会議室における参加者の位置に応じてその参加者を表示する位置を特定した映像が生成されるので、他の参加者の位置がまるで実際の会議室の位置にいるような見え方でビデオ会議や会話を行うことができる。また、仮想会議室における参加者の位置に応じてその参加者の映像を選定するので、例えば自分の横にいる参加者は横顔の映像を表示できるようになる。これにより、他の参加者の見え方がまるで実際の会議室にいるような見え方でビデオ会議や会話を行うことができる。 According to the second embodiment as described above, since a video is generated that identifies the position where the participant is displayed according to the position of the participant in the virtual conference room, the positions of the other participants can be displayed as if they were in the actual conference. You can conduct video conferences and conversations as if you were in the room. In addition, since the video of the participant is selected according to the position of the participant in the virtual conference room, for example, the video of the side face of the participant next to the participant can be displayed. This makes it possible to conduct video conferences and conversations as if other participants were in the actual conference room.

また、第２実施形態では、参加者の顔の動きに応じて仮想会議室の表示範囲１８３ｂを選定し、複数の参加者のうち表示範囲１８３ｂに含まれる参加者を端末装置２０に表示させるための映像を生成できる。これによれば、自分の顔を動かすことにより、仮想会議室のうち見たい範囲を表示させることができる。例えばテーブル１８３ｃの右側にいる参加者だけを見たい場合には顔を右に動かせば右側の参加者だけが見えるようにできる。また実際の会議室でその参加者の近くに行って会話するように、会話する参加者の方に顔を動かせばその参加者だけを表示させその参加者とだけ会話をすることもできる。このように、まるで実際の会議室で会話をしているような臨場感を得られる。 In addition, in the second embodiment, the display range 183b of the virtual conference room is selected according to the facial movements of the participants, and the participants included in the display range 183b among the plurality of participants are displayed on the terminal device 20. image can be generated. According to this, by moving one's face, it is possible to display a desired range of the virtual conference room. For example, if it is desired to see only the participants on the right side of the table 183c, only the participants on the right side can be seen by moving the face to the right. It is also possible to display only the participant by moving the face toward the participant to have a conversation, just like going near the participant and having a conversation in an actual conference room. In this way, it is possible to obtain a sense of realism as if you were having a conversation in an actual conference room.

＜第３実施形態＞
本発明の第３実施形態について説明する。第１実施形態及び第２実施形態の動き検知部１２２では「顔の動き」として「顔の移動」（表示画面２５２に対する横方向や縦方向の動き）を検知する場合を例示したが、これに限られない。第３実施形態の動き検知部１２２では、「顔の動き」として「顔の向き」を検知し、顔の向きに応じて仮想会議室上の参加者の見え方を変える場合を例示する。なお、第３実施形態のビデオ会議装置１０の構成は、第１実施形態と同様のためその詳細な説明を省略する。第３実施形態のビデオ会議システム１００は、第２実施形態と同様に参加者４人の場合（図１４）を例に挙げて説明する。<Third Embodiment>
A third embodiment of the present invention will be described. The motion detection unit 122 of the first and second embodiments has exemplified the case of detecting "face movement" (horizontal and vertical movement with respect to the display screen 252) as "face movement". Not limited. The movement detection unit 122 of the third embodiment detects "face orientation" as "face movement", and exemplifies a case where the appearance of the participants in the virtual conference room is changed according to the face orientation. The configuration of the video conference device 10 of the third embodiment is the same as that of the first embodiment, so detailed description thereof will be omitted. A video conference system 100 according to the third embodiment will be described by exemplifying the case of four participants (FIG. 14) as in the second embodiment.

図２１は第３実施形態のビデオ処理の具体例を示すフローチャートであり、図１１に示すビデオ処理の他の変形例である。第３実施形態のビデオ処理は、先ず図２１に示すステップＳ３４２にて制御部１２は取得した映像から参加者の顔の向きを検知する。制御部１２は、ステップＳ３４２にて顔の向きが検知されると、ステップＳ３４４にて顔の向きに応じて他の参加者の映像と音声を選定する。制御部１２はステップＳ３４６にて選定した表示範囲１８３ｂで映像と音声から出力映像と出力音声を生成し、ステップＳ３４８にて出力映像と出力音声を動画データとして出力する。 FIG. 21 is a flowchart showing a specific example of video processing according to the third embodiment, which is another modified example of the video processing shown in FIG. In the video processing of the third embodiment, first, in step S342 shown in FIG. 21, the control unit 12 detects the orientation of the face of the participant from the acquired video. When the orientation of the face is detected in step S342, the control unit 12 selects video and audio of other participants according to the orientation of the face in step S344. The control unit 12 generates output video and output audio from the video and audio in the display range 183b selected in step S346, and outputs the output video and output audio as moving image data in step S348.

このような図２１のビデオ処理では参加者の顔の動き（顔の向き）に応じて表示範囲１８３ｂを変えることで、仮想会議室表示情報１８３での「同列」の参加者の見え方を変えることができる。以下、図２２乃至図２７を参照しながら具体的に説明する。図２２は第３実施形態における仮想会議室表示情報１８３の具体例を示す図である。図２３は図２２の仮想会議室表示情報１８３による参加者Ａからの見え方を示す図である。図２４は顔の向きが正面から左に動いた場合の仮想会議室表示情報１８３を示す図であり、図２５は図２４の仮想会議室表示情報１８３による参加者Ａからの見え方を示す図である。図２６は顔の向きが正面から右に動いた場合の仮想会議室表示情報１８３を示す図であり、図２７は図２６の仮想会議室表示情報１８３による参加者Ａからの見え方を示す図である。 In such a video processing of FIG. 21, by changing the display range 183b according to the movement of the face of the participant (orientation of the face), the appearance of the participants in the "same line" in the virtual conference room display information 183 is changed. be able to. A specific description will be given below with reference to FIGS. 22 to 27. FIG. FIG. 22 is a diagram showing a specific example of the virtual conference room display information 183 in the third embodiment. FIG. 23 is a diagram showing how the virtual meeting room display information 183 of FIG. 22 is viewed from the participant A. FIG. FIG. 24 is a diagram showing the virtual conference room display information 183 when the direction of the face moves from the front to the left, and FIG. 25 is a diagram showing how participant A sees the virtual conference room display information 183 in FIG. is. FIG. 26 is a diagram showing virtual conference room display information 183 when the direction of the face moves from the front to the right, and FIG. 27 is a diagram showing how participant A sees the virtual conference room display information 183 in FIG. is.

図２２によれば、参加者Ｃと参加者Ｄは参加者Ａと同列（横列ｙ０１）であり、参加者Ａの左側に参加者Ｃが配置され、参加者Ａの右側に参加者Ｄが配置されている。第３実施形態では、顔の向きを変えたときだけ同列の参加者を表示することができる。すなわち、図２２に示すように参加者Ａの顔の向きが正面を向いているときには、図２２の表示範囲１８３ｂとなり、同列の参加者Ｃ、Ｄは表示されず音声も出ない。図２５に示すように動き検知部１２２が参加者Ａの左向きへの顔の向きを検知すると、図２４の仮想会議室表示情報１８３により同列の左側の参加者Ｃのみが表示され、参加者Ｃの音声が出力される。他方、図２７に示すように動き検知部１２２が参加者Ａの右向きへの顔の向きを検知すると、図２６の表示範囲１８３ｂにより同列の右側の参加者Ｄのみが表示され、参加者Ｄの音声が出力される。 According to FIG. 22, participants C and D are in the same row as participant A (row y01), participant C is arranged to the left of participant A, and participant D is arranged to the right of participant A. It is In the third embodiment, participants in the same row can be displayed only when the direction of the face is changed. That is, when participant A faces forward as shown in FIG. 22, the display range 183b in FIG. As shown in FIG. 25, when the motion detection unit 122 detects that the face of the participant A faces left, only the left participant C in the same row is displayed according to the virtual conference room display information 183 in FIG. is output. On the other hand, as shown in FIG. 27, when the motion detection unit 122 detects that the face of the participant A faces rightward, only the right participant D in the same row is displayed in the display range 183b of FIG. Sound is output.

このように第３実施形態では、参加者Ａが正面を向いているとき、顔の動きが検知されなければ、参加者Ｃ，Ｄは表示されず、参加者Ｂのみとの会話が可能である。これに対して、参加者Ａが左を向くと参加者Ｃと会話できるようになり、参加者Ａが右を向くと参加者Ｄと会話できるようになる。これにより、実施の会議室において必要なときだけ同列の参加者に話しかける場合をＷｅｂ会議上で実現できる。 As described above, in the third embodiment, when participant A is facing the front and no facial movement is detected, participants C and D are not displayed and only participant B can have a conversation. . On the other hand, when the participant A turns left, he/she can talk with the participant C, and when the participant A turns right, he or she can talk with the participant D. This makes it possible to talk to participants in the same row only when necessary in the actual conference room on the Web conference.

例えば実際の会議室では、自社の参加者と他社の参加者で交渉会議を行う場合、テーブルを挟んで手前側の列に自社の参加者が並び、テーブルの奥側の列に他社の参加者が並ぶことがある。この場合、他社の参加者との交渉をしている間は、自社の参加者と話をする必要がない。自社の参加者は必要なときだけ話しかけたいことがある。第３実施形態では、このようなシチュエーションも実現できる。すなわち、顔の向きを変えることで自社の参加者を表示させて会話することができるようになる。このように、第３実施形態においてもまるで実際の会議室のような臨場感を体験できる。 For example, in an actual conference room, if participants from your company and participants from other companies hold a negotiation meeting, the participants from your company line up in the front row across the table, and the participants from other companies line up in the back row of the table. may line up. In this case, there is no need to talk to the participants of the company while negotiating with the participants of other companies. Your company's participants may want to speak only when necessary. Such a situation can also be realized in the third embodiment. That is, by changing the orientation of the face, the participants of the company can be displayed and the conversation can be conducted. In this way, even in the third embodiment, it is possible to experience a sense of realism as if one were in an actual conference room.

以上のように第３実施形態によれば、動き検知部１２２は、取得部１２１にて取得された４つの映像のうち少なくとも１つの映像から参加者の顔の向きが正面、右向き、左向きのいずれかであるかを検知できる。映像選定部１２３は、検知された顔の向きが正面の場合は、右側と左側に位置する参加者の映像を含まない仮想会議室の表示範囲１８３ｂを選定し、検知された顔の向きが右向きの場合は、右側に位置する参加者の映像を含む仮想会議室の表示範囲１８３ｂを選定し、検知された顔の向きが左向きの場合は、左側に位置する参加者の映像を含む仮想会議室の表示範囲１８３ｂを選定することができる。 As described above, according to the third embodiment, the motion detection unit 122 determines whether the face direction of the participant is front, right, or left from at least one image among the four images acquired by the acquisition unit 121. It is possible to detect whether or not When the detected face orientation is the front, the image selection unit 123 selects the display range 183b of the virtual conference room that does not include the images of the participants positioned on the right and left sides, and selects the detected face orientation to the right. In the case of , select the display range 183b of the virtual conference room that includes the video of the participant located on the right side, and if the detected face direction is left, select the virtual conference room that includes the video of the participant located on the left side. display range 183b can be selected.

これにより、右側の参加者と会話する場合は右側を向き、左側の参加者と会話する場合は左側を向けばよいので、まるで会議室で隣の参加者と会話する場合のような臨場感を得ることができる。また自分の右側に複数の参加者がいればそれを見たいときには右を向けばよく、自分の左側に複数の参加者がいればそれを見たいときには左を向けばよいので、まるで実際の会議室で顔の向きを変えた場合のような臨場感を得られる。なお、第３実施形態では、参加者Ａ（自分）の左側と右側に参加者が一人ずつの場合を例示したが、これに限られない。例えば自分の左側と右側に参加者が複数の場合にも適用できる。この場合、右側を向けば自分の右側にいるすべての参加者が見えるようになり、左側を向けば自分の左側にいるすべての参加者が見えるようになる。 This allows you to turn to the right when talking to the participant on the right, and turn to the left when talking to the participant on the left, so you can feel as if you were talking to the participants next to you in a conference room. Obtainable. Also, if there are multiple participants on your right side, you can turn to the right when you want to see them, and if you want to see multiple participants on your left side, you can turn to the left when you want to see them. You can get a sense of realism as if you changed the direction of your face in a room. In addition, in the third embodiment, the case where there are one participant on the left side and one on the right side of the participant A (himself) has been exemplified, but the present invention is not limited to this. For example, it can also be applied when there are multiple participants on the left and right sides of oneself. In this case, looking to the right will allow you to see all participants to your right, and looking to your left will allow you to see all participants to your left.

＜第１実施形態乃至第３実施形態の変形例＞
（１）第１実施形態乃至第３実施形態では、仮想会議室表示情報１８３において参加者の位置（席）が予め設定される場合を例示したが、これに限られない。会議中に仮想会議室表示情報１８３の参加者の位置（席）を変えられるようにしてもよい。例えば図２８は第１変形例に係る仮想会議室表示情報１８３の具体例を示す図であり、図１５の縦列ｔ０２の参加者Ｄを横列ｙ０２に位置（席）を移動した場合である。図２９は参加者Ｄの席移動後の参加者Ａからの見え方を示す図である。図３０は参加者Ｄの席移動前の参加者Ｄからの見え方を示す図であり、図３１は参加者Ｄの席移動後の参加者Ｄからの見え方を示す図である。<Modified Examples of First to Third Embodiments>
(1) In the first to third embodiments, the positions (seats) of the participants are set in advance in the virtual conference room display information 183, but the present invention is not limited to this. The positions (seats) of the participants in the virtual conference room display information 183 may be changed during the conference. For example, FIG. 28 is a diagram showing a specific example of the virtual conference room display information 183 according to the first modification, in which the position (seat) of participant D in column t02 in FIG. 15 is moved to row y02. FIG. 29 is a diagram showing how participant A sees after participant D moves his/her seat. FIG. 30 is a diagram showing how the participant D looks before the participant D moves, and FIG. 31 shows how the participant D looks after the participant D moves.

図２９に示すように参加者Ｄの席の移動は参加者Ａから見れば、テーブル１８３ｃの右側から奥側の参加者Ｃの隣に移って見えるので、移動したことが一目で分かる。参加者Ｄから見れば席移動前は図３０に示すようにテーブル１８３ｃの奥側に参加者Ｃが見えたのに対して、席を移動することで図３１に示すようにテーブル１８３ｃの奥側には参加者Ａが見えるようになる。このような第１変形例によれば、仮想会議室表示情報１８３において席を移動することでも、実際の会議室と同じような見え方をビデオ会議でも実現できるので、臨場感を高めることができる。 As shown in FIG. 29, when the seat of participant D is moved, the seat of participant A appears to move from the right side of the table 183c to the side of participant C on the back side, so that the movement can be understood at a glance. From the perspective of participant D, before the seat change, participant C could be seen on the far side of table 183c as shown in FIG. becomes visible to participant A. According to such a first modification, by moving the seats in the virtual conference room display information 183, it is possible to achieve the same appearance as in the actual conference room even in the video conference, so it is possible to enhance the sense of presence. .

（２）上記第１実施形態では参加者２人の場合、第２実施形態及び第３実施形態では参加者４人の場合を例示したが、これに限られない。多数の参加者を表示して会議を行う場合であってもよい。例えば図３２は第２変形例に係る仮想会議室表示情報１８３の具体例を示す図であり、図３の大会議室Ｒ３の表示範囲（参加者１６人の場合）を例示する。図３３は図３２の仮想会議室表示情報１８３による参加者Ａ１からの見え方を示す図である。図３２では、横列ｙ０１に参加者３人（Ａ２、Ａ１、Ａ３）、横列ｙ０２に参加者４人（Ｂ１、Ｂ２、Ｂ３、Ｂ４）、縦列ｔ０１に参加者４人（Ｃ１、Ｃ２、Ｃ３、Ｃ４）、縦列ｔ０２に参加者４人（Ｄ１、Ｄ２、Ｄ３、Ｄ４）、プレゼンターとしての参加者１人（Ａ４）が配置されている。図３３に示すように参加者Ａ１の表示画面２５２には、図３２の仮想会議室表示情報１８３の配置の通り、実際の会議室のようにテーブル１８３ｃの手前には参加者３人（Ａ２、Ａ１、Ａ３）の正面顔の映像が並び、テーブル１８３ｃの奥側には参加者４人（Ｂ１、Ｂ２、Ｂ３、Ｂ４）の正面顔の映像が並んで見える。またテーブル１８３ｃの左側には参加者４人（Ｃ１、Ｃ２、Ｃ３、Ｃ４）の右顔の映像が並び、テーブル１８３ｃの右側には参加者４人（Ｄ１、Ｄ２、Ｄ３、Ｄ４）の左顔の映像が並んで見える。さらに表示画面２５２の右上にはプレゼンターとしての参加者Ａ４の映像が見えるので、この参加者Ａ４がプレゼンテーションを行っていることが一目でわかる。なお、図３２及び図３３に示すようにホワイトボード１８３ｄはテーブル１８３ｃ上に表示してもよい。この場合、ホワイトボード１８３ｄの向きは、どの参加者からも同じように見えるように調整可能である。 (2) In the first embodiment, the case of two participants, and in the second and third embodiments, the case of four participants is illustrated, but the present invention is not limited to this. It may be a case of holding a conference by displaying a large number of participants. For example, FIG. 32 is a diagram showing a specific example of the virtual conference room display information 183 according to the second modification, exemplifying the display range of the large conference room R3 in FIG. 3 (in the case of 16 participants). FIG. 33 is a diagram showing how the virtual conference room display information 183 of FIG. 32 is viewed from the participant A1. In FIG. 32, there are three participants (A2, A1, A3) in row y01, four participants (B1, B2, B3, B4) in row y02, and four participants (C1, C2, C3, C4), four participants (D1, D2, D3, D4) and one participant (A4) as a presenter are arranged in column t02. As shown in FIG. 33, on the display screen 252 of participant A1, three participants (A2, A1, A3) are displayed side by side, and four participants (B1, B2, B3, B4) are viewed side by side on the back side of the table 183c. Images of the right faces of the four participants (C1, C2, C3, C4) are arranged on the left side of the table 183c, and images of the left faces of the four participants (D1, D2, D3, D4) are arranged on the right side of the table 183c. are displayed side by side. Furthermore, since the image of participant A4 as a presenter can be seen on the upper right of display screen 252, it can be seen at a glance that participant A4 is giving a presentation. Note that the whiteboard 183d may be displayed on the table 183c as shown in FIGS. In this case, the orientation of the whiteboard 183d can be adjusted so that all participants can see it in the same way.

図３２ではプレゼンターの列ｐ０１（発言領域の列の例示）を他の横列ｙ０１、ｙ０２や縦列ｔ０１、ｔ０２とは別に設けている。このような発言領域に参加者を表示する列ｐ０１を設け、この列ｐ０１に参加者が移動して発言することで、誰が発言しているかが分かりやすくなる。図３２では参加者Ａ４がプレゼンターであり、参加者Ａ４は横列ｙ０１からプレゼンターの列ｐ０１に席を移動した場合を例示している。図３２ではプレゼンターの列ｐ０１（発言領域の列）は１人分であるが、２人分以上あってもよい。２人以上でプレゼンテーションを行う場合もあるからである。このように、参加者が多数であっても、プレゼンターの列ｐ０１（発言領域の列）に移動してプレゼンテーションや発言を行うことで、誰がプレゼンテーションを行っているのか、誰が発言しているのかなどが一目で分かり、実際の会議室で参加者の前に出てプレゼンテーションをしたり発言したりする場合に近い見え方にすることができるので、より臨場感を高めることができる。 In FIG. 32, a column p01 of presenters (an example of a column of speech areas) is provided separately from other rows y01 and y02 and columns t01 and t02. By providing a column p01 for displaying the participants in such a comment area and having the participants move to this column p01 to speak, it becomes easy to understand who is speaking. FIG. 32 illustrates a case where participant A4 is the presenter, and participant A4 has moved from the horizontal row y01 to the presenter row p01. In FIG. 32, the column p01 of presenters (speech area column) is for one person, but it may be for two or more presenters. This is because there are cases where two or more people make a presentation. In this way, even if there are a large number of participants, by moving to the presenter column p01 (speech area column) and giving a presentation or making a statement, it is possible to see who is giving the presentation, who is speaking, etc. can be seen at a glance, and the appearance can be similar to that of presenting or speaking in front of participants in an actual conference room, so the presence can be enhanced.

＜第４実施形態＞
本発明の第４実施形態について説明する。第１実施形態乃至第３実施形態とその変形例では、撮像装置２４から参加者の異なる向きの顔の映像を取得することで、参加者の顔の動きに応じて他の参加者の「顔の映像」を変えられる場合を例示した。第４実施形態では、撮像装置２４として全方位（３６０度）カメラを用いて異なる向きの全方位映像を取得することで、参加者の顔の動きに応じて他の参加者の「周囲の映像」も変えられる場合を例示する。ここでの「周囲の映像」としては、参加者の背景映像だけでなく、前方映像、右側映像、左側映像などの映像も含まれる。第４実施形態では、そのような周囲の映像の見え方も参加者の顔の動きに応じて変えられることで、より臨場感のある体験が可能となる。<Fourth Embodiment>
A fourth embodiment of the present invention will be described. In the first to third embodiments and their modifications, images of the faces of the participants facing different directions are acquired from the imaging device 24, so that the "faces" of other participants are captured in accordance with the movements of the faces of the participants. A case in which the "image of the In the fourth embodiment, an omnidirectional (360-degree) camera is used as the image pickup device 24 to acquire omnidirectional images in different directions, so that other participants'"surroundingimages" are captured according to the movement of the participant's face. ” is also changed. The "surrounding image" here includes not only the background image of the participant but also images such as the front image, the right image, and the left image. In the fourth embodiment, the appearance of such surrounding images can be changed according to the movement of the participant's face, thereby enabling a more realistic experience.

従来は、例えばビデオ会議で商品開発会議などを行う場合、テーブルに置かれた試作品の映像を別の方向から見たいとき、その場にいる参加者にカメラに映るようにその試作品を動かしてもらったり、カメラの方を動かしてもらったりしないと見ることができなかった。ところが、見たい方向に試作品を向けてもらったり、見たい方向にカメラを動かしてもらったりすることは意外と難しい。従来のビデオ会議では、映像を見ている参加者がその場にいる他の参加者に、そこではなくてもっと右とか、もっと左とか伝えても上手く伝わらず、なかなか見たい映像が見られないというもどかしさがあった。 In the past, for example, when a product development meeting was held by video conference, when the video of the prototype placed on the table was to be viewed from a different direction, the participants moved the prototype so that it could be seen by the camera. I couldn't see it unless I had someone hold me or move the camera. However, it is surprisingly difficult to have the prototype pointed in the direction you want to see, or to have the camera move in the direction you want to see. In conventional video conferencing, even if the participant who is watching the video tells the other participants to move to the right or left instead of there, it is difficult to see the video they want to see. There was frustration.

この点、第４実施形態によれば、その場にいる参加者に試作品やカメラを動かしてもらわなくても、映像を見ている参加者が見たい方向に顔を動かすだけで、試作品の映像を含む周囲の映像を見たい方向に切り替えることができ、その周囲の映像を動かしたりすることもできる。これによれば、商品開発会議だけでなく、建設現場で建設物の映像を見ながら現場監督と遠隔で会話したい場合、ショールームで商品を見ながら店員と会話したい場合など、様々な場面で利用できる。しかも、映像を見ている参加者が見たい方向に顔を動かすだけで、建設物の映像や商品の映像を含む周囲の映像の見え方を変えられるので、まるでその場にいるような臨場感でコミュニケーションを円滑に行うことができる。 In this respect, according to the fourth embodiment, even if the participants on the spot do not move the prototype or the camera, the participant who is watching the video simply moves his/her face in the desired viewing direction. You can switch to the direction you want to see the surrounding images including the image of , and you can also move the surrounding images. According to this, it can be used not only in product development meetings, but also in various situations such as when you want to talk remotely with a site supervisor while watching a video of a building at a construction site, or when you want to talk to a clerk while looking at a product in a showroom. . What's more, by simply moving the face of the participant watching the video in the direction they want to see, the appearance of the surrounding video, including the video of the building and the video of the product, can be changed, giving a sense of realism as if they were there. communication can be carried out smoothly.

このような第４実施形態のコミュニケーションシステム１００について図３４及び図３５を参照しながら詳細に説明する。図３４は、第４実施形態のコミュニケーションシステム１０１の概略構成を示す図である。図３５は、図３４の端末装置２０Ｂを正面から見た概略図であり、撮像装置の設置位置の具体例を示す図である。図３４のコミュニケーションシステム１０１は、コミュニケーション装置１１を構成するサーバに、参加者Ａの端末装置２０Ａと参加者Ｂの端末装置２０ＢとがネットワークＮで接続されて構成される。端末装置２０Ａ、２０Ｂは図６とほぼ同様の構成であるため、同様の機能を有する要素は同様の符号を付して説明する。 The communication system 100 of such a fourth embodiment will be described in detail with reference to FIGS. 34 and 35. FIG. FIG. 34 is a diagram showing a schematic configuration of the communication system 101 of the fourth embodiment. FIG. 35 is a schematic diagram of the terminal device 20B of FIG. 34 as viewed from the front, showing a specific example of the installation position of the imaging device. A communication system 101 of FIG. 34 is configured by connecting a terminal device 20A of a participant A and a terminal device 20B of a participant B to a server constituting a communication device 11 via a network N. FIG. Since the terminal devices 20A and 20B have substantially the same configuration as in FIG. 6, elements having similar functions are denoted by similar reference numerals.

図３４は、撮像装置２４の前方の参加者Ａ、Ｂの顔の映像だけでなく、撮像装置２４の後方にある構造物３０の映像まで端末装置２０Ａ、２０Ｂに表示させる場合の例示である。第１実施形態乃至第３実施形態の撮像装置２４は、端末装置２０を見ている参加者の方に向いている内側カメラなので、その撮像装置２４の後方にあるものまでは撮影できない。そこで、図３４では、撮像装置２４に全方位カメラを適用することで、参加者の顔の映像だけでなく、構造体Ｔの映像も含む周囲の映像も端末装置２０Ａ、２０Ｂで見られるようにしたものである。以下、このような撮像装置２４について具体的に説明する。 FIG. 34 is an example in which not only images of the faces of participants A and B in front of the imaging device 24 but also images of the structure 30 behind the imaging device 24 are displayed on the terminal devices 20A and 20B. Since the image pickup device 24 of the first to third embodiments is an inner camera facing toward the participant who is looking at the terminal device 20, it is not possible to photograph an object behind the image pickup device 24. FIG. Therefore, in FIG. 34, by applying an omnidirectional camera to the imaging device 24, not only the image of the participant's face but also the surrounding image including the image of the structure T can be viewed on the terminal devices 20A and 20B. It is what I did. Hereinafter, such an imaging device 24 will be specifically described.

図３４の端末装置２０Ａ、２０Ｂにはそれぞれ、全方位カメラで構成される撮像装置２４（撮像装置２４ｅ、２４ｆ、２４ｇ）が設けられている。撮像装置２４ｅ、２４ｆ、２４ｇは、後述する図４１などに示すような参加者の異なる向きの顔の映像と異なる向きの周囲の映像を含む全方位映像を撮像する。ここでの「周囲の映像」は参加者の顔の映像を含む水平３６０度の全方位映像であり、例えば参加者の背景の映像、正面の映像、右側の映像、左側の映像なども含まれる。 The terminal devices 20A and 20B in FIG. 34 are respectively provided with imaging devices 24 (imaging devices 24e, 24f, and 24g) configured by omnidirectional cameras. The imaging devices 24e, 24f, and 24g capture omnidirectional images including images of the faces of the participants in different orientations and images of the surroundings in different orientations, as shown in FIG. 41, which will be described later. The "surrounding image" here is a horizontal 360-degree omnidirectional image including the image of the participant's face. For example, the participant's background image, front image, right image, left image, etc. are included. .

撮像装置２４（２４ｅ、２４ｆ、２４ｇ）は必ずしも表示装置２５に設置されていなくてもよく、異なる向きの顔の映像と異なる向きの周囲の映像を含む映像データを取得できれば壁や机などどのような場所に設置されていてもよい。また撮像装置２４は必ずしも端末装置２０Ａ、２０Ｂに電気的に接続されていなくてもよい。撮像装置２４は、異なる向きの顔の映像と異なる向きの周囲の映像を含む映像データがネットワークＮを介してコミュニケーション装置１１に送信される構成であれば、どのような構成であってもよい。 The imaging devices 24 (24e, 24f, and 24g) do not necessarily have to be installed in the display device 25. If it is possible to acquire video data including images of faces in different directions and images of the surroundings in different directions, they can be used on walls, desks, and so on. may be installed in any location. Also, the imaging device 24 does not necessarily have to be electrically connected to the terminal devices 20A and 20B. The imaging device 24 may have any configuration as long as image data including a face image in a different direction and a surrounding image in a different direction are transmitted to the communication device 11 via the network N.

なお、撮像装置２４は、必ずしも全方位（３６０度）の映像を撮像できるカメラでなくてもよい。例えば所定の角度以上の映像を撮像できる広角カメラでもよい。撮像装置２４は２つの広角カメラを内側カメラと外側カメラに組み合わせて全方位映像を撮像できるようにした構成でもよく、スマートフォンなどの内側カメラと外側カメラを撮像装置２４として用いてもよい。第４実施形態では、例えば図３６に示すように外側カメラ２４６と内側カメラ２４７とで全方位映像を撮像できる撮像装置２４を例示している。これによれば、内側カメラ２４７から参加者の顔の映像や背景の映像を取得でき、外側カメラ２４６から参加者の前方の映像を取得できる。また、撮像装置２４は、１つの魚眼レンズや凸面鏡を備えるカメラであってもよい。撮像装置２４の向きも図示したものに限られない。参加者の顔の映像と周囲の映像を撮像できれば、レンズを上向きに設定しても、下向きに設置してもよい。例えば全方位レンズをマウントした撮像装置であれば、レンズが上向きになるように配置してもよい。また、図３４の撮像装置２４の数も３つに限られない。 It should be noted that the imaging device 24 may not necessarily be a camera capable of capturing images in all directions (360 degrees). For example, a wide-angle camera capable of capturing an image of a predetermined angle or more may be used. The imaging device 24 may have a configuration in which two wide-angle cameras are combined with an inner camera and an outer camera to capture an omnidirectional video, or an inner camera and an outer camera such as a smartphone may be used as the imaging device 24. In the fourth embodiment, for example, as shown in FIG. 36, an imaging device 24 capable of capturing an omnidirectional video with an outer camera 246 and an inner camera 247 is exemplified. According to this, an image of the face of the participant and an image of the background can be obtained from the inner camera 247 , and an image in front of the participant can be obtained from the outer camera 246 . Also, the imaging device 24 may be a camera with a single fisheye lens or a convex mirror. The orientation of the imaging device 24 is also not limited to that illustrated. As long as the image of the participant's face and the image of the surroundings can be captured, the lens may be set upward or downward. For example, if an imaging device mounts an omnidirectional lens, the lens may be arranged upward. Also, the number of imaging devices 24 in FIG. 34 is not limited to three.

ここで撮像装置２４ｅ、２４ｆ、２４ｇの設置位置と映像例を図３５及び図３６を参照しながら説明する。図３５は、図３４の端末装置を正面から見た概略図であり、撮像装置の設置位置の具体例を示す図である。図３６は、図３４の端末装置２０Ｂを上から見た図であり、各撮像装置２４ｅ、２４ｆ、２４ｇからの周囲映像の具体例を示す図である。端末装置２０Ａも同様の構成であるため、ここでは端末装置２０Ｂを代表して説明する。 Here, installation positions of the imaging devices 24e, 24f, and 24g and image examples will be described with reference to FIGS. 35 and 36. FIG. FIG. 35 is a schematic diagram of the terminal device of FIG. 34 viewed from the front, showing a specific example of the installation position of the imaging device. FIG. 36 is a top view of the terminal device 20B in FIG. 34, showing a specific example of surrounding images from the imaging devices 24e, 24f, and 24g. Since the terminal device 20A has the same configuration, the terminal device 20B will be described here as a representative.

図３５に示すように撮像装置２４ｅ、２４ｆ、２４ｇは表示装置２５の上部に並べて配置される。図３６に示すように撮像装置２４ｇは表示装置２５の上部中央に配置される中央撮像装置（第３撮像装置）である。撮像装置２４ｇは後述の図４１（ａ）のような水平３６０度の全方位映像４１ｇを撮像できる。全方位映像４１ｇには、図８に示すような参加者Ｂの顔の正面の映像と、図３６に示すような構造物３０の正面の映像４０ｇとが含まれる。 As shown in FIG. 35, the imaging devices 24e, 24f, and 24g are arranged side by side above the display device 25. As shown in FIG. As shown in FIG. 36, the imaging device 24g is a central imaging device (third imaging device) arranged in the upper center of the display device 25. As shown in FIG. The imaging device 24g can image a horizontal 360-degree omnidirectional video 41g as shown in FIG. 41(a), which will be described later. The omnidirectional image 41g includes a front image of the face of the participant B as shown in FIG. 8 and a front image 40g of the structure 30 as shown in FIG.

図３６に示すように撮像装置２４ｅは、表示装置２５の上部に撮像装置２４ｇよりも右寄りに離間して配置される右側撮像装置（第１撮像装置）である。撮像装置２４ｅは後述する図４５（ａ）に示すような水平３６０度の全方位映像４１ｅを撮像できる。全方位映像４１ｅには、図８に示すような参加者Ｂの顔の右側の映像と、図３６に示すような構造物３０の正面右寄りからの映像４０ｅとが含まれる。 As shown in FIG. 36, the imaging device 24e is a right imaging device (first imaging device) that is arranged above the display device 25 and spaced to the right of the imaging device 24g. The imaging device 24e can capture a horizontal 360-degree omnidirectional image 41e as shown in FIG. 45(a), which will be described later. The omnidirectional image 41e includes an image of the right side of the face of the participant B as shown in FIG. 8 and an image 40e from the front right side of the structure 30 as shown in FIG.

図３６に示すように撮像装置２４ｆは、表示装置２５の上部に撮像装置２４ｇよりも左寄りに離間して配置される左側撮像装置（第２撮像装置）である。撮像装置２４ｆは後述する図４３（ａ）に示すような水平３６０度の全方位映像４１ｆを撮像できる。全方位映像４１ｆには、図８に示すような参加者Ｂの顔の左側の映像と、図３６に示すような構造物３０の正面左寄りからの映像４０ｆとが含まれる。このように、撮像装置２４ｅ、２４ｆ、２４ｇによれば、向きの異なる周囲の映像（正面、右寄り、左寄りなど）を撮像できる。 As shown in FIG. 36, the imaging device 24f is a left imaging device (second imaging device) arranged above the display device 25 and spaced leftward from the imaging device 24g. The imaging device 24f can capture a horizontal 360-degree omnidirectional video 41f as shown in FIG. 43(a), which will be described later. The omnidirectional image 41f includes an image of the left side of the face of the participant B as shown in FIG. 8 and an image 40f of the front left side of the structure 30 as shown in FIG. In this manner, the imaging devices 24e, 24f, and 24g can capture surrounding images in different directions (front, rightward, leftward, etc.).

図３６に示すように構造物３０は、例えば基板３１上に複数の物体３２、３４、３６、３８を配置してなる。物体３２は直方体、物体３４は四角錐、物体３６は六角柱、物体３８は四角柱である。物体３２と物体３４は基板３１の略正面中央に配置され、物体３４は物体３２の後ろ側に配置されている。物体３２を正面に見て、正面右側（向かって右側）に物体３６が配置され、正面左側（向かって左側）に物体３８が配置されている。 As shown in FIG. 36, the structure 30 comprises, for example, a substrate 31 and a plurality of objects 32, 34, 36, 38 arranged thereon. Object 32 is a rectangular parallelepiped, object 34 is a square pyramid, object 36 is a hexagonal prism, and object 38 is a square prism. Objects 32 and 34 are arranged substantially in the front center of substrate 31 , and object 34 is arranged behind object 32 . When the object 32 is viewed from the front, an object 36 is arranged on the right side of the front (right side when facing), and an object 38 is arranged on the left side of the front (left side when facing).

ところで、図３６の正面の映像４０ｇによれば、物体３４は物体３２に隠れて先端の三角形の部分しか見えないことが分かる。これでは、物体３４の形状も分からない。正面の映像４０ｇだけを見れば、物体３４は三角板状で物体３２の上に配置されているようにも見える。 By the way, according to the front image 40g of FIG. 36, it can be seen that the object 34 is hidden by the object 32 and only the triangular portion at the tip can be seen. With this, the shape of the object 34 is also unknown. If only the front image 40g is seen, the object 34 looks like a triangular plate placed on top of the object 32 .

これに対して、正面右寄りにずれた位置からの映像４０ｅでは物体３４の右側まで見えるようになり、正面左寄りにずれた位置からの映像４０ｆでは物体３４の左側まで見えるようになる。これによれば、物体３４は物体３２とは別体で後ろに配置されていること、三角板状ではなく、角錐であることも分かる。このように、本実施形態によれば、正面とは異なる向きの映像を含む全方位映像から、正面右寄りの映像や正面左寄りの映像も取得できるので、正面の映像だけからでは分からなかったことも分かるようになる。 On the other hand, the right side of the object 34 can be seen in the image 40e shifted rightward from the front, and the left side of the object 34 can be seen in the image 40f shifted leftward in the front. According to this, it can also be seen that the object 34 is arranged behind the object 32 separately from the object 32, and that the object 34 is a pyramid, not a triangular plate. As described above, according to the present embodiment, it is possible to obtain an image to the right of the front and an image to the left of the front from an omnidirectional image including an image oriented in a direction different from the front. come to understand.

図３４の映像４０ｅ、４０ｆ、４０ｇはそれぞれ、撮像装置２４ｅ、２４ｆ、２４ｇの全方位映像から構造物３０を含む表示範囲の映像を切り出して展開した展開映像である。第４実施形態では、全方位映像から切り出す表示範囲を参加者の「顔の動き」に応じて変えることができるところに大きな特徴がある。 Images 40e, 40f, and 40g in FIG. 34 are unfolded images obtained by extracting images of display ranges including the structure 30 from the omnidirectional images of the imaging devices 24e, 24f, and 24g, respectively. A major feature of the fourth embodiment is that the display range cut out from the omnidirectional video can be changed according to the "face movement" of the participant.

以下、このような第４実施形態に係るコミュニケーションシステム１０１の構成について図３７乃至図４０を参照しながら説明する。図３７は、第４実施形態のコミュニケーションシステム１０１のブロック図であり、コミュニケーション装置１１と端末装置２０の具体的構成例を示す。図３７の端末装置２０の構成は図２とほぼ同様であるため、その詳細な説明を省略する。図３７のコミュニケーション装置１１の構成は図２のビデオ会議装置１０と同様の構成については同様の符号を付して詳細な説明を省略する。 The configuration of the communication system 101 according to the fourth embodiment will be described below with reference to FIGS. 37 to 40. FIG. FIG. 37 is a block diagram of the communication system 101 of the fourth embodiment, showing a specific configuration example of the communication device 11 and the terminal device 20. As shown in FIG. Since the configuration of the terminal device 20 in FIG. 37 is substantially the same as in FIG. 2, detailed description thereof will be omitted. The configuration of the communication device 11 in FIG. 37 is the same as that of the video conference device 10 in FIG.

図３７のコミュニケーション装置１１は、映像選定部１２３Ａ、映像生成部１２４Ａ、音選定部１２５Ａ、音生成部１２６Ａを備える。映像選定部１２３Ａは、顔映像選定部１２３ａと周囲映像選定部１２３ｂを備える。顔映像選定部１２３ａは、動き検知部１２２で検知された参加者の「顔の動き」に応じて他の参加者の「顔の映像」を選定する。周囲映像選定部１２３ｂは、動き検知部１２２で検知された参加者の「顔の動き」に応じて「周囲の映像」を選定する。 The communication device 11 of FIG. 37 includes a video selection unit 123A, a video generation unit 124A, a sound selection unit 125A, and a sound generation unit 126A. The image selection unit 123A includes a face image selection unit 123a and a surrounding image selection unit 123b. The face image selection unit 123 a selects “face images” of other participants according to the “face movements” of the participants detected by the motion detection unit 122 . The surrounding image selection unit 123 b selects a “surrounding image” according to the “face movement” of the participant detected by the motion detection unit 122 .

映像生成部１２４Ａは、顔映像生成部１２４ａと周囲映像生成部１２４ｂを備える。顔映像生成部１２４ａは、顔映像選定部１２３ａで選定された顔の映像から端末装置２０に表示する参加者の映像を生成する。周囲映像生成部１２４ｂは、周囲映像選定部１２３ｂで選定された周囲の映像から端末装置２０に表示する周囲の映像を生成する。このように、映像選定部１２３Ａと映像生成部１２４Ａは「顔の映像」の選定と生成だけでなく、「周囲の映像」の選定と生成もできる。 The image generation unit 124A includes a face image generation unit 124a and a surrounding image generation unit 124b. The face image generation unit 124a generates a participant image to be displayed on the terminal device 20 from the face image selected by the face image selection unit 123a. The surrounding image generation unit 124b generates a surrounding image to be displayed on the terminal device 20 from the surrounding image selected by the surrounding image selection unit 123b. Thus, the image selection unit 123A and the image generation unit 124A can select and generate not only the "face image" but also the "surrounding image".

音選定部１２５Ａは、音声選定部１２５ａと周囲音選定部１２５ｂとを備える。音声選定部１２５ａは、動き検知部１２２で検知された参加者の「顔の動き」に応じて他の参加者の音声を選定する。周囲音生成部１２６ｂは、動き検知部１２２で検知された参加者の「顔の動き」に応じて周囲の音を選定する。 The sound selection section 125A includes a sound selection section 125a and an ambient sound selection section 125b. The voice selection unit 125a selects voices of other participants according to the "face movements" of the participants detected by the motion detection unit 122. FIG. The ambient sound generator 126 b selects ambient sounds according to the “face movements” of the participants detected by the motion detector 122 .

音生成部１２６Ａは、音声生成部１２６ａと周囲音生成部１２６ｂを備える。音声生成部１２６ａは、音声選定部１２５ａで選定された音声から端末装置２０で出力する音声を生成する。周囲音生成部１２６ｂは、周囲音選定部１２５ｂで選定された周囲の音から端末装置２０で出力する周囲の音を生成する。このように、音選定部１２５Ａと音生成部１２６Ａは「音声」の選定と生成だけでなく、「周囲の音」の選定と生成もできる。なお、この場合、取得部１２１は、別々のマイク２６から音声と周囲の音を取得してもよく、また１つのマイク２６の入力音から音声と周囲の音を切り離して取得するようにしてもよい。また、周囲音選定部１２５ｂと周囲音生成部１２６ｂは必ずしも設けなくてもよい。その場合、端末装置２０から音声だけを出力してもよく、周囲の音が入ったままの音声を端末装置２０から出力してもよい。 The sound generation section 126A includes a sound generation section 126a and an ambient sound generation section 126b. The sound generator 126a generates a sound to be output by the terminal device 20 from the sound selected by the sound selector 125a. The ambient sound generator 126b generates ambient sounds to be output by the terminal device 20 from the ambient sounds selected by the ambient sound selector 125b. In this way, the sound selection unit 125A and the sound generation unit 126A can select and generate not only "voice" but also "surrounding sound". In this case, the acquisition unit 121 may acquire the voice and the ambient sound from separate microphones 26, or may acquire the voice and the ambient sound separately from the input sound of one microphone 26. good. Also, the ambient sound selection unit 125b and the ambient sound generation unit 126b may not necessarily be provided. In that case, only the sound may be output from the terminal device 20, or the sound with the surrounding sounds may be output from the terminal device 20. FIG.

図３８のデータテーブルは、参加者の映像と周囲の映像とを仮想会議室を利用して端末装置２０に表示させるための仮想会議室構成情報１８１であり、図３に対応する。図３８の仮想会議室構成情報１８１が図３と異なるのは、テーブルやボードの代わりに顔映像、周囲映像の項目を入れたことである。なお、図３８の項目は図示したものに限られず、テーブルやボードの項目を入れるようにしてもよい。図３８によれば、例えば小会議室Ｒ１では、参加者の顔映像あり、周囲映像ありなので、顔映像と周囲映像を含む表示範囲を設定する。図３９のデータテーブルは、参加者情報１８２であり、図４に対応する。本実施形で受信する映像データは、撮像装置２４ｅ、２４ｆ、２４ｇからの３つの全方位映像である。 The data table in FIG. 38 is virtual conference room configuration information 181 for displaying images of participants and surrounding images on the terminal device 20 using the virtual conference room, and corresponds to FIG. The virtual conference room configuration information 181 in FIG. 38 differs from that in FIG. 3 in that face images and surrounding images are included instead of tables and boards. The items in FIG. 38 are not limited to those shown in the figure, and items such as tables and boards may be included. According to FIG. 38, for example, in the small conference room R1, there is a participant's face image and a surrounding image, so the display range including the face image and the surrounding image is set. The data table in FIG. 39 is the participant information 182 and corresponds to FIG. The image data received in this embodiment are three omnidirectional images from the imaging devices 24e, 24f, and 24g.

図４０の仮想会議室表示情報１８３は、各参加者が入室する仮想会議室の表示情報である。例えば図４０の仮想会議室表示情報１８３には、各参加者を配置する列や各参加者の位置（参加者の映像をはめ込む顔映像枠の位置）の他、周囲の映像をはめ込む周囲映像枠ｒ０１の位置、仮想会議室の表示範囲１８３ｂなどが含まれる。仮想会議室表示情報１８３は、図３８の仮想会議室構成情報１８１の仮想会議室の種類毎に設けられる。どの仮想会議室を利用するかは、ホストとなる参加者が予め設定できるようになっている。 The virtual conference room display information 183 of FIG. 40 is display information of the virtual conference room into which each participant enters. For example, in the virtual conference room display information 183 in FIG. 40, in addition to the row in which each participant is arranged and the position of each participant (the position of the face image frame in which the image of the participant is inserted), the surrounding image frame in which the surrounding image is inserted The position of r01, the display range 183b of the virtual conference room, and the like are included. The virtual conference room display information 183 is provided for each type of virtual conference room in the virtual conference room configuration information 181 of FIG. The host participant can set in advance which virtual conference room to use.

図４０は、図３８の小会議室Ｒ１の仮想会議室を参加者Ａ、Ｂの２人が利用する場合を例示する。図４０に示すように外枠１８３ａが仮想会議室全体を示す。その内側の太枠が表示範囲１８３ｂを示す。図３８の小会議室Ｒ１は収容人数が２人、顔映像あり、周囲映像あり、横列数１つである。図４０では、ほぼ全面に周囲映像枠ｒ０１を配置し、その下辺りに参加者Ａの顔映像枠ｙＡと参加者Ｂの顔映像枠ｙＢを含む横列ｙ０１を配置する。なお、仮想会議室表示情報１８３の構成は図示したものに限られない。 FIG. 40 illustrates a case where two participants A and B use the virtual conference room of the small conference room R1 in FIG. As shown in FIG. 40, an outer frame 183a indicates the entire virtual conference room. The thick frame inside it indicates the display range 183b. The small conference room R1 in FIG. 38 has a capacity of two people, a face image, a surrounding image, and one row. In FIG. 40, a surrounding image frame r01 is arranged on almost the entire surface, and a row y01 including the face image frame yA of the participant A and the face image frame yB of the participant B is arranged below it. Note that the configuration of the virtual conference room display information 183 is not limited to that illustrated.

第４実施形態のコミュニケーション装置１１も図１０と同様のビデオ会議処理を行う。図１０のステップＳ１４０のビデオ処理について、図１１を参照しながら説明する。図１１は図１０に示すビデオ処理の具体例を示すフローチャートである。このビデオ処理は、本発明のコミュニケーションプログラムの例示である。第４実施形態のビデオ処理でも、ステップＳ１４２～ステップＳ１４８まで同様の処理が行われる。第４実施形態のビデオ処理が第１実施形態と異なるのは、ステップＳ１４４とステップＳ１４６にて検知した参加者の顔の動きに応じて、他の参加者の顔の映像と音声を選定するだけでなく、周囲の映像と周囲の音も選定して出力映像と出力音声を生成する点である。 The communication device 11 of the fourth embodiment also performs videoconference processing similar to that of FIG. The video processing of step S140 in FIG. 10 will be described with reference to FIG. FIG. 11 is a flow chart showing a specific example of the video processing shown in FIG. This video processing is exemplary of the communication program of the present invention. Also in the video processing of the fourth embodiment, similar processing is performed from step S142 to step S148. The video processing of the fourth embodiment differs from that of the first embodiment only by selecting video and audio of other participants' faces according to the movement of the faces of the participants detected in steps S144 and S146. In addition, the surrounding image and surrounding sound are also selected to generate the output image and the output sound.

以下、第４実施形態の図１１のビデオ処理を図４１乃至図４６の具体例を挙げながら詳細に説明する。ここでは、参加者Ａ、Ｂの顔の映像と参加者Ｂの前方にある構造物３０の映像を端末装置２０Ａ、２０Ｂで共有して表示させる場合を例示する。図４１乃至図４６は、参加者Ａの顔の動きに応じて参加者Ｂの映像と構造物３０の映像が変わる様子を説明するための図である。図４１及び図４２は参加者Ａの顔の動きなしを検知した場合であり、図４３及び図４４は参加者Ａが左方向への顔の移動を検知した場合であり、図４５及び図４６は参加者Ａが右方向への顔の移動を検知した場合である。図４１（ａ）、図４３（ａ）、図４５（ａ）は参加者Ｂの端末装置２０Ｂからの全方位映像である。図４１（ｂ）、図４３（ｂ）、図４５（ｂ）は全方位映像から生成した出力映像である。図４２は、参加者Ａが正面から顔を動かさずに見える表示画像を示す図である。図４４は、参加者Ａが顔を左に動かして見える表示画像を示す図である。図４６は、参加者Ａが顔を右に動かして見える表示画像を示す図である。 The video processing of FIG. 11 of the fourth embodiment will be described in detail below with specific examples of FIGS. 41 to 46. FIG. Here, a case is exemplified in which the images of the faces of participants A and B and the image of the structure 30 in front of participant B are shared and displayed by the terminal devices 20A and 20B. 41 to 46 are diagrams for explaining how the image of participant B and the image of the structure 30 change according to the movement of participant A's face. FIGS. 41 and 42 show the cases where no movement of the face of the participant A is detected, FIGS. 43 and 44 show the cases where the movement of the face of the participant A to the left is detected, and FIGS. is a case where participant A detects that the face moves to the right. 41(a), 43(a), and 45(a) are omnidirectional images from the terminal device 20B of the participant B. FIG. 41(b), 43(b), and 45(b) are output images generated from the omnidirectional images. FIG. 42 is a diagram showing a display image viewed from the front by participant A without moving his/her face. FIG. 44 is a diagram showing a display image seen when participant A moves his face to the left. FIG. 46 is a diagram showing a display image seen when participant A moves his face to the right.

本実施形態のビデオ処理は、先ず図１１に示すステップＳ１４２にて制御部１２は取得部１２１にて取得した複数の映像のうち少なくとも１つの映像から参加者の顔の動きを検知する。具体的には取得部１２１が撮像装置２４による参加者Ａ、Ｂのそれぞれの端末装置２０Ａ、２０Ｂからすべての撮像装置２４ｅ、２４ｆ、２４ｇからの３つの全方位映像（異なる向きの周囲の映像）を参加者Ａ、Ｂ毎に取得する。 In the video processing of the present embodiment, first, in step S142 shown in FIG. 11, the control unit 12 detects the movement of the participant's face from at least one image among the plurality of images acquired by the acquisition unit 121. FIG. Specifically, the acquisition unit 121 acquires three omnidirectional images (surrounding images in different directions) from the terminal devices 20A and 20B of the participants A and B, and all the imaging devices 24e, 24f, and 24g. are acquired for each of participants A and B.

例えば参加者Ｂの撮像装置２４ｇからは図４１（ａ）に示すような構造物３０の正面映像と参加者Ｂの正面顔映像を含む全方位映像を取得する。参加者Ｂの撮像装置２４ｆからは図４３（ａ）に示すような構造物３０の正面左寄り映像と参加者Ｂの左顔映像を含む全方位映像を取得する。参加者Ｂの撮像装置２４ｅからは図４５（ａ）に示すような構造物３０の正面右寄り映像と参加者Ｂの右顔映像を含む全方位映像を取得する。参加者Ａの撮像装置２４ｅ、２４ｆ、２４ｇからも顔の映像と周囲の映像を含む全方位映像を取得する。 For example, from the imaging device 24g of the participant B, an omnidirectional image including the front image of the structure 30 and the front face image of the participant B as shown in FIG. From the imaging device 24f of the participant B, an omnidirectional image including the front left image of the structure 30 and the left face image of the participant B as shown in FIG. 43(a) is obtained. From the imaging device 24e of the participant B, an omnidirectional image including the front right image of the structure 30 and the right face image of the participant B as shown in FIG. 45(a) is acquired. An omnidirectional image including a face image and surrounding images is also acquired from the imaging devices 24e, 24f, and 24g of the participant A.

参加者Ａ、Ｂ毎に、取得した３つの全方位映像のうち少なくとも１つから参加者の顔映像を認識し、その顔の映像から動き検知部１２２が参加者Ａ、Ｂの顔の動きを検知する。例えば図４１（ａ）の全方位映像に含まれる参加者Ｂの顔の映像から顔の動きを検知する。この場合、顔の動きの検知には、必ずしも全方位映像を利用しなくてもよく、その参加者の顔の映像を展開した映像を利用してもよい。 For each of the participants A and B, the facial image of the participant is recognized from at least one of the three omnidirectional images acquired, and the motion detection unit 122 detects the facial movements of the participants A and B from the facial image. detect. For example, the movement of the face of participant B included in the omnidirectional video shown in FIG. 41(a) is detected. In this case, it is not always necessary to use an omnidirectional image to detect the movement of the face, and an image obtained by developing the face image of the participant may be used.

本実施形態の動き検知部１２２が検知する顔の動きは、右方向や左方向への顔の移動と、左向きや右向きなど顔の向きの変化である。制御部１２は、ステップＳ１４２にて顔の動きが検知されると、ステップＳ１４４にて顔の動きに応じて他の参加者の３つの全方位映像から１つの映像と音声を選定し、ステップＳ１４６にて選定した全方位映像と音声から出力映像と出力音声を生成し、ステップＳ１４８にて出力映像と出力音声を動画データとして出力する。 The movement of the face detected by the movement detection unit 122 of the present embodiment is a movement of the face to the right or left and a change in orientation such as left or right. When the movement of the face is detected in step S142, the control unit 12 selects one video and audio from the three omnidirectional videos of the other participants in step S144 according to the movement of the face, and selects one video and audio in step S146. The output video and output audio are generated from the omnidirectional video and audio selected in step S148, and the output video and output audio are output as moving image data.

具体的には動き検知部１２２が図４２に示すように参加者Ａの顔の動きなしを検知した場合、映像選定部１２３Ａが図４１（ａ）に示す参加者Ｂの撮像装置２４ｇからの全方位映像４１ｇを選定する。具体的には顔の映像については、顔映像選定部１２３ａが図４１（ａ）に示す全方位映像４１ｇから参加者Ｂの顔映像の表示範囲４２ｇを選定し、顔映像生成部１２４ａが表示範囲４２ｇを切り出して図４１（ｂ）に示すように参加者Ｂの顔映像を展開する。周囲の映像については、周囲映像選定部１２３ｂが図４１（ａ）に示す全方位映像４１ｇから構造物３０の映像を含む周囲映像の表示範囲４３ｇを選定し、周囲映像生成部１２４ｂがその表示範囲４３ｇを切り出して図４１（ｂ）に示すように構造物３０の映像を含む周囲映像を矩形に展開する。参加者Ａの顔映像も同様に参加者Ａの撮像装置２４ｇからの全方位映像４１ｇから切り出されて矩形に展開される。 Specifically, when the motion detection unit 122 detects that the participant A's face does not move as shown in FIG. The azimuth image 41g is selected. Specifically, for the face image, the face image selection unit 123a selects the display range 42g of the face image of the participant B from the omnidirectional image 41g shown in FIG. 42g is cut out and the facial image of participant B is developed as shown in FIG. 41(b). As for the surrounding image, the surrounding image selection unit 123b selects a display range 43g of the surrounding image including the image of the structure 30 from the omnidirectional image 41g shown in FIG. 43g is cut out, and the surrounding image including the image of the structure 30 is developed into a rectangle as shown in FIG. 41(b). The face image of the participant A is similarly cut out from the omnidirectional image 41g from the imaging device 24g of the participant A and developed into a rectangle.

映像生成部１２４Ａは、図４０の仮想会議室表示情報１８３に基づいて、上述のように展開された参加者Ａの顔映像と参加者Ｂの顔映像と構造物３０の映像を含む周囲映像とから出力映像を生成する。具体的には映像生成部１２４Ａは、構造物３０の映像を含む周囲映像を図４０の周囲映像枠ｒ０１にはめ込み、参加者Ａの顔映像を横列ｙ０１の参加者Ａの顔映像枠にはめ込み、参加者Ｂの顔映像を横列ｙ０１の参加者Ｂの顔映像枠にはめ込む。こうして、映像生成部１２４Ａは、図４１（ａ）の全方位映像４１ｇから図４１（ｂ）の出力映像を生成する。参加者Ａ、Ｂの音声と周囲音はともに音声選定部１２５ａと周囲音選定部１２５ｂでマイク２６から左右の音声と周囲音が選定され、音声生成部１２６ａと周囲音生成部１２６ｂは選定された音声と周囲音から出力音声を生成する。出力部１２７は、図４１（ｂ）の出力映像と出力音声を画像データとして出力する。 Based on the virtual conference room display information 183 of FIG. 40, the video generation unit 124A generates the peripheral video including the face video of the participant A, the face video of the participant B, and the video of the structure 30 developed as described above. to generate the output video from. Specifically, the image generation unit 124A inserts the surrounding image including the image of the structure 30 into the surrounding image frame r01 of FIG. The face image of the participant B is fitted in the face image frame of the participant B in the row y01. In this way, the image generation unit 124A generates the output image of FIG. 41(b) from the omnidirectional image 41g of FIG. 41(a). For both the voices of the participants A and B and the ambient sounds, the left and right voices and the ambient sounds from the microphones 26 are selected by the voice selector 125a and the ambient sound selector 125b, and the voice generator 126a and the ambient sound generator 126b are selected. Generate output audio from speech and ambient sound. The output unit 127 outputs the output video and output audio shown in FIG. 41(b) as image data.

すると、図４２に示すように参加者Ａの表示画面２５２には、参加者Ｂの顔の正面の映像と構造物３０の正面の映像が表示され、左右の音声がそのまま出力される。参加者Ｂの表示画面２５２にも同様の映像が表示される。このように第４実施形態では、参加者の顔の映像だけでなく、その周囲の映像も表示させることができる。これにより、参加者は、周囲の映像を見ながら会話をすることができる。 Then, as shown in FIG. 42, the front image of the participant B's face and the front image of the structure 30 are displayed on the display screen 252 of the participant A, and the left and right sounds are output as they are. A similar image is displayed on the display screen 252 of the participant B as well. As described above, in the fourth embodiment, not only the image of the participant's face but also the surrounding image can be displayed. This allows the participants to have a conversation while watching the surrounding images.

ところが、図４２の構造物３０の正面の映像では、物体３４が物体３２に隠れて見えない。そこで、図４４に示すように参加者Ａが左に顔を動かすと、その顔の動きが動き検知部１２２で検知され、上記と同様の処理で図４３（ａ）の全方位映像４１ｆから参加者Ｂの顔映像と周囲映像の表示範囲４３ｆが選定され矩形に展開されて、図４３（ｂ）の出力画像と出力音声が生成され、画像データとして出力される。そうすると、図４４に示すように参加者Ａの表示画面２５２には、構造物３０の正面左寄りの映像が表示される。これにより、正面の映像では物体３２に隠れて見えなかった物体３４が見えるようになる。すなわち、顔を左に動かせば正面左寄りから構造物３０の映像が見えるので、まるでその場にいるような臨場感のある体験が可能となる。 However, in the image of the front of the structure 30 in FIG. 42, the object 34 is hidden behind the object 32 and cannot be seen. Therefore, when the participant A moves his face to the left as shown in FIG. 44, the movement of the face is detected by the movement detection unit 122, and the participant A participates from the omnidirectional video 41f of FIG. A display range 43f of the face image and the surrounding image of the person B is selected and developed into a rectangle, and the output image and the output sound shown in FIG. 43(b) are generated and output as image data. Then, as shown in FIG. 44, on the display screen 252 of the participant A, an image of the front left side of the structure 30 is displayed. As a result, the object 34, which was hidden behind the object 32 in the front image, becomes visible. That is, if the user moves his/her face to the left, the image of the structure 30 can be seen from the left side of the front, so that it is possible to have a realistic experience as if the player were there.

この場合、参加者Ａの左への顔の動きに連動して、図４３（ｂ）に示すように参加者Ａの表示位置を左に動かすようにしてもよい。これにより、例えば図４４のように参加者Ａが正面左寄りから構造物３０を見ているような映像にできる。また図４３（ｂ）に示すように参加者Ｂは顔の左側の映像になるので、まるで参加者Ａの方を向いて会話しているように見える。このように参加者の顔の動きに応じて選定した周囲の映像の向きに合わせてその参加者の表示位置を変えることで、その表示位置から周囲の映像を見ているように表示できる。これにより、まるでその場にいるような臨場感のある体験が可能となる。 In this case, the display position of participant A may be moved leftward as shown in FIG. As a result, for example, as shown in FIG. 44, it is possible to create an image in which the participant A looks at the structure 30 from the left side of the front. Also, as shown in FIG. 43(b), the image of the participant B is on the left side of the face, so it looks as if the participant B is facing the participant A while conversing. By changing the display position of the participant in accordance with the orientation of the surrounding image selected in accordance with the movement of the participant's face in this manner, it is possible to display the surrounding image as if it were being viewed from that display position. As a result, it is possible to have a realistic experience as if you were there.

これに対して、図４６に示すように参加者Ａが右に顔を動かすと、その顔の動きが動き検知部１２２で検知され、上記と同様の処理で図４５（ａ）の全方位映像４１ｅから参加者Ｂの顔映像と周囲映像の表示範囲４３ｅが選定され矩形に展開されて、図４５（ｂ）の出力画像と出力音声が生成され、画像データとして出力される。そうすると、図４６に示すように参加者Ａの表示画面２５２には、構造物３０の正面右寄りの映像が表示される。これによっても、正面の映像では物体３２に隠れて見えなかった物体３４が見えるようになる。すなわち、顔を右に動かせば正面右寄りから構造物３０の映像が見えるので、まるでその場にいるような臨場感のある体験が可能となる。 On the other hand, when the participant A moves his face to the right as shown in FIG. 46, the movement of the face is detected by the movement detection unit 122, and the omnidirectional image shown in FIG. A display range 43e of the face image and surrounding image of participant B is selected from 41e and developed into a rectangle, and the output image and output sound shown in FIG. 45(b) are generated and output as image data. Then, as shown in FIG. 46, on the display screen 252 of the participant A, an image of the front right of the structure 30 is displayed. This also makes it possible to see the object 34 that was hidden behind the object 32 in the front image. That is, if the user moves his/her face to the right, the image of the structure 30 can be seen from the right side of the front, so that it is possible to have a realistic experience as if the player were there.

この場合、参加者Ａの右への顔の動きに連動して、図４５（ｂ）に示すように参加者Ａの表示位置を右に配置するようにしてもよい。これにより、例えば図４６のように参加者Ａが正面右寄りから構造物３０を見ているような映像にできる。また図４５（ｂ）に示すように参加者Ｂは顔の右側の映像になるので、まるで参加者Ａの方を向いて会話しているように見える。このように参加者の顔の動きに応じて選定した周囲の映像の向きに合わせてその参加者の表示位置を変えることで、その表示位置から周囲の映像を見ているように表示できる。これにより、まるでその場にいるような臨場感のある体験が可能となる。 In this case, the display position of participant A may be arranged to the right as shown in FIG. As a result, for example, as shown in FIG. 46, it is possible to create an image in which the participant A looks at the structure 30 from the right side of the front. Also, as shown in FIG. 45(b), since the image of participant B is on the right side of the face, it seems as if the participant B is facing the participant A and conversing with him/her. By changing the display position of the participant in accordance with the orientation of the surrounding image selected in accordance with the movement of the participant's face in this manner, it is possible to display the surrounding image as if it were being viewed from that display position. As a result, it is possible to have a realistic experience as if you were there.

ところで、図４２に示す構造物３０の正面の映像では、物体３８が一部しか見えないので、どのような形状かよく分からない。そこで、物体３８は正面右寄りにあるので、図４５に示すように参加者Ａは顔を移動させずに左に顔を向ける。すると、その顔の動きが動き検知部１２２で検知され、上記と同様の処理で図４７（ａ）の全方位映像４１ｇから図４７（ｂ）の出力画像と出力音声が生成され、画像データとして出力される。このとき、参加者Ａの顔の向きに応じて図４７（ａ）の表示範囲４３ｇが左に動く。そうすると、図４８に示すように参加者Ａの表示画面２５２には、構造物３０の全体が左に移動した映像が表示される。これにより、正面の映像では一部しか見えなった物体３８の全体が見えるようになる。すなわち、顔を左に向けば構造物３０の左側の映像が見えるようになるので、まるでその場にいるような臨場感のある体験が可能となる。 By the way, in the image of the front of the structure 30 shown in FIG. 42, only a part of the object 38 can be seen, so the shape of the object 38 cannot be clearly understood. Therefore, since the object 38 is on the right side of the front, as shown in FIG. 45, the participant A turns his face to the left without moving his face. 47(b) is generated from the omnidirectional video 41g of FIG. output. At this time, the display range 43g in FIG. Then, as shown in FIG. 48, on the display screen 252 of the participant A, an image in which the entire structure 30 is moved to the left is displayed. As a result, the entire object 38, which was only partially visible in the front image, can be seen. That is, when the user turns his/her face to the left, the image on the left side of the structure 30 can be seen.

また、図４２に示す構造物３０の正面の映像では、物体３６が一部しか見えないので、どのような形状かよく分からない。そこで、物体３６は正面右寄りにあるので、図５０に示すように参加者Ａは顔を移動させずに右に顔を向ける。すると、その顔の動きが動き検知部１２２で検知され、上記と同様の処理で図４９（ａ）の全方位映像４１ｇから図４９（ｂ）の出力画像と出力音声が生成され、画像データとして出力される。このとき、参加者Ａの顔の向きに応じて図４９（ａ）の表示範囲４３ｇが右に動く。そうすると、図５０に示すように参加者Ａの表示画面２５２には、構造物３０の全体が右に移動した映像が表示される。これにより、正面の映像では一部しか見えなった物体３６の全体が見えるようになる。すなわち、顔を右に向けば構造物３０の右側の映像が見えるようになるので、まるでその場にいるような臨場感のある体験が可能となる。 Also, in the image of the front of the structure 30 shown in FIG. 42, only a part of the object 36 is visible, so it is difficult to know what shape it is. Therefore, since the object 36 is on the right side of the front, as shown in FIG. 50, the participant A turns his face to the right without moving his face. 49(b) is generated from the omnidirectional video 41g of FIG. output. At this time, the display range 43g in FIG. Then, as shown in FIG. 50, on the display screen 252 of the participant A, an image in which the entire structure 30 is moved to the right is displayed. As a result, the entire object 36, which was only partially visible in the front image, can be seen. That is, when the user turns his/her face to the right, the image on the right side of the structure 30 can be seen, so that it is possible to have a realistic experience as if the user were actually there.

＜その他の変形例＞
本発明は、上述した各実施形態に限定されず、例えば以降に説明する各種の応用・変形が可能である。また、これらの変形の態様および上述した各実施形態及びその変形例は、任意に選択された一または複数を適宜組み合わせることも可能である。また当業者であれば、請求の範囲に記載された範疇内において、各種の別の変更例または修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。<Other Modifications>
The present invention is not limited to the above-described embodiments, and various applications and modifications described below are possible. Moreover, it is also possible to appropriately combine one or a plurality of arbitrarily selected aspects of these modifications, each of the above-described embodiments, and modifications thereof. In addition, it is obvious that a person skilled in the art can conceive various other modifications or modifications within the scope described in the claims, and these naturally belong to the technical scope of the present invention. understood as a thing.

（１）上記第１実施形態乃至第４実施形態とその変形例において、取得部１２１で取得した撮像装置２４からの映像（顔映像や周囲映像を含む）や、音声（周囲音は、記憶部１４に記憶しておくことができる。取得部１２１から取得する撮像装置２４からの映像は、リアルタイムで取得する場合に限られず、上記記憶部１４に記憶しておいた映像を取得するようにしてもよい。これによれば、例えば当日ビデオ会議に出席できなくなった参加者が記憶部１４に記憶しておいた映像や音声などを利用して、当日と同じ顔映像と周囲映像でビデオ会議を体験できる。この場合もリアルタイム映像の場合と同様に、参加者の顔の動きに応じて顔の映像や周囲の映像を変えることができるので、見たい時間にまるでその場にいるような体験が可能となる。 (1) In the first to fourth embodiments and their modifications, images (including facial images and surrounding images) from the imaging device 24 acquired by the acquisition unit 121 and sounds (surrounding sounds are stored in the storage unit 14. The image from the imaging device 24 acquired from the acquisition unit 121 is not limited to acquisition in real time, and the image stored in the storage unit 14 is acquired. According to this, for example, a participant who cannot attend the video conference on the day can use the video and audio stored in the storage unit 14 to hold the video conference with the same face image and surrounding image as on the day. In this case, as in the case of real-time video, it is possible to change the image of the participant's face and the surrounding image according to the movement of the participant's face, so it is possible to experience as if you were there at the time you want to see it. It becomes possible.

（２）上記第１実施形態乃至第４実施形態とその変形例では、動き検知部１２２が参加者の顔の動きとして、左右方向や上下方向の顔の動きを検知する場合を例示したが、前後方向の顔の動きも検知できるようにしてもよい。例えば顔のベクトルの位置と撮像装置２４との距離の変化に基づいて前後方向の顔の動きを検知してもよい。これによれば、例えば参加者が端末装置２０の表示画面２５に近づいて、動き検知部１２２が前方向の顔の動きを検知すると。顔の映像や周囲の映像を拡大するようにすることもできるようになる。例えば周囲の映像に映ってる物体に小さくて読めない文字があるときに、表示画面２５に顔を近づければ映像が拡大され、その文字が大きく見えて読めるようになる。逆に表示画面２５から顔を遠ざければ、表示画面２５の映像が縮小されるようにしてもよい。例えば映像が大きくて物体の一部しか見えない場合に、表示画面２５から顔を遠ざければ映像が縮小され、その物体の全体が見えるようになる。これにより、まるで実物を見ているかのような臨場感を体験できる。 (2) In the first to fourth embodiments and their modifications, the motion detection unit 122 detects the movement of the face of the participant in the left-right direction and the up-down direction. It may also be possible to detect movement of the face in the front-rear direction. For example, the movement of the face in the front-rear direction may be detected based on the change in the distance between the position of the face vector and the imaging device 24 . According to this, for example, when the participant approaches the display screen 25 of the terminal device 20 and the movement detection unit 122 detects the movement of the face in the forward direction. It will also be possible to enlarge the image of the face and the image of the surroundings. For example, when an object in the surrounding image has small characters that cannot be read, if the face is brought close to the display screen 25, the image is enlarged and the characters can be seen and read large. Conversely, if the face is moved away from the display screen 25, the image on the display screen 25 may be reduced. For example, when the image is large and only part of the object can be seen, moving the face away from the display screen 25 reduces the image so that the entire object can be seen. As a result, you can experience a sense of realism as if you were looking at the real thing.

（３）上記第１実施形態乃至第３実施形態とその変形例では、内容を分かりやすくするため、顔の映像のみを表示した図面を用いて説明したが、これに限られない。例えば第４実施形態のように顔だけでなく上半身も含めた映像であってもよい。ビデオ会議装置１０が上半身と顔を含む映像を受信する場合には、その映像から顔の部分を認識して顔の動きを検知するようにしてもよい。例えばＡＩ（人工知能）などで機械学習させた学習済モデルや既存の学習済モデルを用いて、映像から顔の部分の認識や顔の動きを検知する。 (3) In the above-described first to third embodiments and their modified examples, drawings showing only face images are used for the purpose of making the contents easier to understand, but the present invention is not limited to this. For example, the image may include not only the face but also the upper half of the body as in the fourth embodiment. When the video conference device 10 receives a video including the upper half of the body and the face, the facial movement may be detected by recognizing the facial portion from the video. For example, by using a trained model machine-learned by AI (artificial intelligence) or an existing trained model, facial recognition and facial movement are detected from an image.

（４）上記実施形態及び上記変形例では、本発明のコミュニケーション装置をビデオ会議装置１０に適用した場合を例示したが、これに限られない。例えばＷｅｂ会議装置、テレビ会議装置、テレビ電話装置、オンライン会議装置、テレビ通話装置、ビデオ通話装置など様々なコミュニケーション装置に適用可能である。また、本発明の用途についても会議用に限られず、展示会会場、介護施設、病院施設、実家などと自宅の間や、介護施設同士、病院同士などにおける会話や対話など、様々なシチュエーションでのコミュニケーションに利用できる。 (4) In the above embodiment and modification, the case where the communication device of the present invention is applied to the video conference device 10 was exemplified, but the present invention is not limited to this. For example, it can be applied to various communication devices such as a web conference device, a video conference device, a video phone device, an online conference device, a video communication device, and a video communication device. In addition, the use of the present invention is not limited to conferences, and can be used in various situations such as conversations and dialogues between exhibition venues, nursing care facilities, hospital facilities, parents' homes, etc., between nursing facilities, between hospitals, etc. available for communication.

１００…ビデオ会議システム（コミュニケーションシステム）、１０…ビデオ会議装置（コミュニケーション装置）、１１…通信部、１２…制御部、１２１…取得部、１２２…検知部、１２３（１２３Ａ）…映像選定部、１２３ａ…顔映像選定部、１２３ｂ…周囲映像選定部、１２４（１２４Ａ）…映像生成部、１２４ａ…顔映生成定部、１２４ｂ…周囲映像生成部、１２５Ａ…音選定部、１２５（１２５ａ）…音声選定部、１２５ｂ…周囲音選定部、１２６Ａ…音生成部、１２６（１２６ａ）…音声生成部、１２６ｂ…周囲音生成部、１２７…出力部、１４…記憶部、１５…プログラム記憶部、１６…データ記憶部、１６１…ユーザ情報、１８…仮想会議室記憶部、１８１…仮想会議室構成情報、１８２…参加者情報、１８３…仮想会議室表示情報、１８３ａ…外枠、１８３ｂ…表示範囲、１８３ｃ…テーブル、１８３ｄ…ホワイトボード、２０（２０Ａ、２０Ｂ、２０Ｃ、２０Ｄ）…端末装置、２０ａ…本体、２０Ｌ…バスライン、２１…通信部、２２…制御部、２３…記憶部、２４（２４ａ、２４ｂ、２４ｃ、２４ｄ）…撮像装置、２４ａ、２４ｅ…第１撮像装置（右側撮像装置）、２４ｂ、２４ｆ…第２撮像装置（左側撮像装置）、２４ｃ、２４ｇ…第３撮像装置（中央撮像装置）、２４ｄ…第４撮像装置（下側撮像装置）、２４２ａ、２４２ｂ、２４２ｃ、２４２ｄ…支持部、２４４ａ、２４４ｂ、２４４ｃ、２４４ｄ…回転軸、２４６…外側カメラ、２４７…内側カメラ、２５…表示装置、２５２…表示画面、２６…マイク、２７…スピーカ、２８…入力装置、４０ｅ、４０ｆ、４０ｇ…映像、４１ｅ、４１ｆ、４１ｇ…全方位映像、４２ｅ、４２ｆ、４２ｇ…顔映像の表示範囲、４３ｅ、４３ｆ、４３ｇ…周囲映像の表示範囲、ｔ０１…縦列、ｔ０２…縦列、ｙ０１…横列、ｙ０２…横列、Ａ～Ｄ…参加者、Ｎ…ネットワーク、Ｒ１…小会議室、Ｒ２…中会議室、Ｒ３…大会議室、Ｒ４…対面会議室。
DESCRIPTION OF SYMBOLS 100... Video conference system (communication system), 10... Video conference apparatus (communication apparatus), 11... Communication part, 12... Control part, 121... Acquisition part, 122... Detection part, 123 (123A)... Video selection part, 123a Face image selection unit 123b Surrounding image selection unit 124 (124A) Image generation unit 124a Face image generation determination unit 124b Surrounding image generation unit 125A Sound selection unit 125 (125a) Sound selection Part 125b... Ambient sound selection part 126A... Sound generation part 126 (126a)... Sound generation part 126b... Ambient sound generation part 127... Output part 14... Storage part 15... Program storage part 16... Data Storage unit 161 User information 18 Virtual conference room storage unit 181 Virtual conference room configuration information 182 Participant information 183 Virtual conference room display information 183a Outer frame 183b Display range 183c Table 183d Whiteboard 20 (20A, 20B, 20C, 20D) Terminal device 20a Main body 20L Bus line 21 Communication unit 22 Control unit 23 Storage unit 24 (24a, 24b) , 24c, 24d)... Imaging devices 24a, 24e... First imaging device (right imaging device) 24b, 24f... Second imaging device (left imaging device) 24c, 24g... Third imaging device (central imaging device) , 24d... fourth imaging device (lower side imaging device), 242a, 242b, 242c, 242d... support portion, 244a, 244b, 244c, 244d... rotating shaft, 246... outer camera, 247... inner camera, 25... display device , 252... display screen, 26... microphone, 27... speaker, 28... input device, 40e, 40f, 40g... video, 41e, 41f, 41g... omnidirectional video, 42e, 42f, 42g... display range of face video, 43e , 43f, 43g... display range of surrounding video, t01...column, t02...column, y01...horizontal line, y02...horizontal line, A to D...participant, N...network, R1...small conference room, R2...medium conference room, R3: large conference room, R4: face-to-face conference room.

Claims

A communication device for displaying images of a plurality of participants on a terminal device,
an acquisition unit that acquires a plurality of images including images of faces facing different directions for each of the plurality of participants;
a motion detection unit that detects movement of the face of at least one of the participants from the face image acquired by the acquisition unit;
an image selection unit that selects an image of the face direction of another participant according to the movement of the participant's face detected by the movement detection unit;
an image generation unit configured to generate an image for displaying at least an image of the other participant's face on the terminal device from the images selected by the image selection unit;
A virtual conference room for arranging the plurality of participants and a virtual conference room storage unit for storing the positions of the plurality of participants in the virtual conference room,
The image selection unit selects the display range of the virtual conference room according to the movement of the participant's face detected by the movement detection unit,
The image generation unit generates an image specifying a position to display the participant according to the position of the participant in the virtual conference room, and the participant included in the display range among the plurality of participants. generate images to be displayed on the terminal device, and
communication device.

2. The communication device according to claim 1 , wherein the image selection unit selects an image of a face orientation according to the position of the participant in the virtual conference room.

The motion detection unit detects whether the face orientation of at least one of the participants is front, right, or left from the face image acquired by the acquisition unit,
When the face of the participant detected by the motion detection unit faces the front, the video selection unit selects the other participants located on the right and left sides of the participant in the display range of the virtual conference room. , and if the face direction of the participant detected by the motion detection unit is facing right, the other participant located on the right side of the participant in the display range of the virtual conference room and when the participant's face direction detected by the motion detection unit is facing left, the video of the other participant positioned to the left of the participant is included. 3. The communication device according to 2 .

When the participant's position in the virtual meeting room changes,
The image selection unit selects an image of the face orientation corresponding to the changed position,
4. The communication device according to any one of claims 1 to 3 , wherein the image generating unit generates an image specifying a position where the participant is to be displayed according to the changed position.

a voice selection unit that selects the voice of the other participant according to the face movement of the participant detected by the motion detection unit;
5. Any one of claims 1 to 4 , further comprising a voice generation unit that generates a voice for outputting the voice of the other participant from the terminal device based on the voice selected by the voice selection unit. Communication device as described.

The acquisition unit acquires a face image in a different orientation and a peripheral image in a different orientation for each of the plurality of participants,
The image selection unit selects the image of the face direction of the other participant and the surrounding image according to the movement of the face of the participant detected by the movement detection unit,
The image generation unit generates an image for displaying at least the image of the face of the other participant and the image of the surroundings on the terminal device from the image selected by the image selection unit. Item 6. The communication device according to any one of items 5 .

The movement detection unit detects movement and orientation of the face as the movement of the face of the participant,
The image selection unit selects the surrounding image according to the movement of the face, selects a display range of the surrounding image according to the orientation of the face,
7. The communication device according to claim 6 , wherein the image generating section generates an image for displaying the surrounding image on the terminal device in the display range selected by the image selecting section.

a virtual conference room for arranging the plurality of participants;
the positions of the plurality of participants and the positions of the surrounding images in the virtual conference room;
Equipped with a virtual conference room storage unit that stores
8. The communication device according to claim 7 , wherein the image selection unit changes the display position of the participant according to the orientation of the surrounding image selected according to the movement of the face of the participant.

A computer-readable storage medium storing a communication program that causes a computer to execute video processing performed by a communication device,
The communication device is
A virtual conference room for arranging a plurality of participants and a virtual conference room storage unit for storing the positions of the plurality of participants in the virtual conference room,
The video processing includes:
obtaining a plurality of videos, including videos of different face orientations for each of a plurality of participants;
detecting movement of the face of at least one of the participants from the captured video of the face;
selecting an image of the facial direction of another participant and a display range of the virtual conference room according to the detected facial movement of the participant;
generating an image for displaying at least the image of the other participant's face on the terminal device from the selected image;
The image to be displayed on the terminal device includes the image specifying the position of the participant to be displayed according to the position of the participant in the virtual conference room, and the display range of the plurality of participants. and a video for displaying the participant on the terminal device.
A storage medium containing

A communication program that causes a computer to execute video processing performed by a communication device,
The communication device is
A virtual conference room storage unit that stores a virtual conference room in which a plurality of participants are arranged and the positions of the plurality of participants in the virtual conference room,
The video processing includes:
obtaining a plurality of videos, including videos of different face orientations for each of a plurality of participants;
detecting movement of the face of at least one of the participants from the captured video of the face;
selecting an image of the facial direction of another participant and a display range of the virtual conference room according to the detected facial movement of the participant;
generating an image for displaying at least the image of the other participant's face on the terminal device from the selected image;
The image to be displayed on the terminal device includes the image specifying the position of the participant to be displayed according to the position of the participant in the virtual conference room, and the display range of the plurality of participants. and a video for displaying the participant on the terminal device.
communication program.