JP2011097447A

JP2011097447A - Communication system

Info

Publication number: JP2011097447A
Application number: JP2009250862A
Authority: JP
Inventors: Keisuke Omori; 圭祐大森
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2009-10-30
Filing date: 2009-10-30
Publication date: 2011-05-12

Abstract

PROBLEM TO BE SOLVED: To provide a communication system capable of bringing the visual lines of conversation persons into a line even if the conversation persons do not face display surfaces. SOLUTION: The communication system includes: an image output unit for outputting an image produced by viewing a first viewer, being a conversation person, approximately from the front side; a display unit including the display surface for displaying the image of the first viewer; and an image enlarging and reducing unit for calculating the image produced by performing, to the image output by the image outputting unit, a transmission and projection conversion which is directed to a display surface from a virtual display surface facing a straight line connecting the second viewer reference position which is a position of a visual point of a second viewer conversing with the first viewer and the center of the display surface and centers on a second viewer reference position. Then, the display unit displays the image produced by the image enlarging and reducing unit. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、コミュニケーションシステムに関する。 The present invention relates to a communication system.

互いに異なる地点に位置する複数の会議参加者用に互いの映像を表示するテレビ会議システムなど、映像を用いたコミュニケーションツールにおいて、会話者は表示装置に表示される者、特に相手の会話者の顔を見ながら会話を行う。ここでは、会話者とは、発言を行っている者と、その者が話しかける相手とを言う。例えばテレビ会議システムでは、発言を行っている会議参加者（以下では「発話者」ともいう）、または、この発話者が１人の会議参加者に対して話しかける場合に、その話しかける対象の会議参加者のことをいう。また、以下では、コミュニケーションシステムが表示する画像を見る者、例えばテレビ会議の参加者のことを「視聴者」ともいう。このように、会話者同士が顔を見ながら会話をすることが意思疎通を図るために重要であり、更には会話者同士の視線が一致していることがより意思疎通を深める上で重要である。
特許文献１では、第一のユーザ（会話者）が見る表示装置の表示面周辺に複数の撮像装置を備え第一のユーザの正面映像を撮像し、選択された第二のユーザに第一のユーザの正面映像を送信し、それ以外のユーザには第一のユーザの横顔画像を送信することで、第一のユーザと第二のユーザとの視線を一致させる方法が提案されている。これにより、表示装置に正対しているユーザに対しては、相手のユーザと視線が一致する画像を表示ことができる。 In a communication tool that uses video, such as a video conference system that displays each other's video for a plurality of conference participants located at different points, the talker is the person who is displayed on the display device, especially the face of the other party's talker Have a conversation while watching. Here, the talker means the person who is speaking and the person with whom the person speaks. For example, in a video conference system, when a conference participant who is speaking (hereinafter also referred to as a “speaker”) or when this speaker speaks to a single conference participant, the conference participation to be spoken Refers to the person. In the following, a person who views an image displayed by the communication system, for example, a participant in a video conference is also referred to as a “viewer”. In this way, it is important for conversations to have a conversation while looking at each other's faces. Furthermore, it is important for the conversations to have the same line of sight. is there.
In Patent Document 1, a front image of the first user is captured by providing a plurality of imaging devices around the display surface of the display device viewed by the first user (conversator), and the first user is selected by the first user. There has been proposed a method of matching the line of sight of the first user and the second user by transmitting a front image of the user and transmitting a profile image of the first user to other users. As a result, an image whose line of sight matches that of the other user can be displayed for the user who is directly facing the display device.

特開２００１−１３６５０１号公報JP 2001-136501 A

しかしながら、上記の方法では、会話者が表示装置の表示面に正対していない場合には、会話者同士の視線が一致しない。例えば、１つの会議室に複数の視聴者が居り、１つの表示装置を各々が見る場合、表示面に会話者の正面画像が表示されると、表示面に正対した席にいる視聴者は会話者と視線が一致する画像を見ることができるが、表示面に正対していない席にいる視聴者は会話者と視線が一致しない画像を見ることになる。したがって、表示装置に正対していない席に位置する視聴者が会話を行う場合は、相手の会話者が違う方向を向いている画像を見ながら会話を行うことになる。このため、非言語コミュニケーションにおいて重要である顔を向かい合わせての会話、特に視線による意思疎通を十分に図ることができない。 However, in the above method, when the talkers are not directly facing the display surface of the display device, the lines of sight of the talkers do not match. For example, when there are a plurality of viewers in one conference room and each of them views one display device, when a front image of a conversation person is displayed on the display surface, An image whose line of sight matches the conversation person can be seen, but a viewer who is in a seat not facing the display surface sees an image whose line of sight does not coincide with the conversation person. Therefore, when a viewer located at a seat not facing the display device has a conversation, the conversation is performed while viewing an image in which the other conversation person faces in a different direction. For this reason, face-to-face conversation, which is important in non-verbal communication, especially communication by line of sight cannot be sufficiently achieved.

本発明は、このような事情に鑑みてなされたもので、その目的は、会話者が表示面に正対していない場合でも相手の会話者と視線が一致する画像を表示することができるコミュニケーションシステムを提供することにある。 The present invention has been made in view of such circumstances, and a purpose thereof is a communication system capable of displaying an image whose line of sight coincides with that of a conversational partner even when the conversational person is not facing the display surface. Is to provide.

［１］本発明は上述した課題を解決するためになされたもので、本発明の一態様によるコミュニケーションシステムは、第一視聴者を撮像し、前記第一視聴者を略正面から見た画像を出力する画像出力部と、前記第一視聴者と会話する第二視聴者の視点の位置である第二視聴者基準位置を検出する基準位置出力部と、前記第一視聴者の画像を表示する表示面を含む表示部と、前記画像出力部から出力された画像に対して、前記第二視聴者基準位置と前記表示面の中心とを結ぶ直線に正対する仮想表示面から前記表示面への、前記第二視聴者基準位置を中心とする透視投影変換を行った画像を算出する画像伸縮部と、を具備し、前記表示部は前記画像伸縮部が算出した画像を表示する、ことを特徴とする。
このコミュニケーションシステムは、画像伸縮部が会話者の視点の位置である第二視聴者基準位置に基づいて画像を生成するので、第二視聴者基準位置から表示部を見る会話者に対して、この会話者が表示面に正対していない場合でも、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 [1] The present invention has been made to solve the above-described problems, and a communication system according to an aspect of the present invention captures an image of a first viewer and views the first viewer from substantially the front. An image output unit for outputting, a reference position output unit for detecting a second viewer reference position, which is a position of the viewpoint of the second viewer who talks with the first viewer, and an image of the first viewer are displayed. From the virtual display surface facing the straight line connecting the second viewer reference position and the center of the display surface to the display surface with respect to the display unit including the display surface and the image output from the image output unit An image expansion / contraction unit that calculates an image subjected to perspective projection conversion centered on the second viewer reference position, and the display unit displays the image calculated by the image expansion / contraction unit. And
In this communication system, the image expansion / contraction unit generates an image based on the second viewer reference position, which is the position of the conversation person's viewpoint. Even if the conversation person is not facing the display screen, an image whose line of sight matches that of the other conversation person can be displayed. It is possible to display a natural image that can be seen when the other party's conversation person is viewed from the front, instead of the displayed image.

［２］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記基準位置出力部は、視聴者の視点の位置である基準位置を、前記第二視聴者を含む１人以上の視聴者について検出する基準位置検出部と、前記基準位置検出部が検出した前記基準位置の中から前記第二視聴者基準位置を選択する基準位置選択部と、を具備することを特徴とする。
このコミュニケーションシステムは、複数の視聴者の基準位置から会話者の基準位置である第二視聴者基準位置を選択し、画像伸縮部が第二視聴者基準位置に基づいて画像を生成するので、表示部を見る視聴者が２人以上いる場合でも、第二視聴者基準位置から表示部を見る会話者に対して、この会話者が表示面に正対していない場合でも、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 [2] A communication system according to an aspect of the present invention is the communication system described above, wherein the reference position output unit includes a reference position that is the position of the viewer's viewpoint, including the second viewer. A reference position detecting unit for detecting the viewer, and a reference position selecting unit for selecting the second viewer reference position from the reference positions detected by the reference position detecting unit. To do.
This communication system selects a second viewer reference position that is a reference position of a conversation person from a plurality of viewer reference positions, and the image expansion / contraction unit generates an image based on the second viewer reference position. Even if there are two or more viewers viewing the part, even if the conversation person is not directly facing the display surface for the conversation person who is viewing the display part from the second viewer reference position, Can be displayed, and the other party's talker seen when viewing the display from an angle is not an image that is deformed vertically, but is natural when the other party's talker is seen from the front. An image can be displayed.

［３］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステム、かつ、通信路を介して接続された第一端末装置及び第二端末装置を有するコミュニケーションシステムであって、前記第一端末装置は、前記画像出力部と、前記基準位置選択部と、を具備し、前記第二端末装置は、前記基準位置検出部と、前記表示部と、前記画像伸縮部と、を具備することを特徴とする。
このコミュニケーションシステムは、上述のコミュニケーションシステムと同様、複数の視聴者の基準位置から会話者の基準位置である第二視聴者基準位置を選択し、画像伸縮部が第二視聴者基準位置に基づいて画像を生成するので、表示部を見る視聴者が２人以上いる場合でも、第二視聴者基準位置から表示部を見る会話者に対して、この会話者が表示面に正対していない場合でも、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 [3] A communication system according to an aspect of the present invention is the communication system having the above-described communication system and the first terminal device and the second terminal device connected via a communication path, wherein the first terminal The device includes the image output unit and the reference position selection unit, and the second terminal device includes the reference position detection unit, the display unit, and the image expansion / contraction unit. Features.
In this communication system, the second viewer reference position, which is the reference position of the conversation person, is selected from the reference positions of the plurality of viewers as in the above communication system, and the image expansion / contraction unit is based on the second viewer reference position Since the image is generated, even when there are two or more viewers viewing the display unit, even if the conversation person is not directly facing the display surface with respect to the conversation unit viewing the display unit from the second viewer reference position It is possible to display an image whose line of sight matches that of the other party's talker, and the other party's talker is not a vertically deformed image when the display surface is viewed obliquely. It is possible to display a natural image that can be seen when entangled.

［４］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記第二視聴者を含む１人以上の視聴者の画像を表示する第二表示面を有する第二表示部をさらに具備し、前記画像出力部は、前記第一視聴者を被写体とする第一ステレオ画像を撮像する第一撮像部と、前記第一ステレオ画像から前記第一視聴者の顔を含む画像を検出し、前記第一視聴者の視線方向を検出し、また、前記第一ステレオ画像に基づいて前記第一視聴者の顔を含む三次元モデルを生成し、前記視線方向から見た前記第一視聴者の顔を含む画像を生成して出力する自由視点画像生成部と、を具備し、前記基準位置検出部は、前記１人以上の視聴者を被写体とする第二ステレオ画像を撮像する第二撮像部と、前記第二ステレオ画像から前記１人以上の視聴者の各々の顔またはその一部の画像を検出し、前記１人以上の視聴者の各々の基準位置を算出する基準位置算出部と、を具備し、前記基準位置選択部は、前記第一ステレオ画像から前記第一視聴者の視線方向を検出し、前記視線方向と前記第二表示面との交点に表示される視聴者を前記第二視聴者として選択し、前記基準位置算出部が算出する基準位置の中から当該視聴者の基準位置を前記第二視聴者基準位置として選択する、ことを特徴とする。
このコミュニケーションシステムは、第一撮像部と自由視点画像生成部とが第一視聴者を正面（視線方向）から見た画像を出力し、第二撮像部と基準位置算出部とが基準位置を検出し、基準位置選択部が基準位置の中から第二視聴者基準位置を選択して出力するので、上述のコミュニケーションシステムと同様、表示部を見る視聴者が２人以上いる場合でも、第二視聴者基準位置から表示部を見る会話者に対して、この会話者が表示面に正対していない場合でも、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 [4] A communication system according to an aspect of the present invention is the communication system described above, and includes a second display unit having a second display surface that displays images of one or more viewers including the second viewer. The image output unit further includes: a first imaging unit that captures a first stereo image with the first viewer as a subject; and an image including the face of the first viewer from the first stereo image. Detecting the direction of the line of sight of the first viewer; generating a three-dimensional model including the face of the first viewer based on the first stereo image; A free viewpoint image generation unit that generates and outputs an image including a viewer's face, and the reference position detection unit captures a second stereo image having the one or more viewers as subjects. Two imaging units and the second A reference position calculation unit for detecting a face of each of the one or more viewers or a part of the image from a Leo image and calculating a reference position of each of the one or more viewers, and The reference position selection unit detects the first viewer's line-of-sight direction from the first stereo image, and selects the viewer displayed at the intersection of the line-of-sight direction and the second display surface as the second viewer. The reference position of the viewer is selected as the second viewer reference position from the reference positions calculated by the reference position calculation unit.
In this communication system, the first imaging unit and the free viewpoint image generation unit output an image obtained by viewing the first viewer from the front (line of sight), and the second imaging unit and the reference position calculation unit detect the reference position. In addition, since the reference position selection unit selects and outputs the second viewer reference position from the reference positions, the second viewing is performed even when there are two or more viewers viewing the display unit, as in the communication system described above. Even if this conversation person is not directly facing the display surface, the image whose line of sight coincides with the other party's conversation person can be displayed, and the display surface It is possible to display a natural image that can be seen when the other party's conversation person is viewed from the front, instead of an image in which the other person's conversation person who is viewed from an oblique direction is deformed vertically.

［５］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記表示面は少なくとも２方向に異なる画像を表示し、前記基準位置選択部は、前記第二視聴者基準位置を選択するとともに、前記第二視聴者基準位置に前記表示面が前記異なる画像を表示する方向である表示方向のいずれかを対応付け、前記表示部は、前記画像伸縮部から入力される画像を、前記第二視聴者基準位置に対応付けられた方向に表示する、ことを特徴とする。
このコミュニケーションシステムは、選択された第二視聴者基準位置の方向に対して、上述のコミュニケーションシステムと同様、会話者と視線が一致する画像を表示することができ、また、上述のコミュニケーションシステムと同様、自然な画像を表示することができる。 [5] A communication system according to an aspect of the present invention is the communication system described above, wherein the display surface displays different images in at least two directions, and the reference position selection unit includes the second viewer reference position. And the second viewer reference position is associated with any one of display directions in which the display surface displays the different images, and the display unit displays an image input from the image expansion / contraction unit. And displaying in a direction associated with the second viewer reference position.
This communication system can display an image in which the line of sight of the conversation person coincides with the direction of the selected second viewer reference position, as with the communication system described above. , Natural images can be displayed.

［６］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記第二視聴者を含む１人以上の視聴者の画像を表示する第二表示面を有する第二表示部をさらに具備し、前記画像出力部は、前記第一視聴者を被写体とする第一ステレオ画像を撮像する第一撮像部と、前記第一ステレオ画像から前記第一視聴者の顔を含む画像を検出し、前記第一視聴者の視線方向を検出し、また、前記第一ステレオ画像に基づいて前記第一視聴者の顔を含む三次元モデルを生成し、前記視線方向から見た前記第一視聴者の顔を含む画像を生成して出力する自由視点画像生成部と、を具備し、前記基準位置検出部は、前記１人以上の視聴者を被写体とする第二ステレオ画像を撮像する第二撮像部と、前記第二ステレオ画像から前記１人以上の視聴者の各々の顔またはその一部の画像を検出し、前記１人以上の視聴者の各々の基準位置を算出する基準位置算出部と、を具備し、前記基準位置選択部は、前記第一ステレオ画像から前記第一視聴者の視線方向を検出し、前記視線方向と前記第二表示面との交点に表示される視聴者を前記第二視聴者として選択し、前記基準位置算出部が算出する基準位置の中から当該視聴者の基準位置を前記第二視聴者基準位置として選択し、前記表示面の中心と前記第二視聴者基準位置との位置関係に基づいて、前記第二視聴者基準位置に前記表示方向のいずれかを対応付ける、ことを特徴とする。
このコミュニケーションシステムは、第一撮像部と自由視点画像生成部とが第一視聴者を正面（視線方向）から見た画像を出力し、第二撮像部と基準位置算出部とが基準位置を検出し、基準位置選択部が基準位置の中から第二視聴者基準位置を選択して出力するので、上述のコミュニケーションシステムと同様、選択された第二視聴者基準位置の方向に対して、会話者と視線が一致する画像を表示することができ、また、上述のコミュニケーションシステムと同様、自然な画像を表示することができる。 [6] A communication system according to an aspect of the present invention is the communication system described above, and includes a second display unit having a second display surface that displays images of one or more viewers including the second viewer. The image output unit further includes: a first imaging unit that captures a first stereo image with the first viewer as a subject; and an image including the face of the first viewer from the first stereo image. Detecting the direction of the line of sight of the first viewer; generating a three-dimensional model including the face of the first viewer based on the first stereo image; A free viewpoint image generation unit that generates and outputs an image including a viewer's face, and the reference position detection unit captures a second stereo image having the one or more viewers as subjects. Two imaging units and the second A reference position calculation unit for detecting a face of each of the one or more viewers or a part of the image from a Leo image and calculating a reference position of each of the one or more viewers, and The reference position selection unit detects the first viewer's line-of-sight direction from the first stereo image, and selects the viewer displayed at the intersection of the line-of-sight direction and the second display surface as the second viewer. The reference position of the viewer is selected as the second viewer reference position from the reference positions calculated by the reference position calculation unit, and the positional relationship between the center of the display surface and the second viewer reference position Based on the above, any one of the display directions is associated with the second viewer reference position.
In this communication system, the first imaging unit and the free viewpoint image generation unit output an image obtained by viewing the first viewer from the front (line of sight), and the second imaging unit and the reference position calculation unit detect the reference position. In addition, since the reference position selection unit selects and outputs the second viewer reference position from the reference positions, as in the communication system described above, the conversation person is displayed in the direction of the selected second viewer reference position. Can be displayed, and a natural image can be displayed as in the communication system described above.

［７］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記表示面は少なくとも２方向に異なる画像を表示し、前記基準位置選択部は、前記表示面が前記異なる画像を表示する方向である表示方向の各々に、基準位置のいずれかを対応付け、前記画像出力部は、前記第二視聴者基準位置が対応付けられていない方向に対応付けられた基準位置に基づいて定められる位置から見た前記第一視聴者の画像を生成し、当該方向と当該画像とを対応付けて前記画像伸縮部に出力し、前記画像伸縮部は、前記入力された方向の各々について、当該方向に対応付けられた画像に対して、当該方向に対応付けられた基準位置と前記表示面の中心とを結ぶ直線に正対する仮想表示面から前記表示面への、前記当該方向に対応付けられた基準位置を中心とする透視投影変換を行った画像を算出し、算出した画像を当該方向と対応付けて出力し、前記表示部は、前記画像伸縮部から入力される画像を、該画像に対応付けられた方向に表示する、ことを特徴とする。
このコミュニケーションシステムは、基準位置選択部が選択しない基準位置に対して、この基準位置に基づいて定められる位置から見た画像を出力する。この画像として、第二視聴者基準位置の方向を向く会話者の画像を表示することで、会話者でない視聴者に対して、会話者同士が向かい合う、より自然な画像を表示することができる。 [7] A communication system according to an aspect of the present invention is the communication system described above, wherein the display surface displays different images in at least two directions, and the reference position selection unit has the display surface different in image. Any one of the reference positions is associated with each display direction that is a direction for displaying the image, and the image output unit is based on a reference position associated with a direction not associated with the second viewer reference position. Generating the first viewer image viewed from the position determined, and associating the direction with the image and outputting the image to the image expansion / contraction unit, the image expansion / contraction unit for each of the input directions From the virtual display surface facing the straight line connecting the reference position associated with the direction and the center of the display surface to the image associated with the direction from the virtual display surface to the display surface An image obtained by performing perspective projection transformation centered on the reference position associated with the direction is calculated, the calculated image is output in association with the direction, and the display unit is input from the image expansion / contraction unit. The image is displayed in a direction associated with the image.
This communication system outputs an image viewed from a position determined based on the reference position, with respect to a reference position that is not selected by the reference position selection unit. By displaying the image of the conversation person facing the direction of the second viewer reference position as this image, a more natural image in which the conversation persons face each other can be displayed to a non-conversation viewer.

［８］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、前記第二視聴者を含む一人以上の視聴者の画像を表示する第二表示面を有する第二表示部をさらに具備し、前記画像出力部は、前記第一視聴者を被写体とする第一ステレオ画像を撮像する第一撮像部と、前記第一ステレオ画像から前記第一視聴者の顔を含む画像を検出し、前記第一視聴者の視線方向を検出し、また、前記第一ステレオ画像に基づいて前記第一視聴者の顔を含む三次元モデルを生成し、前記基準位置選択部が前記表示方向に対応付けた前記視点の位置のうち前記第二視聴者基準位置を除く視点の位置の各々に対して、前記表示面の中心の位置を中心とする、当該視点の位置から前記第二視聴者基準位置への向きを検出し、前記第一視聴者の視線方向から前記検出した向きに移動した位置から見た画像を生成して、当該視点の位置が対応付けられた表示方向と前記生成した画像とを対応付けて前記画像伸縮部に出力する自由視点画像生成部と、を具備し、前記基準位置検出部は、前記１人以上の視聴者を被写体とする第二ステレオ画像を撮像する第二撮像部と、前記第二ステレオ画像から前記１人以上の視聴者の顔またはその一部の画像を検出し、前記１人以上の視聴者の各々の基準位置を算出する基準位置算出部と、を具備し、前記基準位置選択部は、前記第一ステレオ画像から前記第一視聴者の視線方向を検出し、前記視線方向と前記第二表示面との交点に表示される視聴者の基準位置を、前記基準位置算出部が算出する基準位置の中から第二視聴者基準位置として選択し、前記表示面の中心と前記第二視聴者基準位置との位置関係に基づいて、前記第二視聴者基準位置に前記表示方向のいずれかを対応付け、前記第二視聴者基準位置に対応付けられた方向以外の表示方向の各々に、第二視聴者基準位置以外の基準位置のいずれかを、前記表示面と各基準位置との位置関係に基づいて対応付ける、ことを特徴とする。
このコミュニケーションシステムは、第二視聴者基準位置を除く視点の位置の各々に対して、第二視聴者基準位置を向く会話者の画像を表示する。したがって、会話者でない視聴者に対して、会話者同士が向かい合う、より自然な画像を表示することができる。 [8] A communication system according to an aspect of the present invention is the communication system described above, and includes a second display unit having a second display surface that displays an image of one or more viewers including the second viewer. The image output unit further includes a first imaging unit that captures a first stereo image with the first viewer as a subject, and an image including the face of the first viewer from the first stereo image. And detecting a line-of-sight direction of the first viewer, generating a three-dimensional model including the face of the first viewer based on the first stereo image, and the reference position selection unit in the display direction For each of the viewpoint positions excluding the second viewer reference position among the associated viewpoint positions, the second viewer reference is determined from the viewpoint position centered on the center position of the display surface. Orientation to position And generating an image viewed from a position moved in the detected direction from the line-of-sight direction of the first viewer, and associating the generated image with the display direction associated with the position of the viewpoint A free viewpoint image generation unit that outputs to an image expansion / contraction unit, wherein the reference position detection unit captures a second stereo image having the one or more viewers as subjects, and the first imaging unit A reference position calculation unit for detecting a face of the one or more viewers or a part of the image from two stereo images and calculating a reference position of each of the one or more viewers, and the reference The position selection unit detects the first viewer's line-of-sight direction from the first stereo image, and calculates the reference position of the viewer displayed at the intersection of the line-of-sight direction and the second display surface. The second viewer from the reference position calculated by the department The second viewer is selected as a quasi-position, and any one of the display directions is associated with the second viewer reference position based on the positional relationship between the center of the display surface and the second viewer reference position. A reference position other than the second viewer reference position is associated with each display direction other than the direction associated with the reference position based on the positional relationship between the display surface and each reference position. And
This communication system displays an image of a conversation person facing the second viewer reference position for each of the viewpoint positions excluding the second viewer reference position. Therefore, it is possible to display a more natural image in which the talkers face each other for viewers who are not talkers.

［９］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステム、かつ、通信路を介して互いに接続され、第一視聴者が利用する第一端末装置と第二視聴者が利用する第二端末装置と第三視聴者が利用する第三端末装置を有するコミュニケーションシステムであって、前記第一視聴者と前記第二視聴者と第三視聴者とのうち何れか２人を会話者として選択する会話者選択部をさらに具備し、前記画像出力部は、前記会話者選択部が、前記第一視聴者と前記第二視聴者とを選択した場合は、前記第一視聴者を略正面から見た画像を生成して前記画像伸縮部に出力し、前記第一視聴者と前記第三視聴者とを選択した場合は、正面以外から見た前記第一視聴者の画像を生成して前記画像伸縮部に出力する、ことを特徴とする。
このコミュニケーションシステムは、第一視聴者と第二視聴者が会話者であるときは、上述のコミュニケーションシステムと同様、第一視聴者と視線が一致する画像を第二視聴者に表示することができ、また、上述のコミュニケーションシステムと同様、自然な画像を表示することができる。これにより第二視聴者は、相手の会話者と視線が一致した自然な画像を見ながら会話を行うことができる。また、第一視聴者と第三視聴者とが会話者であって第二視聴者が会話者でないときは、第一視聴者を正面以外から見た画像を第二視聴者に表示する。第三視聴者が表示された表示部の方向を向いた第一視聴者の画像を表示することにより、会話者でない第二視聴者は、会話者である第一視聴者が相手の会話者の方向を向いた、より自然な画像を見ることができる。 [9] Also, a communication system according to an aspect of the present invention is connected to each other via the communication system described above, and a first terminal device used by the first viewer and a second viewer used by the second viewer. A communication system having a second terminal device and a third terminal device used by a third viewer, wherein any two of the first viewer, the second viewer, and the third viewer are conversational persons. The image output unit further includes a selection unit for selecting a conversation person. When the conversation person selection unit selects the first viewer and the second viewer, the image output unit substantially faces the first viewer. When the first viewer and the third viewer are selected, an image of the first viewer viewed from outside the front is generated when the first viewer and the third viewer are selected. Outputting to the image expansion / contraction part And features.
In this communication system, when the first viewer and the second viewer are conversational persons, an image whose line of sight matches the first viewer can be displayed to the second viewer, as in the above communication system. Moreover, a natural image can be displayed similarly to the communication system described above. Thereby, the second viewer can have a conversation while looking at a natural image whose line of sight matches that of the other party's conversation. Further, when the first viewer and the third viewer are the talkers and the second viewer is not the talker, an image obtained by viewing the first viewer from a position other than the front is displayed to the second viewer. By displaying an image of the first viewer facing the direction of the display unit on which the third viewer is displayed, the second viewer who is not a conversation person is the first viewer who is the conversation person. You can see a more natural image facing the direction.

［１０］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステム、かつ、通信路を介して前記第一端末装置と前記第二端末装置と前記第三端末装置とに接続された会話者選択装置を有するコミュニケーションシステムであって、前記第一端末装置は、前記画像出力部を具備し、前記第二端末装置は、前記基準位置出力部と、前記表示部と、前記画像伸縮部と、を具備し、前記会話者選択装置は、前記会話者選択部を具備する、ことを特徴とする。
このコミュニケーションシステムは、上述のコミュニケーションシステムと同様、第一視聴者と第二視聴者が会話者であるときは、第一視聴者と視線が一致する画像を第二視聴者に表示することができ、また、上述のコミュニケーションシステムと同様、自然な画像を表示することができる。これにより第二視聴者は、相手の会話者と視線が一致した自然な画像を見ながら会話を行うことができる。また、第一視聴者と第三視聴者とが会話者であって第二視聴者が会話者でないときは、第一視聴者を正面以外から見た画像を第二視聴者に表示する。第三視聴者が表示された表示部の方向を向いた第一視聴者の画像を表示することにより、会話者でない第二視聴者は、会話者である第一視聴者が相手の会話者の方向を向いた、より自然な画像を見ることができる。 [10] A communication system according to an aspect of the present invention is the above communication system, and a conversation person connected to the first terminal device, the second terminal device, and the third terminal device via a communication path. A communication system having a selection device, wherein the first terminal device includes the image output unit, and the second terminal device includes the reference position output unit, the display unit, the image expansion / contraction unit, And the talker selection device comprises the talker selection unit.
Similar to the communication system described above, this communication system can display an image whose line of sight matches the first viewer to the second viewer when the first viewer and the second viewer are conversational persons. Moreover, a natural image can be displayed like the above-mentioned communication system. Thereby, the second viewer can have a conversation while looking at a natural image whose line of sight matches that of the other party's conversation. Further, when the first viewer and the third viewer are the talkers and the second viewer is not the talker, an image obtained by viewing the first viewer from a position other than the front is displayed to the second viewer. By displaying an image of the first viewer facing the direction of the display unit on which the third viewer is displayed, the second viewer who is not a conversation person is the first viewer who is the conversation person. You can see a more natural image facing the direction.

［１１］また、本発明の一態様によるコミュニケーションシステムは上述のコミュニケーションシステムであって、第三視聴者を撮像する第三視聴者撮像部をさらに具備し、前記画像出力部は、前記第一視聴者を被写体とする第一ステレオ画像を撮像する第一視聴者撮像部と、前記第一ステレオ画像から前記第一視聴者の顔を含む画像を検出し、前記第一視聴者の視線方向を検出し、また、前記第一ステレオ画像に基づいて前記第一視聴者の顔を含む三次元モデルを生成し、前記会話者選択部が前記第一視聴者と前記第二視聴者とを選択した場合は、前記第一視聴者を略正面から見た画像を生成して出力し、前記会話者選択部が前記第一視聴者と前記第三視聴者とを選択した場合は、前記表示面の中心の位置を中心とする、前記第二表示面の中心の位置から前記基準位置出力部が出力する基準位置の方向への向きを検出し、前記第一視聴者の視線方向から前記検出した向きに移動した位置から見た画像を生成して、前記正面以外から見た前記第一視聴者の画像として出力する自由視点画像生成部と、を具備し、前記基準位置出力部は、前記第二視聴者を被写体とする第二ステレオ画像を撮像する第二視聴者撮像部と、前記第二ステレオ画像から前記第二視聴者の顔またはその一部の画像を検出し、前記第二視聴者基準位置を算出する基準位置算出部と、を具備し、前記会話者選択部は、前記第一撮像部が撮像した画像と前記第二視聴者撮像部が撮像した画像と前記第三視聴者撮像部が撮像した画像とから、それぞれ前記第一視聴者の口の動きの頻度と前記第二視聴者の口の動きの頻度と前記第三視聴者の口の動きの頻度とを検出し、検出した頻度に基づいて会話者を選択し、選択した会話者の画像から視線方向を検出し、検出した視線方向に基づいて相手の会話者を選択する、ことを特徴とする。
このコミュニケーションシステムは、上述のコミュニケーションシステムと同様、第一視聴者と第二視聴者が会話者であるときは、第一視聴者と視線が一致する画像を第二視聴者に表示することができ、また、上述のコミュニケーションシステムと同様、自然な画像を表示することができる。これにより、第二視聴者は、相手の会話者と視線が一致した自然な画像を見ながら会話を行うことができる。また、第一視聴者と第三視聴者とが会話者であって第二視聴者が会話者でないときは、第三視聴者が表示された表示部の方向を向いた第一視聴者の画像を表示するので、会話者でない第二視聴者は、会話者である第一視聴者が相手の会話者の方向を向いた、より自然な画像を見ることができる。 [11] A communication system according to an aspect of the present invention is the communication system described above, further including a third viewer imaging unit that images a third viewer, and the image output unit includes the first viewing unit. A first viewer imaging unit that captures a first stereo image of a person as a subject, and an image including the face of the first viewer is detected from the first stereo image, and the line-of-sight direction of the first viewer is detected And when a three-dimensional model including the face of the first viewer is generated based on the first stereo image, and the talker selection unit selects the first viewer and the second viewer. Generates and outputs an image in which the first viewer is viewed from substantially the front, and when the talker selection unit selects the first viewer and the third viewer, the center of the display surface Centered on the position of the second display surface Detecting the direction from the position of the heart in the direction of the reference position output by the reference position output unit, and generating an image viewed from the position moved in the detected direction from the viewing direction of the first viewer, A free viewpoint image generation unit that outputs the first viewer image as viewed from other than the front, and the reference position output unit captures a second stereo image with the second viewer as a subject. A second viewer imaging unit, and a reference position calculation unit that detects the second viewer's face or a part of the image from the second stereo image and calculates the second viewer reference position, The talker selection unit includes the first viewer, the image captured by the second viewer imaging unit, and the image captured by the third viewer imaging unit, respectively. The frequency of mouth movements and the frequency of mouth movements of the second viewer And the third viewer's mouth movement frequency, select a talker based on the detected frequency, detect a gaze direction from an image of the selected talker, and based on the detected gaze direction It is characterized by selecting a conversation person.
Similar to the communication system described above, this communication system can display an image whose line of sight matches the first viewer to the second viewer when the first viewer and the second viewer are conversational persons. Moreover, a natural image can be displayed like the above-mentioned communication system. Thereby, the second viewer can have a conversation while seeing a natural image whose line of sight coincides with the other conversation person. In addition, when the first viewer and the third viewer are the talkers and the second viewer is not the talker, the image of the first viewer facing the direction of the display unit on which the third viewer is displayed. Is displayed, the second viewer who is not a conversation person can see a more natural image in which the first viewer who is the conversation person faces the other conversation person.

本発明によれば、会話者が表示面に正対していない場合でも相手の会話者と視線が一致する画像を表示することができる。 According to the present invention, it is possible to display an image whose line of sight matches that of the other party's conversation even when the conversation is not facing the display surface.

本発明の第１の実施形態におけるテレビ会議システム１の概略構成を示すシステム構成図である。1 is a system configuration diagram showing a schematic configuration of a video conference system 1 in a first embodiment of the present invention. 同実施形態におけるテレビ会議端末装置１１が設置される会議室Ｒ１内の平面図である。It is a top view in conference room R1 in which the video conference terminal device 11 in the same embodiment is installed. 同実施形態におけるテレビ会議端末装置１２が設置される会議室Ｒ２内の平面図である。It is a top view in conference room R2 in which the video conference terminal device 12 in the same embodiment is installed. 同実施形態におけるテレビ会議端末装置１１及び１２の概略構成を示す構成図である。It is a block diagram which shows schematic structure of the video conference terminal devices 11 and 12 in the embodiment. 同実施形態における表示部１１６を表示面に対して直角方向手前から見た正面図である。It is the front view which looked at the display part 116 in the same embodiment from the near-right direction with respect to the display surface. 同実施形態における表示部１２７を表示面に対して直角方向手前から見た正面図である。It is the front view which looked at the display part 127 in the same embodiment from the orthogonal | vertical direction front with respect to the display surface. 同実施形態において基準位置算出部１２２が算出する基準位置の相対座標を示す図である。It is a figure which shows the relative coordinate of the reference position which the reference position calculation part 122 calculates in the same embodiment. 同実施形態において基準位置算出部１２２が基準位置選択部１１２に入力するデータのデータ構成を示すデータ構成図である。4 is a data configuration diagram illustrating a data configuration of data input to a reference position selection unit 112 by a reference position calculation unit 122 in the embodiment. FIG. 同実施形態において視聴者Ｐ１１が表示部１１６上の視聴者Ｐ２４の画像を見る視線を示す図である。It is a figure which shows the eyes | visual_axis which the viewer P11 sees the image of the viewer P24 on the display part 116 in the embodiment. 同実施形態において基準位置選択部１１２が画像伸縮部１２６に入力する基準位置の情報のデータ構成を示すデータ構成図である。4 is a data configuration diagram illustrating a data configuration of information on a reference position input to an image expansion / contraction unit 126 by a reference position selection unit 112 in the embodiment. FIG. 同実施形態において自由視点画像生成部１１３が表示部１１６の表示面の中央正面から撮像した場合の画像を生成する処理手順を示すフローチャートである。5 is a flowchart illustrating a processing procedure for generating an image when the free viewpoint image generation unit 113 captures an image from the center front of the display surface of the display unit 116 in the embodiment. 同実施形態において画像伸縮部１２６が行う画像の伸縮を示す図である。It is a figure which shows the expansion-contraction of the image which the image expansion-contraction part 126 performs in the embodiment. 同実施形態において画像伸縮部１２６が画像の伸縮を行う処理手順を示すフローチャートである。5 is a flowchart illustrating a processing procedure in which an image expansion / contraction unit 126 expands / contracts an image in the embodiment. 同実施形態において自由視点画像生成部１１３が出力する画像および画像伸縮部１２６が画像の伸縮を行った画像の例を示す図である。5 is a diagram illustrating an example of an image output from a free viewpoint image generation unit 113 and an image in which an image expansion / contraction unit 126 performs image expansion / contraction in the embodiment. FIG. 本発明の第２の実施形態におけるテレビ会議システム２の概略構成を示すシステム構成図である。It is a system block diagram which shows schematic structure of the video conference system 2 in the 2nd Embodiment of this invention. 同実施形態において、各表示部が表示する画面の例を示す図である。In the embodiment, it is a figure which shows the example of the screen which each display part displays. 同実施形態において、会話者選択部２４１が会話者を選択する処理手順を示すフローチャートである。In the same embodiment, it is a flowchart which shows the process sequence in which the conversation person selection part 241 selects a conversation person. 同実施形態において、会話者選択部２４１が自由視点画像生成部２１３と２２３と２３３とに入力するデータのデータ構成図である。In the same embodiment, it is a data block diagram of the data which the conversation person selection part 241 inputs into the free viewpoint image generation part 213, 223, and 233. FIG. 同実施形態において視聴者Ｐ３１が会話者でないと判断した場合に、自由視点画像生成部２２３及び自由視点画像生成部２３３が撮像位置を回転させる角度を示す図である。It is a figure which shows the angle which the free viewpoint image generation part 223 and the free viewpoint image generation part 233 rotate an imaging position when it determines with the viewer P31 not being a talker in the embodiment. 同実施形態において自由視点画像生成部２２３及び２３３が生成する画像の撮像位置を示す図である。It is a figure which shows the imaging position of the image which the free viewpoint image generation parts 223 and 233 generate | occur | produce in the embodiment. 同実施形態において画像伸縮部２１４及び２１５が行う画像の伸縮を示す図である。It is a figure which shows the expansion-contraction of the image which the image expansion-contraction parts 214 and 215 perform in the embodiment. 本発明の第３の実施形態におけるテレビ会議システム３の概略構成を示すシステム構成図である。It is a system block diagram which shows schematic structure of the video conference system 3 in the 3rd Embodiment of this invention. 同実施形態において表示部３２８が表示する画像の例を示す図である。It is a figure which shows the example of the image which the display part 328 displays in the embodiment. 同実施形態において表示部３１６が表示する画像の例を示す図である。It is a figure which shows the example of the image which the display part 316 displays in the same embodiment. 同実施形態において基準位置選択部３１２が自由視点画像生成部３１３に入力するデータの構成を示すデータ構成図である。4 is a data configuration diagram illustrating a configuration of data input to a free viewpoint image generation unit 313 by a reference position selection unit 312 in the embodiment. FIG. 同実施形態において、視聴者Ｐ５１が会話者であると判断した場合に、自由視点画像生成部３１３が撮像位置を回転させる角度を示す図である。In the embodiment, when the viewer P51 is determined to be a conversation person, the free viewpoint image generation unit 313 shows an angle at which the imaging position is rotated.

＜第１の実施形態＞
以下、図面を参照して、本発明の実施の形態について説明する。以下では、コミュニケーションシステムの一例としてテレビ会議システムに本発明を適用した場合について説明するが、本発明の適用範囲はテレビ会議システムに限らない。なお、ここでいうコミュニケーションシステムとは、コミュニケーションの当事者の画像を相手当事者に表示するシステムであり、例えば、テレビ会議システムや、インフォメーションディスプレイに案内係の画像を表示して情報提供を行うインフォメーションシステムや、ディスプレイに教師の画像を表示して授業を行う教育システム等がある。 <First Embodiment>
Embodiments of the present invention will be described below with reference to the drawings. Below, the case where this invention is applied to a video conference system as an example of a communication system is demonstrated, However, The application range of this invention is not restricted to a video conference system. The communication system here is a system that displays an image of a party of communication to the other party. In addition, there are educational systems for teaching lessons by displaying teacher images on a display.

図１は、本発明の第１の実施形態におけるテレビ会議システム１の概略構成を示すシステム構成図である。同図において、テレビ会議システム１は、通信ネットワーク１３によって互いに接続されるテレビ会議端末装置（第一端末装置）１１とテレビ会議端末装置（第二端末装置）１２とを含んで構成される。テレビ会議端末装置１１で撮像された画像及び採音された音声は、送信装置により通信ネットワーク１３を介してテレビ会議端末装置１２に送られ、そこの受信装置を介して表示部により画像及び音声が再現・表示される。また、テレビ会議端末装置１２で撮像された画像及び採音された音声は、送信装置により通信ネットワーク１３を介してテレビ会議端末装置１１に送られ、そこの受信装置を介して表示部により画像及び音声が再現・表示される。 FIG. 1 is a system configuration diagram showing a schematic configuration of a video conference system 1 according to the first embodiment of the present invention. In FIG. 1, the video conference system 1 includes a video conference terminal device (first terminal device) 11 and a video conference terminal device (second terminal device) 12 connected to each other by a communication network 13. The image picked up by the video conference terminal device 11 and the collected voice are sent to the video conference terminal device 12 via the communication network 13 by the transmission device, and the image and the voice are transmitted by the display unit via the reception device. Reproduced and displayed. In addition, the image picked up by the video conference terminal device 12 and the collected sound are sent to the video conference terminal device 11 via the communication network 13 by the transmission device, and the image and the sound are collected by the display unit via the reception device. The sound is reproduced and displayed.

図２は、テレビ会議端末装置１１が設置される会議室Ｒ１内の平面図である。後述するように、テレビ会議端末装置１１は撮像装置１１１−１及び１１１−２と表示部（第二表示部）１１６とを含んで構成される。同図において、会議室Ｒ１内には撮像装置１１１−１及び１１１−２と表示部１１６と机Ｔ１とが配置され、視聴者（第一視聴者）Ｐ１１が居る。なお、テレビ会議端末装置１１を構成する他の部分の配置については図示を省略する。これらは会議室Ｒ１内に配置されていてもよいし、会議室Ｒ１外に配置されていてもよい。あるいは、表示部１１６の筐体内に組み込まれていてもよい。 FIG. 2 is a plan view of the conference room R1 in which the video conference terminal device 11 is installed. As will be described later, the video conference terminal device 11 includes imaging devices 111-1 and 111-2 and a display unit (second display unit) 116. In the figure, imaging devices 111-1 and 111-2, a display unit 116, and a desk T1 are arranged in a conference room R1, and a viewer (first viewer) P11 is present. In addition, illustration is abbreviate | omitted about arrangement | positioning of the other part which comprises the video conference terminal device 11. FIG. These may be arranged in the conference room R1, or may be arranged outside the conference room R1. Alternatively, the display unit 116 may be incorporated in the housing.

図３は、テレビ会議端末装置１２が設置される会議室Ｒ２内の平面図である。後述するように、テレビ会議端末装置１２は撮像装置１２１−１及び１２１−２と表示部１２７とを含んで構成される。同図において、会議室Ｒ２には撮像装置１２１−１及び１２１−２と表示部１２７と机Ｔ２とが配置され、視聴者Ｐ２１とＰ２２とＰ２３とＰ２４とＰ２５とが居る。なお、テレビ会議端末装置１２を構成する他の部分の配置については図示を省略する。これらは会議室Ｒ２内に配置されていてもよいし、会議室Ｒ２外に配置されていてもよい。あるいは、表示部１２７の筐体内に組み込まれていてもよい。 FIG. 3 is a plan view of the conference room R2 in which the video conference terminal device 12 is installed. As will be described later, the video conference terminal device 12 includes imaging devices 121-1 and 121-2 and a display unit 127. In the figure, in the conference room R2, imaging devices 121-1 and 121-2, a display unit 127, and a desk T2 are arranged, and there are viewers P21, P22, P23, P24, and P25. In addition, illustration is abbreviate | omitted about arrangement | positioning of the other part which comprises the video conference terminal device 12. FIG. These may be arranged in the conference room R2, or may be arranged outside the conference room R2. Alternatively, the display unit 127 may be incorporated in the housing.

図４は、テレビ会議端末装置１１及び１２の概略構成並びにそれらの間の通信ネットワーク（図１）を介する接続関係を示す構成図である。同図において、テレビ会議システム１は、テレビ会議端末装置１１とテレビ会議端末装置１２とを含んで構成される。テレビ会議端末装置１１は、撮像部（第一撮像部）１１１と基準位置選択部１１２と自由視点画像生成部１１３と表示部１１６とを含んで構成される。撮像部（第一撮像部）１１１は撮像装置１１１−１及び１１１−２を含んで構成される。テレビ会議端末装置１２は、撮像部（第二撮像部）１２１と基準位置算出部１２２と画像伸縮部１２６と表示部１２７とを含んで構成される。撮像部１２１は撮像装置１２１−１及び１１１−２を含んで構成される。なお、撮像部１１１が３個以上の撮像装置を備えるようにしてもよい。同様に、撮像部１２１が３個以上の撮像装置を備えるようにしてもよい。撮像部１２１または撮像部１３１が備える撮像装置の数を増やし、被写体である視聴者の広い範囲を撮像することにより、後述する視聴者の画像を生成する際に、様々な向きの画像を生成することができる。また、生成する画像の向きに近い方向から撮像した画像を用いることにより、後述する視聴者の画像を生成する際に、より精度の高い３次元座標データを算出し、より精度の高い画像を生成することができる。 FIG. 4 is a configuration diagram showing a schematic configuration of the video conference terminal devices 11 and 12 and a connection relationship therebetween via a communication network (FIG. 1). In FIG. 1, the video conference system 1 includes a video conference terminal device 11 and a video conference terminal device 12. The video conference terminal device 11 includes an imaging unit (first imaging unit) 111, a reference position selection unit 112, a free viewpoint image generation unit 113, and a display unit 116. The imaging unit (first imaging unit) 111 includes imaging devices 111-1 and 111-2. The video conference terminal device 12 includes an imaging unit (second imaging unit) 121, a reference position calculation unit 122, an image expansion / contraction unit 126, and a display unit 127. The imaging unit 121 includes imaging devices 121-1 and 111-2. Note that the imaging unit 111 may include three or more imaging devices. Similarly, the imaging unit 121 may include three or more imaging devices. By increasing the number of imaging devices included in the imaging unit 121 or the imaging unit 131 and capturing an image of a wide range of viewers as subjects, images of various orientations are generated when generating viewer images described later. be able to. Further, by using an image captured from a direction close to the direction of the image to be generated, when generating an image of a viewer described later, more accurate three-dimensional coordinate data is calculated, and a more accurate image is generated. can do.

撮像部１２１と基準位置算出部１２２とが本発明の基準位置検出部１２３に対応し、視聴者の視点の位置（以下、基準位置ともいう）を、会話者である第二視聴者を含む１人以上の視聴者について検出する。本実施形態においては、撮像部１２１が撮像する画像を用いて、基準位置算出部１２２が基準位置を算出することにより、基準位置検出部１２３は、基準位置を検出する。なお、基準位置検出部が基準位置を検出する方法は、前述の方法に限らない。例えば、基準位置検出部が基準位置算出部から構成され、後述するように基準位置算出部が位置センサを用いて基準位置を求めることによって、基準位置検出部が基準位置を検出するなど、他の方法を用いてもよい。 The imaging unit 121 and the reference position calculation unit 122 correspond to the reference position detection unit 123 of the present invention, and the viewer's viewpoint position (hereinafter also referred to as a reference position) includes a second viewer who is a talker. Detect more than viewers. In the present embodiment, the reference position detection unit 123 detects the reference position by calculating the reference position using the image captured by the imaging unit 121 and the reference position calculation unit 122. Note that the method by which the reference position detection unit detects the reference position is not limited to the method described above. For example, the reference position detection unit is composed of a reference position calculation unit, and the reference position detection unit detects the reference position by using the position sensor as described later, and the reference position detection unit detects the reference position. A method may be used.

また、この基準位置検出部１２３と基準位置選択部１１２とが本発明の基準位置出力部に対応し、会話者である第二視聴者の基準位置（以下では、第二視聴者基準位置ともいう）を出力する。本実施形態では、基準位置検出部１２３が検出した基準位置の中から、基準位置選択部１１２が会話者である第二視聴者の基準位置を選択することにより、基準位置出力部は、第二視聴者基準位置を検出する。
また、撮像部１１１と自由視点画像生成部１１３とが本発明の画像出力部１１４に対応し、第一視聴者を撮像して、第一視聴者を正面から見た画像（以下では、正面画像ともいう）を出力する。本実施形態では、撮像部１１１が撮像する画像を用いて、自由視点画像生成部１１３が視聴者Ｐ１１（第一視聴者）の正面画像を生成し、生成した正面画像を出力することによって、画像出力部１１４が正面画像を出力する。なお、画像出力部が正面画像を出力する方法は、前述の方法に限らない。例えば、画像出力部が撮像装置から構成され、後述するように、ハーフミラーを使用する既存の方法により、撮像装置が表示部１１６の表示面の中央正面からの画像を撮像することにより、画像出力部が正面画像を出力するなど、他の方法を用いてもよい。
なお、画像出力部１１４が出力する画像は、第一視聴者を真正面から見た画像でなくともよく、視聴者が見て違和感を感じない程度であれば、真正面からずれた位置から見た画像であってもよい。 Further, the reference position detection unit 123 and the reference position selection unit 112 correspond to the reference position output unit of the present invention, and a reference position of a second viewer who is a talker (hereinafter also referred to as a second viewer reference position). ) Is output. In the present embodiment, when the reference position selection unit 112 selects the reference position of the second viewer who is a conversation person from the reference positions detected by the reference position detection unit 123, the reference position output unit The viewer reference position is detected.
In addition, the imaging unit 111 and the free viewpoint image generation unit 113 correspond to the image output unit 114 of the present invention, and images the first viewer and views the first viewer from the front (hereinafter, the front image). (Also called). In the present embodiment, the free viewpoint image generation unit 113 generates a front image of the viewer P11 (first viewer) using the image captured by the imaging unit 111, and outputs the generated front image. The output unit 114 outputs a front image. Note that the method by which the image output unit outputs the front image is not limited to the method described above. For example, the image output unit includes an imaging device, and the image output device captures an image from the center front of the display surface of the display unit 116 by an existing method using a half mirror, as will be described later. Other methods may be used such that the unit outputs a front image.
Note that the image output by the image output unit 114 does not have to be an image obtained by viewing the first viewer from the front. If the viewer does not feel uncomfortable, an image viewed from a position shifted from the front. It may be.

撮像装置１１１−１と１１１−２と１２１−１と１２１−２とは、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ；電荷結合素子）とレンズとを含んで構成され、動画像を撮像するカメラである。なお、撮像装置１１１−１又は１１１−２又は１２１−１又は１２１−２が撮像素子として、ＣＣＤに換えてＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ；相補性金属酸化膜半導体）等の固体撮像デバイスを含んで構成されるようにしてもよい。また、表示部１１６が撮像装置１１１−１又は１１１−２としてカメラモジュールを内蔵するようにしてもよいし、表示部１２７が撮像装置１２１−１又は１２１−２としてカメラモジュールを内蔵するようにしてもよい。 The imaging devices 111-1, 111-2, 121-1, and 121-2 are configured to include a CCD (Charge Coupled Device) and a lens, and are cameras that capture a moving image. The imaging device 111-1, 111-2, 121-1, or 121-2 includes a solid-state imaging device such as a CMOS (Complementary Metal Oxide Semiconductor) as an imaging element instead of a CCD. It may be configured. Further, the display unit 116 may incorporate a camera module as the imaging device 111-1 or 111-2, and the display unit 127 may incorporate a camera module as the imaging device 121-1 or 121-2. Also good.

表示部１１６及び１２７は、液晶パネルの表示面を含んで構成され、表示面に動画像等の画像を表示する。なお、表示部１１６及び１２７が、プラズマディスプレイパネルなど液晶パネル以外の表示面を含むようにしてもよい。
自由視点画像生成部１１３は、撮像部１１１が撮像した画像に基づいて、正面画像を生成する。
基準位置算出部１２２は、撮像部１２１が撮像した画像に基づいて、視聴者の両目の中央を結ぶ線の中心位置を基準位置として算出する。基準位置選択部１１２は、基準位置算出部１２２が算出した基準位置の中から、会話者の基準位置を選択する。以下では、選択される会話者を第二視聴者ともいい、第二視聴者の基準位置を第二視聴者基準位置ともいう。
画像伸縮部１２６は、会話者（第二視聴者）が表示部１２７に対して斜めに位置する場合に、表示部１２７に表示すると、画像中の視聴者がこの会話者の位置からは正面から見たように見えるよう、画像の伸縮を行う。 The display units 116 and 127 are configured to include a display surface of the liquid crystal panel, and display an image such as a moving image on the display surface. The display units 116 and 127 may include a display surface other than a liquid crystal panel such as a plasma display panel.
The free viewpoint image generation unit 113 generates a front image based on the image captured by the imaging unit 111.
The reference position calculation unit 122 calculates, based on the image captured by the imaging unit 121, the center position of a line connecting the centers of both eyes of the viewer as the reference position. The reference position selection unit 112 selects the reference position of the conversation person from the reference positions calculated by the reference position calculation unit 122. Hereinafter, the selected conversation person is also referred to as a second viewer, and the reference position of the second viewer is also referred to as a second viewer reference position.
The image expansion / contraction unit 126 is displayed on the display unit 127 when the conversation person (second viewer) is positioned obliquely with respect to the display unit 127, and the viewer in the image is seen from the front from the position of the conversation person. Scale the image so that it looks like it looks.

図５は、表示部１１６を表示面に対して直角方向手前から見た正面図である。表示部１１６には、視聴者Ｐ２１〜Ｐ２５を含む画像が表示される。表示部１１６の上部左右の端部に撮像装置１１１−１及び１１１−２が設置されている。 FIG. 5 is a front view of the display unit 116 as viewed from the front in the direction perpendicular to the display surface. The display unit 116 displays images including the viewers P21 to P25. Imaging devices 111-1 and 111-2 are installed at the left and right ends of the upper portion of the display unit 116.

図６は、表示部１２７を表示面に対して直角方向手前から見た正面図である。表示部１２７には、視聴者Ｐ１１を含む画像が表示される。表示部１２７の上部左右の端部に撮像装置１２１−１及び１２１−２が設置されている。 FIG. 6 is a front view of the display unit 127 as viewed from the front in the direction perpendicular to the display surface. The display unit 127 displays an image including the viewer P11. Imaging devices 121-1 and 121-2 are installed on the left and right ends of the upper portion of the display unit 127.

次に、テレビ会議システム１の動作について説明する。
撮像部１２１は視聴者Ｐ２１〜Ｐ２５を含むステレオ画像（第二ステレオ画像）を撮像する。ここで、ステレオ画像とは、被写体を異なる角度から同時に撮像した複数の画像をいう。撮像部１２１は、被写体である視聴者Ｐ２１〜Ｐ２５を、撮像部１２１−１と撮像部１２１−２とで同時に撮像することにより、ステレオ画像を撮像する。撮像部１２１は、撮像した画像を基準位置算出部１２２に入力し、また、通信ネットワーク１３（図１）を介して表示部１１６に入力する。表示部１１６は、撮像部１２１から受ける画像のうち、撮像装置１２１−１が撮像した画像を表示する。なお、撮像部１２１と表示部１１６との間に自由視点画像生成部を設け、表示部１２７の表示面中央から撮像した場合の画像をこの自由視点画像生成部が生成するようにしてもよい。この場合、表示部１１６は、表示部１２７の表示面中央から撮像した場合の画像を表示する。これにより、視聴者Ｐ２１〜Ｐ２５のうちの会話者が表示部１２７を注視する場合に、視聴者Ｐ１１は、この会話者と視線が一致した画像を見ながら会話を行うことができる。この自由視点画像生成部は、後述する自由視点画像生成部１１３と同様に、ステレオマッチング法を用いて３次元モデルを生成することにより、表示部１２７の表示面中央から撮像した場合の画像を生成する。
基準位置算出部１２２は、撮像部１２１から受ける画像に基づいて、視聴者Ｐ２１〜Ｐ２５の基準位置を算出する。基準位置算出部１２２は、表示部１２７の表示面中央に対する各基準位置の相対座標を算出する。そして、基準位置算出部１２２は、算出した各基準位置を基準位置選択部１１２に、通信ネットワーク１３（図１）を介して出力する。 Next, the operation of the video conference system 1 will be described.
The imaging unit 121 captures a stereo image (second stereo image) including the viewers P21 to P25. Here, the stereo image refers to a plurality of images obtained by simultaneously imaging the subject from different angles. The imaging unit 121 captures a stereo image by simultaneously capturing the viewers P21 to P25, which are subjects, with the imaging unit 121-1 and the imaging unit 121-2. The imaging unit 121 inputs the captured image to the reference position calculation unit 122 and also inputs the image to the display unit 116 via the communication network 13 (FIG. 1). The display unit 116 displays an image captured by the imaging device 121-1 among images received from the imaging unit 121. Note that a free viewpoint image generation unit may be provided between the imaging unit 121 and the display unit 116, and the free viewpoint image generation unit may generate an image captured from the center of the display surface of the display unit 127. In this case, the display unit 116 displays an image captured from the center of the display surface of the display unit 127. Thus, when a conversation person among the viewers P21 to P25 gazes at the display unit 127, the viewer P11 can have a conversation while looking at an image whose line of sight coincides with the conversation person. This free viewpoint image generation unit generates an image captured from the center of the display surface of the display unit 127 by generating a three-dimensional model using the stereo matching method, as in the free viewpoint image generation unit 113 described later. To do.
The reference position calculation unit 122 calculates the reference positions of the viewers P21 to P25 based on the image received from the imaging unit 121. The reference position calculation unit 122 calculates the relative coordinates of each reference position with respect to the center of the display surface of the display unit 127. Then, the reference position calculation unit 122 outputs the calculated reference positions to the reference position selection unit 112 via the communication network 13 (FIG. 1).

図７は基準位置算出部１２２が算出する基準位置の相対座標を示す図である。同図には、視聴者Ｐ２４の基準位置Ｐが示されている。図７（ａ）のように、基準位置算出部１２２は、表示部１２７の表示面中央を原点Ｏ(０、０、０)とする。また、基準位置算出部１２２は、原点Ｏから表示部１２７の表示面に垂直に伸びる直線をｚ軸とし、表示部１２７の表示面正面の向きをｚ軸の正の向きとする。また、表示部１２７の表示面は横長の長方形であり、基準位置算出部１２２は、原点Ｏを通り表示面の長辺と平行な直線をｘ軸とし、正面から見て右向きをｘ軸の正の向きとする。また、基準位置算出部１２２は、原点Ｏを通り表示面の短辺と平行な直線をｙ軸とし、上向きをｙ軸の正の向きとする。基準位置算出部１２２は、視聴者Ｐ２４の基準位置Ｐの座標（ｘｐ、ｙｐ、ｚｐ）を算出する。基準位置算出部１２２は、他の視聴者についても同様に基準位置を算出する。 FIG. 7 is a diagram illustrating the relative coordinates of the reference position calculated by the reference position calculation unit 122. In the figure, the reference position P of the viewer P24 is shown. As shown in FIG. 7A, the reference position calculation unit 122 sets the center of the display surface of the display unit 127 as the origin O (0, 0, 0). Further, the reference position calculation unit 122 sets a straight line extending perpendicularly from the origin O to the display surface of the display unit 127 as the z axis, and sets the front direction of the display surface of the display unit 127 as the positive direction of the z axis. Further, the display surface of the display unit 127 is a horizontally long rectangle, and the reference position calculation unit 122 sets a straight line passing through the origin O and parallel to the long side of the display surface as the x axis, and the right direction when viewed from the front is a positive x axis. And the direction. Further, the reference position calculation unit 122 sets a straight line passing through the origin O and parallel to the short side of the display surface as the y-axis, and upward as a positive direction of the y-axis. The reference position calculation unit 122 calculates the coordinates (xp, yp, zp) of the reference position P of the viewer P24. The reference position calculation unit 122 calculates the reference position in the same manner for other viewers.

基準位置算出部１２２は、撮像装置１２１−１及び１２１−２の画角と表示部１２７に対する相対的な位置及び方向を内部の記憶部（不図示）に記憶している。そして、基準位置算出部１２２は、撮像装置１２１−１及び１２１−２が撮像した画像を用いて、ステレオマッチング法にて各視聴者の基準位置の座標を算出する。
具体的には、基準位置算出部１２２は、まず、肌の色および顔形状のパターンマッチングにて撮像装置１２１−１が撮像した画像と撮像装置１２１−２が撮像した画像とに対して顔検出を行う。複数の顔の画像が検出される場合は、基準位置算出部１２２は、両画像間でのずれ量が最も少ない顔の画像同士を同一の顔の画像と判断する。以下、複数の目の対応付け等も同様である。そして、基準位置算出部１２２は、黒目の色及び形状や眉毛の色及び形状に基づいて、検出した各顔の中からさらに両目を検出する。基準位置算出部１２２は検出した両目の各々の中心について、両画像間における注目点のずれ量と各カメラの位置関係に基づいて、三角測量にて三次元座標を算出する。さらに、基準位置算出部１２２は、両目の各々の三次元座標から両目を結ぶ線分の中心の三次元座標を算出し、この三次元座標を基準位置の座標とする。 The reference position calculation unit 122 stores the angle of view of the imaging devices 121-1 and 121-2 and the relative position and direction with respect to the display unit 127 in an internal storage unit (not shown). Then, the reference position calculation unit 122 calculates the coordinates of the reference position of each viewer by the stereo matching method using the images captured by the imaging devices 121-1 and 121-2.
Specifically, the reference position calculation unit 122 first performs face detection on an image captured by the imaging device 121-1 and an image captured by the imaging device 121-2 by skin color and face shape pattern matching. I do. When a plurality of face images are detected, the reference position calculation unit 122 determines that the face images having the smallest deviation amount between the two images are the same face images. The same applies to the correspondence between a plurality of eyes. Then, the reference position calculation unit 122 further detects both eyes from the detected faces based on the color and shape of the black eyes and the color and shape of the eyebrows. The reference position calculation unit 122 calculates three-dimensional coordinates by triangulation based on the detected amount of the center of both eyes and the positional relationship between the cameras and the point of interest between the images. Further, the reference position calculation unit 122 calculates the three-dimensional coordinates of the center of the line segment connecting the eyes from the three-dimensional coordinates of both eyes, and uses the three-dimensional coordinates as the coordinates of the reference position.

なお、基準位置算出部１２２が、目頭の形状に基づくパターンマッチングにて目頭を検出し、両目頭を結ぶ直線の中点を基準位置とするなど、顔またはその一部の画像を検出して、基準位置を算出するようにしてもよい。
なお、基準位置算出部１２２が基準位置を算出する方法は、上述のステレオマッチングによる方法に限らない。例えば、位置検出センサを用いて視聴者Ｐ２１〜Ｐ２５の位置を検出し、検出した位置の中心を基準位置として近似的に求めるなど、他の方法を用いて基準位置を算出するようにしてもよい。
また、基準位置算出部１２２が基準位置Ｐの座標を図７（ｂ）に示す極座標形式で算出するなど、上記以外の座標の形式で算出するようにしてもよい。例えば、基準位置算出部１２２は、同図（ｂ）のように、同図（ａ）の原点Ｏを原点とし、原点Ｏから基準位置Ｐまでの距離をｒとする。また、基準位置算出部１２２は、原点Ｏを中心としてｚ軸から点（ｘｐ，０，ｚｐ）までの角度をφとし、ｙ軸の正の向きから見て左回り方向をφの正の角度とする。また、基準位置算出部１２２は、原点Ｏを中心として点（ｘｐ，０，ｚｐ）から点Ｐまでの角度をθとし、点（ｘｐ，０，ｚｐ）からｙ軸の正の側へ回転する側を正の角度とする。基準位置算出部１２２は、基準位置Ｐの座標（ｒ，φ，θ）を算出する。 The reference position calculation unit 122 detects the face by pattern matching based on the shape of the eye, detects the face or a part of the image such as a midpoint of a straight line connecting both eyes, and the like, The reference position may be calculated.
Note that the method by which the reference position calculation unit 122 calculates the reference position is not limited to the above-described stereo matching method. For example, the position of the viewers P21 to P25 may be detected using a position detection sensor, and the reference position may be calculated using another method, such as approximately obtaining the center of the detected position as the reference position. .
Alternatively, the reference position calculation unit 122 may calculate the coordinates of the reference position P in a coordinate format other than the above, for example, in the polar coordinate format illustrated in FIG. For example, as shown in FIG. 5B, the reference position calculation unit 122 sets the origin O in FIG. 5A as the origin and the distance from the origin O to the reference position P as r. Further, the reference position calculation unit 122 sets the angle from the z axis to the point (xp, 0, zp) around the origin O as φ, and the counterclockwise direction when viewed from the positive direction of the y axis as a positive angle of φ. And Further, the reference position calculation unit 122 rotates from the point (xp, 0, zp) to the positive side of the y-axis with θ being the angle from the point (xp, 0, zp) to the point P with the origin O as the center. The side is a positive angle. The reference position calculation unit 122 calculates the coordinates (r, φ, θ) of the reference position P.

基準位置算出部１２２は、上記の顔検出において検出したそれぞれの顔が画像中に占める領域を算出し、各顔検出に基づいて得られる基準位置と対応付けて基準位置選択部１１２に出力する。
図８は基準位置算出部１２２が基準位置選択部１１２に出力するデータのデータ構成を示すデータ構成図である。
同図において、基準位置算出部１２２が基準位置選択部１１２に出力するデータは視聴者の人数分の基準位置の情報と各視聴者の顔が画像中に占める領域を表す情報とを含んで構成される。
基準位置の情報は視聴者の基準位置の座標を示す情報である。表示部上の座標の情報は、視聴者の顔が画像中に占める領域の座標を示す情報である。基準位置算出部１２２は、視聴者の顔が画像中に占める領域のｘ座標の最小値および最大値とｙ座標の最小値および最大値とを出力する。なお、基準位置算出部１２２が、視聴者の顔が画像中に占める領域を表す情報は、ｘ座標の最小値および最大値とｙ座標の最小値および最大値に限らない。例えば、基準位置算出部１２２が視聴者の顔を楕円で近似して中心点の座標と長軸及び短軸の長さを、顔が画像中に占める領域を表す情報としてもよい。あるいは、基準位置算出部１２２が視聴者の顔を四角形より角数の多い多角形の位置で近似した各頂点の座標を、顔が画像中に占める領域を表す情報としてもよい。
また、同図の、基準位置のデータの単位はミリメートルであり、表示部上の座標の単位はピクセルである。なお、基準位置のデータの単位はミリメートルに限らず、例えばインチなど長さを表す他の単位を用いてもよい。また、表示部上の座標の単位はピクセルに限らず、例えば、表示面の水平方向の辺の長さを１とし、これに対する表示面左端からの相対的な長さで水平方向の座標を表すなど、他の単位を用いてもよい。 The reference position calculation unit 122 calculates the area occupied by each face detected in the face detection in the image, and outputs it to the reference position selection unit 112 in association with the reference position obtained based on each face detection.
FIG. 8 is a data configuration diagram illustrating a data configuration of data output from the reference position calculation unit 122 to the reference position selection unit 112.
In the figure, the data output from the reference position calculation unit 122 to the reference position selection unit 112 includes information on the reference positions for the number of viewers and information representing the area occupied by each viewer's face in the image. Is done.
The reference position information is information indicating the coordinates of the viewer's reference position. The coordinate information on the display unit is information indicating the coordinates of the area occupied by the viewer's face in the image. The reference position calculation unit 122 outputs the minimum value and maximum value of the x coordinate and the minimum value and maximum value of the y coordinate of the area occupied by the viewer's face in the image. The information indicating the area occupied by the viewer's face in the image by the reference position calculation unit 122 is not limited to the minimum value and maximum value of the x coordinate and the minimum value and maximum value of the y coordinate. For example, the reference position calculation unit 122 may approximate the viewer's face with an ellipse and use the coordinates of the center point and the lengths of the major axis and the minor axis as information representing the area occupied by the face in the image. Alternatively, the reference position calculation unit 122 may use the coordinates of each vertex obtained by approximating the viewer's face at a polygonal position with more corners than the quadrangle as information representing the area occupied by the face in the image.
Further, the unit of the reference position data in the figure is millimeters, and the unit of coordinates on the display unit is pixels. Note that the unit of the data of the reference position is not limited to millimeters, and other units representing the length such as inches may be used. The unit of coordinates on the display unit is not limited to pixels, and for example, the horizontal side length of the display surface is set to 1, and the horizontal coordinate is expressed by the relative length from the left end of the display surface. Other units may be used.

撮像部１１１は、視聴者Ｐ１１の画像を撮像し、基準位置選択部１１２と自由視点画像生成部１１３とに入力する。撮像部１１１は、撮像装置１１１−１と撮像装置１１１−２とを用いて、視聴者Ｐ１１のステレオ画像（第一ステレオ画像）を撮像する。基準位置選択部１１２は、基準位置算出部１２２から通信ネットワーク１３（図１）を介して受ける基準位置の中から、会話者に対応する１個の基準位置を選択して画像伸縮部１２６に出力する。基準位置選択部１１２は、撮像部１１１から受ける画像から、視聴者Ｐ１１の視線方向を検出し、視聴者Ｐ１１が注目している視聴者を会話者と判断し、この会話者に対応する基準位置を選択する。
具体的には、基準位置選択部１１２は、まず、撮像部１１１が撮像した画像から視聴者Ｐ１１の顔を検出し、検出した顔の中から目を検出する。基準位置選択部１１２は、検出した顔の方向及び目の位置から視聴者Ｐ１１の視線方向を検出する。顔の方向は、例えば左右の目の面積の大小より判別する。基準位置選択部１１２は、検出した視線方向と表示部１１６の表示面との交点を、視聴者Ｐ１１が注目する表示部１１６上の点として検出する。そして、基準位置選択部１１２は、基準位置算出部１２２から受ける、視聴者Ｐ２１〜Ｐ２５のそれぞれの顔が画像中に占める領域の情報の中から、視聴者Ｐ１１が注目する表示部１１６上の点を含む領域の情報を選択することにより、視聴者Ｐ１１が注目する視聴者を会話者として選択する。基準位置選択部１１２は、選択した領域の情報に対応付けられた基準位置を、会話者に対応する基準位置として画像伸縮部１２６に、通信ネットワーク１３を介して出力する。 The imaging unit 111 captures an image of the viewer P11 and inputs it to the reference position selection unit 112 and the free viewpoint image generation unit 113. The imaging unit 111 captures a stereo image (first stereo image) of the viewer P11 using the imaging device 111-1 and the imaging device 111-2. The reference position selection unit 112 selects one reference position corresponding to the talker from the reference positions received from the reference position calculation unit 122 via the communication network 13 (FIG. 1) and outputs the selected reference position to the image expansion / contraction unit 126. To do. The reference position selection unit 112 detects the line of sight of the viewer P11 from the image received from the imaging unit 111, determines that the viewer who the viewer P11 is paying attention to is a talker, and the reference position corresponding to the talker Select.
Specifically, the reference position selection unit 112 first detects the face of the viewer P11 from the image captured by the imaging unit 111, and detects eyes from the detected face. The reference position selection unit 112 detects the viewing direction of the viewer P11 from the detected face direction and eye position. The direction of the face is determined based on the size of the left and right eye areas, for example. The reference position selection unit 112 detects an intersection between the detected line-of-sight direction and the display surface of the display unit 116 as a point on the display unit 116 that is viewed by the viewer P11. Then, the reference position selection unit 112 receives points from the reference position calculation unit 122 on points on the display unit 116 that the viewer P11 pays attention to from the information about the area occupied by each face of the viewers P21 to P25 in the image. By selecting the information of the area including the viewer P11, the viewer who pays attention to the viewer P11 is selected as the talker. The reference position selection unit 112 outputs the reference position associated with the selected region information to the image expansion / contraction unit 126 via the communication network 13 as the reference position corresponding to the talker.

図９は、視聴者Ｐ１１が表示部１１６上の視聴者Ｐ２４の画像を見る視線を示す図である。同図において、表示部１１６には視聴者Ｐ２１〜Ｐ２５の画像が表示されており、視聴者Ｐ１１は、視聴者Ｐ２４を見ている。この視聴者Ｐ１１を、表示部１１６に設置された撮像装置１１１−１及び１１１−２が撮像する。基準位置選択部１１２は撮像装置１１１−１及び１１１−２が撮像する画像を用いて上記のように視聴者Ｐ１１の視線方向を検出して基準位置を選択する。 FIG. 9 is a diagram illustrating a line of sight when the viewer P11 views the image of the viewer P24 on the display unit 116. In the figure, images of the viewers P21 to P25 are displayed on the display unit 116, and the viewer P11 is watching the viewer P24. The viewers P11 are imaged by the imaging devices 111-1 and 111-2 installed in the display unit 116. The reference position selection unit 112 selects the reference position by detecting the viewing direction of the viewer P11 as described above using the images captured by the imaging devices 111-1 and 111-2.

図１０は、基準位置選択部１１２が画像伸縮部１２６に出力する基準位置の情報のデータ構成を示すデータ構成図である。同図において、基準位置選択部１１２が画像伸縮部１２６に出力する基準位置の情報は、１個の基準位置の座標を含んで構成される。後述するように、画像伸縮部１２６は撮像部１１１から受ける画像をこの基準位置に基づいて伸縮する。
なお、基準位置選択部１１２が会話者に対応する１個の基準位置を選択する方法は、上記に示した視線を検出する方法に限らない。例えば、表示部１１６がタッチパネルになっており、視聴者Ｐ１１が表示部１１６上の視聴者が表示されている位置に触れることで相手の会話者を選択するようにしてもよい。基準位置選択部１１２は、選択された視聴者に対応する基準位置を選択する。あるいは、基準位置選択部１１２が表示部１１６上にカーソルを表示し、視聴者Ｐ１１がリモートコントローラ等によりそのカーソル位置を操作して相手の会話者を選択すると、基準位置選択部１１２が選択された会話者に対応する基準位置を選択するようにしてもよい。また、視聴者Ｐ２１〜Ｐ２５のいずれかが視聴者Ｐ１１に話しかける場合に、基準位置選択部１１２が音声認識あるいは口の動きの検出によって発言している視聴者を特定し、特定した視聴者を会話者として選択するようにしてもよい。あるいは、視聴者Ｐ２１〜Ｐ２５の各々用のマイクを設け、基準位置選択部１１２は視聴者が発言のためにマイクのスイッチを入れたことを検出して会話者を選択するようにしてもよい。 FIG. 10 is a data configuration diagram showing the data configuration of the reference position information output from the reference position selection unit 112 to the image expansion / contraction unit 126. In the figure, the reference position information output from the reference position selection unit 112 to the image expansion / contraction unit 126 includes coordinates of one reference position. As will be described later, the image expansion / contraction unit 126 expands / contracts the image received from the imaging unit 111 based on the reference position.
In addition, the method for the reference position selection unit 112 to select one reference position corresponding to the talker is not limited to the method for detecting the line of sight described above. For example, the display unit 116 may be a touch panel, and the viewer P11 may select the other party's talker by touching the position on the display unit 116 where the viewer is displayed. The reference position selection unit 112 selects a reference position corresponding to the selected viewer. Alternatively, when the reference position selection unit 112 displays a cursor on the display unit 116, and the viewer P11 operates the cursor position with a remote controller or the like to select a conversation partner, the reference position selection unit 112 is selected. A reference position corresponding to the talker may be selected. Further, when any of the viewers P21 to P25 speaks to the viewer P11, the reference position selection unit 112 identifies the viewer who is speaking by voice recognition or detection of mouth movements, and talks to the identified viewer. You may make it select as a person. Alternatively, microphones may be provided for each of the viewers P21 to P25, and the reference position selection unit 112 may detect that the viewer has switched on the microphone for speaking and select a talker.

自由視点画像生成部１１３は、撮像部１１１が撮像した画像に基づいて、表示部１１６の表示面の中央正面から撮像した場合の画像を生成する。自由視点画像生成部１１３は、複数の画像から任意視点の画像を生成する既存の方法を用いて、画像を生成する。
具体的には、自由視点画像生成部１１３は、撮像装置１１１−１が撮像した画像および撮像装置１１１−２が撮像した画像を用いて、ステレオマッチング法によって視聴者Ｐ１１上の各点（以下、注目点ともいう）の三次元座標を算出する。この際、自由視点画像生成部１１３は、撮像装置１１１−１が撮像した画像上に注目点を設定し、撮像装置１１１−２が撮像した画像上で、この注目点に相当する点（以下、対応点ともいう）を以下の方法により決定する。まず、自由視点画像生成部１１３は、撮像装置１１１−１と１１１−２との位置関係に基づき、撮像装置１１１−２が撮像した画像上に、対応点の検索範囲を設定する。自由視点画像生成部１１３は、検索範囲内の各画素について、その画素及びその周辺の画素と、注目点及びその周辺の画素とを対応付け、対応付けた各画素の明るさの差の合計を算出する。自由視点画像生成部１１３は、検索範囲内の各画素のうち、明るさの差の合計が最小となる点を、対応点とする。なお、自由視点画像生成部１１３が、画像から濃淡エッジ等の特徴を抽出するなど、他の方法を用いて注目点と対応点とを決定するようにしてもよい。
自由視点画像生成部１１３は、この注目点の三次元座標をステレオマッチング法によって算出する。 The free viewpoint image generation unit 113 generates an image captured from the center front of the display surface of the display unit 116 based on the image captured by the imaging unit 111. The free viewpoint image generation unit 113 generates an image using an existing method for generating an image of an arbitrary viewpoint from a plurality of images.
Specifically, the free viewpoint image generation unit 113 uses the image captured by the image capturing device 111-1 and the image captured by the image capturing device 111-2 to perform each point on the viewer P11 (hereinafter, referred to as “stereo matching method”). 3D coordinates are also calculated. At this time, the free viewpoint image generation unit 113 sets a point of interest on the image captured by the imaging device 111-1, and a point corresponding to the point of interest (hereinafter, referred to as the point of interest) on the image captured by the imaging device 111-2 Is also determined by the following method. First, the free viewpoint image generation unit 113 sets a corresponding point search range on the image captured by the imaging device 111-2 based on the positional relationship between the imaging devices 111-1 and 111-2. For each pixel in the search range, the free viewpoint image generation unit 113 associates the pixel and its surrounding pixels with the attention point and its surrounding pixels, and calculates the sum of the brightness differences of the associated pixels. calculate. The free viewpoint image generation unit 113 sets, as a corresponding point, a point having the smallest sum of brightness differences among the respective pixels within the search range. Note that the free viewpoint image generation unit 113 may determine the point of interest and the corresponding point using other methods such as extracting features such as shading edges from the image.
The free viewpoint image generation unit 113 calculates the three-dimensional coordinates of the attention point by the stereo matching method.

次に、自由視点画像生成部１１３は算出した三次元座標に基づいて視聴者Ｐ１１の三次元モデルを構築する。また、自由視点画像生成部１１３は、基準位置選択部１１２と同様に、視聴者Ｐ１１の視線方向を検出する。自由視点画像生成部１１３は、構築した三次元モデルを元に、視聴者Ｐ１１の視線方向から見た視聴者Ｐ１１の画像を生成する。その際、自由視点画像生成部１１３は、視聴者Ｐ１１の基準位置を画像の中央に合わせて画像を生成する。自由視点画像生成部１１３は、生成した画像を画像伸縮部１２６に、通信ネットワーク１３（図１）を介して入力する。 Next, the free viewpoint image generation unit 113 constructs a three-dimensional model of the viewer P11 based on the calculated three-dimensional coordinates. Further, like the reference position selection unit 112, the free viewpoint image generation unit 113 detects the viewing direction of the viewer P11. The free viewpoint image generation unit 113 generates an image of the viewer P11 viewed from the viewing direction of the viewer P11 based on the constructed three-dimensional model. At that time, the free viewpoint image generation unit 113 generates an image by matching the reference position of the viewer P11 with the center of the image. The free viewpoint image generation unit 113 inputs the generated image to the image expansion / contraction unit 126 via the communication network 13 (FIG. 1).

図１１は自由視点画像生成部１１３が視聴者Ｐ１１の視線方向から見た視聴者Ｐ１１の画像を生成する処理手順を示すフローチャートである。自由視点画像生成部１１３は、テレビ会議端末装置１１が起動すると、視聴者Ｐ１１の視線方向から見た視聴者Ｐ１１の画像を生成する処理を開始する。
ステップＳ１において、自由視点画像生成部１１３は、撮像部１１１から画像が入力されたか否かを判断する。例えば、撮像部１１１は、自由視点画像生成部１１３に所定のヘッダを持つフレームデータの形式で画像を入力し、自由視点画像生成部１１３は、このヘッダを検出すると画像が入力されたと判断する。画像が入力されたと判断した場合（ステップＳ１：ＹＥＳ）はステップＳ２に移り、入力されていないと判断した場合（ステップＳ１：ＮＯ）はステップＳ１を繰り返す。
ステップＳ２〜Ｓ４において、自由視点画像生成部１１３はステレオマッチング法によって視聴者Ｐ１１の各部の位置を算出する。ステップＳ２において、自由視点画像生成部１１３は、撮像部１１１から受ける、撮像装置１１１−１と１１１−２とのそれぞれが撮像した画像に共通する注目点を抽出する。自由視点画像生成部１１３は視聴者Ｐ１１の画像上の点を含む注目点を抽出する。ステップＳ３において、自由視点画像生成部１１３は、各注目点について、撮像装置１１１−１が撮像した画像と撮像装置１１１−２が撮像した画像との視差を算出する。ステップＳ４において、自由視点画像生成部１１３は、算出した視差に基づき三角測量を用いて各注目点の三次元座標を算出する。 FIG. 11 is a flowchart illustrating a processing procedure in which the free viewpoint image generation unit 113 generates an image of the viewer P11 viewed from the viewing direction of the viewer P11. When the video conference terminal device 11 is activated, the free viewpoint image generation unit 113 starts processing for generating an image of the viewer P11 viewed from the viewing direction of the viewer P11.
In step S 1, the free viewpoint image generation unit 113 determines whether an image is input from the imaging unit 111. For example, the imaging unit 111 inputs an image in the form of frame data having a predetermined header to the free viewpoint image generation unit 113, and the free viewpoint image generation unit 113 determines that an image has been input when this header is detected. If it is determined that an image has been input (step S1: YES), the process proceeds to step S2. If it is determined that an image has not been input (step S1: NO), step S1 is repeated.
In steps S2 to S4, the free viewpoint image generation unit 113 calculates the position of each part of the viewer P11 by the stereo matching method. In step S 2, the free viewpoint image generation unit 113 extracts an attention point that is received from the imaging unit 111 and is common to images captured by the imaging devices 111-1 and 111-2. The free viewpoint image generation unit 113 extracts attention points including points on the image of the viewer P11. In step S 3, the free viewpoint image generation unit 113 calculates the parallax between the image captured by the imaging device 111-1 and the image captured by the imaging device 111-2 for each target point. In step S4, the free viewpoint image generation unit 113 calculates the three-dimensional coordinates of each point of interest using triangulation based on the calculated parallax.

ステップＳ５において、自由視点画像生成部１１３は算出した三次元画像に基づいて視聴者Ｐ１１の三次元モデルを生成する。ステップＳ６において、自由視点画像生成部１１３は、生成した三次元モデルの表面に質感を与えるための公知のテキスチャマッピングを行う。ステップＳ７において、自由視点画像生成部１１３は、視聴者Ｐ１１の視線方向を検出する。ステップＳ８において、自由視点画像生成部１１３は、テキスチャマッピングを行った三次元モデルに基づいて、視聴者Ｐ１１の視線方向から撮像した場合の画像を生成する。ステップＳ９において、自由視点画像生成部１１３は生成した画像を画像伸縮部１２６に入力する。その後ステップＳ１に移る。 In step S5, the free viewpoint image generation unit 113 generates a 3D model of the viewer P11 based on the calculated 3D image. In step S6, the free viewpoint image generation unit 113 performs known texture mapping for giving a texture to the surface of the generated three-dimensional model. In step S7, the free viewpoint image generation unit 113 detects the viewing direction of the viewer P11. In step S 8, the free viewpoint image generation unit 113 generates an image when captured from the viewing direction of the viewer P 11 based on the three-dimensional model subjected to texture mapping. In step S 9, the free viewpoint image generation unit 113 inputs the generated image to the image expansion / contraction unit 126. Thereafter, the process proceeds to step S1.

なお、基準位置選択部１１２が自由視点画像生成部１１３に、視聴者Ｐ１１が注目する表示部１１６上の点の座標を入力し、自由視点画像生成部１１３がこの点から視聴者Ｐ１１を撮像した画像を生成するようにしてもよい。これにより、自由視点画像生成部１１３が視聴者Ｐ１１の視線方向を検出する処理を削減することができる。あるいは、自由視点画像生成部１１３が表示部１１６の表示面中央から見た画像を生成するようにしてもよい。これにより、視聴者Ｐ１１が表示面中央付近を注目している場合には、自由視点画像生成部１１３は視聴者Ｐ１１の視線方向から見た画像を生成することが出来る。
なお、ハーフミラーを使用する既存の方法により、撮像装置が表示面の中央正面からの画像を撮像するようにしてもよい。例えば、表示部１１６は表示面にハーフミラーを備える。このハーフミラーは表示面の下方向に設置された投影部が投影する画像を表示面の正面方向に反射する。これにより表示部１１６は表示面に画像を表示する。また、表示部１１６は表示面の中央かつハーフミラーの後ろに撮像装置を備え、この撮像装置は表示面の中央正面からの画像を撮像する。これにより、この撮像装置は、視聴者Ｐ１１が表示面を見ている場合には、その正面から画像を撮像する。なお、このハーフミラーを使用する既存の方法による場合は、テレビ会議システム１は自由視点画像生成部１１３を具備しなくてもよい。 The reference position selection unit 112 inputs the coordinates of a point on the display unit 116 that the viewer P11 is interested in to the free viewpoint image generation unit 113, and the free viewpoint image generation unit 113 images the viewer P11 from this point. An image may be generated. Thereby, the process in which the free viewpoint image generation unit 113 detects the viewing direction of the viewer P11 can be reduced. Alternatively, the free viewpoint image generation unit 113 may generate an image viewed from the center of the display surface of the display unit 116. Thereby, when the viewer P11 pays attention to the vicinity of the center of the display surface, the free viewpoint image generation unit 113 can generate an image viewed from the viewing direction of the viewer P11.
In addition, you may make it an imaging device image the image from the center front of a display surface by the existing method using a half mirror. For example, the display unit 116 includes a half mirror on the display surface. The half mirror reflects an image projected by a projection unit installed below the display surface in the front direction of the display surface. Thereby, the display unit 116 displays an image on the display surface. The display unit 116 includes an imaging device in the center of the display surface and behind the half mirror, and the imaging device captures an image from the center front of the display surface. Thus, when the viewer P11 is looking at the display surface, the imaging device captures an image from the front. In the case of the existing method using the half mirror, the video conference system 1 may not include the free viewpoint image generation unit 113.

画像伸縮部１２６は、表示部１２７の表示面に正対した場合の画像が会話者である視聴者の位置から見えるよう、自由視点画像生成部１１３から受ける画像を伸縮する。まず、画像伸縮部１２６は、基準位置選択部１１２から受ける基準位置（ｘｐ，ｙｐ，ｚｐ）に基づいて、図７（ｂ）に示した原点Ｏと基準位置Ｐとの間の距離ｒと、ｘ軸回りの角度θと、ｙ軸回りの角度φとを算出する。画像伸縮部１２６は、ｘ軸の正の側から見て右回りを正の角度として角度θを算出し、ｙ軸の正の側から見て左回りを正の角度として角度φを算出する。
画像伸縮部１２６は、式（１）に基づいてｒとθとφとを算出する。基準位置Ｐは原点Ｏに対し、距離ｒで、鉛直方向にθ、水平方向にφ回転した位置にある。 The image expansion / contraction unit 126 expands / contracts the image received from the free viewpoint image generation unit 113 so that the image when facing the display surface of the display unit 127 can be seen from the position of the viewer who is the talker. First, the image expansion / contraction unit 126, based on the reference position (xp, yp, zp) received from the reference position selection unit 112, the distance r between the origin O and the reference position P shown in FIG. An angle θ around the x axis and an angle φ around the y axis are calculated. The image expansion / contraction unit 126 calculates the angle θ with the clockwise direction as viewed from the positive side of the x axis as a positive angle, and calculates the angle φ with the counterclockwise direction as viewed from the positive side of the y axis as a positive angle.
The image expansion / contraction unit 126 calculates r, θ, and φ based on Expression (1). The reference position P is a position that is rotated by θ in the vertical direction and φ in the horizontal direction with respect to the origin O at a distance r.

次に、画像伸縮部１２６は、自由視点画像生成部１１３から受ける画像を伸縮する。
図１２は、画像伸縮部１２６が行う画像の伸縮を示す図である。同図は、表示部１２７を上から見た平面図であり、原点Ｏとｘ軸とｚ軸と角度φと基準位置Ｐとは図７のものと同様である。また、仮想表示面１２７’は直線ＯＰに垂直な平面である。図１２は基準位置Ｐと原点Ｏとのｙ方向のずれがない場合を示し、仮想表示面１２７’は直線で示されている。点Ｑは仮想表示面１２７’上の１点であり、直線ｌは点Ｑ及び点Ｐを通る直線である。点Ｒは直線ｌと表示部１２７の表示面との交点である。
画像伸縮部１２６は、表示部１２７の表示面中央と基準位置Ｐとの距離ｒ及び角度φ及び表示部１２７の表示面の大きさに基づいて、基準位置Ｐから表示部１２７を見た場合の表示部１２７の視野角αを算出する。 Next, the image expansion / contraction unit 126 expands / contracts the image received from the free viewpoint image generation unit 113.
FIG. 12 is a diagram illustrating image expansion / contraction performed by the image expansion / contraction unit 126. This figure is a plan view of the display unit 127 as viewed from above. The origin O, the x axis, the z axis, the angle φ, and the reference position P are the same as those in FIG. The virtual display surface 127 ′ is a plane perpendicular to the straight line OP. FIG. 12 shows a case where there is no deviation in the y direction between the reference position P and the origin O, and the virtual display surface 127 ′ is shown by a straight line. The point Q is one point on the virtual display surface 127 ′, and the straight line 1 is a straight line passing through the point Q and the point P. Point R is an intersection of the straight line 1 and the display surface of the display unit 127.
The image expansion / contraction unit 126 is a case where the display unit 127 is viewed from the reference position P based on the distance r and angle φ between the center of the display surface of the display unit 127 and the reference position P and the size of the display surface of the display unit 127. The viewing angle α of the display unit 127 is calculated.

つぎに、画像伸縮部１２６は、原点Ｏを含み、直線ＯＰに垂直な仮想表示面１２７’の形状を算出する。仮想表示面１２７’は、基準位置Ｐから見た場合に表示部１２７の表示面と視野角が一致する表示面である。
画像伸縮部１２６は、表示部１２７の表示面の各辺の長さと仮想表示面１２７’の各辺の長さとを比較して、仮想表示面１２７’の各辺のうち、表示部１２７の表示面に対して最も拡大される辺を判定する。図１２の場合、画像伸縮部１２６は、ｙ軸に平行な辺のうちのｘ座標値が正の側の辺が表示部１２７の表示面に対して最も拡大される辺であると判定する。
画像伸縮部１２６は、判定した最も拡大される辺の拡大率に従って、自由視点画像生成部１１３から受ける画像を縦横比を保って拡大する。画像伸縮部１２６は、原点Ｏと拡大した画像の中心とを一致させ、かつ、仮想表示面１２７’の最も拡大される辺と拡大した画像中の対応する辺との方向を一致させて、拡大した画像を仮想表示面１２７’に対応付け、この対応付けた画像に対して、仮想表示面１２７’から表示部１２７の表示面への、基準位置Ｐを中心とする透過投影変換を行った画像を生成する。具体的には、画像伸縮部１２６は、自由視点画像生成部１１３から受ける画像の各画素（ピクセル）について、上記の対応付けによって仮想表示面１２７’上で対応付けられる位置を算出する。さらに、画像伸縮部１２６は、自由視点画像生成部１１３から受ける画像の各画素について、下記の写像によって表示部１２７の表示面上に対応付けられる位置を算出する。画像伸縮部１２６は、自由視点画像生成部１１３から受ける画像の各画素が表示部１２７の表示面上に対応付けられる位置に基づいて、自由視点画像生成部１１３の画素と表示部１２７の画素との対応付けを行う。画像伸縮部１２６は、この画素の対応付けに基づいて表示部１２７が表示する画像を生成し、表示部１２７に入力する。 Next, the image expansion / contraction unit 126 calculates the shape of the virtual display surface 127 ′ including the origin O and perpendicular to the straight line OP. The virtual display surface 127 ′ is a display surface whose viewing angle matches that of the display surface of the display unit 127 when viewed from the reference position P.
The image expansion / contraction unit 126 compares the length of each side of the display surface of the display unit 127 with the length of each side of the virtual display surface 127 ′, and displays the display of the display unit 127 among the sides of the virtual display surface 127 ′. The side that is most magnified with respect to the face is determined. In the case of FIG. 12, the image expansion / contraction unit 126 determines that the side having the positive x coordinate value among the sides parallel to the y axis is the side that is most enlarged with respect to the display surface of the display unit 127.
The image expansion / contraction unit 126 expands the image received from the free viewpoint image generation unit 113 while maintaining the aspect ratio according to the determined enlargement ratio of the side to be enlarged. The image expansion / contraction unit 126 matches the origin O and the center of the enlarged image, and matches the direction of the most enlarged side of the virtual display surface 127 ′ with the corresponding side in the enlarged image to enlarge the image. The obtained image is associated with the virtual display surface 127 ′, and an image obtained by performing transmission projection conversion centered on the reference position P from the virtual display surface 127 ′ to the display surface of the display unit 127 with respect to the associated image. Is generated. Specifically, the image expansion / contraction unit 126 calculates the position of each pixel (pixel) of the image received from the free viewpoint image generation unit 113 that is associated on the virtual display surface 127 ′ by the above association. Further, the image expansion / contraction unit 126 calculates a position associated with the display surface of the display unit 127 by the following mapping for each pixel of the image received from the free viewpoint image generation unit 113. The image expansion / contraction unit 126 determines the pixels of the free viewpoint image generation unit 113 and the pixels of the display unit 127 based on the positions where the pixels of the image received from the free viewpoint image generation unit 113 are associated with the display surface of the display unit 127. Is associated. The image expansion / contraction unit 126 generates an image to be displayed by the display unit 127 based on the pixel association, and inputs the image to the display unit 127.

同図の場合、仮想表示面１２７’は原点Ｏを通り、表示部１２７の表示面に対して角度φ傾いている。したがって、仮想表示面１２７’は図中のｘｚ平面内において式（２）で表される直線となる。 In the figure, the virtual display surface 127 ′ passes through the origin O and is inclined at an angle φ with respect to the display surface of the display unit 127. Therefore, the virtual display surface 127 ′ is a straight line represented by Expression (2) in the xz plane in the drawing.

また、仮想表示面上の点Ｑ（ｑ，−ｑ（ｔａｎφ））と点Ｐ（ｒ（ｓｉｎφ），ｒ（ｃｏｓφ））とを結ぶ直線は、式（３）で表される。 A straight line connecting the point Q (q, −q (tan φ)) and the point P (r (sin φ), r (cos φ)) on the virtual display surface is expressed by Expression (3).

この直線と表示部１２７の表示面との交点が、投影すべき点となる。その交点は、（２）式とｚ＝０との交点なので、投影点Ｒは、Ｒ（ｒｑ／（ｒ（ｃｏｓ^２φ）＋ｑ（ｓｉｎφ）），０）となる。
画像伸縮部１２６は、点Ｑ（ｑ，−ｑ（ｔａｎφ））から点Ｒ（ｒｑ／（ｒ（ｃｏｓ^２φ）＋ｑ（ｓｉｎφ）），０）への画像の投影により、自由視点画像生成部１１３から受ける画像を伸縮する。具体的には、自由視点画像生成部１１３から受ける画像中の、上記によって点Ｑに対応付けられる画素の画素値を、点Ｒに対応付けられる表示部１２７上の画素値として表示部１２７に入力する。
以上が２次元（ｙ＝０）の場合の処理である。３次元の場合も同様に、画像伸縮部１２６は、仮想表示面１２７’から表示部１２７の表示面に画像を投影した場合の画像を生成する。 The intersection of this straight line and the display surface of the display unit 127 is a point to be projected. Since the intersection is an intersection between the equation (2) and z = 0, the projection point R is R (rq / (r (cos ² φ) + q (sin φ)), 0).
The image expansion / contraction unit 126 projects a free viewpoint image generation unit by projecting an image from the point Q (q, −q (tan φ)) to the point R (rq / (r (cos ² φ) + q (sin φ)), 0). The image received from 113 is expanded or contracted. Specifically, the pixel value of the pixel associated with the point Q in the image received from the free viewpoint image generation unit 113 is input to the display unit 127 as the pixel value on the display unit 127 associated with the point R. To do.
The above is the processing in the case of two dimensions (y = 0). Similarly, in the case of the three-dimensional case, the image expansion / contraction unit 126 generates an image when an image is projected from the virtual display surface 127 ′ onto the display surface of the display unit 127.

図１３は画像伸縮部１２６が画像の伸縮を行う処理手順を示すフローチャートである。画像伸縮部１２６はテレビ会議端末装置１２が起動すると、画像の伸縮を行う処理を開始する。
ステップＳ２１において、画像伸縮部１２６は自由視点画像生成部１１３から画像が入力されたか否かを判断する。画像が入力されたと判断した場合（ステップＳ２１：ＹＥＳ）はステップＳ２２に移る。画像が入力されていないと判断した場合（ステップＳ２１：ＮＯ）はステップＳ２１を繰り返す。
ステップＳ２２において、画像伸縮部１２６は基準位置Ｐから見た場合の画像表示部１２７の表示面の視野角を算出する。ステップＳ２３において、画像伸縮部１２６は、仮想表示面１２７’の外形を算出する。ステップＳ２４において、画像伸縮部１２６は算出した仮想表示面１２７’の外形に基づいて、自由視点画像生成部１１３から受ける画像を拡大する。
ステップＳ２５において、画像伸縮部１２６は、仮想表示面１２７’から表示部１２７へ、基準位置Ｐを中心として投影した場合の画像を生成する。ステップＳ２６において、画像伸縮部１２６は、生成した画像を表示部１２７に入力する。 FIG. 13 is a flowchart illustrating a processing procedure in which the image expansion / contraction unit 126 performs image expansion / contraction. When the video conference terminal device 12 is activated, the image expansion / contraction unit 126 starts processing for expanding / contracting the image.
In step S 21, the image expansion / contraction unit 126 determines whether an image is input from the free viewpoint image generation unit 113. If it is determined that an image has been input (step S21: YES), the process proceeds to step S22. If it is determined that no image is input (step S21: NO), step S21 is repeated.
In step S 22, the image expansion / contraction unit 126 calculates the viewing angle of the display surface of the image display unit 127 when viewed from the reference position P. In step S23, the image expansion / contraction unit 126 calculates the outer shape of the virtual display surface 127 ′. In step S24, the image expansion / contraction unit 126 enlarges the image received from the free viewpoint image generation unit 113 based on the calculated outer shape of the virtual display surface 127 ′.
In step S 25, the image expansion / contraction unit 126 generates an image when projected from the virtual display surface 127 ′ to the display unit 127 around the reference position P. In step S 26, the image expansion / contraction unit 126 inputs the generated image to the display unit 127.

図１４は、自由視点画像生成部１１３が出力する画像および画像伸縮部１２６が画像の伸縮を行った画像の例を示す図である。
自由視点画像生成部１１３は、図１４（ａ）のように視聴者Ｐ１１を正面から見た画像を画像伸縮部１２６に入力する。画像伸縮部１２６が画像の伸縮を行い表示部１２７が表示する画像を表示部１２７の正面から見ると、図１４（ｂ）のように視聴者Ｐ１１を正面から見た画像が変形されて見える。これを、会話者である視聴者の位置から見ると、図１４（ｃ）のように視聴者Ｐ１１を正面から見た画像が見える。同図は、図１２で説明したように画像伸縮部１２６が会話者である視聴者の基準位置に基づいて算出した画像を、画面に向かっての右側にある基準位置から見た図である。このため、表示面の右側の辺が、左側の辺よりも長く見えている。
図１４（ｃ）のように、会話者である視聴者の位置からは、視聴者Ｐ１１を正面から見た画像が見えるので、会話者である視聴者は、相手の会話者である視聴者Ｐ１１と視線が一致した画像を見ながら会話を行うことができる。 FIG. 14 is a diagram illustrating an example of an image output from the free viewpoint image generation unit 113 and an image in which the image expansion / contraction unit 126 performs image expansion / contraction.
The free viewpoint image generation unit 113 inputs an image obtained by viewing the viewer P11 from the front as shown in FIG. When the image expansion / contraction unit 126 expands / contracts the image and the image displayed by the display unit 127 is viewed from the front of the display unit 127, the image viewed from the front of the viewer P11 appears to be deformed as shown in FIG. When this is viewed from the position of the viewer who is a conversation person, an image of the viewer P11 seen from the front can be seen as shown in FIG. This figure is a view of the image calculated by the image expansion / contraction unit 126 based on the reference position of the viewer who is a conversation person as seen from the reference position on the right side toward the screen as described with reference to FIG. For this reason, the right side of the display surface appears longer than the left side.
As shown in FIG. 14C, an image of the viewer P11 as seen from the front can be seen from the position of the viewer who is the conversation person, so that the viewer who is the conversation person is the viewer P11 who is the other conversation person. You can have a conversation while looking at the images with the same line of sight.

以上のように、テレビ会議システム１では、自由視点画像生成部が正面画像を生成し、画像伸縮部１２６が、この正面画像を、第二視聴者視点位置から見ると視聴者Ｐ１１を正面から見た画像が見えるように変換する。したがって、第二視聴者基準位置から表示部１２７を見る会話者は、相手の会話者である視聴者Ｐ１１と視線が一致した画像を見ながら会話を行うことができる。また、会話者は、表示部１２７を斜めから見た場合に見える視聴者Ｐ１１が縦長に変形された画像ではなく、表示部１２７を正面から見た場合に見える自然な画像を見ながら会話を行うことができる。
なお、テレビ会議システム１における各部の配置は図４のものに限らず、基準位置選択部１１２と自由視点画像生成部１１３と基準位置算出部１２２と画像伸縮部１２６とは、テレビ会議端末装置１１とテレビ会議端末装置１２とのいずれに含まれていてもよいし、テレビ会議端末装置１１とテレビ会議端末装置１２とは別の装置に含まれていてもよい。
例えば、基準位置算出部１２２が、テレビ会議端末装置１１に含まれ、通信ネットワークを介して撮像部１２１から画像を受けるようにしてもよい。 As described above, in the video conference system 1, the free viewpoint image generation unit generates a front image, and when the image expansion / contraction unit 126 views the front image from the second viewer viewpoint position, the viewer P11 is viewed from the front. Convert the image so that it can be seen. Therefore, a conversation person who views the display unit 127 from the second viewer reference position can have a conversation while viewing an image whose line of sight coincides with the viewer P11 who is the other conversation person. In addition, the conversation person has a conversation while viewing a natural image that can be seen when the display unit 127 is viewed from the front instead of an image obtained by vertically deforming the viewer P11 when the display unit 127 is viewed from an oblique direction. be able to.
The arrangement of each part in the video conference system 1 is not limited to that in FIG. 4, and the reference position selection unit 112, the free viewpoint image generation unit 113, the reference position calculation unit 122, and the image expansion / contraction unit 126 are included in the video conference terminal device 11. And the video conference terminal device 12, or the video conference terminal device 11 and the video conference terminal device 12 may be included in different devices.
For example, the reference position calculation unit 122 may be included in the video conference terminal device 11 and receive an image from the imaging unit 121 via a communication network.

＜第２の実施形態＞
図１５は、本発明の第２の実施形態におけるテレビ会議システム２の概略構成を示すシステム構成図である。同図において、テレビ会議システム２は、テレビ会議端末装置（第一端末装置）２１とテレビ会議端末装置（第二端末装置）２２とテレビ会議端末装置（第三端末装置）２３と会話者選択装置２４とを含んで構成される。テレビ会議端末装置２１とテレビ会議端末装置２２とテレビ会議端末装置２３と会話者選択装置２４とは、通信ネットワークによって互いに接続されている。テレビ会議端末装置２１は、撮像部（第一視聴者撮像部）２１１と基準位置算出部２１２と自由視点画像生成部２１３と画像伸縮部２１４及び２１５と表示部２１６及び２１７とを含んで構成される。テレビ会議端末装置２２は、撮像部（第二視聴者撮像部）２２１と基準位置算出部２２２と自由視点画像生成部２２３と画像伸縮部２２４及び２２５と表示部２２６及び２２７とを含んで構成される。テレビ会議端末装置２３は、撮像部（第三視聴者撮像部）２３１と基準位置算出部２３２と自由視点画像生成部２３３と画像伸縮部（第一画像伸縮部）２３４と画像伸縮部（第二画像伸縮部）２３５と表示部（第一表示部）２３６と表示部（第二表示部）２３７とを含んで構成される。会話者選択装置２４は会話者選択部２４１を含んで構成される。撮像部２１１は撮像装置２１１−１及び２１１−２を含んで構成される。撮像部２２１は撮像装置２２１−１及び２２１−２を含んで構成される。撮像部２３１は撮像装置２３１−１及び２３１−２を含んで構成される。 <Second Embodiment>
FIG. 15 is a system configuration diagram showing a schematic configuration of the video conference system 2 in the second embodiment of the present invention. In the figure, a video conference system 2 includes a video conference terminal device (first terminal device) 21, a video conference terminal device (second terminal device) 22, a video conference terminal device (third terminal device) 23, and a talker selection device. 24. The video conference terminal device 21, the video conference terminal device 22, the video conference terminal device 23, and the talker selection device 24 are connected to each other by a communication network. The video conference terminal device 21 includes an imaging unit (first viewer imaging unit) 211, a reference position calculation unit 212, a free viewpoint image generation unit 213, image expansion / contraction units 214 and 215, and display units 216 and 217. The The video conference terminal device 22 includes an imaging unit (second viewer imaging unit) 221, a reference position calculation unit 222, a free viewpoint image generation unit 223, image expansion / contraction units 224 and 225, and display units 226 and 227. The The video conference terminal device 23 includes an imaging unit (third viewer imaging unit) 231, a reference position calculation unit 232, a free viewpoint image generation unit 233, an image expansion / contraction unit (first image expansion / contraction unit) 234, and an image expansion / contraction unit (second An image expansion / contraction part) 235, a display part (first display part) 236, and a display part (second display part) 237 are configured. The talker selection device 24 includes a talker selection unit 241. The imaging unit 211 includes imaging devices 211-1 and 211-2. The imaging unit 221 includes the imaging devices 221-1 and 221-2. The imaging unit 231 includes the imaging devices 231-1 and 231-2.

撮像部２２１と基準位置算出部２２２とが本発明の基準位置出力部に対応し、基準位置出力部は、会話者の基準位置である第二視聴者基準位置を出力する。本実施形態では、撮像部２２１が会話者である第二視聴者を撮像し、この画像を用いて基準位置算出部２２２が第二視聴者の基準位置を算出することにより、基準位置出力部が第二視聴者基準位置を検出して出力する。
また、撮像部２１１と自由視点画像生成部２１３とが本発明の画像出力部に対応し、後述する第一視聴者を撮像して、正面画像を出力する。本実施形態では、撮像部２１１が撮像する画像を用いて自由視点画像生成部２１３が正面画像を生成し、生成した正面画像を出力することによって、画像出力部が正面画像を出力する。なお、画像出力部が正面画像を出力する方法は、前述の方法に限らず、テレビ会議システム１の場合と同様の方法など、他の方法を用いるようにしてもよい。 The imaging unit 221 and the reference position calculation unit 222 correspond to the reference position output unit of the present invention, and the reference position output unit outputs the second viewer reference position that is the reference position of the conversation person. In the present embodiment, the image capturing unit 221 captures an image of a second viewer who is a conversation person, and the reference position calculating unit 222 calculates the reference position of the second viewer using this image. The second viewer reference position is detected and output.
The imaging unit 211 and the free viewpoint image generation unit 213 correspond to the image output unit of the present invention, and capture a first viewer described later and output a front image. In the present embodiment, the free viewpoint image generation unit 213 generates a front image using an image captured by the imaging unit 211, and outputs the generated front image, so that the image output unit outputs the front image. Note that the method by which the image output unit outputs the front image is not limited to the method described above, and other methods such as the same method as in the video conference system 1 may be used.

撮像装置２１１−１と２１１−２と２２１−１と２２１−２と２３１−１と２３１−２とは、図１の撮像装置１１１−１等と同様である。
基準位置算出部２１２と２２２と２３２とは、図１の基準位置算出部１２２と同様に、基準位置を算出する。ただし、テレビ会議システム２において、１個のテレビ会議端末装置を一人の視聴者が使用する場合は、基準位置算出部２１２と２２２と２３２とは、各々１個の基準位置を算出する。１個のテレビ会議端末装置を複数の視聴者が使用する場合については後述する。
自由視点画像生成部２１３は、撮像部２１１が撮像した画像に基づいて、視聴者の視線方向から見た視聴者の画像を生成する。加えて、自由視点画像生成部２１３は、視聴者の視線方向から移動した位置から見た視聴者の画像を生成する。詳細については後述する。自由視点画像生成部２２３及び２３３も自由視点画像生成部２１３と同様である。
画像伸縮部２１４は、図１の画像伸縮部１２６と同様に、表示部２１６に表示する画像を伸縮する。画像伸縮部２１５と２２４と２２５と２３４と２３５とも画像伸縮部２１４と同様である。
表示部２１６と２１７と２２６と２２７と２３６と２３７とは、図１の表示部１２７と同様である。
会話者選択部２４１は、撮像部２１１と２２１と２３１とから入力される画像に基づいて、会話者を選択する。詳細については後述する。 The imaging devices 211-1, 211-2, 221-1, 221-2, 231-1, and 231-2 are the same as the imaging device 111-1 and the like in FIG.
The reference position calculation units 212, 222, and 232 calculate the reference position in the same manner as the reference position calculation unit 122 in FIG. However, in the video conference system 2, when one viewer uses one video conference terminal device, the reference position calculation units 212, 222, and 232 each calculate one reference position. The case where a plurality of viewers use one video conference terminal device will be described later.
The free viewpoint image generation unit 213 generates an image of the viewer viewed from the viewing direction of the viewer based on the image captured by the imaging unit 211. In addition, the free viewpoint image generation unit 213 generates an image of the viewer viewed from a position moved from the viewer's line-of-sight direction. Details will be described later. The free viewpoint image generation units 223 and 233 are the same as the free viewpoint image generation unit 213.
The image expansion / contraction unit 214 expands / contracts an image to be displayed on the display unit 216 in the same manner as the image expansion / contraction unit 126 of FIG. The image expansion / contraction units 215, 224, 225, 234, and 235 are the same as the image expansion / contraction unit 214.
The display units 216, 217, 226, 227, 236, and 237 are the same as the display unit 127 of FIG.
The talker selection unit 241 selects a talker based on images input from the imaging units 211, 221, and 231. Details will be described later.

図１６は、各表示部が表示する画面の例を示す図である。
同図（ａ）において会議室Ｒ２１に設置された表示部２１６と２１７とを視聴者Ｐ３１（第一視聴者）が見ている。表示部２１６には視聴者P３２（第二視聴者）が表示され、表示部２１７には視聴者Ｐ３３（第三視聴者）が表示されている。同図（ｂ）において会議室Ｒ２２に設置された表示部２２６と２２７とを視聴者Ｐ３２が見ている。表示部２２６には視聴者Ｐ３３が表示され、表示部２２７には視聴者Ｐ３１が表示されている。同図（ｃ）において会議室Ｒ２３に設置された表示部２３６と２３７とを視聴者Ｐ３３が見ている。表示部２３６には視聴者Ｐ３１が表示され、表示部２３７には視聴者Ｐ３２が表示されている。
また、同図は視聴者Ｐ３２とＰ３３とが会話者である場合の例である。同図（ａ）において、表示部２１６と２１７とには、それぞれ視聴者Ｐ３２とＰ３３とが互いに向き合うように表示される。一方、同図（ｂ）の表示部２２６には、第１の実施形態と同様、会話者Ｐ３３の画像が会話者Ｐ３２と視線が一致するように表示される。同様に、同図（ｃ）の表示部２３７には、会話者Ｐ３２の画像が会話者Ｐ３３と視線が一致するように表示される。
なお、テレビ会議端末装置２１、２２、２３のそれぞれ２つの表示部２１６・２１７、２２６・２２７、２３６・２３７は、別個の液晶表示装置を用いて形成してもよいし、あるいは１つの液晶表示装置のスクリーンを分割表示するようにして形成してもよい。 FIG. 16 is a diagram illustrating an example of a screen displayed by each display unit.
In FIG. 6A, the viewer P31 (first viewer) is looking at the display units 216 and 217 installed in the conference room R21. A viewer P32 (second viewer) is displayed on the display unit 216, and a viewer P33 (third viewer) is displayed on the display unit 217. In FIG. 5B, the viewer P32 is viewing the display units 226 and 227 installed in the conference room R22. The viewer P33 is displayed on the display unit 226, and the viewer P31 is displayed on the display unit 227. In FIG. 6C, the viewer P33 is viewing the display units 236 and 237 installed in the conference room R23. The viewer P31 is displayed on the display unit 236, and the viewer P32 is displayed on the display unit 237.
The figure is an example in which viewers P32 and P33 are conversational persons. In FIG. 6A, the display parts 216 and 217 are displayed so that viewers P32 and P33 face each other. On the other hand, as in the first embodiment, the image of the conversation person P33 is displayed on the display unit 226 in FIG. 7B so that the line of sight coincides with the conversation person P32. Similarly, the image of the conversation person P32 is displayed on the display unit 237 in FIG. 8C so that the line of sight coincides with the conversation person P33.
The two display units 216, 217, 226, 227, 236, and 237 of the video conference terminal devices 21, 22, and 23 may be formed using separate liquid crystal display devices, or one liquid crystal display. You may form so that the screen of an apparatus may be divided and displayed.

次に、テレビ会議システム２の動作について説明する。以下では、視聴者Ｐ３２とＰ３３とが会話者である場合について説明する。他の場合についても同様である。
撮像部２１１は視聴者Ｐ３１のステレオ画像（第一ステレオ画像）を撮像し、基準位置算出部２１２と自由視点画像生成部２１３と会話者選択部２４１とに入力する。撮像部２２１及び２３１も同様である。以下では、撮像部２２１が撮像する視聴者Ｐ３２のステレオ画像を第二ステレオ画像ともいう。
撮像部２１１から視聴者Ｐ３１の画像が入力されると、基準位置算出部２１２は視聴者Ｐ３１の基準位置を算出し、画像伸縮部２１４及び２１５に入力する。また、基準位置算出部２１２は算出した基準位置を自由視点画像生成部２１３に入力する。基準位置算出部２２２及び２３２も同様である。
会話者選択部２４１は、通信ネットワークを介して撮像部２１１と２２１と２３１とから受ける画像から、視聴者Ｐ３１とＰ３２とＰ３３との口の動きを検出し、例えば口の動き（口の開閉）の頻度に基づいて会話者を選択する。さらに会話者選択部２４１は選択した会話者の視線を検出することにより、相手の会話者を選択する。例えば、会話者選択部２４１は、視聴者Ｐ３２が頻繁に口を開閉していることを検出して視聴者Ｐ３２を会話者として選択する。さらに会話者選択部２４１は、視聴者Ｐ３２の視線検出を行い、視線が表示部２２６側にあることを検出して視聴者Ｐ３３を相手の会話者として選択する。会話者選択部２４１は選択した会話者を示す信号を自由視点画像生成部２１３と２２３と２３３とに、通信ネットワークを介して入力する。
なお、会話者選択装置２４はテレビ会議端末装置２１に付属して設けられ、他のテレビ会議端末装置２２、２３へは通信回線を介して接続されていてもよい。
なお、本実施形態はテレビ会議端末装置が４つまたはそれ以上の場合にも容易に拡張することができる。例えば、テレビ会議システムが４つのテレビ会議端末装置を備え、各テレビ会議端末装置は３つの表示部を備える。会話者選択部は、上記と同様にして、４つのテレビ会議端末装置のうち２つのテレビ会議端末装置の視聴者を会話者として選択する。会話者として選択されなかった視聴者のテレビ会議端末装置は、上記と同様にして、会話者同士が互いに向かい合う画像を表示する。 Next, the operation of the video conference system 2 will be described. Hereinafter, a case where the viewers P32 and P33 are talkers will be described. The same applies to other cases.
The imaging unit 211 captures a stereo image (first stereo image) of the viewer P31 and inputs the stereo image to the reference position calculation unit 212, the free viewpoint image generation unit 213, and the talker selection unit 241. The same applies to the imaging units 221 and 231. Hereinafter, the stereo image of the viewer P32 captured by the imaging unit 221 is also referred to as a second stereo image.
When the image of the viewer P31 is input from the imaging unit 211, the reference position calculation unit 212 calculates the reference position of the viewer P31 and inputs it to the image expansion / contraction units 214 and 215. Further, the reference position calculation unit 212 inputs the calculated reference position to the free viewpoint image generation unit 213. The same applies to the reference position calculation units 222 and 232.
The talker selection unit 241 detects mouth movements of the viewers P31, P32, and P33 from the images received from the imaging units 211, 221, and 231 via the communication network. Select a talker based on the frequency of Furthermore, the talker selection unit 241 selects the other talker by detecting the line of sight of the selected talker. For example, the talker selection unit 241 detects that the viewer P32 frequently opens and closes his / her mouth and selects the viewer P32 as a talker. Furthermore, the talker selection unit 241 detects the line of sight of the viewer P32, detects that the line of sight is on the display unit 226 side, and selects the viewer P33 as the conversation partner. The talker selection unit 241 inputs a signal indicating the selected talker to the free viewpoint image generation units 213, 223, and 233 via the communication network.
The talker selection device 24 may be provided attached to the video conference terminal device 21 and may be connected to the other video conference terminal devices 22 and 23 via a communication line.
Note that the present embodiment can be easily expanded when there are four or more video conference terminal devices. For example, the video conference system includes four video conference terminal devices, and each video conference terminal device includes three display units. The talker selection unit selects viewers of two video conference terminal devices among the four video conference terminal devices as the talker in the same manner as described above. In the same manner as described above, the video conference terminal device of the viewer who has not been selected as the conversation person displays images in which the conversation persons face each other.

図１７は、会話者選択部２４１が会話者を選択する処理手順を示すフローチャートである。
会話者選択部２４１は、会話者選択装置２４が起動すると、会話者を選択する処理を開始する。
ステップＳ４１において、会話者選択部２４１は、撮像部２１１と２２１と２３１とから受ける画像に基づいて会話者を選択する。ステップＳ４２において、会話者選択部２４１は、選択した会話者の視線を検出する。ステップＳ４３において、会話者選択部２４１は、検出した視線に基づいて相手の会話者を選択する。ステップＳ４４において、会話者選択部２４１は選択した会話を示す信号を自由視点画像生成部２１３と２２３と２３３とに入力する。その後ステップＳ４１を繰り返す。 FIG. 17 is a flowchart showing a processing procedure in which the talker selection unit 241 selects a talker.
The conversation selector 241 starts a process of selecting a conversation when the conversation selector 24 is activated.
In step S 41, the talker selection unit 241 selects a talker based on images received from the imaging units 211, 221, and 231. In step S42, the talker selection unit 241 detects the line of sight of the selected talker. In step S43, the talker selection unit 241 selects a partner talker based on the detected line of sight. In step S 44, the talker selection unit 241 inputs a signal indicating the selected conversation to the free viewpoint image generation units 213, 223, and 233. Then step S41 is repeated.

図１８は、会話者選択部２４１が自由視点画像生成部２１３と２２３と２３３とに入力するデータのデータ構成図である。
同図（ａ）において、会話者選択部２４１が選択して２人の会話者が、端末番号で示されている。端末番号は、その会話者が使用するテレビ会議端末装置の識別番号である。端末番号１はテレビ会議端末装置３１を示し、端末番号２はテレビ会議端末装置３２を示し、端末番号３はテレビ会議端末装置３３を示す。また、端末番号０は会話者が選択されていないことを示す。
同図（ｂ）は、会話者選択部２４１が会話者を選択していない場合に、会話者選択部２４１が自由視点画像生成部２１３と２２３と２３３とに入力するデータを示す。
会話者選択部２４１は、口を動かす頻度の閾値を内部の記憶部（不図示）に記憶しており、口を動かす頻度が閾値以上となる視聴者がいないと判断した場合は、会話者を選択しない。この場合会話者選択部２４１は同図（ｂ）のデータを自由視点画像生成部２１３と２２３と２３３とに入力する。 FIG. 18 is a data configuration diagram of data input by the talker selection unit 241 to the free viewpoint image generation units 213, 223, and 233.
In FIG. 9A, the two conversation parties selected by the conversation person selection unit 241 are indicated by terminal numbers. The terminal number is an identification number of the video conference terminal device used by the conversation person. Terminal number 1 indicates a video conference terminal device 31, terminal number 2 indicates a video conference terminal device 32, and terminal number 3 indicates a video conference terminal device 33. Terminal number 0 indicates that no conversation person is selected.
FIG. 5B shows data that the conversational selection unit 241 inputs to the free viewpoint image generation units 213, 223, and 233 when the conversational selection unit 241 has not selected a conversational party.
The talker selection unit 241 stores a threshold value for the frequency of moving the mouth in an internal storage unit (not shown), and determines that there is no viewer whose frequency of moving the mouth exceeds the threshold value. Do not select. In this case, the talker selection unit 241 inputs the data in FIG. 5B to the free viewpoint image generation units 213, 223, and 233.

なお、会話者選択部２４１が、音声認識など、口の動きを検出する以外の方法で会話者を選択するようにしてもよい。なお、表示部２１６と２１７と２２６と２２７と２３６と２３７とがタッチパネルになっており、会話者選択部２４１は、視聴者がいずれかの表示部上の位置に触れたことを検出すると、触れられた表示部に応じて会話者を選択するなど、上記以外の方法で会話者及び相手の会話者を選択するようにしてもよい。 The talker selection unit 241 may select a talker by a method other than detecting mouth movement such as voice recognition. Note that the display units 216, 217, 226, 227, 236, and 237 are touch panels, and the talker selection unit 241 detects that the viewer has touched any position on the display unit. The conversation person and the other conversation person may be selected by a method other than the above, such as selecting a conversation person according to the displayed display unit.

自由視点画像生成部２１３は、会話者選択部２４１から選択した会話者を示す信号が入力されると、入力された信号に基づいて視聴者Ｐ３１の画像を生成して画像伸縮部２２５及び２３４に入力する。
ここで、会話者選択部２４１から受ける信号に端末番号２が含まれる場合は、テレビ会議端末装置２２の視聴者Ｐ３２が会話者として選択されている。したがって、テレビ会議端末装置２２との関係では、この信号は会話者であることを示す信号である。この場合、自由視点画像生成部２１３は、第１の実施形態の自由視点画像生成部１１３と同様に、視聴者Ｐ３１の視線方向から撮像した画像を生成して画像伸縮部２２５に入力する。
一方、会話者選択部２４１から受ける信号に、端末番号２が含まれていない場合は、テレビ会議端末装置２２の視聴者Ｐ３２は会話者として選択されていない。したがって、テレビ会議端末装置２２との関係では、この信号は会話者でないことを示す信号である。この場合、自由視点画像生成部２１３は、後述するように視聴者Ｐ３１の視線方向から移動した位置から見た視聴者Ｐ３１の画像を生成して画像伸縮部２２５に、通信ネットワークを介して入力する。
同様に、会話者選択部２４１から受ける信号に端末番号３が含まれる場合は、視聴者Ｐ３１の視線方向から見た視聴者Ｐ３１の画像を生成して画像伸縮部２３５に入力する。一方、会話者選択部２４１から受ける信号に端末番号３が含まれない場合は、自由視点画像生成部２１３は、視聴者Ｐ３１の視線方向から移動した位置から見た視聴者Ｐ３１の画像を生成して画像伸縮部２３５に入力する。自由視点画像生成部２２３及び２３３も同様である。 When a signal indicating the selected conversation person is input from the conversation person selection section 241, the free viewpoint image generation section 213 generates an image of the viewer P 31 based on the input signal, and sends it to the image expansion / contraction sections 225 and 234. input.
Here, when the terminal number 2 is included in the signal received from the talker selection unit 241, the viewer P32 of the video conference terminal device 22 is selected as the talker. Therefore, in the relationship with the video conference terminal device 22, this signal is a signal indicating that it is a talker. In this case, like the free viewpoint image generation unit 113 of the first embodiment, the free viewpoint image generation unit 213 generates an image captured from the viewing direction of the viewer P31 and inputs the image to the image expansion / contraction unit 225.
On the other hand, when the terminal number 2 is not included in the signal received from the talker selection unit 241, the viewer P32 of the video conference terminal device 22 is not selected as a talker. Therefore, in the relationship with the video conference terminal device 22, this signal is a signal indicating that it is not a talker. In this case, the free viewpoint image generation unit 213 generates an image of the viewer P31 viewed from a position moved from the viewing direction of the viewer P31 as will be described later, and inputs the generated image to the image expansion / contraction unit 225 via the communication network. .
Similarly, when the terminal number 3 is included in the signal received from the talker selection unit 241, an image of the viewer P31 viewed from the viewing direction of the viewer P31 is generated and input to the image expansion / contraction unit 235. On the other hand, when the terminal number 3 is not included in the signal received from the talker selection unit 241, the free viewpoint image generation unit 213 generates an image of the viewer P31 viewed from the position moved from the viewing direction of the viewer P31. To the image expansion / contraction unit 235. The same applies to the free viewpoint image generation units 223 and 233.

図１９は、視聴者Ｐ３１が会話者でない場合、すなわち会話者選択部２４１から受ける信号に端末番号１が含まれない場合に、自由視点画像生成部２２３及び自由視点画像生成部２３３が生成する画像の視点位置と視聴者の視線方向とのずれの角度を示す図である。
同図において、点Ｐは視聴者Ｐ３１の基準位置、点Ｑは表示部２１６の画像中における視聴者Ｐ３２の基準位置、点Ｒは表示部２１７の画像中における視聴者Ｐ３３の基準位置を示す。自由視点画像生成部２２３が視聴者Ｐ３２の基準位置を画像の中心として画像を生成する。これにより、点Ｑは表示部２１６表示面中央である。同様に、点Ｒは表示部２１７の表示面中央である。三角形ＰＱＲの各頂点の角度は、頂点Ｑの角度がα、頂点Ｒの角度がβ、頂点Ｐの角度がπ−α−βである。ここに、πは円周率を示す。
自由視点画像生成部２２３は、点Ｑの位置として表示部２１６の表示面中央の位置を記憶しており、点Ｒの位置として表示部２１７の表示面中央の位置を記憶している。Ｐの位置は基準位置算出部２１２から受ける基準位置である。
自由視点画像生成部２２３は、点Ｑを中心とする、点Ｒから点Ｐへの向きを検出し、この向きの分だけ、視聴者Ｐ３２の視線方向から移動した位置から見た視聴者Ｐ３２の画像を生成する。
同様に、自由視点画像生成部２３３は、点Ｒを中心とする、点Ｑから点Ｐへの向きを検出し、この向きの分だけ、視聴者Ｐ３３の視線方向から移動した位置から見た視聴者Ｐ３３の画像を生成する。 FIG. 19 shows images generated by the free viewpoint image generation unit 223 and the free viewpoint image generation unit 233 when the viewer P31 is not a conversation person, that is, when the terminal number 1 is not included in the signal received from the conversation person selection unit 241. It is a figure which shows the angle of the shift | offset | difference of the viewpoint position of a viewer, and a viewer's gaze direction.
In the figure, point P indicates the reference position of the viewer P31, point Q indicates the reference position of the viewer P32 in the image of the display unit 216, and point R indicates the reference position of the viewer P33 in the image of the display unit 217. The free viewpoint image generation unit 223 generates an image with the reference position of the viewer P32 as the center of the image. Thereby, the point Q is the center of the display surface of the display unit 216. Similarly, the point R is the center of the display surface of the display unit 217. As for the angle of each vertex of the triangle PQR, the angle of the vertex Q is α, the angle of the vertex R is β, and the angle of the vertex P is π−α−β. Here, π represents the circumference ratio.
The free viewpoint image generation unit 223 stores the position of the center of the display surface of the display unit 216 as the position of the point Q, and stores the position of the center of the display surface of the display unit 217 as the position of the point R. The position P is a reference position received from the reference position calculation unit 212.
The free viewpoint image generation unit 223 detects the direction from the point R to the point P with the point Q as the center, and the viewer P32 viewed from the position moved from the viewing direction of the viewer P32 by this amount. Generate an image.
Similarly, the free viewpoint image generation unit 233 detects the direction from the point Q to the point P with the point R as the center, and viewing from the position moved from the viewing direction of the viewer P33 by this amount. An image of the person P33 is generated.

図２０は自由視点画像生成部２２３及び２３３が生成する画像の撮像位置を示す図である。同図（ａ）に示すように、自由視点画像生成部２２３は、視聴者Ｐ３２の視線方向から角度αだけ同図の左方向、すなわち、図１９の点Ｒの方向から点Ｐの方向に移動した位置から見た視聴者Ｐ３２の画像を生成する。この角度αは点Ｑを中心として点Ｒから点Ｐに回転する角度である。これにより、点Ｐのほうを向く視聴者Ｐ３２の画像が点Ｒのほうを向くように回転される。同様に、図２０（ｂ）に示すように、自由視点画像生成部２３３は、視聴者Ｐ３３の視線方向Ｌ３３から角度βだけ撮像位置を点Ｑの方向から点Ｐの方向に移動した位置から見た視聴者Ｐ３３の画像を生成する。
自由視点画像生成部２２３は、図１の基準位置選択部１１２と同様にして、それぞれ視聴者Ｐ３２の視線方向Ｌ３２及び視聴者Ｐ３３の視線方向Ｌ３３を検出する。そして、自由視点画像生成部２２３は、図１の自由視点画像生成部１１３と同様、撮像部２２１の撮像装置２２１−１及び２２１−２が撮像した画像を用いて、上記の画像を合成する。この際、自由視点画像生成部２２３は、自由視点画像生成部１１３と同様、視聴者Ｐ３２の基準位置を画像の中央に合わせて画像を生成する。 FIG. 20 is a diagram illustrating the imaging positions of images generated by the free viewpoint image generation units 223 and 233. As shown in FIG. 19A, the free viewpoint image generation unit 223 moves in the left direction of the figure by the angle α from the viewing direction of the viewer P32, that is, from the direction of the point R in FIG. An image of the viewer P32 viewed from the position is generated. This angle α is an angle that rotates from the point R to the point P around the point Q. As a result, the image of the viewer P32 facing the point P is rotated so as to face the point R. Similarly, as shown in FIG. 20B, the free viewpoint image generation unit 233 looks from the position where the imaging position is moved from the direction of the point Q to the direction of the point P from the viewing direction L33 of the viewer P33. An image of the viewer P33 is generated.
The free viewpoint image generation unit 223 detects the viewing direction L32 of the viewer P32 and the viewing direction L33 of the viewer P33, respectively, similarly to the reference position selection unit 112 of FIG. And the free viewpoint image generation part 223 synthesize | combines said image using the image which the imaging devices 221-1 and 221-2 of the imaging part 221 imaged similarly to the free viewpoint image generation part 113 of FIG. At this time, like the free viewpoint image generation unit 113, the free viewpoint image generation unit 223 generates an image by matching the reference position of the viewer P32 with the center of the image.

なお、撮像部２１１〜２３１が撮像する角度によっては、撮像位置を上記のように回転した場合の画像を生成できない場合がある。例えば、撮像部２２１の撮像装置２２１−１及び撮像装置２２１−２が視聴者Ｐ３２の正面近くから撮像する場合、上記の角度αが大きく視聴者の横方向に近いと、三次元モデルの生成に必要な画像データが得られず、画像を生成できない。この場合は、撮像部２２１はさらに画像を生成する方向の近くに撮像装置を備える。自由視点画像生成部２２３は、撮像部２２１から受ける画像の中から角度αに応じて２枚の画像を選択し、選択した画像を用いて三次元モデルを生成し、撮像位置を角度α回転させた上記の画像をこの三次元モデルを用いて生成する。
なお、上記の撮像位置にカメラを設置しておき、このカメラを用いて撮像を行うようにしてもよい。例えば、視聴者Ｐ３２を撮像する可動式のカメラを用意しておく。自由視点画像生成部２２３は撮像位置を算出すると、可動式カメラが算出した撮像位置に移動して撮像を行うように制御する。これにより、自由視点画像生成部２２３が画像を合成する必要が無くなり計算量を削減できる。
自由視点画像生成部２１３及び２３３についても同様である。 Note that, depending on the angle captured by the imaging units 211 to 231, an image when the imaging position is rotated as described above may not be generated. For example, when the imaging device 221-1 and the imaging device 221-2 of the imaging unit 221 capture an image from near the front of the viewer P32, if the angle α is large and close to the viewer's lateral direction, a three-dimensional model is generated. Necessary image data cannot be obtained and an image cannot be generated. In this case, the imaging unit 221 further includes an imaging device near the direction in which the image is generated. The free viewpoint image generation unit 223 selects two images according to the angle α from the images received from the imaging unit 221, generates a three-dimensional model using the selected images, and rotates the imaging position by the angle α. The above image is generated using this three-dimensional model.
Note that a camera may be installed at the above imaging position, and imaging may be performed using this camera. For example, a movable camera that images the viewer P32 is prepared. After calculating the imaging position, the free viewpoint image generation unit 223 controls to move to the imaging position calculated by the movable camera and perform imaging. As a result, it is not necessary for the free viewpoint image generation unit 223 to synthesize images, and the amount of calculation can be reduced.
The same applies to the free viewpoint image generation units 213 and 233.

画像伸縮部２１４は自由視点画像生成部２２３から画像が入力されると、図１の画像伸縮部１２６と同様に画像の伸縮を行う。同様に、画像伸縮部２１５は自由視点画像生成部２３３から画像が入力されると画像の伸縮を行う。
図２１は画像伸縮部２１４及び２１５が行う画像の伸縮を示す図である。同図（ａ）に示すように、視聴者Ｐ３１の基準位置Ｐは表示部２１６に正対する方向から角度θ回転した位置にある。そこで、第１の実施形態と同様に、点Ｑを通り直線ＰＱに垂直な仮想表示面２１６’から表示部２１６への、基準位置Ｐを中心とする透過投影変換を、自由視点画像生成部２２３から受ける画像に対して行った画像を算出することにより、自由視点画像生成部２２３から受ける画像を伸縮する。画像伸縮部２１４は、算出した画像を表示部２１６に入力する。
同様に、画像伸縮部２１５は、同図（ｂ）に示されるように、仮想表示面２１７’から表示部２１７への、基準位置Ｐを中心とする透過投影変換を、自由視点画像生成部２３３から受ける画像対して行った画像を算出する。画像伸縮部２１５は算出した画像を表示部２１７に入力する。
表示部２１６は画像伸縮部２１４から受ける画像を表示し、表示部２１７は画像伸縮部２１５から受ける画像を表示する。 When an image is input from the free viewpoint image generation unit 223, the image expansion / contraction unit 214 performs image expansion / contraction similarly to the image expansion / contraction unit 126 of FIG. Similarly, when an image is input from the free viewpoint image generation unit 233, the image expansion / contraction unit 215 expands / contracts the image.
FIG. 21 is a diagram illustrating image expansion / contraction performed by the image expansion / contraction units 214 and 215. As shown in FIG. 5A, the reference position P of the viewer P31 is at a position rotated by an angle θ from the direction facing the display unit 216. Therefore, similarly to the first embodiment, the free viewpoint image generation unit 223 performs transmission projection conversion centered on the reference position P from the virtual display surface 216 ′ passing through the point Q and perpendicular to the straight line PQ to the display unit 216. The image received from the free viewpoint image generation unit 223 is expanded and contracted by calculating an image performed on the image received from the image. The image expansion / contraction unit 214 inputs the calculated image to the display unit 216.
Similarly, the image expansion / contraction unit 215 performs transmissive projection conversion around the reference position P from the virtual display surface 217 ′ to the display unit 217, as shown in FIG. The image performed on the image received from is calculated. The image expansion / contraction unit 215 inputs the calculated image to the display unit 217.
Display unit 216 displays an image received from image expansion / contraction unit 214, and display unit 217 displays an image received from image expansion / contraction unit 215.

以上により、表示部２１６は、視聴者Ｐ３２が表示部２１７に表示される視聴者Ｐ３３を見ているように見える画像を表示する。同様に、表示部２１７は、視聴者Ｐ３３が表示部２１６に表示される視聴者Ｐ３２を見ているように見える画像を表示する。したがって、視聴者Ｐ３１は、会話する二人の視聴者Ｐ３２とＰ３３の視線が一致しているように見える、より自然な画像を見ることができる。また、テレビ会議システム２は、会話者が変わった場合には会話者選択部２４１が新たな会話者を選択するので、会話する視聴者同士の視線が一致するような画像を表示することが出来る。
また、テレビ会議システム２は、会話者に対しては視線方向から見た場合の画像を表示するので、テレビ会議システム１の場合と同様、会話者は相手の会話者と視線が一致する画像を見ながら会話を行うことができる。
なお、テレビ会議システム２の各部の配置は図１５のものに限らず、基準位置算出部２１２と２２２と２３２と、自由視点画像生成部２１３と２２３と２３３と、画像伸縮部２１４と２１５と２２４と２２５と２３４と２３５とが、テレビ会議端末装置２１と２２と２３と会話者選択装置２４とのいずれに含まれるようにしてもよいし、これらと別の装置に含まれるようにしてもよい。例えば、会話者選択部２４１がテレビ会議端末装置２１に含まれるようにしてもよい。これにより、テレビ会議システム２が会話者選択装置２４を具備する必要が無くなる。 As described above, the display unit 216 displays an image that looks as if the viewer P32 is viewing the viewer P33 displayed on the display unit 217. Similarly, the display unit 217 displays an image that looks like the viewer P33 is viewing the viewer P32 displayed on the display unit 216. Therefore, the viewer P31 can see a more natural image that looks like the line of sight of the two viewers P32 and P33 having a conversation. In the video conference system 2, when the conversation person changes, the conversation person selection unit 241 selects a new conversation person, so that an image can be displayed so that the lines of sight of the conversational audiences match. .
Further, since the video conference system 2 displays an image when viewed from the line of sight to the conversation person, the conversation person displays an image whose line of sight matches that of the other conversation person as in the case of the video conference system 1. You can talk while watching.
The arrangement of each part of the video conference system 2 is not limited to that shown in FIG. 15, but the reference position calculation units 212, 222, and 232, the free viewpoint image generation units 213, 223, and 233, and the image expansion / contraction units 214, 215, and 224. 225, 234, and 235 may be included in any of the video conference terminal devices 21, 22, 23, and the conversational selection device 24, or may be included in devices other than these. . For example, the talker selection unit 241 may be included in the video conference terminal device 21. This eliminates the need for the video conference system 2 to include the talker selection device 24.

なお、１個のテレビ会議端末装置を複数の視聴者が使用するようにしてもよい。例えば、テレビ会議端末装置２１を視聴者Ａ及びＢが使用し、テレビ会議端末装置２２を視聴者Ｃ及びＤが使用し、テレビ会議端末装置２３を視聴者Ｅ及びＦが使用する場合、視聴者Ｃと視聴者Ｅとが会話者であるときは、表示部２１６は視聴者Ｃの画像を表示し、表示部２１７は視聴者Ｅの画像を表示する。
具体的には、撮像部２１１は、テレビ会議端末装置２１を使用する全ての視聴者を含む画像を撮像する。撮像部２２１と２３１とも同様である。基準位置算出部２１２は、撮像部２１１から入力される画像に含まれる視聴者の各々の基準位置を算出し、図８で説明したように、算出した基準位置と表示部上の座標とを対応付けて、自由視点画像生成部２１３と画像伸縮部２１４および２１５に加えて、会話者選択部２４１にも入力する。基準位置算出部２２２と２３２とも同様である。 A single video conference terminal device may be used by a plurality of viewers. For example, when the viewers A and B use the video conference terminal device 21, the viewers C and D use the video conference terminal device 22, and the viewers E and F use the video conference terminal device 23, the viewers When C and the viewer E are conversations, the display unit 216 displays the image of the viewer C, and the display unit 217 displays the image of the viewer E.
Specifically, the imaging unit 211 captures an image including all viewers who use the video conference terminal device 21. The same applies to the imaging units 221 and 231. The reference position calculation unit 212 calculates each reference position of the viewer included in the image input from the imaging unit 211, and associates the calculated reference position with the coordinates on the display unit as described with reference to FIG. In addition, in addition to the free viewpoint image generation unit 213 and the image expansion / contraction units 214 and 215, the input is also made to the talker selection unit 241. The same applies to the reference position calculation units 222 and 232.

会話者選択部２４１は、撮像部２１１と２２１と２３１とから入力される画像に含まれる全ての視聴者の中から、上記の１個のテレビ会議端末装置を１人の視聴者が使用する場合と同様に視聴者の口の動きに基づいて、会話者を選択する。そして、会話者選択部２４１は、会話者の視線方向を検出し、会話者が注目する画面上の座標を算出する。会話者選択部２４１は、基準位置算出部２１２と２２２と２３２とから入力される、基準位置と表示部上の座標とを対応付けた情報に基づいて、会話者が注目する画面上の位置に表示される視聴者を相手の会話者として選択する。会話者選択部２４１は、選択した会話者を、基準位置算出部から入力された、基準位置と表示部上の座標とを対応付けた情報にさらに対応付ける。この対応付けは、例えば、基準位置算出部から入力された、基準位置と表示部上の座標とを対応付けた情報の視聴者毎に、図２５に示されるような、会話者か否かを示すフラグを付加することによって行う。
会話者選択部２４１は、生成した情報の、全てのテレビ会議端末装置に関するものを、自由視点画像生成部２１３と２２３と２３３とに入力する。 In the case where one viewer uses the above-described one video conference terminal device among all viewers included in the images input from the imaging units 211, 221, and 231. Similarly, a conversation person is selected based on the movement of the viewer's mouth. And the conversation person selection part 241 detects a conversation person's gaze direction, and calculates the coordinate on the screen which a conversation person pays attention to. The talker selection unit 241 sets the position on the screen to which the talker pays attention based on the information that is input from the reference position calculation units 212, 222, and 232 and associates the reference position with the coordinates on the display unit. Select the viewer to be displayed as the conversation partner. The talker selection unit 241 further associates the selected talker with information that is input from the reference position calculation unit and associates the reference position with the coordinates on the display unit. This association is performed by, for example, determining whether or not the person is a conversation person as shown in FIG. This is done by adding a flag to indicate.
The talker selection unit 241 inputs the generated information related to all the video conference terminal devices to the free viewpoint image generation units 213, 223, and 233.

自由視点画像生成部２１３は、撮像部２１１から入力されるステレオ画像に含まれる全ての視聴者について、三次元モデルを生成する。
そして、会話者選択部２４１から入力される情報において、テレビ会議端末装置２１の視聴者に会話者が含まれていない場合は、三次元モデルを生成した視聴者が全て正面を向いている画像を生成して出力する。この際、例えば基準位置に基づいて、視聴者の実際の位置に応じて視聴者を画面上に配置する。自由視点画像生成部２２３と２３３とについても同様である。
一方、自由視点画像生成部２１３と２２３と２３３とは、当該自由視点画像生成部が含まれるテレビ会議端末装置（例えば、自由視点画像生成部２１３であれば、テレビ会議端末装置２１）を使用する視聴者に会話者が含まれる場合は、もう一方の会話者が含まれるテレビ会議端末装置に対しては、会話者を正面から見た画像を入力し、会話者が含まれないテレビ会議端末装置に対しては、相手の会話者を向いているように見えるように、会話者の視線方向から移動した位置から見た視聴者の画像を入力する。 The free viewpoint image generation unit 213 generates a three-dimensional model for all viewers included in the stereo image input from the imaging unit 211.
And in the information input from the conversation person selection part 241, when the conversation person is not contained in the viewer of the video conference terminal device 21, all the viewers who generated the three-dimensional model face the front. Generate and output. At this time, for example, based on the reference position, the viewer is arranged on the screen according to the actual position of the viewer. The same applies to the free viewpoint image generation units 223 and 233.
On the other hand, the free viewpoint image generation units 213, 223, and 233 use the video conference terminal device including the free viewpoint image generation unit (for example, the video conference terminal device 21 in the case of the free viewpoint image generation unit 213). When a viewer includes a conversation person, an image of the conversation person viewed from the front is input to a video conference terminal apparatus including the other conversation person, and the conversation person is not included. , The viewer's image viewed from a position moved from the direction of the conversation's line of sight is input so as to appear to face the other party's conversation.

例えば、テレビ会議端末装置２２を使用する視聴者Ｃとテレビ会議端末装置２３を使用する視聴者Ｅとが会話者である場合は、自由視点画像生成部２２３は、テレビ会議端末装置２３に対しては、視聴者Ｃを正面から見た画像を入力する。一方、テレビ会議端末装置２１に対しては、上記の１個のテレビ会議端末装置を１人の視聴者が使用する場合の視聴者Ｐ３２と同様に、視聴者Ｃの視線方向から移動した位置から見た視聴者Ｃの画像を生成する。
同様に、自由視点画像生成部２３３は、上記の１個のテレビ会議端末装置を１人の視聴者が使用する場合の視聴者Ｐ３３と同様に、視聴者Ｅの視線方向から移動した位置から見た視聴者Ｅの画像を生成する。
このように、視聴者Ａおよび視聴者Ｂから見て、視聴者Ｃと視聴者Ｅとが互いに向かい合って見えるように、自由視点画像生成部２２３と自由視点画像生成部２３３とは、それぞれ視聴者Ｃと視聴者Ｅとの撮像位置を移動させた画像を生成して出力する。ここで、基準位置算出部２１２が視聴者Ａの視点位置と視聴者Ｂの視点位置との中点を基準位置としてさらに算出し、自由視点画像生成部２２３と自由視点画像生成部２３３とに入力する。自由視点画像生成部２２３と自由視点画像生成部２３３とは、基準位置算出部２１２から受ける基準位置に基づいて上記の視聴者Ｃの画像と視聴者Ｅの画像とを生成する。
これにより、視聴者Ａの基準位置と視聴者Ｂの基準位置との中点から見た場合に視聴者Ｃと視聴者Ｅとが向かい合って見える画像が表示される。視聴者Ａと視聴者Ｂとは、この中点に比較的近い位置から表示部２１６と表示部２１７とを見ていると考えられるので、視聴者Ａと視聴者Ｂとは、視聴者Ｃと視聴者Ｅとが互いに向かい合う自然な画像を見ることができる。他の視聴者が会話者である場合も同様である。 For example, when the viewer C who uses the video conference terminal device 22 and the viewer E who uses the video conference terminal device 23 are talkers, the free viewpoint image generation unit 223 sends the video conference terminal device 23 to the video conference terminal device 23. Inputs an image of the viewer C viewed from the front. On the other hand, with respect to the video conference terminal device 21, from the position moved from the viewing direction of the viewer C, similarly to the viewer P32 when one viewer uses the one video conference terminal device described above. An image of the viewed viewer C is generated.
Similarly, the free viewpoint image generation unit 233 looks from the position moved from the viewing direction of the viewer E, similarly to the viewer P33 when one viewer uses the one video conference terminal device. An image of the viewer E is generated.
As described above, the free viewpoint image generation unit 223 and the free viewpoint image generation unit 233 are configured so that the viewer C and the viewer E can be viewed from the viewer A and the viewer B, respectively. An image obtained by moving the imaging positions of C and viewer E is generated and output. Here, the reference position calculation unit 212 further calculates a midpoint between the viewpoint position of the viewer A and the viewpoint position of the viewer B as a reference position, and inputs it to the free viewpoint image generation unit 223 and the free viewpoint image generation unit 233. To do. The free viewpoint image generation unit 223 and the free viewpoint image generation unit 233 generate the viewer C image and the viewer E image based on the reference position received from the reference position calculation unit 212.
As a result, an image in which viewer C and viewer E appear to face each other when viewed from the midpoint between the reference position of viewer A and the reference position of viewer B is displayed. Since viewer A and viewer B are considered to be viewing display unit 216 and display unit 217 from a position relatively close to this midpoint, viewer A and viewer B are viewer C and viewer C. It is possible to see a natural image where the viewer E faces each other. The same applies when other viewers are conversational.

あるいは、１個のテレビ会議端末装置を複数人の視聴者が使用する上記の場合において、表示部２１６と表示部２１７とが、画面を右から見たときと左から見たときで異なる画像を表示する液晶ディスプレイを含んで構成されるようにしてもよい。画面を右から見たときと左から見たときで異なる映像を表示する液晶ディスプレイは、例えばカーナビゲーションシステムにおいて、運転席から見たときと助手席から見たときとで異なる映像を表示するディスプレイとして実用化されている。
この場合、基準位置算出部２１２は、視聴者Ａの基準位置と視聴者Ｂの基準位置とを算出して、自由視点画像生成部２２３と自由して画像生成部２３３とに出力する。自由視点画像生成部２２３と自由視点画像生成部２３３とは、視聴者Ａの基準位置から見た場合に視聴者Ｃと視聴者Ｅとが向かい合って見える画像と、視聴者Ｂの基準位置から見た場合に視聴者Ｃと視聴者Ｅとが向かい合って見える画像とを生成し、それぞれ表示部２１６と表示部２１７とに出力する。表示部２１６と表示部２１７とは、視聴者Ａの基準位置と視聴者Ｂの基準位置とに基づいて、視聴者Ａに対しては、視聴者Ａの基準位置から見た場合に視聴者Ｃと視聴者Ｅとが向かい合って見える画像を表示し、視聴者Ｂに対しては、視聴者Ｂの基準位置から見た場合に視聴者Ｃと視聴者Ｅとが向かい合って見える画像を表示する。これによって、視聴者Ａと視聴者Ｂとは、視聴者Ｃと視聴者Ｅとが互いに向かい合う自然な画像を見ることができる。他の視聴者が会話者である場合も同様である。
なお、表示部２１６、２１７、２２６、２２７、２３６、２３７が３方向以上の方向に対して異なる画像を表示する液晶ディスプレイを含んで構成されるようにしてもよい。これにより、２方向に対して異なる画像を表示する上記の場合よりも多くの視聴者に対して自然な画像を表示することができる。 Alternatively, in the above case where a plurality of viewers use one video conference terminal device, the display unit 216 and the display unit 217 display different images when the screen is viewed from the right and when viewed from the left. You may make it comprise including the liquid crystal display to display. A liquid crystal display that displays different images when viewed from the right and when viewed from the left, for example, in a car navigation system, a display that displays different images when viewed from the driver's seat and when viewed from the passenger seat Has been put to practical use.
In this case, the reference position calculation unit 212 calculates the reference position of the viewer A and the reference position of the viewer B, and outputs them to the free viewpoint image generation unit 223 and the image generation unit 233 freely. The free viewpoint image generation unit 223 and the free viewpoint image generation unit 233 are viewed from the reference position of the viewer B and the image that the viewer C and the viewer E face each other when viewed from the reference position of the viewer A. In such a case, an image that the viewer C and the viewer E see to face each other is generated and output to the display unit 216 and the display unit 217, respectively. The display unit 216 and the display unit 217 are based on the reference position of the viewer A and the reference position of the viewer B. For the viewer A, when viewed from the reference position of the viewer A, the viewer C And an image that viewer E looks to face each other, and viewer B displays an image that viewer C and viewer E seem to face each other when viewed from the reference position of viewer B. Thereby, the viewer A and the viewer B can see a natural image in which the viewer C and the viewer E face each other. The same applies when other viewers are conversational.
The display units 216, 217, 226, 227, 236, and 237 may include a liquid crystal display that displays different images in three or more directions. As a result, a natural image can be displayed for a larger number of viewers than in the case where different images are displayed in two directions.

＜第３の実施形態＞
図２２は本発明の第３の実施形態におけるテレビ会議システム３の概略構成を示すシステム構成図である。同図において、テレビ会議システム３は、テレビ会議端末装置（第一端末装置）３１及びテレビ会議端末装置（第二端末装置）３２を含んで構成される。テレビ会議端末装置３１とテレビ会議端末装置３２とは、通信ネットワークによって互いに接続されている。テレビ会議端末装置３１は、撮像部（第一撮像部）３１１と基準位置選択部３１２と自由視点画像生成部３１３と表示部（第二表示部）３１６とを含んで構成される。テレビ会議端末装置３２は、撮像部（第二撮像部）３２１と基準位置算出部３２２と画像伸縮部３２６及び３２７と表示部３２８とを含んで構成される。撮像部３１１は、撮像装置３１１−１及び３１１−２を含んで構成される。撮像部３２１は、撮像装置３２１−１及び撮像装置３２１−２を含んで構成される。 <Third Embodiment>
FIG. 22 is a system configuration diagram showing a schematic configuration of the video conference system 3 in the third embodiment of the present invention. In the figure, the video conference system 3 includes a video conference terminal device (first terminal device) 31 and a video conference terminal device (second terminal device) 32. The video conference terminal device 31 and the video conference terminal device 32 are connected to each other by a communication network. The video conference terminal device 31 includes an imaging unit (first imaging unit) 311, a reference position selection unit 312, a free viewpoint image generation unit 313, and a display unit (second display unit) 316. The video conference terminal device 32 includes an imaging unit (second imaging unit) 321, a reference position calculation unit 322, image expansion / contraction units 326 and 327, and a display unit 328. The imaging unit 311 includes the imaging devices 311-1 and 311-2. The imaging unit 321 includes an imaging device 321-1 and an imaging device 321-2.

撮像部３２１と基準位置算出部３２２とが本発明の基準位置検出部に対応し、会話者である第二視聴者を含む１人以上の視聴者について基準位置を検出する。本実施形態では撮像部３２１が撮像する画像を用いて、基準位置算出部３２２が基準位置を算出することにより、基準位置検出部は、基準位置を検出する。なお、基準位置検出部が基準位置を検出する方法は、前述の方法に限らず、テレビ会議システム１の場合と同様の方法など、他の方法を用いてもよい。
また、この基準位置検出部と基準位置選択部３１２とが本発明の基準位置出力部に対応し、第二視聴者の基準位置である第二視聴者基準位置を出力する。本実施形態では、基準位置検出部が検出した基準位置の中から、基準位置選択部３１２が会話者である第二視聴者の基準位置を選択することにより、第二視聴者基準位置を検出する。
また、撮像部３１１と自由視点画像生成部３１３とが本発明の画像出力部に対応し、第一視聴者を撮像して、正面画像を出力する。本実施形態では、撮像部３１１が撮像する画像を用いて、自由視点画像生成部３１３が正面画像を生成し、生成した正面画像を出力することにより、画像出力部が正面画像を出力する。なお、画像出力部が正面画像を出力する方法は、前述の方法に限らず、テレビ会議システム１の場合と同様の方法など、他の方法を用いてもよい。
また、画像伸縮部３２６と３２７とが本発明の画像伸縮部に対応し、第二視聴者基準位置に正対する仮想表示面から表示面への、第二視聴者基準位置を中心とする透過投影変換を、正面画像または後述する撮像位置を回転させた画像に対して行った画像を算出する。本実施形態では、画像伸縮部３２６または画像伸縮部３２７に、正面画像または撮像位置を回転させた画像と、第二視聴者基準位置とが入力されると、その画像伸縮部３２６または画像伸縮部３２７は、第二視聴者基準位置に正対する仮想表示面から表示面への、第二視聴者基準位置を中心とする透過投影変換を、正面画像または撮像位置を回転させた画像に対して行った画像を算出する。 The imaging unit 321 and the reference position calculation unit 322 correspond to the reference position detection unit of the present invention, and detect a reference position for one or more viewers including a second viewer who is a talker. In the present embodiment, the reference position calculation unit 322 calculates the reference position using the image captured by the imaging unit 321, so that the reference position detection unit detects the reference position. Note that the method of detecting the reference position by the reference position detection unit is not limited to the method described above, and other methods such as the same method as in the video conference system 1 may be used.
The reference position detection unit and the reference position selection unit 312 correspond to the reference position output unit of the present invention, and output a second viewer reference position that is a reference position of the second viewer. In the present embodiment, the second viewer reference position is detected by the reference position selection unit 312 selecting the reference position of the second viewer who is the talker from the reference positions detected by the reference position detection unit. .
In addition, the imaging unit 311 and the free viewpoint image generation unit 313 correspond to the image output unit of the present invention, image the first viewer, and output a front image. In the present embodiment, the free viewpoint image generation unit 313 generates a front image using the image captured by the imaging unit 311 and outputs the generated front image, so that the image output unit outputs the front image. Note that the method by which the image output unit outputs the front image is not limited to the method described above, and other methods such as the same method as in the video conference system 1 may be used.
Further, the image expansion / contraction units 326 and 327 correspond to the image expansion / contraction unit of the present invention, and transmission projection centered on the second viewer reference position from the virtual display surface facing the second viewer reference position to the display surface. An image obtained by performing conversion on the front image or an image obtained by rotating an imaging position described later is calculated. In the present embodiment, when a front image or an image obtained by rotating an imaging position and a second viewer reference position are input to the image expansion / contraction unit 326 or the image expansion / contraction unit 327, the image expansion / contraction unit 326 or the image expansion / contraction unit 327 performs transmission projection conversion centered on the second viewer reference position from the virtual display surface facing the second viewer reference position to the display surface with respect to the front image or the image obtained by rotating the imaging position. Calculate the image.

表示部３１６は図１の表示部１１６と同様である。
表示部３２８は、画面を右から見たときと左から見たときで異なる映像を表示する液晶ディスプレイを含んで構成される。撮像部３１１は視聴者Ｐ４１（第一視聴者）のステレオ画像（第一ステレオ画像）を撮像し、撮像部３２１は視聴者Ｐ５１（第二視聴者）及び視聴者Ｐ５２のステレオ画像（第二ステレオ画像）を撮像する。撮像装置３１１−１と３１１−２と３２１−１と３２１−２とは、図１の撮像装置１１１−１等と同様である。
自由視点画像生成部３１３は、撮像部３１１が撮像した画像に基づいて、視聴者の視線方向から見た視聴者の画像を生成する。加えて、自由視点画像生成部３１２は、視聴者の視線方向から移動した位置から見た視聴者の画像を生成する。詳細については後述する。
基準位置算出部３２２は、図１の基準位置算出部１２２と同様に、基準位置を算出する。
画像伸縮部３２６は、表示部３２８が画像を表示する方向のうち表示部３２８に向かって左方向に対応付けられ、表示部３２８を左から見たときの画像を図１の画像伸縮部１２６と同様に伸縮する。画像伸縮部３２７は、表示部３２８が画像を表示する方向のうち表示部３２８に向かって右方向に対応付けられ、表示部３２８を右から見たときの画像を図１の画像伸縮部１２６と同様に伸縮する。 The display unit 316 is the same as the display unit 116 of FIG.
The display unit 328 includes a liquid crystal display that displays different images when the screen is viewed from the right and when viewed from the left. The imaging unit 311 captures a stereo image (first stereo image) of the viewer P41 (first viewer), and the imaging unit 321 includes stereo images (second stereo) of the viewer P51 (second viewer) and the viewer P52. Image). The imaging devices 311-1, 311-2, 321-1, and 321-2 are the same as the imaging device 111-1 and the like in FIG.
The free viewpoint image generation unit 313 generates an image of the viewer viewed from the viewing direction of the viewer based on the image captured by the imaging unit 311. In addition, the free viewpoint image generation unit 312 generates an image of the viewer viewed from a position moved from the viewer's line-of-sight direction. Details will be described later.
The reference position calculation unit 322 calculates the reference position in the same manner as the reference position calculation unit 122 in FIG.
The image expansion / contraction unit 326 is associated with the left direction toward the display unit 328 among the directions in which the display unit 328 displays an image. It expands and contracts in the same way. The image expansion / contraction unit 327 is associated with the display unit 328 in the right direction among the directions in which the display unit 328 displays an image. It expands and contracts in the same way.

図２３は表示部３２８が表示する画像の例を示す図である。同図において、会議室Ｒ３２に設置された表示部３２８を視聴者Ｐ５１及びＰ５２が見ている。また、視聴者Ｐ４１とＰ５１とが会話者である。同図（ａ）の表示部３２８には、表示部３２８を左から見た場合の画像が表示されている。視聴者Ｐ５１は会話者であり、表示部３２８には視聴者Ｐ４１が視聴者Ｐ５１と視線が一致するように表示される。また、同図（ｂ）の表示部３２８には、表示部を右から見た場合の画像が表示されている。表示部３２８には視聴者Ｐ４１が視聴者Ｐ５１の方向を見ている画像が表示される。
図２４は表示部３１６が表示する画像の例を示す図である。同図において、会議室Ｒ３１に設置された表示部３１６を視聴者Ｐ４１が見ている。 FIG. 23 is a diagram illustrating an example of an image displayed by the display unit 328. In the figure, viewers P51 and P52 are viewing the display unit 328 installed in the conference room R32. In addition, viewers P41 and P51 are talkers. The display unit 328 in FIG. 6A displays an image when the display unit 328 is viewed from the left. The viewer P51 is a conversation person, and the viewer P41 is displayed on the display unit 328 so that the line of sight coincides with the viewer P51. Further, an image when the display unit is viewed from the right is displayed on the display unit 328 of FIG. The display unit 328 displays an image in which the viewer P41 is looking in the direction of the viewer P51.
FIG. 24 is a diagram illustrating an example of an image displayed on the display unit 316. In the figure, the viewer P41 is viewing the display unit 316 installed in the conference room R31.

次に、テレビ会議システム３の動作について説明する。以下では、視聴者Ｐ４１と視聴者Ｐ５１とが会話者である場合について説明する。視聴者Ｐ４１と視聴者Ｐ５２とが会話者である場合も同様である。
撮像部３２１は視聴者Ｐ５１及びＰ５２の画像を撮像し、基準位置算出部３２２に入力し、また、通信ネットワークを介して表示部３１６に入力する。表示部３１６は撮像部３２１から受ける画像を表示する。
基準位置算出部３２２は、図１の基準位置算出部１２２と同様に、撮像部３２１から受ける画像に基づいて、視聴者Ｐ５１及びＰ５２の基準位置を算出する。基準位置算出部３２２は、算出した基準位置を基準位置選択部３１２に入力する。
撮像部３１１は、視聴者Ｐ５１の画像を撮像し、基準位置選択部３１２と自由視点画像生成部３１３とに入力する。
基準位置選択部３１２は、基準位置算出部３２２から通信ネットワークを介して受ける基準位置から、会話者に対応する１個の基準位置を選択する。基準位置選択部３１２は、図１の基準位置選択部１１２と同様に、視聴者Ｐ４１の視線を検出して会話者に対応する基準位置を選択する。基準位置選択部３１２は、基準位置算出部３２２から受ける基準位置に、選択した基準位置を示すフラグを付す。さらに、基準位置選択部３１２は、表示部３２８が画像を表示する方向を予め記憶しており、表示部３２８が画像を表示する方向と基準位置とを対応付ける。基準位置選択部３１２は、表示部３２８が画像を表示する方向と対応付けた基準位置を、自由視点画像生成部３１３に入力する。
この際、１個の方向に複数の基準位置を対応付け得る場合は、基準位置選択部３１２は、そのうち１個の基準位置のみを、この方向に対応付けて自由視点画像生成部３１３に入力する。これらの基準位置に会話者の基準位置が含まれる場合は、会話者の基準位置のみを、この方向に対応付けて自由視点画像生成部３１３に入力する。会話者の基準位置が含まれない場合は、自由視点画像生成部３１３は、例えば、同じ方向に対応付けられた基準位置のうち、基準位置算出部３２２から最初に入力される基準位置のみを、この方向に対応付けて自由視点画像生成部３１３に入力する。
また、基準位置選択部３１２は、自由視点画像生成部３１３に入力した基準位置を画像伸縮部３２６及び３２７に入力する。基準位置選択部３１２は、表示部３２８が画像を表示する方向に応じて、表示部３２８に向かって左方向に対応付けられる基準位置を画像伸縮部３２６に入力し、表示部３２８に向かって右方向に対応付けられる基準位置を画像伸縮部３２７に入力する。 Next, the operation of the video conference system 3 will be described. Below, the case where the viewer P41 and the viewer P51 are talkers is demonstrated. The same applies when the viewer P41 and the viewer P52 are talkers.
The imaging unit 321 captures images of the viewers P51 and P52, inputs the images to the reference position calculation unit 322, and inputs the images to the display unit 316 via the communication network. Display unit 316 displays an image received from imaging unit 321.
The reference position calculation unit 322 calculates the reference positions of the viewers P51 and P52 based on the image received from the imaging unit 321 similarly to the reference position calculation unit 122 in FIG. The reference position calculation unit 322 inputs the calculated reference position to the reference position selection unit 312.
The imaging unit 311 captures an image of the viewer P51 and inputs it to the reference position selection unit 312 and the free viewpoint image generation unit 313.
The reference position selection unit 312 selects one reference position corresponding to the talker from the reference positions received from the reference position calculation unit 322 via the communication network. Similarly to the reference position selection unit 112 in FIG. 1, the reference position selection unit 312 detects the line of sight of the viewer P41 and selects the reference position corresponding to the talker. The reference position selection unit 312 adds a flag indicating the selected reference position to the reference position received from the reference position calculation unit 322. Furthermore, the reference position selection unit 312 stores in advance the direction in which the display unit 328 displays an image, and associates the direction in which the display unit 328 displays an image with the reference position. The reference position selection unit 312 inputs a reference position associated with the direction in which the display unit 328 displays an image to the free viewpoint image generation unit 313.
At this time, when a plurality of reference positions can be associated with one direction, the reference position selection unit 312 inputs only one reference position to the free viewpoint image generation unit 313 in association with this direction. . When the reference position of the talker is included in these reference positions, only the reference position of the talker is input to the free viewpoint image generation unit 313 in association with this direction. When the reference position of the talker is not included, the free viewpoint image generation unit 313, for example, among the reference positions associated with the same direction, only the reference position that is input first from the reference position calculation unit 322, The free viewpoint image generation unit 313 is input in association with this direction.
The reference position selection unit 312 inputs the reference position input to the free viewpoint image generation unit 313 to the image expansion / contraction units 326 and 327. The reference position selection unit 312 inputs the reference position associated with the left direction toward the display unit 328 to the image expansion / contraction unit 326 according to the direction in which the display unit 328 displays the image, and the right side toward the display unit 328. A reference position associated with the direction is input to the image expansion / contraction unit 327.

図２５は、基準位置選択部３１２が自由視点画像生成部３１３に入力するデータの構成を示すデータ構成図である。
同図において、基準位置選択部３１２が自由視点画像生成部３１３に入力するデータは、２人の視聴者の基準位置と表示部上の座標と選択した会話者を示すフラグとを含んで構成される。基準位置及び表示部上の座標は、図８の基準位置及び表示部上の座標と同様である。会話者を示すフラグは、基準位置選択部３１２が選択した会話者をフラグの値「１」にて示し、基準位置選択部３１２が選択した以外の会話者をフラグの値「０」にて示す。
また、同図において、「方向１」は、表示部３２８が画像を表示する方向のうち表示部３２８に向かって左側の方向を示し、「方向２」は、表示部３２８が画像を表示する方向のうち表示部３２８に向かって右側の方向を示す。「方向１」には「視聴者１」が対応付けられ、「方向２」には「視聴者２」が対応付けられている。 FIG. 25 is a data configuration diagram illustrating a configuration of data input to the free viewpoint image generation unit 313 by the reference position selection unit 312.
In the figure, the data input by the reference position selection unit 312 to the free viewpoint image generation unit 313 includes the reference position of two viewers, the coordinates on the display unit, and a flag indicating the selected talker. The The reference position and the coordinates on the display unit are the same as the reference position and the coordinates on the display unit in FIG. The flag indicating the conversation person indicates the conversation person selected by the reference position selection unit 312 by the flag value “1”, and indicates the conversation person other than the selection by the reference position selection part 312 by the flag value “0”. .
Also, in the figure, “direction 1” indicates the left direction toward the display unit 328 among the directions in which the display unit 328 displays an image, and “direction 2” indicates the direction in which the display unit 328 displays an image. Among these, the direction on the right side toward the display unit 328 is shown. “Viewer 1” is associated with “direction 1”, and “viewer 2” is associated with “direction 2”.

自由視点画像生成部３１３は、基準位置選択部３１２から受けるデータに基づいて、表示部３２８の表示面正面方向の左側に位置する視聴者の基準位置と右側に位置する視聴者の基準位置とを区別する。
自由視点画像生成部３１３は、会話者の基準位置に対しては、視聴者Ｐ４１の視線方向から見た視聴者Ｐ４１の画像を生成する。また、会話者以外の視聴者の基準位置に対しては、自由視点画像生成部３１３は、視聴者Ｐ４１の視線方向から移動した位置から見た視聴者Ｐ４１の画像を生成する。詳細は後述する。自由視点画像生成部３１３は生成した画像のうち、表示部３２８の表示面正面方向の左側に位置する視聴者の基準位置に対応する画像を画像伸縮部３２６に通信ネットワークを介して入力し、右側に位置する視聴者の基準位置に対応する画像を画像伸縮部３２７に通信ネットワークを介して入力する。このように、自由視点画像生成部３１３は、画像を表示する方向に応じた画像伸縮部に画像を入力することにより、画像と表示する方向とを対応付ける。 Based on the data received from the reference position selection unit 312, the free viewpoint image generation unit 313 determines the reference position of the viewer located on the left side of the display surface front direction of the display unit 328 and the reference position of the viewer located on the right side. Distinguish.
The free viewpoint image generation unit 313 generates an image of the viewer P41 viewed from the viewing direction of the viewer P41 with respect to the reference position of the conversation person. For the reference positions of viewers other than the talker, the free viewpoint image generation unit 313 generates an image of the viewer P41 viewed from the position moved from the viewing direction of the viewer P41. Details will be described later. Of the generated images, the free viewpoint image generation unit 313 inputs an image corresponding to the reference position of the viewer located on the left side in the front direction of the display surface of the display unit 328 to the image expansion / contraction unit 326 via the communication network. An image corresponding to the reference position of the viewer located at is input to the image expansion / contraction unit 327 via the communication network. As described above, the free viewpoint image generation unit 313 associates the image with the display direction by inputting the image to the image expansion / contraction unit corresponding to the image display direction.

図２６は基準位置選択部３１２が視聴者Ｐ５１を会話者として選択した場合に、自由視点画像生成部３１３が撮像位置を回転させる角度を示す図である。
同図において、点Ｓは視聴者Ｐ５２の基準位置、点Ｔは表示部３２８の画像中における視聴者Ｐ４１の基準位置、点Ｕは視聴者Ｐ５１の基準位置を示す。自由視点画像生成部３１３は、同図の角度γだけ撮像位置を点Ｓの方向から点Ｕの方向に回転した場合に得られる画像を生成する。
これにより、視聴者Ｐ５２は視聴者Ｐ５１と視線を合わせている視聴者Ｐ４１の画像を見る。 FIG. 26 is a diagram illustrating an angle at which the free viewpoint image generation unit 313 rotates the imaging position when the reference position selection unit 312 selects the viewer P51 as a talker.
In the figure, point S indicates the reference position of the viewer P52, point T indicates the reference position of the viewer P41 in the image of the display unit 328, and point U indicates the reference position of the viewer P51. The free viewpoint image generation unit 313 generates an image obtained when the imaging position is rotated from the direction of the point S to the direction of the point U by the angle γ in FIG.
Thereby, the viewer P52 sees the image of the viewer P41 in line of sight with the viewer P51.

このように、テレビ会議システム３は、会話者でない視聴者Ｐ５２に対して会話者Ｐ５１と視線を合わせている視聴者Ｐ４１の画像を表示するので、視聴者Ｐ５２は、会話者同士の視線方向が一致したより自然が画像を見ながら会議に参加できる。また、会話者である視聴者Ｐ５１に対しては、会話者Ｐ４１と視線が一致する画像を表示するので、会話者Ｐ５１は会話者Ｐ４１と視線が一致した画像を見ながらより自然に会話を行うことが出来る。
なお、表示部３２８備える液晶ディスプレイは、２方向に対して異なる映像を表示するものに限らず、３方向以上に対して異なる映像を表示するものであってもよい。この場合、テレビ会議端末装置３２は、液晶ディスプレイが異なる映像を表示する方向の数と同数の画像伸縮部を含んで構成され、自由視点画像生成部３１３は、各画像伸縮部に応じた視聴者Ｐ４１の画像を生成して、各画像伸縮部に入力する。このように、より多くの方向に対して異なる映像を表示することにより、より多くの視聴者に対してより自然な映像を表示することができる。
なお、テレビ会議システム３における各部の配置は図２２のものに限らず、基準位置選択部３１２と自由視点画像生成部３１３と基準位置算出部３２２と画像伸縮部３２６及び３２７とは、テレビ会議端末装置３１とテレビ会議端末装置３２とのいずれに含まれていてもよいし、テレビ会議端末装置３１とテレビ会議端末装置３２とは別の装置に含まれていてもよい。
例えば、基準位置算出部３２２が、テレビ会議端末装置３１に含まれ、通信ネットワークを介して撮像部３２１から画像を受けるようにしてもよい。 Thus, since the video conference system 3 displays the image of the viewer P41 who is in line of sight with the conversation person P51 for the viewer P52 who is not a conversation person, the viewer P52 has a line-of-sight direction between the conversation persons. You can join the meeting while looking at the images that match nature. Further, for the viewer P51 who is a conversation person, an image whose line of sight matches the conversation person P41 is displayed, so the conversation person P51 talks more naturally while viewing the image whose line of sight matches the conversation person P41. I can do it.
Note that the liquid crystal display provided with the display unit 328 is not limited to displaying different images in two directions, and may display different images in three or more directions. In this case, the video conference terminal device 32 is configured to include the same number of image expansion / contraction units as the number of directions in which different images are displayed on the liquid crystal display, and the free viewpoint image generation unit 313 is a viewer corresponding to each image expansion / contraction unit. An image of P41 is generated and input to each image expansion / contraction unit. In this way, by displaying different images in more directions, more natural images can be displayed for more viewers.
The arrangement of each part in the video conference system 3 is not limited to that shown in FIG. 22, and the reference position selection unit 312, the free viewpoint image generation unit 313, the reference position calculation unit 322, and the image expansion / contraction units 326 and 327 are included in the video conference terminal. Either the device 31 or the video conference terminal device 32 may be included, or the video conference terminal device 31 and the video conference terminal device 32 may be included in different devices.
For example, the reference position calculation unit 322 may be included in the video conference terminal device 31 and receive an image from the imaging unit 321 via a communication network.

なお、本発明は、次の態様で実施することもできる。
（１）通信ネットワークを介して互いに接続されたテレビ会議用の第一端末装置および第二端末装置を備え、第一端末装置が置かれた第一会議室の第一視聴者および第二端末装置が置かれた第二会議室の第二視聴者に互いの映像を表示するコミュニケーションシステムであって、第一端末装置は、第一視聴者を撮像する第一撮像部、第一撮像部が撮像した画像に基づいて第一視聴者の画像を生成する自由視点画像生成部および第二視聴者を表示する第一表示部を備え、第二端末装置は、第二視聴者を撮像する第二撮像部、前記自由視点画像生成部が生成し前記通信ネットワークを介して受信した画像を入力され、前記画像を伸縮して第二視聴者の視線方向を向いた第一視聴者の画像を生成する画像伸縮部、および前記画像伸縮部が生成した第二視聴者の視線方向を向いた第一視聴者の画像を表示する第二表示部を備え、第二撮像装置の撮像した画像を前記通信ネットワークを介して第一端末装置へ送信する、ことを特徴とするコミュニケーションシステム。
このコミュニケーションシステムでは、第二表示部が第二視聴者の視線方向を向いた第一視聴者の画像を表示するので、第二視聴者は、第一視聴者と視線が一致する画像を見ながら第一視聴者と会話を行うことができる。 In addition, this invention can also be implemented with the following aspect.
(1) A first viewer and a second terminal device in a first conference room provided with a first terminal device and a second terminal device for video conference, which are connected to each other via a communication network. Is a communication system that displays each other's video to the second viewer in the second conference room where the first terminal device is configured to capture the first viewer, and the first imaging unit captures the first viewer. A free viewpoint image generation unit that generates an image of the first viewer based on the captured image and a first display unit that displays the second viewer, and the second terminal device captures the second viewer An image generated by the free viewpoint image generation unit and received via the communication network, and generating an image of the first viewer facing the second viewer's line of sight by expanding and contracting the image The expansion / contraction part and the image expansion part generated by the image expansion / contraction part Including a second display unit that displays an image of the first viewer facing the viewing direction of the two viewers, and transmitting the image captured by the second imaging device to the first terminal device via the communication network, A characteristic communication system.
In this communication system, since the second display unit displays the image of the first viewer facing the second viewer's line of sight, the second viewer can view the image whose line of sight matches the first viewer. You can have a conversation with the first viewer.

（２）前記第二会議室には視聴者が複数人いて、その内の一人である第二視聴者が第二表示部に表示された前記第一視聴者を注目して発言することを特徴とする、上記（１）に記載のコミュニケーションシステム。
このコミュニケーションシステムでは、第二表示部が第二視聴者の視線方向を向いた第一視聴者の画像を表示するので、第二視聴者は、第一視聴者と視線が一致する画像を注目して第一視聴者と会話をし、第一視聴者に対して発言することができる。 (2) There are a plurality of viewers in the second conference room, and a second viewer who is one of the viewers speaks by paying attention to the first viewer displayed on the second display unit. The communication system according to (1) above.
In this communication system, since the second display unit displays the image of the first viewer facing the second viewer's line of sight, the second viewer pays attention to the image whose line of sight matches the first viewer. To talk to the first viewer and speak to the first viewer.

（３）前記第一撮像部は、複数の撮像装置を備え、前記自由視点画像生成部は、前記第一撮像部が撮像した画像に基づいて第一視聴者の三次元画像を生成することを特徴とする上記（１）または（２）に記載のコミュニケーションシステム。
このコミュニケーションシステムでは、複数の撮像装置が撮像した画像に基づいて、自由視点画像生成部が第一視聴者の三次元画像を生成する。これにより、上述のように、第二視聴者は、第一視聴者と視線が一致する画像を見ながら第一視聴者と会話を行うことができる。 (3) The first imaging unit includes a plurality of imaging devices, and the free viewpoint image generation unit generates a three-dimensional image of the first viewer based on the image captured by the first imaging unit. The communication system according to (1) or (2), characterized in that it is characterized in that
In this communication system, the free viewpoint image generation unit generates a three-dimensional image of the first viewer based on images captured by a plurality of imaging devices. Accordingly, as described above, the second viewer can have a conversation with the first viewer while viewing an image whose line of sight matches the first viewer.

（４）通信ネットワークを介して互いに接続されたテレビ会議用の第一端末装置および第二端末装置を備え、第一端末装置が置かれた第一会議室の第一視聴者および第二端末装置が置かれた第二会議室の第二視聴者に互いの画像を表示するコミュニケーションシステムであって、第一端末装置は、第一視聴者を撮像する第一撮像部、第一撮像部が撮像した画像に基づいて第一視聴者の視線方向から見た画像を生成する自由視点画像生成部および第二視聴者を表示する第一表示部を備え、第二端末装置は、第二視聴者を撮像する第二撮像部および前記自由視点画像生成部が生成し前記通信ネットワークを介して受信した画像を表示する表示面を、右から見たときと左から見たときとで異なる画像を表示する第二表示部を備え、第二撮像装置の撮像した画像を前記通信ネットワークを介して第一端末装置へ送信するとともに、第二表示部には視聴者に対応して異なる画像が表示される、ことを特徴とするコミュニケーションシステム。
このコミュニケーションシステムは、視聴者に対応して異なる画像が第二表示部に表示されるので、会話者である視聴者に対しては、この視聴者の視線方向を向いた第一視聴者の画像を表示することにより、会話者である視聴者は、第一視聴者と視線が一致する画像を見ながら第一視聴者と会話を行うことができる。また、会話者でない視聴者に対しては、会話者同士の視線が一致する、より自然な画像を表示することができる。 (4) A first viewer and a second terminal device of a first conference room provided with a first terminal device and a second terminal device for video conference, which are connected to each other via a communication network. Is a communication system that displays images of each other to the second viewer in the second conference room where the first terminal device captures the first viewer, and the first image capturing unit captures the first viewer. A free viewpoint image generation unit that generates an image viewed from the line of sight of the first viewer based on the image and a first display unit that displays the second viewer, and the second terminal device displays the second viewer The display surface for displaying the image generated by the second imaging unit for imaging and the free viewpoint image generation unit and received via the communication network displays different images when viewed from the right and when viewed from the left. A second display unit, A communication system, wherein the imaged image is transmitted to the first terminal device via the communication network, and a different image is displayed on the second display unit corresponding to the viewer.
In this communication system, different images corresponding to viewers are displayed on the second display unit. Therefore, for viewers who are conversational users, images of the first viewer facing the viewer's line of sight By displaying, a viewer who is a conversation person can have a conversation with the first viewer while viewing an image whose line of sight matches that of the first viewer. For viewers who are not conversational users, it is possible to display a more natural image in which the lines of sight of the conversational persons match.

（５）通信ネットワークを介して互いに接続されたテレビ会議用の第一端末装置、第二端末装置および第三端末装置を備え、各端末装置が置かれた各会議室の視聴者に互いの映像を表示するコミュニケーションシステムであって、通信ネットワークを介して前記各会議室へ接続され、各会議室の視聴者の中から会話者を選択する会話者選択装置を備え、前記各端末装置は、各会議室の視聴者を撮像する撮像部、前記撮像部からの画像を生成する自由視点画像生成部および他の会議室の視聴者の画像を表示する表示部を備え、各会議室の前記自由視点画像生成部は、会話者ではない視聴者の会議室の表示部へ送る画像として、回転された視聴者の画像を生成する、ことを特徴とするコミュニケーションシステム。
このコミュニケーションシステムでは、会話者ではない視聴者に対しては、回転された視聴者の画像を生成して会話者である視聴者同士が互いに向き合う、より自然な画像を表示することができる。 (5) A videoconferencing first terminal device, a second terminal device, and a third terminal device connected to each other via a communication network, and each viewer's video in each conference room in which each terminal device is placed Is connected to each conference room via a communication network, and includes a talker selection device that selects a talker from viewers of each conference room, and each terminal device includes The free viewpoint of each conference room, comprising: an imaging unit that images viewers in a conference room, a free viewpoint image generation unit that generates images from the imaging unit, and a display unit that displays images of viewers in other conference rooms A communication system, wherein the image generation unit generates a rotated viewer image as an image to be sent to a display unit in a conference room of a viewer who is not a talker.
In this communication system, a non-conversational viewer can generate a rotated viewer image and display a more natural image in which the conversing viewers face each other.

（６）画像を表示する表示面を含む表示部を具備するコミュニケーションシステムの視聴者表示方法であって、第一視聴者を撮像し、前記第一視聴者を略正面から見た画像を出力する画像出力ステップと、前記第一視聴者と会話する第二視聴者の視点の位置である第二視聴者基準位置を検出する基準位置出力ステップと、前記画像出力ステップが出力する画像に対して、前記第二視聴者基準位置と前記表示面の中心とを結ぶ直線に正対する仮想表示面から前記表示面への、前記第二視聴者基準位置を中心とする透視投影変換を行った画像を算出する画像伸縮ステップと、前記表示部が、前記画像伸縮部が算出した画像を表示する画像表示ステップと、を備えることを特徴とする視聴者表示方法。
この視聴者表示方法は、画像伸縮ステップが会話者の視点の位置である第二視聴者基準位置に基づいて画像を生成するので、第二視聴者基準位置から表示部を見る会話者に対して、この会話者が表示面に正対していない場合でも、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 (6) A viewer display method for a communication system including a display unit including a display surface for displaying an image, imaging the first viewer, and outputting an image obtained by viewing the first viewer from a substantially front side. For an image output step, a reference position output step for detecting a second viewer reference position that is a position of a viewpoint of a second viewer who talks with the first viewer, and an image output by the image output step, Calculates an image obtained by performing perspective projection transformation centered on the second viewer reference position from the virtual display surface facing the straight line connecting the second viewer reference position and the center of the display surface to the display surface A viewer display method comprising: an image expansion / contraction step, and an image display step in which the display unit displays an image calculated by the image expansion / contraction unit.
In this viewer display method, since the image expansion / contraction step generates an image based on the second viewer reference position, which is the position of the conversation person's viewpoint, for the conversation person who views the display unit from the second viewer reference position. Even if this conversation person is not directly facing the display surface, it is possible to display an image whose line of sight matches that of the other conversation person. It is possible to display a natural image that can be seen when the other party's conversation person is seen from the front, instead of the image deformed into the shape.

（７）コンピュータに、第一視聴者を撮像し、前記第一視聴者を略正面から見た画像を出力する画像出力ステップと、前記第一視聴者と会話する第二視聴者を含む１人以上の視聴者の視点の位置である基準位置の中から前記第二視聴者の基準位置である第二視聴者基準位置を選択する基準位置選択ステップと、を実行させるための視聴者表示プログラム。
このプログラムは、第一視聴者を略正面から見た画像を生成し、視聴者の視点の位置である第二視聴者基準位置を選択するので、この第一視聴者を略正面から見た画像を表示面の正面から見た場合に見える画像を、第二視聴者基準位置から見えるように変換して表示することにより、会話者である第二視聴者は表示面に正対していない場合でも、この会話者に対して、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面から見た場合に見える自然な画像を表示することができる。 (7) One person including an image output step of capturing an image of the first viewer on a computer and outputting an image of the first viewer viewed from substantially the front, and a second viewer having a conversation with the first viewer A viewer display program for executing a reference position selection step of selecting a second viewer reference position that is a reference position of the second viewer from a reference position that is a position of the viewer's viewpoint.
This program generates an image of the first viewer viewed from substantially the front, and selects the second viewer reference position, which is the position of the viewer's viewpoint, so that the image of the first viewer viewed from the front By converting the image that is visible when viewed from the front of the display surface so that it can be viewed from the second viewer reference position, even if the second viewer who is a conversation person is not facing the display surface For this conversation person, an image whose line of sight coincides with the other party's conversation person can be displayed, and the other person's conversation person who is seen when the display surface is viewed from an oblique direction is not an image that is deformed vertically. It is possible to display a natural image that can be seen when the other conversation person is viewed from the front.

（８）画像を表示する表示面を有する表示部を具備するコンピュータに、視聴者の視点の位置である基準位置を、第一視聴者と会話する第二視聴者を含む1人以上の視聴者について検出する基準位置検出部ステップと、第一視聴者を略正面から見た画像に対して、前記第二視聴者の基準位置である第二視聴者基準位置と前記表示面の中心とを結ぶ直線に正対する仮想表示面から前記表示面への、前記第二視聴者基準位置を中心とする透視投影変換を行った画像を算出する画像伸縮ステップと、前記画像伸縮ステップで算出した画像を前記表示部に表示する表示ステップと、を実行させるための視聴者表示プログラム。
このプログラムは、第二視聴者基準位置に正対する仮想表示面から表示面への、第二視聴者基準位置を視点とする投影を行う変換を、第一視聴者を略正面から見た画像に適用して表示するので、会話者である第二視聴者は表示面に正対していない場合でも、この会話者に対して、相手の会話者と視線が一致した画像を表示することができ、また、表示面を斜めから見た場合に見える相手の会話者が縦長に変形された画像ではなく、相手の会話者を正面からみた場合に見える自然な画像を表示することができる。 (8) One or more viewers including a second viewer who talks to a reference position, which is the position of the viewer's viewpoint, on a computer having a display unit having a display surface for displaying images. A reference position detecting unit step for detecting the first viewer, and connecting the second viewer reference position, which is the reference position of the second viewer, to the center of the display surface with respect to the image viewed from the front of the first viewer An image expansion / contraction step for calculating a perspective projection conversion centered on the second viewer reference position from the virtual display surface facing the straight line to the display surface, and the image calculated in the image expansion / contraction step A viewer display program for executing a display step of displaying on the display unit.
This program converts the projection from the virtual display surface facing the second viewer reference position to the display surface with the second viewer reference position as the viewpoint, into an image obtained by viewing the first viewer from substantially the front. Since it is applied and displayed, even if the second viewer who is a conversation person is not facing the display surface, the conversation person can display an image whose line of sight matches that of the other conversation person, In addition, it is possible to display a natural image that can be seen when the other party's conversation person is viewed from the front, instead of an image in which the other person's conversation person who is seen when the display surface is viewed from an oblique direction is deformed vertically.

なお、テレビ会議システム１〜３の全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各部の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 A program for realizing all or part of the functions of the video conference systems 1 to 3 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. The processing of each unit may be performed as necessary. Here, the “computer system” includes an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、本発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the specific structure is not restricted to this embodiment, The design change etc. of the range which does not deviate from the summary of this invention are included.

本発明は、コミュニケーションシステムに用いて好適である。 The present invention is suitable for use in a communication system.

１〜３テレビ会議システム
１１、１２、２１〜２３、３１、３２テレビ会議端末装置
２４会話者選択装置
１１１、１２１、２１１〜２３１、３１１、３２１撮像部
１２２、２１２、２２２、２３２、３２２基準位置算出部
１１２、３１２基準位置選択部
１１３、２１３、２２３、２３３、３１３自由視点画像生成部
１１６、１２７、２１６、２１７、２２６、２２７、２３６、２３７、３１６、３２８表示部
１２６、２１４、２１５、２２４、２２５、２３４、２３５、３２６、３２７画像伸縮部
２４１会話者選択部
1-3 Video conference system 11, 12, 21-23, 31, 32 Video conference terminal device 24 Conversation selection device 111, 121, 211-231, 311, 321 Imaging unit 122, 212, 222, 232, 322 Reference position Calculation unit 112, 312 Reference position selection unit 113, 213, 223, 233, 313 Free viewpoint image generation unit 116, 127, 216, 217, 226, 227, 236, 237, 316, 328 Display unit 126, 214, 215, 224, 225, 234, 235, 326, 327 Image expansion / contraction part 241 Conversation selection part

Claims

An image output unit that captures an image of the first viewer and outputs an image of the first viewer viewed from substantially the front;
A reference position output unit for detecting a second viewer reference position that is a position of a viewpoint of a second viewer who talks with the first viewer;
A display unit including a display surface for displaying the image of the first viewer;
With respect to the image output from the image output unit, the second viewer reference position from the virtual display surface to the display surface facing the straight line connecting the second viewer reference position and the center of the display surface An image expansion / contraction unit that calculates an image subjected to perspective projection conversion centered on
Comprising
The display unit displays the image calculated by the image expansion / contraction unit.
A communication system characterized by this.

The reference position output unit
A reference position detector that detects a reference position that is the position of the viewer's viewpoint for one or more viewers including the second viewer;
A reference position selection unit that selects the second viewer reference position from the reference positions detected by the reference position detection unit;
The communication system according to claim 1, further comprising:

A communication system having a first terminal device and a second terminal device connected via a communication path,
The first terminal device includes the image output unit and the reference position selection unit,
The communication system according to claim 2, wherein the second terminal device includes the reference position detection unit, the display unit, and the image expansion / contraction unit.

A second display unit having a second display surface for displaying an image of one or more viewers including the second viewer;
The image output unit includes:
A first imaging unit that images a first stereo image of the first viewer as a subject;
An image including the face of the first viewer is detected from the first stereo image, the line-of-sight direction of the first viewer is detected, and the face of the first viewer is detected based on the first stereo image. A free viewpoint image generation unit that generates and outputs an image including the face of the first viewer viewed from the line-of-sight direction.
Comprising
The reference position detector is
A second imaging unit that captures a second stereo image having the one or more viewers as subjects;
A reference position calculation unit for detecting a face of each of the one or more viewers or an image of a part thereof from the second stereo image and calculating a reference position of each of the one or more viewers;
Comprising
The reference position selection unit detects the viewing direction of the first viewer from the first stereo image, and the viewer displayed at the intersection of the viewing direction and the second display surface is the second viewer. Selecting the reference position of the viewer as the second viewer reference position from the reference positions calculated by the reference position calculation unit;
The communication system according to claim 2 or claim 3, wherein

The display surface displays different images in at least two directions;
The reference position selection unit selects the second viewer reference position and associates the second viewer reference position with any one of display directions in which the display surface displays the different images.
The display unit displays an image input from the image expansion / contraction unit in a direction associated with the second viewer reference position;
The communication system according to claim 2 or claim 3, wherein

A second display unit having a second display surface for displaying an image of one or more viewers including the second viewer;
The image output unit includes:
A first imaging unit that images a first stereo image of the first viewer as a subject;
An image including the face of the first viewer is detected from the first stereo image, the line-of-sight direction of the first viewer is detected, and the face of the first viewer is detected based on the first stereo image. A free viewpoint image generation unit that generates and outputs an image including the face of the first viewer viewed from the line-of-sight direction.
Comprising
The reference position detector is
A second imaging unit that captures a second stereo image having the one or more viewers as subjects;
A reference position calculation unit for detecting a face of each of the one or more viewers or an image of a part thereof from the second stereo image and calculating a reference position of each of the one or more viewers;
Comprising
The reference position selection unit detects the viewing direction of the first viewer from the first stereo image, and the viewer displayed at the intersection of the viewing direction and the second display surface is the second viewer. A reference position of the viewer is selected as the second viewer reference position from the reference positions calculated by the reference position calculation unit, and the center of the display surface and the position of the second viewer reference position Based on the relationship, associate any one of the display directions with the second viewer reference position,
The communication system according to claim 5.

The display surface displays different images in at least two directions;
The reference position selection unit associates any of the reference positions with each of the display directions in which the display surface displays the different images.
The image output unit generates an image of the first viewer viewed from a position determined based on a reference position associated with a direction not associated with the second viewer reference position, Associating the image and outputting it to the image expansion / contraction unit,
For each of the input directions, the image expansion / contraction unit is a virtual that directly faces a straight line connecting the reference position associated with the direction and the center of the display surface with respect to the image associated with the direction. Calculating an image obtained by performing perspective projection transformation centered on a reference position associated with the direction from the display surface to the display surface, and outputting the calculated image in association with the direction;
The display unit displays an image input from the image expansion / contraction unit in a direction associated with the image;
The communication system according to claim 2 or claim 3, wherein

A second display unit having a second display surface for displaying an image of one or more viewers including the second viewer;
The image output unit includes:
A first imaging unit that images a first stereo image of the first viewer as a subject;
An image including the face of the first viewer is detected from the first stereo image, the line-of-sight direction of the first viewer is detected, and the face of the first viewer is detected based on the first stereo image. A center of the display surface for each of the viewpoint positions excluding the second viewer reference position among the viewpoint positions associated with the display direction by the reference position selection unit , The direction from the viewpoint position to the second viewer reference position is detected, and an image viewed from the position moved in the detected direction from the line of sight of the first viewer is generated. A free viewpoint image generation unit that associates a display direction with which the position of the viewpoint is associated with the generated image and outputs the image to the image expansion / contraction unit;
Comprising
The reference position detector is
A second imaging unit that captures a second stereo image having the one or more viewers as subjects;
A reference position calculation unit that detects a face of the one or more viewers or a part of the image from the second stereo image, and calculates a reference position of each of the one or more viewers;
Comprising
The reference position selection unit detects the first viewer's line-of-sight direction from the first stereo image, and sets the viewer's reference position displayed at the intersection of the line-of-sight direction and the second display surface as the reference A second viewer reference position is selected from the reference positions calculated by the position calculation unit, and the second viewer reference position is selected based on the positional relationship between the center of the display surface and the second viewer reference position. One of the display directions is associated, and each of the display directions other than the direction associated with the second viewer reference position is set to any one of the reference positions other than the second viewer reference position as the display surface. Corresponding based on the positional relationship with each reference position,
The communication system according to claim 7.

A communication system having a first terminal device used by a first viewer, a second terminal device used by a second viewer, and a third terminal device used by a third viewer, which are connected to each other via a communication channel. And
Further comprising a talker selection unit for selecting any two of the first viewer, the second viewer, and the third viewer as a talker,
The image output unit generates an image obtained by viewing the first viewer from substantially the front when the talker selection unit selects the first viewer and the second viewer, and the image expansion and contraction is performed. When the first viewer and the third viewer are selected, an image of the first viewer viewed from other than the front is generated and output to the image expansion / contraction unit.
The communication system according to claim 1.

A communication system having a conversational selection device connected to the first terminal device, the second terminal device, and the third terminal device via a communication path,
The first terminal device includes the image output unit,
The second terminal device includes the reference position output unit, the display unit, and the image expansion / contraction unit,
The talker selection device includes the talker selection unit.
The communication system according to claim 9.

Further comprising a third viewer imaging unit that images the third viewer;
The image output unit includes:
A first viewer imaging unit that captures a first stereo image with the first viewer as a subject;
An image including the face of the first viewer is detected from the first stereo image, the line-of-sight direction of the first viewer is detected, and the face of the first viewer is detected based on the first stereo image. A three-dimensional model including the first viewer and the second viewer when the conversational selection unit selects the first viewer and the second viewer. , When the talker selection unit selects the first viewer and the third viewer, the reference position from the center position of the second display surface, centered on the center position of the display surface The direction in the direction of the reference position output by the output unit is detected, and an image viewed from a position moved in the detected direction from the direction of the first viewer's line of sight is generated, and the first view viewed from other than the front A free viewpoint image generator that outputs an image of a viewer;
Comprising
The reference position output unit
A second viewer imaging unit that captures a second stereo image with the second viewer as a subject;
A reference position calculation unit for detecting the second viewer's face or a part of the image from the second stereo image and calculating the second viewer reference position;
Comprising
The talker selection unit includes the first viewer, the image captured by the second viewer imaging unit, and the image captured by the third viewer imaging unit, respectively. Detecting the frequency of mouth movement, the frequency of mouth movement of the second viewer, and the frequency of mouth movement of the third viewer, selecting a talker based on the detected frequency, and selecting the selected talker Detect the line-of-sight direction from the image, and select the other party's talker based on the detected line-of-sight direction,
The communication system according to claim 10.