JP2017022600A

JP2017022600A - Image communication device

Info

Publication number: JP2017022600A
Application number: JP2015139768A
Authority: JP
Inventors: 川人　祥二; Shoji Kawahito; 祥二川人; 香川　景一郎; Keiichiro Kagawa; 景一郎香川
Original assignee: Shizuoka University NUC
Current assignee: Shizuoka University NUC
Priority date: 2015-07-13
Filing date: 2015-07-13
Publication date: 2017-01-26
Anticipated expiration: 2035-07-13
Also published as: JP6534120B2

Abstract

PROBLEM TO BE SOLVED: To provide a composite image that does not cause discomfort during communication between users.SOLUTION: An image communication device 1 comprises: a plurality of imaging cameras 9 arranged along a screen 3; a position detection unit 23 for detecting positional information on the position of a person on the phone relative to the screen 3; a direction detection unit 24 for detecting direction information on the direction of the person on the phone relative to the screen 3; an image synthesizing unit 27 for selecting an image of the person on the phone from images taken by the plurality of imaging cameras 9 on the basis of the positional information and the direction information on the person on the phone and for generating a composite image by combining the selected image of the person on the phone and a plurality of images; an image transmission unit 29 for transmitting the composite image to an opposite person side; and an image reception unit 31 for receiving a composite image from the opposite person side and displaying it on the screen 3.SELECTED DRAWING: Figure 4

Description

本発明は、双方向で画像を用いて通信する画像通信装置に関する。 The present invention relates to an image communication apparatus that performs bidirectional communication using an image.

近年、会議や個人的な通話を行う際に、双方向で画像を用いて通信するテレビ電話システムを使用することが一般的になってきている。複数のユーザ同士でテレビ電話システムを使用する際には、広い視野で画像を取得することが求められるため、表示画面の周辺に複数のカメラを配置する構成が用いられている。その際、複数のカメラで取得された画像が相手側のテレビ電話システムに送信される。 In recent years, it has become common to use a videophone system that performs bi-directional communication using images when performing a conference or a personal call. When a videophone system is used by a plurality of users, it is required to acquire an image with a wide field of view. Therefore, a configuration in which a plurality of cameras are arranged around the display screen is used. At that time, images acquired by a plurality of cameras are transmitted to the other party's videophone system.

一方、複数の画像を合成して１つの合成画像を得る技術としては、下記特許文献１，２に記載のものが知られている。このような技術によれば、複数のカメラで取得した複数の画像を高精度につなぎ合わせて合成することができる。 On the other hand, as a technique for synthesizing a plurality of images to obtain one composite image, those described in Patent Documents 1 and 2 below are known. According to such a technique, a plurality of images acquired by a plurality of cameras can be connected and combined with high accuracy.

特開２０１１−１２４８３７号公報JP 2011-124837 A 特開２０１４−８６０９７号公報JP 2014-86097 A

しかしながら、上述した従来の画像合成技術をテレビ電話システムに適用しただけでは、複数のユーザが自由に動き回った場合に、通話者同士で違和感のない合成画像を得ることは困難である。例えば、合成画像における通話者の画像が様々な方向から撮像されることになり、コミュニケーションが取りづらい状況になりやすい。 However, simply applying the above-described conventional image composition technique to a videophone system makes it difficult to obtain a composite image that does not give a sense of incongruity between callers when a plurality of users move around freely. For example, the image of the caller in the composite image is taken from various directions, and it is likely that communication is difficult.

そこで、本発明は、上記課題に鑑みて為されたものであり、ユーザ間でのコミュニケーション時に違和感のない合成画像を得ることが可能な画像通信装置を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide an image communication apparatus capable of obtaining a composite image without a sense of incongruity during communication between users.

上記課題を解決するため、本発明の一形態にかかる画像通信装置は、画面に沿って配置された複数の撮像用カメラと、画面に対する通話者の位置に関する位置情報を検出する位置検出部と、画面に対する通話者の向きに関する方向情報を検出する方向検出部と、位置情報と方向情報とに基づいて、通話者の画像を複数の撮像用カメラによって取得された複数の画像の中から選択し、選択した通話者の画像と複数の画像とを合成して合成画像を生成する画像合成部と、合成画像と複数の撮像用カメラによって取得された複数の画像とのうちのいずれかを送信する画像送信部と、通話者の通信相手である対話者側の画像を基に生成された合成画像を画面に表示する画像表示部と、を備える。 In order to solve the above-described problem, an image communication apparatus according to an aspect of the present invention includes a plurality of imaging cameras arranged along a screen, a position detection unit that detects position information regarding the position of a caller with respect to the screen, Based on the direction detection unit that detects the direction information about the direction of the caller with respect to the screen, the position information and the direction information, the caller's image is selected from a plurality of images acquired by a plurality of imaging cameras, An image synthesizing unit that synthesizes the selected caller image and a plurality of images to generate a composite image, and an image that transmits one of the composite image and a plurality of images acquired by a plurality of imaging cameras A transmission unit, and an image display unit that displays on the screen a composite image generated based on an image of a conversation party who is a communication partner of the caller.

上記形態の画像通信装置によれば、画面に対する通話者の位置に関する情報が検出されるとともに、画面に対する通話者の向きに関する情報が取得され、それらの情報を基に画面に沿って配置された複数の撮像用カメラによって取得された画像の中から通話者の画像が選択され、その画像と複数の撮像用カメラによって取得された複数の画像とを用いて合成画像が生成される。複数の撮像カメラによって取得された画像又は合成画像は対話者側に送信され、対話者側の画像を基に生成された合成画像が画面に表示される。このような構成により、通話者の位置及び向きを基に対話者にとってコミュニケーション時に違和感のない通話者の画像を合成画像に含ませて表示させることができる。その結果、対話者と通話者との円滑なコミュニケーションを実現することができる。 According to the image communication apparatus of the above aspect, information on the position of the caller with respect to the screen is detected, information on the direction of the caller with respect to the screen is acquired, and a plurality of pieces arranged along the screen based on the information The image of the caller is selected from the images acquired by the imaging cameras, and a composite image is generated using the images and the plurality of images acquired by the plurality of imaging cameras. Images or composite images acquired by a plurality of imaging cameras are transmitted to the conversation person side, and a composite image generated based on the conversation person side images is displayed on the screen. With such a configuration, an image of the caller who does not feel uncomfortable during communication for the conversation person can be included in the composite image and displayed based on the position and orientation of the caller. As a result, smooth communication between the interlocutor and the caller can be realized.

ここで、画像合成部は、位置情報によって特定される通話者の位置から方向情報によって特定される方向に伸ばした線上に最も近い撮像用カメラの画像を通話者の画像として選択する、こととしてもよい。こうすれば、通話者の画像を正対した画像として選択することができ、対話者にとってコミュニケーション時に違和感のない通話者の画像を合成画像に含ませて表示させることができる。その結果、対話者と通話者との円滑なコミュニケーションを的確に実現することができる。 Here, the image composition unit may select the image of the imaging camera closest to the line extended in the direction specified by the direction information from the position of the caller specified by the position information as the image of the caller. Good. In this way, the image of the caller can be selected as a directly facing image, and the image of the caller who does not feel uncomfortable during communication for the conversation person can be included in the composite image and displayed. As a result, smooth communication between the interlocutor and the caller can be realized accurately.

また、対話者の画面に対する位置に関する位置情報を取得する情報取得部をさらに備え、画像合成部は、対話者の位置情報に基づいて、合成画像を補正することとしてもよい。この場合、対話者と通話者との位置関係を基に対話者にとってコミュニケーション時に違和感のない通話者の画像を合成画像に含ませて表示させることができる。その結果、対話者と通話者との円滑なコミュニケーションを的確に実現することができる。 In addition, an information acquisition unit that acquires position information regarding the position of the conversation person on the screen may be further provided, and the image composition unit may correct the composite image based on the position information of the conversation person. In this case, based on the positional relationship between the interlocutor and the caller, an image of the caller who does not feel uncomfortable during communication for the interlocutor can be included in the composite image and displayed. As a result, smooth communication between the interlocutor and the caller can be realized accurately.

また、画像合成部は、画面を挟んだ仮想空間内に対話者の位置と通話者の位置とを反映させた場合に、対話者と通話者とを結んだ線上に最も近い撮像用カメラの画像を通話者の画像として選択するように合成画像を補正することとしてもよい。この場合には、対話者に対して通話者を正面から見た画像を合成画像として表示させることができる。その結果、対話者に対して、仮想空間内での実際の対話時に近い通話者の画像を表示させることができる。 Further, the image composition unit reflects the position of the conversation person and the position of the caller in the virtual space across the screen, and the image of the imaging camera closest to the line connecting the conversation person and the caller The composite image may be corrected so that is selected as the caller's image. In this case, an image of the conversation person seen from the front can be displayed as a composite image for the conversation person. As a result, it is possible to display an image of the caller who is close to the actual conversation in the virtual space.

また、位置検出部は、画面に沿って配置された距離画像を取得するカメラを含んでいてもよいし、方向検出部は、複数配列された指向性マイクを含んでいてもよい。この場合には、通話者の画面に対する位置或いは向きに関する情報を効率的に取得することができる。 Moreover, the position detection unit may include a camera that acquires a distance image arranged along the screen, and the direction detection unit may include a plurality of directional microphones arranged. In this case, information regarding the position or orientation of the caller with respect to the screen can be acquired efficiently.

本発明によれば、ユーザ間でのコミュニケーション時に違和感のない合成画像を得ることができる。 ADVANTAGE OF THE INVENTION According to this invention, the composite image without a sense of incongruity at the time of communication between users can be obtained.

本発明の好適な実施形態に係る画像通信装置の構成を示す斜視図である。1 is a perspective view illustrating a configuration of an image communication apparatus according to a preferred embodiment of the present invention. 図１の撮像用カメラ９の視野をスクリーン３に沿って上方から見た平面図である。FIG. 3 is a plan view of the field of view of the imaging camera 9 of FIG. 1 viewed from above along the screen 3. 図１の撮像用カメラ９及び測距用カメラ１１の視野をスクリーン３に沿って側面から見た側面図である。FIG. 3 is a side view of the field of view of the imaging camera 9 and the distance measuring camera 11 of FIG. 1 as viewed from the side along the screen 3. 図１の画像処理装置７の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image processing apparatus 7 of FIG. 図４の画像選択部２５による画像選択の手法を示す概念図である。It is a conceptual diagram which shows the method of the image selection by the image selection part 25 of FIG. 図４の画像合成部２７による合成画像の生成のイメージを示す図である。It is a figure which shows the image of the production | generation of the composite image by the image composition part 27 of FIG. 図１の画像通信装置１による画像通信の動作手順を示すフローチャートである。3 is a flowchart showing an operation procedure of image communication by the image communication apparatus 1 of FIG. 1. 本発明の変形例にかかる撮像用カメラ９及び測距用カメラ１１の構成を示す側面図である。It is a side view which shows the structure of the imaging camera 9 and the ranging camera 11 concerning the modification of this invention. 本発明の変形例にかかるカメラ３５の構成を示す側面図である。It is a side view which shows the structure of the camera 35 concerning the modification of this invention. 本発明の変形例における測距用カメラ１１及び撮像用カメラ９の配置を示す斜視図である。It is a perspective view which shows arrangement | positioning of the ranging camera 11 and the imaging camera 9 in the modification of this invention. 本発明の変形例における測距用カメラ１１及び撮像用カメラ９の配置を示す斜視図である。It is a perspective view which shows arrangement | positioning of the ranging camera 11 and the imaging camera 9 in the modification of this invention. 本発明の変形例における測距用カメラ１１の配置を示す斜視図である。It is a perspective view which shows arrangement | positioning of the ranging camera 11 in the modification of this invention.

以下、図面を参照しつつ本発明に係る画像通信装置の好適な実施形態について詳細に説明する。なお、図面の説明においては、同一又は相当部分には同一符号を付し、重複する説明を省略する。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of an image communication apparatus according to the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same or corresponding parts are denoted by the same reference numerals, and redundant description is omitted.

本発明の好適な一実施形態にかかる画像通信装置１は、遠隔地間で画像を送受信することによりコミュニケーションを行うための装置であり、この画像通信装置１を一組備えることにより、遠隔地の複数のユーザ間で画像を参照しながらコミュニケーションするテレビ電話システムを構成する。 An image communication apparatus 1 according to a preferred embodiment of the present invention is an apparatus for performing communication by transmitting and receiving images between remote locations. By providing this image communication device 1 as a set, a remote location is provided. A videophone system that communicates with reference to images among a plurality of users is configured.

図１には、本実施形態にかかる画像通信装置１の概略構成を示している。同図に示すように、画像通信装置１は、複数のユーザＵに対向するように配置されたスクリーン（画面）３と、スクリーン３の前面に画像光を投影することにより画像を表示させる画像表示装置（プロジェクタ、画像表示部）５と、画像処理を実行する画像処理装置７とを備える。この画像通信装置１は、同一の構成の装置（以下、「対向装置」と呼ぶ。）と通信ネットワークを介して接続され、画像通信装置１によって生成された合成画像を対向装置に向けて送信するとともに、対向装置によって生成された合成画像を受信し、その合成画像をスクリーン３に表示する。 FIG. 1 shows a schematic configuration of an image communication apparatus 1 according to the present embodiment. As shown in FIG. 1, the image communication apparatus 1 includes a screen 3 arranged so as to face a plurality of users U, and an image display that displays an image by projecting image light onto the front surface of the screen 3. An apparatus (projector, image display unit) 5 and an image processing apparatus 7 that executes image processing are provided. The image communication apparatus 1 is connected to an apparatus having the same configuration (hereinafter referred to as “opposing apparatus”) via a communication network, and transmits a composite image generated by the image communication apparatus 1 to the opposing apparatus. At the same time, the composite image generated by the opposite device is received, and the composite image is displayed on the screen 3.

スクリーン３には、スクリーン３の前面（画像表示面）側の画像を撮像する複数の撮像用カメラ９と、スクリーン３の前面側の距離画像を取得するための複数の測距用カメラ１１と、近赤外のパルス光を発するパルス光源１３と、複数のユーザＵの中から発声中のユーザである通話者の方向を検出する複数の指向性マイク１５とが取り付けられている。これらの測距用カメラ１１及びパルス光源１３は、スクリーン３に対する通話者の位置に関する位置情報を検出する位置検出部として機能し、指向性マイク１５は、スクリーン３に対する通話者の向きに関する方向情報を検出する方向検出部として機能する。 The screen 3 includes a plurality of imaging cameras 9 that capture images on the front surface (image display surface) side of the screen 3, a plurality of ranging cameras 11 that acquire distance images on the front surface side of the screen 3, A pulse light source 13 that emits near-infrared pulsed light and a plurality of directional microphones 15 that detect the direction of a speaker who is speaking out of a plurality of users U are attached. The ranging camera 11 and the pulse light source 13 function as a position detection unit that detects position information regarding the position of the caller with respect to the screen 3, and the directional microphone 15 indicates direction information regarding the direction of the caller with respect to the screen 3. It functions as a direction detection unit for detection.

撮像用カメラ９は、スクリーン３の前面に沿って二次元状に等間隔で配列された画像センサであり、スクリーン３に形成された微小穴を通じてスクリーン３の前面側の画像が撮像可能なようにスクリーン３の後面側に固定されている。例えば、撮像用カメラ９は、スクリーン３の長辺方向に沿って５個で等間隔に配列され、その５個の配列が短辺方向に２列並べられた構成を有する。 The imaging camera 9 is an image sensor that is two-dimensionally arranged along the front surface of the screen 3 at equal intervals so that an image on the front surface side of the screen 3 can be captured through a minute hole formed in the screen 3. It is fixed to the rear side of the screen 3. For example, five imaging cameras 9 are arranged at equal intervals along the long side direction of the screen 3, and the five arrays are arranged in two rows in the short side direction.

測距用カメラ１１は、スクリーン３の前面の中央の上端に２つ並んで固定されており、この測距用カメラ１１の間にパルス光源１３が配置されている。測距用カメラ１１は、パルス光源１３によって発せられた光パルスがターゲットに当たって戻る時間を各画素毎にリアルタイムで測定することにより各画素毎の距離情報を含む距離画像を取得するＴＯＦ（Time of Flight）方式の公知の画像センサである。測距用カメラ１１およびパルス光源１３の動作が画像処理装置７によって制御されることにより、リアルタイムで距離画像が取得される。 Two distance measuring cameras 11 are fixed side by side at the upper center of the front surface of the screen 3, and a pulse light source 13 is disposed between the distance measuring cameras 11. The distance measuring camera 11 obtains a distance image including distance information for each pixel by measuring in real time for each pixel the time when the light pulse emitted by the pulse light source 13 returns to the target. ) Type known image sensor. The operations of the distance measuring camera 11 and the pulse light source 13 are controlled by the image processing device 7, whereby a distance image is acquired in real time.

指向性マイク１５は、スクリーン３の前面の下端に複数並んで固定されており、スクリーン３の前面側のユーザＵを含む音源から発せられる音声信号を検出する。これらの指向性マイク１５で検出された音声信号が画像処理装置７で処理されることにより、スクリーン３の面に対するユーザＵを含む音源からの音の到来方向がリアルタイムで検出される。この検出方法としては、公知のマイクロフォンアレイを用いた音の到来方向の算出方法（例えば、相関関数法、遅延和アレイ法）が用いられる。 A plurality of directional microphones 15 are fixed side by side at the lower end of the front surface of the screen 3 and detect an audio signal emitted from a sound source including the user U on the front surface side of the screen 3. The sound signal detected by these directional microphones 15 is processed by the image processing device 7 so that the arrival direction of the sound from the sound source including the user U with respect to the surface of the screen 3 is detected in real time. As this detection method, a method of calculating the direction of arrival of sound using a known microphone array (for example, correlation function method, delay sum array method) is used.

図２は、撮像用カメラ９の視野をスクリーン３に沿って上方から見た平面図、図３は、撮像用カメラ９及び測距用カメラ１１の視野をスクリーン３に沿って側面から見た側面図である。これらの図に示すように、撮像用カメラ９は、スクリーン３からの距離が所定範囲の空間Ｓ_０の像を撮像可能なように焦点距離および画角が設定されている。さらに、撮像用カメラ９は、空間Ｓ_０内においては、上下および左右に隣り合う撮像用カメラ９の視野Ｖ_１が重なり合うようにそれらの光軸および画角が設定されている。これにより、撮像用カメラ９によって取得された画像を合成して合成画像を生成する際に、画像に捉えられた物体の不連続性を低減できる。また、測距用カメラ１１は、空間Ｓ_０内での死角の発生を低減するために、撮像用カメラ９に比較して広い視野Ｖ_２を有するようにその画角が設定されるとともに、その光軸は下方に向けられている。 FIG. 2 is a plan view of the imaging camera 9 viewed from above along the screen 3, and FIG. 3 is a side view of the imaging camera 9 and ranging camera 11 viewed from the side along the screen 3. FIG. As shown in these drawings, the imaging camera 9, the distance from the screen 3 is the focal length and the angle so as to enable imaging an image of the space S ₀ of the predetermined range is set. Furthermore, imaging camera 9, in the space S _0, their optical axis and angle as the viewing V ₁ of the imaging camera 9 adjacent vertically and horizontally overlap is set. Thereby, when the image acquired by the imaging camera 9 is combined to generate a combined image, the discontinuity of the object captured in the image can be reduced. In addition, in order to reduce the generation of blind spots in the space S ₀ , the ranging camera 11 is set to have a field angle that has a wider field of view V ₂ than the imaging camera 9, and The optical axis is directed downward.

次に、図４を参照しながら、画像処理装置７の構成について説明する。図４は、画像処理装置７の機能構成を示すブロック図である。画像処理装置７は、物理的には、パーソナルコンピュータ、画像処理デバイス等のデータ処理装置によって構成され、ＣＰＵ等の演算処理回路、ＲＡＭ、ＲＯＭ等のメモリ、ハードディスク装置等のデータ記憶装置、及び通信デバイスを内蔵している。そして、画像処理装置７は、機能的構成要素として、図４に示すように、位置情報取得部２１、位置検出部２３、方向検出部２４、画像選択部２５、画像合成部２７、画像送信部２９、及び画像受信部（画像表示部）３１を含む。 Next, the configuration of the image processing device 7 will be described with reference to FIG. FIG. 4 is a block diagram showing a functional configuration of the image processing apparatus 7. The image processing apparatus 7 is physically composed of a data processing apparatus such as a personal computer or an image processing device, an arithmetic processing circuit such as a CPU, a memory such as a RAM or a ROM, a data storage device such as a hard disk apparatus, and a communication. Built-in device. As shown in FIG. 4, the image processing apparatus 7 includes a position information acquisition unit 21, a position detection unit 23, a direction detection unit 24, an image selection unit 25, an image composition unit 27, and an image transmission unit as shown in FIG. 29 and an image receiving unit (image display unit) 31.

画像処理装置７の位置検出部２３は、複数の測距用カメラ１１によって検出された距離画像と、複数の撮像用カメラ９によって取得された複数の画像とを用いて、スクリーン３に対する通話者である複数のユーザＵの位置に関する位置情報を検出する。具体的には、位置検出部２３は、複数の撮像用カメラ９によって取得されたそれぞれの画像に公知の顔認識処理を施すことにより、それぞれの画像上における複数のユーザＵの顔位置を検出する。画像からユーザＵの顔を認識する際には機械学習を用いた公知の顔認識の手法が用いられる。その後、位置検出部２３は、画像上に認識したユーザＵの顔位置と、その画像に対応する撮像用カメラ９のレンズの中心位置とを基にして、三次元空間内のその中心位置を始点としたユーザＵの顔位置を通る直線の位置を特定する。さらに、位置検出部２３は、複数の距離画像を合成して空間Ｓ_０内に位置する物体の３次元形状を示す合成距離画像（デプスマップ）を生成し、この合成距離画像と、決定したそれぞれのユーザＵの顔位置に対応した直線を特定する情報とを用いることにより、複数のユーザＵの顔のスクリーン３を基準とした３次元座標を位置情報として算出する。すなわち、位置検出部２３は、合成距離画像中で人に形状が近似すると認識される物体のなかで上記直線と交差すると判断される物体を抽出し、その物体の３次元座標を、ユーザＵの顔の３次元座標として算出する。また、位置検出部２３は、検出した複数のユーザＵの顔の３次元座標を方向検出部２４及び画像選択部２５に出力する。ここで、位置検出部２３は、距離画像を合成しないで複数の距離画像を別々に処理することによってユーザＵの顔の３次元座標を検出してもよい。 The position detection unit 23 of the image processing device 7 is a caller with respect to the screen 3 using the distance images detected by the plurality of ranging cameras 11 and the plurality of images acquired by the plurality of imaging cameras 9. Position information regarding the positions of a plurality of users U is detected. Specifically, the position detection unit 23 performs a known face recognition process on each image acquired by the plurality of imaging cameras 9 to detect the face positions of the plurality of users U on each image. . When recognizing the face of the user U from the image, a known face recognition method using machine learning is used. Thereafter, the position detecting unit 23 starts the center position in the three-dimensional space based on the face position of the user U recognized on the image and the center position of the lens of the imaging camera 9 corresponding to the image. The position of the straight line passing through the user U's face position is specified. Further, the position detection unit 23 combines a plurality of distance images to generate a combined distance image (depth map) indicating the three-dimensional shape of the object located in the space S ₀ , and determines the combined distance image and each determined distance image. By using the information specifying the straight line corresponding to the face position of the user U, three-dimensional coordinates based on the screens 3 of the faces of the plurality of users U are calculated as position information. That is, the position detection unit 23 extracts an object that is determined to intersect the straight line from objects recognized to approximate a human shape in the composite distance image, and the three-dimensional coordinates of the object are determined by the user U's. Calculated as three-dimensional coordinates of the face. In addition, the position detection unit 23 outputs the detected three-dimensional coordinates of the faces of the plurality of users U to the direction detection unit 24 and the image selection unit 25. Here, the position detection unit 23 may detect the three-dimensional coordinates of the face of the user U by separately processing a plurality of distance images without synthesizing the distance images.

方向検出部２４は、指向性マイク１５によって得られた複数の音声信号を基にスクリーン３の面に対する音源からの音の到来方向を連続して複数の音源について検出する。そして、方向検出部２４は、検出した音の到来方向を基に、位置検出部２３によって検出された複数のユーザＵの顔の向きを、検出した複数の音源からの音の到来方向に等しいものとして推定し、複数のユーザＵの顔の３次元座標と、それぞれのユーザＵの顔の向きに関する方向情報とを画像選択部２５に出力する。さらに、方向検出部２４は、取得したユーザＵの位置情報（３次元座標）を、対向装置に向けて送信する。 The direction detection unit 24 continuously detects the arrival directions of sound from the sound source with respect to the surface of the screen 3 based on the plurality of audio signals obtained by the directional microphone 15 for the plurality of sound sources. Then, the direction detection unit 24 has the face directions of the plurality of users U detected by the position detection unit 23 based on the detected sound arrival directions equal to the detected sound arrival directions from the plurality of sound sources. The three-dimensional coordinates of the faces of the plurality of users U and the direction information related to the orientations of the faces of the users U are output to the image selection unit 25. Furthermore, the direction detection unit 24 transmits the acquired position information (three-dimensional coordinates) of the user U toward the opposing device.

位置情報取得部２１は、対向装置から通話者の通信相手である対話者のスクリーンに対する位置に関する位置情報を受信（取得）する。この位置情報は、対向装置において画像通信装置１と同様にして通話者の顔の位置情報として得られたものである。詳細には、位置情報取得部２１は、対話者の顔の対向装置側のスクリーンを基準とした３次元座標を取得する。 The position information acquisition unit 21 receives (acquires) position information related to the position of the conversation person who is the communication partner of the caller with respect to the screen from the opposite device. This position information is obtained as the position information of the caller's face in the opposite apparatus in the same manner as the image communication apparatus 1. Specifically, the position information acquisition unit 21 acquires three-dimensional coordinates based on the screen on the opposite device side of the conversation person's face.

画像選択部２５は、位置検出部２３によって検出された複数のユーザＵごとに、撮像用カメラ９によって取得された複数の画像の中から、合成元の画像を選択する。例えば、顔の向きの検出されていないユーザＵについては、そのユーザＵに関して算出された３次元座標を基に、その３次元座標に最も近い撮像用カメラ９の画像を選択する。一方、顔の向きの検出されたユーザＵについては、次のようにして画像を選択する。すなわち、方向検出部２４から出力されたユーザＵの顔の３次元座標及びそのユーザＵの方向情報を基に、複数の画像の中から合成元の画像を選択する。図５は、画像選択部２５による画像選択の手法を示す概念図である。同図に示すように、画像選択部２５は、仮想空間内にスクリーン３の座標と撮像用カメラ９の座標とを設定し、さらに、スクリーン３を挟んだ仮想空間内に、スクリーン３の座標を基準に、位置検出部２３によって検出された２人のユーザＵ_１，Ｕ_２の３次元座標に対応する座標Ｐ_２を反映させる。ここでいう仮想空間とは、通話者であるユーザＵ_１，Ｕ_２とスクリーン３と撮像用カメラ９とを含む実空間にスクリーン３を挟んで対話者をその位置情報を基に仮想的に配置させた空間である。そして、画像選択部２５は、仮想空間内でユーザＵ_１の座標Ｐ_２からそのユーザＵ_１の方向情報（例、スクリーン３に対する角度情報α）によって特定される方向に伸ばした線Ｌの位置を認識し、その線Ｌに最も近い位置に対応する撮像用カメラ９を選択する。図５の例によれば、左から３番目の撮像用カメラ９が選択される。ここで、ユーザＵ_１に対して選択される撮像用カメラ９としては、そのユーザＵ_１の顔の３次元座標が画角内に収まっている撮像用カメラに限られる。つまり、ユーザＵ_１に対する選択対象の撮像用カメラ９の数は、ユーザＵ_１のスクリーン３からの距離に依存する。 The image selection unit 25 selects a synthesis source image from among a plurality of images acquired by the imaging camera 9 for each of a plurality of users U detected by the position detection unit 23. For example, for the user U whose face orientation is not detected, the image of the imaging camera 9 closest to the three-dimensional coordinate is selected based on the three-dimensional coordinate calculated for the user U. On the other hand, for the user U whose face orientation is detected, an image is selected as follows. That is, based on the three-dimensional coordinates of the face of the user U output from the direction detection unit 24 and the direction information of the user U, an image to be synthesized is selected from a plurality of images. FIG. 5 is a conceptual diagram showing an image selection method by the image selection unit 25. As shown in the figure, the image selection unit 25 sets the coordinates of the screen 3 and the coordinates of the imaging camera 9 in the virtual space, and further sets the coordinates of the screen 3 in the virtual space sandwiching the screen 3. The coordinates P ₂ corresponding to the three-dimensional coordinates of the _two users U ₁ and U ₂ detected by the position detection unit 23 are reflected on the reference. The virtual space here means that the talker is virtually arranged based on the position information with the screen 3 sandwiched in the real space including the users U ₁ and U ₂ who are callers, the screen 3 and the imaging camera 9. It is the space that was made. Then, the image selection unit 25 determines the position of the line L extended from the coordinate P ₂ of the user U ₁ in the virtual space in the direction specified by the direction information of the user U ₁ (eg, angle information α with respect to the screen 3). The imaging camera 9 corresponding to the position closest to the line L is recognized and selected. According to the example of FIG. 5, the third imaging camera 9 from the left is selected. Here, the imaging camera 9 which is selected for the user U _1, limited to imaging camera 3-dimensional coordinates of the face of the user U ₁ is within the angle of view. That is, the number of imaging camera 9 to be selected for the user U ₁ is dependent on the distance from the screen 3 of the user U _1.

また、画像選択部２５は、上記のようにして得た撮像用カメラ９の選択結果を、位置情報取得部２１によって取得された対話者の位置に関する位置情報を基に補正してもよい。そして、画像選択部２５は、補正した選択結果を基に、複数の画像の中から合成元の画像を選択してもよい。この場合、画像選択部２５は、仮想空間内に取得した対話者の３次元座標に対応する座標Ｐ_１をさらに反映させる。そして、画像選択部２５は、仮想空間内で対話者の座標Ｐ_１とユーザＵ_１の座標Ｐ_２とを結んだ線Ｌ_１の位置を認識し、その線Ｌ_１に最も近い位置に対応する撮像用カメラ９をユーザＵ_１に対して選択するように、撮像用カメラ９の選択結果を補正する。図５の例によれば、左から４番目の撮像用カメラ９が選択される。その後、画像選択部２５は、複数のユーザＵ_１，Ｕ_２ごとに選択した撮像用カメラ９の情報と、全ての撮像用カメラ９によって取得された画像とを画像合成部２７にリアルタイムで出力する。 Further, the image selection unit 25 may correct the selection result of the imaging camera 9 obtained as described above based on the position information regarding the position of the conversation person acquired by the position information acquisition unit 21. Then, the image selection unit 25 may select a synthesis source image from a plurality of images based on the corrected selection result. In this case, the image selection unit 25 further reflects the coordinates P ₁ corresponding to the three-dimensional coordinates interlocutor obtained in the virtual space. Then, the image selection unit 25 recognizes the position of the line L ₁ connecting the coordinate P ₂ of the coordinates P ₁ and user U ₁ of the interlocutor in the virtual space, which corresponds to the position closest to the line L ₁ to select the imaging camera 9 to the user U _1, corrects the selection result of the imaging camera 9. According to the example of FIG. 5, the fourth imaging camera 9 from the left is selected. Thereafter, the image selection unit 25 outputs the information of the imaging camera 9 selected for each of the plurality of users U ₁ and U ₂ and the images acquired by all the imaging cameras 9 to the image synthesis unit 27 in real time. .

図４に戻って、画像合成部２７は、複数のユーザＵごとに選択された撮像用カメラ９の情報を用いて、撮像用カメラ９によって取得された画像を合成して合成画像を生成する。このとき、画像合成部２７は、背景に対応する部位については、スクリーン３上に２次元状に配列された撮像用カメラ９の画像をその配列に対応して並べるように組み合わせることによって合成画像を生成する。その際、画像合成部２７は、合成距離画像を参照しながら、隣り合う撮像用カメラ９の２つの画像間の境界で互いに対応点が一致するように、すなわち、視差を補正するように画像の組み合わせの位置を調整する。また、画像合成部２７は、このようにして作成した背景の合成画像に対して、複数のユーザＵの画像を重ね合わせる。詳細には、ユーザＵ毎に、選択された撮像用カメラ９の画像からユーザＵの顔を含む部位の画像を切り出し、合成画像上のユーザＵの顔の３次元座標に対応する位置に切り出した画像を重ね合わせる。画像からユーザＵの画像を切り出す際には、位置検出部２３による合成距離画像を対象にしたユーザＵの顔の３次元座標の探索結果が参照され、この３次元座標を切り出しの対象の画像上の座標に変換することによってユーザＵの顔の画像位置が特定される。 Returning to FIG. 4, the image composition unit 27 synthesizes the images acquired by the imaging camera 9 using the information of the imaging camera 9 selected for each of the plurality of users U, and generates a composite image. At this time, the image composition unit 27 combines the images corresponding to the background by combining the images of the imaging cameras 9 arranged in a two-dimensional manner on the screen 3 so as to correspond to the arrangement. Generate. At that time, the image composition unit 27 refers to the composite distance image so that the corresponding points coincide with each other at the boundary between the two images of the adjacent imaging cameras 9, that is, so as to correct the parallax. Adjust the position of the combination. Further, the image composition unit 27 superimposes images of a plurality of users U on the background composite image created in this way. Specifically, for each user U, an image of a part including the face of the user U is cut out from the selected image of the imaging camera 9 and cut out at a position corresponding to the three-dimensional coordinates of the face of the user U on the composite image. Superimpose images. When the image of the user U is cut out from the image, the search result of the three-dimensional coordinates of the face of the user U targeted for the composite distance image by the position detection unit 23 is referred to, and the three-dimensional coordinates are extracted on the image to be cut out. The position of the image of the face of the user U is specified by converting into the coordinates.

なお、画像の選択の単位は、適宜設定することができ、ユーザＵ全体の単位で画像を選択してもよいし、ユーザＵの部位（顔、首、体等）の単位で画像を選択して切り出してもよい。ただし、ユーザＵの部位単位で画像を切り出して重ね合わせる場合には、合成画像における像の不連続を防止するために、合成距離画像を参照しながら、重ね合わせる複数の画像間の境界で互いに対応点が一致するように、すなわち、視差を補正するように画像の重ね合わせの位置を調整することが好ましい。 The unit of image selection can be set as appropriate, and the image may be selected in units of the entire user U, or the image may be selected in units of the user U's part (face, neck, body, etc.). May be cut out. However, in the case where images are cut out and overlapped in units of user U, in order to prevent discontinuity of the images in the composite image, they correspond to each other at the boundary between the multiple images to be overlapped while referring to the composite distance image. It is preferable to adjust the overlapping position of the images so that the points coincide, that is, to correct the parallax.

図６には、画像合成部２７による合成画像の生成イメージが示されている。図６（ａ）〜（ｃ）には、それぞれ、図５に対応して左から２番目、真ん中、及び右から２番目の撮像用カメラ９によって取得された画像が示され、図６（ｄ）には、それらの画像を用いて画像合成部２７によって生成された合成画像が示されている。このように、ユーザＵ_１に関しては、図５のように、仮想空間内でユーザからその顔の向きの方向に伸びる線に最も近い真ん中の撮像用カメラ９の画像、すなわち、図６（ｂ）の画像が選択され、その画像から切り出された画像Ｇ_１が合成画像に合成される。一方で、顔の向きの検出されていないユーザＵ_２に関しては、ユーザＵ_２の位置に最も近い右から２番目の撮像用カメラ９の画像、すなわち、図６（ｃ）の画像が選択され、その画像から切り出された画像Ｇ_２が合成画像に合成される。 FIG. 6 shows a generation image of the composite image by the image composition unit 27. FIGS. 6A to 6C show images acquired by the second imaging camera 9 from the left, the middle, and the second from the right, corresponding to FIG. 5, respectively. ) Shows a composite image generated by the image composition unit 27 using these images. Thus, with respect to the user U _1, as shown in FIG. 5, the image of the imaging camera 9 in the middle closest to the line extending from the user in the virtual space in the direction of orientation of the face, ie, and FIG. 6 (b) image is selected, the image G ₁ cut out from the image is synthesized in the composite image. On the other hand, for the user U ₂ whose face orientation is not detected, the image of the second imaging camera 9 from the right closest to the position of the user U ₂ , that is, the image of FIG. 6C is selected. the image G ₂ cut out from the image is synthesized in the composite image.

再び図４に戻って、画像送信部２９は、位置情報取得部２１、位置検出部２３、方向検出部２４、画像選択部２５、及び画像合成部２７の処理が繰り返されることによって得られた時系列の複数の合成画像を、連続した動画像として対話者側の対向装置に送信する。また、画像受信部３１は、対向装置から対話者の画像を基に生成された時系列の合成画像を連続して受信し、その時系列の合成画像を画像表示装置５に送出することによってスクリーン３に動画像として表示させる。 Returning to FIG. 4 again, the image transmission unit 29 is obtained by repeating the processing of the position information acquisition unit 21, the position detection unit 23, the direction detection unit 24, the image selection unit 25, and the image composition unit 27. A plurality of composite images of the series are transmitted as continuous moving images to the opposing device on the conversation party side. The image receiving unit 31 continuously receives a time-series composite image generated based on the image of the conversation person from the opposite device, and sends the time-series composite image to the image display device 5 to transmit the screen 3. Displayed as a moving image.

上述した画像通信装置１による画像通信の動作手順を、図７を参照しながら説明する。図７は、画像通信装置１による画像通信の動作手順を示すフローチャートである。 An operation procedure of image communication by the image communication apparatus 1 described above will be described with reference to FIG. FIG. 7 is a flowchart showing an operation procedure of image communication by the image communication apparatus 1.

最初に、画像処理装置７において、ユーザＵからの通信開始の指示が受け付けられると、画像通信処理が開始される（ステップＳ０１）。そうすると、画像処理装置７の位置検出部２３により、スクリーン３の前面側における複数の通話者であるユーザＵの顔の位置が検出される（ステップＳ０２）。次に、画像処理装置７の方向検出部２４により、ユーザＵの顔の向きに関する方向情報が推定される（ステップＳ０３）。同時に、画像処理装置７の位置情報取得部２１によって、対向装置から対話者のスクリーンに対する３次元座標の情報が取得される（ステップＳ０４）。 First, when the image processing apparatus 7 receives an instruction to start communication from the user U, the image communication process is started (step S01). Then, the position of the face of the user U who is a plurality of callers on the front side of the screen 3 is detected by the position detector 23 of the image processing device 7 (step S02). Next, the direction information regarding the orientation of the face of the user U is estimated by the direction detection unit 24 of the image processing device 7 (step S03). At the same time, the position information acquisition unit 21 of the image processing device 7 acquires information on the three-dimensional coordinates with respect to the screen of the conversation person from the opposing device (step S04).

その後、画像処理装置７の画像選択部２５により、複数のユーザＵの顔の位置に関する位置情報と、複数のユーザＵの顔の向きに関する方向情報とを用いて、ユーザＵ毎に、複数の撮像用カメラ９によって取得された画像の中から、合成元の画像が選択される（ステップＳ０５）。そして、画像処理装置７の画像合成部２７により、画像選択部２５による選択結果を基に、複数の撮像用カメラ９によって取得された画像を合成することにより合成画像が生成される（ステップＳ０６）。 Thereafter, the image selection unit 25 of the image processing device 7 uses the positional information related to the positions of the faces of the plurality of users U and the direction information related to the orientations of the faces of the plurality of users U to perform imaging for each user U. A synthesis source image is selected from the images acquired by the camera 9 (step S05). Then, the image composition unit 27 of the image processing device 7 combines the images acquired by the plurality of imaging cameras 9 based on the selection result by the image selection unit 25 to generate a composite image (step S06). .

さらに、生成された合成画像は、その都度、画像処理装置７の画像送信部２９によって対向装置に向けて送信される（ステップＳ０７）。同時に、画像処理装置７の画像受信部３１により、対向装置から合成画像が受信され、受信された合成画像はスクリーン３に表示される（ステップＳ０８）。上記のステップＳ０２〜ステップＳ０８の処理は、ユーザＵからの通信終了の指示が受け付けられるまで繰り返されることにより（ステップＳ０９）、遠隔地間でのリアルタイムでの画像通信が実行される。 Further, the generated composite image is transmitted to the opposite device by the image transmission unit 29 of the image processing device 7 each time (step S07). At the same time, the composite image is received from the opposing device by the image receiving unit 31 of the image processing device 7, and the received composite image is displayed on the screen 3 (step S08). The processes in steps S02 to S08 are repeated until an instruction to end communication from the user U is received (step S09), and real-time image communication between remote locations is executed.

以上説明した画像通信装置１によれば、スクリーン３に対する通話者の位置に関する位置情報が検出されるとともに、スクリーン３に対する通話者の向きに関する方向情報が取得され、それらの情報を基に、スクリーン３に沿って配置された複数の撮像用カメラ９によって取得された画像の中から、通話者の画像が選択され、その画像と複数の撮像用カメラ９によって取得された複数の画像とを用いて合成画像が生成される。さらに、合成画像が対話者側に送信され、対話者側から受信した合成画像がスクリーン３に表示される。このような構成により、通話者の位置及び向きを基に対話者にとってコミュニケーション時に違和感のない通話者の画像を合成画像に含ませて表示させることができる。また、通話者にとってもコミュニケーション時に違和感のない対話者の画像を表示させることができる。その結果、対話者と通話者との円滑なコミュニケーションを実現することができる。 According to the image communication apparatus 1 described above, the position information related to the position of the caller relative to the screen 3 is detected, and the direction information related to the direction of the caller relative to the screen 3 is acquired. Based on the information, the screen 3 The image of the caller is selected from the images acquired by the plurality of imaging cameras 9 arranged along the line, and is synthesized using the image and the plurality of images acquired by the plurality of imaging cameras 9. An image is generated. Further, the composite image is transmitted to the dialog side, and the composite image received from the dialog side is displayed on the screen 3. With such a configuration, an image of the caller who does not feel uncomfortable during communication for the conversation person can be included in the composite image and displayed based on the position and orientation of the caller. In addition, it is possible to display an image of a conversation person who does not feel strange at the time of communication for the caller. As a result, smooth communication between the interlocutor and the caller can be realized.

具体的には、スクリーンを基準にした通話者の顔の位置および通話者の顔の方向を基に対話者にとってコミュニケーション時に違和感のない通話者の画像を合成画像に含ませて表示させることができる。その結果、対話者と通話者との円滑なコミュニケーションをより的確に実現することができる。特に、仮想空間内に通話者の位置を反映させて、その仮想空間内で通話者の位置から通話者の向きに伸ばした線上に最も近い撮像用カメラ９の画像を通話者の画像として選択するので、対話者と通話者との対話時に対話者に対して通話者を正面から見た画像を合成画像に反映しやすい。その結果、対話者に対して、仮想空間内での実際の対話時に近い通話者の画像を表示させることができる。 Specifically, based on the position of the caller's face relative to the screen and the direction of the caller's face, an image of the caller who does not feel uncomfortable during communication can be included in the composite image and displayed. . As a result, smooth communication between the interlocutor and the caller can be realized more accurately. In particular, the position of the caller is reflected in the virtual space, and the image of the imaging camera 9 closest to the line extending from the caller's position in the virtual space toward the caller is selected as the caller's image. Therefore, it is easy to reflect on the composite image an image obtained by viewing the caller from the front with respect to the talker during the dialogue between the talker and the talker. As a result, it is possible to display an image of the caller who is close to the actual conversation in the virtual space.

また、スクリーンを挟んだ仮想空間内に対話者の位置をさらに反映させて、その仮想空間内で対話者の位置と通話者の位置とを結んだ線上に最も近い撮像用カメラ９の画像を通話者の画像として選択するように補正する。これにより、対話者に対して通話者を正面から見た画像を合成画像として表示させることができる。その結果、対話者に対して、仮想空間内での実際の対話時に近い通話者の画像を表示させることができる。 Further, the position of the conversation person is further reflected in the virtual space across the screen, and the image of the imaging camera 9 closest to the line connecting the position of the conversation person and the position of the talker in the virtual space is called. So that it can be selected as the person's image. Thereby, it is possible to display an image obtained by viewing the caller from the front as a composite image with respect to the conversation person. As a result, it is possible to display an image of the caller who is close to the actual conversation in the virtual space.

また、位置検出部として測距用カメラ１１とを含んでおり、方向検出部として指向性マイク１５とを含んでいるので、通話者のスクリーン３に対する位置或いは方向に関する情報を効率的に取得することができる。 Further, since the position detecting unit 11 includes the distance measuring camera 11 and the direction detecting unit includes the directional microphone 15, information on the position or direction of the caller with respect to the screen 3 can be efficiently acquired. Can do.

本発明は、上述した実施形態に限定されるものではない。 The present invention is not limited to the embodiment described above.

上記の実施形態においては、位置検出部としての測距用カメラ１１と合成画像の合成元の画像を取得する撮像用カメラ９とを別々に配置していたが、これらはスクリーン３上の同一の場所に配置してもよい。図８は、本発明の変形例にかかる撮像用カメラ９及び測距用カメラ１１の構成を示す側面図である。同図に示す変形例では、スクリーン３の穴部３ａの後面側には、穴部３ａを通過した光を受けるレンズ３２およびダイクロイックミラー３３が配置されている。このダイクロイックミラー３３は距離画像生成用の近赤外光を透過すると同時に、撮像用カメラ９で検出する可視光を反射させる。さらに、このダイクロイックミラー３３に対して穴部３ａを通過した可視光の反射方向に隣接する位置には、撮像レンズ９ｂ及び撮像素子９ａを含む撮像用カメラ９が配置されている。また、ダイクロイックミラー３３に対して穴部３ａを通過した近赤外光の透過方向に隣接する位置には、撮像レンズ１１ｂ及び撮像素子１１ａを含む測距用カメラ１１が配置されている。このような変形例によれば、撮像用カメラ９と測距用カメラ１１との間で視野を確実に一致させることができるので、合成画像の生成時の視差の補正処理が簡素化できる。 In the above embodiment, the ranging camera 11 as the position detection unit and the imaging camera 9 for acquiring the synthesized image of the synthesized image are separately arranged, but these are the same on the screen 3. You may arrange at the place. FIG. 8 is a side view showing configurations of the imaging camera 9 and the distance measuring camera 11 according to the modification of the present invention. In the modification shown in the figure, a lens 32 and a dichroic mirror 33 that receive light that has passed through the hole 3a are arranged on the rear surface side of the hole 3a of the screen 3. The dichroic mirror 33 transmits near-infrared light for generating a distance image and reflects visible light detected by the imaging camera 9. Further, an imaging camera 9 including an imaging lens 9b and an imaging element 9a is disposed at a position adjacent to the reflection direction of visible light that has passed through the hole 3a with respect to the dichroic mirror 33. A distance measuring camera 11 including an imaging lens 11b and an imaging element 11a is disposed at a position adjacent to the transmission direction of near infrared light that has passed through the hole 3a with respect to the dichroic mirror 33. According to such a modification, the field of view can be reliably matched between the imaging camera 9 and the distance measuring camera 11, so that the parallax correction processing when generating the composite image can be simplified.

また、図９に示すように、撮像用カメラ９と測距用カメラ１１とを一体化されたカメラで代用してもよい。同図に示す変形例では、スクリーン３の穴部３ａの後面側に、穴部３ａを通過した光を受けるレンズ３５ａと撮像素子３５ｂとを含むカメラ３５が配置されている。このカメラは、ＴＯＦ方式による距離画像を生成する機能と、可視光を検出して通常の画像を生成する機能とを有する一体型の画像センサである。 In addition, as shown in FIG. 9, an imaging camera 9 and a distance measuring camera 11 may be replaced with an integrated camera. In the modification shown in the figure, a camera 35 including a lens 35 a that receives light that has passed through the hole 3 a and an imaging element 35 b is disposed on the rear surface side of the hole 3 a of the screen 3. This camera is an integrated image sensor having a function of generating a distance image by the TOF method and a function of generating a normal image by detecting visible light.

また、上記の実施形態においては、測距用カメラ１１及び撮像用カメラ９の個数および配置は、様々変更されてもよい。図１０〜図１１には、本発明の変形例における測距用カメラ１１及び撮像用カメラ９の配置を示している。なお、これらの図において、指向性マイク１５の図示は省略している。 In the above embodiment, the number and arrangement of the distance measurement camera 11 and the imaging camera 9 may be variously changed. 10 to 11 show the arrangement of the distance measuring camera 11 and the imaging camera 9 in a modification of the present invention. In these drawings, the directional microphone 15 is not shown.

図１０に示す変形例では、測距用カメラ１１の個数を４個にし、それらをスクリーン３の上端と下端とに分けて配置するとともに、撮像用カメラ９をスクリーン３の短辺方向に１列分（合計５個）追加した構成を採用している。また、図１１に示す変形例では、スクリーン３上で撮像用カメラ９と測距用カメラ１１とが短辺方向及び長辺方向に沿って交互に配置された構成が採用されている。 In the modification shown in FIG. 10, the number of ranging cameras 11 is set to four, arranged separately at the upper end and the lower end of the screen 3, and the imaging cameras 9 are arranged in one row in the short side direction of the screen 3. A configuration with additional minutes (a total of 5) is adopted. In the modification shown in FIG. 11, a configuration is adopted in which the imaging camera 9 and the distance measuring camera 11 are alternately arranged on the screen 3 along the short side direction and the long side direction.

また、上記実施形態においては、測距用カメラ１１は含んでいなくてもよい。図１２には、本発明の変形例における撮像用カメラ９の配置を示している。なお、図１２において、指向性マイク１５の図示は省略している。 In the above embodiment, the distance measuring camera 11 may not be included. FIG. 12 shows the arrangement of the imaging camera 9 in a modification of the present invention. In FIG. 12, the directional microphone 15 is not shown.

図１２に示す変形例では、スクリーン３には撮像用カメラ９のみが設けられている。そして、これらの撮像用カメラ９によって取得される画像を合成することで合成画像が生成される。また、これらの撮像用カメラ９は、複数のユーザＵの顔の位置を検出する位置検出部としても機能する。すなわち、この変形例においては、画像処理装置７の位置検出部２３は、隣接する２つの撮像用カメラ９の画像をステレオカメラの画像としてマッチング処理することにより距離画像を生成し、複数の撮像用カメラ９のペアによって得られた複数の距離画像を合成することで合成距離画像を生成する。 In the modification shown in FIG. 12, only the imaging camera 9 is provided on the screen 3. Then, a composite image is generated by combining the images acquired by these imaging cameras 9. These imaging cameras 9 also function as a position detection unit that detects the positions of the faces of a plurality of users U. That is, in this modification, the position detection unit 23 of the image processing device 7 generates a distance image by performing matching processing on the images of the two adjacent imaging cameras 9 as images of a stereo camera, and a plurality of imaging images A composite distance image is generated by combining a plurality of distance images obtained by the pair of cameras 9.

また、上記実施形態においては、画像処理装置７の画像選択部２５は、ユーザＵの画像を選択する際に、ユーザＵの位置に最も近い撮像用カメラ９の画像を選択していたが、このような処理には限定されない。すなわち、上記実施形態の画像選択部２５は、ユーザＵについては、そのユーザＵに関して算出された３次元座標を基に、その３次元座標に最も近い撮像用カメラ９の画像を選択していた。これに対して、変形例としては、様々な画像の選択方法を採用してもよい。例えば、複数の撮像用カメラ９によって得られた画像、或いは測距用カメラ１１によって得られた距離画像を用いて、ユーザＵの顔の方向を検出し、その顔の方向に近い位置にある撮像用カメラ９の画像を選択してもよい。例えば、顔の方向は、ユーザＵの口や目等の顔の部位の位置を特定してそれらの位置の関係を演算することによって検出される。また、さらなる変形例としては、撮像用カメラ９によって得られた画像、或いは測距用カメラ１１によって得られた距離画像を用いて、ユーザＵの視線方向を検出し、ユーザＵの顔の位置から検出した視線方向に伸ばした視線方向ベクトル上から近い位置にある撮像用カメラ９の画像を選択してもよい。ユーザＵの視線方向の検出の仕組みとしては、既存の技術（例えば、特許第４６０４１９０に記載の構成、特許第４５１７０４９に記載の構成）を用いることができる。このような変形例によれば、対話者にとって全てのユーザＵとのコミュニケーションが容易な画像を表示させることができる。 In the above embodiment, the image selection unit 25 of the image processing apparatus 7 selects the image of the imaging camera 9 closest to the position of the user U when selecting the image of the user U. It is not limited to such a process. That is, for the user U, the image selection unit 25 of the above embodiment selects the image of the imaging camera 9 closest to the three-dimensional coordinate based on the three-dimensional coordinate calculated for the user U. On the other hand, as a modification, various image selection methods may be employed. For example, the direction of the face of the user U is detected using the images obtained by the plurality of imaging cameras 9 or the distance images obtained by the distance measuring camera 11, and imaging at a position close to the face direction is detected. An image of the camera 9 may be selected. For example, the face direction is detected by specifying the position of a part of the face such as the mouth and eyes of the user U and calculating the relationship between the positions. As a further modification, the line-of-sight direction of the user U is detected using the image obtained by the imaging camera 9 or the distance image obtained by the distance measuring camera 11, and the position of the user U's face is detected. You may select the image of the camera 9 for imaging in the position close | similar from the gaze direction vector extended in the detected gaze direction. An existing technique (for example, the configuration described in Japanese Patent No. 4604190, the configuration described in Japanese Patent No. 4517049) can be used as a mechanism for detecting the user's U gaze direction. According to such a modification, it is possible to display an image that facilitates communication with all users U for the interlocutor.

また、上記実施形態の方向検出部２４は、指向性マイク１５によって検出された音声信号を基にユーザＵの顔の向きを推定していたが、ユーザＵの顔の向きは次の方法によって検出してもよい。例えば、方向検出部２４は、複数の撮像用カメラ９によって得られた画像、或いは測距用カメラ１１によって得られた距離画像を用いて、画像処理によってユーザＵの顔の向きを検出してもよい。このとき、顔の向きは、通話者の口や目等の顔の部位の画像上の位置を特定してそれらの位置の関係を演算することによって検出される。また、方向検出部２４は、撮像用カメラ９によって得られた画像、或いは測距用カメラ１１によって得られた距離画像を用いて、ユーザＵの視線方向を検出し、その視線方向から顔の向きを推定してもよい。 In addition, the direction detection unit 24 of the above embodiment estimates the orientation of the user U's face based on the audio signal detected by the directional microphone 15, but the orientation of the user U's face is detected by the following method. May be. For example, the direction detection unit 24 may detect the orientation of the face of the user U by image processing using the images obtained by the plurality of imaging cameras 9 or the distance images obtained by the distance measuring camera 11. Good. At this time, the orientation of the face is detected by specifying the position of the face part such as the mouth and eyes of the caller on the image and calculating the relationship between these positions. In addition, the direction detection unit 24 detects the gaze direction of the user U using the image obtained by the imaging camera 9 or the distance image obtained by the distance measurement camera 11, and the direction of the face from the gaze direction. May be estimated.

また、上記実施形態では通話者側の画像通信装置１が通話者の画像を含む合成画像を生成し、その合成画像を対向装置に送信していたが、複数の撮像用カメラ９の画像から合成画像を生成する機能は、対向装置側に具備されていてもよい。この場合、画像通信装置１の画像送信部２９は、複数の撮像用カメラ９の画像と位置検出部２３及び方向検出部２４で検出された情報とを対向装置に送信する。 In the above embodiment, the image communication device 1 on the caller side generates a composite image including the image of the caller and transmits the composite image to the opposite device. The function of generating an image may be provided on the opposite device side. In this case, the image transmission unit 29 of the image communication apparatus 1 transmits the images of the plurality of imaging cameras 9 and the information detected by the position detection unit 23 and the direction detection unit 24 to the opposite device.

また、通話者側の画像通信装置１と、その通信相手となる対話者側の対向装置とでは、スクリーンのサイズや、スクリーンで表示する画像の解像度は異なっていてもよい。また、通話者側の画像通信装置１と、その通信相手となる対向装置とでは、複数の撮像用カメラ９の配置及び数が異なっていてもよいし、測距用カメラ１１の配置及び数が異なっていてもよい。このような場合は、画像通信装置１の画像合成部２７は、合成画像を生成する際に、画像通信装置１のスクリーン３と対向装置のスクリーンとで区切られた仮想空間内でのユーザＵの位置に対応する対向装置のディスプレイ上の位置にユーザＵの画像が表示されるように、座標変換処理、画素補間処理、画像拡大処理、又は画像縮小処理を実行することが望ましい。 In addition, the screen size and the resolution of the image displayed on the screen may be different between the image communication device 1 on the caller side and the opposite device on the dialog party side as the communication partner. Further, the arrangement and number of the plurality of imaging cameras 9 may be different between the image communication apparatus 1 on the caller side and the opposite apparatus as the communication counterpart, and the arrangement and number of the ranging cameras 11 may be different. May be different. In such a case, the image compositing unit 27 of the image communication device 1 generates the composite image when the user U in the virtual space separated by the screen 3 of the image communication device 1 and the screen of the opposite device. It is desirable to execute coordinate conversion processing, pixel interpolation processing, image enlargement processing, or image reduction processing so that the image of the user U is displayed at a position on the display of the opposing device corresponding to the position.

１…画像通信装置、３…スクリーン（画面）、５…画像表示装置（画像表示部）、７…画像処理装置、９…撮像用カメラ（位置検出部）、１１…測距用カメラ（位置検出部）、１３…パルス光源、１５…指向性マイク（方向検出部）、２１…位置情報取得部、２３…位置検出部、２４…方向検出部、２５…画像選択部（画像合成部）、２７…画像合成部、２９…画像送信部、３１…画像受信部、３５…カメラ、Ｕ，Ｕ_１，Ｕ_２…ユーザ（通話者）。 DESCRIPTION OF SYMBOLS 1 ... Image communication apparatus, 3 ... Screen (screen), 5 ... Image display apparatus (image display part), 7 ... Image processing apparatus, 9 ... Camera for imaging (position detection part), 11 ... Camera for distance measurement (position detection) Part), 13 ... pulse light source, 15 ... directional microphone (direction detection part), 21 ... position information acquisition part, 23 ... position detection part, 24 ... direction detection part, 25 ... image selection part (image composition part), 27 ... image combining unit, 29 ... image transmitting unit, 31 ... image receiving unit, 35 ... camera, U, _{U 1,} U 2 _... user (caller).

Claims

A plurality of imaging cameras arranged along the screen;
A position detector for detecting position information related to the position of the caller with respect to the screen;
A direction detection unit for detecting direction information regarding the direction of the caller with respect to the screen;
Based on the position information and the direction information, the caller's image is selected from a plurality of images acquired by the plurality of imaging cameras, and the selected caller's image and the plurality of images are selected. An image composition unit that generates a composite image by combining
An image transmission unit that transmits one of the composite image and the plurality of images acquired by the plurality of imaging cameras;
An image display unit for displaying on the screen a composite image generated on the basis of an image of a conversation party that is a communication partner of the caller;
An image communication apparatus comprising:

The image composition unit selects, as the caller's image, the image of the imaging camera that is closest to the line extended in the direction specified by the direction information from the position of the caller specified by the position information.
The image communication apparatus according to claim 1.

An information acquisition unit that acquires position information related to the position of the dialog person on the screen;
The image composition unit corrects the composite image based on the position information of the interactor.
The image communication apparatus according to claim 1 or 2.

The image synthesizing unit captures the image closest to a line connecting the conversation person and the talker when the position of the talker and the position of the talker are reflected on a space sandwiching the screen. Correcting the composite image so as to select the camera image as the caller image,
The image communication apparatus according to claim 3.

The position detection unit includes a camera that acquires a distance image arranged along a screen.
The image communication apparatus according to claim 1.

The direction detection unit includes a plurality of directional microphones arranged,
The image communication apparatus according to claim 1.