JP2000287188A

JP2000287188A - System and unit for inter-multi-point video audio communication

Info

Publication number: JP2000287188A
Application number: JP11094445A
Authority: JP
Inventors: Yoshio Nagashima; 美雄永嶋; Hitoshi Aoki; 仁志青木; Tadahiko Komatsu; 忠彦小松; Hiroshi Koyano; 浩小谷野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-04-01
Filing date: 1999-04-01
Publication date: 2000-10-13

Abstract

PROBLEM TO BE SOLVED: To allow the title system to freely control the size of a portrait image of a communication opposite party and to enhance the transmission efficiency of the communication. SOLUTION: Each terminal displays a common three-dimensional space, enters position information to be located on the xy plane of the space, transmits a portrait and voice data of a use to a server together with the position information, the server recognizes positions of users 41 to 44 of the terminal 31 on the xy plane from the received position information, controls the resolution of video images from the terminals 31 to 33 in a way that the video image of the user 41 closer to the user 44 is displayed as a larger picture having a higher resolution and the video images of the users 42, 43 are displayed as a smaller picture having a smaller resolution in the case of viewing the three- dimensional space from the terminal 34 (user 44), and transmits the controlled result to the terminal 34. The server transmits video images from other terminals to each terminal while controlling the resolution.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、複数地点に設置
された端末のテレビカメラで撮像された利用者の映像情
報、およびマイクロホンにより入力された音声情報を、
ネットワーク内に構築された３次元空間内の位置関係に
応じて処理を行い、各端末の表示装置およびスピーカに
出力する３次元空間を用いた多地点間映像音声通信シス
テム及びその装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for converting video information of a user imaged by a television camera of a terminal installed at a plurality of points and audio information input by a microphone,
The present invention relates to a multipoint video / audio communication system using a three-dimensional space that performs processing in accordance with a positional relationship in a three-dimensional space constructed in a network and outputs to a display device and a speaker of each terminal, and a device thereof. .

【０００２】[0002]

【従来の技術】従来、音声情報と映像情報を用いて遠隔
地間で打合わせを行うシステムとして、テレビ会議シス
テムが実現されている。このシステムを用いて複数の地
点間で会議を行う場合には、地点対応の複数の表示装置
に表示する方法、あるいは一つの表示装置の画面を分割
して、そこに複数の画面を同時に表示する方法等が用い
られている。2. Description of the Related Art Conventionally, a video conference system has been realized as a system for holding meetings between remote locations using audio information and video information. When a conference is held between a plurality of points using this system, a method of displaying on a plurality of display devices corresponding to the points, or dividing a screen of one display device and displaying a plurality of screens simultaneously there. Methods and the like are used.

【０００３】[0003]

【発明が解決しようとする課題】上記のような従来技術
では、接続される地点数が限定される点に問題があり、
他の地点に存在する通信相手の人物像の大きさを自由に
コントロールすることができないという問題があった。
また表示される人物像の大きさに係わらず、他の地点か
ら伝送されてくる映像画面の画素数は一定、すなわち伝
送情報量は一定であり、伝送路の効果的な利用ができな
いといった問題点があった。The above-mentioned prior art has a problem in that the number of connected points is limited.
There has been a problem that the size of the person image of the communication partner at another point cannot be freely controlled.
Also, regardless of the size of the displayed human image, the number of pixels of the video screen transmitted from another point is constant, that is, the amount of transmitted information is constant, and the transmission path cannot be used effectively. was there.

【０００４】この発明の目的は、通信相手の人物像の大
きさを自由にコントロールすることができ、会話をした
い相手に近づき、利用者間での通話を可能とし、伝送路
の効果的利用ができる多地点間映像音声通信システムを
提供することにある。[0004] It is an object of the present invention to enable the size of a person image of a communication partner to be freely controlled, to approach a communication partner, to enable communication between users, and to make effective use of a transmission path. It is an object of the present invention to provide a multipoint video and audio communication system that can perform the above.

【０００５】[0005]

【課題を解決するための手段】上記課題を達成するため
に、この発明の多地点間映像音声通信システムでは、複
数の端末装置と１つのサーバ装置が通信網を介して結合
され、各端末装置は、利用者を撮像する手段と、利用者
の音声を収音する手段と、前記利用者の３次元空間内で
の位置情報を入力する手段と、前記利用者の映像情報と
音声情報と位置情報をサーバ装置へ送信する手段と、サ
ーバ装置から他端末の利用者の位置情報と映像情報と音
声情報を受信する手段と、前記３次元空間の画像情報を
生成し、受信した他端末利用者の位置情報に基づいて、
該生成された画像情報の３次元空間内にオブジェクトを
生成し、そのオブジェクトに対応する受信した他端末利
用者の映像情報をマッピングする手段と、そのマッピン
グされた映像を含む３次元空間の画像情報を表示する手
段と、受信した音声情報を再生する手段とを有する。In order to achieve the above object, in a multipoint video / audio communication system according to the present invention, a plurality of terminal devices and one server device are connected via a communication network, and each terminal device is connected. Means for capturing an image of a user, means for collecting a user's voice, means for inputting position information of the user in a three-dimensional space, and image information, sound information and position of the user Means for transmitting information to the server device, means for receiving position information, video information, and audio information of the user of the other terminal from the server device, and generating and receiving the image information in the three-dimensional space, the user of the other terminal Based on your location information,
Means for generating an object in the three-dimensional space of the generated image information and mapping the received video information of the other terminal user corresponding to the object, and image information of the three-dimensional space including the mapped video And means for reproducing the received audio information.

【０００６】サーバ装置は、各端末からの映像情報と音
声情報と位置情報を受信する手段と、前記受信した映像
情報より人物像を抽出する手段と、前記受信した各端末
の位置情報に基づいて、各端末ごとに前記抽出した各端
末利用者の人物像の解像度を制御する手段と、前記受信
した各端末からの音声情報をあらかじめ定められた条件
によりミキシングする手段と、端末ごとに前記解像度の
制御された各端末利用者の人物像の映像情報と位置情報
および前記ミキシングされた音声情報を対応端末へ送信
する手段とを有する。[0006] The server device includes means for receiving video information, audio information, and position information from each terminal; means for extracting a human image from the received video information; and a server based on the received position information of each terminal. Means for controlling the resolution of the extracted human image of each terminal user for each terminal; means for mixing the received audio information from each terminal under predetermined conditions; and Means for transmitting video information and position information of the controlled person image of each terminal user and the mixed audio information to the corresponding terminal.

【０００７】作用上記のとおり構成されたこの発明では、複数端末から送
られる３次元空間内の位置情報により各端末相互間の位
置関係をサーバ装置が把握し、その位置関係に基づいて
サーバ装置から各端末に送信される映像画面の画素数を
変更することから、端末で表示される人物像の大きさに
適した画素数の画面を伝送することが可能であり、伝送
路の効率的な利用ができる。さらに利用者の操作により
会話をしたい相手を選択して、利用者間での通話を行う
ことが可能となる。[0007] In the invention constructed action as described above, the position information in three-dimensional space sent from the plurality of terminals a positional relationship between the end-to know the server device, the server device based on the positional relationship By changing the number of pixels of the video screen transmitted to each terminal, it is possible to transmit a screen with the number of pixels suitable for the size of the human image displayed on the terminal, and to efficiently use the transmission path Can be. Further, it is possible to select a partner with whom the user wants to have a conversation by operating the user and make a call between the users.

【０００８】[0008]

【発明の実施の形態】この発明の一実施例を図を参照し
て説明する。この実施例では、４つの端末の例を示して
いるが、この発明は端末の数を限定するものではない。
図１は、この発明の多地点間映像音声通信システムの全
体の構成を示す。サーバ装置１はＬＡＮや公衆電話網等
の通信網２に収容され、また第１、第２、第３および第
４の地点に設置された端末３１，３２，３３，３４が通
信網２に収容されている。各端末３１，３２，３３，３
４は、３次元空間を生成し、通信網２を介して、サーバ
装置１と映像情報、音声情報および位置情報の送受信を
行うことにより、３次元空間内の各端末の利用者の位置
にしたがって人物像の表示および会話音声の出力を行
う。An embodiment of the present invention will be described with reference to the drawings. Although this embodiment shows an example of four terminals, the present invention does not limit the number of terminals.
FIG. 1 shows the overall configuration of a multipoint video / audio communication system according to the present invention. The server device 1 is accommodated in a communication network 2 such as a LAN or a public telephone network, and terminals 31, 32, 33, and 34 installed at first, second, third, and fourth points are accommodated in the communication network 2. Have been. Each terminal 31, 32, 33, 3
4 generates a three-dimensional space, and transmits and receives video information, audio information, and position information to and from the server device 1 via the communication network 2, according to the position of the user of each terminal in the three-dimensional space. It displays a person image and outputs a conversation voice.

【０００９】例えば図５Ａに示すように共通の３次元空
間内のｘｙ平面上で端末３１〜３４の各利用者４１〜４
４が位置し、利用者４４の視線方向が壁面３に向ってい
る場合、利用者４４の端末３４の表示画面には例えば図
５Ｂに示すような３次元空間が表示され、この空間内
に、利用者４１〜４３がそのｘｙ平面上の位置と対応し
て表示される。For example, as shown in FIG. 5A, the users 41 to 4 of the terminals 31 to 34 on an xy plane in a common three-dimensional space.
When the user 4 is located and the line of sight of the user 44 faces the wall surface 3, a three-dimensional space as shown in FIG. 5B is displayed on the display screen of the terminal 34 of the user 44, for example. The users 41 to 43 are displayed corresponding to the positions on the xy plane.

【００１０】図２は、端末３１〜３４とサーバ装置１間
で伝送される情報を示す。端末３１〜３４は、それぞれ
３次元空間内の当該端末利用者４１〜４４の位置情報
（人物の向きも含まれる）と、そのテレビカメラにより
撮像した当該利用者の人物像と、そのマイクにより収音
した当該利用者の音声とをサーバ装置１へそれぞれ送信
する。一方、サーバ装置１は、各端末から収集した全て
の位置情報を管理しており、これら位置情報に基づいて
各端末の人物像の位置関係を計測し、各端末３１〜３４
へ、それぞれ当該端末を除く全端末の位置情報と処理画
像およびミキシング音声を送信する。例えば端末３４へ
は利用者４１〜４３の位置情報とその人物像と音声を送
信する。FIG. 2 shows information transmitted between the terminals 31 to 34 and the server device 1. The terminals 31 to 34 collect the position information of the terminal users 41 to 44 in the three-dimensional space (including the direction of the person), the person image of the user captured by the television camera, and the microphone. The sound of the user who made the sound is transmitted to the server device 1. On the other hand, the server device 1 manages all the positional information collected from each terminal, measures the positional relationship of the human image of each terminal based on these positional information, and
, The position information, the processed image, and the mixing sound of all the terminals except for the terminal are transmitted. For example, position information of the users 41 to 43, their images, and sounds are transmitted to the terminal 34.

【００１１】図３は、この発明におけるサーバ装置１の
一実施例を示す。端末３１，３２，３３，３４よりの信
号は、それぞれデータ多重分離部１１，１２，１３，１
４に入り、それぞれ位置情報と映像情報および音声情報
に分離され、制御情報バス１８３、映像情報バス１８１
および音声情報バス１８２を通じて、それぞれ制御部１
５、映像処理部１６および音声処理部１７に入る。制御
部１５では、各端末の利用者の３次元空間内の位置情報
を管理しており、各端末の位置情報に基づいて各利用者
の位置関係を計測し、映像処理部１６と音声処理部１７
に対しての指示を出力する。つまり、端末３４について
はその利用者４４の位置から利用者４１，４２，４３の
各位置までの距離を求め、これらの距離に応じて、図５
Ａの例では近い距離の利用者４１の人物像は大きく、遠
い距離の利用者４２，４３の人物像は小さく、それぞれ
表示する指示が出力され、また音声については近距離の
利用者４１の音声を出力し、遠い距離の利用者４２，４
３の音声は出力しない指示が出力される。このような指
示が各端末についてそれぞれ出力される。また制御部１
５では、受信した各端末の位置情報を制御情報バス１８
３を介して全てのデータ多重分離部１１，１２，１３，
１４に分配する。FIG. 3 shows an embodiment of the server device 1 according to the present invention. Signals from the terminals 31, 32, 33, and 34 are transmitted to data demultiplexing units 11, 12, 13, 1 respectively.
4 and is separated into position information, video information, and audio information, respectively.
And the control unit 1 via the audio information bus 182, respectively.
5. Enter the video processing unit 16 and the audio processing unit 17. The control unit 15 manages the positional information of the user of each terminal in the three-dimensional space, measures the positional relationship of each user based on the positional information of each terminal, and controls the video processing unit 16 and the audio processing unit. 17
Output instructions to That is, with respect to the terminal 34, the distance from the position of the user 44 to each of the positions of the users 41, 42, and 43 is calculated, and according to these distances,
In the example of A, the person image of the short distance user 41 is large, and the person images of the long distance users 42 and 43 are small, and an instruction to display each is output. Is output, and users 42 and 4 at a long distance
An instruction not to output the sound of No. 3 is output. Such an instruction is output for each terminal. Control unit 1
At 5, the received position information of each terminal is transmitted to the control information bus 18.
3, all the data demultiplexing units 11, 12, 13,.
Distribute to 14.

【００１２】映像処理部１６では、各端末３１〜３４か
ら送られてくる映像情報を受信して、まず人物像抽出部
１６１により人物像部分のみを抽出する。この抽出手段
としては色々な方法があるが、例えば各端末で青色の背
景で撮像すればクロマキー処理により人物像部分のみを
容易に抽出できる。この抽出された人物像は、制御部１
５からの指示に基づいて、解像度制御部１６２におい
て、各端末ごとに、当該端末で表示される各人物像の画
面の大きさに対応した解像度に間引かれる。すなわち、
３次元空間内にて近くに存在する場合は、その人物像の
表示画面を大きくするため、解像度を高くする。また３
次元空間内にて遠くに存在する場合は、その人物像の表
示画面を小さくするため、解像度を低くする。たとえば
１００×１００の画素数の画面を５０×５０あるいは２
５×２５等の画素数の画面にするような処理を行う。端
末３４に対する処理では、図５Ａの例では近い利用者４
１の人物像は解像度が高く大きな表示画面とされ、遠い
利用者４２，４３の人物像は解像度が低く小さい表示画
面とされる。端末３２に対する処理では、図５Ａの例で
は、端末３２の利用者４２から端末３３の利用者４３の
人物像はすぐ近くであり最も高い解像度で大きな表示画
面とされ、端末３１の利用者４１は利用者４２からの距
離が中程度で、その人物像は中程度の解像度で中程度の
大きさの表示画面とされ、遠い利用者４４の人物像は低
い解像度で小さい表示画面とされる。このように各端末
ごとに当該端末以外の端末の利用者の人物像が解像度制
御される。この解像度制御された画面は、人物像分配部
１６３より映像情報バス１８１に出力される。The video processing section 16 receives the video information sent from each of the terminals 31 to 34, and first extracts only the human image portion by the human image extracting section 161. There are various methods for this extraction means. For example, if each terminal captures an image on a blue background, only a human image portion can be easily extracted by chroma key processing. The extracted person image is stored in the control unit 1
5, the resolution control unit 162 thins out each terminal to a resolution corresponding to the screen size of each person image displayed on the terminal. That is,
When the person is near in the three-dimensional space, the resolution is increased to enlarge the display screen of the person image. 3
If the person exists far away in the dimensional space, the resolution is lowered to reduce the display screen of the person image. For example, a screen with 100 × 100 pixels is changed to 50 × 50 or 2
A process is performed so that the screen has a number of pixels such as 5 × 25. In the processing for the terminal 34, in the example of FIG.
One person image is a large display screen with high resolution, and the person images of distant users 42 and 43 are low display screens with low resolution. In the process for the terminal 32, in the example of FIG. 5A, the person images of the user 42 of the terminal 32 to the user 43 of the terminal 33 are immediately close to each other and have a large display screen with the highest resolution. The distance from the user 42 is medium, the person image is a medium-sized display screen with a medium resolution, and the person image of the distant user 44 is a small display screen with a low resolution. In this way, the resolution of the person image of the user of the terminal other than the terminal is controlled for each terminal. The screen whose resolution is controlled is output from the person image distribution unit 163 to the video information bus 181.

【００１３】音声処理部１７では、各端末３１〜３４か
ら送られてくる音声情報を受信し、制御部１５からの指
示に基づいて、音声ミキシング部１７１により各端末の
利用者間の距離に応じて、入力音声をミキシングする。
例えば、あらかじめ定められた距離以内にいる利用者の
音声をミキシングし、そのミキシング音声は、音声分配
部１７２より音声情報バス１８２に出力される。端末３
４に対する処理は図５Ａの場合、近い利用者４１からの
音声を端末３４へ送るが、遠い利用者４２，４３からの
各音声は端末３４へは送らない。図５Ａ中で利用者４３
が破線の位置（４３′）に移動したとすると、利用者４
４から利用者４１，４３′は比較的近い等距離となる。
よって利用者４１と４３の両音声をミキシングして端末
３４へ送出する。このように、各端末ごとに、これに送
るべき他の端末の利用者からの音声を制御する。The voice processing unit 17 receives voice information sent from each of the terminals 31 to 34 and, based on an instruction from the control unit 15, controls the voice mixing unit 171 according to the distance between users of each terminal. And mix the input audio.
For example, the sound of a user who is within a predetermined distance is mixed, and the mixed sound is output from the sound distribution unit 172 to the sound information bus 182. Terminal 3
5A, the voice from the near user 41 is sent to the terminal 34 in the case of FIG. 5A, but the voices from the distant users 42 and 43 are not sent to the terminal 34. The user 43 in FIG. 5A
Has moved to the position indicated by the broken line (43 ').
From 4, the users 41 and 43 'are relatively close and equidistant.
Therefore, both voices of the users 41 and 43 are mixed and transmitted to the terminal 34. In this way, for each terminal, the voice from the user of another terminal to be sent to this terminal is controlled.

【００１４】各データ多重分離部１１，１２，１３，１
４では、自端末を除くすべての端末の位置情報、処理さ
れた映像情報およびミキシング音声情報を多重化し、該
当端末に送信する。図４は、この発明における端末３
１，３２，３３，３４の一実施例を示す。サーバ装置１
よりの信号は、データ多重分離部５１に入り、それぞれ
位置情報、映像情報および音声情報に分離される。位置
情報入力部５５は、自分を共通３次元空間内のどの位置
に置くかを制御するためのものであり、例えば３次元空
間内のｘｙ平面内のｘｙ座標として自分の位置する個所
を入力する。３次元空間生成部５４では、位置情報入力
部５５からの指示により３次元空間の表示画面を更新す
る。位置情報入力部５５よりの入力はｘｙ座標のみなら
ず、視線方向も入力する。図５Ａの例では端末３４で利
用者４４の位置と視線方向が入力され、その位置から共
通３次元空間の壁面３を見た３次元空間像が図５Ｂに示
すように端末の表示画面に表示されるように、３次元空
間生成部５４で表示画面の更新がなされる。例えば共通
３次元空間内の主だった複数の位置と視線方向からその
共通３次元空間を見た表面をそれぞれ予め蓄積してお
き、これら蓄積した画面情報から、位置情報入力部５５
よりの入力位置情報に応じて選択し、３次元空間表示画
面を更新する。Each of the data demultiplexing units 11, 12, 13, 1
In step 4, the position information of all the terminals except the own terminal, the processed video information and the mixed audio information are multiplexed and transmitted to the corresponding terminal. FIG. 4 shows a terminal 3 according to the present invention.
1, 32, 33 and 34 show an embodiment. Server device 1
The input signal enters the data demultiplexing unit 51 and is separated into position information, video information, and audio information. The position information input unit 55 is for controlling which position in the common three-dimensional space is to be placed. For example, the position information input unit 55 inputs a position where the user is located as xy coordinates in an xy plane in the three-dimensional space. . The three-dimensional space generation unit 54 updates the display screen of the three-dimensional space according to an instruction from the position information input unit 55. The input from the position information input unit 55 inputs not only the xy coordinates but also the line of sight. In the example of FIG. 5A, the position and the line-of-sight direction of the user 44 are input at the terminal 34, and a three-dimensional space image of the wall surface 3 of the common three-dimensional space is displayed on the display screen of the terminal as shown in FIG. As a result, the display screen is updated by the three-dimensional space generation unit 54. For example, a plurality of main positions in the common three-dimensional space and the surface of the common three-dimensional space viewed from the line of sight are stored in advance, and the position information input unit 55 is obtained from the stored screen information.
Selection according to the input position information, and updates the three-dimensional space display screen.

【００１５】また３次元空間生成部５４では、データ多
重分離部５１で分離された通信相手の他の端末利用者の
位置情報により、画像情報の３次元空間内での他の利用
者のオブジェクトを生成する。例えば図５の例では端末
３４において、共通３次元空間の表示画面が図５Ｂのよ
うに決まり、この時、利用者４４の位置は画面の最下端
（最も手前）の中央部の点であり、この画面位置を利用
者４４の位置情報とし、これを基準に利用者４１〜４３
の各位置情報に対応する画面上の位置を決定してそれら
の各位置に利用者のオブジェクトを生成する。Further, the three-dimensional space generation unit 54 uses the position information of the other terminal user of the communication partner separated by the data demultiplexing unit 51 to convert the object of another user in the three-dimensional space of the image information. Generate. For example, in the example of FIG. 5, the display screen of the common three-dimensional space is determined in the terminal 34 as shown in FIG. 5B. At this time, the position of the user 44 is a point at the center of the lowermost end (foreground) of the screen, This screen position is used as the position information of the user 44, and based on the position information, the users 41 to 43 are used.
Are determined on the screen corresponding to the respective pieces of position information, and a user object is generated at those positions.

【００１６】映像マッピング部５２では、生成された他
の利用者のオブジェクトに、データ多重分離部５１から
受信した他端末の人物映像をマッピングする。このマッ
ピングされた映像を含む３次元空間内の人物の様子が表
示装置５３にパースペクティブ（遠近法）で表示され
る。人物映像はサーバ装置１から指定された解像度に応
じた画素数の方形画像として送られ、その方形画像の例
えば中心を、３次元空間画面内の対応する利用者のオブ
ジェクト位置にマッピングする。The video mapping section 52 maps the other terminal's personal video received from the data demultiplexing section 51 to the generated other user's object. The state of the person in the three-dimensional space including the mapped video is displayed on the display device 53 in a perspective (perspective). The person image is sent from the server device 1 as a square image having the number of pixels corresponding to the designated resolution, and maps, for example, the center of the square image to the corresponding user object position in the three-dimensional space screen.

【００１７】なお位置情報入力部５５から入力された自
分の３次元空間内の位置情報は、データ多重分離部５１
を介してサーバ装置１に送信される。またテレビカメラ
５６により撮像された自分の映像もデータ多重分離部５
１を介してサーバ装置１に送信される。さらにマイク５
８により収音された自分の音声は、サーバ装置１に送信
され、サーバ装置１から受信しデータ多重分離部５１で
分離されたミキシング音声は、スピーカ５７にて再生さ
れる。The position information in the three-dimensional space input from the position information input unit 55 is input to the data demultiplexing unit 51.
Is transmitted to the server device 1 via the. The own image captured by the television camera 56 is also used by the data demultiplexing unit 5.
1 to the server device 1. More microphone 5
The own sound picked up by 8 is transmitted to the server device 1, and the mixed sound received from the server device 1 and separated by the data demultiplexing unit 51 is reproduced by the speaker 57.

【００１８】図５は、３次元空間内の各端末の利用者の
位置に基づいて、表示画面を制御する方法を説明する図
である。ここで人物４１，４２，４３，４４はそれぞ
れ、端末３１，３２，３３，３４の利用者とする。図５
Ａは３次元空間内でのｘ−ｙ平面上での人物位置を示
す。この位置関係はサーバ装置１の制御部１５が管理し
ている。また図５ＢおよびＣは端末３４の利用者４４の
表示装置での画面例を示す。ここでは、利用者４４が３
次元空間内の壁面１のほうを見ている場合を示す。FIG. 5 is a diagram for explaining a method of controlling the display screen based on the position of the user of each terminal in the three-dimensional space. Here, the persons 41, 42, 43, and 44 are the users of the terminals 31, 32, 33, and 34, respectively. FIG.
A indicates a person position on an xy plane in a three-dimensional space. This positional relationship is managed by the control unit 15 of the server device 1. 5B and 5C show screen examples on the display device of the user 44 of the terminal 34. Here, the user 44 is 3
The case where the wall surface 1 in the dimensional space is viewed is shown.

【００１９】図５Ａにおいて、端末３３の利用者４３が
４３′の位置にきた場合には、端末３４の利用者４４の
表示装置５３での画面は、図５ＢからＣに変更される。
この場合、図５Ｂの場合には利用者４３の表示画面が小
さいため解像度の低い画面をサーバ装置１から送信し、
図５Ｃの場合には利用者４３の表示画面が大きいため解
像度の高い画面をサーバ装置１から送信するような制御
をサーバ装置１では行う。またサーバ装置１では、端末
３４のスピーカ５７において、図５Ｂの場合には人物４
１ (端末３１の利用者）のみの音声を出力し、図５Ｃの
場合には人物４１（端末３１の利用者）と人物４３ (端
末３３の利用者）のミキシング音声を出力するような制
御を行う。In FIG. 5A, when the user 43 of the terminal 33 comes to the position 43 ', the screen on the display device 53 of the user 44 of the terminal 34 is changed from FIG. 5B to C.
In this case, since the display screen of the user 43 is small in the case of FIG.
In the case of FIG. 5C, since the display screen of the user 43 is large, the server apparatus 1 performs control to transmit a high-resolution screen from the server apparatus 1. In the server device 1, the speaker 57 of the terminal 34, in the case of FIG.
5 (C) is output, and in the case of FIG. 5C, the mixing sound of the person 41 (the user of the terminal 31) and the person 43 (the user of the terminal 33) is output. Do.

【００２０】なお人物の表示画面が非常に小さい場合、
すなわち３次元空間内にて非常に遠くに存在する場合に
は、人物映像は送信せずに人物を小さいアイコン（パタ
ーン）で表示することも可能である。以上の実施例にお
いては、各端末の利用者の位置から他の利用者の位置ま
での距離に基づき表示画面の大きさを制御したが、例え
ば表示画面上の相手利用者を端末から例えばカーソルに
て指示し、その指示情報によりサーバ装置１にて表示画
面の大きさを変えてその端末の表示装置に表示すること
も可能である。また利用者が、表示画面上の相手利用者
を端末から例えばカーソルで指示することにより、その
端末から音声をスピーカに供給して両者の間で音声通話
路が設定され両利用者同士が会話を行うような制御をす
ることも可能である。If the display screen of the person is very small,
That is, when the person exists very far in the three-dimensional space, the person can be displayed as a small icon (pattern) without transmitting the person image. In the above embodiment, the size of the display screen is controlled based on the distance from the position of the user of each terminal to the position of another user.For example, the partner user on the display screen is moved from the terminal to the cursor, for example. It is also possible to change the size of the display screen in the server device 1 based on the instruction information and display the same on the display device of the terminal. Also, when the user points the other user on the display screen from the terminal with, for example, a cursor, the terminal supplies sound to the speaker, and a voice communication path is set between the two, so that the two users can talk with each other. It is also possible to perform such control.

【００２１】[0021]

【発明の効果】以上説明したように、この発明によれ
ば、複数地点に設置された端末から送られる３次元空間
内の位置情報により各端末相互間の位置関係をサーバ装
置が把握し、その位置関係に基づいてサーバ装置から各
端末に送信される映像情報や音声情報を生成することか
ら、各端末よりの映像情報が表示される大きさに適した
画素数の画面として伝送することが可能である。また利
用者が会話をしたい相手を選択し両者間で通話すること
も可能である。さらに、実際に表示装置に表示される解
像度の画面データのみを送信すれば良いために伝送路の
効率的な利用ができるという利点がある。As described above, according to the present invention, the server device grasps the positional relationship between the terminals based on the positional information in the three-dimensional space sent from the terminals installed at a plurality of points. Generates video information and audio information transmitted from the server device to each terminal based on the positional relationship, so it can be transmitted as a screen with the number of pixels suitable for the size where the video information from each terminal is displayed It is. It is also possible for the user to select a partner with whom the user wants to have a conversation and to have a call between them. Furthermore, since only the screen data having the resolution actually displayed on the display device needs to be transmitted, there is an advantage that the transmission path can be efficiently used.

[Brief description of the drawings]

【図１】この発明の多地点間映像音声通信システム全体
の構成図。FIG. 1 is a configuration diagram of an entire multipoint video / audio communication system according to the present invention.

【図２】端末３１〜３４とサーバ装置１間で伝送される
情報を示す図。FIG. 2 is a diagram showing information transmitted between terminals 31 to 34 and a server device 1;

【図３】この発明におけるサーバ装置１の実施例の機能
構成を示す図。FIG. 3 is a diagram showing a functional configuration of an embodiment of the server device 1 according to the present invention.

【図４】この発明における端末３１〜３４の実施例の機
能構成を示す図。FIG. 4 is a diagram showing a functional configuration of an embodiment of terminals 31 to 34 according to the present invention.

【図５】３次元空間内の各端末の利用者の位置に基づい
て、表示画面を制御する方法を説明する図。FIG. 5 is an exemplary view for explaining a method of controlling a display screen based on a position of a user of each terminal in a three-dimensional space.

───────────────────────────────────────────────────── フロントページの続き (72)発明者小松忠彦東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者小谷野浩東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5C064 AA02 AC01 AC02 AC06 AC09 AC12 AC13 AC16 AC22 AD06 5C082 AA01 AA21 AA27 AA31 BA27 BA41 BB01 CB01 DA87 MM02 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Tadahiko Komatsu 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Inside Japan Telegraph and Telephone Corporation (72) Inventor Hiroshi Otani 3-192-1, Nishi-Shinjuku, Shinjuku-ku, Tokyo No. Nippon Telegraph and Telephone Corporation F term (reference) 5C064 AA02 AC01 AC02 AC06 AC09 AC12 AC13 AC16 AC22 AD06 5C082 AA01 AA21 AA27 AA31 BA27 BA41 BB01 CB01 DA87 MM02

Claims

[Claims]

A means for capturing an image of a user; a means for collecting voice of the user; a means for inputting positional information of the user in a three-dimensional space; Means for multiplexing audio information and position information and transmitting the information to the server device, means for receiving position information, video information and audio information of the user of the other terminal from the server device, and generating image information of the three-dimensional space, Means for generating an object in the three-dimensional space of the generated image information based on the received position information of the other terminal user, and mapping the received other terminal user's video information corresponding to the object; A terminal device having a means for displaying image information in a three-dimensional space including the mapped video, a means for reproducing the received audio information, and receiving video information, audio information and position information from each terminal Means for extracting a person image from the received video information, and means for controlling the resolution of the extracted person image of each terminal user for each terminal based on the received position information of each terminal. Means for mixing the received audio information from each terminal according to a predetermined condition for each terminal, and image information and position information of a person image of each terminal user whose resolution is controlled for each terminal, and A multipoint video and audio communication system, characterized in that a server device having means for transmitting the mixed audio information to a corresponding terminal is connected via a communication network.

2. A terminal device of a multipoint video / audio communication system in which a plurality of terminal devices arranged at multipoints and one server device are connected by a communication network, wherein: a means for imaging a user; Means for collecting a user's voice; means for inputting position information of the user in a three-dimensional space; means for multiplexing the image information, sound information, and position information of the user and transmitting the multiplexed position information to a server device Means for receiving position information, video information, and audio information of a user of the other terminal from the server device; generating image information of the three-dimensional space, based on the received position information of the other terminal user, Means for generating an object in the three-dimensional space of the generated image information and mapping the received video information of the other terminal user corresponding to the object, and image information of the three-dimensional space including the mapped video Terminal device characterized in that it comprises means for displaying, and means for reproducing the received audio information.

3. A server device of a multipoint video and audio communication system in which a plurality of terminals arranged at multipoints and one server device are connected by a communication network, wherein video and audio information from each terminal is provided. Means for receiving position information; means for extracting a person image from the received video information; and resolution of the extracted person image of each terminal user for each terminal based on the received position information of each terminal. Means for controlling, the means for mixing the received audio information from each terminal according to a predetermined condition for each terminal, image information of a person image of each terminal user whose resolution is controlled for each terminal And a means for transmitting the position information and the mixed audio information to the corresponding terminal.