JP5813542B2

JP5813542B2 - Image communication system, AR (Augmented Reality) video generation device, and program

Info

Publication number: JP5813542B2
Application number: JP2012060659A
Authority: JP
Inventors: 紀子水口; 内山　健; 健内山; まり阿久澤; 択磨松村; 大樹清水; 洋志野中
Original assignee: NTT Docomo Inc; Nippon Control System Corp
Current assignee: NTT Docomo Inc; Nippon Control System Corp
Priority date: 2012-03-16
Filing date: 2012-03-16
Publication date: 2015-11-17
Anticipated expiration: 2032-03-16
Also published as: JP2013196154A

Description

本発明は、拡張現実（ＡＲ）を用いたコミュニケーションに関する。 The present invention relates to communication using augmented reality (AR).

ＡＲ技術による合成画像を通信ネットワークを介して配信する技術がある（例えば、特許文献１参照）。合成画像を得るための方法としては、クロマキー合成（ブルーバック合成）のように特定色の背景を利用して撮影を行うものや、ＣＧ（Computer Graphics）によって得られた実在しない画像（アバタなど）を合成するものなどが知られている。 There is a technique for distributing a composite image by an AR technique via a communication network (see, for example, Patent Document 1). As a method for obtaining a composite image, a method using a background of a specific color, such as chroma key composition (blue back composition), or a non-existent image (such as an avatar) obtained by CG (Computer Graphics) The one that synthesizes is known.

特開２００４−３４１６４２号公報JP 2004-341642 A

ところで、ＡＲ技術を利用した従来の合成画像は、ユーザが撮影した映像に対して、あらかじめ用意された画像を重ね合わせるものが一般的であった。したがって、いわゆるテレビ電話のような、ユーザ同士の実際の映像（すなわち、あらかじめ用意されたものではない画像）を利用したリアルタイムなコミュニケーションにそのまま適用することはできなかった。
そこで、本発明は、相互のユーザによって撮影された映像に基づいて合成された動画を用いたリアルタイムなコミュニケーションを可能にする技術を提供する。 By the way, a conventional composite image using the AR technology is generally a superposition of a previously prepared image on a video shot by a user. Therefore, it cannot be applied as it is to real-time communication using actual videos (that is, images not prepared in advance) between users such as so-called videophones.
Therefore, the present invention provides a technique that enables real-time communication using a moving image synthesized based on videos taken by mutual users.

本発明の一態様に係る画像コミュニケーションシステムは、第１端末と第２端末との接続を管理する接続管理装置と、前記接続管理装置により接続が管理された前記第１端末及び前記第２端末の少なくとも一方に対して、双方で撮影された映像を合成した合成動画データを送信するＡＲ（Augmented Reality）動画生成装置とを有し、前記接続管理装置は、前記第１端末と前記第２端末とを接続する通信回線に関する回線情報を前記ＡＲ動画生成装置に供給し、前記ＡＲ動画生成装置は、前記第１端末及び前記第２端末の一方から、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、前記第１端末及び前記第２端末の他方から、所定の形状のマーカを含む背景を撮影した第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記第２受信部により受信された第２動画データが表す背景中の前記マーカに基づいて定義される当該背景の３次元の座標系を特定し、前記抽出部により抽出された画素に応じた画像を当該座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを前記他方の端末に送信する送信部とを備え、前記接続管理装置により供給された回線情報に応じて、前記合成動画データが表す映像を異ならせる。
好ましい態様において、前記ＡＲ動画生成装置は、前記回線情報に基づいて、前記合成動画データによるコミュニケーションが可能であるか否かを前記第１端末と前記第２端末のそれぞれについて判断する判断部を備え、前記判断部により前記合成動画データによるコミュニケーションが可能であると判断された端末について、前記合成動画データを生成及び送信する。
また、本発明の一態様に係る画像コミュニケーションシステムは、第１端末と第２端末との接続を管理する接続管理装置と、前記接続管理装置により接続が管理された前記第１端末及び前記第２端末の少なくとも一方に対して、双方で撮影された映像を合成した合成動画データを送信するＡＲ（Augmented Reality）動画生成装置とを有し、前記接続管理装置は、前記第１端末と前記第２端末とにおけるハードウェア又はソフトウェアのリソースに関するリソース情報を前記ＡＲ動画生成装置に供給し、前記ＡＲ動画生成装置は、前記第１端末及び前記第２端末の一方から、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、前記第１端末及び前記第２端末の他方から、所定の形状のマーカを含む背景を撮影した第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記第２受信部により受信された第２動画データが表す背景中の前記マーカに基づいて定義される当該背景の３次元の座標系を特定し、前記抽出部により抽出された画素に応じた画像を当該座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを前記他方の端末に送信する送信部とを備え、前記接続管理装置により供給されたリソース情報に応じて、前記合成動画データが表す映像を異ならせる。
好ましい態様において、前記ＡＲ動画生成装置は、前記リソース情報に基づいて、前記合成動画データによるコミュニケーションが可能であるか否かを前記第１端末と前記第２端末のそれぞれについて判断する判断部を備え、前記判断部により前記合成動画データによるコミュニケーションが可能であると判断された端末について、前記合成動画データを生成及び送信する。
また、本発明の一態様に係る画像コミュニケーションシステムは、第１端末と第２端末との接続を管理する接続管理装置と、前記接続管理装置により接続が管理された前記第１端末及び前記第２端末の少なくとも一方に対して、双方で撮影された映像を合成した合成動画データを送信するＡＲ（Augmented Reality）動画生成装置とを有し、前記接続管理装置は、前記第１端末と前記第２端末との間で音声メッセージを送受信する送受信部を備えるとともに、前記送受信部により送受信される音声メッセージの音量又は音質の変化を表す情報を前記ＡＲ動画生成装置に供給し、前記ＡＲ動画生成装置は、前記第１端末及び前記第２端末の一方から、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、前記第１端末及び前記第２端末の他方から、所定の形状のマーカを含む背景を撮影した第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記第２受信部により受信された第２動画データが表す背景中の前記マーカに基づいて定義される当該背景の３次元の座標系を特定し、前記抽出部により抽出された画素に応じた画像を当該座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを前記他方の端末に送信する送信部とを備え、前記接続管理装置により供給された音量又は音質の変化を表す情報に応じて、前記合成動画データが表す映像を異ならせる。
また、本発明の一態様に係る画像コミュニケーションシステムは、第１端末と第２端末との接続を管理する接続管理装置と、前記接続管理装置により接続が管理された前記第１端末及び前記第２端末の少なくとも一方に対して、双方で撮影された映像を合成した合成動画データを送信するＡＲ（Augmented Reality）動画生成装置とを有し、前記ＡＲ動画生成装置は、前記第１端末及び前記第２端末の一方から、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、前記第１端末及び前記第２端末の他方から、所定の形状のマーカを含む背景を撮影した第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記第２受信部により受信された第２動画データが表す背景中の前記マーカに基づいて定義される当該背景の３次元の座標系を特定し、前記抽出部により抽出された画素に応じた画像を当該座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを前記他方の端末に送信する送信部と、前記第１受信部により受信された第１動画データにおける前記被写体の変化を解析する解析部を備え、前記解析部により解析された前記被写体の変化を表す情報を前記接続管理装置に供給し、前記接続管理装置は、前記第１端末と前記第２端末との間で音声メッセージを送受信する送受信部であって、前記ＡＲ動画生成装置により供給された前記被写体の変化を表す情報に応じて、送受信する前記音声メッセージの音量又は音質を異ならせる送受信部を備える。 An image communication system according to an aspect of the present invention includes a connection management device that manages a connection between a first terminal and a second terminal, and the first terminal and the second terminal that are managed by the connection management device. An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both sides to at least one, wherein the connection management device includes the first terminal and the second terminal. Is connected to the AR video generation device, and the AR video generation device captures a subject including a predetermined object from one of the first terminal and the second terminal. A first receiver for receiving first moving image data including distance information for each pixel of the subject, and a marker of a predetermined shape from the other of the first terminal and the second terminal. A second receiving unit that receives second moving image data that captures a background including the image, and a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first receiving unit And a three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data received by the second receiving unit, and extracted by the extracting unit Generating a combined moving image data generated by combining the image corresponding to the selected pixel with the background so as to correspond to the coordinate system, and transmitting the combined moving image data generated by the generating unit to the other terminal and a transmission unit, according to the channel information supplied by the connection management apparatus, Ru with different image represented by the composite video data.
In a preferred aspect, the AR video generation device includes a determination unit that determines whether communication using the composite video data is possible for each of the first terminal and the second terminal based on the line information. The synthesized moving image data is generated and transmitted for a terminal that is determined by the determination unit to be able to communicate with the synthesized moving image data.
An image communication system according to an aspect of the present invention includes a connection management device that manages a connection between a first terminal and a second terminal, the first terminal whose connection is managed by the connection management device, and the second terminal. An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both sides to at least one of the terminals, wherein the connection management device includes the first terminal and the second terminal. Resource information relating to hardware or software resources in the terminal is supplied to the AR video generation device, and the AR video generation device photographs a subject including a predetermined object from one of the first terminal and the second terminal. A first receiving unit that receives first moving image data including distance information for each pixel of the subject, the first terminal, and the front Based on the distance information of the 1st animation data received by the 2nd receiving part which receives the 2nd animation data which photoed the background containing the marker of the predetermined shape from the other of the 2nd terminal, and the 1st receiving part A three-dimensional background defined based on the marker in the background represented by the second moving image data received by the second receiver and an extraction unit that extracts pixels corresponding to the predetermined object from the subject A generating unit that generates a composite moving image data in which an image corresponding to the pixel extracted by the extracting unit is combined with the background so as to correspond to the coordinate system, and generated by the generating unit A transmission unit that transmits the combined moving image data to the other terminal, and changes the video represented by the combined moving image data according to the resource information supplied by the connection management device.
In a preferred aspect, the AR video generation device includes a determination unit that determines whether communication using the composite video data is possible for each of the first terminal and the second terminal based on the resource information. The synthesized moving image data is generated and transmitted for a terminal that is determined by the determination unit to be able to communicate with the synthesized moving image data.
An image communication system according to an aspect of the present invention includes a connection management device that manages a connection between a first terminal and a second terminal, the first terminal whose connection is managed by the connection management device, and the second terminal. An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both sides to at least one of the terminals, wherein the connection management device includes the first terminal and the second terminal. A transmission / reception unit that transmits / receives a voice message to / from a terminal, and supplies information representing a change in volume or sound quality of the voice message transmitted / received by the transmission / reception unit to the AR video generation device; First moving image data obtained by photographing a subject including a predetermined object from one of the first terminal and the second terminal, and distance information for each pixel of the subject. And a second receiving unit for receiving second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the first terminal and the second terminal. Based on the distance information of the first moving image data received by the first receiving unit, an extraction unit for extracting pixels corresponding to the predetermined object from the subject, and the second receiving unit A three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data is specified, and an image corresponding to the pixel extracted by the extraction unit is made to correspond to the coordinate system. A volume that is provided by the connection management device, and includes a generation unit that generates the combined moving image data combined with the background and a transmission unit that transmits the combined moving image data generated by the generation unit to the other terminal. Alternatively, the video represented by the synthetic moving image data is made different according to the information representing the change in sound quality.
An image communication system according to an aspect of the present invention includes a connection management device that manages a connection between a first terminal and a second terminal, the first terminal whose connection is managed by the connection management device, and the second terminal. An AR (Augmented Reality) moving image generating device that transmits combined moving image data obtained by synthesizing videos taken by both the terminals to at least one of the terminals, wherein the AR moving image generating device includes the first terminal and the first A first receiver for receiving first moving image data obtained by photographing a subject including a predetermined object from one of two terminals, the first moving image data including distance information for each pixel of the subject; A second receiving unit that receives second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the terminal and the second terminal; and a front of the first moving image data received by the first receiving unit. Based on the distance information, an extraction unit that extracts a pixel corresponding to the predetermined object from the subject, and the marker that is defined based on the marker in the background represented by the second moving image data received by the second reception unit A generating unit that identifies a three-dimensional coordinate system of a background and generates synthetic moving image data synthesized with the background so that an image corresponding to the pixel extracted by the extracting unit corresponds to the coordinate system; A transmitting unit that transmits the combined moving image data generated by the unit to the other terminal, and an analyzing unit that analyzes the change in the subject in the first moving image data received by the first receiving unit. The analyzed information representing the change in the subject is supplied to the connection management apparatus, and the connection management apparatus transmits and receives a voice message between the first terminal and the second terminal. A receiving unit, comprising: a transmitting / receiving unit that varies the volume or sound quality of the voice message to be transmitted / received in accordance with the information representing the change of the subject supplied by the AR video generation device.

好ましい態様において、前記ＡＲ動画生成装置は、前記第２受信部により受信された第２動画データを解析し、前記座標系を算出する算出部を備え、前記生成部は、前記算出部により算出された座標系に基づいて、前記合成動画データを生成する。 In favorable preferable embodiment, the AR video production apparatus analyzes the second moving image data received by the second receiving unit includes a calculation unit for calculating the coordinate system, the generating unit, by the calculation unit Based on the calculated coordinate system, the synthesized moving image data is generated.

本発明の他の態様に係るＡＲ動画生成装置は、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記抽出部により抽出された画素に応じた画像を、前記第２受信部により受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを送信する送信部とを備え、前記第１動画データを送信した端末と前記第２動画データを送信した端末との接続を管理する接続管理装置から、前記第１動画データを送信した端末と前記第２動画データを送信した端末とを接続する通信回線に関する回線情報が供給され、供給された前記回線情報に応じて、前記合成動画データが表す映像を異ならせる。
また、本発明の他の態様に係るＡＲ動画生成装置は、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記抽出部により抽出された画素に応じた画像を、前記第２受信部により受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを送信する送信部とを備え、前記第１動画データを送信した端末と前記第２動画データを送信した端末とを接続する接続管理装置から、前記端末におけるハードウェア又はソフトウェアのリソースに関するリソース情報が供給され、供給された前記リソース情報に応じて、前記合成動画データが表す映像を異ならせる。
また、本発明の他の態様に係るＡＲ動画生成装置は、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記抽出部により抽出された画素に応じた画像を、前記第２受信部により受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを送信する送信部とを備え、前記第１動画データを送信した端末と前記第２動画データを送信した端末との間で音声メッセージを送受信する送受信部を備えるとともに、前記送受信部により送受信される音声メッセージの音量又は音質の変化を表す情報を供給する接続管理装置から、前記送受信部により送受信される音声メッセージの音量又は音質の変化を表す情報が供給され、供給された情報に応じて、前記合成動画データが表す映像を異ならせる。
また、本発明の他の態様に係るＡＲ動画生成装置は、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１受信部と、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２受信部と、前記第１受信部により受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する抽出部と、前記抽出部により抽出された画素に応じた画像を、前記第２受信部により受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する生成部と、前記生成部により生成された合成動画データを送信する送信部と、前記第１受信部により受信された第１動画データにおける前記被写体の変化を解析する解析部を備え、前記第１動画データを送信した端末と前記第２動画データを送信した端末との間で音声メッセージを送受信する送受信部であって、供給される前記被写体の変化を表す情報に応じて、送受信する前記音声メッセージの音量又は音質を異ならせる送受信部を備える接続管理装置へ、前記解析部により解析された前記被写体の変化を表す情報を供給する。 An AR moving image generating apparatus according to another aspect of the present invention receives first moving image data including distance information for each pixel of a subject, the first moving image data capturing a subject including a predetermined object. A second receiving unit that receives second moving image data in which a background including a marker having a predetermined shape is captured, wherein a three-dimensional coordinate system is defined based on the marker; An extraction unit for extracting pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit, and a pixel corresponding to the pixels extracted by the extraction unit A generating unit that generates synthesized moving image data in which the image is combined with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit; And a transmission unit for transmitting the synthesized video data, the connection management apparatus for managing a connection with the transmitted terminal the second moving image data and transmitted by the terminal to the first video data, transmitting the first video data the terminal and the second transmit the moving picture data line information about the communication line for connecting the terminal is supplied in accordance with a supplied the line information, Ru with different image represented by the composite video data.
The AR video generation device according to another aspect of the present invention receives first video data including distance information for each pixel of the subject, the first video data capturing a subject including a predetermined object. A second receiving unit that receives second moving image data in which a three-dimensional coordinate system is defined based on the first receiving unit that captures a background including a marker having a predetermined shape. An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit, and a pixel extracted by the extraction unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit, and the generating unit A connection unit that connects the terminal that has transmitted the first moving image data and the terminal that has transmitted the second moving image data to a hardware or software in the terminal Resource information related to these resources is supplied, and the video represented by the composite video data is made different according to the supplied resource information.
The AR video generation device according to another aspect of the present invention receives first video data including distance information for each pixel of the subject, the first video data capturing a subject including a predetermined object. A second receiving unit that receives second moving image data in which a three-dimensional coordinate system is defined based on the first receiving unit that captures a background including a marker having a predetermined shape. An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit, and a pixel extracted by the extraction unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit, and the generating unit A transmission unit that transmits the synthesized video data formed, and a transmission / reception unit that transmits and receives a voice message between the terminal that transmitted the first video data and the terminal that transmitted the second video data, Information indicating a change in volume or sound quality of a voice message transmitted / received by the transmitter / receiver is supplied and supplied from a connection management device that supplies information indicating a change in volume or sound quality of a voice message transmitted / received by the transmitter / receiver. In accordance with the information, the video represented by the synthesized video data is made different.
The AR video generation device according to another aspect of the present invention receives first video data including distance information for each pixel of the subject, the first video data capturing a subject including a predetermined object. A second receiving unit that receives second moving image data in which a three-dimensional coordinate system is defined based on the first receiving unit that captures a background including a marker having a predetermined shape. An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit, and a pixel extracted by the extraction unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit, and the generating unit A transmission unit that transmits the synthesized moving image data formed; and an analysis unit that analyzes a change in the subject in the first moving image data received by the first reception unit; the terminal that transmitted the first moving image data; A transmission / reception unit that transmits / receives a voice message to / from a terminal that has transmitted the second moving image data, wherein transmission / reception varies in volume or sound quality of the voice message to be transmitted / received according to supplied information representing the change in the subject. Information representing a change in the subject analyzed by the analysis unit is supplied to a connection management device including the unit.

本発明の他の態様に係るプログラムは、コンピュータに、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１ステップと、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２ステップと、前記第１ステップにより受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する第３ステップと、前記第３ステップにより抽出された画素に応じた画像を、前記第２ステップにより受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する第４ステップと、前記第４ステップにより生成された合成動画データを送信する第５ステップと、前記第１動画データを送信した端末と前記第２動画データを送信した端末との接続を管理する接続管理装置から、前記第１動画データを送信した端末と前記第２動画データを送信した端末とを接続する通信回線に関する回線情報が供給され、供給された前記回線情報に応じて、前記合成動画データが表す映像を異ならせる第６ステップとを実行させるためのプログラムである。According to another aspect of the present invention, there is provided a program for receiving, on a computer, first moving image data obtained by photographing a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject. A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker; A third step of extracting pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step, and an image corresponding to the pixels extracted in the third step Is generated on the background so as to correspond to the coordinate system defined in the second moving image data received in the second step. A connection management device for managing a connection between the terminal that has transmitted the first moving image data and the terminal that has transmitted the second moving image data, and a fifth step of transmitting the combined moving image data generated by the step 4 To, line information relating to a communication line connecting the terminal that transmitted the first moving image data and the terminal that transmitted the second moving image data is supplied, and the combined moving image data is represented according to the supplied line information. This is a program for executing the sixth step of making a video different.
また、本発明の他の態様に係るプログラムは、コンピュータに、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１ステップと、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２ステップと、前記第１ステップにより受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する第３ステップと、前記第３ステップにより抽出された画素に応じた画像を、前記第２ステップにより受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する第４ステップと、前記第４ステップにより生成された合成動画データを送信する第５ステップと、前記第１動画データを送信した端末と前記第２動画データを送信した端末とを接続する接続管理装置から、前記端末におけるハードウェア又はソフトウェアのリソースに関するリソース情報が供給され、供給された前記リソース情報に応じて、前記合成動画データが表す映像を異ならせる第６ステップとを実行させるためのプログラムである。In addition, a program according to another aspect of the present invention receives first moving image data including distance information for each pixel of a subject, the first moving image data capturing a subject including a predetermined object, on a computer. A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker; A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step, and according to the pixel extracted in the third step The synthesized video data is synthesized with the background corresponding to the coordinate system defined in the second video data received in the second step. Connection management for connecting the terminal that has transmitted the first moving picture data and the terminal that has transmitted the second moving picture data to the fourth step that transmits the synthetic moving picture data generated by the fourth step. A program for executing resource information on hardware or software resources in the terminal from a device, and executing a sixth step of differentiating the video represented by the composite video data according to the supplied resource information. is there.
また、本発明の他の態様に係るプログラムは、コンピュータに、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１ステップと、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２ステップと、前記第１ステップにより受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する第３ステップと、前記第３ステップにより抽出された画素に応じた画像を、前記第２ステップにより受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する第４ステップと、前記第４ステップにより生成された合成動画データを送信する第５ステップと、前記第１動画データを送信した端末と前記第２動画データを送信した端末との間で音声メッセージを送受信する送受信部を備えるとともに、前記送受信部により送受信される音声メッセージの音量又は音質の変化を表す情報を供給する接続管理装置から、前記送受信部により送受信される音声メッセージの音量又は音質の変化を表す情報が供給され、供給された情報に応じて、前記合成動画データが表す映像を異ならせる第６ステップとを実行させるためのプログラムである。In addition, a program according to another aspect of the present invention receives first moving image data including distance information for each pixel of a subject, the first moving image data capturing a subject including a predetermined object, on a computer. A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker; A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step, and according to the pixel extracted in the third step The synthesized video data is synthesized with the background corresponding to the coordinate system defined in the second video data received in the second step. A voice message between the terminal that has transmitted the first moving image data and the terminal that has transmitted the second moving image data. And a change in volume or sound quality of the voice message transmitted / received by the transmitter / receiver from a connection management device that supplies information indicating a change in volume or sound quality of the voice message transmitted / received by the transmitter / receiver. Is provided, and a sixth step is executed to execute the sixth step of making the video represented by the synthesized moving image data different in accordance with the supplied information.
また、本発明の他の態様に係るプログラムは、コンピュータに、所定のオブジェクトを含む被写体を撮影した第１動画データであって、当該被写体の画素毎の距離情報を含んだ第１動画データを受信する第１ステップと、所定の形状のマーカを含む背景を撮影した第２動画データであって、当該マーカに基づいて３次元の座標系が定義される第２動画データを受信する第２ステップと、前記第１ステップにより受信された第１動画データの前記距離情報に基づき、前記被写体から前記所定のオブジェクトに相当する画素を抽出する第３ステップと、前記第３ステップにより抽出された画素に応じた画像を、前記第２ステップにより受信された第２動画データに定義される座標系に対応するようにして当該背景に合成した合成動画データを生成する第４ステップと、前記第４ステップにより生成された合成動画データを送信する第５ステップと、前記第１ステップにより受信された第１動画データにおける前記被写体の変化を解析する第６ステップと、前記第１動画データを送信した端末と前記第２動画データを送信した端末との間で音声メッセージを送受信する送受信部であって、供給される前記被写体の変化を表す情報に応じて、送受信する前記音声メッセージの音量又は音質を異ならせる送受信部を備える接続管理装置へ、前記第６ステップにより解析された前記被写体の変化を表す情報を供給する第７ステップとを実行させるためのプログラムである。In addition, a program according to another aspect of the present invention receives first moving image data including distance information for each pixel of a subject, the first moving image data capturing a subject including a predetermined object, on a computer. A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker; A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step, and according to the pixel extracted in the third step The synthesized video data is synthesized with the background corresponding to the coordinate system defined in the second video data received in the second step. A fourth step, a fifth step for transmitting the composite moving image data generated by the fourth step, a sixth step for analyzing a change in the subject in the first moving image data received by the first step, A transmission / reception unit that transmits and receives a voice message between a terminal that has transmitted the first moving image data and a terminal that has transmitted the second moving image data, and transmits and receives the message according to information representing a change in the supplied subject. A program for executing a seventh step of supplying information representing a change in the subject analyzed in the sixth step to a connection management device including a transmission / reception unit that varies the volume or sound quality of the voice message.

本発明によれば、相互のユーザによって撮影された映像に基づいて合成された動画を用いたリアルタイムなコミュニケーションが可能である。 ADVANTAGE OF THE INVENTION According to this invention, real-time communication using the moving image synthesize | combined based on the image | video image | photographed by the mutual user is possible.

通信システムの全体構成を示すブロック図Block diagram showing overall configuration of communication system ユーザ端末のハードウェア構成を示すブロック図Block diagram showing the hardware configuration of the user terminal ＡＲ動画生成装置及び接続管理装置のハードウェア構成を示すブロック図The block diagram which shows the hardware constitutions of AR animation production | generation apparatus and a connection management apparatus ＡＲ動画生成装置及び接続管理装置の機能構成を示すブロック図The block diagram which shows the function structure of AR moving image production | generation apparatus and a connection management apparatus 動画データが表す映像を説明するための模式図Schematic diagram for explaining the video represented by video data オブジェクトの抽出原理を説明するための図Diagram for explaining the object extraction principle ＡＲ動画生成装置が実行する処理を示すフローチャートThe flowchart which shows the process which AR animation production | generation apparatus performs 通信システムの各装置における処理を示すシーケンスチャートSequence chart showing processing in each device of the communication system

［実施形態］
図１は、本発明の一実施形態である通信システム１０の全体構成を示すブロック図である。通信システム１０は、ユーザ間で音声とＡＲ動画によるリアルタイムなコミュニケーションを実現するための情報処理システムであり、本発明に係る画像コミュニケーションシステムの一例である。ここにおいて、ＡＲ動画とは、複数のユーザによって撮影された映像を互いに合成した動画をいい、ＡＲ技術を用いて生成されるものである。なお、ここでいうリアルタイムとは、目的とする処理があらかじめ決められた時間までに終了することを意味し、当該時間が比較的短時間であるものを意味する。通信システム１０は、ＡＲ動画生成装置１００と、接続管理装置２００と、ユーザ端末３００と、ネットワーク４００とを備える。 [Embodiment]
FIG. 1 is a block diagram showing an overall configuration of a communication system 10 according to an embodiment of the present invention. The communication system 10 is an information processing system for realizing real-time communication using voice and AR video between users, and is an example of an image communication system according to the present invention. Here, the AR moving image refers to a moving image obtained by synthesizing videos taken by a plurality of users, and is generated using the AR technology. Note that the real time here means that the target processing is completed by a predetermined time, and that the time is relatively short. The communication system 10 includes an AR video generation device 100, a connection management device 200, a user terminal 300, and a network 400.

ＡＲ動画生成装置１００は、複数のユーザ端末３００から動画データを受信し、これらを合成した合成動画データを生成及び送信するサーバ装置である。接続管理装置２００は、ユーザ端末３００間の接続（ここでは呼接続）を管理し、発話や終話を制御するサーバ装置である。接続管理装置２００による管理には、コミュニケーションのリアルタイム性を確保するための処理が含まれる。なお、ＡＲ動画生成装置１００と接続管理装置２００とは、ネットワーク４００によらずに、これとは異なる通信回線で互いに接続されていてもよい。 The AR moving image generating device 100 is a server device that receives moving image data from a plurality of user terminals 300, and generates and transmits synthesized moving image data obtained by combining these moving image data. The connection management device 200 is a server device that manages the connection (here, call connection) between the user terminals 300 and controls the utterance and the call termination. Management by the connection management device 200 includes processing for ensuring real-time communication. Note that the AR video generation device 100 and the connection management device 200 may be connected to each other via a communication line different from the network 400, without using the network 400.

ユーザ端末３００は、通信システム１０によってコミュニケーションを行うユーザが使用する通信端末である。ユーザ端末３００は、通信システム１０のための専用の通信端末であってもよいし、本実施形態のコミュニケーションを行うために必要な周辺機器をスマートフォンやパーソナルコンピュータに取り付けたものであってもよい。なお、ユーザ端末３００は、実際にはネットワーク４００に多数接続されていてもよいが、図１ではコミュニケーションを行う二者の端末のみが図示されている。以下においては、これらの端末を区別する必要がある場合には、「ユーザ端末３００ａ」、「ユーザ端末３００ｂ」と表記するものとする。 The user terminal 300 is a communication terminal used by a user who communicates with the communication system 10. The user terminal 300 may be a dedicated communication terminal for the communication system 10 or may be a device in which peripheral devices necessary for performing communication according to the present embodiment are attached to a smartphone or a personal computer. Note that a large number of user terminals 300 may actually be connected to the network 400, but only two terminals that perform communication are illustrated in FIG. In the following, when it is necessary to distinguish these terminals, they will be referred to as “user terminal 300a” and “user terminal 300b”.

ネットワーク４００は、音声や動画などのデータを送受信するためのネットワークである。ネットワーク４００は、音声を送受信するための通信回線と動画を送受信するための通信回線を別異に構成したものであってもよいが、そうでなくともよい。また、ネットワーク４００は、パケット交換方式と回線交換方式のいずれの通信回線のいずれを用いたものであってもよい。 The network 400 is a network for transmitting and receiving data such as voice and moving images. The network 400 may be configured such that a communication line for transmitting and receiving audio and a communication line for transmitting and receiving a moving image are different from each other, but this need not be the case. Further, the network 400 may use any of the communication lines of the packet switching system and the circuit switching system.

図２は、ユーザ端末３００のハードウェア構成を示すブロック図である。ユーザ端末３００は、制御部３１０と、記憶部３２０と、通信部３３０と、第１撮影部３４０と、第２撮影部３５０と、表示部３６０と、音声入出力部３７０と、操作部３８０とを備える。図２に示す構成のことを、以下においては「標準構成」という。 FIG. 2 is a block diagram illustrating a hardware configuration of the user terminal 300. The user terminal 300 includes a control unit 310, a storage unit 320, a communication unit 330, a first imaging unit 340, a second imaging unit 350, a display unit 360, a voice input / output unit 370, and an operation unit 380. Is provided. The configuration shown in FIG. 2 is hereinafter referred to as “standard configuration”.

制御部３１０は、ユーザ端末３００の動作を制御する手段である。制御部３１０は、ＣＰＵ（Central Processing Unit）などの演算処理装置やメモリを備え、所定のプログラムを実行することによって動画の撮影やデータ通信などを制御する。また、制御部３１０は、後述する３次元の座標系を算出する算出部３１１を備える。なお、算出部３１１による座標系の算出方法は、周知の方法と同様であってよい。例えば、算出部３１１は、撮影された映像の中から所定の形状のマーカを認識し、当該認識された形状と実際の形状の相違（すなわち歪み）に基づいて傾きを特定して傾き情報を算出するとともに、特定した傾きとマーカの位置とに基づいて３次元の直交座標系を定義し、座標を表す座標情報を算出する。 The control unit 310 is means for controlling the operation of the user terminal 300. The control unit 310 includes an arithmetic processing device such as a CPU (Central Processing Unit) and a memory, and controls shooting of moving images, data communication, and the like by executing a predetermined program. In addition, the control unit 310 includes a calculation unit 311 that calculates a three-dimensional coordinate system to be described later. Note that the calculation method of the coordinate system by the calculation unit 311 may be the same as a known method. For example, the calculation unit 311 recognizes a marker having a predetermined shape from the captured video, specifies inclination based on a difference (that is, distortion) between the recognized shape and the actual shape, and calculates inclination information. In addition, a three-dimensional orthogonal coordinate system is defined based on the specified inclination and the position of the marker, and coordinate information representing the coordinates is calculated.

記憶部３２０は、データを記憶する手段である。記憶部３２０は、補助記憶装置に相当し、例えば、ハードディスクやフラッシュメモリを含んで構成される。記憶部３２０は、ユーザ端末３００（自装置）のリソースに関するリソース情報を記憶している。ここにおいて、リソース情報とは、ユーザ端末３００がどのようなリソースを有しているかを示すデータをいう。ここでいうリソースは、ハードウェア的なリソースとソフトウェア的なリソースの双方を含み得る。リソース情報は、例えば、ユーザ端末３００の識別情報（機種名など）、ＣＰＵの処理能力、第１撮影部３４０や第２撮影部３５０の性能（画素数など）、表示部３６０の性能（表示解像度など）、対応しているコーデック、コミュニケーションに必要なソフトウェアのバージョン情報などを示す。 The storage unit 320 is means for storing data. The storage unit 320 corresponds to an auxiliary storage device and includes, for example, a hard disk or a flash memory. The memory | storage part 320 has memorize | stored the resource information regarding the resource of the user terminal 300 (own apparatus). Here, the resource information refers to data indicating what resources the user terminal 300 has. The resource here may include both a hardware resource and a software resource. The resource information includes, for example, the identification information (model name, etc.) of the user terminal 300, the processing capability of the CPU, the performance (number of pixels, etc.) of the first imaging unit 340 and the second imaging unit 350, and the performance (display resolution) of the display unit 360. Etc.), the corresponding codec, software version information necessary for communication, etc.

通信部３３０は、ネットワーク４００を介してデータを送受信する手段である。通信部３３０による通信は、無線・有線のいずれであってもよい。また、通信部３３０は、ユーザ端末３００の一部が外付けの周辺機器によって構成される場合に当該周辺機器と通信を行う構成を含んでもよい。 The communication unit 330 is means for transmitting / receiving data via the network 400. Communication by the communication unit 330 may be either wireless or wired. Further, the communication unit 330 may include a configuration for communicating with a peripheral device when a part of the user terminal 300 is configured with an external peripheral device.

第１撮影部３４０及び第２撮影部３５０は、いずれも被写体を撮影して動画データを出力する手段である。第１撮影部３４０及び第２撮影部３５０により出力される動画データは、被写体の各画素の色を表す色情報を少なくとも含んでいる。また、第１撮影部３４０は、ユーザを含む被写体を撮影するために用いられ、第２撮影部３５０は、マーカ（ＡＲマーカ）を含む被写体を撮影するために用いられる。さらに、第１撮影部３４０は、距離画像センサを含んで構成される。一方、第２撮影部３５０は、通常のイメージセンサ、すなわち距離情報を出力しないイメージセンサであってもよい。 The first photographing unit 340 and the second photographing unit 350 are both means for photographing a subject and outputting moving image data. The moving image data output by the first photographing unit 340 and the second photographing unit 350 includes at least color information representing the color of each pixel of the subject. The first photographing unit 340 is used for photographing a subject including a user, and the second photographing unit 350 is used for photographing a subject including a marker (AR marker). Furthermore, the first imaging unit 340 includes a distance image sensor. On the other hand, the second imaging unit 350 may be a normal image sensor, that is, an image sensor that does not output distance information.

距離画像センサとは、各画素の距離情報を出力することができるイメージセンサである。距離画像センサには、あらかじめ決められたパターンの光を照射し、その反射光の歪みによってセンサと被写体との距離を測定する「パターン照射方式」と、被写体に対して光を照射し、光が被写体に反射して戻ってくるまでの所要時間によってセンサと被写体との距離を測定する「ＴＯＦ（Time of Flight）方式」とがあるが、第１撮影部３４０にはいずれの方式が用いられてもよい。 The distance image sensor is an image sensor that can output distance information of each pixel. The distance image sensor irradiates light with a predetermined pattern, measures the distance between the sensor and the subject by distortion of the reflected light, and irradiates the subject with light. There is a “TOF (Time of Flight) method” in which the distance between the sensor and the subject is measured according to the time required for the light to reflect back to the subject, and any method is used for the first photographing unit 340. Also good.

第１撮影部３４０は、このような距離画像センサと通常のイメージセンサとを備え、色情報と距離情報とを出力する。距離画像センサとイメージセンサは、同じ方向を撮影し、距離画像センサの画素とイメージセンサの画素との対応付けがあらかじめ得られているものとする。 The first photographing unit 340 includes such a distance image sensor and a normal image sensor, and outputs color information and distance information. It is assumed that the distance image sensor and the image sensor photograph the same direction, and the correspondence between the pixels of the distance image sensor and the pixels of the image sensor is obtained in advance.

表示部３６０は、画像を表示する手段である。表示部３６０は、ユーザ端末３００に備わる液晶等のディスプレイであってもよいし、ユーザ端末３００とは別体のテレビのようなディスプレイであってもよい。また、表示部３６０は、ＨＭＤ（Head Mounted Display）のようにユーザに装着される構成であってもよい。 The display unit 360 is a means for displaying an image. The display unit 360 may be a display such as a liquid crystal provided in the user terminal 300, or may be a display such as a television separate from the user terminal 300. Further, the display unit 360 may be configured to be worn by the user like an HMD (Head Mounted Display).

音声入出力部３７０は、音声を入力及び出力する手段である。音声入出力部３７０は、スピーカやマイクロホンを備える。音声入出力部３７０は、音声の入出力をステレオ・モノラルのいずれで行ってもよい。
操作部３８０は、ユーザの操作を受け付ける手段である。操作部３８０は、キーパッド（キーボード）や、表示部３６０のディスプレイに重ねて設けられるタッチスクリーンを備える。 The voice input / output unit 370 is means for inputting and outputting voice. The voice input / output unit 370 includes a speaker and a microphone. The audio input / output unit 370 may perform audio input / output in either stereo or monaural.
The operation unit 380 is a unit that receives a user operation. The operation unit 380 includes a keypad (keyboard) and a touch screen provided to overlap the display of the display unit 360.

なお、ユーザ端末３００は、これらの構成を全て備えるものを標準構成とするが、後述するように、当該構成の一部を有しない端末であっても、本実施形態のコミュニケーションを（制限付きで）行うことが可能である。また、ユーザ端末３００は、表示解像度などの個々のリソースが一致していなくてもよい。 Note that the user terminal 300 includes all of these configurations as a standard configuration. However, as will be described later, even in a terminal that does not have a part of the configuration, the communication of this embodiment (with restrictions) ) Is possible. In addition, the user terminal 300 does not have to match individual resources such as display resolution.

図３は、ＡＲ動画生成装置１００及び接続管理装置２００のハードウェア構成を示すブロック図である。ＡＲ動画生成装置１００は、制御部１１０と、記憶部１２０と、通信部１３０とを備える。また、接続管理装置２００は、制御部２１０と、記憶部２２０と、通信部１３０とを備える。制御部１１０、１２０は、自装置（ＡＲ動画生成装置１００又は接続管理装置２００）の動作を制御する手段である。また、記憶部１２０、２２０は、データを記憶する手段であり、通信部１３０、２３０は、データを送受信する手段である。通信部１３０、２３０は、ネットワーク４００と通信するほか、ネットワーク４００を介さずに互いに通信することも可能である。 FIG. 3 is a block diagram showing the hardware configuration of the AR video generation device 100 and the connection management device 200. The AR video generation device 100 includes a control unit 110, a storage unit 120, and a communication unit 130. The connection management apparatus 200 includes a control unit 210, a storage unit 220, and a communication unit 130. The control units 110 and 120 are means for controlling the operation of the own device (the AR video generation device 100 or the connection management device 200). The storage units 120 and 220 are units for storing data, and the communication units 130 and 230 are units for transmitting and receiving data. The communication units 130 and 230 can communicate with each other without going through the network 400 in addition to communicating with the network 400.

図４は、ＡＲ動画生成装置１００及び接続管理装置２００の機能構成を示すブロック図である。ＡＲ動画生成装置１００及び接続管理装置２００は、所定のプログラムを実行することにより、図４に示す機能を実現させる。接続管理装置２００の制御部２１０は、送受信部２１１及びデータ授受部２１２として機能する。また、ＡＲ動画生成装置１００の制御部１１０は、合成制御部１１０ａ及び同期制御部１１０ｂとして機能する。 FIG. 4 is a block diagram illustrating functional configurations of the AR moving image generating apparatus 100 and the connection management apparatus 200. The AR video generation device 100 and the connection management device 200 implement the functions shown in FIG. 4 by executing a predetermined program. The control unit 210 of the connection management apparatus 200 functions as a transmission / reception unit 211 and a data transmission / reception unit 212. In addition, the control unit 110 of the AR video generation device 100 functions as a synthesis control unit 110a and a synchronization control unit 110b.

送受信部２１１は、音声メッセージを送受信する手段である。送受信部２１１は、ユーザ端末３００ａから送信された音声メッセージをユーザ端末３００ｂに送信し、ユーザ端末３００ｂから送信された音声メッセージをユーザ端末３００ａに送信する。また、送受信部２１１は、音声メッセージを送受信するために必要な接続管理を行い、必要に応じて、音量や音質の制御を行う。 The transmission / reception unit 211 is means for transmitting / receiving a voice message. The transmission / reception unit 211 transmits the voice message transmitted from the user terminal 300a to the user terminal 300b, and transmits the voice message transmitted from the user terminal 300b to the user terminal 300a. In addition, the transmission / reception unit 211 performs connection management necessary for transmitting / receiving voice messages, and controls volume and sound quality as necessary.

データ授受部２１２は、ＡＲ動画生成装置１００との間でデータを授受する手段である。データ授受部２１２は、ＡＲ動画生成装置１００からリソース情報を取得し、ＡＲ動画生成装置１００に回線情報を供給する。回線情報は、ユーザ端末３００ａ、３００ｂのそれぞれが使用している通信回線に関する情報であって、通信回線の伝達能力（転送帯域など）を示す。回線情報は、ユーザの通信事業者との契約内容によって異なる場合もあるし、通信回線の実際の利用状況（混雑の度合い）などによっても異なり得る。 The data exchanging unit 212 is a means for exchanging data with the AR moving image generating apparatus 100. The data transmission / reception unit 212 acquires resource information from the AR video generation device 100 and supplies line information to the AR video generation device 100. The line information is information relating to the communication line used by each of the user terminals 300a and 300b, and indicates the transmission capability (transfer bandwidth, etc.) of the communication line. The line information may differ depending on the contents of the contract with the user's telecommunications carrier, and may differ depending on the actual usage status (degree of congestion) of the communication line.

合成制御部１１０ａは、映像の合成を制御する手段である。合成制御部１１０ａは、より詳細には、第１受信部１１１ａ、１１１ｂと、第２受信部１１２ａ、１１２ｂと、抽出部１１３ａ、１１３ｂと、生成部１１４ａ、１１４ｂと、送信部１１５ａ、１１５ｂとを備える。 The composition control unit 110a is means for controlling the composition of video. More specifically, the composition control unit 110a includes first reception units 111a and 111b, second reception units 112a and 112b, extraction units 113a and 113b, generation units 114a and 114b, and transmission units 115a and 115b. Prepare.

なお、第１受信部１１１ａ、第２受信部１１２ａ、抽出部１１３ａ、生成部１１４ａ及び送信部１１５ａは、それぞれ、ユーザ端末３００ａに合成動画データを送信するための構成である。一方、第１受信部１１１ｂ、第２受信部１１２ｂ、抽出部１１３ｂ、生成部１１４ｂ及び送信部１１５ｂは、それぞれ、ユーザ端末３００ｂに合成動画データを送信するための構成である。例えば、第１受信部１１１ａと第１受信部１１１ｂとは、動画データを受信する端末が異なるだけであって、実行する動作自体には相違がない。同様に、第２受信部１１２ａ、１１２ｂ、抽出部１１３ａ、１１３ｂ、生成部１１４ａ、１１４ｂ及び送信部１１５ａ、１１５ｂの各組み合わせも、処理対象とするデータが異なるだけで、実行可能な処理には相違がないものである。 In addition, the 1st receiving part 111a, the 2nd receiving part 112a, the extraction part 113a, the production | generation part 114a, and the transmission part 115a are the structures for transmitting synthetic | combination moving image data to the user terminal 300a, respectively. On the other hand, the 1st receiving part 111b, the 2nd receiving part 112b, the extraction part 113b, the production | generation part 114b, and the transmission part 115b are the structures for transmitting synthetic | combination moving image data to the user terminal 300b, respectively. For example, the first receiving unit 111a and the first receiving unit 111b differ only in the terminal that receives the moving image data, and there is no difference in the operation itself. Similarly, the combinations of the second receiving units 112a and 112b, the extracting units 113a and 113b, the generating units 114a and 114b, and the transmitting units 115a and 115b differ only in the data to be processed and are different in the executable processing. There is no.

第１受信部１１１ａ、１１１ｂは、第１撮影部３４０により撮影された動画データを受信する手段である。すなわち、第１受信部１１１ａ、１１１ｂは、通信部１３０を介して受信されるデータのうち、第１撮影部３４０により撮影された動画データを選択的に取得する。第１受信部１１１ａは、この動画データをユーザ端末３００ｂから取得し、第１受信部１１１ｂは、この動画データをユーザ端末３００ａから取得する。第１受信部１１１ａ、１１１ｂが受信する動画データは、距離情報を含む動画データである。 The first receiving units 111a and 111b are means for receiving moving image data shot by the first shooting unit 340. That is, the first receiving units 111a and 111b selectively acquire the moving image data shot by the first shooting unit 340 among the data received via the communication unit 130. The first receiving unit 111a acquires the moving image data from the user terminal 300b, and the first receiving unit 111b acquires the moving image data from the user terminal 300a. The moving image data received by the first receiving units 111a and 111b is moving image data including distance information.

第２受信部１１２ａ、１１２ｂは、第２撮影部３５０により撮影された動画データを受信する手段である。すなわち、第２受信部１１２ａ、１１２ｂは、通信部１３０を介して受信されるデータのうち、第２撮影部３５０により撮影された動画データを選択的に取得する。第２受信部１１２ａは、この動画データをユーザ端末３００ａから取得し、第２受信部１１２ｂは、この動画データをユーザ端末３００ｂから取得する。第２受信部１１２ａ、１１２ｂが受信する動画データは、その映像中にマーカを少なくとも含み、当該マーカによって定義される３次元の座標系の情報を含む動画データである。 The second receiving units 112 a and 112 b are means for receiving the moving image data shot by the second shooting unit 350. That is, the second receiving units 112a and 112b selectively acquire the moving image data shot by the second shooting unit 350 among the data received via the communication unit 130. The second receiving unit 112a acquires the moving image data from the user terminal 300a, and the second receiving unit 112b acquires the moving image data from the user terminal 300b. The moving image data received by the second receiving units 112a and 112b is moving image data including at least a marker in the video and including information on a three-dimensional coordinate system defined by the marker.

以下においては、説明の便宜上、第１撮影部３４０により撮影された動画データを「第１動画データ」といい、第２撮影部３５０により撮影された動画データを「第２動画データ」という。つまり、第１動画データは距離情報を含み、第２動画データは（座標系を特定可能な）マーカの画像を含む。 In the following, for convenience of explanation, the moving image data shot by the first shooting unit 340 is referred to as “first moving image data”, and the moving image data shot by the second shooting unit 350 is referred to as “second moving image data”. That is, the first moving image data includes distance information, and the second moving image data includes an image of a marker (which can specify a coordinate system).

抽出部１１３ａ、１１３ｂは、第１受信部１１１ａ、１１１ｂにより受信された第１動画データから、所定のオブジェクトに相当する画素を抽出する手段である。抽出部１１３ａ、１１３ｂは、第１動画データの距離情報に基づき、センサとの距離が所定の条件を満たす画素を抽出し、当該画素の色情報と距離情報を特定する。ここにおいて、所定の条件は、センサとの距離について設定された閾値によって定まる。閾値は、距離の上限のみが設定されていてもよいし、上限と下限の双方が設定されていてもよい。 The extraction units 113a and 113b are means for extracting pixels corresponding to a predetermined object from the first moving image data received by the first reception units 111a and 111b. Based on the distance information of the first moving image data, the extraction units 113a and 113b extract a pixel whose distance from the sensor satisfies a predetermined condition, and specifies the color information and distance information of the pixel. Here, the predetermined condition is determined by a threshold set for the distance from the sensor. As the threshold value, only the upper limit of the distance may be set, or both the upper limit and the lower limit may be set.

生成部１１４ａ、１１４ｂは、第１受信部１１１ａ、１１１ｂにより受信された第１動画データと第２受信部１１２ａ、１１２ｂにより受信された第２動画データとに基づいて、合成動画データを生成する手段である。生成部１１４ａ、１１４ｂは、第２受信部１１２ａ、１１２ｂにより受信された第２動画データが表す映像から背景中にあるマーカを検出し、検出したマーカに基づいて当該背景の３次元の座標系を特定し、抽出部１１３ａ、１１３ｂにより抽出された画素に応じた画像を当該座標系に対応するようにして当該背景に合成する。本実施形態において、生成部１１４ａ、１１４ｂは、ユーザ端末３００の算出部３１１により算出された座標系を用いて合成動画データを生成する。 The generating units 114a and 114b generate synthetic moving image data based on the first moving image data received by the first receiving units 111a and 111b and the second moving image data received by the second receiving units 112a and 112b. It is. The generation units 114a and 114b detect a marker in the background from the video represented by the second moving image data received by the second reception units 112a and 112b, and based on the detected marker, the three-dimensional coordinate system of the background is detected. The image corresponding to the pixel identified and extracted by the extraction units 113a and 113b is combined with the background so as to correspond to the coordinate system. In the present embodiment, the generation units 114 a and 114 b generate synthetic moving image data using the coordinate system calculated by the calculation unit 311 of the user terminal 300.

生成部１１４ａ、１１４ｂは、第１動画データが表す映像の座標系と第２動画データが表す映像の座標系を対応するようにして合成動画データを生成する。具体的には、生成部１１４ａ、１１４ｂは、第１動画データについて、画素の配列に基づいてｘ軸とｙ軸を定義し、距離情報の方向にｚ軸を定義すると、３次元の直交座標系を定義することができる。そして、生成部１１４ａ、１１４ｂは、第２動画データについてマーカによって定義される座標系と第１動画データに定義された座標系とを対応付けるようにしてこれらの映像を合成する。生成部１１４ａ、１１４ｂは、いったん両者の座標系を対応付けたら、その後はその対応付けが維持されるようにして映像の合成を続ける。このとき、生成部１１４ａ、１１４ｂは、一方の座標系のある座標と他方の座標系のある座標とが一致するように合成を行うなどして、これらの映像の位置合わせを行う。 The generation units 114a and 114b generate the combined moving image data so that the coordinate system of the video represented by the first moving image data corresponds to the coordinate system of the video represented by the second moving image data. Specifically, when the generation units 114a and 114b define the x-axis and the y-axis based on the pixel arrangement and the z-axis in the direction of the distance information for the first moving image data, the three-dimensional orthogonal coordinate system Can be defined. Then, the generation units 114a and 114b synthesize these videos so as to associate the coordinate system defined by the marker with the coordinate system defined in the first moving image data for the second moving image data. Once the generating units 114a and 114b associate the two coordinate systems once, the generating units 114a and 114b continue to synthesize the video so that the association is maintained. At this time, the generation units 114a and 114b align these images by performing synthesis so that a certain coordinate in one coordinate system matches a certain coordinate in the other coordinate system.

また、生成部１１４ａ、１１４ｂは、これらの映像を合成するときに、タイミングの調整やエフェクト等の画像処理を行ってもよい。例えば、生成部１１４ａ、１１４ｂは、第１動画データと第２動画データの受信タイミングに時間差がある場合に、その時間差が少なくなるように合成時のタイミングを調整する。また、ここでいう画像処理は、映像に対して何らかの画像を重畳する処理であってもよいし、合成する２つの映像の明るさや色合いを合わせる処理であってもよい。 The generation units 114a and 114b may perform image processing such as timing adjustment and effects when combining these videos. For example, when there is a time difference between the reception timings of the first moving image data and the second moving image data, the generation units 114a and 114b adjust the timing at the time of synthesis so that the time difference is reduced. The image processing here may be processing for superimposing some kind of image on the video, or processing for matching the brightness and color of the two videos to be combined.

送信部１１５ａ、１１５ｂは、生成部１１４ａ、１１４ｂにより生成された合成動画データをユーザ端末３００ａ、３００ｂに送信する手段である。送信部１１５ａは、合成動画データをユーザ端末３００ａに送信し、送信部１１５ｂは、合成動画データをユーザ端末３００ｂに送信する。このようにすることで、ユーザ端末３００ａ、３００ｂのユーザは、自身が送信した第２動画データ（マーカを撮影した動画データ）に対して相手方のユーザのオブジェクト（顔、上半身など）が合成された動画を見ることができるようになる。 The transmission units 115a and 115b are means for transmitting the combined moving image data generated by the generation units 114a and 114b to the user terminals 300a and 300b. The transmission unit 115a transmits the composite video data to the user terminal 300a, and the transmission unit 115b transmits the composite video data to the user terminal 300b. By doing in this way, the user of user terminal 300a, 300b synthesize | combined the other user's object (a face, upper body, etc.) with the 2nd moving image data (moving image data which image | photographed the marker) which he transmitted. You will be able to watch videos.

同期制御部１１０ｂは、合成動画データの再生と音声メッセージの再生とを同期させるための処理を実行する手段である。例えば、同期制御部１１０ｂは、回線情報の変化に追従するように、合成動画データの圧縮方式を異ならせることができる。また、同期制御部１１０ｂは、送受信部２１１が音声メッセージを合成動画データに合わせて再生するために必要なデータを接続管理装置２００に供給することもできる。同期制御部１１０ｂは、より詳細には、データ授受部１１６と、判断部１１７と、解析部１１８とを含んで構成される。 The synchronization control unit 110b is a means for executing processing for synchronizing the reproduction of the synthesized moving image data and the reproduction of the voice message. For example, the synchronization control unit 110b can change the compression method of the combined moving image data so as to follow the change in the line information. The synchronization control unit 110b can also supply the connection management device 200 with data necessary for the transmission / reception unit 211 to reproduce the voice message in accordance with the synthesized moving image data. More specifically, the synchronization control unit 110b includes a data exchange unit 116, a determination unit 117, and an analysis unit 118.

データ授受部１１６は、接続管理装置２００との間でデータを授受する手段である。データ授受部１１６は、接続管理装置２００から回線情報を取得し、接続管理装置２００にリソース情報を供給する。 The data transmission / reception unit 116 is a means for exchanging data with the connection management device 200. The data transfer unit 116 acquires line information from the connection management apparatus 200 and supplies resource information to the connection management apparatus 200.

判断部１１７は、回線情報又はリソース情報に基づいて、ユーザ端末３００ａとユーザ端末３００ｂとが合成動画データによるコミュニケーションを行うことができるか否かを端末毎に判断する手段である。例えば、合成制御部１１０ａは、ユーザ端末３００ａとユーザ端末３００ｂの一方の通信回線が合成動画データのリアルタイムな再生に必要な品質を満たしていない場合には、当該一方についての合成動画データを生成せずに、他方の合成動画データのみを生成及び送信する、といった処理を、判断部１１７による判断結果に基づいて行うことができる。また、合成制御部１１０ａは、リソース情報による判断結果に基づき、ユーザ端末３００が第１撮影部３４０を備えていない場合には合成動画データの生成を行わないようにすることも可能である。 The determining unit 117 is a unit that determines, for each terminal, whether or not the user terminal 300a and the user terminal 300b can perform communication using the composite moving image data based on the line information or the resource information. For example, when one communication line of the user terminal 300a and the user terminal 300b does not satisfy the quality necessary for real-time reproduction of the composite video data, the synthesis control unit 110a generates the composite video data for the one. Instead, the process of generating and transmitting only the other combined moving image data can be performed based on the determination result by the determination unit 117. Further, based on the determination result based on the resource information, the composition control unit 110a can also prevent the composite moving image data from being generated when the user terminal 300 does not include the first photographing unit 340.

解析部１１８は、第１受信部１１１ａ、１１１ｂにより受信された第１動画データにおける被写体の変化を解析し、その解析結果を表す情報を生成する手段である。同期制御部１１０ｂは、解析部１１８により生成された情報を接続管理装置２００に供給する。解析部１１８は、例えば、被写体であるユーザの移動を解析する。あるいは、解析部１１８は、周知の顔認識技術（例えば、笑顔を検出する技術）を用いて、被写体であるユーザの表情の変化を解析してもよい。また、解析部１１８は、ユーザの変化に限らず、被写体全体の変化（例えば、明るさの変化）を解析により求めてもよい。 The analysis unit 118 is a unit that analyzes changes in the subject in the first moving image data received by the first reception units 111a and 111b and generates information representing the analysis result. The synchronization control unit 110b supplies the information generated by the analysis unit 118 to the connection management apparatus 200. For example, the analysis unit 118 analyzes the movement of the user who is the subject. Alternatively, the analysis unit 118 may analyze changes in the facial expression of the user who is the subject using a known face recognition technique (for example, a technique for detecting a smile). Further, the analysis unit 118 may obtain a change in the entire subject (for example, a change in brightness) by analysis, not limited to a change in the user.

通信システム１０の構成は、以上のとおりである。この構成のもと、ユーザは、ユーザ端末３００を用いて他のユーザとリアルタイムなコミュニケーションを行う。ここでいうコミュニケーションは、音声と映像とを用いたものであり、例えば、いわゆるテレビ電話のようなものである。ただし、本実施形態のコミュニケーションは、ＡＲ技術によって合成された映像を利用可能である点において従来のテレビ電話と異なっている。 The configuration of the communication system 10 is as described above. Under this configuration, the user communicates with other users in real time using the user terminal 300. The communication here uses audio and video, and is, for example, a so-called videophone. However, the communication of the present embodiment is different from the conventional videophone in that an image synthesized by the AR technology can be used.

このようなコミュニケーションを実現するために、ユーザ端末３００においては、被写体が異なる２種類の動画データが撮影及び送信される。動画データの一つは、上述した第１動画データであり、ユーザが自身を撮影して得られるものである。また、もう一つの動画データは、上述した第２動画データであり、ユーザがマーカを撮影して得られるものである。 In order to realize such communication, the user terminal 300 captures and transmits two types of moving image data with different subjects. One of the moving image data is the first moving image data described above, and is obtained by the user photographing himself / herself. Another moving image data is the above-described second moving image data, which is obtained by a user shooting a marker.

ＡＲ動画生成装置１００は、これらの動画データを受信し、合成動画データを生成及び送信する。ＡＲ動画生成装置１００は、一方のユーザ端末３００から送信され、ユーザを被写体に含む第２動画データと、他方のユーザ端末３００から送信され、マーカを被写体に含む第１動画データとを用いて合成動画データを生成し、これを当該一方のユーザ端末３００に送信する。すなわち、ＡＲ動画生成装置１００は、マーカを映した動画をあるユーザから受信し、これを背景に用いて、他のユーザを映した映像をここに合成する。このとき、ＡＲ動画生成装置１００は、第２動画データの映像からユーザに相当する所定のオブジェクトを抽出し、これを背景に重ね合わせる。 The AR moving image generating apparatus 100 receives these moving image data, and generates and transmits composite moving image data. The AR moving image generating apparatus 100 is synthesized using the second moving image data transmitted from one user terminal 300 and including the user as a subject, and the first moving image data transmitted from the other user terminal 300 and including the marker as the subject. Video data is generated and transmitted to the one user terminal 300. That is, the AR moving image generating apparatus 100 receives a moving image showing a marker from a certain user, and uses this as a background to synthesize an image showing another user here. At this time, the AR moving image generating apparatus 100 extracts a predetermined object corresponding to the user from the video of the second moving image data, and superimposes it on the background.

図５は、本実施形態において送受信される動画データが表す映像を説明するための模式図である。ここにおいて、映像Ｖ１ａ、Ｖ２ａ、Ｖ３ａは、それぞれ、ユーザ端末３００ａにより送受信される動画データが表す映像であり、映像Ｖ１ｂ、Ｖ２ｂ、Ｖ３ｂは、それぞれ、ユーザ端末３００ｂにより送受信される動画データが表す映像であるとする。また、映像Ｖ１ａ、Ｖ１ｂが第１動画データ、映像Ｖ２ａ、Ｖ２ｂが第２動画データ、映像Ｖ３ａ、Ｖ３ｂが合成動画データに、それぞれ相当する。 FIG. 5 is a schematic diagram for explaining a video represented by moving image data transmitted and received in the present embodiment. Here, the videos V1a, V2a, and V3a are videos represented by moving picture data transmitted and received by the user terminal 300a, and the videos V1b, V2b, and V3b are videos represented by moving picture data transmitted and received by the user terminal 300b, respectively. Suppose that The videos V1a and V1b correspond to the first moving image data, the videos V2a and V2b correspond to the second moving image data, and the videos V3a and V3b correspond to the combined moving image data, respectively.

なお、図５において、Ｍａ、Ｍｂは、それぞれマーカを示している。マーカＭａ、Ｍｂは、あらかじめ決められた形状の画像を印刷等によって形成した小片であり、ユーザが好きな場所に配置することが可能である。マーカＭａ、Ｍｂに形成された画像は、他の背景部分との識別が容易であり、かつ、座標系や傾きの特定が容易なように構成されていれば、どのようなパターンの画像であってもよい。 In FIG. 5, Ma and Mb indicate markers. The markers Ma and Mb are small pieces formed by printing an image of a predetermined shape by printing or the like, and can be arranged at a place the user likes. The image formed on the markers Ma and Mb is any pattern image as long as it can be easily distinguished from other background parts and can be easily specified in the coordinate system and the inclination. May be.

図５に示すように、ユーザ端末３００ａにおいて表示される映像Ｖ３ａは、ユーザ端末３００ｂからの第１動画データ（Ｖ１ｂ）とユーザ端末３００ａからの第２動画データ（Ｖ２ａ）とを合成して得られる映像である。一方、ユーザ端末３００ｂにおいて表示される映像Ｖ３ｂは、ユーザ端末３００ａからの第１動画データ（Ｖ１ａ）とユーザ端末３００ｂからの第２動画データ（Ｖ２ｂ）とを合成して得られる映像である。 As shown in FIG. 5, the video V3a displayed on the user terminal 300a is obtained by synthesizing the first moving image data (V1b) from the user terminal 300b and the second moving image data (V2a) from the user terminal 300a. It is a picture. On the other hand, the video V3b displayed on the user terminal 300b is a video obtained by combining the first video data (V1a) from the user terminal 300a and the second video data (V2b) from the user terminal 300b.

なお、図５に示すように、映像Ｖ３ａ、Ｖ３ｂに合成されるオブジェクト（この場合、ユーザの上半身）は、映像Ｖ１ａ、Ｖ１ｂに含まれるオブジェクトと等しい倍率である必要はなく、拡大・縮小などを適宜に行ってよい。また、映像Ｖ３ａ、Ｖ３ｂに合成されるオブジェクトは、距離情報に基づいて凹凸が表現され、色情報に基づいて着色された立体的な画像（ポリゴン）であるとするが、平面的な画像であってもよい。 As shown in FIG. 5, the object (in this case, the user's upper body) to be combined with the videos V3a and V3b does not need to have the same magnification as the objects included in the videos V1a and V1b. It may be done as appropriate. In addition, the object combined with the videos V3a and V3b is a three-dimensional image (polygon) in which unevenness is expressed based on the distance information and colored based on the color information. May be.

図６は、オブジェクトの抽出原理を説明するための図である。距離画像センサによって得られる距離情報は、図中の破線の矢印によって示すように、各画素について得られる。ここにおいて、図中のＴｈで示した閾値を設定し、距離情報により表される距離がこの閾値よりも小さい画素を抽出するようにすれば、被写体のうちのユーザに相当する部分の画素が選択され、ユーザ以外の部分（ユーザの背後の壁など）が除外されたオブジェクトが得られる。 FIG. 6 is a diagram for explaining the principle of object extraction. The distance information obtained by the distance image sensor is obtained for each pixel, as indicated by the dashed arrows in the figure. Here, if a threshold indicated by Th in the figure is set and pixels whose distance represented by the distance information is smaller than this threshold are extracted, the portion of the subject corresponding to the user is selected. Thus, an object excluding a part other than the user (such as a wall behind the user) is obtained.

なお、この閾値は、あらかじめ設定されており、例えばユーザが距離画像センサから１ｍ以内の場所で会話するようにしてもよいし、複数の選択肢の中からユーザが選択できるようになっていてもよい。あるいは、閾値は、ユーザ端末３００によって動的に設定されてもよい。例えば、ユーザ端末３００は、色情報に基づいてユーザの位置（顔など）を推定し、この推定結果に基づいて閾値を設定することも可能である。 This threshold value is set in advance. For example, the user may have a conversation within a place within 1 m from the distance image sensor, or the user may be able to select from a plurality of options. . Alternatively, the threshold value may be dynamically set by the user terminal 300. For example, the user terminal 300 can estimate the user's position (such as a face) based on the color information and set a threshold based on the estimation result.

ユーザは、このようにして合成された映像を見ながら相手と会話することで、あたかも自分のそば（背景として撮影している位置）に相手がいるような感覚でコミュニケーションを行うことができる。これにより、ユーザは、リアリティのあるコミュニケーションをリアルタイムに行うことが可能になる。 The user can communicate with a partner as if he / she is beside him / herself (position taken as a background) by talking with the partner while watching the synthesized video. Thereby, the user can perform realistic communication in real time.

通信システム１０においては、このような映像をユーザ端末３００に表示可能にするために、ＡＲ動画生成装置１００が合成動画データを生成及び送信する。ＡＲ動画生成装置１００は、リソース情報や回線情報を参照し、必要な合成動画データを生成する。例えば、ＡＲ動画生成装置１００は、ユーザ端末３００のリソースや通信回線の状況に応じて、合成動画データの圧縮方式や転送レートを決定する。また、ＡＲ動画生成装置１００は、リソース情報や回線情報に基づいて、合成動画データを生成するか否かを判断することも可能である。例えば、ＡＲ動画生成装置１００は、合成動画データの送信に十分な転送帯域が確保されていない場合や、そもそもユーザ端末３００が合成動画データを再生する機能を有しない場合などには、合成動画データを送信しないか、合成動画データよりもデータ量が少ない代替的なデータを送信する、といったことが可能である。 In the communication system 10, the AR video generation device 100 generates and transmits composite video data so that such video can be displayed on the user terminal 300. The AR moving image generating apparatus 100 refers to resource information and line information and generates necessary synthesized moving image data. For example, the AR moving image generating apparatus 100 determines the compression method and transfer rate of the combined moving image data according to the resource of the user terminal 300 and the state of the communication line. Further, the AR moving image generating apparatus 100 can determine whether or not to generate the combined moving image data based on the resource information and the line information. For example, the AR video generation device 100 may generate the composite video data when a transfer band sufficient for transmission of the composite video data is not secured or when the user terminal 300 does not have a function of reproducing the composite video data in the first place. Can be transmitted, or alternative data having a data amount smaller than that of the combined moving image data can be transmitted.

図７は、ＡＲ動画生成装置１００が合成動画データを生成するときに実行する処理を示すフローチャートである。ＡＲ動画生成装置１００の制御部１１０は、まず最初に、リソース情報と回線情報の少なくともいずれかを用いて、合成動画データの生成態様を判定する（ステップＳａ１）。このとき、制御部１１０は、合成動画データの生成の可否や、合成動画データを生成するときの圧縮方式や転送レートなどを、合成動画データの送信対象であるユーザ端末３００のそれぞれについて判定する。 FIG. 7 is a flowchart showing processing executed when the AR moving image generating apparatus 100 generates composite moving image data. First, the control unit 110 of the AR moving image generating apparatus 100 determines the generation mode of the combined moving image data using at least one of the resource information and the line information (Step Sa1). At this time, the control unit 110 determines whether or not the composite moving image data can be generated, the compression method and the transfer rate when generating the combined moving image data, for each user terminal 300 that is the transmission target of the combined moving image data.

その後、制御部１１０は、ステップＳａ１の判定結果に応じた処理を実行する。制御部１１０は、一方のユーザ端末３００であるユーザ端末３００ａについて、合成動画データを生成するか否か判断し（ステップＳａ２）、合成動画データを生成すると判定した場合には合成動画データを生成する一方、そうでなければ生成を行わない（ステップＳａ３）。また、合成動画データを生成する場合、制御部１１０は、ステップＳａ１の判定結果に応じて（すなわち回線情報やリソース情報に応じて）、合成動画データの圧縮方式や転送レートなどを異ならせる。 Then, the control part 110 performs the process according to the determination result of step Sa1. The control unit 110 determines whether or not to generate the combined moving image data for the user terminal 300a that is one user terminal 300 (step Sa2), and generates the combined moving image data when it is determined to generate the combined moving image data. On the other hand, generation is not performed otherwise (step Sa3). Further, when generating the composite moving image data, the control unit 110 varies the compression method, the transfer rate, and the like of the composite moving image data according to the determination result of Step Sa1 (that is, according to the line information and resource information).

続いて、制御部１１０は、他方のユーザ端末３００であるユーザ端末３００ｂについても同様に、合成動画データを生成するか否かの判断（ステップＳａ４）と合成動画データの生成（ステップＳａ５）とを実行する。そして、制御部１１０は、生成した合成動画データを通信部１３０に供給することにより、合成動画データをユーザ端末３００ａ、３００ｂのそれぞれに送信する（ステップＳａ６）。 Subsequently, similarly to the user terminal 300b, which is the other user terminal 300, the control unit 110 determines whether or not to generate the combined moving image data (Step Sa4) and generates the combined moving image data (Step Sa5). Run. And the control part 110 transmits synthetic | combination moving image data to each of user terminal 300a, 300b by supplying the produced | generated synthetic | combination moving image data to the communication part 130 (step Sa6).

ＡＲ動画生成装置１００は、このような処理を実行することで、ユーザ端末３００ａとユーザ端末３００ｂのそれぞれに応じた品質の合成動画データを生成することが可能である。また、ＡＲ動画生成装置１００は、送信する必要がない合成動画データの生成を省略することが可能であり、自装置のリソースを効率的に使用して処理を進めることができる。 The AR moving image generating apparatus 100 can generate composite moving image data having a quality corresponding to each of the user terminal 300a and the user terminal 300b by executing such processing. Further, the AR moving image generating apparatus 100 can omit the generation of the composite moving image data that does not need to be transmitted, and can proceed with the processing by efficiently using the resource of the own apparatus.

図８は、通信システム１０の各装置における処理を示すシーケンスチャートである。なお、図８に示す例は、ユーザ端末３００ａ、３００ｂの双方に合成動画データを送信する場合、すなわち、図７のステップＳａ２、Ｓａ４のいずれの判断も「ＹＥＳ」となる場合のものである。また、図８においては、図示の便宜上、ＡＲ動画生成装置１００を２つ示し、ユーザ端末３００ａに合成動画データを送信するための構成とユーザ端末３００ｂに合成動画データを送信するための構成とを分けているが、実際には同一の装置で同時並行的に処理が行われてもよい。 FIG. 8 is a sequence chart showing processing in each device of the communication system 10. The example shown in FIG. 8 is a case where the composite moving image data is transmitted to both the user terminals 300a and 300b, that is, the case where both determinations in steps Sa2 and Sa4 in FIG. 7 are “YES”. Also, in FIG. 8, for convenience of illustration, two AR video generation devices 100 are shown, and a configuration for transmitting the composite video data to the user terminal 300a and a configuration for transmitting the composite video data to the user terminal 300b. In practice, however, the same apparatus may actually perform processing in parallel.

まず、ユーザ端末３００ａとユーザ端末３００ｂとは、接続管理装置２０を介して、呼接続を確立する（ステップＳｂ１）。例えば、ユーザは、電話番号やこれに代替するもの（ユーザアカウントなど）を用いてコミュニケーションの相手を指定し、発話を開始することができる。 First, the user terminal 300a and the user terminal 300b establish a call connection via the connection management device 20 (step Sb1). For example, the user can designate a communication partner using a telephone number or an alternative (such as a user account) and start speaking.

呼接続が確立すると、接続管理装置２０は、ユーザ端末３００ａ、３００ｂに接続情報を送信する（ステップＳｂ２）。接続情報は、ユーザ端末３００ａとユーザ端末３００ｂとが接続されたときに送信される情報であって、コミュニケーションの相手に関する情報などを含んでいる。例えば、接続情報には、呼接続以降にシステム内部で呼を識別するために用いられる呼識別情報などが含まれる。また、接続情報は、上述した回線情報やリソース情報を含んでもよい。なお、回線情報やリソース情報を接続情報に含む場合、ユーザ端末３００は、第１動画データや第２動画データを送信するか否かを接続情報に基づいて判断してもよい。このようにすれば、図７に示した判定の一部をユーザ端末３００で行うことが可能になる。 When the call connection is established, the connection management device 20 transmits connection information to the user terminals 300a and 300b (step Sb2). The connection information is information that is transmitted when the user terminal 300a and the user terminal 300b are connected, and includes information related to a communication partner. For example, the connection information includes call identification information used for identifying a call within the system after the call connection. Further, the connection information may include the above-described line information and resource information. When line information or resource information is included in the connection information, the user terminal 300 may determine whether to transmit the first moving image data or the second moving image data based on the connection information. In this way, a part of the determination shown in FIG.

また、ユーザ端末３００ａ、３００ｂは、呼接続が確立すると、映像の撮影を開始する（ステップＳｂ３）。すなわち、ユーザ端末３００ａ、３００ｂは、第１撮影部３４０及び第２撮影部３５０を起動し、動画データを送信できる状態にする。そして、ユーザ端末３００ａ、３００ｂは、第１動画データと第２動画データを、ＡＲ動画生成装置１００にそれぞれ送信する（ステップＳｂ４、Ｓｂ５）。ＡＲ動画生成装置１００は、第１動画データお第２動画データを受信すると、合成動画データを生成し（ステップＳｂ６）、ユーザ端末３００ａ、３００ｂに送信する（ステップＳｂ７）。 In addition, when the call connection is established, the user terminals 300a and 300b start capturing video (step Sb3). That is, the user terminals 300a and 300b activate the first photographing unit 340 and the second photographing unit 350 so that moving image data can be transmitted. Then, the user terminals 300a and 300b transmit the first moving image data and the second moving image data to the AR moving image generating device 100, respectively (steps Sb4 and Sb5). When receiving the first moving image data and the second moving image data, the AR moving image generating apparatus 100 generates the combined moving image data (step Sb6) and transmits it to the user terminals 300a and 300b (step Sb7).

以上のとおり、本実施形態によれば、一方のユーザが撮影した背景に対して他方のユーザの映像を合成し、合成した映像を用いながら音声メッセージをやり取りすることが可能になる。また、本実施形態においては、動画データの合成をＡＲ動画生成装置１００が行うため、ユーザ端末３００においてポリゴンデータのレンダリングのような比較的負荷が高い処理を実行する必要がない。 As described above, according to the present embodiment, it is possible to synthesize the video of the other user with the background photographed by one user and exchange voice messages using the synthesized video. In the present embodiment, since the AR moving image generation apparatus 100 performs combining of moving image data, it is not necessary to execute a relatively high load process such as polygon data rendering in the user terminal 300.

また、本実施形態において、ＡＲ動画生成装置１００は、合成動画データの生成態様をリソース情報や回線情報に応じて異ならせることができる。これにより、ＡＲ動画生成装置１００は、遅延するおそれがより少ない態様でデータ転送を行ったり、無駄な処理の実行を省略したりすることが可能である。また、ＡＲ動画生成装置１００は、一方のユーザ端末３００のみが距離画像センサを備え、他方のユーザ端末３００には距離画像センサが備わっていない場合であれば、当該他方のユーザ端末３００に送信する合成動画データのみを生成することができ、このような態様でのコミュニケーションも実現可能である。 In the present embodiment, the AR moving image generating device 100 can change the generation mode of the combined moving image data according to the resource information and the line information. As a result, the AR moving image generating apparatus 100 can perform data transfer in a mode in which there is less risk of delay, or can omit performing unnecessary processing. In addition, in the AR video generation device 100, if only one user terminal 300 includes a distance image sensor and the other user terminal 300 does not include a distance image sensor, the AR moving image generation apparatus 100 transmits the other user terminal 300 to the other user terminal 300. Only synthetic video data can be generated, and communication in such a manner can be realized.

［変形例］
本発明は、上述した実施形態の態様に限らず、他の態様でも実施することができる。以下に示すのは、本発明の他の態様の一例である。なお、これらの変形例は、必要に応じて、各々を適宜組み合わせてもよい。 [Modification]
The present invention is not limited to the aspect of the embodiment described above, and can be implemented in other aspects. The following is an example of another embodiment of the present invention. In addition, you may combine these modifications suitably as needed.

（１）本発明は、二者に限らず、三者以上のコミュニケーションにも適用可能である。この場合において、各ユーザのユーザ端末３００が距離画像センサを備えていてもよいが、特定の一のユーザのみが距離画像センサで自身を撮影する態様も可能である。例えば、第１ユーザ、第２ユーザ、第３ユーザという３名のユーザがコミュニケーションを行う場合において、第１ユーザのみが距離画像センサで自身を撮影し、第２ユーザと第３ユーザは背景のみを撮影する、といった態様でコミュニケーションを行うことも可能である。この場合、ＡＲ動画生成装置１００は、撮影された第１ユーザに相当する画像を抽出し、これを第２ユーザ側で撮影された第２動画データに合成する処理と、第３ユーザ側で撮影された第２動画データに合成する処理とを実行し、第２ユーザと第３ユーザのそれぞれに応じた合成動画データを生成する。 (1) The present invention is applicable not only to two parties but also to communication of three or more parties. In this case, the user terminal 300 of each user may be provided with the distance image sensor, but a mode in which only one specific user photographs itself with the distance image sensor is also possible. For example, when three users, a first user, a second user, and a third user, communicate with each other, only the first user takes a picture with the distance image sensor, and the second user and the third user only take the background. It is also possible to communicate in a manner such as shooting. In this case, the AR moving image generating apparatus 100 extracts an image corresponding to the photographed first user, combines this with the second moving image data photographed on the second user side, and photographed on the third user side. A process of synthesizing the generated second moving image data is executed to generate combined moving image data corresponding to each of the second user and the third user.

このようにすれば、第２ユーザにあっては、自身が撮影している背景に第１ユーザの映像が重畳された映像を視聴可能である一方、第３ユーザにあっては、自身が撮影している背景に第１ユーザの映像が重畳された映像（すなわち、第１ユーザの映像は重畳されているが、第２ユーザが視聴している映像とは異なる映像）を視聴可能である。
なお、このような態様においては、第１ユーザのユーザ端末３００は第２撮影部３５０を備えていなくてもよく、また、第２ユーザ及び第３ユーザのユーザ端末３００は第１撮影部３４０を備えていなくてもよい。 In this way, the second user can view the video in which the first user's video is superimposed on the background he / she is shooting, while the third user can take the video. A video in which the video of the first user is superimposed on the background (that is, a video in which the video of the first user is superimposed but is different from the video that the second user is viewing) can be viewed.
In such an aspect, the user terminal 300 of the first user may not include the second photographing unit 350, and the user terminal 300 of the second user and the third user may include the first photographing unit 340. It does not have to be provided.

（２）上述したように、通信システム１０においては、音声メッセージの音量又は音質の変化に応じて合成動画データの映像を変化させたり、あるいは第１動画データの映像の変化に応じて音声メッセージの音量又は音質を変化させることが可能である。具体的な例としては、以下のようなものがある。 (2) As described above, in the communication system 10, the video of the synthesized video data is changed according to the change in the volume or the sound quality of the voice message, or the voice message is changed according to the change in the video of the first video data. Volume or sound quality can be changed. Specific examples include the following.

例えば、ＡＲ動画生成装置１００は、ユーザ端末３００ａに対して送信する合成動画データに関して、抽出されたオブジェクトを合成する場合において、ユーザ端末３００ｂから送信された音声メッセージの音量が大きくなったときには、当該オブジェクトを拡大する一方、当該音声メッセージの音量が小さくなったときには、当該オブジェクトを縮小するようにしてもよい。また、ＡＲ動画生成装置１００は、音声メッセージにノイズが生じるなどして音質が悪化した場合に、合成動画データが表す映像にも（意図的な）ノイズを付加したり、あるいは画質を劣化させたりしてもよい。このようにすれば、合成される映像と音声との間に感覚的な連動性を与えることが可能である。 For example, when the AR video generation device 100 combines the extracted objects with respect to the composite video data to be transmitted to the user terminal 300a, when the volume of the voice message transmitted from the user terminal 300b increases, While the object is enlarged, the object may be reduced when the volume of the voice message decreases. In addition, when the sound quality deteriorates due to noise generated in the voice message, the AR moving image generating apparatus 100 adds (intentional) noise to the video represented by the combined moving image data or deteriorates the image quality. May be. In this way, it is possible to give a sensory link between the synthesized video and audio.

また、接続管理装置２００は、ＡＲ動画生成装置１００から被写体の変化を表す情報に基づいて、抽出されたオブジェクトの大きさの変化や、距離情報の変化や、あるいは被写体全体の明るさの変化などを認識することが可能である。接続管理装置２００は、このようにして認識された映像の変化に応じて、例えば、オブジェクトの大きさやその距離の変化に合わせて音量を変化させたり、被写体が明るいときと暗いときとで音質を異ならせたりしてもよい。また、接続管理装置２００は、被写体であるユーザの表情の変化に応じて音質や音量を異ならせてもよい。さらに、接続管理装置２００は、音声メッセージがマルチチャネル（例えばステレオ）のデータの場合であれば、所定のオブジェクトが抽出された位置の変化に応じて音像定位を制御し、例えば、被写体であるユーザの移動に追従するように音量やディレイを調整してもよい。 In addition, the connection management device 200 changes the size of the extracted object, changes in the distance information, or changes in the brightness of the entire subject based on the information representing the change in the subject from the AR video generation device 100. Can be recognized. The connection management device 200 changes the sound volume according to the change in the size of the object and the distance, for example, according to the change in the image recognized in this way, and the sound quality depending on whether the subject is bright or dark. It may be different. Further, the connection management apparatus 200 may vary the sound quality and volume according to changes in the facial expression of the user who is the subject. Further, if the voice message is multi-channel (for example, stereo) data, the connection management apparatus 200 controls the sound image localization according to the change in the position where the predetermined object is extracted, for example, the user who is the subject. The volume and delay may be adjusted so as to follow the movement.

（３）算出部３１１に相当する機能は、ユーザ端末３００ではなく、ＡＲ動画生成装置１００に備わっていてもよい。すなわち、ＡＲ動画生成装置１００は、ユーザ端末３００から送信された動画データに基づいて座標系を算出するように構成されてもよい。算出部３１１に相当する機能は、例えば、第２受信部１１２ａ、１１２ｂが有していてもよい。 (3) A function corresponding to the calculation unit 311 may be included in the AR moving image generation device 100 instead of the user terminal 300. That is, the AR moving image generating apparatus 100 may be configured to calculate a coordinate system based on moving image data transmitted from the user terminal 300. For example, the second reception units 112a and 112b may have a function corresponding to the calculation unit 311.

（４）ＡＲ動画生成装置１００は、ユーザ端末３００の動き（より詳細には、第２撮影部３５０の動き）に応じて、抽出したオブジェクトを回転させて合成することが可能である。しかし、距離画像センサで撮影ができるのは、ユーザが当該センサに向けている側（正面側）のみであって、反対側（背面側）を撮影することはできない。したがって、抽出されたオブジェクトのうち、距離画像センサで撮影できない部分については、これを表示させないか、あるいはあらかじめ決められた適当な代替的な映像を表示させるようにすればよい。 (4) The AR moving image generating apparatus 100 can synthesize the extracted object by rotating it according to the movement of the user terminal 300 (more specifically, the movement of the second photographing unit 350). However, the distance image sensor can shoot only the side (front side) where the user faces the sensor, and cannot shoot the opposite side (back side). Therefore, a portion of the extracted object that cannot be photographed by the distance image sensor may not be displayed or may be displayed as a suitable alternative video image that has been determined in advance.

なお、オブジェクトの回転は、ユーザの操作によって行われてもよい。すなわち、ＡＲ動画生成装置１００は、ユーザの操作を受け付け、受け付けた操作に応じてオブジェクトを回転させてから、これを背景の映像に合成して合成動画データを生成することが可能である。 Note that the rotation of the object may be performed by a user operation. That is, the AR moving image generating apparatus 100 can receive a user operation, rotate an object in accordance with the received operation, and then synthesize it with a background image to generate combined moving image data.

（５）ＡＲ動画生成装置１００は、抽出したオブジェクトに所定の画像を付加し、これを背景に合成することも可能である。ここでいう所定の画像とは、例えば、ユーザの顔や身体に装着する装飾品を模した画像（サングラス、洋服など）である。このようにすれば、ユーザの顔や身体の一部を隠した映像を表示することが可能になる。あるいは、ＡＲ動画生成装置１００は、抽出したオブジェクトを所定の画像（アバタなど）に置き換えて合成を行ってもよい。ＡＲ動画生成装置１００は、第２動画データが送信されてこなかった場合や、第２動画データの転送に遅延が生じている場合を判断し、このような場合に置き換えを行うようにしてもよい。なお、これらの画像を表示するための画像データは、あらかじめＡＲ動画生成装置１００や接続管理装置２００に記憶されていてもよいし、ユーザ端末３００がコミュニケーションを開始するときに接続管理装置２００に送信してもよい。 (5) The AR moving image generating apparatus 100 can add a predetermined image to the extracted object and synthesize it with the background. The predetermined image referred to here is, for example, an image (sunglasses, clothes, etc.) imitating an ornament worn on the user's face or body. In this way, it is possible to display an image in which a part of the user's face or body is hidden. Alternatively, the AR moving image generating apparatus 100 may perform synthesis by replacing the extracted object with a predetermined image (such as an avatar). The AR video generation device 100 may determine when the second video data has not been transmitted or when there is a delay in the transfer of the second video data, and in such a case, replacement may be performed. . Note that image data for displaying these images may be stored in advance in the AR video generation device 100 or the connection management device 200, or transmitted to the connection management device 200 when the user terminal 300 starts communication. May be.

（６）接続情報には、上述した変形例（４）に示した画像データが含まれてもよい。また、接続情報は、このほかにも、ユーザがあらかじめ設定した属性情報（性別、趣味）などを含み得る。変形例（４）において、ＡＲ動画生成装置１００は、抽出したオブジェクトを所定の画像に置き換え、又は当該オブジェクトに所定の画像を付加する場合に、表示すべき画像を属性情報に基づいて決定してもよい。 (6) The connection information may include the image data shown in the modified example (4) described above. In addition, the connection information may include attribute information (gender, hobby) set in advance by the user. In the modified example (4), the AR video generation device 100 determines an image to be displayed based on the attribute information when replacing the extracted object with a predetermined image or adding a predetermined image to the object. Also good.

（７）色情報の画素のサイズと距離情報の画素のサイズ、すなわち両者の解像度は、必ずしも一致していなくてもよい。例えば、距離情報の１画素は、色情報の４画素（縦２画素×横２画素）に相当していてもよい。この場合の距離情報は、色情報の４画素分について同一の値であるとしてもよいが、隣り合う画素の距離情報を参照して適当な補間処理を実行して算出されてもよい。 (7) The size of the pixel of color information and the size of the pixel of distance information, that is, the resolutions of both do not necessarily match. For example, one pixel of distance information may correspond to four pixels of color information (vertical 2 pixels × horizontal 2 pixels). The distance information in this case may be the same value for the four pixels of the color information, but may be calculated by executing appropriate interpolation processing with reference to the distance information of adjacent pixels.

（８）本発明において抽出されるオブジェクトは、距離情報が所定の条件を満たすものであればどのようなものであってもよく、必ずしもユーザである必要はない。例えば、ユーザは、自身に代えて、人形や、飼育しているペットを撮影し、これを合成対象の映像とすることも可能である。 (8) The object extracted in the present invention may be any object as long as the distance information satisfies a predetermined condition, and is not necessarily a user. For example, the user can take a picture of a doll or a reared pet instead of himself and use it as a composition target video.

（９）ＡＲ動画生成装置１００及び接続管理装置２００は、別体の装置として構成されるのではなく、一体に構成されてもよい。また、本発明は、音声メッセージに代えて、テキストメッセージを交換するものであってもよい。このようにすれば、チャットのようなコミュニケーションにも本発明を適用することが可能である。この場合において、ＡＲ動画生成装置１００は、音量や音質に代えて、文字のサイズや文字装飾（フォント、下線、色など）に応じて、あるいは特定の文字（感嘆符など）の有無に応じて映像を異ならせることも可能である。 (9) The AR video generation device 100 and the connection management device 200 may be configured integrally instead of being configured as separate devices. Further, the present invention may replace text messages instead of voice messages. In this way, the present invention can be applied to communication such as chat. In this case, the AR moving image generating apparatus 100 replaces the volume and the sound quality, according to the character size and character decoration (font, underline, color, etc.), or according to the presence or absence of a specific character (exclamation mark, etc.). It is also possible to make the images different.

（１０）本発明は、ＡＲ動画生成装置やこれを備える画像コミュニケーションシステムとしてだけではなく、ＡＲ動画生成装置が実行するプログラムや、合成した動画を用いたコミュニケーション方法として把握することも可能である。また、このプログラムは、光ディスクなどの記録媒体に記録した形態や、インターネットなどのネットワークを介して、コンピュータにダウンロードさせ、これをインストールして利用可能にする形態などでも提供することができる。 (10) The present invention can be understood not only as an AR moving image generating device and an image communication system including the AR moving image generating device, but also as a communication method using a program executed by the AR moving image generating device or a synthesized moving image. The program can also be provided in a form recorded on a recording medium such as an optical disk, or a form that is downloaded to a computer via a network such as the Internet, and can be installed and used.

１０…通信システム、１００…ＡＲ動画生成装置、１１０…制御部、１１１ａ、１１１ｂ…第１受信部、１１２ａ、１１２ｂ…第２受信部、１１３ａ、１１３ｂ…抽出部、１１４ａ、１１４ｂ…生成部、１１５ａ、１１５ｂ…送信部、１１６…データ授受部、１１７…判断部、１１８…解析部、記憶部…１２０、通信部…１３０、２００…接続管理装置、２１０…制御部、２１１…送受信部、２１２…データ授受部、２２０…記憶部、１３０…通信部、３００、３００ａ、３００ｂ…ユーザ端末、３１０…制御部、３２０…記憶部、３３０…通信部、３４０…第１撮影部、３５０…第２撮影部、３６０…表示部、３７０…音声入出力部、３８０…操作部、４００…ネットワーク DESCRIPTION OF SYMBOLS 10 ... Communication system, 100 ... AR animation production | generation apparatus, 110 ... Control part, 111a, 111b ... 1st reception part, 112a, 112b ... 2nd reception part, 113a, 113b ... Extraction part, 114a, 114b ... production | generation part, 115a , 115b ... transmission unit, 116 ... data exchange unit, 117 ... determination unit, 118 ... analysis unit, storage unit ... 120, communication unit ... 130, 200 ... connection management device, 210 ... control unit, 211 ... transmission / reception unit, 212 ... Data transmission / reception unit, 220 ... storage unit, 130 ... communication unit, 300, 300a, 300b ... user terminal, 310 ... control unit, 320 ... storage unit, 330 ... communication unit, 340 ... first imaging unit, 350 ... second imaging Part, 360 ... display part, 370 ... voice input / output part, 380 ... operation part, 400 ... network

Claims

A connection management device for managing the connection between the first terminal and the second terminal;
An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both to the at least one of the first terminal and the second terminal whose connection is managed by the connection management device; Have
The connection management device includes:
Supplying line information regarding a communication line connecting the first terminal and the second terminal to the AR video generation device;
The AR video generation device
A first moving image data obtained by photographing a subject including a predetermined object from one of the first terminal and the second terminal, and receiving first moving image data including distance information for each pixel of the subject. A receiver,
A second receiving unit for receiving second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the first terminal and the second terminal;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
An image corresponding to the pixel extracted by the extraction unit by specifying a three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data received by the second reception unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system;
A transmission unit that transmits the synthesized moving image data generated by the generation unit to the other terminal ,
The connection management apparatus according to channel information supplied by the image communication system Ru at different image represented by the composite video data.

A connection management device for managing the connection between the first terminal and the second terminal;
An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both to the at least one of the first terminal and the second terminal whose connection is managed by the connection management device; Have
The connection management device includes:
Supplying resource information related to hardware or software resources in the first terminal and the second terminal to the AR video generation device;
The AR video generation device
A first moving image data obtained by photographing a subject including a predetermined object from one of the first terminal and the second terminal, and receiving first moving image data including distance information for each pixel of the subject. A receiver,
A second receiving unit for receiving second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the first terminal and the second terminal;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
An image corresponding to the pixel extracted by the extraction unit by specifying a three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data received by the second reception unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system;
A transmission unit that transmits the synthesized moving image data generated by the generation unit to the other terminal ,
The connection management apparatus according to the resource information supplied by an image communication system in which Ru at different image represented by the composite video data.

The AR video generation device
On the basis of the line information includes a determination unit for determining for each of the whether it is possible to communicate by said synthesized video data and the first terminal the second terminal,
Image communication system according to claim 1 wherein the terminal is determined to communications are possible by combining video data, generating and transmitting the composite video data by the determining unit.

The AR video generation device
Before Based on cut source information includes a determination unit for determining for each of the whether it is possible to communicate by said synthesized video data and the first terminal the second terminal,
The image communication system according to claim 2 , wherein the composite moving image data is generated and transmitted with respect to a terminal that is determined by the determination unit to be able to communicate with the composite moving image data.

A connection management device for managing the connection between the first terminal and the second terminal;
An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both to the at least one of the first terminal and the second terminal whose connection is managed by the connection management device; Have
The connection management device includes:
A transmission / reception unit that transmits / receives a voice message between the first terminal and the second terminal is provided, and information indicating a change in volume or sound quality of the voice message transmitted / received by the transmission / reception unit is supplied to the AR video generation device. And
The AR video generation device
A first moving image data obtained by photographing a subject including a predetermined object from one of the first terminal and the second terminal, and receiving first moving image data including distance information for each pixel of the subject. A receiver,
A second receiving unit for receiving second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the first terminal and the second terminal;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
An image corresponding to the pixel extracted by the extraction unit by specifying a three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data received by the second reception unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system;
A transmission unit that transmits the synthesized moving image data generated by the generation unit to the other terminal ,
Image communication system in accordance with the information indicating the change of the supplied volume or sound quality, Ru with different image represented by the composite video data by the connection management apparatus.

A connection management device for managing the connection between the first terminal and the second terminal;
An AR (Augmented Reality) video generation device that transmits composite video data obtained by synthesizing videos captured by both to the at least one of the first terminal and the second terminal whose connection is managed by the connection management device; Have
The AR video generation device
A first moving image data obtained by photographing a subject including a predetermined object from one of the first terminal and the second terminal, and receiving first moving image data including distance information for each pixel of the subject. A receiver,
A second receiving unit for receiving second moving image data obtained by capturing a background including a marker having a predetermined shape from the other of the first terminal and the second terminal;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
An image corresponding to the pixel extracted by the extraction unit by specifying a three-dimensional coordinate system of the background defined based on the marker in the background represented by the second moving image data received by the second reception unit A generating unit that generates synthesized moving image data synthesized with the background so as to correspond to the coordinate system;
A transmission unit for transmitting the composite video data generated by the generation unit to the other terminal ;
An analysis unit that analyzes changes in the subject in the first moving image data received by the first reception unit;
Supplying information representing the change in the subject analyzed by the analysis unit to the connection management device;
The connection management device includes:
A transmission / reception unit for transmitting / receiving a voice message between the first terminal and the second terminal, wherein the voice message to be transmitted / received according to information representing a change in the subject supplied by the AR video generation device; An image communication system including a transmission / reception unit that varies sound volume or sound quality .

The AR video generation device
Analyzing a second moving image data received by the second receiving unit, and calculating a coordinate system;
The generator is
The image communication system according to any one of claims 1 to 6, wherein the synthesized moving image data is generated based on the coordinate system calculated by the calculation unit.

A first receiver that receives first moving image data of a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second receiving unit that receives second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
Generates synthesized moving image data in which an image corresponding to the pixel extracted by the extracting unit is combined with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit. A generator,
A transmission unit that transmits the composite video data generated by the generation unit ,
A terminal that transmits the first video data and a terminal that transmits the second video data from a connection management device that manages a connection between the terminal that transmits the first video data and the terminal that transmits the second video data; line information about the communication line for connecting is supplied, in accordance with a supplied the line information, the combined moving image data AR moving image generating apparatus of Ru with different image represents.

A first receiver that receives first moving image data of a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second receiving unit that receives second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
Generates synthesized moving image data in which an image corresponding to the pixel extracted by the extracting unit is combined with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit. A generator,
A transmission unit that transmits the composite video data generated by the generation unit ,
Resource information related to hardware or software resources in the terminal is supplied from a connection management device that connects the terminal that transmitted the first video data and the terminal that transmitted the second video data, and the supplied resource information Correspondingly, the combined moving image data AR moving image generating apparatus of Ru with different images represented by the.

A first receiver that receives first moving image data of a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second receiving unit that receives second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
Generates synthesized moving image data in which an image corresponding to the pixel extracted by the extracting unit is combined with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit. A generator,
A transmission unit that transmits the composite video data generated by the generation unit ,
A transmission / reception unit that transmits / receives a voice message between the terminal that has transmitted the first video data and the terminal that has transmitted the second video data, and changes in volume or sound quality of the voice message transmitted / received by the transmission / reception unit; Information representing a change in volume or sound quality of a voice message transmitted / received by the transmission / reception unit is supplied from a connection management device that supplies information to be represented, and the video represented by the composite video data is made different according to the supplied information. AR video generator.

A first receiver that receives first moving image data of a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second receiving unit that receives second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
An extraction unit that extracts pixels corresponding to the predetermined object from the subject based on the distance information of the first moving image data received by the first reception unit;
Generates synthesized moving image data in which an image corresponding to the pixel extracted by the extracting unit is combined with the background so as to correspond to the coordinate system defined in the second moving image data received by the second receiving unit. A generator,
A transmission unit for transmitting the composite video data generated by the generation unit ;
An analysis unit that analyzes changes in the subject in the first moving image data received by the first reception unit;
A transmission / reception unit that transmits and receives a voice message between a terminal that has transmitted the first moving image data and a terminal that has transmitted the second moving image data, and transmits and receives the message according to information representing a change in the supplied subject. An AR video generation device that supplies information representing a change in the subject analyzed by the analysis unit to a connection management device including a transmission / reception unit that varies the volume or sound quality of the voice message .

On the computer,
A first step of receiving first moving image data obtained by photographing a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step;
The synthesized moving image data is generated by combining the image corresponding to the pixel extracted in the third step with the background so as to correspond to the coordinate system defined in the second moving image data received in the second step. The fourth step;
A fifth step of transmitting the composite video data generated by the fourth step ;
A terminal that transmits the first video data and a terminal that transmits the second video data from a connection management device that manages a connection between the terminal that transmits the first video data and the terminal that transmits the second video data; A program for executing the sixth step of supplying line information related to the communication line connecting the video and differentiating the video represented by the composite video data according to the supplied line information .

On the computer,
A first step of receiving first moving image data obtained by photographing a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step;
The synthesized moving image data is generated by combining the image corresponding to the pixel extracted in the third step with the background so as to correspond to the coordinate system defined in the second moving image data received in the second step. The fourth step;
A fifth step of transmitting the composite video data generated by the fourth step ;
Resource information related to hardware or software resources in the terminal is supplied from a connection management device that connects the terminal that transmitted the first video data and the terminal that transmitted the second video data, and the supplied resource information And a sixth step of executing the sixth step of making the video represented by the combined video data different .

On the computer,
A first step of receiving first moving image data obtained by photographing a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step;
The synthesized moving image data is generated by combining the image corresponding to the pixel extracted in the third step with the background so as to correspond to the coordinate system defined in the second moving image data received in the second step. The fourth step;
A fifth step of transmitting the composite video data generated by the fourth step ;
A transmission / reception unit that transmits / receives a voice message between the terminal that has transmitted the first video data and the terminal that has transmitted the second video data, and changes in volume or sound quality of the voice message transmitted / received by the transmission / reception unit; Information representing a change in volume or sound quality of a voice message transmitted / received by the transmission / reception unit is supplied from a connection management device that supplies information to be represented, and the video represented by the composite video data is made different according to the supplied information. A program for executing the sixth step .

On the computer,
A first step of receiving first moving image data obtained by photographing a subject including a predetermined object, the first moving image data including distance information for each pixel of the subject;
A second step of receiving second moving image data in which a background including a marker having a predetermined shape is captured, wherein the second moving image data defines a three-dimensional coordinate system based on the marker;
A third step of extracting a pixel corresponding to the predetermined object from the subject based on the distance information of the first moving image data received in the first step;
The synthesized moving image data is generated by combining the image corresponding to the pixel extracted in the third step with the background so as to correspond to the coordinate system defined in the second moving image data received in the second step. The fourth step;
A fifth step of transmitting the composite video data generated by the fourth step ;
A sixth step of analyzing a change in the subject in the first moving image data received in the first step;
A transmission / reception unit that transmits and receives a voice message between a terminal that has transmitted the first moving image data and a terminal that has transmitted the second moving image data, and transmits and receives the message according to information representing a change in the supplied subject. A program for executing a seventh step of supplying information representing a change in the subject analyzed in the sixth step to a connection management device including a transmission / reception unit that varies the volume or sound quality of the voice message .