JP2014175837A

JP2014175837A - Apparatus for extracting and synthesizing subject image

Info

Publication number: JP2014175837A
Application number: JP2013046562A
Authority: JP
Inventors: Masaru Sugano; 勝菅野; Hitoshi Naito; 整内藤; Kentaro Yamada; 健太郎山田
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-03-08
Filing date: 2013-03-08
Publication date: 2014-09-22
Anticipated expiration: 2033-03-08
Also published as: JP6090917B2

Abstract

PROBLEM TO BE SOLVED: To properly extract an image of a subject even if illumination varies and synthesize the image to another space image, without preparing and displaying a background image or causing a user to preset a foreground and a background.SOLUTION: A depth data acquisition unit 11 acquires depth data within a space including a subject, and a texture data acquisition unit 12 acquires texture data within the space including the subject. A subject data extraction unit 10 identifies the area candidate of the subject by using the difference between the subject and the background in the depth data, evaluates image features in the texture data, identifies the area of the subject from a result of them, and extracts the texture data. A space image input unit 13 receives an arbitrary space image, and a subject data synthesis unit 14 synthesizes the texture data of the area of the subject to the space image.

Description

本発明は、被写体画像抽出および合成装置に関し、特に、被写体を含む空間の画像から被写体の画像を抽出し、抽出された被写体の画像を別の任意の空間映像に合成する被写体画像抽出および合成装置に関する。 The present invention relates to a subject image extraction and synthesis device, and more particularly to a subject image extraction and synthesis device that extracts a subject image from a space image including the subject and synthesizes the extracted subject image with another arbitrary spatial image. About.

遠隔地コミュニケーションシステムとしてＴＶ会議システムや、より高性能・高画質のテレプレゼンスシステムが知られている。また、遠隔地にいる会議参加者が同じ会議室の空間内にいるような感覚を提供するシステムも提案されている。 As a remote communication system, a TV conference system and a telepresence system with higher performance and higher image quality are known. In addition, a system has been proposed that provides a feeling that conference participants at remote locations are in the same conference room space.

参加者が同じ会議室などの空間内にいるような感覚は、遠隔の各拠点に物理的に同じ空間を作り、参加者を含む各拠点の空間映像を他の拠点に送信して再現することにより実現できる。しかし、これでは、各拠点に物理的に同じ空間を予め作っておく必要があるので、コストが増大するという課題があり、また、拠点が限られるという課題もある。 The feeling that participants are in the same space such as a conference room is created by physically creating the same space at each remote site and transmitting the spatial video of each site including the participants to other sites. Can be realized. However, in this case, since it is necessary to create the same physical space in advance at each base, there is a problem that the cost increases, and there is also a problem that the base is limited.

各拠点の空間画像から参加者の画像だけを抽出し、抽出された参加者の画像を他の拠点に送信し、他の拠点の空間映像に合成するという画像処理を用いれば、各拠点に物理的に同じ空間を予め作っておく必要がなく、参加者が仮想的に同じ空間にいるような感覚を実現できるので、コスト面で有利であり、また、拠点が限られることもない。 If you use image processing that extracts only the participant's image from the spatial image of each base, sends the extracted participant's image to the other base, and synthesizes it with the spatial video of the other base, Therefore, it is not necessary to create the same space in advance, and it is possible to realize a feeling that the participant is virtually in the same space, which is advantageous in terms of cost and the base is not limited.

特許文献１では、背景画像を予め生成して表示し、表示された背景画像の前に被写体を配置し、背景画像とともに被写体を撮影して得られる画像から被写体の画像を抽出し、抽出された被写体の画像を別の背景画像に合成する画像撮影・表示装置が提案されている。 In Patent Document 1, a background image is generated and displayed in advance, a subject is placed in front of the displayed background image, and an image of the subject is extracted from an image obtained by photographing the subject together with the background image. There has been proposed an image capturing / displaying device that combines an image of a subject with another background image.

特許文献２では、第１のカラーカメラによるカラー画像における特定色領域と、第１のカラーカメラによるカラー画像および第２のカラーカメラによるカラー画像の視差に基づいて生成された距離画像に基づいて、特定色を有する対象を認識する対象認識装置が提案されている。 In Patent Document 2, based on a specific color region in a color image by a first color camera, a distance image generated based on a parallax between a color image by a first color camera and a color image by a second color camera, An object recognition device that recognizes an object having a specific color has been proposed.

特許文献３では、カラー画像と距離画像を取得し、ユーザによる前景または背景の設定に従ってカラー画像の色情報と距離画像の距離情報を統合的に用いて画像から対象物をセグメンテーションする画像セグメンテーション方法が提案されている。 In Patent Document 3, an image segmentation method that acquires a color image and a distance image and segments an object from the image by using color information of the color image and distance information of the distance image in an integrated manner according to the foreground or background setting by the user. Proposed.

特開２０１０−２１３１２４号公報JP 2010-213124 A 特開２０１１−１９８２７０号公報JP 2011-198270 A 特開２０１０−３９９９９号公報JP 2010-39999 A

特許文献１の画像撮影・表示装置では、予め背景画像を生成して表示し、表示された背景画像の前に被写体を配置し、それを撮影して得られる画像から被写体領域を判別して被写体の画像を抽出するので、背景画像としてブルーバック等の単一色画像あるいは被写体と区別できるような色・模様の画像を予め作成する必要があり、また、背景画像を表示するための画像表示装置を必要とするので、装置構成が複雑になるという課題がある。 In the image capturing / display apparatus disclosed in Patent Document 1, a background image is generated and displayed in advance, a subject is placed in front of the displayed background image, and a subject region is determined from an image obtained by capturing the subject. Therefore, it is necessary to create in advance a single color image such as a blue background or a color / pattern image that can be distinguished from the subject as a background image, and an image display device for displaying the background image Since this is necessary, there is a problem that the apparatus configuration becomes complicated.

特許文献２の対象認識装置は、まず、カラー画像から肌色などの事前に指定した色を有する手などの特定色領域を検出し、次に、特定色領域のうち、距離画像における距離が閾値以上の領域を排除することで特定色を有する対象を認識するので、ユーザの手など色が予め分かっている対象を認識するのに有効なものである。この技術では、対象の色を事前に指定する必要があるので、色が未知の対象や衣服を着用している人物などの多様な色を含む対象の画像を抽出できないという課題がある。また、照明変動などによってカラー画像の色味が変わると、事前に指定した色では対象の画像を適切に抽出できなくなる可能性もある。 The object recognition device of Patent Document 2 first detects a specific color region such as a hand having a predesignated color such as a skin color from a color image, and then, in the specific color region, the distance in the distance image is equal to or greater than a threshold value. Since the object having the specific color is recognized by eliminating this area, it is effective for recognizing the object whose color is known in advance, such as the user's hand. In this technique, since it is necessary to specify a target color in advance, there is a problem that an image of a target including various colors such as a target whose color is unknown or a person wearing clothes cannot be extracted. In addition, when the color of the color image changes due to illumination fluctuations or the like, there is a possibility that the target image cannot be appropriately extracted with the color designated in advance.

特許文献３の画像セグメンテーション方法では、セグメンテーションに先立ってユーザが前景または背景の一部を設定する手間を要するため、利便性が損なわれるという課題がある。 In the image segmentation method of Patent Document 3, since the user needs to set a part of the foreground or background prior to segmentation, there is a problem that convenience is impaired.

被写体を含む空間の画像取得時に照明変動などがあると、その変動が背景と被写体の判別に影響し、被写体の画像を適切に抽出できなくなる可能性がある。また、被写体を含む空間の画像取得時と被写体の画像を合成する他の空間映像の取得時とで光源の位置や色味などの照明条件が異なると、被写体の画像を合成したときの会議映像が不自然になり、また、他の空間映像にオブジェクトが存在する場合も、被写体とオブジェクトの隠蔽・配置などの関係から、被写体の画像を合成したときの会議映像が不自然になることもある。特許文献１−３には、このような問題を解決する手法は開示されていない。 If there is an illumination variation or the like when acquiring an image of a space including the subject, the variation affects the discrimination between the background and the subject, and the subject image may not be extracted properly. Also, if the lighting conditions such as the position of the light source and the color are different between the acquisition of the image of the space including the subject and the acquisition of another spatial image that combines the images of the subject, the conference video when the images of the subject are combined When there is an object in other spatial video, the conference video when the subject images are combined may become unnatural due to the relationship between the subject and the object. . Patent Documents 1-3 do not disclose a method for solving such a problem.

本発明の目的は、背景画像を予め準備して表示させたりユーザが前景と背景を予め設定したりするなどの必要がなく、また、照明変動があっても被写体の画像を適切に抽出して他の空間映像に合成できる被写体画像抽出および合成装置を提供することにある。 An object of the present invention is that it is not necessary to prepare and display a background image in advance or to set a foreground and a background in advance by a user. An object of the present invention is to provide a subject image extraction and synthesis device that can be synthesized with other spatial images.

本発明の他の目的は、被写体を含む空間の画像取得時と他の空間映像の取得時とで光源の位置や色味などの照明条件が異なる場合でも、被写体の画像が合成されたときの会議映像に不自然さが生じない被写体画像抽出および合成装置を提供することにある。 Another object of the present invention is that when the image of the subject is synthesized even when the lighting conditions such as the position of the light source and the color are different between the acquisition of the image of the space including the subject and the acquisition of another spatial video. An object of the present invention is to provide a subject image extraction and synthesis device that does not cause unnaturalness in a conference video.

本発明のさらに他の目的は、他の空間映像に机などのオブジェクトが存在する場合でも、被写体の画像が合成されたときの会議映像に不自然さが生じない被写体抽出および合成装置を提供することにある。 Still another object of the present invention is to provide a subject extraction and synthesis device that does not cause unnaturalness in a conference video when a subject image is synthesized even when an object such as a desk is present in another spatial video. There is.

上記課題を解決するため、本発明は、被写体を含む空間内の奥行きデータを取得する奥行きデータ取得手段と、被写体を含む空間内のテクスチャデータを取得するテクスチャデータ取得手段と、前記奥行きデータ取得手段により取得された奥行きデータと前記テクスチャデータ取得手段により取得されたテクスチャデータから、被写体領域を特定して該被写体領域のテクスチャデータを抽出する被写体データ抽出手段と、任意の空間映像を入力する空間映像入力手段と、前記被写体データ抽出手段により抽出された被写体領域のテクスチャデータを前記入力手段により入力された空間映像に合成する被写体データ合成手段を備え、前記被写体データ抽出手段は、前記奥行きデータ取得手段により取得された奥行きデータにおける被写体と背景の差を利用して被写体領域候補を特定し、さらに前記テクスチャデータ取得手段により取得されたテクスチャデータにおける画像特徴を評価し、これらの結果から被写体領域を特定し、そのテクスチャデータを抽出する点に第１の特徴がある。 In order to solve the above problems, the present invention provides depth data acquisition means for acquiring depth data in a space including a subject, texture data acquisition means for acquiring texture data in a space including the subject, and the depth data acquisition means. Subject data extracting means for specifying a subject area and extracting texture data of the subject area from the depth data obtained by the above and the texture data obtained by the texture data obtaining means, and a spatial video for inputting an arbitrary spatial video Input means; and subject data synthesizing means for synthesizing the texture data of the subject area extracted by the subject data extracting means with the spatial video input by the input means, wherein the subject data extracting means comprises the depth data acquiring means Subject and background in depth data acquired by The difference is that the subject region candidate is specified, the image feature in the texture data acquired by the texture data acquisition means is evaluated, the subject region is specified from these results, and the texture data is extracted. There is one feature.

また、本発明は、前記被写体データ抽出手段が、前記テクスチャデータ取得手段により取得されたテクスチャデータにおける画像特徴として小領域の色または輝度またはテクスチャ情報の空間的連続性を、類似性を含めて評価する点に第２の特徴がある。 Further, according to the present invention, the subject data extraction unit evaluates the color or luminance of a small area or the spatial continuity of texture information as an image feature in the texture data acquired by the texture data acquisition unit, including similarity. There is a second feature.

また、本発明は、前記被写体データ抽出手段が、被写体領域候補の境界近傍付近のテクスチャデータにおける画像特徴を評価し、この評価結果で被写体領域候補の境界を補完することにより被写体領域を特定する点に第３の特徴がある。 Further, the present invention is characterized in that the subject data extracting means evaluates image features in texture data near the boundary of the subject region candidate, and identifies the subject region by complementing the boundary of the subject region candidate with this evaluation result. Has a third feature.

また、本発明は、前記奥行きデータ取得手段が、被写体を含む空間内の奥行きデータを連続的に取得して累積する累積手段を備え、前記被写体データ抽出手段は、前記累積手段により累積された奥行きデータにおける被写体と背景の差から被写体を構成する各構成要素の領域候補を特定して被写体の領域候補を特定する点に第４の特徴がある。 In the present invention, the depth data acquisition means includes accumulation means for continuously obtaining and accumulating depth data in a space including the subject, and the subject data extraction means is the depth accumulated by the accumulation means. A fourth feature is that a candidate area of a subject is specified by specifying a candidate area of each component constituting the subject from the difference between the subject and the background in the data.

また、本発明は、前記空間映像入力手段により入力される空間映像のテクスチャデータと奥行きデータを用いて該空間映像におけるオブジェクトの特性を推定するオブジェクト特性推定手段を備え、前記被写体データ合成手段は、前記オブジェクト特性推定手段により推定されたオブジェクトの特性に応じて被写体領域のテクスチャデータを空間映像のテクスチャデータに対して隠蔽、配置する点に第５の特徴がある。 The present invention further comprises object characteristic estimation means for estimating the characteristics of an object in the spatial video using the texture data and depth data of the spatial video input by the spatial video input means, and the subject data synthesis means includes: A fifth feature is that the texture data of the subject area is concealed and arranged with respect to the texture data of the spatial video according to the object characteristics estimated by the object characteristic estimation means.

また、本発明は、前記オブジェクト特性推定手段が、前記空間映像入力手段により入力された空間映像の奥行きデータを用いてオブジェクト領域候補を特定し、さらに前記空間映像入力手段により入力された空間映像のテクスチャデータにおける画像特徴を評価し、これらの結果からオブジェクトの三次元位置および形状をオブジェクトの特性として推定する点に第６の特徴がある。 Further, according to the present invention, the object characteristic estimation unit specifies an object region candidate using the depth data of the spatial video input by the spatial video input unit, and further, the spatial video input by the spatial video input unit There is a sixth feature in that the image feature in the texture data is evaluated, and the three-dimensional position and shape of the object are estimated as the characteristics of the object from these results.

また、本発明は、前記オブジェクト特性推定手段が、前記空間映像入力手段により入力された空間映像のテクスチャデータにおける画像特徴として色または輝度またはテクスチャ情報の空間的連続性を、類似性を含めて評価する点に第７の特徴がある。 Further, according to the present invention, the object characteristic estimation means evaluates the spatial continuity of color or luminance or texture information as an image feature in the texture data of the spatial video input by the spatial video input means, including similarity. There is a seventh feature.

また、本発明は、前記被写体データ合成手段が、前記オブジェクト特性推定手段により推定されたオブジェクトの特性における奥行きデータが予め定められた値以下のオブジェクト領域に重なる被写体領域の部分のテクスチャデータを、該オブジェクト領域のテクスチャデータで隠蔽する点に第８の特徴がある。 In the present invention, the subject data combining means may include texture data of a portion of the subject area that overlaps an object area whose depth data in the object characteristics estimated by the object characteristic estimation means is equal to or less than a predetermined value. The eighth feature is that the object area is hidden by the texture data.

また、本発明は、前記被写体データ合成手段が、前記オブジェクト特性推定手段により推定されたオブジェクトの特性における奥行きデータが前記奥行きデータ取得手段により取得された被写体の奥行きデータの値以下のオブジェクト領域に重なる被写体領域の部分のテクスチャデータを、該オブジェクト領域のテクスチャデータで隠蔽する点に第９の特徴がある。 Further, in the present invention, the subject data synthesizing unit overlaps an object region in which the depth data in the object characteristic estimated by the object characteristic estimation unit is equal to or less than the value of the subject depth data acquired by the depth data acquisition unit. A ninth feature is that the texture data of the subject area is hidden by the texture data of the object area.

また、本発明は、さらに、被写体を含む空間における光源の照明条件を推定する第１の光源推定手段と、前記空間映像入力手段により入力された空間映像が取得された空間における光源の照明条件を推定する第２の光源推定手段と、前記被写体データ抽出手段により抽出された被写体の領域のテクスチャデータを前記第１および第２の光源推定手段により推定された光源の照明条件の相違に応じて加工する被写体加工手段を備え、前記被写体加工手段により加工された被写体の領域のテクスチャデータを前記空間映像入力手段により入力された空間映像に合成する点に第１０の特徴がある。 The present invention further includes a first light source estimating unit that estimates a lighting condition of a light source in a space including a subject, and a lighting condition of the light source in a space from which the spatial video input by the spatial video input unit is acquired. Processing the texture data of the subject area extracted by the second light source estimation means to be estimated and the subject data extraction means in accordance with the difference in illumination conditions of the light sources estimated by the first and second light source estimation means. There is a tenth feature in that a subject processing means is provided, and texture data of a subject region processed by the subject processing means is combined with a spatial video input by the spatial video input means.

さらに、本発明は、前記第１の光源推定手段が、前記テクスチャデータ取得手段により取得されたテクスチャデータをＨＳＶ色空間に変換し、所定範囲内のＨｕｅ値を持つ複数の微小領域におけるＶａｌｕｅ値を比較することにより被写体を含む空間における光源の照明条件を推定し、前記第２の光源推定手段が、前記空間映像入力手段により入力された空間映像のテクスチャデータをＨＳＶ色空間に変換し、所定範囲内のＨｕｅ値を持つ複数の微小領域におけるＶａｌｕｅ値を比較することにより空間映像が取得された空間における光源の照明条件を推定する点に第１１の特徴がある。 Further, according to the present invention, the first light source estimation unit converts the texture data acquired by the texture data acquisition unit into an HSV color space, and sets the Value values in a plurality of minute regions having a Hue value within a predetermined range. By comparing, the illumination condition of the light source in the space including the subject is estimated, and the second light source estimation unit converts the texture data of the spatial video input by the spatial video input unit into the HSV color space, and a predetermined range There is an eleventh feature in that the illumination condition of the light source in the space where the spatial image is acquired is estimated by comparing the Value values in a plurality of minute regions having the Hue value.

本発明では、奥行きデータを用いて被写体と背景の奥行きの差から空間画像中の被写体領域候補を特定し、さらにテクスチャデータにおける画像特徴を評価し、これらの結果を用いて被写体領域を特定し、その画像を抽出するので、背景画像を予め準備して表示させたり、ユーザが前景と背景を予め設定したりする必要がなく、また、照明変動があっても、被写体を含む空間のテクスチャデータ(画像)から被写体の画像を適切に抽出できる。 In the present invention, using the depth data, the subject region candidate in the spatial image is specified from the difference between the depth of the subject and the background, and further, the image feature in the texture data is evaluated, and the subject region is specified using these results. Since the image is extracted, there is no need to prepare and display a background image in advance, the user does not have to set the foreground and the background in advance, and even if there is illumination variation, the texture data of the space including the subject ( The image of the subject can be appropriately extracted from the image.

また、テクスチャデータにおける画像特徴を小領域の色または輝度またはテクスチャ情報とし、その空間的連続性を、類似性を含めて評価することにより、照明変動があっても被写体の画像を適切に抽出できる。 Moreover, the image feature in the texture data is the color, brightness or texture information of the small area, and the spatial continuity including the similarity is evaluated, so that the image of the subject can be extracted appropriately even if there is illumination variation. .

また、被写体を含む空間の撮影時と他の空間映像の撮影時とで光源の位置や色味などの照明条件が異なる場合でも、被写体の画像を加工して両者の照明条件の相違を吸収した上で被写体の画像を他の空間映像に合成することにより、自然な会議映像を再現できる。 In addition, even when the lighting conditions such as the position of the light source and the color are different between the shooting of the space containing the subject and the shooting of other spatial images, the subject image is processed to absorb the difference between the two lighting conditions. A natural conference video can be reproduced by combining the subject image with another spatial video.

さらに、他の空間映像に机などのオブジェクトが存在する場合でも、奥行きデータとテクスチャデータを用いて空間内のオブジェクトの三次元位置や形状などの特性を推定し、推定された特性に応じて被写体とオブジェクトの隠蔽・配置などの関係を考慮して被写体の画像を他の空間映像に合成することにより、自然な会議映像を再現できる。 Furthermore, even when objects such as desks exist in other spatial images, the depth data and texture data are used to estimate the 3D position, shape, and other characteristics of the object in the space, and the subject according to the estimated characteristics. Considering the relationship between the object and the object's concealment / arrangement, etc., a natural conference video can be reproduced by synthesizing the subject image with another spatial video.

これにより、遠隔地コミュニケーションシステムにおいて、遠隔拠点にいる参加者が恰も同一の空間にいるような感覚を提供することができる。 Thereby, in the remote communication system, it is possible to provide a feeling that a participant at a remote base is in the same space.

本発明の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of this invention. 本発明を双方向遠隔地コミュニケーションシステムに適用した実施形態を概略的に示すブロック図である。1 is a block diagram schematically showing an embodiment in which the present invention is applied to a two-way remote communication system. 図２の空間取得装置と空間再現装置の構成を詳細に示すブロック図である。It is a block diagram which shows the structure of the space acquisition apparatus and space reproduction apparatus of FIG. 2 in detail. 本発明における動作を概略的に示す説明図である。It is explanatory drawing which shows the operation | movement in this invention roughly.

以下、図面を参照して本発明を説明する。図１は、本発明の一実施形態を示すブロック図である。 The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention.

この実施形態の被写体画像抽出および合成装置は、被写体データ抽出部10、奥行きデータ取得部11、テクスチャデータ取得部12、空間映像入力部13および被写体データ合成部14を備える。 The subject image extraction and synthesis device of this embodiment includes a subject data extraction unit 10, a depth data acquisition unit 11, a texture data acquisition unit 12, a spatial video input unit 13, and a subject data synthesis unit 14.

被写体データ抽出部10および被写体データ合成部14は、プロセッサのソフトウエアあるいはハードウエアで構成することができる。また、奥行きデータ取得部11にはデプスカメラを用いることができ、テクスチャデータ取得部12にはカラーカメラを用いることができる。 The subject data extraction unit 10 and the subject data synthesis unit 14 can be configured by processor software or hardware. Further, a depth camera can be used for the depth data acquisition unit 11, and a color camera can be used for the texture data acquisition unit 12.

被写体データ抽出部10は、奥行きデータ差分計算部15、被写体領域候補特定部16、画像特徴抽出部17、画像特徴評価部18および被写体領域特定・テクスチャデータ抽出部19を備える。 The subject data extracting unit 10 includes a depth data difference calculating unit 15, a subject region candidate specifying unit 16, an image feature extracting unit 17, an image feature evaluating unit 18, and a subject region specifying / texture data extracting unit 19.

以下では、会議室などの任意の室内空間に人物が存在し、人物が被写体であるとして説明する。 In the following description, it is assumed that a person exists in an arbitrary indoor space such as a conference room and the person is a subject.

デプスカメラは、室内空間に位置する人物のほぼ正面に設置され、奥行きデータ取得部11は、人物を含む室内空間全体についての奥行きデータ(デプス値)を取得する。 The depth camera is installed almost in front of a person located in the indoor space, and the depth data acquisition unit 11 acquires depth data (depth value) for the entire indoor space including the person.

奥行きデータ差分計算部15は、デプスカメラの視野領域を奥行きデータの値に応じて各領域に区分し、さらに、各領域間の奥行きデータ差分を計算する。 The depth data difference calculation unit 15 divides the visual field region of the depth camera into each region according to the value of the depth data, and further calculates the depth data difference between the regions.

被写体領域候補特定部16は、奥行きデータ差分計算部15により計算された奥行きデータ差分に基づいて人物(被写体)領域候補を特定する。会議室などの室内空間では、人物が位置する箇所(座席位置)は、予め定められていて、デプスカメラの設置位置から人物が位置する箇所までの距離が分かっており、デプスカメラから見たときの人物と背景には奥行きに差があるので、人物領域候補は、奥行きデータ差分計算部15で区分された領域のうち、デプスカメラから人物の位置までのデプス値を含む領域を選択することで特定できる。奥行きデータ差分は、照明変動に影響されないので、ここでは照明変動の影響を受けることなく人物領域候補を特定できる。 The subject area candidate specifying unit 16 specifies a person (subject) area candidate based on the depth data difference calculated by the depth data difference calculating unit 15. In an indoor space such as a conference room, the location where the person is located (seat position) is determined in advance, and the distance from the installation position of the depth camera to the location where the person is located is known and viewed from the depth camera. Since there is a difference in depth between the person and the background, the candidate for the human area can be selected by selecting an area including the depth value from the depth camera to the position of the person among the areas divided by the depth data difference calculation unit 15. Can be identified. Since the depth data difference is not affected by the illumination fluctuation, the human region candidate can be specified here without being affected by the illumination fluctuation.

人物と背景の奥行きの差が顕著でなかったり、人物の近くに他のオブジェクトが存在したり、人物の輪郭部分の奥行きデータにノイズが含まれていたりすると、奥行きデータだけを用いて特定される領域を人物領域としたのでは信頼性が十分でない。したがって、被写体領域候補特定部16で、まず、人物領域候補を特定する。 If the difference in depth between the person and the background is not noticeable, if there is another object near the person, or if the depth data of the outline of the person contains noise, it will be identified using only the depth data. If the area is a person area, the reliability is not sufficient. Accordingly, the subject area candidate specifying unit 16 first specifies a person area candidate.

なお、人物と背景の奥行きの差が顕著でなくても、それらの奥行きデータを連続的に観測して累積すれば人物と背景の奥行きデータ差分を顕著にできる。被写体領域候補特定部16では、そのような手法を用いて人物領域候補を特定してもよい。この手法を用いれば、人物の構成要素、例えば顔、胴体、腕などの各構成要素まで別々に特定できる。これは、人物の各構成要素の色が多様である場合に、まず、各構成要素の領域を特定し、次に、構成要素の集まりとして人物領域を特定するのに有効である。 Even if the difference in depth between the person and the background is not remarkable, the depth data difference between the person and the background can be made remarkable by continuously observing and accumulating the depth data. The subject area candidate specifying unit 16 may specify a person area candidate using such a method. By using this method, it is possible to separately specify the constituent elements of a person, for example, constituent elements such as a face, a torso, and an arm. This is effective for specifying the region of each component first, and then specifying the person region as a collection of components when the colors of each component of the person are diverse.

奥行きデータだけを用いて特定される領域を人物領域としたのでは信頼性が十分でないので、以下に説明するように、さらに、テクスチャデータ取得部12により取得されるテクスチャデータを補完的に用いて人物領域候補から人物領域を特定する。 Since the reliability is not sufficient if the area specified using only the depth data is a person area, as described below, the texture data acquired by the texture data acquisition unit 12 is used in a complementary manner. A person area is identified from the person area candidates.

カラーカメラは、室内空間に位置する人物のほぼ正面に設置される。この設置位置は、デプスカメラとほぼ同じ位置とし、その視野領域もデプスカメラとほぼ同じとする。テクスチャデータ取得部12は、人物を含む室内空間全体についてのテクスチャデータ(画像)を取得する。 The color camera is installed almost in front of a person located in the indoor space. The installation position is substantially the same as that of the depth camera, and the visual field area is also substantially the same as that of the depth camera. The texture data acquisition unit 12 acquires texture data (images) for the entire indoor space including a person.

画像特徴抽出部17は、テクスチャデータ取得部12により取得されたテクスチャデータの画像特徴を抽出する。例えば、テクスチャデータを小領域ごとに分割し、各小領域の色情報を画像特徴として抽出する。 The image feature extraction unit 17 extracts the image feature of the texture data acquired by the texture data acquisition unit 12. For example, the texture data is divided for each small area, and the color information of each small area is extracted as an image feature.

画像特徴評価部18は、被写体領域候補特定部16により特定された人物領域候補の境界近傍付近における画像特徴を評価する。例えば、人物領域候補の境界近傍付近における各小領域の色情報の空間的連続性を評価する。各小領域の色情報の空間的連続性は、人物領域候補の境界付近内側の小領域(人物領域と推定される位置)から色情報を抽出し、それに隣接する小領域の色情報がその色情報と同じかどうかを順次行うことで評価できる。ここで、色情報の類似性まで含めて空間的連続性を評価すれば、照明の当たり具合などの影響を吸収して色情報の連続性を評価できる。ここでは、人物領域候補の境界近傍付近の各小領域だけの色情報を評価すればよいので、処理負担は軽い。 The image feature evaluation unit 18 evaluates image features near the boundary of the human region candidate specified by the subject region candidate specifying unit 16. For example, the spatial continuity of the color information of each small region in the vicinity of the boundary of the human region candidate is evaluated. The spatial continuity of the color information of each small area is obtained by extracting color information from the small area inside the vicinity of the boundary of the human area candidate (position estimated as the human area), and the color information of the small area adjacent to it is the color information. Whether it is the same as information can be evaluated by sequentially performing it. Here, if the spatial continuity is evaluated including the similarity of the color information, it is possible to evaluate the continuity of the color information by absorbing the influence such as the lighting condition. Here, the color information of only each small region near the boundary of the human region candidate has only to be evaluated, so the processing load is light.

また、類似すると評価された隣接する小領域の色情報をフィードバックし、その色情報を用いて、さらに隣接する小領域の色情報の空間的連続性を評価するという処理を繰り返し行えば、隣接する小領域間で徐々に色情報が変わっていても、それらの小領域の色情報は空間的に連続していると評価できる。色情報の類似性まで広げて評価することは、照明の当たり具合などの影響を吸収するのに有効であるが、類似の範囲は元々異なる他の色情報が含まれないように設定される。 Further, if the color information of the adjacent small area evaluated to be similar is fed back and the process of evaluating the spatial continuity of the color information of the adjacent small area is repeated using the color information, the adjacent small area is adjacent. Even if the color information gradually changes between the small areas, it can be evaluated that the color information of these small areas is spatially continuous. Although it is effective to spread the evaluation to the similarity of the color information to absorb the influence such as the lighting condition, the similar range is set so as not to include other different color information.

さらに、画像特徴の評価に際しては、色情報の時間的変化を連続して観測し、その時間的変化が所定値以下であれば無視するという手法を併用して、照明変動による影響を低減することもできる。なお、所定サイズ以下で周囲から孤立している色情報の領域は、ノイズとして無視し、周囲の色情報と同じとすればよい。 In addition, when evaluating image characteristics, the temporal change of color information is continuously observed, and if the temporal change is less than a predetermined value, it is ignored and the effect of lighting fluctuations is reduced. You can also. It should be noted that the area of color information that is smaller than a predetermined size and isolated from the surroundings is ignored as noise and may be the same as the surrounding color information.

被写体領域特定・テクスチャデータ抽出部19は、被写体領域候補特定部16により特定された人物領域候補と画像特徴評価部18による画像特徴の評価結果を用いて人物領域を特定し、人物領域のテクスチャデータ(人物の画像)を抽出する。例えば、人物領域候補の境界近傍付近における各小領域の色情報の空間的連続性から人物領域の境界を定めることにより、奥行きデータだけを用いた人物領域の特定での信頼性の問題を補完できる。なお、机などオブジェクトの領域が人物領域候補に含まれていても、そのテクスチャデータや奥行きデータの特徴を基にその領域を排除できる。例えば、机などのオブジェクトのテクスチャデータは、固定的な色情報(画像特徴)を持ち、人物の色情報と相違するのが普通であるので、その相違に基づいて机などのオブジェクトの領域を排除できる。また、奥行きデータで一定長さ以上の水平の直線が検出されれば、その直性を含む領域は机であると判定でき、この判定により机の領域を排除できる。なお、人物の直ぐ前の会議用ノートパソコンや直ぐ後ろの椅子などのオブジェクトは、その領域を含めて人物領域と特定してもよい。 The subject region specifying / texture data extracting unit 19 specifies a person region using the person region candidate specified by the subject region candidate specifying unit 16 and the image feature evaluation result by the image feature evaluating unit 18, and the texture data of the person region Extract (person image). For example, by defining the boundary of the person area from the spatial continuity of the color information of each small area near the boundary of the person area candidate, it is possible to complement the reliability problem in specifying the person area using only the depth data . Even if an object area such as a desk is included in the person area candidate, the area can be excluded based on the features of the texture data and depth data. For example, the texture data of an object such as a desk has fixed color information (image characteristics) and is usually different from the color information of a person, so the area of the object such as a desk is excluded based on the difference. it can. If a horizontal straight line having a certain length or more is detected from the depth data, it can be determined that the area including the straightness is a desk, and the desk area can be excluded by this determination. Note that an object such as a conference notebook computer immediately in front of a person or a chair immediately behind may be specified as a person area including the area.

被写体領域特定・テクスチャデータ抽出部19で、人物領域を特定するために空間的連続性をみる色情報は、人物領域候補内で人物領域を構成すると推定される小領域の色情報を適宜選択すればよい。また、被写体領域候補特定部16で人物の構成要素、例えば顔、胴体、腕などの各構成要素の領域候補まで別々に特定されている場合には、構成要素の領域候補内で構成要素の領域を構成すると推定される小領域の色情報をそれぞれ選択し、その空間的連続性をみて各構成要素の領域を特定し、各構成要素の領域の集まりを人物領域として特定すればよい。 In the subject area specifying / texture data extracting unit 19, the color information for spatial continuity in order to specify the person area is selected as appropriate from the color information of the small area estimated to constitute the person area in the person area candidate. That's fine. In addition, when the subject region candidate specifying unit 16 separately specifies the constituent elements of a person, for example, the constituent candidate areas of each constituent element such as the face, the torso, and the arm, the constituent element areas within the constituent element candidate areas The color information of the small regions estimated to constitute the image is selected, the region of each component is identified by looking at its spatial continuity, and the group of regions of each component is identified as the person region.

以上のように、奥行きデータを用いて人物領域候補を特定し、さらにテクスチャデータの画像特徴を評価し、これらを用いて人物領域を特定することにより、色が未知あるいは色が多様な人物の場合でも、照明変動などにより空間画像の色味が変わる場合でも、人物領域を適切に特定してその画像を抽出できる。 As described above, by identifying person area candidates using depth data, further evaluating the image characteristics of texture data, and identifying person areas using these, it is possible for people with unknown or diverse colors However, even when the color of the spatial image changes due to illumination variation or the like, it is possible to appropriately identify the person region and extract the image.

空間映像入力部13は、任意の空間映像を入力する。また、被写体データ合成部14は、空間映像入力部13により入力された空間映像に、被写体領域特定・テクスチャデータ抽出部19により抽出された人物の画像を合成する。人物の画像を合成するときの空間映像上の位置は、空間映像入力部13により入力される空間映像にもよるが、空間映像上の、例えば中央に固定的に設定してもよく、適宜の位置に可変設定できるようにしてもよい。これにより、空間映像入力部13により入力された空間映像内に人物が存在するような空間映像を再現できる。 The spatial video input unit 13 inputs an arbitrary spatial video. In addition, the subject data composition unit 14 synthesizes the person image extracted by the subject region specifying / texture data extraction unit 19 with the spatial video input by the spatial video input unit 13. The position on the spatial video when the person image is synthesized depends on the spatial video input by the spatial video input unit 13, but may be fixedly set on the spatial video, for example, at the center. The position may be variably settable. As a result, a spatial video in which a person exists in the spatial video input by the spatial video input unit 13 can be reproduced.

本発明は、遠隔地間でのテレビ会議やテレプレゼンスシステムなどの遠隔地コミュニケーションシステムに適用できる。 The present invention can be applied to a remote communication system such as a video conference or a telepresence system between remote locations.

拠点1,2の遠隔地コミュニケーションシステムでは、拠点1の画像から参加者(被写体)領域のテクスチャデータを抽出し、そのテクスチャデータを拠点2の空間映像に合成すれば、拠点2において、拠点1にいる参加者が恰も拠点2にいるかのような会議映像を再現できる。逆に、拠点2の画像から参加者(被写体)領域のテクスチャデータを抽出し、そのテクスチャデータを拠点1の空間映像に合成すれば、拠点1において、拠点2にいる参加者が恰も拠点1にいるかのような会議映像を再現できる。また、両者を組み合わせれば双方向コミュニケーションが可能となる。 In the remote communication system at bases 1 and 2, if the texture data of the participant (subject) area is extracted from the image at base 1 and the texture data is combined with the spatial video at base 2, the base 2 will be the base 1 It is possible to reproduce the conference video as if the participant is at the base 2. Conversely, if the texture data of the participant (subject) area is extracted from the image of the base 2 and the texture data is synthesized with the spatial video of the base 1, the participant at the base 2 at the base 1 is also the base 1 You can reproduce the conference video as if it were. If both are combined, two-way communication is possible.

以下、本発明が適用された遠隔地コミュニケーションシステムについて説明する。しかし、本発明は、その適用に限定されるものではない。 Hereinafter, a remote communication system to which the present invention is applied will be described. However, the present invention is not limited to the application.

図２は、本発明を双方向遠隔地コミュニケーションシステムに適用した実施形態を概略的に示すブロック図である。 FIG. 2 is a block diagram schematically showing an embodiment in which the present invention is applied to a two-way remote communication system.

この遠隔地コミュニケーションシステムは、双方向コミュニケーションの拠点を1,2とし、拠点１は空間取得装置20および空間再現装置21を備え、拠点2は空間取得装置20′および空間再現装置21′を備える。 In this remote communication system, the bases for two-way communication are 1 and 2, the base 1 is provided with a space acquisition device 20 and a space reproduction device 21, and the base 2 is provided with a space acquisition device 20 'and a space reproduction device 21'.

拠点1,2の空間が会議室であり、それぞれの会議室に被写体として会議参加者が各1名存在するものとすると、空間取得装置20と空間再現装置21′は、拠点1の空間画像から会議参加者(以下、参加者１と称する)領域のテクスチャデータ(参加者1の画像)を抽出して拠点2に伝送し、拠点2において、参加者1の画像を拠点2の空間映像に合成して会議映像を再現するために用いられる。また、空間取得装置20′と空間再現装置21は、拠点2の空間画像から会議参加者(以下、参加者2と称する)領域のテクスチャデータ(参加者2の画像)を抽出して拠点1に伝送し、拠点1おいて、参加者2の画像を拠点1の空間画像に合成して会議映像を再現するために用いられる。 Assuming that the space at sites 1 and 2 is a conference room, and there is one conference participant as a subject in each conference room, the space acquisition device 20 and the space reproduction device 21 ′ Extract the texture data (participant 1 image) of the conference participant (hereinafter referred to as participant 1) area and transmit it to the base 2. At the base 2, the participant 1 image is combined with the spatial video of the base 2 And used to reproduce the conference video. Further, the space acquisition device 20 ′ and the space reproduction device 21 extract texture data (participant 2 image) of the conference participant (hereinafter referred to as participant 2) region from the spatial image of the base 2 to the base 1. It is used to reproduce the conference video by combining the image of participant 2 with the spatial image of site 1 at site 1.

図３は、図２の空間取得装置20と空間再現装置21′の構成を詳細に示すブロック図であり、図１、図２と同一あるいは同等部分には同じ符号を付している。なお、図３では、空間取得装置20と空間再現装置21′を示しているが、空間取得装置20′と空間再現装置21も同様に構成される。 FIG. 3 is a block diagram showing in detail the configuration of the space acquisition device 20 and the space reproduction device 21 ′ of FIG. 2, and the same or equivalent parts as in FIGS. 1 and 2 are given the same reference numerals. Although FIG. 3 shows the space acquisition device 20 and the space reproduction device 21 ′, the space acquisition device 20 ′ and the space reproduction device 21 are configured similarly.

空間取得装置20は、奥行きデータを取得するデプスカメラ１台、テクスチャデータを取得するカラーカメラ１台を備える。このデプスカメラおよびカラーカメラは、拠点1の参加者１のほぼ正面に設置される。空間再現装置21′は、奥行きデータを取得するデプスカメラ1台、テクスチャデータを取得するカラーカメラ1台を備え、さらに、拠点1の参加者1の画像が合成された拠点2の空間映像を会議映像として再現するディスプレイ1台を備える。なお、本発明における拠点数、デプスカメラやカラーカメラやディスプレイの台数は任意であり、この例に限られない。 The space acquisition device 20 includes one depth camera that acquires depth data and one color camera that acquires texture data. The depth camera and the color camera are installed almost in front of the participant 1 at the base 1. The space reproduction device 21 'has one depth camera that acquires depth data and one color camera that acquires texture data, and also conferences the spatial image of the base 2 where the images of the participants 1 of the base 1 are synthesized. Equipped with one display that reproduces video. Note that the number of bases and the number of depth cameras, color cameras, and displays in the present invention are arbitrary, and are not limited to this example.

空間取得装置20は、奥行きデータ取得部11、テクスチャデータ取得部12、被写体データ抽出部10、光源推定部22および被写体データ送信部23を備える。奥行きデータ取得部11、テクスチャデータ取得部12、被写体データ抽出部10は、図１と同じである。 The space acquisition device 20 includes a depth data acquisition unit 11, a texture data acquisition unit 12, a subject data extraction unit 10, a light source estimation unit 22, and a subject data transmission unit 23. The depth data acquisition unit 11, the texture data acquisition unit 12, and the subject data extraction unit 10 are the same as those in FIG.

奥行きデータ取得部11は、参加者1を含む会議室全体の奥行きデータ(デプス値)を取得し、テクスチャデータ取得部12は、参加者１を含む会議室全体のテクスチャデータを取得する。 The depth data acquisition unit 11 acquires depth data (depth value) of the entire conference room including the participant 1, and the texture data acquisition unit 12 acquires texture data of the entire conference room including the participant 1.

被写体データ抽出部10は、奥行きデータ取得部11により取得された奥行きデータを用いて参加者1の領域候補を特定し、さらにテクスチャデータ取得部12により取得されたテクスチャデータの画像特徴を評価し、それらの結果を用いて参加者1の領域を特定し、参加者1の領域のテクスチャデータ(参加者1の画像)および奥行きデータを抽出する。具体的には、奥行きデータおよび奥行きデータ差分を用いて参加者１の領域候補を特定し、さらにテクスチャデータにおける各小領域の色情報の空間的連続性を評価し、参加者1の領域候補内の色情報が連続している領域を参加者１の領域として特定し、参加者1の画像および奥行きデータを抽出する。ここで、奥行きデータを用いることによって、会議中に照明変動があってもその影響を受けずに参加者1の領域候補を特定でき、さらにテクスチャデータにおける画像特徴を用いることによって参加者1の領域を適切に特定できる。 The subject data extraction unit 10 identifies the candidate area of the participant 1 using the depth data acquired by the depth data acquisition unit 11, and further evaluates the image characteristics of the texture data acquired by the texture data acquisition unit 12, Using these results, the region of participant 1 is specified, and texture data (participant 1 image) and depth data of participant 1's region are extracted. Specifically, the candidate area of Participant 1 is identified using the depth data and the depth data difference, and the spatial continuity of the color information of each small area in the texture data is evaluated. The region where the color information is continuous is specified as the region of the participant 1, and the image and depth data of the participant 1 are extracted. Here, by using the depth data, it is possible to identify the candidate area of Participant 1 without being affected by lighting fluctuations during the conference, and by using the image features in the texture data, the Participant 1 area Can be identified appropriately.

なお、ある時点で観測される参加者１と背景の奥行きデータ差分が顕著でなくても、奥行きデータを連続的に観測して累積することでその差分を顕著にすることができる。この手法を用いれば、参加者1の構成要素、例えば顔、胴体、腕などの各構成要素の領域候補まで別々に特定することが可能となる。これは、多様な色の構成要素を持つ参加者1の領域の特定に有効である。すなわち、まず、奥行きデータを用いて参加者1の構成要素、例えば顔、胴体、腕などの各構成要素の領域候補まで別々に特定し、次に、各構成要素内の色情報を用いて、各構成要素の領域を別々に特定し、さらに、各構成要素の領域の集まりとして参加者1の領域を特定できる。 Even if the difference in depth data between the participant 1 and the background observed at a certain time is not significant, the difference can be made significant by continuously observing and accumulating the depth data. By using this method, it is possible to separately specify the constituent elements of the participant 1, such as region candidates for the constituent elements such as the face, the torso, and the arms. This is useful for identifying Participant 1's region with various color components. That is, first, by using the depth data, separately specify the constituent elements of the participant 1, such as region candidates for each constituent element such as the face, torso, and arm, and then using the color information in each constituent element. The area of each component can be specified separately, and further, the area of participant 1 can be specified as a collection of areas of each component.

光源推定部22は、テクスチャデータ取得部12により取得されたテクスチャデータから、拠点1の会議室における光源の位置・方向や色味などの照明条件を推定する。光源の位置・方向は、テクスチャデータ取得部12により取得されたテクスチャデータをHSV色空間に変換し、所定範囲内のHue値を持つ各微小領域について照明の強弱を表すValue値のマップ化を行うことで推定できる。例えば、所定範囲内のHue値を持つ各微小領域についてのValue値が、右上で大きい値を持ち、それから離れるに従って小さい値になる場合、右上方に光源があると推定でき、中央上で大きい値を持ち、それから離れるに従って小さい値になる場合には、真上に光源があると推定できる。また、光源の色味は、例えば、Hue値から人間の肌色領域を推定し、この肌色領域のHue値の肌色基準からの偏差を評価することで推定できる。光源推定部22で、参加者1の領域、机などのオブジェクト領域、背景領域などの領域種別ごとに照明条件を推定し、領域種別ごとの推定結果を統合し、例えば領域種別ごとの推定結果の平均や多数となった推定結果などを採用すれば、照明条件の推定の信頼性を高めることができる。 The light source estimation unit 22 estimates illumination conditions such as the position / direction and color of the light source in the conference room at the base 1 from the texture data acquired by the texture data acquisition unit 12. For the position and direction of the light source, the texture data acquired by the texture data acquisition unit 12 is converted into the HSV color space, and the Value value representing the intensity of illumination is mapped for each minute region having a Hue value within a predetermined range. Can be estimated. For example, if the Value value for each small area with a Hue value within a given range has a large value in the upper right and becomes smaller as you move away from it, you can estimate that there is a light source in the upper right, and a large value in the center Can be estimated that the light source is directly above. The color of the light source can be estimated by, for example, estimating a human skin color area from the Hue value and evaluating the deviation of the Hue value of the skin color area from the skin color reference. The light source estimation unit 22 estimates the lighting conditions for each area type such as the participant 1 area, the object area such as a desk, and the background area, and integrates the estimation results for each area type. Employing an average or a large number of estimation results can increase the reliability of estimation of illumination conditions.

被写体データ送信部23は、被写体データ抽出部10により抽出された参加者1の領域のテクスチャデータを含む被写体データをネットワーク経由で連続的あるいは一定時間後ごとに拠点2に送信する。テクスチャデータは、そのまま送信してもよいが、適宜圧縮して送信してもよい。 The subject data transmission unit 23 transmits the subject data including the texture data of the area of the participant 1 extracted by the subject data extraction unit 10 to the base 2 continuously or every certain time after the network. The texture data may be transmitted as it is, but may be appropriately compressed and transmitted.

この実施形態では、拠点1,2における光源の照明条件の相違を吸収するようにするので、さらに、光源推定部22により推定された照明条件をテクスチャデータに含めて送信する。照明条件としての光源の位置は、例えば、照明の強弱をマップ化したものを送信すればよく、光源の色味は、基準色からの偏差を送信すればよい。なお、照明条件は、テクスチャデータほど頻繁に送らなくてもよい。 In this embodiment, since the difference in the illumination conditions of the light sources at the sites 1 and 2 is absorbed, the illumination conditions estimated by the light source estimation unit 22 are further included in the texture data and transmitted. The position of the light source as the illumination condition may be transmitted, for example, by mapping the intensity of illumination, and the color of the light source may be transmitted as a deviation from the reference color. The illumination condition may not be sent as frequently as texture data.

空間再現装置21′は、被写体データ受信部24、奥行きデータ取得部25、テクスチャデータ取得部26、オブジェクト特性推定部27、光源推定部28、被写体データ加工部29、被写体データ合成部14および空間表示部30を備える。 The spatial reproduction device 21 ′ includes a subject data reception unit 24, a depth data acquisition unit 25, a texture data acquisition unit 26, an object characteristic estimation unit 27, a light source estimation unit 28, a subject data processing unit 29, a subject data synthesis unit 14, and a spatial display. The unit 30 is provided.

奥行きデータ取得部25にはデプスカメラを用いることができ、テクスチャデータ取得部26にはカラーカメラを用いることができる。また、空間表示部30には各種ディスプレイを用いることができる。 A depth camera can be used for the depth data acquisition unit 25, and a color camera can be used for the texture data acquisition unit. Various displays can be used for the space display unit 30.

カラーカメラは、テクスチャデータ取得部26が拠点2の空間映像を取得するように設置される。この空間映像は、拠点2の参加者2の画像を含まなくてもよいが、この実施形態では、以下に説明するように、拠点1,2における光源の照明条件(色味)の違いを人間の肌色を利用して吸収するので、参加者2の肌色部分を含むようにする。 The color camera is installed so that the texture data acquisition unit 26 acquires the spatial video of the base 2. This spatial image does not have to include the image of the participant 2 at the base 2, but in this embodiment, as described below, the difference in the lighting conditions (colors) of the light sources at the bases 1 and 2 is human. So that the skin color of Participant 2 is included.

デプスカメラは、カラーカメラとほぼ同じ位置に設置され、また、その視野領域は、カラーカメラとほぼ同じである。ディスプレイは、参加者2がその表示画面を視ることができる位置、例えば表示画面が参加者1と相対するように設置される。 The depth camera is installed at substantially the same position as the color camera, and the visual field area is almost the same as that of the color camera. The display is installed such that the participant 2 can see the display screen, for example, the display screen is opposed to the participant 1.

被写体データ受信部24は、拠点1の被写体データ送信部23から送信された被写体データを受信し、そのテクスチャデータが圧縮されている場合には伸張して元のテクスチャデータに復元する。被写体データには、参加者１のテクスチャデータの他に光源の照明条件が含まれている。 The subject data receiving unit 24 receives the subject data transmitted from the subject data transmitting unit 23 at the base 1, and decompresses the restored data to the original texture data if the texture data is compressed. The subject data includes lighting conditions of the light source in addition to the texture data of the participant 1.

奥行きデータ取得部25は、拠点2の会議室についての奥行きデータ(デプス値)を取得する。テクスチャデータ取得部26は、拠点2の会議室についてのテクスチャデータ(空間映像)を取得する。 The depth data acquisition unit 25 acquires depth data (depth value) for the conference room at the base 2. The texture data acquisition unit 26 acquires texture data (spatial video) for the conference room at the base 2.

オブジェクト特性推定部27は、奥行きデータ取得部25により取得された奥行きデータおよびテクスチャデータ取得部により取得されたテクスチャデータを用いて拠点2の会議室に存在する机などのオブジェクトの三次元位置や形状などの特性を推定する。オブジェクトの形状は、図１の被写体領域特定・テクスチャデータ抽出部17の処理と同様に、奥行きデータ取得部25により取得された奥行きデータを用いてオブジェクト領域候補を特定し、さらにテクスチャデータ取得部26により取得されたテクスチャデータにおける画像特徴の空間的連続性を評価し、これらの結果を用いて特定できる。ここでも奥行きデータとテクスチャデータを用いることによりオブジェクトの形状を適切に推定できる。 The object characteristic estimation unit 27 uses the depth data acquired by the depth data acquisition unit 25 and the texture data acquired by the texture data acquisition unit to use the three-dimensional position and shape of an object such as a desk that exists in the conference room at the base 2 Estimate the characteristics. The shape of the object is determined by using the depth data acquired by the depth data acquisition unit 25 as in the processing of the subject region specification / texture data extraction unit 17 in FIG. The spatial continuity of the image features in the texture data acquired by the above can be evaluated and identified using these results. Again, the shape of the object can be estimated appropriately by using depth data and texture data.

光源推定部28は、拠点1の光源推定部22と同様であり、テクスチャデータ取得部26により取得されたテクスチャデータを用いて、拠点2の会議室における光源の位置や色味などの照明条件を推定する。 The light source estimation unit 28 is the same as the light source estimation unit 22 at the base 1, and uses the texture data acquired by the texture data acquisition unit 26 to determine the lighting conditions such as the position and color of the light source in the conference room at the base 2. presume.

被写体データ加工部29は、参加者1の画像が拠点2の空間映像に合成されたとき、拠点1,2の照明条件の違いによらず自然な会議映像が再現されるように、参加者1の画像を加工する。例えば、拠点1,2の光源の位置が異なり、拠点1の会議室における光源が右上にあり、拠点2の会議室における光源が真上にある場合、参加者1の画像のHSV色空間でのValue値は、右上から離れるに従って徐々に小さくなるのに対し、拠点2の空間映像のValue値は、中央上から離れるに従って徐々に小さくなる。これをそのまま合成した場合、再現される会議映像は不自然になる。そこで、拠点2の空間映像と参加者1の画像におけるValueの傾向および値を類似させる。ここでは、参加者1の画像を加工しているが、それに代えて拠点2の空間映像を加工してもよい。 The subject data processing unit 29 allows the participant 1 to reproduce a natural conference video regardless of the lighting conditions of the sites 1 and 2 when the image of the participant 1 is combined with the spatial video of the site 2. Process images. For example, if the locations of the light sources at locations 1 and 2 are different, the light source in the conference room at location 1 is in the upper right, and the light source in the conference room at location 2 is directly above, the image of participant 1 in the HSV color space The Value value gradually decreases with increasing distance from the upper right, whereas the Value value of the spatial video at the base 2 gradually decreases with increasing distance from the upper center. If this is synthesized as it is, the reproduced conference video becomes unnatural. Therefore, the tendency and value of Value in the spatial video of the base 2 and the image of the participant 1 are made similar. Here, the image of the participant 1 is processed, but a spatial image of the base 2 may be processed instead.

また、拠点1,2における光源の色味の相違によっても不自然な会議映像が再現される。この場合には、参加者1の画像における肌色領域のHue値を、参加者2の画像における肌色領域のHue値に一致させるような補正値で、参加者1の画像を補正すればよい。参加者1の画像を補正する代わりに、拠点2の空間映像を補正してもよい。これにより、拠点1,2の照明条件が相違する場合でも、拠点2の空間映像と参加者1の画像の違和感をなくすことができる。 In addition, an unnatural conference video is reproduced due to the difference in the color of the light source at the sites 1 and 2. In this case, the image of the participant 1 may be corrected with a correction value that matches the Hue value of the skin color region in the image of the participant 1 with the Hue value of the skin color region in the image of the participant 2. Instead of correcting the image of the participant 1, the spatial video of the base 2 may be corrected. Thereby, even when the lighting conditions of the bases 1 and 2 are different, it is possible to eliminate the uncomfortable feeling between the spatial video of the base 2 and the image of the participant 1.

被写体データ合成部19は、被写体データ加工部29により加工された参加者1の画像をテクスチャデータ取得部26により取得された拠点2の空間映像に合成する。参加者1の画像を合成する空間映像上の位置は、例えば、空間映像上の中央や適宜の位置に設定できる。この実施形態では、さらに、拠点2の空間映像に対して拠点1の参加者1の画像を合成する際に、オブジェクト特性推定部27により推定されたオブジェクトの三次元位置や形状などの特性を考慮して、参加者1の画像をオブジェクトに対して隠蔽、配置するようにしている。参加者1の画像が合成される空間映像上の位置およびオブジェクト特性推定部27により推定されたオブジェクトの三次元位置や形状などの特性から参加者1の画像とオブジェクトの重なり具合が分かるので、その重なり具合により、参加者1の画像をオブジェクトに対して隠蔽、配置できる。例えば、拠点2の空間映像において参加者1より前とする奥行きデータの値を予め定めておくことにより、参加者1の画像が合成されたときに参加者1を隠蔽するオブジェクトを判定でき、この判定結果と参加者1の画像およびオブジェクトの形状により参加者1の画像をオブジェクトに対して隠蔽、配置できる。なお、参加者1の画像そのままではその表示サイズが適当でない場合、参加者1の画像の大きさを、拠点2の空間映像における特定領域の大きさを用いて決定する(例えば、参加者1の顔の大きさが参加者2の顔領域の大きさと同じになるように決定する)などすれば、参加者1の画像を適当な表示サイズにできる。 The subject data synthesis unit 19 synthesizes the image of the participant 1 processed by the subject data processing unit 29 with the spatial video of the base 2 acquired by the texture data acquisition unit 26. The position on the spatial video where the images of the participant 1 are combined can be set, for example, at the center or an appropriate position on the spatial video. In this embodiment, when synthesizing the image of the participant 1 of the base 1 with the spatial video of the base 2, characteristics such as the three-dimensional position and shape of the object estimated by the object characteristic estimation unit 27 are taken into consideration. Thus, the image of participant 1 is concealed and arranged with respect to the object. Participant 1's image and object overlap estimated from the position on the spatial video where the participant 1's image is synthesized and the characteristics such as the 3D position and shape of the object estimated by the object characteristic estimation unit 27. Due to the overlapping condition, the image of the participant 1 can be hidden and arranged with respect to the object. For example, by predetermining the value of the depth data that precedes the participant 1 in the spatial video of the base 2, it is possible to determine an object that hides the participant 1 when the image of the participant 1 is synthesized. The image of participant 1 can be concealed and arranged with respect to the object based on the determination result, the image of participant 1 and the shape of the object. If the display size of the participant 1 image is not suitable as it is, the size of the participant 1 image is determined using the size of the specific area in the spatial video of the base 2 (for example, the participant 1 If the size of the face is determined to be the same as the size of the face area of the participant 2), the image of the participant 1 can be set to an appropriate display size.

また、拠点1,2においてデプスカメラに対する参加者1とオブジェクトの奥行き方向の位置を予め定めておいてもよい。例えば、拠点1の参加者1の位置と拠点2の机の位置を、デプスカメラにより取得される参加者1の奥行きデータの値が、拠点2の机の直ぐ奥側の奥行きデータの値となるような位置に予め設定しておく。このようにすれば、参加者1とオブジェクトの奥行きデータを必要とするが、奥行きデータ同士の比較により参加者1の画像とオブジェクトの隠蔽関係を判定でき、また、参加者1の画像そのままで適当な表示サイズとして合成できる。以上により、自然さを損なうことなく参加者1の画像を拠点2の空間映像に合成できる。 Further, the positions of the participant 1 and the object in the depth direction with respect to the depth camera at the bases 1 and 2 may be determined in advance. For example, the depth data value of the participant 1 acquired by the depth camera for the position of the participant 1 at the base 1 and the desk 2 at the base 2 becomes the value of the depth data immediately behind the desk at the base 2 Such a position is set in advance. In this way, the depth data of the participant 1 and the object are required, but the concealment relationship between the image of the participant 1 and the object can be determined by comparing the depth data, and the participant 1 image is suitable as it is. Can be combined as a simple display size. As described above, the image of the participant 1 can be combined with the spatial image of the base 2 without deteriorating the naturalness.

空間表示部30は、被写体データ合成部19により生成された会議映像を再現する。なお、空間表示部30で再現する会議映像は、テクスチャデータ取得部26により取得されたテクスチャデータの全画像領域でなくてもよく、例えば、拠点1の参加者1の画像が合成された領域だけを抽出して再現してもよい。 The space display unit 30 reproduces the conference video generated by the subject data synthesis unit 19. Note that the conference video reproduced by the space display unit 30 may not be the entire image region of the texture data acquired by the texture data acquisition unit 26, for example, only the region where the images of the participants 1 at the base 1 are combined. May be extracted and reproduced.

以上のように、拠点1の参加者1の画像を拠点2の空間映像に合成して会議映像を再現することにより、拠点1の参加者1が恰も拠点2にいるかのような感覚を提供できる。 As described above, by synthesizing the image of participant 1 at site 1 with the spatial video at site 2 and reproducing the conference video, it is possible to provide a feeling as if participant 1 at site 1 is also at site 2. .

図３では、拠点1から参加者1の画像を拠点2へ伝送し、拠点2の空間映像に合成して会議映像を再現することにより、参加者1が恰も拠点2にいるかのような感覚を提供する仕組みについて説明したが、逆に、拠点2から参加者2の画像を拠点1へ伝送し、拠点1の空間映像に合成して会議映像を再現することにより、参加者2が恰も拠点1にいるかのような感覚を提供する仕組みとすることができ、両方の仕組みを備えることにより、臨場感の高い双方向コミュニケーションを実現することもできる。 In Figure 3, the image of Participant 1 is transmitted from Base 1 to Base 2, and it is combined with the spatial video of Base 2 to reproduce the conference video, so that it feels as if Participant 1 is also at Base 2. We explained the mechanism to be provided, but conversely, by transmitting the image of Participant 2 from Base 2 to Base 1 and combining it with the spatial video of Base 1, Participant 2 was able to visit Base 1 It is possible to provide a mechanism that provides a feeling as if it is present, and by providing both mechanisms, it is possible to realize two-way communication with a high sense of reality.

図４は、本発明における動作を概略的に示す説明図である。拠点1では、参加者1を含む空間を撮影し(S1)、その画像から参加者1の画像を抽出し(S2)、参加者1の画像を拠点2に送信する(S3)。S2では、まず、奥行きデータを用いて参加者1の領域候補を特定し、また、テクスチャデータにおける画像特徴を抽出して評価し、これらの結果を用いて参加者1の画像を抽出する。拠点2では、拠点2の空間を撮影し(S4)、その空間映像に拠点1から送信された参加者1の画像を合成する(S5)。 FIG. 4 is an explanatory diagram schematically showing the operation in the present invention. The site 1 captures the space including the participant 1 (S1), extracts the image of the participant 1 from the image (S2), and transmits the image of the participant 1 to the site 2 (S3). In S2, first, an area candidate of the participant 1 is specified using the depth data, and an image feature in the texture data is extracted and evaluated, and an image of the participant 1 is extracted using these results. At the base 2, the space of the base 2 is photographed (S4), and the image of the participant 1 transmitted from the base 1 is combined with the spatial video (S5).

拠点2では、参加者2を含む空間を撮影し(S6)、その画像から参加者2の画像を抽出し(S7)、抽出した参加者2の画像を拠点1に送信する(S8)。S7では、まず、奥行きデータを用いて参加者2の領域候補を特定し、また、テクスチャデータにおける画像特徴と抽出して評価し、これらの結果を用いて参加者2の画像を抽出する。拠点1では、拠点1の空間を撮影し(S9)、その空間映像に拠点2から送信された参加者2の画像を合成する(S10)。 The base 2 captures the space including the participant 2 (S6), extracts the image of the participant 2 from the image (S7), and transmits the extracted image of the participant 2 to the base 1 (S8). In S7, first, the region 2 candidate of the participant 2 is specified using the depth data, extracted and evaluated as an image feature in the texture data, and the image of the participant 2 is extracted using these results. At the base 1, the space of the base 1 is photographed (S9), and the image of the participant 2 transmitted from the base 2 is combined with the spatial video (S10).

なお、S9,S6は、S1,S4で共用でき、それによりS9,S6を省くことができる。この場合、S1で撮影された画像のうち、参加者1を含まない領域を拠点1の空間映像とすることもでき、S6で撮影された画像のうち、参加者2を含まない領域を拠点2の空間映像とすることもできる。これにより、図３の空間取得装置21と空間再現装置22′のデプスカメラ、カラーカメラを兼用することができる。 S9 and S6 can be shared by S1 and S4, so that S9 and S6 can be omitted. In this case, an area that does not include participant 1 in the image captured in S1 can be used as a spatial image of base 1, and an area that does not include participant 2 in the image captured in S6. It can also be a spatial image. Thereby, the depth camera and the color camera of the space acquisition device 21 and the space reproduction device 22 ′ of FIG. 3 can be used together.

以上、実施形態について説明したが、本発明は、上記実施形態に限定されず、種々に変形したものも含む。例えば、上記実施形態では、拠点を2つとしたが、本発明は、3以上の拠点間での遠隔地間コミュニケーションシステムに適用することもできる。また、上記実施形態では、画像特徴を色情報とし、その空間的連続性を評価しているが、画像特徴やその評価対象は、奥行きデータを用いた特定での信頼性を補完するものであれば他のものでもよく、テクスチャや輝度の空間的連続性を評価して被写体領域を特定することもできる。 As mentioned above, although embodiment was described, this invention is not limited to the said embodiment, What contains various deformation | transformation is also included. For example, in the above embodiment, there are two bases, but the present invention can also be applied to a remote communication system between three or more bases. In the above embodiment, the image feature is color information, and its spatial continuity is evaluated. However, the image feature and the evaluation target may complement the reliability in the identification using depth data. For example, the subject area can be specified by evaluating the spatial continuity of texture and brightness.

また、図１では、画像特徴抽出部17で画像全領域の画像特徴を抽出しているが、画像特徴抽出部17でも、被写体領域候補を用いてその境界近傍付近だけの画像特徴を抽出するようにすれば、画像特徴抽出の処理負担を軽減できる。また、被写体領域候補の特定とは独立して並列的に、テクスチャデータ全領域を画像特徴に従って区分し、これにより区分された領域から、被写体領域候補内の画像特徴を用いて被写体領域を特定するようにしてもよい。この場合でも、被写体領域候補内の画像特徴を適宜選択し、その画像特徴を用いて被写体領域を特定できるので、被写体領域の画像特徴が未知の場合や多様な場合でも被写体領域を特定できる。 In FIG. 1, the image feature extraction unit 17 extracts the image features of the entire image area. However, the image feature extraction unit 17 also uses the subject region candidate to extract only the image features near the boundary. By doing so, the processing load of image feature extraction can be reduced. Further, in parallel with the specification of the subject region candidate, the entire texture data region is divided according to the image feature, and the subject region is specified from the divided region using the image feature in the subject region candidate. You may do it. Even in this case, an image feature in the subject region candidate is appropriately selected, and the subject region can be specified using the image feature. Therefore, the subject region can be specified even when the image feature of the subject region is unknown or various.

また、図３では、拠点1,2間で、参加者1の画像に加えて、光源の照明条件や奥行きデータも送受信しているが、拠点1,2における光源の照明条件に相違がない場合やその相違を吸収する必要がない場合には、光源の照明条件を送受信する必要がない。また、参加者1とオブジェクトの隠蔽・配置を考慮する必要がない場合には、奥行きデータを送受信する必要もない。 In addition, in FIG. 3, the lighting conditions and depth data of the light source are transmitted / received between the bases 1 and 2 in addition to the image of the participant 1, but there is no difference in the lighting conditions of the light sources at the bases 1 and 2. If there is no need to absorb the difference, it is not necessary to transmit / receive the illumination conditions of the light source. In addition, when it is not necessary to consider the concealment / arrangement of objects with the participant 1, there is no need to transmit / receive depth data.

また、図３では、デプスカメラおよびカラーカメラにより、拠点2の会議室についての奥行きデータおよびテクスチャデータを取得しているが、それらは、会議中でなく事前に取得しておいてもよい。 In FIG. 3, the depth data and the texture data for the conference room at the base 2 are acquired by the depth camera and the color camera. However, they may be acquired in advance rather than during the conference.

また、光源の色味の相違は、人間の肌色でない既知の色を利用して補正することもでき、その場合には、拠点2の空間映像は、参加者2の肌色部分を含まなくてもよい。 In addition, the difference in color of the light source can be corrected by using a known color that is not human skin color. In this case, the spatial image of the base 2 does not include the skin color part of the participant 2 Good.

また、拠点2の空間映像を、参加者2の背面から撮影したものとし、その参加者2の前に拠点1の参加者1がいるように、参加者2の背面画像を含む空間映像に参加者1の画像を合成してもよい。 Also, the spatial video of Base 2 was taken from the back of Participant 2, and participated in the spatial video including the back image of Participant 2 so that Participant 1 of Base 1 is in front of Participant 2. The image of the person 1 may be synthesized.

また、テレビ会議では参加者の前に机が存在するのが普通であり、奥行きデータ差分からは机の領域まで含んで参加者の領域候補として特定されることもあるが、机と参加者のテクスチャデータや奥行きデータはその特徴が相違するので、その相違から机の領域を排除できる。また、参加者の前に参加者の画像と重なるオブジェクト(会議用ノートパソコンなど)が存在することもあり、そのオブジェクトの領域まで含んで参加者の領域候補として特定される場合には、そのオブジェクトの領域を含む領域を、参加者の領域候補、参加者の領域と特定してもよい。 Also, in a video conference, there is usually a desk in front of the participant, and from the depth data difference, it may be specified as a candidate area of the participant including the desk area. Since texture data and depth data have different characteristics, the desk area can be excluded from the difference. In addition, there may be an object that overlaps the participant's image in front of the participant (such as a notebook computer for a meeting), and if the object area is specified as a candidate area for the participant, that object is included. The region including the region may be specified as a participant region candidate and a participant region.

さらに、オブジェクトにより参加者が隠蔽されて参加者の一部が欠けて参加者の領域候補として特定される場合には、そのまま参加者の領域を特定すればよい。また、参加者の直ぐ後ろに椅子が存在し、椅子の領域まで参加者の領域候補として特定される場合、椅子の領域を含めて参加者の領域と特定してもよい。 Further, when the participant is hidden by the object and a part of the participant is missing and is specified as the candidate region of the participant, the participant region may be specified as it is. Moreover, when a chair exists immediately behind the participant and the region of the chair is specified as the candidate region of the participant, the region including the chair region may be specified as the region of the participant.

10・・・被写体データ抽出部、11・・・奥行きデータ取得部、12・・・テクスチャデータ取得部、13・・・空間映像入力部、14・・・被写体データ合成部、15・・・奥行きデータ差分計算部、16・・・被写体領域候補特定部、17・・・画像特徴抽出部、18・・・画像特徴評価部、19・・・被写体領域特定・テクスチャデータ抽出部、20,20′・・・空間取得装置、21,21′・・・空間再現装置、22,28・・・光源推定部、23・・・被写体データ送信部、24・・・被写体データ受信部、25・・・奥行きデータ取得部、26・・・テクスチャデータ取得部、27・・・オブジェクト特性推定部、29・・・被写体データ加工部、30・・・空間表示部 10 ... Subject data extraction unit, 11 ... Depth data acquisition unit, 12 ... Texture data acquisition unit, 13 ... Spatial image input unit, 14 ... Subject data composition unit, 15 ... Depth Data difference calculation unit, 16 ... subject region candidate specifying unit, 17 ... image feature extracting unit, 18 ... image feature evaluating unit, 19 ... subject region specifying / texture data extracting unit, 20, 20 ' ... Space acquisition device, 21, 21 '... Space reproduction device, 22, 28 ... Light source estimation unit, 23 ... Subject data transmission unit, 24 ... Subject data reception unit, 25 ... Depth data acquisition unit, 26 ... Texture data acquisition unit, 27 ... Object property estimation unit, 29 ... Subject data processing unit, 30 ... Spatial display unit

Claims

Depth data acquisition means for acquiring depth data in a space including the subject;
Texture data acquisition means for acquiring texture data in a space including the subject;
Subject data extraction means for specifying a subject area from the depth data acquired by the depth data acquisition means and the texture data acquired by the texture data acquisition means, and extracting the texture data of the subject area;
A spatial video input means for inputting an arbitrary spatial video;
Subject data synthesizing means for synthesizing the texture data of the subject area extracted by the subject data extracting means with the spatial video input by the input means;
The subject data extraction unit specifies a subject region candidate using a difference between a subject and a background in the depth data acquired by the depth data acquisition unit, and further, an image feature in the texture data acquired by the texture data acquisition unit A subject image extraction and synthesis device characterized by extracting the texture data from the results of the evaluation of the subject region.

The subject data extraction means evaluates the color or brightness of a small area or the spatial continuity of texture information as an image feature in the texture data acquired by the texture data acquisition means, including similarity. The subject image extraction and composition device according to claim 1.

The subject data extraction unit is configured to evaluate an image feature in texture data near a boundary of a subject region candidate, and identify a subject region by complementing a boundary of the subject region candidate with the evaluation result. 3. A subject image extraction and synthesis apparatus according to 1 or 2.

The depth data acquisition means includes accumulation means for continuously obtaining and accumulating depth data in a space including a subject,
The subject data extracting means identifies a subject area candidate by identifying a candidate area of each component constituting the subject from a difference between a subject and a background in the depth data accumulated by the accumulating means. The subject image extraction and synthesis device according to any one of 1 to 3.

Object characteristic estimation means for estimating the characteristics of an object in the spatial video using the texture data and depth data of the spatial video input by the spatial video input means,
5. The subject data synthesizing unit conceals and arranges texture data of a subject region with respect to texture data of a spatial video according to the object characteristic estimated by the object characteristic estimation unit. The subject image extraction and synthesis device according to any one of the above.

The object characteristic estimation unit specifies an object region candidate using the depth data of the spatial video input by the spatial video input unit, and further determines an image feature in the texture data of the spatial video input by the spatial video input unit. 6. The subject image extracting and synthesizing apparatus according to claim 5, wherein the subject image extracting and synthesizing apparatus estimates the three-dimensional position and shape of the object as a characteristic of the object based on the evaluation.

The object characteristic estimation means evaluates the spatial continuity of color or luminance or texture information including the similarity as an image feature in the texture data of the spatial video input by the spatial video input means. The subject image extraction and synthesis device according to claim 6.

The subject data synthesis means uses the texture data of the object area as the texture data of the part of the subject area that overlaps the object area whose depth data in the object characteristics estimated by the object characteristic estimation means is equal to or less than a predetermined value. 6. The subject image extracting and synthesizing device according to claim 5, wherein the subject image extracting and synthesizing device is concealed.

The subject data synthesizing means includes a texture of a portion of the subject area that overlaps an object area whose depth data in the object characteristics estimated by the object characteristic estimation means is equal to or less than a depth data value of the subject acquired by the depth data acquisition means. 6. The subject image extracting and synthesizing apparatus according to claim 5, wherein the data is concealed by texture data of the object area.

A first light source estimating means for estimating an illumination condition of the light source in a space including the subject;
Second light source estimation means for estimating an illumination condition of a light source in a space from which the spatial video input by the spatial video input means is acquired;
Subject processing means for processing the texture data of the subject area extracted by the subject data extraction means according to the difference in illumination conditions of the light source estimated by the first and second light source estimation means;
10. The subject image extraction according to claim 1, wherein texture data of a subject area processed by the subject processing unit is combined with a spatial image input by the spatial image input unit. And synthesizer.

The first light source estimation unit converts the texture data acquired by the texture data acquisition unit into an HSV color space, and compares the value in a plurality of minute regions having a Hue value within a predetermined range to determine the subject. Estimate the lighting conditions of the light source in the including space,
The second light source estimation unit converts texture data of the spatial video input by the spatial video input unit into an HSV color space, and compares the Value values in a plurality of minute regions having a Hue value within a predetermined range. The apparatus for extracting and synthesizing a subject image according to claim 10, wherein an illumination condition of a light source in a space from which a spatial video is acquired is estimated.