JP4745724B2

JP4745724B2 - Image processing method and image processing apparatus

Info

Publication number: JP4745724B2
Application number: JP2005168403A
Authority: JP
Inventors: 要谷村; 清秀佐藤; 登志一大島
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-06-08
Filing date: 2005-06-08
Publication date: 2011-08-10
Anticipated expiration: 2025-06-08
Also published as: JP2006343953A

Description

本発明は、現実空間の画像上に、仮想空間の画像を重畳させた合成画像を生成する為の技術に関するものである。 The present invention relates to a technique for generating a composite image in which a virtual space image is superimposed on a real space image.

実写風景を背景として、その上にコンピュータグラフィックス（以下、ＣＧと記述）を重畳して体験者に対して提示し、この体験者にあたかもその場に仮想の物体が存在するかのような体験を行わせる複合現実感(Mixed Reality:MR)の技術がある。 With a live-action scene in the background, computer graphics (hereinafter referred to as CG) are superimposed on it and presented to the experiencer, and this experience is as if a virtual object exists on the spot. There is a mixed reality (MR) technology that makes

このＭＲ技術を用いて臨場感豊かな体験を実現するためには、背景となる実写風景の上にただ単純にＣＧを重畳して表示するだけではなく、体験者が実際にＣＧで描画される仮想物体（以下、単に仮想物体と言うことがある）に触ったり操作する（しているように体感させる）といったインタラクションが重要になってくる。そして、このようなインタラクションを実現するめには、仮想物体よりも手前（前景）に仮想物体を操作する体験者の手など（以下、被写体という）を表示することが必要である。なぜなら、仮想物体よりも手前にあるべき被写体が仮想物体によって隠されてしまうと、仮想物体との距離感や現実感が破綻し、臨場感を損ねてしまうからである。 In order to realize a rich experience using this MR technology, not only simply superimposing and displaying the CG on the background of the live-action scene, but also the actual user is drawn in CG. Interactions such as touching and operating a virtual object (hereinafter sometimes simply referred to as a virtual object) are becoming important. In order to realize such interaction, it is necessary to display the hands of an experienced person who operates the virtual object (hereinafter referred to as a subject) in front of the virtual object (foreground). This is because if the subject that should be in front of the virtual object is hidden by the virtual object, the sense of distance from the virtual object and the sense of reality will break down, impairing the sense of reality.

このような課題を解決するために、出願人は特許文献１において、最前景とすべき被写体の画像をＣＧ画像によって隠さないようにする技術を提案した。この技術は、背景と被写体とを実写画像として取得し、この実写画像から、予め手作業でシステムに登録した、ＣＧ画像より手前に表示すべき被写体（被写体検出情報としての色情報を有する領域）を被写体領域として抽出し、被写体領域にはＣＧ画像の描画を禁止するものである。この技術により、最前景となるべき被写体がＣＧ画像で隠されることなく、仮想物体よりも手前にあるように表示され、臨場感の高い複合現実感体験を行うことが可能となる。 In order to solve such a problem, the applicant proposed in Japanese Patent Application Laid-Open No. H11-228707 a technique for preventing the image of the subject to be the foreground from being hidden by the CG image. In this technique, a background and a subject are acquired as a live-action image, and a subject to be displayed in front of a CG image (region having color information as subject detection information) registered in the system manually from the live-action image in advance. Is extracted as a subject region, and drawing of a CG image is prohibited in the subject region. With this technique, the subject that should be the foreground is displayed so as to be in front of the virtual object without being hidden by the CG image, and a mixed reality experience with a high presence can be performed.

また、実写画像とコンピュータグラフィックス画像とを合成する従来技術として、背景領域を特定の色によって着色し、クロマキー合成によってこの背景領域をコンピュータグラフィックス画像と置き換えるものがある。このような従来技術では実写画像は背景以外の被写体に限定されていた。
特開２００３−２９６７５９号公報 Further, as a conventional technique for synthesizing a real image and a computer graphics image, there is a technique in which a background region is colored with a specific color and this background region is replaced with a computer graphics image by chroma key composition. In such a conventional technique, the photographed image is limited to a subject other than the background.
JP 2003-296759 A

特許文献１における複合現実感体験システムでは、体験者の見ている現実の風景に被写体と類似した色情報を持つ領域がほとんど存在しない場合に良好に動作する。しかしながら、体験者の見ている現実の風景に、他の体験者が存在する風景では、本来ＣＧ画像の奥にいるべき他の体験者の被写体領域（手や顔）までもＣＧ画像の描画を禁止してしまい、他の体験者の被写体領域がＣＧ画像よりも手前にあるかのように表示され、体験者の現実感を損なうことがあった。 The mixed reality experience system disclosed in Patent Document 1 operates well when there is almost no region having color information similar to the subject in the actual landscape viewed by the user. However, in a landscape where other users are present in the actual landscape viewed by the user, the CG image is drawn even to the subject area (hand or face) of the other user who should originally be behind the CG image. It is prohibited, and the subject area of the other experiencing person is displayed as if it is in front of the CG image, and the actual feeling of the experiencing person may be impaired.

このような技術的背景から、体験者の手や指定した被写体領域のみ抽出するために被写体領域を実写画像より抽出した後、抽出した領域から目的とする領域（体験者の手など）を選択し、選択した領域のみに対してＣＧ画像を描画しない処理を施し、複合現実感画像を生成することが望まれている。 From this technical background, after extracting the subject area from the live-action image in order to extract only the hands of the experienced person or the specified subject area, the target area (such as the hands of the experienced person) is selected from the extracted area. It is desired to generate a mixed reality image by performing a process of not rendering a CG image only on a selected region.

本発明は以上の問題に鑑みてなされたものであり、現実空間の画像において、仮想空間の画像を重畳させない領域を適切に特定するための技術を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for appropriately specifying a region in which a virtual space image is not superimposed in a real space image.

本発明の目的を達成するために、例えば、本発明の画像処理方法は以下の構成を備える。 In order to achieve the object of the present invention, for example, an image processing method of the present invention comprises the following arrangement.

すなわち、画像処理装置が行う画像処理方法であって、前記画像処理装置が有する取得手段が、観察者と該観察者と異なる他の人物とが存在する現実空間で該観察者の一部と該他の人物とを観察者の視点から撮像した撮像画像を取得する取得工程と、前記画像処理装置が有する登録手段が、前記撮像画像において前記観察者の一部の色として判断する被写体色を登録する登録工程と、前記画像処理装置が有する抽出手段が、前記取得した撮像画像内で、前記登録された被写体色を有する被写体領域を抽出する抽出工程と、前記画像処理装置が有する導出手段が、前記抽出された被写体領域のそれぞれのサイズを求める導出工程と、前記画像処理装置が有する計測手段が、前記観察者と前記他の人物との距離を計測する計測工程と、前記画像処理装置が有する決定手段が、前記距離に基づいて、前記他の人物によって前記仮想空間の画像がマスクされず、前記観察者の一部によって前記仮想空間の画像がマスクされる被写体領域のサイズの閾値を決定する決定工程と、前記画像処理装置が有する生成手段が、前記決定された閾値より大きいサイズの前記抽出された被写体領域を前記観察者の一部とみなして、当該被写体領域により前記仮想空間の画像をマスクするマスク画像を生成する生成工程と、前記画像処理装置が有する重畳手段が、前記撮像画像に、前記マスク画像でマスクされた前記仮想空間の画像を重畳する重畳工程と、前記画像処理装置が有する出力手段が、前記仮想空間の画像を重畳された撮像画像を出力する出力工程とを有することを特徴とする。 That is, an image processing method performed by an image processing apparatus, in which the acquisition unit included in the image processing apparatus includes a part of the observer in the real space where the observer and another person different from the observer exist. An acquisition step of acquiring a captured image obtained by capturing another person from the viewpoint of the observer, and a registration color that the registration unit of the image processing apparatus determines as a partial color of the observer in the captured image is registered. A registration step, an extraction unit included in the image processing device extracts an object region having the registered subject color in the acquired captured image, and a derivation unit included in the image processing device. a derivation step of obtaining a size of each of the extracted subject region, measuring means for the image processing apparatus has found a measuring step of measuring a distance between the other person and the observer, the image processing Determining means the device has, based on the distance, the image of the virtual space by the other person is not masked, the threshold of the size of the observer's subject region image of the virtual space by a part is masked And a generation unit included in the image processing apparatus regards the extracted subject area having a size larger than the determined threshold as a part of the observer, and uses the subject area as the virtual space. a generation step of generating a mask image for masking the image, superimposing means for the image processing apparatus has found to the captured image, and superimposing step of superimposing the image of the mask by the mask image the virtual space, the image An output unit included in the processing apparatus includes an output step of outputting a captured image on which the image of the virtual space is superimposed.

本発明の目的を達成するために、例えば、本発明の画像処理装置は以下の構成を備える。 In order to achieve the object of the present invention, for example, an image processing apparatus of the present invention comprises the following arrangement.

すなわち、観察者と該観察者と異なる他の人物とが存在する現実空間で該観察者の一部と該他の人物とを観察者の視点から撮像した撮像画像を取得する取得手段と、前記撮像画像において前記観察者の一部の色として判断する被写体色を登録する登録手段と、前記取得した撮像画像内で、前記登録された被写体色を有する被写体領域を抽出する抽出手段と、前記抽出された被写体領域のそれぞれのサイズを求める導出手段と、前記観察者と前記他の人物との距離を計測する計測手段と、前記距離に基づいて、前記他の人物によって前記仮想空間の画像がマスクされず、前記観察者の一部によって前記仮想空間の画像がマスクされる被写体領域のサイズの閾値を決定する決定手段と、前記決定された閾値より大きいサイズの前記抽出された被写体領域を前記観察者の一部とみなして、当該被写体領域により前記仮想空間の画像をマスクするマスク画像を生成する生成手段と、前記撮像画像に、前記マスク画像でマスクされた前記仮想空間の画像を重畳する重畳手段と、前記仮想空間の画像を重畳された撮像画像を出力する出力手段とを有することを特徴とする。 That is, an acquisition unit that acquires a captured image obtained by capturing a part of the observer and the other person from the viewpoint of the observer in a real space where the observer and another person different from the observer exist; Registration means for registering a subject color to be determined as a part of the color of the observer in the captured image, extraction means for extracting a subject area having the registered subject color in the acquired captured image, and the extraction Deriving means for determining the size of each of the subject areas, measuring means for measuring the distance between the observer and the other person, and masking the image of the virtual space by the other person based on the distance Sarezu, determining means and, the photographic said extracted larger size than the determined threshold image of the virtual space by a part of the observer to determine the threshold value of the size of the subject region to be masked Considers region and part of the observer, and generation means for generating a mask image for masking the image of the subject region by the virtual space, the captured image, the image of the virtual space that is masked by the mask image And superimposing means for superimposing and output means for outputting a captured image on which the image of the virtual space is superimposed.

本発明の構成により、現実空間の画像において、仮想空間の画像を重畳させない領域を適切に特定することができる。 According to the configuration of the present invention, it is possible to appropriately specify an area in which an image in the virtual space is not superimposed on the real space image.

以下添付図面を参照して、本発明を好適な実施形態に従って詳細に説明する。 Hereinafter, the present invention will be described in detail according to preferred embodiments with reference to the accompanying drawings.

図１は、ユーザ（体験者）に現実空間と仮想空間とが融合した複合現実空間を体験させるための、本実施形態に係るシステムの機能構成を示すブロック図である。また、本実施形態に係るシステムでは、現実空間の画像上に仮想空間の画像を重畳させる際に、現実空間の画像中に所定の被写体が含まれている場合には、この被写体の領域に関しては常に仮想空間の画像よりも手前に表示するよう、重畳処理を制御する。なお、本実施形態ではこの所定の被写体の一例として「手」を用いる。 FIG. 1 is a block diagram showing a functional configuration of a system according to the present embodiment for allowing a user (experience person) to experience a mixed reality space in which a real space and a virtual space are fused. Further, in the system according to the present embodiment, when a virtual subject image is superimposed on a real space image and a predetermined subject is included in the real space image, the subject region is related. The superimposition process is controlled so that the image is always displayed in front of the virtual space image. In the present embodiment, “hand” is used as an example of the predetermined subject.

図１に示す如く、本実施形態に係るシステムは、ビデオカメラ１０１、画像入力部１０２、被写体領域抽出部１０３、カメラ位置姿勢センサ１０４、カメラ位置姿勢計測部１０５、画像生成部１０６、仮想空間データベース１０７、画像合成部１０８、頭部搭載型画像表示装置（以下、ＨＭＤと呼称）１０９、被写体色情報登録部１１０、被写体領域選択部１１１により構成されている。 As shown in FIG. 1, the system according to the present embodiment includes a video camera 101, an image input unit 102, a subject area extraction unit 103, a camera position and orientation sensor 104, a camera position and orientation measurement unit 105, an image generation unit 106, and a virtual space database. 107, an image composition unit 108, a head-mounted image display device (hereinafter referred to as HMD) 109, a subject color information registration unit 110, and a subject region selection unit 111.

また、図２は、本実施形態に係るシステムを使用している様子を示す図である。以下、図１，２を用いて、本実施形態に係るシステムの構成について説明する。 Moreover, FIG. 2 is a figure which shows a mode that the system which concerns on this embodiment is used. Hereinafter, the configuration of the system according to the present embodiment will be described with reference to FIGS.

なお、本システムにおける体験者は一人、または複数人を想定している。つまり、図２の体験者２０１が椅子２０２に着座している状況で、他の体験者２０８が体験者２０１から見える状態で、体験者２０１は複合現実感空間体験を行うなどの状況が考えられる。 In addition, the experience person in this system assumes one person or multiple persons. In other words, in the situation where the experiencer 201 of FIG. 2 is seated on the chair 202, the experiencer 201 can experience a mixed reality space experience while the other experiencer 208 can be seen by the experiencer 201. .

ビデオカメラ１０１は現実空間の動画像を撮像するものであり、図２に示す如く、ＨＭＤ１０９に取り付けられている。ビデオカメラ１０１の取り付け位置は、体験者２０１が自身の頭部２０３にＨＭＤ１０９を装着した場合に、体験者２０１の目の位置に近い位置である。また、ビデオカメラ１０１の取り付け方向は、体験者２０１が自身の頭部２０３にＨＭＤ１０９を装着した場合に、体験者２０１の視線方向に略一致するような方向である。従ってこのビデオカメラ１０１は、体験者２０１の目の位置姿勢に応じて見える現実空間の画像を撮像するためのものとすることができる。以下の説明では、このビデオカメラ１０１を「視点」と呼称する場合がある。このビデオカメラ１０１により撮像された各フレームの画像（現実空間の画像）は順次、図１に示す画像入力部１０２に入力される。画像入力部１０２は、受けた画像のデータを画像合成部１０８に出力すると共に、被写体領域抽出部１０３にも出力する。 The video camera 101 captures a moving image in the real space, and is attached to the HMD 109 as shown in FIG. The attachment position of the video camera 101 is a position close to the eye position of the experience person 201 when the experience person 201 wears the HMD 109 on his / her head 203. In addition, the mounting direction of the video camera 101 is a direction that substantially matches the viewing direction of the experiencing person 201 when the experiencing person 201 wears the HMD 109 on his / her head 203. Therefore, the video camera 101 can be used to capture an image of a real space that can be seen according to the position and orientation of the eye of the experiencer 201. In the following description, the video camera 101 may be referred to as a “viewpoint”. Images of each frame (real space image) captured by the video camera 101 are sequentially input to the image input unit 102 shown in FIG. The image input unit 102 outputs the received image data to the image composition unit 108 and also outputs it to the subject region extraction unit 103.

カメラ位置姿勢センサ１０４は、ビデオカメラ１０１の位置姿勢を計測するものであり、図２に示す如く、ビデオカメラ１０１に取り付けられている。このカメラ位置姿勢センサ１０４には例えば磁気センサや光学式センサ、超音波式センサなどが適用可能である。これらのセンサによる位置姿勢の計測技術に関しては周知のものであるので、これに関する説明は省略する。 The camera position and orientation sensor 104 measures the position and orientation of the video camera 101, and is attached to the video camera 101 as shown in FIG. For example, a magnetic sensor, an optical sensor, an ultrasonic sensor, or the like can be applied to the camera position and orientation sensor 104. Since the position and orientation measurement technique using these sensors is well known, a description thereof will be omitted.

カメラ位置姿勢センサ１０４により計測されたビデオカメラ１０１の位置姿勢を示す信号は図１に示すカメラ位置姿勢計測部１０５に入力され、カメラ位置姿勢計測部１０５により、この信号に基づいて、ビデオカメラ１０１の位置姿勢を示すデータが得られる。このデータは画像生成部１０６に出力すると共に、被写体領域選択部１１１にも出力する。 A signal indicating the position and orientation of the video camera 101 measured by the camera position and orientation sensor 104 is input to the camera position and orientation measurement unit 105 shown in FIG. 1, and the video camera 101 is based on this signal by the camera position and orientation measurement unit 105. Data indicating the position and orientation of the. This data is output to the image generation unit 106 and also to the subject region selection unit 111.

このように、ＨＭＤ１０９にはビデオカメラ１０１、カメラ位置姿勢センサ１０４が備わっており、体験者２０１の頭部２０３には、このようなＨＭＤ１０９が装着されている。また図２に示す如く、この体験者２０１は、椅子２０２に座っている。なお、同図では体験者２０１は椅子２０２に座っているものとしているが、体験者２０１が任意の姿勢をとっていたとしても、以下の説明の本質は特に変わらない。 As described above, the HMD 109 includes the video camera 101 and the camera position / orientation sensor 104, and the HMD 109 is mounted on the head 203 of the experience person 201. In addition, as shown in FIG. 2, the experience person 201 is sitting on a chair 202. In the figure, it is assumed that the experience person 201 is sitting on a chair 202, but even if the experience person 201 takes an arbitrary posture, the essence of the following description is not particularly changed.

図１に戻って、画像生成部１０６は、カメラ位置姿勢計測部１０５から受けたデータ、及び仮想空間データベース１０７に格納されているデータ（仮想空間データ）を用いて、ビデオカメラ１０１から見える仮想空間の画像を生成する。仮想空間データベース１０７には、仮想空間を構成する各仮想物体の画像を生成するために必要なデータが格納されており、例えば、仮想物体がポリゴンでもって構成されている場合には、各ポリゴンの法線ベクトルデータやポリゴンの色データ、ポリゴンを構成している各頂点の座標データ、また、仮想物体に対してテクスチャマッピングする場合には、テクスチャデータなどがこの仮想空間データベース１０７に格納されていることになる。 Returning to FIG. 1, the image generation unit 106 uses the data received from the camera position / orientation measurement unit 105 and the data (virtual space data) stored in the virtual space database 107 to display a virtual space that can be viewed from the video camera 101. Generate an image of The virtual space database 107 stores data necessary for generating an image of each virtual object constituting the virtual space. For example, when the virtual object is composed of polygons, the data of each polygon is stored. Normal vector data, polygon color data, coordinate data of each vertex constituting the polygon, and texture data are stored in the virtual space database 107 when texture mapping is performed on a virtual object. It will be.

図２の場合には、２０７が仮想物体であるので、この仮想物体２０７の画像を生成するためのデータが仮想空間データベース１０７に格納されていることになる。この場合、画像生成部１０６は、ビデオカメラ１０１から見える仮想物体２０７の画像を生成することになる。なお、所定の位置姿勢を有する視点から見える仮想空間の画像を生成する為の技術については周知のものであるので、これに関する説明は省略する。 In the case of FIG. 2, since 207 is a virtual object, data for generating an image of the virtual object 207 is stored in the virtual space database 107. In this case, the image generation unit 106 generates an image of the virtual object 207 that can be seen from the video camera 101. Since a technique for generating an image of a virtual space that can be seen from a viewpoint having a predetermined position and orientation is well known, a description thereof will be omitted.

画像生成部１０６が生成した仮想空間の画像データは画像合成部１０８に出力する。画像合成部１０８は、画像入力部１０２から受けた現実空間の画像上に、画像生成部１０６から受けた仮想空間の画像を重畳させた合成画像、即ち、複合現実空間の画像を生成し、ＨＭＤ１０９に出力する。これにより、体験者２０１の眼前には視点の位置姿勢に応じて見える複合現実空間の画像が表示されることになる。 The virtual space image data generated by the image generation unit 106 is output to the image synthesis unit 108. The image composition unit 108 generates a composite image obtained by superimposing the virtual space image received from the image generation unit 106 on the real space image received from the image input unit 102, that is, an image of the mixed reality space. Output to. As a result, an image of the mixed reality space that is visible according to the position and orientation of the viewpoint is displayed in front of the experiencer 201.

ここで、体験者２０１がＨＭＤ１０９を介して複合現実空間の画像を観察している最中に、自身の手２０５や他の体験者２０８の手２１０がビデオカメラ１０１の視界に入った場合、当然、手２０５、２１０はビデオカメラ１０１によって撮像されるし、ＨＭＤ１０９に表示される複合現実空間の画像上には手２０５、２１０が表示されることになる。 Here, when the user 201 observes the mixed reality space image via the HMD 109 and his / her hand 205 or the hand 210 of another user 208 enters the field of view of the video camera 101, naturally. The hands 205 and 210 are picked up by the video camera 101, and the hands 205 and 210 are displayed on the mixed reality space image displayed on the HMD 109.

このような場合、観察者１０１にとっては、自身の手２０５のみが前景となることが好ましいため、現実空間の画像上の手２０５の領域上には仮想空間の画像は重畳させないようにするが必要である。図２の場合、体験者２０１がＨＭＤ１０９を介して複合現実空間の画像を観察する際、現実物体において自身の手２０５のみを前景、それ以外の現実物体２０６ａ、２０６ｂ、他の体験者２０８の手２１０を背景とする。 In such a case, since it is preferable for the observer 101 to have only his / her hand 205 as the foreground, it is necessary not to superimpose the virtual space image on the region of the hand 205 on the real space image. It is. In the case of FIG. 2, when the experience person 201 observes the mixed reality space image via the HMD 109, only the hand 205 of the real object is in the foreground, the other real objects 206 a and 206 b, and the hands of other experience persons 208. The background is 210.

そのための構成として本実施形態に係るシステムは、被写体領域抽出部１０３、被写体色情報登録部１１０、被写体領域選択部１１１を備える。以下では、被写体領域抽出部１０３、被写体色情報登録部１１０、被写体領域選択部１１１の各部について説明する。 As a configuration for this purpose, the system according to the present embodiment includes a subject region extraction unit 103, a subject color information registration unit 110, and a subject region selection unit 111. Below, each part of the subject region extraction unit 103, the subject color information registration unit 110, and the subject region selection unit 111 will be described.

被写体色情報登録部１１０には、手２０５の色を示す色データ（色情報）が予め登録されている。ここで、この色データについて説明する。被写体色情報登録部１１０に登録されている色データは、多次元色空間における座標値として記述することができる。一般によく知られている表色系の種類には、RGB、YIQ、YCbCr、YUV、HSV、Lu*v*、La*b*など様々なものがある（日本規格協会 JIS色彩ハンドブック）。 In the subject color information registration unit 110, color data (color information) indicating the color of the hand 205 is registered in advance. Here, this color data will be described. The color data registered in the subject color information registration unit 110 can be described as coordinate values in a multidimensional color space. There are various well-known types of color systems such as RGB, YIQ, YCbCr, YUV, HSV, Lu * v *, La * b * (Japanese Standards Association JIS Color Handbook).

対象とする被写体の色彩特性に合わせて、適当なものを任意に用いてよいが、照明条件の相違による被写体の色彩特性の変化を相殺するために、輝度情報と色相情報とに分離する形式の表色系を用い、色相情報だけを用いることが望ましい。このような表色系の代表的なものとしてはYIQやYCbCrが一般的である。本実施形態では、YCbCr表色系を用いる。 An appropriate one may be used arbitrarily according to the color characteristics of the target subject, but in order to cancel the change in the color characteristics of the subject due to the difference in lighting conditions, the information is separated into luminance information and hue information. It is desirable to use a color system and use only hue information. Typical examples of such a color system are YIQ and YCbCr. In this embodiment, the YCbCr color system is used.

図４は、YCbCr表色系のCbCr平面における色彩分布の概要を示す図である。横軸４０１がCr、縦軸４０２がCbである。中央４０３は白色の領域であり、中央から周辺に向かって彩度が高くなっている。 FIG. 4 is a diagram showing an outline of the color distribution in the CbCr plane of the YCbCr color system. The horizontal axis 401 is Cr and the vertical axis 402 is Cb. The center 403 is a white region, and the saturation increases from the center toward the periphery.

図５は、被写体の色情報をYcbCr表色系のCbCr平面における色空間座標分布５０１として表現した例を示す図である。 FIG. 5 is a diagram illustrating an example in which the color information of the subject is expressed as a color space coordinate distribution 501 on the CbCr plane of the YcbCr color system.

被写体の色情報は背景の色情報に含まれないものであることが望ましいが、条件によっては被写体と背景とを色によって識別しにくい場合も考えられる。このような場合には、例えば手に着色した手袋を装着するなど、被写体に対して都合の良い色を割り当てると良い。 It is desirable that the subject color information is not included in the background color information, but depending on the conditions, it may be difficult to distinguish the subject and the background by color. In such a case, a convenient color may be assigned to the subject, for example, by wearing colored gloves on the hand.

被写体色情報登録部１１０における色情報の登録方法としては、被写体をカメラなどで撮像することで、この被写体の画像を取得し、取得した画像を構成する各画素の画素値をCbCr平面での分布として記述した上で、色空間の各軸の座標範囲を指定する方法や、色空間を各軸について標本化し、その各点について被写体であるか否かの値を登録する方法、いわゆるルックアプテーブルを用いる等、様々な方法が考え得る。 As a color information registration method in the subject color information registration unit 110, an image of the subject is acquired by imaging the subject with a camera or the like, and pixel values of each pixel constituting the acquired image are distributed on the CbCr plane. A method of specifying the coordinate range of each axis of the color space, a method of sampling the color space for each axis, and registering the value of whether or not each point is a subject, a so-called look-up table Various methods are conceivable, such as using.

図１に戻って、被写体領域抽出部１０３は、画像入力部１０２から受けた画像を構成する各画素値を参照し、被写体色情報登録部１１０に登録されたデータが示す『手２０５の色』と略同じ色を示す画素値を有する画素を「手２０５の領域を構成する画素」であると判断する。より詳しくは、画像入力部１０２から受けた画像を構成する各画素が手２０５の領域を構成する画素であるか否かを判断し、各画素について判断結果を割り当てる。例えば手２０５の領域を構成する画素については「１」を割り当て、手２０５の領域を構成しない画素については「０」を割り当てる。 Returning to FIG. 1, the subject region extraction unit 103 refers to each pixel value constituting the image received from the image input unit 102, and “the color of the hand 205” indicated by the data registered in the subject color information registration unit 110. It is determined that a pixel having a pixel value indicating substantially the same color as “a pixel constituting the region of the hand 205”. More specifically, it is determined whether each pixel constituting the image received from the image input unit 102 is a pixel constituting the region of the hand 205, and a determination result is assigned to each pixel. For example, “1” is assigned to the pixels constituting the hand 205 area, and “0” is assigned to the pixels not constituting the hand 205 area.

図３は、手２０５の色に略同じ色を有する現実物体が手２０５のみであるような現実空間の画像３０１が画像入力部１０２から得られた場合に、この画像３０１から手２０５の領域を抽出する処理を説明する図である。 FIG. 3 shows the region of the hand 205 from the image 301 when a real space image 301 in which only a hand 205 is a real object having substantially the same color as the hand 205 is obtained from the image input unit 102. It is a figure explaining the process to extract.

画像３０１を構成する各画素について手２０５の領域を構成するものであるのか否かをチェックする処理を行うと、各画素について「１」若しくは「０」の値が割り当てられるので、結果として３０２で示す画像を生成することができる。画像３０２において「白」で示した領域は「１」が割り当てられた画素群で構成された領域で、手２０５の領域を構成する画素群である。一方、画像３０２において「黒」で示した領域は「０」が割り当てられた画素群で構成された領域で、手２０５の領域を構成していない画素群である。このような画像３０２は所謂マスク画像である。このように、このマスク画像を生成することで、現実空間の画像中における手２０５の領域を抽出することができる。 When a process for checking whether or not each pixel constituting the image 301 constitutes the region of the hand 205 is performed, a value of “1” or “0” is assigned to each pixel. An image can be generated. In the image 302, an area indicated by “white” is an area constituted by a pixel group to which “1” is assigned, and is a pixel group constituting the area of the hand 205. On the other hand, an area indicated by “black” in the image 302 is an area constituted by a pixel group to which “0” is assigned, and is a pixel group that does not constitute the area of the hand 205. Such an image 302 is a so-called mask image. Thus, by generating this mask image, the region of the hand 205 in the image of the real space can be extracted.

図６は、被写体領域抽出部１０３が行う処理のフローチャートである。図６のフローチャートは、画像入力部１０２から受けた画像において、画像座標が（ｉ，ｊ）の位置にある画素に対する処理のフローチャートであるので、実際には被写体領域抽出部１０３は、図６のフローチャートに従った処理を、画像入力部１０２から受けた画像を構成する各画素について行うことになる。 FIG. 6 is a flowchart of processing performed by the subject area extraction unit 103. The flowchart of FIG. 6 is a flowchart of the process for the pixel whose image coordinates are at the position (i, j) in the image received from the image input unit 102. Processing according to the flowchart is performed for each pixel constituting the image received from the image input unit 102.

先ず、画像入力部１０２から受けた画像内で画像座標が（ｉ，ｊ）の位置にある画素のＲＧＢ値をＹＣｒＣｂ値に変換する（ステップＳ６０１）。画像座標が（ｉ，ｊ）の位置にある画素のＲ値をＲ（ｉ、ｊ）、Ｇ値をＧ（ｉ、ｊ）、Ｂ値をＢ（ｉ、ｊ）とすると、ＲＧＢ値をＹＣｒＣｂ値に変換する為の関数color_conversion()でもってＲ（ｉ、ｊ）、Ｇ（ｉ、ｊ）、Ｂ（ｉ、ｊ）を変換し、Ｙ値、Ｃｒ値、Ｃｂ値を得る。 First, the RGB value of the pixel whose image coordinates are (i, j) in the image received from the image input unit 102 is converted into a YCrCb value (step S601). If the R value of the pixel at the image coordinate (i, j) is R (i, j), the G value is G (i, j), and the B value is B (i, j), the RGB value is YCrCb. R (i, j), G (i, j), and B (i, j) are converted by a function color_conversion () for converting to a value to obtain a Y value, a Cr value, and a Cb value.

次に、ステップＳ６０１で得たＹ、Ｃｒ、Ｃｂのそれぞれの値が表現する色が、被写体色情報登録部１１０に登録されている色データが示す色に略同じであるのか否かを判断する。例えば同図の場合には、ステップＳ６０１で得たＹ、Ｃｒ、Ｃｂのそれぞれの値が表現する色が、被写体色情報登録部１１０に登録されている色データが示す色に略同じであるのか否かを、関数mask_func()でもって判断する。そして判断結果は、配列mask(ｉ、ｊ)に代入される。この配列mask(ｉ、ｊ)は、画像座標が（ｉ，ｊ）の位置にある画素が被写体の領域を構成するものであるのか否かを示す値を格納する為のものであり、上記マスク画像において画像座標が（ｉ，ｊ）の位置にある画素の画素値（１もしくは０）を格納する為のものである。 Next, it is determined whether or not the color represented by each value of Y, Cr, and Cb obtained in step S601 is substantially the same as the color indicated by the color data registered in the subject color information registration unit 110. . For example, in the case of the figure, is the color represented by the values of Y, Cr, and Cb obtained in step S601 substantially the same as the color indicated by the color data registered in the subject color information registration unit 110? Whether or not is determined by the function mask_func (). Then, the determination result is assigned to the array mask (i, j). This array mask (i, j) is for storing a value indicating whether or not the pixel whose image coordinates are at the position (i, j) constitutes the subject area. This is for storing the pixel value (1 or 0) of the pixel whose image coordinates are (i, j) in the image.

関数mask_func()による判断方法としては、例えば、Ｃｂ、Ｃｒで規定されるCbCr平面上における座標値（Ｃｒ、Ｃｂ）が、被写体色情報登録部１１０に登録されている被写体の色分布５０１の領域に属するか否かを判定する。判定結果は、例えば、被写体色分布５０１に属するのであれば１、属さないのであれば０と二値で表してもよいが、属する度合いを０から１までの連続値でもって表現するようにしても良い。 As a determination method using the function mask_func (), for example, coordinate values (Cr, Cb) on the CbCr plane defined by Cb and Cr are regions of the subject color distribution 501 registered in the subject color information registration unit 110. It is judged whether it belongs to. The determination result may be expressed as a binary value, for example, 1 if it belongs to the subject color distribution 501 and 0 if it does not belong, but the degree of belonging is expressed by a continuous value from 0 to 1. Also good.

なお、本実施形態では、画像入力部１０２から受けた画像を構成する各画素の画素値がＲＧＢで表されているものとしているが、YIQやYUVで表されていても良く、その場合には、ステップＳ６０１における処理を省略し、ステップＳ６０２において（ｃｂ、ｃｒ）の代わりにそれぞれIQ空間やUV空間における座標値を用いればよい。 In the present embodiment, the pixel value of each pixel constituting the image received from the image input unit 102 is represented by RGB, but may be represented by YIQ or YUV. The processing in step S601 is omitted, and the coordinate values in IQ space and UV space may be used instead of (cb, cr) in step S602.

以上説明したように、被写体領域抽出部１０３は、画像入力部１０２から受けた画像を構成する各画素が、「手」の領域を構成するものであるのか否かを示すマスク画像を生成する。 As described above, the subject area extraction unit 103 generates a mask image indicating whether or not each pixel constituting the image received from the image input unit 102 constitutes a “hand” area.

しかしここで、被写体領域抽出部１０３は、現実空間の画像において『手２０５の色』と略同じ色を有する画素については、マスク画像上では画素値が「１」となるので、この現実空間の画像内に手２０５の色と略同じ色を有する現実物体が手２０５以外にも含まれている場合には、この現実物体の領域を構成する画素もまた、マスク画像上では画素値が「１」となることに注意されたい。このような場合、マスク画像内には複数の白の領域が点在する事になる。 However, here, the subject area extraction unit 103 has a pixel value “1” on the mask image for a pixel having substantially the same color as the “color of the hand 205” in the image in the real space. When a real object having substantially the same color as that of the hand 205 is included in the image other than the hand 205, the pixels constituting the real object region also have a pixel value “1” on the mask image. Note that In such a case, a plurality of white areas are scattered in the mask image.

従って、被写体領域選択部１１１は、被写体色情報登録部１１０が生成したマスク画像中の白領域のうち何れの領域が手２０５の領域であるのかを特定する処理を行う。この判断処理の詳細については後述する。 Accordingly, the subject region selection unit 111 performs processing for specifying which region of the white region in the mask image generated by the subject color information registration unit 110 is the region of the hand 205. Details of this determination processing will be described later.

そして被写体領域選択部１１１は、マスク画像内で手２０５の領域を特定すると、この領域を白の領域にする。即ち、手２０５の領域を構成する画素の画素値を「１」にする。また、手２０５の領域以外の領域を黒の領域にする。即ち、手２０５の領域以外の領域を構成する画素の画素値を「０」にする。これにより、マスク画像において、手２０５の領域を構成する各画素の画素値は「１」となり、手２０５の領域以外の領域を構成する各画素の画素値は「０」となる。即ち、このマスク画像により、手２０５の領域と手２０５以外の領域とをマスク画像を構成する各画素の画素値でもって区別することができる。このようにして生成したマスク画像は画像合成部１０８に出力する。 Then, when the subject region selection unit 111 specifies the region of the hand 205 in the mask image, this region is changed to a white region. That is, the pixel value of the pixels constituting the region of the hand 205 is set to “1”. Further, a region other than the region of the hand 205 is set to a black region. That is, the pixel value of the pixels constituting the area other than the hand 205 area is set to “0”. As a result, in the mask image, the pixel value of each pixel constituting the hand 205 region is “1”, and the pixel value of each pixel constituting the region other than the hand 205 region is “0”. That is, with this mask image, the region of the hand 205 and the region other than the hand 205 can be distinguished by the pixel value of each pixel constituting the mask image. The mask image generated in this way is output to the image composition unit 108.

図７は、現実空間の画像上に仮想空間の画像を重畳させる際に、被写体領域選択部１１１が生成したマスク画像を用いて、手２０５の領域上に仮想空間の画像は重畳させないようにする為の処理を説明する図である。このような処理は画像合成部１０８によって行われる。 FIG. 7 illustrates that when a virtual space image is superimposed on a real space image, the virtual space image is not superimposed on the region of the hand 205 using the mask image generated by the subject region selection unit 111. It is a figure explaining the process for this. Such processing is performed by the image composition unit 108.

同図において、３０１は、画像入力部１０２から受けた画像である。画像３０１には、体験者２０１の手２０５が写っている。一方、７０１は、画像生成部１０６が生成した仮想空間の画像である。ここで画像合成部１０８によって画像３０１上に画像７０１を重畳させる際に、画像合成部１０８は、マスク画像３０２を参照する。即ち、画像３０１上に画像７０１を重畳させる際に、画像７０１において画像３０２中の手の領域（白で示した領域）に相当する部分は画像３０１上には描画しないようにする。これにより、画像７０２に示す如く、手２０５の領域上には常に仮想空間の画像は描画されないので、手２０５の領域は常に前景とすることができる。 In the figure, reference numeral 301 denotes an image received from the image input unit 102. In the image 301, the hand 205 of the experience person 201 is shown. On the other hand, reference numeral 701 denotes a virtual space image generated by the image generation unit 106. Here, the image composition unit 108 refers to the mask image 302 when the image composition unit 108 superimposes the image 701 on the image 301. That is, when the image 701 is superimposed on the image 301, a portion corresponding to the hand region (region shown in white) in the image 302 in the image 701 is not drawn on the image 301. Thereby, as shown in the image 702, since the image of the virtual space is not always drawn on the region of the hand 205, the region of the hand 205 can always be the foreground.

このような画像合成部１０８による処理を、同処理のフローチャートを示す図１０を用いて説明する。図１０のフローチャートは、画像入力部１０２から受けた画像、画像生成部１０６から受けた画像、において、画像座標が（ｉ，ｊ）の位置にある画素に対する処理のフローチャートであるので、実際には画像合成部１０８は、図１０のフローチャートに従った処理を、i,jが取りうる範囲全てについて行うことになる。 Processing performed by the image composition unit 108 will be described with reference to FIG. 10 showing a flowchart of the processing. The flowchart in FIG. 10 is a flowchart of processing for a pixel whose image coordinates are (i, j) in the image received from the image input unit 102 and the image received from the image generation unit 106. The image composition unit 108 performs the processing according to the flowchart of FIG. 10 for all the ranges that i and j can take.

ステップＳ１００では、画像入力部１０２から受けた現実空間の画像で画像座標が（ｉ，ｊ）の位置にある画素の画素データreal(i,j)をフレームメモリbuffer(i,j)に転送する。次に、ステップＳ２００では、被写体領域選択部１１１によって生成されたマスク画像で画像座標が（ｉ，ｊ）の位置にある画素の画素データmask(i,j)をマスク処理のための画像メモリであるステンシルバッファに転送する。 In step S100, the pixel data real (i, j) of the pixel having the image coordinate (i, j) in the real space image received from the image input unit 102 is transferred to the frame memory buffer (i, j). . Next, in step S200, the pixel data mask (i, j) of the pixel whose image coordinates are (i, j) in the mask image generated by the subject area selection unit 111 is stored in the image memory for mask processing. Transfer to a stencil buffer.

ステップＳ３００では、画像生成部１０６から受けた仮想空間の画像で画像座標が（ｉ，ｊ）の位置にある画素の画素データCGI(i,j)を上記フレームメモリbuffer(i,j)に転送するのであるが、その際にはステンシルバッファ内の画素データmask(i,j)を参照する。 In step S300, the pixel data CGI (i, j) of the pixel whose image coordinates are (i, j) in the image of the virtual space received from the image generation unit 106 is transferred to the frame memory buffer (i, j). In this case, the pixel data mask (i, j) in the stencil buffer is referred to.

そして、mask(i,j)＝０であれば、real(i,j)は手２０５の領域外の領域を構成する画素データであるので、仮想空間の画像の画素データCGI(i,j)を上記フレームメモリbuffer(i,j)に転送する。これにより、フレームメモリbuffer(i,j）には、現実空間の画像の画素データreal(i,j)上に仮想空間の画像の画素データCGI(i,j)が重畳された状態で格納されていることになる。 If mask (i, j) = 0, real (i, j) is pixel data constituting an area outside the area of the hand 205, and therefore pixel data CGI (i, j) of the image in the virtual space. Is transferred to the frame memory buffer (i, j). Thereby, the frame memory buffer (i, j) stores the pixel data CGI (i, j) of the virtual space image superimposed on the pixel data real (i, j) of the real space image. Will be.

一方、mask(i,j)＝１であれば、real(i,j)は手２０５の領域内を構成する画素データであるので、仮想空間の画像の画素データCGI(i,j)は上記フレームメモリbuffer(i,j)には転送しない。これにより、フレームメモリbuffer(i,j）には、現実空間の画像の画素データreal(i,j)のみが格納されていることになり、仮想空間の画像の画素データCGI(i,j)は重畳されない。 On the other hand, if mask (i, j) = 1, real (i, j) is the pixel data constituting the area of the hand 205, so the pixel data CGI (i, j) of the image in the virtual space is It is not transferred to the frame memory buffer (i, j). As a result, only the pixel data real (i, j) of the real space image is stored in the frame memory buffer (i, j), and the pixel data CGI (i, j) of the virtual space image is stored. Are not superimposed.

そして画像合成部１０８は、このようにしてフレームメモリ内に格納された画像を複合現実空間画像としてＨＭＤ１０９に出力する。 The image composition unit 108 outputs the image stored in the frame memory in this way to the HMD 109 as a mixed reality space image.

次に、被写体領域選択部１１１が、被写体領域抽出部１０３が生成したマスク画像を参照し、マスク画像内における白の領域のうち、何れの領域が手２０５の領域であるのかを判断する処理について説明する。 Next, a process in which the subject region selection unit 111 refers to the mask image generated by the subject region extraction unit 103 and determines which of the white regions in the mask image is the hand 205 region. explain.

上述の通り、被写体領域抽出部１０３は、現実空間の画像内に手２０５の色と略同じ色を有する現実物体が含まれている場合には、この現実物体の領域を構成する画素もまた、手２０５の領域を構成する画素であると判断するので、マスク画像内には複数の白の領域が点在する事になる。このようなマスク画像を用いて複合現実空間画像を生成すると、仮想空間の画像においてマスク画像中の白の領域に対応するいくつかの箇所が現実空間の画像上に描画されないことになる。しかし、目的は被写体としての手２０５の領域のみが前景となればよいので、マスク画像内で手２０５の領域以外は黒の領域とする処理が必要となる。 As described above, when a real object having substantially the same color as the color of the hand 205 is included in the image of the real space, the subject area extraction unit 103 also determines the pixels constituting the real object area as follows. Since it is determined that the pixel constitutes the region of the hand 205, a plurality of white regions are scattered in the mask image. When a mixed reality space image is generated using such a mask image, some portions corresponding to white areas in the mask image in the virtual space image are not drawn on the image in the real space. However, since only the area of the hand 205 as the object needs to be the foreground, it is necessary to process the area other than the area of the hand 205 as a black area in the mask image.

図８は、このような目的のために、マスク画像において手２０５の領域以外の領域を黒の領域とする為の処理を説明する図である。同図の場合、マスク画像８０１には、被写体領域（手２０５の領域）８５０の外側と内側にノイズが生じている。外側のノイズは現実空間中で手２０５と誤認識した領域であり、内側のノイズは被写体領域８５０であるにも関わらず被写体領域８５０ではないと認識した領域である。 For this purpose, FIG. 8 is a diagram illustrating a process for making a region other than the region of the hand 205 a black region in the mask image. In the case of the figure, noise is generated in the mask image 801 on the outside and inside of the subject area (the area of the hand 205) 850. The outside noise is an area that is misrecognized as the hand 205 in the real space, and the inside noise is an area that is recognized not to be the subject area 850 even though it is the subject area 850.

ステップＳ１において、まず外側のノイズを除去（被写体領域８５０外の白領域を黒領域に修正する）して、第１ノイズ除去画像８０２を得る。例えば、画像内の連結した領域の面積を算出する際によく用いられるラベリングの手法を用いて、白領域の面積（白領域を構成する画素数）を求め、求めた面積が所定の閾値以下であれば、この白領域は被写体領域８５０ではないと判断し、この白領域を構成する各画素の画素値を「０」にする。 In step S1, first, noise outside is removed (a white area outside the subject area 850 is corrected to a black area) to obtain a first noise-removed image 802. For example, the area of the white region (the number of pixels constituting the white region) is obtained using a labeling technique often used in calculating the area of the connected region in the image, and the obtained area is equal to or less than a predetermined threshold value. If there is, it is determined that the white area is not the subject area 850, and the pixel value of each pixel constituting the white area is set to “0”.

ここで、この「所定の閾値」は体験者２０１に複合現実空間画像を提示する事前に求めておくものであり、例えば以下のようにして求める。 Here, this “predetermined threshold value” is obtained in advance for presenting the mixed reality space image to the experience person 201, and is obtained as follows, for example.

先ず、体験者２０１は上記ＨＭＤ１０９を自身の頭部２０３に装着した状態で、自身の手２０５でもって握り拳を作り、手を伸ばした状態で前に出す。そして、その状態をビデオカメラ１０１で撮影する。ビデオカメラ１０１によって撮像された画像は、体験者２０１の手２０５でもって作った握り拳を、ビデオカメラ１０１から見た場合に見える画像である。 First, the experienced person 201 makes a fist with his hand 205 while putting the HMD 109 on his / her head 203, and puts out the hand with his hand extended. Then, the video camera 101 captures the state. The image captured by the video camera 101 is an image that can be seen when the fist made with the hand 205 of the experiencer 201 is viewed from the video camera 101.

オペレータはビデオカメラ１０１が撮像した現実空間の画像をコンピュータに入力すると共に、このコンピュータの表示装置の表示画面上に表示させる。そして表示されている画像上における握り拳の領域をマウスでなぞることによって手動でこの領域を指定する。そして指定した領域を構成している画素の数をこのコンピュータのＣＰＵによって計数し、計数した数を、上記「所定の閾値」とする。そして求めた「所定の閾値」はメモリ内に格納する。 The operator inputs the real space image captured by the video camera 101 to the computer and displays it on the display screen of the display device of the computer. Then, this area is manually designated by tracing the area of the fist on the displayed image with the mouse. The number of pixels constituting the designated area is counted by the CPU of the computer, and the counted number is set as the “predetermined threshold value”. The obtained “predetermined threshold value” is stored in the memory.

このとき、オペレータが選択した領域を構成する各画素の画素値の平均値（Cb,Crの値）も同時に取得し、被写体色情報登録部１１０に登録する。これにより、より質の高い被写体色情報を登録することができる。 At this time, the average value (Cb, Cr value) of each pixel constituting the region selected by the operator is also acquired and registered in the subject color information registration unit 110. As a result, higher-quality subject color information can be registered.

なお、以上の説明ではオペレータが手動で握り拳の領域を指定しているが、視点の位置姿勢を固定して背景差分によって握り拳の領域の面積および色情報を抽出してもよい。 In the above description, the operator manually specifies the fist region, but the position and orientation of the viewpoint may be fixed, and the area and color information of the fist region may be extracted based on the background difference.

以上のようにして、所定の閾値を求めることができる。よって、ステップＳ１では、握り拳のサイズよりも小さい白領域に関しては、手２０５の領域ではないと見なし、このような白領域を除去する。 As described above, the predetermined threshold value can be obtained. Therefore, in step S1, a white area smaller than the size of the fist is regarded as not being an area of the hand 205, and such a white area is removed.

これにより、第１のノイズ除去画像８０２に示す如く、被写体領域８５０の外側に点在していた白領域を除去することができる。なお、マスク画像内において被写体領域８５０内に黒領域が存在していない場合には、ステップＳ２以降の処理は不要となる。 Thereby, as shown in the first noise-removed image 802, white areas scattered outside the subject area 850 can be removed. Note that if there is no black area in the subject area 850 in the mask image, the processing after step S2 is not necessary.

次に、ステップＳ２ではステップＳ１で得られた第１ノイズ除去画像８０２を構成する各画素の画素値を反転する（画素値が「０」のものは「１」に反転し、画素値が「１」のものは「０」に反転する）処理を行う。これにより、ビット反転画像８０３が得られる。 Next, in step S2, the pixel value of each pixel constituting the first noise-removed image 802 obtained in step S1 is inverted (a pixel value of “0” is inverted to “1”, and the pixel value is “ “1” is inverted to “0”). Thereby, a bit inverted image 803 is obtained.

次に、ステップＳ３では、ステップＳ２で得られたビット反転画像８０３に対し、再度ラベリングを行う。その結果、ビット反転画像８０３は最も大きな面積を持つ領域（ビット反転画像８０３において被写体領域８５０外の領域（被写体領域８５０内に点在している白領域は除く））と、それ以外の領域とに分類される。従って、最も大きな面積を持つ領域以外の領域を被写体領域８５０であると見なし、最も大きな面積を持つ領域以外の領域を構成する各画素の画素値を「０」にする。これにより、第２ノイズ除去画像８０４を得ることができる。この第２ノイズ除去画像８０４に示す如く、被写体領域８５０内に点在していた白領域を除去することができる。 Next, in step S3, labeling is performed again on the bit inverted image 803 obtained in step S2. As a result, the bit-reversed image 803 has a region having the largest area (a region outside the subject region 850 in the bit-reversed image 803 (excluding white regions scattered within the subject region 850)), and other regions. are categorized. Accordingly, the region other than the region having the largest area is regarded as the subject region 850, and the pixel value of each pixel constituting the region other than the region having the largest area is set to “0”. Thereby, the 2nd noise removal image 804 can be obtained. As shown in the second noise-removed image 804, white areas scattered in the subject area 850 can be removed.

そして、ステップＳ４では、ステップＳ３得られた第２ノイズ除去画像８０４を構成する各画素の画素値を反転する（画素値が「０」のものは「１」に反転し、画素値が「１」のものは「０」に反転する）処理を行う。これにより、ノイズ除去画像８０５が得られる。 In step S4, the pixel value of each pixel constituting the second noise-removed image 804 obtained in step S3 is inverted (a pixel value of “0” is inverted to “1”, and the pixel value is “1”). "Is reversed to" 0 "). Thereby, a noise-removed image 805 is obtained.

従ってこのようなノイズ除去画像８０５をマスク画像として用いることで、画像合成部１０８において現実空間の画像上に仮想空間の画像を重畳させる際に、手２０５の領域上に仮想空間の画像を重畳させないようにすることができる。 Therefore, by using such a noise-removed image 805 as a mask image, when the image composition unit 108 superimposes the virtual space image on the real space image, the virtual space image is not superimposed on the region of the hand 205. Can be.

なお、図８に示した処理例では、面積に対して閾値を設定することで手２０５の領域を特定する処理を行っているが、被写体の数が予め決まった個数Ｎである場合には、マスク画像内における白領域を面積の大きい順にみた場合に、上位Ｎ個を被写体領域であると判定し、他の被写体領域は削除するようにしてもよい。 In the processing example shown in FIG. 8, the processing for specifying the region of the hand 205 is performed by setting a threshold value for the area. However, when the number of subjects is a predetermined number N, When the white areas in the mask image are viewed in order of increasing area, it may be determined that the top N areas are subject areas and the other subject areas are deleted.

また、肌色領域内部に生じる領域に対しても、別途閾値を設定して、その閾値以下になる領域は手の領域内に生じたノイズであると見なし、削除するようにしてもよい。ただし、その際に閾値は手の領域に対し、十分小さな値を設定する必要がある。 Also, a threshold value may be separately set for an area generated in the skin color area, and an area that is lower than the threshold value may be regarded as noise generated in the hand area and may be deleted. However, in this case, it is necessary to set a sufficiently small threshold value for the hand region.

また、本実施形態では、一般的なラベリングのアルゴリズムを採用することで、マスク画像内における手２０５の領域以外の白領域を除去していたが、ラベリング以外のメジアンフィルタや、縮小・膨張処理といったアルゴリズムや、被写体領域の候補に対して凸閉方処理などを用いて被写体領域以外の白領域を除去するようにしても良く、好適な手段を任意に選択すればよい。 In this embodiment, a white region other than the region of the hand 205 in the mask image is removed by adopting a general labeling algorithm. However, a median filter other than labeling, reduction / expansion processing, and the like are used. A white region other than the subject region may be removed using an algorithm or a convex closing process for the subject region candidate, and a suitable means may be arbitrarily selected.

また、以上の処理では、マスク画像内における手２０５の領域を特定する処理を白領域の面積に基づいて行っているが、２眼ステレオの原理を利用し、左右の被写体領域の重心位置でマッチングをとることにより奥行き情報を算出し、カメラ位置姿勢計測部１０５で得られる情報と比較して被写体領域の選択を行っても良い。２眼ステレオによる奥行き推定の手法は公知な技術であるため詳細は省略する。 Further, in the above processing, the processing for specifying the region of the hand 205 in the mask image is performed based on the area of the white region, but matching is performed at the center of gravity position of the left and right subject regions using the principle of binocular stereo. By calculating the depth information, the subject area may be selected by comparing with the information obtained by the camera position / orientation measurement unit 105. Since the method for estimating the depth by binocular stereo is a known technique, the details are omitted.

通常は上記手段のように一定の閾値を与えてノイズ除去を行うことで、提示画像の品質を向上することができる。 Usually, the quality of the presented image can be improved by removing noise by giving a certain threshold value as in the above-described means.

また、本実施形態では上述の通り、被写体として「手」を用いたが、これ以外のものを被写体としても良いことはいうまでもない。 In the present embodiment, as described above, the “hand” is used as the subject, but it goes without saying that other subjects may be used as the subject.

図１２は、画像入力部１０２、被写体領域抽出部１０３、画像生成部１０６、仮想空間データベース１０７、画像合成部１０８、被写体色情報登録部１１０、被写体領域選択部１１１の各部の機能を有する画像処理装置のハードウェア構成を示すブロック図である。なお、この画像処理装置としては、例えば、一般のＰＣ（パーソナルコンピュータ）やＷＳ（ワークステーション）等が適用可能である。 FIG. 12 illustrates image processing having functions of an image input unit 102, a subject region extraction unit 103, an image generation unit 106, a virtual space database 107, an image composition unit 108, a subject color information registration unit 110, and a subject region selection unit 111. It is a block diagram which shows the hardware constitutions of an apparatus. As this image processing apparatus, for example, a general PC (personal computer) or WS (workstation) can be applied.

１２０１はＣＰＵで、ＲＡＭ１２０２やＲＯＭ１２０３に格納されているプログラムやデータを用いて本装置全体の制御を行うと共に、図１１に示すフローチャートに従った処理を実行する。 A CPU 1201 controls the entire apparatus using programs and data stored in the RAM 1202 and the ROM 1203, and executes processing according to the flowchart shown in FIG.

１２０２はＲＡＭで、外部記憶装置１２０６からロードされたプログラムやデータを一時的に記憶するためのエリア、Ｉ／Ｆ（インターフェース）１２０７を介して外部から受信したデータを一時的に記憶するためのエリア、ＣＰＵ１２０１が各種の処理を実行する際に用いるワークエリア、そして上記フレームバッファやステンシルバッファなど、各種のエリアを適宜提供することができる。 Reference numeral 1202 denotes a RAM, an area for temporarily storing programs and data loaded from the external storage device 1206, and an area for temporarily storing data received from the outside via an I / F (interface) 1207. Various areas such as a work area used when the CPU 1201 executes various processes and the frame buffer and stencil buffer can be provided as appropriate.

１２０３はＲＯＭで、本装置の設定データや、ブートプログラムなどを格納する。 A ROM 1203 stores setting data, a boot program, and the like of the apparatus.

１２０４は操作部で、キーボードやマウスなどにより構成されており、本装置の操作者が操作することで、各種の指示をＣＰＵ１２０１に対して入力することができる。 An operation unit 1204 includes a keyboard, a mouse, and the like, and various instructions can be input to the CPU 1201 by an operator of the apparatus.

１２０５は表示部で、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ１２０１による処理結果を画像や文字などでもって表示することができる。 A display unit 1205 includes a CRT, a liquid crystal screen, and the like, and can display a processing result by the CPU 1201 using an image, text, or the like.

１２０６は外部記憶装置で、ハードディスクドライブ装置に代表される大容量情報記憶装置であって、ここにＯＳ（オペレーティングシステム）や、図１１に示すフローチャートに従った処理をＣＰＵ１２０１に実行させるためのプログラムやデータ等が保存されている。なお、上記仮想空間データベース１０７、被写体色情報登録部１１０、もまた、この外部記憶装置１２０６内に設けられている。そして、これらは必要に応じて適宜ＣＰＵ１２０１による制御に従ってＲＡＭ１２０２にロードされる。 Reference numeral 1206 denotes an external storage device, which is a large-capacity information storage device typified by a hard disk drive device. Here, an OS (operating system), a program for causing the CPU 1201 to execute processing according to the flowchart shown in FIG. Data etc. are saved. The virtual space database 107 and the subject color information registration unit 110 are also provided in the external storage device 1206. These are loaded into the RAM 1202 according to the control of the CPU 1201 as necessary.

１２０７はＩ／Ｆで、ここに上記ＨＭＤ１０９やビデオカメラ１０１、カメラ位置姿勢計測部１０５等が接続され、このＩ／Ｆ１２０７を介して本装置はＨＭＤ１０９やビデオカメラ１０１、カメラ位置姿勢計測部１０５等とデータ通信を行うことができる。なお、同図ではＩ／Ｆは１つとしているが、ＨＭＤ１０９、ビデオカメラ１０１、カメラ位置姿勢計測部１０５のそれぞれに対して別個にＩ／Ｆを設けるようにしても良い。 Reference numeral 1207 denotes an I / F, to which the HMD 109, the video camera 101, the camera position / orientation measurement unit 105, and the like are connected. The HMD 109, the video camera 101, the camera position / orientation measurement unit 105, and the like are connected to the apparatus via the I / F 1207. And data communication. In the figure, the number of I / Fs is one, but an I / F may be provided for each of the HMD 109, the video camera 101, and the camera position / orientation measurement unit 105.

１２０８は上述の各部を繋ぐバスである。 Reference numeral 1208 denotes a bus connecting the above-described units.

図１１は、以上説明した、複合現実空間画像を生成してＨＭＤ１０９に出力する処理のフローチャートである。各ステップにおける処理は上述の通りであるので、ここでは簡単に説明する。 FIG. 11 is a flowchart of the process described above for generating the mixed reality space image and outputting it to the HMD 109. Since the processing in each step is as described above, it will be briefly described here.

先ず、ビデオカメラ１０１により撮像された現実空間の画像がフレーム毎に順次Ｉ／Ｆ１２０７を介して本装置内に入力されるので、ＣＰＵ１２０１は入力された画像のデータをＲＡＭ１２０２若しくは外部記憶装置１２０６に格納する（ステップＳ１１０１）。 First, since the real space image captured by the video camera 101 is sequentially input into the apparatus via the I / F 1207 for each frame, the CPU 1201 stores the input image data in the RAM 1202 or the external storage device 1206. (Step S1101).

一方、カメラ位置姿勢センサ１０４が計測したビデオカメラ１０１の位置姿勢を示すデータがＩ／Ｆ１２０７を介して本装置内に入力されるので、ＣＰＵ１２０１は入力された位置姿勢を示すデータをＲＡＭ１２０２若しくは外部記憶装置１２０６に格納する（ステップＳ１１０２）。 On the other hand, since the data indicating the position and orientation of the video camera 101 measured by the camera position and orientation sensor 104 is input into this apparatus via the I / F 1207, the CPU 1201 stores the input data indicating the position and orientation in the RAM 1202 or an external storage. It stores in the device 1206 (step S1102).

次に、ＣＰＵ１２０１は、外部記憶装置１２０６内に設けられている仮想空間データベース１０７内に格納されている上記仮想空間データを用いて、仮想空間を構成している各仮想物体を生成して配置し、そしてこのような仮想空間を、ステップＳ１１０２で取得した位置姿勢を有する視点から見た場合に見える画像を生成する（ステップＳ１１０３）。 Next, the CPU 1201 uses the virtual space data stored in the virtual space database 107 provided in the external storage device 1206 to generate and arrange each virtual object constituting the virtual space. Then, an image that is seen when such a virtual space is viewed from the viewpoint having the position and orientation acquired in step S1102 is generated (step S1103).

次に、ＣＰＵ１２０１は、上記被写体領域抽出部１０３として機能し、現実空間画像内において、手２０５の色と略同じ色を有する領域を示すマスク画像を生成する（ステップＳ１１０４）。 Next, the CPU 1201 functions as the subject area extraction unit 103, and generates a mask image indicating an area having substantially the same color as the hand 205 in the real space image (step S1104).

次に、ＣＰＵ１２０１は、生成したマスク画像内で手２０５の色と同じ色を有する領域として抽出された各領域のうち、手２０５の領域、手２０５の領域外を特定する処理を行う（ステップＳ１１０５）。より具体的には、例えば、マスク画像において、手２０５の領域内を構成する各画素の画素値を「１」とし、手２０５の領域外を構成する各画素の画素値を「０」とする。 Next, the CPU 1201 performs processing for specifying the region of the hand 205 and the region outside the region of the hand 205 among the regions extracted as regions having the same color as the hand 205 in the generated mask image (step S1105). ). More specifically, for example, in the mask image, the pixel value of each pixel constituting the inside of the hand 205 area is set to “1”, and the pixel value of each pixel constituting the outside of the hand 205 area is set to “0”. .

そして、ステップＳ１１０５で生成したマスク画像を用いて、ステップＳ１１０１で取得した現実空間の画像上に、ステップＳ１１０３で生成した仮想空間の画像を重畳させることで、手２０５の領域を前景とした複合現実空間画像を生成する（ステップＳ１１０６）。 Then, by using the mask image generated in step S1105 and superimposing the virtual space image generated in step S1103 on the real space image acquired in step S1101, the mixed reality with the region of the hand 205 as the foreground is provided. A spatial image is generated (step S1106).

そして生成した複合現実空間画像のデータをＩ／Ｆ１２０７を介してＨＭＤ１０９に対して出力する（ステップＳ１１０７）。 The generated mixed reality space image data is output to the HMD 109 via the I / F 1207 (step S1107).

＜変形例＞
上記閾値決定方法は、体験者の手が視界から消えかかる段階において面積が閾値以下になった途端、視界から消えてしまい、体験者のリアリティを損ねてしまうことがあった。そこで、固定閾値により生じる現象を軽減するために、体験者２０１および他の体験者２０８の位置情報を利用し、動的に閾値を決定する。 <Modification>
In the threshold value determination method, as soon as the area of the experiencer's hand disappears from the field of view, the area disappears from the field of view and the reality of the user may be impaired. Therefore, in order to reduce the phenomenon caused by the fixed threshold value, the threshold value is dynamically determined using the positional information of the experience person 201 and the other experience person 208.

まず、カメラ位置姿勢計測部１０５より体験者２０１および、他の体験者２０８の頭部の位置を計測する。そして、計測した位置情報より２者間の距離を算出し、算出した距離に応じて面積の閾値を動的に変更する。閾値は、体験者２０１の手を伸ばした状態の拳の大きさ（画像上の面積）を最大値とし、最小値はシステムが動作する環境（背景が被写体領域と同様の色情報をどの程度有しているか）に依存してオペレータが決定する。 First, the positions of the heads of the experiencer 201 and other experiencers 208 are measured by the camera position / orientation measurement unit 105. Then, the distance between the two persons is calculated from the measured position information, and the area threshold value is dynamically changed according to the calculated distance. The threshold value is the maximum value of the fist size (area on the image) with the hand of the experiential person 201 extended, and the minimum value is the environment in which the system operates (how much color information is similar to that of the subject area in the background). The operator decides depending on whether or not

ここで、閾値の最大値、最小値を求める以上の処理は以下の仮定を前提としている。 Here, the processing beyond obtaining the maximum and minimum threshold values is based on the following assumptions.

即ち、他の体験者２０８の肌色領域を、体験者２０１と他の体験者２０８との位置が十分に離れた状態で、体験者２０１から見た場合、体験者２０１の拳の大きさよりも肌色領域の面積は大きくなることはない。つまり、体験者２０１と他の体験者２０８との距離が十分に離れている場合は、体験者２０１の拳の大きさをノイズの閾値（最大値）として、それよりも小さい肌色領域はノイズと見なすことができる。 That is, when the skin color region of another experiencer 208 is viewed from the experiencer 201 in a state where the experiencer 201 and the other experiencer 208 are sufficiently separated from each other, the skin color is larger than the fist size of the experiencer 201 The area of the region does not increase. In other words, when the distance between the experiencer 201 and the other experiencer 208 is sufficiently large, the size of the fist of the experiencer 201 is set as a noise threshold (maximum value), and a skin color region smaller than that is regarded as noise. Can be considered.

また、閾値の制御方法に関しては、体験者２０１とその他の体験者２０８との距離が近づいた場合には閾値を徐々に増加させる。このとき距離と閾値との関係式に関しては、例えば、距離に係数をかけて線形に閾値を変更させる式や、距離の二乗に比例して閾値を非線形的に変更させる式などが考えられる。 Regarding the threshold control method, the threshold is gradually increased when the distance between the experience person 201 and the other experience person 208 approaches. At this time, as a relational expression between the distance and the threshold, for example, an expression for linearly changing the threshold by applying a coefficient to the distance, an expression for changing the threshold nonlinearly in proportion to the square of the distance, and the like are conceivable.

また、カメラ位置姿勢計測部１０５の位置情報を利用することによって、体験者の視野内に他の体験者が存在するかどうかを判定し、存在しない場合は特定の面積を持つ領域は体験者の被写体領域であるという仮定を設け、オペレータの設定した閾値を採用するなどしてもよい。 Further, by using the position information of the camera position / orientation measurement unit 105, it is determined whether or not another experience person exists in the field of view of the experience person. An assumption may be made that the area is the subject area, and a threshold set by the operator may be adopted.

画像情報より閾値以下の面積を持つ被写体領域をノイズと見なすことができるが、閾値以上の面積を持つ被写体領域をノイズ、または体験者の被写体領域かを判断するのは困難である。 A subject area having an area equal to or smaller than the threshold value can be regarded as noise from the image information, but it is difficult to determine whether the subject area having an area equal to or larger than the threshold value is noise or the subject area of the user.

例えば、被写体を体験者の手としているときに、他の体験者の顔が体験者の手よりも大きくビデオカメラ１０１に映ってしまう場合などである。 For example, when the subject is in the hands of an experienced person, the face of another experienced person appears larger on the video camera 101 than the hands of the experienced person.

図９は、体験者の視界から他の体験者を見た場合の色情報のみから被写体として判断される領域の分布、および表示領域と非表示領域を説明する図である。図９の右側の手９０１は体験者自身の手であり、体験者の手は表示したい領域である。そして、図９の左側に表示されている他の体験者の頭部９０２、および、他の体験者の手や靴９０３（女性の場合は足の色）などが表示しない領域である。 FIG. 9 is a diagram for explaining the distribution of areas determined as subjects only from color information when other experienced persons are viewed from the field of view of the experienced persons, and display areas and non-display areas. The hand 901 on the right side of FIG. 9 is the hands of the experienced person, and the hands of the experienced person are areas to be displayed. 9 is an area where the heads 902 of other experiencers displayed on the left side of FIG. 9 and the hands and shoes 903 (colors of feet in the case of women) of other experiencers are not displayed.

ビデオカメラ１０１で獲得した画像から被写体領域を抽出し、非表示領域を選択除去するためには画像情報のみでは容易に除去することができない。そこで、画像情報に加え、位置姿勢情報も利用する。 In order to extract a subject area from an image acquired by the video camera 101 and selectively remove a non-display area, it cannot be easily removed only by image information. Therefore, in addition to image information, position and orientation information is also used.

カメラ位置姿勢計測部１０５より他の体験者の頭部の位置姿勢情報を獲得し、他の体験者の頭部を覆うように球、または卵型形状のマスクオブジェクトを３次元空間中に配置する。このとき配置するマスクオブジェクトの形状はシステムの構築環境に応じて好適な任意の形状を選択すると良い。体験者のマスクオブジェクトによって他の体験者の頭部は、被写体領域中のこの領域をマスクキングすることができる。したがって、他の体験者の顔部分に配置したマスクオブジェクトによって、他の体験者の頭部に生じる被写体領域を除去することができる。 The position and orientation information of the head of another experiencer is acquired from the camera position and orientation measurement unit 105, and a sphere or egg-shaped mask object is placed in the three-dimensional space so as to cover the head of the other experiencer. . As the shape of the mask object to be arranged at this time, any suitable shape may be selected according to the system construction environment. Another experience person's head can mask this area in the subject area by the experience object's mask object. Therefore, the subject area generated on the head of another experiencer can be removed by the mask object placed on the face part of the other experiencer.

本変形例では、他の体験者の頭部のみにマスクオブジェクトを配置したが、他の体験者の手などに位置姿勢センサを取り付けられるシステム環境であれば、他の体験者の手などを覆うような形でマスクオブジェクトを配置する。これにより、他の体験者の手などをロバストに検出し非表示領域にすることができる。 In this modification, the mask object is placed only on the head of another experiencer. However, in a system environment in which the position and orientation sensor can be attached to the hand of another experiencer, the other experiencer's hand is covered. Arrange the mask object like this. This makes it possible to robustly detect the hands of other users and make them non-display areas.

＜変形例２＞
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 <Modification 2>
Also, an object of the present invention is to supply a recording medium (or storage medium) in which a program code of software that realizes the functions of the above-described embodiments is recorded to a system or apparatus, and the computer (or CPU or Needless to say, this can also be achieved when the MPU) reads and executes the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program code read from the recording medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is based on the instruction of the program code. It goes without saying that the CPU or the like provided in the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

本発明を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。 When the present invention is applied to the recording medium, program code corresponding to the flowchart described above is stored in the recording medium.

ユーザ（体験者）に現実空間と仮想空間とが融合した複合現実空間を体験させるための、本実施形態に係るシステムの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the system which concerns on this embodiment for making a user (experience person) experience the mixed reality space which physical space and virtual space united. 本発明の実施形態に係るシステムを使用している様子を示す図である。It is a figure which shows a mode that the system which concerns on embodiment of this invention is used. 手２０５の色に略同じ色を有する現実物体が手２０５のみであるような現実空間の画像３０１が画像入力部１０２から得られた場合に、この画像３０１から手２０５の領域を抽出する処理を説明する図である。When a real space image 301 in which only the hand 205 is a real object having substantially the same color as the color of the hand 205 is obtained from the image input unit 102, a process of extracting the region of the hand 205 from the image 301 is performed. It is a figure explaining. YCbCr表色系のCbCr平面における色彩分布の概要を示す図である。It is a figure which shows the outline | summary of the color distribution in the CbCr plane of a YCbCr color system. 被写体の色情報をYcbCr表色系のCbCr平面における色空間座標分布５０１として表現した例を示す図である。It is a figure which shows the example which expressed the color information of a to-be-photographed object as the color space coordinate distribution 501 in the CbCr plane of a YcbCr color system. 被写体領域抽出部１０３が行う処理のフローチャートである。5 is a flowchart of processing performed by a subject area extraction unit 103. 現実空間の画像上に仮想空間の画像を重畳させる際に、被写体領域選択部１１１が生成したマスク画像を用いて、手２０５の領域上に仮想空間の画像は重畳させないようにする為の処理を説明する図である。When superimposing the virtual space image on the real space image, a process for preventing the virtual space image from being superimposed on the hand 205 region using the mask image generated by the subject region selection unit 111. It is a figure explaining. マスク画像において手２０５の領域以外の領域を黒の領域とする為の処理を説明する図である。It is a figure explaining the process for making area | regions other than the area | region of the hand 205 into a black area | region in a mask image. 体験者の視界から他の体験者を見た場合の色情報のみから被写体として判断される領域の分布、および表示領域と非表示領域を説明する図である。It is a figure explaining the distribution of the area | region judged as a to-be-photographed object only from the color information at the time of seeing another experienced person from a visual field of an experienced person, a display area, and a non-display area. 画像合成部１０８による処理のフローチャートである。5 is a flowchart of processing by the image composition unit. 複合現実空間画像を生成してＨＭＤ１０９に出力する処理のフローチャートである。10 is a flowchart of a process for generating a mixed reality space image and outputting it to the HMD 109; 画像入力部１０２、被写体領域抽出部１０３、画像生成部１０６、仮想空間データベース１０７、画像合成部１０８、被写体色情報登録部１１０、被写体領域選択部１１１の各部の機能を有する画像処理装置のハードウェア構成を示すブロック図である。Hardware of an image processing apparatus having functions of an image input unit 102, a subject region extraction unit 103, an image generation unit 106, a virtual space database 107, an image composition unit 108, a subject color information registration unit 110, and a subject region selection unit 111 It is a block diagram which shows a structure.

Claims

An acquisition means for acquiring a captured image obtained by capturing a part of the observer and the other person from the viewpoint of the observer in a real space where the observer and another person different from the observer exist;
Registration means for registering a subject color to be determined as a partial color of the observer in the captured image;
Extraction means for extracting a subject area having the registered subject color in the acquired captured image;
Derivation means for determining the size of each of the extracted subject areas;
Measuring means for measuring the distance between the observer and the other person ;
Determining means for determining a threshold value of a size of a subject area in which the image of the virtual space is not masked by the other person and the image of the virtual space is masked by a part of the observer based on the distance;
Generating means for considering the extracted subject region having a size larger than the determined threshold as a part of the observer, and generating a mask image for masking the image of the virtual space by the subject region ;
In the captured image, and superimposing means for superimposing the image of the masked the virtual space in the mask image,
An image processing apparatus comprising: output means for outputting a captured image on which an image of the virtual space is superimposed.

The image processing apparatus according to claim 1, wherein the generation unit generates a mask image that prohibits superimposition of an image in the virtual space on a subject area having a size larger than the threshold value.

The image processing apparatus according to claim 1, wherein the determination unit calculates the threshold value such that the threshold value increases as the distance approaches.

Position measuring means for measuring the position of a part of the other person ;
An arrangement means for arranging a mask object at the measured position in the virtual space;
4. The superimposing unit superimposes an image in a virtual space on the captured image, with the region where the mask object is arranged in the captured image being a region on which the virtual image may be superimposed. The image processing apparatus according to any one of the above.

The computer program for functioning a computer as each means which the image processing apparatus of any one of Claims 1 thru | or 4 has.

An image processing method performed by an image processing apparatus,
The acquired image possessed by the image processing apparatus is a captured image obtained by capturing a part of the observer and the other person from the viewpoint of the observer in a real space where the observer and another person different from the observer exist. An acquisition process for acquiring
A registration step in which a registration unit included in the image processing apparatus registers a subject color to be determined as a partial color of the observer in the captured image;
Extraction means for the image processing apparatus has, in the acquired the captured image, an extraction step of extracting a subject region with the registered subject color,
A deriving step in which a deriving unit included in the image processing apparatus obtains the size of each of the extracted subject areas;
A measuring step that the image processing apparatus has, a measuring step of measuring a distance between the observer and the other person ,
Based on the distance, a determination unit included in the image processing apparatus is configured to determine a subject area in which the virtual space image is not masked by the other person and the virtual space image is masked by a part of the observer . A determining step for determining a size threshold;
A mask image in which the generation unit included in the image processing apparatus regards the extracted subject area having a size larger than the determined threshold as a part of the observer, and masks the image in the virtual space with the subject area. A generating step for generating
Superimposing means for the image processing apparatus has found to the captured image, and superimposing step of superimposing the image of the mask by the mask image the virtual space,
An image processing method comprising: an output unit included in the image processing apparatus that outputs a captured image on which an image of the virtual space is superimposed.