JP5059503B2

JP5059503B2 - Image composition apparatus, image composition method, and image composition program

Info

Publication number: JP5059503B2
Application number: JP2007184979A
Authority: JP
Inventors: 健一郎吉田; 伸俊小島; 徳道津村; 貴雄牧野; 紘一高瀬; 桂一落合; 洋一三宅
Original assignee: Kao Corp
Current assignee: Kao Corp
Priority date: 2007-07-13
Filing date: 2007-07-13
Publication date: 2012-10-24
Anticipated expiration: 2027-07-13
Also published as: JP2009021964A

Description

本発明は、別々に撮像した人物の動画像と背景動画像とを合成する画像合成装置、方法及びプログラムに関する。 The present invention relates to an image synthesizing apparatus, method, and program for synthesizing a moving image of a person captured separately and a background moving image.

画像合成技術の進歩に伴って、テレビジョン撮影等においては、人物画像と背景画像とを別々に撮像し、人物画像に対して特定の閾値で該人物画像の人物領域のみを抽出し、抽出した人物領域と前記背景画像とをクロマキー合成し、該人物があたかもその背景の撮像場所にいるかのように合成する技術が実用化されている。しかし、この技術においては、人物の表面に現れる陰影が、背景画像の照明条件に応じて現れていないため、不自然な画像となる一つの課題を有していた。 Accompanying advances in image synthesis technology, in television shooting and the like, a person image and a background image are captured separately, and only the person area of the person image is extracted with a specific threshold for the person image and extracted. A technique has been put to practical use in which a person area and the background image are combined with a chroma key, and the person is combined as if the person is in the background imaging location. However, in this technique, since the shadow appearing on the surface of the person does not appear according to the illumination condition of the background image, there is one problem that the image becomes unnatural.

このような課題を改善する技術として、例えば、特許文献１に記載の技術が提案されている。この技術は、対象となる物体の表面色を拡散反射成分と鏡面反射成分に分離するとともに、それらの信頼度を計算した上で、反射モデルを適用して各画像における法線方向を復元し、該法線方向により与えられた照明方向と前記反射モデルによって照明方向、すなわち陰影を修正した画像を生成するものである。 As a technique for improving such a problem, for example, a technique described in Patent Document 1 has been proposed. This technology separates the surface color of the target object into diffuse reflection components and specular reflection components, calculates their reliability, and then applies the reflection model to restore the normal direction in each image, An image in which the illumination direction, that is, the shadow is corrected by the illumination direction given by the normal direction and the reflection model is generated.

特開平５−２３３８２６号公報JP-A-5-233826

しかしながら、上記特許文献１に記載の技術は、画像中のまとまった領域に対する統計処理で対象物の色を推定するため、例えば車のようにある程度広い領域で均一な色になっていることが期待できる対象物に対してはうまく機能するが、人の肌のように細かな周期で色合いが変化し、それが見る人の印象に影響を与える対象物に対してはうまく機能させることができなかった。 However, since the technique described in Patent Document 1 estimates the color of an object by statistical processing on a grouped area in an image, it is expected that the color is uniform over a wide area such as a car. It works well for objects that can be done, but cannot work well for objects that change color in fine cycles like human skin and that affect the impression of the viewer It was.

本発明は、上記従来の課題に鑑みてなされたものであり、別々に撮像された人物と背景とを、あたかも同じ場所で撮像されたと同様の自然な動画像に合成することができる画像合成装置、画像合成方法及び画像合成プログラムを提供することを目的とする。 The present invention has been made in view of the above-described conventional problems, and is capable of synthesizing a separately captured person and background with a natural moving image similar to that captured at the same place. Another object is to provide an image composition method and an image composition program.

本発明は、人物の実動画像に基づいて合成された該人物の顔の合成画像を、前記人物の実動画像とは別に撮像された背景の実動画像と合成する画像合成装置であって、
前記人物の実動画像を入力又は記憶する人物実動画像入力・記憶手段と、
前記背景の実動画像を入力又は記憶する背景実動画像入力・記憶手段と、
前記背景の実動画像とともに撮像された該背景の光環境実動画像を入力又は記憶する光環境実動画像入力・記憶手段と、
前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる前記人物の顔の表面反射特性情報を入力又は記憶する表面反射特性情報入力・記憶手段と、
前記人物実動画像入力・記憶手段に記憶された前記人物の実動画像の各フレーム画像から人物画像を抽出する人物画像抽出処理手段と、
抽出された前記人物画像、及び前記表面反射特性情報入力・記憶手段が入力又は記憶した前記顔の三次元形状情報に基づいて、前記人物の顔画像、並びに前記人物の顔の特定部位の位置情報及び該顔の方向情報からなる顔の追跡情報を求める追跡処理手段と、
前記光環境実動画像入力・記憶手段が入力又は記憶した光環境実動画像の各フレーム画像に基づいて、前記背景の実動画像の各フレーム画像に応じた光環境情報を求める光環境情報処理手段と、
前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求める合成顔画像情報処理手段と、
前記人物の顔画像の各フレーム画像から、色成分情報及び陰影成分情報からなる肌情報を求める肌情報処理手段と、
前記肌情報、前記顔の陰影情報、及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成する顔画像合成処理手段と、
前記顔の合成画像、前記人物画像、及び前記背景実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理手段とを具備する画像合成装置を提供することにより、前記目的を達成したものである。 The present invention is an image composition device for compositing a synthesized image of a person's face synthesized based on an actual moving image of a person with an actual moving image of a background captured separately from the actual moving image of the person. ,
Person actual moving image input / storage means for inputting or storing the actual moving image of the person;
Background actual moving image input / storage means for inputting or storing the actual moving image of the background;
Light environment actual moving image input / storage means for inputting or storing the light environment actual moving image captured together with the background actual moving image;
Surface reflection characteristic information input / storage means for inputting or storing the surface reflection characteristic information of the person's face composed of three-dimensional shape information, normal information and light reflection characteristic information of the person's face;
Person image extraction processing means for extracting a person image from each frame image of the person's actual moving image stored in the person actual moving image input / storage means;
Based on the extracted person image and the three-dimensional shape information of the face input or stored by the surface reflection characteristic information input / storage means, the position information of the face image of the person and the specific part of the face of the person And tracking processing means for obtaining face tracking information comprising the face direction information;
Light environment information processing for obtaining light environment information corresponding to each frame image of the background actual moving image based on each frame image of the light environment actual moving image input or stored by the light environment actual moving image input / storage means Means,
Synthetic face image information processing means for determining shadow information and surface reflection information of the face based on the face surface reflection characteristic information, the face tracking information, and the light environment information;
Skin information processing means for obtaining skin information comprising color component information and shadow component information from each frame image of the face image of the person;
Face image synthesis processing means for synthesizing the synthesized image of the face based on the skin information, the shadow information of the face, and the surface reflection information of the face;
By providing an image composition device comprising a person background composition moving image composition processing means for composing a person background composition moving image based on the face composite image, the person image, and the background actual moving image, The goal has been achieved.

また、本発明は、人物の実動画像、前記人物の実動画像とは別に撮像された背景の実動画像、前記背景の実動画像とともに撮像された該背景の光環境実動画像、並びに前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる該人物の顔の表面反射特性情報を入力又は記憶するデータ入力・記憶処理ステップと、
前記光環境実動画像の各フレーム画像に基づいて、前記背景の実動画像の各フレーム画像に応じた光環境情報を求める光環境情報処理ステップと、
前記人物の実動画像の各フレーム画像から人物画像を抽出する人物画像抽出処理ステップと、
前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記人物の顔の追跡情報を求める追跡処理ステップと、
前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求める合成顔画像情報処理ステップと、
前記人物の顔画像から、色成分情報及び陰影成分情報からなる肌情報を求める肌情報処理ステップと、
前記肌情報、前記顔の陰影情報、及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成する顔画像合成処理ステップと、
前記顔の合成画像、前記人物画像、及び前記背景の実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理ステップとを具備する画像合成方法を提供するものである。 In addition, the present invention provides an actual moving image of a person, a background actual moving image captured separately from the actual moving image of the person, an optical environment actual moving image captured with the background actual moving image, and A data input / storage processing step for inputting or storing the surface reflection characteristic information of the person's face composed of the three-dimensional shape information, normal information and light reflection characteristic information of the person's face;
An optical environment information processing step for obtaining optical environment information according to each frame image of the background actual moving image, based on each frame image of the optical environment actual moving image;
A person image extraction processing step of extracting a person image from each frame image of the actual moving image of the person;
A tracking processing step for obtaining tracking information of the face image of the person and the face of the person based on the three-dimensional shape information of the person image and the face;
A synthetic face image information processing step for obtaining shadow information and surface reflection information of the face based on the surface reflection characteristic information of the face, the tracking information of the face, and the light environment information;
Skin information processing step for obtaining skin information consisting of color component information and shadow component information from the face image of the person,
A face image synthesis processing step of synthesizing the synthesized image of the face based on the skin information, the shadow information of the face, and the surface reflection information of the face;
There is provided an image composition method including a person background composition moving image composition processing step for composing a person background composition moving image based on the face composite image, the person image, and the background actual motion image. .

また、本発明は、コンピュータに、人物の実動画像、前記人物の実動画像とは別に撮像された背景の実動画像、前記背景の実動画像とともに撮像された該背景の光環境実動画像、並びに前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる該人物の顔の表面反射特性情報を入力又は記憶手段に記憶させるデータ入力・記憶処理機能と、人物の実動画像の各フレーム画像から人物画像を抽出する人物画像抽出処理機能と、前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記人物の顔の追跡情報を求める追跡処理機能と、光環境実動画像の各フレーム画像に基づいて、記憶された前記背景の実動画像の各フレーム画像に応じた光環境情報を求める光環境情報処理機能と、前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求める合成顔画像情報処理機能と、前記人物の顔画像から、色成分情報及び陰影成分情報からなる肌情報を求める肌情報処理機能と、前記肌情報、前記顔の陰影情報、及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成する顔画像合成処理機能と、前記顔の合成画像、前記人物画像、及び前記背景実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理機能とを実現させる画像合成プログラムを提供するものである。 In addition, the present invention provides an actual moving image of a person, an actual moving image of a background captured separately from the actual moving image of the person, and an actual moving image of the light environment of the background captured together with the actual moving image of the background. A data input / storage processing function for inputting or storing the surface reflection characteristic information of the person's face composed of the image and the three-dimensional shape information, normal line information and light reflection characteristic information of the person's face into the storage means; Based on the person image extraction processing function for extracting a person image from each frame image of the actual moving image and the three-dimensional shape information of the person image and the face, the person face image and the person face tracking information are obtained. A tracking processing function, a light environment information processing function for obtaining light environment information corresponding to each frame image of the background actual moving image stored based on each frame image of the light environment actual moving image, and the face surface Reflection characteristics information Based on the face tracking information and the light environment information, it is composed of a synthetic face image information processing function for obtaining shadow information and surface reflection information of the face, and color component information and shadow component information from the face image of the person. Skin information processing function for obtaining skin information, face image synthesis processing function for synthesizing the synthesized image of the face based on the skin information, shadow information of the face, and surface reflection information of the face, and synthesis of the face An image composition program for realizing a person background composition moving image composition processing function for composing a person background composition moving image based on an image, the person image, and the background actual moving image is provided.

本発明によれば、別々の場所で撮像された人物の顔画像と背景画像とを、背景画像と略同じ照明条件で合成することができるので、得られた合成動画像は、人物と背景との間の光の反射具合等に違和感がなく、当該人物と背景とがあたかも同じ撮影場所で自然に撮像されたような動画像を合成することができる。 According to the present invention, since the face image and background image of a person captured at different locations can be combined under substantially the same illumination conditions as the background image, the resulting combined moving image is obtained by Therefore, it is possible to synthesize a moving image as if the person and the background were naturally captured at the same shooting location.

以下、本発明の画像合成装置を、その好ましい一実施形態に基づいて説明する。 Hereinafter, an image composition device of the present invention will be described based on a preferred embodiment thereof.

図１は、本発明の画像合成装置の一実施形態を示すものである。本実施形態の画像合成装置は、人物の実動画像（以下、人物実動画像ともいう。）に基づいて合成された該人物の顔の合成画像を含む人物画像を、前記人物実動画像とは別に撮像された背景の実動画像（以下、背景実動画像ともいう。）と合成するものである。 FIG. 1 shows an embodiment of an image composition apparatus of the present invention. The image synthesizing apparatus according to the present embodiment uses a person image including a synthesized image of a person's face synthesized based on a person's actual moving image (hereinafter also referred to as a person's actual moving image) as the person actual moving image. Is synthesized with a background actual moving image (hereinafter also referred to as a background actual moving image).

図１に示すように、画像合成装置１は、前記人物実動画像を入力又は記憶する人物実動画像入力・記憶手段１１と、前記背景実動画像を入力又は記憶する背景実動画像入力・記憶手段１２と、前記背景の実動画像とともに撮像した該背景の光環境実動画像を入力又は記憶する光環境実動画像入力・記憶手段１３と、前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる前記人物の顔の表面反射特性情報を入力又は記憶する顔の表面反射特性情報入力・記憶手段１４と、人物実動画像入力・記憶手段１１に記憶された前記人物実動画像の各フレーム画像から前記人物画像（人物領域画像）を抽出する人物画像抽出処理手段２１と、抽出された前記人物画像、及び表面反射特性情報入力・記憶手段１４が入力又は記憶した前記顔の三次元形状情報に基づいて、前記人物の顔画像（顔領域画像）、並びに前記人物の顔の特定部位の位置情報及び該顔の方向情報からなる顔の追跡情報を求める追跡処理手段２２と、前記光環境実動画像入力・記憶手段１３が入力又は記憶した光環境実動画像の各フレーム画像に基づいて、前記背景実動画像の各フレーム画像に応じて光源を点光源に近似させた光環境情報を求める光環境情報処理手段２３と、前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求める合成顔画像情報処理手段２４と、前記人物の顔画像から、色成分情報及び陰影成分情報からなる肌情報を求める肌情報処理手段２５と、前記陰影成分情報からその高周波陰影成分情報を求めた上で、前記肌情報、前記高周波陰影成分情報、前記顔の陰影情報及び表面反射情報に基づいて、前記顔の合成画像を合成する陰影成分情報処理手段２６兼顔画像合成処理手段２７と、前記顔の合成画像、前記人物画像、及び前記背景実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理手段２８と、前記人物背景合成動画像を表示する動画像表示処理手段２９とを具備している。 As shown in FIG. 1, the image synthesizing apparatus 1 includes a person actual moving image input / storage means 11 for inputting or storing the person actual moving image, and a background actual moving image input / storage for inputting or storing the background actual moving image. Storage means 12; light environment actual moving image input / storage means 13 for inputting or storing the light environment actual moving image captured together with the background actual moving image; and three-dimensional shape information and method of the person's face The surface reflection characteristic information input / storage means 14 for inputting or storing the surface reflection characteristic information of the person's face composed of line information and light reflection characteristic information, and the person actual moving image input / storage means 11 The person image extraction processing means 21 for extracting the person image (person area image) from each frame image of the person actual moving image, and the extracted person image and surface reflection characteristic information input / storage means 14 have input or stored them. The face A tracking processing means 22 for obtaining face tracking information comprising the face image (face area image) of the person and the position information and the direction information of the face of the person based on the three-dimensional shape information; Light obtained by approximating a light source to a point light source according to each frame image of the background actual moving image based on each frame image of the light environment actual moving image input or stored by the light environment actual moving image input / storage means 13 Light environment information processing means 23 for obtaining environment information, and synthetic face image information for obtaining shadow information and surface reflection information of the face based on the surface reflection characteristic information of the face, the tracking information of the face, and the light environment information Processing means 24; skin information processing means 25 for obtaining skin information comprising color component information and shadow component information from the face image of the person; and obtaining the high-frequency shadow component information from the shadow component information; Information, the shadow component information processing means 26 for synthesizing the composite image of the face based on the high-frequency shadow component information, the shadow information of the face and the surface reflection information, and a facial image composition processing means 27; Based on the person image and the background actual moving image, a person background synthesized moving image synthesizing processing unit 28 for synthesizing a person background synthesized moving image, and a moving image display processing unit 29 for displaying the person background synthesized moving image. It has.

以下、本実施形態の画像合成装置１を更に詳しく説明する。
上記各手段１１〜１４及び２１〜２９を具備する本実施形態の画像合成装置１は、図２に示すように、コンピュータ１０における中央処理装置（以下「ＣＰＵ」という）１０Ａが、基本ソフトウェア（ＯＳ）とともに作動する画像合成プログラム（以下、単にプログラムともいう。）１００に基づいて実行するように構成されている。 Hereinafter, the image composition device 1 of the present embodiment will be described in more detail.
As shown in FIG. 2, the image synthesizing apparatus 1 according to the present embodiment including the above-described units 11 to 14 and 21 to 29 includes a central processing unit (hereinafter referred to as “CPU”) 10A in a computer 10 and basic software (OS ) And an image composition program (hereinafter also simply referred to as a program) 100 that operates together with the image composition program.

プログラム１００は、コンピュータ１０に、前記各手段１１〜１４及び２１〜２９を実行させるために、前記各実動画像及び前記表面反射特性情報を前記各入力・記憶手段１１〜１４にそれぞれ入力又は記憶させるデータ入力・記憶処理機能１０１と、人物実動画像の各フレーム画像から前記人物画像を抽出する人物画像抽出処理機能１０２と、前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記顔の追跡情報を求める追跡処理機能１０３と、光環境実動画像の各フレーム画像に基づいて、前記背景実動画像の各フレーム画像に応じて、光源を点光源に近似させた光環境情報を求める光環境情報処理機能１０４と、前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求める合成顔画像情報処理機能１０５と、前記人物の顔画像から、色成分情報及び陰影成分情報からなる前記肌情報を求める肌情報処理機能１０６と、前記陰影成分情報から高周波陰影成分情報を求める陰影成分情報処理機能１０７と、前記肌情報、前記高周波陰影成分情報、前記顔の陰影情報、及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成する顔画像合成処理機能１０８と、前記顔の合成画像、前記人物画像、及び前記背景実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理機能１０９と、人物背景合成動画像を表示手段に表示させる動画像表示処理機能１１０とを実現させたプログラムである。 The program 100 inputs or stores the actual moving images and the surface reflection characteristic information in the input / storage units 11 to 14, respectively, in order to cause the computer 10 to execute the units 11 to 14 and 21 to 29. Based on the data input / storage processing function 101 to be performed, the person image extraction processing function 102 for extracting the person image from each frame image of the person actual moving image, and the three-dimensional shape information of the person image and the face Based on each frame image of the actual moving image of the light environment, the light source is approximated to a point light source based on each frame image of the actual moving image of the light environment based on the tracking processing function 103 for obtaining the face image and the tracking information of the face The light environment information processing function 104 for obtaining the light environment information, the face reflection information on the face, the tracking information on the face, and the shadow of the face based on the light environment information A combined face image information processing function 105 for obtaining information and surface reflection information, a skin information processing function 106 for obtaining the skin information composed of color component information and shadow component information from the face image of the person, and a high frequency from the shadow component information A facial image that synthesizes the composite image of the face based on the shadow component information processing function 107 for obtaining shadow component information and the skin information, the high-frequency shadow component information, the shadow information of the face, and the surface reflection information of the face A composition processing function 108; a person background composition moving image composition processing function 109 for composing a person background composition moving image based on the composite image of the face, the person image, and the background actual moving image; and a person background composition moving image. Is a program that realizes a moving image display processing function 110 for displaying the image on the display means.

画像合成装置１は、中央演算処理装置（ＣＰＵ）１０Ａ、メインメモリ（ＲＡＭ）１０Ｂ、画像処理プロセッサ（ＧＰＵ）１０Ｃ、ビデオメモリ（ＶＲＡＭ）１０Ｄ、チップセット１０Ｅ、ファイル記憶装置（ＨＤＤ）１０Ｆ、表示装置（モニター）１０Ｇ、表面反射特性情報入力装置１０Ｈ、入力装置１０Ｉ及びこれらの間を接続するバス１０Ｊを備えたコンピュータ１０と、人物実動画像入力装置（ビデオカメラ）１０Ｋと、背景実動画像入力装置（ビデオカメラ）１０Ｌ、光環境実動画像入力装置（ビデオカメラ）１０Ｍとで構成されている。画像合成装置１は、本実施形態のように、汎用のＣＰＵを備えた汎用のコンピュータで構成してもよいし、専用のＣＰＵを備えた画像合成処理専用のコンピュータとして構成してもよい。 The image composition device 1 includes a central processing unit (CPU) 10A, a main memory (RAM) 10B, an image processing processor (GPU) 10C, a video memory (VRAM) 10D, a chip set 10E, a file storage device (HDD) 10F, a display A computer 10 provided with a device (monitor) 10G, a surface reflection characteristic information input device 10H, an input device 10I, and a bus 10J connecting them, a human actual moving image input device (video camera) 10K, and a background actual moving image It is composed of an input device (video camera) 10L and an optical environment actual image input device (video camera) 10M. As in this embodiment, the image composition apparatus 1 may be configured by a general-purpose computer including a general-purpose CPU, or may be configured as a computer dedicated to image composition processing including a dedicated CPU.

人物実動画像入力装置１０Ｋ、背景実動画像入力装置１０Ｌ及び光環境実動画像入力装置１０Ｍは、それぞれビデオカメラ等の撮像手段で構成し、画像合成に使用される実動画像を各装置からリアルタイムに取得することもできるし、ファイル記憶装置１０Ｆに格納可能な動画ファイルにコンピュータ１０ないしこれとは別に用意したコンピュータを用いて変換しておき、画像合成に使用されるデータをファイル記憶装置１０Ｆから取得することもできるし、これらの取得法を組み合わせることもできる。
本実施形態においては、以下に説明するように、人物実動画像入力装置１０Ｋからは動画像をリアルタイムに取得し、背景実動画像入力装置１０Ｌ、光環境実動画像入力装置１０Ｍ及び表面反射特性情報入力装置１０Ｈからは実動画像を所定の動画ファイルとして予めファイル記憶装置１０Ｆに格納し、これらのデータによって画像合成を行う。このようなデータ取得法の組み合わせは、コンピュータによる演算処理の負荷を軽減することができるので、リアルタイム処理を実現する上で好ましい。本実施形態におけるようなリアルタイムな処理は、画像処理系記述言語（例えばＯｐｅｎＧＬ（米国シリコングラフィックス社の登録商標）やＤｉｒｅｃｔＸ（米国マイクロソフト社の登録商標））を用い、ＧＰＵ１０Ｃの機能を利用することで、実装が可能となる。 The person actual moving image input device 10K, the background actual moving image input device 10L, and the light environment actual moving image input device 10M are each configured by imaging means such as a video camera, and the actual moving images used for image synthesis are received from each device. It can be acquired in real time, or converted to a moving image file that can be stored in the file storage device 10F by using the computer 10 or a computer prepared separately, and data used for image composition is converted into the file storage device 10F. It is also possible to acquire from the above, and it is also possible to combine these acquisition methods.
In the present embodiment, as described below, a moving image is acquired in real time from the human actual moving image input device 10K, and the background actual moving image input device 10L, the light environment actual moving image input device 10M, and the surface reflection characteristics are acquired. From the information input device 10H, an actual moving image is stored in advance in the file storage device 10F as a predetermined moving image file, and image synthesis is performed using these data. Such a combination of data acquisition methods is preferable in realizing real-time processing because it can reduce the processing load on the computer. The real-time processing as in the present embodiment uses an image processing description language (for example, OpenGL (registered trademark of US Silicon Graphics) or DirectX (registered trademark of US Microsoft)) and uses the function of the GPU 10C. Implementation is possible.

ＣＰＵ１０Ａは、メインメモリ１０Ｂに記憶されたプログラム１００を解釈してその指令内容を実行する所定の演算処理回路を備えている。メインメモリ１０Ｂは、基本ソフトウェアＯＳ、プログラム１００並びにＣＰＵ１０Ａ及びＧＰＵ１０Ｃで使用される処理用データ等が記憶される記憶装置である。本実施形態では、ＧＰＵ１０Ｃの演算処理用にメモリ（図示せず）を備えているので、ＧＰＵ１０Ｃでの処理用データはこのメモリに記憶される。ＧＰＵ１０Ｃは、最終的な動画像の合成を含む画像処理をＣＰＵ１０Ａに代わって行う専用の演算処理回路を備えた演算処理装置である。ビデオメモリ１０Ｄは、ＧＰＵ１０Ｃで行った画像処理の結果最終的に表示される画像を記憶するメモリである。チップセット１０Ｅは、制御回路が集積された複数の集積回路（ＬＳＩ）であり、ＣＰＵ１０Ａとそれに接続される周辺デバイスとの間に介装される。ファイル記憶装置１０Ｆは、画像処理に必要な各情報が所定のファイル形式で記憶された補助記憶装置である。表示装置１０Ｇは、ＧＰＵ１０Ｃで画像処理された画像がビデオメモリ１０Ｄを介して表示される表示装置である。人物実動画像入力装置１０Ｋは、合成対象となる人物の実動画像をリアルタイムで入力する装置である。背景実動画像入力装置１０Ｌ、光環境実動画像入力装置１０Ｍ及び表面反射特性情報入力装置１０Ｈは、所定のファイル形式でファイル記憶装置１０Ｆに取り込まれて蓄積されるべき各動画像及び情報を入力する装置である。入力装置１０Ｉは、オペレータによる各種の入力操作行う装置である。バス１０Ｊは、それぞれの通信プロトコルに応じて各回路・装置を通信可能に接続している。
本実施形態の画像合成装置１においては、基本ソフトウェアＯＳ及びプログラム１００と協働することにより、ＣＰＵ１０Ａ及び／又はＧＰＵ１０Ｃが主として上記各処理手段２１〜２８として機能し、人物実動画像入力装置１０Ｋ及び／又はファイル記憶装置１０Ｆが主として上記各入力・記憶手段１１〜１４として機能する。
動画像を記憶装置に保存する際のファイル形式には、ＭＰＥＧ形式やＱｕｉｃｋＴｉｍｅ形式、ＡＶＩ形式等の様々なファイル形式を採用することができるが、画質の劣化が起こらない可逆変換に基づくファイル形式が好ましく、その中でも一般的に用いられており、汎用性に優れている非圧縮ＡＶＩ形式がより好ましい。 The CPU 10A includes a predetermined arithmetic processing circuit that interprets the program 100 stored in the main memory 10B and executes the command content. The main memory 10B is a storage device that stores the basic software OS, the program 100, processing data used by the CPU 10A and the GPU 10C, and the like. In this embodiment, since a memory (not shown) is provided for arithmetic processing of the GPU 10C, processing data in the GPU 10C is stored in this memory. The GPU 10C is an arithmetic processing device that includes a dedicated arithmetic processing circuit that performs image processing including synthesis of a final moving image instead of the CPU 10A. The video memory 10D is a memory that stores an image that is finally displayed as a result of image processing performed by the GPU 10C. The chip set 10E is a plurality of integrated circuits (LSIs) in which control circuits are integrated, and is interposed between the CPU 10A and peripheral devices connected thereto. The file storage device 10F is an auxiliary storage device that stores information necessary for image processing in a predetermined file format. The display device 10G is a display device that displays an image processed by the GPU 10C via the video memory 10D. The person actual moving image input device 10K is a device that inputs an actual moving image of a person to be combined in real time. The background actual moving image input device 10L, the light environment actual moving image input device 10M, and the surface reflection characteristic information input device 10H input each moving image and information to be captured and stored in the file storage device 10F in a predetermined file format. It is a device to do. The input device 10I is a device that performs various input operations by an operator. The bus 10J connects each circuit / device in a communicable manner according to each communication protocol.
In the image composition device 1 of the present embodiment, by cooperating with the basic software OS and the program 100, the CPU 10A and / or the GPU 10C mainly function as the processing units 21 to 28, and the person actual moving image input device 10K and The file storage device 10F mainly functions as each of the input / storage units 11-14.
Various file formats such as MPEG format, Quick Time format, and AVI format can be adopted as the file format for saving the moving image in the storage device, but the file format is based on reversible conversion that does not cause deterioration in image quality. Among them, the uncompressed AVI format that is generally used and excellent in versatility is more preferable.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ファイル記憶装置１０Ｆ、人物実動画像入力装置１０Ｋ、及び表面反射特性情報入力装置１０Ｈ等と協働し、データ入力・記憶処理機能１０１により、人物実動画像入力装置１０Ｋから前記人物実動画像、ファイル記憶装置１０Ｆから前記背景実動画像、前記背景の光環境動画像、及び前記人物の顔の表面反射特性情報をＧＰＵ１０Ｃ上のデータ記憶部にそれぞれ記憶させることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the file storage device 10F, the person actual moving image input device 10K, the surface reflection characteristic information input device 10H, and the like, and the data input / storage processing function 101 performs the person actual moving image. The person actual moving image from the input device 10K, the background actual moving image, the background light environment moving image, and the surface reflection characteristic information of the person's face from the file storage device 10F are stored in the data storage unit on the GPU 10C, respectively. To make it happen.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ等と協働し、人物画像抽出処理機能１０２により、人物実動画像の各フレーム画像から前記人物画像を抽出することを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, and the like to realize the person image extraction processing function 102 to extract the person image from each frame image of the person actual moving image.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、追跡処理機能１０３により、前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記顔の追跡情報を求めることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like, and performs the tracking processing function 103 on the basis of the person image and the three-dimensional shape information of the face. This makes it possible to obtain face tracking information.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、光環境情報処理機能１０４により、光環境実動画像の各フレーム画像に基づいて、前記背景実動画像の各フレーム画像に応じて、光源を点光源に近似させた光環境情報を求めることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like, and performs each of the background actual moving images based on each frame image of the light environment actual moving images by the light environment information processing function 104. According to the frame image, it is possible to obtain light environment information in which a light source is approximated to a point light source.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、合成顔画像情報処理機能１０５により、前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び表面反射情報を求めることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like, and by the synthetic face image information processing function 105, the face surface reflection characteristic information, the face tracking information, and the light environment information Based on the above, it is possible to obtain the shadow information and the surface reflection information of the face.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、肌情報処理機能１０６により、前記人物の顔画像から、前記色成分情報及び前記陰影成分情報からなる肌情報を求めることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like, and uses the skin information processing function 106 to obtain skin information including the color component information and the shadow component information from the face image of the person. We are realizing what we want.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、陰影成分情報処理機能１０７により、前記陰影成分情報から前記高周波陰影成分情報を求めることを実現させている。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like to obtain the high-frequency shadow component information from the shadow component information by the shadow component information processing function 107.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、顔画像合成処理機能１０８により、前記肌情報、前記高周波陰影成分情報、前記顔の陰影情報及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成することを実現している。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like, and uses the face image synthesis processing function 108 to perform the skin information, the high-frequency shadow component information, the face shadow information, and the face surface. Based on the reflection information, the composite image of the face is synthesized.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ及びファイル記憶装置１０Ｆ等と協働し、人物背景合成動画像合成処理機能１０９により、前記顔の合成画像、前記人物画像、及び前記背景実動画像から前記人物背景合成動画像を合成することを実現している。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the file storage device 10F, and the like from the synthesized image of the face, the person image, and the background actual moving image by the person background synthesized moving image synthesis processing function 109. The composition of the person background composition moving image is realized.

プログラム１００は、ＣＰＵ１０Ａ、メインメモリ１０Ｂ、ＧＰＵ１０Ｃ、ビデオメモリ１０Ｄ、ファイル記憶装置１０Ｆ及び表示装置１０Ｇと協働し、動画像表示処理機能１１０により、前記人物背景合成動画像を表示装置１０Ｇに表示させることを実現させている。
プログラム１００は、本実施形態のように汎用のコンピュータシステムにインストールされて使用される場合には、プログラムのインストーラーとともに記録媒体に記録されて提供される。また、画像合成処理専用の専用コンピュータに組み込まれる場合には、不揮発メモリに記録されて提供される。 The program 100 cooperates with the CPU 10A, the main memory 10B, the GPU 10C, the video memory 10D, the file storage device 10F, and the display device 10G to display the person background synthesized moving image on the display device 10G by the moving image display processing function 110. To make it happen.
When the program 100 is installed and used in a general-purpose computer system as in the present embodiment, it is provided by being recorded on a recording medium together with the program installer. Further, when it is incorporated in a dedicated computer dedicated to image composition processing, it is recorded in a nonvolatile memory and provided.

次に、本発明の画像合成方法を好ましい実施形態に基づいて説明する。
図３〜図７に示すように、本実施形態の画像合成方法は、人物実動画像とは別に撮像された背景実動画像、前記背景実動画像とともに撮像された該背景の光環境実動画像、並びに前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる該人物の顔の表面反射特性情報を入力し記憶するデータ入力・記憶処理ステップＳ１と、前記光環境実動画像の各フレーム画像に基づいて、前記背景実動画像の各フレーム画像に応じた光環境情報を求める光環境情報処理ステップＳ２と、前記人物実動画像の各フレーム画像から人物画像を抽出する人物画像抽出処理ステップＳ３と、前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記人物の顔の追跡情報を求める追跡処理ステップＳ４と、前記人物の顔画像の各フレーム画像から、色成分情報及び陰影成分情報からなる肌情報を求める肌情報処理ステップＳ５と、前記顔の表面反射特性情報、前記顔の追跡情報、及び前記光環境情報に基づいて、前記顔の陰影情報及び顔の表面反射情報を求める合成顔画像情報処理ステップＳ６と、前記肌情報、前記顔の陰影情報、及び前記顔の表面反射情報に基づいて、前記顔の合成画像を合成する顔画像合成処理ステップＳ７と、前記顔の合成画像、前記人物画像、及び前記背景実動画像に基づいて、人物背景合成動画像を合成する人物背景合成動画像合成処理ステップＳ８とを具備する。 Next, the image composition method of the present invention will be described based on a preferred embodiment.
As shown in FIGS. 3 to 7, the image composition method of the present embodiment is a background actual moving image captured separately from a human actual moving image, and a light environment actual moving image of the background captured together with the background actual moving image. A data input / storage processing step S1 for inputting and storing the surface reflection characteristic information of the person's face composed of the image and the three-dimensional shape information, normal line information and light reflection characteristic information of the person's face; Based on each frame image of the moving image, a light environment information processing step S2 for obtaining light environment information corresponding to each frame image of the background actual moving image, and extracting a person image from each frame image of the person actual moving image Person image extraction processing step S3, tracking processing step S4 for obtaining tracking information of the person's face image and the person's face based on the person image and the three-dimensional shape information of the face, Based on the skin information processing step S5 for obtaining skin information consisting of color component information and shadow component information from the frame image, and on the face surface reflection characteristic information, the face tracking information, and the light environment information, A combined face image information processing step S6 for obtaining shadow information and face surface reflection information, and a face image for combining the face composite image based on the skin information, the face shadow information, and the face surface reflection information. A composition processing step S7 and a person background composition moving image composition processing step S8 for composing a person background composition moving image based on the face composition image, the person image, and the background actual moving image are provided.

以下、本実施形態の画像合成方法を、前記画像合成装置１を使用した画像合成方法に基づいて詳しく説明する。
画像合成装置１では、図３に示すように、先ず、データ入力・記憶処理ステップＳ１において、ＣＰＵ１０Ａが、プログラム１００におけるデータ入力・記憶処理機能１０１により、対象となる人物の実動画像とは別に撮像された背景の実動画像、前記背景の実動画像とともに撮像された該背景の光環境実動画像、並びに前記人物の顔の三次元形状情報、法線情報及び光反射特性情報からなる該人物の顔の表面反射特性情報を、背景実動画像入力装置１０Ｌ、光環境実動画像入力装置１０Ｍ及び表面反射特性情報入力装置１０Ｈから得た上で、所定のファイル形式にてファイル記憶装置１０Ｆに予め記憶させる。 Hereinafter, the image composition method of the present embodiment will be described in detail based on the image composition method using the image composition device 1.
In the image composition apparatus 1, as shown in FIG. 3, first, in the data input / storage processing step S1, the CPU 10A uses the data input / storage processing function 101 in the program 100 to separate from the target person's actual moving image. The captured background actual moving image, the background light environment actual moving image captured together with the background actual moving image, and the person's face three-dimensional shape information, normal information, and light reflection characteristic information The surface reflection characteristic information of the human face is obtained from the background actual moving image input device 10L, the light environment actual moving image input device 10M, and the surface reflection characteristic information input device 10H, and then the file storage device 10F in a predetermined file format. In advance.

背景実動画像入力装置１０Ｌ及び光環境実動画像力装置１０Ｍによって入力される背景実動画像及び光環境実動画像は以下のようにして取得される。 The background actual moving image and the light environment actual moving image input by the background actual moving image input device 10L and the light environment actual moving image force device 10M are acquired as follows.

図８に示すように、背景の実動画像を撮像する背景撮像用のビデオカメラＶ１と、該背景の光環境実動画像を撮像するビデオカメラＶ２とを、それらのレンズの光軸Ｘが同一直線状に位置するように前後に間隔をおいて設置する。これらのビデオカメラＶ１、Ｖ２の間に、表面が鏡面加工された球体Ｓをその中心に前記光軸Ｘが通るように設置する。カメラＶ２のフォーカスは、球体Ｓ上に映る像が最も鮮明になるように合わせる。カメラＶ１及びＶ２、球体Ｓは、撮像中は互いの相対位置が変わらないように固定される。背景実動画像は、定点に静止した状態で撮像してもよいし、移動しながら撮像してもよい。ビデオカメラＶ１は、後に人物実動画像を撮影するカメラと同じ機種であることが好ましい。ビデオカメラＶ１及びＶ２は、カラー（ＲＧＢ）での撮像が可能であれば、アナログ式、ディジタル式の何れのビデオカメラでも良いが、後のデータの取り扱い等を考慮すると、ディジタル式のビデオカメラが好ましい。 As shown in FIG. 8, a background imaging video camera V 1 that captures an actual moving image of the background and a video camera V 2 that captures an actual moving image of the background light environment have the same optical axis X of their lenses. Install them at an interval in front and back so that they are positioned in a straight line. Between these video cameras V1 and V2, a sphere S having a mirror-finished surface is installed so that the optical axis X passes through the center. The camera V2 is focused so that the image displayed on the sphere S is the clearest. The cameras V1 and V2 and the sphere S are fixed so that their relative positions do not change during imaging. The actual background moving image may be captured in a stationary state at a fixed point or may be captured while moving. The video camera V1 is preferably the same model as a camera that captures a human actual moving image later. The video cameras V1 and V2 may be either analog or digital video cameras as long as color (RGB) imaging is possible. However, in consideration of later data handling, the digital video cameras preferable.

ビデオカメラＶ２で撮像される光環境実動画像は、中央に大きく球体Ｓが撮像され、その周辺に背景も撮像されるが、後述する光環境情報処理に用いられるのは、球体Ｓの部分の画像である。撮像された画像中、球体Ｓには、周囲の風景が歪んで映し出される。球体Ｓの表面では鏡面反射（入射角と反射角が完全に等しい反射）が起こっていることから、球体Ｓの部分の画像は、各対応する方向からどれだけの光が入ってきたかを表している。球体Ｓの形状は完全に幾何学的に表現できるので、球体Ｓの各部位を通してビデオカメラＶ２に入ってくる光が、どの方角から来た光であるかは、完全に特定することができる。 In the optical environment actual moving image captured by the video camera V2, a large sphere S is imaged in the center and the background is also imaged in the vicinity thereof. However, what is used for the optical environment information processing described later is the portion of the sphere S. It is an image. In the captured image, the surrounding landscape is distorted and projected on the sphere S. Since specular reflection (reflection where the incident angle and the reflection angle are completely equal) occurs on the surface of the sphere S, the image of the portion of the sphere S shows how much light has entered from each corresponding direction. Yes. Since the shape of the sphere S can be expressed completely geometrically, it can be specified completely from which direction the light entering the video camera V2 through each part of the sphere S comes.

本実施形態においては、背景実動画像及び光環境実動画像は、予め動画ファイルとしてファイル記憶装置１０Ｆに記憶されるので、ビデオカメラＶ１及びＶ２は、コンピュータ１０に接続されていなくてもよいが、これらのビデオカメラを前記背景実動画像入力装置１０Ｌ及び光環境実動画像入力装置１０Ｍとし、それらから実動画像を取り込んでリアルタイム処理により画像合成を行うこともできる。この場合には、これらのビデオカメラＶ１及びＶ２は、ＩＥＥＥ１３９４等のインターフェースを介してコンピュータ１０に接続される。また、ビデオカメラＶ１及びＶ２から取得される各動画像は、同期が取れている必要がある。この同期させる方法に特に制限はないが、簡便な方法としては、例えば、撮像の前後にストロボを光らせた画像も取得し、該ストロボが光った時間が揃うように同期させる方法がとられる。 In the present embodiment, the background actual moving image and the light environment actual moving image are stored in advance in the file storage device 10F as moving image files, so that the video cameras V1 and V2 may not be connected to the computer 10. These video cameras can be used as the background actual moving image input device 10L and the light environment actual moving image input device 10M, and the actual moving images can be taken in from them and image synthesis can be performed by real time processing. In this case, these video cameras V1 and V2 are connected to the computer 10 via an interface such as IEEE1394. Also, each moving image acquired from the video cameras V1 and V2 needs to be synchronized. Although there is no particular limitation on the synchronization method, a simple method is, for example, a method in which images obtained by shining a strobe before and after imaging are acquired and synchronized so that the time when the strobe shines is aligned.

画像合成装置１では、光環境情報処理ステップＳ２（図３参照）において、ＣＰＵ１０Ａが、プログラム１００における光環境情報処理機能１０４により、光環境情報を求める。本実施形態では、ファイル記憶装置１０Ｆに記憶された前記光環境実動画像の各フレーム画像について、仰角・方位角が二次元（ｘ、ｙ）座標となるように座標変換を行った後、前記背景実動画像の各フレーム画像に応じて光源を点光源に近似させた光環境情報を求める。背景実動画像から前記球体Ｓの領域を抽出する方法に特に制限はない。本実施形態では、前記球体Ｓの境界が明瞭に見えるような条件で予め撮影を行い、そのときに得られる１フレーム画像から市販の画像処理ソフトウェアを使用して前記境界がなす円周の前記画像中における位置を取得することにより行う。前記ビデオカメラＶ２と前記球体Ｓの相対位置が変わらなければ、前記背景実動画像中における前記球体Ｓが占める位置も変わらないので、前記球体Ｓの領域は、前記円周により抽出することができる。 In the image composition device 1, in the light environment information processing step S 2 (see FIG. 3), the CPU 10 A obtains light environment information using the light environment information processing function 104 in the program 100. In the present embodiment, for each frame image of the light environment actual moving image stored in the file storage device 10F, coordinate conversion is performed so that the elevation angle / azimuth angle is a two-dimensional (x, y) coordinate, Light environment information in which the light source is approximated to a point light source is obtained according to each frame image of the background actual moving image. There is no particular limitation on the method for extracting the region of the sphere S from the background actual moving image. In the present embodiment, the image of the circumference formed by photographing in advance under the condition that the boundary of the sphere S is clearly visible and using the commercially available image processing software from one frame image obtained at that time. This is done by acquiring the position inside. If the relative position between the video camera V2 and the sphere S does not change, the position occupied by the sphere S in the background actual moving image also does not change, so the area of the sphere S can be extracted by the circumference. .

本実施形態では、この光環境情報は、メジアンカット法（Debevec,P. 2005. A median cut algorithm for light probe sampling. In Proceedings of SIGGRAPH 2005 Conference Abstracts and Applications(Poster), ACM Press/ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM.）に基づいて次のように求められる。 In this embodiment, this light environment information is obtained from the Median cut algorithm for light probe sampling. In Proceedings of SIGGRAPH 2005 Conference Abstracts and Applications (Poster), ACM Press / ACM SIGGRAPH, Computer. Based on Graphics Proceedings, Annual Conference Series, ACM.)

先ず、近似に用いる点光源の数を設定する。この点光源の数は、光環境実動画像の構造の細かさ及び合成する顔の光反射特性における光沢の幅に基づいて２ⁿ（２、４、８、１６、３２、６４、・・・）の中から予め設定される。そして、図９に示したように、ＣＰＵ１０Ａが、設定された光源の数に応じて下記処理１〜処理４を行う。
処理１：画像全体を唯一の矩形領域としてリストに載せる。つまり、プログラム中において、要素数が可変で、各要素が画像中の任意の矩形領域を表す配列を用意し、その一つ目の要素に画像全体を割り当てる。具体的には、ピクセル数で数えたときに画像の幅がＷ、画像の高さがＨであったとき、画像中の各ピクセルの座標を（ｘ，ｙ）で表現し、画像全体の左上を（１，１）、右下を（Ｗ，Ｈ）となるように座標系をとることで配列の各要素に任意の矩形領域を割り当てる。任意の矩形領域の左上の座標が（ｘ１，ｙ１）、右下の座標が（ｘ２，ｙ２）であった場合、（ｘ１，ｙ１，ｘ２，ｙ２）の４要素の数列によりこの矩形領域を特定することができるので、各要素に４つの数列を格納できるようにしておけばよい。画像全体を唯一の矩形領域としてリストに載せるとは、前記手法に従えば、要素数がただ１つの配列を用意し、その唯一の要素を（１，１，Ｗ，Ｈ）とすればよい。
処理２：前記リストに載っている各矩形領域をその領域の短辺に平行な境界線で２つに分割（区分）する。前記境界線による分割は、分割後の各領域の光量が等しくなるように行われる。本実施形態では、光環境実動画像はＲＧＢの３チャンネルで得られており、各ピクセルにおける光量Ｙは、ＩＴＵ−ＲのＢＴ．７０９に従うものとされ、ＲＧＢとの関係で、次式（１）によって求められる。
Ｙ＝０．２１２５Ｒ＋０．７１５４Ｇ＋０．０７２１Ｂ（１）
処理３：上記処理２が点光源の数に応じてｎ回繰り返され、その結果もとの画像は２ⁿの領域に分割される。
処理４：各領域内の光の重心を求め、そこに該領域中の光量の総和に相当する点光源があると近似する。ここで、光の重心とは具体的には、前記座標系における座標（ｘ，ｙ）の前記光量をＹ（ｘ，ｙ）で表すとき、重心ｘｃ、ｙｃは、ｘｃ＝Σ（ｘ×Ｙ（ｘ，ｙ）
）／ΣＹ（ｘ，ｙ）及びｙｃ＝Σ（ｙ×Ｙ（ｘ，ｙ））／ΣＹ（ｘ，ｙ）で求まる。ただ
し、Σは該領域の全てのピクセルにわたる総和を表す。
本実施形態のように、光環境実動画像を予め取得し、動画ファイルとして格納しておく場合には、この処理は予め済ませておくことができる。これにより処理を行うときのコンピュータに対する計算負荷を軽減させることができる。 First, the number of point light sources used for approximation is set. The number of point light sources is 2 ⁿ (2, 4, 8, 16, 32, 64,... Based on the fineness of the structure of the actual moving image of the light environment and the gloss width of the light reflection characteristics of the face to be synthesized. ) In advance. And as shown in FIG. 9, CPU10A performs the following process 1-the process 4 according to the number of the set light sources.
Process 1: The entire image is put on the list as a single rectangular area. That is, in the program, an array in which the number of elements is variable and each element represents an arbitrary rectangular area in the image is prepared, and the entire image is assigned to the first element. Specifically, when the image width is W and the image height is H when counted by the number of pixels, the coordinates of each pixel in the image are expressed by (x, y), and the upper left of the entire image An arbitrary rectangular area is assigned to each element of the array by taking the coordinate system so that is (1, 1) and the lower right is (W, H). If the upper left coordinate of an arbitrary rectangular area is (x1, y1) and the lower right coordinate is (x2, y2), this rectangular area is specified by a sequence of four elements (x1, y1, x2, y2). Therefore, it is only necessary to store four number sequences in each element. In order to put the entire image in the list as the only rectangular area, according to the above-described method, an array having only one element may be prepared and the only element may be (1, 1, W, H).
Process 2: Each rectangular area on the list is divided (segmented) into two by a boundary line parallel to the short side of the area. The division by the boundary line is performed so that the light amounts of the divided areas are equal. In the present embodiment, the actual light environment moving image is obtained with three channels of RGB, and the amount of light Y at each pixel is the BT. 709, and is obtained by the following equation (1) in relation to RGB.
Y = 0.2125R + 0.7154G + 0.0721B (1)
Process 3: The above process 2 is repeated n times according to the number of point light sources, and as a result, the original image is divided into 2 ⁿ regions.
Process 4: The center of gravity of the light in each region is obtained, and it is approximated that there is a point light source corresponding to the total amount of light in the region. Here, specifically, the center of gravity of light is expressed as Yc (y, x) where the light quantity of the coordinate (x, y) in the coordinate system is expressed as xc = Σ (x × Y (X, y)
) / ΣY (x, y) and yc = Σ (y × Y (x, y)) / ΣY (x, y). Here, Σ represents the sum total over all pixels in the region.
When the light environment actual moving image is acquired in advance and stored as a moving image file as in the present embodiment, this processing can be completed in advance. Thereby, it is possible to reduce a calculation load on the computer when processing is performed.

顔の表面反射特性情報には、顔の三次元形状情報、法線情報、及び光反射特性情報（光沢の光反射特性情報）が含まれている。該光反射特性情報を表す関数は双方向反射関数（ＢＲＤＦ）と呼ばれており、ある方向から光を照射したときにどの向きにどの程度光が反射するかを表す関数で、一般には入射方向と出射方向に依存した複雑な形となる。ただし知覚的には、光沢は強度及び幅でほぼ表現できることが知られているので、実用上はＢＲＤＦの光沢成分を強度及び幅の２つのパラメータを含むモデルで近似したときのそれぞれのパラメータの値を光反射特性情報としても差し支えなく、本実施形態においても光沢の強度及び幅を光反射特性情報とする。本実施形態では、表面反射特性情報は、以下に説明する表面反射特性測定装置（以下、測定装置という。）Ａにより測定されるデータに基づいて算出される。 The face surface reflection characteristic information includes face three-dimensional shape information, normal line information, and light reflection characteristic information (glossy light reflection characteristic information). The function representing the light reflection characteristic information is called a bidirectional reflection function (BRDF), which is a function representing how much light is reflected in which direction when light is irradiated from a certain direction. And a complicated shape depending on the emission direction. However, perceptually, it is known that gloss can be expressed almost by intensity and width, so in practice, the value of each parameter when the gloss component of BRDF is approximated by a model including two parameters of intensity and width. Can be used as the light reflection characteristic information, and in this embodiment, the intensity and width of the gloss are also used as the light reflection characteristic information. In the present embodiment, the surface reflection characteristic information is calculated based on data measured by a surface reflection characteristic measurement apparatus (hereinafter referred to as a measurement apparatus) A described below.

図１０に示したように、本実施形態の測定装置Ａは、顔の光反射特性情報測定手段１Ａと、顔の三次元形状情報測定手段１Ｂと、顔の法線情報測定手段１Ｃとを具備している。 As shown in FIG. 10, the measuring apparatus A of this embodiment includes a face light reflection characteristic information measuring means 1A, a face three-dimensional shape information measuring means 1B, and a face normal information measuring means 1C. is doing.

光反射特性情報測定手段１Ａは、顔Ｆを臨む軌道上で移動可能に垂設され、顔Ｆに線状の光を照射する可動光源２と、前記軌道に沿って移動させながら可動光源２から顔Ｆに光が照射された状態を撮像するように配設された撮像手段３と、撮像手段３で撮像された顔Ｆの前記状態の画像データに基づいて、顔Ｆの光反射特性を求める情報処理手段４とを備えている。 The light reflection characteristic information measuring means 1A is suspended from a movable light source 2 that irradiates the face F with linear light, and a movable light source 2 while moving along the orbit. The light reflection characteristics of the face F are obtained based on the imaging unit 3 arranged to capture the state in which the face F is irradiated with light and the image data of the state of the face F captured by the imaging unit 3. And information processing means 4.

三次元形状情報測定手段１Ｂは、明暗の繰り返しからなる光学パターンを顔Ｆに投影する投影手段５を備えている。本実施形態では、撮像手段３は、顔Ｆに前記光学パターンが投影された状態を撮像するように配設されている。また、情報処理手段４は、撮像手段３で撮像された前記光学パターンが投影された状態の顔Ｆの画像データに基づいて、顔Ｆの三次元形状情報を求めるように設けられている。 The three-dimensional shape information measuring unit 1B includes a projecting unit 5 that projects an optical pattern composed of repeated bright and dark on the face F. In the present embodiment, the imaging unit 3 is disposed so as to capture the state in which the optical pattern is projected onto the face F. The information processing unit 4 is provided so as to obtain the three-dimensional shape information of the face F based on the image data of the face F in a state where the optical pattern imaged by the imaging unit 3 is projected.

法線情報測定手段１Ｃは、定位置に固定され顔に光を照射する複数の固定光源６を備えている。本実施形態では、撮像手段３は、前記測定対象に前記各固定光源から光が照射された状態を撮像するように配設されている。また、情報処理手段４は、撮像手段３で撮像された固定光源６から顔Ｆに光が照射された状態の画像データに基づいて顔の法線情報を求めるように設けられている。 The normal information measuring unit 1C includes a plurality of fixed light sources 6 that are fixed at a fixed position and irradiate light on the face. In the present embodiment, the image pickup means 3 is arranged so as to pick up an image of a state in which the measurement object is irradiated with light from each of the fixed light sources. The information processing means 4 is provided so as to obtain the normal information of the face based on the image data in a state where the face F is irradiated with light from the fixed light source 6 imaged by the imaging means 3.

＜光反射特性情報測定手段１Ａ＞
測定装置は、可動光源２の位置移動、各固定光源６の点灯及び消灯、投影手段５による前記光学パターンの投影及び該光学パターンの切り替え、並びに撮像手段３での前記各状態の顔の撮像及び撮像された前記各状態の画像データの情報処理手段４への取り込みを制御する制御手段７を備えている。本実施形態では、情報処理手段４及び制御手段７は、情報処理手段４及び制御手段７として機能するコンピュータシステム（以下、単にコンピュータともいう。）８で構成されている。 <Light reflection characteristic information measuring means 1A>
The measuring apparatus moves the position of the movable light source 2, turns on and off each fixed light source 6, projects the optical pattern by the projection unit 5 and switches the optical pattern, and captures the face in each state by the imaging unit 3. Control means 7 is provided for controlling the taking of image data of each of the captured states into the information processing means 4. In the present embodiment, the information processing means 4 and the control means 7 are configured by a computer system (hereinafter also simply referred to as a computer) 8 that functions as the information processing means 4 and the control means 7.

測定装置は、測定対象となる人が座った状態で収容できる空間を形成し、且つ可動光源２の軌道となる可動ステージ及び各固定光源６が取り付けられたフレーム９を備えている。 The measuring apparatus includes a frame 9 that forms a space that can be accommodated while a person to be measured sits down, and that is provided with a movable stage serving as a trajectory of the movable light source 2 and each fixed light source 6.

可動光源２の光源は、その軌道上の任意の位置から、撮像手段３で撮像できる強度の光を測定対象に照射できるものであれば、その光源の種類に特に制限はないが、蛍光灯、発光ダイオードアレイ、ネオン管などが好ましく、さらには線光源の軸を中心に考えたときに放射光量の軸対象性に優れており、細い形態のものが容易に入手できることを考慮すると、ネオン管が好ましい。可動光源２の光源の長さは、測定対象を含んで余りある上下に充分な長さであればよい。 The light source of the movable light source 2 is not particularly limited as long as it can irradiate the measurement target with light having an intensity that can be imaged by the imaging means 3 from an arbitrary position on the orbit. A light-emitting diode array, a neon tube, etc. are preferable. Furthermore, when considering the axis of the line light source as the center, it is excellent in the axial coverage of the amount of radiated light, and considering that the thin form can be easily obtained, the neon tube is preferable. The length of the light source of the movable light source 2 only needs to be long enough up and down including the measurement object.

可動光源２の軌道は、測定対象の外形、光源の形状、光源を動かす可動ステージの形態に応じて設定される。本実施形態では、可動光源２の軌道は、測定装置を平面視したときに測定対象に対して左右対称で且つ測定対象に対して開く略Ｖ字状の２本の軌道とされている。可動ステージとしては直線状に動かすものが容易に入手可能であり、それらを２つ組み合わせて測定対象を取り囲むように配置することで、可動光源２を測定対象の周囲で動かすことを実現している。可動光源をこのような略Ｖ字状の軌道に沿って移動させることによって、測定対象に向かって単に直線的に動かしたときには光沢を観察することができない部分からの光沢情報を得ることができる。なお、可動光源２の軌道は、曲線であっても良く、略円形であっても良い。 The trajectory of the movable light source 2 is set according to the outer shape of the measurement target, the shape of the light source, and the form of the movable stage that moves the light source. In the present embodiment, the trajectory of the movable light source 2 is two substantially V-shaped trajectories that are bilaterally symmetrical with respect to the measurement target and open with respect to the measurement target when the measurement apparatus is viewed in plan. As the movable stage, one that can be moved in a straight line is easily available, and the movable light source 2 is moved around the measurement target by combining the two to surround the measurement target. . By moving the movable light source along such a substantially V-shaped trajectory, it is possible to obtain gloss information from a portion where gloss cannot be observed when it is simply moved linearly toward the measurement object. The trajectory of the movable light source 2 may be a curve or a substantially circular shape.

本実施形態では、可動光源２は、測定対象の左側を移動する光源２Ａと、測定対象の右側を移動する光源２Ｂとの二つの光源を備えている。測定開始時には、これらの光源は測定対象から見てそれぞれ左後方、右後方の定位置にある。ネオン管は点灯直後には光強度の安定性があまりよくなく、光源点灯直後は揺らぎは大きくなってしまうが、光源の強度の揺らぎは取得する光反射特性情報に強く影響を与える。そのため、これらの光源は装置起動時に点灯しておき、使用していない間は定位置に配置しておく。さらに、この定位置から測定対象を見込む方向には遮蔽板２１Ａ及び２１Ｂを設置しておき、光源が定位置にある間は点灯していても測定対象を照らさないようにしておく。 In the present embodiment, the movable light source 2 includes two light sources, a light source 2A that moves to the left of the measurement target and a light source 2B that moves to the right of the measurement target. At the start of measurement, these light sources are in the left rear and right rear fixed positions as viewed from the measurement object. The light intensity of the neon tube is not very stable immediately after lighting, and the fluctuation increases immediately after the light source is turned on, but the fluctuation of the light source intensity strongly affects the obtained light reflection characteristic information. For this reason, these light sources are turned on when the apparatus is activated, and are placed at fixed positions when not in use. Further, shielding plates 21A and 21B are installed in a direction in which the measurement object is expected from this fixed position, so that the measurement object is not illuminated even when the light source is in the fixed position.

光源２Ａ、２Ｂは、前記可動ステージに沿って移動するキャリッジ（図示せず）に取り付けられている。該キャリッジは、制御手段７に電気的に接続されており、その駆動手段が制御手段７によって制御されることによって、光源２Ａ、２Ｂの所定の静止位置に応じて静止したり移動したりする。 The light sources 2A and 2B are attached to a carriage (not shown) that moves along the movable stage. The carriage is electrically connected to the control means 7, and the driving means is controlled by the control means 7, so that the carriage stops or moves according to a predetermined stationary position of the light sources 2 A and 2 B.

光源２Ａ、２Ｂの測定対象への光照射のための静止位置の間隔は、光反射特性の精度を高める観点からは、短ければ短いほど良いが、それに伴い測定時間も増加するため、本実施形態のように、測定対象が人の場合など、ある場所に長時間静止し続けることが困難なものの場合は、実用上は測定時間との兼ね合いで決める。静止位置の間隔の上限は以下のようにして決める。測定対象の任意の部位の撮像手段からの見えに着目したとき、光源を移動させると光沢が認められる角度条件近傍では暗かったものが明るくなり、再び暗くなるという現象が見られる。静止位置の間隔の上限は、この明るくなる領域の幅に比べて充分に短くなる範囲で設定するのが好ましい。本実施形態においては、静止位置の間隔は前記可動ステージ上で１ｃｍとした。 From the viewpoint of improving the accuracy of the light reflection characteristics, the interval between the stationary positions for irradiating the measurement target of the light sources 2A and 2B is preferably as short as possible. As described above, in the case where it is difficult to remain stationary for a long time, such as when the measurement target is a person, the determination is made in consideration of the measurement time in practice. The upper limit of the interval between the stationary positions is determined as follows. When attention is paid to the appearance of an arbitrary part to be measured from the imaging means, when the light source is moved, there is a phenomenon that a dark thing becomes bright and becomes dark again in the vicinity of an angle condition where gloss is recognized. The upper limit of the interval between the stationary positions is preferably set within a range that is sufficiently shorter than the width of the brightened area. In the present embodiment, the interval between the stationary positions is 1 cm on the movable stage.

撮像手段３は、測定対象である顔に対峙するように顔の略正面に固定される。撮像手段３は、前記各状態の測定対象である顔の全像を所定の画角に収めて撮像できるもので、充分な空間分解能を有するものであれば、特に制限はない。異なる露出で撮像できることが好ましく、さらに動画と静止画の何れも撮像できるものが好ましい。また、制御手段７によって、撮像及び撮像した画像を画像データ（電子データ）としてコンピュータ８への取り込みの制御が可能な、荷電結合素子（ＣＣＤ）や相補型酸化物半導体（ＣＭＯＳ）素子を備えたビデオカメラが好ましい。可動光源２を連続的に移動させても、露出を異にしての一連の撮像の時間の間に光源の移動が殆ど起こっていないとみなせる程度に撮像手段が高速な場合には、光源は静止することなく移動させても構わない。また、撮像手段の光量に対するダイナミックレンジが光沢の強い部分に対応するのに充分な広さを持っており、かつ光量に対する分解能が光沢の弱い部分を表すのに充分な細かさを持っている場合には、異なる露出で撮像する必要はなく、単一の露出で撮像すれば充分である。 The imaging means 3 is fixed to a substantially front face of the face so as to face the face to be measured. The imaging means 3 is not particularly limited as long as it can capture an image of a face as a measurement target in each state with a predetermined angle of view and has sufficient spatial resolution. It is preferable to be able to capture images with different exposures, and it is preferable to be able to capture both moving images and still images. The control means 7 includes a charge coupled device (CCD) and a complementary oxide semiconductor (CMOS) device that can control the capturing of the captured image and the captured image into the computer 8 as image data (electronic data). A video camera is preferred. Even if the movable light source 2 is moved continuously, the light source is stationary when the imaging means is fast enough to assume that the light source hardly moves during a series of imaging times with different exposures. You may move it without doing. Also, when the dynamic range for the light intensity of the imaging means is wide enough to correspond to areas with high glossiness, and the resolution for the light intensity is fine enough to represent areas with low glossiness For this, it is not necessary to capture images with different exposures, and it is sufficient to capture images with a single exposure.

撮像手段３は、後のデータ処理が簡単になるように、画素値が入射光量に対して線形になるものを用いており、さらに反射光量の大きなレンジに対応するために露出を変えて撮像できる機能を有している。 The imaging means 3 uses a pixel value that is linear with respect to the amount of incident light so that later data processing is simple, and can capture images with different exposures to accommodate a large range of reflected light. It has a function.

＜三次元形状情報測定手段１Ｂ＞
投影手段５は、顔に白黒（明暗）の横縞の繰り返しからなる光学パターンを投影する装置である。投影手段５は、その投影位置から撮像手段３で撮像できる強度の光を任意のパターンで測定対象に投影できるものであれは、その種類に特に制限はないが、容易に安価に入手できること、投影パターンを容易に設定できることを考慮すると、コンピュータに接続可能な市販のプロジェクターが好ましい。また、投影手段５は、本実施形態のように投影する光学パターンを制御手段７において自動的に変更可能なものが好ましい。 <Three-dimensional shape information measuring means 1B>
The projection means 5 is an apparatus that projects an optical pattern composed of repeated white and black (bright and dark) horizontal stripes on the face. The projection unit 5 is not particularly limited in type as long as the projection unit 5 can project light of an intensity that can be imaged by the imaging unit 3 from the projection position onto the measurement target. However, the projection unit 5 can be easily obtained at low cost. Considering that the pattern can be easily set, a commercially available projector that can be connected to a computer is preferable. Further, it is preferable that the projection unit 5 can automatically change the optical pattern to be projected by the control unit 7 as in this embodiment.

投影手段５の配置は、測定対象の測定したい領域中に投影パターンが投影されず影になってしまうことが起こらない範囲で、測定対象に対する撮像手段３の向きと投影手段５の向きがなるべく異なっていることが望ましい。鼻の影が発生しにくいこと、投影パターンが左右均等に投影されることを考慮すると、測定対象から見て撮像手段の下方に設置することが好ましい。 As for the arrangement of the projection means 5, the direction of the imaging means 3 and the direction of the projection means 5 with respect to the measurement object are different as much as possible within a range in which the projection pattern is not projected in the region to be measured of the measurement object and the shadow does not occur. It is desirable that Considering that the shadow of the nose is unlikely to occur and that the projection pattern is evenly projected on the left and right, it is preferable that the nose shadow be installed below the imaging means as viewed from the measurement target.

光学パターンにおける明暗の幅の切り替えは、制御手段７において自動的に行われる。光学パターンにおける明暗の繰り返しをどこまで細かくするかについては、得られる形状の精度向上のためには細かければ細かいほど望ましいが、投影手段の分解能、撮像手段の分解能によって限界がある。目的を考慮すると、測定対象に投影される最も細かいパターンの繰返し幅を２０ｍｍ以下とすることが好ましく、５ｍｍ以下とすることがより好ましい。投影手段及び撮像手段の分解能は、これを実現するのに充分な性能を有する必要がある。具体的には、例えば、投影パターンは空間を上下２分割したものから始めて順次分割数を倍に増やしていき、最終的には２¹⁰（＝１０２４）分割したものまでの１０のパターンを投影する。 The control means 7 automatically switches the light / dark width in the optical pattern. The finer the repetition of light and darkness in the optical pattern, the more desirable it is to improve the accuracy of the obtained shape, but there is a limit depending on the resolution of the projection means and the resolution of the imaging means. Considering the purpose, the repetition width of the finest pattern projected onto the measurement object is preferably 20 mm or less, and more preferably 5 mm or less. The resolution of the projection means and the imaging means needs to have sufficient performance to realize this. Specifically, for example, the projection pattern starts with a space divided into two vertically and sequentially increases the number of divisions twice, and finally projects 10 patterns up to 2 ¹⁰ (= 1024) divisions. .

＜法線情報測定手段１Ｃ＞
固定光源６の光源には、点光源が用いられる。固定光源６の光源は、その固定位置から撮像手段３で撮像できる強度の光を測定対象に照射できるものであれば、その光源の種類に特に制限はないが、点灯後すぐに強度が安定すること、耐久性に優れていることを考慮すると、発光ダイオード（ＬＥＤ）が好ましい。 <Normal information measuring means 1C>
A point light source is used as the light source of the fixed light source 6. The light source of the fixed light source 6 is not particularly limited as long as it can irradiate the measurement target with light having an intensity that can be imaged by the imaging means 3 from the fixed position, but the intensity is stabilized immediately after lighting. In view of the excellent durability, a light emitting diode (LED) is preferable.

固定光源６の向きとそのときの撮像装置３に入ってくる光量の複数の関係からフィッティングを行って法線を求めることになるので、固定光源６の数は多ければ多いほど好ましく、さらに測定対象から見て均等に配置されていることが好ましい。一方で測定時間を短縮するためには光源の数は少ないほど好ましく、少ない数で効率的にデータを取得することを考慮すると、測定対象に対して前方に、測定対象を取り囲むようにリング状に配置することが好ましい。本実施形態においては、表面反射特性に必要な精度を確保するためには、固定光源６は９個で充分であった。 Since the normal is obtained by performing fitting from a plurality of relationships between the direction of the fixed light source 6 and the amount of light entering the imaging device 3 at that time, the larger the number of fixed light sources 6 is, the more preferable. It is preferable that they are arranged evenly when viewed from above. On the other hand, in order to shorten the measurement time, it is preferable that the number of light sources is small. Considering efficient data acquisition with a small number, a ring shape is formed so as to surround the measurement object in front of the measurement object. It is preferable to arrange. In the present embodiment, nine fixed light sources 6 are sufficient to ensure the accuracy required for the surface reflection characteristics.

固定光源６は、個々の光源が接続されるリレー回路（図示せず）を備えている。該リレー回路は、制御手段７に接続されており、制御手段７によって該リレー回路のオン・オフスイッチが制御されることによって、個々の光源が点灯・消灯する。 The fixed light source 6 includes a relay circuit (not shown) to which each light source is connected. The relay circuit is connected to the control means 7, and when the control means 7 controls the ON / OFF switch of the relay circuit, each light source is turned on / off.

測定装置においては、測定対象である顔と、撮像手段３及び各固定光源６との間に、それぞれ偏光子３Ａ、６Ａが配されている。ここで、これらの偏光子は、光源側の偏光子と、撮像手段側の偏光子とで、偏光面が互いに交差するように配置されている。ここで、偏光面が互いに交差するとは、光源から照射され測定対象にあたって反射し、撮像手段に入射してくる光の光量が最も小さくなる関係にあることをいう。撮像手段側の偏光子の向きを固定し、光源側の偏光子を回転させながら撮像手段から得られる映像が最も暗くなる条件を満たす光源側の偏光子の向きをもって交差しているとみなすことができる。測定対象に対して光源と撮像手段がほぼ同じ方向にある場合、それぞれの偏光子は直交することになる。この条件のとき、光沢は表面で一回のみ反射するため、光源側の偏光子を通り測定対象で反射した光沢の偏光の向きは保たれることになり、それと直交している撮像手段側の偏光子を通過することはできない。本実施形態では、一連の操作を簡単にするために、他の撮像時にも撮像手段側の偏光子を配置したまま撮像しているが、他の撮像に用いる各光源には偏光子が配置されておらず、そのとき撮像手段側に入ってくる光は殆ど偏光していないため、他の撮像には影響しないとみなせる。 In the measuring apparatus, polarizers 3A and 6A are arranged between the face to be measured and the imaging means 3 and each fixed light source 6, respectively. Here, these polarizers are arranged such that the polarization planes of the polarizer on the light source side and the polarizer on the imaging means side cross each other. Here, the planes of polarization intersecting each other means that the amount of light emitted from the light source, reflected by the measurement object, and incident on the imaging means is minimized. It can be considered that the orientation of the polarizer on the imaging means side is fixed and the light source side polarizer intersects with the orientation of the light source side polarizer satisfying the condition that the image obtained from the imaging means is the darkest while rotating the polarizer on the light source side. it can. When the light source and the imaging means are in substantially the same direction with respect to the measurement target, the respective polarizers are orthogonal to each other. Under this condition, since the gloss is reflected only once on the surface, the direction of the glossy polarized light reflected by the measurement object through the polarizer on the light source side is maintained, and on the imaging means side that is orthogonal to it. It cannot pass through the polarizer. In this embodiment, in order to simplify a series of operations, imaging is performed while the polarizer on the imaging unit side is arranged at the time of other imaging, but a polarizer is arranged for each light source used for other imaging. In this case, the light that enters the imaging means at that time is hardly polarized, so that it can be considered that the other imaging is not affected.

情報処理手段４及び制御手段７は、上述のように、これら各手段として機能するコンピュータ８で構成されている。コンピュータ８は、中央演算処理装置（ＣＰＵ）、主記憶装置、補助記憶装置、入力装置、出力装置を備えたハードウェアと、基本ソフトウェア及び基本ソフトウェアと連動して下記ステップに従って測定装置に顔の表面反射特性を算出させるプログラムを備えたソフトウェアを具備している。コンピュータ８は、ＣＰＵが前記主記憶装置に保持された前記各プログラムを解釈してその指令内容を実行し、下記ステップに従ってコンピュータ８を前記情報処理手段４及び制御手段７として機能させる。 As described above, the information processing means 4 and the control means 7 are constituted by the computer 8 that functions as each of these means. The computer 8 includes a hardware provided with a central processing unit (CPU), a main storage device, an auxiliary storage device, an input device, and an output device, a basic software and a basic surface and a basic surface. Software having a program for calculating reflection characteristics is provided. In the computer 8, the CPU interprets the programs held in the main storage device and executes the contents of the instructions, and causes the computer 8 to function as the information processing means 4 and the control means 7 according to the following steps.

＜測定手法＞
次に、前記測定装置による顔の表面反射特性の測定手法について図１０〜図１４を参照しながら説明する。
先ず、ステップＳ１１（図１１参照）において、制御手段７の制御下、投影手段５によって測定対象者の顔に前述の白黒の繰り返しからなる横縞の光学パターンが投影される。光学パターンの周期は、制御手段７によって連続的に変更され、その投影状態が撮像手段３によって連続的に撮像される。全ての光学パターンの投影状態を撮像した後、投影手段５による顔への光学パターンの投影が停止される。 <Measurement method>
Next, a method for measuring the surface reflection characteristics of the face by the measurement apparatus will be described with reference to FIGS.
First, in step S11 (see FIG. 11), under the control of the control means 7, the projection means 5 projects the horizontal stripe optical pattern consisting of the above-described black and white repetition onto the face of the measurement subject. The period of the optical pattern is continuously changed by the control means 7, and the projection state is continuously imaged by the imaging means 3. After imaging the projection state of all the optical patterns, the projection of the optical pattern onto the face by the projection unit 5 is stopped.

前記光学パターンが投影された状態の顔の投影画像データは、制御手段７の制御下、情報処理手段４に取り込まれる。 The projected image data of the face on which the optical pattern is projected is taken into the information processing means 4 under the control of the control means 7.

次に、ステップＳ１２（図１１参照）において、各固定光源６が、制御手段７の制御下、所定の順番で１回ずつ点灯・消灯され、各固定光源６による顔の固定光源光照射画像が撮像手段３によって撮像される。全ての固定光源６による１回ずつの点灯・消灯が終わると、光源６の点灯・消灯が完了する。 Next, in step S12 (see FIG. 11), each fixed light source 6 is turned on and off once in a predetermined order under the control of the control means 7, and a fixed light source light irradiation image of the face by each fixed light source 6 is obtained. The image is picked up by the image pickup means 3. When the lighting and extinguishing of each fixed light source 6 is completed once, the lighting and extinguishing of the light source 6 are completed.

各固定光源６による顔の固定光源光照射画像は、電子データ（固定光源光照射画像データ、以下、固定光源画像データともいう。）として、制御手段７の制御下、情報処理手段４に取り込まれる。 The fixed light source irradiation image of the face by each fixed light source 6 is taken into the information processing means 4 under the control of the control means 7 as electronic data (fixed light source light irradiation image data, hereinafter also referred to as fixed light source image data). .

次に、ステップＳ１３（図１１参照）において、可動光源２の光源２Ａが、制御手段７の制御下、顔の左後方の定位置から略Ｖ字状の軌道の尖端側まで移動し、その間において所定位置で静止し、顔の可動光源光照射画像が撮像手段３によって２種類の露出条件で撮像される。光源２Ａによる光の照射及び撮影が終わると、光源２Ａは左後方の定位置に戻り、その光は遮蔽板２１Ａで遮られる。次いで、光源２Ｂが、制御手段７の制御によってまず略Ｖ字状の軌道の尖端側に移動し、しかる後Ｖ字状の軌道の尖端側から顔の右後方まで移動し、その間において所定位置で停止し、顔の可動光照射画像が撮像手段３によって２種類の露出条件で撮像される。光源２Ｂによる光の照射及び撮影が終わると、光源２Ｂは定位置に戻り、その光は遮蔽板２１Ｂで遮られる。 Next, in step S13 (see FIG. 11), the light source 2A of the movable light source 2 moves from a fixed position on the left rear side of the face to the apex side of the substantially V-shaped trajectory under the control of the control means 7, The face is stationary at a predetermined position, and the moving light source image of the face is picked up by the image pickup means 3 under two types of exposure conditions. When the light irradiation and photographing by the light source 2A are finished, the light source 2A returns to the left rear fixed position, and the light is blocked by the shielding plate 21A. Next, the light source 2B is first moved to the apex side of the substantially V-shaped track by the control of the control means 7, and then moved from the apex side of the V-shaped track to the right rear side of the face. Then, the movable light irradiation image of the face is picked up by the image pickup means 3 under two types of exposure conditions. When the light irradiation and photographing by the light source 2B are finished, the light source 2B returns to a fixed position, and the light is blocked by the shielding plate 21B.

光源２Ａ、２Ｂによる顔の可動光源光照射画像は、電子データ（可動光源光照射画像データ、以下、可動光源画像データともいう。）として、制御手段７の制御下に情報処理手段４に取り込まれる。 The movable light source light irradiation image of the face by the light sources 2A and 2B is taken into the information processing means 4 under the control of the control means 7 as electronic data (movable light source light irradiation image data, hereinafter also referred to as movable light source image data). .

上述のようにして情報処理手段４に取り込まれた投影画像データ、固定光源光照射画像データ及び可動光源光照射画像データに基づいて、顔の表面光反射特性が情報処理手段４において以下のように求められる。 Based on the projection image data, fixed light source light irradiation image data, and movable light source light irradiation image data captured by the information processing means 4 as described above, the surface light reflection characteristics of the face are obtained in the information processing means 4 as follows. It is done.

即ち、ステップＳ１４（図１１参照）において、情報処理手段４が、該投影画像データに基づいて顔の三次元形状情報（三次元基準座標における座標情報）を演算する。測定対象に光学パターンを投影して三次元形状情報を得る方法には、光切断法、イメージエンコーダ法などがあるが、装置及び解析プログラムの構成を比較的簡単にできることから、前記投影画像データから三次元形状情報を算出する空間コード化法が好ましい。 That is, in step S14 (see FIG. 11), the information processing means 4 calculates the three-dimensional shape information of the face (coordinate information in the three-dimensional reference coordinates) based on the projection image data. There are a light cutting method, an image encoder method, and the like as a method for obtaining the three-dimensional shape information by projecting the optical pattern onto the measurement target. However, since the configuration of the apparatus and the analysis program can be made relatively simple, A spatial coding method for calculating three-dimensional shape information is preferred.

本実施形態の測定装置において、上記三次元形状情報は、本実施形態では、空間コード化法（Valkenburg,R.J., et al., 1998, Accurate 3D measurement using a structured light system. Image and Vision Computing 16, 2, 99-110）によって求められる。 In the measurement apparatus of the present embodiment, the three-dimensional shape information in the present embodiment is the spatial coding method (Valkenburg, RJ, et al., 1998, Accurate 3D measurement using a structured light system. Image and Vision Computing 16, 2, 99-110).

投影手段５は、１回目に上半分が暗、下半分が明のパターンを投影する。撮像手段から見える測定対象の中で、暗の状態になっている領域は上半分の空間に属し、明の状態になっている領域は下半分の空間に属していることになる。２回目に上半分、下半分をそれぞれ二分して上から順に暗−明−暗−明なる光学パターンを投影したとき、１、２回目共に暗となった領域は１番上の四分の一、１回目に暗、２回目に明となった領域は上から２番目の四分の一といったように、測定対象の各位置が属している空間を絞り込むことができる。
このことを利用して具体的には、以下のようにして形状を求める。図１２に示したように、撮像された一連の画像を元に、測定対象の各位置の状態を２進数で表す。一の位は、１回目のパターンを投影したときに撮像された画像で着目している位置が暗の状態のとき０、明の状態のとき１とする。十の位は、２回目のパターンを投影したときに撮像された画像で着目している位置が暗の状態のとき０、明の状態のとき１とする。以下、３回目のパターンを投影したときの画像と百の位、４回目のパターンを投影したときの画像と千の位というように対応させていく。これを続けていくと、撮像手段３に映し出される測定対象の各微小面に対して一つの２進数が対応することになる。この結果得られる２進数はそれぞれ、空間を横向きに薄く切断していったときの一つと対応する。一方で、測定対象のある微小面が撮像された画像のあるピクセルに現れたとき、当該微小面はそのピクセルに対応する、撮像装置から延びる直線上のどこかに属することになる。先に述べた薄く切断された空間を平面と近似した場合、得られた２進数と対応するピクセルの位置から、前記平面と前記直線の交点として測定対象の当該微小面の空間的な位置を求めることができる。 The projection means 5 projects a pattern in which the upper half is dark and the lower half is bright for the first time. Of the measurement object that can be seen from the imaging means, the dark area belongs to the upper half space, and the bright area belongs to the lower half space. When the second half is divided into the upper half and the lower half, and a dark-bright-dark-bright optical pattern is projected in order from the top, the dark area in both the first and second times is the upper quarter. It is possible to narrow down the space to which each position of the measurement object belongs, such as the area that is dark at the first time and brighter at the second time, such as the second quarter from the top.
Specifically, the shape is obtained in the following manner. As shown in FIG. 12, based on a series of captured images, the state of each position of the measurement target is represented by a binary number. The first place is 0 when the focused position in the image captured when the first pattern is projected is dark, and 1 when it is bright. The tens place is 0 when the focused position in the image captured when the second pattern is projected is dark, and 1 when it is bright. In the following, the image when the third pattern is projected is associated with the hundreds, and the image when the fourth pattern is projected is associated with the thousands. If this is continued, one binary number corresponds to each minute surface to be measured projected on the imaging means 3. Each binary number obtained as a result corresponds to one when the space is cut thinly horizontally. On the other hand, when a minute surface to be measured appears in a pixel of an imaged image, the minute surface belongs to somewhere on a straight line corresponding to that pixel extending from the imaging device. When the thinly cut space described above is approximated to a plane, the spatial position of the minute surface to be measured is obtained as the intersection of the plane and the straight line from the position of the pixel corresponding to the obtained binary number. be able to.

次に、ステップＳ１５（図１１参照）において、情報処理手段４が、該固定光源画像データ及び前記三次元形状情報に基づいて顔の法線情報を演算する。測定対象の法線を求める方法としては、三次元形状の各面の傾きから求める方法等が挙げられる。測定精度を考慮すると、法線を形状とは別の方法で求めることが望ましく、具体的にはフォトメトリックステレオ法が好ましいが、装置規模の小型化を考慮すると、前記固定光源画像データ及び前記三次元形状情報から、修正フォトメトリックステレオ法を用いるのがより好ましい。以下にその具体的な方法を説明する。 Next, in step S15 (see FIG. 11), the information processing means 4 calculates facial normal information based on the fixed light source image data and the three-dimensional shape information. Examples of the method for obtaining the normal line of the measurement target include a method for obtaining from the inclination of each surface of the three-dimensional shape. Considering the measurement accuracy, it is desirable to obtain the normal by a method different from the shape. Specifically, the photometric stereo method is preferable, but considering the downsizing of the apparatus scale, the fixed light source image data and the cubic It is more preferable to use the modified photometric stereo method from the original shape information. The specific method will be described below.

本実施形態の測定装置において、前記法線情報は、法線情報の算出方法（Woodha, R.J. 1980. Photometric method for determining surface orientation from multiple images. Optical Engineering 19, 1, 139-144）を元にした方法によって求められる。この算出方法において、法線情報は、具体的に以下のようにして求められる。 In the measurement apparatus of this embodiment, the normal information is based on a normal information calculation method (Woodha, RJ 1980. Photometric method for determining surface orientation from multiple images. Optical Engineering 19, 1, 139-144). Required by the method. In this calculation method, the normal information is specifically obtained as follows.

前記固定光源画像データの任意の点（座標）の法線を、顔全面に亘って、以下のようにして求める。 A normal of an arbitrary point (coordinate) of the fixed light source image data is obtained over the entire face as follows.

各光源６（本実施形態では９個）がそれぞれ点灯したときの固定光源画像データにおける任意の点の強度をＩｎ（ｎ＝１〜９）とし、着目点の法線ベクトルをＮとする。また、着目点に対する各光源の方向ベクトルをＤｎ（ｎ＝１〜９）とする。さらに、各光源６と着目点との距離をＬｎ（ｎ＝１〜９）とする。また、光の強度が、その点の法線と光源の向きとの内積に比例する（ランバーシアン（Lambertian））と仮定すると、光の強度の理論値Ｉｔｎは、その点の法線ベクトルＮとその点からの光源への方向ベクトルＤｎとによって、下記式（２）で表される。ただし、ｃは定数であり、単位系の取り方や光源の強度、撮像手段の露出条件などにより決まる。また、ｍａｘ（ａ，ｂ）は、ａ、ｂの何れか大きなほうの値をとることを意味する。また、ｄｏｔ（Ａ，Ｂ）は、ベクトルＡとベクトルＢの内積を表す。右辺を除するＬｎ²は、光源から発せられた光の距離による減衰の効果を表している。
Ｉｔｎ＝ｍａｘ（０，ｃ×ｄｏｔ（Ｎ，Ｄｎ））／Ｌｎ² （ｎは１〜９）（２）
そして、Ｎをフィッティングパラメータとしたときに、この式の値が実測と最も近いときのＮの値が真値であると考え、Σ（Ｉｎ − ｍａｘ（０,ｃ×ｄｏｔ（Ｎ，Ｄｎ））／Ｌｎ²）²が最小となるように、最小自乗法によって、法線ベクトルＮを求める。 The intensity of an arbitrary point in the fixed light source image data when each of the light sources 6 (9 in this embodiment) is turned on is In (n = 1 to 9), and the normal vector of the point of interest is N. Further, the direction vector of each light source with respect to the point of interest is Dn (n = 1 to 9). Further, the distance between each light source 6 and the point of interest is Ln (n = 1 to 9). Assuming that the light intensity is proportional to the inner product of the normal of the point and the direction of the light source (Lambertian), the theoretical value Itn of the light intensity is the normal vector N of the point The direction vector Dn from the point to the light source is expressed by the following formula (2). However, c is a constant and is determined by how to take a unit system, the intensity of the light source, the exposure condition of the imaging means, and the like. Further, max (a, b) means taking a larger value of either a or b. Also, dot (A, B) represents the inner product of vector A and vector B. Ln ² excluding the right side represents the attenuation effect due to the distance of the light emitted from the light source.
Itn = max (0, c × dot (N, Dn)) / Ln ² (n is 1 to 9) (2)
When N is a fitting parameter, the value of N when the value of this equation is closest to the actual measurement is considered to be a true value, and Σ (In−max (0, c × dot (N, Dn)) / Ln ² ) ² determines the normal vector N by the method of least squares so that ² is minimized.

上述のようにして求められた三次元形状情報及び法線情報は、ステップＳ１６（図１１参照）において、ハイブリッド法と呼ばれる修正処理によってさらに修正されることが好ましい。この修正によって、三次元形状情報と法線情報の整合性が確保でき、さらに三次元形状情報と法線情報の正確さが向上する。異なる方法で得られた三次元形状情報及び法線情報の整合性を確保し、正確さを向上させる方法は現時点ではハイブリッド法が唯一の方法であるが、今後新しい方法が開発された際にはそのような方法で置き換えても良い。 It is preferable that the three-dimensional shape information and the normal line information obtained as described above are further corrected by a correction process called a hybrid method in step S16 (see FIG. 11). This correction ensures the consistency between the three-dimensional shape information and the normal information, and further improves the accuracy of the three-dimensional shape information and the normal information. The hybrid method is currently the only method to ensure the consistency of 3D shape information and normal information obtained by different methods and improve the accuracy, but when a new method is developed in the future, You may replace by such a method.

本実施形態の測定装置Ａでは、前記三次元形状情報及び法線情報の修正方法として、ハイブリッド法（Nehab,D.,Rusinkiewicz, et al., 2005, Efficiently Efficiently combining position and normals for precise3D geometry. ACM Transactions on Graphics 24, 3, 536-543.）によって行われる。 In the measuring apparatus A of the present embodiment, as a method of correcting the three-dimensional shape information and normal information, a hybrid method (Nehab, D., Rusinkiewicz, et al., 2005, Efficiently Efficiently combining position and normals for precise 3D geometry. ACM Transactions on Graphics 24, 3, 536-543.)

具体的には、図１３に示すように、前記三次元形状情報におけるあるピクセル近傍での形状に着目したとき、該ピクセル及びそれと接している上下左右のピクセルで形成される４つの三角形を考える。それぞれのピクセルには３次元空間での位置情報が含まれているので、各三角形の法線ベクトルを求めることができる。各三角形の法線ベクトルを平均したものを該ピクセルでの法線ベクトルＮ’とする。このようにして求めた法線ベクトルＮ’は、当該文献中にも指摘されている通り、法線ベクトルＮに比べて高周波成分は精度が低く、低周波成分は精度が高いので、正確さを増すためにＮの高周波成分とＮ’の低周波成分を組み合わせて新しい法線ベクトルとする。具体的には、先ず細かな変化の情報が失われるように法線ベクトルＮ’を平滑化し、その結果をＮｓ’とする。平滑化には一般的なガウシアンフィルタを用いるが、そのフィルタの幅（すなわちカットオフ周波数）は測定対象の凹凸の細かさの程度に応じて最適な値が異なることになるので、平滑化の出力を見ながら毛穴や細かなシワに由来すると思われる構造が失われ、かつ全体の平均的な形状が歪まない値を予め選んでおく。この条件は、例えば測定対象が人の顔であれば細かな凹凸の空間周波数はあまり変わらないので、ある人の顔に関して一旦この値を選んでおけば、別の人の顔を計測する際にも同じ値を使用して構わない。本実施形態においては、１０２４×６７８ピクセルに測定対象全体が含まれる条件で撮像して取得したデータに対して、分散値が１２となるガウシアンフィルタを用いて平滑化を行っている。法線ベクトルＮも法線ベクトルＮ’と同じ平滑化条件で平滑化し、その結果をＮｓとする。そして、得られた平滑化法線ベクトルＮｓを法線ベクトルＮに一致させる回転変換を、各点について求め、この回転変換を前記平滑化法線ベクトルＮｓ’に適用し、法線ベクトルＮ”を求める。 Specifically, as shown in FIG. 13, when attention is paid to the shape in the vicinity of a certain pixel in the three-dimensional shape information, four triangles formed by the pixel and the upper, lower, left, and right pixels in contact with the pixel are considered. Since each pixel includes position information in a three-dimensional space, a normal vector of each triangle can be obtained. An average of the normal vectors of each triangle is defined as a normal vector N ′ at the pixel. The normal vector N ′ obtained in this way is accurate as the high frequency component is less accurate and the low frequency component is more accurate than the normal vector N, as pointed out in the literature. In order to increase the frequency, N high frequency components and N ′ low frequency components are combined to form a new normal vector. Specifically, the normal vector N ′ is first smoothed so that information on fine changes is lost, and the result is defined as Ns ′. A general Gaussian filter is used for smoothing, but the width of the filter (that is, the cut-off frequency) varies depending on the degree of fineness of the unevenness to be measured. A value that does not cause distortion of the structure considered to be derived from pores and fine wrinkles and does not distort the overall average shape is selected in advance. For example, if the measurement target is a human face, the spatial frequency of fine irregularities does not change much, so once you select this value for one person's face, when measuring another person's face May use the same value. In the present embodiment, smoothing is performed using a Gaussian filter with a variance value of 12 on data acquired by imaging under the condition that the entire measurement target is included in 1024 × 678 pixels. The normal vector N is also smoothed under the same smoothing condition as the normal vector N ', and the result is Ns. Then, a rotation transformation that matches the obtained smoothing normal vector Ns with the normal vector N is obtained for each point, and this rotation transformation is applied to the smoothing normal vector Ns ′ to obtain the normal vector N ″ as Ask.

次に、測定対象である顔の実形状を想定し、それと実測値すなわち前記三次元形状情報及び法線ベクトルＮ”との誤差が最小となるように形状を修正する。具体的には、実形状として最もありうる形状を修正形状情報としたときに、該修正形状情報と前記三次元形状情報との誤差の自乗和と、修正形状情報から求めた修正法線情報と前記法線ベクトルＮ”との誤差の自乗和との重み付けした和が最小となるように、修正形状情報及び修正法線情報（修正法線ベクトルＮ）を求める。重み付けの値は、測定系や測定対象により異なるので、予め当該ステップがうまく機能するように、値を設定しておく。
本実施例において、以後使用する前記顔の光反射特性情報及び前記顔の法線情報はそれぞれ、ここで求めた前記修正形状情報及び前記修正法線情報を指す。 Next, assuming the actual shape of the face to be measured, the shape is corrected so that the error between the actual shape and the actual measurement value, that is, the three-dimensional shape information and the normal vector N ″ is minimized. When the most probable shape is the corrected shape information, the sum of squares of errors between the corrected shape information and the three-dimensional shape information, the corrected normal information obtained from the corrected shape information, and the normal vector N ″ The corrected shape information and the corrected normal information (corrected normal vector N) are obtained so that the weighted sum of the error and the square sum of the errors is minimized. Since the weighting value varies depending on the measurement system and the measurement object, the value is set in advance so that the step functions well.
In this embodiment, the face light reflection characteristic information and the face normal information to be used hereinafter refer to the corrected shape information and the corrected normal information obtained here, respectively.

次に、ステップＳ１７において、情報処理手段４が、該可動光源画像データに基づいて顔の光反射特性情報を求める。
情報処理手段４によって求められる光反射特性情報は、双方向反射関数（ＢＲＤＦ）またはそれから派生する様々な特徴量を取り得るが、本実施例においては特に、照射される光の強さが一定である場合における、光沢の強度及び幅を言い、具体的には、以下のようにして求められる。 Next, in step S 17, the information processing means 4 obtains face light reflection characteristic information based on the movable light source image data.
The light reflection characteristic information obtained by the information processing means 4 can take a bidirectional reflection function (BRDF) or various feature amounts derived therefrom, but in this embodiment, the intensity of irradiated light is particularly constant. The gloss intensity and width in a certain case are specifically described as follows.

先ず、予め可動光源の光源２Ａ、２Ｂが点灯したときに、各光源をｍ個の仮想点光源の連続体として近似したときに、一つの仮想点光源から発せられる光が空間中のどの方向にどれくらいの強度で放射するかを計測しておく。即ち、一つの仮想点光源に相当する長さの要素以外を遮蔽し、そのとき各方向に放射される光量を計測する。本実施例においては取り扱いを簡単にするために、光源の特性が線光源の軸に対して軸対象で、線光源中の各要素の放射特性は光源の部分によらず（中央部でも端部でも）一定であることを仮定した。各光源を高さ方向にｍ個の単位に分割したときにおける各分割単位に対する方向及び強度を計測する。分割単位の個数ｍは、光源の長さ、測定対象の大きさ、光源と測定対象の距離などに応じて設定され、精度向上の点ではｍは大きければ大きいほど好ましいが、一方で計算コストが上昇してしまうことから、総合的には精度を悪化させない範囲で小さいことが好ましく、本実施形態の場合では、ｍは１００程度であることが好ましい。 First, when the light sources 2A and 2B of the movable light source are turned on in advance, when each light source is approximated as a continuum of m virtual point light sources, the light emitted from one virtual point light source is in which direction in the space. Measure how much radiation is emitted. That is, elements other than the length corresponding to one virtual point light source are shielded, and the amount of light emitted in each direction at that time is measured. In this embodiment, in order to simplify the handling, the characteristics of the light source are an axis object with respect to the axis of the line light source, and the radiation characteristics of each element in the line light source do not depend on the portion of the light source (even in the central portion, the end portion But) it was assumed to be constant. The direction and intensity for each division unit when each light source is divided into m units in the height direction are measured. The number m of division units is set according to the length of the light source, the size of the measurement target, the distance between the light source and the measurement target, and m is preferably as large as possible in terms of accuracy improvement. Since it will rise, it is preferable that it is small in the range which does not deteriorate a precision comprehensively, and in the case of this embodiment, it is preferable that m is about 100.

次に、顔の光反射特性が、Ｔｏｒｒａｎｃｅ−Ｓｐａｒｒｏｗモデル（以下、ＴＳモデルという。）に従うと仮定する。
即ち、光反射率ｆは、皮膚の屈折率をｎ、光沢の強度をα、広がりをＭとすると、下記式（３）により求められるとする。
ｆ＝α×（Ｆ・Ｇ・Ｄ）／（４ｃｏｓθｉｃｏｓθｒ）（３）
ただし、フレネル項Ｆは、垂直入射に対しては下記式（４）、垂直入射以外は、式（５）で表される。
Ｆ＝（ｎ−１）²／（ｎ＋１）² （４）
Ｆ＝（１／２）〔ｔａｎ²（θｉ−θｔ）／ｔａｎ²（θｉ＋θｔ）＋
ｓｉｎ²（θｉ−θｔ）／ｓｉｎ²（θｉ＋θｔ）〕（５）
また、表面粗さ項Ｄは、下記式（６）で表される。
Ｄ＝ｅｘｐ（−ｔａｎ²β／Ｍ²）／πＭ²ｃｏｓ⁴β （６）
また、形状項Ｇは、下記式（７）で表される。ここで、ｍｉｎ（ａ，ｂ，ｃ）はａ，ｂ，ｃの中で最も小さい値を意味する。
Ｇ＝ｍｉｎ（Ｇｓ，Ｇｍ，１）（７）
ここで、Ｇｓ＝２（Ｎ・Ｈ）・（Ｎ・Ｓ）／（Ｓ・Ｈ）
Ｇｍ＝２（Ｎ・Ｈ）・（Ｎ・Ｖ）／（Ｖ・Ｈ）
Ｈ＝（Ｓ・Ｖ）／｜Ｓ・Ｖ｜
θｉ＝ａｃｏｓ（ｄｏｔ（Ｎ，Ｓ））
θｒ＝ａｃｏｓ（ｄｏｔ（Ｎ，Ｖ））
（ｄｏｔ（Ａ，Ｂ）は、ベクトルＡとベクトルＢの内積）
θｔ＝ａｓｉｎ（ｓｉｎθｉ／ｎ）
（ｎは皮膚表面の屈折率）
であり、Ｎは、修正法線ベクトル、Ｓは光源方向の単位ベクトル、Ｖは撮像手段方向の単位ベクトル、ＨはＳとＶを２等分する単位ベクトル、βはＮとＨが成す角度、θｉは入射角、θｒは受光角、θｔは屈折角である。 Next, it is assumed that the light reflection characteristic of the face follows the Torrance-Sparrow model (hereinafter referred to as TS model).
That is, it is assumed that the light reflectance f is obtained by the following equation (3), where n is the refractive index of the skin, α is the intensity of gloss, and M is the spread.
f = α × (F · G · D) / (4 cos θicos θr) (3)
However, the Fresnel term F is expressed by the following equation (4) for normal incidence, and expressed by equation (5) except for normal incidence.
F = (n−1) ² / (n + 1) ² (4)
F = (1/2) [tan ² (θi−θt) / tan ² (θi + θt) +
sin ² (θi−θt) / sin ² (θi + θt)] (5)
Moreover, the surface roughness term D is represented by the following formula (6).
D = exp (-tan ² β / M ² ) / πM ² cos ⁴ β (6)
The shape term G is expressed by the following formula (7). Here, min (a, b, c) means the smallest value among a, b, and c.
G = min (Gs, Gm, 1) (7)
Here, Gs = 2 (N · H) · (N · S) / (S · H)
Gm = 2 (N · H) · (N · V) / (V · H)
H = (S · V) / | S · V |
θi = acos (dot (N, S))
θr = acos (dot (N, V))
(Dot (A, B) is the inner product of vector A and vector B)
θt = asin (sin θi / n)
(N is the refractive index of the skin surface)
N is a modified normal vector, S is a unit vector in the direction of the light source, V is a unit vector in the direction of the imaging means, H is a unit vector that bisects S and V, β is an angle formed by N and H, θi is an incident angle, θr is a light receiving angle, and θt is a refraction angle.

次に、前記可動光源画像データの顔の表面に対応する各点に対して、光反射特性を、次のようにして、顔の全面に亘って求める。
即ち、先ず、顔の中のある点に着目し、可動光源がそれぞれの静止位置（Ｐ１〜Ｐｎのｎ箇所）において光を照射したときの該点の画像の明るさ（実測値）Ｉ１〜Ｉｎを求める。
そして、可動光源がそれぞれの静止位置に来たときの該点に照射される光の各前記仮想点光源の向きＶ１１〜Ｖ１ｍ〜・・・〜Ｖｎ１〜Ｖｎｍと光量Ｌ１（Ｖ１１）〜・・・〜Ｌ１（Ｖ１ｍ）〜Ｌｎ（Ｖｎ１）〜・・・〜Ｌｎ（Ｖｎｍ）を前記計測した方向及び強度のデータから求める。 Next, for each point corresponding to the face surface of the movable light source image data, a light reflection characteristic is obtained over the entire face as follows.
That is, first, paying attention to a certain point in the face, the brightness (actually measured values) I1 to In of the point when the movable light source irradiates light at each stationary position (n locations of P1 to Pn). Ask for.
And the direction V11-V1m of each said virtual point light source of the light irradiated to this point when a movable light source comes to each stationary position ... Vn1-Vnm and light quantity L1 (V11) ... ˜L1 (V1m) ˜Ln (Vn1) ˜... ˜Ln (Vnm) are determined from the measured direction and intensity data.

次に、可動光源の各光源が所定の静止位置において光を照射したとき画像の明るさＩｃ１〜Ｉｃｎを、ＴＳモデルに従うと仮定してＶ１１〜Ｖｎｍ及びＬ１（Ｖ１１）〜Ｌｎ（Ｖｎｍ）から求める。この計算結果には、光沢の強度と幅を表す変数（光反射特性情報）α、Ｍが含まれているので、図１４に示すように、Ｉｃ１〜Ｉｃｎが実測値のＩ１〜Ｉｎと最も一致するように、変数α、Ｍを変数としてシンプレックス法を用いてフィットさせ、光反射特性情報α、Ｍを求める。ここで前記実測値Ｉ１〜Ｉｎは、２種類の露出条件で撮影した画像から値を読み取る。具体的には、あらかじめ明るい露出条件で撮影した画像の画素値が同じ場所を暗い露出条件で撮影したときに比べて何倍になるか係数Ｃとして求めておく。そして、暗い露出条件で撮影したｊ番目の静止位置における画像の該画素値がＩ１ｊ、明るい露出条件で撮影したｊ番目の静止位置における画像の該画素値がＩ２ｊであったとき、画像中の明るい領域（明るい露出条件では画素値が飽和してしまう領域）ではＩｊ＝Ｃ×Ｉ１ｊ、暗い領域ではＩｊ＝Ｉ２ｊとする。 Next, when each light source of the movable light source emits light at a predetermined stationary position, the brightness Ic1 to Icn of the image is obtained from V11 to Vnm and L1 (V11) to Ln (Vnm) on the assumption that it follows the TS model. . Since this calculation result includes variables (light reflection characteristic information) α and M representing the intensity and width of gloss, as shown in FIG. 14, Ic1 to Icn most closely match the actual measured values I1 to In. As described above, the light reflection characteristic information α and M are obtained by fitting using the simplex method with the variables α and M as variables. Here, the measured values I1 to In are read from images taken under two types of exposure conditions. Specifically, the coefficient C is obtained as the number of times that the pixel value of the image shot under the bright exposure condition is the same as when the same pixel value is shot under the dark exposure condition. When the pixel value of the image at the j-th still position photographed under the dark exposure condition is I1j and the pixel value of the image at the j-th still position photographed under the bright exposure condition is I2j, the bright image in the image Ij = C × I1j in the region (region where the pixel value is saturated under bright exposure conditions), and Ij = I2j in the dark region.

上述のように求めた光反射特性情報α、Ｍのマップの中で、鼻の陰になってしまう等の影響で明らかに数値が正しくない場合には、その値を破棄し、周囲の値で補間する。このようにして求められた表面反射特性情報は、ＣＰＵ１０Ａによって、ファイル装置１０ＦからＧＰＵ１０Ｃ内の処理用データ記憶領域に記憶される。 In the map of the light reflection characteristic information α and M obtained as described above, if the numerical value is clearly incorrect due to the shadow of the nose, etc., the value is discarded and the surrounding value is used. Interpolate. The surface reflection characteristic information thus obtained is stored in the processing data storage area in the GPU 10C from the file device 10F by the CPU 10A.

図４〜図７におけるＳ３〜Ｓ８のリアルタイム処理に先立ち、ＣＰＵ１０Ａは、プログラム１００における人物画像抽出処理機能１０２、追跡処理機能１０３、合成顔画像情報処理機能１０５、肌情報処理機能１０６、陰影成分情報処理機能１０７、顔画像合成処理機能１０８、人物背景合成動画像合成処理機能１０９、動画像表示処理機能１１０をＧＰＵ１０Ｃのプログラム領域にロードする。 Prior to the real-time processing of S3 to S8 in FIG. 4 to FIG. 7, the CPU 10A has the human image extraction processing function 102, tracking processing function 103, synthetic face image information processing function 105, skin information processing function 106, shadow component information in the program 100. The processing function 107, the face image composition processing function 108, the person background composition moving image composition processing function 109, and the moving image display processing function 110 are loaded into the program area of the GPU 10C.

本実施形態では、人物実動画像を取り込んでリアルタイム処理を行うため、人物実動画像入力装置１０Ｋは、ＩＥＥＥ１３９４インターフェースを介してコンピュータに接続されたビデオカメラで構成される。該ビデオカメラは、画素の縦横比、周辺部での歪みなど、各機種固有の特性を相殺させるため、背景実動画像入力装置１０Ｌで用いたビデオカメラＶ１と同一機種のビデオカメラを用いるのが好ましい。人物の撮像に際しては、図１５に示すように、ビデオカメラＶ３のレンズに偏光板Ｐ３を取り付ける。また、人物を照らす照明Ｌ３にも偏光板Ｐ３を取り付ける。人物の背後には、従来からクロマキー合成処理を行うために使用されているスクリーン（緑色又は青色）ＳＣを配する。撮像したときに画像上で単一色に映るように、照明Ｌ３の向きやスクリーンＳＣの張り方に配慮する。人物は、顔以外の部分（首から下及び髪の毛）を黒くて光沢があまり出ない衣装で覆う。これは、後述するように、顔の皮膚領域の処理のみを行うので、処理の結果得られる合成映像のうち、人物画像で顔以外の部分が、顔との対比で不自然に見えないようにするためである。前記人物実動画像の冒頭には、後述する追跡処理に使用するために、ビデオカメラＶ３に対して顔を正対させて静止させた状態の画像（正面画像）を一定時間（数秒間）撮像することが好ましい。 In the present embodiment, the person actual moving image input device 10K is configured by a video camera connected to a computer via an IEEE 1394 interface in order to capture a person actual moving image and perform real-time processing. The video camera uses a video camera of the same model as the video camera V1 used in the background actual moving image input device 10L in order to cancel the characteristics peculiar to each model such as the aspect ratio of the pixel and the distortion in the peripheral portion. preferable. When imaging a person, as shown in FIG. 15, a polarizing plate P3 is attached to the lens of the video camera V3. A polarizing plate P3 is also attached to the illumination L3 that illuminates the person. Behind the person, a screen (green or blue) SC conventionally used for performing chroma key composition processing is arranged. Consider the direction of the illumination L3 and how the screen SC is stretched so that it appears as a single color on the image when it is picked up. The person covers the parts other than the face (below the neck and the hair) with black clothes that are not very glossy. As will be described later, since only the skin region of the face is processed, the portion other than the face in the human image in the synthesized image obtained as a result of the processing does not look unnatural in comparison with the face. It is to do. At the beginning of the actual human moving image, an image (front image) in which the face is kept stationary with the face facing the video camera V3 is captured for a certain period (several seconds) for use in tracking processing described later. It is preferable to do.

図４に示すように、先ず、データ入力・記憶処理ステップＳ１’において、ＣＰＵ１０Ａが、プログラム１００におけるデータ入力・記憶処理機能１０１により、撮像された人物動画像を人物実動画像入力装置１０Ｋからコンピュータ１０に取り込む。取り込まれた人物動画像は、後述するＧＰＵ１０Ｃによるリアルタイムの画像処理に使用される。 As shown in FIG. 4, first, in the data input / storage processing step S1 ′, the CPU 10A uses the data input / storage processing function 101 in the program 100 to transfer the captured person moving image from the person actual moving image input device 10K to the computer. 10 The captured human moving image is used for real-time image processing by the GPU 10C described later.

画像合成処理装置１では、人物画像抽出処理ステップＳ３（図４参照）において、ＧＰＵ１０Ｃが、あらかじめロードされたプログラム１００における人物画像抽出処理機能１０２を実行し、前記人物実動画像入力装置１０Ｋから取り込まれた前記人物実動画像の各フレーム画像から人物画像のみを抽出する。人物画像のみの抽出処理は、従来からクロマキー合成処理において行われている通常の手法で行われ、人物領域と背景のスクリーン領域とを特定の画素値を閾値として分離して抽出し、抽出した人物画像をＧＰＵ１０Ｃ内の処理用データ記憶部の領域に記憶させる。前記抽出は具体的には、前記人物実動画像のフレーム画像を記憶させると共に、前記フレーム画像に対して、対応するピクセル値が、背景では０、人物領域では１となるようなマスク画像を作成して記憶させ、前記フレーム画像を取り出す際に、前記マスク画像も合わせて参照することで実現する。 In the image composition processing device 1, in the human image extraction processing step S3 (see FIG. 4), the GPU 10C executes the human image extraction processing function 102 in the program 100 loaded in advance, and is taken in from the human actual moving image input device 10K. Only the person image is extracted from each frame image of the person actual moving image. The person image only extraction process is performed by a normal method conventionally used in the chroma key composition process, and the person area and the background screen area are extracted by separating specific pixel values as threshold values, and the extracted person is extracted. The image is stored in the area of the processing data storage unit in the GPU 10C. Specifically, the extraction stores a frame image of the person actual moving image, and creates a mask image in which the corresponding pixel value is 0 in the background and 1 in the person area with respect to the frame image. This is realized by referring to the mask image together when the frame image is taken out and stored.

画像合成処理装置１では、追跡処理（トラッキング処理）ステップＳ４（図４参照）において、ＧＰＵ１０Ｃは、プログラム１００における追跡処理機能１０３を実行し、ＧＰＵ１０Ｃ上の記憶領域に記憶された前記人物画像及び前記顔の三次元形状情報に基づいて、前記人物の顔画像及び前記人物の顔の追跡情報を求める。 In the image composition processing apparatus 1, in the tracking process (tracking process) step S4 (see FIG. 4), the GPU 10C executes the tracking process function 103 in the program 100, and the person image stored in the storage area on the GPU 10C and the person image Based on the three-dimensional shape information of the face, the person's face image and the tracking information of the person's face are obtained.

本実施形態においては、前記人物の顔の追跡情報は、パーティクル・フィルタリング（Oka,K.,and Sato,Y. 2005. Real-time modeling of face deformation for 3-d head pose estimation. In Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures, 308-320.）に基づいて、以下のようにして求められる。ただし、本実施形態では、処理を簡素化するために、顔の表情による形状の変化は考慮されていない。より自然な結果を得るためには、顔の表情による形状の変化を考慮することが好ましい。 In the present embodiment, the tracking information of the person's face includes particle filtering (Oka, K., and Sato, Y. 2005. Real-time modeling of face deformation for 3-d head pose estimation. In Proceedings of the Based on IEEE International Workshop on Analysis and Modeling of Faces and Gestures, 308-320.) However, in this embodiment, in order to simplify the process, a change in shape due to facial expressions is not taken into consideration. In order to obtain a more natural result, it is preferable to consider changes in shape due to facial expressions.

先ず、追跡処理に使用するための顔画像における特徴点の初期座標を取得するために、ＧＰＵ１０Ｃは、前記ＧＰＵ１０Ｃ上の記憶領域に記憶された前記人物画像の最初のフレームを表示装置１０Ｇの画面上に表示させるとともに、前記ＧＰＵ１０Ｃ上の記憶領域に記憶された前記顔の三次元形状情報に基づいて、予め設定した人物の顔を正面から見たときの特徴点（の一群）を表示装置１０Ｇの画面上に表示させる。顔の特徴点としては、左右両目の目頭、目尻、鼻孔、唇の境界位置、眉頭、眉尻等が挙げられる。本実施形態では、処理の簡素化と追跡情報の精度を考慮し、前記特徴点として、左右両目の目頭、目尻、及び両鼻穴、唇の上下左右境界の位置、計１０点の特徴点が設定されている。本実施形態のプログラム１００においては、特徴点の一群は、表示装置１０Ｇの画面上において、入力装置１０Ｉによる平行移動、拡大縮小表示が可能とされており、前記人物の画像との位置合わせが精度良く行われる。 First, in order to acquire the initial coordinates of the feature points in the face image to be used for the tracking process, the GPU 10C displays the first frame of the person image stored in the storage area on the GPU 10C on the screen of the display device 10G. And a feature point (group) when the face of a preset person is viewed from the front is displayed on the display device 10G based on the three-dimensional shape information of the face stored in the storage area on the GPU 10C. Display on the screen. Examples of facial feature points include the eyes of the left and right eyes, the corners of the eyes, the nostrils, the boundary positions of the lips, the eyebrows, and the buttocks. In this embodiment, considering the simplification of processing and the accuracy of tracking information, the feature points include the top and bottom of the left and right eyes, the corners of the eyes, the nostrils, and the positions of the upper and lower boundaries of the lips. Is set. In the program 100 of this embodiment, the group of feature points can be translated and enlarged / reduced by the input device 10I on the screen of the display device 10G, and the alignment with the person image is accurate. Well done.

オペレータは、前記入力装置１０Ｉによって前記特徴点の一群を平行移動又は拡大縮小し、前記人物の画像中の特徴点に対応する座標とを位置合わせする。位置合わせが済むと、オペレータは、前記入力装置１０Ｉからの入力操作によってＣＰＵ１０Ａにその旨の信号を送る。この信号を受けると、ＣＰＵ１０Ａが、画像上での特徴点の位置情報をＧＰＵ１０Ｃに送る。それを受けたＧＰＵ１０Ｃは、各特徴点に対応する位置近傍の画像を取得し、それをテンプレートとして特徴点の位置情報と合わせて記憶領域に記憶する。ＧＰＵ１０Ｃは、前記信号をトリガーとして、図１６にフローチャートで示す追跡処理を開始する。 The operator translates or enlarges / reduces the group of feature points by the input device 10I, and aligns the coordinates corresponding to the feature points in the person image. After the alignment, the operator sends a signal to that effect to the CPU 10A by an input operation from the input device 10I. Upon receiving this signal, the CPU 10A sends the position information of the feature points on the image to the GPU 10C. Receiving it, the GPU 10C acquires an image in the vicinity of the position corresponding to each feature point, and stores it in the storage area together with the position information of the feature point as a template. The GPU 10C starts the tracking process shown in the flowchart of FIG. 16 using the signal as a trigger.

前述のように、本実施形態では、顔の表情による変化を考慮しないことから、前記人物の画像中の顔の位置の移動は、平行移動（３次元）と回転（３次元）の６次元による表記が可能となる。
ＧＰＵ１０Ｃは、前記ＧＰＵ１０Ｃ上の記憶領域に記憶された前記人物の画像中の二つ前及び一つ前のフレーム画像の顔における前記特徴点の位置情報から、現在のフレーム画像の顔の位置情報として考えられる候補を複数用意する。候補は、顔が加速しなかったときに予想される位置を中心に選ばれる。 As described above, in the present embodiment, changes due to facial expressions are not taken into consideration, and therefore the movement of the face position in the person image is based on six dimensions of translation (three dimensions) and rotation (three dimensions). Notation becomes possible.
The GPU 10C obtains the position information of the face of the current frame image from the position information of the feature point in the face of the previous and previous frame images in the person image stored in the storage area on the GPU 10C. Prepare multiple possible candidates. Candidates are chosen around the position expected when the face does not accelerate.

前記中心は、具体的には、次式（８）で表される。
（現在フレーム画像における位置・回転の値）＝（一つ前のフレーム画像における位置・回転の値）＋（（一つ前のフレーム画像における位置・回転の値）−（二つ前のフレーム画像における位置・回転値））（８）
ただし、式（８）の右辺第２項は顔の動きの速度に関する項である。追跡処理開始時には、二つ前のフレーム画像の各値は取得できないので、速度に関する項が０とされる。 Specifically, the center is represented by the following formula (8).
(Position / rotation value in current frame image) = (position / rotation value in previous frame image) + ((position / rotation value in previous frame image) − (two previous frame image) Position and rotation value)) (8)
However, the second term on the right side of Equation (8) is a term relating to the speed of the face movement. Since each value of the previous frame image cannot be acquired at the start of the tracking process, the term relating to speed is set to zero.

ＧＰＵ１０Ｃは、揺らぎ幅が前記速度に関する項の値に比例するようなガウスノイズ（乱数）を発生させ、それを前記中心に加算することで、前記候補として、前記中心を中心に複数の位置情報を用意する。揺らぎ幅が一定値以下になったときには、揺らぎ幅をその値とすることで、速度に関する項が０でもガウスノイズが発生するようにされている。 The GPU 10C generates Gaussian noise (random number) such that the fluctuation width is proportional to the value of the term related to the speed, and adds it to the center, so that a plurality of pieces of positional information about the center are obtained as the candidates. prepare. When the fluctuation width becomes a certain value or less, the fluctuation width is set to that value so that Gaussian noise is generated even if the term relating to the speed is zero.

ＧＰＵ１０Ｃは、前記候補に対して、特徴点付近の画像と前記テンプレートとの一致度πを求める。一致度πは、ある候補に対して、前記特徴点付近の画像と前記テンプレートとを、特徴点ごと、ピクセルごと、ＲＧＢのチャネルごとに引き算を行い、各ピクセルの差異の自乗和を求め、さらにそれの逆数を求めたものを指す。一致度πは、具体的には、下記式（９）で表される。
π＝１／Σ（Ｉｒ（ｘ，ｙ，λ）−Ｉｔ（ｘ，ｙ，λ））² （９）
ここで、ｘ、ｙは画像中のピクセルの座標、λはＲ／Ｇ／Ｂのいずれか、Ｉｒ（ｘ，ｙ，λ）は人物実動画像の該ピクセル、該チャネルの画素値、Ｉｔは前記候補の動きに従ってテンプレートを動かしたときの前記人物実動画像の該ピクセルに対応する位置のテンプレート上のピクセルの該チャネルの画素値、Σはテンプレートの存在する全ピクセル及び
全チャネルにわたる総和を表す。 The GPU 10C obtains a degree of coincidence π between the image near the feature point and the template with respect to the candidate. The degree of coincidence π is obtained by subtracting the image near the feature point and the template for each candidate point for each feature point, for each pixel, and for each RGB channel, and obtaining a square sum of the differences between the pixels. It refers to the reciprocal of it. The degree of coincidence π is specifically represented by the following formula (9).
π = 1 / Σ (Ir (x, y, λ) −It (x, y, λ)) ² (9)
Here, x and y are the coordinates of a pixel in the image, λ is any of R / G / B, Ir (x, y, λ) is the pixel of the human actual moving image, the pixel value of the channel, and It is When the template is moved according to the movement of the candidate, the pixel value of the channel of the pixel on the template at the position corresponding to the pixel of the human actual moving image, Σ represents the total pixel existing in the template and the sum over all channels .

前記人物実動画像中の各ピクセルに対して対応するテンプレートの有無をマスク画像で表すこともでき、対応するテンプレートがあるピクセルの値を１、ないピクセルの値を０としてマスク画像を作成した上で、前記一致度πの計算式（９）は、下記式（１０）で表される。
π＝１／Σ（マスク画像（ｘ，ｙ）×（Ｉｒ（ｘ，ｙ，λ）−Ｉｔ（ｘ，ｙ，λ））²
）（１０）
ここで、Σは画像中の全ピクセルにわたる総和を表す。
この値が大きければ大きいほど特徴点付近の画像と前記テンプレートとが一致していることになる。 The presence or absence of a template corresponding to each pixel in the human actual moving image can also be represented by a mask image. A mask image is created by setting the value of a pixel with a corresponding template to 1 and the value of a non-existing pixel to 0. The calculation formula (9) for the degree of coincidence π is expressed by the following formula (10).
π = 1 / Σ (mask image (x, y) × (Ir (x, y, λ) −It (x, y, λ)) ²
(10)
Here, Σ represents the sum total over all pixels in the image.
The larger this value is, the more the image near the feature point matches the template.

前記６次元のうち、ある次元ｘに着目すると、ｉ番目の候補の次元ｘの値をｂ（ｉ，ｘ）、前記一致度をπ（ｉ）としたとき、現在フレーム画像における次元ｘの値は、次式（１１）で求められる。
ｂ（ｘ）＝Σｂ（ｉ，ｘ）π（ｉ）／Σπ（ｉ）（１１）
ここで、Σは選んだ候補全てにわたる総和を表す。 Focusing on a certain dimension x out of the six dimensions, when the value of the i-th candidate dimension x is b (i, x) and the matching degree is π (i), the value of the dimension x in the current frame image Is obtained by the following equation (11).
b (x) = Σb (i, x) π (i) / Σπ (i) (11)
Here, Σ represents the sum total over all selected candidates.

画像合成処理装置１では、肌情報処理ステップＳ５（図４参照）において、ＧＰＵ１０Ｃは、あらかじめロードされたプログラム１００の肌情報処理機能１０６を実行し、前記ＧＰＵ１０Ｃ上の記憶領域に記憶された人物実動画像の各フレーム画像から、色成分情報及び陰影成分情報からなる肌情報を求める。本実施形態では、該肌情報は、Norimichi,T., Ojima,N., Sato,K., Shiraishi,M., Shimizu,H., Nabeshima,H., Akazaki,S., Hori,K., and Miyake,Y. 2003. Image-based skin color and texture analysis/synthesis by extracting hemoglobin and melanin information in the skin. ACM Transactions on Graphics 22, 3, 770-779.等で述べられている下記のような主成分分析法に基づいて、求められる。 In the image composition processing device 1, in the skin information processing step S5 (see FIG. 4), the GPU 10C executes the skin information processing function 106 of the program 100 loaded in advance, and the person actual stored in the storage area on the GPU 10C. Skin information including color component information and shadow component information is obtained from each frame image of the moving image. In this embodiment, the skin information is Norimichi, T., Ojima, N., Sato, K., Shiraishi, M., Shimizu, H., Nabeshima, H., Akazaki, S., Hori, K., and Miyake, Y. 2003. Image-based skin color and texture analysis / synthesis by extracting hemoglobin and melanin information in the skin.ACM Transactions on Graphics 22, 3, 770-779. It is determined based on the component analysis method.

ＲＧＢチャネルの各値の対数をとり、その符号を反転させたＲ’＝−ｌｏｇ（Ｒ）、Ｇ’＝−ｌｏｇ（Ｇ）、Ｂ’＝−ｌｏｇ（Ｂ）からなる３次元空間を考え、この３次元空間に、撮像した肌の画像の各ピクセルの値を射影すると、肌の色は前記３次元空間中の、ある色成分平面上にほぼ集まることが知られており、また、この色成分平面からのズレは肌の各部位における照度のムラ、つまり陰影により生じることも知られている。本実施形態では、このことを利用して前記各フレーム画像から、次のようにして色成分情報と陰影成分情報を分離する。 Consider a three-dimensional space consisting of R ′ = − log (R), G ′ = − log (G), and B ′ = − log (B) with the logarithm of each value of the RGB channel and inverted sign. It is known that when the values of each pixel of the captured skin image are projected onto the three-dimensional space, the skin color is almost collected on a certain color component plane in the three-dimensional space. It is also known that deviation from the component plane is caused by uneven illuminance, that is, shadow, in each part of the skin. In the present embodiment, this is used to separate color component information and shadow component information from each frame image as follows.

先ず、統計的手法により、前記３次元空間上での前記色成分平面を求める。この統計的手法は具体的には主成分分析法及び／又は独立成分分析法を採用することができる。本実施形態では、計算結果が安定して求まることなどから、主成分分析により主成分を求める。具体的には、前記３次元空間に射影した肌の各ピクセルの値に対して主成分分析を行い、その第一主成分ベクトル及び第二主成分ベクトルにより張られる平面を前記色成分平面とする。陰影の効果はＲＧＢの値それぞれにＲＧＢによらない値を乗じることで表せることから、第三のベクトルとして、（１，１，１）なる陰影単位ベクトルを選び、前記３次元空間をこれら３つのベクトルを軸とする斜交座標系で表す。 First, the color component plane in the three-dimensional space is obtained by a statistical method. Specifically, the principal component analysis method and / or the independent component analysis method can be adopted as this statistical method. In this embodiment, since the calculation result is obtained stably, the principal component is obtained by principal component analysis. Specifically, a principal component analysis is performed on the value of each pixel of the skin projected onto the three-dimensional space, and a plane spanned by the first principal component vector and the second principal component vector is defined as the color component plane. . Since the shadow effect can be expressed by multiplying each RGB value by a value that does not depend on RGB, a shadow unit vector of (1, 1, 1) is selected as the third vector, and the three-dimensional space is represented by these three dimensions. Expressed in an oblique coordinate system with the vector as the axis.

対象となる人物の肌の色を色黒、色白にするなどの画像効果を実現するためには、前記斜交座標系における前記色成分平面上への前記各ピクセルの値の射影に対してさらに独立成分分析を行い、２つの独立成分ベクトルを求め、それらを前記第一、第二主成分ベクトルの代わりに用いることが好ましい。この２つの独立成分ベクトルは、肌の色をもたらす２種類の色素、即ちメラニン色素及びヘモグロビン色素に対応し、例えばメラニン色素に対応する軸のスケール変換を行うことで、その後逆変換を行うと、肌の色を自然な色合いに保ちながら色黒にしたり色白にしたりすることができる。このことは前記独立成分分析法に関する文献に詳細に述べられている。 In order to realize an image effect such as setting the skin color of the target person to be black or white, further to the projection of the value of each pixel on the color component plane in the oblique coordinate system It is preferable to perform independent component analysis to obtain two independent component vectors and use them instead of the first and second principal component vectors. These two independent component vectors correspond to two types of pigments that bring about skin color, that is, melanin pigments and hemoglobin pigments. For example, by performing scale transformation of the axis corresponding to melanin pigments, and then performing inverse transformation, While keeping the skin tone in a natural color, it can be turned black or white. This is described in detail in the literature on the independent component analysis.

前記各ピクセルの値は、相対的に比較できれば充分なので、前記斜交座標系の取り方及びその原点の取り方には任意性があるが、非負であったほうが画像データとして扱いやすく、また肌の色を変える等の用途への拡張を考慮すると、本実施形態では、主成分分析を行った後、独立成分分析によりメラニン色素軸、ヘモグロビン色素軸を求めた上で、図１７のように、肌の色の各ピクセルの値に各３成分の最小値が０より小さくならないようにオフセットを設けることが好ましい。前記独立成分ベクトル及び前記オフセットは、事前に数例のテスト撮影を行い、それら画像の解析から求めておく。
このようにして求められた座標系により、人物の顔画像の各フレーム画像から、色成分情報と陰影成分情報を分離し、これらの情報からなる肌情報を求め、該情報をＧＰＵ１０Ｃ内の処理用データ記憶領域に記憶する。 Since the values of the pixels are sufficient if they can be compared relatively, there is an arbitrary way to set the oblique coordinate system and the origin, but it is easier to handle the image data if it is non-negative. In this embodiment, after performing the principal component analysis, after obtaining the melanin pigment axis and the hemoglobin pigment axis by independent component analysis in this embodiment, as shown in FIG. It is preferable to provide an offset so that the minimum value of each of the three components does not become smaller than 0 for each pixel value of the skin color. The independent component vector and the offset are obtained in advance by performing several test shootings and analyzing the images.
Using the coordinate system thus determined, color component information and shadow component information are separated from each frame image of a person's face image, skin information consisting of these information is obtained, and the information is used for processing in the GPU 10C. Store in the data storage area.

画像合成処理装置１では、合成顔画像情報処理ステップＳ６（図５参照）において、ＧＰＵ１０Ｃは、あらかじめロードされたプログラム１００の合成顔画像情報処理機能１０５を実行し、前記ＧＰＵ１０Ｃ上の記憶領域に記憶された前記顔の表面反射特性情報における顔の法線情報、前記各ステップで求められた前記顔の追跡情報及び前記光環境情報における前記点光源の位置、強度情報（光量）に基づいて、図１８に示すように、フレーム毎に前記顔の陰影情報を求める。 In the image composition processing apparatus 1, in the synthesized face image information processing step S6 (see FIG. 5), the GPU 10C executes the synthesized face image information processing function 105 of the program 100 loaded in advance and stores it in the storage area on the GPU 10C. Based on the normal information of the face in the surface reflection characteristic information of the face, the tracking information of the face obtained in each step, the position of the point light source in the light environment information, and intensity information (light quantity). As shown in FIG. 18, the shadow information of the face is obtained for each frame.

前記各フレーム画像上の任意のピクセルに対して、その位置における顔表面の法線ベクトルをＮ、そこから見たときの各点光源の方向ベクトルをＳ（ｉ）、各点光源の強度をＩ（ｉ）としたとき、そのピクセルの照度に比例した値、即ち陰影成分情報Ｌは、次式（１２）で表される。
Ｌ＝Σｄｏｔ（Ｎ，Ｓ（ｉ））Ｉ（ｉ）（１２）
ここで、Σは全ての点光源（ｎ個の点近似光源）にわたる総和、ｄｏｔ（Ａ，Ｂ）はＡ
とＢの内積を表す。後に計算する陰影を取り除いた肌情報にこれを乗ずることで、陰影を考慮した肌色を求めることができる。 For an arbitrary pixel on each frame image, N is the normal vector of the face surface at that position, S (i) is the direction vector of each point light source when viewed from there, and I is the intensity of each point light source. When (i) is assumed, a value proportional to the illuminance of the pixel, that is, the shadow component information L is expressed by the following equation (12).
L = Σdot (N, S (i)) I (i) (12)
Here, Σ is the total over all point light sources (n point approximate light sources), and dot (A, B) is A
Represents the inner product of B and B. By multiplying the skin information from which shadows to be calculated later are removed, the skin color in consideration of the shadows can be obtained.

画像合成処理装置１では、合成顔画像情報処理ステップＳ６において、ＧＰＵ１０Ｃはプログラム１００の合成顔画像情報処理機能１０５を実行し、前記ＧＰＵ上の記憶領域に記憶された前記顔の表面反射特性情報、前記各ステップで求められた前記顔の追跡情報及び前記光環境情報に基づき、前記顔の表面反射情報を求める。ただし、本実施形態では、ここでも、顔の光反射特性情報がＴＳモデルに従うと仮定して、顔の表面反射情報が求められる。 In the image synthesis processing device 1, in the synthesized face image information processing step S6, the GPU 10C executes the synthesized face image information processing function 105 of the program 100, and the surface reflection characteristic information of the face stored in the storage area on the GPU, Based on the face tracking information and the light environment information obtained in each step, the surface reflection information of the face is obtained. However, in this embodiment, the surface reflection information of the face is also obtained here assuming that the light reflection characteristic information of the face follows the TS model.

ＴＳモデルは、前述と同様に、皮膚の屈折率をｎ、光沢の強度をα、広がり（幅）をＭとしたときに、光反射率ｆが下記式（１３）により求められるとするものである。
ｆ＝α×（Ｆ・Ｇ・Ｄ）／（４ｃｏｓθｉｃｏｓθｒ）（１３）
ただし、Ｆはフレネル項、Ｇは形状項、Ｄは表面粗さ項である。
フレネル項Ｆは、垂直入射に対しては下記式（１４）、垂直入射以外は、下記式（１５）で表される。
Ｆ＝（ｎ−１）²／（ｎ＋１）² （１４）
Ｆ＝（１／２）〔ｔａｎ²（θｉ−θｔ）／ｔａｎ²（θｉ＋θｔ）＋
ｓｉｎ²（θｉ−θｔ）／ｓｉｎ²（θｉ＋θｔ）〕（１５）
また、表面粗さ項Ｄは、下記式（１６）で表される。
Ｄ＝ｅｘｐ（−ｔａｎ²β／Ｍ²）／πＭ²ｃｏｓ⁴β （１６）
また、形状項Ｇは、下記式（１７）で表される。ここで、ｍｉｎ（ａ，ｂ，ｃ）はａ，ｂ，ｃの中で最も小さい値を意味する。
Ｇ＝ｍｉｎ（Ｇｓ，Ｇｍ，１）（１７）
ここで、Ｇｓ＝２（Ｎ・Ｈ）・（Ｎ・Ｓ）／（Ｓ・Ｈ）
Ｇｍ＝２（Ｎ・Ｈ）・（Ｎ・Ｖ）／（Ｖ・Ｈ）
Ｈ＝（Ｓ・Ｖ）／｜Ｓ・Ｖ｜
であり、Ｎは、修正法線ベクトル、Ｓは光源方向の単位ベクトル、Ｖは撮像手段方向の単位ベクトル、ＨはＳとＶを２等分する単位ベクトル、βはＮとＨが成す角度、θｉは入射角、θｒは受光角、θｔは屈折角である。θｔは皮膚の屈折率をｎとして次式（１８）から求める。ここでも、前述したとおり、法線ベクトルはハイブリッド法で修正した修正法線ベクトルを用いる。
ｎ＝ｓｉｎθｉ／ｓｉｎθｔ（１８） In the TS model, as described above, when the refractive index of the skin is n, the intensity of gloss is α, and the spread (width) is M, the light reflectance f is obtained by the following equation (13). is there.
f = α × (F · G · D) / (4 cos θicos θr) (13)
However, F is a Fresnel term, G is a shape term, and D is a surface roughness term.
The Fresnel term F is expressed by the following formula (14) for normal incidence, and expressed by the following formula (15) except for normal incidence.
F = (n−1) ² / (n + 1) ² (14)
F = (1/2) [tan ² (θi−θt) / tan ² (θi + θt) +
sin ² (θi−θt) / sin ² (θi + θt)] (15)
Moreover, the surface roughness term D is represented by the following formula (16).
D = exp (-tan ² β / M ² ) / πM ² cos ⁴ β (16)
The shape term G is represented by the following formula (17). Here, min (a, b, c) means the smallest value among a, b, and c.
G = min (Gs, Gm, 1) (17)
Here, Gs = 2 (N · H) · (N · S) / (S · H)
Gm = 2 (N · H) · (N · V) / (V · H)
H = (S · V) / | S · V |
N is a modified normal vector, S is a unit vector in the direction of the light source, V is a unit vector in the direction of the imaging means, H is a unit vector that bisects S and V, β is an angle formed by N and H, θi is an incident angle, θr is a light receiving angle, and θt is a refraction angle. θt is obtained from the following equation (18), where n is the refractive index of the skin. Here, as described above, the corrected normal vector corrected by the hybrid method is used as the normal vector.
n = sin θi / sin θt (18)

前記顔の実動画像の各フレーム画像上の任意ピクセルに対して、その位置における顔表面の法線ベクトルをＮ、そこから見たときの各点光源の方向ベクトルをＳ（ｉ）、各点光源の強度をＩ（ｉ）、そこから見たときのカメラの方向ベクトルをＶとしたとき、そのピクセルの光反射率Ｆは次式（１９）で表せる。ただし前記各点光源は、前記光環境情報処理手段によりフレームごとに求めた近似点光源のうち、前記顔の実動画像のフレームに対応したもののことで、前記光環境情報として格納されている。
Ｆ＝Σｆ(ｉ) （１９）
ただし、Σは全ての点光源にわたる計算値の総和、ｆ（ｉ）はｉ番目の点光源に対して
ＴＳモデルに従って求められる反射光量の値である。ｉ番目の点光源に対する反射光量は、前記ＴＳモデルの定義式中の各入力パラメータとして、ＳにＳ（ｉ）、ｃｏｓθｉ＝ｄｏｔ（Ｎ，Ｓ）、ｃｏｓθｒ＝ｄｏｔ（Ｎ，Ｖ）、α、Ｍとして前記顔の表面反射特性の該位置の強度、広がりを代入して求めることができる。 For an arbitrary pixel on each frame image of the actual moving image of the face, N is the normal vector of the face surface at that position, S (i) is the direction vector of each point light source when viewed from there, and each point When the intensity of the light source is I (i) and the direction vector of the camera when viewed from there is V, the light reflectance F of the pixel can be expressed by the following equation (19). However, each point light source corresponds to a frame of the actual moving image of the face among the approximate point light sources obtained for each frame by the light environment information processing means, and is stored as the light environment information.
F = Σf (i) (19)
Here, Σ is the sum of calculated values over all point light sources, and f (i) is the value of the amount of reflected light obtained according to the TS model for the i-th point light source. The amount of reflected light with respect to the i-th point light source is S (i), cos θi = dot (N, S), cos θr = dot (N, V), α, S as input parameters in the TS model definition formula. M can be obtained by substituting the intensity and spread of the position of the surface reflection characteristics of the face.

画像合成処理装置１では、顔動画像合成処理ステップＳ７（図６参照）において、ＧＰＵ１０Ｃは、あらかじめロードされたプログラム１００の陰影成分情報処理機能１０７兼顔画像合成処理機能１０８を実行し、前記肌情報、顔の陰影情報、及び顔の表面反射情報に基づいて、前記顔の合成画像を合成する。 In the image composition processing apparatus 1, in the face moving image composition processing step S7 (see FIG. 6), the GPU 10C executes the shadow component information processing function 107 and the face image composition processing function 108 of the program 100 loaded in advance, and Based on the information, the shadow information of the face, and the surface reflection information of the face, the composite image of the face is synthesized.

本実施形態では、図１９に示すように、前記陰影成分情報にローパスフィルタをかけたものを低周波陰影成分情報とし、前記陰影成分情報を低周波陰影成分情報で除した結果を前記陰影成分情報の高周波陰影成分情報とする。前記高周波陰影成分情報は、各情報の取得方法の相違により生じる、顔の色成分（ヘモグロビン成分、メラニン成分）情報と顔の陰影成分の間で高周波での周波数特性が異なってしまうことにより合成画像が不自然になることを防ぐために用いられる。
本実施形態では、前記顔の陰影情報と前記陰影成分情報の高周波陰影成分情報とを乗じたものを最終的な顔の陰影情報とし、これを前記独立成分分析法で求めた陰影成分情報に代えて、メラニン成分情報、ヘモグロビン成分情報と共に、前記独立成分分析法で行った変換の逆変換を行うことで、陰影を考慮した色成分情報を得る。この結果に、前記顔の表面反射光成分情報を加算する。加算する前に、各々の画像には、最終的に背景と合成したときに自然な画像となるように、適切な定数を乗じておく。この定数は、表面反射情報を求めるプロセスと色成分情報を求めるプロセス、及び背景画像を取得するプロセスの各プロセスの結果画像それぞれの単位画素値が表す光量が異なるために必要となる。 In the present embodiment, as shown in FIG. 19, low-frequency shadow component information is obtained by applying a low-pass filter to the shadow component information, and the result obtained by dividing the shadow component information by low-frequency shadow component information is the shadow component information. High-frequency shadow component information. The high-frequency shadow component information is a composite image due to differences in frequency characteristics at high frequencies between facial color component (hemoglobin component, melanin component) information and facial shadow component, which are caused by differences in the acquisition method of each information. Used to prevent unnaturalness.
In this embodiment, the result of multiplying the shadow information of the face and the high-frequency shadow component information of the shadow component information is used as final face shadow information, which is replaced with the shadow component information obtained by the independent component analysis method. Thus, together with the melanin component information and the hemoglobin component information, the color component information considering the shadow is obtained by performing the inverse transformation of the transformation performed by the independent component analysis method. The face reflection light component information of the face is added to this result. Before the addition, each image is multiplied by an appropriate constant so that it finally becomes a natural image when it is combined with the background. This constant is necessary because the amount of light represented by the unit pixel value of each result image of the process of obtaining the surface reflection information, the process of obtaining the color component information, and the process of obtaining the background image is different.

画像合成処理装置１では、人物背景合成動画像合成処理ステップＳ８（図７参照）において、ＧＰＵ１０Ｃは、プログラム１００の人物背景合成動画像合成処理機能１０９を実行し、前記各ステップで求めた前記顔の合成画像及び前記人物画像、並びに前記ファイル記憶装置１０Ｆに記憶された前記背景実動画像に基づいて、人物背景合成動画像（最終合成画像）を合成する。具体的には、顔の合成画像とそれに対応する前記人物実動画像中の１フレーム画像、及び前記背景実動画像の対応する１フレーム画像をフレームごとに順次合成した後、連結して動画像にする。そして、この人物の合成動画像、前記人物画像及び本実施形態では、図２０に示すように、前記顔の合成画像の背景に相当するスクリーンＳＣの領域を、前記背景実動画像と差し替えることで最終的な合成画像を得る。 In the image composition processing device 1, in the person background composition moving image composition processing step S8 (see FIG. 7), the GPU 10C executes the person background composition moving image composition processing function 109 of the program 100, and the face obtained in each of the above steps. The person background synthesized moving image (final synthesized image) is synthesized based on the synthesized image and the person image and the background actual moving image stored in the file storage device 10F. Specifically, a synthesized image of a face, a corresponding one frame image in the actual human moving image, and a corresponding one frame image of the actual background moving image are sequentially combined for each frame and then connected to a moving image. To. Then, in the synthesized moving image of the person, the person image, and in the present embodiment, as shown in FIG. 20, the area of the screen SC corresponding to the background of the synthesized image of the face is replaced with the background actual moving image. A final composite image is obtained.

以上説明したように、本実施形態の画像合成装置及び方法では、合成の対象とする人物実動画像と、これとは別に撮像される背景実動画像とを合成するに当たって、予め該人物の顔について前記顔の表面反射特性情報を求めておくとともに、前記背景実動画像の撮像とともに該背景の光環境実動画像を撮像しておき、該顔の表面反射特性情報と、前記光環境実動画像から光源を点近似して求めた光環境情報と、前記人物実動画像から抽出した顔の追跡情報とをリアルタイム処理により合成して顔の陰影情報及び顔の表面反射情報を求め、これらの各情報と、前記人物の顔画像から求めた肌情報及び高周波陰影成分情報とともに合成した上で、前記背景実動画像と合成するので、人物と背景の動画像における照明条件を略同じにすることができ、あたかも同じ場所で撮像されたと同様の自然な動画像を得ることができる。 As described above, in the image synthesizing apparatus and method of the present embodiment, in synthesizing a person actual moving image to be synthesized with a background actual moving image captured separately, the person's face is previously stored. The surface reflection characteristic information of the face is obtained, and the actual light environment image of the background is imaged together with the background actual motion image, and the face surface reflection characteristic information and the light environment actual moving image are captured. The light environment information obtained by point approximation of the light source from the image and the face tracking information extracted from the human actual moving image are synthesized by real-time processing to obtain face shadow information and face surface reflection information. Since each information is combined with the skin information and high-frequency shadow component information obtained from the person's face image and then combined with the background actual moving image, the lighting conditions in the person and background moving images should be substantially the same. Can It is possible to obtain a natural moving images similar to as if captured at the same place.

本発明は、前記実施形態に制限されない。
例えば、画像合成に使用する表面反射特性情報は、他の方法で得られた情報を採用することができる。 The present invention is not limited to the embodiment.
For example, information obtained by other methods can be adopted as the surface reflection characteristic information used for image composition.

また、肌情報処理手段では、前記色成分平面を導くための解析手法として主成分分析法に変えて、独立成分分析法を用いることができる。また光環境情報処理手段では、光環境実動画像を適切に変換できれば、点光源の代わりに面光源、線光源などを使ってもよい。各情報取得、処理を工夫することで、顔の色成分（ヘモグロビン成分、メラニン成分）情報と顔の陰影成分の間で周波数特性を一致させることができれば、陰影成分の高周波陰影成分情報は使わなくても良い。 The skin information processing means can use an independent component analysis method instead of the principal component analysis method as an analysis method for deriving the color component plane. The light environment information processing means may use a surface light source, a line light source or the like instead of the point light source as long as the light environment actual moving image can be appropriately converted. If the frequency characteristics can be matched between facial color component (hemoglobin component, melanin component) information and facial shadow component by devising each information acquisition and processing, the high-frequency shadow component information of the shadow component is not used. May be.

本発明の表面反射特性測定装置の測定対象に特に制限はない。顔以外の身体の各部位の表面反射特性の他、物品の表面反射特性を測定することができる。 There is no particular limitation on the measurement target of the surface reflection characteristic measuring apparatus of the present invention. In addition to the surface reflection characteristics of each part of the body other than the face, the surface reflection characteristics of the article can be measured.

本発明により得られる人物背景合成動画像は、例えば、アミューズメント施設におけるアトラクション、テレビなどにおけるリアルタイム合成処理等に利用される。 The person background synthesized moving image obtained by the present invention is used for, for example, an attraction in an amusement facility, a real-time synthesis process in a television, or the like.

本発明の画像合成装置の一実施形態の概略を示すデータフローを含むブロック図である。It is a block diagram including the data flow which shows the outline of one Embodiment of the image synthesizing | combining apparatus of this invention. 本発明の画像合成装置の一実施形態を汎用のコンピュータに適用したブロック図である。1 is a block diagram in which an embodiment of an image composition apparatus of the present invention is applied to a general-purpose computer. 本発明の画像合成方法の処理ステップを示すデータフロー図である。It is a data flow figure showing a processing step of an image composition method of the present invention. 本発明の画像合成方法の処理ステップを示すデータフロー図である。It is a data flow figure showing a processing step of an image composition method of the present invention. 本発明の画像合成方法の処理ステップを示すデータフロー図である。It is a data flow figure showing a processing step of an image composition method of the present invention. 本発明の画像合成方法の処理ステップを示すデータフロー図である。It is a data flow figure showing a processing step of an image composition method of the present invention. 本発明の画像合成方法の処理ステップを示すデータフロー図である。It is a data flow figure showing a processing step of an image composition method of the present invention. 本発明の画像合成方法において背景実画像及び光環境実動画像の撮像に使用される撮像手段の模式図である。It is a schematic diagram of the imaging means used for imaging of a background real image and a light environment actual moving image in the image composition method of the present invention. 本発明の画像合成方法において使用される光環境情報を求めるための光源の点近似方法におけるフローを示す図である。It is a figure which shows the flow in the point approximation method of the light source for calculating | requiring the light environment information used in the image synthesis method of this invention. 本発明の画像合成方法において使用される表面反射特性の測定装置の一実施形態を示す模式図であり、（ａ）は平面図、（ｂ）は側面図である。It is a schematic diagram which shows one Embodiment of the measuring apparatus of the surface reflection characteristic used in the image composition method of this invention, (a) is a top view, (b) is a side view. 前記実施形態の表面反射特性測定装置における測定手順の概略を示すフローチャートである。It is a flowchart which shows the outline of the measurement procedure in the surface reflection characteristic measuring apparatus of the said embodiment. 前記実施形態の表面反射特性測定装置の情報処理手段における、空間コード化法による三次元形状情報の算出ステップを説明するための図である。It is a figure for demonstrating the calculation step of the three-dimensional shape information by the spatial coding method in the information processing means of the surface reflection characteristic measuring apparatus of the said embodiment. 前記実施形態の表面反射特性測定装置の情報処理手段における、ハイブリッド法による三次元形状情報及び法線情報の修正ステップを説明するための図である。It is a figure for demonstrating the correction step of the three-dimensional shape information and normal line information by the hybrid method in the information processing means of the surface reflection characteristic measuring apparatus of the embodiment. 前記実施形態の表面反射特性測定装置の情報処理手段における、シンプレックス法による光反射特性情報の算出ステップを説明するための概念図であり、（ａ）は撮像される一連のフレーム画像、（ｂ）は一連のフレーム画像中の特定の位置に着目したときの画素値のグラフ（点）と、反射強度及び幅をパラメータとしてフィッティングを行った結果グラフ（破線）とを示す図である。It is a conceptual diagram for demonstrating the calculation step of the light reflection characteristic information by the simplex method in the information processing means of the surface reflection characteristic measuring apparatus of the said embodiment, (a) is a series of frame images imaged, (b) FIG. 5 is a diagram showing a graph (point) of pixel values when focusing on a specific position in a series of frame images, and a graph (dashed line) as a result of fitting using the reflection intensity and width as parameters. 本発明の画像合成方法において人物実動画像の撮像に使用される装置を示す図であり、（ａ）は平面図、（ｂ）は側面図である。It is a figure which shows the apparatus used for imaging of a person actual moving image in the image composition method of this invention, (a) is a top view, (b) is a side view. 前記実施形態の画像合成方法における追跡処理ステップを説明するフローチャートである。It is a flowchart explaining the tracking process step in the image composition method of the embodiment. 主成分分析法及び独立成分分析法によって人物の顔画像から色成分情報（ヘモグロビン成分とメラニン成分）と陰影成分情報からなる肌情報を求めるステップを説明する概略図である。It is the schematic explaining the step which calculates | requires the skin information which consists of color component information (a hemoglobin component and a melanin component) and shadow component information from a person's face image by a principal component analysis method and an independent component analysis method. 前記実施形態の画像合成方法における合成顔画像情報処理ステップを説明するフローチャートである。It is a flowchart explaining the synthetic face image information processing step in the image composition method of the embodiment. 前記実施形態の画像合成方法における顔画像合成処理ステップを説明するフローチャートである。It is a flowchart explaining the face image composition processing step in the image composition method of the embodiment. 前記実施形態の画像合成方法における人物背景合成動画像合成処理ステップを説明するフローチャートである。It is a flowchart explaining the person background synthetic | combination moving image composition process step in the image composition method of the embodiment.

Explanation of symbols

１画像合成装置
１１人物実動画像入力・記憶手段
１２背景実動画像入力・記憶手段
１３光環境実動画像入力・記憶手段
１４表面反射特性情報入力・記憶手段
２１人物画像抽出処理手段
２２追跡処理手段
２３光環境情報処理手段
２４合成顔情報処理手段
２５肌情報処理手段
２６陰影成分情報処理手段
２７顔画像合成処理手段
２８人物背景合成動画像合成処理手段
２９動画像表示処理手段
１０コンピュータシステム
１００画像合成プログラム
Ａ表面反射特性測定装置
１Ａ光反射特性情報測定手段
１Ｂ三次元形状情報測定手段
１Ｃ法線情報測定手段
２可動光源
２Ａ、２Ｂ光源
２１Ａ、２１Ｂ遮蔽板
３撮像手段
３Ａ偏光子
４情報処理手段
５投影手段
６固定光源
６Ａ偏光子
７制御手段
８コンピュータシステム
９フレーム DESCRIPTION OF SYMBOLS 1 Image composition apparatus 11 Person actual moving image input / storage means 12 Background actual moving image input / storage means 13 Optical environment actual moving image input / storage means 14 Surface reflection characteristic information input / storage means 21 Person image extraction processing means 22 Tracking process Means 23 Light environment information processing means 24 Synthetic face information processing means 25 Skin information processing means 26 Shadow component information processing means 27 Face image composition processing means 28 Human background composition moving image composition processing means 29 Moving image display processing means 10 Computer system 100 Image Composition program A Surface reflection characteristic measuring device 1A Light reflection characteristic information measuring means 1B Three-dimensional shape information measuring means 1C Normal information measuring means 2 Movable light source 2A, 2B Light source 21A, 21B Shield plate 3 Imaging means 3A Polarizer 4 Information processing means DESCRIPTION OF SYMBOLS 5 Projection means 6 Fixed light source 6A Polarizer 7 Control means 8 Computer system 9 frames

Claims

An image synthesizer for synthesizing a synthesized image of a person's face synthesized based on an actual moving image of a person with an actual moving image of a background captured separately from the actual moving image of the person,
Person actual moving image input / storage means for inputting or storing the actual moving image of the person;
Background actual moving image input / storage means for inputting or storing the actual moving image of the background;
Light environment actual moving image input / storage means for inputting or storing the light environment actual moving image captured together with the background actual moving image;
Surface reflection characteristic information input / storage means for inputting or storing the surface reflection characteristic information of the person's face composed of three-dimensional shape information, normal information and light reflection characteristic information of the person's face;
Person image extraction processing means for extracting a person image from each frame image of the person's actual moving image stored in the person actual moving image input / storage means;
Based on the extracted person image and the three-dimensional shape information of the face input or stored by the surface reflection characteristic information input / storage means, the position information of the face image of the person and the specific part of the face of the person And tracking processing means for obtaining face tracking information comprising the face direction information;
Light environment information processing for obtaining light environment information corresponding to each frame image of the background actual moving image based on each frame image of the light environment actual moving image input or stored by the light environment actual moving image input / storage means Means,
Synthetic face image information processing means for determining shadow information and surface reflection information of the face based on the face surface reflection characteristic information, the face tracking information, and the light environment information;
Skin information processing means for obtaining skin information comprising color component information and shadow component information from each frame image of the face image of the person;
Face image synthesis processing means for synthesizing the synthesized image of the face based on the skin information, the shadow information of the face, and the surface reflection information of the face;
An image synthesizing apparatus comprising: a person background synthesized moving image synthesizing unit that synthesizes a person background synthesized moving image based on the synthesized image of the face, the person image, and the background actual moving image.

The skin information processing means obtains skin information including the color component information and shadow component information from each frame image of the person's face image using a principal component analysis method and / or an independent component analysis method. The image composition apparatus described.

The light environment information processing means includes a light source according to each frame image of the background actual moving image based on each frame image of the light environment actual moving image input or stored by the light environment actual moving image input / storage means. The image synthesis apparatus according to claim 1, wherein light environment information is obtained by approximating a point light source.

The face image synthesis processing means includes shadow component information processing means for obtaining high-frequency shadow component information from the shadow component information, and in addition to the skin information, the face shadow information, and the face surface reflected light information, The image synthesizing apparatus according to claim 1, wherein the synthesized image of the face is synthesized based on high-frequency shadow component information.

An optical environment in which the optical environment actual moving image is disposed behind a video camera for background imaging that captures the actual moving image of the background, and is set so that the optical axes of the lenses are positioned in the same straight line 2. A moving image including a sphere or a spherical shell, which is imaged by an imaging video camera, is disposed in front of the optical environment imaging video camera, and has a mirror-finished surface passing through the center of the optical axis. The image synthesis device according to any one of?

The actual moving image of the person, the actual moving image of the background captured separately from the actual moving image of the person, the actual light image of the background captured with the actual moving image of the background, and the tertiary of the face of the person A data input / storage processing step for inputting or storing surface reflection characteristic information of the person's face composed of original shape information, normal line information, and light reflection characteristic information;
An optical environment information processing step for obtaining optical environment information according to each frame image of the background actual moving image, based on each frame image of the optical environment actual moving image;
A person image extraction processing step of extracting a person image from each frame image of the actual moving image of the person;
A tracking processing step for obtaining tracking information of the face image of the person and the face of the person based on the three-dimensional shape information of the person image and the face;
A synthetic face image information processing step for obtaining shadow information and surface reflection information of the face based on the surface reflection characteristic information of the face, the tracking information of the face, and the light environment information;
Skin information processing step for obtaining skin information consisting of color component information and shadow component information from the face image of the person,
A face image synthesis processing step of synthesizing the synthesized image of the face based on the skin information, the shadow information of the face, and the surface reflection information of the face;
An image synthesis method comprising: a person background synthesized moving image synthesizing step of synthesizing a person background synthesized moving image based on the face synthesized image, the person image, and the background actual moving image.

An actual moving image of a person, an actual moving image of a background captured separately from the actual moving image of the person, an optical environment actual moving image of the background captured together with the actual moving image of the background, and the person A data input / storage processing function for inputting or storing the surface reflection characteristic information of the person's face composed of the three-dimensional shape information of the face, normal information and light reflection characteristic information, or each frame of the actual moving image of the person A human image extraction processing function for extracting a human image from an image; a tracking processing function for obtaining tracking information of the human face image and the human face based on the human image and the three-dimensional shape information of the face; A light environment information processing function for obtaining light environment information corresponding to each frame image of the background actual motion image stored based on each frame image of the environment actual motion image, the surface reflection characteristic information of the face, and the face Tracking information And a synthetic face image information processing function for obtaining shadow information and surface reflection information of the face based on the light environment information, and skin for obtaining skin information comprising color component information and shadow component information from the face image of the person An information processing function, a face image synthesis processing function for synthesizing the synthesized image of the face based on the skin information, the shadow information of the face, and the surface reflection information of the face, the synthesized image of the face, and the person image And an image composition program for realizing a person background composition moving image composition processing function for composing a person background composition moving image based on the actual background moving image.