JP3984191B2

JP3984191B2 - Virtual makeup apparatus and method

Info

Publication number: JP3984191B2
Application number: JP2003160973A
Authority: JP
Inventors: 真由美湯浅; 朗子中島; 修山口; 智和若杉; 達夫小坂谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-07-08
Filing date: 2003-06-05
Publication date: 2007-10-03
Anticipated expiration: 2023-06-05
Also published as: JP2004094917A

Description

【０００１】
【発明の属する技術分野】
本発明は、主として人物の顔画像に対してメイクアップ等の変形を施した画像をディスプレイ等の表示装置に表示するか、あるいは、ネットワークを通じて送信する仮想化粧装置及びその方法に関する。
【０００２】
【従来の技術】
口紅やアイシャドウ等の化粧品、あるいはメガネやカラーコンタクトレンズ等は実際に自分がつけた時にどのように見えるかを知りたいという顧客の要求がある。しかしながらこれらの実際に肌等に装着するものは装着自体が面倒であったり、衛生上の問題があったり、異なる商品間の比較が困難であったり、といった問題がある。そこでこれらの商品を実際に装着することなく、装着した状態をバーチャルに画像中で体験できるものは非常に有用である。
【０００３】
これまで画像を用いて化粧、髪型、メガネ、カラーコンタクトレンズ等の装着をシミュレートする装置やソフトウエアは静止画を用いた簡易的なものは存在する（例えば、特許文献１）。しかしながら、動画像でユーザの動きに応じて動かすことができなかったため、実際の装着感を得ることができない。また、仮想画像の生成の元となる顔画像中の特徴点や特徴領域の検出がロバストでないため、照明の変化や動き、個人差に十分対応できていない。
【０００４】
特徴点の検出を行い、それに対して予め決められた画像を重ねることにより変装画像を表示するものとしては、特許文献２の手法があるが、実画像のシミュレーションという意味では満足できるものではない。
【０００５】
また、今日ブロードバンドの普及により、個人間においてもテレビ電話等の顔を見ることができるコミュニケーションツールが普通に使われるようになりつつある。その際に自分の顔や姿を自宅でいるそのままの状態で出て行ってしまうことが問題となると予想される。もちろん、アバタ等を用いることで顔を出さない方法もあるが、それでは臨場感が失われてしまい、折角のブロードバンドの意味がなくなってしまう。しかし、テレビ電話に出るためにわざわざ外出時のような化粧をするのは面倒であるといった問題点がある。
【０００６】
【特許文献１】
特開２０００−２８５２２２公報
【特許文献２】
特開２０００−３２２５８８公報
【０００７】
【発明が解決しようとする課題】
本発明は上記の問題を解決するために、リアルタイムで画像の入力表示を行い、特徴点等の検出をロバストに行うことで、ユーザが自然に装着感を得られる仮想化粧装置及びその方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
請求項１の発明は、少なくとも人物の顔領域を含む動画像における化粧前の各フレームの画像を画像処理して、化粧を施した人物の顔の画像に加工する仮想化粧装置において、顔形状モデルを記憶する顔形状モデル記憶手段と、前記化粧前の各フレームの画像から前記顔に関する特徴点、特徴領域、または、その両方を検出する特徴検出手段と、前記化粧を施すためのアイテムに関する前記アイテムの種類に関するデータと、顔の特徴点、特徴領域、または、その両方と前記アイテムの位置関係を示す位置関係データと、前記アイテムの色、形状、または、塗布範囲を含んでいる化粧データを記憶するデータ記憶手段と、前記記憶されたアイテムの種類から、化粧に必要なアイテムの種類に関するデータを決定する加工方式決定手段と、前記各フレームの画像から検出された特徴点の２次元座標にもとづいて回転角を求めることによって顔の向きを検出する顔方向検出手段と、前記顔の形状モデルを用いて前記各フレームの顔の向きに応じた前記アイテムの塗布位置及び範囲を計算する塗布範囲計算手段と、前記塗布範囲計算手段により計算された前記アイテムの塗布範囲に応じて、動画像の各フレームにおける前記顔の画像を化粧するように画像を加工する画像加工手段と、前記化粧が施された顔の画像を表示する表示手段と、を有することを特徴とする仮想化粧装置である。
【０００９】
請求項２の発明は、前記化粧が施された顔の画像を、ネットワークを通じて送信する表示画像送信手段を有することを特徴とする請求項１記載の仮想化粧装置である。
【００１０】
請求項３の発明は、前記特徴検出手段は、ネットワークを通じて前記化粧前の画像に関するデータを受信することを特徴とする請求項１記載の仮想化粧装置である。
【００１１】
請求項４の前記化粧を施した画像を前記人物の顔の３次元モデルにテクスチャマッピングで貼り付けることを特徴とする請求項１から３のうち少なくとも一項に記載の仮想化粧装置である。
【００１２】
請求項５の発明は、前記表示手段において、異なる時刻において化粧を施した画像、異なる顔の方向の化粧を施した画像、異なる解像度の化粧を施した画像、異なる部位の化粧を施した画像、または、異なる化粧を施した画像を同時に表示することを特徴とする請求項１から４のうち少なくとも一項に記載の仮想化粧装置である。
【００１３】
請求項６の発明は、前記アイテムが、カラーコンタクトレンズ、アイシャドウ、チーク、口紅、または、タトゥーであることを特徴とする請求項１から５のうち少なくとも一項に記載の仮想化粧装置である。
【００１４】
請求項７の発明は、少なくとも人物の顔領域を含む動画像における化粧前の各フレームの画像を画像処理して、化粧を施した人物の顔の画像に加工する仮想化粧方法において、顔形状モデルを記憶する顔形状モデル記憶ステップと、前記化粧前の各フレームの画像から前記顔に関する特徴点、特徴領域、または、その両方を検出する特徴検出ステップと、前記化粧を施すためのアイテムに関する前記アイテムの種類に関するデータと、顔の特徴点、特徴領域、または、その両方と前記アイテムの位置関係を示す位置関係データと、前記アイテムの色、形状、または、塗布範囲を含んでいる化粧データを記憶するデータ記憶ステップと、前記記憶されたアイテムの種類から、化粧に必要なアイテムの種類に関するデータを決定する加工方式決定ステップと、前記各フレームの画像から検出された特徴点の２次元座標にもとづいて回転角を求めることによって顔の向きを検出する顔方向検出ステップと、前記顔の形状モデルの向きに応じて前記アイテムの塗布位置及び範囲を計算する塗布範囲計算ステップと、前記塗布範囲計算手段により計算された前記アイテムの塗布範囲に応じて、動画像の各フレームにおける前記顔の画像を化粧するように画像を加工する画像加工ステップと、前記化粧が施された顔の画像を表示する表示ステップと、を有することを特徴とする仮想化粧方法である。
【００１５】
請求項８の発明は、少なくとも人物の顔領域を含む動画像における化粧前の各フレームの画像を画像処理して、化粧を施した人物の顔の画像に加工する仮想化粧方法をコンピュータによって実現するプログラムにおいて、顔形状モデルを記憶する顔形状モデル記憶機能と、前記化粧前の各フレームの画像から前記顔に関する特徴点、特徴領域、または、その両方を検出する特徴検出機能と、前記化粧を施すためのアイテムに関する前記アイテムの種類に関するデータと、顔の特徴点、特徴領域、または、その両方と前記アイテムの位置関係を示す位置関係データと、前記アイテムの色、形状、または、塗布範囲を含んでいる化粧データを記憶するデータ記憶機能と、前記記憶されたアイテムの種類から、化粧に必要なアイテムの種類に関するデータを決定する加工方式決定機能と、前記各フレームの画像から検出された特徴点の２次元座標にもとづいて回転角を求めることによって顔の向きを検出する顔方向検出機能と、前記顔の形状モデルの向きに応じて前記アイテムの塗布位置及び範囲を計算する塗布範囲計算機能と、前記塗布範囲計算手段により計算された前記アイテムの塗布範囲に応じて、動画像の各フレームにおける前記顔の画像を化粧するように画像を加工する画像加工機能と、前記化粧が施された顔の画像を表示する表示機能と、をコンピュータによって実現することを特徴とする仮想化粧方法のプログラムである。
【００１９】
【発明の実施の形態】
以下に本発明を詳細に説明する。
【００２０】
（第１の実施形態）本発明に関わる第１の実施形態の仮想化粧装置１００について、図１から図４に基づいて述べる。
【００２１】
なお、本明細書において、「化粧」とは、いわゆる化粧品等をつけて顔をよそおい飾ること、美しく見えるように、表面を磨いたり飾ったりすることに限らず、顔の表面にタトゥーを付けたり、ペインティングを施したりして、化粧前の顔に対し、化粧後の顔の一部、または、全部の色等が異なる状態をいう。
【００２２】
（１）仮想化粧装置１００の構成
図１は本実施形態の構成を表す図であり、図２は、仮想化粧装置１００の概観図例である。
【００２３】
仮想化粧装置１００は、パーソナルコンピュータ（以下、ＰＣという）にビデオカメラを接続し、ＣＲＴもしくは液晶ディスプレイに画像を表示する例である。
【００２４】
具体的には人物の瞳領域に対してカラーコンタクトレンズを擬似的につけた画像を表示するものである。
【００２５】
画像入力部１０１は、ビデオカメラであり、動画像を入力する。
【００２６】
特徴検出部１０２では、入力された画像から瞳位置及び虹彩領域の輪郭を抽出する。
【００２７】
データ記憶部１０３では、化粧のアイテムの一つであるカラーコンタクトレンズの種類毎に色、テクスチャー、透明度、サイズ等の化粧データと、顔の特徴点、特徴領域、または、その両方と前記アイテムの位置関係を示す位置関係データを記憶している。
【００２８】
加工方式決定部１０５では、記憶されたカラーコンタクトレンズの種類を選択することができるインターフェースを提供する。
【００２９】
画像加工部１０４においては、選択されたカラーコンタクトレンズのデータに基づき、画像中の対象者の瞳虹彩部分にカラーコンタクトレンズを装着した状態に加工する。
【００３０】
表示部１０６では、ＣＲＴもしくは液晶ディスプレイに加工された画像を表示する。
【００３１】
各構成部１０１〜１０６について以下で詳細に述べる。なお、特徴検出部１０２、データ記憶部１０３、画像加工部１０４、加工方式決定部１０５の各機能は、ＰＣに記録されているプログラムによって実現される。
【００３２】
（２）画像入力部１０１
画像入力部１０１では、ビデオカメラから対象となる人物の顔を含む動画像を入力する。
【００３３】
例えば、一般的なＵＳＢカメラやデジタルビデオ等、または特別な画像入力デバイスを介して入力する方法もある。
【００３４】
入力された動画像は特徴検出部１０２に逐次送られる。
【００３５】
（３）特徴検出部１０２
特徴検出部１０２では、入力された画像から特徴点、領域を検出する。
【００３６】
本実施形態の場合では、図３に示すような瞳の虹彩領域の輪郭を検出する。瞳虹彩領域の輪郭検出方式を以下で述べる。
【００３７】
具体的には、文献１（福井、山口：“形状抽出とパターン照合の組合せによる顔特徴点抽出”，信学論（Ｄ−II）ｖｏｌ．Ｊ８０−Ｄ−II，ｎｏ．９，ｐｐ．２１７０−２１７７，Ａｕｇ．１９９７）で示す方法で瞳位置を検出する。
【００３８】
その後、その位置を初期値としてさらに輪郭を検出する。最初に顔領域部分の検出を行ってもよい。
【００３９】
瞳輪郭抽出する方法としては、輪郭が円形で近似できる場合においては、文献２（湯浅他：「パターンとエッジの統合エネルギー最小化に基づく高精度瞳検出」，信学技報，ＰＲＭＵ２０００−３４，ＰＰ．７９−８４，Ｊｕｎ．２０００）が利用できる。
【００４０】
先に検出した瞳の中心位置と半径を初期値として、位置と半径を変化させたときのパターン類似度によるパターンエネルギーと円形分離度フィルタの大きさによるエッジエネルギーの組合せによる値が最小になるような位置と半径を求めることで、より精度の高い輪郭を求める方法である。
【００４１】
その際、移動部分空間と名づけられた正しい輪郭からずれた位置や半径を基準として切り出したパターンによる部分空間を最小化の際の指標として用いることで、初期値が正解から離れている場合にも対応でき、収束時間を短くすることができる。
【００４２】
先に示した文献２では輪郭が円形の場合であったが、この方式は輪郭形状が任意であってもよい。
【００４３】
例えば、楕円や輪郭上の複数個のサンプル点の組からなるスプライン曲線等でもよい。
【００４４】
楕円の場合には、位置のパラメータ２＋形状パラメータ３の５パラメータで表される。
【００４５】
また、スプライン曲線の場合には、任意の形状を表現可能な個数のサンプル点数で表現される。
【００４６】
但し、任意の形状の場合にはパラメータ数が多くなりすぎて、収束が難しい、あるいは例えできたとしても非常に時間がかかって現実的ではない場合が考えられる。
【００４７】
その場合には、ある基準点（例えば瞳の中心）を元に、基準の大きさ（例えば瞳の平均半径、本実施形態の場合には、先に瞳位置を求めた際の分離度フィルタの半径が使用可能である）からのずれを、予め収集した多数の実画像における輪郭形状についてデータを取得しておき、それらのデータを主成分分析することで、パラメータの次元を減らすことが可能である。
【００４８】
また、パターンの正規化方法としては位置と形状を元にアフィン変換等で画像を変形させる方法の他に、輪郭に沿って画像を切り出したパターンを用いることができる。
【００４９】
（４）データ記憶部１０３
データ記憶部１０３においては各選択可能なカラーコンタクトレンズに対して必要な情報を保持する。情報はアイテムに必要な特徴点、特徴領域、または、その特徴点と特徴領域との両方に対する塗布領域の基準位置、サイズ、形状の相対的な位置関係よりなる位置関係データを含む。
【００５０】
本実施形態においては、特徴点は瞳の虹彩領域の中心であり、特徴領域は虹彩領域である。
【００５１】
カラーコンタクトレンズの中心点と瞳の中心との距離は０とする。
【００５２】
また、レンズデータに対してデータ中の基準点と特徴点及び特徴領域を元に算出される基準点とそれに対応するレンズデータの基準点が情報として記録されている。
【００５３】
さらに、データ記憶部１０３は、基準点に対応する化粧データ（色のデータ、テクスチャのデータ、透明度のデータ）が同時に記録されている。
【００５４】
（５）加工方式決定部１０５
加工方式決定部１０５では、予め入力されてデータ記憶部１０３で記録されているデータの組を選択する。
【００５５】
本実施形態では、化粧のアイテムであるカラーコンタクトレンズの種類を選択する。選択を促すダイアログの例を図４に示す。
【００５６】
選択画面では可能なカラーコンタクトレンズを示す画像や、色、文字列等を表示する。
【００５７】
（６）画像加工部１０４
画像加工部１０４では、検出された特徴領域と選択された位置関係データ、化粧データに基づいて画像の加工を行う。
【００５８】
本実施形態では、瞳の虹彩領域に選択されたカラーコンタクトレンズの画像を重ねて表示することにする。
【００５９】
カラーコンタクトレンズの化粧データは、第１にレンズ半径と中心からの距離に対応した色と透明度を持つ場合と、第２にレンズの形状及び色やテクスチャー情報を２次元の画像データとして保持する場合が考えられる。
【００６０】
本実施形態では第１の場合について述べる。
【００６１】
まず、検出された瞳の中心を位置関係データに基づいてレンズの中心に対応させ、虹彩輪郭上の各点をレンズの虹彩対応点に対応させ、アフィン変換で対応半径を算出する。
【００６２】
次に、各虹彩内の画素について算出された半径に対応する色や透明度に応じて各画素の値を変更する。例えば、色及び透明度が中心からの距離ｒの関数としてそれぞれＣ（ｒ）、α（ｒ）で表される場合には、元の画素値がＩｏｒｇ (r) とすると、変更された画素値Ｉ（ｒ）は以下の式で表される。
【００６３】
【数１】

【００６４】
このように透明度を考慮して２つの輝度を重ねあわせる方式をαブレンディングという。
【００６５】
（７）画像表示部１０６
画像表示部１０６では、画像加工部１０４で作成された画像をディスプレイ等に表示する。
【００６６】
（第２の実施形態）本発明に関わる第２の実施形態の仮想化粧装置５００について、図５から図８に基づいて説明する。
【００６７】
（１）仮想化粧装置５００の構成
本実施形態の仮想化粧装置５００は、ＰＣにビデオカメラを接続し、加工した画像をネットワークで送信し、通信相手のディスプレイに画像を表示する例である。
【００６８】
具体的には人物の顔領域に対して口紅等のメイクアップを施した画像を表示するものである。
【００６９】
図５は本実施形態の構成を表す図である。
【００７０】
画像入力部５０１は、ビデオカメラであり、２Ｄの動画像を入力する。
【００７１】
特徴検出部５０２では、入力された画像から必要とされる特徴点及び特徴領域を検出する。本実施形態では、瞳位置、鼻孔位置、唇領域とする。
【００７２】
データ記憶部５０３では、対象とするメイクアップのアイテム及び種類毎に色、テクスチャー、透明度、サイズ等の情報を記憶しておく。
【００７３】
加工方式決定部５０５では、記憶されたメイクアップアイテム及び種類、濃さ、グラデーションのパラメータ等を選択することができるインターフェースを提供する。
【００７４】
画像加工部５０４においては、選択されたメイクアップのデータに基づき、画像中の対象者における各メイクアップアイテムの対象領域にメイクアップデータを塗布した状態に加工する。
【００７５】
表示画像送信部５０６では、加工された画像を送信すると共に、その加工された画像を表示する。
【００７６】
各構成部について以下で詳細に述べる。なお、特徴検出部５０２、データ記憶部５０３、画像加工部５０４、加工方式決定部５０５、表示画像送信部５０６の各機能は、ＰＣに記録されているプログラムによって実現される。
【００７７】
（２）画像入力部５０１
画像入力部５０１では、カメラからの人物の顔が撮影された画像を入力する。この部分は第１の実施形態と同様である。
【００７８】
（３）特徴検出部５０２
特徴検出部５０２では最初に瞳位置、虹彩半径、鼻孔位置、口端位置を検出する。それらの検出は第１の実施形態と同様である。
【００７９】
それらの特徴点位置、特徴量から以下の塗布領域及び塗布方法を決定する。
【００８０】
データ記憶部５０３には、各メイクアップのアイテム固有の、色、サイズ等の情報の他に、各メイクアップアイテムの種類（アイシャドウ、チーク等）に対応する、必要な特徴点／領域、及びそれらの特徴点／領域からのメイクアップアイテム塗布位置／領域までの相対座標／距離を与える定数値もしくは関数が記録されている。
【００８１】
本実施形態においては、チークでは必要な特徴点／領域は両瞳中心位置及び鼻孔位置であり、これらの特徴点からチーク塗布領域（楕円で表現される）の中心座標と長径／短径／回転角を算出する関数が記録される。
【００８２】
（３−１）アイシャドウ塗布
アイシャドウ塗布の例を示す。
【００８３】
アイシャドウ領域は図６に示すように検出された瞳位置及び虹彩半径を元に算出された領域（斜線部分）について塗布を行う。この領域は、上側は眉で下側はまぶたの下端で規定され、左右においても眉もしくはまぶたの存在領域に規定される。但し、この存在範囲の抽出はそれほど厳密でなくてよい。
【００８４】
なぜなら、これらのアイシャドウの塗布は通常、塗布されない領域に対してぼかしを入れるため、塗布領域の端部においては、多少位置がずれても表示上はあまり問題にならないためである。
【００８５】
さらに、塗布領域においては、その塗布領域内での元画像の輝度や色の情報を利用して塗布を行うか行わないかを決めることが可能である。例えば、ある一定以上輝度の低いところには塗らないとすればまつげの上等に誤って塗布してしまうことを防ぐことができる。もちろん、直接塗布領域を詳細に抽出しても差し支えない。
【００８６】
（３−２）チーク塗布
次に、チーク塗布の例を示す。
【００８７】
チーク領域は図７に示すように、瞳の位置及び虹彩半径及び鼻孔位置を元に算出された領域について塗布を行う。チークの場合もアイシャドウの場合と同様に、ぼかしを入れたり、元画像の輝度や色の情報によって塗り分けたりすることが可能である。
【００８８】
画素値を決める方法としては、例えば図の中心が（ｘ０，ｙ０）の短径、長径がそれぞれａ，ｂで表される楕円を基準として、中心からの距離をそれらで正規化した距離ｒを（ｘ，ｙ）の点について（２）式で表されるように規定する。
【００８９】
【数２】

【００９０】
このｒを（１）式に当てはめたものを利用する。但し、この場合、Ｃ（ｒ）を（３）式で表されるものとする。
【００９１】
【数３】

【００９２】
（３−３）口紅塗布
次に、口紅塗布の例を示す。
【００９３】
口紅領域は図８に示すように、口端位置、あるいはそれに加えて鼻孔や瞳位置等の情報を元に、唇の輪郭を抽出し、その輪郭の内部を基本的には塗布する。
【００９４】
但し、その領域内には口を開いた場合には歯や歯茎等が含まれてしまう可能性があるため、内側の輪郭についても抽出するか、もしくは他の輝度や色情報を用いてそれらの領域を無視する必要がある。
【００９５】
唇の輪郭を抽出するには例えば口端の２点を元に、予め取得した一般的な唇輪郭の多数のデータを主成分分析して得られた主成分輪郭を元に初期値を生成し、生成された初期値から輪郭を抽出する。この方式は特願平１０−０６５４３８で開示されている方式である。また、同様に内側の輪郭も抽出し、内側輪郭と外側輪郭の間に対して処理を行ってもよい。
【００９６】
（３−４）その他
加工すべきアイテムは前記の特徴に限らない。例えば特徴としては、肌領域、眉領域、まぶた輪郭等が使用可能である。顔領域を決定すれば、その部分にファンデーションを塗ることが可能である。
【００９７】
また、塗布領域の決定はこれら６個の特徴点位置のみによらなくてもよいし、すべてを使う必要もない。得られた特徴量から必要なもののみを使用してもよいし、入手可能な得られる特徴量の任意の組合せが利用可能である。
【００９８】
さらに、画像を入力して特徴点抽出部５０２において特徴点を抽出した後に、アイシャドウの塗布、チーク塗布、口紅の塗布を順番に行ってメークアップを完了しても良い。
【００９９】
（４）表示画像送信部５０６
表示画像送信部５０６はネットワークを通じて作成された画像を送信し、相手方のディスプレイ上に送信された画像を表示する。
【０１００】
（第３の実施形態）本発明に関わる第３の実施形態の仮想化粧装置９００について、図９から図１２に基づいて説明する。
【０１０１】
（１）仮想化粧装置９００の構成
本実施形態の仮想化粧装置９００は、ペインティングやタトゥー等を施したと同等の画像を表示し、ユーザが表示されたいくつかのアイテムや色、形状、位置等のパラメータを選択し、選択された画像と同等の効果が実世界で実現できる型紙や具材を作成したり、予め対応する型紙や具材を準備しておき、ユーザの選択に合わせて提供したりする仮想化粧装置９００に関するものである。
【０１０２】
本実施形態の構成図を図９に示す。
【０１０３】
加工具材配布部９０７以外の構成は、第１の実施形態と同様である。
【０１０４】
但し、特徴検出部９０２においては、ペインティングを施すために、顔の向きの情報が不可欠であるため、顔向き検出を導入する。
【０１０５】
（２）顔向き検出方法
顔の向きは目鼻等の特徴点の位置関係から簡易的に求めることもできるし、顔領域のパターンを使って求めることも可能である。ペインティングを施した画像の例を図１０に示す。
【０１０６】
文献としては、例えば、特徴点を利用した手法として特開２００１−３３５６６６、パターンを利用した手法として特開２００１−３３５６６３、文献３（山田、中島、福井「因子分解法と部分空間法による顔向き推定」信学技法ＰＲＭＵ２００１−１９４，２００１）があげられる。
【０１０７】
ここでは特開２００１−３３５６６６で提案された特徴点の位置座標から顔の向きを計算する方法について簡単に説明する。
【０１０８】
まず、顔の映っている画像（互いに異なる顔向きの画像が三フレーム以上）から、目鼻口端等の特徴点（四点以上）を検出する。
【０１０９】
その特徴点位置座標から、因子分解法（Ｔｏｍａｓｉ，Ｃ．ａｎｄＴ．Ｋａｎａｄｅ：ＴｅｃｈｎｉｃａｌＲｅｐｏｒｔＣＭＵ−ＣＳ−９１−１７２，ＣＭＵ（１９９１）；ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，９：２，１３７−１５４（１９９２））を用いて特徴点の３次元座標を求める。
【０１１０】
この特徴点の３次元座標を要素にもつ行列を形状行列Ｓとして保持しておく。向きを求めたい顔の映っている画像が入力されたら、その画像から特徴点を検出し、その特徴点の２次元座標を要素にもつ計測行列Ｗｎｅｗに形状行列Ｓの一般化逆行列を掛ければ、顔の向きを表す運動行列Ｍｎｅｗが求まり、ロールピッチヨー等の回転角がわかる。運動行列Ｍｎｅｗを計算する様子を図１１に示す。なお、図１１では、形状行列Ｓの一般化逆行列をＳの上に＋をつけた記号で表示ししている。
【０１１１】
この図では、カメラを固定して顔を動かす様子を相対的に、顔を固定してカメラ向きが変化するとみなしている。特徴点座標は特徴検出部９０２で検出されたものを利用すればよく、向き自体は行列の掛け算を一度するだけで求まるため、非常に高速な手法であり、フレーム毎に独立に計算できるため誤差が蓄積することもない。
【０１１２】
このようにして求まる回転角等を元に、ペインティングアイテムにアフィン変形を施し顔画像にマッピングすれば、顔向きに合致した自然なペインティング画像が得られる。
【０１１３】
（３）加工具材配布部９０７
加工具材配布部９０７では図１０のような画像を実世界で実現するための具材を配布する。
【０１１４】
具材としては、図１２に示すような型紙を生成する。
【０１１５】
型紙にはペイントを行う部分には穴が開いていて、簡単に塗りつぶしを行うことができる。
【０１１６】
また、型紙には図中×で表すような基準点を設定することで誰でも簡単に位置あわせができる。
【０１１７】
また、型紙は１枚である必要はなく、必要に応じて複数枚生成することで、複雑な形状や、何色も色が存在する場合にも対応可能である。
【０１１８】
（第４の実施形態）
本発明に関わる第４の実施形態の仮想化粧装置９００について、図１３から図１４に基づいて説明する。
【０１１９】
本実施形態では、デジタルメイクを行う際の照明環境を考慮した例について述べる。処理の流れを図１３に示す。
【０１２０】
照明環境の記述法は、光源の位置、方向、色、強度等を指定でき、一般のＣＧで取り扱われているような点光源、面光源を複数用意することで、複雑な光源のシミュレートや、屋外での太陽光等をシミュレートする。
【０１２１】
顔の各部分での反射率はＢＲＤＦ（双方向反射関数）等の関数を用いて記述しておき、記述された照明環境に基づいて各画素の輝度値を計算し、表示を行う。
【０１２２】
次に、カメラにて実際の人物の顔を撮影する際の照明環境を取得し、照明環境をコントロールする方法について述べる。図１３のようなフローチャートを用いて行う。
【０１２３】
まず、顔の３次元モデルを、複数のカメラによる取得、もしくは、単一のカメラにてＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ技術の手法を用いて作成する。
【０１２４】
また、文献４（牧「Ｇｅｏｔｅｎｓｉｔｙ拘束による３次元形状獲得−複数光源への対応−」電子情報通信学会パターン認識・メディア理解研究会研究報告、ＰＲＭＵ９９−１２７，１９９９）等の方法を用いてもよい。
【０１２５】
作成した３次元モデルに対して、３次元モデルを構成する複数の面情報について、各面の法線方向等を求めておき、各面についての反射の様子から、光源を推定するために必要な情報を得る。
【０１２６】
光源の推定方法は、文献５（西野、池内、張「疎な画像列からの光源状況と反射特性の解析」、文化財のデジタル保存自動化手法開発プロジェクト平成１３年度成果報告書、ｐｐ．８６−５０１）のように、各面での鏡面反射の情報を用いて、複数の点光源方向を求めるものや、文献６（佐藤いまり、佐藤洋一、池内克史、“物体の陰影に基づく光源環境の推定、”情報処理学会論文誌：コンピュータビジョンとイメージメディア「Ｐｈｙｓｉｃｓ−ｂａｓｅｄＶｉｓｉｏｎとＣＧの接点」特集号，Ｖｏｌ．４１，Ｎｏ．ＳＩＧ１０（ＣＶＩＭ１），ｐｐ．３１−４０，Ｄｅｃｅｍｂｅｒ２０００．）のように影の情報をもちいた光源位置の推定法でもよい。
【０１２７】
図１４（ａ）のように、ある環境で撮影した顔に対して、図１４（ｂ）のように別の環境での照明条件での画像を作成する場合を考える。
【０１２８】
推定した光源の情報を用いて、それらの光源の方向、数、強度等を抑制することによって、３次元モデルに対しての照明環境を変化させ、様々な環境における画像を生成する。
【０１２９】
また、３次元モデルを陽に用いない方法も考えられる。図１４（ａ）における撮影時の照明条件を考慮して、影、鏡面反射等の特徴的な部分を取り除いた画像を作成する。そして、図１４（ｂ）の別の環境での照明条件を考慮し、影、鏡面反射等を画像に付加することで、別の照明条件での画像を得る。
【０１３０】
また、合成したい、これらの照明環境については、予めいくつかの照明パターンを用意しておいてもよい。例として、「屋内（自宅）」「屋内（レストラン）」「屋外（晴れ）」「屋外（くもり）」等、環境、場所、時間等に応じた照明環境を作成し、合成してもよい。
【０１３１】
以上のように撮影した環境と違った照明環境の画像を作成することにより、あたかも別の場所で撮影したかのような顔画像を仮想的に作成することができる。
【０１３２】
（第５の実施形態）以下、図面を参照して本発明の第５の実施形態の仮想化粧装置について説明する。
【０１３３】
人は化粧をする時に、鏡で自分の顔を見ながら行うことが多い。人は化粧をする時、顔を鏡に近づけたり、あるいは、手鏡等、鏡を動かせる場合は鏡を顔に近づけたりして、顔の見たい部分を拡大して見ようとする。
【０１３４】
本実施形態の仮想化粧装置は、カメラで撮影した利用者の顔の画像から検出した複数の特徴点から特定の領域（例えば瞼、唇）を決定し、その領域に対してメイクアップを行う。さらには、顔の動きを検知して当該領域を基準に顔を拡大表示する。
【０１３５】
尚、以下の説明では瞼にアイシャドウを擬似的に塗布する場合を例に説明を行う。
【０１３６】
図２０は本実施形態の仮想化粧装置を実現するためのパーソナルコンピュータ（ＰＣ）の構成の一例を説明する図である。
【０１３７】
このＰＣは、プロセッサ２００１と、メモリ２００２と、磁気ディスクドライブ２００３と、光ディスクドライブ２００４とを備える。
【０１３８】
更に、ＣＲＴやＬＣＤ等の表示装置２００８とのインターフェース部に相当する画像出力部２００５と、キーボードやマウス等の入力装置２００９とのインターフェース部に相当する入力受付部２００６と、外部装置２０１０とのインターフェース部に相当する出入力部２００７とを備える。
【０１３９】
出入力部２００７としては、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）やＩＥＥＥ１３９４等のインターフェースがある。
【０１４０】
本実施形態の仮想化粧装置では、画像を入力するためのビデオカメラは外部装置２０１０に相当する。
【０１４１】
磁気ディスクドライブ２００３には、入力された画像に対してメイクアップ等の画像処理を行うプログラムが格納されている。プログラムを実行する際に磁気ディスクドライブ２００３からメモリ２００２に読み込まれ、プロセッサ２００１においてプログラムが実行される。
【０１４２】
ビデオカメラで撮像された動画像は出入力部２００７を経由してメモリ２００２にフレーム単位で順次記憶される。プロセッサ２００１は記憶された各フレームに対して画像処理を行うとともに、利用者が本装置を操作するためのＧＵＩの生成も行う。プロセッサ２００１は処理した画像及び生成したＧＵＩを画像出力部２００５を経由して表示装置２００８に出力する。表示装置２００８は画像及びＧＵＩを表示する。
【０１４３】
図２１は、本実施形態の仮想化粧装置の機能ブロックを説明する図である。
【０１４４】
本装置は、動画像を入力するビデオカメラである画像入力部２１０１と、入力された画像から人物の顔の特徴点及び特徴領域を検出する特徴検出部２１０２と、化粧データ及び位置関係データを記憶するデータ記憶部２１０３を備える。
【０１４５】
化粧データとは、化粧のアイテムの一つであるアイシャドウの種類毎の色、テクスチャー、透明度のデータである。位置関係データとは、顔の特徴点及び特徴領域とアイシャドウの塗布領域との位置関係を示すデータである。
【０１４６】
さらに、利用者が予めデータ記憶部２１０３に記憶させてあるアイシャドウの種類を選択するためのＧＵＩを提供する加工方式決定部２１０５と、瞼の部分に選択されたアイシャドウの色を重ねて擬似的に塗布した画像を生成する画像加工部２１０４と、加工された画像を表示するＬＣＤやＣＲＴ等の表示装置である表示部２１０６と、顔の大きさ・位置・傾きを推定し、推定に基づいて表示部２００５を制御して拡大表示させる大きさ・位置情報管理部２１０７を備える。
【０１４７】
以下、各部について説明する。
【０１４８】
（画像入力部２１０１）画像入力部２００１は、ビデオカメラから対象となる人物の顔が映っている動画像を入力する。ビデオカメラとしては、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）接続のカメラや、デジタルビデオカメラ等の一般的なカメラを用いればよい。入力された動画像は、特徴検出部２１０２に逐次出力される。
【０１４９】
（特徴検出部２１０２）特徴検出部２１０２は、入力された画像から顔の特徴領域を検出する。例えば、瞼にアイシャドウを塗る場合は、図３に示した瞳の虹彩領域の輪郭３０１を検出する。
【０１５０】
瞳の虹彩領域輪郭検出は第１の実施形態と同様に文献１に示す手法で行う。検出処理の流れは、特徴点を検出して、特徴点周辺の画素情報を部分空間法により照合していくというものである。以下、概要を図２２に基づいて説明する。
【０１５１】
（Ｓ２２０１分離度マップ生成）
入力画像全体の各画素毎に図２３（Ａ）に示す円形分離度フィルターの出力値を求め、分離度マップを生成する。分離度とは２つの領域の間の画素情報の分離度を表す指標である。本実施形態では、領域２３０１と領域２３０２との間の分離度を次式を用いて求める。
【０１５２】
【数４】

【０１５３】
尚、入力画像全体について分離度マップを求めるのではなく、例えばテンプレートマッチングや背景差分法等の画像中における顔の領域を特定する前処理を行った上で分離度マップを求めても良い。
【０１５４】
（Ｓ２２０２特徴点候補抽出）
生成した分離度マップにおいて分離度が局所最大値になる点を特徴点候補とする。
【０１５５】
局所最大値を求める前に分離度マップに対して適宜平滑化等の処理を行っておくと、ノイズの影響を抑制できるので良い。
【０１５６】
（Ｓ２２０３パターン類似度算出）
各特徴点候補の近傍から、分離度フィルタの半径ｒに応じた局所正規化画像を取得する。正規化は半径に応じたアフィン変換で行う。正規化画像と予め瞳周辺の画像から作成しておいた辞書（左の瞳、右の瞳）との類似度を部分空間法により算出する。
【０１５７】
（Ｓ２２０４特徴点統合）
各特徴点候補の位置、分離度フィルタの半径、類似度の値に基づいて、正しい瞳の位置を検出する。
【０１５８】
左の瞳及び右の瞳それぞれについて、Ｓ２２０３パターン類似度算出で求めた類似度の高い順に所定数の特徴点候補若しくは類似度が所定の閾値の以上となる特徴点候補を抽出する。抽出された左右の瞳の特徴点候補を組み合わせて左右の瞳候補を生成する。
【０１５９】
生成した左右の瞳候補を予め想定した条件と照合して、条件に合致しないものを候補から除外する。条件とは、例えば左右の瞳の間隔（異常に狭い・広い場合は条件外）や、左右の瞳を結ぶ線分の方向（画像中で垂直方向に伸びるような場合は条件外）等である。
【０１６０】
残った左右の瞳候補のうち、左右それぞれの瞳の類似度の和がもっとも高い組み合わせを左右の瞳とする。
【０１６１】
（Ｓ２２０５詳細検出）
瞳の輪郭をさらに正確に求め、瞳の正確な位置を求める。このステップでは文献６（Yuasa et al. "Precise Pupil Contour Detection Based on Minimizing the Energy of Pattern and Edge", Proc.IAPR Workshop on Machine Vision Applications, pp. 232-235, Dec. 2002）の手法を利用する。以下では、この手法の概要を説明する。
【０１６２】
Ｓ２２０４特徴点統合で得られた瞳位置とその瞳を検出する際に用いた分離度フィルタの半径とを利用して、瞳の輪郭を円形状と仮定して初期形状を作成する。作成した初期形状は楕円パラメータで表現しておく。
【０１６３】
例えば中心座標が（ｘ₀，ｙ₀）で半径がｒならば、楕円パラメータは、長短径をａ，ｂ、回転角をθとして、（ｘ，ｙ，ａ，ｂ，θ）＝（ｘ₀，ｙ₀，ｒ，ｒ，０）と表すことができる。ｘ、ｙは中心位置、ａ、ｂは楕円の長径、短径、θは目の傾きである。
【０１６４】
まず、これらの楕円パラメータを初期値として、ガイドパターンを用いて楕円パラメータを大まかに求める。図２４はガイドパターンについて説明する図である。
【０１６５】
ここで用いるガイドパターンは、基準となる瞳周辺の画像（正解パターン）に対して、瞳の半径、瞳の中心位置等のパラメータを様々に変えた画像を生成して主成分分析により作成した辞書である。ここでは楕円パラメータ等を様々に変えて作成した図２５に示す１１種類の辞書を用いる。尚、正解パターンは利用者自身のものでなくても良い。
【０１６６】
辞書との照合は部分空間法により行う。複数の辞書の中から最も類似度が高いものを求める。求めた辞書を作成した時のパラメータが、入力した瞳の画像に関するパラメータに近いと推定される。
【０１６７】
ガイドパターンを利用して推定した楕円パラメータと入力画像とを用いて、楕円パラメータの値を変えながらエッジエネルギーとパターンエネルギーとを算出する。
【０１６８】
エッジエネルギーとは、当該楕円パラメータを持つ楕円形状の分離度フィルタ（図２３（Ｂ））を用いて求めた分離度の値の符号を負にしたものである。分離殿値は値の計算には、前述の数４を用いる。
【０１６９】
パターンエネルギーとは、当該楕円パラメータを基準にして瞳近傍の縮小画像と、予め準備した正解画像及び楕円パラメータから作成した辞書との類似度の値の符号を負にしたものである。辞書との類似度は、部分空間法を用いて求める。
【０１７０】
このステップでは、楕円パラメータを変化させて上述の二つのエネルギーの和が局所的に最小となる時の楕円パラメータの組を求めることにより、正確な瞳の輪郭を求める。
【０１７１】
以上のステップにより正確な瞳の輪郭が求まる。瞳の位置は、輪郭の中心として求めることができる。
【０１７２】
（大きさ・位置情報管理部２１０７）大きさ・位置情報管理部２１０７では、特徴検出部２１０２で求めた瞳の位置、瞳の輪郭半径（楕円パラメータ）から、現在の顔の大きさ、位置、傾きを推定する。そして、顔の大きさの変化率から顔を表示する際の拡大率を求める。
【０１７３】
顔の大きさ及び傾きは、例えば両瞳を結ぶ線分の長さや傾きから推定する。さらに、左右の瞳の輪郭半径の比率からは顔の回転（首を中心とした回転）を推定する。
【０１７４】
推定した情報をもとにして、利用者が注目している領域を中心に拡大表示するように、拡大の中心位置と拡大率とを画像加工部２１０４に通知する。顔において仮想的に化粧を行っている部位（目、頬、唇等）のうち、画像の中心位置に最も近い位置にあると推定された部分を含む領域を、利用者が注目している領域とする。
【０１７５】
拡大の際に、例えば、顔全体が大きくなりつつある場合は、実際の顔の大きさの変化率を上回る拡大率で拡大するように通知する。このようにすることで、カメラに接近しすぎることを防ぐことができる。図２８は拡大表示の一例を説明する図である。最初は顔全体が表示されていても、顔の大きさの変化率から利用者の接近を検出し、利用者が接近しすぎる前に、例えば目を拡大表示させている。
【０１７６】
人物がカメラに接近し過ぎた状態になると顔全体を撮影することが困難になる。結果として顔の向きや大きさを推定することが困難になる。
【０１７７】
このような場合は、画面中で注目領域を追跡する。追跡には前述したガイドパターンを応用する。
【０１７８】
前述したガイドパターンは正解パターンから位置及び瞳半径がズレた状態のパターンであるので、これを用いれば現在の位置及び形状におけるパターンがどの「ズレ方」であるかを推定できる。すなわち、前のフレームにおける「ズレ方」と今のフレームにおける「ズレ方」との違いから、追跡対象となる領域がどのように運動しているかを推定することができる。
【０１７９】
例えば左右に位置がずれていれば左右の方向の運動していることが分かる。大きさが大きくなっていれば近づいていることがわかる。さらに変形具合から３次元的な運動の推定も可能である。
【０１８０】
図２９は、人物の顔位置がカメラに対して画面内で時間ｄｔの間にｘ方向に−ｄｘ移動した場合の例を説明する図である。時刻ｔ０における領域２９０１と同じ位置にある、時刻ｔ０＋ｄｔにおける領域は、領域２９０２である。ところが、顔が移動したため、領域２９０２と同じ（最も類似する）パターンを持つ、時刻ｔ０における領域は、領域２９０３になる。
【０１８１】
よって、時刻ｔ０において、ガイドパターンを用いて領域２９０１のパターンの正解パターンからの「ズレ」を調べておく。そして、時刻ｔ０＋ｄｔにおいて、ガイドパターンを用いて領域２９０２のパターンの正解パターンからの「ズレ」を調べる。両時刻における「ズレ」の差から移動量を推測することができる。
【０１８２】
（画像加工部２１０４）画像加工部２１０４においては、特徴検出部２１０２において検出された瞳の位置を利用して画像の加工を行なう。また、大きさ・位置情報管理部２１０７から通知された拡大率及び拡大の中心位置に基づいて画像を拡大する加工を行う。
【０１８３】
以下の説明では、図２６を参照して、右眼の瞼にアイシャドウを仮想的に塗布した画像作成する場合を例に説明する。
【０１８４】
アイシャドウを仮想的に塗布する手法の概略は次のようになる。瞳の検出された領域、両瞳の位置関係、事前知識として得られる瞳の大きさと目頭及び目尻位置等の関係から、大まかな瞼領域として図２６における矩形領域２６０１を推定する。矩形領域２６０１内の輝度情報から、瞼に相当する領域、すなわち本領域中で眉や瞳でなない部分、を決定する。瞼の領域に対して、アイシャドウを塗布する加工を施す。
【０１８５】
矩形領域２６０１は以下のように決定する。
【０１８６】
矩形領域２６０１の下底が、特徴検出部２１０２で検出した瞳の中心２６０２を通るように設定する。矩形領域２６０１の左右の辺は目頭２６０４や目尻２６０３の位置を検出して求めることもできるが、本実施形態では瞳の中心２６０２から目尻寄りに瞳の半径の例えば２．５倍の位置を矩形領域左下端部２６０５と仮定し、瞳の中心２６０２から目頭寄りに瞳の半径の例えば３倍の位置を右下端部２６０６と仮定する。左下端部２６０５を通り先ほど設定した下底と垂直に交わる線と右下端部３６０６を通り下底と垂直に交わる線とを左右の辺とする。
【０１８７】
矩形領域の上底は、下底から瞳の半径の例えば４倍の距離だけ上方に設定する。左右の辺との交点を、左上端部２６０８、右上端部２６０９とする。
【０１８８】
アイシャドウの塗布領域２６１０の決定方法について述べる。
【０１８９】
まずはじめに、矩形領域２６０１全体において、アイシャドウの塗布領域２６１０とその他の領域間の輝度を分ける閾値を決定する。
【０１９０】
矩形領域２６０１における輝度分布は図２７のようなヒストグラムとして表わすことができる。矩形領域２６０１に含まれる領域は輝度の大きさによって大別して３つの領域に分割される。もっとも輝度の低い領域２７０１は主に虹彩（瞳）内部、まつ毛、眉毛等であり、もっとも輝度の高い領域２７０３は白目領域であり、残った中間の領域２７０２が瞼、すなわちアイシャドウを塗布すべき領域となる。
【０１９１】
矩形領域２６０１の輝度分布のヒストグラムに判別分析法を適用して、輝度を３分割する閾値Ｔｈ１及びＴｈ２を決定する（Ｔｈ１＜Ｔｈ２とする）。判別分析法は、判別基準としてクラス間分散σ_B ²が最大になるような閾値を決定する方法である。分割による分離度を表わす指標としては、次式で表わされるものを用いることができる。ここではこの値が極大となる時の閾値をＴｈ１およびＴｈ２とする。
【０１９２】
【数５】

【０１９３】
次に、矩形領域２６０１を上下に２等分する線分２６１１を設定する。線分２６１１上の各点から上下方向にそれぞれ塗布領域２６１０の探索を行う（図中、中抜きの矢印で表わされた方向）。
【０１９４】
探索は次の手順で行なう。上半分については眉との境界が問題であるから、先に求めた２つの閾値のうち低い方、すなわち閾値Ｔｈ１を使用する。線分２６１１上の各点から上方向に移動し、その輝度値がＴｈ１より高ければさらに上へ移動する。輝度値がＴｈ１より低ければ眉領域に達したと判断して探索をやめる。本処理を各点について行なうと、塗布領域２６１０の上側境界が求まることになる。
【０１９５】
下半分についても同様である。下半分には輝度の高い部分である白目が存在するので、輝度値が閾値Ｔｈ２を超えないことを条件とする。また、虹彩領域に入ることも妥当でないので、輝度値が閾値Ｔｈ１より大きいことも条件とする。
【０１９６】
さらに、目頭２６０３と目尻２６０４の位置を検出するために、白目のラインと矩形領域２６０１の下底との交点を求める。矩形領域２６０１の右下端部２６０６から左に向かって輝度値を調べる。輝度値が閾値Ｔｈ２を超える点を目頭２６０４とする。同様に矩形領域２６０１の左下端部２６０５から右に向かって輝度値を調べる。輝度値が閾値Ｔｈ２を超える点を目尻２６０３とする。
【０１９７】
矩形領域２６０１の左の辺と線分２６１１との交点２６１２を求める。矩形領域２６１０のうち交点２６１２と目尻２６０３とを結ぶ線分より下にある領域は塗布領域２６１０には含めない。同様に、矩形領域２６０１の右の辺と線分２６１１との交点２６１３を求める。矩形領域２６１０のうち交点２６１３と目頭２６０４とを結ぶ線分より下にある領域は塗布領域２６１０には含めない。
【０１９８】
このようにして、塗布領域２６１０を求めることができる。画像にノイズがある場合、境界線が不連続になることが考えられる。そこで、あらかじめ矩形領域２６０１内の輝度を平滑化するか、求められた境界を平滑化するか、或いは、隣接する境界点間に拘束条件を設ける等の処理でそのような状態を避けると良い。本実施形態では、メディアンフィルタを用いて平滑化する。
【０１９９】
上述のようにして設定した塗布領域２６１０に対してアイシャドウを仮想的に塗布する加工を施す。第１の実施形態と同様にαブレンディングを用いて加工する。本実施形態では次式を用いる。
【０２００】
【数６】

【０２０１】
上式の（ｘ，ｙ）は瞳の中心２６０２からの相対的な位置である。Ｄ１は瞳の半径の３倍で、Ｄ２は瞳の半径の４倍の値とする。
【０２０２】
尚、顔が傾いている場合は、人物の顔向きを検出し、それに従って前述の矩形領域２６０１をアフィン変換等で変換して処理を行い、最後に逆変換すればよい。
【０２０３】
（データ記憶部２１０３、加工方式決定部２１０５、表示部２１０６）データ記憶部２１０３、加工方式決定部２１０５及び表示部２１０６は第１の実施形態と同様である。
【０２０４】
（第５の実施形態の効果）以上に説明したように、本実施形態の仮想化粧装置ならば、利用者がカメラに近づいたのを検出し、利用者が見ようとしていると推定される部位を拡大表示させることができる。
【０２０５】
（第６の実施形態）本実施形態の仮想化粧装置は、カメラで複数の人物の顔が撮像された場合に、各々の人物に対して異なる加工を行えるようにしたものである。以下、第５の実施形態と異なる部分を中心に説明する。
【０２０６】
図３０は本実施形態の仮想化粧装置の構成を説明する図である。本装置は、動画像を入力するビデオカメラである画像入力部２１０１と、入力された画像から人物の顔の特徴点及び特徴領域を検出する特徴検出部２１０２と、入力された画像中から人物の顔に相当する領域を検出する顔領域検出部３００１と、検出された顔画像情報から顔認識用の辞書を生成して記憶しておく顔辞書生成記憶部３００２と、化粧データ及び位置関係データを記憶するデータ記憶部２１０３を備える。
【０２０７】
化粧データとは、化粧のアイテムの一つであるアイシャドウの種類毎の色、テクスチャー、透明度のデータである。位置関係データとは、顔の特徴点及び特徴領域とアイシャドウの塗布領域との位置関係を示すデータである。
【０２０８】
さらに、利用者が予めデータ記憶部２１０３に記憶させてあるアイシャドウの種類を選択するためのＧＵＩを提供する加工方式決定部２１０５と、瞼の部分に選択されたアイシャドウの色を重ねて擬似的に塗布した画像を生成する画像加工部２１０４と、加工された画像を表示するＬＣＤやＣＲＴ等の表示装置である表示部２１０６と、顔の大きさ・位置・傾きを推定し、推定に基づいて表示部２００５を制御して拡大表示させる大きさ・位置情報管理部２１０７を備える。
【０２０９】
本装置では、入力された画像から顔領域検出部３００１が顔の領域を検出する。検出された顔の領域と特徴検出部２１０２で得られる情報とを用いて、大きさ・位置情報管理部２１０７は人物の顔が画像中のどの位置にどの大きさで何個あるかを推定する。
【０２１０】
そして、時間的・空間的に連続した位置にある顔については同一人物と推定し、大きさ・位置の情報に識別子を割り当てて記憶する。画像加工部２１０４は識別子を利用して人物毎に異なる加工を行うとともに、行った加工の内容を識別子とともにデータ記憶部２１０３に記憶させておく。
【０２１１】
大きさ・位置情報管理部２１０７は、一定数以上同一人物の画像が得られた場合、顔領域検出部３００１に対して顔認識を行うように通知を行い、識別子と顔の位置とを出力する。
【０２１２】
顔領域検出部３００１は通知を受けて、顔辞書生成記憶部３００２に、識別子と顔辞書の生成を行うのに必要な顔領域の画素情報を出力する。また、顔領域検出部３００１は顔認識を行って識別子による人物特定を行う。
【０２１３】
本実施形態では、顔認識は文献７（山口他、「動画像を用いた顔認識システム」、信学技法ＰＲＭＵ９７−５０、１９９７）で提案されている相互部分空間法を利用する。そのため、顔辞書の生成は、顔領域の画素情報から特徴ベクトルを求め、特徴ベクトルを主成分分析した結果の上位の固有ベクトルで張られる部分空間を求めることに対応する。
【０２１４】
顔領域検出部３００１における顔認識では、顔辞書生成と同様にして入力画像の顔領域から特徴ベクトル若しくは部分空間を求め、顔辞書生成期億部３００２に記憶されている顔辞書との類似度を算出する。
【０２１５】
顔認識を行うことによって、画像中に複数人の顔が同時に映っていても正確に人物を識別して各個人に対して加工を行うことができる。
【０２１６】
尚、人物が一度画像から消えてから再び画像中に出現した場合等でも、顔領域検出部３００１で顔辞書と照合を行えば同一人物かを識別することができる。そして、顔辞書に登録された人であれば、以前に割り当てられた識別子を用いてデータ記憶部２１０３に格納された加工情報を検索し、以前に行った加工を再現することが可能である。従って、本実施形態ならば、追跡中に一度見失っても再び追跡が可能となる。
【０２１７】
（変形例）
（１）変形例１
第２の実施形態において、このように化粧等の変形を施した場合、受信する側において、送信側の人物の顔が変わりすぎて、送信側人物を特定できなくなるといった問題が起こる可能性がある。
【０２１８】
そこで、受信側のセキュリティを保つために、化粧等の変形を施す前の画像で顔による個人認証を行い、その結果を同時に送信する。そうすることで、確かに送信側人物がその人であるとわかるので、受信側人物が安心して会話できる。化粧等の変形を施す前の画像で認識するので、認識率が低下したりしない。もちろん顔認識以外の方法で個人認証を行っても差し支えない。顔認識による個人認証には、特開平９−２５１５３４で開示されている方法を利用する。
【０２１９】
（２）変形例２
第２の実施形態において、個々のユーザ特有の３Ｄモデル、動きのモデル装着、塗布の嗜好性、色や具材の選択等をユーザパラメータと呼ぶ。こういったユーザパラメータをユーザ毎に保存することもできる。ログインユーザ情報もしくはカメラからの画像を利用した個人認証、もしくはその他の指紋やパスワード等による個人認証によってこれらのユーザパラメータを呼び出すことができる。
【０２２０】
（３）変形例３
メイクアップデータ保持やメイクアップアイテムの付加を行うのはサーバであってもよいし、自分または相手のローカルの装置でもかまわない。
【０２２１】
また、互いに通信を行う場合にあっては、その間に仲介するサーバがあってもなくてもよい。
【０２２２】
さらに、表示部１０６、９０６においても、通信を行う例においては自分の表示装置及び送信される側の表示装置のいずれかもしくは双方に元の画像及び加工済み画像の双方もしくはいずれかをどのような組合せで表示を行っても差し支えない。
【０２２３】
なお、通常は加工済みデータのみを表示するのが望ましい。
【０２２４】
（４）変形例４
第１の実施形態においては、カラーコンタクトレンズ装着のシミュレーションをする場合について述べたが、メイクアップアイテムはカラーコンタクトレンズに限らない。
【０２２５】
第２の実施形態で述べた一般的なメイクアップアイテムのいずれでも実施可能である。
【０２２６】
また、第２の実施形態においてカラーコンタクトレンズを装着してもよい。
【０２２７】
第３、４の実施形態についても同様である。
【０２２８】
さらに、それ以外にも頭髪やアクセサリ等であってもよい。
【０２２９】
（５）変形例５
第１〜４の実施形態において１台のカメラから２Ｄ動画像を入力する例について述べたが、カメラは一台に限る必要はない。
【０２３０】
すなわち、複数のカメラを用いて一般的なステレオ手法により、３Ｄのデータを取得してもよい。その場合には２Ｄによって得られた場合よりもより多くの特徴が得られる可能性がある。
【０２３１】
例えば、チークを塗布する際に、頬骨の出ている位置等を基準位置として使用することができる。
【０２３２】
さらに、３Ｄデータは複数台のカメラでなく、一台であっても、モーションステレオを用いれば取得可能であり、そのようなデータを利用してもよい。
【０２３３】
なお、カメラ以外のデバイスによって３Ｄデータを取得しても差し支えない。
【０２３４】
（６）変形例６
第１、２の実施形態では顔領域の特徴のみを用いたが、第３の実施形態のように顔の向きを検出し、その向きの変化を加工方式に加味してもよい。
【０２３５】
例えば、顔向きに応じて塗布位置をアフィン変換することにより、変化させるといったことが考えれる。この変換は単純なアフィン変換のみならず、予め一般的な顔の形状モデルを準備しておき、顔向きが何度の時に塗布位置や範囲がどのように変化するかを予め計算しておいたり、その場で計算したりしてもよい。
【０２３６】
また、顔の形状モデルは１つだけでなく、複数個準備しておき、ユーザの形状に合ったものを選択してもよい。
【０２３７】
このように、ユーザまたは一般的な３次元形状モデルが何らかの方法で得られる場合には、２次元画像に対して加工を施した加工済画像をそれらの形状モデルにテクスチャとしてマッピングを行なうことで、良好な３次元表示が得られることになる。
【０２３８】
（７）変形例７
第１〜４の実施形態においては、ＰＣを使用する例について述べたが、画像入力装置と画像表示装置を備えていれば、上記処理を行う部分はＰＣである必要はない。
【０２３９】
例えば、半導体チップやそれらの組み合わせたボード等の処理ハードウエア、組み込みコンピュータ、ゲーム機等でも差し支えない。
【０２４０】
（８）変形例８
第１〜４の実施形態においては、リアルタイムで入力画像に対して即時に表示を行う例について述べたが、これもその用途に限られるものではなく、画像を予め取得しておいて随時処理や表示を行ったり、処理結果を記憶メディアに保存したりしても差し支えない。
【０２４１】
（９）変形例９
第１〜４の実施形態については、予めメイクアップアイテムに対するデータセットが用意されている例について述べたが、これについても必ずしもはじめから用意されている必要はなく、ユーザが自由に実際のデータを入力できるシステムであってもよい。
【０２４２】
例えば、口紅の例について述べる。
【０２４３】
ユーザは、画像入力装置であるカメラで口紅を塗る前の画像と塗った後の画像の双方を同じ条件で撮影し、それらの双方から特徴（すなわち、この場合においては唇の輪郭）の抽出を行い、唇領域における画像特徴、すなわち、色やテクスチャーの差を抽出し、それらのデータを所定の方式によって加工することにより、メイクアップデータを作成する。
【０２４４】
加工の方法としては、以下の方法が考えられる。
【０２４５】
まず、抽出された唇領域の色、例えばＲＧＢそれぞれの平均値を口紅塗布前後で算出し、それらの差をメイクアップデータとする。
【０２４６】
また、元から存在せず、ユーザが入力する場合でもなく、予めインターネットやＣＤ−ＲＯＭ等のメディアを介してメイクアップアイテムやそのデータを配布してもよい。
【０２４７】
さらに、先に述べたメイクアップアイテムデータの入力装置はユーザの端末には必ずしも必要ではなく、これらの配布サイトに対してのみその機能があってよい。
【０２４８】
（１０）変形例１０
第１〜４の実施形態において、メイクアップデータの持ち方は、背景のない状態で元となる具材をカメラで撮像した画像、元となる具材を対象物に対して装着もしくは塗布した状態、及び、装着もしくは塗布する以前の状態における対象物の画像をカメラで撮像し、その差分情報を記録したものでもよい。また、それらの撮像を異なる対象物に対して行った情報の平均値もしくはなんらかの統計操作を施したもの等がある。
【０２４９】
また、それらの撮像は同じ対象物の異なる大きさや向きに対して行い、それらを大きさや向きのパラメータで正規化を行うかもしくはそれらのパラメータに対して分類して保持するとさらによい。
【０２５０】
さらに、それ以外にもデータは撮影条件等にかかわらず同じ画像を一部そのまま貼り付けたり、部分的にパールやグロスによるつや等の効果を与えたりすることもできる。
【０２５１】
例えば、付けひげ等は前者の例である。
【０２５２】
（１１）変形例１１
第１〜４の実施形態においては、メガネをかけていない場合について述べたが、最終的な状態としてメガネをかけた状態でもかまわない場合には問題がないが、最終状態としてメガネをはずした状態にしたい場合には、メガネの領域を検出してメガネを取り除いた画像を作成してもよい。
【０２５３】
（１２）変形例１２
第３の実施形態において、表示部９０６で表示された画像と同等のものを実世界において作成する具材をメイクアップデータから作成する例を述べたが、これについては第１、２の実施形態において、カラーコンタクトレンズや化粧具材を実施に作成してもよいし、具材ではなく、その補助となる型紙のようなものを作成あるいは、予め準備しておいても差し支えない。
【０２５４】
（１３）変形例１３
第１〜４の実施形態において、顔特徴等の検出に失敗した場合には、そのフレームの画像を表示せず、検出されている以前のフレームの画像にそのフレームにおいて検出された特徴により加工を施した画像を表示することもできる。
【０２５５】
こうすることで、特徴の検出に失敗した場合でも、不自然にならない表示が可能である。
【０２５６】
（１４）変形例１４
第１〜４の実施形態において、表示は一画面のみの場合について述べたが、これは２つ以上の画像を含んでも差し支えない。
【０２５７】
例えば、図１５のように時刻の異なる画像を並べて表示したり、図１６のように方向の異なる画像を並べて表示したり、図１７のように異なる具材によって加工された画像を並べて表示したりしてもかまわない。
【０２５８】
また、その場合、具材は同じ具材でかつ、色のみ違うパターンを表示すればより選択の際にユーザの補助として有効である。
【０２５９】
さらに、図１８のように、画像中の一部のみを拡大して表示する機能があればより好ましい。
【０２６０】
（１５）変形例１５
第１〜４の実施形態において、画像の加工をαブレンディングによって行う例をあげたが、他の方法でも差し支えない。
【０２６１】
例えば、当該範囲の色情報のうち色相のみを入れ替えるといった方法も考えられる。
【０２６２】
（１６）変形例１６
第１〜４の実施形態において、２Ｄの画像を表示する例について述べたが、これには限らない。
【０２６３】
例えば、３Ｄデータを作成した場合には、これを３Ｄのビューアで表示してもよい。
【０２６４】
（１７）変形例１７
第１、３、４の実施形態において、表示部１０６、９０６はディスプレイ等の装置に表示する場合について述べたが、例えばユーザの顔にプロジェクタ等で直接照射してもかまわない。
【０２６５】
（１８）変形例１８
メイクは、例えば、チークを頬の筋肉の流れに沿って塗布したり、眉墨を眉近辺の骨格に沿って目尻の位置の延長まで塗布したりする等、個人の顔の形状に合わせて行われると自然な仕上がりになる。
【０２６６】
従って、入力画像を元に作成された顔の３次元形状を見てメイクアップアーティスト等がその形状に合ったメイクを施すことができれば、個人の形状に合ったより自然なメイクを提案できる。そのような場合に３Ｄエディタが有効である。
【０２６７】
また、予め用意されているメイクアイテムの中から選択されたものが自動的にメイクが塗布されるのではなく、自由にメイクやペインティングを創作したい場合にも、３Ｄエディタが有効である。
【０２６８】
３Ｄエディタとしては、例えば図１９に示すような、３次元形状データを読み込み、その形状を３Ｄ表示する表示部１０６と、色パレットや、ブラシ、ペンシル、パフ、消しゴム等のメイクツールパレットや、ぼかし効果、つや効果、テクスチャ等の効果ツールパレット等をもつ。メイクツールや効果ツール等を使って、データから読み込まれた３Ｄオブジェクトの上にメイクやペインティング等を好きな形や色で書き込み、作成した書き込みオブジェクトは、オブジェクト毎に或いはグルーピングされたオブジェクトをまとめてメイクアイテムとして保存しておくこともできる。
【０２６９】
メイクエディタは、メイクを施される対象者本人が使ってもよいし、メイクアップアーティスト等が使っても良い。
【０２７０】
また、メイクを専門に学ぶためのツールとしても利用できる。
【０２７１】
従って、自分の好きなメイクを編み出すために顔画像を撮影しエディタを使うこともあれば、顔画像を撮影しエディタを使ってメイクを施した画像あるいはデータを専門家に送信し、コメントや修正を加えて送り返すこともあれば、顔画像を撮影しメイクを施さずに顔画像を送信し、送信された顔画像を受信側が受け取り、受信側でエディタを使って３Ｄオブジェクトにメイクを施し、メイクを施した画像や３Ｄデータを送り返すこともある。
【０２７２】
ここでいう「メイク」は、通常のメイクに限ったものではなく、舞台用メイク（例えば、宝塚メイク、動物メイク、歌舞伎メイク等）やペインティングを含んでもかまわない。この舞台用メイクでは、俳優の顔を予め撮影しておき、メイクアップアーティストが舞台に合わせたメイクを編み出すために利用することもできる。ペインティングも自分の好みのデザインを作ることができる。
【０２７３】
以上、３次元メイクエディタについて述べたが、画像に直接書き込むことのできる２次元エディタでもかまわない。このようなエディタ機能により、自由にメイクアイテムを作成、編集できる。
【０２７４】
実際にメイクを物理的に施す際、メイクを初めてする者にとってはメイクの仕方が全くわからないためメイクの仕方を知りたいという要求や、初心者でなくてもプロのテクニックを知りたいという要求等がある。このような要求に応え、メイクを物理的に施すための支援ツールとして、ファンデーション、眉墨、チーク等の塗り方を順に表示する機能をメイクエディタに組み込むこともできる。
【０２７５】
表示は２次元でも３次元でもかまわないし、必ずしもメイクエディタに組み込まれている必要はなく、支援ツールが独立していてもよい。このような支援ツールは、子供等を対象に歌舞伎メイク等がどのように施されるかを見せる等、後世に無形文化を伝えていくためのツールとしても利用できる。
【０２７６】
（１９）変形例１９
第１、３の実施形態では、メイクアップ等の変形の結果はディスプレイ等の表示装置に表示していたが、直接ユーザの顔等にメイクアップ等の結果を反映してもかまわない。
【０２７７】
これは、装置が機械的に行ってもよいし、人間が行ってもよい。例えば、ブラシ等を用いてユーザの顔にペインティングを施すといったことが考えられる。
【０２７８】
（２０）変形例２０
第１、３の実施形態において、加工方式決定部１０５，５０５はブラシやタブレット等の入力装置を用いて、ユーザの顔や模型の顔やディスプレイ上の顔等に直接塗布してもよい。
【０２７９】
例えば、ユーザの顔にブラシを用いて塗る場合は、ユーザにブラシ用のセンサをつけた面を装着する等して塗る場所を判別するといったことが考えられる。
【０２８０】
（２１）変形例２１
第２の実施形態において、表示部５０６がネットワークを経由して接続されている例を示したが、加工方式決定部５０５もネットワークを経由して一つまたは複数存在してもよい。
【０２８１】
例えば、離れた場所にいるメイクアップアーティスト等がユーザをメイクアップしたり、ユーザのメイクアップを修正したり、アドバイスを与えるといったことが考えられる。
【０２８２】
また、ネットワークの相手側の加工方式決定部５０５を操作するのは人間だけでなく、計算機等によって操作されてもよい。
【０２８３】
さらに、ネットワークの相手側を操作するのは複数でもよい。その場合は、操作するのは人間だけでなく計算機も組み合わせてもよく、それぞれが同時に操作をしてもかまわない。
【０２８４】
（２２）変形例２２
仮想化粧装置において仮想的に施された化粧についてユーザが興味を持った場合、そのような化粧を実際に行うための方法や必要となる製品についての情報を入手したいという欲求が生じることが考えられる。
【０２８５】
本変形例においては上述のユーザ欲求を満足するための方法として、仮想メイクを実際に行いたいと考えるユーザをサポートする機能について詳述する。
【０２８６】
ユーザが施された化粧を実際にそのまま自分の顔に行いたいと考えた場合、まず必要な化粧用製品の情報を入手する必要がある。そこで予め施されるメイクの種類毎にそれぞれ必要とされる化粧用製品のリストを用意しておき、ユーザがそのリストを画面上で閲覧できるようにしておく。
【０２８７】
このリストは化粧用製品のメーカ、販売価格、含有成分等の情報を含んでおり、ユーザがその製品を購入しようとする際に重要な参考情報となる。閲覧するタイミングとしては、ユーザが自分で製品リスト表示ボタンを押す「自発的閲覧」や、ユーザが同じ化粧を一定時間表示させているとシステムが自動的に製品リストを表示する「自動表示」等が考えられる。
【０２８８】
ユーザが実際にリストの中から選んだ製品を購入する方法として、ユーザが直接販売店に足を運ぶこと無くネット上で購入処理を行う方法が考えられる。この方法はいわゆる「ネット通販」と呼ばれるものと類似した購入システムであり、ユーザは欲しい製品をリストから選択し、購入ボタンを押せば自動的にその情報が契約している販売店へ送信され、製品が送られてくるシステムである。
【０２８９】
（２３）変形例２３
第２の実施形態において、通信を相互で行っている場合、相手画像からメイクアップデータを抽出し、そのデータに基づいてメイクアップシミュレーションを行ってもよい。
【０２９０】
（２４）変形例２４
第１〜４の実施形態において、メイクアップデータの組み合わせは通信を行う場合には通信の相手によって予め準備しておいてもよい。
【０２９１】
また、その組み合わせは有名人に似せたもの等を予め準備しておいてもよい。
【０２９２】
メイクアップデータを作成する場合に雑誌等の写真をスキャナで取り込み、その画像からメイクアップデータを抽出し、同じメイクを再現する機能があってもよい。
【０２９３】
また、選択されたメイクアップデータの組み合わせに基づいて必要なアイテムのリスト出力してもよい。
【０２９４】
また、メイクアップのイメージやパラメータを選択する際に、画面中に表示されたダイアログをマウスで選択する例については述べたが、これらの選択は音声を用いても差し支えない。
【０２９５】
その場合には、仮想化粧装置にマイク等の音声を取り込むデバイスから音声を取り込み、音声認識装置により取り込まれた音声で表されたコマンドを実行する。例えば、「もっとシャープに」等とイメージを伝えたり、「もう少し下に」等と位置を指定したり、「もっと赤く」「アイシャドウはブルーに」等さまざまな指定が可能となる。ユーザはいちいち手を動かさないでいいので、非常に使い勝手がよくなる。
【０２９６】
また、メイクアップデータやその組み合わせは予め定められたメイクアップランゲージとして保存することも可能である。この言語はたとえば口元や目元と言った部位毎あるいは口紅やアイシャドウといったアイテム毎に指定し、それぞれに位置や形状、色や質感をパラメータとして指定する。それらの組み合わせによって顔全体のメイクアップ方式を定めることができる。それらのデータは一般的なデータとして定めることもできるし、メイクアップアイテムを製造するメーカ毎に個々に定めてもよい。一度決められればメイクアップを言語で再現できるため、たとえば通信する際にデータを削減できるし、だれでも同じメイクが再現できる。
【０２９７】
（２５）変形例２５
第５の実施形態では、注目領域を瞳領域として説明したが、これに限らない。例えば、利用者が指定した場所を注目領域としても良いし、画像の中心を固定して注目領域としても良い。また、特開平１１−１７５２４６号公報において提案されている注視位置の検出手法を用いて、利用者の注視位置を注目領域としても良い。
【０２９８】
また、第５の実施形態において瞳の追跡の際にガイドパターンを用いる例を挙げた。顔が動かなくても瞳だけが動くため、拡大する領域が必要以上に揺れ動く可能性がある。この問題を防ぐためには、例えば、ガイドパターン生成の際に目頭及び目尻に対する瞳の相対位置を予め保持しておき追跡時に比較すると良い。
【０２９９】
（２６）変形例２６
図３１（Ａ）及び図３１（Ｂ）は、第５の実施形態の仮想化粧装置における、画像入力部２１０１および表示部２１０６の一例を説明する図である。本変形例では、画像入力部２１０１であるカメラ３１０１と、表示部２１０６であるディスプレイ３１０２とが一体になっており、かつ、利用者が手で持つための持ち手３１０３が付いている。
【０３００】
図３１（Ａ）は利用者がカメラ３１０１から離れた状態で、ディスプレイ３１０２に顔全体が映っている場合である。図３１（Ｂ）は利用者がカメラ３１０１に目を近づけた状態で、ディスプレイ３１０２には目の周辺が表示されている。
【０３０１】
画像入力部２１０１および表示部２１０６を、このような一体構造にすると、利用者は実際に手鏡を使用しているような感覚で使用することができるので良い。
【０３０２】
その他の部分（例えば特徴検出部２１０２、画像加工部２１０４等）との通信は、ケーブルを介して行っても良いし、無線を用いてもよい。
【０３０３】
尚、本変形例は、カメラ付ＰＤＡ、カメラ付携帯電話、カメラ付タブレット型ＰＣ等の、カメラ付携帯端末を用いて実現しても良い。
【０３０４】
【発明の効果】
以上説明したように本発明によれば、リアルタイムで画像の入力表示を行い、特徴点等の検出をロバストに行うことで、ユーザが位置合わせ等にわずらわされることなく、自然に装着感を得られる仮想化粧装置が可能となる。
【図面の簡単な説明】
【図１】第１の実施形態の構成図である。
【図２】第１の実施形態の外観図である。
【図３】瞳虹彩領域の検出例の図である。
【図４】選択ダイアログの例の図である。
【図５】第２の実施形態の構成図である。
【図６】アイシャドウ塗布領域の例の図である。
【図７】チーク塗布領域の例の図である。
【図８】口紅塗布領域の例の図である。
【図９】第３の実施形態の構成図である。
【図１０】ペインティングを施した例の図である。
【図１１】運動行列を計算する様子である。
【図１２】加工具材の例の図である。
【図１３】第４の実施形態の処理の流れである。
【図１４】別環境の照明条件での画像を作成する例の図である。
【図１５】変形例１４における複数画像表示の例におけるその１の図である。
【図１６】変形例１４における複数画像表示の例におけるその２の図である。
【図１７】変形例１４における複数画像表示の例におけるその３の図である。
【図１８】変形例１４における拡大画像表示の例の図である。
【図１９】変形例１８における３Ｄエディタの例の図である。
【図２０】第５の実施形態でプログラムを動作させるコンピュータの例。
【図２１】第５の実施形態の構成図である。
【図２２】第５の実施形態の特徴検出部の処理の流れを説明する図。
【図２３】（Ａ）瞳虹彩領域の検出例。（Ｂ）瞳虹彩領域の検出例。
【図２４】ガイドパターン用の正規化画像の例。
【図２５】ガイドパターンのパラメータ例。
【図２６】アイシャドウ塗布領域の例。
【図２７】輝度ヒストグラムによる閾値決定の説明図。
【図２８】第５の実施形態の表示例。
【図２９】注目領域の追跡手法の説明図。
【図３０】第６の実施形態の構成図。
【図３１】（Ａ）変形例２６において顔全体を表示させた図。（Ｂ）変形例２６において目の周辺を拡大表示させた図。
【符号の説明】
１００仮想化粧装置
１０１画像入力部
１０２特徴検出部
１０３データ記憶部
１０４画像加工部
１０５加工方式決定部
１０６表示部
２００１プロセッサ
２００２メモリ
２００３磁気ディスクドライブ
２００４光ディスクドライブ
２００５画像出力部
２００６入力受付部
２００７出入力部
２００８表示装置
２００９入力装置
２０１０外部装置
２１０１画像入力部
２１０２特徴検出部
２１０３データ記憶部
２１０４画像加工部
２１０５加工方式決定部
２１０６表示部
２１０７大きさ・位置情報管理部
３００１顔領域検出部
３００２顔辞書生成部[0001]
BACKGROUND OF THE INVENTION
The present invention mainly relates to a virtual makeup apparatus and method for displaying an image obtained by deforming a face image of a person on a display device such as a display or transmitting the image through a network.
[0002]
[Prior art]
There is a customer demand to know how cosmetics such as lipsticks and eye shadows, or glasses and color contact lenses look when they are actually worn. However, those actually attached to the skin and the like have problems such as troublesome mounting itself, sanitary problems, and difficulty in comparing different products. Therefore, it is very useful to be able to experience the wearing state virtually in the image without actually wearing these products.
[0003]
Up to now, there are simple devices and software that use images to simulate the wearing of makeup, hairstyles, glasses, color contact lenses, and the like (for example, Patent Document 1). However, since the moving image cannot be moved according to the user's movement, an actual wearing feeling cannot be obtained. In addition, since detection of feature points and feature regions in a face image that is a source of generation of a virtual image is not robust, it cannot sufficiently cope with changes in lighting, movement, and individual differences.
[0004]
Japanese Patent Laid-Open No. 2003-26058 discloses a technique for detecting feature points and displaying a disguise image by superimposing a predetermined image on the feature point. However, this method is not satisfactory in terms of real image simulation.
[0005]
In addition, with the spread of broadband today, communication tools that allow faces to be seen between individuals, such as videophones, are becoming commonplace. In that case, it is expected that it will be a problem that you leave your face and figure while you are at home. Of course, there is a method that doesn't show up by using avatars, etc., but this loses the sense of realism, and the meaning of broadband is lost. However, there is a problem that it is troublesome to put on makeup like when going out to answer a videophone.
[0006]
[Patent Document 1]
JP 2000-285222 A
[Patent Document 2]
JP 2000-322588 A
[0007]
[Problems to be solved by the invention]
In order to solve the above-mentioned problems, the present invention provides a virtual makeup apparatus and a method therefor, in which a user can naturally get a feeling of wearing by performing image input display in real time and robustly detecting feature points and the like. The purpose is to do.
[0008]
[Means for Solving the Problems]
  According to the first aspect of the present invention, there is provided a virtual makeup device for processing a face image of each frame before makeup in a moving image including at least a person's face area, and processing the image into a face image of a person who has applied makeup. A face shape model storage means for storing the feature, a feature detection means for detecting a feature point, a feature region, or both of the face from the image of each frame before the makeup, and the item for the item for applying makeup Data relating to the type of item, positional relationship data indicating the positional relationship between the feature point, feature region, or both of the item and the item, and makeup data including the color, shape, or application range of the item Data storage means, processing method determination means for determining data related to the type of item necessary for makeup from the stored item type, and Face direction detecting means for detecting a face direction by obtaining a rotation angle based on a two-dimensional coordinate of a feature point detected from a frame image, and a face direction of each frame using the face shape model An application range calculation unit that calculates the application position and range of the item in response, and makeup of the face image in each frame of the moving image according to the application range of the item calculated by the application range calculation unit A virtual makeup apparatus comprising: an image processing means for processing an image; and a display means for displaying an image of the face on which the makeup has been applied.
[0009]
According to a second aspect of the present invention, there is provided a virtual makeup apparatus according to the first aspect, further comprising display image transmission means for transmitting the face image to which the makeup has been applied through a network.
[0010]
The invention according to claim 3 is the virtual makeup apparatus according to claim 1, wherein the feature detection means receives data relating to the image before makeup through a network.
[0011]
  The image according to claim 4 is pasted on a three-dimensional model of the person's face by texture mapping.3It is a virtual makeup | decoration apparatus as described in at least 1 item.
[0012]
  The invention of claim 5 is characterized in that in the display means, an image applied with makeup at different times, an image applied with makeup in different face directions, an image applied with makeup with different resolutions, an image applied with makeup at different sites, 5. The virtual makeup apparatus according to claim 1, wherein images having different makeups are simultaneously displayed.
[0013]
  The invention according to claim 6 is the virtual makeup device according to at least one of claims 1 to 5, wherein the item is a color contact lens, eye shadow, teak, lipstick, or tattoo. .
[0014]
  According to a seventh aspect of the present invention, there is provided a face shape model in a virtual makeup method in which an image of each frame before makeup in a moving image including at least a person's face area is processed into an image of a person's face subjected to makeup. Storing a face shape model, a feature detection step of detecting feature points, feature regions, or both of the face from the image of each frame before makeup, and the item relating to an item for applying makeup Data relating to the type of item, positional relationship data indicating the positional relationship between the feature point, feature region, or both of the item and the item, and makeup data including the color, shape, or application range of the item Data storage step and processing method determination for determining data relating to the item type necessary for makeup from the stored item type A face direction detecting step for detecting a face direction by obtaining a rotation angle based on two-dimensional coordinates of the feature points detected from the images of the frames, and the direction of the shape model of the face An application range calculation step for calculating the application position and range of the item, and an image so as to make up the face image in each frame of the moving image according to the application range of the item calculated by the application range calculation means A virtual makeup method comprising: an image processing step for processing; and a display step for displaying an image of the face on which the makeup has been applied.
[0015]
  According to the invention of claim 8, a virtual makeup method for processing a frame image before makeup in a moving image including at least a person's face area and processing the image into a face image of a person who has applied makeup is realized by a computer. In the program, a face shape model storage function that stores a face shape model, a feature detection function that detects a feature point, a feature region, or both of the face from an image of each frame before the makeup is applied, and the makeup is applied Data relating to the item type for the item, facial feature points, feature regions, or both, positional relationship data indicating the positional relationship of the item, and the color, shape, or application range of the item From the data storage function for storing the makeup data that is in progress and the type of the stored item, the item type required for makeup A processing method determining function for determining data, a face direction detecting function for detecting a face direction by obtaining a rotation angle based on a two-dimensional coordinate of a feature point detected from the image of each frame, The application range calculation function for calculating the application position and range of the item according to the orientation of the shape model, and the face of each frame of the moving image according to the application range of the item calculated by the application range calculation means. An image processing function for processing an image so as to make up an image and a display function for displaying an image of a face on which the makeup has been applied are realized by a computer.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
The present invention is described in detail below.
[0020]
(First Embodiment) A virtual makeup apparatus 100 according to a first embodiment of the present invention will be described with reference to FIGS.
[0021]
In this specification, “makeup” is not limited to decorating the face with so-called cosmetics and so on. In other words, a state in which a part or all of the color of the face after makeup is different from the face before makeup after painting is performed.
[0022]
(1) Configuration of virtual makeup device 100
FIG. 1 is a diagram illustrating the configuration of the present embodiment, and FIG. 2 is an example of an overview diagram of the virtual makeup device 100.
[0023]
The virtual makeup apparatus 100 is an example in which a video camera is connected to a personal computer (hereinafter referred to as a PC) and an image is displayed on a CRT or a liquid crystal display.
[0024]
Specifically, an image in which a color contact lens is artificially attached to a human pupil region is displayed.
[0025]
The image input unit 101 is a video camera and inputs a moving image.
[0026]
The feature detection unit 102 extracts the pupil position and the outline of the iris region from the input image.
[0027]
In the data storage unit 103, makeup data such as color, texture, transparency, and size, facial feature points, feature regions, or both for each type of color contact lens that is one of the cosmetic items, and the item The positional relationship data indicating the positional relationship is stored.
[0028]
The processing method determination unit 105 provides an interface capable of selecting the type of stored color contact lens.
[0029]
Based on the selected color contact lens data, the image processing unit 104 performs processing so that the color contact lens is attached to the pupil iris portion of the subject in the image.
[0030]
The display unit 106 displays the processed image on the CRT or liquid crystal display.
[0031]
Each component 101-106 is described in detail below. Note that each function of the feature detection unit 102, the data storage unit 103, the image processing unit 104, and the processing method determination unit 105 is realized by a program recorded in the PC.
[0032]
(2) Image input unit 101
The image input unit 101 inputs a moving image including the face of a target person from a video camera.
[0033]
For example, there is a method of inputting via a general USB camera, digital video, or a special image input device.
[0034]
The input moving image is sequentially sent to the feature detection unit 102.
[0035]
(3) Feature detection unit 102
The feature detection unit 102 detects feature points and regions from the input image.
[0036]
In the present embodiment, the contour of the iris region of the pupil as shown in FIG. 3 is detected. The outline detection method of the pupil iris region will be described below.
[0037]
Specifically, Reference 1 (Fukui, Yamaguchi: “Face Feature Point Extraction by Combination of Shape Extraction and Pattern Matching”, Science Theory (D-II) vol. J80-D-II, no. 9, pp. 2170 -2177, Aug. 1997), the pupil position is detected.
[0038]
Thereafter, the contour is further detected using the position as an initial value. First, the face area portion may be detected.
[0039]
As a method for extracting the pupil contour, in the case where the contour can be approximated by a circle, Reference 2 (Yuasa et al .: “High-precision pupil detection based on minimizing integrated energy of pattern and edge”, IEICE Technical Report, PRMU2000-34, PP. 79-84, Jun. 2000) can be used.
[0040]
The initial value is the center position and radius of the previously detected pupil, and the value based on the combination of the pattern energy based on the pattern similarity and the edge energy based on the size of the circular separability filter when the position and radius are changed is minimized. This is a method for obtaining a more accurate contour by obtaining a correct position and radius.
[0041]
At that time, even if the initial value is far from the correct answer by using the partial space by the pattern cut out based on the position and radius shifted from the correct contour named moving subspace as a reference It is possible to cope with it, and the convergence time can be shortened.
[0042]
In Reference 2 shown above, the outline is a circular shape, but the outline shape may be arbitrary in this method.
[0043]
For example, it may be an ellipse or a spline curve composed of a set of a plurality of sample points on the contour.
[0044]
In the case of an ellipse, it is represented by 5 parameters of position parameter 2 + shape parameter 3.
[0045]
In the case of a spline curve, it is expressed by the number of sample points that can express an arbitrary shape.
[0046]
However, in the case of an arbitrary shape, the number of parameters increases too much, so that it is difficult to converge, or even if it can be compared, it takes a very long time and is not realistic.
[0047]
In that case, based on a certain reference point (for example, the center of the pupil), the size of the reference (for example, the average radius of the pupil, in the case of the present embodiment, in the case of the separation filter when the pupil position is first determined) It is possible to reduce the dimensionality of parameters by acquiring data about the outline shape in a large number of pre-collected real images and performing principal component analysis of these data. is there.
[0048]
As a pattern normalization method, in addition to a method of deforming an image by affine transformation or the like based on a position and a shape, a pattern obtained by cutting an image along an outline can be used.
[0049]
(4) Data storage unit 103
The data storage unit 103 holds necessary information for each selectable color contact lens. The information includes a feature point necessary for the item, a feature region, or positional relationship data including a relative positional relationship of the reference position, size, and shape of the application region with respect to both the feature point and the feature region.
[0050]
In the present embodiment, the feature point is the center of the iris region of the pupil, and the feature region is the iris region.
[0051]
The distance between the center point of the color contact lens and the center of the pupil is zero.
[0052]
In addition, a reference point in the data, a reference point calculated based on the feature point and the feature region, and a reference point of the lens data corresponding to the reference point in the data are recorded as information.
[0053]
Further, the data storage unit 103 records makeup data (color data, texture data, transparency data) corresponding to the reference point at the same time.
[0054]
(5) Processing method determination unit 105
The processing method determination unit 105 selects a data set that has been input in advance and recorded in the data storage unit 103.
[0055]
In this embodiment, the type of color contact lens that is a cosmetic item is selected. An example of a dialog prompting selection is shown in FIG.
[0056]
On the selection screen, an image showing possible color contact lenses, colors, character strings, and the like are displayed.
[0057]
(6) Image processing unit 104
The image processing unit 104 processes an image based on the detected feature region, the selected positional relationship data, and makeup data.
[0058]
In the present embodiment, an image of the selected color contact lens is displayed in an overlapping manner on the iris region of the pupil.
[0059]
The color contact lens makeup data includes first the color and transparency corresponding to the lens radius and distance from the center, and second the case where the lens shape, color and texture information are stored as two-dimensional image data. Can be considered.
[0060]
In the present embodiment, the first case will be described.
[0061]
First, the detected center of the pupil is made to correspond to the center of the lens based on the positional relationship data, each point on the iris contour is made to correspond to the iris corresponding point of the lens, and the corresponding radius is calculated by affine transformation.
[0062]
Next, the value of each pixel is changed according to the color and transparency corresponding to the radius calculated for the pixels in each iris. For example, if the color and transparency are expressed as C (r) and α (r), respectively, as a function of the distance r from the center, and the original pixel value is Iorg (r), the changed pixel value I (R) is represented by the following equation.
[0063]
[Expression 1]

[0064]
Such a method of superimposing two luminances in consideration of transparency is called α blending.
[0065]
(7) Image display unit 106
The image display unit 106 displays the image created by the image processing unit 104 on a display or the like.
[0066]
(Second Embodiment) A virtual makeup apparatus 500 according to a second embodiment of the present invention will be described with reference to FIGS.
[0067]
(1) Configuration of virtual makeup apparatus 500
The virtual makeup device 500 of this embodiment is an example in which a video camera is connected to a PC, the processed image is transmitted over a network, and the image is displayed on the display of the communication partner.
[0068]
Specifically, an image in which makeup such as lipstick is applied to the face area of a person is displayed.
[0069]
FIG. 5 is a diagram showing the configuration of the present embodiment.
[0070]
The image input unit 501 is a video camera and inputs a 2D moving image.
[0071]
The feature detection unit 502 detects necessary feature points and feature regions from the input image. In this embodiment, the pupil position, nostril position, and lip area are used.
[0072]
The data storage unit 503 stores information such as color, texture, transparency, and size for each target makeup item and type.
[0073]
The processing method determination unit 505 provides an interface capable of selecting a stored makeup item and type, darkness, gradation parameters, and the like.
[0074]
Based on the selected makeup data, the image processing unit 504 processes the makeup data in a state where the makeup data is applied to the target area of each makeup item in the subject in the image.
[0075]
The display image transmission unit 506 transmits the processed image and displays the processed image.
[0076]
Each component will be described in detail below. Note that the functions of the feature detection unit 502, the data storage unit 503, the image processing unit 504, the processing method determination unit 505, and the display image transmission unit 506 are realized by programs recorded in the PC.
[0077]
(2) Image input unit 501
The image input unit 501 inputs an image obtained by photographing a human face from a camera. This part is the same as in the first embodiment.
[0078]
(3) Feature detection unit 502
The feature detection unit 502 first detects the pupil position, iris radius, nostril position, and mouth end position. Their detection is the same as in the first embodiment.
[0079]
The following application region and application method are determined from the feature point positions and feature amounts.
[0080]
In the data storage unit 503, in addition to information such as color and size specific to each makeup item, necessary feature points / areas corresponding to each makeup item type (eye shadow, cheek, etc.), and Constant values or functions giving relative coordinates / distances from these feature points / areas to makeup item application positions / areas are recorded.
[0081]
In the present embodiment, the feature points / regions necessary for cheek are the center position and nostril position of both pupils, and the center coordinates and major axis / minor axis / rotation of the cheek application region (expressed by an ellipse) from these feature points. The function for calculating the corner is recorded.
[0082]
(3-1) Eye shadow application
An example of eye shadow application will be described.
[0083]
As shown in FIG. 6, the eye shadow region is applied to a region (shaded portion) calculated based on the detected pupil position and iris radius. This area is defined by the eyebrows on the upper side and the lower end of the eyelids on the lower side, and is also defined by the eyebrow or eyelid existing area on the left and right. However, the extraction of the existence range may not be so strict.
[0084]
This is because the application of these eye shadows usually blurs the uncoated area, so that even if the position of the edge of the coated area is slightly shifted, there is no problem in display.
[0085]
Further, in the application area, it is possible to decide whether or not to apply the information by using the luminance and color information of the original image in the application area. For example, if it is not applied to a place where the luminance is lower than a certain level, it can be prevented from being applied by mistake on the eyelashes. Of course, the application area may be directly extracted in detail.
[0086]
(3-2) Teak application
Next, an example of cheek application will be shown.
[0087]
As shown in FIG. 7, the cheek region is applied to a region calculated based on the pupil position, iris radius, and nostril position. In the case of cheeks, as in the case of eye shadows, it is possible to blur or to paint differently according to the luminance and color information of the original image.
[0088]
As a method for determining the pixel value, for example, an ellipse whose center is (x0, y0) and whose major axis is represented by a and b is used as a reference, and a distance r obtained by normalizing the distance from the center is used. The point of (x, y) is defined as expressed by equation (2).
[0089]
[Expression 2]

[0090]
This r is applied to the equation (1). However, in this case, C (r) is represented by the expression (3).
[0091]
[Equation 3]

[0092]
(3-3) Lipstick application
Next, an example of lipstick application will be shown.
[0093]
In the lipstick region, as shown in FIG. 8, the outline of the lips is extracted based on the information on the mouth end position or in addition to the nostril and the pupil position, and the inside of the outline is basically applied.
[0094]
However, if the mouth is opened in that area, teeth, gums, etc. may be included, so either extract the inner contour or use other brightness and color information It is necessary to ignore the area.
[0095]
In order to extract the lip contour, for example, based on the principal component contour obtained by principal component analysis of a large number of general lip contour data obtained in advance based on two points at the mouth edge, an initial value is generated. The contour is extracted from the generated initial value. This method is the method disclosed in Japanese Patent Application No. 10-065438. Similarly, the inner contour may be extracted, and the process may be performed between the inner contour and the outer contour.
[0096]
(3-4) Other
Items to be processed are not limited to the above features. For example, as a feature, a skin region, an eyebrow region, an eyelid contour, or the like can be used. Once the face area is determined, it is possible to apply a foundation to the area.
[0097]
Further, the application area may not be determined only by these six feature point positions, and it is not necessary to use all of them. Only necessary ones from the obtained feature values may be used, or any combination of available feature values obtained can be used.
[0098]
Furthermore, after the image is input and the feature points are extracted by the feature point extraction unit 502, makeup may be completed by applying eye shadow, cheek, and lipstick in order.
[0099]
(4) Display image transmission unit 506
The display image transmission unit 506 transmits the image created through the network and displays the transmitted image on the display of the other party.
[0100]
(Third Embodiment) A virtual makeup apparatus 900 according to a third embodiment of the present invention will be described with reference to FIGS.
[0101]
(1) Configuration of virtual makeup apparatus 900
The virtual makeup apparatus 900 of the present embodiment displays an image equivalent to a painting or tattoo, and the user selects some items displayed and parameters such as color, shape, position, etc. Related to a virtual makeup apparatus 900 that creates a pattern or material that can achieve the same effect as a real image, or prepares a corresponding pattern or material in advance and provides it according to the user's selection It is.
[0102]
FIG. 9 shows a configuration diagram of the present embodiment.
[0103]
The configuration other than the processing tool material distribution unit 907 is the same as that of the first embodiment.
[0104]
However, since the face detection information is indispensable in the feature detection unit 902 in order to perform painting, face orientation detection is introduced.
[0105]
(2) Face orientation detection method
The orientation of the face can be easily obtained from the positional relationship of feature points such as the eyes and nose, or can be obtained using a pattern of the face area. An example of a painted image is shown in FIG.
[0106]
References include, for example, Japanese Patent Application Laid-Open No. 2001-335666 as a method using feature points, Japanese Patent Application Laid-Open No. 2001-335663 as a method using patterns, and Reference 3 (Yamada, Nakajima, Fukui “Face Decomposition and Subspace Method Face Orientation” Estimate ”, a scientific technique PRMU 2001-194, 2001).
[0107]
Here, a method for calculating the orientation of the face from the position coordinates of the feature points proposed in Japanese Patent Laid-Open No. 2001-335666 will be briefly described.
[0108]
First, feature points (four or more points) such as the eyes, nose and mouth ends are detected from an image showing a face (three or more frames having different face orientations).
[0109]
From the feature point position coordinates, factorization method (Tomasi, C. and T. Kanade: Technical Report CMU-CS-91-172, CMU (1991); International Journal of Computer Vision, 9: 2, 137-154 (1992). )) To determine the three-dimensional coordinates of the feature points.
[0110]
A matrix having the three-dimensional coordinates of the feature points as elements is held as a shape matrix S. When an image showing the face whose orientation is to be obtained is input, a feature point is detected from the image, and a measurement matrix Wnew having the two-dimensional coordinates of the feature point as an element is multiplied by a generalized inverse matrix of the shape matrix S. Then, the motion matrix Mnew representing the face orientation is obtained, and the rotation angle such as roll pitch yaw can be known. FIG. 11 shows how the motion matrix Mnew is calculated. In FIG. 11, the generalized inverse matrix of the shape matrix S is represented by a symbol with + on S.
[0111]
In this figure, it is assumed that the camera is fixed and the face is moved relatively, while the face is fixed and the camera orientation is changed. The feature point coordinates may be those detected by the feature detection unit 902, and the direction itself can be obtained only by multiplying the matrix once. Therefore, this is a very fast method and can be calculated independently for each frame. Does not accumulate.
[0112]
If a painting item is subjected to affine deformation and mapped to a face image based on the rotation angle obtained in this way, a natural painting image that matches the face direction can be obtained.
[0113]
(3) Processing tool material distribution section 907
The processing tool material distribution unit 907 distributes materials for realizing an image as shown in FIG. 10 in the real world.
[0114]
As the material, a pattern as shown in FIG. 12 is generated.
[0115]
The pattern paper has holes in the paint area, so it can be easily filled.
[0116]
In addition, anyone can easily align the pattern by setting a reference point as indicated by x in the drawing.
[0117]
Further, it is not necessary to have a single pattern paper, and it is possible to cope with a complicated shape or a number of colors by generating a plurality of paper patterns as necessary.
[0118]
(Fourth embodiment)
A virtual makeup apparatus 900 according to a fourth embodiment of the present invention will be described with reference to FIGS.
[0119]
In this embodiment, an example in consideration of the lighting environment when performing digital makeup will be described. The flow of processing is shown in FIG.
[0120]
The lighting environment description method can specify the position, direction, color, intensity, etc. of the light source. By preparing multiple point light sources and surface light sources that are handled by general CG, it is possible to simulate complex light sources. Simulate outdoor sunlight.
[0121]
The reflectance of each part of the face is described using a function such as BRDF (bidirectional reflection function), and the luminance value of each pixel is calculated based on the described illumination environment for display.
[0122]
Next, a description will be given of a method of acquiring an illumination environment when photographing an actual human face with a camera and controlling the illumination environment. This is performed using a flowchart as shown in FIG.
[0123]
First, a three-dimensional model of a face is acquired by using a plurality of cameras or created by using a structure from motion technique using a single camera.
[0124]
Alternatively, a method such as Reference 4 (Maki “Acquisition of three-dimensional shape by Geotensity constraint—corresponding to a plurality of light sources—” IEICE Pattern Recognition / Media Understanding Study Group Report, PRMU99-127, 1999) may be used. .
[0125]
Necessary for estimating the light source from the state of reflection on each surface by obtaining the normal direction of each surface for a plurality of surface information constituting the three-dimensional model for the created three-dimensional model. get information.
[0126]
The light source estimation method is described in Reference 5 (Nishino, Ikeuchi, Zhang “Analysis of light source conditions and reflection characteristics from sparse image sequences”, Development project for digital preservation automation technique for cultural assets, 2001 results report, pp. 86- 501), using information of specular reflection on each surface to obtain a plurality of point light source directions, Document 6 (Imari Sato, Yoichi Sato, Katsushi Ikeuchi, “Light source environment based on object shading” Estimated, “Information Processing Society Journal: Computer Vision and Image Media“ Physics-based Vision and CG ”Special Issue, Vol. 41, No. SIG 10 (CVIM 1), pp. 31-40, December 2000.) A light source position estimation method using shadow information may be used.
[0127]
Consider the case of creating an image under illumination conditions in another environment as shown in FIG. 14B for a face photographed in a certain environment as shown in FIG.
[0128]
Using the estimated light source information, the lighting environment for the three-dimensional model is changed by suppressing the direction, number, intensity, and the like of the light sources, and images in various environments are generated.
[0129]
A method that does not explicitly use the three-dimensional model is also conceivable. In consideration of the illumination conditions at the time of photographing in FIG. 14A, an image is created by removing characteristic portions such as shadows and specular reflections. Then, in consideration of the illumination condition in another environment of FIG. 14B, an image under another illumination condition is obtained by adding a shadow, a specular reflection or the like to the image.
[0130]
In addition, for these lighting environments to be combined, several lighting patterns may be prepared in advance. For example, an illumination environment corresponding to the environment, location, time, etc., such as “indoor (home)”, “indoor (restaurant)”, “outdoor (clear)”, “outdoor (cloudy)”, etc. may be created and combined.
[0131]
By creating an image of an illumination environment different from the environment where the image was taken as described above, it is possible to virtually create a face image as if the image was taken at another location.
[0132]
(Fifth Embodiment) A virtual makeup apparatus according to a fifth embodiment of the present invention will be described below with reference to the drawings.
[0133]
People often do makeup while looking at their faces in the mirror. When applying makeup, a person tries to enlarge the part he wants to see by moving his face close to the mirror or by moving the mirror close to his face, such as a hand mirror.
[0134]
The virtual makeup apparatus according to the present embodiment determines a specific region (for example, eyelids and lips) from a plurality of feature points detected from a user's face image captured by a camera, and performs makeup on the region. Furthermore, the movement of the face is detected and the face is enlarged and displayed based on the area.
[0135]
In the following description, a case where eye shadow is applied in a pseudo manner is described as an example.
[0136]
FIG. 20 is a diagram for explaining an example of the configuration of a personal computer (PC) for realizing the virtual makeup apparatus of the present embodiment.
[0137]
The PC includes a processor 2001, a memory 2002, a magnetic disk drive 2003, and an optical disk drive 2004.
[0138]
Further, an image output unit 2005 corresponding to an interface unit with a display device 2008 such as a CRT or LCD, an input receiving unit 2006 corresponding to an interface unit with an input device 2009 such as a keyboard or a mouse, and an interface with an external device 2010. Input / output unit 2007 corresponding to the unit.
[0139]
Examples of the input / output unit 2007 include an interface such as USB (Universal Serial Bus) and IEEE1394.
[0140]
In the virtual makeup device of this embodiment, the video camera for inputting an image corresponds to the external device 2010.
[0141]
The magnetic disk drive 2003 stores a program for performing image processing such as makeup on the input image. When the program is executed, the program is read from the magnetic disk drive 2003 to the memory 2002, and the program is executed by the processor 2001.
[0142]
The moving images picked up by the video camera are sequentially stored in the memory 2002 in units of frames via the input / output unit 2007. The processor 2001 performs image processing on each stored frame, and also generates a GUI for the user to operate the apparatus. The processor 2001 outputs the processed image and the generated GUI to the display device 2008 via the image output unit 2005. A display device 2008 displays an image and a GUI.
[0143]
FIG. 21 is a diagram illustrating functional blocks of the virtual makeup device according to the present embodiment.
[0144]
The apparatus stores an image input unit 2101 that is a video camera that inputs a moving image, a feature detection unit 2102 that detects a feature point and a feature region of a person's face from the input image, and makeup data and positional relationship data. The data storage unit 2103 is provided.
[0145]
The makeup data is data of color, texture, and transparency for each type of eye shadow that is one of the makeup items. The positional relationship data is data indicating the positional relationship between the facial feature points and the feature region and the eye shadow application region.
[0146]
Furthermore, the processing method determination unit 2105 that provides a GUI for selecting the type of eye shadow stored in the data storage unit 2103 in advance by the user, and the eye shadow color selected on the eyelid portion are superimposed to simulate An image processing unit 2104 that generates a coated image, a display unit 2106 that is a display device such as an LCD or CRT that displays the processed image, and estimates the size, position, and inclination of the face, and based on the estimation And a size / position information management unit 2107 that controls the display unit 2005 to display the enlarged image.
[0147]
Hereinafter, each part will be described.
[0148]
(Image Input Unit 2101) The image input unit 2001 inputs a moving image showing the face of a target person from a video camera. As the video camera, for example, a general camera such as a USB (Universal Serial Bus) connection camera or a digital video camera may be used. The input moving image is sequentially output to the feature detection unit 2102.
[0149]
(Feature Detection Unit 2102) The feature detection unit 2102 detects a facial feature region from the input image. For example, when an eye shadow is applied to the eyelid, the outline 301 of the iris region of the pupil shown in FIG. 3 is detected.
[0150]
The iris region contour detection of the pupil is performed by the method shown in Document 1 as in the first embodiment. The flow of detection processing is to detect feature points and collate pixel information around the feature points by a subspace method. The outline will be described below with reference to FIG.
[0151]
(S2201 separability map generation)
An output value of the circular separability filter shown in FIG. 23A is obtained for each pixel of the entire input image, and a separability map is generated. The degree of separation is an index representing the degree of separation of pixel information between two regions. In this embodiment, the degree of separation between the region 2301 and the region 2302 is obtained using the following equation.
[0152]
[Expression 4]

[0153]
Instead of obtaining the separability map for the entire input image, for example, the separability map may be obtained after performing preprocessing for specifying a face region in the image, such as template matching or background subtraction.
[0154]
(S2202 Feature point candidate extraction)
A point where the degree of separation becomes a local maximum value in the generated degree of separation map is set as a feature point candidate.
[0155]
If an appropriate smoothing process or the like is performed on the separability map before obtaining the local maximum value, the influence of noise can be suppressed.
[0156]
(S2203 Pattern similarity calculation)
A local normalized image corresponding to the radius r of the separability filter is acquired from the vicinity of each feature point candidate. Normalization is performed by affine transformation according to the radius. The similarity between the normalized image and a dictionary (left pupil, right pupil) created beforehand from the image around the pupil is calculated by the subspace method.
[0157]
(S2204 feature point integration)
The correct pupil position is detected based on the position of each feature point candidate, the radius of the separability filter, and the similarity value.
[0158]
For each of the left pupil and the right pupil, a predetermined number of feature point candidates or feature point candidates whose similarity is equal to or greater than a predetermined threshold are extracted in descending order of similarity obtained in S2203 pattern similarity calculation. Left and right pupil candidates are generated by combining the extracted feature point candidates of the left and right pupils.
[0159]
The generated left and right pupil candidates are checked against previously assumed conditions, and those that do not match the conditions are excluded from the candidates. The conditions are, for example, the interval between the left and right pupils (excluded when the condition is abnormally narrow or wide), the direction of the line segment connecting the left and right pupils (excluded when the image extends in the vertical direction in the image), etc. .
[0160]
Of the remaining left and right pupil candidates, the combination having the highest sum of the similarity between the left and right pupils is defined as the left and right pupils.
[0161]
(S2205 Detailed detection)
The contour of the pupil is obtained more accurately, and the exact position of the pupil is obtained. In this step, the method of Reference 6 (Yuasa et al. “Precise Pupil Contour Detection Based on Minimizing the Energy of Pattern and Edge”, Proc. IAPR Workshop on Machine Vision Applications, pp. 232-235, Dec. 2002) is used. . Below, the outline | summary of this method is demonstrated.
[0162]
Using the pupil position obtained by S2204 feature point integration and the radius of the separability filter used to detect the pupil, an initial shape is created assuming that the pupil contour is circular. The created initial shape is expressed by ellipse parameters.
[0163]
For example, the center coordinate is (x₀, Y₀) And the radius is r, the ellipse parameters are (x, y, a, b, θ) = (x₀, Y₀, R, r, 0). x and y are the center positions, a and b are the major axis and minor axis of the ellipse, and θ is the inclination of the eye.
[0164]
First, the ellipse parameters are roughly determined using the guide pattern with these ellipse parameters as initial values. FIG. 24 is a diagram for explaining the guide pattern.
[0165]
The guide pattern used here is a dictionary created by principal component analysis by generating images with various parameters such as pupil radius and pupil center position, etc., with respect to the reference pupil periphery image (correct answer pattern) It is. Here, eleven kinds of dictionaries shown in FIG. 25 created by changing various ellipse parameters and the like are used. The correct answer pattern may not be the user's own.
[0166]
Collation with the dictionary is performed by the subspace method. The one with the highest similarity is obtained from a plurality of dictionaries. It is estimated that the parameters when the obtained dictionary is created are close to the parameters related to the input pupil image.
[0167]
Edge energy and pattern energy are calculated while changing the value of the ellipse parameter using the ellipse parameter estimated using the guide pattern and the input image.
[0168]
The edge energy is obtained by negating the sign of the degree of separation obtained using the elliptical degree of separation filter having the elliptic parameter (FIG. 23B). The above-mentioned equation 4 is used to calculate the separation value.
[0169]
The pattern energy is obtained by negating the sign of the similarity value between the reduced image near the pupil with reference to the elliptic parameter and the dictionary prepared from the correct image prepared in advance and the elliptic parameter. The similarity to the dictionary is obtained using the subspace method.
[0170]
In this step, an accurate pupil contour is obtained by changing the ellipse parameter to obtain a set of ellipse parameters when the sum of the two energies is locally minimum.
[0171]
Through the above steps, an accurate pupil contour can be obtained. The position of the pupil can be obtained as the center of the contour.
[0172]
(Size / Position Information Management Unit 2107) The size / position information management unit 2107 calculates the current face size, position, and the like from the pupil position and pupil contour radius (ellipse parameter) obtained by the feature detection unit 2102. Estimate the slope. Then, an enlargement ratio for displaying the face is obtained from the change rate of the face size.
[0173]
The size and inclination of the face are estimated from, for example, the length and inclination of a line segment connecting both pupils. Further, the rotation of the face (rotation around the neck) is estimated from the ratio of the contour radius of the left and right pupils.
[0174]
Based on the estimated information, the image processing unit 2104 is notified of the enlargement center position and the enlargement rate so that the enlargement display is performed around the area that the user is paying attention to. The region where the user is paying attention to the region including the portion estimated to be closest to the center position of the image among the parts (eyes, cheeks, lips, etc.) that are virtually makeup on the face And
[0175]
At the time of enlargement, for example, if the entire face is getting larger, a notification is given to enlarge at an enlargement rate that exceeds the actual face size change rate. By doing in this way, it can prevent that it approaches a camera too much. FIG. 28 is a diagram for explaining an example of enlarged display. Even if the entire face is initially displayed, the approach of the user is detected from the change rate of the size of the face, and before the user approaches too much, for example, the eyes are enlarged and displayed.
[0176]
If a person gets too close to the camera, it becomes difficult to capture the entire face. As a result, it becomes difficult to estimate the orientation and size of the face.
[0177]
In such a case, the attention area is tracked in the screen. The aforementioned guide pattern is applied to the tracking.
[0178]
Since the above-described guide pattern is a pattern in which the position and pupil radius are deviated from the correct answer pattern, it can be estimated which “displacement” is the pattern in the current position and shape. That is, it is possible to estimate how the region to be tracked is moving from the difference between the “shift” in the previous frame and the “shift” in the current frame.
[0179]
For example, if the position is shifted left and right, it can be seen that the movement is in the left and right direction. If the size is larger, you can see that it is getting closer. Furthermore, it is possible to estimate a three-dimensional motion from the deformation.
[0180]
FIG. 29 is a diagram for explaining an example in which the face position of a person moves −dx in the x direction during the time dt within the screen with respect to the camera. A region at time t0 + dt at the same position as region 2901 at time t0 is region 2902. However, since the face has moved, the region at time t0 having the same (most similar) pattern as region 2902 becomes region 2903.
[0181]
Therefore, at time t0, the “deviation” from the correct pattern of the pattern in the region 2901 is checked using the guide pattern. Then, at a time t0 + dt, the “deviation” from the correct pattern of the pattern in the region 2902 is examined using the guide pattern. The amount of movement can be estimated from the difference between the “deviations” at both times.
[0182]
(Image Processing Unit 2104) The image processing unit 2104 processes the image using the position of the pupil detected by the feature detection unit 2102. Further, processing for enlarging the image is performed based on the enlargement ratio and the enlargement center position notified from the size / position information management unit 2107.
[0183]
In the following description, with reference to FIG. 26, an example of creating an image in which eye shadow is virtually applied to the eyelid of the right eye will be described.
[0184]
The outline of the method of virtually applying the eye shadow is as follows. A rectangular region 2601 in FIG. 26 is estimated as a rough eyelid region from the relationship between the pupil detected region, the positional relationship between both pupils, and the relationship between the pupil size obtained as prior knowledge and the positions of the eyes and the corners of the eyes. From the luminance information in the rectangular area 2601, an area corresponding to the eyelid, that is, a portion that is not an eyebrow or a pupil in this area is determined. Apply eye shadow to the eyelid area.
[0185]
The rectangular area 2601 is determined as follows.
[0186]
It is set so that the lower base of the rectangular area 2601 passes through the center 2602 of the pupil detected by the feature detection unit 2102. The left and right sides of the rectangular area 2601 can be obtained by detecting the positions of the eyes 2604 and the corners 2603. In this embodiment, however, the position of the pupil radius, for example, 2.5 times the rectangle from the center 2602 to the corner of the eyes Assume that the region is the lower left end 2605, and the right lower end 2606 is assumed to be a position that is, for example, three times the radius of the pupil from the center 2602 of the pupil toward the eye. A line that passes through the lower left end 2605 and intersects with the lower base set earlier and a line that passes through the right lower end 3606 and intersects with the lower base perpendicularly are defined as the left and right sides.
[0187]
The upper base of the rectangular area is set upward from the lower base by a distance, for example, four times the radius of the pupil. Intersections with the left and right sides are defined as an upper left end 2608 and an upper right end 2609.
[0188]
A method for determining the eye shadow application region 2610 will be described.
[0189]
First, a threshold value for dividing the luminance between the eye shadow application region 2610 and other regions in the entire rectangular region 2601 is determined.
[0190]
The luminance distribution in the rectangular area 2601 can be represented as a histogram as shown in FIG. An area included in the rectangular area 2601 is roughly divided into three areas depending on the luminance level. The region 2701 having the lowest luminance is mainly the interior of the iris (pupil), eyelashes, eyebrows, etc., the region 2703 having the highest luminance is the white eye region, and the remaining intermediate region 2702 should be applied with eyelids, that is, an eye shadow. It becomes an area.
[0191]
A discriminant analysis method is applied to the histogram of the luminance distribution of the rectangular area 2601 to determine threshold values Th1 and Th2 for dividing the luminance into three (Th1 <Th2). The discriminant analysis method uses the interclass variance σ as a discriminant criterion._B ²This is a method for determining a threshold value that maximizes. As an index representing the degree of separation by division, an index represented by the following equation can be used. Here, the threshold values when this value is maximized are Th1 and Th2.
[0192]
[Equation 5]

[0193]
Next, a line segment 2611 that divides the rectangular area 2601 into two equal parts is set. The application area 2610 is searched from the respective points on the line segment 2611 in the vertical direction (the direction indicated by the hollow arrow in the figure).
[0194]
The search is performed according to the following procedure. For the upper half, the boundary with the eyebrows is a problem, so the lower one of the two threshold values obtained earlier, that is, the threshold value Th1 is used. It moves upward from each point on the line segment 2611. If the luminance value is higher than Th1, it moves further upward. If the luminance value is lower than Th1, it is determined that the eyebrow area has been reached and the search is stopped. When this process is performed for each point, the upper boundary of the application region 2610 is obtained.
[0195]
The same applies to the lower half. Since there is a white eye that is a high-luminance part in the lower half, the condition is that the luminance value does not exceed the threshold value Th2. Further, since it is not appropriate to enter the iris region, it is also a condition that the luminance value is larger than the threshold value Th1.
[0196]
Furthermore, in order to detect the positions of the eyes 2603 and the eyes 2604, the intersection of the white eye line and the bottom of the rectangular area 2601 is obtained. The luminance value is examined from the lower right end 2606 of the rectangular area 2601 toward the left. A point where the luminance value exceeds the threshold Th2 is defined as the head 2604. Similarly, the luminance value is examined from the lower left end 2605 of the rectangular area 2601 toward the right. The point where the luminance value exceeds the threshold Th2 is defined as the corner 2603.
[0197]
An intersection point 2612 between the left side of the rectangular area 2601 and the line segment 2611 is obtained. Of the rectangular area 2610, an area below the line connecting the intersection point 2612 and the corner 2603 is not included in the application area 2610. Similarly, an intersection point 2613 between the right side of the rectangular area 2601 and the line segment 2611 is obtained. Of the rectangular area 2610, the area below the line connecting the intersection 2613 and the eye 2604 is not included in the application area 2610.
[0198]
In this way, the application region 2610 can be obtained. If there is noise in the image, the boundary line may be discontinuous. Therefore, such a state may be avoided by processing such as smoothing the luminance in the rectangular area 2601 in advance, smoothing the obtained boundary, or providing a constraint between adjacent boundary points. In the present embodiment, smoothing is performed using a median filter.
[0199]
The application region 2610 set as described above is processed to virtually apply the eye shadow. Processing is performed using α blending as in the first embodiment. In the present embodiment, the following formula is used.
[0200]
[Formula 6]

[0201]
In the above equation, (x, y) is a relative position from the center 2602 of the pupil. It is assumed that D1 is three times the pupil radius and D2 is four times the pupil radius.
[0202]
If the face is tilted, the face direction of the person is detected, and the rectangular area 2601 is converted by affine transformation or the like according to the detected direction, and finally the inverse transformation is performed.
[0203]
(Data storage unit 2103, processing method determination unit 2105, display unit 2106) The data storage unit 2103, processing method determination unit 2105 and display unit 2106 are the same as those in the first embodiment.
[0204]
(Effect of Fifth Embodiment) As described above, in the virtual makeup device according to the present embodiment, it is detected that the user is approaching the camera, and the part estimated to be viewed by the user is detected. Enlarged display is possible.
[0205]
(Sixth Embodiment) The virtual makeup apparatus of the present embodiment is configured such that when a plurality of human faces are imaged by a camera, different processing can be performed on each person. Hereinafter, a description will be given centering on differences from the fifth embodiment.
[0206]
FIG. 30 is a diagram illustrating the configuration of the virtual makeup device according to the present embodiment. This apparatus includes an image input unit 2101 that is a video camera that inputs a moving image, a feature detection unit 2102 that detects a feature point and a feature region of a person's face from the input image, and a person's face from the input image. A face area detection unit 3001 for detecting an area corresponding to a face, a face dictionary generation storage unit 3002 for generating and storing a dictionary for face recognition from the detected face image information, makeup data and positional relationship data A data storage unit 2103 for storing is provided.
[0207]
The makeup data is data of color, texture, and transparency for each type of eye shadow that is one of the makeup items. The positional relationship data is data indicating the positional relationship between the facial feature points and the feature region and the eye shadow application region.
[0208]
Furthermore, the processing method determination unit 2105 that provides a GUI for selecting the type of eye shadow stored in the data storage unit 2103 in advance by the user, and the eye shadow color selected on the eyelid portion are superimposed to simulate An image processing unit 2104 that generates a coated image, a display unit 2106 that is a display device such as an LCD or CRT that displays the processed image, and estimates the size, position, and inclination of the face, and based on the estimation And a size / position information management unit 2107 that controls the display unit 2005 to display the enlarged image.
[0209]
In this apparatus, the face area detection unit 3001 detects a face area from the input image. Using the detected face area and the information obtained by the feature detection unit 2102, the size / position information management unit 2107 estimates at what position and in what size a human face exists in the image. .
[0210]
Then, the faces in the temporally and spatially continuous positions are presumed to be the same person, and identifiers are allocated and stored in the size / position information. The image processing unit 2104 uses the identifier to perform different processing for each person, and stores the details of the processing performed in the data storage unit 2103 together with the identifier.
[0211]
When more than a certain number of images of the same person are obtained, the size / position information management unit 2107 notifies the face area detection unit 3001 to perform face recognition, and outputs an identifier and a face position. .
[0212]
Upon receiving the notification, the face area detection unit 3001 outputs the identifier and the pixel information of the face area necessary for generating the face dictionary to the face dictionary generation storage unit 3002. In addition, the face area detection unit 3001 performs face recognition and specifies a person using an identifier.
[0213]
In the present embodiment, the face recognition uses the mutual subspace method proposed in Reference 7 (Yamaguchi et al., “Face recognition system using moving images”, scientific technique PRMU 97-50, 1997). Therefore, generation of a face dictionary corresponds to obtaining a feature vector from pixel information of a face region and obtaining a partial space spanned by higher eigenvectors as a result of principal component analysis of the feature vector.
[0214]
In the face recognition in the face area detection unit 3001, a feature vector or a partial space is obtained from the face area of the input image in the same manner as the face dictionary generation, and the similarity with the face dictionary stored in the face dictionary generation period unit 3002 is obtained. calculate.
[0215]
By performing face recognition, it is possible to accurately identify a person and process each person even if a plurality of faces appear in the image at the same time.
[0216]
Even when a person once disappears from the image and then appears again in the image, the face region detection unit 3001 can identify the same person by collating with the face dictionary. If the person is registered in the face dictionary, the processing information stored in the data storage unit 2103 can be searched using the previously assigned identifier, and the processing performed previously can be reproduced. Therefore, according to the present embodiment, tracking can be performed again even if it is lost once during tracking.
[0217]
(Modification)
(1) Modification 1
In the second embodiment, when a modification such as makeup is applied in this way, there is a possibility that the receiving person's face will change too much on the receiving side, causing a problem that the transmitting person cannot be identified. .
[0218]
Therefore, in order to maintain the security on the receiving side, personal authentication using a face is performed on an image before applying deformation such as makeup, and the result is transmitted simultaneously. By doing so, it is sure that the sending person is that person, so that the receiving person can talk with peace of mind. Since the image is recognized by an image before being subjected to deformation such as makeup, the recognition rate does not decrease. Of course, personal authentication may be performed by methods other than face recognition. For personal authentication by face recognition, the method disclosed in Japanese Patent Laid-Open No. 9-251534 is used.
[0219]
(2) Modification 2
In the second embodiment, a 3D model specific to each user, wearing a motion model, preference for application, selection of color and ingredients, and the like are referred to as user parameters. These user parameters can be stored for each user. These user parameters can be called by personal authentication using login user information or an image from a camera, or by personal authentication using other fingerprints or passwords.
[0220]
(3) Modification 3
The server that holds the makeup data and adds the makeup item may be a server or the local device of the other party.
[0221]
When communicating with each other, there may or may not be a server that mediates between them.
[0222]
Further, in the

display units

106 and 906, in the example of communication, the original image and / or the processed image are displayed on either or both of the own display device and the transmission-side display device. There is no problem even if the display is performed in combination.
[0223]
Normally, it is desirable to display only processed data.
[0224]
(4) Modification 4
In the first embodiment, the case where a color contact lens wearing simulation is described, but the makeup item is not limited to a color contact lens.
[0225]
Any of the general makeup items described in the second embodiment can be implemented.
[0226]
In the second embodiment, a color contact lens may be attached.
[0227]
The same applies to the third and fourth embodiments.
[0228]
In addition, hair, accessories, etc. may be used.
[0229]
(5) Modification 5
In the first to fourth embodiments, examples of inputting a 2D moving image from one camera have been described. However, the number of cameras is not limited to one.
[0230]
That is, 3D data may be acquired by a general stereo method using a plurality of cameras. In that case, more features may be obtained than with 2D.
[0231]
For example, when applying cheeks, the position where the cheekbone comes out can be used as the reference position.
[0232]
Furthermore, 3D data can be obtained by using motion stereo, even if there is only one 3D data, and such data may be used.
[0233]
Note that 3D data may be acquired by a device other than the camera.
[0234]
(6) Modification 6
In the first and second embodiments, only the feature of the face area is used. However, as in the third embodiment, the orientation of the face may be detected and the change in the orientation may be added to the processing method.
[0235]
For example, it can be considered that the application position is changed by affine transformation according to the face orientation. This conversion is not only a simple affine transformation, but also prepares a general face shape model in advance, and calculates in advance how the application position and range will change when the face orientation changes. Or you may calculate it on the spot.
[0236]
Further, not only one face shape model but also a plurality of face shape models may be prepared, and a face shape model may be selected.
[0237]
As described above, when a user or a general three-dimensional shape model is obtained by any method, by mapping a processed image obtained by processing a two-dimensional image as a texture on the shape model, A good three-dimensional display can be obtained.
[0238]
(7) Modification 7
In the first to fourth embodiments, examples in which a PC is used have been described. However, if an image input device and an image display device are provided, the portion that performs the above process need not be a PC.
[0239]
For example, processing hardware such as a semiconductor chip or a combination board thereof, an embedded computer, a game machine, or the like may be used.
[0240]
(8) Modification 8
In the first to fourth embodiments, an example in which an input image is displayed immediately in real time has been described. However, this is not limited to the application, and an image may be acquired in advance and processed as needed. There is no problem even if display is performed or the processing result is stored in a storage medium.
[0241]
(9) Modification 9
In the first to fourth embodiments, an example in which a data set for a makeup item is prepared in advance has been described. However, it is not always necessary to prepare a data set from the beginning. The system which can input may be sufficient.
[0242]
For example, an example of lipstick will be described.
[0243]
The user shoots both the image before applying the lipstick and the image after applying with the same camera with the camera that is the image input device, and extracts the features (that is, the lip contour in this case) from both of them. Then, the image feature in the lip region, that is, the difference in color and texture is extracted, and the data is processed by a predetermined method to create makeup data.
[0244]
As processing methods, the following methods are conceivable.
[0245]
First, the average values of the extracted lip region colors, for example, RGB, are calculated before and after lipstick application, and the difference between them is used as makeup data.
[0246]
In addition, the makeup item and its data may be distributed in advance via media such as the Internet or a CD-ROM.
[0247]
Further, the makeup item data input device described above is not necessarily required for the user terminal, and may have a function only for these distribution sites.
[0248]
(10) Modification 10
In the first to fourth embodiments, how to have makeup data is an image obtained by capturing the original ingredients with a camera without a background, a state where the original ingredients are attached to or applied to an object, And the image of the target object in the state before mounting | wearing or apply | coating may be imaged with the camera, and the difference information may be recorded. In addition, there is an average value of information obtained by performing imaging on different objects, or one obtained by performing some statistical operation.
[0249]
Further, it is more preferable that the imaging is performed with respect to different sizes and orientations of the same object, and they are normalized with parameters of size and orientation, or classified and held with respect to those parameters.
[0250]
In addition to that, the same image can be pasted as it is regardless of the shooting conditions, etc., or effects such as pearl or gloss can be partially applied to the data.
[0251]
For example, a whisker is an example of the former.
[0252]
(11) Modification 11
In the first to fourth embodiments, the case where the glasses are not worn has been described, but there is no problem if the glasses may be put on as the final state, but the glasses are removed as the final state. If the user wants to make an image, the region of the glasses may be detected to create an image with the glasses removed.
[0253]
(12) Modification 12
In 3rd Embodiment, although the example which produces the material which produces the thing equivalent to the image displayed on the display part 906 in the real world from makeup data was described, about this in 1st, 2nd embodiment A color contact lens or a cosmetic material may be created in practice, or it may be created or prepared in advance as a pattern paper that is not an ingredient but an auxiliary.
[0254]
(13) Modification 13
In the first to fourth embodiments, when the detection of the facial feature or the like fails, the image of the frame is not displayed, and the image of the previous frame that has been detected is processed by the feature detected in the frame. The applied image can also be displayed.
[0255]
By doing so, even if feature detection fails, a display that does not appear unnatural is possible.
[0256]
(14) Modification 14
In the first to fourth embodiments, the case where only one screen is displayed has been described. However, this may include two or more images.
[0257]
For example, images having different times are displayed side by side as shown in FIG. 15, images having different directions are displayed side by side as shown in FIG. 16, images processed by different materials are displayed side by side as shown in FIG. It doesn't matter.
[0258]
In that case, if the same material is used and a pattern that is different only in color is displayed, it is more effective as an aid to the user when selecting.
[0259]
Furthermore, as shown in FIG. 18, it is more preferable if there is a function for enlarging and displaying only a part of an image.
[0260]
(15) Modification 15
In the first to fourth embodiments, an example in which image processing is performed by α blending has been described, but other methods may be used.
[0261]
For example, a method in which only the hue of the color information in the range is replaced can be considered.
[0262]
(16) Modification 16
In the first to fourth embodiments, examples of displaying a 2D image have been described, but the present invention is not limited to this.
[0263]
For example, when 3D data is created, it may be displayed by a 3D viewer.
[0264]
(17) Modification 17
In the first, third, and fourth embodiments, the case where the

display units

106 and 906 are displayed on a device such as a display has been described. However, for example, the user's face may be directly irradiated with a projector or the like.
[0265]
(18) Modification 18
Makeup is performed according to the shape of the individual's face, for example, applying teak along the flow of the cheek muscles or applying eyebrows along the skeleton near the eyebrows to the extension of the position of the outer corner of the eye. And a natural finish.
[0266]
Therefore, if a makeup artist or the like can apply a makeup that matches the three-dimensional shape of the face created based on the input image, a more natural makeup that matches the individual's shape can be proposed. In such a case, the 3D editor is effective.
[0267]
Also, the 3D editor is effective when the makeup selected from the makeup items prepared in advance is not automatically applied and the makeup or painting is freely created.
[0268]
As the 3D editor, for example, as shown in FIG. 19, the three-dimensional shape data is read and the display unit 106 for displaying the shape in 3D, a color palette, a makeup tool palette such as a brush, pencil, puff, and eraser, or a blur Has an effect tool palette for effects, gloss effects, textures, etc. Using makeup tools, effect tools, etc., write makeup, painting, etc. in any shape or color on the 3D object read from the data. The created writing object is a group of objects or grouped objects. Can be stored as makeup items.
[0269]
The makeup editor may be used by the person to whom makeup is applied, or may be used by a makeup artist or the like.
[0270]
It can also be used as a tool for specializing in makeup.
[0271]
Therefore, you can shoot a face image and use an editor to create your favorite makeup, or you can shoot a face image and use the editor to send the image or data you applied to an expert for comments and corrections. Or send it back, send the face image without applying make-up, receive the sent face image on the receiving side, apply the make-up to the 3D object using the editor on the receiving side, make-up The image or 3D data that has been subjected to the processing may be sent back.
[0272]
The “makeup” here is not limited to normal makeup, and may include stage makeup (for example, Takarazuka makeup, animal makeup, Kabuki makeup, etc.) and painting. In this stage make-up, the actor's face can be photographed in advance and used by make-up artists to create makeup that matches the stage. You can also create your own designs for painting.
[0273]
Although the three-dimensional makeup editor has been described above, a two-dimensional editor that can directly write to an image may be used. With such an editor function, makeup items can be created and edited freely.
[0274]
When physically applying make-up, there is a request for those who are new to make-up to know how to do make-up at all, or a request to know professional techniques even if they are not a beginner. . In response to such a request, as a support tool for physically applying makeup, a function for sequentially displaying how to apply foundation, eyebrows, teak, etc. can be incorporated into the makeup editor.
[0275]
The display may be two-dimensional or three-dimensional, and is not necessarily incorporated in the makeup editor, and the support tool may be independent. Such a support tool can also be used as a tool for conveying intangible culture to future generations, such as showing how kabuki makeup and the like are performed for children and the like.
[0276]
(19) Modification 19
In the first and third embodiments, the result of deformation such as makeup is displayed on a display device such as a display. However, the result of makeup or the like may be directly reflected on the user's face or the like.
[0277]
This may be done mechanically by the device or by a human. For example, it may be possible to paint the user's face using a brush or the like.
[0278]
(20) Modification 20
In the first and third embodiments, the processing

method determination units

105 and 505 may be directly applied to a user's face, a model face, a face on a display, or the like using an input device such as a brush or a tablet.
[0279]
For example, when painting with a brush on the user's face, it may be possible to determine where to paint by wearing a surface with a brush sensor on the user.
[0280]
(21) Modification 21
In the second embodiment, an example is shown in which the display unit 506 is connected via a network, but one or more processing method determination units 505 may also exist via the network.
[0281]
For example, a makeup artist or the like in a remote place may make up the user, modify the user's makeup, or give advice.
[0282]
Further, the processing method determination unit 505 on the other side of the network may be operated not only by a human but also by a computer or the like.
[0283]
Furthermore, a plurality of other parties may be operated on the network. In that case, not only human beings but also computers may be combined, and each may be operated simultaneously.
[0284]
(22) Modification 22
When a user is interested in makeup virtually applied in a virtual makeup device, a desire to obtain information about a method for actually performing such makeup and necessary products may arise. .
[0285]
In this modification, as a method for satisfying the above-described user desire, a function for supporting a user who wants to actually perform virtual makeup will be described in detail.
[0286]
When the user actually wants to apply makeup on his / her face as it is, it is necessary to obtain information on necessary cosmetic products. Therefore, a list of cosmetic products required for each type of makeup applied in advance is prepared, and the user can browse the list on the screen.
[0287]
This list includes information such as the manufacturer of the cosmetic product, the sales price, and the ingredients, and is important reference information when the user intends to purchase the product. The browsing timing includes “voluntary browsing” in which the user presses the product list display button by himself or “automatic display” in which the system automatically displays the product list when the user displays the same makeup for a certain period of time. Can be considered.
[0288]
As a method for purchasing a product actually selected by the user from the list, a method in which the user performs purchase processing on the network without directly visiting the store can be considered. This method is a purchase system similar to what is called "online shopping", the user selects the desired product from the list, and when the purchase button is pressed, the information is automatically transmitted to the contracting dealer, This is a system where products are sent.
[0289]
(23) Modification 23
In the second embodiment, when communication is performed mutually, makeup data may be extracted from the partner image and a makeup simulation may be performed based on the data.
[0290]
(24) Modification 24
In the first to fourth embodiments, a combination of makeup data may be prepared in advance by a communication partner when performing communication.
[0291]
Moreover, the combination may be prepared in advance that resemble a celebrity.
[0292]
When creating makeup data, there may be a function of taking a photograph of a magazine or the like with a scanner, extracting makeup data from the image, and reproducing the same makeup.
[0293]
Further, a list of necessary items may be output based on the selected combination of makeup data.
[0294]
Further, although an example in which a dialog displayed on the screen is selected with a mouse when selecting a makeup image or parameter has been described, sound may be used for these selections.
[0295]
In that case, a voice is captured from a device that captures voice, such as a microphone, into the virtual makeup apparatus, and a command represented by the voice captured by the speech recognition apparatus is executed. For example, it is possible to transmit various images such as “more sharply” or the like, to specify a position such as “a little more down”, or “more red” or “eyeshadow should be blue”. Users do not have to move their hands one by one, which is very convenient.
[0296]
Further, the makeup data and the combination thereof can be stored as a predetermined makeup language. This language is specified for each part such as the mouth and eyes, or for each item such as lipstick and eye shadow, and the position, shape, color, and texture are specified as parameters. The makeup method for the entire face can be determined by their combination. Such data can be determined as general data, or can be determined individually for each manufacturer that manufactures makeup items. Once determined, makeup can be reproduced in a language, so for example, data can be reduced when communicating, and anyone can reproduce the same makeup.
[0297]
(25) Modification 25
Although the attention area has been described as the pupil area in the fifth embodiment, the present invention is not limited to this. For example, a place designated by the user may be set as the attention area, or the center of the image may be fixed as the attention area. Further, the gaze position of the user may be set as the attention area by using a gaze position detection method proposed in Japanese Patent Application Laid-Open No. 11-175246.
[0298]
In the fifth embodiment, an example in which a guide pattern is used for tracking a pupil has been described. Even if the face does not move, only the pupil moves, so there is a possibility that the area to be expanded will shake more than necessary. In order to prevent this problem, for example, the relative positions of the pupils with respect to the eyes and the corners of the eyes may be held in advance when the guide pattern is generated and compared at the time of tracking.
[0299]
(26) Modification 26
FIGS. 31A and 31B are diagrams illustrating an example of the image input unit 2101 and the display unit 2106 in the virtual makeup device according to the fifth embodiment. In this modified example, a camera 3101 that is an image input unit 2101 and a display 3102 that is a display unit 2106 are integrated, and a handle 3103 is provided for the user to hold.
[0300]
FIG. 31A shows a case where the entire face is shown on the display 3102 while the user is away from the camera 3101. FIG. 31B shows a state in which the user has his eyes close to the camera 3101 and the periphery of the eyes is displayed on the display 3102.
[0301]
If the image input unit 2101 and the display unit 2106 have such an integrated structure, the user can use the image input unit 2101 and the display unit 2106 as if they were actually using a hand mirror.
[0302]
Communication with other parts (for example, the feature detection unit 2102 and the image processing unit 2104) may be performed via a cable or may be performed wirelessly.
[0303]
Note that this modification may be realized using a camera-equipped mobile terminal such as a camera-equipped PDA, a camera-equipped mobile phone, or a camera-equipped tablet PC.
[0304]
【The invention's effect】
As described above, according to the present invention, an image is displayed and displayed in real time, and feature points and the like are detected robustly. A virtual makeup device capable of obtaining the above becomes possible.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a first embodiment.
FIG. 2 is an external view of the first embodiment.
FIG. 3 is a diagram illustrating a detection example of a pupil iris region.
FIG. 4 is a diagram of an example of a selection dialog.
FIG. 5 is a configuration diagram of a second embodiment.
FIG. 6 is a diagram illustrating an example of an eye shadow application region.
FIG. 7 is a diagram of an example of a cheek application region.
FIG. 8 is a diagram of an example of a lipstick application region.
FIG. 9 is a configuration diagram of a third embodiment.
FIG. 10 is a diagram illustrating an example of painting.
FIG. 11 shows how a motion matrix is calculated.
FIG. 12 is a diagram of an example of a processing tool material.
FIG. 13 is a process flow of the fourth embodiment.
FIG. 14 is a diagram illustrating an example of creating an image under a lighting condition in another environment.
15 is a first diagram of an example of multi-image display in Modification Example 14. FIG.
FIG. 16 is a second diagram illustrating an example of displaying a plurality of images according to Modification Example 14;
17 is a third diagram illustrating an example of multi-image display in Modification Example 14. FIG.
FIG. 18 is a diagram illustrating an example of an enlarged image display in Modification Example 14;
FIG. 19 is a diagram illustrating an example of a 3D editor according to Modification 18;
FIG. 20 shows an example of a computer that operates a program in the fifth embodiment.
FIG. 21 is a configuration diagram of a fifth embodiment.
FIG. 22 is a diagram for explaining a processing flow of a feature detection unit according to the fifth embodiment;
FIG. 23A is a detection example of a pupil iris region. (B) Example of detection of pupil iris region.
FIG. 24 shows an example of a normalized image for a guide pattern.
FIG. 25 shows an example of guide pattern parameters.
FIG. 26 shows an example of an eye shadow application area.
FIG. 27 is an explanatory diagram of threshold determination based on a luminance histogram.
FIG. 28 is a display example of the fifth embodiment.
FIG. 29 is an explanatory diagram of a tracking method for a region of interest.
FIG. 30 is a configuration diagram of the sixth embodiment.
FIG. 31A is a diagram in which the entire face is displayed in Modification 26. FIG. (B) The figure which enlargedly displayed the periphery of eyes in the modification 26. FIG.
[Explanation of symbols]
100 virtual makeup device
101 Image input unit
102 Feature detection unit
103 Data storage unit
104 Image processing section
105 Processing method determination unit
106 Display section
2001 processor
2002 Memory
2003 magnetic disk drive
2004 Optical disk drive
2005 Image output unit
2006 Input acceptance unit
2007 I / O section
2008 Display device
2009 input device
2010 External device
2101 Image input unit
2102 Feature detection unit
2103 Data storage unit
2104 Image processing unit
2105 Processing method determination unit
2106 display unit
2107 Size / Location Information Management Department
3001 Face area detection unit
3002 Face dictionary generator

Claims

In a virtual makeup device that performs image processing on each frame image before makeup in a moving image including at least a person's face area, and processes the image into a face image of a person who has applied makeup,
A face shape model storage means for storing a face shape model;
Feature detection means for detecting feature points, feature regions, or both of the face from the image of each frame before makeup;
Data relating to the item type relating to the item to be applied, facial feature points, feature areas, or both, positional relationship data indicating the positional relationship between the item, and the color, shape, or the item Data storage means for storing makeup data including the application range;
From the stored item type, processing method determining means for determining data relating to the type of item necessary for makeup,
Face direction detecting means for detecting a face direction by obtaining a rotation angle based on a two-dimensional coordinate of a feature point detected from the image of each frame;
Application range calculation means for calculating the application position and range of the item according to the face orientation of each frame using the face shape model;
Image processing means for processing an image so as to make up the face image in each frame of a moving image in accordance with the application range of the item calculated by the application range calculation means ;
Display means for displaying an image of the face with the makeup;
A virtual makeup apparatus characterized by comprising:

2. The virtual makeup apparatus according to claim 1, further comprising display image transmission means for transmitting the face image to which makeup has been applied through a network.

The virtual makeup apparatus according to claim 1, wherein the feature detection unit receives data related to the image before makeup through a network.

Virtual cosmetic device according to at least one item of claims 1 to 3, characterized in that paste the image subjected to the decorative texture mapping to three-dimensional model of the face of the person.

In the display means, images with makeup at different times, images with makeup in different face directions, images with makeup with different resolutions, images with makeup of different parts, or with different makeup virtual cosmetic device according to at least one item of claim 1, wherein the displaying the images simultaneously 4.

The virtual makeup device according to at least one of claims 1 to 5 , wherein the item is a color contact lens, eye shadow, teak, lipstick, or tattoo.

In a virtual makeup method for processing an image of each frame before makeup in a moving image including at least a person's face area, and processing the image into a face image of a person who has performed makeup,
A face shape model storing step for storing a face shape model;
A feature detection step of detecting a feature point, a feature region, or both of the face from an image of each frame before the makeup;
Data relating to the item type relating to the item to be applied, facial feature points, feature areas, or both, positional relationship data indicating the positional relationship between the item, and the color, shape, or the item A data storage step for storing makeup data including the application range;
From the stored item type, a processing method determining step for determining data related to the item type necessary for makeup;
A face direction detecting step of detecting a face direction by obtaining a rotation angle based on a two-dimensional coordinate of a feature point detected from the image of each frame;
An application range calculation step of calculating the application position and range of the item according to the orientation of the face shape model;
An image processing step of processing the image so as to make up the face image in each frame of the moving image according to the application range of the item calculated by the application range calculation means ;
A display step for displaying an image of the face to which the makeup has been applied;
A virtual makeup method characterized by comprising:

In a program for realizing a virtual makeup method by a computer that performs image processing of each frame image before makeup in a moving image including at least a person's face area, and processes the image into a face image of a person who has applied makeup,
A face shape model storage function for storing a face shape model;
A feature detection function for detecting feature points, feature regions, or both of the face from the image of each frame before the makeup;
Data relating to the item type relating to the item to be applied, facial feature points, feature areas, or both, positional relationship data indicating the positional relationship of the item, and the color, shape, or the item A data storage function for storing makeup data including the application range;
From the stored item type, a processing method determination function for determining data related to the item type necessary for makeup;
A face direction detection function for detecting a face direction by obtaining a rotation angle based on a two-dimensional coordinate of a feature point detected from the image of each frame;
An application range calculation function for calculating the application position and range of the item according to the orientation of the face shape model;
An image processing function for processing the image so as to make up the face image in each frame of the moving image according to the application range of the item calculated by the application range calculation means ;
A display function for displaying an image of the face with the makeup;
Is realized by a computer.