JP4318825B2

JP4318825B2 - Image processing apparatus and image processing method

Info

Publication number: JP4318825B2
Application number: JP2000036759A
Authority: JP
Inventors: 和裕佐伯
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-02-15
Filing date: 2000-02-15
Publication date: 2009-08-26
Anticipated expiration: 2020-02-15
Also published as: JP2001229400A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力画像から３次元画像を生成する画像処理装置および画像処理方法に関し、特に、ＣＣＤカメラ等で得られた顔画像データ（２次元顔画像データ）を３次元画像とする等、数点の顔特徴点（顔特徴部分）の抽出制御を必要とする画像変換処理に関する。
【０００２】
【従来の技術】
近年、パソコンに代表される情報機器において、カメラデバイスの小型化技術の進歩や、カメラデバイス部品の低価格化などに起因し、ＣＣＤカメラを内蔵するタイプの機種が増加している。そのため、このような機能を利用して、ＣＣＤカメラで取り込んだ自分や隣人、知り合い等の２次元顔画像を元に３Ｄ人物モデルを生成する機能や、２次元顔画像を似顔絵化／アニメーション化形成する等の機能を持つＰＣアプリケーションソフトウエア等が増加しつつある。
【０００３】
これら２次元顔画像の成形技術は、従来より表情生成等をテーマとして、さまざまな研究開発がされており、３Ｄモデル生成（符号）化に関しては、例えば、目や口部分の形状変化の抽出、合成、符号化方法についての論文「形状変化の検出と３次元形状モデルに基づく顔動画像の符号化」（信学技報 IE87-101）が開示されており、また、似顔絵化／アニメーション化に関しては、例えば、人手で描いた正面顔と側面顔の似顔絵を材料として任意の方角から見た似顔絵を各個人の特徴をとらえて自動生成することを目的とした「３次元似顔絵生成方法および装置」（特開平10-74271号公報) が開示されており、また、入力された単一の画像データに基づいて画風や表情等の種別が異なる複数種の画像から単一または複数の画像を選択して生成することを目的とした「画像作成装置、画像作成方法および画像作成プログラム記録媒体」（特開平10-232950 号公報) が開示されている。
【０００４】
ところで、上記したようにＣＣＤカメラ等で得られた顔画像データ（２次元顔画像データ）を元に３Ｄ人物モデルを生成する方法や、２次元顔画像を似顔絵化／アニメーション化成形する方法においては、必ず、２次元顔画像データ内の顔特徴点の抽出が必須の構成要件となる。この顔特徴点を抽出するための概略ブロック図を図６に示す。すなわち、ＣＣＤカメラ等の原画像入力手段１０１より入力された顔画像データを特徴点抽出手段１０２に出力する。特徴点抽出手段１０２では、顔の各部（目、鼻、口等）の位置を抽出し、その抽出した各部の座標位置と原画像とを原画像成形手段１０３に出力する。ここで、特徴点抽出手段１０２による顔の各部の位置の抽出は、どのような成形をするのかによって異なる。すなわち、抽出されるデータ内容は、原画像成形手段１０３の成形内容に依存し、点ではなく領域であったり、各部の中心位置と形状とであったりする。
【０００５】
ただし、ここでは説明を容易とするために、以降の説明では「点」として説明する。また、以降の制御内容については、個々に相違するため、詳細については省略するが、例えば、３Ｄ人物モデルの場合には、以下の制御手順となる。
【０００６】
すなわち、（１）特徴点情報ならびに原画像データを元にテクスチャを生成する。（２）特徴点情報を用いてテクスチャマッピングするための３次元人物モデル（顔の部分がのっぺら棒の３Ｄ人物オブジェクト）を生成する（標準モデルを変形して生成する）。（３）テクスチャマッピングを行う。すなわち、原画像入力手段１０１より得られた顔画像が埋め込まれた３Ｄ人物オブジェクトを符号化する。（４）３Ｄ人物オブジェクトを表示する。このような（１）〜（４）の制御手順で行われる。ここで、（１）と（２）とは一般に並列処理される。すなわち、制御に後先の順番はない。
【０００７】
【発明が解決しようとする課題】
上記した特徴点抽出手段１０２を具現化する方法として、抽出すべき特徴点を全て抽出する（これを方法１という）、抽出すべき特徴点を全てユーザに指定させる（これを方法２という）、の２種類が考えられる。
【０００８】
方法１では、一般に、（ａ）与えられた原画像データより、顔の輪郭を抽出する。（ｂ）得られた顔の輪郭領域の範囲で、各部を抽出する。の２ステップより実現することになる。ところが、（ａ）の処理において、図７に示すように、顔の輪郭領域の大きさ（左が大、右が小）や位置（左が中央、右が左端）が、極端に異なる原画像も等価に扱う必要があり、これは、抽出精度の低下や、速度パフォーマンスの低下等を引き起こす要因となる。また、顔の輪郭領域の抽出方法は、肌色部分の抽出により実現されるのが一般的（現時点では、自動抽出するためには、基本的に、肌色部分の抽出に頼るしか方法がない）であり、特に、次のステップ（ｂ）での各部の抽出処理における鼻の抽出は、基本色が肌と同一のため、非常に困難である。また、これ以外にも、方法１による実現において、致命的な問題がある。それは、図８に示すように、複数の人物が撮影されている画像が原画像として入力されたときに発生し、成形対象が判別できなという問題である。
【０００９】
次に、方法２の問題点について説明するが、その前に、３Ｄ人物モデルを生成する場合を例として、特徴点抽出手段１０２で抽出すべき特徴点について説明する。
【００１０】
−特徴点抽出手段１０２で抽出すべき特徴点−
特徴点を多く抽出すればするほど、より実体にあったテクスがチャマッピングされる３次元人物モデル（顔の部分がのっぺら棒の３Ｄ人物オブジェクト）の生成が可能であるため、最終的に生成される３Ｄ人物モデルもより高度な表情生成が可能となり、高品質なものとなることは当然のことである。この場合、以下に示すさまざまな要因［１）〜４）として記載］から考えると、方法２における抽出特徴点は、鼻部、目部（左右）、口部、耳部（左右）、顎部の７点と考えるのが妥当である。
【００１１】
１）顔の凹凸の認識
最終目的が立体画像（３次元画像）の生成と表示にあることから、多少なりとも顔の凹凸検出処理が必要であり、その意味から、最低限、鼻部（顔の頂点部分に相当）と目部の抽出が必要である。なお、目部については、口部でも代替可能であるが、顔の表情を表現する意味では、目部が優先されると考えられる。
【００１２】
２）原画像の顔の向き
原画像は、必ずしも完全な正面画像となっている保障はない。従って、立体画像を生成するために、原画像の顔の向きを算出する必要がある。顔の向きを算出するためには、少なくとも顔の各部のうち、３点（そのうち２点は、目部などの対を成すものである必要がある）の指定が必要となるが、この場合、目部（左右）と、ほぼ顔の中央に存在する鼻部とがベストであると考えられる。すなわち、左右の向きに限定すれば、鼻部を口部で代替することも可能であるが、上下の顔の向きの算出まで考えると、目部（左右）と口部だけでは算出不可能である。また、目部（左右）の代替として耳部（左右）も可能であるが、より高度な表情生成等を行うときの視線検出を考慮すると、ややふさわしくないと考えられるため、目部（左右）と鼻部とがベストであると考えられる。
【００１３】
３）原画像に対する顔の輪郭領域の抽出
上述の通り、顔の特徴点を自動検出する場合、一般的に、原画像より顔の輪郭領域の抽出処理を実施する。この処理は、すべての顔の特徴点をユーザ指定する方法で実現した場合においても、詳細な処理内容は異なるが、３次元人物モデル（顔の部分がのっぺら棒の３Ｄ人物オブジェクト）を生成（場合によっては、あらかじめ準備された標準モデルを変形して生成）するときの、３次元人物モデルの顔の輪郭生成時に処理される。この処理（顔の輪郭生成／変形）を容易とするためには、顔の輪郭を成す特徴点を、少なくとも３点指定することになるが、この３点は、耳部（左右）と顎部とするのが一般的である。
【００１４】
４）特徴点としての指定
似顔絵生成の場合においても３Ｄモデル生成においても、原画像内の人物に似た成形画像を出力することを目的とするので、少なくとも目部と口部とは、特に重要な顔の特徴点となる。なお、髪型や眼鏡の有無、形状等も、人物を特定するのに重要な特徴点となるが、これらは部品化しやすい部分、および部品化すべき部分であるため、本発明では主たるテーマとしていない。すなわち、眼鏡については、眼鏡部の抽出も可能ではあるが、それよりも眼鏡をかけていない原画像に限定し、眼鏡部品をいくつか準備する方が好ましいアプリケーションソフトとなる場合がある。また、髪型についても眼鏡と同様である。だだし、髪型の場合は、原画像を坊主頭にするわけにいかないので、何らかの抽出処理は必要となる。ここでは、説明を容易とするために、眼鏡については、原画像には眼鏡をかけた人物画像はないという原則で記載している。また、髪型については、顔の輪郭領域の抽出処理に髪型抽出が含まれているという原則で記載している。
【００１５】
以上、１）〜４）で説明したように、方法２（すなわち、「抽出すべき特徴点を全てユーザに指定させる。」）であっても、その特徴点として顔のどの部品を指定させるかは、最終目的（成形画像）にもよるが、鼻部、目部（左右）、口、耳部（左右）、顎部の７点、もしくはそれ以上とするのが一般的である。
【００１６】
そして、この「抽出すべき特徴点を全てユーザに指定させる。」という方法２においても、次の２つの問題が発生する。すなわち、ユーザの指定付加の問題と、対を成す部品（目部、耳部）の指定誘導メッセージの問題である。
【００１７】
ユーザの指定付加の問題について説明すると、ＣＣＤカメラ等が比較的高価なデバイスで、本デバイスが標準装備された情報機器が特殊な用途用に商品化されていた時代では、上述した２次元の人物画像を３次元画像としたりといった利用シーンは、それなりに高価なシステム（例えば、ＴＶ会議システム）の１機能（アプリケーションソフト）として商品化されていた。そのため、利用者の３次元画像の生成にかかる負荷よりも、３次元画像の品質（リアルな表情を如何にして生成するか等）の方が重視される傾向にあった。
【００１８】
しかしながら、最近では、ＣＣＤカメラの低価格化や、ＣＰＵや画像処理用演算処理の速度面での高パフォーマンス化に伴い、例えば、送信者をイメージする３次元画像をメールに付加して送信する等といった具合に、３次元画像もエンターテイメント的な用途に使われるようになりつつある。そのため、３次元画像生成のためのユーザ負荷（ユーザインタフェース）を少なく、かつ安易としたいといった要望が増大する傾向にある。
【００１９】
次に、対を成す部品（目部、耳部）の指定誘導メッセージの問題について説明すると、一般に、各特徴点指定のユーザインタフェースは、図９に示すように、「×××を指定してください。」というメッセージを表示する対話型のユーザインタフェースとなる。ここで、図９に示すように、「右目を指定してください。」というメッセージが表示されたとき、ユーザ（以下、利用者ともいう）は、どちらの目がポインティングするかといった問題である。すなわち、人物画像の本来の右目と、利用者から見て右側にある右目のどららの眼を利用者がポインティングするかといった問題である。
【００２０】
人物の原画像は、利用者（ポインティングする人）からは、向かい合ったものになり、右眼は左側、左眼は右側に存在するが、それを利用者が意識して指定するか否か、右目といわれて左側の目の部分をポインティングするか否かは、おそらく利用者に依存することになる。
【００２１】
ただし、処理ロジック上は、次に続くであろう「左目を指定して下さい。」のメッセージで、「右目を指定してください。」のメッセージ時に指定した逆側の目の部分を指定さえすれば、何ら問題もないが、途中で利用者が迷ったとき、例えば「右目を指定して下さい。」のメッセージで誤って（勘違いして）、左側の眼を指定し、次の「左目を指定して下さい。」のメッセージで、先程の入力の誤りに気づいて、利用者がどうすれば良いか悩んだり、再度、右側の眼、すなわち正しい左側を指定したりする、といったシーンが発生し得るという問題である。
【００２２】
なお、補足すると、現状では、上記の問題を擬似的に解決する方法として、図１０に示すように、各指定部位を名前ではなく番号で指定させることによって、利用者に誤りや迷いを起こさせないように考慮したユーザインターフェイスをとる方法もある。しかしながら、例えば、口部（番号４の部分）といった場合に、口の中心か、それとも口のどこでも良いのかという疑問が生じる問題までは解決できない。すなわち、利用者は、内部ロジックまでは分からない。この場合、口周辺であればどこでも良いロジックとする（すなわち、指定された点を基点に抽出処理する）のが一般であるが、利用者はそのこと自体を知らないので、どうすればよいのかといった疑問を持つことになるが、その疑問を持つこと自体が問題となる。
【００２３】
本発明はかかる問題点を解決すべく創案されたもので、その目的は、２次元顔画像の３次元変換やアニメ化変換等において、顔特徴点抽出処理に伴う利用者の入力指定を簡便なものとし、かつ、高速な顔特徴点抽出処理を実現し得る画像処理装置および画像処理方法を提供することにある。
【００２４】
【課題を解決するための手段】
上記課題を解決するため、本発明の画像処理装置は、原画像を入力する画像入力手段と、顔部品の抽出を補助するための予め用意されているカラーパレットと、前記画像入力手段により入力された原画像データおよびこの原画像データ内に存在する顔画像の１つの顔部品である鼻の表示位置の指定を利用者に促すメッセージを表示する一方、前記原画像データと前記カラーパレットとを並べて表示する表示手段と、顔部品である鼻の表示位置の指定を促す前記メッセージに従い、表示上の１点である鼻の位置座標を入力する入力手段と、前記表示手段に表示された前記カラーパレットを用いて前記鼻の色値を指定するパラメータ指定手段と、前記入力手段により入力された鼻の位置座標および前記パラメータ指定手段により指定された鼻の色値を用いて、鼻以外の顔部品を肌色もしくは肌色以外の個所を抽出することで抽出する顔部品抽出手段と、この顔部品抽出手段にて抽出された顔部品および入力された基点の座標情報に基づいてテクスチャを生成し、３次元人物モデルにマッピングを行って３次元画像を生成する画像生成手段と、を備えたことを特徴とする。また、前記顔部品抽出手段は、前記鼻の位置座標との相対位置関係より前記原画像を複数の検索領域に分割し、各検索領域について、前記鼻の色値に所定の許容範囲を持たせた範囲でなくなった閉空間または閉図形を抽出することで、各検索領域での顔部品を抽出する構成とする。
【００２５】
ここで、画像入力手段には、ＣＣＤカメラ等のカメラデバイスだけでなく、ファイル化された画像データ群より対象とするファイルを選択することも、画像入力手段の範疇に含まれる。また、入力手段には、マウスなどの表示上の任意の１点を入力するポインティングデバイスが含まれる。
【００２６】
このような特徴を有する本発明によれば、従来の方法１「顔部品を全て自動抽出する」の場合と比較して、処理速度の向上、生成もしくは成形する画像データの品質の向上を図ることができる。また、複数の人物が含まれる原画像データであっても、１つの特徴点（顔部品）を指定するのみで、対応することが可能となる。また、従来の方法２「２以上の顔部品をユーザ指定させる」の場合と比較しても、生成もしくは成形される画像データの品質をさほど劣化させることなく、利用者の操作負担の軽減や誤操作の防止を図ることができる。
【００２７】
また、本発明の画像処理装置は、入力手段により入力される１点を鼻の位置座標としている。すなわち、「１つの顔部品の表示位置の指定を促すメッセージ」は、鼻の入力を促すメッセージである。顔の特徴点の抽出制御を考えると、入力する原画像として、横を向いた画像や、後ろ向きの写真が入力されることはまずなく、通常はほぼ正面を向いた写真が入力される。この場合、鼻は顔のほぼ中心に位置することになる。また、「鼻を指定してください。」というメッセージを表示したときに、鼻の付け根部分等を指定する人はほとんどいないと考えられる。さらに、鼻は１人１つしかない。さらにまた、顔輪郭や顔部品の抽出ロジックの基本は、肌色の部分または肌色でない部分を検出することであり、この場合、鼻の抽出は、口や目などの他の部品と比較して、自動抽出が難しい。さらにまた、人の顔部品のレイアウトは万人共通である。このようなさまざまな理由から、特に、鼻点をユーザ指定させることで、複数点指定させる場合と比較して、最終出力データの品質を劣化させることなく、利用者の操作負担の軽減や誤操作の防止を図ることができる。
【００２８】
また、本発明の画像処理装置は、顔部品の抽出を補助するカラーパレットを備えており、パラメータ指定手段は、原画像の鼻位置にドロップすることで、前記ドラッグ位置のカラーパレット値を前記ドロップ位置の鼻の色値として指定するようになっている。本発明によれば、特徴点を補助するパラメータ（カラーパレット値）は、表示オブジェクト化された各パラメータ値の表示位置を押下し続ける操作（ドラッグ操作：マウスではマウスボタンを押し続けてポインティング移動させる操作、ペンではタブレットにペンを押し付けてポインティング移動させる操作）により、その押下したときの座標にて決定し、顔部品の表示位置は、その押下し続ける操作（ドラッグ操作）の継続をストップする操作（ドロップ操作：マウスボタンアップやペンアップ）時の座標にて決定する。
【００２９】
また、本発明の画像処理装置は、顔部品の抽出を補助するカラーパレットを備えており、パラメータ指定手段は、前記表示手段に表示された前記カラーパレットに設けられたスライダーを操作することにより、そのスライダー位置のカラーパレット値（補助パラメータ）を前記鼻の色値として指定するようになっている。また、前記表示手段には、髪色のカラーパレットと複数パターンの顔形状とが合わせて表示され、前記パラメータ指定手段は、前記表示手段に表示された前記髪色のカラーパレットに設けられたスライダーを操作することにより、そのスライダー位置のカラーパレット値を髪色に指定し、前記表示手段に表示された複数パターンの顔形状の中から一つの顔形状を指定するようになっている。
【００３０】
この補助パラメータ（カラーパレット値）の指定は、特徴点の指定数を減らしても高品質な出力データを得るための代替手段であるが、１点も指定しない場合と、１点指定する場合とでは、補助パラメータ指定量は激減する。つまり、細かな指定をさせなくても品質はそれなりに確保できる。また、利用シーン、商品コンセプト、抽出ロジック等にも関係するが、１つのパラメータの指定で目的が達成する場合もあり、この場合のマンマシンインターフェイスとしては、ドラッグ・アンド・ドロップ操作が最適である。
【００３１】
また、２次元顔画像の３次元変換やアニメ化変換等において、顔特徴点抽出処理に伴う利用者の入力指定を簡便なものとする顔特徴点抽出処理の手法は、必ずしも装置という形態には限られず、方法としても実現し得る。そこで、本発明の画像処理方法は、入力画像から３次元画像を生成する画像処理装置を用いた画像生成方法であって、顔部品の抽出を補助するためのカラーパレットが予め用意されており、画像入力手段から入力された原画像を記憶するステップと、画像入力手段から入力された原画像データおよびこの原画像データ内に存在する顔画像の１つの顔部品である鼻の表示位置の指定を利用者に促すメッセージを表示手段に表示するステップと、前記画像入力手段から入力された原画像データと前記カラーパレットとを並べて表示手段に表示するステップと、入力手段から入力された表示上の１点である鼻の位置座標を記憶するステップと、前記表示手段に表示された前記カラーパレットを用いて指定された鼻の色値を記憶するステップと、顔部品抽出手段が、前記入力手段より入力された鼻の位置座標および前記指定された鼻の色値を用いて、鼻以外の顔部品を肌色もしくは肌色以外の個所を抽出することで抽出するステップと、画像生成手段が、前記顔部品抽出手段にて抽出された顔部品および入力された基点の座標情報に基づいてテクスチャを生成し、３次元人物モデルにマッピングを行って３次元画像を生成するステップと、を備えたことを特徴とする。
【００３２】
さらに、本発明にかかる技術的思想は、上記の画像処理方法の各ステップをコンピュータに実効させる画像処理プログラムを記録した記録媒体として提供することも可能である。
【００３３】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照して説明する。
【００３４】
図１は、人物の２次元顔画像データを３次元顔画像に変換する本発明の画像処理装置のブロック構成図である。
【００３５】
この画像処理装置は、２次元画像入力制御部１０、原画像指定部１１、鼻点指定部１２、特徴点抽出部１３、３Ｄ画像符号化部１４、３Ｄ画像表示部１５、ＣＣＤカメラ２０、入力装置３０、出力装置４０、原画像記憶装置５０、各種変数・定数記憶装置５１、標準人物モデル等（符号化用定数）記憶装置５２、および成形画像記憶装置５３によって構成されている。
【００３６】
２次元画像入力制御部１０は、接続されたＣＣＤカメラ２０に対し、原画像データを送るようにシャッターを制御し、ＣＣＤカメラ２０より送られてきた原画像データを原画像記憶装置５０に記憶（ファイル化）するまでの制御を行う。本実施形態では、ＣＣＤカメラ２０はカラー画像が入力可能な画像取り込みデバイスとしている。
【００３７】
原画像指定部１１は、原画像記憶装置５０に記憶されている複数の原画像ファイルの中から、入力装置３０により指定された原画像ファイルを読み込み、鼻点指示部１２へ原画像ファイルを引き渡すまでの制御を行う。
【００３８】
鼻点指示部１２は、原画像指定部１１から引き渡された原画像データを、出力装置４０に出力するとともに、入力装置３０を介して利用者より入力された鼻点と抽出補助パラメータ（ここでは、後述する顔色値）を特徴点抽出部１３に引き渡すまでの制御を行う。
【００３９】
特徴点抽出部１３は、鼻点指示部１２より引き渡された鼻点座標と顔色値（抽出補助パラメータ）を参考に、各種変数・定数記憶装置５１より抽出用の各種変数・定数値等を読み出し、これらの値を利用して各顔の特徴点を抽出する。抽出方法は、鼻点座標を拠点座標とし、顔色値等を元に読み出された各種変数・定数値（抽出補助パラメータ）を使って、原画像をいくつかの領域に分割し、目部、口部等の抽出する各部品の抽出処理を、それに対応する領域内のみ処理する。また、ここで抽出された特徴点は３Ｄ画像符号化部１４へ特徴点情報として引き渡される。ここまでの一連の制御を特徴点抽出部１３で行う。
【００４０】
３Ｄ画像符号化部１４は、引き渡された特徴点情報と原画像データを基にテクスチャを生成した後、標準人物モデル等記憶装置５２より特徴点情報を基に適切な３次元人物モデルを読み込み、テクスチャマッピングを行って、成形画像記憶装置５３に記憶させる一連の制御を行う。
【００４１】
３Ｄ画像表示部１５では、成形画像記憶装置５３より３Ｄ画像を読み込み、出力装置４０に３Ｄ画像を表示する制御を行う。
【００４２】
以上が本発明の画像処理装置の各部の制御であるが、以下に、本発明の特徴部分である鼻点指示部１２の制御内容について、さらに詳しく説明する。
【００４３】
図２は、抽出補助パラメータとして原画像の人物顔色のおおよそ値（カラーパレット値）をユーザ指定させるときの、鼻点指示部１２の画面レイアウト例を示している。ここで、出力画面右側の顔色カラーパレット指定域［（cX min,cY min)〜(cX man,cY man）］に、人物の顔色となり得る色（Cmin〜Cman）をグラデーション表示し、また、出力画面左側の原画像表示域［（uX min,uY min)〜(uX man,uY man）］には、原画像指定部１１より入力した画像データを表示する。さらに、原画像の人物の顔色に近い個所より原画像の鼻位置までをドラッグ・アンド・ドロップすることを利用者に誘導するメッセージ（この例では、「人物顔色にあったカラーパレットをドラッグし、鼻位置にドロップください」という文字列）を、出力画面下部のメッセージ表示部に表示する。
【００４４】
これにより、顔色カラーパレット指定域内の原画像の人物の顔色に近い個所をドラッグし、原画像の鼻位置にドロップするという一連の入力操作を誘導し、その利用者の入力操作により得られた入力情報を基に、原画像表示域内の鼻点座標と顔色カラーパレット値を算出することになる。
【００４５】
図３は、鼻点指示部１２の制御内容、すなわち上記の算術ロジック（すなわち、鼻点座標ならびに顔色カラーパレット値の算術ロジック）を示すフローチャートである。
【００４６】
すなわち、まず最初に、図２に示される顔色カラーパレット指定域内の原画像の人物の顔色に近い個所をドラッグし、原画像の鼻位置にドロップするという一連の入力操作を誘導する入力画面（図２）を生成して表示する（ステップＳ１）。次に、入力装置３０である例えばマウスより「左ボタンダウン」メッセージが入力されると（ステップＳ２）、マウスがダウンされた座標が顔色カラーパレット指定域内であることを条件に（ステップＳ３でＹＥＳと判断されることを条件に）、ボタンダウンされた座標より、顔色カラーパレット値を変数FaceC に代入する（ステップＳ４）。この処理に引き続き、マウスダウン中に、マウスより「左ボタンアップ」メッセージが入力されると（ステップＳ５でＹＥＳと判断されると）、マウスがアップされた座標が原画像表示域内であることを条件に（ステップＳ６でＹＥＳと判断されることを条件に）、ボタンアップされた座標を（NosePx,NosePy ）に代入する（ステップＳ７）。そして最後に、変数FaceC 、NosePx、NosePyをパラメータとし、制御を特徴点抽出部１３へ遷移する（ステップＳ８）。
【００４７】
なお、特徴点抽出部１３では、鼻点指定部１２より入力された変数FaceC 、NosePx、 NosePy を利用して、鼻以外の顔特徴点（座標となるか、領域となるか、オブジェクトとなるかは、この特徴点抽出部１３の仕様に依存する）を抽出することになる。この抽出処理の基本ロジックは、従来の顔特徴点の抽出ロジックと同様であるので、ここでは詳細な説明を省略するが、抽出ロジックの基本的な考え方は、肌色もしくは肌色以外の個所を抽出することで実現される。
【００４８】
図４は、特徴点抽出部１３で鼻点座標（顔色カラーパレット）値（FaceC 、NosePx、 NosePy の３変数）をどのように利用するかの概要（この３変数を利用しての抽出ロジックの概要）を、表形式にまとめて列記している。
【００４９】
図４に沿って説明すると、まず、輪郭の抽出は、鼻点（NosePx、 NosePy ）を中心に顔色カラーパレット値FaceC ±ｄ（ｄは許容範囲）でなくなった閉空間を抽出することで実現できる。あらかじめ指定された鼻点を中心に処理できるため（すなわち、顔色でない領域を抽出するための基点を鼻点とできるため）、抽出すべき特徴点を自動検出する方法（発明が解決しようとする課題の欄で説明した方法１）と比較した場合、抽出ロジックは極めて容易に、また、極めて精度の高い抽出が可能となる。
【００５０】
次に、右目部の抽出は、鼻点との相対的位置関係より、検索領域はｕ１とし、抽出基本ロジックは、顔色カラーパレット値FaceC ±ｄ（ｄは許容範囲）でなくなった閉図形で白目と黒目が存在することから抽出することで実現できる。これを上記と同様方法１と比較すると、ロジックの簡便化、精度の正確さの向上以外に、検索領域を絞ることによる検索時間の高速化を可能とする。なお、検索領域が異なるだけで、左目部、口部の抽出も同様である。
【００５１】
次に、耳部や顎部の抽出は、方法１では特に抽出されないのが一般である。また、抽出すべき特徴点を全てユーザ指定させる方法（発明が解決しようとする課題の欄で説明した方法２）によって耳部や顎部をユーザ指定させるのは、輪郭抽出や顔の向きを求める等の処理において精度の向上やロジックの簡易化を図ることが目的であり、特にそれほど精度が要求されないアプリケーションソフトを考えると、代替にするのが望ましい。つまり、「顔の向き」は、鼻点と左右の目点より、ほぼ顔の向きに近い向きが算出できるので、これで代替しても、精度面でもさほど問題はない。また、顔の輪郭抽出等は、抽出の基点が明確にさえなっていれば、速度面、精度面とも特に大きな問題は発生せず、また、顔の特徴点を指定させる方法以外にも、後述するような抽出補助パラメータを複数指定させるという方法で代替可能である。
【００５２】
なお、上記実施形態は、鼻点指示部１２の制御内容において、利用者に指定させる抽出補助パラメータを、顔色の１パラメータとしたときの実施形態であるが、上述した通り、利用者に複数の抽出補助パラメータを指定させることで、顔の特徴点の抽出精度や処理速度の短縮化を図ってもよい。
【００５３】
その入力画面の一例を図５に示す。この例では、顔色と髪色をスライダーによる指定とし、あらかじめ数パターン準備された顔形状をラジオボタンの指定としていることが模式化されている。なお、ここで新たに指定可能（図２の顔色カラーパレットの指定に加えて新たに指定可能）となった髪色や顔形状のパラメータは、顔色のパラメータと同様、そのまま特徴点抽出部１３に引き渡される。特徴点抽出部１３では、このパラメータを、主に顔の輪郭抽出に利用することになる。
【００５４】
抽出補助パラメータの数を増やすことは、抽出ロジックの簡便化、抽出の高速化、正確さの向上のみならず、ユーザ操作の多い「抽出すべき特徴点を全て指定させる」場合（方法２）と比較しても、「右目をクリックしてください」がどちらの目をクリックするのか、また、「口をクリックしてください」が口のどこをクリックするのかといった迷いがない。つまり、顔色や髪型の色のスライダーバーでの変更や、輪郭の選択を、直観的にかつ容易に行うことができる。
【００５５】
以上、上記実施形態の画像処理装置は、その装置の３次元顔画像の生成処理を、図示していない記憶部に格納されている３次元顔画像処理プログラムによって実現されている。このプログラムは、コンピュータ読み取り可能な記録媒体に格納されている。本発明では、この記録媒体は、図示していないが、画像処理装置の内部にプログラム読み取り装置を備え、そこに記録媒体を挿入することで読み取り可能なプログラムメディアであってもよいし、あるいは装置内部のプログラムメモリ等の記憶手段に格納されているものであってもよい。いずれの場合においても、格納されているプログラムは直接アクセスして実行させる構成であってもよいし、あるいはいずれの場合もプログラムを読み出し、読み出されたプログラムは、図示していない主記憶メモリにダウンロードされて、そのプログラムが実行される方式であってもよい。このダウンロード用のプログラムは、あらかじめ装置本体に格納されているものとする。
【００５６】
ここで、上記プログラムメディアは、本体と分離可能に構成される記録媒体であり、磁気テープやカセットテープ等のテープ系、フロッピディスクやハードディスク等の磁気ディスクや、ＣＤ−ＲＯＭ、ＭＯ、ＭＤ、ＤＶＤ等の光ディスクのディスク系、ＩＣカードや光カード等のカード系、あるいはマスクＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、フラッシュＲＯＭ等による半導体メモリを含めた固定的にプログラムを担持する媒体であってもよい。
【００５７】
また、本発明においては外部との通信が可能な手段（無線通信機能、あるいはインターネット等の公衆回線を介する有線通信機能）を備えている場合には、これを用いて外部と接続し、そこからプログラムをダウンロードするように流動的にプログラムを担持する媒体であってもよい。なお、このように通信ネットワークからプログラムをダウンロードする場合には、そのダウンロード用プログラムはあらかじめ装置本体に格納しておくか、あるいは別の記録媒体からインストールされるものであってもよい。
【００５８】
【発明の効果】
本発明の画像処理装置によれば、従来の顔部品を全て自動抽出する場合と比較して、処理速度の向上、生成もしくは成形する画像データの品質の向上を図ることができる。また、複数の人物が含まれる原画像データであっても、１つの特徴点を指定するのみで、対応することができる。また、従来の２以上の顔部品を指定させる場合と比較しても、生成もしくは成形される画像データの品質をさほど劣化させることなく、利用者の操作負担の軽減や誤操作の防止を図ることができる。
【００５９】
また、本発明の画像処理装置によれば、入力手段により入力される１点を鼻の位置座標としているので、鼻点をユーザ指定させることで、複数点指定させる場合と比較して、最終出力データの品質を劣化させることなく、利用者の操作負担の軽減や誤操作の防止を図ることができる。
【００６０】
また、本発明の画像処理装置によれば、顔部品の抽出を補助するパラメータ指定手段をさらに備え、このパラメータ指定手段は、表示された補助パラメータをドラッグ操作で鼻の位置まで移動させた後ドロップ操作を行うことで指定するように構成したので、特徴点の指定数を減らしても高品質な出力データを得ることができる。
【００６１】
さらに、本発明のによれば、従来の顔部品を全て自動抽出する方法と比較して、処理速度の向上、生成もしくは成形する画像データの品質の向上を図ることができる画像処理方法を提供することができる。
【図面の簡単な説明】
【図１】本発明の画像処理装置のブロック構成図である。
【図２】抽出補助パラメータとして原画像の人物顔色のおおよそ値をユーザ指定させるときの画面レイアウト例を示す説明図である。
【図３】鼻点座標ならびに顔色カラーパレット値の算術ロジックを示すフローチャートである。
【図４】特徴点抽出部での鼻点座標値の利用概要を示す説明図である。
【図５】抽出補助パラメータ指定の入力画面例を示す説明図である。
【図６】顔特徴点を抽出するための概略ブロック図である。
【図７】顔の輪郭領域の大きさや位置が極端に異なる原画像の例を示す説明図である。
【図８】複数の人物が撮影されている画像を原画像として入力されたときに発生する問題点を説明するための図である。
【図９】特徴点を指定させるユーザインタフェースの例を示す説明図である。
【図１０】各指定部位を名前ではなく番号で指定させるユーザインタフェースの例を示す説明図である。
【符号の説明】
１０２次元画像入力制御部
１１原画像指定部
１２鼻点指定部
１３特徴点抽出部
１４３Ｄ画像符号化部
１５３Ｄ画像表示部
２０ＣＣＤカメラ
３０入力装置
４０出力装置
５０原画像記憶装置
５１各種変数・定数記憶装置
５２標準人物モデル等（符号化用定数）記憶装置
５３成形画像記憶装置[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to an image processing apparatus and an image processing method for generating a three-dimensional image from an input image.To the lawIn particular, image conversion processing that requires extraction control of several facial feature points (facial feature portions), such as making facial image data (two-dimensional facial image data) obtained by a CCD camera or the like into a three-dimensional image. About.
[0002]
[Prior art]
2. Description of the Related Art In recent years, in information equipment represented by personal computers, the number of models with a built-in CCD camera is increasing due to advances in miniaturization technology of camera devices and lower prices of camera device components. Therefore, using these functions, a function to generate a 3D human model based on a 2D face image captured by a CCD camera such as yourself, neighbors, acquaintances, etc. The number of PC application software and the like having a function such as performing is increasing.
[0003]
These two-dimensional face image forming technologies have been researched and developed with the theme of facial expression generation, etc., and for 3D model generation (encoding), for example, extraction of shape changes in the eyes and mouth, A paper on composition and coding method “Detection of shape change and coding of face moving image based on 3D shape model” (Science Technical Report IE87-101) has been disclosed. Is a “three-dimensional caricature generation method and apparatus” for automatically generating a caricature viewed from an arbitrary direction using, for example, a front face and a side face caricature drawn manually. (Japanese Patent Laid-Open No. 10-74271) is disclosed, and a single image or a plurality of images are selected from a plurality of types of images having different types such as style and expression based on a single input image data. Generate Aimed "image creating apparatus, an image forming method and an image generating program recording medium" (Japanese Patent Laid-Open No. 10-232950) discloses and.
[0004]
  By the way, in the method of generating a 3D human model based on the face image data (two-dimensional face image data) obtained by a CCD camera or the like as described above, and the method of caricature / animation shaping of a two-dimensional face image. The extraction of facial feature points in the two-dimensional face image data is an essential constituent requirement. A schematic block diagram for extracting the face feature points is shown in FIG. That is, such as CCD cameraoriginalThe face image data input from the image input unit 101 is output to the feature point extraction unit 102. The feature point extraction means 102 extracts the position of each part (eyes, nose, mouth, etc.) of the face, and outputs the extracted coordinate position of each part and the original image to the original image shaping means 103. Here, the extraction of the position of each part of the face by the feature point extraction unit 102 differs depending on what kind of shaping is performed. That is, the data content to be extracted depends on the molding content of the original image shaping unit 103, and is not a point but an area, or the center position and shape of each part.
[0005]
However, in order to facilitate the description here, the following description will be given as “points”. Further, since the subsequent control contents are individually different, the details are omitted. For example, in the case of a 3D human model, the following control procedure is used.
[0006]
That is, (1) a texture is generated based on feature point information and original image data. (2) A three-dimensional human model (a 3D human object whose face portion is a stick) is generated for texture mapping using the feature point information (generated by modifying the standard model). (3) Perform texture mapping. That is, the 3D person object in which the face image obtained from the original image input unit 101 is embedded is encoded. (4) Display a 3D person object. Such a control procedure (1) to (4) is performed. Here, (1) and (2) are generally processed in parallel. That is, there is no order in control.
[0007]
[Problems to be solved by the invention]
As a method for embodying the feature point extraction means 102 described above, all feature points to be extracted are extracted (this is called method 1), and all feature points to be extracted are designated by the user (this is called method 2). There are two types.
[0008]
In the method 1, generally, (a) the outline of a face is extracted from given original image data. (B) Each part is extracted within the range of the contour area of the obtained face. This is realized by two steps. However, in the process of (a), as shown in FIG. 7, original images with extremely different face outline area sizes (left is large, right is small) and positions (left is center, right is left end) are extremely different. Need to be treated equally, which causes a decrease in extraction accuracy, a decrease in speed performance, and the like. In addition, the method for extracting the face contour region is generally realized by extracting the skin color part (currently, the only way to automatically extract is to rely on the skin color part extraction). In particular, the extraction of the nose in the extraction process of each part in the next step (b) is very difficult because the basic color is the same as that of the skin. In addition, there is a fatal problem in the realization by the method 1. As shown in FIG. 8, it occurs when an image in which a plurality of persons are photographed is input as an original image, and the object to be molded cannot be determined.
[0009]
Next, the problem of the method 2 will be described. Before that, the feature point to be extracted by the feature point extraction unit 102 will be described by taking as an example the case of generating a 3D human model.
[0010]
-Feature points to be extracted by the feature point extraction means 102-
The more feature points that are extracted, the more it is possible to generate a 3D human model (a 3D human object whose face part is a sticker) with which the actual texture is cha-mapped. Naturally, the generated 3D human model can also generate a higher level of facial expression and be of high quality. In this case, considering from the following various factors [1) to 4)], the extracted feature points in Method 2 are the nose, eyes (left and right), mouth, ears (left and right), jaws It is reasonable to consider these as 7 points.
[0011]
1) Recognizing facial irregularities
Since the final purpose is to generate and display a three-dimensional image (three-dimensional image), it is necessary to detect the unevenness of the face to some extent, and in that sense, at least, the nose (corresponding to the apex of the face) It is necessary to extract the eyes. The eyes can be replaced by the mouth, but in terms of expressing facial expressions, the eyes are considered to be given priority.
[0012]
2) Face orientation of the original image
There is no guarantee that the original image is necessarily a complete front image. Therefore, in order to generate a stereoscopic image, it is necessary to calculate the orientation of the face of the original image. In order to calculate the orientation of the face, it is necessary to specify at least three points of each part of the face (of which two points must be paired such as the eyes). In this case, It is considered that the eyes (left and right) and the nose located almost in the center of the face are the best. In other words, if it is limited to the left and right orientations, it is possible to replace the nose with the mouth, but considering the calculation of the orientation of the upper and lower faces, it is impossible to calculate with only the eyes (left and right) and the mouth. is there. In addition, the ears (left and right) can be used as an alternative to the eyes (left and right). However, the eye (left and right) is considered to be slightly unsuitable when considering gaze detection when performing more advanced facial expression generation. And the nose are considered the best.
[0013]
3) Extraction of facial contour region from original image
As described above, in the case of automatically detecting facial feature points, in general, a face contour region is extracted from an original image. Even when this process is implemented by a method in which all facial feature points are specified by the user, a detailed processing content is different, but a three-dimensional human model (a 3D human object with a face sticker) is generated. This is processed at the time of generating the face contour of the three-dimensional human model when (in some cases, generating a standard model prepared in advance). In order to facilitate this process (facial contour generation / deformation), at least three feature points constituting the face contour are designated. These three points are the ear (left and right) and the jaw. Is generally.
[0014]
4) Designation as a feature point
In both the case of caricature generation and 3D model generation, the purpose is to output a shaped image resembling a person in the original image, so at least the eyes and mouth are particularly important facial feature points. . The hairstyle, the presence or absence of eyeglasses, the shape, and the like are also important feature points for specifying a person. However, since these are easy-to-part parts and parts to be parted, they are not the main theme in the present invention. In other words, for eyeglasses, although it is possible to extract the eyeglass part, it may be preferable application software to limit to the original image without eyeglasses and prepare some eyeglass parts. The hairstyle is the same as that of glasses. However, in the case of a hairstyle, the original image cannot be made into a shaved head, so some extraction processing is necessary. Here, for ease of explanation, the principle of glasses is described based on the principle that there is no person image with glasses in the original image. The hairstyle is described based on the principle that the hair contour extraction is included in the process of extracting the face contour region.
[0015]
As described above in 1) to 4), which part of the face is designated as the feature point even in the method 2 (that is, “all the feature points to be extracted are designated by the user”). Depending on the final purpose (molded image), it is common to have 7 points or more in the nose, eyes (left and right), mouth, ears (left and right), and jaws.
[0016]
The following two problems also occur in the method 2 that “allows the user to specify all feature points to be extracted”. That is, there is a problem of a user's designation addition and a problem of a designation guidance message for a pair of parts (eyes and ears).
[0017]
The problem of user-specified addition will be explained. In the era when CCD cameras and the like were relatively expensive devices, and information equipment equipped with this device as standard equipment was commercialized for special purposes, the above-described two-dimensional person Use scenes such as making images into three-dimensional images have been commercialized as one function (application software) of a reasonably expensive system (for example, a TV conference system). For this reason, the quality of the three-dimensional image (how to generate a realistic expression, etc.) tends to be more important than the load on the user's generation of the three-dimensional image.
[0018]
However, recently, along with the price reduction of CCD cameras and the high performance of CPU and image processing arithmetic processing, for example, a three-dimensional image of a sender is added to an e-mail and transmitted. For example, 3D images are also being used for entertainment purposes. For this reason, there is a tendency to increase the demand for reducing the user load (user interface) for generating a three-dimensional image and making it easy.
[0019]
Next, the problem of the designation guidance message for the paired parts (eyes and ears) will be explained. Generally, as shown in FIG. 9, the user interface for designating each feature point specifies “XXX”. It becomes an interactive user interface that displays the message "Please." Here, as shown in FIG. 9, when the message “Please specify the right eye” is displayed, the user (hereinafter also referred to as the user) has a problem as to which eye is pointing. That is, the problem is whether the user points to the original right eye of the person image and the right eye on the right side when viewed from the user.
[0020]
The original image of the person is the one facing the user (pointing person), the right eye is on the left side and the left eye is on the right side. Whether it is called the right eye and pointing to the left eye will probably depend on the user.
[0021]
However, on the processing logic, in the “Specify left eye” message that will continue next, you can even specify the reverse eye part that was specified in the “Specify right eye” message. For example, if there is no problem, but the user gets lost on the way, for example, in the message "Please specify your right eye." "Please specify" message, you may notice a mistake in the previous input, and the user may be worried about what to do or specify the right eye, that is, the correct left side again. It is a problem.
[0022]
In addition, at present, as a method for solving the above problem in a pseudo manner, as shown in FIG. 10, by designating each designated part with a number instead of a name, the user is not caused an error or hesitation. There is also a method of taking a user interface in consideration. However, for example, in the case of the mouth part (number 4 part), it is impossible to solve the problem that raises the question of whether the center of the mouth or anywhere in the mouth is acceptable. That is, the user does not know the internal logic. In this case, it is common to use good logic anywhere around the mouth (that is, extraction processing is performed based on the specified point), but the user does not know that itself, so the question of what to do However, having the question itself becomes a problem.
[0023]
  The present invention was devised to solve such a problem, and its purpose is to make it easy to specify the user input accompanying the facial feature point extraction process in 3D conversion or animation conversion of a 2D face image. Image processing apparatus and image processing method capable of realizing high-speed facial feature point extraction processingThe lawIt is to provide.
[0024]
[Means for Solving the Problems]
  In order to solve the above problems, an image processing apparatus according to the present invention provides:originalAn image input means for inputting an image;A color palette prepared in advance to assist in extracting facial parts;Input by image input meansoriginalImage data and thisoriginalOne face part of the face image existing in the image dataThe noseA message prompting the user to specify the display position ofOn the other hand, the original image data and the color palette are displayed side by side.Display means;In accordance with the message prompting specification of the display position of the nose, which is a facial part, the position coordinates of the nose, which is one point on the displayAn input means for inputtingParameter specifying means for specifying the color value of the nose using the color palette displayed on the display means;Entered by the input meansNose positionCoordinateAnd other than the nose using the color value of the nose specified by the parameter specifying meansFace partsBy extracting the skin color or non-skin colorBased on the facial part extracting means to be extracted, the facial part extracted by the facial part extracting means, and the input base point coordinate informationGenerate a texture and map it to a 3D human modelAnd an image generating means for generating a three-dimensional image.Further, the face part extracting means divides the original image into a plurality of search areas based on a relative positional relationship with the nose position coordinates, and gives the nose color value a predetermined allowable range for each search area. By extracting a closed space or a closed figure that is no longer in the range, facial parts in each search area are extracted.
[0025]
Here, the image input means includes not only a camera device such as a CCD camera but also a selection of a target file from a filed image data group. The input means includes a pointing device for inputting any one point on the display such as a mouse.
[0026]
According to the present invention having such a feature, the processing speed is improved and the quality of image data to be generated or shaped is improved as compared with the conventional method 1 “automatic extraction of all face parts”. Can do. Further, even original image data including a plurality of persons can be dealt with by only specifying one feature point (face part). In addition, compared with the conventional method 2 “2 or more face parts specified by the user”, the operational burden on the user can be reduced and the operation error can be reduced without significantly degrading the quality of the generated or formed image data. Can be prevented.
[0027]
In the image processing apparatus of the present invention, one point input by the input means is used as the position coordinate of the nose. That is, the “message that prompts the user to specify the display position of one face part” is a message that prompts the user to input a nose. Considering facial feature point extraction control, it is unlikely that a sideways image or a backward-facing photo will be input as the original image to be input, and a photo that is generally facing the front is normally input. In this case, the nose is positioned approximately at the center of the face. In addition, when the message “Please specify the nose” is displayed, it is considered that few people specify the base of the nose. In addition, there is only one nose per person. Furthermore, the basic logic of face contour and facial part extraction logic is to detect skin color parts or non-skin color parts, in which case nose extraction is compared to other parts such as mouth and eyes, Automatic extraction is difficult. Furthermore, the layout of human face parts is common to all. For these various reasons, in particular, by letting the user specify the nose point, compared to the case where multiple points are specified, the quality of the final output data is not degraded, and the operational burden on the user is reduced or incorrect operations are performed. Prevention can be achieved.
[0028]
  The image processing apparatus of the present invention assists in extracting facial parts.Color palettePreparationAndThe parameter designation means isSpecify the color palette value at the drag position as the nose color value at the drop position by dropping it on the nose position of the original image.It is like that.According to the present invention, a parameter (color palette value) for assisting a feature point is an operation of continuing to press the display position of each parameter value that is converted into a display object (drag operation: a mouse is moved by pointing while continuing to hold a mouse button. The operation of pressing the pen on the tablet and pointing and moving with the pen) is determined by the coordinates when the button is pressed, and the display position of the face part is an operation that stops the continuation of the pressing operation (drag operation) (Drop operation: mouse button up or pen up) Determine the coordinates.
[0029]
  The image processing apparatus of the present invention includes a color palette that assists in extracting facial parts, and the parameter specifying means operates a slider provided in the color palette displayed on the display means, A color palette value (auxiliary parameter) at the slider position is designated as the color value of the nose. The display unit displays a hair color palette and a plurality of patterns of face shapes, and the parameter designating unit is a slider provided in the hair color palette displayed on the display unit. , The color palette value at the slider position is designated as the hair color, and one face shape is designated from among a plurality of patterns of face shapes displayed on the display means.
[0030]
  This auxiliary parameter(Color palette value)Is an alternative means for obtaining high-quality output data even if the number of feature points specified is reduced. However, the amount of auxiliary parameter specified is drastically reduced when no point is specified or when one point is specified. To do. In other words, the quality can be ensured as it is without detailed specification. Also related to usage scene, product concept, extraction logic, etc.The paIn some cases, the purpose can be achieved by specifying parameters, and the drag-and-drop operation is optimal as the man-machine interface in this case.
[0031]
  In addition, a method of face feature point extraction processing that makes it easy for a user to specify input in face feature point extraction processing in three-dimensional conversion or animation conversion of a two-dimensional face image is not necessarily in the form of an apparatus. It is not limited and can be realized as a method. Therefore, the image processing method of the present invention generates a three-dimensional image from an input image.Image generation method using image processing apparatusBecauseA color palette for assisting the extraction of the facial parts is prepared in advance, the step of storing the original image input from the image input means, and the image input meansThe input original image data and one face part of the face image existing in the original image dataThe noseA message prompting the user to specify the display position ofDisplay meansSteps to display;Displaying the original image data input from the image input means and the color palette on the display means, storing the position coordinates of the nose that is one point on the display input from the input means, A step of storing a color value of the nose designated using the color palette displayed on the display means, and a face component extraction means, the position coordinates of the nose inputted from the input means and the color of the designated nose Extracting a facial part other than the nose by extracting a skin color or a part other than the skin color using the value, and an image generating means at the facial part extracting meansBased on the extracted facial parts and the coordinate information of the input base pointGenerate a texture and map it to a 3D human modelGenerating a three-dimensional image.
[0032]
  Furthermore, the technical idea according to the present invention is as follows:Recording medium on which an image processing program for causing a computer to execute each step of the above image processing method is recordedCan also be provided as.
[0033]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0034]
FIG. 1 is a block diagram of an image processing apparatus according to the present invention for converting a person's two-dimensional face image data into a three-dimensional face image.
[0035]
The image processing apparatus includes a two-dimensional image input control unit 10, an original image designation unit 11, a nose point designation unit 12, a feature point extraction unit 13, a 3D image encoding unit 14, a 3D image display unit 15, a CCD camera 20, and an input. The apparatus 30, the output device 40, the original image storage device 50, various variable / constant storage devices 51, a standard person model or the like (encoding constant) storage device 52, and a formed image storage device 53.
[0036]
The two-dimensional image input control unit 10 controls the shutter to send the original image data to the connected CCD camera 20 and stores the original image data sent from the CCD camera 20 in the original image storage device 50 ( Control until file creation. In this embodiment, the CCD camera 20 is an image capturing device capable of inputting a color image.
[0037]
The original image designating unit 11 reads an original image file designated by the input device 30 from a plurality of original image files stored in the original image storage device 50, and delivers the original image file to the nose point instruction unit 12. Control up to.
[0038]
The nose point instruction unit 12 outputs the original image data delivered from the original image designating unit 11 to the output device 40, and also inputs the nose point input by the user via the input device 30 and an extraction auxiliary parameter (here, , Face color values described later) are controlled until they are delivered to the feature point extraction unit 13.
[0039]
The feature point extraction unit 13 reads various variables and constant values for extraction from the various variable and constant storage device 51 with reference to the nose point coordinates and face color values (extraction auxiliary parameters) delivered from the nose point instruction unit 12. The feature points of each face are extracted using these values. The extraction method uses the nose point coordinates as the base coordinates, and uses various variables and constant values (extraction auxiliary parameters) read based on the face color values, etc., to divide the original image into several regions, The extraction processing of each part to be extracted such as the mouth portion is processed only in the corresponding region. Further, the feature points extracted here are delivered as feature point information to the 3D image encoding unit 14. A series of control up to this point is performed by the feature point extraction unit 13.
[0040]
The 3D image encoding unit 14 generates a texture based on the delivered feature point information and original image data, and then reads an appropriate 3D human model from the standard human model storage device 52 based on the feature point information. A series of controls for performing texture mapping and storing in the shaped image storage device 53 is performed.
[0041]
The 3D image display unit 15 reads the 3D image from the molded image storage device 53 and performs control to display the 3D image on the output device 40.
[0042]
The above is the control of each part of the image processing apparatus of the present invention. The control content of the nose point instruction unit 12 which is a characteristic part of the present invention will be described in more detail below.
[0043]
FIG. 2 shows an example of the screen layout of the nose point instruction unit 12 when the user specifies an approximate value (color palette value) of the human face color of the original image as an extraction auxiliary parameter. Here, the face color palette specification area [(cX min, cY min) to (cX man, cY man)] is a gradation display of colors (Cmin to Cman) that can be a human face color, and the original image display area [(uX min, uY min) to (uX man, uY man)] displays the image data input from the original image designating unit 11. In addition, a message that guides the user to drag and drop from the position close to the face color of the person in the original image to the nose position of the original image (in this example, "Drag the color palette that matches the person face color, "Please drop at the nose position" is displayed in the message display area at the bottom of the output screen.
[0044]
This guides a series of input operations by dragging a portion close to the face color of the person in the face color palette designated area and dropping it on the nose position of the original image, and the input obtained by the user's input operation Based on the information, the nose point coordinates and the face color palette value in the original image display area are calculated.
[0045]
FIG. 3 is a flowchart showing the control contents of the nose point instruction unit 12, that is, the arithmetic logic (that is, the arithmetic logic of the nose point coordinates and the face color palette value).
[0046]
That is, first, an input screen for guiding a series of input operations for dragging a portion close to the face color of the person in the face color palette designated area shown in FIG. 2) is generated and displayed (step S1). Next, when a “left button down” message is input from the input device 30, for example, a mouse (step S 2), the condition that the coordinate where the mouse is down is within the face color palette designated area (YES in step S 3). The face color palette value is substituted into the variable FaceC from the button-down coordinates (step S4). Subsequent to this processing, if a “left button up” message is input from the mouse while the mouse is down (when YES is determined in step S5), the coordinates where the mouse is up are within the original image display area. The button-up coordinates are substituted into (NosePx, NosePy) as a condition (provided that YES is determined in step S6) (step S7). Finally, using the variables FaceC, NosePx, and NosePy as parameters, control is transferred to the feature point extraction unit 13 (step S8).
[0047]
Note that the feature point extraction unit 13 uses the variables FaceC, NosePx, and NosePy input from the nose point specification unit 12 to determine facial feature points (coordinates, regions, or objects) other than the nose. Is extracted depending on the specifications of the feature point extraction unit 13. Since the basic logic of this extraction process is the same as the conventional logic for extracting facial feature points, a detailed description is omitted here, but the basic idea of the extraction logic is to extract the skin color or a part other than the skin color. This is realized.
[0048]
FIG. 4 shows an outline of how the feature point extraction unit 13 uses the nose point coordinate (facial color palette) values (FaceC, NosePx, NosePy) (the extraction logic using these three variables). Summary) is listed in tabular form.
[0049]
Explaining along FIG. 4, first, the contour extraction can be realized by extracting a closed space that is no longer the face color palette value FaceC ± d (d is an allowable range) around the nose point (NosePx, NosePy). . A method of automatically detecting feature points to be extracted because the processing can be performed centering on a nose point specified in advance (that is, a base point for extracting a region that is not a facial color can be used as a nose point). Compared with the method 1) described in the section, the extraction logic is very easy and enables extraction with extremely high accuracy.
[0050]
Next, in the extraction of the right eye part, the search area is u1 based on the relative positional relationship with the nose point, and the basic extraction logic is a closed figure that is no longer the face color palette value FaceC ± d (d is an allowable range). It can be realized by extracting from the presence of black eyes. Compared with method 1 as described above, in addition to simplification of logic and improvement of accuracy, it is possible to speed up the search time by narrowing down the search area. The extraction of the left eye part and the mouth part is the same except for the search area.
[0051]
Next, the extraction of the ears and jaws is generally not particularly performed in Method 1. In addition, the method of causing the user to specify all the feature points to be extracted (method 2 described in the section of the problem to be solved by the invention) is to specify the ears and jaws by extracting the contour and the orientation of the face. The purpose of this process is to improve accuracy and simplify logic, and it is desirable to substitute for application software that does not require much accuracy. In other words, since the “face orientation” can be calculated from the nose point and the left and right eye points, the orientation is almost the same as the face orientation. In addition, the face contour extraction and the like will not cause any serious problems in terms of speed and accuracy as long as the extraction base point is clear, and other than the method of specifying the facial feature points, It is possible to substitute by a method of specifying a plurality of extraction auxiliary parameters.
[0052]
In addition, although the said embodiment is embodiment when the extraction assistance parameter to which a user designates is 1 parameter of face color in the control content of the nose point instruction | indication part 12, as above-mentioned, there are several to a user. By specifying the extraction assistance parameter, the facial feature point extraction accuracy and processing speed may be shortened.
[0053]
An example of the input screen is shown in FIG. In this example, the face color and hair color are designated by a slider, and the face shape prepared in advance of several patterns is designated by radio buttons. Note that the hair color and face shape parameters that can be newly specified here (in addition to the specification of the face color palette in FIG. 2) can be directly input to the feature point extraction unit 13 in the same manner as the face color parameters. Delivered. In the feature point extraction unit 13, this parameter is mainly used for face contour extraction.
[0054]
Increasing the number of auxiliary extraction parameters not only simplifies the extraction logic, speeds up the extraction, improves the accuracy, but also “has all the feature points to be extracted” specified by many user operations (method 2). Even if you compare, there is no doubt as to which eye will be clicked by "Click on the right eye" and where in the mouth "Click on the mouth" clicks. That is, it is possible to intuitively and easily change the face color and hairstyle color with the slider bar and select the contour.
[0055]
As described above, the image processing apparatus according to the above-described embodiment is realized by the three-dimensional face image processing program stored in a storage unit (not shown). This program is stored in a computer-readable recording medium. In the present invention, this recording medium is not shown, but may be a program medium provided with a program reading device inside the image processing apparatus, and readable by inserting the recording medium therein, or the device. It may be stored in storage means such as an internal program memory. In any case, the stored program may be directly accessed and executed, or in any case, the program is read and the read program is downloaded to a main memory (not shown). Then, the program may be executed. It is assumed that this download program is stored in the apparatus main body in advance.
[0056]
Here, the program medium is a recording medium configured to be separable from the main body, such as a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a floppy disk or a hard disk, CD-ROM, MO, MD, DVD. It may be a medium carrying a fixed program including a disk system such as an optical disk, a card system such as an IC card or an optical card, or a semiconductor memory such as a mask ROM, EPROM, EEPROM, flash ROM or the like.
[0057]
Further, in the present invention, when a means capable of communicating with the outside (wireless communication function or wired communication function via a public line such as the Internet) is provided, it is used to connect to the outside and from there It may be a medium that carries the program in a fluid manner so as to download the program. When the program is downloaded from the communication network in this way, the download program may be stored in the apparatus main body in advance, or may be installed from another recording medium.
[0058]
【The invention's effect】
According to the image processing apparatus of the present invention, the processing speed can be improved and the quality of image data to be generated or shaped can be improved as compared with the case where all conventional facial parts are automatically extracted. Further, even original image data including a plurality of persons can be dealt with by designating only one feature point. In addition, compared with the conventional case where two or more facial parts are specified, it is possible to reduce the operational burden on the user and prevent erroneous operations without significantly degrading the quality of the generated or molded image data. it can.
[0059]
Further, according to the image processing apparatus of the present invention, since one point input by the input means is used as the position coordinate of the nose, the final output is compared with the case where a plurality of points are specified by specifying the nose points by the user. It is possible to reduce the operation burden on the user and prevent erroneous operations without degrading the data quality.
[0060]
Further, according to the image processing apparatus of the present invention, the image processing apparatus further includes parameter specifying means for assisting the extraction of the facial part, and the parameter specifying means drops the auxiliary parameter displayed after moving the displayed auxiliary parameter to the position of the nose by a drag operation. Since the configuration is such that it is specified by performing an operation, high-quality output data can be obtained even if the number of specified feature points is reduced.
[0061]
Furthermore, according to the present invention, there is provided an image processing method capable of improving the processing speed and improving the quality of image data to be generated or shaped as compared with the conventional method of automatically extracting all facial parts. be able to.
[Brief description of the drawings]
FIG. 1 is a block diagram of an image processing apparatus according to the present invention.
FIG. 2 is an explanatory diagram showing an example of a screen layout when a user specifies an approximate value of a human face color of an original image as an extraction auxiliary parameter.
FIG. 3 is a flowchart showing arithmetic logic of nose point coordinates and facial color palette values.
FIG. 4 is an explanatory diagram showing an outline of use of nose point coordinate values in a feature point extraction unit;
FIG. 5 is an explanatory diagram showing an example of an input screen for specifying extraction auxiliary parameters.
FIG. 6 is a schematic block diagram for extracting face feature points.
FIG. 7 is an explanatory diagram illustrating an example of an original image in which the size and position of a face outline region are extremely different.
FIG. 8 is a diagram for explaining a problem that occurs when an image in which a plurality of persons are photographed is input as an original image.
FIG. 9 is an explanatory diagram showing an example of a user interface for designating feature points.
FIG. 10 is an explanatory diagram illustrating an example of a user interface for designating each designated portion by a number instead of a name.
[Explanation of symbols]
10 2D image input controller
11 Original image designation part
12 Nose point designation part
13 Feature point extraction unit
14 3D image encoding unit
15 3D image display
20 CCD camera
30 Input device
40 Output device
50 Original image storage device
51 Various variable and constant storage devices
52 Standard person model (encoding constant) storage device
53 Molded image storage device

Claims

An image input means for inputting an original image;
A color palette prepared in advance to assist in the extraction of facial parts;
While displaying a message prompting designation of the display position of the nose is one facial part of the face image existing in the original image data and within the original image data input by said image input means to the user, the original image data And display means for displaying the color palette side by side ,
In accordance with the message prompting the designation of the display position of the nose that is a facial part, input means for inputting the position coordinates of the nose that is one point on the display;
Parameter specifying means for specifying the color value of the nose using the color palette displayed on the display means;
Using the color values of the designated nose by the position coordinates and the parameter specifying means nasal entered by the entering force means, face parts to be extracted by extracting the locations other than the skin color or skin color facial parts other than the nose Extraction means;
Image generating means for generating a texture based on the face part extracted by the face part extracting means and the input coordinate information of the base point , mapping to a three-dimensional human model, and generating a three-dimensional image; An image processing apparatus characterized by that.

The image processing apparatus according to claim 1.
The parameter designating means drags a desired location in the color palette designation area displayed on the display means and drops it on the nose position of the original image, thereby obtaining the color palette value of the drag position at the drop position. An image processing apparatus, which is specified as a color value of a nose .

The image processing apparatus according to claim 1.
Image the parameter specifying means, by manipulating the slider provided in the color palette displayed on the display unit, characterized by specifying a color palette value of the slider position as the color values of the nose Image processing device.

The image processing apparatus according to claim 3.
The display means displays a hair color palette and a plurality of patterns of face shapes,
The parameter designating means designates a color palette value at the slider position as a hair color by operating a slider provided in the hair color palette displayed on the display means, and is displayed on the display means. images processor characterized by specifying one of the face shape from among a plurality of patterns of the face shape.

The image processing apparatus according to any one of claims 1 to 4, wherein:
The face part extracting means divides the original image into a plurality of search areas based on a relative positional relationship with the nose position coordinates, and each nose color range has a predetermined allowable range for the nose color value. An image processing apparatus that extracts facial parts in each search region by extracting a closed space or a closed figure that is no longer a part .

An image processing method using an image processing apparatus that generates a three-dimensional image from an input image,
A color palette is prepared in advance to assist in extracting facial parts.
Storing the original image input from the image input means;
Displaying on the display means a message prompting the user to specify the display position of the nose that is one face part of the original image data input from the image input means and the face image existing in the original image data;
Displaying the original image data input from the image input means and the color palette on the display means;
Storing the position coordinates of the nose, which is one point on the display, input from the input means;
Storing the color value of the nose designated using the color palette displayed on the display means;
A step of extracting facial parts other than the nose by extracting a facial color other than the nose using a position coordinate of the nose input from the input means and the color value of the designated nose by extracting the facial color. When,
A step of generating a texture based on the coordinate information of the facial part extracted by the facial part extracting means and the input base point, and generating a three-dimensional image by mapping to a three-dimensional human model ; An image processing method comprising: