JP2004505353A

JP2004505353A - Facial feature special method and device by accurate landmark detection on expressionless facial image

Info

Publication number: JP2004505353A
Application number: JP2002514665A
Authority: JP
Inventors: ブッデンメイヤー，ウルリッヒ，エフ; ネベン，ハートマット
Original assignee: アイマティック・インターフェイシズ・インコーポレイテッド
Priority date: 2000-07-24
Filing date: 2001-07-24
Publication date: 2004-02-19
Also published as: WO2002009038A2; AU2001277148B2; WO2002009038A3; KR100827939B1; KR20030041131A; AU7714801A; EP1303842A2

Abstract

本発明は演技者の無表情顔面画像を利用して顔面造作部をトラッキングするための視覚センサーの特製化方法と装置とを提供する。この方法は演技者の顔面造作部をトラッキングするセンサーの性能を改善させるために補正グラフを創出させる。The present invention provides a method and apparatus for customizing a visual sensor for tracking a facial feature using an expressionless facial image of an actor. This method creates a correction graph to improve the performance of the sensor tracking the performer's facial features.

Description

【０００１】
【関連出願】
本願は２０００年７月２４日出願の米国仮特許願第６０／２２０２８８号「無表情顔画像上での正確な目印検出による顔面特徴特製法及び装置」の優先権と、１９９８年１１月６日出願の米国一部継続特許願第９／１８８０７９号「アバター動画のためのウェーブレット利用顔面動作捕捉」の優先権とを主張する。
【０００２】
【発明の背景】
本発明はアバター動画に関連し、特には顔面造作部トラッキング技術に関連する。
【０００３】
アバターで満たされたバーチャル空間は共通の環境を経験させる魅力的な手段である。しかし、写真的アバター動画化は一般的に演技者の動きの面倒なトラッキング、特に顔面造作部のトラッキングを要する。
【０００４】
従って、顔面造作部トラッキング技術の改良が望まれている。本発明はこの要求を満たす。
【０００５】
【発明の概要】
本発明は演技者の無表情顔面を利用する視覚センサーを特製するための方法及び関連装置を提供する。この方法は演技者の無表情顔面画像の捕捉と、伸縮バンチグラフマッチング法（ｅｌａｓｔｉｃｂｕｎｃｈｇｒａｐｈｍａｔｃｈｉｎｇ）を利用した無表情顔面画像上での画面造作部位置の自動検出とを含んでいる。ノード（基準点：ｎｏｄｅ）が演技者の無表情顔面画像上の顔面造作部位置に自動的に配点される。その後にノードポジションは演技者の無表情顔面画像上にて手動で補正される。
【０００６】
さらに、この方法は補正されたノードポジションに基いて補正グラフを創出させることができる。
【０００７】
本発明の他の特徴と利点とは添付図面を利用した以下の詳細な説明で明らかとなろう。
【０００８】
【好適実施例の詳細な説明】
本発明は演技者の無表情顔面画像を利用した顔面造作部トラッキングのための視覚センサー特製化方法及び装置を提供する。この方法は演技者の顔面造作部トラッキングに利用するセンサー性能を改良させる目的に利用する補正グラフを創出させる。
【０００９】
図１に示すように、この方法では演技者の顔面画像が捕捉される（ブロック１２）。無表情顔面画像は図２に示すように視覚センサー特製化ウィザード２２を利用して捕捉される。捕捉画像２６の整合性を示すため、演技者に対して例示画像２４が示される。
【００１０】
次に、顔面造作部位置は伸縮バンチグラフマッチング法を利用して自動的に検出される（ブロック１４）。伸縮バンチグラフマッチング法を利用した顔面造作部検出法は米国特許願第０９／１８８０７９号にて開示されている。伸縮グラフマッチング技術においては、画像はガボールウェーブレット（Ｇａｂｏｒｗａｖｅｌｅｔ）に基いたウェーブレット変換を利用してガボール空間に変換される。変換された画像はオリジナル画像の各画素と関連する複合ウェーブレット成分値（ｃｏｍｐｌｅｘｗａｖｅｌｅｔｃｏｍｐｏｎｅｎｔｖａｌｕｅ）で表される。
【００１１】
図３に示すようにノード２８は特定の顔面造作部位置にて顔面画像上に自動的に配点される（ブロック１６）。演技者固有の画像特徴が介在するため、演技者の顔面画像上に置かれた顔面造作部グラフには顔面画像上に適正に配置されていないノード位置が含まれているであろう。例えば、演技者の眉毛の４点ノードは顔面画像の眉毛の多少上方に配点される。
【００１２】
本発明装置はノード２８を拾って移動させるために視覚センサー特製化ウィザード２２を使用する。ノードはマウス等のポインティング装置を使用して無表情顔面画像上で手動にて移動され、望む選択位置にまで引っ張られる（ブロック１８）。
【００１３】
例えば、図４に示すように、演技者画像の眉毛上へのノード配点は、例示画像２４に従って演技者眉毛と正確に整合させるように調整されている。
【００１４】
図５に示すように、ＡからＥまでの顔面造作部のノード２８が無表情顔面画像２４上に正確に配点された後に、画像ジェットがそれぞれの顔面造作部に対して再計算され、バンチグラフのギャレリー３２の対応ジェットと比較される。このバンチグラフギャレリーは多人数（Ｎ）のサブギャレリーを含む。サブギャレリーの各人は無表情顔面画像３４と、例えば笑顔や驚嘆を表す有表情顔面画像３６から３８のジェットを含む。
【００１５】
補正された演技者画像２４からのそれぞれの顔面造作部ジェットはいくつかのサブギャレリーの無表情ジェットからの対応する顔面造作部ジェットと比較される。顔面造作部Ａのジェットと最も近似する造作部（造作部Ａ）のサブギャレリーの無表情ジェットが補正グラフ４０の造作部Ａのジェットギャレリーを創出するために選択される。
【００１６】
別の実施例では、造作部Ｅが対象であり、人数（Ｎ）に対するサブギャレリーは、無表情画像２４からの造作部Ｅのジェットに最も近似する造作部Ｅの無表情ジェットを有している。サブギャレリーＮからのそれぞれの有表情造作ジェット３６から３８からの造作部Ｅのジェットと共に無表情ジェットからの造作部Ｅのジェットを使用して造作部Ｅの補正グラフジェットが創出される。
【００１７】
従って、無表情顔面画像２４に関して、補正グラフ４０はバンチグラフを形成するギャレリー３２から最良ジェットを使用して形成される。
【００１８】
得られた補正グラフ４０はノード位置をトラッキングするためにさらに強力なセンサーを提供する。補正グラフを組み込んだ特製造作部トラッキングセンサーはさらに写真状であるアバターと増強されたバーチャル空間経験とを提供する。
【００１９】
前述の説明は本発明の実施例を開示しているが、当業界技術者であれば、本発明の範囲内でそれら実施例を適宜変更できよう。
【図面の簡単な説明】
【図１】本発明に従った無表情顔面画像上での正確な目印検出による顔面造作部トラッキングを特製化する方法を示すフロー図である。
【図２】演技者のカメラ画像と一般的モデル画像とを示す視覚センサー特製化ウィザードである。
【図３】演技者顔面のカメラ画像上でのノード位置の自動検出と配点処理後の視覚センサー特製化ウィザードである。
【図４】本発明に従った補正グラフ創出のための補正ノードポジションを示す視覚センサー特製化ウィザードである。
【図５】本発明に従った無表情顔面画像を利用した補正グラフ創出技術を示すブロック図である。[0001]
[Related application]
This application claims priority to US Provisional Patent Application No. 60/220288, filed July 24, 2000, entitled "Special Features and Apparatus for Facial Features by Accurate Mark Detection on Expressionless Face Images," Claims priority of co-pending U.S. patent application Ser. No. 9 / 188,079 entitled "Wavelet-Based Facial Motion Capture for Avatar Video."
[0002]
BACKGROUND OF THE INVENTION
The present invention relates to avatar video, and more particularly to facial feature tracking technology.
[0003]
Avatar-filled virtual spaces are an attractive way to experience a common environment. However, photographic avatar animation generally requires cumbersome tracking of the performer's movements, especially tracking of the facial features.
[0004]
Therefore, there is a need for an improved facial feature tracking technology. The present invention fulfills this need.
[0005]
Summary of the Invention
The present invention provides a method and associated apparatus for customizing a visual sensor utilizing an expressionless face of an actor. This method includes capturing the performer's expressionless face image and automatically detecting the position of the screen feature on the expressionless face image using an elastic bunch graph matching method. A node (reference point: node) is automatically assigned to the position of the facial feature on the expressionless facial image of the performer. Thereafter, the node position is manually corrected on the expressionless facial image of the performer.
[0006]
Further, the method can create a correction graph based on the corrected node positions.
[0007]
Other features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings.
[0008]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a method and apparatus for customizing a visual sensor for tracking a facial feature using an expressionless facial image of an actor. This method creates a correction graph that is used to improve the performance of the sensor used for the actor's facial feature tracking.
[0009]
As shown in FIG. 1, in this method, a facial image of an actor is captured (block 12). The expressionless facial image is captured using the visual sensor customization wizard 22 as shown in FIG. An example image 24 is shown to the actor to show the integrity of the captured image 26.
[0010]
Next, the facial feature position is automatically detected using a telescopic bunch graph matching method (block 14). A facial feature detection method using a stretchable bunch graph matching method is disclosed in U.S. Patent Application No. 09 / 188,079. In the expansion / contraction graph matching technique, an image is converted into a Gabor space using a wavelet transform based on a Gabor wavelet. The transformed image is represented by a complex wavelet component value associated with each pixel of the original image.
[0011]
As shown in FIG. 3, node 28 is automatically scored on the facial image at a particular facial feature location (block 16). Due to the intervening actor-specific image features, the facial features graph placed on the actor's face image will include node positions that are not properly located on the face image. For example, the four-point node of the performer's eyebrows is positioned slightly above the eyebrows of the facial image.
[0012]
The apparatus of the present invention uses the vision sensor customization wizard 22 to pick up and move the node 28. The node is manually moved over the expressionless facial image using a pointing device such as a mouse and pulled to the desired selected position (block 18).
[0013]
For example, as shown in FIG. 4, the node arrangement points on the eyebrows of the actor image are adjusted to exactly match the actor eyebrows according to the example image 24.
[0014]
As shown in FIG. 5, after the nodes 28 of the facial features from A to E have been accurately scored on the expressionless facial image 24, the image jets are recalculated for each facial feature and the bunch graph Is compared with the corresponding jet of the gallery 32. The bunch graph gallery includes a multi-person (N) sub gallery. Each person in the sub-gallery has a jet of expressionless facial images 34 and facial expression facial images 36 to 38 representing, for example, a smile or wonder.
[0015]
Each facial feature jet from the corrected actor image 24 is compared to the corresponding facial feature jet from several sub-gallerys' expressionless jets. The expressionless jet of the sub gallery of the feature (feature A) that is closest to the jet of facial feature A is selected to create the jet gallery of feature A in the correction graph 40.
[0016]
In another embodiment, feature E is of interest, and the sub gallery for the number of people (N) has a featureless jet of feature E that most closely resembles the jet of feature E from expressionless image 24. . A feature graph E jet is created using features E jets from expressionless jets along with features E jets from respective expressive features jets 36-38 from sub-gallerys N.
[0017]
Thus, for the expressionless facial image 24, the correction graph 40 is formed using the best jet from the gallery 32 forming a bunch graph.
[0018]
The resulting correction graph 40 provides a more powerful sensor for tracking node positions. A custom-built workplace tracking sensor that incorporates a correction graph further provides a photographic avatar and an enhanced virtual space experience.
[0019]
Although the foregoing description discloses embodiments of the present invention, those skilled in the art will be able to modify those embodiments as appropriate within the scope of the present invention.
[Brief description of the drawings]
FIG. 1 is a flowchart illustrating a method of customizing face feature tracking by accurate landmark detection on a faceless facial image according to the present invention.
FIG. 2 is a visual sensor customization wizard showing an actor's camera image and a general model image.
FIG. 3 is a visual sensor specialization wizard after automatic detection of a node position on a camera image of an actor's face and a scoring process;
FIG. 4 is a visual sensor customization wizard showing correction node positions for creating a correction graph according to the present invention.
FIG. 5 is a block diagram showing a correction graph creation technique using an expressionless facial image according to the present invention.

Claims

A method of customizing facial features tracking,
Capturing the actor's expressionless facial image;
Automatically detecting the facial feature position on the expressionless facial image using a telescopic bunch graph matching method,
Automatically arranging nodes at the facial features on the expressionless facial image;
Manually correcting the position of the node on the expressionless facial image.

The method of claim 1, further comprising creating a correction graph based on the corrected node positions.

A device that specializes in facial feature tracking,
Means for capturing an expressionless facial image of the performer;
Means for automatically detecting the position of the facial features on the expressionless facial image using a stretchable bunch graph matching method,
Means for automatically arranging a node at the facial feature position on the expressionless facial image;
Means for manually correcting the position of the node on the expressionless face image;
An apparatus characterized by comprising:

The apparatus of claim 3, further comprising means for creating a correction graph based on the corrected node positions.

A method of customizing facial features tracking,
Capturing the actor's expressionless facial image;
Automatically detecting face features on the expressionless face image using image analysis based on wavelet component values created by the wavelet transform of the expressionless face image;
Means for automatically arranging a node at the facial feature position on the expressionless facial image;
Means for manually correcting the position of the node on the expressionless face image;
A method comprising:

The method of claim 5, wherein the wavelet transform utilizes Gabor wavelets.

A method of customizing facial features tracking,
Capturing the actor's expressionless facial image;
Detecting the position of the facial features on the facial image using image analysis based on wavelet component values created by the wavelet transform of the expressionless facial image;
Creating a correction graph for providing an expressive feature based on the wavelet component values at the facial feature location on the face image;
A method comprising:

The method of claim 7, wherein the wavelet transform utilizes Gabor wavelets.