JP2002008057A

JP2002008057A - Device and method for compositing animation image

Info

Publication number: JP2002008057A
Application number: JP2001109346A
Authority: JP
Inventors: Kigyo Boku; ▲き▼ 業朴; Seikyu Boku; 成九朴; Heikan Zen; 炳煥全; Junshin Sai; 淳眞崔; Jongen Jo; ▲ジョン▼ 源徐
Original assignee: MORIA TECHNOLOGY KK
Current assignee: MORIA TECHNOLOGY KK
Priority date: 2000-05-08
Filing date: 2001-04-06
Publication date: 2002-01-11
Also published as: KR20010102718A; KR100411760B1

Abstract

PROBLEM TO BE SOLVED: To provide a device and a method of efficiently compositing the face of a substitute person in response to the face pose and the expression of a specified person rendered by animation cartoon, shortening the processing time for image compositing, and improving the quality of the composited image. SOLUTION: A target image is taken out from an animation image frame 10, and the image information of a first set and the image information of a second set in relation to the taken target image are decided, and a digital reference image is taken out from a predetermined static image (namely, a substituent person image frame 20), and the digital reference image is composited on the basis of the image information of the second set in relation to the target image so as to generate the composited image, and this composited image is placed in the target image in the animation image frame 10. The image information of the first set in the area data surrounding the target image and a center coordinate, and the image information of the second set is the element area data surrounding a main structural element in the target image and the feature data of each structural element.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はアニメーション映像
合成技術に関し、特に、アニメーションなどの動画像フ
レーム内の特徴人物の顔表情及びポーズに合わせ所望の
人物の顔を効率的に合成できるように改善されたアニメ
ーション映像合成装置及びその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an animation image synthesizing technique, and more particularly to an animation image synthesizing technique which is capable of efficiently synthesizing a desired person's face according to the facial expression and pose of a characteristic person in a moving image frame such as an animation. And a method for synthesizing an animation image.

【０００２】[0002]

【従来の技術】従来、証明写真で顔領域を取り出すのに
は、例えば、色彩分布情報を用いる技法と顔輪郭線を用
いる技法とがある。色彩分布情報を用いた従来の顔領域
取出し方法では、多数の顔標本を用いて顔色彩ヒストグ
ラムを作成し、該顔色彩ヒストグラムを用いて写真の画
素値を計算することによって顔領域を楕円型として取り
出す。また、多種多様な顔の様子に対する情報に基づい
て、目と口との位置関係を数式化し、これらの式を用い
て両目及び口の領域を求める。しかし、このような従来
の方法は、顔領域取出しの際、固定された楕円型マスク
を用いるため、顔の大きさが多様の場合には、顔領域を
正確に取り出すことが困難であり、且つ、顔が左右に傾
くか或いは複雑な背景を有する場合には、顔領域を正確
に取り出すことが困難であるという不都合がある。2. Description of the Related Art Conventionally, for extracting a face area from an ID photograph, for example, there are a technique using color distribution information and a technique using a face outline. In the conventional face area extraction method using the color distribution information, a face color histogram is created using a large number of face samples, and the pixel value of a photograph is calculated using the face color histogram, so that the face area is made elliptical. Take out. In addition, the positional relationship between the eyes and the mouth is converted into a mathematical expression based on information on a variety of facial states, and the regions of the eyes and the mouth are obtained using these expressions. However, such a conventional method uses a fixed elliptical mask at the time of extracting a face area, so that if the size of the face is various, it is difficult to extract the face area accurately, and If the face is tilted left or right or has a complicated background, it is difficult to accurately extract the face area.

【０００３】一方、輪郭線を用いる技法では、初期映像
の各領域に対して逆Ｕ字形状であるのか、内部に小さい
領域を含むのか、非常に小さいのではないかを検査し最
も適当な領域を頭として選択し、その領域の大きさを用
いて顔の輪郭線を楕円形で近似化することによって顔領
域を取り出す。しかし、このような輪郭線の顔領域取出
し技法は単に頭領域として顔領域を取り出すので実際に
目及び口の位置を精度よく取り出すことが困難であると
いう短所を有する。従って、色彩分布情報または輪郭線
情報を用いる従来の顔領域取出し技法は正確な顔領域の
みならず目及び口の繊細な変化などの多様な表情を探し
出せないため、顔の表情及びポーズが比較的に誇張され
単純な色彩を用いるアニメーションのような動映像に適
用する場合、自然な映像の合成を具現することが困難で
ある。On the other hand, in the technique using a contour line, it is checked whether each area of the initial image has an inverted U shape, contains a small area inside, or is very small, and determines the most appropriate area. Is selected as the head, and the face area is extracted by approximating the contour of the face with an ellipse using the size of the area. However, such a face area extracting technique of a contour has a disadvantage that it is difficult to accurately extract the positions of the eyes and mouth with high accuracy because the face area is simply extracted as the head area. Therefore, the conventional face region extraction technique using the color distribution information or the contour line information cannot find not only an accurate face region but also various expressions such as delicate changes in eyes and mouth, so that the facial expressions and poses are relatively low. When applied to moving images such as animations using simple colors and exaggerated, it is difficult to realize natural image synthesis.

【０００４】また、従来の技法を用いて顔の輪郭、目及
び口などの構成要素を検出可能であるとしても、アニメ
ーション動画像で特定の顔をフレーム毎に取出し所望の
顔に置き換えるには、長時間の処理を要し、且つ所望の
顔の特徴部分を適切に再構成してアニメーション映像フ
レーム内の特定顔と置き換えるには限界がある。[0004] Further, even if it is possible to detect constituent elements such as a face outline, eyes and mouth using a conventional technique, it is necessary to extract a specific face for each frame in an animation moving image and replace it with a desired face. It takes a long time, and there is a limit in appropriately reconstructing a desired facial feature portion and replacing it with a specific face in an animation video frame.

【０００５】[0005]

【発明が解決しようとする課題】従って、本発明の目的
は、アニメーション動画像で演出される特定人物の顔ポ
ーズ及び表情に合うように代替人物の顔を効率的に合成
することによって、映像合成処理時間を短縮させ、合成
済みのアニメーション映像の画質を向上させ得る改善さ
れたアニメーション映像合成装置及びその方法を提供す
ることにある。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an image synthesizing method by efficiently synthesizing a substitute person's face so as to match a specific person's face pose and facial expression produced by an animation moving image. An object of the present invention is to provide an improved animation video synthesizing apparatus and method capable of shortening the processing time and improving the image quality of a synthesized animation video.

【０００６】本発明の他の目的は、アニメーション映像
合成装置に用いるためのプログラム内蔵型の格納媒体を
提供することにある。Another object of the present invention is to provide a storage medium with a built-in program for use in an animation video synthesizing apparatus.

【０００７】[0007]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明の第１好適実施例によれば、予め定められ
た停止映像と映像フレーム内の目標映像とを合成する映
像合成方法であって、前記予め定められた停止映像及び
前記目標映像が夫々複数の画素よりなり、前記映像フレ
ームを有する映像媒体から前記目標映像を取出し、前記
取出された目標映像に対する第１セットの映像情報及び
第２セットの映像情報を決定する第ａ段階と、前記予め
定められた停止映像からディジタル参照映像を取出し、
前記目標映像に対する前記第２セットの映像情報に基づ
き前記ディジタル参照映像を合成して合成済みの映像を
提供する第ｂ段階と、前記映像フレーム内の前記目標映
像を前記合成済みの映像に置き換える第ｃ段階とを含む
ことを特徴とする映像合成方法が提供される。According to a first preferred embodiment of the present invention, there is provided a video synthesizing method for synthesizing a predetermined stop video and a target video in a video frame. Wherein the predetermined stop image and the target image are each composed of a plurality of pixels, extract the target image from an image medium having the image frame, and set a first set of image information for the extracted target image. And a step of determining a second set of image information, and extracting a digital reference image from the predetermined stop image,
B. Synthesizing the digital reference video based on the second set of video information with respect to the target video to provide a synthesized video, and replacing the target video in the video frame with the synthesized video. and c) providing a video composing method.

【０００８】本発明の他の第２好適実施例によれば、予
め定められた停止映像と映像フレーム内の目標映像とを
合成する映像合成装置であって、前記予め定められた停
止映像及び前記目標映像が夫々複数の画素よりなり、前
記映像フレームを有する映像媒体から前記目標映像を取
出し、前記取出された目標映像に対する第１セットの映
像情報及び第２セットの映像情報を決定する映像取出し
手段と、前記予め定められた停止映像からディジタル参
照映像を取出し、前記第２セットの映像情報に基づき前
記ディジタル参照映像を座標移動及び画素補間を通じて
合成することによって合成済みの映像を生成する第１合
成手段と、前記映像フレーム内の前記目標映像を前記合
成済みの映像に置き換える第２合成手段とを含み、前記
第１セットの映像情報が前記目標映像を取り囲む領域デ
ータ及び中心座標を有し、前記第２セットの映像情報が
前記目標映像内の主要構成要素を取り囲む要素領域デー
タ及び各構成要素の特徴データを有することを特徴とす
る映像合成装置が提供される。According to another second preferred embodiment of the present invention, there is provided a video synthesizing apparatus for synthesizing a predetermined still image and a target image in an image frame, wherein the predetermined still image and the target image are combined. Video extracting means for extracting the target image from an image medium having the image frame, wherein the target image is composed of a plurality of pixels, and determining a first set of image information and a second set of image information for the extracted target image A first synthesizing unit for extracting a digital reference image from the predetermined stop image, and synthesizing the digital reference image based on the second set of image information through coordinate movement and pixel interpolation to generate a synthesized image. Means for replacing the target image in the image frame with the synthesized image, the first set of images Wherein the information has region data surrounding the target image and center coordinates, and the second set of image information has element region data surrounding main components in the target image and characteristic data of each component. Is provided.

【０００９】[0009]

【発明の実施の形態】以下、本発明の好適実施例につい
て、図面を参照しながらより詳しく説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below in detail with reference to the drawings.

【００１０】図１は、本発明の第１好適実施例によるア
ニメーション映像合成装置の概略的なブロック図であ
る。本発明のアニメーション映像合成装置１００はアニ
メーション顔取出し部３０、代替顔合成部４０、映像合
成部５０及び後処理部６０で構成される。FIG. 1 is a schematic block diagram of an animation image synthesizing apparatus according to a first preferred embodiment of the present invention. The animation video synthesizing apparatus 100 of the present invention includes an animation face extracting unit 30, an alternative face synthesizing unit 40, an image synthesizing unit 50, and a post-processing unit 60.

【００１１】アニメーション顔取出し部３０は、例え
ば、ハードディスクなどの格納媒体に書き込まれた複数
のアニメーション映像フレーム１０を入力とし、各映像
フレーム内で例えば、ば、主人公の顔を取出してマスキ
ングし、主人公の顔がマスキングされたアニメーション
映像を内部メモリ(図示せず)に格納し、アニメーション
映像フレーム１０の情報、該主人公の顔に関連された顔
領域情報及び顔変化情報を出力する。ここで、顔領域情
報は目及び口などの主要構成要素を含む顔の全域を示
し、顔変化情報は主人公顔の回転程度或いは表情を表
す。これらの顔領域情報及び顔変化情報はフレーム単位
で求められる。The animation face extracting unit 30 receives, for example, a plurality of animation video frames 10 written on a storage medium such as a hard disk, and extracts and masks, for example, the main character's face in each video frame, and performs masking. Is stored in an internal memory (not shown), and outputs information of the animation image frame 10, face area information related to the hero's face, and face change information. Here, the face area information indicates the entire area of the face including the main components such as eyes and mouth, and the face change information indicates the degree of rotation or the facial expression of the protagonist's face. These face area information and face change information are obtained for each frame.

【００１２】本発明の第１好適実施例による代替顔合成
部４０は、カメラなどで撮像した写真のような停止映像
である代替人物映像フレーム２０を入力とし、アニメー
ション顔取出し部３０からの顔変化情報に基づき、代替
顔を合成して合成済みの参照映像を出力する。An alternative face synthesizing unit 40 according to a first preferred embodiment of the present invention receives an alternative person image frame 20 which is a still image such as a photograph taken by a camera or the like, and receives a face change from an animation face extracting unit 30. Based on the information, the alternative face is synthesized, and the synthesized reference video is output.

【００１３】映像合成部５０は、アニメーション顔取出
し部３０からのアニメーション映像フレーム１０の情報
及び該主人公の顔領域情報に基づき、代替合成部４０か
ら入力された合成済みの参照映像をアニメーション映像
のマスキング済みの顔領域に合成して出力する。An image synthesizing unit 50 masks the synthesized reference image input from the alternative synthesizing unit 40 based on the information of the animation image frame 10 from the animation face extracting unit 30 and the face area information of the hero. This is combined with the face area that has already been output.

【００１４】次に、後処理部６０は、映像合成部５０に
より合成されたアニメーション映像フレームにおけるエ
ラー部分を処理し最終に合成されたアニメーション映像
フレームをディスクプレー７０へ出力する。Next, the post-processing unit 60 processes an error portion in the animation video frame synthesized by the video synthesis unit 50, and outputs the finally synthesized animation video frame to the disc play 70.

【００１５】ディスプレー７０は入力されたアニメーシ
ョン映像フレーム、即ち、合成済みのアニメーション映
像フレームを実時間で印する。選択的に、後処理部５０
から出力されたアニメーション映像フレームは格納媒体
(図示せず)に格納され、必要によって映像再生装置で再
生できる。The display 70 marks the input animation video frame, that is, the synthesized animation video frame in real time. Optionally, the post-processing unit 50
Animation video frames output from
(Not shown), and can be reproduced by a video reproducing device as needed.

【００１６】以下、図２及び図３を参照しながら、本発
明の好適実施例によるアニメーション顔取出し部３０に
対して詳細に説明する。Hereinafter, the animation face extracting unit 30 according to the preferred embodiment of the present invention will be described in detail with reference to FIGS.

【００１７】顔隣接ボックスの取出し図３(ａ)に示したように、ユーザはマウスなどの位置決
め装置を用い、印装置(図示せず)上にロードされた最初
のアニメーション映像フレーム１０において顔の大略的
な基準位置Ｐｒｅｆを指定する(ステップＳ３１)。顔の
基準位置Ｐｒｅｆが指定されれば、第１領域取出しアル
ゴリズム３１を通じて顔を取り囲む隣接ボックスが求め
られる(ステップＳ３２)。詳述すると、ステップＳ３２
にて顔の基準位置Ｐｒｅｆをマウスで指定すれば、指定
された基準位置の画素値を中心として放射方向に所定の
類似度を有する隣接画素を検索して顔領域(図３中で実
線)を拡張し、拡張された顔領域内で最大／最小の画素
値を有する画素座標Ａ、Ｂ、Ｃ及びＤを決め、各座標を
結んだ顔の隣接ボックス(図３中で点線)を求める。顔領
域を含む顔隣接ボックスの情報はアニメーション顔取出
し部３０によって顔領域情報として出力される。[0017]Removing the box adjacent to the face As shown in FIG. 3A, the user determines the position of the mouse or the like.
First loaded on a marking device (not shown) using a
Of the face in the animation video frame 10
The appropriate reference position Pref is specified (step S31). face's
If the reference position Pref is designated, the first area extraction
The adjacent box surrounding the face is sought through Gorism 31
Is performed (step S32). More specifically, step S32
By specifying the reference position Pref of the face with the mouse,
A predetermined value in the radial direction with the pixel value at the reference position
An adjacent pixel having similarity is searched for a face area (the actual area in FIG. 3).
Line) and expand / decrease the maximum / minimum pixels in the expanded face area
Determine pixel coordinates A, B, C, and D having values, and
An adjacent box (dotted line in FIG. 3) of the connected face is obtained. Face
Face adjacent box information including area is animated face extraction
The information is output as face area information by the selector 30.

【００１８】ここで、以降に入力されるアニメーション
映像フレームの基準位置及び基準画素値は、以前に入力
されて処理された映像フレームにおいて指定された基準
位置及び基準画素値として設定されることに注目された
い。また、アニメーション映像フレーム１０において顔
の輪郭線が２つ以上に分けられた場合、即ち、顔の明暗
によって輪郭線が２つの領域に区分された場合、ユーザ
は各明暗領域の大略的な位置を基準位置として指定し、
上記した過程を通じて顔の隣接ボックスを取り出すこと
ができる。Note that the reference position and reference pixel value of the subsequently input animation video frame are set as the reference position and reference pixel value specified in the previously input and processed video frame. I want to be. In addition, when the outline of the face is divided into two or more in the animation video frame 10, that is, when the outline is divided into two areas by the lightness and darkness of the face, the user determines the approximate position of each light and dark area. Designate as reference position,
Through the above process, the box adjacent to the face can be extracted.

【００１９】目隣接ボックスの取出し次に、ステップＳ３３にて、第２領域取出しアルゴリズ
ム３２によって目を取り囲む隣接ボックスが取り出され
る。ユーザが顔隣接ボックス内で各目領域の大略的な中
心をマウスなどの位置決め装置を用いて指定すれば、顔
隣接ボックス大きさの一定比率の大略的な目の位置範囲
が自動に定められる。次に、ユーザがマウスを用いて各
目の黒目及び白目の領域を順次に指定すれば、該当画素
値によって表現される二進目ボックス映像が生成され
る。その後、大略的な目ボックスの二進映像に対して、
如何なる行及び列に如何なる画素が多く集中されている
かを判断するプロジェクションを行うことによって、黒
目の領域と白目の領域とを含む実質的な目領域が検出さ
れる。[0019]Removing the box adjacent to the eye Next, in step S33, the second region extraction algorithm
32 removes the adjacent box surrounding the eye
You. When the user moves roughly inside each eye area in the face-adjacent box
If the heart is specified using a positioning device such as a mouse, the face
Approximate eye position range with a fixed ratio of adjacent box sizes
Is automatically determined. Next, the user uses the mouse to
If the iris and iris regions of the eye are sequentially specified, the corresponding pixel
A secondary box image represented by the value is generated
You. Then, for the binary image of the rough eye box,
Many pixels are concentrated in any row and column
By performing a projection to determine whether
A substantial eye area including an eye area and a white eye area is detected.
It is.

【００２０】各目の黒目領域を指定するのは各目の実質
的な中心位置を探すためのものであり、現映像フレーム
で指定された目の中心位置は後続する映像フレームの目
の中心位置として設定される。目の隣接ボックスは顔の
隣接ボックス取出し方法と同様に、検索された画素値の
うち、最大／最小の画素値を有する４つの画素座標を決
定し、各座標を結ぶことによって、図３(ｂ)に示された
ような目隣接ボックスＥＬＸ及びＥＲＸが求められる。Specifying the iris region of each eye is for searching for the substantial center position of each eye, and the center position of the eye specified in the current video frame is the center position of the eye in the succeeding video frame. Is set as The adjacent box of the eyes determines four pixel coordinates having the maximum / minimum pixel value among the searched pixel values in the same manner as the face adjacent box extraction method, and connects the respective coordinates to obtain the coordinates of FIG. ) Are obtained as shown in FIG.

【００２１】口隣接ボックスの取出し口の隣接ボックスは、第３領域取出しアルゴリズム３３
を用いてステップＳ３４にて取り出される。ユーザはマ
ウスなどを用いて上唇または下唇上で基準位置となるべ
き一点を指定する。基準位置が指定されれば、プロセス
は顔の隣接ボックス取出し方法と同様に、基準位置の画
素値を中心として放射方向に所定の類似度を有する隣接
画素を探索し図３(ｃ)に示したような口の隣接ボックス
ＭＸを取り出す。[0021]Removing the box next to the mouth The box adjacent to the mouth is the third region extraction algorithm 33
And is taken out in step S34 using. The user is
Use a mouse to set the reference position on the upper lip or lower lip.
Specify one point. Once the reference position is specified, the process
Is the image of the reference position in the same way as
Neighbors that have a certain degree of similarity in the radial direction around the prime value
The pixel is searched and the box adjacent to the mouth as shown in FIG.
Take out MX.

【００２２】本発明では、顔の主要特徴部分として目及
び口のみに対して説明したが、耳及び鼻などの異なる特
徴部分に対しては、上記のような方法で領域を拡張でき
るためそれに対する説明は省略する。In the present invention, only the eyes and the mouth are described as the main feature parts of the face. However, for different feature parts such as the ears and the nose, the area can be expanded by the above-described method. Description is omitted.

【００２３】顔表情情報の取出し図４及び図５は、本発明の第１好適実施例によって、表
情情報取出しアルゴリズム３４を用いて図２のステップ
Ｓ３５にて行われる表情情報取出し方法を説明するため
の模式図である。[0023]Extracting facial expression information FIGS. 4 and 5 show a table according to a first preferred embodiment of the present invention.
2 using the information extraction algorithm 34
To explain the facial expression information extracting method performed in S35
FIG.

【００２４】図４に示したように、プロセスはステップ
Ｓ３３によって取り出された目の隣接ボックスにおいて
目の幅Ｗｅ及び高さＨｅを測定する。目の幅ＷｅはＸ軸
線上に位置する画素座標のうち、最大及び最小の画素座
標間の距離として求められ、高さＨｅはＹ軸線上に位置
する画素座標のうち最大及び最小の画素座標間の距離と
して求められる。目の表情(開閉)は瞼を上下に移動させ
ることによって演出できる。目の開閉の程度を明確に検
出するためには、アニメーション主人公の顔が証明写真
のような正常的な表情で示された映像フレームを最初の
映像フレームと選定することが好ましい。As shown in FIG. 4, the process measures the eye width We and the height He in the box adjacent to the eye retrieved in step S33. The eye width We is determined as the distance between the maximum and minimum pixel coordinates among the pixel coordinates located on the X-axis, and the height He is between the maximum and minimum pixel coordinates among the pixel coordinates located on the Y-axis. Is calculated as the distance of The expression (open / close) of the eyes can be produced by moving the eyelids up and down. In order to clearly detect the degree of opening and closing of the eyes, it is preferable to select a video frame in which the face of the animation hero has a normal expression such as an ID photo as the first video frame.

【００２５】例えば、取り出された目隣接ボックス内で
黒目及び白目領域が全く検出されない場合は、目が閉じ
られたものと判断し、目の開閉の程度を０として計算す
ることができる。即ち、高さＨｅは０である。図４(ａ)
は目が全開された状態を示し、図４(ｂ)は目が半開され
た状態を示す。For example, if no black and white eye regions are detected in the extracted eye adjacent box, it is determined that the eyes are closed, and the degree of opening and closing of the eyes can be calculated as 0. That is, the height He is zero. FIG. 4 (a)
Shows a state where the eyes are fully opened, and FIG. 4B shows a state where the eyes are half-opened.

【００２６】従って、前述したような表情情報取出しア
ルゴリズム３４を用いて目の開閉程度を計算することに
よって目に関連された表情データが求められる。同様
に、口の表情(開閉)は上唇と下唇とを上下左右に移動さ
せることによって演出できる。口の開閉の程度を正確に
検出するためには、アニメーション主人公の顔が証明写
真のような正常的な表情で表れた映像フレームを最初の
映像フレームと選定することが好ましい。Therefore, the expression data related to the eyes is obtained by calculating the degree of opening and closing of the eyes using the expression information extracting algorithm 34 as described above. Similarly, the facial expression (opening / closing) of the mouth can be produced by moving the upper lip and the lower lip up, down, left, and right. In order to accurately detect the degree of opening and closing of the mouth, it is preferable to select a video frame in which the face of the animation hero has a normal expression such as an ID photo as the first video frame.

【００２７】図５に示したように、口は目とは異なり二
つの隣接ボックス、即ち、外部ボックスＥｂｏｘと内部
ボックスＩｂｏｘとして表現できる。前述したように、
外部ボックスＥｂｏｘは探索された各画素のうちでＸ軸
及びＹ軸に位置する最大及び最小の画素点を結ぶことに
よって求められ、内部ボックスＩｂｏｘは外部ボックス
Ｅｂｏｘ内において唇画素値と異なる画素値とを有する
画素座標のうちでＸ軸及びＹ軸に位置する最大及び最小
の画素座標点を結ぶことによって求められる。As shown in FIG. 5, the mouth, unlike the eye, can be represented as two adjacent boxes, an outer box Ebox and an inner box Ibox. As previously mentioned,
The outer box Ebox is obtained by connecting the maximum and minimum pixel points located on the X axis and the Y axis among the searched pixels, and the inner box Ibox has a pixel value different from the lip pixel value in the outer box Ebox. Is obtained by connecting the maximum and minimum pixel coordinate points located on the X axis and the Y axis among the pixel coordinates having

【００２８】このようにして求められた二つの隣接ボッ
クスの高さ、即ち、外部ボックスＥｂｏｘの高さＭ０と
内部ボックスＩｂｏｘの高さＭｉとの間の比率を計算す
ることによって、口に関連された表情データが取り出さ
れる。By calculating the ratio of the heights of the two adjacent boxes determined in this way, ie the height M0 of the outer box Ebox and the height Mi of the inner box Ibox, it is possible to associate the height with the mouth. The extracted facial expression data is extracted.

【００２９】その後、ステップＳ３５にて取り出された
目関連表情データ及び口関連表情データは図１の代替顔
合成部４０へ伝送される。Thereafter, the eye-related expression data and the mouth-related expression data extracted in step S35 are transmitted to the alternative face synthesizing unit 40 in FIG.

【００３０】顔ポーズ情報の取出し以下、図６及び図７を参照しながら、本発明の第１好適
実施例による顔ポーズ情報取出し方法を説明する。[0030]Extracting face pose information Hereinafter, a first preferred embodiment of the present invention will be described with reference to FIGS.
A method of extracting face pose information according to an embodiment will be described.

【００３１】通常、顔のポーズは顔の多様な回転方向で
識別可能であり、３つの類形に大別することができる。
図６(ａ)に示したように、第１ポーズは首を傾けるポー
ズ(以下、“サア回転”と称す)であり、第２ポーズは首
を左右に振るポーズ(以下、“否定回転”と称す)であ
り、第３ポーズは首をたてに振るポーズ(以下、“肯定
回転”と称す)である。In general, face poses can be identified by various rotation directions of the face, and can be roughly classified into three types.
As shown in FIG. 6A, the first pose is a pose of tilting the neck (hereinafter, referred to as “saw rotation”), and the second pose is a pose of swinging the neck left and right (hereinafter, “negative rotation”). The third pose is a pose of shaking the head straight (hereinafter, referred to as “positive rotation”).

【００３２】図７のステップＳ６１にて、プロセスはア
ニメーション主人公顔の中心が変わったのかを判断す
る。ここで、顔の中心は両目の中心を結んだ線分に対し
て垂直を成す線分を示す。ステップＳ６１にて顔の中心
が変わらなかった場合、プロセスは現在の顔が正面であ
ると判断し、正面情報、即ち、０の回転データを出力す
る(ステップＳ６２)。In step S61 of FIG. 7, the process determines whether the center of the animation protagonist's face has changed. Here, the center of the face indicates a line segment perpendicular to the line segment connecting the centers of both eyes. If the center of the face has not changed in step S61, the process determines that the current face is the front, and outputs the front information, that is, 0 rotation data (step S62).

【００３３】一方、ステップＳ６１で顔の中心が変わっ
た場合、プロセスは顔の中心が左側または右側の方向に
変わったのかを検査し(ステップＳ６３)、変わった場
合、現在の顔が“否定ポーズ”状態であるものと判断
し、該当回転程度を計算する(ステップＳ６４)。図６
(ｂ)は“否定回転”状態を示した一例であって、左右目
の中心から顔ボックス境界までの距離が相異し、片目の
中心から顔ボックスまでの距離が短い方に回転した形
態、即ち、左側に回転した形態である。否定回転の程度
は、目の中心から顔ボックス境界までの距離の長いｂに
対して距離の短いａに対する比率を計算することによっ
て求められる。ステップＳ６４にて求められた回転比率
は否定回転データとして出力される。On the other hand, if the center of the face has changed in step S61, the process checks whether the center of the face has changed to the left or right (step S63). It is determined that the state is "", and the corresponding rotation degree is calculated (step S64). FIG.
(b) is an example showing a “negative rotation” state, in which the distance from the center of the left and right eyes to the face box boundary is different, and the distance from the center of one eye to the face box is shorter, That is, it is a form rotated to the left. The degree of the negative rotation is determined by calculating the ratio of a long distance b to a short distance a from the center of the eye to the face box boundary. The rotation ratio obtained in step S64 is output as negative rotation data.

【００３４】ステップＳ６３にて、顔の中心が左側また
は右側方向に変わらなかった場合、プロセスは顔の中心
が上下方向に移動したかを検査し(ステップＳ６５)、移
動した場合、現在の顔が“肯定ポーズ”状態であると判
断し該当回転程度を計算する(ステップＳ６６)。図６
(ｃ)は“肯定回転”状態を示したものであって、顔領域
のｙ軸上で最大及び最小の画素座標が上下に移動した状
態である。例えば、主人公の顔が下に向かっている場
合、主人公の顔領域のＹ軸の最大画素座標Ｐｍａｘは下
側に移動するものであり、ｙ軸の最小画素座標Ｐｍｉｎ
は変わらないだろう。これとは異なり、主人公の顔が上
に向かっている場合は、主人公の顔領域のＹ軸の最小画
素座標Ｐｍｉｎは上側に移動し、Ｙ軸の最大画素座標Ｐ
ｍａｘは変化しないだろう。即ち、肯定回転データはＹ
軸に対する最大及び最小の画素座標の変位量を計算する
ことによって求められる。ステップＳ６６にて求められ
た変位量は肯定回転データとして出力される。At step S63, if the center of the face has not changed to the left or right, the process checks whether the center of the face has moved up and down (step S65). It is determined that the state is "positive pause", and the corresponding rotation degree is calculated (step S66). FIG.
(c) shows the “positive rotation” state, in which the maximum and minimum pixel coordinates on the y-axis of the face area have moved up and down. For example, when the main character's face faces downward, the maximum pixel coordinate Pmax of the Y-axis of the main character's face area moves downward, and the minimum pixel coordinate Pmin of the y-axis.
Will not change. On the other hand, when the main character's face is facing upward, the minimum pixel coordinate Pmin of the Y-axis of the main character's face area moves upward, and the maximum pixel coordinate P of the Y-axis changes.
max will not change. That is, the positive rotation data is Y
It is determined by calculating the displacement of the maximum and minimum pixel coordinates with respect to the axis. The displacement amount obtained in step S66 is output as positive rotation data.

【００３５】ステップＳ６５にて、顔の中心が上側また
は下側の方向に変わらなかった場合、プロセスは顔の中
心が左右方向に傾けられたかを検査し(ステップＳ６
７)、傾けられた場合、現在の顔が“サアポーズ”状態
であると判断し該当回転程度を計算する(ステップＳ６
８)。図６(ｄ)は“サアポーズ”状態を示したものであ
って、各隣接ボックス内の各目の中心を結んだ線分が水
平線に対して傾けられている。目の傾き程度は、両目の
中心を結ぶ線分の中心から左側目(または、右側目)の中
心までの水平距離Ｌｈと左側目(または、右側目)の中心
から水平線までの垂直距離Ｌｖとの間の比率Ｄｙを計算
することによって求められる。ステップＳ６８で求めら
れた比率はサア回転データとして出力される。If the center of the face has not changed in the upper or lower direction in step S65, the process checks whether the center of the face has been tilted left and right (step S6).
7) If tilted, it is determined that the current face is in the “Surpose” state, and the degree of rotation is calculated (step S6).
8). FIG. 6 (d) shows a "Surpose" state, in which the line connecting the centers of the eyes in each adjacent box is inclined with respect to the horizontal line. The degree of inclination of the eyes is determined by the horizontal distance Lh from the center of the line connecting the centers of both eyes to the center of the left eye (or right eye) and the vertical distance Lv from the center of the left eye (or right eye) to the horizontal line. Is calculated by calculating the ratio Dy. The ratio obtained in step S68 is output as the saur rotation data.

【００３６】ステップＳ７０にては、上記の過程を通じ
て求められた否定回転データ、肯定回転データ及びサア
回転データに基づいて、顔ポーズデータを図１の代替顔
合成部４０に出力する。In step S70, the face pose data is output to the alternative face synthesizing section 40 shown in FIG. 1 based on the negative rotation data, the positive rotation data, and the sad rotation data obtained through the above process.

【００３７】顔表情の合成図８は、本発明の第１好適実施例による代替顔合成部４
０で行われる過程を説明するための図面であり、図９及
び図１０は夫々表情データに基づき顔を合成する方法を
説明するための模式図及び流れ図である。[0037]Synthesis of facial expressions FIG. 8 shows an alternative face synthesizing unit 4 according to the first preferred embodiment of the present invention.
FIG. 9 is a diagram for explaining the process performed in FIG.
10 and 10 show the method of synthesizing faces based on facial expression data, respectively.
It is a schematic diagram and a flowchart for explaining.

【００３８】ステップＳ４１にて、ユーザはマウスなど
の位置決め装置を用いて通常の方法で代替人物映像フレ
ーム２０から代替顔の領域を取得する。In step S41, the user obtains an area of a substitute face from the substitute person image frame 20 by a normal method using a positioning device such as a mouse.

【００３９】ステップＳ４２にて、ユーザは取得した顔
領域上で顔の主要特徴点を指定し各特長点を結ぶメッシ
ュを生成する。各特長点は顔の主要境界部分(例えば、
眉毛、目、口などの上下左右の境界)或いは屈曲部分(例
えば、鼻先、頬骨)に位置するようにしなければなら
ず、メッシュは顔の構成要素(即ち、目、口など)がメッ
シュ格子によって取り囲まれるようにしなければならな
い。In step S42, the user designates main feature points of the face on the acquired face area and generates a mesh connecting the feature points. Each feature point is the main boundary of the face (for example,
Eyebrow, eyes, mouth, etc. must be located at the upper and lower left and right boundaries) or at the bent part (for example, nose tip, cheekbones), the mesh is a component of the face (i.e., eyes, mouth, etc.) by a mesh grid You must be surrounded.

【００４０】詳記すると、図９(ａ)及び図９(ｂ)に各々
示したように、目メッシュを含む隣接ボックスと口メッ
シュを含む隣接ボックスを設定する。図９で、隣接ボッ
クス境界などに位置する印“□”は固定点を示し、メッ
シュ上に位置する印“■”はアニメーション顔取出し部
３０からの表情データに応じて移動されるべき移動点を
示し、印“□”はアニメーション主人公の実際の目(ま
たは、口)の開閉程度を示す目標点である。目の開閉効
果は図９(ａ)に示したように、目蓋メッシュ点に位置す
る移動点■を制御することによって演出できる。図９
で、メッシュと隣接ボックスとの間、即ち、固定点と移
動点との間に設けられた所定の空間は画素補間(後述)時
に基準画素として用いられる画素集合である。More specifically, as shown in FIGS. 9A and 9B, an adjacent box including an eye mesh and an adjacent box including a mouth mesh are set. In FIG. 9, a mark “□” located at an adjacent box boundary or the like indicates a fixed point, and a mark “■” located on the mesh indicates a moving point to be moved according to expression data from the animation face extracting unit 30. The mark “□” is a target point indicating the degree of opening and closing of the actual eyes (or mouth) of the animation hero. As shown in FIG. 9A, the eye opening / closing effect can be produced by controlling the moving point する located at the eyelid mesh point. FIG.
The predetermined space provided between the mesh and the adjacent box, that is, between the fixed point and the moving point is a pixel set used as a reference pixel at the time of pixel interpolation (described later).

【００４１】口の開閉効果は、図９(ｂ)に示したよう
に、下唇と上唇との境界に位置する各移動点■を上下に
移動させることによって演出できる。また、唇の垂直中
央に設けられた移動点を基準として左右側に設けられ
た各移動点“”及び“”は、移動点“”との水平
距離に半比例して上下に移動するように設けることによ
って、口の表情をより自然に演出することができる。実
際に、代替顔において口のメッシュは表情合成時に高さ
の増加によって合成済みのアニメーション主人公顔の形
態が歪曲される恐れがあるため、代替顔に対するメッシ
ュ作成時に口メッシュの周りをゆとりのあるように設定
しメッシュを作成することが好ましい。As shown in FIG. 9B, the opening and closing effect of the mouth can be produced by moving each moving point 下 located at the boundary between the lower lip and the upper lip up and down. The moving points "" and "" provided on the left and right sides with respect to the moving point provided at the vertical center of the lips move up and down in half proportion to the horizontal distance from the moving point "". By providing, the expression of the mouth can be produced more naturally. In fact, in the alternative face, the mesh of the mouth may be distorted due to the increase in the height of the facial expression when the facial expression is synthesized, and the form of the synthesized animation hero face may be distorted. It is preferable to create a mesh by setting to.

【００４２】ステップＳ４３にて、プロセスはアニメー
ション顔取出し部３０から入力された顔表情データに基
づき第１合成アルゴリズム４０ａを用いて代替人物の目
及び口に関連された表情を再構成(合成)する。図１０は
第１合成アルゴリズム４０ａを説明するための流れ図で
ある。In step S43, the process reconstructs (synthesizes) the facial expression associated with the eyes and mouth of the substitute person using the first synthetic algorithm 40a based on the facial expression data input from the animation face extracting unit 30. . FIG. 10 is a flowchart for explaining the first combination algorithm 40a.

【００４３】図１０に示したように、ステップＳ１１に
て、プロセスはアニメーション顔取出し部３０から顔表
情データが入力されたかの可否を判断する。顔表情デー
タが入力されれば、プロセスはステップＳ１２にて目関
連表情データであるかを判断する。ステップＳ１２にて
目関連表情データであるものと判断されれば、プロセス
は該目関連表情データに基づき、図９(ａ)を参照して説
明したようにメッシュ制御点を移動させて代替人物の目
を合成する。As shown in FIG. 10, in step S11, the process determines whether or not facial expression data has been input from the animation face extracting unit 30. If facial expression data is input, the process determines in step S12 whether the data is eye-related expression data. If it is determined in step S12 that the data is the eye-related expression data, the process moves the mesh control point as described with reference to FIG. Synthesize the eyes.

【００４４】一方、ステップＳ１２にてアニメーション
顔取出し部３０から入力されたデータが目関連表情デー
タでない場合、プロセスはステップＳ１４へ進み口関連
表情データであるかを判断する。On the other hand, if the data input from the animation face extracting unit 30 is not the eye-related expression data in step S12, the process proceeds to step S14 to determine whether the data is the mouth-related expression data.

【００４５】ステップＳ１４にて、入力データが口関連
表情データであるものと判断されれば、プロセスはステ
ップＳ１５にて図９(ｂ)を参照して説明したようにメッ
シュ制御点を移動させ、代替人物の口の開け表情を合成
する。即ち、図９(ｂ)に示したように、口隣接ボックス
内で上部に位置する各移動点を上側に移動させ、下部に
位置する各移動点を下側に移動させることによって、口
の開け表情を合成することができる。この際、上唇及び
下唇の最大大きさまたは高さは固定されたものとする。If it is determined in step S14 that the input data is the mouth-related expression data, the process moves the mesh control point in step S15 as described with reference to FIG. The substitute person's open mouth expression is synthesized. That is, as shown in FIG. 9B, the opening point is moved by moving each moving point located at the upper part in the box adjacent to the mouth upward and moving each moving point located at the lower part downward. Expressions can be combined. At this time, it is assumed that the maximum size or height of the upper lip and the lower lip is fixed.

【００４６】顔ポーズの合成以下、本発明による第２合成アルゴリズム４０ｂを用い
て代替人物の顔ポーズを合成する方法に対して説明す
る。[0046]Synthesis of face poses Hereinafter, using the second synthesis algorithm 40b according to the present invention,
Explain how to compose the alternative person's face pose.
You.

【００４７】ステップＳ４４にて、プロセスは図１のア
ニメーション顔取出し部３０から入力された一連のポー
ズデータ、即ち、正面データ、否定回転データ、肯定回
転データ及びサア回転データに基づき、図６を参照しな
がら説明したように代替人物の顔の中心を回転させるこ
とによって、代替人物の顔をアニメーション主人公の顔
領域に合成する。In step S44, the process is based on a series of pose data input from the animation face extracting unit 30 of FIG. 1, ie, front data, negative rotation data, positive rotation data, and sad rotation data, and refer to FIG. As described above, by rotating the center of the face of the substitute person, the face of the substitute person is combined with the face area of the animation hero.

【００４８】本発明の代替顔合成部４０における合成過
程の際、目及び口の表情合成を先ず行い、顔のポーズ合
成を後に行ったが、その逆順にしてもよい。In the synthesizing process in the alternative face synthesizing section 40 of the present invention, the facial expression of the eyes and the mouth are synthesized first and the pose of the face is synthesized later.

【００４９】また、本発明の第１好適実施例によれば、
目または口の隣接ボックス内で各移動点を目標点まで移
動(または、膨脹)させた後、移動点と目標点との間に適
切な画素値を満たすため画素補間技法を用いる。図１１
は本発明による画素補間技法を説明するための模式図で
ある。According to the first preferred embodiment of the present invention,
After each moving point is moved (or expanded) to the target point in the box adjacent to the eye or mouth, a pixel interpolation technique is used to fill an appropriate pixel value between the moving point and the target point. FIG.
FIG. 3 is a schematic diagram for explaining a pixel interpolation technique according to the present invention.

【００５０】詳記すれば、図１０に示した目の隣接ボッ
クス内の上部に位置する２番目の移動点■“”が２番
目の目標点□“”に移動(膨脹)される場合、２番目の
移動点■と２番目の目標点□との間の空間を２番目の移
動点■に対応する画素値に基づいて満たせば、満たされ
た画素値と隣接ボックス内の画素値とが不自然になる。
従って、本発明では図１１(ａ)に示したように、隣接ボ
ックス上に位置する各固定点と各移動点との間の画素値
を用いて膨脹領域に挿入されるべき画素値を求める。More specifically, when the second moving point ■ “” located in the upper part of the box adjacent to the eye shown in FIG. 10 is moved (expanded) to the second target point □ “”, 2 If the space between the second moving point ■ and the second target point □ is filled based on the pixel value corresponding to the second moving point ■, the filled pixel value and the pixel value in the adjacent box do not match. Become natural.
Therefore, in the present invention, as shown in FIG. 11A, a pixel value to be inserted into the expansion area is obtained by using a pixel value between each fixed point and each moving point located on the adjacent box.

【００５１】即ち、固定点と移動点との間の入力画素値
が、例えば、“９０，８０，６０，６５，７１，８１”
であれば、膨脹領域に挿入されるべき出力画素値は例え
ば、“９０，８３．３，７３．２，６０，６５，６９．
０８，７４．５，８１．１８”のように補間されて出力
されることによって、目の表情がより一層自然に処理す
る。図１１(ｂ)はメッシュ移動点が収縮された場合に画
素値が補間されたことを示す。このようにして、目及び
口の表情が合成された代替人物の顔映像は図１の映像合
成部５０に入力される。That is, the input pixel value between the fixed point and the moving point is, for example, “90, 80, 60, 65, 71, 81”.
, The output pixel value to be inserted into the expansion area is, for example, “90, 83.3, 73.2, 60, 65, 69.
By interpolating and outputting as in the case of 08, 74.5, 81.18 ", the expression of the eyes is processed more naturally. FIG. 11B shows the pixel value when the mesh moving point is contracted. Is thus interpolated. In this way, the face image of the alternative person in which the facial expressions of the eyes and the mouth are synthesized is input to the image synthesizing unit 50 of FIG.

【００５２】図１２は本発明の第２好適実施例によるア
ニメーション映像合成装置２００を示した概略的なブロ
ック図である。FIG. 12 is a schematic block diagram showing an animation video synthesizing apparatus 200 according to a second preferred embodiment of the present invention.

【００５３】本発明の第２好適実施例によるアニメーシ
ョン映像合成装置２００は、カメラなどで撮像した写真
などの停止映像である複数の代替人物映像フレーム２
０’から対応する参照映像を取出して、これらの参照映
像を合成時に用いる点を除外しては、第１実施例と同一
である。ここで、参照映像というのは目的とする合成映
像を最適に得るために取られる映像を意味する。The animation image synthesizing apparatus 200 according to the second preferred embodiment of the present invention includes a plurality of alternative human image frames 2 which are still images such as photographs taken by a camera or the like.
This embodiment is the same as the first embodiment except that corresponding reference videos are extracted from 0 ′ and these reference videos are used at the time of synthesis. Here, the reference image means an image that is taken in order to optimally obtain a target composite image.

【００５４】図１３に示したように、複数の代替人物映
像フレーム２０’は正面、上側、下側、半左側、半右
側、左側、右側の７つの参照映像を有する。As shown in FIG. 13, the plurality of substitute person image frames 20 'have seven reference images: front, upper, lower, half left, half right, left, and right.

【００５５】図８を参照しながら説明したように、代替
顔合成部４０’は代替人物映像フレーム２０’から前述
した７つの参照映像を取出し、取出した各参照映像に対
し図１３に示したような顔メッシュを作成して格納媒体
(図示せず)に格納する。As described with reference to FIG. 8, the alternative face synthesizing unit 40 'extracts the above-mentioned seven reference images from the alternative person image frame 20', and applies the extracted reference images as shown in FIG. Create a simple face mesh and store it
(Not shown).

【００５６】説明に前に、アニメーション主人公顔の表
情及びポーズによって所望する顔を合成するとき、正面
の参照映像だけを参照する場合を考慮する。アニメーシ
ョン主人公の顔が左右に激しく回転された場合、正面の
参照映像をソースメッシュとして用いれば、各メッシュ
点のｙ座標が変って図１４(ｂ)に示したように側面が歪
曲された映像をもたらす。同様に、上下回転時に中間映
像を生成する場合には、各メッシュ点のｘ座標が変っ
て、図示してはいないが上下部分が歪曲された映像が生
成される。即ち、方向転換が左右側に激しく発生した場
合には、合成済みの映像の画質が低下される恐れがあ
る。従って、本発明の第２好適実施例では、ソースメッ
シュは正面の参照映像とし、半右側または半左側より更
に側面に回転された場合には半右側または半左側の参照
映像をソースメッシュと設定して、任意のポーズに対す
る中間フレーム(中間映像)を生成する。ここで、中間フ
レームは正面の参照映像と半左右側の参照映像との間の
映像、正面の参照映像と上下側の参照映像との間の映
像、または半左右側の参照映像と左右側の参照映像との
間の映像を意味する。Prior to the description, a case where only a front reference image is referred to when a desired face is synthesized based on the expression and pose of the protagonist of the animation is considered. When the face of the animation hero is rotated violently right and left, if the reference image of the front is used as the source mesh, the y coordinate of each mesh point changes and the image whose side is distorted as shown in FIG. Bring. Similarly, when an intermediate image is generated during vertical rotation, the x-coordinate of each mesh point changes, and although not shown, an image in which the upper and lower portions are distorted is generated. In other words, when the direction change occurs violently on the left and right sides, the image quality of the synthesized video may be degraded. Therefore, in the second preferred embodiment of the present invention, the source mesh is set as the front reference image, and the half-right or half-left reference image is set as the source mesh when rotated to the side more than the half right or half left. To generate an intermediate frame (intermediate video) for an arbitrary pose. Here, the intermediate frame is an image between the front reference image and the half-left and right reference images, an image between the front reference image and the upper and lower reference images, or a half-left and right reference image and the left and right reference images. It means an image between the reference image.

【００５７】詳記すると、図１５に示したように、アニ
メーション顔取出し部３０から入力された顔ポーズデー
タが正面の参照映像２０”から右側に所定の基準値０．
５より更に回転された場合、即ち、位置Ｒに位置する場
合、代替顔合成部４０’は以前に書込まれた半右側の参
照映像を引出して該映像をソースメッシュと設定する。
その後、代替顔合成部４０’は半右側の参照映像に基づ
き中間映像メッシュを生成した後、該中間映像メッシュ
に対して前述したような線形補間及び画素補間を実施し
て所望する中間映像を生成する。この際、線形補間はｘ
座標成分とｙ座標成分とを個別的に求めて実施する。More specifically, as shown in FIG. 15, the face pose data input from the animation face extracting unit 30 is shifted from the front reference image 20 "to the right by a predetermined reference value 0.
When rotated further than 5, that is, when it is located at the position R, the alternative face synthesizing unit 40 'extracts the previously written half-right reference image and sets the image as the source mesh.
After that, the alternative face synthesis unit 40 'generates an intermediate image mesh based on the half-right reference image, and then performs the above-described linear interpolation and pixel interpolation on the intermediate image mesh to generate a desired intermediate image. I do. At this time, the linear interpolation is x
The coordinate component and the y-coordinate component are individually obtained and performed.

【００５８】また、本発明の第２好適実施例によれば、
顔が左右に傾けられて時計方向または反時計方向に回転
する効果を具現するアルゴリズムを提供する。顔映像の
回転軸を中心に設定し、９０゜の倍数の１８０゜及び２
７０°の場合は単純な座標対置によって具現でき、その
他の回転角度の場合には通常の両線形補間法と本発明の
画素補間法とを適用することによって具現できる。According to a second preferred embodiment of the present invention,
Provided is an algorithm for implementing an effect that a face is tilted left and right and rotated clockwise or counterclockwise. 180 ° and 2 which are multiples of 90 ° are set around the rotation axis of the face image.
In the case of 70 °, it can be realized by simple coordinate offset, and in the case of other rotation angles, it can be realized by applying the normal bilinear interpolation method and the pixel interpolation method of the present invention.

【００５９】最終的に、映像合成部５０は、アニメーシ
ョン顔取出し部３０から提供されるアニメーション主人
公の顔領域情報、即ち、主人公の顔がマスキングされた
映像情報に基づき、代替顔合成部４０から入力された合
成済みの顔映像をアニメーション主人公のマスキング済
みの顔領域に合成する。Finally, the image synthesizing section 50 receives an input from the alternative face synthesizing section 40 based on the face area information of the animation hero provided from the animation face extracting section 30, that is, the image information in which the hero's face is masked. The synthesized face image thus synthesized is synthesized with the masked face area of the animation hero.

【００６０】後処理部６０では、プロセスのエラーによ
り、アニメーション主人公の顔領域と代替顔領域との間
に空間が生ずる場合、本発明の画素補間技法を用いて代
替顔の境界部分に位置する画素値を基に該空間を埋める
か、アニメーション主人公の顔領域の外部に近接する画
素値と代替顔の境界部分に位置する画素値との間の平均
画素値を計算し、その平均画素値を該空間に埋めること
ができる。In the post-processing unit 60, when a space occurs between the face area of the animation hero and the alternative face area due to a process error, the pixel located at the boundary portion of the alternative face using the pixel interpolation technique of the present invention. Fill the space based on the values or calculate the average pixel value between the pixel value close to the outside of the face area of the animation hero and the pixel value located at the boundary of the alternative face, and calculate the average pixel value. Can be buried in space.

【００６１】その結果として、アニメーション主人公の
顔部分が代替人物の顔に置き換えるか或いは合成済みの
アニメーション映像フレームはコンピュータのハードデ
ィスクまたはビデオカセットレコーダなどのデータ格納
装置(図示せず)に格納される。後続フレームに対するア
ニメーション映像合成処理は前述した一連の過程を繰返
すことによって行われるので、それに対する説明は省略
する。As a result, the animated hero's face is replaced with the face of the substitute person, or the synthesized animation video frame is stored in a data storage device (not shown) such as a hard disk of a computer or a video cassette recorder. Since the animation video synthesizing process for the subsequent frame is performed by repeating the above-described series of processes, description thereof will be omitted.

【００６２】今まで、本発明は２次元アニメーション映
像に対して説明したが、本発明のポーズ／表情合成技法
と画素補間技法とを用いることによって３次元映像媒体
に対しても適用できることは勿論である。従って、コン
ピュータを用いたホームショッピング時にショッピング
企業体から提供された2次元製品の映像をパソコン上で
３次元の映像で再現し感想できるという長所がある。Although the present invention has been described above with respect to a two-dimensional animation image, it is needless to say that the present invention can be applied to a three-dimensional image medium by using the pose / expression synthesis technique and the pixel interpolation technique of the present invention. is there. Therefore, there is an advantage that the image of the two-dimensional product provided by the shopping company during home shopping using a computer can be reproduced and felt as a three-dimensional image on a personal computer.

【００６３】上記において、本発明の好適な実施の形態
について説明したが、本発明の請求範囲を逸脱すること
なく、当業者は種々の改変をなし得るであろう。Although the preferred embodiment of the present invention has been described above, those skilled in the art will be able to make various modifications without departing from the scope of the present invention.

【００６４】[0064]

【発明の効果】従って、本発明によれば、次のような効
果を奏される。１．本発明によれば、一枚の写真を用いてアニメーショ
ン動画像で演出される主人公の顔を自分の顔に置き換え
ることができ、教育用としても使用でき、安価な製作費
用で自分が主人公になる動画像を製作することができ
る。２．本発明によれば、複数の参照映像を取得しアニメー
ション主人公の顔の回転方向に最も適合な参照映像を用
いることによって、より自然で画質が向上されたアニメ
ーション動画像を製作することができる。３．本発明による映像合成技法によれば、コンピュータ
を用いたホームショッピング時に製品の映像を3次元映
像としても再現できる。４．本発明による映像合成技法を用いて、ゲーム時にゲ
ーム主人公の顔をユーザの顔に置き換えることによって
ゲームをより興味に誘導できる効果を奏する。Therefore, according to the present invention, the following effects can be obtained. 1. ADVANTAGE OF THE INVENTION According to this invention, the main character's face produced by an animation moving image using one photograph can be replaced with his / her own face, can also be used for education, and becomes the main character at a low production cost. A moving image can be produced. 2. According to the present invention, by acquiring a plurality of reference videos and using a reference video most suitable for the rotation direction of the face of the animation hero, it is possible to produce an animation moving image with more natural and improved image quality. 3. According to the video synthesis technique of the present invention, a video of a product can be reproduced as a three-dimensional video during home shopping using a computer. 4. By using the video synthesis technique according to the present invention, the game main character can be replaced with the user's face during the game, so that the game can be more interesting.

[Brief description of the drawings]

【図１】本発明の第１好適実施例によるアニメーション
映像合成装置の概略的なブロック図である。FIG. 1 is a schematic block diagram of an animation image synthesizing apparatus according to a first preferred embodiment of the present invention;

【図２】本発明によるアニメーション顔取出し部におけ
る動作を説明するための流れ図である。FIG. 2 is a flowchart illustrating an operation of an animation face extracting unit according to the present invention.

【図３】本発明によって、主人公顔映像に対する隣接ボ
ックスを取出す方法を説明するための模式図である。FIG. 3 is a schematic diagram for explaining a method of extracting an adjacent box from a main face image according to the present invention;

【図４】本発明の第１好適実施例により目の表情情報を
取出す方法を説明するための模式図である。FIG. 4 is a schematic diagram illustrating a method for extracting facial expression information according to a first preferred embodiment of the present invention;

【図５】本発明の第１好適実施例により口の表情情報を
取出す方法を説明するための模式図である。FIG. 5 is a schematic view for explaining a method for extracting facial expression information according to the first preferred embodiment of the present invention;

【図６】本発明の第１好適実施例により顔のポーズ情報
を取出す方法を説明するための模式図である。FIG. 6 is a schematic diagram for explaining a method of extracting face pose information according to the first preferred embodiment of the present invention.

【図７】本発明の第１好適実施例により顔のポーズ情報
を取出す方法を説明するための流れ図である。FIG. 7 is a flowchart illustrating a method for extracting face pose information according to a first preferred embodiment of the present invention;

【図８】本発明の第１好適実施例による代替顔取出し部
における動作を説明するための流れ図である。FIG. 8 is a flowchart illustrating an operation of the alternative face extracting unit according to the first preferred embodiment of the present invention;

【図９】本発明によって、表情データに基づき代替人物
の顔を合成する方法を説明するための模式図である。FIG. 9 is a schematic diagram for explaining a method of synthesizing a face of a substitute person based on facial expression data according to the present invention.

【図１０】本発明によって、表情データに基づき代替人
物の顔を合成する方法を説明するための流れ図である。FIG. 10 is a flowchart illustrating a method of synthesizing a substitute person's face based on facial expression data according to the present invention.

【図１１】本発明の好適実施例による画素補間技法を説
明するための模式図である。FIG. 11 is a schematic diagram illustrating a pixel interpolation technique according to a preferred embodiment of the present invention.

【図１２】本発明の第２好適実施例によるアニメーショ
ン映像合成装置の概略的なブロック図である。FIG. 12 is a schematic block diagram of an animation image synthesizing apparatus according to a second preferred embodiment of the present invention;

【図１３】複数の参照映像を示す模式図である。FIG. 13 is a schematic diagram showing a plurality of reference videos.

【図１４】左右回転によって歪曲された合成映像を示す
模式図である。FIG. 14 is a schematic diagram showing a composite video image distorted by horizontal rotation.

【図１５】顔ポーズが正面の参照映像から右側にかたよ
る場合を示す模式図である。FIG. 15 is a schematic diagram showing a case where a face pose is tilted rightward from a front reference image.

【図１６】本発明による一連の顔合成を表情及びポーズ
順に行う過程を説明する模式図である。FIG. 16 is a schematic diagram illustrating a process of performing a series of facial synthesis in the order of facial expressions and poses according to the present invention.

[Explanation of symbols]

１０アニメーション映像フレーム２０代替人物映像フレーム３０アニメーション顔取出し部４０代替顔合成部５０映像合成部６０後処理部７０ディスプレー DESCRIPTION OF SYMBOLS 10 Animation video frame 20 Alternative person video frame 30 Animation face extraction part 40 Alternative face synthesis part 50 Video synthesis part 60 Post-processing part 70 Display

フロントページの続き (72)発明者全炳煥大韓民国忠清南道天安市雙龍洞月峰一星アパート509−501 (72)発明者崔淳眞大韓民国ソウル特別市九老区開封３洞270 −75番地102号 (72)発明者徐 ▲じょん▼ 源大韓民国ソウル特別市東大門区轉農洞398 −10，26／１Ｆターム(参考） 5B050 AA06 AA08 AA09 BA08 BA12 CA07 DA02 DA04 DA07 EA07 EA09 EA12 EA18 EA19 EA24 FA02 FA05 GA08 5B057 AA20 BA02 BA29 CA01 CA08 CA13 CA16 CB01 CB08 CB13 CB16 CC03 CD03 CE08 DA08 DB03 DB06 DB09 DC03 DC25 DC32 Continuing on the front page (72) Inventor Zen Byung-Hwan South Korea, Chungcheongnam-do, Cheonan-si, Sanggyong-dong, Wolfeng 1-star, Part 509-501 (72) Inventor Choi Jun Shin 270-75 Kaifeng 3-dong, Guro-gu, Seoul, Korea. No. 102 (72) Inventor Xu ▲ Jun ▼ Source 398-10, 26/1 F-term (Reference) 5B050 AA06 AA08 AA09 BA08 BA12 CA07 DA02 DA04 DA07 EA07 EA09 EA12 EA18 EA19 EA24 FA02 FA05 GA08 5B057 AA20 BA02 BA29 CA01 CA08 CA13 CA16 CB01 CB08 CB13 CB16 CC03 CD03 CE08 DA08 DB03 DB06 DB09 DC03 DC25 DC32

Claims

[Claims]

1. A video synthesizing method for synthesizing a predetermined stop video and a target video in a video frame, wherein the predetermined stop video and the target video each include a plurality of pixels, Extracting a target image from an image medium having a frame, determining a first set of video information and a second set of video information for the extracted target image, and digitally referring to the predetermined stop image. Extracting a video, synthesizing the digital reference video based on the second set of video information with respect to the target video to provide a synthesized video, and combining the target video in the video frame with the synthesized video. C) replacing the image with an image.

2. The method according to claim 1, wherein the image medium is an animation image.

3. The method according to claim 2, wherein the target image is a face of a specific person in the animation image.

4. The method according to claim 1, wherein the step (a) comprises: extracting a first boundary area surrounding the face of the specific person in the animation image; and enclosing a main component of the face within the first boundary area. 4. The method according to claim 3, further comprising the step of extracting a plurality of second boundary areas.

5. The method according to claim 1, wherein, in the step a1, when a predetermined position in the first boundary area is designated by a user, adjacent pixels having a predetermined similarity based on a pixel value of the designated position are used as a reference. A111 step of determining the first boundary area by searching for; and determining pixel coordinates having maximum and minimum pixel values in horizontal and vertical directions among the searched neighboring pixel values, A first adjacent area is obtained by connecting the coordinates so as to surround one boundary area, and a coordinate corresponding to the center of the face of the specific person is selected from the pixel coordinates, and the first adjacent area and the face are selected. Outputting the center coordinates of the first set as the first set of image information.

6. The main component of the face of the specific person includes a pair of eyes and a mouth, and the a2 step includes the step of: when the user specifies a general center position of each eye, the first boundary. A21 step of searching for a pixel value belonging to a predetermined range in the area; and determining a row and a column in which the searched pixel value is concentrated for each eye, and based on the determination, A22 determining a second boundary region including a binary image representing a black eye and a white eye; and a maximum and a minimum pixel value in the horizontal and vertical directions among pixel values located in the second boundary region for each eye. 5. The method according to claim 4, further comprising the step of: a23 determining a second adjacent area for each eye by determining pixel coordinates having the following and connecting the respective coordinates.

7. The step a2, when two approximate center positions of the mouth are designated by a user, adjacent pixels having a predetermined similarity based on a pixel value of each center position are determined. A24 step of searching to determine a second boundary region for the mouth, and determining pixel coordinates having maximum and minimum pixel values in the horizontal and vertical directions among the pixel values searched in the step a21. 7. The method according to claim 6, further comprising the step of: a25 determining a second adjacent area to the mouth by connecting the coordinates.

8. The a25 step: connecting maximum and minimum pixel coordinates on the horizontal and vertical sides among pixel coordinates having a pixel value different from the lip pixel value in the second boundary region with respect to the mouth. The method according to claim 7, further comprising: a26 obtaining a second internal boundary area adjacent to the mouth formed inside the second boundary area.

9. The step a) after the step a23, calculating a width and a height of the second boundary region with respect to each eye, and calculating a ratio of the calculated width and height to a first expression. A27 step of outputting as data, calculating the height of the second boundary area and the height of the second internal boundary area with respect to the mouth, and outputting the calculated ratio of each height as second expression data A28 stage to be performed,
The method according to claim 8, further comprising:

10. The method according to claim 1, wherein the step (a) is performed after the step (a23).
Each distance to the boundary of the adjacent area is calculated, and if the calculated distances are the same, first rotation data is output. If the distances are not the same, a smaller distance among the distances is output. A29 step of outputting the value of as the second rotation data; determining whether the maximum and minimum pixel coordinates with respect to the vertical axis have moved up and down in the first adjacent area, and assigning a value corresponding to the degree of movement to the A3 output as 3 rotation data
Step a31, determining whether a line segment connecting the centers of the second adjacent regions to the respective eyes is inclined with respect to the horizontal line segment, and outputting a value corresponding to the degree of inclination as fourth rotation data. 9. The video synthesizing method according to claim 8, further comprising:

11. The center position and reference pixel value of each eye specified in the animation video frame,
7. The method according to claim 6, wherein the method is used as a reference position and a reference pixel value of a subsequent animation video frame.

12. The method according to claim 1, wherein the step (b) comprises: extracting the digital reference image from the predetermined stop image, and designating a feature point including a fixed point and a control point on the main component in the reference image. 10. The video compositing method according to claim 9, comprising: b1 step; and b2 step of generating the mesh by connecting each feature point so that a main component of the face of the specific person is surrounded by a mesh grid. Method.

13. The method according to claim 12, wherein the control point is located on each eyelid of the reference image.

14. The method according to claim 12, wherein the control point is located on a second boundary area and a second internal boundary area for the mouth of the reference image.

15. The method according to claim 12, wherein the second adjacent area is formed at a predetermined distance from the second boundary area.

16. The method according to claim 1, wherein the step (b2) includes moving the control point designated in a second boundary area with respect to the eye by a distance corresponding to the first expression data.
The method further comprises the following three steps: and a step b4 of moving the control point specified in the second boundary area or the second inner boundary area with respect to the mouth by a distance corresponding to the second facial expression data. The video composing method according to claim 12.

17. The method of claim 10, wherein the b2 step includes a b5 step of rotating a center of the reference image based on at least one of the first to fourth rotation data. Video synthesis method.

18. In the b4th step, each control point adjacent to a control point having a maximum and minimum vertical axis pixel value in a second internal boundary region with respect to the mouth is referred to as a reference control point. 17. The video composing method according to claim 16, wherein the video is vertically moved in half proportion to the horizontal distance.

19. The control point of the mesh for the mouth,
19. The method according to claim 18, wherein the image is extended to the outside from a control point of the mesh for another component.

20. After moving the control point by a distance corresponding to the first and second expression data, the fixed point is determined based on a series of pixel values located between the control point and the fixed point. 2. The method according to claim 1, further comprising: obtaining a pixel value corresponding to the moved distance from the image data; and interpolating the pixel value between the moved distances to generate the mesh.
9. The video synthesizing method according to 9.

21. A video synthesizing method for synthesizing a predetermined stop video and a target video in a video frame, wherein the predetermined stop video and the target video each include a plurality of pixels, Extracting the target image from the image medium having a, and determining a first set of image information and a second set of image information for the extracted target image, a; Fetching and storing the digital reference image, and searching for and synthesizing the digital reference image to be optimally referred to the second set of image information from the plurality of digital reference images to generate an optimum synthesized image B. Replacing the target video in the video frame with the synthesized video. C. Method.

22. The method according to claim 21, wherein the video medium is an animation video.

23. The target image is a specific person's face in the animation image, the first set of image information is area data surrounding the specific person's face and center coordinates of the face, 23. The video synthesizing method according to claim 22, wherein the video information of the set is expression data for main components in the face of the specific person and rotation data indicating a degree of rotation of the face.

24. The method of claim 21, wherein the plurality of digital reference images are front, upper, lower, half-left, half-right, left, and right images of the face.

25. The step of b, wherein the rotation data in the second set of image information indicates that the specific face of the animation image is located in a first direction. 24. The video synthesizing method according to claim 23, wherein the searched reference video is searched from the plurality of digital reference videos, and the optimum synthesized video is output using the searched reference video.

26. An image synthesizing apparatus for synthesizing a predetermined still image and a target image in a video frame, wherein the predetermined still image and the target image each include a plurality of pixels, Video extracting means for extracting the target image from an image medium having a frame and determining a first set of image information and a second set of image information for the extracted target image; and a digital reference from the predetermined stop image. First synthesizing means for extracting an image, synthesizing the digital reference image based on the second set of image information through coordinate movement and pixel interpolation to generate a synthesized image, and the target image in the image frame. Second combining means for replacing the combined image with the combined image, wherein the first set of image information surrounds the target image. It has data and the center coordinates, the image synthesizing apparatus according to claim video information of the second set to have the characteristic data of the principal components surrounding element regions data and each component in the target image.

27. The video synthesizing apparatus according to claim 26, wherein the video medium is an animation video, and the target video is a face of a characteristic person in the animation video.

28. An image synthesizing apparatus for synthesizing a predetermined still image and a target image in a video frame, wherein the predetermined still image and the target image each include a plurality of pixels, Video extracting means for extracting a desired target image from an image medium having a frame, extracting a first set of image information and a second set of image information for the extracted target image, and from a plurality of the predetermined stop areas Optimum synthesized video by retrieving and storing a plurality of said digital reference videos, and searching and synthesizing from said plurality of digital reference videos a digital reference video to be optimally referenced to said second set of video information And a second synthesizing unit that replaces the target video in the video frame with the synthesized video. Wherein the first set of image information has area data and center coordinates surrounding the target image, and the second set of image information has element area data surrounding main components in the target image and characteristics of each element. A video synthesizing device having data.

29. The apparatus according to claim 27, wherein the image medium is an animation image, and the target image is a face of a characteristic person in the animation image.