TW201342305A

TW201342305A - Method and arrangement for 3D model morphing

Info

Publication number: TW201342305A
Application number: TW101150514A
Authority: TW
Inventors: Sammy Lievens; Donny G Tytgat; Maarten Aerts; Erwin Six
Original assignee: Alcatel Lucent
Priority date: 2012-01-12
Filing date: 2012-12-27
Publication date: 2013-10-16
Also published as: TW201336333A

Abstract

A method for morphing a standard 3D model based on 2D image data input, comprises the steps of performing an initial morphing (100) of said standard 3D model using a detection model and a morphing model, thereby obtaining a morphed standard 3D model determining (200) the optical flow between the 2D image data input and the morphed standard 3D model, applying (300) the optical flow (300) to said morphed standard 3D model, thereby providing a fine tuned 3D standard model.

Description

Method and device for deforming three-dimensional model

本發明係關於一種用於三維模型變形之方法。 The present invention relates to a method for three-dimensional model deformation.

目前根據實際動態場景或甚至廉價相機拍攝的影像來對模型變形會是困難的任務。舉例來說，三維，在接下來的內文中將會被縮寫為3D，模型藝術家可能會花很多時間和心力在創造具有高度細節和栩栩如生的3D內容和3D動畫。然而，這麼做並不能達到要求，而且在次世代通訊系統，比如說要做到讓參加會議的人的影像即時地3D視覺化，是無法做到的。 It is currently a difficult task to deform a model based on actual dynamic scenes or even images taken with inexpensive cameras. For example, 3D, which will be abbreviated as 3D in the following text, may take a lot of time and effort to create 3D content and 3D animation with high detail and lifelikeness. However, this does not meet the requirements, and in the next generation communication system, for example, it is impossible to visually 3D the images of the people attending the meeting.

因此，本發明的實施例提出一種方法和配置以進行影像模型變形，其可根據二維，在接下來的內文中將會被縮寫為2D，影像場景，甚至較低畫質的實際影像擷取產生高畫質3D影像模型，同時提供低廉、簡單與自動化解決方案。 Therefore, embodiments of the present invention propose a method and configuration for image model deformation, which can be abbreviated as two-dimensional, and will be abbreviated as 2D, image scene, and even lower-quality actual image capture in the following text. Produces high-quality 3D image models while providing a low-cost, simple and automated solution.

本發明的一個目的是經由本發明的實施例而達成，其中提供一種根據2D影像資料輸入將標準3D模型變形的方法，該方法包含以下的步驟：- 利用一偵測模型與一變形模型對該標準3D模型執行一初始變形，藉此取得一變形的標準3D模型； - 決定介於該2D影像資料輸入與該變形的標準3D模型之間的光流；- 對該變形的標準3D模型施加該光流，藉此提供一微調變形的標準3D模型。 An object of the present invention is achieved by an embodiment of the present invention, wherein a method for deforming a standard 3D model based on 2D image data input is provided, the method comprising the steps of: - utilizing a detection model and a deformation model The standard 3D model performs an initial deformation, thereby obtaining a modified standard 3D model; - determining the optical flow between the 2D image data input and the deformed standard 3D model; - applying the optical flow to the deformed standard 3D model, thereby providing a finely tuned standard 3D model.

如此一來，根據變形的傳統偵測可藉由光流變形而強化，以此而達到更真實的模型，同時可即時完成。 In this way, the conventional detection according to the deformation can be strengthened by the deformation of the optical flow, thereby achieving a more realistic model and being completed at the same time.

在一實施例中，介於該2D影像資料輸入與該變形的標準3D模型之間的光流係根據一由一先前D影像框所決定的先前微調變形的標準3D模型而決定。 In one embodiment, the optical flow between the 2D image data input and the deformed standard 3D model is determined based on a standard 3D model of the previous fine-tuned deformation determined by a previous D image frame.

在一實施例變化中，決定介於2D影像資料輸入與該變形的標準3D模型之間的光流之步驟包含：- 決定一介於該變形的標準3D模型的2D投影與該先前微調變形的標準3D模型的2D投影之間的第一光流；- 決定一介於該實際2D影像框與該先前微調變形的標準3D模型的2D投影之間的第二光流；- 結合該第一與第二光流以取得一介於該實際2D影像框與該變形的標準3D模型的2D投影之間的第三光流；- 根據在該變形的標準3D模型的2D投影期間取得的深度資訊調適該第三光流，以取得該介於該2D影像輸入與該變形的標準3D模型之間的光流。 In an embodiment variation, the step of determining the optical flow between the 2D image data input and the deformed standard 3D model comprises: - determining a 2D projection of the standard 3D model between the deformation and the criteria for the prior fine-tuning deformation a first optical flow between the 2D projections of the 3D model; - determining a second optical flow between the actual 2D image frame and the 2D projection of the prior fine-tuned standard 3D model; - combining the first and second Optical flow to obtain a third optical flow between the actual 2D image frame and the 2D projection of the deformed standard 3D model; - adapting the third according to depth information obtained during 2D projection of the deformed standard 3D model Optical flow to obtain the optical flow between the 2D image input and the deformed standard 3D model.

如此可達到高畫質且具時間效率的方法。 This achieves a high quality and time efficient method.

在另一實施例中，該變形模型係以介於該2D影像資料輸入與該變形的標準3D模型之間的光流加以調適。如此更可提高產生的模型的畫質，以及模型與輸入視訊物件的關聯。 In another embodiment, the deformation model is between the 2D image assets The optical flow between the material input and the deformed standard 3D model is adapted. This further improves the quality of the resulting model and the association of the model with the input video object.

在另一實施例中，在該初始變形步驟中的偵測模型係以根據介於該2D影像框與一先前2D影像框之間所決定的光流資訊加以調適。如此一來，針對輸入2D影像，可更快速與更真實的塑形/變形3D標準模型。 In another embodiment, the detection model in the initial deformation step is adapted according to the optical flow information determined between the 2D image frame and a previous 2D image frame. In this way, for the input 2D image, the 3D standard model can be shaped and deformed more quickly and more realistically.

在另一實施例變化中，施加該光流的步驟包含一能量最小化程序。 In another embodiment variation, the step of applying the optical flow includes an energy minimization procedure.

如此可進一步強化所產生的微調變形模型的畫質。 This further enhances the image quality of the resulting fine-tuned deformation model.

本發明也關於一種用以執行上述方法的裝置的實施例，包含此一裝置的影像或視訊處理裝置，以及一種電腦程式產品，包含當在一資料處理設備上實施時，適於執行如上述或申請專利範圍所述的方法之軟體。 The invention also relates to an embodiment of a device for performing the above method, comprising an image or video processing device of the device, and a computer program product comprising, when implemented on a data processing device, adapted to perform A soft body of the method as described above or in the scope of the patent application.

要注意的是，用於申請專利範圍中的「耦接(coupled)」一詞不能視為僅限於直接連接。因此，「裝置A耦接至裝置B」的表示式不應限制在裝置A的輸出直接連接至裝置B的輸入。它表示在裝置A的輸出和裝置B的輸入間有一路徑，其可包括其他裝置或手段。 It should be noted that the term "coupled" as used in the scope of the patent application is not to be construed as limited to direct connection. Therefore, the expression "device A coupled to device B" should not be limited to the input of device A being directly connected to the input of device B. It indicates that there is a path between the output of device A and the input of device B, which may include other devices or means.

要注意的是，用於申請專利範圍中的「包含(comprising)」一詞不能視為僅限於包含之後所條列的裝置或手段。因此，「一裝置包含元件A與B」的表示式不應限制為裝置僅有包含元件A與B。它代表針對本發明，與裝置相關的元件有A與B。 It is to be noted that the term "comprising" as used in the scope of the claims is not to be construed as limited to the means or means. Therefore, the expression "a device includes components A and B" should not be limited to the device having only components A and B. It represents that for the present invention, the components associated with the device are A and B.

如上述，在本說明書中，二維會以2D，三維會以3D縮寫表示。 As described above, in this specification, 2D will be represented by 2D, and 3D will be represented by 3D abbreviation.

熟悉此技藝者應可了解，在此所述的任何方塊圖用以代表實施本發明的原則的示範電路系統概念。同樣地，任何流程圖、流程表、狀態圖、虛擬碼，及其類似者所代表的各種程序，實質上都可利用電腦可讀取媒體代表，並且由電腦或處理器執行，不論是否有特別標示出電腦或處理器都是如此。 Those skilled in the art will appreciate that any block diagrams described herein are representative of exemplary circuit system concepts for implementing the principles of the invention. Similarly, any program represented by any flow chart, flow chart, state diagram, virtual code, and the like can be substantially represented by a computer readable medium and executed by a computer or processor, whether or not there is a special This is the case with a computer or a processor.

第1圖所示為利用輸入的2D視訊產生一高畫質即時3D模型的裝置與對應方法之第一實施例的高階圖式。本實施例採用視訊序列的輸入連續訊框。在第1圖中的步驟係對一特定訊框，比如在時間T的2D視訊框所執行的步驟。 Figure 1 shows a high-order diagram of a first embodiment of a device and a corresponding method for generating a high-quality instant 3D model using input 2D video. This embodiment uses an input continuous frame of a video sequence. The steps in Figure 1 are for a particular frame, such as the steps performed at the 2D video frame at time T.

第一操作模100是對一可用的標準3D模型，其為事先選擇或預先儲存在記憶體內加以變形。模組100會根據時間T的輸入2D視訊框對此標準3D模型變形。此一變形程序的詳細實施例將會參考第2圖加以敘述。模組100的輸出為在時間T的變形的標準3D模型。 The first mode of operation 100 is a standard 3D model available that is pre-selected or pre-stored in memory for deformation. The module 100 will deform the standard 3D model according to the input 2D video frame of time T. A detailed embodiment of this variant procedure will be described with reference to Figure 2. The output of module 100 is a standard 3D model of deformation at time T.

與變形步驟100部分平行的是，光流係由從在時間T的2D視訊框到在時間T的變形的標準3D模型所決定，此一步驟是在模組200進行，其具有在時間T的2D視訊框做為輸入，由模組100所提供的變形的標準3D模型，還有在先前時間步驟中所決定的裝置的輸出。先前決定的輸出是有關在先前時間步驟中所決定的微調變形的標準3D模型，在第1圖所示的實施例中，時間為T-1，並且經由裝置輸出以回授連結提供給模組200。在第1圖中，回授迴圈包含一延遲元件D，致能提供先前決定的輸出。當然了，其他許多根據簡單記憶體儲存的實施方式，也可以採用，藉此來減少使用專屬延遲元件的需要。同樣要注意的是不僅是對應時間T-1的先前視訊框，在其他的先前時間所決定的輸出，也可以使用。而延遲必須根據這些實施例加以調整。 Parallel to the deformation step 100, the optical flow is determined by a standard 3D model from the 2D video frame at time T to the deformation at time T, this step being performed at module 200, which has time T The 2D video frame is used as input, and the standard 3D model of the deformation provided by the module 100, There is also the output of the device as determined in the previous time step. The previously determined output is a standard 3D model of the fine-tuning deformation determined in the previous time step. In the embodiment shown in Figure 1, the time is T-1 and is provided to the module via the device output with a feedback link. 200. In Figure 1, the feedback loop contains a delay element D that provides the previously determined output. Of course, many other implementations based on simple memory storage can also be employed to reduce the need to use dedicated delay elements. It should also be noted that not only the previous video frame corresponding to time T-1, but also the output determined at other previous times can be used. The delay must be adjusted according to these embodiments.

第1圖的實施例更包含另一模組300，用來將模組200決定的光流，施加於模組100所提供的變形的標準3D模型。本發明基本的想法是結合模組100以模型為主的方法，其所使用的是相當簡單的3D模型，與模組300的光流為主的變形細節，而光流本身則於模組200內推導。實際上，在用於臉部模型化的時候，從模組100所獲得的模型變形可通常產生有點人工化的臉部，接著再由模組300的光流變形加以改進修正，而光流本身是由模組200決定。 The embodiment of FIG. 1 further includes another module 300 for applying the optical flow determined by the module 200 to the deformed standard 3D model provided by the module 100. The basic idea of the present invention is to combine the module 100 model-based method, which uses a relatively simple 3D model, with the optical flow of the module 300 as the main deformation detail, and the optical flow itself in the module 200. Internal derivation. In fact, when used for face modeling, the model deformation obtained from the module 100 can usually produce a somewhat artificial face, which is then modified by the optical flow deformation of the module 300, and the optical flow itself It is determined by the module 200.

如前述，所產生的微調變形3D標準模型會用在決定光流的回授迴圈內。 As mentioned above, the resulting fine-tuned 3D standard model is used to determine the feedback loop of the optical flow.

以下將透過臉部特徵的模型化描述更詳細的實施例。熟悉此技藝者應可運用本說明書的內容，在視訊中對其他可變形的物件像是動物等，加以變形。 A more detailed embodiment will be described below by modeling the facial features. Those skilled in the art should be able to use the contents of this manual to modify other deformable objects like animals in video.

第2圖所示為第1圖的標準3D變形區塊100的更詳細實施例。此一模組包含像是一主動外觀模型(Active Appearance Model,AAM)偵測模組。然而，也有利用其他偵測模組像是主動形狀模型(Active Shape Model,ASM)的偵測模組的實施例。 Figure 2 shows a more detailed embodiment of the standard 3D deformation block 100 of Figure 1. This module contains an Active Appearance Model (AAM) detection module. However, there are also embodiments in which other detection modules such as Active Shape Model (ASM) detection modules are utilized.

偵測模組110可根據一偵測模型，像是AAM偵測模型，致能偵測視訊框在時間T的臉部特徵。AAM模型與AAM偵測是電腦視覺技術中用以偵測非堅固物件的特徵點之已知技術。當輸入3D視訊進入系統時，AAM變形也可延伸為3D定位，而AAM偵測模組可偵測在其他不是臉部的物件上的特徵點。所欲偵測的物件類別可與AAM偵測模組的訓練階段有關，而訓練可離線進行，或者在早期訓練程序中執行。在所述的實施例中，AAM偵測模組110被訓練來偵測2D視訊框中的人臉上的特徵點，像是鼻子、嘴巴、眼睛、睫毛，以及臉頰等非堅固物件。AAM偵測模組110中所用的AAM偵測模型可由一組模型中選出，或者可以預先程式化或離線訓練，以便能夠通用在所有人臉上。 The detection module 110 can detect the facial features of the video frame at time T according to a detection model, such as an AAM detection model. AAM model and AAM detection are known techniques for detecting feature points of non-solid objects in computer vision technology. When inputting 3D video into the system, the AAM deformation can also be extended to 3D positioning, and the AAM detection module can detect feature points on other objects that are not faces. The type of object to be detected can be related to the training phase of the AAM detection module, and the training can be performed offline or in an early training program. In the illustrated embodiment, the AAM detection module 110 is trained to detect feature points on the face of the person in the 2D video frame, such as nose, mouth, eyes, eyelashes, and non-solid objects such as cheeks. The AAM detection model used in the AAM detection module 110 can be selected from a set of models, or can be pre-programmed or offline trained to be universally available to all faces.

在對動物模型像是貓進行變形時，訓練程序就必須調適為針對貓的形式/可能的表情，偵測其他重要特徵點。熟悉此技藝者應該很了解這些技術。 When an animal model is deformed like a cat, the training program must be adapted to detect other important feature points for the cat's form/possible expression. Those skilled in the art should be familiar with these techniques.

在人臉模型化的範例中，AAM偵測區塊110通常包含偵測在視訊框內的人臉的粗略動作，並同時或接著偵測與人類感情相關的臉部表情。在實況視訊框內的全臉之相對或絕對的位置在第1圖中是標示為「位置」資訊。位置資訊會被用來移動或轉動臉的3D標準模型，在模組120中以標準3D模型表示。此外，在模組110中藉由一些鼻子、睫毛、嘴巴等位置的粗略指示，也可以偵測有限數量的臉部表情。輸出在第1圖中是以「特徵(feature)」代表，並且會被用在變形模組130，以便調適由模組120輸出的位置調適標準模型的對應臉部特徵。 In the example of face modeling, the AAM detection block 110 typically includes a rough motion to detect a face within the video frame and simultaneously or subsequently detect facial expressions associated with human emotions. The full face in the live video frame The right or absolute position is marked as "location" information in Figure 1. The location information is used to move or rotate the 3D standard model of the face, represented in the module 120 as a standard 3D model. In addition, a limited number of facial expressions can be detected in the module 110 by a rough indication of positions such as nose, eyelashes, and mouth. The output is represented in Figure 1 as a "feature" and will be used in the deformation module 130 to adapt the corresponding facial features of the position adaptation standard model output by the module 120.

輸入至模組120的3D標準模型通常也可由標準資料庫找到或選擇。此種標準資料庫可包含人臉的3D標準模型，以及像是貓、狗類等動物的3D標準模型。而標準3D模型會根據模組110的位置資訊加以轉譯、轉動，以及/或者縮放。 The 3D standard model input to module 120 can also typically be found or selected by a standard repository. Such a standard database can include 3D standard models of faces and 3D standard models for animals such as cats and dogs. The standard 3D model is translated, rotated, and/or scaled based on the location information of the module 110.

在人臉模型化的例子中，位置調適步驟使得3D標準模型反映在實況視訊輸入中的人臉的相同姿勢。為了進一步調適3D模型以便表示2D訊框的正確臉部表情，模組110所偵測到的特徵在步驟130會被施加在部分調整的3D標準模型上。此一變形模組130更使用特殊的調適模型，在第2圖中以「變形模型(morphing model)」表示，其可包括因應偵測模組提供的資訊以調適標準3D模型的臉部特徵的指令。在使用AAM偵測模型的情形下，變形模型通常是AAM變形模型。相同的考量也可適用於先前使用的ASM變形的例子。 In the case of face modeling, the position adaptation step causes the 3D standard model to reflect the same pose of the face in the live video input. To further adapt the 3D model to represent the correct facial expression of the 2D frame, the features detected by the module 110 are applied to the partially adjusted 3D standard model at step 130. The deformation module 130 further uses a special adaptation model, which is represented by a "morphing model" in FIG. 2, which may include adapting the information provided by the detection module to adapt the facial features of the standard 3D model. instruction. In the case of using the AAM detection model, the deformation model is usually the AAM deformation model. The same considerations apply to examples of previously used ASM variants.

所得到的結果就是由模組130提供的變形的標準3D模型。 The result obtained is a standard 3D model of the deformation provided by the module 130.

以此模型進行變形的示範實施方式可包括根據實況視訊輸入的臉部特徵偵測結果，重新放置與臉部特徵有關的標準3D模型的頂點。臉部特徵之間的3D內容可藉由簡單的線性內插填補，或者要包括臉部彈性的較複雜的高階AAM變形模型時，會用到更高階內插或甚至其他更複雜的函數。有關頂點如何移位與臉部特徵間的資料如何填補，都包含在變形模型中。 An exemplary embodiment of the deformation of the model can include repositioning the vertices of the standard 3D model associated with the facial features based on the facial feature detection results of the live video input. Higher-order interpolation or even more complex functions are used when 3D content between facial features can be filled by simple linear interpolation, or when more complex high-order AAM deformation models including facial elasticity are to be included. The information about how the vertices are shifted and how the data between the facial features are filled is included in the deformation model.

要注意的是，不論可用的偵測與變形模型(AAM)的畫質多好，還是有可能會得到看起來人工化的結果，因為一般可用的偵測模型只是用來偵測在實況視訊輸入內的臉部特徵的位置，接著會根據在實況視訊輸入中的位置，用來更換為3D位置調適模型的臉部特徵。接著在3D標準模型的臉部特徵之間的區域會利用一個AAM變形模型加以內插。然而，後者並不知道或僅知一些有關每一臉部特徵在更換時會影響鄰近臉部區域的知識。有關臉部表情和它們對臉部區域的影響的一般資訊，可能與彈性相關，可以放進變形模型內，但是還是會造成看起來人工化的變形結果，這只不過是因為每個人都不一樣，而沒有一個非常通用的模型可以涵蓋所有的臉部表情。 It should be noted that no matter how good the image quality of the available detection and deformation model (AAM) is, it is possible to obtain a result that looks artificial, because the commonly used detection model is only used to detect the live video input. The position of the internal facial features is then replaced with the facial features of the 3D position adaptation model based on the position in the live video input. The area between the facial features of the 3D standard model is then interpolated using an AAM deformation model. However, the latter does not know or only know some knowledge that each facial feature affects adjacent facial regions when replaced. General information about facial expressions and their effects on facial areas may be related to elasticity and can be placed in the deformation model, but it will still result in artificially deformed results, simply because everyone is different. Without a very generic model that covers all facial expressions.

類似的考量在根據3D標準模型對其他可變形物件如視訊中偵測到的動物變形時仍然有效。 Similar considerations are still valid when deforming animals detected in other deformable objects, such as video, according to the 3D standard model.

為了進一步改良變形的標準3D模型，由模組100所提供的人工化外觀變形模型可利用在步驟300中的光流式變形加以改進，先前已配合第1圖加以解說。 To further improve the deformed standard 3D model, the artificial appearance deformation model provided by the module 100 can be improved using the optical flow deformation in step 300, which has been previously explained in conjunction with FIG.

在執行光流式變形之前，必須先決定光流本身。在此所定義的光流是從一個訊框到另一個，或者是從一訊框到一個2D或3D模型的視覺場景內，物件、表面與邊緣的明顯動作的位移或型態。在此所述的實施例中，用以決定光流的方法著重在以像素等級計算不同時間，例如T與T-1，所擷取到的影像之間的動作，或替代地，著重在計算在時間T的像素與3D模型內對應的立體像素之間的位移，反之亦然。 Before performing optical flow deformation, the optical flow itself must be determined. The optical flow defined herein is the displacement or pattern of apparent motion of objects, surfaces and edges from a frame to another, or from a frame to a 2D or 3D model. In the embodiments described herein, the method for determining the optical flow focuses on calculating the time between different images, such as T and T-1, at the pixel level, or alternatively, focusing on the calculation. The displacement between the pixels at time T and the corresponding voxels within the 3D model, and vice versa.

由於必須在模組300中根據2D視訊框施加光流於變形的標準3D模型，所以光流必須從2D視訊框計算到3D模型。一般來說，光流計算是計算從一個2D視訊框到另一個2D視訊框，所以要加上一些額外的步驟來決定從2D視訊框到3D變形模型。額外的步驟可包括使用參考3D輸入，即先前決定，例如時間T-1決定的微調3D模型。此一資訊會由裝置的輸出提供給模組200。 Since the light must be applied to the deformed standard 3D model in the module 300 according to the 2D video frame, the optical flow must be calculated from the 2D video frame to the 3D model. In general, optical flow calculations are calculated from one 2D video frame to another 2D video frame, so some extra steps are added to determine the 2D video frame to the 3D deformation model. Additional steps may include using a reference 3D input, ie a previously determined, such as a fine-tuned 3D model determined at time T-1. This information is provided to the module 200 by the output of the device.

第3圖所示為實施模組200的詳細實施例。在此實施例中，第一模組250係適於決定在變形的標準3D模型的2D投影與先前的微調變形3D標準模型的2D投影之間的一第一光流。第二模組290係適於決定在時間T的實際2D視訊框與先前的微調變形3D標準模型的2D投影之間的一第二光流。一結合模組270利用第一與第二光流計算一第三光流。此一第三光流是介於時間T的實際2D視訊框與時間T的變形標準3D模型的2D投影。模組280接著會進一步調適第三光流以取得介於時間T的2D影像資料輸入與時間T的變形的標準3D模型之間的想要的光流。以下將敘述進一步的細節。 Figure 3 shows a detailed embodiment of the implementation module 200. In this embodiment, the first module 250 is adapted to determine a first optical flow between the 2D projection of the deformed standard 3D model and the 2D projection of the previous trimming deformed 3D standard model. The second module 290 is adapted to determine a second optical flow between the actual 2D video frame at time T and the 2D projection of the previous fine-tuned 3D standard model. A combining module 270 calculates a third optical flow using the first and second optical flows. This third optical flow is a 2D projection of the deformed standard 3D model of the actual 2D video frame at time T and time T. Module 280 then further adapts the third optical stream to obtain 2D imagery at time T The desired optical flow between the input and the deformed standard 3D model of time T is input. Further details will be described below.

為了決定在變形的標準3D模型的2D投影與先前的微調變形3D標準模型的2D投影之間的第一光流，這些2D投影都會在個別的3D模型上處理提供給模組200。因此，模組230係適於針對模組100所提供的變形的標準3D模型執行2D貼圖或投影，而模組240係適於針對先前的微調變形3D標準模型執行類似的2D投影，在第3圖的實施例中即為時間T-1所決定的。在這些投影中所使用的投影參數較佳地為對應錄製2D視訊框的視訊攝影機的投影參數。其與視訊攝影機的調整參數相關。 To determine the first optical flow between the 2D projection of the deformed standard 3D model and the 2D projection of the previous fine-tuned 3D standard model, these 2D projections are processed and provided to the module 200 on individual 3D models. Thus, module 230 is adapted to perform 2D mapping or projection for a deformed standard 3D model provided by module 100, while module 240 is adapted to perform a similar 2D projection for a previous fine-tuned 3D standard model, at 3rd In the embodiment of the figure, it is determined by time T-1. The projection parameters used in these projections are preferably projection parameters corresponding to the video camera that recorded the 2D video frame. It is related to the adjustment parameters of the video camera.

在第3圖所示的實施例中，模組290包含三個子模組。在模組290的模組220中，係決定介於目前時間T的視訊框與先前的、在此例中為時間T-1的視訊框之光流。先前2D視訊框的時間係與先前微調的變形3D標準模型的時間相同。 In the embodiment shown in FIG. 3, the module 290 includes three sub-modules. In the module 220 of the module 290, the optical flow between the video frame at the current time T and the previous video frame at time T-1 in this example is determined. The time of the previous 2D video frame is the same as the time of the previously fine-tuned deformed 3D standard model.

因此，模組290的延遲元件210會引進與第1圖的完整裝置的回授迴圈所用的相同延遲。在此當然有其他實施例可以提供2D視訊的先前數值，其可簡單地儲存在內部記憶體內，省除了額外延遲區塊的需求。 Thus, delay element 210 of module 290 introduces the same delay as the feedback loop of the complete device of Figure 1. Of course, other embodiments may provide previous values for 2D video, which may simply be stored in internal memory, eliminating the need for additional delay blocks.

因此在模組220可決定在連續的視訊框T與T-1之間所計算而得的光流，然後用在模組260，以便決定從時間T-1的3D微調輸出的2D投影到時間T的2D視訊框之光流。投影本身會在模組240執行。投影參數是用來映對 2D攝影機所用與2D視訊框所用來記錄的參數。 Therefore, the module 220 can determine the optical flow calculated between the successive video frames T and T-1, and then used in the module 260 to determine the 2D projection of the 3D fine-tuning output from time T-1 to the time. The optical flow of the 2D video frame of T. The projection itself will be executed at module 240. Projection parameters are used to map The parameters used by the 2D camera and recorded by the 2D video frame.

在步驟260中，決定第二光流的過程考慮到標準模型和實況視訊輸入有時候會代表不同的人，這仍需要校準。在一些實施例中模組260可包含二個步驟：第一臉部註冊步驟，其中在先前時間T-1的實況視訊輸入的臉型會映對至時間T-1的先前微調變形3D內容的2D投影的臉型。此一註冊步驟也可再一次使用AAM偵測器。接著，在時間T的實況視訊輸入所計算的光流係藉由對時間T-1的2D投影3D內容的臉型進行內插運算而校準。這些實施例將進一步在第7與第8圖中說明。 In step 260, the process of determining the second optical flow takes into account that the standard model and the live video input sometimes represent different people, which still requires calibration. In some embodiments, the module 260 can include two steps: a first face registration step, wherein the face of the live video input at the previous time T-1 is mapped to the 2D of the previously fine-tuned 3D content of the time T-1. The projected face. This registration step can also use the AAM detector again. Next, the optical flow calculated by the live video input at time T is calibrated by interpolating the face of the 2D projected 3D content at time T-1. These embodiments will be further illustrated in Figures 7 and 8.

由模組250決定的介於時間T的變形標準模型的2D投影與時間T-1的先前微調標準模型之間的第一光流，接著會與模組260所決定的第二光流結合，以生成從時間T的2D視訊到時間T的變形標準模型的2D投影之第三光流。這是所需的2D光流資訊。由於在此結合過程中會去掉一個中間共同元件，也就是先前決定的微調模型的2D投影，因此在模組270是以一個減號“-”代表。 The first optical flow between the 2D projection of the deformation standard model of time T determined by module 250 and the previous fine tuning standard model of time T-1 is then combined with the second optical flow determined by module 260. A third optical flow of a 2D projection of a warped standard model that generates a 2D video from time T to a time T. This is the required 2D optical flow information. Since an intermediate common component, i.e., the 2D projection of the previously determined fine tuning model, is removed during this combination, the module 270 is represented by a minus sign "-".

然而，由於第三光流仍是二個2D影像的光流，因此需要額外的步驟280將此光流從時間T的2D視訊框轉換為時間T的變形的標準3D模型的3D內容。這可能牽涉到使用如同在2D投影時所用的反向程序之背投影，因此是採用同樣的投影參數。為此，會採用由2D投影所生成的深度，用來重新計算2D轉換至3D的頂點。 However, since the third optical stream is still the optical stream of the two 2D images, an additional step 280 is required to convert this optical stream from the 2D video frame of time T to the 3D content of the deformed standard 3D model of time T. This may involve the use of a backprojection as a reverse program used in 2D projection, so the same projection parameters are used. To this end, the depth generated by the 2D projection is used to recalculate the 2D transition to the 3D vertices.

要注意的是，與其使用時間T與T-1連續的視訊框與連續決定的微調變形的3D模型，介於新視訊框與先前視訊框之間的時間差可能會大於一個訊框的延遲。在此情形下會使用一對應的先前決定的輸出變形模型，使得在一實際訊框與模組200所用的先前訊框之間的時間差，會對應新的要被決定的輸出和用以決定光流的先前輸出的時間差。在一實施例中，可以用第1圖的回授迴圈的類似延遲元件D與第3圖的模組210來達成。 It should be noted that instead of using time T and T-1 consecutive video frames and The continuously determined fine-tuned 3D model may have a time difference between the new frame and the previous frame that is greater than the delay of one frame. In this case, a corresponding previously determined output deformation model is used such that the time difference between the actual frame and the previous frame used by the module 200 corresponds to the new output to be determined and the light used to determine the light. The time difference of the previous output of the stream. In an embodiment, the similar delay element D of the feedback loop of FIG. 1 and the module 210 of FIG. 3 can be used.

第1圖的模組300接著將計算而得的光流施加於變形的標準3D模型上，藉此產生微調變形的3D標準模型。 The module 300 of Figure 1 then applies the calculated optical flow to the deformed standard 3D model, thereby producing a fine-tuned 3D standard model.

如第4圖所示，在裝置的第一變化實施例中，在模組200的輸出中有多一額外的回授迴圈，用以計算介於時間T的2D視訊與時間T的變形的標準3D模型之光流，並用於一調適模組1000，用以執行標準3D模型的初始變形。此一調適模組1000的細節係示於第5圖。與第2圖比較，此一模組1000接收到額外的輸入訊號，標示為光流計算模組200的輸出所提供的「光流」，其資訊係用來調適在變形模組130中所用的變形模型。在變形模組1000內的額外模組140會根據光流資訊更新先前版本的變形模型。在第5圖所示的實施例中同樣有使用到一延遲元件，不過在其他實施例中也可僅使用儲存先前數值的方式。 As shown in FIG. 4, in the first variant embodiment of the apparatus, there is an additional feedback loop in the output of the module 200 for calculating the deformation of the 2D video and the time T between the time T. The optical flow of the standard 3D model is used in an adjustment module 1000 to perform initial deformation of the standard 3D model. The details of this adjustment module 1000 are shown in Figure 5. Compared with FIG. 2, the module 1000 receives an additional input signal, which is indicated as the "optical flow" provided by the output of the optical flow calculation module 200, and the information is used to adapt the use in the deformation module 130. Deformation model. The additional module 140 within the deformation module 1000 updates the previous version of the deformation model based on the optical flow information. A delay element is also used in the embodiment shown in Fig. 5, but in other embodiments only the manner of storing the previous value may be used.

使用光流回授更新變形模型的有用之處在於，標準通用的變形模型並不知道每一個臉部特徵的位移會如何影響到鄰近臉部區域。這是因為在基本變形模型中對彈性沒有或沒有足夠的註解。提供光流資訊可以了解更複雜更高階的變形模型。此處的想法是一個完美的變形模型可將3D標準模型變形，使其唯妙唯肖地模擬實況視訊輸入，在此情形下，模組200的「光流結合」區塊270就沒有需要處理多出的光流，而會變得多餘。 The usefulness of updating the deformation model using optical flow feedback is that the standard universal deformation model does not know how the displacement of each facial feature affects the adjacent facial area. This is because there is no elasticity in the basic deformation model. Or not enough comments. Provide optical flow information to understand more complex and higher-order deformation models. The idea here is that a perfect deformation model can transform the 3D standard model to make it possible to simulate live video input. In this case, the "optical flow combination" block 270 of the module 200 does not need to be processed. The extra light flow will become redundant.

在第6圖所示的另一變化實施例中，有另一回授迴圈，用以從光流計算模組200回授一內部訊號至標準3D變形模組100。第7圖所示為詳細實施例：回授實際上是由時間T與T-1的2D視訊框之間的光流，提供給額外的AAM或其他偵測模型調適模組。在此可假定實況視訊輸入內的視訊框T-1與T之間的光流將視訊框T-1所偵測到的臉部特徵映對至視訊框T所偵測到的臉部特徵。由於並非所有的臉部表情都會被偵測模組找到，因此有時候無法成功偵測到實況視訊輸入內的臉部特徵。此一狀況可藉由調適偵測模組以偵測臉部特徵來解決，如此一來將可納入臉部表情，而未來出現時就可偵測得到，並且施加在3D標準模型。 In another variation of the embodiment shown in FIG. 6, there is another feedback loop for returning an internal signal from the optical flow calculation module 200 to the standard 3D deformation module 100. Figure 7 shows a detailed embodiment: the feedback is actually the optical flow between the 2D video frames of time T and T-1, which is provided to additional AAM or other detection model adaptation modules. Here, it can be assumed that the optical flow between the video frames T-1 and T in the live video input maps the facial features detected by the video frame T-1 to the facial features detected by the video frame T. Since not all facial expressions are found by the detection module, sometimes the facial features in the live video input cannot be successfully detected. This condition can be solved by adapting the detection module to detect facial features, so that it can be incorporated into facial expressions, which can be detected in the future and applied to the 3D standard model.

第8圖所示的實施例則是將目前所提到的所有回授迴圈都加進來。 The embodiment shown in Figure 8 adds all of the feedback loops mentioned so far.

第9圖所示為另一高階實施例，其中採用更偏向機率的做法來結合模型與光流式變形。模型為主的模組100提供3D模型的特徵點的有限分散集的精確位移，而以流程為主的模組提供較不精確的2D位移估計，但是會在模型上形成較密集的點。透過機率方法結合不同分類與不同精確度的觀察點，甚至可獲得用於微調變形的3D標準模型的更精確結果。此一機率方法是由第9圖的能量最小化模組400所實施。 Figure 9 shows another high-order embodiment in which a more biased approach is used to combine the model with optical flow distortion. The model-based module 100 provides precise displacement of the finite dispersion of feature points of the 3D model, while the process-based module provides less accurate 2D displacement estimation, but forms denser points on the model. Combine different classifications with different precision through probability method The accuracy of the observation point, even more accurate results of the 3D standard model for fine-tuning deformation can be obtained. This probability method is implemented by the energy minimization module 400 of FIG.

在人臉模型化的例子中，此種機率方法可直覺式地將臉的基本彈性模型填補在未觀察到的空隙。臉部只能以某些方式活動。這些動作會有限制。舉例來說，在模型上的鄰近點會以類似的方式活動。同樣地，臉上的對稱點是相關的。意思是如果你看到左臉在笑，儘管右臉並沒有被觀察到，但是右臉也有很大的機率在笑。 In the case of face modeling, this probabilistic approach can intuitively fill the basic elastic model of the face in unobserved voids. The face can only be active in some way. These actions are limited. For example, neighbors on the model will behave in a similar manner. Similarly, the point of symmetry on the face is related. It means that if you see the left face laughing, although the right face is not observed, but the right face also has a great chance to laugh.

從數學上來看，其可視為能量最小化問題，包括二個資料項與一個平滑項。 Mathematically, it can be considered as an energy minimization problem, including two data items and one smoothing term.

E=S+D_FLOW+D_MODEL E=S+D _FLOW +D _MODEL

D_FLOW為最終微調變形的3D模型的建議候選方案與2D輸入影像的光流之間的距離關係。建議候選方案越接近機率分佈，在所觀察到的密集光流圖上，距離就越短。此一距離關係是與光流估計的精確度呈反比。 D _FLOW is the distance relationship between the proposed candidate for the final fine-tuned 3D model and the optical flow of the 2D input image. It is suggested that the closer the candidate is to the probability distribution, the shorter the distance on the observed dense optical flow diagram. This distance relationship is inversely proportional to the accuracy of the optical flow estimate.

D_MODEL是類似的距離關係，不過是候選方案與所觀察到的AAM為主變形3D模型之間的匹配關係。D_MODEL與AAM演算法的精確度也是呈反比。 D _MODEL is a similar distance relationship, but is a matching relationship between the candidate solution and the observed AAM-based deformation 3D model. The accuracy of the D _MODEL and AAM algorithms is also inversely proportional.

S則是懲罰臉部不太可能產生的動作。它包含二個類型的子項：絕對與相對懲罰。絕對懲罰與臉上的一點向建議方向移動的不可能性呈正比。相對懲罰的意義相同，不過是看鄰近點(或其他相關點，例如對稱點)之間的位移。 S is an action that punishes the face that is unlikely to occur. It contains two types of children: absolute and relative punishment. Absolute punishment is directly proportional to the impossibility of moving a little in the direction of the suggestion. The meaning of relative punishment is the same, no It is to see the displacement between adjacent points (or other related points, such as symmetrical points).

能量最小化問題可利用數值方法解決。範例像是梯度坡降法、隨機方法(模擬退火法、遺傳演算法、隨機漫步)、還有圖形分割、置信度傳播、Kalman濾波器等。其目的相同：找到在上述等式中能量最小的建議的變形3D模型。 The problem of energy minimization can be solved by numerical methods. Examples are gradient ramping, stochastic methods (simulated annealing, genetic algorithms, random walks), as well as graph segmentation, confidence propagation, Kalman filters, and more. The purpose is the same: find the proposed deformed 3D model with the lowest energy in the above equation.

第10圖為第9圖的更詳細實施例。 Figure 10 is a more detailed embodiment of Figure 9.

在第11圖中所示為第二個機率相關的實施例。在此實施例中，校準的光流會隨著時間累積，結合累積的校準光流和AAM偵測/變形結果並以能量最小化問題允許簡單與擬真的3D資料庫內容變形。累積光流所可能造成的偏移可以用加入AAM變形結果來修正。以及人工化外觀的變形結果可藉由加入光流變形結果而予以去除。 A second probability related embodiment is shown in FIG. In this embodiment, the calibrated optical flow will accumulate over time, combined with the accumulated calibration optical flow and AAM detection/deformation results and allow for simple and realistic 3D library content distortion with energy minimization issues. The offset that can be caused by the accumulated optical flow can be corrected by adding the AAM deformation result. And the deformation result of the artificial appearance can be removed by adding the optical flow deformation result.

要注意的是，所述的實施例不僅可以用於人臉的變形。利用本發明以模型為主的方法，可以建立任何非堅固物件的模型並用於變形。此外，本發明的實施例並不限於使用AAM模型。初始變形模組100也可以使用其他模型像是主動形狀模型(ASM)。 It is to be noted that the described embodiment can be used not only for deformation of a human face. With the model-based approach of the present invention, any model of non-solid objects can be built and used for deformation. Moreover, embodiments of the invention are not limited to the use of an AAM model. The initial deformation module 100 can also use other models like the Active Shape Model (ASM).

儘管本發明的原理已透過特定的裝置設備加以闡述，要注意的是本說明書的內容僅為舉例，而非用以限制本發明的範疇，而應以申請專利範圍為準。在申請專利範圍中用以執行一特定功能的手段的任何元件應被視為包含任何執行該功能的作法。舉例來說，由電氣或機械元件所執行的功能或任何形式的軟體的組合，包括像是，韌體、微程式碼，或其類似者，與適當的電路系統組合以便執行實施該功能的軟體，以及任何耦接至軟體所控制的電路系統的機械元件。本發明由申請專利範圍所定義，其中由各種所述手段所提供的功能係由申請專利範圍所引述的手段所組合和結合，除非有特別聲明，否則任何實體結構對於所申請的專利都不具或僅具有少許的重要性。本發明申請人因此認定所有可提供上述功能的手段都應視為本發明的等效範圍。 Although the principles of the present invention have been described with respect to the specific device and device, it is to be noted that the contents of the present specification are only examples, and are not intended to limit the scope of the present invention. Any element of the means for performing a particular function in the scope of the claims should be construed as including any means of performing the function. For example, performed by electrical or mechanical components A combination of functions or any form of software, including, for example, firmware, microcode, or the like, combined with appropriate circuitry to perform the software that performs the function, and any circuitry coupled to the software. The mechanical components of the system. The invention is defined by the scope of the patent application, wherein the functions provided by the various means are combined and combined by the means recited in the scope of the patent application, unless otherwise stated, any physical structure does not have or Only a little importance. The Applicant of the present invention therefore recognizes that all means for providing the above-described functions are considered to be equivalent scope of the present invention.

100‧‧‧第一操作模組 100‧‧‧First operation module

110‧‧‧偵測模組 110‧‧‧Detection module

120‧‧‧模組 120‧‧‧ modules

130‧‧‧變形模組 130‧‧‧ deformation module

140‧‧‧模組 140‧‧‧Module

200‧‧‧模組 200‧‧‧ modules

210‧‧‧延遲元件 210‧‧‧ Delay element

220‧‧‧模組 220‧‧‧ module

230‧‧‧模組 230‧‧‧ modules

240‧‧‧模組 240‧‧‧ modules

250‧‧‧第一模組 250‧‧‧ first module

260‧‧‧模組 260‧‧‧Module

270‧‧‧結合模組 270‧‧‧ Combined module

280‧‧‧步驟 280‧‧ steps

290‧‧‧第二模組 290‧‧‧ second module

300‧‧‧模組 300‧‧‧ modules

400‧‧‧能量最小化模組 400‧‧‧Energy Minimization Module

1000‧‧‧調適模組 1000‧‧‧Adjustment module

2000‧‧‧模組 2000‧‧‧ module

熟悉此技藝者將可透過以下的說明書與附屬的專利範例，並配合圖表說明，更為理解本發明的各種優點，其中：第1圖所示為本發明的方法之高階實施例；第2圖與第3圖所示為第1圖的實施例的某些模組的更詳細實施例；第4圖所示為本發明的方法的另一實施例的高階圖示；第5與第6圖所示為第4圖的實施例中的某些模組的進一步細節；第7圖與第8圖所示為另外二個實施例；第9圖所示為本發明的方法的另一高階實施例；以及第10圖與第11圖所示為更詳細的替代實施例。 Those skilled in the art will be able to understand the various advantages of the present invention through the following description and accompanying patent examples, and in conjunction with the accompanying drawings, wherein: FIG. 1 is a high-level embodiment of the method of the present invention; 3 is a more detailed embodiment of some of the modules of the embodiment of FIG. 1; FIG. 4 is a high-level illustration of another embodiment of the method of the present invention; 5th and 6th Further details of some of the modules in the embodiment of FIG. 4 are shown; FIGS. 7 and 8 show two other embodiments; and FIG. 9 shows another high-level implementation of the method of the present invention. Examples; and Figures 10 and 11 show more detailed alternative embodiments.

100‧‧‧第一操作模組 100‧‧‧First operation module

200‧‧‧模組 200‧‧‧ modules

300‧‧‧模組 300‧‧‧ modules

Claims

A method for deforming a standard three-dimensional model based on two-dimensional image data input, the method comprising the steps of: - performing an initial deformation (100) on the standard three-dimensional model using a detection model and a deformation model, thereby obtaining a deformation a standard three-dimensional model; - determining (200) the optical flow between the input of the two-dimensional image data and the standard three-dimensional model of the deformation; - applying (300) the optical flow to the standard three-dimensional model of the deformation, thereby providing A standard three-dimensional model of fine-tuning deformation.

The method of claim 1, wherein the optical flow between the two-dimensional image data input and the deformed standard three-dimensional model is based on a previous fine-tuned deformation determined by a previous two-dimensional image frame. The standard 3D model is determined.

The method of claim 2, wherein the step (200) of determining the optical flow between the input of the two-dimensional image data and the standard three-dimensional model of the deformation comprises: - determining (250) one between a first optical flow between the two-dimensional projection of the deformed standard three-dimensional model and the two-dimensional projection of the standard three-dimensional model of the previous fine-tuned deformation; - determining (290) a criterion between the actual two-dimensional image frame and the previous fine-tuned deformation a second optical flow between the two-dimensional projections of the three-dimensional model; - combining (270) the first and second optical flows to obtain a two-dimensional projection between the actual two-dimensional image frame and the deformed standard three-dimensional model Third optical flow; - adapting (280) the third optical stream according to depth information obtained during the two-dimensional projection of the deformed standard three-dimensional model to obtain the light between the two-dimensional image data input and the deformed standard three-dimensional model flow.

The method of any one of claims 1 to 3, further comprising adapting (140) the optical flow between the two-dimensional image data input and the modified standard three-dimensional model at the initial The step of the deformation model used in the deformation step (1000).

The method of any one of claims 1 to 4, further comprising adapting the initial deformation according to the optical flow information determined between the two-dimensional image frame and a previous two-dimensional image frame. The steps of the detection model used in the step.

The method of any of claims 1 to 3, wherein the step of applying the optical flow comprises an energy minimization procedure (400).

A device for deforming a standard three-dimensional model based on two-dimensional image data input, the device being adapted to: - perform an initial deformation (100) on the standard three-dimensional model using a detection model and a deformation model a modified standard three-dimensional model; - determining (200) an optical flow between the input of the two-dimensional image data and the standard three-dimensional model of the deformation; - applying (300) the optical flow to the standard three-dimensional model of the deformation, This provides a standard three-dimensional model of the fine-tuned deformation to one of the outputs of the device.

The device as described in claim 7 is more suitable for standard three-dimensional deformation based on previous fine-tuning deformation determined by a previous two-dimensional image frame. The model determines the optical flow between the two-dimensional image data input and the deformed standard three-dimensional model.

The device as described in claim 8 is further adapted to determine the optical flow between the input of the two-dimensional image data and the standard three-dimensional model of the deformation by the following steps: - determining (250) one a first optical flow between the two-dimensional projection of the deformed standard three-dimensional model and the two-dimensional projection of the standard three-dimensional model of the previous fine-tuned deformation; - determining (290) a criterion between the actual two-dimensional frame and the previous fine-tuned deformation a second optical flow between the two-dimensional projections of the three-dimensional model; - combining (270) the first and second optical flows to obtain a two-dimensional projection between the actual two-dimensional frame and the deformed standard three-dimensional model a third optical flow; - adapting (280) the third optical flow according to the depth information obtained during the two-dimensional projection of the deformed standard three-dimensional model to obtain the standard three-dimensional input between the two-dimensional image data and the deformation The flow of light between the models.

The apparatus of any one of claims 7 to 9 is further enabled to adapt (140) the optical flow between the two-dimensional image data input and the deformed standard three-dimensional model. The deformation model used in the initial deformation step (1000).

The device of any one of claims 7 to 10 is further adapted to adjust the optical flow information determined between the two-dimensional image frame and a previous two-dimensional image frame. The detection model used in the initial deformation step.

An image processing apparatus comprising the apparatus of any one of claims 7 to 11.

A computer program product comprising software for performing the method of any one of claims 1 to 6 when implemented on a data processing device.