TW201320005A

TW201320005A - Method and arrangement for 3-dimensional image model adaptation

Info

Publication number: TW201320005A
Application number: TW101121335A
Authority: TW
Inventors: Donny G Tytgat; Sammy Lievens; Maarten Aerts
Original assignee: Alcatel Lucent
Priority date: 2011-06-20
Filing date: 2012-06-14
Publication date: 2013-05-16

Abstract

1. Method for adapting a 3D model (m) of an object, said method comprising the steps of performing at least one projection of said 3D model to obtain at least one 2D image model projection (p1) with associated depth information (d1), performing at least one state extraction operation on said at least one 2D image model projection (p1), thereby obtaining at least one state (s1) adapting said at least one 2D image model projection (p1) and said associated depth information (d1) in accordance with said at least one state (s1) and with a target state (s), thereby obtaining at least one adapted 2D image model (p1') and an associated adapted depth (d1') back-projecting said at least one adapted 2D image model (p1') to 3D, based on said associated adapted depth (d1') for thereby obtaining an adapted 3D model (m').

Description

Method and configuration for 3D image model adjustment

本發明係有關一種三維(Three Dimensional；下文中簡稱3D)影像模型調整之方法。 The present invention relates to a method of three-dimensional (Three Dimensional; hereinafter referred to as 3D) image model adjustment.

通常以人工的方式執行3D模型調整，此做法通常是不理想的。調整3D模型的另一種方式利用狀態調整，狀態調整係有關將3D模型調整成符合某一狀態。該等狀態影響到形狀之3D位置及/或諸如該模型的某些部分或特徵之紋理(texture)等的外觀。3D模型狀態調整的現有技術之一主要問題仍然在於3D中將要調整的特徵之數目通常是相當大的，因而由於不充足的運算資源而通常仍然需要人工介入。此外，目前最佳的技術受限於使用已建模型(rigged model)，其於動態系統中做用時存在嚴重限制，在動態系統中，模型可學習使得其形狀亦可在學習過程中改變。 3D model adjustments are usually performed manually, which is often undesirable. Another way to adjust the 3D model is to use state adjustments, which are related to adjusting the 3D model to conform to a certain state. These states affect the 3D position of the shape and/or the appearance of textures such as certain portions or features of the model. One of the main problems with the prior art of 3D model state adjustment is that the number of features to be adjusted in 3D is typically quite large, and thus manual intervention is still typically required due to insufficient computing resources. In addition, the best technology at present is limited to the use of a rigged model, which is severely limited when used in a dynamic system. In a dynamic system, the model can be learned so that its shape can also be changed during the learning process.

因此，本發明的實施例之一目的在於提出一種可全自動使用且能夠使用可動態調整模型之3D影像模型調整之方法及配置。 Accordingly, it is an object of an embodiment of the present invention to provide a method and arrangement that can be fully automated and that can be adjusted using a 3D image model that dynamically adjusts the model.

根據本發明的實施例，以一種調整物件的3D模型之方法實現該目的，該方法包含下列步驟：執行該3D模型之至少一投影，以便得到具有相關聯的深度資訊(d1)之至少一2D影像模型投影(p1)；對該至少一2D影像模型投影(p1)執行至少一狀態提取操作，因而得到至少一狀態(s1)；根據該至少一狀態(s1)及一目標狀態(s)而調整該至少一2D影像模型投影(p1)及該相關聯的深度資訊，因而得到至少一已調整之2D影像模型(p1')及一相關聯的已調整之深度(d1')；根據該相關聯的已調整之深度(d1')將該至少一已調整之2D影像模型反投影到3D，因而得到一已調整之3D模型(m')。 According to an embodiment of the invention, the object is achieved in a method of adjusting a 3D model of an object, the method comprising the steps of: Performing at least one projection of the 3D model to obtain at least one 2D image model projection (p1) having associated depth information (d1); performing at least one state extraction operation on the at least one 2D image model projection (p1), thus Obtaining at least one state (s1); adjusting the at least one 2D image model projection (p1) and the associated depth information according to the at least one state (s1) and a target state (s), thereby obtaining at least one adjusted a 2D image model (p1') and an associated adjusted depth (d1'); the at least one adjusted 2D image model is backprojected to 3D based on the associated adjusted depth (d1') Get an adjusted 3D model (m').

藉由調整至少一2D投影之狀態及其3D影像模型的相關聯的深度資訊，而使用較少的運算資源，因而排除該過程中人工介入的需求。反投影到3D保證該3D模型本身被儘量逼真地調整。 By adjusting the state of at least one 2D projection and the associated depth information of the 3D image model, less computing resources are used, thereby eliminating the need for manual intervention in the process. Backprojection to 3D ensures that the 3D model itself is adjusted as realistically as possible.

在一實施例中，進一步根據初始的3D模型(m)資訊而決定該已調整之3D模型(m')。 In an embodiment, the adjusted 3D model (m') is further determined based on the initial 3D model (m) information.

此種方式能夠得到該已調整之模型的一平滑漸變(smooth morphing)。 This way a smooth morphing of the adjusted model can be obtained.

在另一實施例中，自外部施加的限制決定該目標狀態(s)。 In another embodiment, the limit imposed from the outside determines the target state (s).

其可諸如包含與鼻子的形狀或眼睛的顏色等的特徵有關之高階資訊。 It may be such as to contain high-level information about features such as the shape of the nose or the color of the eye.

在另一實施例中，自一外部影像輸入(IV)之狀態 (se)得到該目標狀態(s)。 In another embodiment, the state of input (IV) from an external image (se) Get the target state (s).

當將該外部影像輸入(IV)之該狀態(se)與該至少一狀態(s1)結合而得到該目標狀態時，此種方式可允許一3D模型平滑地調整成諸如一現場視訊上的一物件的變化特徵，或調整成相似於，在靜態影像上出現的此物件。 When the state (se) of the external image input (IV) is combined with the at least one state (s1) to obtain the target state, the manner may allow a 3D model to be smoothly adjusted to, for example, a scene on a live video. The changing characteristics of the object, or adjusted to resemble this object appearing on the still image.

在一較佳變形中，該外部影像輸入(IV)包含一2D影像輸入，且根據自該外部影像輸入(IV)演繹之一虛擬攝影機(virtual camera)而執行該3D模型的該至少一2D投影中之一2D投影。 In a preferred variant, the external image input (IV) comprises a 2D image input, and the at least one 2D projection of the 3D model is performed according to a virtual camera from the external image input (IV) interpretation One of the 2D projections.

此種方式適用於得到該外部影像輸入與該3D模型間之一最佳關係。 This method is suitable for obtaining an optimal relationship between the external image input and the 3D model.

在又一變形中，該外部影像輸入可包含一2D+視差輸入(disparity input)，此即意指以諸如一立體攝影機自外部提供2D以及視差資訊。然後可利用深度×視差=常數之公式而自該視差資訊直接推導出深度資訊。 In yet another variation, the external image input can include a 2D+disparity input, which means providing 2D and disparity information from the outside, such as a stereo camera. The depth information can then be derived directly from the disparity information using the formula of depth x disparity = constant.

此種方式可將來自該輸入的深度資料用來更新相關聯的深度。 This way the depth data from this input can be used to update the associated depth.

本發明也係有關一種用來執行該方法的影像或視訊處理裝置所包含的一配置之實施例，且係有關一種包含被資料處理設備執行時調整於執行前文所述或申請專利範圍述及的方法步驟的軟體之電腦程式產品。 The present invention is also directed to an embodiment of a configuration embodied in an image or video processing device for performing the method, and relating to an embodiment comprising, when executed by a data processing device, adjusted to perform the foregoing or the scope of the patent application Method of software software for computer programs.

請注意，申請專利範圍中使用的術語"被耦合"不應被詮釋為只限於直接連接。因此，詞句"一裝置A被耦合到一裝置B"之範圍應不限於裝置A的一輸出被直接連接到裝置B的一輸入之裝置或系統。此即意指：裝置A的一輸出與裝置B的一輸入之間的一路徑可能是包括其他裝置或機構的一路徑。 Please note that the term "coupled" as used in the scope of the patent application should not be construed as limited to direct connection. Therefore, the scope of the phrase "a device A is coupled to a device B" should not be limited to an output of device A being directly connected to An input device or system of device B. This means that a path between an output of device A and an input of device B may be a path including other devices or mechanisms.

請注意，申請專利範圍中使用的術語"包含"不應被詮釋為限於其後列出的機構。因此，詞句"一裝置包含機構A及B"之範圍應不限於裝置只包含組件A及B。此即意指：關於本發明，該裝置的僅有之相關組件是A及B。 Please note that the term "comprising" as used in the scope of the patent application should not be construed as being limited to the Therefore, the scope of the phrase "a device includes mechanisms A and B" should not be limited to that the device only includes components A and B. This means that with regard to the invention, only the relevant components of the device are A and B.

在全部的本文中，二維(Two-Dimensional)將被簡稱為2D，且如前文所述，三維將被簡稱為3D。 In all of the texts, Two-Dimensional will be referred to simply as 2D, and as described above, three-dimensional will be referred to simply as 3D.

熟悉此項技術者當可了解：本說明書中之任何方塊圖代表實施本發明的原理的例示電路之概念圖。同樣地，我們應可了解：任何流程圖、流向圖、狀態變遷圖、及虛擬碼等的圖式代表可在電腦可讀取的媒體中實質地呈現且因而可被電腦或處理器(不論該電腦或處理器是否被明確地示出)執行之各種程序。 It will be understood by those skilled in the art that any <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; Similarly, we should be able to understand that any schematic representation of flowcharts, flow diagrams, state transition diagrams, and virtual code, etc., can be rendered substantially in computer readable media and thus can be used by a computer or processor (regardless of Whether the computer or processor is explicitly shown) various programs that are executed.

第1a圖示出用來調整一3D模型m的該方法的第一變形執行之步驟。 Figure 1a shows the steps of the first variant of the method for adjusting a 3D model m.

在一第一步驟中，執行該3D模型投影到2D。用於該投影之參數是根據習知的針孔攝影機模型而使用之參數，該針孔攝影機模型係述於諸如Richard Hartley及Andrew Zisserman所著之教學手冊"Multiple View Geometry in Computer Vision"(Cambridge University Press,second edition 2003,ISBN 0521 54051 8)的第6章。 In a first step, the 3D model is projected to 2D. The parameters used for this projection are parameters used in accordance with the conventional pinhole camera model, which is described in the teaching manual "Multiple View Geometry in Computer Vision" by Richard Hartley and Andrew Zisserman (Cambridge University). Press, second Edition 2003, ISBN 621 54051 8) Chapter 6.

其因而係有關經由一中央"針孔"而將一3D空間中之各點投影到一平面。在該模型中，該平面對應於該攝影機之投影平面，而針孔對應於通常也被稱為攝影機中心(camera center)的該攝影機之光圈開孔(diafragma opening)。該投影步驟的結果被標示為p1,d1，其中p1指示2D投影本身，其可由含有色彩資訊的像素值之2D矩陣來表示，且其中d1指示投影深度圖(projection depth map)，其亦可由相關聯的深度值之2D矩陣來表示。根據亦將在後文中提供的一些習知方程式而自該等原始深度值及攝影機位置計算出這些相關聯的深度值。 It is thus related to projecting points in a 3D space onto a plane via a central "pinhole". In this model, the plane corresponds to the projection plane of the camera, and the pinhole corresponds to the diabragma opening of the camera, also commonly referred to as the camera center. The result of this projection step is denoted by p1, d1, where p1 indicates the 2D projection itself, which may be represented by a 2D matrix of pixel values containing color information, and wherein d1 indicates a projection depth map, which may also be related The 2D matrix of the depth values of the joints is represented. These associated depth values are calculated from the original depth values and camera positions in accordance with some conventional equations that will also be provided hereinafter.

在替代實施例中，可在一大型2D矩陣內表示該投影及該深度圖，其中對於每一被投影的像素而言，係在對應的矩陣列及行中呈現色彩資訊及相關聯的深度資訊。 In an alternative embodiment, the projection and the depth map may be represented within a large 2D matrix, wherein for each projected pixel, color information and associated depth information are presented in corresponding matrix columns and rows. .

第2a圖中以示意圖示出該投影本身，圖中示出點A，其具有相對於原點O之3個空間座標x_A、y_A、及z_A，其中係經由用來界定一參考座標系的三個軸x、y、z界定這些座標。以具有相對於該相同參考原點及參考座標系之座標x_C、y_C、及z_C之針孔攝影機的攝影機中心位置C標示該針孔攝影機。在被標示為S的與該攝影機相關聯之一投影螢幕上進行點A的投影。點A經由針孔C到該螢幕之投影被標示為p(A)，其具有相關聯的座標(x_PA,y_PA)。然而，係以與在該投影平面S內界定的二維軸x_P及y_P有關之方式界定這些座標。 The projection itself is shown schematically in Figure 2a, which shows a point A with three spatial coordinates x _A , y _A , and z _A relative to the origin O, where the reference is used to define a reference The three axes x, y, z of the coordinate system define these coordinates. The pinhole camera is indicated by a camera center position C of a pinhole camera having coordinates x _C , y _C , and z _C relative to the same reference origin and reference coordinate system. Projection of point A is performed on a projection screen associated with the camera, designated S. The projection of point A via the pinhole C to the screen is labeled p(A) with associated coordinates (x _PA , y _PA ). However, these coordinates are defined in a manner related to the two-dimensional axes x _P and y _P defined within the projection plane S.

為了不使第2a圖超載，此處假定攝影機並不相對於該等三個參考軸x、y、z而轉動。然而，習知的公式也適用於此種更一般性的情況，且這些公式被用於根據本發明的實施例，以便計算投影及相關聯的深度圖。如第2b圖中以示意圖示出的，該攝影機的轉動被標示為θ_x、θ_y、θ_z，以便分別示出該攝影機中心繞著該x、y、z軸的轉動，其中只針對該原點O與該攝影機中心C一致之情形而示出這些轉動。 In order not to overload the 2a map, it is assumed here that the camera does not rotate relative to the three reference axes x, y, z. However, conventional formulas are also applicable to such more general cases, and these formulas are used in accordance with embodiments of the present invention to calculate projections and associated depth maps. As shown schematically in Figure 2b, the rotation of the camera is labeled θ _x , θ _y , θ _z to show the rotation of the camera center about the x, y, and z axes, respectively. These rotations are shown in the case where the origin O coincides with the center C of the camera.

在最一般性之情形中，C可能相對於該參考原點O及該等參考軸x、y、z而平移及轉動。 In the most general case, C may translate and rotate relative to the reference origin O and the reference axes x, y, z.

在根據本發明的實施例中，一3D模型之投影因而將包含此模型之被投影之3D點的色彩或紋理資訊，只要這些被投影之3D點落在該螢幕區S的輪廓內而且不被該模型的另一3D點之另一投影阻擋。3D物件的2D投影確實幾乎固有地會發生阻擋，且阻擋係有關該模型的一個以上的3D點將被投影到該投影螢幕上的相同2D點。 In an embodiment in accordance with the invention, the projection of a 3D model will thus contain the color or texture information of the projected 3D point of the model as long as the projected 3D points fall within the contour of the screen area S and are not Another projection of another 3D point of the model is blocked. The 2D projection of the 3D object does almost inherently block, and more than one 3D point that blocks the model will be projected onto the same 2D point on the projection screen.

與該投影相關聯的深度圖因而將包含該等被投影的像素p(A)中之每一被投影的像素之各別的與該攝影機的位置有關之相對深度值。其被標示為 The depth map associated with the projection will thus include the respective relative depth values associated with the position of the camera for each of the projected pixels of the projected pixels p(A). It is marked as

其中θ_x、θ_y、θ_z指示攝影機繞著第2b圖所示之該等參考軸之各別的轉動， Where θ _x , θ _y , θ _z indicate the respective rotation of the camera about the reference axes shown in Figure 2b,

其中a_x、a_y、a_z代表一點a在參考座標系中之座標，其中c_x、c_y、c_z代表攝影機中心c在參考座標系中之座標，以及其中d_z代表點a相對於該攝影機中心c之相關聯的深度。 Where a _x , a _y , a _z represent the coordinates of a point a in the reference coordinate system, where c _x , c _y , c _z represent the coordinates of the camera center c in the reference coordinate system, and wherein d _z represents point a relative to The depth associated with the center of the camera c.

如果該攝影機沒有相對於參考原點O中參考座標系x、y、z的轉動，則這些轉動角度均為零，因而該方程式(1)將被簡化為：d_z=a_z-c_z (2) If the camera does not rotate relative to the reference coordinate system x, y, z in the reference origin O, then these angles of rotation are all zero, so the equation (1) will be reduced to: d _z = a _{z -} c _z ( 2)

該方程式使用第2a圖中之記號時將對應於：d(A)=z_A-z_c (3) When the equation uses the notation in Figure 2a, it will correspond to: d(A)=z _A -z _c (3)

其亦如第2a圖所示。 It is also shown in Figure 2a.

一般而言，將選擇投影而使待進行3D調整之該3D模型中的該等特徵將在足夠高的解析度下成為該投影的一部分，或使該等特徵將最佳地填滿該投影影像。此可藉由嘗試一組預先被決定的投影位置，然後選擇提供最佳結果的一投影位置而試探地被完成。 In general, the projections will be selected such that the features in the 3D model to be 3D adjusted will be part of the projection at a sufficiently high resolution, or such features will best fill the projected image. . This can be done tentatively by attempting a set of pre-determined projection positions and then selecting a projection position that provides the best results.

在另一實施例中，其可經由一中間步驟而被進一步決定，其中將利用3D三角形而計算該模型的3D曲面之近似值。一般而言，將只以此種3D三角形計算該模型中與將要調整的特徵有關之部分之近似值。決定這些三角形中之每一三角形的與垂直方向有關之法線。在理想的投影中，該法線的方向應該是與該攝影機至該三角形的方向成180度。對於每一攝影機位置而言，所有三角形的各別三角形上的法線與該攝影機至該三角形中心的方向間之角度的餘弦的總和應該是最小的。藉由計算一些可能的攝影機位置之總和，且選擇可得到該總和的最小值之位置，即可計算出一最佳方向。在替代實施例中，可解出求最小值問題，以便諸如決定最佳攝影機方向。 In another embodiment, it can be further determined via an intermediate step in which an approximation of the 3D surface of the model will be calculated using the 3D triangle. In general, the model will only be calculated in this model with such 3D triangles. An approximation of the part of the feature to be adjusted. Determines the normal to the vertical direction of each of these triangles. In an ideal projection, the normal should be oriented 180 degrees from the camera to the direction of the triangle. For each camera position, the sum of the cosines of the normals on the respective triangles of all triangles and the direction of the camera to the center of the triangle should be minimal. An optimum direction can be calculated by calculating the sum of some possible camera positions and selecting the position at which the minimum of the sum can be obtained. In an alternate embodiment, the minimum problem can be solved to determine, for example, the optimal camera orientation.

當然，如熟悉此項技術者所習知的，亦可使用許多其他的技術。 Of course, many other techniques can be used as is known to those skilled in the art.

在一次一步驟中，自該投影提取狀態。狀態意指物件特徵之一組態，且由一組值代表這些特徵本身。這些值可因而描述該物件的可能可變之特性或特徵。可將該組值安排成一向量，但是用於此種狀態的其他表示法當然也是可以的。狀態提取因而意指決定用來代表一影像(在該例子中為一3D模型之一投影)的一物件的狀態之一些狀態參數。如將於下文中述及的例子所示，可經由基於3D模型資訊的某些計算而執行上述步驟，或使用一些更一般性之方法，例如，首先包含辨識/偵測所考慮的物件之一步驟，該步驟可能(但非必要地)執行一些分段操作，然後進一步對該因而被辨識/偵測的物件執行深度分析。 In one step at a time, the state is extracted from the projection. State means the configuration of one of the object features, and a set of values represents the features themselves. These values may thus describe the potentially variable characteristics or characteristics of the object. The set of values can be arranged as a vector, but other representations for such a state are of course also possible. State extraction thus means determining some of the state parameters of the state of an object used to represent an image (in this example, a projection of a 3D model). As shown in the examples that will be described below, the above steps may be performed via some calculations based on 3D model information, or using some more general methods, for example, first including identifying/detecting one of the objects considered A step, which may (but not necessarily) perform some segmentation operations, and then perform a depth analysis on the object thus identified/detected.

然而，在根據本發明的大部分實施例中，3D模型本身係已知，因而可根據該3D模型之狀態而大幅減少狀態提取之計算。如果該3D狀態係有關某些特徵(該等特徵在人頭的3D模型之例子中可以是面部特徵)之座標，則這些3D點的2D投影可立即導致2D影像之狀態參數。 However, in most embodiments according to the present invention, the 3D model itself is known, so that the state can be greatly reduced according to the state of the 3D model. The calculation of the extraction. If the 3D state is a coordinate with respect to certain features (which may be facial features in the case of a 3D model of the human head), the 2D projection of these 3D points may immediately result in a state parameter of the 2D image.

如果還不知道該3D模型之狀態，則前文所述之辨識步驟之後可接續進一步的分析，該進一步的分析可能諸如涉及主動外觀模型(Active Appearance Model；簡稱AAM)之使用。AAM可在諸如作為物件模型的一人頭待更新之情形中，經由配適一2D AAM內部成形模型，而決定2D被投影之影像的面部特徵之形狀及外觀。其開始時可將該2D投影與一2D AAM模型之初始值比較，然後進一步漸漸地改變該AAM模型本身，以便找出最佳配適。一旦找出了一良好的匹配之後，輸出根據該AAM已調整之模型而決定之參數，諸如face_expression_1_x及face_expression_1_y等。 If the state of the 3D model is not yet known, then the identification step described above may be followed by further analysis, such as involving the use of an Active Appearance Model (AAM). The AAM can determine the shape and appearance of the facial features of the 2D projected image by fitting a 2D AAM internal shaping model in a situation such as being updated as an object model. It can initially compare the 2D projection to the initial value of a 2D AAM model and then gradually change the AAM model itself to find the best fit. Once a good match is found, the parameters determined according to the AAM adjusted model, such as face_expression_1_x and face_expression_1_y, are output.

在第1a圖中，投影影像之狀態被標示為s1，且在一目標狀態合成步驟中使用該狀態。自該2D投影的該狀態s1且自外部狀態資訊得到該目標狀態s。可以離線的方式自諸如一靜態影像輸入或根據諸如與鼻子形狀、眼睛顏色、及面部表情等的特徵有關之高階語意資訊等的其他描述資訊，而預先決定被標示為se之該外部狀態資訊。在此種情形中，可將該外部狀態資訊預先儲存在一記憶體內。 In Figure 1a, the state of the projected image is labeled s1 and is used in a target state synthesis step. This state s is obtained from the state s1 of the 2D projection and from the external state information. The external status information indicated as se may be predetermined in an off-line manner from other description information such as a still image input or high-order semantic information related to features such as nose shape, eye color, and facial expression. In this case, the external status information can be pre-stored in a memory.

在替代實施例中，可諸如根據可能迅速地隨著時間的經過而改變之變化的外部視訊影像輸入資料而"即時"決定該外部狀態資訊se。在此種情形中，通常為一視訊序列的後續框決定該外部狀態se。 In an alternate embodiment, the "on-the-fly" decision may be made, such as based on external video image input data that may change rapidly as time elapses. The external status information se. In this case, the subsequent frame, usually a video sequence, determines the external state se.

使用該外部狀態資訊以及該2D投影的狀態s1，以便得到目標狀態。 The external state information and the state s1 of the 2D projection are used to obtain the target state.

用來自該輸入狀態s1及se決定第1a圖中被標示為s的目標狀態之方法可包含下列步驟：以用來反映該等狀態的信賴水準之權值對s1及se值執行一加權組合，其中係在狀態提取期間決定該等信賴水準。在決定該等s1參數的該AAM方法之上述例子中，然後可諸如選擇用來識別匹配的結果之各參數作為該等信賴水準度量。 The method of determining the target state indicated as s in Figure 1a from the input states s1 and se may include the step of performing a weighted combination of the weights of the trust levels used to reflect the states of the states, for the s1 and se values, Among them, the level of trust is determined during the state extraction. In the above example of the AAM method of determining the s1 parameters, each of the parameters used to identify the result of the match may then be selected as the level of trust level.

決定目標狀態之另一方法可以只包含選擇諸如se，在不同狀態之如先前例子中所述的內插或加權組合的結果檢查指示該內插後之結果係位在預定界限外的情形中，其選擇係較佳的。 Another method of determining the target state may include only selecting a result such as se, and the result of the interpolation or weighted combination as described in the previous example in the different states indicates that the result of the interpolation is outside the predetermined limit, The choice is preferred.

在說明第4a-b圖所示之實施例時，將進一步說明狀態及目標狀態的決定之特定實施方式。 In describing the embodiment shown in Figures 4a-b, a particular embodiment of the determination of the state and target state will be further described.

在決定了第1a圖中被標示為s的目標狀態之後，將根據該目標狀態s而轉換該2D投影p1以及相關聯的深度圖d1。在一例子中，可使用一種將三角形用來代表諸如面部特徵之方法。經由內插這些三角形所界定的距離，且使先前歸屬於在先前位置上的各像素之一些特徵現在歸屬於在這些新的位置上之該等像素，將可產生一影像轉換。該方法極適用於許多此種三角形被使用之情形。 After determining the target state labeled s in Figure 1a, the 2D projection p1 and the associated depth map d1 will be converted according to the target state s. In one example, a method of using a triangle to represent features such as facial features can be used. An image conversion can be produced by interpolating the distances defined by the triangles and causing some of the features previously attributed to the pixels at the previous location to now belong to the pixels at these new locations. This method is extremely suitable for the case where many such triangles are used.

在一類似的方法中，將根據新狀態而計算投影影像中與該等特徵相關聯的像素之被更新的2D座標。位於原始2D投影上界定的各三角形間之像素的色彩及紋理資訊將被歸屬於位於被更新之影像中之這些新位置上的該等三角形間之像素。如果該2D投影上的兩點具有內部座標(100,100)及(200,200)，且這兩點將被轉換到被轉換的投影上之座標(50,50)及(100,100)，則座標(150,150)上的原始像素之色彩將被歸屬於被轉換之影像中之座標(75,75)上的像素。 In a similar method, the projected image will be calculated according to the new state. The updated 2D coordinates of the pixels associated with the features. The color and texture information of the pixels between the triangles defined on the original 2D projection will be attributed to the pixels between the triangles located at these new locations in the updated image. If the two points on the 2D projection have internal coordinates (100, 100) and (200, 200) and the two points will be converted to coordinates (50, 50) and (100, 100) on the converted projection, then on the coordinates (150, 150) The color of the original pixel will be attributed to the pixel on the coordinates (75, 75) in the converted image.

在說明第4a-b圖時，將進一步說明更詳細之實施方式。 In the description of Figures 4a-b, a more detailed embodiment will be further described.

已調整之2D投影被標示為p1'。 The adjusted 2D projection is labeled p1'.

亦根據該目標狀態而並行地調整相關聯的深度圖之相關聯的深度值。在某些實施例中，該目標狀態之決定直接涉及該投影的某些像素的已調整之深度值之計算。根據該目標狀態的其他深度值之調整，亦可經由已計算出的已調整之深度間的內插而執行，如前文中所述，相對於已調整之被投影像素的色彩值的調整。 The associated depth values of the associated depth map are also adjusted in parallel based on the target state. In some embodiments, the determination of the target state is directly related to the calculation of the adjusted depth values for certain pixels of the projection. Adjustments to other depth values according to the target state may also be performed via interpolation between the calculated adjusted depths, as described above, with respect to the adjustment of the adjusted color values of the projected pixels.

該已調整之深度圖被標示為d1'。 The adjusted depth map is labeled d1'.

根據該被轉換之深度圖及通常包括已調整之2D影像模型的被轉換之2D投影，可執行再投影或反投影到3D，其方式為使用與3D至2D投影時使用的那些轉換相反之轉換，但是現在將該等已調整之相關聯的深度值用於該已調整之投影影像的每一2D像素。 Re-projection or back-projection to 3D can be performed based on the converted depth map and the converted 2D projection, which typically includes the adjusted 2D image model, by using the conversions that are the opposite of those used in 3D to 2D projection. , but now the adjusted associated depth values are used for each 2D pixel of the adjusted projected image.

該反投影之結果被標示為p3d_1。 The result of this back projection is indicated as p3d_1.

在某些情形中，3D反投影點足以形成一被更新之3D模型。 In some cases, the 3D back projection point is sufficient to form an updated 3D model.

在其他實施例中，將至3D的反投影與原始3D模型m合併，而得到被更新或已調整之3D模型m'。 In other embodiments, the back projection to 3D is merged with the original 3D model m to obtain an updated or adjusted 3D model m'.

第1b圖示出用來執行該方法的一實施例之一配置A。 Figure 1b shows a configuration A for performing one of the embodiments of the method.

第3a圖示出一變形實施例，其中對初始3D模型m執行一個以上的投影。可根據該模型的形式及形狀以及選擇第一投影時發生的阻擋量，或使用前文所述的用來決定投影參數本身的該等方法中之一方法，而選擇該等投影本身。一可能的實施方式因而可基於將在3D中利用一組三角形而被建立模型的3D曲面的近似。計算這些三角形中之每一三角形的垂直方向。可以指向該3D模型本體之外的一3D"法線"向量代表該垂直方向。藉由計算3D向量與攝影機投影方向間之差異，而得到一種用來決定阻擋之簡單方式，至於未被阻擋的曲面，該投影方向應與法線向量相反。因此，可調整攝影機投影，且可能因而變成：為了得到將要被建立模型的所有特徵的具有充分解析度之足夠好的投影，可能需要數次投影。在替代實施例中，亦可使用內定3次的預定投影，而減輕對最佳攝影機位置的嘗試錯誤法計算。 Figure 3a shows a variant embodiment in which more than one projection is performed on the initial 3D model m. The projections themselves may be selected depending on the form and shape of the model and the amount of blocking that occurs when the first projection is selected, or one of the methods described above for determining the projection parameters themselves. A possible implementation may thus be based on an approximation of a 3D surface that will be modeled using a set of triangles in 3D. Calculate the vertical direction of each of these triangles. A 3D "normal" vector that can point outside the body of the 3D model represents the vertical direction. By calculating the difference between the 3D vector and the camera's projection direction, a simple way to determine the blocking is obtained. As for the unobstructed surface, the projection direction should be opposite to the normal vector. Thus, the camera projection can be adjusted and may thus become: several projections may be required in order to obtain a sufficiently good projection of sufficient resolution of all features to be modeled. In an alternate embodiment, a predetermined projection of 3 times may be used to mitigate the trial error calculation for the optimal camera position.

這些不同的投影被標示為p1、p2、至pn，且相關聯的深度圖被標示為d1、d2、至dn。這些投影中之每一投影因而與具有第2a-b圖所示之某一位置、轉動、以及相關聯的螢幕寬度及長度的一虛擬攝影機相關聯。 These different projections are labeled p1, p2, to pn, and the associated depth maps are labeled d1, d2, to dn. Each of these projections thus has a position, rotation, and phase as shown in Figures 2a-b. Associated with the width and length of a virtual camera associated with the camera.

這些不同的投影p1至pn中之每一投影也將經歷狀態提取操作，因而得到各別的被決定之狀態s1、s2至sn。在某些實施例中，尤其在將要被調整的該等特徵與所考慮的該等特徵之座標或像素位置直接相關之情形中，可以前文所述之方式計算這些各別的投影之狀態。 Each of these different projections p1 to pn will also undergo a state extraction operation, thus resulting in respective determined states s1, s2 to sn. In some embodiments, particularly in the case where the features to be adjusted are directly related to the coordinates or pixel locations of the features in question, the state of these respective projections can be calculated in the manner previously described.

這些各別的被決定之狀態s1、s2至sn可能(但非必然)連同外部狀態輸入se而被用來作為決定一目標狀態s之各別的輸入。該目標狀態之該決定可包含：以用來反映各種輸入狀態的信賴水準對該等輸入狀態執行一加權組合，其中係在狀態提取期間決定該等信賴水準。在決定該等s1參數的該AAM方法之上述例子中，然後可諸如選擇用來識別匹配的結果之各參數作為該等信賴水準度量。 These respective determined states s1, s2 to sn may (but are not necessarily) used in conjunction with the external state input se as separate inputs for determining a target state s. The decision of the target state can include performing a weighted combination of the input states with a level of trust that is used to reflect various input states, wherein the trust levels are determined during state extraction. In the above example of the AAM method of determining the s1 parameters, each of the parameters used to identify the result of the match may then be selected as the level of trust level.

決定目標狀態之另一方法可以只包含選擇該等輸入狀態中之一輸入狀態或選擇該外部狀態，在不同狀態之如先前例子中所述的內插或加權組合的結果檢查指示該內插後之結果是在預定界限外的情形中，其選擇係較佳的。 Another method of determining the target state may include selecting only one of the input states or selecting the external state, and the result check of the interpolated or weighted combination as described in the previous example of the different states indicates the interpolation. The result is that in the case of outside the predetermined limits, the choice is preferred.

該目標狀態s構成該等n個各別的投影及其各別相關聯的深度圖被更新之基礎。該等被更新之投影被標示為p1'、p2'至pn'，且該等被更新之深度圖被標示為d1'、d2'至dn'。 The target state s forms the basis for the n respective individual projections and their respective associated depth maps to be updated. The updated projections are labeled p1', p2' to pn', and the updated depth maps are labeled d1', d2' to dn'.

然後根據與該等投影中之每一2D像素相關聯的該等被更新之深度圖值，而將這些被更新之投影p1'、p2'至pn'反投影到3D。將這些反投影與原始模型合併，而產生一被更新之或已調整之模型。 These updated projections p1', p2' to pn' are then backprojected to 3D based on the updated depth map values associated with each 2D pixel in the projections. Combine these back projections with the original model An updated or adjusted model.

第3b圖示出用來執行該變形方法的一配置之一實施例。 Figure 3b shows an embodiment of a configuration for performing the method of deformation.

第4a圖示出用來調整一人頭的3D模型調整之一實施例。在該實施例中，該模型的狀態係有關面部表情，但是在其他實施例中，該狀態可能也係有關頭髮、眼睛、及皮膚等的部位之色彩。該特定實施例之目標在於使用一輸入2D視訊提供的面部特徵描繪該3D模型。 Figure 4a shows an embodiment of a 3D model adjustment for adjusting a human head. In this embodiment, the state of the model is related to facial expressions, but in other embodiments, the state may also be related to the color of the hair, eyes, skin, and the like. The goal of this particular embodiment is to depict the 3D model using facial features provided by an input 2D video.

該輸入視訊在第3a圖中被標示為IV。對於該視訊之每一框而言，以與3D模型的物件之縮放及方位有關之方式估計物件之縮放及方位。此種方式對決定與3D模型至一2D平面的虛擬攝影機視點(viewpoint)有關之第一投影是較佳的，其中該投影應儘量相似於用來擷取2D視訊的攝影機中使用的2D投影。第一投影的特定選擇無須如此，但是對簡易的更新可能是有利的。對於特定投影而言，3D模型至一2D平面的投影因而應使用具有一些相關聯的投影參數之一虛擬攝影機，且這些投影參數係儘量相似於用於拍攝輸入視訊的2D影像的攝影機之那些投影參數。 The input video is labeled IV in Figure 3a. For each frame of the video, the zoom and orientation of the object is estimated in a manner related to the scaling and orientation of the object of the 3D model. This approach is preferred for determining the first projection associated with the virtual camera viewpoint of the 3D model to a 2D plane, wherein the projection should be as similar as possible to the 2D projection used in the camera used to capture 2D video. This need not be the case for a particular choice of the first projection, but may be advantageous for a simple update. For a particular projection, the projection of the 3D model to a 2D plane should therefore use a virtual camera with some associated projection parameters, and these projection parameters are as similar as possible to those projections of the camera used to capture the 2D image of the input video. parameter.

根據諸如將於下文中說明的一些習知技術而執行這些投影參數的計算：決定用於此虛擬攝影機之參數的程序的輸入係人臉的一3D資料庫模型及一現場2D視訊饋送。由於已知該3D資料庫模型的面部特徵之3D位置、該現場視訊饋送中之面部特徵之2D位置、以及網路攝影機及該虛擬攝影機之投影矩陣，所以這些資料應足以計算該現場視訊饋送中之面部的該等面部特徵之3D位置。如果因而知道該現場視訊饋送中之該等面部特徵之該等3D位置、以及該資料庫模型的對應的面部特徵之3D位置，則可計算出該等對應的3D位置間之3D轉換(平移及轉動)。在替代實施例中，因而亦可計算出一虛擬攝影機為了擷取該3D資料庫模型的與該現場視訊饋送中所看到的相同2D視域所需之3D轉換(平移及轉動)。為了將要被施加到該虛擬攝影機的轉換的計算所需之最少特徵點數量是3。因為人臉由於變化的及不同的情緒而不是一死板的物件，因而取得面部特徵時將需要解決最小化的問題。因此，使用諸如左眼的左邊、右眼的右邊、及嘴的頂部等的3個穩定的點。該資料庫模型中之這3個面部特徵之3D位置、以及該現場視訊饋送及該網路攝影機投影矩陣中之對應的面部特徵之2D位置因而被輸入到習知的Grunert演算法。該演算法將提供這些對應的3個面部特徵之所計算出的3D位置。其然後可被用來使該虛擬攝影機在該3D資料庫模型周圍移動，以便擷取與該現場視訊饋送中之面部提供的2D畫面相同之資料庫模型2D畫面。 The calculation of these projection parameters is performed in accordance with some conventional techniques, such as will be described hereinafter: a 3D library model of the input system face that determines the parameters for the parameters of the virtual camera and a live 2D video feed. Since the 3D position of the facial features of the 3D database model is known, in the live video feed The 2D position of the facial features, and the projection matrix of the webcam and the virtual camera, so the data should be sufficient to calculate the 3D position of the facial features of the face in the live video feed. If the 3D positions of the facial features in the live video feed and the 3D position of the corresponding facial features of the library model are thus known, then a 3D transition between the corresponding 3D positions can be calculated (translation and Turn). In an alternative embodiment, it is thus also possible to calculate a 3D transition (translation and rotation) required by a virtual camera to capture the same 2D field of view as seen in the live video feed of the 3D library model. The minimum number of feature points required for the calculation of the transition to be applied to the virtual camera is three. Because the face is not a rigid object due to changing and different emotions, the problem of minimization will need to be solved when obtaining facial features. Therefore, three stable points such as the left side of the left eye, the right side of the right eye, and the top of the mouth are used. The 3D position of the three facial features in the database model, and the 2D position of the live video feed and corresponding facial features in the webcam projection matrix are thus input to the conventional Grunert algorithm. The algorithm will provide the calculated 3D position of these corresponding 3 facial features. It can then be used to move the virtual camera around the 3D library model to retrieve the same library model 2D picture as the 2D picture provided by the face in the live video feed.

在如第4a圖所示之某些實施例中，最好是使用該3D模型之另一投影。此種方式在使用攝影機參數的第一投影導致類似於該視訊饋送的影像之一最佳投影但是仍然沒有產生足夠的像素資料之情形中(例如，在投影影像上的面部之一部分被鼻子阻擋時)可能係所欲的。 In some embodiments as shown in Figure 4a, it is preferred to use another projection of the 3D model. This is the case where the first projection using the camera parameters results in an optimal projection of one of the images similar to the video feed but still does not produce sufficient pixel data (eg, on the projected image) When one part of the department is blocked by the nose, it may be desirable.

第5a圖示出此種情形，圖中在左邊長方形中示出真實"攝影機"擷取的"真"人之視訊，而右邊長方形的左方部分示出以被標示為虛擬攝影機1之第一虛擬攝影機執行之3D模型的投影。如圖所示，該虛擬攝影機對3D模型之投影與"現場"2D攝影機使用之投影條件匹配。但是仍然有面部的左方部分之某些像素被鼻子阻擋。因此，由另一虛擬攝影機執行另一投影，該攝影機被標示為"虛擬攝影機2"。根據先前攝影機位置的被阻擋之像素而決定該虛擬攝影機2之參數。可根據諸如面部點等的內在參數以及該等虛擬攝影機的外在參數，且根據該3D模型之知識，而決定該虛擬攝影機2之參數。該資訊將能夠決定是否要將3D模型的將要被建立模型的該等特徵之兩個立體像素或3D點投影到一2D投影中之相同的像素。如果確係如此，則顯然將發生阻擋。根據該資訊，然後可計算另一虛擬攝影機位置，而可以有該立體像素之至少不同的投影。藉由對所有被投影的像素之此種檢查，可決定阻擋的存在，且可據此而決定力一虛擬攝影機位置及轉動。 Figure 5a shows this situation, in which the "real" person's video captured by the real "camera" is shown in the left rectangle, and the left part of the right rectangle is shown as the first of the virtual camera 1. Projection of a 3D model performed by a virtual camera. As shown, the virtual camera's projection of the 3D model matches the projection conditions used by the "live" 2D camera. However, some pixels of the left part of the face are still blocked by the nose. Therefore, another projection is performed by another virtual camera, which is labeled "Virtual Camera 2". The parameters of the virtual camera 2 are determined based on the blocked pixels of the previous camera position. The parameters of the virtual camera 2 can be determined based on the intrinsic parameters such as facial points and the extrinsic parameters of the virtual cameras, and based on the knowledge of the 3D model. This information will be able to decide whether to project two voxels or 3D points of the features of the 3D model to be modeled into the same pixels in a 2D projection. If this is the case, it is clear that blocking will occur. Based on this information, another virtual camera position can then be calculated, and at least a different projection of the voxel can be present. By performing such an inspection of all of the projected pixels, the presence of the barrier can be determined and the virtual camera position and rotation can be determined accordingly.

在另一實施例中，可使用一些預定的虛擬攝影機或自該等虛擬攝影機中被選出的虛擬攝影機，以便取得感興趣的特徵之投影。此外，在替代實施例中，可使用分別提供一前視畫面以及兩個90度的側視畫面的虛擬攝影機之標準組態，且可根據將要被建立模型的特徵，而使用所有的投影或該等投影的一子集。 In another embodiment, some predetermined virtual cameras or virtual cameras selected from the virtual cameras may be used to obtain a projection of the feature of interest. Moreover, in an alternative embodiment, a standard configuration of a virtual camera that provides a front view and two 90 degree side views, respectively, may be used, and all projections may be used depending on the features to be modeled. A subset of the projection.

如果只使用兩個投影，則第5a圖的右邊長方形的右方部分示出第二投影之結果。亦連同該等投影p1及p2而產生被標示為d1及d2之相關聯的深度圖。這些深度圖指示每一2D被投影的像素之相對深度，其中包括利用方程式(1)得到的與各別攝影機位置有關之自各別虛擬攝影機1或2的視點觀測到的轉動資訊。該等兩個投影中之每一投影的深度圖被標示在該右邊長方形的下方圖中。 If only two projections are used, the right portion of the right rectangle of Figure 5a shows the result of the second projection. The associated depth maps labeled d1 and d2 are also generated along with the projections p1 and p2. These depth maps indicate the relative depth of each 2D projected pixel, including the rotational information observed from the viewpoint of the respective virtual camera 1 or 2 associated with the respective camera position obtained using equation (1). A depth map of each of the two projections is indicated in the lower graph of the right rectangle.

在次一步驟中，提取兩個投影p1及p2以及輸入視訊的後續框之狀態。由於該狀態在該實施例中係有關面部表情，所以將這些面部表情特徵化。使用諸如AAM技術等的前文所述之目前最佳技術自該輸入視訊的後續框以及該等2D投影提取與這些面部表情有關之特徵。亦可以前文所述之方式根據模型之3D狀態以及對應的立體像素投影而計算投影之狀態。第5b圖中示出此種情形，而在左邊長方形中，示出現場2D框上的嘴及眼睛的邊緣的不同的像素之位置。也因而決定該等投影上的這些相同特徵之這些位置。在第5b圖之右方部分中，其只針對投影p1而被示出，但是其亦明顯發生在投影p2，但在該圖中並未被示出，以便不會使該圖式超載。在該特定實施例中，該等各別的狀態對應於與p1、p2以及一輸入框上出現的這些特徵相關聯的像素之位置。這些狀態被分別標示為s1、s2、及se。由於第5b圖中只示出p1，所以也只示出s1。這三個狀態被用來決定目標狀態，而該目標狀態在該實施例中對應於狀態se。雖然在該實施例中，該等各別的狀態s1 及s2因而未被用來決定該目標狀態，但是在根據該目標狀態轉換該等投影期間，仍然使用該等各別的狀態s1及s2。此目標狀態因而也被用來調整2D投影p1及p2。對於對應於"真實"視訊攝影機之虛擬攝影機而言，藉由使該等被選擇的特徵之像素位置被視訊框中出現的這些特徵之對應的像素位置取代，即可簡易地執行此調整。藉由選擇虛擬攝影機1映射到該真實攝影機，其可被極簡易地執行。為了調整由另一虛擬攝影機2所得到的2D投影p2，一可能的方法包含下列步驟：計算首先在3D中決定的p2的該等已調整之特徵的位置。可根據該已調整之投影p1'及已調整之深度圖d1'而執行上述步驟。此種方式可決定p1'上可看見的這些特徵的3D位置之計算。藉由將該等投影參數用於第二投影，即可識別其在p2'上之對應的位置。對於來自p1及p1'之被阻擋的特徵而言，可將內插技術用來計算已調整之投影及已調整之深度圖。 In the next step, the states of the two projections p1 and p2 and the subsequent frames of the input video are extracted. Since the state is related to facial expressions in this embodiment, these facial expressions are characterized. The current best techniques described above, such as AAM techniques, are used to extract features associated with these facial expressions from subsequent frames of the input video and the 2D projections. The state of the projection can also be calculated from the 3D state of the model and the corresponding voxel projections in the manner previously described. This situation is illustrated in Figure 5b, while in the left rectangle, the position of the different pixels of the mouth and the edge of the eye on the live 2D frame is shown. These positions of these same features on the projections are thus also determined. In the right part of Figure 5b, which is shown only for projection p1, it also occurs significantly at projection p2, but is not shown in this figure so as not to overload the pattern. In this particular embodiment, the respective states correspond to the locations of pixels associated with p1, p2, and those features appearing on an input box. These states are labeled s1, s2, and se, respectively. Since only p1 is shown in Fig. 5b, only s1 is also shown. These three states are used to determine the target state, which in this embodiment corresponds to state se. Although in this embodiment, the respective states s1 And s2 are thus not used to determine the target state, but during transitioning the projections according to the target state, the respective states s1 and s2 are still used. This target state is thus also used to adjust the 2D projections p1 and p2. For virtual cameras corresponding to "real" video cameras, this adjustment can be easily performed by replacing the pixel locations of the selected features with corresponding pixel locations of the features appearing in the video frame. By selecting the virtual camera 1 to map to the real camera, it can be executed extremely simply. In order to adjust the 2D projection p2 obtained by the other virtual camera 2, a possible method comprises the steps of calculating the position of the adjusted features of p2 first determined in 3D. The above steps can be performed according to the adjusted projection p1' and the adjusted depth map d1'. This approach determines the calculation of the 3D position of these features visible on p1'. By using the projection parameters for the second projection, the corresponding position on p2' can be identified. For blocked features from p1 and p1', interpolation techniques can be used to calculate the adjusted projection and the adjusted depth map.

一旦知道p1及p2的關鍵性特徵之新位置之後，諸如加權內插等的漸變技術可被用來決定不是關鍵性特徵的像素之色彩及深度。 Once the new locations of the key features of p1 and p2 are known, gradient techniques such as weighted interpolation can be used to determine the color and depth of pixels that are not critical features.

第5b圖的右邊長方形之底部圖中示出投影p1的調整。該投影顯在顯然被調整至左邊長方形的輸入視訊框上出現的"笑"臉表情。該情況也發生在投影p2(第5b圖中未示出)。 The adjustment of the projection p1 is shown in the bottom view of the right rectangle of Fig. 5b. The projection is shown in the "laughing" face expression that appears to be adjusted to the left rectangle of the input video frame. This also occurs in projection p2 (not shown in Figure 5b).

接著使用已調整之深度圖而再投影已調整之投影p1'及p2'到3D且合併，以取代或更新舊的資料。可根據下列近似而計算d1'之資料：該已調整之深度等於初始深度，因而與所考慮的特徵有關且具有投影座標x_PA,y_PA的像素A之初始深度d(A)現在將被歸屬於座標x_PA',y_PA'之像素，這是因為x_PA'及y_PA'是該所考慮的特徵的已調整之座標。 The adjusted depth map is then used to re-project the adjusted projections p1' and p2' to 3D and merge to replace or update the old data. The data of d1' can be calculated from the following approximation: the adjusted depth is equal to the initial depth, and thus the initial depth d(A) of pixel A associated with the feature under consideration and having projection coordinates x _PA , y _PA will now be assigned The coordinates of the coordinates x _PA ', y _PA ', because x _PA 'and y _PA ' are the adjusted coordinates of the feature under consideration.

在這方面，要提到該等已調整之2D影像的所有反投影在3D域中應是一致的。其基本上意指：當反投影在一個以上之2D被投影影像中可看見的一被轉換之特徵時，此特徵應自所有的投影將而被反投影到相同的3D位置。所以，如果嘴角被轉換，且該嘴角出現在數個這些投影，則所有被反投影的座標都應是相同的。 In this regard, it should be mentioned that all backprojections of the adjusted 2D images should be consistent in the 3D domain. It basically means that when the backprojection is a transformed feature visible in more than one 2D projected image, this feature should be backprojected from all projections to the same 3D position. Therefore, if the corner of the mouth is converted and the corner of the mouth appears in several of these projections, then all of the backprojected coordinates should be the same.

假定x_3d是3D物件中所考慮的某一特徵(例如，鼻尖)，x_3d是具有資訊(x,y,z，色彩)之一向量，x_2dz是2D+Z域中之某一特徵，則其是含有資訊(x-2d,y_2d，深度，色彩)之一向量。 Suppose x_3d is a feature considered in a 3D object (for example, the tip of the nose), x_3d is a vector with information (x, y, z, color), and x_2dz is a feature in the 2D+Z domain, then it is A vector containing information (x-2d, y_2d, depth, color).

根據某一虛擬攝影機c1而以函數p建立3D至2D+Z的投影之模型：p(c1,x_3d)=x_2dz_c1 The model of 3D to 2D+Z projection is established by function p according to a virtual camera c1: p(c1, x_3d)=x_2dz_c1

現在考慮狀態已調整之3D模型。在狀態調整之後的預期3D特徵被稱為x'_3d。3D狀態轉移函數是m_3d：x'_3d=m_3d(x_3d) Now consider the 3D model with the state adjusted. The expected 3D feature after the state adjustment is referred to as x'_3d. The 3D state transfer function is m_3d: x'_3d=m_3d(x_3d)

此即意指：x'_2dz_c1=p(c1,x'_3d)=p(c1,m_3d(x_3d)) This means: x'_2dz_c1=p(c1,x'_3d)=p(c1,m_3d(x_3d))

由於係對該等投影執行與狀態有關的調整，因而在2D+Z域中，無法使用m_3d函數。此可藉由使用一m_2dz函數而近似：x"_2dz_c1=m_2dz(c1,x_2dz_c1) Since the state-related adjustments are performed on the projections, the m_3d function cannot be used in the 2D+Z domain. This can be approximated by using a m_2dz function: x"_2dz_c1=m_2dz(c1,x_2dz_c1)

其只有在下列條件時才可與3D狀態一致：x'_2dz_c1=x"_2dz_c1 It can be consistent with the 3D state only under the following conditions: x'_2dz_c1=x"_2dz_c1

此即意指：函數p(c1,m_3d)與函數m_2dz(c1)在所考慮的域內實際上是相同的。 This means that the function p(c1, m_3d) and the function m_2dz(c1) are actually identical in the domain under consideration.

如果確係如此，則沒有任何問題，且可使用前文所述之方法，而不會有任何問題；如果並非如此，則必須執行一額外的步驟。 If this is the case, there is no problem and the method described above can be used without any problems; if not, an additional step must be performed.

為了將其列入考慮，對投影參數的小心選擇將可自開始便解決該問題。 In order to take this into account, careful selection of projection parameters will solve the problem from the beginning.

然而，如果其不被處理，則可能發生不一致的狀況。其中一個問題在於：於將多個2D+Z來源用來重新建立3D模型時，這些來源的反投影對狀態轉移函數必須"有一致性"。當該等函數在3D狀態上是一致時，沒有任何問題(這是因為所有的2dz函數都實際實施3d狀態轉移函數的特定2dz版本)。當該等函數在3d狀態上是不一致時，需要經由"正確的"3d狀態轉移函數或該函數的一近似而強制其一致性。可諸如選擇一參考2DZ狀態轉移函數，且將所有其他的狀態轉移函數投影到該參考，而執行上述步驟：x'_2dz_c1ref=m_2dz(c1ref,x_2dz_c1ref) However, if it is not processed, an inconsistent condition may occur. One of the problems is that the backprojection of these sources must be "consistent" when using multiple 2D+Z sources to recreate 3D models. "." When the functions are consistent in the 3D state, there is no problem (this is because all 2dz functions actually implement a specific 2dz version of the 3d state transition function). When the functions are inconsistent in the 3d state It is necessary to enforce consistency via a "correct" 3d state transfer function or an approximation of the function. It is possible to, for example, select a reference 2DZ state transfer function and project all other state transfer functions to the reference, and perform the above steps :x'_2dz_c1ref=m_2dz(c1ref,x_2dz_c1ref)

現在考慮將m_2dz(c1ref)用來作為該參考2dz狀態轉移函數。可透過經由3D域的移動，而建立其他的函數：x'_3d=p_inv(c1ref,x'_2dz_c1ref)=p_inv(c1ref,m_2qz(c1ref,x_2dz_c1ref) m_2dz(c2,x'_2dz_c2)=p(c2,x'_3d)=p(c2,p_inv(c1ref,m_2dz(c1ref,x_2dz_c1ref)))請注意，並非所有來自3D物件的特徵在移動通過p(c,x_3d)之後都將具有有效值。例如，不在虛擬攝影機視野內之點或被物件中之其他特徵阻擋之點沒有有效值。為了使這些點都有一致的被轉換之特徵，將需要其他的參考攝影機。 Now consider using m_2dz(c1ref) as the reference 2dz state transfer function. Other functions can be established by moving through the 3D domain: x'_3d=p_inv(c1ref,x'_2dz_c1ref)=p_inv(c1ref,m_2qz(c1ref,x_2dz_c1ref) m_2dz(c2,x'_2dz_c2)=p(c2, X'_3d)=p(c2,p_inv(c1ref,m_2dz(c1ref,x_2dz_c1ref))) Note that not all features from 3D objects will have valid values after moving through p(c, x_3d). For example, not Points within the virtual camera's field of view or points blocked by other features in the object have no valid values. In order for these points to have consistent transformed features, additional reference cameras will be required.

第二實施例是第一實施例之一變形，也涉及人臉的3D模型之狀態調整，但是與先前實施例不同之處在於：第二實施例不使用2D攝影機，而是使用2D+Z攝影機，例如，使用立體攝影機、或諸如Microsoft Kinect等的時差測距攝影機(time-of-flight camera)。在此種情形中，可使用取代2D座標的3D座標之面部特徵點。再次取得所需之許多離線模型的2D+Z投影，以涵蓋被現場資料修改的所有點並將狀態推論到這些投影。可諸如將先前實施例的漸變技術用於該等"離線"2D+Z資料，而合併該等資料，但是現在也將被修改的Z資料用於該等特徵點。 The second embodiment is a variant of the first embodiment and also relates to the state adjustment of the 3D model of the face, but differs from the previous embodiment in that the second embodiment does not use a 2D camera but uses a 2D+Z camera. , For example, a stereo camera, or a time-of-flight camera such as Microsoft Kinect or the like is used. In this case, a facial feature point of the 3D coordinate instead of the 2D coordinate can be used. The 2D+Z projections of the many offline models required are again taken to cover all the points modified by the field data and to infer the state to these projections. The gradation technique of the previous embodiment can be used, for example, for the "offline" 2D+Z data, but the data is merged, but the modified Z data is now also used for the feature points.

在這些實施例中，能夠減少3D狀態調整之問題。當開始將狀態自一或多個2D影像轉移到一全3D模型時，現在自2D至2D+Z的狀態轉移減少了，因而使這些操作容易適用於即時應用。 In these embodiments, the problem of 3D state adjustment can be reduced. When the transition from one or more 2D images to a full 3D model begins, the state transitions from 2D to 2D+Z are now reduced, making these operations easy to apply to real-time applications.

雖然前文中以與特定設備有關之方式說明了本發明之原理，但是我們應可清楚地了解：只是以舉例而非對最後的申請專利範圍中界定的本發明範圍加以限制之方式提供本說明。在本發明之申請專利範圍中，被陳述為用來執行一指定功能的一裝置之任何元件將包含用來執行該功能之任何方式。其可包括諸如用來執行該功能的電氣或機械元件之組合、或因而其中包括韌體或微碼等的任何形式之軟體及與其結合而用來執行該軟體以便執行該功能之適當的電路、以及被耦合到以軟體控制的電路(如果有此種電路)之機械元件。這些申請專利範圍界定的本發明存在於下列事實：以申請專利範圍要求之方式結合及組合所述的各種裝置所提供之功能，且除非另有明確的界定，否則任何實體結構對申請專利範圍主張的本發明之新穎性只有很少的重要性或沒有重要性。本案申請人因而將可提供那些功能的任何裝置視為本發明所示的那些裝置之等效物。 While the invention has been described above in terms of the specific embodiments of the invention, it is to be understood that the description of the present invention is provided by way of example only and not limitation of the scope of the invention defined in the scope of the appended claims. In the scope of the present invention, any element that is stated as a means for performing a specified function will include any means for performing the function. It may include, for example, a combination of electrical or mechanical components used to perform the function, or any form of software including firmware or microcode therein, and suitable circuitry for performing the function in conjunction therewith, And mechanical components that are coupled to the software controlled by the software (if such a circuit is present). The invention as defined by the scope of the claims is based on the fact that the functions provided by the various devices are combined and combined in the manner required by the scope of the claims, and unless otherwise explicitly defined, The novelty of the invention is only very Less importance or no importance. Applicants of the present invention thus consider any device that provides those functions to be equivalent to those devices shown in the present invention.

若參閱前文中對一實施例之說明，並配合各附圖，將可對本發明的前文所述及其他的目的及特徵有更清楚的了解，且將可對本發明本身有最佳的了解，在該等附圖中：第1a-b圖示出本發明的方法及裝置之一第一變形；第2a-b圖以示意圖示出本發明的實施例所涉及之幾何模型；第3a-b圖示出本發明的方法之一第二變形；第4a-b圖分別示出本發明的方法之第三及第四實施例；以及第5a-c圖示出第3a圖所示之實施例在有一額外的2D視訊輸入之情形中執行的不同的步驟。 The foregoing and other objects and features of the present invention will become more fully understood from In the drawings: 1a-b shows a first variant of the method and apparatus of the present invention; and 2a-b shows a geometrical model according to an embodiment of the invention; 3a-b The figure shows a second variant of the method of the invention; the 4a-b diagram shows the third and fourth embodiments of the method of the invention; and the 5a-c diagram shows the embodiment shown in Fig. 3a Different steps performed in the case of an additional 2D video input.

Claims

A method of adjusting a 3D model (m) of an object, the method comprising the steps of: performing at least one projection of the 3D model to obtain at least one 2D image model projection (p1) having associated depth information (d1); The at least one 2D image model projection (p1) performs at least one state extraction operation, thereby obtaining at least one state (s1); adjusting the at least one 2D image model projection according to the at least one state (s1) and a target state (s) (p1) and the associated depth information (d1), thereby obtaining at least one adjusted 2D image model (p1') and an associated adjusted depth (d1'); and adjusted according to the association The depth (d1') backprojects the at least one adjusted 2D image model (p1') to 3D, thus obtaining an adjusted 3D model (m').

The method of claim 1, wherein the adjusted 3D model (m') is further determined based on the initial 3D model (m) information.

The method of claim 1 or 2, wherein the target state (s) is derived from externally applied semantic information.

The method of claim 1 or 2, wherein the target state (s) is derived from a state (PS) of an external image input (IV).

The method of claim 4, wherein the target state is obtained by combining the state (PS) of the external image input (IV) with the at least one state (s1).

The method of claim 4, wherein the 2D projection of the at least one 2D projection of the 3D model is performed according to a virtual camera derived from the external image input (IV).

The method of any one of claims 4 to 6, wherein the converting of the live video extracted from the outside and the key features of the projected 2D image is performed, and wherein the keyness of the projections is The new location of the feature is determined based on the location of the critical features of the live video.

A configuration (A1) that is adjusted to perform the method of any one of claims 1 to 7.

An image processing apparatus comprising the configuration of claim 8 of the patent application.

A computer program product comprising software, which is adapted to perform the method steps of any one of claims 1 to 6 when executed on a data processing device.