WO2015042867A1 - Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra - Google Patents

Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra Download PDF

Info

Publication number
WO2015042867A1
WO2015042867A1 PCT/CN2013/084449 CN2013084449W WO2015042867A1 WO 2015042867 A1 WO2015042867 A1 WO 2015042867A1 CN 2013084449 W CN2013084449 W CN 2013084449W WO 2015042867 A1 WO2015042867 A1 WO 2015042867A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
face
model
motion capture
subspace
Prior art date
Application number
PCT/CN2013/084449
Other languages
English (en)
Chinese (zh)
Inventor
吴怀宇
潘春洪
王舒旸
沙金正
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2013/084449 priority Critical patent/WO2015042867A1/fr
Publication of WO2015042867A1 publication Critical patent/WO2015042867A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2021Shape modification

Definitions

  • the present invention relates to the field of computer vision technology, and in particular to a face expression editing method based on a single camera and motion capture data. Background technique
  • Keyframe interpolation is the simplest and most common method. It refers to taking a spatial vertex in three-dimensional space and moving it from one position to another. The computer calculates all the points between the two points, and then moves the point along the calculated track of the point. Key frame interpolation techniques are simple and fast. However, it is only used for the case where the key frame changes are small, and the effect of the key frame with large difference is not ideal.
  • Parameterization method This method still uses polygons to describe the face surface, but uses fewer parameters to describe the motion changes. Users can create shapes and expressions of various faces directly and conveniently by changing the values of these parameters. These parameters include: a) Shape parameters. It is used to control the shape of a personalized face, including the size and shape of the face, the relative position of each feature of the face, and parameters that control the overall characteristics of the face, such as height and width ratio. b) Expression parameters. The expression parameters used to control expressions, such as the eye area, include the opening of the eye, the size of the pupil, the shape and position of the eyebrows, and so on. The parameterization method representation relies on the face topology, so it is difficult to design a general parametric model, and only experienced animators can design high-quality face animation.
  • Muscle-based approach This method is used to simulate the true muscle and muscle movements of the human body.
  • This method introduces anatomical principles, embedding muscles into an anatomically-based human face skin model, forming a particle-spring model, deforming the model by applying pressure to the muscles, thereby realizing the simulation of the face and its expression changes. effect.
  • This method requires a lot of calculations. Different models have great differences in depth and complexity. Simplified models are difficult to achieve ideal visual effects. Complex models are computationally intensive, even for general complexity models. real time.
  • the most successful face animation technology today is to drive face animation through performance data.
  • the method captures features of real human faces under various facial expressions to drive the facial model to produce a true facial expression.
  • the main method is to set a lot of feature points on the face of a performer. When the performers perform various facial expressions, capture the motion vectors of these feature points, and then use these motion vectors to drive the corresponding feature points of the face model to generate the face. expression. It provides an intuitive and efficient way to directly control the generation of facial expressions. In today's Hollywood movies, most movies, such as Avatar, use this motion capture technology.
  • the present invention utilizes the information contained in the motion capture data database to compensate for the lack of information collected by a single camera.
  • the virtual facial expression editing method includes an offline process and an online process, wherein the offline process includes: Step 1: generating a virtual three-dimensional face model of the user by using a frontal photo of the face; Step 2: capturing data for the motion Decoupling, separating the pose and expression; Step 3, constructing the face subspace association model, thereby realizing the global expression controlled by the local features; the input of the online process includes the user's expression video in front of the camera, and the virtual face obtained offline
  • the 3D model, the decoupled motion capture data, and the face subspace association model, the online process includes the following steps: Step 4: Perform video analysis on the emoticon captured from the single camera, and use the active appearance model to track the rigid body motion of the head And the facial feature points in the video, and then tracked
  • the expression control parameters are extracted from the feature points, and the control parameters of the two parts, that is, the expression control parameters and the three-dimensional head posture parameters are obtained.
  • Step 5 The expression control parameters are dynamically filtered by the decoupled motion capture data.
  • the filtered signal is input into the face subspace association model, and the global expression is calculated.
  • the virtual face expression is edited by assigning the global expression to the virtual three-dimensional face generated by the offline process.
  • the present invention collects facial expression information made by a user with a single camera, optimizes the motion capture data, and finally implements expression editing of the virtual face model.
  • the difference from the conventional three-dimensional facial expression animation technique is that the present invention does not require complicated hardware equipment and manual editing by professionals, and at the same time achieves high-quality expression effects.
  • the off-line process of the system essentially constructs the filter and the local feature global expression correlator by using the motion capture data, and then the online process uses the filter to filter the signal of the video acquisition, and then the global expression is calculated by the correlator, and finally the virtual implementation is successfully realized. Face expression editing. DRAWINGS
  • FIG. 1 is a schematic diagram of a face expression editing method based on a single camera and motion capture data according to the present invention
  • FIG. 2 is a schematic diagram of a utility software tag feature point generation face model
  • Figure 3 is a schematic diagram of a motion capture video
  • Figure 4 is a schematic diagram of comparison before and after motion capture data decoupling
  • Figure 5 is a map of the level of the human face
  • FIG. 6 is a diagram of a three-dimensional expression association model of a human face, wherein FIG. 6A is a face subspace association model diagram, and FIG. 6B is a three-dimensional facial expression diagram corresponding to FIG. 6A;
  • Figure 7 is a schematic diagram of the process of transforming the texture model from the average shape to the target face;
  • Figure 8 is a schematic diagram of the result of the fit tracking;
  • Figure 9 is a flow chart of noise signal filtering
  • Figure 10 is a schematic diagram of an example of an expression editing result. detailed description
  • FIG. 1 is a schematic diagram of a facial expression editing method based on a single camera and motion capture data according to the present invention.
  • the method is divided into an online process and an offline process, and the online process is the inner part of the dotted line in FIG.
  • the offline process is the motion capture data preprocessing process.
  • face modeling, video analysis and expression editing are the most basic parts of the method, that is, the basic components of the expression editing function can be basically realized without considering the quality and realism of the result;
  • face modeling, video analysis and expression editing are the most basic parts of the method, that is, the basic components of the expression editing function can be basically realized without considering the quality and realism of the result;
  • decoupling separation pose, constructing facial expression correlation model, expression parameter filtering and local expression parameter calculation global expression is the core part of improving the problem of video extraction.
  • the method includes the following steps:
  • Step 1 Put a face photo on the face, or take a photo through the camera to input the FaceGen Modeller software, mark the feature points, and automatically generate the user's virtual 3D face model, where the photos need to be shot under even light. And the face state should be expressionless and unobstructed.
  • Figure 2 is a schematic diagram of generating a virtual three-dimensional face model. The purpose of this step is to create a virtual 3D face model that provides an entity for expression editing, after which the virtual expression will be reflected on this model.
  • Steps 2 and 3 are offline processes, and the motion capture data is preprocessed.
  • Steps 2 and 3 respectively construct a filter and local feature global expression correlator, so that the video is played during the user's online process.
  • the signal is processed.
  • Step 2 Decouples the motion capture data, and Step 3 establishes the face subspace association model. The following steps 2 and 3 are described in detail.
  • Step 2 The motion capture data is decoupled in order to eliminate the rigid motion in the motion capture data and preserve the expression motion as a filter for the video signal.
  • the invention extracts a reasonable and realistic facial expression motion from a large amount of motion capture data to compensate for the loss of video signal due to noise. To this end, it is necessary to eliminate the interference of the rigid motion on the facial expression change, that is, to separate the translation, scale transformation, elevation angle, roll angle and yaw angle in each frame data, that is, six degrees of freedom of the rigid body.
  • the video signal is automatically filtered during the online process using the constructed filter.
  • the motion capture data decoupling process of the present invention utilizes the orthogonality of the weighted rotation matrix to construct a rotation constraint, and constructs a base constraint using key frames; each frame of data includes two parts: a three-dimensional head gesture and a facial expression change.
  • enter a motion capture database that is rich enough for the sample.
  • the ideal database should contain at least 50,000 frames and basically cover all everyday expressions. This A skeleton does not need to capture the skeleton information of the data, only the coordinates of all the points. Separation of pose and expression changes for each frame of data by singular value decomposition and two constraints (rotation constraints, base constraints)
  • Motion capture (hereafter referred to as mocap) is used to accurately measure the motion of moving objects in three-dimensional space. Based on computer graphics principles, moving objects are tracked by several video capture devices arranged in space (tracker: The motion state is recorded in the form of an image, and then the image data is processed using a computer to obtain spatial coordinates of different objects (trackers) in different time units of measurement (X, Y, Z.
  • the present invention uses Vicon A short open download of emoticon capture data, including asf and amc format data files and video presentation files, video screenshots are shown in Figure 3. In the capture phase, the presenter will inevitably have head movements outside the expression, this The consequence is that the head motion and the expression motion are coupled in the recorded data, so the present invention must decouple the head gesture and expression in the following manner before using the motion capture data.
  • the mocap data is imported into matlab. All the data points form a matrix of 3F x P, where F is the number of frames of the entire mocap data, and P is the number of points in the model.
  • F is the number of frames of the entire mocap data
  • P is the number of points in the model.
  • a facial expression is composed of L independent models, that is, it can be expressed as a linear combination of deformation bases Si, S 2 , ..., S L .
  • Each deformation base is a 3 XP matrix that describes the deformation of P points.
  • the mocap data recorded in each frame contains two parts: three-dimensional head gesture and facial expression change:
  • R f is the head rotation matrix of 3 X 3 and T f is the head translation matrix of 3 XI, which represent the three-dimensional head pose.
  • L is the number of deformation bases
  • f is the f-th frame, indicating the first deformation base
  • c fl is the weight corresponding to the i-th deformation base at the f-th frame.
  • the next step is to separate the two head pose parameters R f and T f from the original data X f , so that it only contains the expression deformation, first subtract the average of all 3D points from X f (will each 3D point x The values of y, z are subtracted from the average of all three-dimensional points x, y, and z, respectively, to eliminate T f , and obtain the format in which two matrices are multiplied:
  • Equation (2) shows that in the absence of noise, the rank of M is the maximum number of deformation bases 3L (actually the size of F and P is generally large, especially the number of frames will reach tens of thousands of frames), so the present invention can pass
  • the size of L is automatically determined while guaranteeing the energy of the specified amount of raw data. Taking the present invention as an example, the number of points in the model is 36, and taking L to 12 can save enough raw data energy.
  • the two matrices obtained by the equation (3) are matrices respectively having the same Q and B dimensions, but are not the decompositions desired by the present invention. It then needs to be transformed by a linear transformation. Any 3LX 3L non-singular matrix G, when inserted between it and its inverse simultaneously 3 ⁇ 4 ⁇ , the resulting product is the same. So the true weighted rotation matrix Q and the deformation base matrix B can be expressed as:
  • G a suitable 3LX3L matrix.
  • the present invention uses two sets of linear constraints: a rotation constraint and a base constraint to first find GG T .
  • the orthogonality of the rotation matrix is a very strong constraint. This property is often used for structural reconstruction of static objects and complex rigid moving objects.
  • each frame in the Q matrix corresponds to only one rotation matrix R, because R is caused by the rigid motion of the head, that is, the rotation of each deformation base is the same in the same frame. Only the difference in weight.
  • Q and B obtained by the above two constraints do not necessarily meet this condition.
  • the first 3 columns in the Q matrix are represented as (c ⁇ Ri ... c Fi R F ) T , and each of the 3 X 3 block matrices is left-multiplied by the same one of the rotation matrices ⁇ .
  • the present invention requires the R-position reference of the first column, normalizing R in the L-1 three columns after Q, thus obtaining a unique Q and B matrix that meets the requirements. After that, all the rotation matrices in the Q matrix are removed, and only the weights of the deformation base are retained.
  • FIG. 4 is the mocap data of the present invention.
  • the two figures in the upper row are the front and top views of the decoupling, and the two rows in the lower row are the front and top views after decoupling.
  • a 1, 2 , ..., 12) indicate control parameters. Zi acts as a filter and is responsible for filtering the video signal during the online process.
  • Step 3 The face subspace association model is established.
  • the invention constructs a subspace layered model on the face by motion capture data, and uses a layered Gaussian process latent variable model algorithm to calculate a mapping between the high dimensional global expression and the low dimensional local feature, so that the global expression is constructed by the local expression feature. .
  • a hierarchical face subspace association model is constructed according to the skeleton information in the motion capture database.
  • the upper layer is the overall expression appearance
  • the lower layer is the local facial feature
  • the layered Gaussian process latent variable model and the radial basis are utilized.
  • Algorithms such as functions establish high-level to low-level mapping and low-level to high-level inverse mapping.
  • the face subspace association model implements two functions, one is to decompose the whole expression change into a partial facial feature change, and the other is to calculate the overall expression change through the partial facial feature change.
  • the latter function is used in the online process of the system, so that the local information input by the single camera can be correlated with each other to produce a realistic expression.
  • the expression of the human face is the result of the joint action of the local muscles and the skin, and the local features are highly correlated in the expression movements.
  • the present invention uses a hierarchical Gaussian process latent variable model (hereinafter referred to as HGPLVM) to achieve correlation and control between global expressions and local features.
  • HGPLVM hierarchical Gaussian process latent variable model
  • the expression is high-dimensional, it is necessary to express the expression through a low-dimensional subspace.
  • the subspace in the present invention is a two-dimensional space, and each of the coordinates represents a high-dimensional expression state.
  • the facial expressions are layered, the upper layer is the overall expression appearance, and the lower layer is the partial facial features, such as the left cheek, forehead, and lower lip.
  • the invention divides the face into five parts: chin, eyebrows, glasses, mouth, cheeks, the same When the eyebrows, eyes, and cheeks are divided into left and right, the mouth is divided into upper and lower, as shown in Figure 5.
  • the face is divided into three layers: the top layer is the global facial expression, the middle layer is divided into the chin, the eyebrows, the glasses, the mouth, the cheeks, the bottom layer is the eyebrows, the eyes, the cheeks are divided into left and right, the mouth is divided into upper and lower, then, in the subspace
  • Each node is set to Xi r e g, defined as follows: if reg e fAlt ⁇
  • the upper subspace state is ⁇ ⁇ , which corresponds
  • the next layer of subspace chin cheeks left eyebrows cheek state is XX
  • the next layer of subspace state is X
  • the states are all one coordinate in their respective subspaces, and the positions of the coordinates are mapped to each other.
  • the mapping from the upper layer to the lower layer in the subspace can be calculated by the HGPLVM (the upper layer and the lower layer mentioned here refer to the mapping relationship between the lower layer and the upper layer between the three layers mentioned above, that is, the bottom layer and the middle layer, The relationship between the middle layer and the top layer).
  • the present invention calculates the inverse mapping in the subspace. Given a low-level node, the state of its parent node is calculated by:
  • X + e , i is the child node of the i-th expression
  • N is the number of samples
  • the input of the inverse mapping is a low-level subspace variable, and the output is a high-level subspace variable.
  • the low-level variables in the training data set are linked to the high-level variables and used together as training data to learn the parameters of the kernel in the projection function.
  • the subspace state corresponding to the global expression can be calculated by the low layer variable, and the process of establishing the face subspace association model is completed.
  • the operation diagram of the face subspace association model is shown in FIGS. 6A and 6B.
  • the system constructs the filter and correlator for the expression control parameter processing.
  • the user's online process only needs to make various expressions in front of the camera, and then the system automatically collects information for filtering and calculation. Expression, complete editing of the 3D face model.
  • step 4 is video analysis.
  • the active appearance model is used to track and collect the expression of the user in front of the camera.
  • Step 5 is to process the expression control parameters, and pass the data through the filter and the association. The optimization of the device to achieve the realism of the virtual face expression editing. The details will be described below.
  • Step 4 Video analysis, video analysis is the data input part, and it is necessary to obtain as many accurate and accurate user actions and expression information as possible from the video for subsequent processing. This requires the face tracking algorithm to have very good robustness and real-time performance.
  • Video analysis of the video image captured from the single camera is performed in this step, using the active appearance model to track the three-dimensional spatial position and orientation of the head and the main facial features in the video, and then automatically converting these features into two parts.
  • Control parameters namely expression control parameters and three-dimensional head attitude parameters
  • AAM Active Appearance Model
  • AAM is a parametric generation model for visual phenomena, mainly used for face modeling.
  • ASM Active Shape Model
  • AAM is an improvement on ASM.
  • face modeling one of the main problems with ASM is that it does not make full use of all available information, ignoring texture features, while AAM is also about face shape and Texture modeling.
  • One principal vector, Pi , Ai is the ith combination coefficient of shape and texture, respectively.
  • the face shape Before modeling the face texture feature using PCA, the face shape needs to be normalized first to obtain a shape-independent patch, and the normalization standard is obtained by Procrustes analysis; the shape modeling is similar.
  • the global transformation is performed before the PCA, but the shape model thus obtained does not contain the information related to the rotation, translation and scale transformation, so the subsequent model fitting also needs to perform the same transformation on the target face first.
  • the corresponding parameters p and ⁇ are obtained. Facing the new input image, the goal of the model fitting is to adjust the two sets of parameters so that the combined model instance and the input image can be matched.
  • A. (x) is the average texture model
  • m is the number of texture principals
  • Ai(x) is the i-th texture principal
  • I(W(x; p)) is the same deformation process W x; applied to the target person Face. Note that this is based on the coordinate system of the model instance.
  • the difference can be defined as the error image E(X).
  • the method used in the present invention is based on an inverse composite image alignment algorithm, which is the biggest difference from the conventional fitting algorithm in that the parameter is not directly directed to p but to the transformation.
  • W(x; to update that is, the update rule changes from the previous pp + ⁇ to W(x; p) W(x; p).
  • W(x; ⁇ ) Symbol is used to represent the inverse composite update method. The direct addition relationship is distinguished.
  • the result of the fitting tracking is shown in Fig. 8.
  • the face tracking program of the present invention can give six pose parameters of a face and two-dimensional coordinates of 66 feature points of a face in real time.
  • the motion of the facial feature points is separated from the rigid motion of the head.
  • the X, y parameters are used to translate them to the center of the screen, and then the pitch angle, the yaw angle, and the roll angle (pitch, Yaw, roll)
  • the rotation matrix consisting of three parameters restores the face to a non-rotating state.
  • the scale transformation is the z-parameter to restore the face to its normal size.
  • the first frame of the user in front of the camera must be a neutral expression, and the program automatically sets the two-dimensional data collected in the first frame as the initial face of the current face. Value, each subsequent frame will be compared with the initial expression to get the expression state of the current frame.
  • the invention collects 12 expression control parameters for 66 two-dimensional feature points:
  • Mouth By tracking the characteristic points around the mouth, we collected the distance between the upper and lower lips (1), the distance between the left and right lips (1), the angle between the upper and lower lips and the vertical line (1), and the angle between the left and right corners and the horizontal line ( 1), the relative position of the midpoint of the upper and lower lips and the corner of the mouth (1) and a total of 5 parameters.
  • Eyes Distance between the upper and lower eyelids of the left and right eyes (2) A total of 2 parameters.
  • eyebrows the distance between the two eyebrows (1), the distance between the left and right eyebrows relative to the left and right eyes (2 angles of the eyebrows and the horizontal line respectively (2) a total of 5 parameters.
  • Step 5 Expression Control Parameter Processing. Since the active appearance model tracking algorithm has jitter on the tracking of the feature points, the present invention extracts the expression control parameters extracted by the video analysis using the decoupled motion capture data for dynamic data filtering, and the filtered signal is input into the face subspace correlation model, and the calculation is performed. Get the global expression, and finally realize the expression editing of the virtual face by giving the global expression to the virtual 3D face generated by the offline process.
  • the expression control parameters obtained by the video tracking are inconsistent with the data obtained in the mocap data, because the user and the mocap data collector have different facial expression geometries, so the present invention uses the two before using the two. , you need to use the control parameters of neutral expressions to standardize and keep them consistent. Filtering is then possible.
  • the vision-based control parameters typically carry a lot of noise, and the present invention divides them into segments at a fixed time interval W, which is filtered by prior knowledge in the motion capture database.
  • the value range of K is 2 to 4 times the number of motion capture database and video control parameter fragment frames.
  • the number of fragment frames is 10 frames, and K is 30.
  • the above process completes the filtering of the video signal, and the next task is to calculate the global expression from the filtered video signal, that is, the local features of the face.
  • the present invention utilizes a relay algorithm.
  • the user When the method of the present invention is applied, the user first provides his own frontal photo, and uses the FaceGen Modeller software to generate a three-dimensional face model, as shown in FIG. 2, and then the motion capture data as shown in FIG. 3 (the presenter wearing the figure in FIG. 3) 38 markers are used to capture)
  • Pre-processing including two parts, (1) Decoupling of data, eliminating rigid body motion, only retaining facial expression changes as shown in Figure 4, Figure 4 data points for motion capture data 30 frames of data selected by the same number of frames in the middle, (2) Establishing a face layering model as shown in FIG. 5 and calculating a mapping relationship between the upper subspace and the lower subspace as shown in FIG. 6A.
  • each rectangle represents two A node in the dimension space, where the white curve represents the trajectory after the expression data is projected into the subspace, and the motion of controlling any of the nodes can be visually reflected in the model of FIG. 6B.
  • Figure 7 shows the basic principle of the active appearance model, showing the transformation of the texture model from the average shape to the target face W(x; p).
  • Figure 8 is the result of video analysis fit tracking. Then, the video signal obtained by the fitting is filtered, and the filtering process is as shown in FIG. 9, and then input into the face subspace correlation model shown in FIG. 6A, thereby calculating the final facial expression, and
  • FIG. 10 is the expression editing result.
  • a Core 2 computer with a 2.6 GHz central processing unit and 1 Gbyte of memory is used and the online process of the system is programmed in C language, and the offline process for motion capture data is written in Matlab.
  • the processing and the face expression editing system of the present invention are implemented, and other execution environments may also be used, and details are not described herein again.
  • the user of the present invention drives the virtual three-dimensional face to make the same realistic expression by making different expressions in front of the camera.
  • the real-time face tracking is used to extract the rigid motion of the head. Parameters and expression control parameters.
  • the present invention utilizes the information contained in the motion capture data to filter the video signal, which requires decoupling the motion capture data.
  • the present invention uses a layered Gaussian process latent variable model to create a subspace mapping for a face.
  • the online process of the system does not require any mouse or keyboard interaction.
  • the video capture signal is filtered by pre-processed motion capture data after separation, and the filtered signal input face layered model is converted into high-dimensional expression signal for Control the movement and expression of virtual faces.
  • the system combines convenient and fast single-camera tracking and high-quality motion capture data, combining versatility and realism, without the need for expensive multi-camera capture devices, and can be implemented on inexpensive PC platforms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention porte sur un procédé de modification d'une expression faciale virtuelle, qui comprend un processus hors ligne et un processus en ligne. Le processus hors ligne comprend les étapes suivantes : étape 1, utilisation d'une photo d'un visage entier pour générer un modèle de visage tridimensionnel virtuel d'un utilisateur ; étape 2, découplage de données de capture de mouvement et séparation d'un geste d'avec une expression ; et étape 3, construction d'un modèle d'association de sous-espace facial. Le processus en ligne comprend les étapes suivantes : étape 4, réalisation d'une analyse vidéo sur une image vidéo capturée par une seule caméra, utilisation d'un modèle d'apparence active pour suivre une position dans l'espace tridimensionnel et une orientation d'une tête et de caractéristiques faciales principales dans une vidéo, et conversion automatique des caractéristiques en deux parties de paramètres de commande ; et étape 5, utilisation des données de capture de mouvement après découplage pour effectuer un filtrage de données dynamique sur des paramètres de commande d'expression à fort bruit et basse résolution, introduction de signaux après filtrage dans le modèle d'association de sous-espace facial, calcul et obtention d'une expression globale, et enfin attribution de l'expression globale au visage tridimensionnel virtuel généré dans le processus hors ligne pour mettre en œuvre une modification de l'expression faciale virtuelle.
PCT/CN2013/084449 2013-09-27 2013-09-27 Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra WO2015042867A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/084449 WO2015042867A1 (fr) 2013-09-27 2013-09-27 Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/084449 WO2015042867A1 (fr) 2013-09-27 2013-09-27 Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra

Publications (1)

Publication Number Publication Date
WO2015042867A1 true WO2015042867A1 (fr) 2015-04-02

Family

ID=52741811

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/084449 WO2015042867A1 (fr) 2013-09-27 2013-09-27 Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra

Country Status (1)

Country Link
WO (1) WO2015042867A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984943A (zh) * 2023-01-16 2023-04-18 支付宝(杭州)信息技术有限公司 面部表情捕捉及模型训练方法、装置、设备、介质及产品
US11941753B2 (en) 2018-08-27 2024-03-26 Alibaba Group Holding Limited Face pose estimation/three-dimensional face reconstruction method, apparatus, and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017372A1 (en) * 2002-07-18 2004-01-29 Park Min Je Motion reconstruction method from inter-frame feature correspondences of a singular video stream using a motion library
US20070236501A1 (en) * 2006-04-05 2007-10-11 Ig-Jae Kim Method for generating intuitive quasi-eigen paces
CN101216949A (zh) * 2008-01-14 2008-07-09 浙江大学 一种基于区域分割和分段学习的三维人脸动画制作的方法
CN101944238A (zh) * 2010-09-27 2011-01-12 浙江大学 基于拉普拉斯变换的数据驱动人脸表情合成方法
CN103093490A (zh) * 2013-02-02 2013-05-08 浙江大学 基于单个视频摄像机的实时人脸动画方法
CN103473801A (zh) * 2013-09-27 2013-12-25 中国科学院自动化研究所 一种基于单摄像头与运动捕捉数据的人脸表情编辑方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017372A1 (en) * 2002-07-18 2004-01-29 Park Min Je Motion reconstruction method from inter-frame feature correspondences of a singular video stream using a motion library
US20070236501A1 (en) * 2006-04-05 2007-10-11 Ig-Jae Kim Method for generating intuitive quasi-eigen paces
CN101216949A (zh) * 2008-01-14 2008-07-09 浙江大学 一种基于区域分割和分段学习的三维人脸动画制作的方法
CN101944238A (zh) * 2010-09-27 2011-01-12 浙江大学 基于拉普拉斯变换的数据驱动人脸表情合成方法
CN103093490A (zh) * 2013-02-02 2013-05-08 浙江大学 基于单个视频摄像机的实时人脸动画方法
CN103473801A (zh) * 2013-09-27 2013-12-25 中国科学院自动化研究所 一种基于单摄像头与运动捕捉数据的人脸表情编辑方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11941753B2 (en) 2018-08-27 2024-03-26 Alibaba Group Holding Limited Face pose estimation/three-dimensional face reconstruction method, apparatus, and electronic device
CN115984943A (zh) * 2023-01-16 2023-04-18 支付宝(杭州)信息技术有限公司 面部表情捕捉及模型训练方法、装置、设备、介质及产品
CN115984943B (zh) * 2023-01-16 2024-05-14 支付宝(杭州)信息技术有限公司 面部表情捕捉及模型训练方法、装置、设备、介质及产品

Similar Documents

Publication Publication Date Title
Achenbach et al. Fast generation of realistic virtual humans
JP5344358B2 (ja) 演技から作り出される顔アニメーション
Xia et al. A survey on human performance capture and animation
Cao et al. 3D shape regression for real-time facial animation
Wechsler Reliable Face Recognition Methods: System Design, Impementation and Evaluation
Ersotelos et al. Building highly realistic facial modeling and animation: a survey
CN103473801B (zh) 一种基于单摄像头与运动捕捉数据的人脸表情编辑方法
Cheng et al. Parametric modeling of 3D human body shape—A survey
KR100715735B1 (ko) 디지털 안면 모델을 애니메이팅하기 위한 시스템 및 방법
WO2017044499A1 (fr) Système de régularisation et de reciblage d'image
WO2023071964A1 (fr) Procédé et appareil de traitement de données, dispositif électronique et support de stockage lisible par ordinateur
CN113421328B (zh) 一种三维人体虚拟化重建方法及装置
Rhee et al. Real-time facial animation from live video tracking
Malleson et al. Rapid one-shot acquisition of dynamic VR avatars
Goto et al. MPEG-4 based animation with face feature tracking
WO2022060230A1 (fr) Systèmes et procédés pour construire une topologie pseudo-musculaire d'un acteur en direct dans une animation informatique
Chen et al. 3D face reconstruction and gaze tracking in the HMD for virtual interaction
Zhang et al. Simuman: A simultaneous real-time method for representing motions and emotions of virtual human in metaverse
Fechteler et al. Markerless multiview motion capture with 3D shape model adaptation
CN108908353B (zh) 基于平滑约束逆向机械模型的机器人表情模仿方法及装置
Kang et al. Appearance-based structure from motion using linear classes of 3-d models
CN117333604A (zh) 一种基于语义感知神经辐射场的人物面部重演方法
WO2015042867A1 (fr) Procédé de modification d'une expression faciale sur la base de données de capture de mouvement provenant d'une seule caméra
Gahlot et al. Skeleton based human action recognition using Kinect
CN116740290A (zh) 基于可变形注意力的三维交互双手重建方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13894689

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13894689

Country of ref document: EP

Kind code of ref document: A1