WO2022143197A1 - 虚拟对象面部动画生成方法及装置、存储介质、终端 - Google Patents

虚拟对象面部动画生成方法及装置、存储介质、终端 Download PDF

Info

Publication number
WO2022143197A1
WO2022143197A1 PCT/CN2021/138747 CN2021138747W WO2022143197A1 WO 2022143197 A1 WO2022143197 A1 WO 2022143197A1 CN 2021138747 W CN2021138747 W CN 2021138747W WO 2022143197 A1 WO2022143197 A1 WO 2022143197A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
face
actor
facial
preset
Prior art date
Application number
PCT/CN2021/138747
Other languages
English (en)
French (fr)
Inventor
金师豪
王从艺
柴金祥
Original Assignee
魔珐(上海)信息科技有限公司
上海墨舞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔珐(上海)信息科技有限公司, 上海墨舞科技有限公司 filed Critical 魔珐(上海)信息科技有限公司
Publication of WO2022143197A1 publication Critical patent/WO2022143197A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the invention relates to the technical field of virtual digital objects, in particular to a method and device for generating facial animation of a virtual object, a storage medium and a terminal.
  • the facial capture (referred to as face capture) animation technology mainly includes two parts: offline facial animation production and real-time driving of the facial expressions of virtual characters.
  • the face performance animation technology (that is, the aforementioned face capture animation technology) that has appeared in recent years accelerates the production of offline animation by using the captured facial information of actors, in order to reduce the labor cost of offline facial animation production.
  • the application of face capture animation technology makes it possible to drive the facial expressions of virtual characters in real time.
  • the existing face capture animation technology still has many problems, such as time-consuming production, high labor cost, and low quality of the generated virtual object faces.
  • the technical problem solved by the present invention is to provide an efficient and high-precision facial animation generation solution for virtual objects.
  • an embodiment of the present invention provides a method for generating facial animation of a virtual object, including: receiving an image frame to be processed, the image frame including an actor's facial image; based on a preset three-dimensional facial model and the facial image
  • the three-dimensional face of the actor is obtained by reconstruction, and the preset three-dimensional face model is used to describe the facial expression changes of the actor; a plurality of three-dimensional feature points are extracted from the three-dimensional face; based on the relationship between the three-dimensional feature points and the animation data
  • the mapping relationship determine the animation data corresponding to the plurality of three-dimensional feature points; generate the expression of the corresponding virtual object face based on the animation data, and the generated expression of the virtual object face and the actor in the facial image
  • the expressions made in are the same.
  • the preset three-dimensional face model includes a preset three-dimensional face model and a preset three-dimensional eyes model
  • the three-dimensional face of the actor includes the three-dimensional face of the actor and the three-dimensional eyes of the actor, wherein the three-dimensional face of the actor is based on The preset three-dimensional face model and the face image are reconstructed, and the three-dimensional eyesight of the actor is reconstructed based on the preset three-dimensional eyes model and the face image.
  • the process of reconstructing the three-dimensional face of the actor based on the preset three-dimensional face model and the face image includes the following steps: detecting the face image to obtain at least a plurality of two-dimensional face feature points;
  • the preset three-dimensional face model generates an estimated three-dimensional face; extracts multiple estimated three-dimensional feature points from the estimated three-dimensional face; projects the multiple estimated three-dimensional feature points to a two-dimensional plane to obtain multiple estimated three-dimensional feature points.
  • two-dimensional projection points calculate the coordinate difference between the multiple two-dimensional face feature points and the multiple two-dimensional projection points; if the coordinate difference is less than a preset threshold, determine the estimated three-dimensional face is the reconstructed three-dimensional face of the actor.
  • the two-dimensional face feature points have corresponding semantic information
  • the two-dimensional projection points have corresponding semantic information
  • the calculation of the plurality of two-dimensional face feature points and the plurality of two-dimensional face feature points includes: calculating the coordinates between the two-dimensional face feature points and the two-dimensional projection points corresponding to the same semantic information among the multiple two-dimensional face feature points and the multiple two-dimensional projection points respectively. difference; determining the sum of multiple coordinate differences obtained by calculation as the coordinate difference between the multiple two-dimensional face feature points and the multiple two-dimensional projection points.
  • the process of reconstructing the three-dimensional face of the actor based on the preset three-dimensional face model and the facial image further includes the following steps: if the coordinate difference is greater than a preset threshold, iteratively adjust the preset 3D face model and camera external parameters, until the coordinate difference between the multiple 2D projection points obtained based on the adjusted preset 3D face model and the multiple 2D face feature points is less than the preset threshold .
  • the output result of the preset three-dimensional face model is associated with input weights
  • the iteratively adjusting the preset three-dimensional face model includes: iteratively adjusting the input weights to obtain the preset three-dimensional face model.
  • Different output results of the face model different output results correspond to different expressions.
  • the camera external parameters include the relative position and orientation between the actor's face and the image capture device that captures the face image.
  • the process of establishing the preset three-dimensional face model includes the following steps: obtaining a mixed-shape model group of the actor, where the mixed-shape model group includes multiple mixed-shape models and is used to describe multiple expressions; Principal component analysis is performed on the mixed shape model group to obtain the preset three-dimensional face model.
  • the plurality of expressions include at least a neutral expression
  • the mixed-shape model group includes at least one mixed-shape model describing the neutral expression
  • the process of establishing the mapping relationship between the three-dimensional feature points and the animation data includes the following steps: acquiring training data, where the training data includes multiple three-dimensional feature points and animation data corresponding to each of the multi-frame training frames.
  • the multi-frame training frames are facial images when the actor makes different expressions; a mapping relationship between the three-dimensional feature points and animation data is established based on the training data.
  • the multi-frame training frame is selected from a single video, and the multi-frame training frame is an image frame with the largest difference in feature information of corresponding three-dimensional feature points among all image frames included in the video.
  • the to-be-processed image frame is selected from image frames other than training frames in the video.
  • the training data is adjusted according to the similarity of expressions, and the similarity of expressions is the expressions made by the actor in the image frames to be processed and the faces of the virtual objects generated based on the image frames to be processed. similarity between expressions.
  • the multi-frame training frames are obtained from a plurality of videos, and the plurality of videos are captured when the actor performs according to a preset script.
  • the to-be-processed image frame is a facial image of the actor captured in real time.
  • an embodiment of the present invention also provides a virtual object facial animation generation device, including: a receiving module for receiving an image frame to be processed, the image frame including an actor's facial image; a reconstruction module for The actor's three-dimensional face is reconstructed based on a preset three-dimensional face model and the face image, and the preset three-dimensional face model is used to describe the facial expression changes of the actor; an extraction module is used to extract the three-dimensional face from the Obtaining a plurality of three-dimensional feature points; a determining module for determining animation data corresponding to the plurality of three-dimensional feature points based on the mapping relationship between the three-dimensional feature points and animation data; a generating module for generating corresponding animation data based on the animation data and the generated facial expression of the virtual object is consistent with the expression made by the actor in the facial image.
  • an embodiment of the present invention further provides a storage medium on which a computer program is stored, and the computer program executes the steps of the above method when the computer program is run by a processor.
  • an embodiment of the present invention further provides a terminal, including a memory and a processor, where the memory stores a computer program that can run on the processor, and when the processor runs the computer program, Perform the steps of the above method.
  • An embodiment of the present invention provides a method for generating facial animation of a virtual object, including: receiving an image frame to be processed, the image frame including a facial image of an actor; reconstructing a facial image of the actor based on a preset three-dimensional facial model and the facial image.
  • Three-dimensional face, the preset three-dimensional face model is used to describe the facial expression changes of the actor; a plurality of three-dimensional feature points are extracted from the three-dimensional face; based on the mapping relationship between the three-dimensional feature points and animation data, determine the The animation data corresponding to the plurality of three-dimensional feature points; the expression of the corresponding virtual object face is generated based on the animation data, and the generated expression of the virtual object face is maintained with the expression made by the actor in the facial image. Consistent.
  • the present embodiment can provide an efficient and high-precision virtual object facial animation generation solution, and the generated virtual object face has a high similarity in expression with the actor's real face. Specifically, the quality of 3D facial reconstruction is improved based on a preset 3D facial model. Further, since the three-dimensional face of the actor is accurately reconstructed, the animation data can be more accurately predicted, and finally a high-quality virtual object face can be obtained. In addition, the overall production efficiency of the virtual object face generated by this embodiment is high. Further, due to the adoption of 3D facial reconstruction technology, the requirement for actors to wear helmets can be relaxed, and there is no need to strictly return to the helmet wearing position when facial data was last captured.
  • the preset three-dimensional face model includes a preset three-dimensional face model and a preset three-dimensional eye model
  • the three-dimensional face of the actor includes the three-dimensional face of the actor and the three-dimensional eyes of the actor, wherein the three-dimensional face of the actor is reconstructed based on the preset three-dimensional face model and the facial image, and the three-dimensional eyesight of the actor is reconstructed based on the preset three-dimensional gaze model and the facial image.
  • the process of establishing the preset three-dimensional face model includes the following steps: acquiring a mixed-shape model group of the actor, where the mixed-shape model group includes multiple mixed-shape models and is used for the multiple expressions; Principal component analysis is performed on the mixed shape model group to obtain the preset three-dimensional face model.
  • the principal component analysis model of the actor with very high accuracy is obtained based on the mixed shape model as the preset three-dimensional face model. Since the quality of the preset 3D face model is high enough, a high-precision 3D face of an actor can be reconstructed as a mapping basis for animation data when the virtual object face is generated.
  • a machine learning model is used to automatically detect the facial image to obtain a plurality of two-dimensional feature points.
  • the present embodiment automatically detects the two-dimensional feature points for each actor's facial image, that is, this step is completely automated. , which greatly improves the efficiency of animation production.
  • the process of establishing the mapping relationship between the three-dimensional feature points and the animation data includes the following steps: acquiring training data, where the training data includes multiple three-dimensional feature points and animation data corresponding to each of the multi-frame training frames.
  • the training frame is the facial image when the actor makes different expressions; the mapping relationship between the three-dimensional feature points and the animation data is established based on the training data.
  • the multi-frame training frame is selected from a single video, and the multi-frame training frame is an image frame with the largest difference in feature information of corresponding three-dimensional feature points among all image frames included in the video.
  • the image frames to be processed are selected from image frames other than training frames in the video. This embodiment is suitable for offline facial animation production scenarios, and can greatly improve offline production efficiency.
  • the process of establishing the mapping relationship between the three-dimensional feature points and the animation data includes the following steps: acquiring training data, where the training data includes multiple three-dimensional feature points and animation data corresponding to each of the multi-frame training frames.
  • the training frame is the facial image when the actor makes different expressions; the mapping relationship between the three-dimensional feature points and the animation data is established based on the training data.
  • the multi-frame training frames are obtained from a plurality of videos, and the plurality of videos are captured when the actor performs according to a preset script.
  • the to-be-processed image frame is a facial image of the actor obtained by real-time shooting. This embodiment is suitable for real-time driving application scenarios, and can drive virtual character faces of various precisions in real time.
  • FIG. 1 is a flowchart of a method for generating a virtual object facial animation according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a process for establishing a preset three-dimensional face model according to an embodiment of the present invention
  • step S102 in FIG. 1 is a flowchart of a specific implementation of step S102 in FIG. 1;
  • FIG. 4 is a schematic structural diagram of an apparatus for generating a facial animation of a virtual object according to an embodiment of the present invention.
  • the existing facial capture animation technology still has many problems, such as time-consuming production, high labor cost, and low quality of the generated virtual object facial animation.
  • the inventors of the present application found that the existing facial performance animation technology (that is, face capture technology) is mainly divided into two categories: one is based on two-dimension (two-dimension, 2D for short) facial feature points; It is based on three-dimensional (three-dimension, 3D) face reconstruction.
  • This type of technology based on 3D face reconstruction needs to reconstruct a 3D face based on the captured face images, and then redirect the 3D face information into animation data of the virtual character's face.
  • the quality of the reconstructed three-dimensional face is generally not high by the existing technology.
  • the existing technology is to directly transfer the calculated blend shape weight to the blend shape weight of the virtual character, which can only work for relatively simple character binding, and the effect is limited.
  • the two technical means usually adopted by the existing face capture technology have many defects, and cannot efficiently generate high-quality virtual object faces.
  • an embodiment of the present invention provides a method for generating facial animation of a virtual object, including: receiving an image frame to be processed, the image frame including an actor's facial image; based on a preset three-dimensional facial model and the facial image
  • the three-dimensional face of the actor is obtained by reconstruction, and the preset three-dimensional face model is used to describe the facial expression changes of the actor; a plurality of three-dimensional feature points are extracted from the three-dimensional face; based on the relationship between the three-dimensional feature points and the animation data
  • the mapping relationship determine the animation data corresponding to the plurality of three-dimensional feature points; generate the expression of the corresponding virtual object face based on the animation data, and the generated expression of the virtual object face and the actor in the facial image
  • the expressions made in are the same.
  • This embodiment can provide an efficient and high-precision virtual object facial animation generation solution, and the generated virtual object face has a high similarity in expression with the actor's real face. Specifically, the quality of 3D facial reconstruction is improved based on a preset 3D facial model. Further, since the three-dimensional face of the actor is accurately reconstructed, the animation data can be more accurately predicted, and finally a high-quality virtual object face can be obtained. In addition, the overall production efficiency of the virtual object face generated by this embodiment is high. Further, due to the adoption of 3D facial reconstruction technology, the requirement for actors to wear helmets can be relaxed, and there is no need to strictly return to the helmet wearing position when facial data was last captured.
  • FIG. 1 is a flowchart of a method for generating facial animation of a virtual object according to an embodiment of the present invention.
  • This embodiment can be applied to application scenarios such as virtual digital object generation, animation production, etc., such as an animation generation scenario applied to the face of a virtual object.
  • the facial expressions of the actors can be redirected to the faces of the virtual objects based on the face capture technology, so that the facial expressions presented by the virtual objects are consistent with the expressions made by the actors.
  • the virtual object may include a virtual person, and may also include multiple types of virtual objects with faces, such as virtual animals and virtual plants. Virtual objects can be three-dimensional.
  • the virtual object facial animation data may include controller data for generating virtual object animation, in the form of a sequence of digitized vectors.
  • the animation data of the face of the virtual object can be obtained, and the animation data is the attribute value of the controller.
  • the animation data is converted into a data form that can be received by UE or Unity3d, and input to the rendering engine, such as UE or Unity3d. Unity3d, you can drive the face of the virtual object to make corresponding actions.
  • the animation data may include facial expressions of virtual objects, ie, expression parameters of the virtual objects.
  • facial expressions may include expressions, eyes, and other information.
  • the method for generating facial animation of a virtual object described in this embodiment may include the following steps:
  • Step S101 receiving an image frame to be processed, the image frame including an actor's facial image
  • Step S102 reconstructing a three-dimensional face of the actor based on a preset three-dimensional facial model and the facial image, and the preset three-dimensional facial model is used to describe the facial expression changes of the actor;
  • Step S103 extracting a plurality of three-dimensional feature points from the three-dimensional face
  • Step S104 based on the mapping relationship between the three-dimensional feature points and the animation data, determine the animation data corresponding to the plurality of three-dimensional feature points;
  • Step S105 generating a corresponding facial expression of the virtual object based on the animation data, and the generated facial expression of the virtual object is consistent with the expression made by the actor in the facial image.
  • the preset three-dimensional facial model is a mathematical model established based on scan data when the actor makes a specific expression, and can describe any expression of the actor.
  • the preset three-dimensional facial model can describe the three-dimensional facial expression changes of the actor with as few expression parameters as possible, which is beneficial to improve the processing efficiency in the subsequent reconstruction of the three-dimensional face.
  • the preset three-dimensional facial model may be associated with the actor, and the actor is the actor captured in the to-be-processed image frame input in step S101. That is, when the actor is replaced, the operation of establishing the preset three-dimensional face model needs to be repeatedly performed. Thereby, the manufacturing precision can be improved and the calculation cost can be saved.
  • the preset three-dimensional face model may include a preset three-dimensional face model and a preset three-dimensional eyes model
  • the three-dimensional face of the actor may include the three-dimensional face of the actor and the three-dimensional eyes of the actor, wherein the three-dimensional The human face may be reconstructed based on the preset three-dimensional face model and the facial image
  • the three-dimensional gaze of the actor may be reconstructed based on the preset three-dimensional gaze model and the facial image.
  • the process of establishing the preset three-dimensional face model may include the following steps:
  • Step S201 obtaining a mixed shape model group of the actor, where the mixed shape model group includes multiple mixed shape models and is used to describe multiple expressions;
  • Step S202 performing principal component analysis on the mixed shape model group to obtain the preset three-dimensional face model.
  • the plurality of expressions include at least a neutral expression
  • the mixed-shape model group includes at least one mixed-shape model describing the neutral expression.
  • the neutral expression means no expression.
  • Other expressions can include open mouth, pouted mouth, puffed cheeks, right eye closed, etc.
  • multiple sets of scan data can be obtained by scanning multiple expressions of an actor, and based on the multiple sets of scan data, a blend shape model set (ie, a blendshape model set) of the actor can be generated, or a multi-linear model and all
  • RGBD three-channel color image and depth image, abbreviated as RGB+Depth Map
  • a principal component analysis (Principle Component Analysis, PCA for short) may be performed on the mixed shape model group to obtain a preset three-dimensional face model of the actor.
  • PCA Principal Component Analysis
  • the preset three-dimensional face model can be described based on formula (1):
  • ⁇ 1 ,..., ⁇ n is the input weight
  • n is the number of expressions
  • M( ⁇ 1 ,..., ⁇ n ) is the output result of the preset three-dimensional face model
  • is the average expression
  • e i is the i-th principal component vector
  • ⁇ and e i are the results of principal component analysis, which are fixed variables, not related to specific expressions, but related to actors.
  • the input weight is the weight of n principal component vectors, that is, n expression parameters. Combined with formula (1), inputting different input weights can generate 3D faces with different shapes (ie expressions).
  • the 3D face of the actor with the corresponding expression can be generated according to the preset 3D face model.
  • the principal component analysis model of the actor with very high accuracy is obtained as a preset three-dimensional face model based on the mixed shape model group. Since the quality of the preset 3D face model is high enough, the 3D face of the actor with high precision can be reconstructed as the mapping basis of the animation data when the face of the virtual object is generated.
  • step S102 when producing an animation offline or performing an animation in real time, it is necessary to reconstruct a three-dimensional face corresponding to the expression through the facial image of the actor, that is, step S102 is performed.
  • step S102 may include reconstructing the actor's three-dimensional face based on the preset three-dimensional face model and the actor's facial image
  • step S102 may further include reconstructing the actor's face based on the preset three-dimensional eye model and the actor's facial image.
  • the reconstruction yields the three-dimensional gaze of the actor.
  • the step S102 may include the following steps:
  • Step S1021 detecting the facial image to obtain at least a plurality of two-dimensional facial feature points
  • Step S1022 generating an estimated three-dimensional face according to the preset three-dimensional face model
  • Step S1023 extracting a plurality of estimated three-dimensional feature points from the estimated three-dimensional face
  • Step S1024 projecting the plurality of estimated three-dimensional feature points to a two-dimensional plane to obtain a plurality of two-dimensional projection points;
  • Step S1025 calculate the coordinate difference between the multiple two-dimensional face feature points and the multiple two-dimensional projection points
  • Step S1026 if the coordinate difference is less than a preset threshold, determine the estimated three-dimensional face as the reconstructed three-dimensional face of the actor.
  • a plurality of two-dimensional feature points can be obtained by detecting the facial image, wherein the two-dimensional feature points include two-dimensional face feature points and two-dimensional pupil feature points.
  • extracting a plurality of 3D feature points from the 3D face includes: predetermining vertex indices of a plurality of 3D faces corresponding to the plurality of 2D face feature points respectively, according to the multiple 3D faces The vertex index of the face, extracts the vertices of multiple 3D faces as multiple 3D feature points.
  • the estimated three-dimensional feature points are extracted from the estimated three-dimensional face.
  • Actors need to wear a helmet when performing, and a camera will be fixed on the helmet.
  • the camera will record the facial images of the actors during performance.
  • the camera can be a head-mounted RGB (R is the abbreviation of red RED, G is the abbreviation of green Green, and B is the abbreviation of green. Abbreviation of Blue Blue) camera, or RGBD (D is the abbreviation of depth map Depth) camera.
  • RGBD RGBD
  • step S102 shown in FIG. 3 is described in detail by taking one of the frames as an example.
  • a machine learning method may be used to detect the facial image, so as to detect two-dimensional feature points and corresponding semantic information therein.
  • the semantic information is used to describe the face position corresponding to the two-dimensional feature point.
  • the semantic information of each 2D feature point is predefined, for example, the 2D feature point No. 64 represents the tip of the nose.
  • each facial image detects 73 2D facial feature points and 6 2D pupil feature points, as well as the semantic information of each 2D feature point.
  • numbers 0-14 are two-dimensional facial contour points
  • 15-72 are two-dimensional facial feature points
  • 73-78 are two-dimensional pupil feature points.
  • the machine learning model may include a model constructed based on a convolutional neural network (Convolution Neural Network, CNN for short), or an active appearance model (Active appearance model).
  • Convolution Neural Network CNN for short
  • Active appearance model an active appearance model
  • the present embodiment automatically detects the two-dimensional feature points for each actor's facial image, that is, this step is completely automated. , which greatly improves the efficiency of animation production.
  • each 2D face feature point may correspond to a vertex index of a 3D face.
  • the vertex index of the 3D face corresponding to the 2D face feature point No. 64 (that is, the tip of the nose) is 3780.
  • the vertex indices of the 73 three-dimensional faces corresponding to the 73 two-dimensional face feature points may be predetermined.
  • 2D pupil feature points are used for subsequent eye reconstruction, wherein there are three 2D pupil feature points for each of the left and right eyes, including one pupil center feature point and two 2D pupil edge feature points.
  • extracting a plurality of three-dimensional feature points from the three-dimensional face includes: predetermining the vertex indices of the 73 three-dimensional faces corresponding to the 73 two-dimensional face feature points respectively, according to the vertices of the 73 three-dimensional faces , the vertices of 73 3D faces are extracted as 73 3D feature points.
  • step S1022 according to the input weights and the preset 3D face model established in the step S202, an estimated 3D face corresponding to the expressions made by the actor in the current facial image can be obtained.
  • the estimated three-dimensional faces of the 73 vertices corresponding to the expressions made by the actor in the current facial image can be obtained. Coordinate location. That is, 73 estimated three-dimensional feature points are extracted from the estimated three-dimensional face corresponding to the expressions made by the actor in the current facial image.
  • the above 73 estimated 3D feature points can be projected onto the A two-dimensional projection point is obtained on the face image, and the coordinate difference between the two-dimensional projection point and the two-dimensional face feature point detected in step S1021 is calculated.
  • M( ⁇ 1 ,..., ⁇ n ) is the output result of the preset three-dimensional face model described in formula (1);
  • p i is the i-th two-dimensional face feature point detected in step S1021;
  • v i is the vertex index of the three-dimensional face corresponding to the i-th two-dimensional face feature point;
  • R is the rotation matrix of the actor's face relative to the camera;
  • t is the translation vector of the actor's face relative to the camera;
  • is the perspective projection function, The function of this function is to project three-dimensional vertices into two-dimensional points.
  • the perspective projection function needs to use camera internal parameters, which are obtained by camera calibration; Indicates the two-dimensional projection point corresponding to the i-th two-dimensional face feature point; "
  • the Euclidean distance is used to measure the above coordinate difference.
  • the coordinate difference calculated based on the formula (2) it can be determined whether the estimated three-dimensional face obtained based on the current input weight fits the expression made by the actor in the facial image.
  • the estimated three-dimensional face is determined as the actor's three-dimensional face.
  • the preset three-dimensional face model and camera external parameters can be iteratively adjusted, and steps S1022 to S1025 are repeatedly performed to iteratively calculate the coordinate difference, until a plurality of two-dimensional face models are obtained based on the adjusted preset three-dimensional face model.
  • the coordinate difference between the projection point and the plurality of two-dimensional face feature points is smaller than the preset threshold. At this point, an estimated three-dimensional face that best fits the expression made by the actor in the face image can be obtained. Different estimated 3D faces correspond to different expressions.
  • the output result of the preset 3D face model is associated with the input weight, and accordingly, the input weight can be iteratively adjusted to obtain different output results of the preset 3D face model, different output results corresponding to different expressions.
  • the output result of the preset three-dimensional face model is a three-dimensional face. Different three-dimensional faces correspond to different expressions.
  • the input weights can be iteratively adjusted from zero at the beginning, that is, iteratively adjusted from the expressionless.
  • the camera extrinsic parameters include the relative position and orientation (R, t) between the actor's face and the image capture device that captures the face image.
  • the image acquisition device includes a camera, that is, a camera.
  • the three-dimensional eyesight of the actor is reconstructed according to the preset three-dimensional eyesight model and the facial image.
  • the three-dimensional eye model may be a model established according to the reconstructed three-dimensional face and camera external parameters (R, t), or the three-dimensional eye model may also be an artificial neural network prediction model.
  • the three-dimensional eye can be reconstructed according to the reconstructed three-dimensional face and camera extrinsic parameters (R, t), and the detected six two-dimensional pupil feature points from the face image.
  • three-dimensional eyes can also be directly predicted using an artificial neural network based on facial images.
  • the reconstructed three-dimensional face is obtained by merging the reconstructed three-dimensional face and the eyeball with the reconstructed three-dimensional eyes.
  • the process of establishing the mapping relationship between the three-dimensional feature points and the animation data in the step S104 may include the step of: acquiring training data, where the training data includes multiple three-dimensional features corresponding to each of the training frames of multiple frames Points and animation data, the multi-frame training frames are facial images of the actor when different expressions are made; a mapping relationship between the three-dimensional feature points and animation data is established based on the training data.
  • the plurality of three-dimensional feature points corresponding to the training frames of each frame may be obtained by executing the above steps S101 to S103, that is, each frame of the training frame is used as the image frame to be processed, and the above steps S101 to S103 are executed to A plurality of corresponding three-dimensional feature points are obtained.
  • the three-dimensional coordinates of the centers of the left and right pupils are obtained as the three-dimensional pupil center feature points of the left and right eyes.
  • the plurality of three-dimensional feature points may include 73 three-dimensional feature points representing human faces and 2 three-dimensional pupil center feature points representing eyes.
  • two three-dimensional pupil center feature points are selected from the eyeballs having the reconstructed three-dimensional eye gaze to represent the gaze directions of the left and right eyes.
  • the method for extracting the 73 three-dimensional feature points representing the face can be performed according to the method of step S1023.
  • the multi-frame training frame may be selected from a single video, and the video may be filmed when the actor performs according to a preset script.
  • the multi-frame training frame is an image frame with the largest difference in the feature information of the three-dimensional feature points among all the image frames included in the video.
  • image frames to be processed may be selected from image frames other than training frames in the video.
  • This embodiment is suitable for offline facial animation production scenarios, and can greatly improve offline production efficiency.
  • Taking the production of offline facial animation for a video of 1000 frames as an example based on this embodiment, about 30 training frames and their animation data can be selected from the 1000 frames as training data, and the difference between the three-dimensional feature points and the animation data can be obtained by training. mapping relationship.
  • As for the remaining 970 frames in the 1000 frames it can be directly predicted based on the mapping relationship obtained by the aforementioned training to obtain the corresponding animation data, instead of the traditional production process, the animation of the whole 1000 frames needs to be carefully produced by the animator manually to obtain the corresponding animation data. animation data.
  • a performance video of actor A may be recorded.
  • step S102 is performed to reconstruct the corresponding 3D face
  • step S103 is performed to extract the 3D feature points of each frame of the 3D face.
  • this video has a total of 1000 frames.
  • the farthest point sampling algorithm is used to sample 30 frames with the largest difference (ie, the largest difference in feature information) as training frames.
  • the large difference in feature information refers to a relatively large difference in the positions of the corresponding three-dimensional feature points of the two frames of three-dimensional faces. That is, the most representative 30 frames are selected from the 1000 frames as the training frames, and the most representative refers to the largest and most prominent expression difference.
  • other methods can be used to obtain training frames, such as cluster sampling, stratified sampling, random sampling.
  • the animation data there is a face rig for the virtual character, which contains the controllers that the animator will use.
  • the expression of the avatar can be adjusted by adjusting the attribute value of the controller.
  • Animation data refers to controller data.
  • 30 frames of three-dimensional feature points and corresponding animation data can be obtained, and these data form training data as the training basis for predicting the animation data corresponding to the remaining 970 frames.
  • training data there are 30 frames of facial images, 30 frames of corresponding 3D facial feature point data, and 30 frames of corresponding animation data.
  • each frame of training data includes the three-dimensional feature point data of the three-dimensional face of the frame and the animation data of the frame.
  • the radial basis function (RBF) algorithm is used to establish the mapping relationship between the three-dimensional feature points of the three-dimensional face and the animation data. For example, use the training data of these 30 frames to train the RBF algorithm model to obtain the RBF weight parameters.
  • the RBF weight parameter can describe the above mapping relationship.
  • algorithms such as linear regression can also be used to establish the mapping relationship.
  • the training data may be adjusted according to the expression similarity, which is the expression similarity made by the actor in the to-be-processed image frame and the expression based on the to-be-processed image frame.
  • the training data may be a result of feedback adjustment based on the expression similarity.
  • 30 frames may not be selected as training frames at the beginning. Instead, first select a dozen frames for training, and use the mapping relationship obtained by training to predict the animation data of the remaining frames. Stop tuning if the predicted animation quality is good enough. If the predicted animation quality is not good enough, select some additional image frames from the video as training frames.
  • the user may specify the number of training frames (ie, the number of frames), and then the terminal executing this specific implementation may select the number of training frames specified by the user.
  • the user can delete or add the currently selected training frame, and can also designate any frame in the current video as the training frame.
  • the training data of the remaining frames is predicted and obtained by using the mapping relationship established based on the training data
  • the training data used for production ie the aforementioned 30 frames of 3D feature points and 30 frames of animation data
  • These training data can be used in real-time driving scenarios.
  • the multi-frame training frames may be obtained from a plurality of videos, and the plurality of videos are shot when the actor performs according to a preset script.
  • the image frame to be processed may be a facial image of the actor obtained by real-time shooting.
  • This embodiment is suitable for real-time driving application scenarios, and can drive virtual character faces of various precisions in real time.
  • the training data needs to be prepared in advance.
  • the quality of training data has a great influence on the final real-time driving effect, so there are high requirements for the production of training data.
  • training data preparation process is as follows:
  • the specified content may include a general expression, such as smile, surprise, shock, contempt, and the like.
  • the specified content may also include basic expressions, such as eyebrow lowering, nostril constriction, and the like.
  • the specified content may also include articulation and vocalization, such as the expressions of the actors when they begin articulating characters from a (ah) o (oh) e (goose) i (clothing) u (wu)... in Chinese.
  • the specified content may also include the reading of text, for example, a specified actor reads and records one or several text segments, and the segment is selected in advance.
  • animation data corresponding to each frame of facial image in these performance videos can be produced by using the above-mentioned offline facial animation production process.
  • export these animation data as training data.
  • the acquired training data may be the result of the expression similarity feedback adjustment used in the offline facial animation production process, so as to obtain better training data.
  • a part of image frames may be selected respectively from a performance video of the actor recording some specified content and each video obtained by the actor performing the recording according to a preset script to make a training frame.
  • the training data (including the three-dimensional feature points and animation data in the training frame) is adjusted by the method of expression similarity feedback adjustment adopted in the above-mentioned offline facial animation production process to obtain adjusted training data.
  • the adjusted training data are superimposed together as the training data for training. This ensures that the coverage of the training data is wider and can cover most expressions.
  • an RBF algorithm model can be trained based on the training data, so as to obtain an RBF weight parameter to describe the mapping relationship.
  • Step S103 is to extract three-dimensional feature points; use the RBF algorithm model trained based on the training data to predict the animation data in real time; use character binding to convert the animation data into a data form that UE or Unity3d can receive in real time (such as mixed shape weights). and bone data); send the converted data to UE or Unity3d in real time, thereby driving the virtual object face in real time.
  • Offline-made predictions are the remaining frames in a single video that are not selected as training frames, while real-time-driven predictions are video data received in real-time, and real-time-driven predictions are not image frames in the video used for training.
  • the training data can be universal, thereby enriching the training samples.
  • the RBF algorithm model obtained by training can express the mapping relationship between the three-dimensional feature points and animation data under enough kinds of expressions
  • the RBF algorithm model can also be a general model and suitable for different videos.
  • the offline-produced prediction object may also be an image frame in the newly acquired video data.
  • Face tracing method mark a number of points on the actor's face, capture the face, and obtain face information
  • face no tracing method there is no mark on the actor's face, and the algorithm is used to directly extract information on the actor's face , to capture the face and obtain the face information.
  • a single camera or multiple cameras can be used to capture the face.
  • a single camera is light and easy to wear, and it can also achieve the result of multiple cameras. Multiple cameras can capture face data from multiple angles. For capture devices, RGB cameras and/or RGBD cameras may be employed.
  • the present embodiment can provide an efficient and high-precision virtual object face animation generation solution, and the generated virtual object face has a high similarity in expression with the actor's real face.
  • the quality of 3D facial reconstruction is improved based on a preset 3D facial model.
  • the animation data can be more accurately predicted, and finally a high-quality virtual object face can be obtained.
  • the overall production efficiency of the virtual object face generated by this embodiment is high. Further, due to the adoption of 3D facial reconstruction technology, the requirement for actors to wear helmets can be relaxed, and there is no need to strictly return to the helmet wearing position when facial data was last captured.
  • FIG. 4 is a schematic structural diagram of an apparatus for generating a facial animation of a virtual object according to an embodiment of the present invention.
  • the virtual object facial animation generating apparatus 4 in this embodiment can be used to implement the method and technical solutions described in the embodiments described in FIG. 1 to FIG. 3 above.
  • the virtual object facial animation generation device 4 in this embodiment may include: a receiving module 41 for receiving an image frame to be processed, the image frame including an actor's face image; a reconstruction module 42 for using The three-dimensional face of the actor is reconstructed based on the preset three-dimensional facial model and the facial image, and the preset three-dimensional face model is used to describe the facial expression changes of the actor; the extraction module 43 is used to extract the three-dimensional face from the three-dimensional face.
  • a plurality of three-dimensional feature points are extracted and obtained; the determining module 44 is used to determine the animation data corresponding to the plurality of three-dimensional feature points based on the mapping relationship between the three-dimensional feature points and the animation data; the generating module 45 is used to determine the animation data corresponding to the plurality of three-dimensional feature points based on the The animation data generates a corresponding facial expression of the virtual object, and the generated facial expression of the virtual object is consistent with the expression made by the actor in the facial image.
  • the preset three-dimensional face model includes a preset three-dimensional face model and a preset three-dimensional eye model
  • the three-dimensional face of the actor includes the three-dimensional face of the actor and the three-dimensional eyes of the actor, wherein the three-dimensional face of the actor is reconstructed based on the preset three-dimensional face model and the facial image, and the three-dimensional eyesight of the actor is reconstructed based on the preset three-dimensional gaze model and the facial image.
  • the reconstruction module 42 may include: a first detection unit for detecting the facial image to obtain at least a plurality of two-dimensional face feature points; a first generation unit for according to the preset three-dimensional face model generating an estimated three-dimensional face; a first extraction unit for extracting a plurality of estimated three-dimensional feature points from the estimated three-dimensional face; a projection unit for projecting the plurality of estimated three-dimensional feature points to a two-dimensional plane, to obtain a plurality of two-dimensional projection points; a first calculation unit for calculating the coordinate difference between the plurality of two-dimensional face feature points and the plurality of two-dimensional projection points; a first determination unit, if the If the coordinate difference is less than the preset threshold, the estimated three-dimensional face is determined as the reconstructed three-dimensional face of the actor.
  • the two-dimensional face feature points have corresponding semantic information
  • the two-dimensional projection points have corresponding semantic information
  • the first calculation unit includes: a second calculation unit for calculating the plurality of two Among the two-dimensional face feature points and multiple two-dimensional projection points, the coordinate difference between the two-dimensional face feature points corresponding to the same semantic information and the two-dimensional projection points; the second determination unit is used for calculating the calculated coordinates The sum of the differences is determined as the coordinate difference between the plurality of two-dimensional face feature points and the plurality of two-dimensional projection points.
  • the reconstruction module 42 may further include: an iterative adjustment unit, if the coordinate difference is greater than a preset threshold, iteratively adjusts the preset 3D face model and camera external parameters until the preset 3D human face is adjusted based on the adjusted preset 3D face model.
  • the coordinate difference between the multiple two-dimensional projection points obtained by the face model and the multiple two-dimensional face feature points is smaller than the preset threshold.
  • the output result of the preset three-dimensional face model is associated with the input weight
  • the iterative adjustment unit includes: an input weight adjustment unit, configured to iteratively adjust the input weight to obtain the preset three-dimensional face model Different output results of , and different output results correspond to different expressions.
  • the camera external parameters include the relative position and orientation between the actor's face and the image capturing device that captures the face image.
  • the virtual object facial animation generating device 4 further includes: a first establishing module for establishing the preset three-dimensional facial model.
  • the first establishment module includes: a first acquisition unit for acquiring a mixed shape model group of the actor, where the mixed shape model group includes multiple mixed shape models and is used to describe multiple expressions; an analysis unit is used for analyzing The mixed shape model group is subjected to principal component analysis to obtain the preset three-dimensional face model.
  • the plurality of expressions include at least neutral expressions
  • the set of blend shape models includes at least one blend shape model describing the neutral expressions.
  • the virtual object facial animation generation device 4 further includes: a second establishment module for establishing a mapping relationship between the three-dimensional feature points and animation data.
  • the second establishment module includes: a second acquisition unit for acquiring training data, the training data includes multiple three-dimensional feature points and animation data corresponding to each of the multi-frame training frames, and the multi-frame training frames are all The facial images when the actor makes different expressions; the establishment unit is used to establish the mapping relationship between the three-dimensional feature points and the animation data based on the training data.
  • the multi-frame training frame is selected from a single video, and the multi-frame training frame is an image frame with the largest difference in feature information of corresponding three-dimensional feature points among all image frames included in the video.
  • image frames to be processed are selected from image frames other than training frames in the video.
  • the training data is adjusted according to the similarity of expressions, and the similarity of expressions is the expressions made by the actor in the image frames to be processed and the expressions of the faces of the virtual objects generated based on the image frames to be processed. similarity between.
  • the multi-frame training frames are obtained from a plurality of videos, and the plurality of videos are captured when the actor performs according to a preset script.
  • the to-be-processed image frame is a facial image of the actor obtained by real-time shooting.
  • the virtual object facial animation generating apparatus 4 may be integrated into computing devices such as terminals and servers.
  • the virtual object facial animation generating apparatus 4 may be centrally integrated in the same server.
  • the virtual object facial animation generating apparatus 4 may be integrated in a plurality of terminals or servers dispersedly and coupled to each other.
  • the preset three-dimensional face model can be separately set on the terminal or the server to ensure a better data processing speed.
  • the user inputs the image to be processed on the side of the receiving module 41 , and then the corresponding virtual object face can be obtained at the output end of the generating module 45 . Expression, so as to achieve the actor's face capture.
  • an embodiment of the present invention also discloses a storage medium on which a computer program is stored, and when the computer program is run by a processor, the method and technical solutions described in the embodiments shown in FIG. 1 to FIG. 3 are executed.
  • the storage medium may include a computer-readable storage medium such as a non-volatile memory or a non-transitory memory.
  • the storage medium may include ROM, RAM, magnetic or optical disks, and the like.
  • an embodiment of the present invention also discloses a terminal, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor executes the above diagram when running the computer program. 1 to the technical solutions of the methods described in the embodiments shown in FIG. 3 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种虚拟对象面部动画生成方法及装置、存储介质、终端,所述方法包括:接收待处理的图像帧,所述图像帧包括演员的面部图像;基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;从所述三维面部中提取得到多个三维特征点;基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。本发明方案能够提供一种高效且高精度的虚拟对象面部动画生成方案,生成的虚拟对象面部与演员真实面部的表情相似度高。

Description

虚拟对象面部动画生成方法及装置、存储介质、终端
本申请要求2020年12月31日提交中国专利局、申请号为202011639440.3、发明名称为“虚拟对象面部动画生成方法及装置、存储介质、终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及虚拟数字对象技术领域,具体地涉及一种虚拟对象面部动画生成方法及装置、存储介质、终端。
背景技术
面部捕捉(简称面捕)动画技术主要包括离线面部动画制作和实时驱动虚拟角色的面部表情这两大部分。
在传统动画制作流程里,为了制作出高质量的面部动画,不仅需要动画师有较高的制作能力,而且非常耗时。除此之外,在传统动画制作流程里是无法实时驱动虚拟角色的面部表情的。
近些年出现的人脸表演动画技术(即前述面捕动画技术),通过采用捕捉到的演员面部信息去加速离线动画的制作,以期减少离线面部动画制作的人力成本。同时,面捕动画技术的应用使得实时驱动虚拟角色的面部表情成为可能。
但是,现有的面捕动画技术仍然存在制作耗时、人力成本高、生成的虚拟对象面部质量低等诸多问题。
发明内容
本发明解决的技术问题是提供一种高效且高精度的虚拟对象面部动画生成方案。
为解决上述技术问题,本发明实施例提供一种虚拟对象面部动画 生成方法,包括:接收待处理的图像帧,所述图像帧包括演员的面部图像;基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;从所述三维面部中提取得到多个三维特征点;基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
所述预设三维面部模型包括预设三维人脸模型和预设三维眼神模型,所述演员的三维面部包括演员的三维人脸以及演员的三维眼神,其中,所述演员的三维人脸是基于所述预设三维人脸模型以及所述面部图像重建得到的,所述演员的三维眼神是基于所述预设三维眼神模型以及所述面部图像重建得到的。
可选的,基于所述预设三维人脸模型以及所述面部图像重建所述演员的三维人脸的过程包括如下步骤:检测所述面部图像以至少得到多个二维人脸特征点;根据所述预设三维人脸模型生成估算三维人脸;从所述估算三维人脸中提取得到多个估算三维特征点;将所述多个估算三维特征点投影至二维平面,以得到多个二维投影点;计算所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异;如果所述坐标差异小于预设阈值,则将所述估算三维人脸确定为重建得到的所述演员的三维人脸。
可选的,所述二维人脸特征点具有对应的语义信息,所述二维投影点具有对应的语义信息,所述计算所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异包括:分别计算所述多个二维人脸特征点和多个二维投影点中,对应相同语义信息的二维人脸特征点与二维投影点之间的坐标差异;将计算得到的多个坐标差异之和确定为所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异。
可选的,基于所述预设三维人脸模型以及所述面部图像重建所述 演员的三维人脸的过程还包括如下步骤:如果所述坐标差异大于预设阈值,则迭代调整所述预设三维人脸模型和相机外参,直至基于调整后的预设三维人脸模型得到的多个二维投影点与所述多个二维人脸特征点之间的坐标差异小于所述预设阈值。
可选的,所述预设三维人脸模型的输出结果与输入权重相关联,所述迭代调整所述预设三维人脸模型包括:迭代调整所述输入权重,以得到所述预设三维人脸模型的不同的输出结果,不同的输出结果对应不同的表情。
可选的,所述相机外参包括所述演员的面部与拍摄所述面部图像的影像采集设备之间的相对位置与朝向。
可选的,所述预设三维人脸模型的建立过程包括如下步骤:获取所述演员的混合形状模型组,所述混合形状模型组包括多个混合形状模型并用于描述多个表情;对所述混合形状模型组进行主成分分析,以得到所述预设三维人脸模型。
可选的,所述多个表情至少包括中性表情,所述混合形状模型组至少包括一个描述所述中性表情的混合形状模型。
可选的,所述三维特征点与动画数据之间映射关系的建立过程包括如下步骤:获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;基于所述训练数据建立所述三维特征点与动画数据之间的映射关系。
可选的,所述多帧训练帧选取自单个视频,并且,所述多帧训练帧是所述视频包括的所有图像帧中对应的三维特征点的特征信息差异最大的图像帧。
可选的,所述待处理的图像帧选取自所述视频中除训练帧之外的图像帧。
可选的,所述训练数据根据表情相似度调节,所述表情相似度为 所述演员在所述待处理的图像帧中做出的表情与基于所述待处理的图像帧生成的虚拟对象面部的表情之间的相似度。
可选的,所述多帧训练帧获取自多个视频,并且所述多个视频是所述演员按照预设脚本表演时拍摄得到的。
可选的,所述待处理的图像帧为实时拍摄得到的所述演员的面部图像。
为解决上述技术问题,本发明实施例还提供一种虚拟对象面部动画生成装置,包括:接收模块,用于接收待处理的图像帧,所述图像帧包括演员的面部图像;重建模块,用于基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;提取模块,用于从所述三维面部中提取得到多个三维特征点;确定模块,用于基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;生成模块,用于基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
为解决上述技术问题,本发明实施例还提供一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述方法的步骤。
为解决上述技术问题,本发明实施例还提供一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行上述方法的步骤。
与现有技术相比,本发明实施例的技术方案具有以下有益效果:
本发明实施例提供一种虚拟对象面部动画生成方法,包括:接收待处理的图像帧,所述图像帧包括演员的面部图像;基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;从所述三维面部中提 取得到多个三维特征点;基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
较之现有的面捕技术方案,本实施方案能够提供一种高效且高精度的虚拟对象面部动画生成方案,生成的虚拟对象面部与演员真实面部的表情相似度高。具体而言,基于预设三维面部模型提高三维面部重建的质量。进一步,由于准确地重建了演员的三维面部,从而能够更为准确地预测动画数据,最终得到高质量的虚拟对象面部。并且,采用本实施方案生成虚拟对象面部的整体制作效率高。进一步,由于采用三维面部重建技术,能够放松对演员佩戴头盔的要求,不需严格恢复到上次捕捉面部数据时的头盔佩戴位置。
进一步,所述预设三维面部模型包括预设三维人脸模型和预设三维眼神模型,所述演员的三维面部包括演员的三维人脸以及演员的三维眼神,其中,所述演员的三维人脸是基于所述预设三维人脸模型以及所述面部图像重建得到的,所述演员的三维眼神是基于所述预设三维眼神模型以及所述面部图像重建得到的。
进一步,所述预设三维人脸模型的建立过程包括如下步骤:获取所述演员的混合形状模型组,所述混合形状模型组包括多个混合形状模型并用于所述多个表情;对所述混合形状模型组进行主成分分析,以得到所述预设三维人脸模型。由此,基于混合形状模型获得精度非常高的该演员的主成分分析模型作为预设三维人脸模型。由于预设三维人脸模型的质量足够高,因而可以在生成虚拟对象面部生成时重建出高精度的演员的三维面部作为动画数据的映射基础。
进一步,对于输入的面部图像,使用机器学习模型自动检测所述面部图像以得到多个二维特征点。相比于现有技术需要动画师手动从演员的面部图像中标注出二维特征点,本实施方案针对每张演员的面部图片都是自动检测出二维特征点,也即该步骤是完全自动化的,大 幅提升了动画制作效率。
进一步,所述三维特征点与动画数据之间映射关系的建立过程包括如下步骤:获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;基于所述训练数据建立所述三维特征点与动画数据之间的映射关系。进一步,所述多帧训练帧选取自单个视频,并且,所述多帧训练帧是所述视频包括的所有图像帧中对应的三维特征点的特征信息差异最大的图像帧。进一步,所述待处理的图像帧选取自所述视频中除训练帧之外的图像帧。本实施方案适用于离线面部动画制作场景,能够大幅提高离线制作效率。以针对一段1000帧的视频制作离线面部动画为例,基于本实施方案,可以从1000帧中选取30帧左右的训练帧及其动画数据作为训练数据,并训练得到三维特征点与动画数据之间的映射关系。至于1000帧中剩余的970帧则可以直接基于前述训练得到的映射关系进行预测,以得到对应的动画数据,而无需如传统制作流程一般整整1000帧的动画都需要动画师手动精心制作得到对应的动画数据。
进一步,所述三维特征点与动画数据之间映射关系的建立过程包括如下步骤:获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;基于所述训练数据建立所述三维特征点与动画数据之间的映射关系。进一步,所述多帧训练帧获取自多个视频,并且所述多个视频是所述演员按照预设脚本表演时拍摄得到的。进一步,所述待处理的图像帧为实时拍摄得到的所述演员的面部图像。本实施方案适用于实时驱动应用场景,能够实时驱动各种精度的虚拟角色面部。
附图说明
图1是本发明实施例一种虚拟对象面部动画生成方法的流程图;
图2是本发明实施例一种预设三维人脸模型的建立过程流程图;
图3是图1中步骤S102的一个具体实施方式的流程图;
图4是本发明实施例一种虚拟对象面部动画生成装置的结构示意图。
具体实施方式
如背景技术所言,现有的面捕动画技术仍然存在制作耗时、人力成本高、生成的虚拟对象面部动画质量低等诸多问题。
本申请发明人经过分析发现,现有的人脸表演动画技术(即面捕技术)主要分为两大类:一种是基于二维(two-dimension,简称2D)人脸特征点,一种是基于三维(three-dimension,简称3D)人脸重建。
基于二维人脸特征点的这类技术,很多都需要动画师针对捕捉到的面捕视频手动标注二维特征点。这一步骤是非常耗时的,而且手动标注的质量好坏对最终结果影响很大。而且这类技术对演员佩戴头盔要求极高,不但要佩戴稳定,与上一次捕捉数据时佩戴的位置也要尽可能接近。
基于三维人脸重建的这类技术,需要先根据捕捉到的人脸画面重建出三维人脸,然后将三维人脸信息重定向成虚拟角色面部的动画数据。现有这类技术一般重建出的三维人脸质量不高。而且,现有这类技术是将计算得到的混合形状(blendshape)权重直接传递给虚拟角色的混合形状权重,这种做法只能针对比较简单的角色绑定起作用,且效果有限。
综上所述,现有面捕技术所通常采用两种技术手段均存在诸多缺陷,无法高效地生成高质量的虚拟对象面部。
为解决上述技术问题,本发明实施例提供一种虚拟对象面部动画生成方法,包括:接收待处理的图像帧,所述图像帧包括演员的面部图像;基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;从所述三维面部中提取得到多个三维特征点;基于三维特征点与 动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
本实施方案能够提供一种高效且高精度的虚拟对象面部动画生成方案,生成的虚拟对象面部与演员真实面部的表情相似度高。具体而言,基于预设三维面部模型提高三维面部重建的质量。进一步,由于准确地重建了演员的三维面部,从而能够更为准确地预测动画数据,最终得到高质量的虚拟对象面部。并且,采用本实施方案生成虚拟对象面部的整体制作效率高。进一步,由于采用三维面部重建技术,能够放松对演员佩戴头盔的要求,不需严格恢复到上次捕捉面部数据时的头盔佩戴位置。
为使本发明的上述目的、特征和有益效果能够更为明显易懂,下面结合附图对本发明的具体实施例做详细的说明。
图1是本发明实施例一种虚拟对象面部动画生成方法的流程图。
本实施方案可以应用于虚拟数字对象生成、动画制作等应用场景,如应用于虚拟对象面部的动画生成场景。采用本实施方案,能够基于面捕技术将演员的面部表情重定向到虚拟对象面部,使得虚拟对象呈现的面部表情与演员做出的表情相一致。
虚拟对象可以包括虚拟人,也可以包括虚拟动物、虚拟植物等多类型的具有面部的虚拟对象。虚拟对象可以是三维的。
所述虚拟对象面部动画数据可以包括用于生成虚拟对象动画的控制器数据,具体表现形式为数字化向量的序列。例如,采用本实施方案能够得到虚拟对象面部的动画数据,所述动画数据即为控制器的属性值,将所述动画数据转化为UE或Unity3d可以接收的数据形式,输入渲染引擎,如UE或Unity3d,即可驱动虚拟对象的面部做出相应的动作。
所述动画数据可以包括虚拟对象的面部表情,即虚拟对象的表情参数。例如,面部表情可以包括表情、眼神等信息。
具体地,参考图1,本实施例所述虚拟对象面部动画生成方法可以包括如下步骤:
步骤S101,接收待处理的图像帧,所述图像帧包括演员的面部图像;
步骤S102,基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;
步骤S103,从所述三维面部中提取得到多个三维特征点;
步骤S104,基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;
步骤S105,基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
在一个具体实施中,预设三维面部模型是基于所述演员做出特定表情时的扫描数据建立的数学模型,能够描述这个演员的任意表情。并且,所述预设三维面部模型能够用尽量少的表情参数描述出演员的三维面部表情变化,利于提高后续重建三维面部时的处理效率。
进一步,所述预设三维面部模型可以与所述演员相关联,所述演员即为步骤S101中输入的待处理的图像帧中所拍摄的演员。也即,更换演员时需要重复执行预设三维面部模型的建立操作。由此可以提高制作精度,节省计算开销。
进一步,所述预设三维面部模型可以包括预设三维人脸模型和预设三维眼神模型,所述演员的三维面部可以包括演员的三维人脸以及演员的三维眼神,其中,所述演员的三维人脸可以是基于所述预设三 维人脸模型以及所述面部图像重建得到的,所述演员的三维眼神可以是基于所述预设三维眼神模型以及所述面部图像重建得到的。
在一个具体实施中,参考图2,所述预设三维人脸模型的建立过程可以包括如下步骤:
步骤S201,获取所述演员的混合形状模型组,所述混合形状模型组包括多个混合形状模型并用于描述多个表情;
步骤S202,对所述混合形状模型组进行主成分分析,以得到所述预设三维人脸模型。
具体地,所述多个表情至少包括中性表情,所述混合形状模型组至少包括一个描述所述中性表情的混合形状模型。所述中性表情是指无表情。其他表情可以包括张嘴、撇嘴、脸颊鼓气、右闭眼等。进一步,可以扫描演员的多个表情获得多组扫描数据,基于多组扫描数据生成所述演员的混合形状模型组(即,blendshape模型组),或者利用多线性模型(multi-linear model)和所述演员的多个表情的RGBD(三通道彩色图像和深度图像,即RGB+Depth Map的简称)数据生成混合形状模型组(即,blendshape模型组)。
在一个具体实施中,在所述步骤S202中,可以对混合形状模型组进行主成分分析(Principle Component Analysis,简称PCA),得到所述演员的预设三维人脸模型。由此,能够以更少的数据量描述演员的面部表情变化。
具体而言,所述预设三维人脸模型可以基于公式(1)描述:
Figure PCTCN2021138747-appb-000001
其中,α 1,...,α n为输入权重,n为表情的数量;M(α 1,...,α n)为所述预设三维人脸模型的输出结果;μ为平均表情的向量;e i为第i个主成分向量,μ和e i是主成分分析的结果,为固定变量,与具体表情无关, 与演员相关。
输入权重为n个主成分向量的权重,即n个表情参数。结合公式(1),输入不同的输入权重可以生成不同形状(即表情)的三维人脸。
当给定一组输入权重α 1,...,α n时,可根据预设三维人脸模型生成对应表情的演员的三维人脸。
同时,可以得到第i个主成分对应的标准差δ i
由此,基于混合形状模型组获得精度非常高的该演员的主成分分析模型作为预设三维人脸模型。由于预设三维人脸模型的质量足够高,因而在生成虚拟对象面部时可以重建出高精度的演员的三维面部作为动画数据的映射基础。
在一个具体实施中,在离线制作动画或者实时表演动画的时候,都需要通过演员的面部图像重建出对应表情的三维人脸,也即执行步骤S102。其中,步骤S102可以包括基于预设三维人脸模型以及所述演员的面部图像重建得到所述演员的三维人脸,所述步骤S102还可以包括基于预设三维眼神模型以及所述演员的面部图像重建得到所述演员的三维眼神。接下来,结合图3对基于预设三维人脸模型以及所述演员的面部图像重建所述演员的三维人脸的过程进行具体阐述。
具体地,参考图3,所述步骤S102可以包括如下步骤:
步骤S1021,检测所述面部图像以至少得到多个二维人脸特征点;
步骤S1022,根据所述预设三维人脸模型生成估算三维人脸;
步骤S1023,从所述估算三维人脸中提取得到多个估算三维特征点;
步骤S1024,将所述多个估算三维特征点投影至二维平面,以得到多个二维投影点;
步骤S1025,计算所述多个二维人脸特征点与所述多个二维投影 点之间的坐标差异;
步骤S1026,如果所述坐标差异小于预设阈值,则将所述估算三维人脸确定为重建得到的所述演员的三维人脸。
进一步,检测面部图像可以得到多个二维特征点,其中,所述二维特征点包括二维人脸特征点,还包括二维瞳孔特征点。
在一个具体实施中,从所述三维人脸中提取得到多个三维特征点包括,可以预先确定多个二维人脸特征点分别对应的多个三维人脸的顶点索引,依据多个三维人脸的顶点索引,将多个三维人脸的顶点提取出来作为多个三维特征点。在步骤S1023中从估算三维人脸中提取估算三维特征点即采取的该方法。
在演员表演时需要佩戴一个头盔,头盔上会固定一个摄像机,摄像机会录制演员表演时的面部画面,摄像机可以为头戴式RGB(R为红色RED的缩写,G为绿色Green的缩写,B为蓝色Blue的缩写)相机,或者RGBD(D为深度图Depth的缩写)相机。对每一帧的演员的面部图像,三维人脸重建的流程是一样,这里以其中一帧为例对图3所示步骤S102的具体流程进行详细说明。
在所述步骤S1021中,获取一帧的面部图像后,可以采用机器学习方法对面部图像进行检测,以检测出其中的二维特征点及对应的语义信息。所述语义信息用于描述该二维特征点所对应的面部位置。每个二维特征点的语义信息是预先定义好的,如64号二维特征点表示鼻尖点。
例如,每张面部图像会检测73个二维人脸特征点和6个二维瞳孔特征点,以及每个二维特征点的语义信息。如0~14号是二维脸部轮廓点,15~72是二维脸部五官点,73~78是二维瞳孔特征点。
例如,所述机器学习模型可以包括基于卷积神经网络(Convolution Neural Network,简称CNN)构建的模型,或者主动外观模型(Active appearance model)。
相比于现有技术需要动画师手动从演员的面部图像中标注出二维特征点,本实施方案针对每张演员的面部图片都是自动检测出二维特征点,也即该步骤是完全自动化的,大幅提升了动画制作效率。
进一步,每一个二维人脸特征点可以对应一个三维人脸的顶点索引。例如,64号二维人脸特征点(即鼻尖点)对应的三维人脸的顶点索引为3780。在本具体实施中,可以预先确定73个二维人脸特征点分别对应的73个三维人脸的顶点索引。
进一步,6个二维瞳孔特征点用于后续眼神重建,其中,左右眼各三个二维瞳孔特征点,包括一个瞳孔中心特征点和两个二维瞳孔边缘特征点。
具体地,从所述三维人脸中提取得到多个三维特征点包括,可以预先确定73个二维人脸特征点分别对应的73个三维人脸的顶点索引,依据73个三维人脸的顶点,将73个三维人脸的顶点提取出来作为73个三维特征点。
在所述步骤S1022中,依据输入权重和步骤S202建立的预设三维人脸模型,可以得到对应于演员在当前面部图像中所做表情的估算三维人脸。
然后,根据预先确定的对应于73个二维人脸特征点的73个三维人脸的顶点索引,可以得到对应于演员在当前面部图像中所做表情的估算三维人脸上这73个顶点的坐标位置。也即,从对应于演员在当前面部图像中所做表情的估算三维人脸中提取得到73个估算三维特征点。
然后,根据演员的面部与相机的相对位置(R,t)和透视投影函数Π,利用三维人脸重建的目标函数(公式(2)),可以将上述73个估算三维特征点投影到所述面部图像上得到二维投影点,并计算二维投影点与步骤S1021检测得到的二维人脸特征点之间的坐标差异。
Figure PCTCN2021138747-appb-000002
其中,M(α 1,...,α n)为公式(1)所述预设三维人脸模型的输出结果;p i为步骤S1021检测到的第i个二维人脸特征点;v i为第i个二维人脸特征点对应的三维人脸的顶点索引;R为演员的面部相对于相机的旋转矩阵;t为演员的面部相对于相机的平移向量;Π为透视投影函数,该函数的功能是将三维顶点投影成二维点,其中,透视投影函数需要用到相机内参,相机内参是通过相机标定得到的;
Figure PCTCN2021138747-appb-000003
表示第i个二维人脸特征点对应的二维投影点;“||||”为取模函数;
Figure PCTCN2021138747-appb-000004
为正则项,α j是第j个输入权重;为了保证在迭代调整过程中不会求出奇怪的输入权重α 1,...,α n;λ为可调节的超参数,用于调节正则项对整个迭代调整过程的影响;δ i为预设三维人脸模型里第i个主成分对应的标准差。
在本具体实施中,采用欧氏距离衡量上述坐标差异。
进一步,在公式(2)中,将对应相同语义信息的二维人脸特征点和二维投影点确定为编号相同的二维人脸特征点和二维投影点,编号取值范围为i=[1,73]。也即,在公式(2)中,分别计算所述多个二维人脸特征点和多个二维投影点中,对应相同语义信息的二维人脸特征点与二维投影点之间的坐标差异;将计算得到的多个坐标差异之和确定为所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异。
进一步,根据基于公式(2)计算得到的坐标差异,可以确定基于当前输入权重得到的估算三维人脸是否贴合于面部图像中演员做出的表情。
如果坐标差异小于预设阈值,则将估算三维人脸确定为演员的三维人脸。
如果所述坐标差异大于预设阈值,表明基于当前输入权重得到的估算三维人脸不符合面部图像中演员做出的表情。相应地,可以迭代调整所述预设三维人脸模型和相机外参,并重复执行步骤S1022至步骤S1025以迭代计算坐标差异,直至基于调整后的预设三维人脸模型得到的多个二维投影点与所述多个二维人脸特征点之间的坐标差异小于所述预设阈值。此时,可以获得最贴合于面部图像中演员所做表情的估算三维人脸。不同的估算三维人脸对应不同的表情。
例如,所述预设三维人脸模型的输出结果与输入权重相关联,相应的,可以迭代调整所述输入权重,以得到所述预设三维人脸模型的不同的输出结果,不同的输出结果对应不同的表情。所述预设三维人脸模型的输出结果为三维人脸。不同的三维人脸对应不同的表情。
例如,初始时输入权重可以从零开始迭代,即从无表情开始迭代调整。
例如,所述相机外参包括所述演员的面部与拍摄所述面部图像的影像采集设备之间的相对位置和朝向(R,t)。所述影像采集设备包括相机,即摄像头。
进一步,根据预设三维眼神模型以及面部图像重建得到所述演员的三维眼神。
在一个具体实施中,所述三维眼神模型可以是根据重建的三维人脸和相机外参(R,t)建立得到的模型,或者,所述三维眼神模型也可以是人工神经网络预测模型。具体地,可以根据重建的三维人脸和相机外参(R,t),以及检测到的自面部图像的6个二维瞳孔特征点 来重建出三维眼神。或者,也可以根据面部图像利用人工神经网络直接预测出三维眼神。
进一步,将重建的所述三维人脸和具有重建的三维眼神的眼球合并到一起得到重建后的所述三维面部。
在一个具体实施中,所述步骤S104中所述三维特征点与动画数据之间映射关系的建立过程可以包括步骤:获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;基于所述训练数据建立所述三维特征点与动画数据之间的映射关系。
具体地,各帧训练帧对应的多个三维特征点可以是通过执行上述步骤S101至步骤S103得到的,也即,将每一帧训练帧作为待处理的图像帧执行上述步骤S101至步骤S103以得到对应的多个三维特征点。
具体地,根据具有重建的三维眼神的眼球,获取左右瞳孔中心的三维坐标,作为左右眼的三维瞳孔中心特征点。
在一个具体实施例中,所述多个三维特征点可以包括表征人脸的73个三维特征点和表征眼神的2个三维瞳孔中心特征点。在本具体实施中,从具有重建的三维眼神的眼球中选择2个三维瞳孔中心特征点表示左右眼的眼神方向。表征人脸的73个三维特征点的提取方法可以按照步骤S1023的方法执行。
进一步,所述多帧训练帧可以选取自单个视频,并且视频可以是所述演员按照预设脚本表演时拍摄得到的。并且,所述多帧训练帧是所述视频包括的所有图像帧中三维特征点的特征信息差异最大的图像帧。
进一步,所述待处理的图像帧可以选取自所述视频中除训练帧之外的图像帧。
本实施方案适用于离线面部动画制作场景,能够大幅提高离线制 作效率。以针对一段1000帧的视频制作离线面部动画为例,基于本实施方案,可以从1000帧中选取30帧左右的训练帧及其动画数据作为训练数据,并训练得到三维特征点与动画数据之间的映射关系。至于1000帧中剩余的970帧则可以直接基于前述训练得到的映射关系进行预测,以得到对应的动画数据,而无需如传统制作流程一般整整1000帧的动画都需要动画师手动精心制作得到对应的动画数据。
具体而言,可以录制演员A的一段表演视频。对该段视频的每一帧面部图像,执行步骤S102以重建出对应的三维面部,然后执行步骤S103以提取出每一帧三维面部的三维特征点。在本示例中,假设这段视频共有1000帧。
然后,根据这1000帧三维面部各自对应的三维特征点,采用最远点采样算法采样出差异性最大(即特征信息差异最大)的30帧作为训练帧。其中,特征信息差异大是指两帧三维面部的对应的三维特征点的位置差异比较大。即从1000帧中选取最具代表性的30帧作为训练帧,最具代表性是指表情差别最大、最突出。除了最远点采样算法之外,还可以采用其他方法获取训练帧,例如,整群抽样(cluster sampling)、分层采样、随机采样。
在制作动画数据之前,要有虚拟角色的面部绑定,绑定里包含了动画师要用到的控制器。可以通过调整控制器的属性值来调整虚拟角色的表情。
然后,动画师可以参考这30帧的面部图片制作出对应的30帧动画数据。动画数据是指控制器数据。
至此,可以得到30帧的三维特征点和对应的动画数据,这些数据形成训练数据,作为预测剩余970帧所对应动画数据的训练基础。具体而言,现在已经有30帧面部图像、对应的30帧三维面部的三维特征点数据以及对应的30帧动画数据。这样就有了30帧训练数据,每一帧训练数据包括该帧的三维面部的三维特征点数据和该帧的动画数据。
接下来,利用径向基函数(radial based function,简称RBF)算法建立三维面部的三维特征点与动画数据之间的映射关系。例如,利用这30帧的训练数据训练RBF算法模型,得到RBF权重参数。该RBF权重参数即可描述上述映射关系。除了RBF算法外,还可以用诸如线性回归等算法建立所述映射关系。
然后,利用训练得到的RBF算法模型去预测剩余970帧的动画数据。在预测的时候,向RBF算法模型输入任意一帧的三维面部的三维特征点,该模型即可输出该帧的动画数据。
在离线面部动画制作场景中,可以根据表情相似度调整所述训练数据,所述表情相似度为所述演员在所述待处理的图像帧中做出的表情与基于所述待处理的图像帧生成的虚拟对象面部的表情之间的相似度。
进一步,所述训练数据可以是基于所述表情相似度进行反馈调节后的结果。
例如,实际操作过程中,可以不是一开始就选择30帧为训练帧。而是先选择十几帧训练一下,用训练得到的映射关系预测一下剩余帧的动画数据。如果预测得到的动画质量足够好就停止调整。如果预测得到的动画质量不够好就从视频中再增加选择一些图像帧作为训练帧。
又例如,用户可以指定训练帧的数目(即帧数),然后执行本具体实施的终端可以选出用户指定数目的训练帧。在训练过程中,用户可以对当前选出的训练帧做删减或增加,还能指定当前视频中的任意一帧作为训练帧。
再例如,在用基于训练数据建立的映射关系预测得到剩余帧的动画数据后,可以根据所述表情相似度来确定是否需要调整训练数据。例如,若表情相似度较低,则可以增加或删减训练数据中的训练帧,或者调整训练数据中与训练帧对应的动画数据,以通过对剩余帧的动 画数据的预测结果来反馈影响训练数据的生成,以期得到更高质量、更贴合演员实际表情的虚拟对象面部。
进一步,在离线面部动画制作场景中,在制作完成一段表演视频对应的动画数据后,可以将制作用到的训练数据(即前述30帧三维特征点和30帧动画数据)导出并保存。在实时驱动场景中可以用到这些训练数据。
在一个具体实施中,所述多帧训练帧可以获取自多个视频,并且所述多个视频是所述演员按照预设脚本表演时拍摄得到的。
进一步,所述待处理的图像帧可以为实时拍摄得到的所述演员的面部图像。
本实施方案适用于实时驱动应用场景,能够实时驱动各种精度的虚拟角色面部。
具体而言,在演员实时驱动虚拟角色之前,需要提前准备好训练数据。训练数据的好坏对最终实时驱动的效果影响很大,因此对训练数据的制作有较高要求。
例如,训练数据的准备过程如下:
首先对演员录制一些指定内容的表演视频。所述指定内容可以包括一段常规表情,如微笑、惊讶、大惊、蔑视等。所述指定内容还可以包括基础表情,例如降眉、鼻孔收缩等。所述指定内容还可以包括吐字发声,如中文从a(啊)o(喔)e(鹅)i(衣)u(乌)...开始演员吐字时表情。所述指定内容还可以包括文字的朗读,如指定演员朗读一段或几段文字片段并录制,该片段是提前选好的。
进一步,还可以录制一些与剧本相关的视频。同时,针对角色特点和剧本,会录制一些为该虚拟角色准备的表演片段或表情。
录制好这些表演视频后,可以采用上述离线面部动画制作流程制作出这些表演视频中每一帧面部图像对应的动画数据。当觉得制作的 动画质量足够好后,导出这些动画数据作为训练数据。进一步,在实时驱动场景中,获取的训练数据可以是如离线面部动画制作流程所采用的经过表情相似度反馈调节的结果,以得到更优的训练数据。
在一个具体实施中,可以从所述演员录制一些指定内容的表演视频和所述演员按照预设脚本表演录制得到的每一段视频中分别选取一部分图像帧做出训练帧。针对每一段视频,采用上述离线面部动画制作流程所采用的经过表情相似度反馈调节的方法调节训练数据(包括训练帧中的三维特征点和动画数据),获得调节后的训练数据。将调节后的训练数据叠加到一起作为所述训练数据进行训练。由此,保证训练数据的覆盖面更广泛,能够覆盖大部分的表情。
进一步,准备好训练数据后,可以基于所述训练数据训练RBF算法模型,从而得到RBF权重参数以描述所述映射关系。
接下来,可以开始实时驱动虚拟对象面部了,实时驱动的具体过程如下:
演员佩戴好头盔,连接好相机;从相机实时捕捉到演员的面部图像;执行步骤S1021以实时检测出面部图像中的二维特征点;执行步骤S1022至步骤S1026以实时重建出三维面部,并执行步骤S103以提取出三维特征点;用基于训练数据训练得到的RBF算法模型实时预测出动画数据;利用角色绑定,实时地将动画数据转化为UE或Unity3d可以接收地数据形式(如混合形状权重和骨骼数据);将转化后地数据实时发送给UE或者Unity3d,从而实时驱动虚拟对象面部。
通常而言,实时驱动与离线制作的区别在于,两者的预测对象不同。离线制作的预测对象是单个视频中未被选为训练帧的剩余帧,而实时驱动的预测对象是实时接收到的视频数据,实时驱动的预测对象不是用于训练的视频中的图像帧。
另一方面,实时驱动与离线制作的区别还在于,离线制作有反馈过程而实时驱动没有。离线制作的预测结果可以反过来影响训练数据 的选取和数据内容,实时驱动则没有此过程。
在一个变化例中,在离线制作场景中,虽然不同的视频训练得到的RBF算法模型一般而言不能通用的,但训练数据是可以通用的,从而丰富训练样本。
或者,当训练样本足够多,使得训练得到的RBF算法模型能够表达足够多种类表情下三维特征点与动画数据之间的映射关系时,所述RBF算法模型也能够是通用模型而适用于不同视频。
具体地,若离线制作的训练数据足够多到能覆盖所有表情,那么离线制作的预测对象也可以是新获取到的视频数据中的图像帧。
对于捕捉演员在表演过程中的表情,可采用以下方法进行捕捉。脸上描点法,在演员的脸上标记处若干个标记点,捕捉人脸,获得人脸信息;脸上不描点法:演员的脸上无标记点,运用算法直接在演员的脸上提取信息,捕捉人脸,获得人脸信息。在人脸捕捉过程中,可以采用单个相机或者多个相机对人脸进行捕捉。单个相机轻便易戴,也可以达到多个相机的结果,多个相机可以实现多个角度的人脸数据的捕捉。对于捕捉设备,可以采用RGB相机和/或RGBD相机。
由上,采用本实施方案能够提供一种高效且高精度的虚拟对象面部动画生成方案,生成的虚拟对象面部与演员真实面部的表情相似度高。具体而言,基于预设三维面部模型提高三维面部重建的质量。进一步,由于准确地重建了演员的三维面部,从而能够更为准确地预测动画数据,最终得到高质量的虚拟对象面部。并且,采用本实施方案生成虚拟对象面部的整体制作效率高。进一步,由于采用三维面部重建技术,能够放松对演员佩戴头盔的要求,不需严格恢复到上次捕捉面部数据时的头盔佩戴位置。
图4是本发明实施例一种虚拟对象面部动画生成装置的结构示意图。本领域技术人员理解,本实施例所述虚拟对象面部动画生成装置4可以用于实施上述图1至图3所述实施例中所述的方法技术方 案。
具体地,参考图4,本实施例所述虚拟对象面部动画生成装置4可以包括:接收模块41,用于接收待处理的图像帧,所述图像帧包括演员的面部图像;重建模块42,用于基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;提取模块43,用于从所述三维面部中提取得到多个三维特征点;确定模块44,用于基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;生成模块45,用于基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
进一步,所述预设三维面部模型包括预设三维人脸模型和预设三维眼神模型,所述演员的三维面部包括演员的三维人脸以及演员的三维眼神,其中,所述演员的三维人脸是基于所述预设三维人脸模型以及所述面部图像重建得到的,所述演员的三维眼神是基于所述预设三维眼神模型以及所述面部图像重建得到的。
进一步,所述重建模块42可以包括:第一检测单元,用于检测所述面部图像以至少得到多个二维人脸特征点;第一生成单元,用于根据所述预设三维人脸模型生成估算三维人脸;第一提取单元,用于从所述估算三维人脸中提取得到多个估算三维特征点;投影单元,用于将所述多个估算三维特征点投影至二维平面,以得到多个二维投影点;第一计算单元,用于计算所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异;第一确定单元,如果所述坐标差异小于预设阈值,则将所述估算三维人脸确定为重建得到的所述演员的三维人脸。
进一步,所述二维人脸特征点具有对应的语义信息,所述二维投影点具有对应的语义信息,所述第一计算单元包括:第二计算单元,用于分别计算所述多个二维人脸特征点和多个二维投影点中,对应相 同语义信息的二维人脸特征点与二维投影点之间的坐标差异;第二确定单元,用于将计算得到的多个坐标差异之和确定为所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异。
进一步,所述重建模块42还可以包括:迭代调整单元,如果所述坐标差异大于预设阈值,则迭代调整所述预设三维人脸模型和相机外参,直至基于调整后的预设三维人脸模型得到的多个二维投影点与所述多个二维人脸特征点之间的坐标差异小于所述预设阈值。
进一步,所述预设三维人脸模型的输出结果与输入权重相关联,所述迭代调整单元包括:输入权重调整单元,用于迭代调整所述输入权重,以得到所述预设三维人脸模型的不同的输出结果,不同的输出结果对应不同的表情。
进一步,所述相机外参包括所述演员的面部与拍摄所述面部图像的影像采集设备之间的相对位置与朝向。
进一步,所述虚拟对象面部动画生成装置4还包括:第一建立模块,用于建立所述预设三维面部模型。
进一步,所述第一建立模块包括:第一获取单元,获取所述演员的混合形状模型组,所述混合形状模型组包括多个混合形状模型并用于描述多个表情;分析单元,用于对所述混合形状模型组进行主成分分析,以得到所述预设三维人脸模型。
进一步,所述多个表情至少包括中性表情,所述混合形状模型组至少包括一个描述所述中性表情的混合形状模型。
进一步,所述虚拟对象面部动画生成装置4还包括:第二建立模块,用于建立所述三维特征点与动画数据之间的映射关系。
进一步,所述第二建立模块包括:第二获取单元,用于获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;建立单元,用于基于所述训练数据建立所述三维特征点与动画数据之 间的映射关系。
进一步,所述多帧训练帧选取自单个视频,并且,所述多帧训练帧是所述视频包括的所有图像帧中对应的三维特征点的特征信息差异最大的图像帧。
进一步,所述待处理的图像帧选取自所述视频中除训练帧之外的图像帧。
进一步,所述训练数据根据表情相似度调节,所述表情相似度为所述演员在所述待处理的图像帧中做出的表情与基于所述待处理的图像帧生成的虚拟对象面部的表情之间的相似度。
进一步,所述多帧训练帧获取自多个视频,并且所述多个视频是所述演员按照预设脚本表演时拍摄得到的。
进一步,所述待处理的图像帧为实时拍摄得到的所述演员的面部图像。
关于所述虚拟对象面部动画生成装置4的工作原理、工作方式的更多内容,可以参照上述图1至图3中的相关描述,这里不再赘述。
进一步,所述虚拟对象面部动画生成装置4可以集成于终端、服务器等计算设备。例如,虚拟对象面部动画生成装置4可以集中地集成于同一服务器内。或者,虚拟对象面部动画生成装置4可以分散的集成于多个终端或服务器内并相互耦接。例如,所述预设三维面部模型可以单独设置于终端或服务器上,以确保较优的数据处理速度。
基于本实施例虚拟对象面部动画生成装置4及对应的虚拟对象面部动画生成方法,用户在接收模块41一侧输入待处理的图像,即可在生成模块45的输出端得到对应的虚拟对象面部的表情,从而实现演员面捕。
进一步地,本发明实施例还公开一种存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时执行上述图1至图3所示实 施例中所述的方法技术方案。优选地,所述存储介质可以包括诸如非挥发性(non-volatile)存储器或者非瞬态(non-transitory)存储器等计算机可读存储介质。所述存储介质可以包括ROM、RAM、磁盘或光盘等。
进一步地,本发明实施例还公开一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行上述图1至图3所示实施例中所述的方法技术方案。
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,均可作各种更动与修改,因此本发明的保护范围应当以权利要求所限定的范围为准。

Claims (18)

  1. 一种虚拟对象面部动画生成方法,其特征在于,包括:
    接收待处理的图像帧,所述图像帧包括演员的面部图像;
    基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;
    从所述三维面部中提取得到多个三维特征点;
    基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;
    基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
  2. 根据权利要求1所述的方法,其特征在于,所述预设三维面部模型包括预设三维人脸模型和预设三维眼神模型,所述演员的三维面部包括演员的三维人脸以及演员的三维眼神,其中,所述演员的三维人脸是基于所述预设三维人脸模型以及所述面部图像重建得到的,所述演员的三维眼神是基于所述预设三维眼神模型以及所述面部图像重建得到的。
  3. 根据权利要求2所述的方法,其特征在于,基于所述预设三维人脸模型以及所述面部图像重建所述演员的三维人脸的过程包括如下步骤:
    检测所述面部图像以至少得到多个二维人脸特征点;
    根据所述预设三维人脸模型生成估算三维人脸;
    从所述估算三维人脸中提取得到多个估算三维特征点;
    将所述多个估算三维特征点投影至二维平面,以得到多个二维投 影点;
    计算所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异;
    如果所述坐标差异小于预设阈值,则将所述估算三维人脸确定为重建得到的所述演员的三维人脸。
  4. 根据权利要求3所述的方法,其特征在于,所述二维人脸特征点具有对应的语义信息,所述二维投影点具有对应的语义信息,所述计算所述多个人脸二维特征点与所述多个二维投影点之间的坐标差异包括:
    分别计算所述多个二维人脸特征点和多个二维投影点中,对应相同语义信息的二维人脸特征点与二维投影点之间的坐标差异;
    将计算得到的多个坐标差异之和确定为所述多个二维人脸特征点与所述多个二维投影点之间的坐标差异。
  5. 根据权利要求3所述的方法,其特征在于,基于所述预设三维人脸模型以及所述面部图像重建所述演员的三维人脸的过程还包括如下步骤:
    如果所述坐标差异大于预设阈值,则迭代调整所述预设三维人脸模型和相机外参,直至基于调整后的预设三维人脸模型得到的多个二维投影点与所述多个二维人脸特征点之间的坐标差异小于所述预设阈值。
  6. 根据权利要求5所述的方法,其特征在于,所述预设三维人脸模型的输出结果与输入权重相关联,所述迭代调整所述预设三维人脸模型包括:
    迭代调整所述输入权重,以得到所述预设三维人脸模型的不同的输出结果,不同的输出结果对应不同的表情。
  7. 根据权利要求5所述的方法,其特征在于,所述相机外参包括所 述演员的面部与拍摄所述面部图像的影像采集设备之间的相对位置与朝向。
  8. 根据权利要求2所述的方法,其特征在于,所述预设三维人脸模型的建立过程包括如下步骤:
    获取所述演员的混合形状模型组,所述混合形状模型组包括多个混合形状模型并用于描述多个表情;
    对所述混合形状模型组进行主成分分析,以得到所述预设三维人脸模型。
  9. 根据权利要求8所述的方法,其特征在于,所述多个表情至少包括中性表情,所述混合形状模型组至少包括一个描述所述中性表情的混合形状模型。
  10. 根据权利要求1所述的方法,其特征在于,所述三维特征点与动画数据之间映射关系的建立过程包括如下步骤:
    获取训练数据,所述训练数据包括多帧训练帧各自对应的多个三维特征点以及动画数据,所述多帧训练帧为所述演员做出不同表情时的面部图像;
    基于所述训练数据建立所述三维特征点与动画数据之间的映射关系。
  11. 根据权利要求10所述的方法,其特征在于,所述多帧训练帧选取自单个视频,并且,所述多帧训练帧是所述视频包括的所有图像帧中对应的三维特征点的特征信息差异最大的图像帧。
  12. 根据权利要求11所述的方法,其特征在于,所述待处理的图像帧选取自所述视频中除训练帧之外的图像帧。
  13. 根据权利要求12所述的方法,其特征在于,所述训练数据根据表情相似度调节,所述表情相似度为所述演员在所述待处理的图像帧中做出的表情与基于所述待处理的图像帧生成的虚拟对象面部 的表情之间的相似度。
  14. 根据权利要求10或13所述的方法,其特征在于,所述多帧训练帧获取自多个视频,并且所述多个视频是所述演员按照预设脚本表演时拍摄得到的。
  15. 根据权利要求14所述的方法,其特征在于,所述待处理的图像帧为实时拍摄得到的所述演员的面部图像。
  16. 一种虚拟对象面部动画生成装置,其特征在于,包括:
    接收模块,用于接收待处理的图像帧,所述图像帧包括演员的面部图像;
    重建模块,用于基于预设三维面部模型以及所述面部图像重建得到所述演员的三维面部,所述预设三维面部模型用于描述所述演员的面部表情变化;
    提取模块,用于从所述三维面部中提取得到多个三维特征点;
    确定模块,用于基于三维特征点与动画数据之间的映射关系,确定所述多个三维特征点对应的动画数据;
    生成模块,用于基于所述动画数据生成对应的虚拟对象面部的表情,且生成的所述虚拟对象面部的表情与所述演员在所述面部图像中做出的表情保持一致。
  17. 一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器运行时执行权利要求1至15中任一项所述方法的步骤。
  18. 一种终端,包括存储器和处理器,所述存储器上存储有能够在所述处理器上运行的计算机程序,其特征在于,所述处理器运行所述计算机程序时执行权利要求1至15中任一项所述方法的步骤。
PCT/CN2021/138747 2020-12-31 2021-12-16 虚拟对象面部动画生成方法及装置、存储介质、终端 WO2022143197A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011639440.3 2020-12-31
CN202011639440.3A CN112700523B (zh) 2020-12-31 2020-12-31 虚拟对象面部动画生成方法及装置、存储介质、终端

Publications (1)

Publication Number Publication Date
WO2022143197A1 true WO2022143197A1 (zh) 2022-07-07

Family

ID=75513962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138747 WO2022143197A1 (zh) 2020-12-31 2021-12-16 虚拟对象面部动画生成方法及装置、存储介质、终端

Country Status (2)

Country Link
CN (1) CN112700523B (zh)
WO (1) WO2022143197A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115526966A (zh) * 2022-10-12 2022-12-27 广州鬼谷八荒信息科技有限公司 一种用调度五官部件实现虚拟人物表情展现的方法
CN115908655A (zh) * 2022-11-10 2023-04-04 北京鲜衣怒马文化传媒有限公司 一种虚拟人物面部表情处理方法及装置
CN116503524A (zh) * 2023-04-11 2023-07-28 广州赛灵力科技有限公司 一种虚拟形象的生成方法、系统、装置及存储介质
CN116912373A (zh) * 2023-05-23 2023-10-20 苏州超次元网络科技有限公司 一种动画处理方法和系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700523B (zh) * 2020-12-31 2022-06-07 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端
CN112767453B (zh) * 2021-01-29 2022-01-21 北京达佳互联信息技术有限公司 人脸跟踪方法、装置、电子设备及存储介质
CN113724367A (zh) * 2021-07-13 2021-11-30 北京理工大学 一种机器人表情驱动方法及装置
CN113633983B (zh) * 2021-08-16 2024-03-15 上海交通大学 虚拟角色表情控制的方法、装置、电子设备及介质
CN113946209B (zh) * 2021-09-16 2023-05-09 南昌威爱信息科技有限公司 一种基于虚拟人的交互方法及系统
CN114219878B (zh) * 2021-12-14 2023-05-23 魔珐(上海)信息科技有限公司 虚拟角色的动画生成方法及装置、存储介质、终端
CN114419956B (zh) * 2021-12-31 2024-01-16 深圳云天励飞技术股份有限公司 基于学生画像的实物编程方法及相关设备
CN115116109B (zh) * 2022-04-27 2024-05-14 平安科技(深圳)有限公司 虚拟人物说话视频的合成方法、装置、设备及存储介质
CN114898020A (zh) * 2022-05-26 2022-08-12 唯物(杭州)科技有限公司 一种3d角色实时面部驱动方法、装置、电子设备及存储介质
CN115393486B (zh) * 2022-10-27 2023-03-24 科大讯飞股份有限公司 虚拟形象的生成方法、装置、设备及存储介质
CN115546366B (zh) * 2022-11-23 2023-02-28 北京蔚领时代科技有限公司 一种基于不同中之人驱动数字人的方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130215113A1 (en) * 2012-02-21 2013-08-22 Mixamo, Inc. Systems and methods for animating the faces of 3d characters using images of human faces
CN104077804A (zh) * 2014-06-09 2014-10-01 广州嘉崎智能科技有限公司 一种基于多帧视频图像构建三维人脸模型的方法
CN107330371A (zh) * 2017-06-02 2017-11-07 深圳奥比中光科技有限公司 3d脸部模型的脸部表情的获取方法、装置和存储装置
CN109584353A (zh) * 2018-10-22 2019-04-05 北京航空航天大学 一种基于单目视频重建三维人脸表情模型的方法
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7606392B2 (en) * 2005-08-26 2009-10-20 Sony Corporation Capturing and processing facial motion data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130215113A1 (en) * 2012-02-21 2013-08-22 Mixamo, Inc. Systems and methods for animating the faces of 3d characters using images of human faces
CN104077804A (zh) * 2014-06-09 2014-10-01 广州嘉崎智能科技有限公司 一种基于多帧视频图像构建三维人脸模型的方法
CN107330371A (zh) * 2017-06-02 2017-11-07 深圳奥比中光科技有限公司 3d脸部模型的脸部表情的获取方法、装置和存储装置
CN109584353A (zh) * 2018-10-22 2019-04-05 北京航空航天大学 一种基于单目视频重建三维人脸表情模型的方法
CN112700523A (zh) * 2020-12-31 2021-04-23 魔珐(上海)信息科技有限公司 虚拟对象面部动画生成方法及装置、存储介质、终端

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115526966A (zh) * 2022-10-12 2022-12-27 广州鬼谷八荒信息科技有限公司 一种用调度五官部件实现虚拟人物表情展现的方法
CN115908655A (zh) * 2022-11-10 2023-04-04 北京鲜衣怒马文化传媒有限公司 一种虚拟人物面部表情处理方法及装置
CN116503524A (zh) * 2023-04-11 2023-07-28 广州赛灵力科技有限公司 一种虚拟形象的生成方法、系统、装置及存储介质
CN116503524B (zh) * 2023-04-11 2024-04-12 广州赛灵力科技有限公司 一种虚拟形象的生成方法、系统、装置及存储介质
CN116912373A (zh) * 2023-05-23 2023-10-20 苏州超次元网络科技有限公司 一种动画处理方法和系统
CN116912373B (zh) * 2023-05-23 2024-04-16 苏州超次元网络科技有限公司 一种动画处理方法和系统

Also Published As

Publication number Publication date
CN112700523B (zh) 2022-06-07
CN112700523A (zh) 2021-04-23

Similar Documents

Publication Publication Date Title
WO2022143197A1 (zh) 虚拟对象面部动画生成方法及装置、存储介质、终端
Yu et al. Improving few-shot user-specific gaze adaptation via gaze redirection synthesis
Wu et al. Reenactgan: Learning to reenact faces via boundary transfer
He et al. Photo-realistic monocular gaze redirection using generative adversarial networks
Shi et al. Automatic acquisition of high-fidelity facial performances using monocular videos
WO2022095721A1 (zh) 参数估算模型的训练方法、装置、设备和存储介质
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
US11393149B2 (en) Generating an animation rig for use in animating a computer-generated character based on facial scans of an actor and a muscle model
US11158104B1 (en) Systems and methods for building a pseudo-muscle topology of a live actor in computer animation
JP7462120B2 (ja) 2次元(2d)顔画像から色を抽出するための方法、システム及びコンピュータプログラム
Zhao et al. Mask-off: Synthesizing face images in the presence of head-mounted displays
CN113192132A (zh) 眼神捕捉方法及装置、存储介质、终端
Wang et al. Digital twin: Acquiring high-fidelity 3D avatar from a single image
CN107862387A (zh) 训练有监督机器学习的模型的方法和装置
Kaur et al. Subject guided eye image synthesis with application to gaze redirection
KR20230110787A (ko) 개인화된 3d 머리 및 얼굴 모델들을 형성하기 위한 방법들 및 시스템들
Danieau et al. Automatic generation and stylization of 3d facial rigs
Song et al. Real-time 3D face-eye performance capture of a person wearing VR headset
Basak et al. Methodology for building synthetic datasets with virtual humans
CN112132107A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2020193972A1 (en) Facial analysis
US11587278B1 (en) Systems and methods for computer animation of an artificial character using facial poses from a live actor
US20240169635A1 (en) Systems and Methods for Anatomically-Driven 3D Facial Animation
US11715247B1 (en) Generating a facial rig for use in animating a computer-generated character based on facial scans and muscle models of multiple live actors
US20230154094A1 (en) Systems and Methods for Computer Animation of an Artificial Character Using Facial Poses From a Live Actor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913938

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205ADATED 22.11.2023)