WO2020193972A1 - Analyse faciale - Google Patents

Analyse faciale Download PDF

Info

Publication number
WO2020193972A1
WO2020193972A1 PCT/GB2020/050791 GB2020050791W WO2020193972A1 WO 2020193972 A1 WO2020193972 A1 WO 2020193972A1 GB 2020050791 W GB2020050791 W GB 2020050791W WO 2020193972 A1 WO2020193972 A1 WO 2020193972A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
parameters
identity
data
frames
Prior art date
Application number
PCT/GB2020/050791
Other languages
English (en)
Inventor
Jane Haslam
Original Assignee
Cubic Motion Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cubic Motion Limited filed Critical Cubic Motion Limited
Publication of WO2020193972A1 publication Critical patent/WO2020193972A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/755Deformable models or variational models, e.g. snakes or active contours
    • G06V10/7553Deformable models or variational models, e.g. snakes or active contours based on shape, e.g. active shape models [ASM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/469Contour-based spatial representations, e.g. vector-coding
    • G06V10/476Contour-based spatial representations, e.g. vector-coding using statistical shape modelling, e.g. point distribution models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • Disclosed examples relate to facial modeling, facial analysis and describing faces. Examples also relate to facial animation, and to training a solver to map a description of a face to control values for controlling a computer model of a face.
  • An actor’s performance may be recorded, e.g. by one or more video cameras, and the changes in facial expression (i.e. deformations of the actor’s face) between different frames may be described in terms of the location and/or motion of a set of points on the actors face, referred to as landmark points or fiducial points.
  • Landmark points may correspond, for example, to corners of the mouth, corners of the eyes, the pupils of the eyes, tip of the nose, etc.
  • a representation may be used which captures the statistical variation of both landmark point location (modelling shape variation), and also the appearance of the image sampled at and around the location of the landmark point (modelling the variation of image appearance at the landmark point as facial expression and lighting are varied).
  • the landmark points may be identified automatically by a computer executing appropriate software. In some cases, the landmark points may be determined and annotated on the video manually by a user inspecting the video of the actor. In some cases, the actor’s face may be marked with landmark points prior to capturing the video, such that the landmark points are captured along with the actor’s face in the video recording.
  • a rig is a 3D model of the face that provides input controls that allow the expression of the simulated face to be controlled.
  • Various types of rig may be used.
  • FACS Facial Action Coding System
  • Other rig types may include, for example, rigs with controls that mimic bones or joints.
  • Rigs may make use of blendshapes, whereby a rig control has a value representing an interpolation (e.g. a linear interpolation) between extremes of movement of the muscle group or groups.
  • Non-linear movements/interpolations may also be included (e.g. a jaw opening and closing on an arc).
  • a so-called solver (or facial solver) may be used.
  • a solver describes hardware, software, a mapping or a combination thereof, that receives landmark points (e.g. describing a face or facial expression) as an input and provides rig control values as an output.
  • landmark points e.g. describing a face or facial expression
  • rig control values that approximate the expressions in a simulation of the face are output.
  • the simulated face may then be used to produce CGI images or animations with facial expressions approximating, or corresponding with, the target expression or expressions.
  • ISSN 0167-7055 describes building a solver which solves directly from input video streams, rather than from geometrical landmark points describing locations of facial features.
  • a facial solver may be implemented using machine learning or a trained learning algorithm, such as a feed forward artificial neural network (ANN) or other non linear learning algorithm such as Support Vector Machines or Random Forest Regression.
  • ANN feed forward artificial neural network
  • This learning may involve generating training data that includes data describing a series of images with various facial expressions, e.g. landmark points derived from a video sequence of an actor adopting various facial expressions, and corresponding rig control values.
  • the training data may then be used in a training phase to train the solver.
  • Generating the rig control values for the training data may involve an animator manually selecting rig control values that approximate the facial
  • a computer-implemented method of analysing a face comprises receiving data describing a plurality of frames, each frame including coordinates of a plurality of landmark points of the face; and determining, based on the plurality of frames: a set of first parameters indicative of the identity of the face, and for each frame, a respective set of second parameters that are independent of the identity of the face, wherein the set of first parameters and the sets of second parameters are determined such that for each frame, the coordinates of the landmark points of the face are approximated by a description in which the set of first parameters is the same across all frames of the plurality of frames, and the respective sets of second parameters are variable between frames.
  • the data describing the plurality of frames includes a plurality of 2D projections, each 2D projection including 2D point coordinates of the plurality of landmark points of the face.
  • the 2D projections correspond with images of the face captured by a single camera, respective 2D projections corresponding with one or more of variations in expression of the face, perturbations of the camera position, or perturbations of the camera angle.
  • the single camera is either (i) a virtual camera, and the face is a simulated face, or (ii) a real camera, and the face is a real face.
  • the 2D projections correspond with images of the face captured by a first camera
  • the receiving includes receiving second data describing a second plurality of frames captured by a second camera, the second camera differing from the first camera in one or more of optical properties, position or orientation
  • the method further comprises: performing the determining with the second data, wherein the determining associated with the first camera is independent of the determining associated with the second camera, and the determining associated with the second camera is independent of the determining associated with the first camera.
  • the set of first parameters and the set of second parameters satisfy:
  • P EC describes an identity-independent subspace
  • P 7 describes a subspace indicative of the identity of the face
  • q is the set of first parameters
  • q ECJ is a set of second parmeters associated with a j th frame of the plurality of frames
  • b j represents the coordinates of the landmark points in the j th frame.
  • the data describing the plurality of frames includes 3D coordinates of the plurality of landmark points of the face.
  • the set of first parameters and the set of second parameters satisfy:
  • P EC describes an identity-independent subspace
  • P j describes a subspace indicative of the identity of the face
  • q t is the set of first parameters
  • q ECj is a set of second parmeters associated with a j th frame of the plurality of frames
  • b j represents the coordinates of the landmark points in the j th frame.
  • bp is a vector representing weights corresponding with the basis vectors
  • X c is a 3D translation of the transformed coordinate frame relative to a coordinate frame of the coordinates of the landmark points, wherein determining the set of first parameters and the sets of second parameters is based on b.
  • a linear combination of the first set of parameters and the respective set of second parameters approximates coordinates of the landmark points of the face in each respective frame of the plurality of frames.
  • the coordinates of the landmark points of the face are
  • the set of first parameters represents a projection of the
  • the respective set of second parameters represents a projection of the coordinates of the plurality of landmark points onto one or more second subspaces.
  • the one or more second subspaces describe one or more of: expression of the face, perturbations of the camera position, or perturbations of the camera angle
  • the method further comprises generating training data based on the sets of second parameters, independent of the set of first parameters, the training data including control information for controlling deformations of a simulated model of the face in accordance with deformations of the face in corresponding frames; and training a solver using the training data, the solver to output control information for the simulated model in response to input describing coordinates of landmark points of a generalized face.
  • the method further comprises generating input data based on the sets of second parameters, independent of the set of first parameters; inputting the input data to a solver; and receiving, from the solver, control information for a simulated model of a further face, in response to inputting the input data.
  • a computer-implemented method of training a solver comprises, where the solver to map a facial description to rig control values that reproduce an expression in the facial description in a simulated facial model: receiving data describing a plurality of frames, each frame associated with a face, the face having, in each frame, a respective facial expression, the data describing each frame including coordinates of a plurality of landmark points of the face, and each frame associated with a set of rig control values describing parameter values to reproduce the respective facial expression in a simulated model of the face; producing training data by mapping each frame of one or more of the plurality of frames to an identiy-free description based on a projection of the data into a non-identity subspace and an identity subspace; training the solver using the training data.
  • identity parameters describing the projection of the data into the identity subspace are constrained to be the same for each frame of the one or more of the plurality of frames.
  • the received data describing a plurality of frames is associated with frames captured by a first camera, the method further comprising repeating the receiving and producing for data describing a plurality of frames associated with at least one additional camera, wherein the non-identity subspace and identity subspace for each camera is independent of the non-identity subspace and identity subspace of each other camera.
  • the received data is (i) video or image data of a real face captured by one or more real cameras, or (ii) video or image data of a simulated face captured by one or more simulated cameras.
  • a computer-implemented method of facial animation comprises: receiving data describing a plurality of frames, each frame associated with a face, the face having, in each frame, a respective facial expression, the data describing each frame including coordinates of a plurality of landmark points of the face; producing solver input data by mapping each frame of one or more of the plurality of frames to an identity-free description based on a projection of the data into a non-identity subspace and an identity subspace; and inputting the solver input data to a solver, the solver arranged to output rig control values for each frame, the rig control values describing parameter values to reproduce the respective facial expression in a simulated facial model.
  • the method further comprises performing a calibration, the calibration comprising receiving range of motion, ROM, data, the ROM data including a plurality of frames including respective facial expressions; and determining identity parameters for projecting the ROM data into the identity subspace based on the ROM data, wherein mapping each frame of one or more of the plurality of frames to an identity-free description is based on the determined identity parameters.
  • Some aspects provide an apparatus comprising means arranged to carry out the methods described herein.
  • Some aspects provide a computer program that, when executed by a processing device, causes the processing device to carry out the methods described herein.
  • Some aspects provide a computer-readable medium storing instructions thereon that, when executed by a processing device, cause the processing device to carry out the methods descried herein.
  • Figure 1a illustrates a method 100 for generating a framework for describing a face.
  • Figure 1 b illustrates a method 110 of generating a generic shape model.
  • Figure 1c illustrates a method 120 of generating simulated or synthetic training data.
  • Figure 1d illustrates a method 130 of generating training data from real camera footage.
  • Figure 2a illustrates a method 200 for decomposing a generic point distribution model into two subspaces.
  • Figure 2b shows a method 220 of generating a non-identity subspace.
  • Figure 2c shows a method 240 of generating an identity subspace.
  • Figure 3a illustrates a method 300 of analysing a face.
  • Figure 3b illustrates a method 320 for determining identity and non-identity parameters.
  • Figure 3c illustrates an apparatus for carrying out the method of Figure 3b.
  • Figure 4a illustrates a method 400 of producing training data.
  • Figure 4b illustrates an apparatus for carrying out the method of Figure 4a.
  • Figure 4c illustrates a process 430 for producing training data.
  • Figure 5 shows a method 500 projecting a training set into an identity-free subspace.
  • Figure 6a shows a method 600 for calibrating the identity of a runtime actor.
  • Figure 6b shows a method 610 for capturing and tracking a range of motion performance.
  • Figure 6c shows a method 620 for determining the identity parameters.
  • Figure 7a illustrates a method 700 for using a generic solver at runtime.
  • Figure 7b illustrates an apparatus for carrying out the method of Figure 7a.
  • Figure 7c illustrates a method 730 of generating input data for a generic solver at runtime.
  • Figure 8 illustrates a computer system.
  • Figure 9 illustrates a computer readable medium.
  • identity of a face is used to describe the characteristics of the face (or an image/description of the face) that are intrinsic to the individual face and differentiate the face from other faces (e.g. associated with the character or distinctiveness of the face).
  • Non-identity factors herein, describe factors that describe a particular image or pose of a face, but are independent of the identity and are not intrinsic the face.
  • Non-identity factors may include an expression adopted by the face, a view of the face (e.g. relating to position and angle of an image capture device), lighting incident on the face, etc.
  • identity and non-identity parameters describing a plurality of frames associated with a particular face may be analyzed in a consistent manner, leading to an improvement in the description across the frames, and also providing a simplified description of the frames by reducing a number of parameters needed to describe the frames.
  • Examples disclosed herein may be applied to the generation and use of facial solvers (e.g. for CGI animation) to remove a dependency of the solver on an identity of a runtime user. This improves the applicability and flexibility of the solver, and may reduce duplication of work in generating multiple solvers for different runtime actors.
  • Examples disclosed herein be used for facial recognition purposes, e.g. based on the extracted identity parameters. Some examples may be used to improve the accuracy of a generic facial feature tracker: disclosed examples may be used to determine the identity parameters of a tracked face. The identity parameters may then be“fixed” to give a more specific tracking model for that particular individual face that is be more robust and accurate than the‘unfixed’ generic face model.
  • a separate shape model may be constructed for each required camera view of the face.
  • Each 2D shape model describes the expected 2D shape variation of a set of landmark points on the face, viewed from a particular camera (or virtual camera) view of the face (for example a frontal camera view, or a side camera view).
  • the shape model captures the statistics of how the landmark point positions for that particular camera view vary with respect to various factors, such as i) facial identity, ii) facial expression, and iii) small variations in camera position and orientation (e.g. due to wobbling of the camera).
  • Figure 1 a illustrates a method 100 for generating a framework for describing a face in which the identity and non-identity aspects are separated.
  • the method begins at 102, and at 104 an initial generic shape model is created.
  • This model contains variation due to various factors, such as facial identity, facial expression and camera position and orientation.
  • parameters for these different sources of variation are not separated. This may be carried out by a generic shape model creation module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the initial shape model is refined by splitting it into at least two linear subspaces so that facial identity is modeled separately from other factors, such as facial expression and camera position and orientation. This may be carried out by a subspace determination module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof. The method ends at 108.
  • Figure 1 b illustrates a method 1 10 of generating a generic shape model, as in 104 of Figure 1 a.
  • the method begins in 1 12 and training data is received at 1 14.
  • the training data should include a number of images that depict a range of variation due to facial identity and other factors of interest, such as facial expression.
  • the additional factors may also include variations due to camera wobble, etc.
  • the same camera and camera position e.g. aside from perturbations, such as perturbations associated with camera wobble
  • minor variations in camera position and orientation e.g. due to camera wobble is referred to as camera pose.
  • the training data may correspond with real video or photographic data of a real face or may be derived from a simulation of the face, e.g. using a 3D facial rig. In some cases, a combination of real and simulated data may be used.
  • the training data may relate to a number of different faces. Where there is a good variation (e.g. between facial shapes, gender and ethnicity) the applicability of the resulting frame work will generally be improved.
  • the number of different faces is denoted as R. In some examples R may be between 20 and 30.
  • the training data may include or may be processed to produce 2D shape training examples, the shape training data describing fiducial points or landmarks corresponding with each frame of the training data.
  • the same set of landmarks e.g. denoting corresponding features, such as tip of nose, left pupil, etc.
  • the set of landmarks may include N points for each frame. In some examples the N landmark points are visible (i.e. not obscured) in every training frame.
  • the yth training example of the rth face may be described as a vector x ij :
  • the training data may be generated or received by a training data input
  • module/section which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the training data is used to generate a generic shape model from the M ⁇ 2D shape training examples.
  • Principal Component Analysis (PCA) of the 2D shape training examples may be used to build a standard 2D Point Distribution Model (PDM) (e.g. as described in Cootes, T. F. and Taylor, C. J.: Active Shape Models -’Smart Snakes’.
  • PDM Point Distribution Model
  • M(9) is a matrix defining a 2D rotation by angle 9
  • x is the mean position of the points
  • P ( i, i, ... p t ) is the matrix of the first t modes of variation corresponding to the most significant eigenvectors of the covariance matrix in a PCA of the aligned shape training examples
  • b ... b t ) is a vector of weights for each mode
  • X c
  • Equation (2) models/approximates the shape variation in the training set. It is to be understood that equation (2) is not a strict equality, as only t modes of variation are included. In some examples t may be chosen such that a variance of the model of equation (2) is within a threshold value of the variance of the training set (e.g. 95% or 99%). In some examples, t may be chosen as a fixed number (e.g. 20).
  • the generic shape model may be generated by a generic shape model generation module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the method 1 10 terminates at 1 18.
  • Figure 1 c illustrates a method 120 of generating simulated or synthetic training data suitable for use in 1 14.
  • the method begins at 122, and at 124 a number of different 3D facial models or rigs are selected. Where the selected facial models cover a range of different facial shapes, gender and ethnicity a more general shape model is likely to be produced.
  • the number of facial rigs R may be between 20 and 30, inclusive, but R is not particularly limited to this range.
  • a virtual camera model is defined for use in producing the current 2D shape model.
  • the virtual camera model may be defined in terms of camera intrinsic and extrinsic parameters as described in Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1 1 ):1330-1334, 2000, which is hereby incorporated by reference in its entirety.
  • a set of 2D shape training data is generated for the current virtual camera. This may involve adjusting the rig controls to obtain a facial expression in a simulated face and capturing, based on the virtual cameras, 2D projections of landmark points on the simulated face.
  • 2D shape training data may be obtained by capturing, using the virtual cameras, images of the face simulated by the rig and automatically or manually identifying landmarks on the image of the face.
  • the same set of N facial fiducial points or landmarks may be used for each rig.
  • the yth training example for the rth rig the may be described as a vector x ij , as in equation (1 ). The method terminates at 129.
  • Figure 1d illustrates a method 130 of generating training data suitable for use in 114 from real camera footage.
  • the method begins at 132, and at 134 a number of different people are selected. Where the faces of the selected people cover a range of different facial shapes, gender and ethnicity the generality of the general shape model is likely to be improved.
  • the number of people/faces, R, selected may be from 20 to 30. However, in some examples R may be less than 20 or more than 30.
  • camera footage of each person of the R selected people is captured using a desired camera set-up of one or more synchronized camera views.
  • the person may perform a“range-of-motion” performance.
  • a range-of-motion performance may be about 30 seconds to 1 minute in length.
  • the person exercises a range of facial expressions.
  • the facial expressions may be chosen to include extremes of facial movement.
  • the required facial fiducial points or landmarks are annotated on the captured facial footage. This may be performed manually or using an automated tracking system. For a particular camera view, for each person, this produces a set of M training examples, each of which contains N points each, as described in equation (1 ).
  • the tracking system may include a tracking module/section for carrying out automated identification and tracking of landmark points.
  • the tracking module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the method 130 terminates at 139.
  • This PDM may be further decomposed to produce a model or framework in which the variation due to different sources or factors may be separated into different linear sub-spaces.
  • one subspace may correspond with facial expression, camera position and camera orientation and another subspace may be associated with facial identity.
  • Figure 2a illustrates a method 200 for decomposing the generic PDM into two subspaces, corresponding with (i) facial identity and (ii) other factors, but similar methods could be used to decompose the PDM into different or additional subspaces.
  • the method 200 of Figure 2a may be used to perform 106 of Figure 1a. Costen, Nicholas & F.
  • the method 200 begins at 202, and at 204, shape model parameters b are determined (e.g. according to equation 2) for each of the M; 2D shape training examples across all faces or rigs, all expression/camera poses, (e.g. all shape training examples created in Section 1 .1 ).
  • An iterative technique may be used, e.g. as described in Cootes, T., Chapter 7:’’Model- Based Methods in Analysis of Biomedical Images” in’’Image Processing and Analysis”, Ed.R.Baldock and J. Graham, Oxford University Press, 2000, pp223-248, which is hereby incorporated by reference in its entirety.
  • a shape model parameter module/section may be provided to determine the shape model parameters.
  • module/section may be in software, firmware, hardware, circuitry, or a combination thereof.
  • two shape sub-spaces are built using (potentially) different groups of training data from the original ensemble.
  • the shape model parameters b for each example may be used as the training data for building the model for each sub-space, rather than the original landmark point data.
  • a within group covariance matrix for the sub-space is constructed using some or all of the original training data, and then used to calculate a corresponding PCA model.
  • Identity and non-identity subspace generation modules/sections may be provided to generate the identity and non-identity subspaces, respectively.
  • the identity and non-identity subspace generation modules/sections may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Figure 2b shows a method 220 of generating 206 a non-identity subspace.
  • the method begins a 222.
  • the within group covariance is calculated at 224.
  • the covariance matrix is then constructed 228 across all rigs, all expressions and camera poses, based upon subtracting the rig mean from each data item:
  • Operation 224 may be performed by a covariance calculation module/section, which may include a mean calculation module/section and a covariance calculation module/section. Each of these modules/sections may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • PCA may be used to give a model of the expression and camera pose subspace: [0063] where gee is a vector of non-identity subspace parameters. Operation 230 may be carried out by a multivariate analysis module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • non-identity subspace may be thought of as an expression and camera pose shape sub-space, where expression and camera pose are the main causes of variation mapped to that subspace.
  • Method 220 terminates at 232.
  • Figure 2c shows a method 240 of generating 208 an identity subspace. The method begins a 242.
  • the training data is divided into groups of examples at 244. Examples within each group contain the same expression and camera pose (or as close to this as possible) across all rigs/faces.
  • a neutral expression e.g. a rig neutral expression or an image of an actor’s face posed with a neutral expression
  • a neutral expression may include eyes open and looking straight forward, mouth closed, neutral facial expression.
  • Examples that are not grouped may be discarded for the purposes of generating the identity subspace.
  • the identity sub-space training set may be a proper subset of the whole data-set.
  • the grouping of data may be carried out automatically or with user input by a grouping module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the within group covariance is calculated 246.
  • the shape model parameters for each group e.g. facial expression/camera pose group
  • the number of rigs in each group is
  • Operation 246 may be performed by a covariance calculation module/section, which may include a mean calculation module/section and a covariance calculation module/section. Each of these modules/sections may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Operation 252 may be carried out by a multivariate analysis module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Method 240 terminates at 254.
  • the groups were arranged such that the examples in each group have the same expression and camera pose. However, in other examples different factors may be used to determine the groups. For example, where camera pose is not expected to vary (e.g. where camera wobble is negligible) the groups may represent variation in expression only.
  • Methods 220 and 240 were described as using PCA to model the non-identity and identity subspaces. However, as described in Section 1.1 other analysis techniques could be used in place of PCA. For example, ICA or EFA could be used in 230 and 252.
  • the overall model may be constructed (210 of Figure 2a) from the two sub-spaces.
  • Operation 210 may be carried out using a model construction module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Method 200 may be performed independently for each camera view (where there is more than one camera view).
  • each camera view describes a separate camera (e.g.
  • the result of repeating method 200 for each camera view is NC 2D sub-space shape models which each follow equation (7) (where NC is the number of camera views), each of which describes the variation in 2D landmark point positions for the points modelled for a corresponding camera view.
  • the camera views may be synchronized; that is, for each of the camera views, respective frames are captured relating to the same instant of time.
  • a set of frames corresponding to the same instant of time captured by a plurality of cameras may be referred to herein as a superframe.
  • 2D Subspace models associated with identity and non-identity parameters may be used to analyze data including a plurality of frames of a face.
  • the plurality of frames may be, for example, a collection of still images or a video sequence of a face.
  • the frames may be generated using one or more real cameras to capture images of a real actor, or may be generated using one or more virtual cameras to capture images of a simulated face (e.g. using a 3D facial rig).
  • the camera or cameras used to capture the frames may have the same parameters and positions relative to the face as the camera or cameras used in generating the 2D subspace models.
  • the face in the data to be analyzed may be different from the faces used to generate the 2D subspace models.
  • the data to be analyzed will be referred to here as analysis data.
  • the frames of the analysis data may be descriptions in terms of landmark points, rather than images captured by a real or virtual camera.
  • the analysis data may include from 100 to 200 frames, but more or fewer frames may be included.
  • the analysis data may be selected from a larger set of received data. For example, from a subset of received frames.
  • the selection may be according to any suitable method. For example, the selection may be based on a uniform sampling of frames, by random selection of frames, or by a more principled approach to select training frames which reflect the greatest range of variation in the training data (e.g. demonstrating a variety of distinct of facial expressions).
  • the number of frames or examples to be used is denoted F.
  • the analysis method 300 is illustrated in Figure 3a, and begins at 301.
  • 2D landmark data for the /rth frame of the subset for the Mh camera may be described as a vector x Nk .
  • the corresponding shape model parameters b k and 2D rigid body transform (defined by M(0) fc , X ck ) are calculated for the overall generic shape model given by equation (2), as described in Section 1.2.
  • the generic shape model may be calculated by a generic shape model calculation module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the generic shape model calculation module/section may be implemented by the generic shape model generation module/section described in relation to 116 of Figure 1 b.
  • the best actor identity parameters f across all of the analysis frames (associated with a particular camera) are calculated.
  • the best non-identity parameters are calculated (e.g. actor expression and camera pose parameters) q E ci, q EC 2, -. ⁇ ,q ECF for each frame individually.
  • the following set of overdetermined linear equations may be used to determine qi and qEci, qEC2, ....,qECF.
  • F is the number of frames in the analysis data associated with the current camera. In some examples, F may have the same value for all cameras (e.g. where the analysis data relates to synchronized frames).
  • the set of linear equations (8) may be solved using standard numerical techniques to give qi and qEci, qE C 2, ....,qE CF .
  • Equation (8) yields identity parameters qi that are the same across all frames. This constraint may improve consistency and quality of the model. In addition, by constraining qi to be consistent across all of the frames, the number of parameters is reduced (there is not a different qi for each frame). This may simplify the description of the frames and reduce processing resources (e.g. CPU cycles, memory, etc.) used in processing the model.
  • processing resources e.g. CPU cycles, memory, etc.
  • Operation 306 may be carried out by a parameter determination module/section, which may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • 304 to 306 may be repeated for each camera for which data is included in the analysis data. If, at 308 if there is analysis data related to cameras that have not been processed, the method returns to 304. Otherwise the method terminates at 310.
  • Figure 3b illustrates a method 320 for determining identity parameters and non-identity parameters.
  • the method begins at 322, and at 324 data is received that describes a plurality of frames. Each frame may include coordinates of a plurality of landmark points of a face to be analyzed.
  • the received data may be analysis data, as described in relation to Figure 3a.
  • identity parameters and non-identity parameters may be determined based on the received data.
  • the identity parameters may be indicative of an identity of the face described by the received data, and non-identity parameters may be independent of the identity of the face.
  • the landmark points in the received data may be approximated by a description in which the identity parameters are the same across all frames of the plurality of frames, and respective sets of second parameters are allowed to vary between frames. The method terminates at 328.
  • data may be received relating to a single camera, and in other examples the received data may relate to two or more cameras.
  • the identity and non-identity parameters may be determined for each camera independently of the determination for other cameras.
  • landmark points in the received data may be
  • identity and non-identity parameters satisfy equation 8, where qi are identity parameters and q E c q EC 2, ---,q ECF are non-identity parameters.
  • the identity parameters represent a projection of the landmark points into the identity subspace and the non-identity subspace.
  • Figure 3c illustrates an apparatus 330 for carrying out the method of Figure 3b.
  • the apparatus 330 includes a data input module/section 332 to carry out operation 324 and a parameter identification module/section 334 to carry out operation 326.
  • Data received by the data input module/section 332 may be passed, with or without analysis or modification, to the parameter identification module/section 334.
  • Each of the data input module/section 332 and parameter identification module/section 334 may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • a facial solver may be used to map input information describing a facial expression (e.g. information identifying landmark points derived from image or video data) to output information describing rig control values that recreate the facial expression in a 3D model of a face.
  • Solvers may be implemented using machine learning. Prior to use of the solver at runtime, the solver may be trained using input training data in a training phase.
  • runtime refers to a phase in which a solver is used to produce rig control values based on input frames (e.g. corresponding to images of a real actor) or information about input frames (such as coordinates of tracked landmark points in the input frames).
  • T raining data may include a plurality of frames or training examples.
  • Each frame may include a set of landmark points associated with a facial expression and corresponding rig values for producing the same facial expression in a digital rig, the rig being a simulated model of a face.
  • a digital double rig may be used.
  • the face simulated by the rig corresponds as closely as possible with the face from which the landmark data was derived.
  • Using a digital double helps to ensure that the solver training data produced by the method is a realistic simulation of positions of 2D facial landmark points, e.g. as captured in videos of the corresponding real actor.
  • One approach to generating training data is to capture video of an actor using one or more cameras.
  • the cameras may correspond with an intended runtime camera arrangement, such as one camera directly in front of the actor’s face and one camera to the side.
  • the runtime cameras may be mounted on the actor’s head, for example. All, or a subset, of the frames of the captured video are selected for training, and the selected frames are annotated with landmark points, either manually or automatically.
  • Corresponding rig control values are then generated for each training frame. This may involve an animator selecting appropriate rig control values by hand to recreate the actor’s facial expression in each of the training frames.
  • the runtime actor may be the same as the actor in the training data, such that the rig is a digital double of the runtime actor.
  • simulated training data may be used, rather than video of a real actor.
  • the solver training data is produced by taking a number of animated example poses of a 3D model (or rig), selecting a set of 3D landmark points on the rig, and projecting these from 3D into 2D for each example pose.
  • Each example pose is defined on the animation rig by a set of rig control values.
  • the 3D to 2D point projection is performed using one or more virtual camera models.
  • the virtual camera models may correspond with real physical cameras that are to be used to capture video of a performance of a real-world actor’s face at runtime.
  • the final solver training data therefore consists of a number of examples, or frames, each of which contains a set of 2D points and a corresponding set of rig control-values.
  • the 3D rig or model may be a digital double of a real actor’s face.
  • the training data may use a rig that is a digital double of an actor that will be the subject of data input to the solver at runtime.
  • the other features may include one or more of raw pixels patches sampled from the input video in the vicinity of the landmark points, and/or features derived from those pixels by image pre-processing or other mathematical operations, for example: edge detection, intensity normalization of patch pixel values, or projecting the pre-processed sampled patch pixels in a principal component analysis basis.
  • the features input to the solver include lighting information
  • the plurality of frames used to generate the input to the solver may include variations in lighting. Further, in some examples one or more non-identity subspace may describe variation in lighting.
  • Facial solvers are linked to a specific face or identity (i.e. a specific actor) and perform poorly when used with a different face or identity. Decoupling the solver from a specific identity improves flexibility by allowing a runtime identity to differ from an identity during training of the solver.
  • Facial data that is represented by separate identity and non-identity parameters may be used to produce training data for training a facial solver. By using identity-free data to train the solver, an identity-free solver may be produced. This reduces or eliminates the influence of identity on the operation of the solver, and provides a solver that provides good results when the training data is based on a rig that is not a digital double of the runtime actor.
  • Figure 4a illustrates a method 400 of producing training data according to some examples.
  • the method begins at 402, and at 404 initial data is generated or received.
  • the initial data may describe a plurality of frames, each frame associated with a face, the face having, in each frame, a respective facial expression.
  • the data describing each frame may include coordinates of a plurality of landmark points of the face, and each frame may be associated with a set of rig control values describing parameter values to reproduce the respective facial expression in a simulated model of the face.
  • training data is produced by mapping each frame of one or more of the plurality of frames to an identity-free description, based on a projection of the data into a non-identity subspace and an identity subspace.
  • the training data may be used to train a solver.
  • the method terminates at 410.
  • Figure 4b illustrates an apparatus 420 for carrying out the method of Figure 4a.
  • Generation or receipt of initial data 404 may be carried out by an initial data handling
  • mapping/projecting data to an identity-free subspace may be carried out by a training data production module/section 424.
  • a solver training module/section 426 may cause the training data to be used to train a solver.
  • Each of 422, 424 and 426 may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Figure 4c illustrates a process 430 for producing training data.
  • the method begins at 432, and at 434 a set of solver training data is generated or received.
  • the solver training data may be based on real video of an actor’s performance or on a simulation based on a 3D rig.
  • the training data may describe facial landmarks and corresponding rig control values for a plurality of frames or examples.
  • a generic linear sub-space 2D face shape model e.g. as described in Section 1.2 and defined by equation (7) is fit to the solver training data to establish best fit identity parameters q ⁇ and non-identity parameters (e.g. expression/camera pose parameters) Q EC for each frame of the solver training data.
  • the subspace model is used to project the training data into an identity-free subspace. The method terminates at 440.
  • Generation or receipt of initial data 434 may be carried out by an initial data handling module/section. Finding c/i and Q EC may be performed by a parameter determination
  • Training data projection module/section Projection of the training data into an identity-free subspace may be performed by a training data projection module/section.
  • Each of the initial data handling module/section, parameter determination module/section and training data projection module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Initial solver training data may be generated as described in Section 3.
  • rig values may be based on a rig that is a digital double of the actor in the initial training data.
  • the rig used for initial training data whether the initial training data is based on real video or is simulated, need not be a digital double of the intended runtime actor.
  • the rig type and landmark points may correspond with a rig and landmark points to be used at runtime.
  • the training rig and runtime rig may have the same or corresponding controls.
  • the same or corresponding landmark points may be used to characterize a facial expression (e.g. tip of nose, pupils, corners of mouth, etc.)
  • the initial solver training data generated is a set of examples that may embody the range of variation of facial expression and a small amount of variation in position and orientation of each of the virtual camera views, plus also corresponding facial animation rig control values for each example.
  • Initial solver training data may include solver input training data and solver output training data defining a set of input data and the corresponding target outputs.
  • Solver input training data for the rth example can therefore be represented as an array of vectors each of which is a concatenation of 2D landmark points for a particular camera view:
  • NC is the number of camera views.
  • the initial solver training data may be distinct (i.e. be different, non-overlapping data) from the training data of Section 1.1. However, in some examples the initial solver training data may overlap with the training data of Section 1.1.
  • Actor identity parameters, q ⁇ for the facial identity in the initial training data may be determined for the solver input training data in a similar manner to that described in Section 2, where the solver input training data is analysis data in the method 300 of Figure 3a or the method 320 of Figure 3b.
  • q ⁇ may be determined from the solver training data for a linear subspace shape model for each camera view. The linear subspaces may be determined as described in relation to Figure 2a, for example.
  • a subset of the initial solver training frames may be selected for use in determining the identity parameters.
  • 100 to 200 frames of data may be selected for use in method 300 or 320. More than 200 or fewer than 100 frames may be used in some examples.
  • the selection of frames may be based on any suitable method.
  • the selection may be based on uniform sampling of frames from the solver training data, random selection of frames, or a more principled approach to select training frames which reflect the greatest range of variation in the training data (e.g. variation in facial expression).
  • all of the initial training data is used.
  • Figure 5 shows a method 500 carrying out this projection for a single camera view. Method 500 may be repeated for each camera view in turn. The method 500 begins at 502.
  • Equation (9) may be used to calculate qECk.
  • the solver training data for the frame is projected into an identity-free subspace by setting q to zero, and substituting back into equation (7) to give a set of‘identity-corrected’ or ‘identity-free’ points Xcorrected k for the /cth frame:
  • the solver training data is produced at 508.
  • the identity-free or identity-corrected points, Xcorrected k may be used as solver input training data for the identity-free solver training data.
  • Solver training data may be generated by combining or associating the identity-corrected points for each frame with the rig control values contained in the initial training data for the corresponding frame, the rig control values corresponding with solver output training data. The method terminates at 510.
  • Calculation of non-identity parameters 504 may be carried out by a parameter determination module/section, projecting the training data into an identity-free subspace 506 may be performed by a training data projection module/section.
  • a training data association section may be provided to associate identity-corrected points for each frame with the rig control values 508.
  • Each of the parameter determination module/section, training data projection module/section and training data association section may be implemented in software, firmware, hardware, circuitry, or a combination thereof. 4. Calibration for Runtime Actor
  • the system may be used to perform facial animation for any actor.
  • the animation may use a rig of the same type as the rig used in generating the identity-free solver (e.g. the rig may have the same or corresponding controls). In some examples, the same rig may be used.
  • the identity of the runtime actor may be calibrated, e.g. to determine a set of identity parameters gi for each camera stream/view.
  • Figure 6a shows a method 600 for this calibration. The method begins at 602, and at 604 a video range of motion (ROM) take for the actor is captured and tracked. At 606, the identity parameters for the actor are determined from the tracked ROM for each camera view. The method terminates at 608.
  • ROM video range of motion
  • the calibration 600 of the identity of the runtime actor may make use of a calibration module/section.
  • the calibration module/section may include a ROM handling module/section to capture a ROM or receive input, such as raw video of a ROM video, a tracked ROM video, etc. and where necessary, preform processing to generate tracked ROM output.
  • the tracked ROM output may be provided to an identity parameter determination module/section to determine identity parameters 606 for the actor based on the tracked ROM received from the ROM handling module/section.
  • Each of the calibration module/section, ROM handling module/section and identity parameter determination module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Figure 6b shows a method 610 for capturing and tracking a ROM according to 604 of method 600. The method begins at 612.
  • NC synchronized videos of the actor are captured, for example from NC different cameras.
  • the synchronized videos may be captured using one or more head-mounted cameras (HMC), and may contain footage in which the actor acts out a so-called facial‘range of motion’.
  • HMC head-mounted cameras
  • the actor performs a range of facial expressions, e.g. including movements of the eyes, eyelids, eyebrows and mouth, intended to capture a range of typical facial expressions for that actor.
  • the ROM video might be around 1-2 minutes of footage, although longer or shorter videos are possible.
  • a set of tracked 2D facial landmark points are obtained.
  • T, ⁇ xn, xz, ... ,XNO ⁇
  • x n is the vector of 2D landmark points for the nth camera view
  • NC is the number of camera views.
  • a tracking module/section may be provided to obtain landmark points from the frames of the ROM video, either automatically or based on user input.
  • the tracking module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • the method terminates at 618.
  • Determining identity parameters from the tracked ROM according to 606 of method 600 may be achieved using a similar method to that described in Section 3.2.
  • Figure 6c shows an example method 620 for determining the identity parameters. The method begins at 612.
  • a subset of the frames of the ROM may be selected. This selection may be similar to the selection of frames in 3.2, e.g. uniform sampling of frames from the ROM, random selection of frames from the ROM, a selection of frames from the ROM that reflect a range of variation in the training data, etc. In some examples, all of the ROM frames may be
  • a camera view from among the NC camera views is selected.
  • overdetermined linear equations for solving qi are set up, as described in Section 2 (e.g.
  • the result is a set of identity parameters for all cameras Q, qi N c ⁇ , where qi n is the vector of identity parameters for the nth camera view.
  • Figure 7a illustrates a method 700 for using a generic solver at runtime to produce rig control values based on input frames.
  • the method begins at 702.
  • data is received or generated describing a plurality of frames, each frame associated with a face, the face having, in each frame, a respective facial expression, the data describing each frame including coordinates of a plurality of landmark points of the face.
  • the frames may be from captured video footage of an actor.
  • solver input data is produced by mapping each frame to an identity-free description based on a projection of the data into a non-identity subspace and an identity subspace.
  • the subspaces may be determined as described in Section 1.2, and projections into the same subspaces may have been used in training the generic solver.
  • the solver input data is input to the generic solver.
  • the generic solver may be arranged to output rig control values for each frame (or each superframe), the rig control values may describe parameter values to reproduce the respective facial expression in a simulated facial model.
  • Figure 7b illustrates an apparatus 720 for carrying out the method of Figure 7a.
  • Initial data handling module/section 722 is arranged to receive or generate data according to operation 704.
  • Mapping module/section 724 may perform operation 706.
  • Solver interface module/section 726 is provided to input the solver input data generated by the mapping module/section to a generic solver.
  • module/section and solver interface module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • Figure 7c illustrates a method 730 of generating input data for a generic solver at runtime based on initial data (e.g. video of an actor).
  • the method includes projecting the initial input data into an identity-free subspace, e.g. using a similar process to that described in Section 3.3.
  • the method begins at 732, and at 734 a frame, or superframe, of input video from a performance capture system is received.
  • the superframe may include NC synchronized camera images.
  • the tracking may be performed automatically, or could be based on marks placed on the actor’s face prior to capture of the video performance.
  • the input frame tracking data is projected into an identity-free subspace for each camera. This may be carried out as described in Section 3.3. This may be based on the frame tracking data T, and actor identity parameters for all cameras Qi, This identity-free corrected tracking data may be used as final input data for an identity-free solver.
  • the solver may output rig control values based on the final input data. The method terminates at 740.
  • Receiving input video 734 may be performed by a video input module/section.
  • Tracking 736 may be performed by a tracking module/section.
  • Projection of the tracked data 738 into an identity-free subspace may be carried out by a projection module/section.
  • the video input module/section, tracking module/section and projection module/section may be implemented in software, firmware, hardware, circuitry, or a combination thereof.
  • one or more generic shape models may be created, describing the expected shape variation of a set of fiducial markers or‘landmark points’ on the face. While the method was described in relation to 2D facial landmark point positions, note that the method may also be applied to 3D facial landmark point positions.
  • generated facial landmark data may be constructed to give a set of 3D points prior to projection to 2D.
  • the 3D to 2D camera projection (projecting from a 3D description of landmark points to a 2D camera view) may be omitted, and the remaining stages of the method may be applied directly to 3D point data.
  • the 2D landmark points may be converted to 3D, e.g. using stereoscopic reconstruction from 2 or more 2D cameras to give 3D landmark points before applying further processing.
  • a standard triangulation method may be used (see, e.g. Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003, which is hereby incorporated by reference in its entirety).
  • Other information or sensors may be used to provide range information.
  • lidar may be used to generate 3D landmark point information.
  • Other landmark data such as the analysis data in Section 2, initial training data in Section 3.1 , ROM performance in Section 4, and runtime data in Section 5 may be retained as 3D data (where it is originally 3D data) or converted to 3D data (from the 2D data used in the examples). In each case, subsequent operations may be carried out using 3D coordinates.
  • Equation (2) becomes:
  • Equations (3), (4), (5), (6), (7), (8), (9) and (10) similarly generalize to the 3D case.
  • a solver could be built by defining a mapping from a set of reconstructed 3D facial points onto a corresponding set of rig control-values, and the generation of solver training data described herein could be applied entirely in 3D rather than 2D.
  • a 3D solver may be applied more generally than a 2D solver trained for specific camera positions and parameters.
  • conversion from 3D to 2D may be applied at any stage.
  • the processing of data to be input to the solver produces 3D data
  • the data may be converted to 2D for input to a 2D solver by application of a simple camera projection.
  • Linear models and methods generally require fewer computing resources and can be executed more rapidly than non-linear techniques.
  • non-linear techniques may provide advantages in terms of accuracy.
  • Methods disclosed herein may be implemented on one or more computers. Examples disclosed herein may be implemented as a computer program for running on a computer system, at least including executable code portions for performing steps of any method according to the examples when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to embodiments the invention.
  • the computer system may comprise server-side resources and client-side resources.
  • a computer program may be formed of a list of executable instructions such as a particular application program, a precompiled component of an application (such as browser) or an add-on/snap-in for the application and/or an operating system.
  • the computer program may, for example, include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, source code, object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a suitable computer system.
  • the computer program may be stored internally on a computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on computer readable media
  • the computer readable media may include non-transitory computer readable media .
  • the computer readable media may include, for example and without limitation, any one or more of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD ROM, CD R, Blu- ray, etc.) digital video disk storage media (DVD, DVD-R, DVD-RW, etc) or high density optical media (e.g.
  • non-volatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, DRAM, DDR RAM etc.; and data transmission media including computer networks, point- to-point telecommunication equipment, and carrier wave transmission media, and the like.
  • semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM
  • MRAM volatile storage media including registers, buffers or caches, main memory, RAM, DRAM, DDR RAM etc.
  • data transmission media including computer networks, point- to-point telecommunication equipment, and carrier wave transmission media, and the like. Embodiments of the invention are not limited to the form of computer readable media used.
  • Figure 9 illustrates a computer readable medium 900 storing instructions 910 thereon that, when executed by a processing device carry out a method as described herein.
  • a computer process may include an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process.
  • An operating system is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources.
  • An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
  • a computer system 800 suitable for use with examples herein is illustrated in Figure 8.
  • the computer system 800 may, for instance, include at least one processing unit 810, associated memory 820 and a number of input/output (I/O) devices 830.
  • I/O input/output
  • the computer system 800 processes information according to the computer program, which may be stored in memory 820, and produces resultant output information via I/O devices 830.
  • the processor 810, memory 320 and I/O devices 830 may be operatively coupled by one or more buses 840 or any other suitable communication medium.
  • Examples disclosed herein, or portions of those examples, may be implemented as software or code, firmware, or hardware, or any combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé d'analyse d'un visage, consistant à recevoir des données décrivant une pluralité de trames, chaque trame comprenant des coordonnées d'une pluralité de points de repère du visage ; et à déterminer, sur la base de la pluralité de trames : un ensemble de premiers paramètres indicatifs de l'identité du visage, et pour chaque trame, un ensemble respectif de seconds paramètres qui sont indépendants de l'identité du visage, l'ensemble de premiers paramètres et les ensembles de seconds paramètres étant déterminés de telle sorte que pour chaque trame, les coordonnées des points de repère du visage sont approximées par une description dans laquelle l'ensemble de premiers paramètres sont les mêmes sur toutes les trames de la pluralité de trames, et les ensembles respectifs de seconds paramètres sont variables entre des trames.
PCT/GB2020/050791 2019-03-26 2020-03-25 Analyse faciale WO2020193972A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1904122.7A GB201904122D0 (en) 2019-03-26 2019-03-26 Facial analysis
GB1904122.7 2019-03-26

Publications (1)

Publication Number Publication Date
WO2020193972A1 true WO2020193972A1 (fr) 2020-10-01

Family

ID=66381534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2020/050791 WO2020193972A1 (fr) 2019-03-26 2020-03-25 Analyse faciale

Country Status (2)

Country Link
GB (1) GB201904122D0 (fr)
WO (1) WO2020193972A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963425A (zh) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 人脸活体检测系统的测试方法、装置及存储介质
CN114463815A (zh) * 2022-01-27 2022-05-10 南京甄视智能科技有限公司 一种基于人脸关键点的面部表情捕捉方法

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
CHILD, D.: "The Essentials of Factor Analysis", 2006, BLOOMSBURY ACADEMIC PRESS
COOTES, T. F.EDWARDS, G. J.TAYLOR, C. J.: "Active appearance models", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 23, no. 6, 2001, pages 681, XP055249235, DOI: 10.1109/34.927467
COOTES, T. F.TAYLOR, C. J.: "Active Shape Models - 'Smart Snakes", PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE, LEEDS, 1992
COOTES, T.: "Image Processing and Analysis", 2000, OXFORD UNIVERSITY PRESS, article "Model-Based Methods in Analysis of Biomedical Images", pages: 223 - 248
COSTEN, NICHOLASF. COOTES, TIMOTHYJ. EDWARDS, GARETHTAYLOR, CHRISTOPHER: "Automatic Extraction of the Face Identity-Subspace", IMAGE VISION COMPUT., vol. 20, 2002, pages 319 - 329
DANIEL VLASIC ET AL: "Face transfer with multilinear models", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 24, no. 3, 1 July 2005 (2005-07-01), pages 426 - 433, XP058365615, ISSN: 0730-0301, DOI: 10.1145/1073204.1073209 *
FEI YANG ET AL: "Facial expression editing in video using a temporally-smooth factorization", COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012 IEEE CONFERENCE ON, IEEE, 16 June 2012 (2012-06-16), pages 861 - 868, XP032232158, ISBN: 978-1-4673-1226-4, DOI: 10.1109/CVPR.2012.6247759 *
KLAUDINY, M.MCDONAGH, S.BRADLEY, D.BEELER, T.MITCHELL, K.: "Real-Time Multi-View Facial Capture with Synthetic Training", COMPUTER GRAPHICS FORUM, vol. 36, no. 2, 2017, pages 325 - 336, XP055544469, ISSN: 0167-7055, DOI: 10.1111/cgf.13129
RICHARD HARTLEYANDREW ZISSERMAN: "Multiple view geometry in computer vision", 2003, CAMBRIDGE UNIVERSITY PRESS
STONE, JAMES V.: "Independent component analysis: a tutorial introduction", 2004, MIT PRESS
T.F. COOTESC.J. TAYLORD.H. COOPERJ. GRAHAM: "Active shape models - their training and application", COMPUTER VISION AND IMAGE UNDERSTANDING, vol. 61, 1995, pages 38 - 59
Z. ZHANG: "A Flexible New Technique for Camera Calibration", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 22, no. 11, 2000, pages 1330 - 1334, XP055037019, DOI: 10.1109/34.888718

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963425A (zh) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 人脸活体检测系统的测试方法、装置及存储介质
CN113963425B (zh) * 2021-12-22 2022-03-25 北京的卢深视科技有限公司 人脸活体检测系统的测试方法、装置及存储介质
CN114463815A (zh) * 2022-01-27 2022-05-10 南京甄视智能科技有限公司 一种基于人脸关键点的面部表情捕捉方法

Also Published As

Publication number Publication date
GB201904122D0 (en) 2019-05-08

Similar Documents

Publication Publication Date Title
Wood et al. 3d face reconstruction with dense landmarks
US11455496B2 (en) System and method for domain adaptation using synthetic data
US11036973B2 (en) Visual sign language translation training device and method
US11055521B2 (en) Real-time gesture recognition method and apparatus
Ploumpis et al. Towards a complete 3D morphable model of the human head
JP7200139B2 (ja) 仮想顔化粧の除去、高速顔検出およびランドマーク追跡
US11954904B2 (en) Real-time gesture recognition method and apparatus
US10949649B2 (en) Real-time tracking of facial features in unconstrained video
CN108369643B (zh) 用于3d手部骨架跟踪的方法和系统
US9361723B2 (en) Method for real-time face animation based on single video camera
WO2022095721A1 (fr) Procédé et appareil de formation de modèle d'estimation de paramètre, dispositif, et support de stockage
CN111354079A (zh) 三维人脸重建网络训练及虚拟人脸形象生成方法和装置
US11615516B2 (en) Image-to-image translation using unpaired data for supervised learning
JP2024501986A (ja) 3次元顔再構築の方法、3次元顔再構築の装置、デバイスおよび記憶媒体
Dundar et al. Unsupervised disentanglement of pose, appearance and background from images and videos
CN110598638A (zh) 模型训练方法、人脸性别预测方法、设备及存储介质
Yu et al. A video-based facial motion tracking and expression recognition system
WO2020193972A1 (fr) Analyse faciale
Purps et al. Reconstructing facial expressions of hmd users for avatars in vr
Zimmer et al. Imposing temporal consistency on deep monocular body shape and pose estimation
Lagneaux et al. An Automatic Highly Dynamical Digital Twin Design with YOLOv8 for hydrodynamic studies on living animals
Ardelean et al. Pose Manipulation with Identity Preservation
Aleksandrova et al. Approach for Creating a 3D Model of a Face from its 2D Image
Rochette Pose estimation and novel view synthesis of humans
Duignan Exploring Advanced Methodologies for the Generation of Synthetic Data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20716886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20716886

Country of ref document: EP

Kind code of ref document: A1