WO2022148349A1 - 人脸追踪方法、装置、电子设备和存储介质 - Google Patents

人脸追踪方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2022148349A1
WO2022148349A1 PCT/CN2022/070133 CN2022070133W WO2022148349A1 WO 2022148349 A1 WO2022148349 A1 WO 2022148349A1 CN 2022070133 W CN2022070133 W CN 2022070133W WO 2022148349 A1 WO2022148349 A1 WO 2022148349A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
key frame
data
tracking
frame
Prior art date
Application number
PCT/CN2022/070133
Other languages
English (en)
French (fr)
Inventor
陈文喻
刘更代
Original Assignee
百果园技术(新加坡)有限公司
刘更代
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司, 刘更代 filed Critical 百果园技术(新加坡)有限公司
Priority to US18/260,126 priority Critical patent/US20240062579A1/en
Priority to EP22736511.1A priority patent/EP4276681A4/en
Priority to JP2023541109A priority patent/JP7503348B2/ja
Publication of WO2022148349A1 publication Critical patent/WO2022148349A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • G06V40/173Classification, e.g. identification face re-identification, e.g. recognising unknown faces across different face tracks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of video processing, for example, to a face tracking method, apparatus, electronic device and storage medium.
  • Video face tracking can be used to achieve some visual enhancement effects, such as trying on hats, glasses, adding beards, tattoos, etc.
  • the 2D face in the video is the 2D projection of the 3D face after the combination of the user's face identity and the user's expression.
  • real-time face tracking it is necessary to accurately calculate the user's face according to the 2D key points on the face in the video. identity and user expressions, so it is necessary to quickly reconstruct the face identity for face tracking.
  • the present application provides a face tracking method, device, electronic device and storage medium to solve the problems of slow convergence of face identity vector and many calls of face identity optimization thread during face identity reconstruction in the related art.
  • the present application provides a face tracking method, which is applied to a tracking thread, where the tracking thread maintains a first key frame data set and a second key frame data set, and the face tracking method includes:
  • the optimization thread is called, so that the optimization thread optimizes the face identity based on the first key frame data set.
  • the present application provides a face tracking method, which is applied to optimization threads, including:
  • the current face identity vector used by the tracking thread is used as the initial face identity vector
  • first key frame data set is a data set updated after the tracking thread performs face tracking, and the first key frame data set includes face tracking data
  • the optimized face identity vector is sent to the tracking thread, and when the tracking thread receives the optimized face identity vector, the received The face identity vector is determined as the current face identity vector;
  • the present application provides a face tracking device, which is applied to a tracking thread, where the tracking thread maintains a first key frame data set and a second key frame data set, and the face tracking device includes:
  • the optimization thread operation judgment module is set to judge whether the optimization thread is running during the face tracking process of the video frame
  • a second key frame data set update module configured to run in response to the optimization thread, when the video frame is a key frame, update the second key frame data set according to the video frame;
  • the clearing module is configured to clear the video frames in the second key frame data set when receiving the clearing instruction sent by the optimization thread to clear the video frames in the second key frame data set, and clear the second key frame data set.
  • the dataset is updated to the first keyframe dataset;
  • a first key frame data set update module configured to update the first key frame data set according to the video frame and the second key frame data set when the video frame is a key frame in response to the optimization thread not running keyframe dataset;
  • the optimization thread calling module is configured to call the optimization thread after the first key frame data set is updated, so that the optimization thread optimizes the face identity based on the first key frame data set.
  • the embodiment of the present application provides a face tracking device, which is applied to optimizing threads, including:
  • the face identity vector initialization module is set to take the current face identity vector used by the tracking thread as the initial face identity vector after the optimization thread is called;
  • the first key frame data set obtaining module is configured to obtain the first key frame data set, the first key frame data set is the data set updated after the tracking thread performs face tracking, and the first key frame data set including face tracking data;
  • a face tracking data optimization module configured to optimize the face tracking data in the first key frame data set based on the initial face identity vector to obtain optimized face tracking data
  • a face identity vector optimization module configured to iteratively optimize the initial face identity vector based on the optimized face tracking data to obtain an optimized face identity vector
  • the stop iteration judgment module is set to, after each round of iteration, judge whether the stop iteration condition is met according to the optimized face identity vector and the initial face identity vector;
  • a stop iteration module configured to send the optimized face identity vector to the tracking thread in response to satisfying the stop iteration condition, and when the tracking thread receives the optimized face identity vector, Determine the received face identity vector as the current face identity vector;
  • the clearing instruction sending module is configured to send a clearing instruction for clearing the video frames in the second key frame data set to the tracking thread in response to not meeting the stop iteration condition, and after receiving the clearing instruction, the tracking thread will When the second key frame set of the second key data set is a non-empty set, updating the second key frame data set to the first key frame data set;
  • the initial face identity vector update module is set to take the optimized face identity vector as the initial face identity vector and return to the face tracking data optimization module.
  • the application provides an electronic device, including:
  • processors one or more processors
  • storage means arranged to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned face tracking method.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the above-mentioned face tracking method.
  • FIG. 1 is a flowchart of steps of a face tracking method provided in Embodiment 1 of the present application;
  • FIG. 2 is a flowchart of steps of a face tracking method provided in Embodiment 2 of the present application;
  • FIG. 3 is a flowchart of steps of a face tracking method provided in Embodiment 3 of the present application.
  • FIG. 4 is a flowchart of steps of a face tracking method provided in Embodiment 4 of the present application.
  • FIG. 5 is a structural block diagram of a face tracking device according to Embodiment 5 of the present application.
  • FIG. 6 is a structural block diagram of a face tracking device according to Embodiment 6 of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 7 of the present application.
  • FIG. 1 is a flowchart of steps of a face tracking method provided in Embodiment 1 of the present application.
  • the embodiment of the present application can be applied to the situation in which the tracking thread tracks the face to extract key frames during the face tracking process, and the method can be implemented by the present application.
  • the face tracking device of the example is implemented, and the face tracking device can be realized by hardware or software, and is integrated in the electronic equipment provided by the embodiment of the present application.
  • the face tracking device of the embodiment of the present application The method may include the following steps:
  • the tracking thread and the optimization thread can be used to perform face tracking on the video frame, and the tracking thread is used to perform face tracking on the video frame of the video to obtain face pose data, expression data, and face key points, etc.
  • the thread of face tracking data is also used to detect whether the video frame is a key frame.
  • the optimization thread can be a thread that optimizes the face identity vector based on the key frame detected by the tracking thread.
  • the optimization thread provides the optimized thread for the tracking thread. Face identity vector, the tracking thread performs face tracking on video frames and detects key frames based on the optimized face identity vector.
  • Data can be exchanged between the tracking thread and the optimization thread, so that the tracking thread can determine whether the optimization thread is running.
  • the optimization thread can be set with a state flag, and the tracking thread can determine whether the optimization thread is running according to the state flag. If the optimization thread is running, execute S102; if the optimization thread is not running, execute S104.
  • the tracking thread maintains a first key frame data set ⁇ 1 and a second key frame data set ⁇ 2 , and each key frame data set includes a key frame set, a frame vector of the key frame, and a face of the key frame. track data.
  • the tracking thread Before the tracking thread starts to track the face of the video, it can initialize various parameters such as the face identity vector, the first key frame data set ⁇ 1 and the second key frame data set ⁇ 2 , where the current face identity vector can be the current The face identity vector used to track the video frame.
  • the tracking thread uses the current face identity vector to perform face tracking on the video frame to obtain face tracking data.
  • the face tracking data can be face pose data, face Expression data and face key points, that is, during the face tracking process, given the current face identity vector ⁇ , the tracking thread obtains the following face tracking data after performing face tracking on the i-th video frame:
  • Q i is the key point of the face
  • P i is the pose data
  • ⁇ i is the expression data
  • the optimization thread uses the face tracking data of the key frames in the first key frame data set to optimize the face identity vector, and the first key frame data set is not updated during the running process of the optimization thread.
  • the tracking thread also maintains a first principal component analysis (Principal Components Analysis, PCA) subspace and a second PCA subspace
  • PCA subspace may be a keyframe set of a keyframe dataset
  • the space formed by the frame vector, average frame vector and eigenvector matrix of all key frames in the first PCA subspace can be the frame vector, average frame In the space formed by the vector and the feature vector matrix, before starting the optimization thread, the first PCA subspace is assigned to the second PCA subspace, and the current video can be detected according to the face tracking data of the current video frame and the second PCA subspace.
  • the frame is a key frame
  • the current video frame F i is a key frame
  • Two key frame data sets and update the second PCA subspace, namely ⁇ 2 ⁇ 2 ⁇ F i ⁇ , and then continue to detect whether the next video frame is a key frame with the updated second PCA subspace.
  • the tracking thread calculates the frame vector of each video frame according to the expression data ⁇ and the rotation vector in the attitude data, and performs the frame vector of all key frames in the first key frame data set.
  • PCA analyzes and retains 95% of the variation, and obtains an average frame vector v0 and an eigenvector matrix M.
  • the frame vector, average frame vector v0 , and eigenvector matrix M of all key frames can be used as the first PCA subspace, After assigning the first PCA subspace to the second PCA subspace, the distance from the frame vector v of any video frame to the second PCA subspace is:
  • the distance dis(v,M) is smaller than the preset threshold ⁇ 1 , determine the video frame as a key frame, add the video frame to the second key frame set of the second key frame data set, and then add the second key frame to the second key frame set.
  • the frame vector of the frame is added to the second key frame data set, and the second PCA subspace is updated according to the frame vector of the second key frame. If the distance dis(v, M) is not less than the preset threshold ⁇ 1 , then the next A video frame is tracked.
  • the optimization thread determines whether the second key frame set in the second key frame data set is a non-empty set, and if the first key frame set in the second key frame data set is a non-empty set The second key frame set is a non-empty set, indicating that the tracking thread has tracked key frames during the iterative optimization process of the optimization thread.
  • the optimization thread can send an instruction to clear the video frames of the second key frame data set to the tracking thread.
  • the optimization thread is not running, you can determine whether the video frame is a key frame based on the first PCA subspace and face tracking data. Refer to determining whether a video frame is a key frame based on the second PCA subspace and face tracking data, here No further details.
  • the video frame is added as a key frame to the first key frame set of the first key frame data set, and the frame vector of the video frame is added to the first key frame data set.
  • the frame vector of the frame is added to the first PCA subspace, and the average frame vector and eigenvector matrix of the first PCA subspace are updated to track the next video frame.
  • the second key frame data is judged Whether the second key frame set in the set is a non-empty set, if the second key frame set in the second key frame data set is a non-empty set, update the key frames in the second key frame set to the first key frame set, The frame vectors of keyframes in the second keyframe set are added to the first keyframe dataset.
  • the update of the first key frame data set indicates that a new key frame is tracked, and the second key frame set in the second key frame data set is a non-empty set. It also means that a new key frame is tracked, and the first key frame data set will also be updated.
  • the optimization thread is not running, if the first key frame data set is updated, the optimization thread can be called, and the face identity vector can be optimized by combining the face tracking data of the key frames of the first key frame data set through the optimization thread.
  • the face tracking method of the embodiment of the present application realizes that during the running process of the optimization thread, the second key frame data set is first updated according to the detected key frame, and when an instruction of the optimization thread to clear the video frames of the second key frame data set is received, When the second key frame data set is updated to the first key frame data set, and the video frame in the second key frame data set is cleared, when the optimization thread is not running, if the video frame is a key frame
  • the second key frame data set updates the first key data set, that is, the tracking thread can add key frames at any time regardless of whether the optimization thread is running.
  • the convergence speed of the face identity vector does not need to detect a key frame to call the optimization thread once, which reduces the number of calls of the optimization thread, and finally makes the real-time face identity vector optimization high real-time, consumes less resources, and can use more resources to implement complex optimization algorithms to improve the accuracy of face identity optimization.
  • FIG. 2 is a flowchart of steps of a face tracking method provided in Embodiment 2 of the present application.
  • the embodiment of the present application is described on the basis of the foregoing Embodiment 1.
  • the face tracking method of the embodiment of the present application is described The method may include the following steps:
  • the tracking thread provides key frames for the optimization thread, and the optimization thread uses the key frames to optimize the face identity vector. After the optimization thread iteratively optimizes the face identity and obtains the converged face identity vector, the The identity vector is sent to the tracking thread, and the tracking thread uses the received face identity vector as the current face identity vector, and uses the current face identity vector to perform face tracking on the received video frame.
  • S202 Track the video frame based on the current face identity vector, and obtain face key points, posture data and expression data of the face in the video frame as face tracking data.
  • the tracking thread Before the tracking thread starts to track the face of the video, it can initialize various parameters such as the face identity vector, then the current face identity vector can be the face identity vector currently used to track the video frame, and the tracking thread adopts the current face identity vector.
  • Face tracking data can be obtained after the face identity vector performs face tracking on the video frame.
  • the face tracking data can be face pose data, face expression data and face key points, that is, during the face tracking process, the current identity
  • the tracking thread obtains the following face tracking data after performing face tracking on the i-th video frame:
  • Q i is the key point of the face
  • P i is the pose data
  • ⁇ i is the expression data
  • Data can be exchanged between the tracking thread and the optimization thread, so that the tracking thread can obtain whether the optimization thread is running.
  • the optimization thread can be set with a state flag, and the tracking thread can judge whether the optimization thread is running according to the state flag. If the optimization thread is running, execute S204; if the optimization thread is not running, execute S212.
  • the first PCA subspace is a frame of all key frames in the first key frame set of the first key frame data set A space of vectors, mean frame vectors, and eigenvector matrices.
  • the first PCA subspace may be assigned to the second PCA subspace, and the first PCA subspace is all data of the first key frame set in the first key frame data set.
  • the frame vector, average frame vector, and eigenvector matrix of keyframes may be assigned to the second PCA subspace, and the first PCA subspace is all data of the first key frame set in the first key frame data set.
  • S205 Determine whether the video frame is a key frame based on the second PCA subspace, the gesture data and the expression data of the video frame.
  • the frame vector of the video frame can be calculated based on the gesture data and the expression data, the distance between the frame vector of the video frame and the second PCA subspace is calculated, and when the distance is less than a preset threshold, it is determined that the video frame is the key. frame, go to S206, and when the distance is not less than the preset threshold, it is determined that the video frame is not a key frame, go to S209.
  • the tracking thread calculates the frame vector v of each video frame according to the expression data ⁇ of the face in the video frame and the rotation vector in the posture data, and obtains an average frame vector v 0 and a
  • the eigenvector matrix M, M is the first PCA subspace. After assigning the first PCA subspace to the second PCA subspace, the distance from the frame vector v of any video frame to the second PCA subspace is:
  • v 0 is the average frame vector in the second PCA subspace
  • M is the feature vector matrix in the second PCA subspace
  • the current video frame can be added as a key frame to the second key frame set of the second key frame data set, and the face key points, pose data, expression data and frame vector of the video frame can be added to the second key frame data.
  • Concentrating, S207 and S210 may be performed, wherein the frame vector of the video frame may be calculated according to the gesture data and the expression data of the video frame and added to the second key frame data set.
  • the second key frame set of the second key frame data set may also be set with an identifier, and it may be determined first whether the identifier of the second key frame set is the preset first identifier, the preset first identifier A flag indicates that the second key frame set is an empty set.
  • the preset first identifier may be 1, indicating that the second key frame set is an empty set, and if the preset first identifier is not 1, such as 0, it means that the second key frame set is a non-empty set, The detected key frame can be directly added to the second key frame set without changing the identification.
  • the identification of the second key frame set is the preset first identification 1, it means that the second key frame set is an empty set.
  • the key frame is generated, and after the key frame is added to the second key frame set, the identifier of the second key frame set can be set to the preset second identifier 0.
  • the optimization thread can easily know whether the second key frame set is a non-empty set, so that the optimization thread can track the second key frame set in time during the optimization process when the second key frame set is a non-empty set.
  • the thread sends a clear command to merge the first set of keyframes and the second set of keyframes, using the newly detected keyframes to optimize the face identity vector.
  • a dynamic threshold is used, that is, the threshold is updated in real time, exemplarily:
  • ⁇ 1 ⁇ 1 + ⁇ 0
  • ⁇ 1 is the threshold value
  • ⁇ 0 is the step size, that is, the threshold value is updated every time a key frame is detected, and then S208 is performed.
  • the frame vector of the video frame can be added to the second PCA subspace, and the second PCA subspace can be recalculated The average frame vector and eigenvector matrix of the space to update the second PCA subspace, and continue to judge whether the next video frame is a key frame with the updated second PCA subspace, until the optimization thread is not running, based on the first PCA subspace to determine whether a video frame is a key frame.
  • the tracking thread After tracking a video frame, the tracking thread needs to track the next video frame. Before tracking the next video frame, it can first determine whether the face identity vector is received from the optimization thread. If the face identity vector is received from the optimization thread, the description The face identity vector is optimized and updated, and the process returns to S201 to use the received face identity vector as the current face identity vector.
  • the process returns to S202, indicating that the face identity vector has not been updated and optimized, and the current face identity vector continues to track the next video frame.
  • the optimization thread determines whether the second key frame set in the second key frame data set is a non-empty set.
  • the frame set is a non-empty set, indicating that the tracking thread has tracked key frames during the iterative optimization process of the optimization thread.
  • the optimization thread can send an instruction to clear the second key frame set to the tracking thread. After the instruction, update the tracked keyframes in the second keyframe set to the first keyframe set, and clear the second keyframe set, so that the running optimization thread can use the newly tracked keyframes to iteratively optimize people.
  • the face identity vector realizes the addition of key frames during the running process of the optimization thread, avoids missing key frames due to the inability to track key frames, and improves the increase speed of key frames. Add multiple keyframes to reduce the number of calls to the optimization thread.
  • the added frame vectors can be added to the first PCA subspace, and the average frame vectors and features of the first PCA subspace can be recalculated vector matrix to update the first PCA subspace.
  • the frame vector of the video frame can be calculated based on the pose data and expression data of the face in the video frame, and the distance between the frame vector of the video frame and the first PCA subspace can be calculated. If the distance is less than the preset threshold When the distance is greater than or equal to the preset threshold, it is determined that the video frame is not a key frame, and S216 is executed.
  • the video frame can be added to the first key frame set of the first key frame data set, and the frame vector, face key point, pose of the video frame
  • the data and expression data are added to the first key frame dataset, and at the same time, the key frames in the second key frame set in the second key frame dataset are added to the first key frame set, and the key frames in the second key frame dataset are added.
  • the frame vector, face keypoints, pose data and expression data of are added to the first keyframe dataset.
  • ⁇ 1 ⁇ 1 + ⁇ 0
  • ⁇ 1 is the threshold value
  • ⁇ 0 is the step size, that is, the threshold value is updated every time a key frame is detected.
  • the frame vector of the detected key frame is added in the first key frame data set, or the frame vector of the key frame in the second key frame data set is added, the newly added frame vector can be added to the first PCA subspace, and according to Frame vectors in the first PCA subspace Calculate the average frame vector and eigenvector matrix to update the first PCA subspace.
  • the tracking thread determines whether the second key frame set is a non-empty set, if the second key frame set is a non-empty set, execute S217, and if the second key frame set is an empty set, return S209.
  • the keyframes in the second keyframe set in the second keyframe dataset can be added to the first keyframe set, and the frame vectors, face key points, pose data, and expression data of the keyframes in the second keyframe dataset can be added.
  • Add to the first key frame data set update the first PCA subspace according to the newly added frame vector, and continue to use the updated first PCA subspace to detect whether the next video frame is a key frame.
  • the optimization thread when the optimization thread is not running, if the first key frame set is updated, the optimization thread is called, so as to optimize the face identity by using the posture data and expression data of the key frames in the first key frame data set by the optimization thread. vector.
  • S219 Calculate the face change rate by using the two adjacent face identity vectors received from the optimization thread.
  • F( ⁇ ) is the face grid corresponding to the face identity vector ⁇
  • s is the minimum circumscribed rectangle of the three-dimensional average face.
  • the diagonal length of when the two adjacent face identity vectors change from ⁇ 1 to ⁇ 2 , calculate the face change rate:
  • the maximum movement in the j vertices of the face mesh divided by the diagonal length of the minimum circumscribed rectangle of the three-dimensional average face is the face change rate.
  • the tracking thread can track the video frame based on the current face identity vector, and no key frames are detected during the tracking process, that is, no more Steps S203-S221 are performed.
  • the face tracking method of the embodiment of the present application realizes that during the running process of the optimization thread, the second key frame data set is first updated according to the detected key frame, and when an instruction of the optimization thread to clear the video frames of the second key frame data set is received, When the second key frame data set is updated to the first key frame data set, and the video frame in the second key frame data set is cleared, when the optimization thread is not running, if the video frame is a key frame
  • the second key frame data set updates the first key data set, that is, the tracking thread can add key frames at any time regardless of whether the optimization thread is running.
  • the convergence speed of the face identity vector does not need to detect a key frame to call the optimization thread once, which reduces the number of calls of the optimization thread, and finally makes the real-time face identity vector optimization high real-time, consumes less resources, and can use more resources to implement complex optimization algorithms to improve the accuracy of face identity optimization.
  • the threshold value is updated according to the preset step size. As the key frame increases, the threshold value gradually increases, and the video frame is less likely to be detected as a key frame.
  • the first key frame in the first key frame data set is the first key frame. Framesets are updated less frequently, and the optimization thread is called less often.
  • FIG. 3 is a flowchart of steps of a face tracking method provided in Embodiment 3 of the present application.
  • the embodiment of the present application may be applied to a situation where an optimization thread optimizes a face identity vector.
  • the face tracking device can be implemented by hardware or software and integrated in the electronic equipment provided by the embodiment of the present application.
  • the face tracking method of the embodiment of the present application can include the following step:
  • the current face identity vector used by the tracking thread can be obtained, and the current face identity vector can be used as the initial face identity vector for optimization.
  • the optimization thread can output the latest face identity vector to the tracking thread.
  • the face identity vector of is used as the initial face identity vector.
  • the tracking thread maintains the first key frame data set, the first key frame data set includes face tracking data of all key frames detected before the optimization thread is called, and the face tracking data includes face key points, pose data and expressions data.
  • a three-dimensional face model can be established based on the initial face identity vector and expression data, and the face key points of the three-dimensional face model can be obtained.
  • the key points of the face in the tracking data are used to obtain the optimal posture data and expression data, and the optimized posture data and expression data are obtained as the optimized face tracking data.
  • the face size of the tracked face may be calculated according to the face key points
  • the expression weight of each key frame may be calculated based on the expression data of each key frame
  • the expression weight of each key frame may be calculated based on the face tracking data, face size
  • the expression weight, the current face identity vector and the initial face identity vector are used to establish an optimization equation, and the optimization equation is iteratively solved to obtain the optimized face identity vector.
  • the optimized face identity vector and the initial face identity vector can be used to calculate the face change rate. If the face change rate is less than the preset threshold, the iteration will be stopped, and the optimized face identity vector will be called this time. As a result of the optimization, when the stop iteration condition is reached, S306 is performed, and when the stop iteration condition is not reached, S307 is performed.
  • the face identity vector can be sent to the tracking thread, and the tracking thread receives the face identity vector as the current face identity vector.
  • the tracking thread If the stop iteration condition is not reached, and the tracking thread detects a new keyframe and adds it to the second keyframe set of the second keyframe dataset, it can send the tracking thread to clear the second keyframe after each iteration
  • the set instruction causes the tracking thread to add the key frames in the second key frame set to the first key frame set and clear the second key frame set, so that the tracking thread can use the newly detected key frame in the next round of optimization.
  • Optimize face identity vector Optimize face identity vector.
  • the optimization thread takes the optimized face identity vector obtained from this round of optimization iterations as the initial face identity vector of the next round of iterations, and returns to S302 to continue iterative optimization Face identity vector.
  • the current face identity vector used by the tracking thread is used as the initial face identity vector
  • the first key frame data set is obtained, and based on the initial face identity vector pair
  • the face tracking data in the first key frame data set is optimized to obtain the optimized face tracking data; based on the optimized face tracking data, the initial face identity vector is iteratively optimized to obtain the optimized face identity vector; After the round of iteration, judge whether the stop iteration condition is met according to the optimized face identity vector and the initial face identity vector; if the stop iteration condition is met, the optimized face identity vector is sent to the tracking thread, and the tracking thread receives the optimization
  • the received face identity vector is determined as the current face identity vector; if the stop iteration condition is not met, a clear command to clear the video frame in the second key frame data set is sent to the tracking thread, After the tracking thread receives the clearing instruction, when the second key frame set of the second key frame data set is a non-empty set, the second key frame
  • the optimization thread can use the key frames newly added to the second key frame set to optimize the face identity vector during the iterative optimization process, without waiting for the optimization thread to add key frames after each call to optimize the face identity.
  • it can quickly extract a large number of key frames to optimize the face identity vector, which improves the convergence speed of the face identity vector.
  • the number of calls finally makes the real-time face identity vector optimization high real-time, consumes less resources, and can use more resources to implement complex optimization algorithms to improve the accuracy of face identity optimization.
  • FIG. 4 is a flowchart of steps of a face tracking method provided by Embodiment 4 of the present application.
  • the embodiment of the present application is described on the basis of the foregoing Embodiment 3.
  • the face tracking method of the embodiment of the present application is described.
  • the method may include the following steps:
  • the current face identity vector is ⁇ rre
  • the tracking thread uses the current face identity vector ⁇ pre to perform face tracking on the received video frame.
  • a three-dimensional face model can be established based on the initial face identity vector and expression data, the face key points of the three-dimensional face model can be obtained, and the face key points of the three-dimensional face model and the faces in the face tracking data can be obtained.
  • the key point is to solve the optimal posture data and expression data, and the optimized posture data and expression data are obtained as the optimized face tracking data.
  • the 3D face model F i of the ith video frame can be expressed as:
  • C 0 is the neutral face of the user without expression
  • C exp is the expression shape fusion deformer for the user
  • is the facial expression data.
  • B is the average face
  • B ID is the user's identity shape fusion deformer
  • B exp is the expression shape fusion deformer designed for the average face B, where B, B ID and B exp can be preset.
  • C is the expressionless neutral face of the user, is the modal product.
  • k is the k-th iteration, is the neutral face used in the k-th iteration, is the expression shape blend deformer used in the k-th iteration, is the j key points obtained by projecting the three-dimensional face (C 0 +C exp ⁇ i ) j
  • Q i is the face key point directly extracted by the face key point extraction algorithm in the tracking thread
  • is the parameter.
  • Q i is the key point of the face
  • P i is the pose data
  • ⁇ i is the expression data
  • the minimum circumscribed rectangular frame of the face can be determined according to the key points of the face, and the face size f i of the face can be calculated by the key points of the face on the minimum circumscribed rectangular frame.
  • the first set of key frames includes multiple key frames, and after performing face tracking on the multiple key frames to obtain the expression data of each key frame, the minimum expression data can be determined from the expression data in all the key frames. , and then use the preset constant term, the minimum expression data and the expression data of the key frame to calculate the expression weight of the key frame, where the expression weight of the key frame is negatively correlated with the expression data of the key frame, as shown in the following formula:
  • the optimization equation is established as follows:
  • I is the first key frame set
  • ⁇ 1 , ⁇ 2 and ⁇ are parameters that can be preset
  • ⁇ pre is the current face identity vector used by the tracking thread.
  • I is the first key frame set
  • ⁇ 1 , ⁇ 2 and ⁇ 3 are parameters that can be preset
  • ⁇ pre is the current face identity vector used by the tracking thread.
  • the face identity vector obtained by the current iteration and the face identity vector of the previous iteration are used to calculate the face change rate.
  • F( ⁇ ) is the face mesh corresponding to the face identity vector ⁇
  • s is the diagonal length of the minimum circumscribed rectangle of the three-dimensional average face, that is, the maximum movement of the j vertices of the face mesh divided by
  • the diagonal length of the smallest circumscribed rectangle of the three-dimensional average face is the face change rate.
  • the tracking thread After receiving the optimized face identity vector ⁇ k , the tracking thread determines the received face identity vector ⁇ as the current face identity vector to the video. frame tracking.
  • the optimization thread also calculates a new user face according to the optimized face identity vector ⁇ k and user facial expression shape mixer
  • the optimized face identity vector ⁇ k , the user face and user facial expression shape mixer Track video frames.
  • the frame vector of each key frame is also updated according to the expression data and posture data of the optimized key frame in the first key frame data set, and the first PCA subspace is updated according to the frame vector of each key frame, so that the first PCA subspace is more For accuracy, key frames can be detected more accurately according to the first PCA subspace.
  • the thread receives the clear instruction, when the second key frame set is a non-empty set, it updates the second key frame set to the first key frame set, and clears the second key frame set, and the optimized face identity vector As the initial face identity vector, and then continue to iteratively solve the optimized face identity vector, so that the optimization thread optimizes the face identity vector by using the keyframes newly added to the second keyframe set during the iterative optimization process, without the need for Wait for the optimization thread to add key frames to optimize the face identity vector after each call of the optimization
  • the face size f i is introduced into the optimization equation, which can adapt to different face sizes; the smoothing term ⁇ 2
  • of the face identity vector is introduced, although the overall convergence speed of the face identity vector is reduced , but in special application scenarios such as face changing, it can prevent the sudden jitter of the face after the face identity vector is updated in the tracking thread; in the optimization process, ⁇ 3
  • FIG. 5 is a structural block diagram of a face tracking apparatus according to Embodiment 5 of the present application. As shown in FIG. 5 , the face tracking apparatus of the embodiment of the present application is applied to a tracking thread, and the tracking thread maintains a first key frame A data set and a second key frame data set, the face tracking device includes:
  • the optimization thread operation judgment module 501 is configured to judge whether the optimization thread is running during the face tracking process on the video frame;
  • the second key frame data set update module 502 is configured to respond to the operation of the optimization thread, in the video frame When the frame is a key frame, the second key frame data set is updated according to the video frame;
  • the clearing module 503 is configured to clear the video frame in the second key frame data set when receiving the clearing instruction sent by the optimization thread.
  • the first key frame data set update module 504 is set to respond to the optimization thread not run, when the video frame is a key frame, update the first key frame data set according to the video frame and the second key frame data set;
  • the optimization thread calling module 505 is set to After the key frame data set is updated, the optimization thread is called, so that the optimization thread optimizes the face identity based on the first key frame data set.
  • the face tracking device provided by the embodiment of the present application can execute the face tracking method provided by the first embodiment and the second embodiment of the present application, and has functional modules and effects corresponding to the execution method.
  • FIG. 6 is a structural block diagram of a face tracking apparatus provided in Embodiment 6 of the present application. As shown in FIG. 6 , the face tracking apparatus of the embodiment of the present application is applied to an optimization thread, and may include the following modules:
  • the face identity vector initialization module 601 is set to use the current face identity vector used by the tracking thread as the initial face identity vector after the optimization thread is called;
  • the first key frame data set acquisition module 602 is set to obtain A first key frame data set, the first key frame data set is a data set updated after the tracking thread performs face tracking, and the first key frame data set includes face tracking data;
  • face tracking data optimization module 603, set to optimize the face tracking data in the first key frame data set based on the initial face identity vector, to obtain optimized face tracking data;
  • face identity vector optimization module 604 set to be based on the
  • the optimized face tracking data is iteratively optimized for the initial face identity vector to obtain the optimized face identity vector;
  • the iterative judgment module 605 is stopped, and after each round of iteration, the optimized face identity vector is set according to the optimized face identity vector.
  • the stop iteration module 606 is configured to send the optimized face identity vector to the tracking thread in response to meeting the stop iteration condition, so When the tracking thread receives the optimized face identity vector, the received face identity vector is determined as the current face identity vector; the clearing instruction sending module 607 is set to stop the iteration in response to not meeting the requirements.
  • the face tracking device provided by the embodiments of the present application can execute the face tracking methods provided by the third and fourth embodiments of the present application, and has functional modules and effects corresponding to the execution methods.
  • the electronic device may include: a processor 701 , a storage device 702 , a display screen 703 with a touch function, an input device 704 , an output device 705 and a communication device 706 .
  • the number of processors 701 in the electronic device may be one or more, and one processor 701 is taken as an example in FIG. 7 .
  • the processor 701 , the storage device 702 , the display screen 703 , the input device 704 , the output device 705 and the communication device 706 of the electronic device may be connected through a bus or other means. In FIG. 7 , the connection through a bus is taken as an example.
  • the electronic device is configured to execute the face tracking method provided by any embodiment of the present application.
  • Embodiments of the present application further provide a computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a device, enable the device to execute the face tracking method described in the foregoing method embodiments.
  • the computer-readable storage medium is a non-transitory storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

本文公开了一种人脸追踪方法、装置、电子设备和存储介质。人脸追踪方法包括:在对视频帧进行人脸追踪过程中判断优化线程是否运行;若优化线程运行,在视频帧为关键帧时,根据视频帧更新第二关键帧数据集,当接收到优化线程的发送的清空第二关键帧数据集中的视频帧的清空指令时,清空第二关键帧数据集中的视频帧,并将第二关键帧数据集更新到第一关键帧数据集;若优化线程未运行,在视频帧为关键帧时,根据视频帧、以及第二关键帧数据集更新第一关键帧数据集;在第一关键帧数据集更新后调用优化线程,以使得优化线程基于第一关键帧数据集对人脸身份进行优化。

Description

人脸追踪方法、装置、电子设备和存储介质
本申请要求在2021年01月05日提交中国专利局、申请号为202110007729.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理技术领域,例如涉及一种人脸追踪方法、装置、电子设备和存储介质。
背景技术
视频人脸追踪可以用于实现一些视觉增强效果,诸如给视频中的人脸试戴帽子、眼镜,追加胡子、纹身等等,也可以用于驱动虚拟人偶的表情运动。
视频中的二维人脸是用户人脸身份和用户表情组合后三维人脸的二维投影,在实时人脸追踪中,需要根据视频中人脸上的二维关键点准确地计算用户人脸身份和用户表情,因此需要快速地重建出人脸身份来进行人脸追踪。
在实时人脸身份重建中,需要从用户的实时视频中提取关键帧来重建人脸身份,用户使用十分方便,并且随着视频数据增多,可以用更多关键帧优化人脸身份,减低误差。但是,在人脸身份重建过程中,人脸身份优化线程运行时追踪线程无法增加关键帧,只能在人脸身份优化线程运行结束后增加一个关键帧,导致追踪线程即使检测到关键帧也无法增加关键帧来优化人脸身份,并且每增加一个关键帧调用一次人脸身份优化线程,一方面,关键帧数量增加缓慢以及遗漏了关键帧,人脸身份优化的收敛速度慢、人脸身份不准确,另一方面,增加了人脸优化线程的调用次数,上述两方面原因导致实时人脸身份重建实时性差,最终无法应用于实时人脸追踪。
发明内容
本申请提供一种人脸追踪方法、装置、电子设备和存储介质,以解决相关技术中人脸身份重建时人脸身份向量收敛速度慢,人脸身份优化线程调用次数多的问题。
本申请提供了一种人脸追踪方法,应用于追踪线程,所述追踪线程维护有第一关键帧数据集和第二关键帧数据集,所述人脸追踪方法包括:
在对视频帧进行人脸追踪过程中,判断优化线程是否运行;
响应于所述优化线程运行,在所述视频帧为关键帧时,根据所述视频帧更 新第二关键帧数据集;
当接收到优化线程发送的清空所述第二关键帧数据集中的视频帧的清空指令时,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到第一关键帧数据集中;
响应于所述优化线程未运行,在所述视频帧为关键帧时,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集;
在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于所述第一关键帧数据集对人脸身份进行优化。
本申请提供了一种人脸追踪方法,应用于优化线程,包括:
在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量;
获取第一关键帧数据集,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据;
基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;
基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;
每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件;
响应于满足所述停止迭代条件,将所述优化后的人脸身份向量发送到所述追踪线程,所述追踪线程在接收到所述优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量;
响应于不满足所述停止迭代条件,向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,所述追踪线程接收到所述清空指令后,在所述第二关键帧数据集的第二关键帧集合为非空集合时,将所述第二关键帧数据集更新到所述第一关键帧数据集中;
将所述优化后的人脸身份向量作为初始人脸身份向量,返回基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据的步骤。
本申请提供了一种人脸追踪装置,应用于追踪线程,所述追踪线程维护有第一关键帧数据集和第二关键帧数据集,所述人脸追踪装置包括:
优化线程运行判断模块,设置为在对视频帧进行人脸追踪过程中,判断优 化线程是否运行;
第二关键帧数据集更新模块,设置为响应于所述优化线程运行,在所述视频帧为关键帧时,根据所述视频帧更新第二关键帧数据集;
清空模块,设置为当接收到优化线程发送的清空所述第二关键帧数据集中的视频帧的清空指令时,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到第一关键帧数据集中;
第一关键帧数据集更新模块,设置为响应于所述优化线程未运行,在所述视频帧为关键帧时,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集;
优化线程调用模块,设置为在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于所述第一关键帧数据集对人脸身份进行优化。
本申请实施例提供了一种人脸追踪装置,应用于优化线程,包括:
人脸身份向量初始化模块,设置为在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量;
第一关键帧数据集获取模块,设置为获取第一关键帧数据集,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据;
人脸追踪数据优化模块,设置为基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;
人脸身份向量优化模块,设置为基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;
停止迭代判断模块,设置为每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件;
停止迭代模块,设置为响应于满足所述停止迭代条件,将所述优化后的人脸身份向量发送到所述追踪线程,所述追踪线程在接收到所述优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量;
清空指令发送模块,设置为响应于不满足所述停止迭代条件,向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,所述追踪线程接收到所述清空指令后,在所述第二关键数据集的第二关键帧集合为非空集合时,将所述第二关键帧数据集更新到所述第一关键帧数据集中;
初始人脸身份向量更新模块,设置为将所述优化后的人脸身份向量作为初始人脸身份向量,返回人脸追踪数据优化模块。
本申请提供了一种电子设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的人脸追踪方法。
本申请提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的人脸追踪方法。
附图说明
图1是本申请实施例一提供的一种人脸追踪方法的步骤流程图;
图2是本申请实施例二提供的一种人脸追踪方法的步骤流程图;
图3是本申请实施例三提供的一种人脸追踪方法的步骤流程图;
图4是本申请实施例四提供的一种人脸追踪方法的步骤流程图;
图5是本申请实施例五提供的一种人脸追踪装置的结构框图;
图6是本申请实施例六提供的一种人脸追踪装置的结构框图;
图7是本申请实施例七提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。此处所描述的具体实施例仅仅用于解释本申请。为了便于描述,附图中仅示出了与本申请相关的部分。
实施例一
图1为本申请实施例一提供的一种人脸追踪方法的步骤流程图,本申请实施例可适用于人脸追踪过程中追踪线程追踪人脸提取关键帧的情况,该方法可以由本申请实施例的人脸追踪装置来执行,该人脸追踪装置可以由硬件或软件来实现,并集成在本申请实施例所提供的电子设备中,如图1所示,本申请实施例的人脸追踪方法可以包括如下步骤:
S101、在对视频帧进行人脸追踪过程中,判断优化线程是否运行。
本申请实施例中,可以通过追踪线程和优化线程对视频帧进行人脸追踪,追踪线程是用于对视频的视频帧进行人脸追踪获得人脸姿态数据、表情数据和人脸关键点等人脸追踪数据的线程,追踪线程还用于检测视频帧是否为关键帧,优化线程可以是基于追踪线程检测出的关键帧对人脸身份向量进行优化的线程, 优化线程为追踪线程提供优化后的人脸身份向量,追踪线程基于优化后的人脸身份向量对视频帧进行人脸追踪并检测关键帧。
追踪线程和优化线程之间可以交互数据,从而使得追踪线程可以判断优化线程是否在运行,示例性地,优化线程可以设置有状态标识,追踪线程可以根据该状态标识来判断优化线程是否运行,如果优化线程运行,则执行S102,如果优化线程未运行,则执行S104。
S102、在所述视频帧为关键帧时,根据所述视频帧更新第二关键帧数据集。
本申请实施例中,追踪线程维护有第一关键帧数据集Φ 1和第二关键帧数据集Φ 2,每个关键帧数据集包括关键帧集合、关键帧的帧向量、关键帧的人脸追踪数据。
追踪线程开始对视频进行人脸追踪之前,可以初始化人脸身份向量、第一关键帧数据集Φ 1和第二关键帧数据集Φ 2等多种参数,其中,当前人脸身份向量可以是当前对视频帧进行追踪所使用的人脸身份向量,追踪线程采用当前人脸身份向量对视频帧进行人脸追踪后可以获得人脸追踪数据,该人脸追踪数据可以是人脸姿态数据、人脸表情数据以及人脸关键点,即人脸追踪过程中,在当前人脸身份向量α给定的情况下,追踪线程对第i个视频帧进行人脸追踪后获得如下人脸追踪数据:
{Q i|P ii}
Q i为人脸关键点,P i为姿态数据,δ i为表情数据。
本申请实施例中,优化线程是采用第一关键帧数据集中的关键帧的人脸追踪数据对人脸身份向量进行优化,在优化线程运行过程中不更新第一关键帧数据集。
在本申请的一个可选实施例中,追踪线程还维护有第一主成分分析(Principal Components Analysis,PCA)子空间和第二PCA子空间,PCA子空间可以是关键帧数据集的关键帧集合中所有关键帧的帧向量、平均帧向量和特征向量矩阵所构成的空间,则第一PCA子空间可以是第一关键帧数据集的第一关键帧集合中所有关键帧的帧向量、平均帧向量和特征向量矩阵所构成的空间,在启动优化线程前,将第一PCA子空间赋值给第二PCA子空间,可以根据当前视频帧的人脸追踪数据和第二PCA子空间检测当前的视频帧是否为关键帧,如果当前视频帧F i为关键帧,则将视频帧F i作为关键帧增加到第二关键帧数据集的第二关键帧集合中、将视频帧的帧向量增加到第二关键帧数据集中并更新第二PCA子空间,即Φ 2=Φ 2∪{F i},然后以更新后的第二PCA子空间继续检测下一视频帧是否为关键帧。
示例性地,在人脸追踪的过程中,追踪线程根据表情数据δ和姿态数据中的旋转向量计算每个视频帧的帧向量,在对第一关键帧数据集中的所有关键帧的帧向量进行PCA分析并保留95%的变化量,得到一个平均帧向量v 0和一个特征向量矩阵M,可以将所有关键帧的帧向量、平均帧向量v 0、特征向量矩阵M作为第一PCA子空间,将第一PCA子空间赋值给第二PCA子空间后,对于任意一个视频帧的帧向量v到第二PCA子空间的距离为:
dis(v,M)=‖v-(v 0+MM T(v-v 0))‖
如果距离dis(v,M)小于预设的阈值ε 1,则确定该视频帧为关键帧,将该视频帧增加到第二关键帧数据集的第二关键帧集合中,将该第二关键帧的帧向量添加到第二关键帧数据集中,并根据该第二关键帧的帧向量更新第二PCA子空间,如果距离dis(v,M)不小于预设的阈值ε 1,则对下一视频帧进行追踪。
在实际应用中不仅仅限于通过PCA子空间来判断视频帧是否为关键帧,本领域技术人员还可以通过其他方法判断一个视频帧是否是关键帧,本申请实施例对此不加以限制。
S103、当接收到优化线程发送的清空所述第二关键帧数据集的视频帧的指令时,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到第一关键帧数据集中。
如果优化线程在对人脸身份向量优化迭代,在每轮迭代优化结束后,优化线程判断第二关键帧数据集中的第二关键帧集合是否为非空集合,若第二关键帧数据集中的第二关键帧集合为非空集合,说明在优化线程迭代优化过程中追踪线程追踪到关键帧,优化线程可以发送清空第二关键帧数据集的视频帧的指令到追踪线程,追踪线程在接收到清空指令后,将第二关键帧数据集中第二关键帧集合中的关键帧更新到第一关键帧数据集的第一关键帧集合中,并清空第二关键帧集合,同时将第二关键帧集合中关键帧的帧向量也增加到第一关键帧集合中,使得正在运行的优化线程可以使用新追踪的关键帧的人脸追踪数据来迭代优化人脸身份向量,一方面,实现了在优化线程运行过程中增加关键帧,避免追踪到关键帧无法增加导致漏掉关键帧,提高了关键帧的增加速度,另一方面,可以在优化线程运行过程中增加多个关键帧,减少调用优化线程的次数。
S104、在所述视频帧为关键帧时,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集。
如果优化线程未运行,则可以基于第一PCA子空间和人脸追踪数据判断视频帧是否为关键帧,可参考基于第二PCA子空间和人脸追踪数据判断视频帧是否为关键帧,在此不再详述。
如果确定视频帧为关键帧,将该视频帧作为关键帧增加到第一关键帧数据集的第一关键帧集合中,以及将视频帧的帧向量增加到第一关键帧数据集中,同时将视频帧的帧向量增加到第一PCA子空间中,以及更新第一PCA子空间的平均帧向量和特征向量矩阵,对下一视频帧进行追踪,如果视频不是关键帧,则判断第二关键帧数据集中的第二关键帧集合是否为非空集合,若第二关键帧数据集中的第二关键帧集合为非空集合,将第二关键帧集合中的关键帧更新到第一关键帧集合中,将第二关键帧集合中的关键帧的帧向量增加到第一关键帧数据集中。
S105、在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于更新后的第一关键帧数据集对人脸身份进行优化。
第一关键帧数据集更新说明追踪到新的关键帧,第二关键帧数据集中第二关键帧集合为非空集合同样说明追踪到新的关键帧,同样会更新第一关键帧数据集,当优化线程未运行时,如果第一关键帧数据集更新,可以调用优化线程,通过优化线程来结合第一关键帧数据集的关键帧的人脸追踪数据对人脸身份向量进行优化。
本申请实施例的人脸追踪方法,实现了在优化线程运行过程中先根据检测到的关键帧更新第二关键帧数据集,当接收到优化线程清空第二关键帧数据集的视频帧的指令时,将第二关键帧数据集更新到第一关键帧数据集中,并清空第二关键帧数据集中的视频帧,在优化线程未运行时,若视频帧为关键帧则根据视频帧、以及第二关键帧数据集更新第一关键数据集,即不论优化线程是否运行,追踪线程都可以随时增加关键帧,一方面,能够快速地提取到大量的关键帧来优化人脸身份向量,提高了人脸身份向量收敛速度,另一方面,无需检测到一个关键帧调用一次优化线程,减少了优化线程的调用次数,最终使得实时人脸身份向量优化实时性高,消耗的资源少,能够使用更多的资源来实现复杂的优化算法以提高人脸身份优化的准确性。
实施例二
图2为本申请实施例二提供的一种人脸追踪方法的步骤流程图,本申请实施例在前述实施例一的基础上进行说明,如图2所示,本申请实施例的人脸追踪方法可以包括如下步骤:
S201、当从优化线程接收到人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量。
本申请实施例中追踪线程为优化线程提供关键帧,优化线程采用关键帧对人脸身份向量进行优化,当优化线程对人脸身份迭代优化得到收敛后的人脸身 份向量后,将该人脸身份向量发送到追踪线程,追踪线程将该接收到的人脸身份向量作为当前人脸身份向量,以该当前人脸身份向量来对接收到的视频帧进行人脸追踪。
S202、基于所述当前人脸身份向量对视频帧进行追踪,获得所述视频帧中人脸的人脸关键点、姿态数据和表情数据作为人脸追踪数据。
追踪线程在开始对视频进行人脸追踪前,可以初始化人脸身份向量等多种参数,则当前人脸身份向量可以是当前对视频帧进行追踪所使用的人脸身份向量,追踪线程采用当前人脸身份向量对视频帧进行人脸追踪后可以获得人脸追踪数据,该人脸追踪数据可以是人脸姿态数据、人脸表情数据以及人脸关键点,即人脸追踪过程中,在当前身份向量α给定的情况下,追踪线程对第i个视频帧进行人脸追踪后获得如下人脸追踪数据:
{Q i|P ii}
Q i为人脸关键点,P i为姿态数据,δ i为表情数据。
S203、判断优化线程是否运行。
追踪线程和优化线程之间可以交互数据,从而使得追踪线程可以获取优化线程是否在运行,示例性地,优化线程可以设置有状态标识,追踪线程可以根据该状态标识来判断优化线程是否运行,如果优化线程运行,则执行S204,如果优化线程未运行,则执行S212。
S204、在调用优化线程前,将第一PCA子空间赋值给第二PCA子空间,所述第一PCA子空间为所述第一关键帧数据集的第一关键帧集合中所有关键帧的帧向量、平均帧向量和特征向量矩阵所构成的空间。
在本申请实施例中,在优化线程每次运行前,可以将第一PCA子空间赋值给第二PCA子空间,第一PCA子空间为对第一关键帧数据集中第一关键帧集合的所有关键帧的帧向量、平均帧向量和特征向量矩阵。
S205、基于所述第二PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧。
可选地,可以基于姿态数据和表情数据计算视频帧的帧向量,计算视频帧的帧向量与第二PCA子空间之间的距离,在该距离小于预设的阈值时,确定视频帧为关键帧,执行S206,在该距离不小于预设的阈值时,确定视频帧不为关键帧,则执行S209。
示例性地,在人脸追踪的过程中,追踪线程根据视频帧中人脸的表情数据δ和姿态数据中的旋转向量计算每个视频帧的帧向量v,得到一个平均帧向量v 0和 一个特征向量矩阵M,M即为第一PCA子空间,将第一PCA子空间赋值给第二PCA子空间后,对于任意一个视频帧的帧向量v到第二PCA子空间的距离为:
dis(v,M)=‖v-(v 0+MM T(v-v 0))‖
v 0为第二PCA子空间中的平均帧向量,M为第二PCA子空间中的特征向量矩阵,如果距离dis(v,M)小于预设的阈值ε 1,则确定该视频帧为关键帧,如果距离dis(v,M)不小于预设的阈值ε 1,则该视频帧不是关键帧。
S206、根据所述视频帧更新第二关键帧数据集。
可以将当前的视频帧作为关键帧增加到第二关键帧数据集的第二关键帧集合中,以及将视频帧的人脸关键点、姿态数据、表情数据以及帧向量增加到第二关键帧数据集中,可以执行S207和S210,其中,可以根据视频帧的姿态数据和表情数据计算视频帧的帧向量并增加到第二关键帧数据集中。
在另一可选实施例中,第二关键帧数据集的第二关键帧集合还可以设置有标识,可以先判断第二关键帧集合的标识是否为预设的第一标识,预设的第一标识表示第二关键帧集合为空集合。示例性地,预设的第一标识可以是1,表示第二关键帧集合为空集合,如果预设的第一标识不是1,如为0,则说明第二关键帧集合为非空集合,可以无需更改标识直接将检测到的关键帧增加到第二关键帧集合中,当第二关键帧集合的标识为预设的第一标识1时,说明第二关键帧集合为空集合,在检测出关键帧,将关键帧增加到第二关键帧集合中后可以将第二关键帧集合的标识设置为预设的第二标识0。通过设置第二关键帧集合的标识,可以使得优化线程方便地获知第二关键帧集合是否为非空集合,使得优化线程在优化过程中在第二关键帧集合为非空集合时可以及时向追踪线程发送清空指令来合并第一关键帧集合和第二关键帧集合,采用新检测到的关键帧来优化人脸身份向量。
S207、根据预设步长更新所述阈值。
在本申请实施例采用动态阈值,即实时更新阈值,示例性地:
ε 1=ε 10
其中,ε 1为阈值,ε 0为步长,即每检测到一个关键帧即更新阈值,然后执行S208。
S208、根据所述第二关键帧数据集中的帧向量更新所述第二PCA子空间。
在将视频帧的帧向量、人脸关键点、姿态数据和表情数据增加到第二关键帧数据集之后,可以将视频帧的帧向量增加到第二PCA子空间中,重新计算第二PCA子空间的平均帧向量和特征向量矩阵以更新第二PCA子空间,并且以该 更新后的第二PCA子空间继续判断下一视频帧是否为关键帧,直到优化线程未运行后才基于第一PCA子空间来判断视频帧是否为关键帧。
S209、判断是否从所述优化线程接收到人脸身份向量。
追踪线程在追踪一个视频帧后需要对下一视频帧追踪,在追踪下一视频帧之前,可以先判断是否从优化线程接收到人脸身份向量,若从优化线程接收到人脸身份向量,说明人脸身份向量有优化更新,返回S201以将接收到的人脸身份向量作为当前人脸身份向量。
如果未从人脸优化线程接收到人脸身份向量则返回S202,说明人脸身份向量未更新优化,以当前人脸身份向量继续追踪下一视频帧。
S210、当接收到优化线程发送的清空所述第二关键帧数据集的视频帧的指令时,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到第一关键帧数据集。
如果优化线程在对人脸身份向量优化迭代,在每轮迭代优化结束后,优化线程判断第二关键帧数据集中第二关键帧集合是否为非空集合,若第二关键帧数据集中第二关键帧集合为非空集合,说明在优化线程迭代优化过程中追踪线程追踪到关键帧,优化线程可以发送清空第二关键帧集合的指令到追踪线程,追踪线程在接收到清空第二关键帧集合的指令后,将追踪到的第二关键帧集合中的关键帧更新到第一关键帧集合中,并清空第二关键帧集合,使得正在运行的优化线程可以使用新追踪的关键帧来迭代优化人脸身份向量,一方面,实现了在优化线程运行过程中增加关键帧,避免追踪到关键帧无法增加导致漏掉关键帧,提高了关键帧的增加速度,另一方面,可以在优化线程运行过程中增加多个关键帧,减少调用优化线程的次数。
S211、根据所述第一关键帧数据集中的帧向量更新所述第一PCA子空间。
当将第二关键帧数据集中的帧向量增加到第一关键帧数据集之后,可以将增加的帧向量添加到第一PCA子空间中,并重新计算第一PCA子空间的平均帧向量和特征向量矩阵以更新第一PCA子空间。
S212、基于第一PCA子空间和所述人脸追踪数据判断所述视频帧是否为关键帧。
如果优化线程未运行,可以基于视频帧中人脸的姿态数据和表情数据计算视频帧的帧向量,计算视频帧的帧向量与第一PCA子空间之间的距离,在距离小于预设的阈值时,确定视频帧为关键帧,执行S213,在距离大于或等于预设的阈值时,确定视频帧不是关键帧,执行S216。
基于第一PCA子空间和人脸追踪数据判断视频帧是否为关键帧可参考S205, 在此不再详述。
S213、根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集。
如果优化线程未运行并检测出视频帧为关键帧时,可以将该视频帧增加到第一关键帧数据集的第一关键帧集合中,将该视频帧的帧向量、人脸关键点、姿态数据和表情数据增加到第一关键帧数据集中,同时,将第二关键帧数据集中的第二关键帧集合中的关键帧增加到第一关键帧集合中,将第二关键帧数据集中关键帧的帧向量、人脸关键点、姿态数据和表情数据增加到第一关键帧数据集中。
S214、根据预设步长更新所述阈值。
示例性地:
ε 1=ε 10
其中,ε 1为阈值,ε 0为步长,即每检测到一个关键帧即更新阈值。
S215、根据所述第一关键帧数据集中的帧向量更新所述第一PCA子空间。
在第一关键帧数据集中增加了检测到的关键帧的帧向量,或者增加了第二关键帧数据集中关键帧的帧向量,可以将新增加的帧向量添加到第一PCA子空间,并根据第一PCA子空间中的帧向量计算平均帧向量和特征向量矩阵以更新第一PCA子空间。
S216、判断所述第二关键帧集合是否为非空集合。
如果追踪线程检测当前视频帧不是关键帧,则判断第二关键帧集合是否为非空集合,若第二关键帧集合为非空集合,执行S217,若第二关键帧集合为空集合,则返回S209。
S217、将所述第二关键帧数据集更新到所述第一关键帧数据集中,并更新所述第一PCA子空间。
可以将第二关键帧数据集中的第二关键帧集合中的关键帧增加到第一关键帧集合中,将第二关键帧数据集中关键帧的帧向量、人脸关键点、姿态数据和表情数据增加到第一关键帧数据集中,并根据新增加的帧向量更新第一PCA子空间,继续以更新后的第一PCA子空间检测下一视频帧是否为关键帧。
S218、在所述第一关键帧数据集更新后,调用所述优化线程。
在本申请实施例中,在优化线程未运行时,如果第一关键帧集合更新即调用优化线程,以通过优化线程采用第一关键帧数据集中关键帧的姿态数据和表情数据来优化人脸身份向量。
S219、采用从所述优化线程接收到的相邻两次的人脸身份向量计算人脸变化率。
优化线程每次被调用后均会将优化后的人脸身份向量发送到追踪线程,假设F(α)是人脸身份向量α所对应的人脸网格,s为三维平均人脸最小外接矩形的对角线长度,当相邻两次的人脸身份向量由α 1变成α 2时,计算人脸变化率为:
Figure PCTCN2022070133-appb-000001
即人脸网格的j个顶点中的最大移动量除以三维平均人脸最小外接矩形的对角线长度即为人脸变化率。
S220、判断所述人脸变化率是否小于预设的变化率阈值。
即判断:
Figure PCTCN2022070133-appb-000002
如果上述公式成立,执行S221,如果上述公式不成立,则返回S202继续在人脸追踪过程中提取关键帧,使得优化线程采用更多的关键帧来对人脸身份向量进行优化。
S221、以所述当前人脸身份向量对视频帧进行追踪的过程中停止判断所述视频帧是否为关键帧,且不再调用所述优化线程。
当人脸变化率小于预设的变化率阈值时,人脸身份向量已收敛,追踪线程可以基于当前人脸身份向量对视频帧进行追踪,并且在追踪过程中不再检测关键帧,即不再执行S203-S221的步骤。
本申请实施例的人脸追踪方法,实现了在优化线程运行过程中先根据检测到的关键帧更新第二关键帧数据集,当接收到优化线程清空第二关键帧数据集的视频帧的指令时,将第二关键帧数据集更新到第一关键帧数据集中,并清空第二关键帧数据集中的视频帧,在优化线程未运行时,若视频帧为关键帧则根据视频帧、以及第二关键帧数据集更新第一关键数据集,即不论优化线程是否运行,追踪线程都可以随时增加关键帧,一方面,能够快速地提取到大量的关键帧来优化人脸身份向量,提高了人脸身份向量收敛速度,另一方面,无需检测到一个关键帧调用一次优化线程,减少了优化线程的调用次数,最终使得实时人脸身份向量优化实时性高,消耗的资源少,能够使用更多的资源来实现复杂的优化算法以提高人脸身份优化的准确性。
当接收到优化线程的清空第二关键帧集合的指令时,将第二关键帧数据集 更新到第一关键帧数据集中,并清空第二关键帧数据集中的第二关键帧集合,使得优化线程在迭代过程中可以使用新检测到的关键帧来优化人脸身份向量,无需等到优化线程优化结束后重新调用优化线程才使用新检测到的关键帧来优化人脸身份向量,降低了优化线程的调用次数。
每检测到一个新的关键帧即按照预设步长更新阈值,随着关键帧的增加,阈值逐渐增大,视频帧就越不容易被检测为关键帧,第一关键帧数据集中第一关键帧集合更新的频率变小,优化线程被调用的次数减少。
实施例三
图3为本申请实施例三提供的一种人脸追踪方法的步骤流程图,本申请实施例可适用于优化线程对人脸身份向量进行优化的情况,该方法可以由本申请实施例的人脸追踪装置来执行,该人脸追踪装置可以由硬件或软件来实现,并集成在本申请实施例所提供的电子设备中,如图3所示,本申请实施例的人脸追踪方法可以包括如下步骤:
S301、在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量。
优化线程被调用后,可以获取追踪线程所使用的当前人脸身份向量,将该当前人脸身份向量作为优化的初始人脸身份向量,在一个示例中,优化线程可以将最近一次输出到追踪线程的人脸身份向量作为初始人脸身份向量。
S302、获取第一关键帧数据集,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据。
追踪线程维护第一关键帧数据集,该第一关键帧数据集包括优化线程被调用前检测到的所有关键帧的人脸追踪数据,该人脸追踪数据包括人脸关键点、姿态数据和表情数据。
S303、基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据。
在本申请的可选实施例中,可以基于初始人脸身份向量和表情数据建立三维人脸模型,获取三维人脸模型的人脸关键点,根据三维人脸模型的人脸关键点和人脸追踪数据中的人脸关键点求解最优姿态数据和表情数据,得到优化后的姿态数据和表情数据作为优化后的人脸追踪数据。
S304、基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量。
在一个示例中,可以先根据人脸关键点计算所追踪的人脸的人脸尺寸,基 于每个关键帧的表情数据计算每个关键帧的表情权重,根据人脸追踪数据、人脸尺寸、表情权重、当前人脸身份向量以及初始人脸身份向量建立优化方程,对优化方程迭代求解,得到优化后的人脸身份向量。
S305、每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件。
可以采用优化后的人脸身份向量和初始人脸身份向量计算人脸变化率,如果人脸变化率小于预设的阈值,则停止迭代,将该优化后的人脸身份向量作为本次被调用优化的结果,当达到停止迭代条件时,执行S306,当达不到停止迭代条件时,执行S307。
S306、将所述优化后的人脸身份向量发送到所述追踪线程,所述追踪线程在接收到所述优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量。
当优化线程在一次调用中停止迭代优化结束后得到优化后的人脸身份向量,可以将该人脸身份向量发送到追踪线程,追踪线程接收到该人脸身份向量后作为当前人脸身份向量。
S307、向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,所述追踪线程接收到所述清空指令后,在所述第二关键帧数据集的第二关键帧集合为非空集合时,将所述第二关键帧数据集更新到所述第一关键帧数据集中。
如果未达到停止迭代条件,并且追踪线程检测到了新的关键帧并增加到了第二关键帧数据集的第二关键帧集合中,则可以在每轮迭代结束后向追踪线程发送清空第二关键帧集合的指令,使得追踪线程将第二关键帧集合中的关键帧增加到第一关键帧集合中并清空第二关键帧集合,使得追踪线程在下一轮优化时能够采用新检测到的关键帧来优化人脸身份向量。
S308、将所述优化后的人脸身份向量作为初始人脸身份向量。
优化线程在每轮迭代优化结束后,如果未达到停止迭代条件,将本轮优化迭代得到的优化后的人脸身份向量作为下一轮迭代的初始人脸身份向量,并返回S302继续迭代优化人脸身份向量。
本申请实施例的人脸追踪方法,在优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量,获取第一关键帧数据集并基于初始人脸身份向量对第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;基于优化后的人脸追踪数据对初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;每轮迭代后,根据优化后的人脸身份向量和初始人脸身份向量判断是否满足停止迭代条件;若满足停止迭代条件,将优化后 的人脸身份向量发送到追踪线程,追踪线程在接收到优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量;若不满足停止迭代条件,向追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,追踪线程接收到所述清空指令后,在第二关键帧数据集的第二关键帧集合为非空集合时,将第二关键帧集合更新到第一关键帧集合中;将优化后的人脸身份向量作为初始人脸身份向量继续优化人脸身份向量。从而实现了优化线程在迭代优化过程中可以采用新增加到第二关键帧集合中的关键帧来优化人脸身份向量,无需等待优化线程每次调用结束后才新增关键帧来优化人脸身份向量,一方面,能够快速地提取到大量的关键帧来优化人脸身份向量,提高了人脸身份向量收敛速度,另一方面,无需检测到一个关键帧调用一次优化线程,减少了优化线程的调用次数,最终使得实时人脸身份向量优化实时性高,消耗的资源少,能够使用更多的资源来实现复杂的优化算法以提高人脸身份优化的准确性。
实施例四
图4为本申请实施例四提供的一种人脸追踪方法的步骤流程图,本申请实施例在前述实施例三的基础上进行说明,如图4所示,本申请实施例的人脸追踪方法可以包括如下步骤:
S401、在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量。
示例性地,当前人脸身份向量为α rre,在优化线程一次调用结束前,追踪线程使用当前人脸身份向量为α pre对接收到的视频帧进行人脸追踪。
S402、获取第一关键帧数据集,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据。
S403、基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据。
可选地,可以基于初始人脸身份向量和表情数据建立三维人脸模型,获取三维人脸模型的人脸关键点,根据三维人脸模型的人脸关键点和人脸追踪数据中的人脸关键点求解最优姿态数据和表情数据,得到优化后的姿态数据和表情数据作为优化后的人脸追踪数据。
在人脸身份向量α给定的情况下,第i个视频帧的三维人脸模型F i可以表示为:
F i=F i(α,δ)=C 0+C expδ    (1)
其中,C 0是用户的无表情的中性人脸,C exp是针对该用户的表情形状融合变形器,δ为人脸表情数据。
对于PCA三维人脸模型:
F i=F i(α,δ)=B+B IDα+B exp(α)δ    (2)
可以设置:
C 0=B+B IDα,C exp=B exp(α)
其中,B是平均人脸,B ID是用户的身份形状融合变形器,B exp是针对平均人脸B设计出来的表情形状融合变形器,其中,B、B ID、B exp可以预先设置。
对于双线性三维人脸模型:
Figure PCTCN2022070133-appb-000003
C是用户的无表情的中性人脸,
Figure PCTCN2022070133-appb-000004
是模态积。
可以根据输入的人脸关键点Q i来求解下面的优化方程(4)得到最优的P ii
Figure PCTCN2022070133-appb-000005
其中,k为第k轮迭代,
Figure PCTCN2022070133-appb-000006
是第k轮迭代所使用的中性人脸,
Figure PCTCN2022070133-appb-000007
是第k轮迭代所使用的表情形状融合变形器,
Figure PCTCN2022070133-appb-000008
是对三维人脸(C 0+C expδ i) j投影得到j个关键点,Q i是追踪线程中人脸关键点提取算法直接提取的人脸关键点,γ是参数,通过对上述方程(4)求解最小值后得到最优P ii,即追踪线程对第i个视频帧进行人脸追踪后获得如下人脸追踪数据:
{Q i|P ii}
Q i为人脸关键点,P i为姿态数据,δ i为表情数据。
S404、根据人脸关键点计算所追踪的人脸的人脸尺寸。
可以根据人脸关键点确定人脸最小外接矩形框,并通过最小外接矩形框上的人脸关键点计算人脸的人脸尺寸f i
S405、基于每个关键帧的表情数据计算每个关键帧的表情权重。
可选地,第一关键帧集合包括多个关键帧,对多个关键帧进行人脸追踪后获得每个关键帧的表情数据,则可以在所有关键帧中的表情数据中确定出最小表情数据,然后采用预设的常数项、最小表情数据以及关键帧的表情数据计算关键帧的表情权重,其中,关键帧的表情权重与关键帧的表情数据负相关,如下公式所示:
通过以下公式计算每个关键帧的表情权重:
Figure PCTCN2022070133-appb-000009
其中,
Figure PCTCN2022070133-appb-000010
为第k轮迭代中关键帧i的表情权重,r为常数,
Figure PCTCN2022070133-appb-000011
为第k轮迭代中关键帧i的表情数据,I为第一关键帧集合,
Figure PCTCN2022070133-appb-000012
为所有关键帧中的最小表情数据。
从上述公式(5)可知,关键帧中人脸表情越大,
Figure PCTCN2022070133-appb-000013
越小,反之,人脸表情越小,
Figure PCTCN2022070133-appb-000014
越大,从而可以降低大表情人脸对人脸身份向量的影响。
S406、根据所述人脸追踪数据、所述人脸尺寸、所述表情权重、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解,得到所述关键帧中人脸模型的优化后的人脸身份向量。
在本申请的可选实施例中,建立优化方程如下:
Figure PCTCN2022070133-appb-000015
公式中,I为第一关键帧集合,β 1、β 2和γ为参数可预先设置,α pre为追踪线程使用的当前人脸身份向量。
在本申请的一个可选实施例中,对于双线性三维人脸模型
Figure PCTCN2022070133-appb-000016
Figure PCTCN2022070133-appb-000017
迭代优化人脸身份向量α k的公式如下:
Figure PCTCN2022070133-appb-000018
在另一可选实施例中,对于PCA三维人脸模型F i=F i(α,δ)=B+B IDα+B exp(α)δ,迭代优化人脸身份向量α k的公式如下:
Figure PCTCN2022070133-appb-000019
公式中,I为第一关键帧集合,β 1、β 2和β 3为参数可预先设置,α pre为追踪线程使用的当前人脸身份向量。
在上述公式(7)和公式(8)中:
正则项
Figure PCTCN2022070133-appb-000020
中引入人脸尺寸f i,使得参数β 1可以自适应不同的人脸尺寸;引入人脸身份向量的平滑项β 2||α-α pre||,虽然降低了人脸身份向量的整体的收敛速度,但是在变脸等特殊应用场景下,可以防止追踪线程中人脸身份向量更新后出现人脸的突然抖动;在优化的过程中引入β 3||α-α k-1||,可以使用B expk-1)去近似B expk),使得优化方程(6)适用于PCA三维人脸模型和双线性三维人脸模型;在优化的过程中动态计算每个关键帧权重w i,降低大 表情人脸的权重,可以降低大表情对人脸身份向量的影响。
S407、每轮迭代后,采用当前迭代得到的人脸身份向量和上一轮迭代的人脸身份向量计算人脸变化率。
每轮迭代结束后得到一个优化后的人脸身份向量α k和上一轮迭代的人脸身份向量α k-1,可以通过以下公式计算人脸变化率:
Figure PCTCN2022070133-appb-000021
其中,F(α)是人脸身份向量α所对应的人脸网格,s为三维平均人脸最小外接矩形的对角线长度,即人脸网格的j个顶点中的最大移动量除以三维平均人脸最小外接矩形的对角线长度即为人脸变化率。
S408、判断所述人脸变化率是否小于预设的变化率阈值。
即判断:
Figure PCTCN2022070133-appb-000022
如果上述公式成立,即确定满足停止迭代求解条件,执行S409,确定不满足停止迭代求解条件,则执行S410。
S409、将所述优化后的人脸身份向量发送到所述追踪线程,所述追踪线程在接收到所述优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量。
将优化后的人脸身份向量α k发送到追踪线程,追踪线程在接收到优化后的人脸身份向量α k后,将所接收到的人脸身份向量α确定为当前人脸身份向量对视频帧进行追踪。
可选地,优化线程还根据优化后的人脸身份向量α k计算新的用户人脸
Figure PCTCN2022070133-appb-000023
和用户人脸表情形状混合器
Figure PCTCN2022070133-appb-000024
将优化后的人脸身份向量α k、用户人脸
Figure PCTCN2022070133-appb-000025
和用户人脸表情形状混合器
Figure PCTCN2022070133-appb-000026
对视频帧进行追踪。
还根据第一关键帧数据集中优化之后的关键帧的表情数据和姿态数据更新每个关键帧的帧向量,根据每个关键帧的帧向量更新第一PCA子空间,使得第一PCA子空间更为准确,能够根据第一PCA子空间更准确地检测出关键帧。
S410、向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,所述追踪线程接收到所述清空指令后,在所述第二关键帧数据集的第二关键帧集合为非空集合时,将所述第二关键帧数据集更新到所述第一关键帧数据集中。
S411、将所述优化后的人脸身份向量作为初始人脸身份向量,返回S402。
本申请实施例的人脸追踪方法,每轮迭代后,根据优化后的人脸身份向量和初始人脸身份向量判断是否满足停止迭代条件;若满足停止迭代条件,将优化后的人脸身份向量发送到追踪线程,追踪线程将所接收到的人脸身份向量确定为当前人脸身份向量;若不满足停止迭代条件,向追踪线程发送清空第二关键帧数据集的视频帧的清空指令,追踪线程接收到清空指令后,在第二关键帧集合为非空集合时,将第二关键帧集合更新到第一关键帧集合中,并清空第二关键帧集合,将优化后的人脸身份向量作为初始人脸身份向量,然后继续迭代求解优化后的人脸身份向量,从而实现了优化线程在迭代优化过程中采用新增加到第二关键帧集合中的关键帧来优化人脸身份向量,无需等待优化线程每次调用结束后才新增关键帧来优化人脸身份向量,一方面,能够快速地提取到大量的关键帧来优化人脸身份向量,提高了人脸身份向量收敛速度,另一方面,无需检测到一个关键帧调用一次优化线程,减少了优化线程的调用次数,最终使得实时人脸身份向量优化实时性高,消耗的资源少,能够使用更多的资源来实现复杂的优化算法以提高人脸身份优化的准确性。
优化方程中引入人脸尺寸f i,可以自适应不同的人脸尺寸;引入人脸身份向量的平滑项β 2||α-α pre||,虽然降低了人脸身份向量的整体的收敛速度,但是在变脸等特殊应用场景下,可以防止追踪线程中人脸身份向量更新后出现人脸的突然抖动;在优化的过程中引入β 3||α-α k-1||,可以使用B expk-1)去近似B expk),使得优化方程适用于PCA三维人脸模型和双线性三维人脸模型;在优化的过程中动态计算每个关键帧的表情权重,降低大表情人脸的权重,可以降低大表情对人脸身份向量的影响。
实施例五
图5是本申请实施例五提供的一种人脸追踪装置的结构框图,如图5所示,本申请实施例的人脸追踪装置应用于追踪线程,所述追踪线程维护有第一关键帧数据集和第二关键帧数据集,所述人脸追踪装置包括:
优化线程运行判断模块501,设置为在对视频帧进行人脸追踪过程中,判断优化线程是否运行;第二关键帧数据集更新模块502,设置为响应于所述优化线程运行,在所述视频帧为关键帧时,根据所述视频帧更新第二关键帧数据集;清空模块503,设置为当接收到优化线程发送的清空所述第二关键帧数据集中的视频帧的清空指令时,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到第一关键帧数据集中;第一关键帧数据集更新模块504,设置为响应于所述优化线程未运行,在所述视频帧为关键帧时,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集;优化线程调用模块505,设置为在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于所述第一关键帧数据集对人脸身份进行优化。
本申请实施例所提供的人脸追踪装置可执行本申请实施例一、实施例二所提供的人脸追踪方法,具备执行方法相应的功能模块和效果。
实施例六
图6是本申请实施例六提供的一种人脸追踪装置的结构框图,如图6所示,本申请实施例的人脸追踪装置应用于优化线程,可以包括如下模块:
人脸身份向量初始化模块601,设置为在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量;第一关键帧数据集获取模块602,设置为获取第一关键帧数据集,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据;人脸追踪数据优化模块603,设置为基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;人脸身份向量优化模块604,设置为基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;停止迭代判断模块605,设置为每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件;停止迭代模块606,设置为响应于满足所述停止迭代条件,将所述优化后的人脸身份向量发送到所述追踪线程,所述追踪线程在接收到所述优化后的人脸身份向量时,将所接收到的人脸身份向量确定为当前人脸身份向量;清空指令发送模块607,设置为响应于不满足所述停止迭代条件,向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,所述追踪线程接收到所述清空指令后,在所述第二关键帧集合为非空集合时,将所述第二关键帧数据集更新到所述第一关键帧数据集中;初始人脸身份向量更新模块608,设置为将所述优化后的人脸身份向量作为初始人脸身份向量,返回人脸追踪数据优化模块。
本申请实施例所提供的人脸追踪装置可执行本申请实施例三、实施例四所提供的人脸追踪方法,具备执行方法相应的功能模块和效果。
实施例七
参照图7,示出了本申请一个示例中的一种电子设备的结构示意图。如图7所示,该电子设备可以包括:处理器701、存储装置702、具有触摸功能的显示屏703、输入装置704、输出装置705以及通信装置706。该电子设备中处理器701的数量可以是一个或者多个,图7中以一个处理器701为例。该电子设备的处理器701、存储装置702、显示屏703、输入装置704、输出装置705以及通信装置706可以通过总线或者其他方式连接,图7中以通过总线连接为例。所述电子设备设置为执行如本申请任一实施例提供的人脸追踪方法。
本申请实施例还提供一种计算机可读存储介质,所述存储介质中的指令由设备的处理器执行时,使得设备能够执行如上述方法实施例所述的人脸追踪方法。该计算机可读存储介质为非暂态存储介质。
对于装置、电子设备、存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。

Claims (32)

  1. 一种人脸追踪方法,应用于追踪线程,所述追踪线程维护有第一关键帧数据集和第二关键帧数据集,所述人脸追踪方法包括:
    在对视频帧进行人脸追踪过程中,判断优化线程是否运行;
    响应于所述优化线程运行,在所述视频帧为关键帧的情况下,根据所述视频帧更新所述第二关键帧数据集;
    在接收到所述优化线程发送的清空所述第二关键帧数据集中的视频帧的清空指令的情况下,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到所述第一关键帧数据集中;
    响应于所述优化线程未运行,在所述视频帧为关键帧的情况下,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集;
    在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于所述第一关键帧数据集对人脸身份进行优化。
  2. 根据权利要求1所述的人脸追踪方法,在所述判断优化线程是否运行之前,还包括:
    在从所述优化线程接收到人脸身份向量的情况下,将所接收到的人脸身份向量确定为当前人脸身份向量;
    基于所述当前人脸身份向量对所述视频帧进行追踪,获得所述视频帧中人脸的人脸关键点、姿态数据和表情数据作为人脸追踪数据。
  3. 根据权利要求2所述的人脸追踪方法,其中,所述第一关键帧数据集和所述第二关键帧数据集均包括关键帧集合、人脸追踪数据以及关键帧的帧向量,所述追踪线程还维护第一主成分分析PCA子空间和第二PCA子空间;
    在所述视频帧为关键帧的情况下,根据所述视频帧更新所述第二关键帧数据集之前,还包括:
    在调用所述优化线程前,将所述第一PCA子空间赋值给所述第二PCA子空间,所述第一PCA子空间为所述第一关键帧数据集的第一关键帧集合中所有关键帧的帧向量、平均帧向量和特征向量矩阵所构成的空间;
    基于所述第二PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧;
    响应于所述视频帧为关键帧,执行所述根据所述视频帧更新所述第二关键帧数据集的步骤;
    响应于所述视频帧不为关键帧,判断是否从所述优化线程接收到人脸身份 向量;
    响应于从所述优化线程接收到人脸身份向量,返回所述在从所述优化线程接收到人脸身份向量的情况下,将所接收到的人脸身份向量确定为当前人脸身份向量的步骤;
    响应于未从所述优化线程接收到人脸身份向量,接收下一视频帧,返回所述基于所述当前人脸身份向量对所述视频帧进行追踪,获得所述视频帧中人脸的人脸关键点、姿态数据和表情数据作为人脸追踪数据的步骤。
  4. 根据权利要求3所述的人脸追踪方法,其中,所述基于所述第二PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧,包括:
    基于所述姿态数据和所述表情数据确定所述视频帧的帧向量;
    采用所述视频帧的帧向量和所述第二PCA子空间的所述平均帧向量以及所述特征向量矩阵计算所述视频帧的帧向量与所述第二PCA子空间之间的距离;
    在所述距离小于预设的阈值的情况下,确定所述视频帧为关键帧;
    在所述距离大于预设的阈值的情况下,确定所述视频帧不是关键帧。
  5. 根据权利要求3所述的人脸追踪方法,在所述基于所述第二PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧之前,还包括:
    判断所述第二关键帧集合的标识是否为预设的第一标识,其中,所述预设的第一标识表示所述第二关键帧集合为空集合;响应于所述第二关键帧集合的标识不为所述预设的第一标识,将所述第二关键帧集合的标识设置为预设的第二标识。
  6. 根据权利要求4所述的人脸追踪方法,其中,所述根据所述视频帧更新第二关键帧数据集,包括:
    将所述视频帧添加到所述第二关键帧数据集的第二关键帧集合中;
    将所述视频帧的人脸关键点、姿态数据、表情数据作为人脸追踪数据添加到所述第二关键帧数据集中;
    将所述视频帧的帧向量添加到所述第二关键帧数据集中。
  7. 根据权利要求6所述的人脸追踪方法,在所述根据所述视频帧更新第二关键帧数据集之后,还包括:
    根据预设步长更新所述阈值;
    根据所述第二关键帧数据集中的帧向量更新所述第二PCA子空间。
  8. 根据权利要求3所述的人脸追踪方法,其中,所述将所述第二关键帧数据集更新到所述第一关键帧数据集中,包括:
    将所述第二关键帧数据集的第二关键帧集合更新到所述第一关键帧数据集的第一关键帧集合中;
    将所述第二关键帧数据集中的人脸追踪数据更新到所述第一关键帧数据集中;
    将所述第二关键帧数据集中关键帧的帧向量更新到所述第一关键帧数据集中。
  9. 根据权利要求8所述的人脸追踪方法,其中,所述将所述第二关键帧数据集更新到第一关键帧数据集中之后,还包括:
    根据所述第一关键帧数据集中的帧向量更新所述第一PCA子空间。
  10. 根据权利要求5所述的人脸追踪方法,在所述将所述第二关键帧数据集更新到第一关键帧数据集之后,还包括:
    设置所述第二关键帧集合的标识为所述预设的第一标识。
  11. 根据权利要求3所述的人脸追踪方法,在所述视频帧为关键帧的情况下,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集之前,还包括:
    基于所述第一PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧;
    响应于所述视频帧为关键帧,执行所述根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集的步骤;
    响应于所述视频帧不为关键帧,判断所述第二关键帧数据集的第二关键帧集合是否为非空集合;
    响应于所述第二关键帧集合为非空集合,将所述第二关键帧数据集更新到所述第一关键帧数据集中,并更新所述第一PCA子空间;
    响应于所述第二关键帧集合为空集合,判断是否从所述优化线程接收到人脸身份向量;
    响应于从所述优化线程接收到人脸身份向量,返回所述在从所述优化线程接收到人脸身份向量的情况下,将所接收到的人脸身份向量确定为当前人脸身份向量的步骤;
    响应于未从所述优化线程接收到人脸身份向量,接收下一视频帧,返回所述基于所述当前人脸身份向量对所述视频帧进行追踪,获得所述视频帧中人脸的姿态数据和表情数据作为人脸追踪数据的步骤。
  12. 根据权利要求11所述的人脸追踪方法,其中,所述基于第一PCA子空间、所述视频帧的姿态数据和表情数据判断所述视频帧是否为关键帧,包括:
    基于所述姿态数据和所述表情数据确定所述视频帧的帧向量;
    采用所述视频帧的帧向量、所述第一PCA子空间的所述平均帧向量以及所述特征向量矩阵计算所述视频帧的帧向量与所述第一PCA子空间之间的距离;
    在所述距离小于预设的阈值的情况下,确定所述视频帧为关键帧;
    在所述距离大于预设的阈值的情况下,确定所述视频帧不是关键帧。
  13. 根据权利要求12所述的人脸追踪方法,其中,所述根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集,包括:
    将所述视频帧、以及所述第二关键帧数据集的第二关键帧集合更新到所述第一关键帧数据集的第一关键帧集合;
    将所述视频帧的帧向量、以及所述第二关键帧数据集中的帧向量添加到所述第一关键帧数据集中;
    将所述视频帧的人脸追踪数据、以及所述第二关键帧数据集中视频帧的人脸追踪数据添加到所述第一关键帧数据集中。
  14. 根据权利要求12所述的人脸追踪方法,在所述根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集之后,还包括:
    根据预设步长更新所述阈值;
    根据所述第一关键帧数据集中的帧向量更新所述第一PCA子空间。
  15. 根据权利要求2-14任一项所述的人脸追踪方法,在所述将所接收到的人脸身份向量确定为当前人脸身份向量之后,还包括:
    采用从所述优化线程接收到的相邻两次的人脸身份向量计算人脸变化率;
    判断所述人脸变化率是否小于预设的变化率阈值;
    响应于所述人脸变化率小于预设的变化率阈值,以所述当前人脸身份向量对所述视频帧进行追踪的过程中停止判断所述视频帧是否为关键帧,且不再调用所述优化线程;
    响应于所述人脸变化率不小于预设的变化率阈值,返回所述基于所述当前人脸身份向量对所述视频帧进行追踪,获得所述视频帧中人脸的姿态数据和表 情数据作为人脸追踪数据的步骤。
  16. 一种人脸追踪方法,应用于优化线程,包括:
    在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量;
    获取第一关键帧数据集,其中,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据;
    基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;
    基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;
    每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件;
    响应于满足所述停止迭代条件,将所述优化后的人脸身份向量发送到所述追踪线程,以使所述追踪线程在接收到所述优化后的人脸身份向量的情况下,将所接收到的人脸身份向量确定为当前人脸身份向量;
    响应于不满足所述停止迭代条件,向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,以使所述追踪线程接收到所述清空指令后,在所述第二关键帧数据集的第二关键帧集合为非空集合的情况下,将所述第二关键帧数据集更新到所述第一关键帧数据集中;
    将所述优化后的人脸身份向量作为初始人脸身份向量,返回所述基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据的步骤。
  17. 根据权利要求13所述的人脸追踪方法,其中,所述人脸追踪数据包括人脸关键点、姿态数据和表情数据,所述基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据,包括:
    基于所述初始人脸身份向量和所述表情数据建立三维人脸模型;
    获取所述三维人脸模型的人脸关键点;
    根据所述三维人脸模型的人脸关键点和所述人脸追踪数据中的人脸关键点求解最优姿态数据和表情数据,得到优化后的姿态数据和表情数据作为所述优化后的人脸追踪数据。
  18. 根据权利要求17所述的人脸追踪方法,其中,所述根据所述三维人脸 模型的人脸关键点和所述人脸追踪数据中的人脸关键点求解最优姿态数据和表情数据,得到优化后的姿态数据和表情数据作为所述优化后的人脸追踪数据,包括:
    通过以下公式求解所述优化后的人脸追踪数据:
    Figure PCTCN2022070133-appb-100001
    其中,k为第k轮迭代,
    Figure PCTCN2022070133-appb-100002
    是第k轮迭代所使用的中性人脸,
    Figure PCTCN2022070133-appb-100003
    是第k轮迭代所使用的表情形状融合变形器,
    Figure PCTCN2022070133-appb-100004
    是对三维人脸模型(C 0+C expδ i) j投影得到j个关键点,Q i是第一关键帧数据集中人脸追踪数据的人脸关键点,γ是参数,δ i是表情数据,P i是姿态数据,i为第i个关键帧。
  19. 根据权利要求16所述的人脸追踪方法,其中,所述人脸追踪数据包括人脸关键点、姿态数据和表情数据,所述基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量,包括:
    根据所述人脸关键点计算所追踪的人脸的人脸尺寸;
    基于每个关键帧的表情数据计算每个关键帧的表情权重;
    根据所述人脸追踪数据、所述人脸尺寸、每个关键帧的表情权重、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解,得到所述优化后的人脸身份向量。
  20. 根据权利要求19所述的人脸追踪方法,其中,所述基于每个关键帧的表情数据计算每个关键帧的表情权重,包括:
    在所有关键帧中的表情数据中确定出最小表情数据;
    采用预设的常数项、所述最小表情数据以及所述关键帧的表情数据计算所述关键帧的表情权重,其中,所述关键帧的表情权重与所述关键帧的表情数据负相关。
  21. 根据权利要求20所述的人脸追踪方法,其中,所述采用预设的常数项、所述最小表情数据以及所述关键帧的表情数据计算所述关键帧的表情权重,包括:
    通过以下公式计算每个关键帧的表情权重:
    Figure PCTCN2022070133-appb-100005
    其中,
    Figure PCTCN2022070133-appb-100006
    为第k轮迭代中关键帧i的表情权重,r为常数,
    Figure PCTCN2022070133-appb-100007
    为第k轮迭代中关键帧i的表情数据,I为第一关键帧集合,
    Figure PCTCN2022070133-appb-100008
    为所有关键帧中的最小 表情数据。
  22. 根据权利要求19所述的人脸追踪方法,其中,所述根据所述人脸追踪数据、所述人脸尺寸、每个关键帧的表情权重、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解,得到所述优化后的人脸身份向量,包括:
    基于所述当前人脸身份向量和每个关键帧的表情数据建立三维人脸模型;
    将所述三维人脸模型投影到二维平面得到多个投影人脸关键点;
    计算所述多个投影人脸关键点和所述人脸关键点的距离和;
    根据所述表情数据、所述距离和、每个关键帧的表情权重、所述人脸尺寸、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解,得到所述优化后的人脸身份向量。
  23. 根据权利要求22所述的人脸追踪方法,其中,所述人脸模型包括双线性人脸模型,所述根据所述表情数据、所述距离和、每个关键帧的表情权重、所述人脸尺寸、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解得到所述优化后的人脸身份向量,包括:
    通过以下公式求解所述优化后的人脸身份向量:
    Figure PCTCN2022070133-appb-100009
    α k为第k轮迭代求解得到的中间人脸身份向量,
    Figure PCTCN2022070133-appb-100010
    第k轮对关键帧i进行人脸追踪后得到的双线性三维人脸模型,
    Figure PCTCN2022070133-appb-100011
    为第k轮对关键帧i的表情数据,C为用户的中性人脸,
    Figure PCTCN2022070133-appb-100012
    为第k轮迭代中关键帧i的表情权重,I为第一关键帧集合,
    Figure PCTCN2022070133-appb-100013
    为关键帧i中双线性人脸模型投影后的j个投影人脸关键点,
    Figure PCTCN2022070133-appb-100014
    为追踪线程追踪关键帧i得到的j个人脸关键点,f i为人脸大小,α pre为追踪线程当前使用的人脸身份向量,β 1和β 2为常数。
  24. 根据权利要求22所述的人脸追踪方法,其中,所述人脸模型包括主成分分析PCA人脸模型,所述根据所述表情数据、所述距离和、每个关键帧的表情权重、所述人脸尺寸、所述当前人脸身份向量以及所述初始人脸身份向量迭代求解得到所述优化后的人脸身份向量,包括:
    通过以下公式求解优化后的人脸身份向量:
    Figure PCTCN2022070133-appb-100015
    Figure PCTCN2022070133-appb-100016
    α k为第k轮迭代求解得到的中间人脸身份向量,
    Figure PCTCN2022070133-appb-100017
    为第k轮对关键帧i进行人脸追踪后得到的PCA三维人脸模型,
    Figure PCTCN2022070133-appb-100018
    为第k轮对关键帧i的表情数据,B是平均人脸,B ID是用户的身份形状融合变形器,B exp是基于平均人脸B设计的表情形状融合变形器,
    Figure PCTCN2022070133-appb-100019
    为第k轮迭代中关键帧i的表情权重,I为第一关键帧集合,
    Figure PCTCN2022070133-appb-100020
    为关键帧i中PCA三维人脸模型投影后的j个投影人脸关键点,
    Figure PCTCN2022070133-appb-100021
    为追踪线程追踪关键帧i得到的j个人脸关键点,f i为人脸大小,α pre为追踪线程当前使用的人脸身份向量,β 1和β 2为常数,α k-1为第k-1轮迭代得到的人脸身份向量。
  25. 根据权利要求16-24任一项所述的人脸追踪方法,其中,所述每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件,包括:
    采用所述优化后的人脸身份向量和所述初始人脸身份向量计算人脸变化率;
    判断所述人脸变化率是否小于预设的变化率阈值;
    响应于所述人脸变化率小于预设的变化率阈值,确定满足所述停止迭代条件;
    响应于所述人脸变化率不小于预设的变化率阈值,确定未满足所述停止迭代条件。
  26. 根据权利要求25所述的人脸追踪方法,其中,所述采用所述优化后的人脸身份向量和所述初始人脸身份向量计算人脸变化率,包括:
    获取平均人脸的人脸大小;
    计算所述优化后的人脸身份向量对应的人脸网格和所述初始人脸身份向量对应的人脸网格的距离;
    计算所述距离与所述平均人脸的人脸大小的比值作为所述人脸变化率。
  27. 根据权利要求16所述的人脸追踪方法,在所述将所述优化后的人脸身份向量发送到所述追踪线程之后,还包括:
    根据所述第一关键帧数据集中优化之后的关键帧的表情数据和姿态数据更新每个关键帧的帧向量;
    根据每个关键帧的帧向量更新第一PCA子空间。
  28. 根据权利要求16所述的人脸追踪方法,在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量之前,还包括:
    将第一PCA子空间赋值给第二PCA子空间。
  29. 一种人脸追踪装置,应用于追踪线程,所述追踪线程维护有第一关键帧数据集和第二关键帧数据集,所述人脸追踪装置包括:
    优化线程运行判断模块,设置为在对视频帧进行人脸追踪过程中,判断优化线程是否运行;
    第二关键帧数据集更新模块,设置为响应于所述优化线程运行,在所述视频帧为关键帧的情况下,根据所述视频帧更新所述第二关键帧数据集;
    清空模块,设置为在接收到所述优化线程发送的清空所述第二关键帧数据集中的视频帧的清空指令的情况下,清空所述第二关键帧数据集中的视频帧,并将所述第二关键帧数据集更新到所述第一关键帧数据集中;
    第一关键帧数据集更新模块,设置为响应于所述优化线程未运行,在所述视频帧为关键帧的情况下,根据所述视频帧、以及所述第二关键帧数据集更新所述第一关键帧数据集;
    优化线程调用模块,设置为在所述第一关键帧数据集更新后,调用所述优化线程,以使得所述优化线程基于所述第一关键帧数据集对人脸身份进行优化。
  30. 一种人脸追踪装置,应用于优化线程,包括:
    人脸身份向量初始化模块,设置为在所述优化线程被调用后,将追踪线程所使用的当前人脸身份向量作为初始人脸身份向量;
    第一关键帧数据集获取模块,设置为获取第一关键帧数据集,其中,所述第一关键帧数据集为所述追踪线程进行人脸追踪后更新的数据集,所述第一关键帧数据集包括人脸追踪数据;
    人脸追踪数据优化模块,设置为基于所述初始人脸身份向量对所述第一关键帧数据集中的人脸追踪数据进行优化,得到优化后的人脸追踪数据;
    人脸身份向量优化模块,设置为基于所述优化后的人脸追踪数据对所述初始人脸身份向量进行迭代优化得到优化后的人脸身份向量;
    停止迭代判断模块,设置为每轮迭代后,根据所述优化后的人脸身份向量和所述初始人脸身份向量判断是否满足停止迭代条件;
    停止迭代模块,设置为响应于满足所述停止迭代条件,将所述优化后的人脸身份向量发送到所述追踪线程,以使所述追踪线程在接收到所述优化后的人脸身份向量的情况下,将所接收到的人脸身份向量确定为当前人脸身份向量;
    清空指令发送模块,设置为响应于不满足所述停止迭代条件,向所述追踪线程发送清空第二关键帧数据集中的视频帧的清空指令,以使所述追踪线程接收到所述清空指令后,在所述第二关键帧数据集的第二关键帧集合为非空集合的情况下,将所述第二关键帧数据集更新到所述第一关键帧数据集中;
    初始人脸身份向量更新模块,设置为将所述优化后的人脸身份向量作为初始人脸身份向量,返回所述人脸追踪数据优化模块。
  31. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-28任一项所述的人脸追踪方法。
  32. 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-28任一项所述的人脸追踪方法。
PCT/CN2022/070133 2021-01-05 2022-01-04 人脸追踪方法、装置、电子设备和存储介质 WO2022148349A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/260,126 US20240062579A1 (en) 2021-01-05 2022-01-04 Face tracking method and electronic device
EP22736511.1A EP4276681A4 (en) 2021-01-05 2022-01-04 FACE TRACKING METHOD AND APPARATUS AND ELECTRONIC DEVICE AND STORAGE MEDIUM
JP2023541109A JP7503348B2 (ja) 2021-01-05 2022-01-04 顔追跡方法、顔追跡装置、電子デバイスおよび記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110007729.1A CN112712044B (zh) 2021-01-05 2021-01-05 人脸追踪方法、装置、电子设备和存储介质
CN202110007729.1 2021-01-05

Publications (1)

Publication Number Publication Date
WO2022148349A1 true WO2022148349A1 (zh) 2022-07-14

Family

ID=75548274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070133 WO2022148349A1 (zh) 2021-01-05 2022-01-04 人脸追踪方法、装置、电子设备和存储介质

Country Status (5)

Country Link
US (1) US20240062579A1 (zh)
EP (1) EP4276681A4 (zh)
JP (1) JP7503348B2 (zh)
CN (1) CN112712044B (zh)
WO (1) WO2022148349A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712044B (zh) * 2021-01-05 2023-08-08 百果园技术(新加坡)有限公司 人脸追踪方法、装置、电子设备和存储介质
CN113221841A (zh) * 2021-06-02 2021-08-06 云知声(上海)智能科技有限公司 一种人脸检测和跟踪的方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321134A1 (en) * 2011-06-15 2012-12-20 Samsung Electornics Co., Ltd Face tracking method and device
CN103646391A (zh) * 2013-09-30 2014-03-19 浙江大学 一种针对动态变化场景的实时摄像机跟踪方法
CN108345821A (zh) * 2017-01-24 2018-07-31 成都理想境界科技有限公司 面部追踪方法及设备
CN111914613A (zh) * 2020-05-21 2020-11-10 淮阴工学院 一种多目标跟踪及面部特征信息识别方法
CN112712044A (zh) * 2021-01-05 2021-04-27 百果园技术(新加坡)有限公司 人脸追踪方法、装置、电子设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9558396B2 (en) * 2013-10-22 2017-01-31 Samsung Electronics Co., Ltd. Apparatuses and methods for face tracking based on calculated occlusion probabilities
JP2018055167A (ja) 2016-09-26 2018-04-05 カシオ計算機株式会社 自律移動装置、自律移動方法及びプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120321134A1 (en) * 2011-06-15 2012-12-20 Samsung Electornics Co., Ltd Face tracking method and device
CN103646391A (zh) * 2013-09-30 2014-03-19 浙江大学 一种针对动态变化场景的实时摄像机跟踪方法
CN108345821A (zh) * 2017-01-24 2018-07-31 成都理想境界科技有限公司 面部追踪方法及设备
CN111914613A (zh) * 2020-05-21 2020-11-10 淮阴工学院 一种多目标跟踪及面部特征信息识别方法
CN112712044A (zh) * 2021-01-05 2021-04-27 百果园技术(新加坡)有限公司 人脸追踪方法、装置、电子设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4276681A4

Also Published As

Publication number Publication date
EP4276681A4 (en) 2024-01-24
JP7503348B2 (ja) 2024-06-20
EP4276681A1 (en) 2023-11-15
CN112712044A (zh) 2021-04-27
CN112712044B (zh) 2023-08-08
JP2024502349A (ja) 2024-01-18
US20240062579A1 (en) 2024-02-22

Similar Documents

Publication Publication Date Title
WO2022148349A1 (zh) 人脸追踪方法、装置、电子设备和存储介质
US11403874B2 (en) Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium
CN109523581B (zh) 一种三维点云对齐的方法和装置
US11238633B2 (en) Method and apparatus for beautifying face, electronic device, and storage medium
CN115049799B (zh) 3d模型和虚拟形象的生成方法和装置
CN115147558B (zh) 三维重建模型的训练方法、三维重建方法及装置
JP7389840B2 (ja) 画像画質補強方法、装置、機器および媒体
CN115147265B (zh) 虚拟形象生成方法、装置、电子设备和存储介质
WO2022135518A1 (zh) 基于三维卡通模型的眼球配准方法、装置、服务器和介质
WO2016165614A1 (zh) 一种即时视频中的表情识别方法和电子设备
CN111899159B (zh) 用于变换发型的方法、装置、设备以及存储介质
US20230401799A1 (en) Augmented reality method and related device
AU2014253687A1 (en) System and method of tracking an object
WO2021218650A1 (zh) 自适应刚性先验模型训练方法、人脸跟踪方法、训练装置、及跟踪装置
WO2022143264A1 (zh) 人脸朝向估计方法、装置、电子设备及存储介质
CN112417985A (zh) 一种人脸特征点追踪方法、系统、电子设备和存储介质
CN114708374A (zh) 虚拟形象生成方法、装置、电子设备和存储介质
WO2021208767A1 (zh) 人脸轮廓修正方法、装置、设备及存储介质
US11830236B2 (en) Method and device for generating avatar, electronic equipment, medium and product
CN116543417A (zh) 人体姿态估计方法、装置、设备以及存储介质
WO2023151348A1 (zh) 一种图像中关键点的处理方法和相关装置
CN115359171A (zh) 虚拟形象处理方法、装置、电子设备和存储介质
CN115082298A (zh) 图像生成方法、装置、电子设备以及存储介质
CN112561995A (zh) 一种实时高效的6d姿态估计网络、构建方法及估计方法
CN112183155A (zh) 动作姿态库建立、动作姿态生成、识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22736511

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18260126

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2023541109

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022736511

Country of ref document: EP

Effective date: 20230807