WO2020098686A1 - Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage - Google Patents

Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage Download PDF

Info

Publication number
WO2020098686A1
WO2020098686A1 PCT/CN2019/117945 CN2019117945W WO2020098686A1 WO 2020098686 A1 WO2020098686 A1 WO 2020098686A1 CN 2019117945 W CN2019117945 W CN 2019117945W WO 2020098686 A1 WO2020098686 A1 WO 2020098686A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
training
dimensional
model
matrix
Prior art date
Application number
PCT/CN2019/117945
Other languages
English (en)
Chinese (zh)
Inventor
陈德健
Original Assignee
广州市百果园信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市百果园信息技术有限公司 filed Critical 广州市百果园信息技术有限公司
Priority to SG11202105115TA priority Critical patent/SG11202105115TA/en
Priority to RU2021115692A priority patent/RU2770752C1/ru
Priority to EP19883820.3A priority patent/EP3882808A4/fr
Priority to US17/294,664 priority patent/US11922707B2/en
Publication of WO2020098686A1 publication Critical patent/WO2020098686A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Definitions

  • Embodiments of the present application relate to the technical field of data processing, for example, a training method of a face detection model, a detection method of a face key point, a training device of a face detection model, and a detection device of a face key point , Equipment and storage media.
  • the embodiments of the present application provide a method for training a face detection model, a method for detecting a face key point, a training device for a face detection model, a device, a device and a storage medium for detecting a face key point, In order to avoid the inaccurate detection of face key points in the detection method of related face key points, the accuracy of face key point detection is improved.
  • an embodiment of the present application provides a training method for a face detection model, including: acquiring a training face image; performing a three-dimensional reconstruction of the training face image based on a preset three-dimensional face model to obtain a training three-dimensional Face model; according to the training 3D face model, generate a training UV coordinate map containing the 3D coordinates of the training 3D face model; use the training face image and the training UV coordinate map to perform a semantic segmentation network Training to obtain a face detection model, wherein the face detection model is set to generate a UV coordinate map containing three-dimensional coordinates.
  • an embodiment of the present application provides a method for detecting key points of a face, including: acquiring a target face image; inputting the target face image into a pre-trained face detection model to generate the target A UV coordinate map of a face image, the UV coordinate map includes a plurality of pixels, each pixel in the UV coordinate map contains three-dimensional coordinates; obtain a UV template map, the UV template map contains a pre-marked face Key points; determining pixel points corresponding to the key points of the face in the UV coordinate map to detect the three-dimensional coordinates of the key points of the face.
  • an embodiment of the present application provides a training device for a face detection model, including: a training face image acquisition module configured to obtain a training face image; and a three-dimensional reconstruction module configured to be based on a preset three-dimensional face The model performs three-dimensional reconstruction of the training face image to obtain a training three-dimensional face model; a training UV coordinate map generation module is set to generate a three-dimensional coordinate containing the training three-dimensional face model according to the training three-dimensional face model A training UV coordinate map; a training module configured to train the semantic segmentation network using the training face image and the training UV coordinate map to obtain a face detection model, wherein the face detection model is used to generate a three-dimensional UV coordinates of the coordinates.
  • an embodiment of the present application provides a face key point detection device, including: a target face image acquisition module configured to acquire a target face image; a UV coordinate map generation module configured to configure the target person
  • the face image is input into a pre-trained face detection model to generate a UV coordinate map of the target face image, the UV coordinate map includes a plurality of pixels, and each pixel in the UV coordinate map includes three-dimensional coordinates
  • the template image acquisition module is set to obtain the UV template image, the UV template image contains pre-marked key points of the face; the face key point 3D coordinate detection module is set to determine the face in the UV coordinate map Pixel points corresponding to the key points to detect the three-dimensional coordinates of the key points of the face.
  • an embodiment of the present application provides an apparatus including: at least one processor; a storage device configured to store at least one program, and when the at least one program is executed by the at least one processor, The at least one processor implements at least one of the method for training a face detection model and the method for detecting key points of a face according to any embodiment of the present application.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • a method for training a face detection model according to any embodiment of the present application is implemented And at least one of the key face detection methods.
  • FIG. 1 is a flowchart of a method for training a face detection model provided by an embodiment of the present application
  • FIG. 2A is a flowchart of another training method of a face detection model provided by an embodiment of the present application.
  • 2B is a schematic diagram of alignment of a three-dimensional face model in an embodiment of the present application.
  • 2C is a schematic diagram of generating a training three-dimensional face model through three-dimensional reconstruction in an embodiment of the present application
  • 3A is a flowchart of a method for detecting key points of a face provided by an embodiment of the present application
  • 3B is a schematic diagram of the UV coordinate map output by the face detection model in an embodiment of the present application.
  • 3C is a schematic diagram of a UV template diagram in an embodiment of the present application.
  • FIG. 4 is a structural block diagram of a training device for a face detection model provided by an embodiment of the present application.
  • FIG. 5 is a structural block diagram of a face key point detection device provided by an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a device provided by an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for training a face detection model provided by an embodiment of the present application.
  • the embodiment of the present application may be applicable to training a face detection model to generate u, v texture map (UV) coordinate maps including three-dimensional coordinates
  • the method can be executed by a training device of the face detection model, which can be implemented by at least one of software and hardware, and integrated in the device that executes the method, as shown in FIG. 1,
  • the method may include steps S101 to S104.
  • step S101 a training face image is acquired.
  • the training face image may be a two-dimensional image containing a face, and the storage format of the two-dimensional image may be BMP, JPG, PNG, TIF and other formats.
  • BMP Bitmap
  • JPG JPG
  • PNG Joint Photographic Experts Group
  • TIF TIF
  • BMP Bitmap
  • the image depth of BMP files includes lbit, 4bit, 8bit, and 24bit;
  • JPG is the joint image expert group format (Joint Photographic) Experts Group, JPEG), JPEG pictures store a single bitmap in 24-bit color, JPEG is a platform-independent format;
  • Portable Network Graphic Format (Portable Network Graphic) (PNG) is a bitmap file storage format, PNG is used to store For grayscale images, the depth of the grayscale image can be up to 16 bits, when storing color images, the depth of the color image can be up to 48 bits, and alpha channel data of up to 16 bits can also be stored;
  • Tag image file format (Tag Image File Format (TIF) is a flexible bitmap format, mainly used to store images including photos and art images.
  • training face images In practical applications, you can select training face images from locally stored images, you can also obtain published face images from the network as training face images, or you can intercept face images from videos as training Face image.
  • the embodiment of the present application does not limit the storage format of the training face image, nor does it limit the method of acquiring the training face image.
  • step S102 the training face image is three-dimensionally reconstructed based on a preset three-dimensional face model to obtain a training three-dimensional face model.
  • the preset three-dimensional face model can be a three-dimensional scan of a real face to obtain three-dimensional scan data of the real face.
  • the three-dimensional scan data is a three-dimensional face model.
  • the three-dimensional face model may also be a published three-dimensional face model.
  • a large number of preset three-dimensional face models can be analyzed to obtain the principal component components and feature values of the three-dimensional face model, and different three-dimensional people can be generated by fitting the principal component components and fitting different feature values
  • the face model therefore, for the training face image, different feature values can be fitted for three-dimensional reconstruction to generate a training three-dimensional face model corresponding to the training face image.
  • step S103 according to the training three-dimensional face model, a training UV coordinate map including the three-dimensional coordinates of the training three-dimensional face model is generated.
  • UV is the abbreviation of U and V texture map coordinates, where U is the horizontal coordinate and V is the vertical coordinate.
  • U is the horizontal coordinate
  • V is the vertical coordinate.
  • each point on the image in the UV coordinate system can be mapped to a three-dimensional face
  • the surface of the model that is, each point on the three-dimensional face model has a unique point on the UV coordinate map, and the three-dimensional face model can be restored through the UV coordinate map.
  • the UV coordinate map may be a two-dimensional plan view that stores the three-dimensional coordinates of the training three-dimensional face model.
  • the UV coordinate map may be composed of a three-dimensional array.
  • the first two dimensions The data is the position of the pixel point on the UV coordinate map on the UV coordinate map.
  • the last one-dimensional data represents the three-dimensional coordinates associated with the pixel point.
  • the three-dimensional coordinates are the three-dimensional coordinates of the points on the training three-dimensional face model.
  • the training three-dimensional face model may be a collection of multiple vertices, each vertex has a determined three-dimensional coordinate, and a corresponding training UV coordinate map may be generated according to the vertices of the three-dimensional face model.
  • the pixels correspond one-to-one with the vertices on the trained three-dimensional face model, and each pixel is associated with the three-dimensional coordinates of the corresponding vertices.
  • step S104 the semantic segmentation network is trained using the training face image and the training UV coordinate map to obtain a face detection model, wherein the face detection model is set to generate a UV coordinate map containing three-dimensional coordinates .
  • the face detection model can generate a UV coordinate map containing three-dimensional coordinates according to the input face image.
  • the training face image can be used as training data to train the corresponding face image Train the UV coordinate map as a training label, input the training face image into the semantic segmentation network, estimate the three-dimensional coordinates of each pixel of the training face image and the position of each pixel in the UV coordinate map to generate a prediction And then calculate the loss rate according to the predicted UV coordinate map and the training UV coordinate map corresponding to the input training face image, and then adjust the network parameters to obtain the final trained face detection model.
  • a UV coordinate map corresponding to the face image can be obtained, and each pixel point on the UV coordinate map and the face image on the face image The pixels correspond, and each pixel contains three-dimensional coordinates.
  • the embodiment of the present application After obtaining the training face image, the embodiment of the present application performs three-dimensional reconstruction on the training face image based on the preset three-dimensional face model to obtain the training three-dimensional face model, and generates the training three-dimensional face according to the training three-dimensional face model.
  • the training UV coordinate map of the three-dimensional coordinates of the model; the training face image and the training UV coordinate map are used to train the semantic segmentation network to obtain a face detection model.
  • a three-dimensional reconstruction of the training face image is used to generate a training UV coordinate map through the training three-dimensional face model generated by the three-dimensional reconstruction, so as to obtain training data to train the semantic segmentation network, and obtain a face detection model.
  • Training face images and training UV coordinate maps for manual annotation avoiding the need for manual prediction of training data in related technologies, which leads to inaccurate training data, resulting in inaccurate coordinates of key points of the face output by CNN, and improved the face detection model Performance, while also improving the accuracy of face key point detection.
  • the face detection model can generate a UV coordinate map containing three-dimensional coordinates, and can detect the three-dimensional coordinates of key points, so that the detected key points of the face have depth information, which enriches the application scenarios of the key points.
  • FIG. 2A is a flowchart of yet another method for training a face detection model provided by an embodiment of the present application.
  • the embodiment of the present application refines the three-dimensional reconstruction and generation of training UV coordinate maps based on the foregoing embodiments, such as As shown in FIG. 2A, the method may include steps S201 to S208.
  • step S201 a training face image is acquired.
  • step S202 M three-dimensional face models are selected.
  • M three-dimensional face models can be selected from a preset three-dimensional face model library, and the selected three-dimensional face models are preprocessed, and the preprocessed three-dimensional face models are aligned using optical flow method. To get the aligned 3D face model.
  • the three-dimensional face model is generated by three-dimensional scanning. Different scanners have different imaging principles. There may be missing data in some areas of the three-dimensional face model. The missing data can be used to fill the hole in the three-dimensional face model. Or in order to eliminate or reduce the subtle changes in the lighting conditions during the scanning process, the surface of the 3D face model is not smooth, you can smooth the 3D face model, of course, you can also do coordinate correction processing on the 3D coordinates of the vertices of the local area.
  • the application examples do not limit the manner of preprocessing.
  • the optical flow method can be used for the 3D face model
  • the alignment makes the three-dimensional face model have the same dimensional vector, and the vertices with the same semantic information correspond to the same vector position.
  • the vertices on the three-dimensional face model are expressed by cylindrical coordinates as I Where h is the height of the vertex in cylindrical coordinates, Indicates the rotation angle. According to the definition of the optical flow method, when the vertex moves slightly, its brightness remains unchanged, so that h and The mathematical expression:
  • S (x 1 y 1 z 1 , x 2 y 2 z 12 , ..., x k y k z k )
  • all three-dimensional faces have k-dimensional vectors
  • the vertices with the same semantic information correspond to the same vector position, for example, the points on the nose in S 0 and S are stored in the k-th vector position, so that the three-dimensional face model is aligned.
  • step S203 principal component analysis is performed on the M three-dimensional face models to obtain a principal component component matrix and a eigenvalue matrix.
  • Principal component analysis is a multivariate statistical analysis method that selects a small number of important variables by linear transformation of multiple variables.
  • M three-dimensional faces can be analyzed
  • the model performs principal component analysis to obtain principal component component matrix and eigenvalue matrix of M three-dimensional face models.
  • any three-dimensional face model S i can be linearly represented by the principal component component matrix s and the eigenvalue matrix ⁇ .
  • step S204 for each training face image, the principal component component matrix and the feature value matrix corresponding to each training face image are used for three-dimensional reconstruction to obtain a training three-dimensional face model.
  • an initial eigenvalue matrix and an initial projection parameter matrix may be set, and an initial three-dimensional face model is constructed using the initial eigenvalue matrix and the principal component component matrix.
  • the initial three-dimensional human is constructed by the initial eigenvalue matrix ⁇ ′ and the principal component component matrix s
  • the projected face image of the initial three-dimensional face model in the two-dimensional space can be obtained based on the initial projection parameter matrix.
  • the projection of the initial three-dimensional face model to the two-dimensional space needs to consider issues such as face translation, face rotation, and lighting.
  • the above projection parameters are expressed by the matrix ⁇ .
  • the difference value between the projected face image and the training face image can be calculated, and according to the difference value, the initial eigenvalue matrix and the initial projection parameter matrix are iteratively optimized using the stochastic gradient descent method to obtain the difference value convergence
  • the eigenvalue matrix and the projection parameter matrix are used as the target eigenvalue matrix and the target projection parameter matrix, and the target eigenvalue matrix is used as the eigenvalue matrix corresponding to the training face image.
  • the difference between the projected face image I model and the trained face image I input can be calculated by the following formula:
  • E I is the difference value
  • I input (x, y) is the pixel on the training face image
  • I model (x, y) is the pixel on the training face image
  • ⁇ . ⁇ 2 represents the L2 norm .
  • the initial eigenvalue matrix ⁇ ′ and the initial projection parameter matrix ⁇ need to be optimized, and the three-dimensional face model is reconstructed through the optimized initial eigenvalue matrix ⁇ ′ and the initial projection parameter matrix ⁇ After the projection, the difference value E I is calculated until the difference value E I converges.
  • a stochastic gradient descent algorithm can be used for optimization, for example, for the first N rounds of iterative optimization, K feature points are randomly selected in the training face image and the projected face image, and according to the K feature points Calculate the difference between the projected face image and the trained face image, and use the stochastic gradient descent method to iteratively optimize the partial eigenvalues and the initial projection parameter matrix in the initial eigenvalue matrix according to the difference, and perform iterative optimization in N rounds Then, the stochastic gradient descent method is used to optimize all the eigenvalues in the eigenvalue matrix after N rounds of iteration and the projection parameter matrix after N rounds of iteration, to obtain the eigenvalue matrix and the projection parameter matrix when the difference value converges, as the target feature Value matrix and target projection parameter matrix.
  • SGD stochastic gradient descent algorithm
  • K 40 random feature points are selected from the training face image I input and the projected face image I model for optimization, and the feature points are the key pixel points describing the facial facial features contours .
  • the feature points can be pixels on the eyes, mouth or face contour of the human face.
  • the n eigenvalues with larger eigenvalues in the initial eigenvalue matrix ⁇ ′ can be optimized first to avoid overfitting caused by fitting smaller eigenvalues during the first N rounds of iteration. Therefore, in the first N rounds of iterations, feature values of higher importance can be fitted. For example, in the first 100 iterations, 10 eigenvalues and projection parameter matrices with larger eigenvalues are optimized. After 100 iterations, all the eigenvalues and projection parameter matrices are optimized. When the difference E I converges, the target For the eigenvalue matrix, the target eigenvalue matrix and the principal component component matrix can be used to generate and train a three-dimensional face model.
  • the training three-dimensional face model S is formed by fitting the principal component components in the principal component component matrix and the eigenvalues in the target feature value matrix, that is, based on the preset three-dimensional face model, the training face 3D reconstruction of the image to obtain a trained 3D face model can avoid the high cost of 3D scanning of the face to obtain the 3D face model, and the sparse data of the public 3D face model can not provide sufficient training data for the face detection model training , Improve the efficiency of obtaining training data and reduce the cost of obtaining training data.
  • step S205 multiple vertices of the training three-dimensional face model are obtained, and each of the vertices has three-dimensional coordinates.
  • the training three-dimensional face model may be a set of multiple vertices, and each vertex has certain three-dimensional coordinates in the training three-dimensional face model.
  • step S206 the multiple vertices are projected onto a preset spherical surface to obtain multiple corresponding projection points of the multiple vertices on the preset spherical surface.
  • multiple vertices can be projected onto a spherical surface with a preset radius through spherical projection. For example, for each vertex of a three-dimensional face model, the vertex is connected to the spherical center, then the connection is on the spherical surface The intersection point is the projection of vertices on the sphere. After multiple vertices of the training 3D face model are projected, the irregular training 3D face model can be projected onto the sphere.
  • step S207 the spherical surface containing the multiple projection points is expanded to generate a training UV coordinate map containing the multiple projection points; wherein each projection point in the training UV coordinate map is associated There are three-dimensional coordinates of the corresponding vertices.
  • the preset spherical surface can be expanded into a two-dimensional plan to obtain a UV coordinate map.
  • the UV coordinate map stores the three-dimensional coordinates of multiple vertices of the training three-dimensional face model, that is, each pixel in the UV coordinate map
  • the training three-dimensional face model can be restored through the pixel points in the UV coordinate map.
  • the pixel points in the UV coordinate map are the projection points.
  • step S208 the semantic segmentation network is trained using the training face image and the training UV coordinate map to obtain a face detection model, wherein the face detection model is set to generate a UV coordinate map containing three-dimensional coordinates .
  • the training face image may be used as the training data
  • the training UV coordinate map corresponding to the training face image may be used as the training label to train the semantic segmentation network.
  • the training face image may be randomly extracted, the training face image is input into the semantic segmentation network to extract the predicted UV coordinate map, and the predicted UV coordinate map is calculated using a preset loss function and the training UV coordinate map Loss rate, use the loss rate to calculate the gradient, determine whether the gradient meets the preset iteration conditions; based on the judgment result of the gradient meeting the preset iteration conditions, determine the semantic segmentation network as a face detection model; based on the gradient does not meet the preset iteration conditions , The gradient and the preset learning rate are used to perform gradient descent on the network parameters of the semantic segmentation network, and the step of extracting the training face image is returned.
  • the semantic segmentation network may be a semantic segmentation network such as FCN, SegNet, U-Net, etc.
  • the semantic segmentation network may be trained by means of random gradient descent.
  • inputting the training face image to the semantic segmentation network can extract the prediction UV coordinate map.
  • the prediction UV coordinate map has the same resolution as the input training face image, for example, respectively The rate is 226 ⁇ 226, and each pixel on the predicted UV coordinate map is associated with three-dimensional coordinates, you can obtain the training UV coordinate map corresponding to the training face image, by predicting the UV coordinate map, training UV coordinate map and preset The loss function and the preset loss weights are used to calculate the predicted UV coordinate loss rate.
  • the preset loss weight is the preset loss weight of the feature points in the training UV coordinate map, where the feature points are the key pixel points describing the facial features contours of the face, such as the pixels on the eyes, mouth, nose or face contour of the face point.
  • the preset loss function is as follows:
  • P (u, v) is the training UV coordinate map
  • W (u, v) is the preset loss weight.
  • P (u, v) and predicted UV coordinates You can extract the three-dimensional coordinates associated with pixels at the same position to calculate the difference. You can get the difference of each pixel if the pixel is in the key point area of the face, such as the key point area of the eyes, mouth, nose, etc. , You can increase the preset loss weight, for example, pixels (u, v) in key areas such as eyes, mouth, nose, then set W (u, v) to 4, if in other areas, set W (u, v) Is 1 to improve accuracy.
  • the gradient and the preset learning rate are used to segment the network according to the following formula Of the network parameters for gradient descent:
  • w t + 1 is the updated network parameter
  • w t is the network parameter before update
  • lr is the learning rate
  • the training is ended, the network parameter w t is output, and a face detection model is obtained.
  • a three-dimensional reconstruction of the training face image is used to generate a training three-dimensional face model, and a training UV coordinate map is generated by training the three-dimensional face model to obtain training data.
  • the semantic segmentation network is trained to obtain a face detection model , No need to manually label the training face image and the training UV coordinate map, avoiding the need for manual prediction of the training data in the related technology, which leads to inaccurate training data, resulting in the inaccurate coordinates of the key points of the face output by CNN, which improves the human
  • the performance of the face detection model also improves the accuracy of face key point detection.
  • the face detection model can generate a UV coordinate map containing three-dimensional coordinates, and can detect the three-dimensional coordinates of key points, so that the detected key points of the face have depth information, which enriches the application scenarios of the key points.
  • FIG. 3A is a flowchart of a method for detecting face key points according to an embodiment of the present application.
  • the embodiment of the present application can be applied to the case of detecting face key points through a face image.
  • the method can include face key points Is executed by at least one of software and hardware, and integrated in the device that performs the method, as shown in FIG. 3A, the method may include steps S301 to S304.
  • step S301 a target face image is acquired.
  • the target face image may be a face image to be added with video effects.
  • the live video app detects the user's operation and intercepts one frame from the video frames collected by the camera
  • the image containing the face is used as the target face image.
  • the target face image can also be a face image of the face to be authenticated collected by the face authentication device during face authentication, or it can be an image of a user's local stored image
  • the face image during processing may also be an image of other key points of the face to be detected.
  • the embodiment of the present application does not limit the method of acquiring the target face image.
  • step S302 the target face image is input into a pre-trained face detection model to generate a UV coordinate map of the target face image, the UV coordinate map includes a plurality of pixels, and the UV coordinates Each pixel in the figure contains three-dimensional coordinates.
  • the face detection model can be trained through steps S3021 to S3024.
  • step S3021 a training face image is acquired.
  • step S3022 the training face image is three-dimensionally reconstructed based on a preset three-dimensional face model to obtain a training three-dimensional face model.
  • step S3023 according to the training three-dimensional face model, a training UV coordinate map including three-dimensional coordinates of the training three-dimensional face model is generated.
  • step S3024 the semantic segmentation network is trained using the training face image and the training UV coordinate map to obtain a face detection model, wherein the face detection model is set to generate a UV coordinate map containing three-dimensional coordinates .
  • a UV coordinate map of the target face image shown in FIG. 3B can be generated.
  • the UV coordinate map includes a plurality of pixels, the Each pixel in the UV coordinate map contains three-dimensional coordinates.
  • the resolution of the output UV coordinate map is the same as the resolution of the target face image.
  • step S303 a UV template map is obtained, and the UV template map contains pre-marked key points of the face.
  • the UV template map may be a two-dimensional map summarized based on a large number of UV coordinate maps, and the key points of the face are marked in the UV template map, and the marked key points of the face are suitable for the key point detection of most faces.
  • FIG. 3C it is a schematic diagram of a UV template diagram according to an embodiment of the present application.
  • the UV template diagram is pre-marked with key points of a human face, for example, in the UV template diagram, standard eyes, nose, mouth, face contour, etc. key point.
  • step S304 the pixel points corresponding to the key points of the face are determined in the UV coordinate map to detect the three-dimensional coordinates of the key points of the face.
  • the resolutions of the UV coordinate map and the UV template map are the same, and each pixel point corresponds to each other.
  • the pixel points corresponding to the key points of the face pre-marked on the UV template can be determined in the UV coordinate map In order to detect the three-dimensional coordinates of key points of the face, the efficiency of detecting key points of the face can be improved.
  • the classification of the key points of the eyes is A
  • the key points of the nose are The classification mark is B
  • the classification of key points of the mouth is C, etc.
  • the face key points of the classification mark A on the UV template map are valid, then the UV coordinate map
  • the pixel points corresponding to the key points of the face identified by the classification ID A can be determined in order to obtain the three-dimensional coordinates associated with the pixel points, so as to realize the detection of the key points of the face, so as to subsequently perform special effects processing according to the three-dimensional coordinates of the key points of the face.
  • the target face image is input into a pre-trained face detection model to generate a UV coordinate map of the target face image, and a UV template map including key points of the pre-marked face is obtained and determined in the UV coordinate map
  • the pixel points corresponding to the key points of the face to detect the three-dimensional coordinates of the key points of the face.
  • the face detection model of the embodiment of the present application does not need to manually mark the training face image and the training UV coordinate map, which avoids the need for training data in related technologies.
  • the artificial prediction annotation leads to inaccurate training data, which causes the CNN to output inaccurate coordinates of key points of the face, improves the performance of the face detection model, and can obtain accurate key points of the face.
  • Each pixel in the UV coordinate map contains three-dimensional coordinates, which can detect the three-dimensional coordinates of the key points, so that the detected key points of the face have depth information, which enriches the application scenarios of the key points.
  • the device includes a training face image acquisition module 401, a three-dimensional reconstruction module 402, and a training UV coordinate map generation Module 403 and training module 404.
  • the training face image obtaining module 401 is set to obtain training face images.
  • the three-dimensional reconstruction module 402 is configured to perform three-dimensional reconstruction on the training face image based on a preset three-dimensional face model to obtain a training three-dimensional face model.
  • the training UV coordinate map generation module 403 is set to generate a training UV coordinate map including the three-dimensional coordinates of the training three-dimensional face model according to the training three-dimensional face model.
  • the training module 404 is configured to train the semantic segmentation network using the training face image and the training UV coordinate map to obtain a face detection model, wherein the face detection model is set to generate UV coordinates including three-dimensional coordinates Figure.
  • the three-dimensional reconstruction module 402 includes a three-dimensional face model selection submodule, a principal component analysis submodule, and a three-dimensional reconstruction submodule.
  • the 3D face model selection sub-module is set to select M 3D face models.
  • the principal component analysis submodule is configured to perform principal component analysis on the M three-dimensional face models to obtain a principal component component matrix and a eigenvalue matrix.
  • the three-dimensional reconstruction submodule is configured to perform three-dimensional reconstruction on the principal component component matrix and the feature value matrix corresponding to each of the training face images for each training face image to obtain a training three-dimensional face model.
  • the 3D reconstruction module 402 further includes a 3D face model preprocessing submodule and a 3D face model alignment submodule.
  • the 3D face model preprocessing submodule is set to preprocess the selected 3D face model.
  • the 3D face model alignment sub-module is set to align the pre-processed 3D face model using the optical flow method to obtain the aligned 3D face model;
  • the preprocessing includes at least one of the following: smoothing processing, hole filling processing and coordinate correction.
  • the 3D reconstruction submodule includes an initial parameter setting unit, an initial 3D face model construction unit, an initial 3D face model projection unit, a difference value calculation unit, an optimization unit, and a training 3D face model generation unit.
  • the initial parameter setting unit is set to set an initial eigenvalue matrix and an initial projection parameter matrix.
  • the initial three-dimensional face model construction unit is configured to construct the initial three-dimensional face model using the initial feature value matrix and the principal component component matrix.
  • the initial three-dimensional face model projection unit is set to obtain a projected face image of the initial three-dimensional face model in a two-dimensional space based on the initial projection parameter matrix.
  • the difference value calculation unit is configured to calculate the difference value between the projected face image and the training face image.
  • the optimization unit is configured to iteratively optimize the initial eigenvalue matrix and the initial projection parameter matrix respectively according to the difference value by using a stochastic gradient descent method to obtain the eigenvalue matrix and projection parameter matrix when the difference value converges As the target feature value matrix and target projection parameter matrix, and use the target feature value matrix as the feature value matrix corresponding to the training face image.
  • the training three-dimensional face model generating unit is configured to generate the training three-dimensional face model using the target feature value matrix and the principal component component matrix.
  • the optimization unit includes a feature point selection subunit, a difference value calculation subunit, a first iterative optimization subunit, and a second iterative optimization subunit.
  • the feature point selection subunit is set to be optimized for the first N rounds of iterations, and K feature points are randomly selected from the training face image and the projected face image.
  • the feature point is a key pixel point describing the outline of the facial features.
  • the feature point can be a pixel point on the face, mouth, or face contour of the face.
  • the difference value calculation subunit is configured to calculate the difference value between the projected face image and the training face image according to the K feature points.
  • the first iterative optimization subunit is configured to iteratively optimize a part of the initial eigenvalue matrix and the initial projection parameter matrix using a stochastic gradient descent method according to the difference value.
  • the second iteration optimization subunit is set to optimize all eigenvalues in the eigenvalue matrix after N rounds of iterations and the projection parameter matrix after N rounds of iterations after stochastic gradient descent method after N rounds of iterative optimization to obtain The eigenvalue matrix and projection parameter matrix when the difference values converge are used as the target eigenvalue matrix and target projection parameter matrix.
  • the training UV coordinate map generation module 403 includes a vertex acquisition submodule, a projection submodule, and a training UV coordinate map generation submodule.
  • the vertex acquisition submodule is configured to acquire multiple vertices of the training three-dimensional face model, and each of the vertices has three-dimensional coordinates.
  • the projection submodule is configured to project the multiple vertices onto a preset spherical surface to obtain multiple corresponding projection points of the multiple vertices on the preset spherical surface.
  • the training UV coordinate map generation sub-module is set to expand the spherical surface containing the multiple projection points to generate a training UV coordinate map containing the multiple projection points; wherein, each of the training UV coordinate maps Each projection point is associated with the three-dimensional coordinates of the corresponding vertex.
  • the training module 404 includes a training face image extraction sub-module, a prediction UV coordinate map extraction sub-module, a loss rate calculation sub-module, a gradient calculation sub-module, an iteration condition judgment sub-module, and a face detection model determination Submodule and network parameter adjustment submodule.
  • the training face image extraction submodule is set to extract the training face image.
  • the prediction UV coordinate map extraction sub-module is set to input the training face image into the semantic segmentation network to extract the prediction UV coordinate map.
  • the loss rate calculation submodule is configured to calculate the loss rate of the predicted UV coordinate map using a preset loss function and the training UV coordinate map.
  • the gradient calculation submodule is configured to calculate the gradient using the loss rate.
  • the iteration condition determination submodule is configured to determine whether the gradient satisfies the preset iteration condition.
  • the face detection model determination submodule is configured to determine that the semantic segmentation network is a face detection model.
  • the network parameter adjustment submodule is configured to perform gradient descent on the network parameters of the semantic segmentation network using the gradient and the preset learning rate, and return to the step of extracting the training face image.
  • the loss rate calculation sub-module includes a training UV coordinate graph unit and a loss rate calculation unit.
  • a training UV coordinate map unit set to obtain a training UV coordinate map corresponding to the training face image
  • the loss rate calculation unit is configured to calculate the loss rate of the predicted UV coordinate map using the training UV coordinate map, the predicted UV coordinate map, a preset loss function, and a preset loss weight, where the preset loss weight Is the preset loss weight of the feature points in the training UV coordinate map, where the feature points are key pixel points that describe the facial feature contours of the face, such as pixels on the eyes, mouth, nose or face contour of the human face.
  • the apparatus for training a face detection model provided in an embodiment of the present application may execute the method for training a face detection model provided in any embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a face key point detection device according to an embodiment of the present application. As shown in FIG. 5, the device includes a target face image acquisition module 501, a UV coordinate map generation module 502, and a template map acquisition Module 503 and a three-dimensional coordinate detection module 504 for key points of the face.
  • the target face image acquisition module 501 is set to acquire the target face image.
  • the UV coordinate map generation module 502 is configured to input the target face image into a pre-trained face detection model to generate a UV coordinate map of the target face image, the UV coordinate map includes a plurality of pixels, Each pixel in the UV coordinate map contains three-dimensional coordinates.
  • the template image obtaining module 503 is configured to obtain a UV template image, the UV template image including pre-marked key points of the face.
  • the face key point three-dimensional coordinate detection module 504 is set to determine the pixel point corresponding to the face key point in the UV coordinate map to detect the three-dimensional coordinates of the face key point.
  • the face key point detection device provided by the embodiment of the present application can execute the face key point detection method provided by any embodiment of the present application, and has a function module and beneficial effects corresponding to the execution method.
  • FIG. 6 a schematic structural diagram of a device in an example of the present application is shown.
  • the device includes a processor 60, a memory 61, a display screen 62 with a touch function, an input device 63, an output device 64, and a communication device 65.
  • the number of processors 60 in the device may be at least one, and one processor 60 is taken as an example in FIG. 6.
  • the number of the memory 61 in the device may be at least one, and one memory 61 is taken as an example in FIG. 6.
  • the processor 60, the memory 61, the display screen 62, the input device 63, the output device 64, and the communication device 65 of the device may be connected through a bus or in other ways. In FIG. 6, a connection through a bus is used as an example.
  • the memory 61 is a computer-readable storage medium, and is configured to store software programs, computer executable programs, and modules, such as program instructions / modules corresponding to the training method of the face detection model described in any embodiment of the present application (for example, the above
  • Program instructions / modules corresponding to the detection method for example, the target face image acquisition module 501, the UV coordinate map generation module 502, the template map acquisition module 503, and the face key point 3D coordinate detection module in the above-mentioned face key point detection device 504).
  • the memory 61 may mainly include a storage program area and a storage data area, where the storage program area may store operation devices and application programs required for at least one function; the storage data area may store data created according to the use of the device and the like.
  • the memory 61 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 61 includes memories remotely provided with respect to the processor 60, and these remote memories may be connected to the device through a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.
  • the display screen 62 is a display screen 62 with a touch function, which may be a capacitive screen, an electromagnetic screen or an infrared screen.
  • the display screen 62 is configured to display data according to the instructions of the processor 60, and is also configured to receive a touch operation acting on the display screen 62 and send a corresponding signal to the processor 60 or other devices.
  • the display screen 62 when the display screen 62 is an infrared screen, the display screen 62 further includes an infrared touch frame, and the infrared touch frame is disposed around the display screen 62.
  • the infrared touch frame can also be used to receive infrared signals. And send the infrared signal to the processor 50 or other devices.
  • the communication device 65 is configured to establish a communication connection with other devices, which may be at least one of a wired communication device and a wireless communication device.
  • the input device 63 is configured to receive input digital or character information, and generate key signal input related to user settings and function control of the device, and may also be a camera for acquiring images and a sound pickup device for acquiring audio data.
  • the output device 64 may include audio equipment such as a speaker. It should be noted that the composition of the input device 63 and the output device 64 can be set according to actual conditions.
  • the processor 60 executes various functional applications and data processing of the device by running software programs, instructions, and modules stored in the memory 61, that is, at least one of the above-mentioned face detection model training method and face key point detection method one.
  • the processor 60 executes at least one program stored in the memory 61, at least one of the training method of the face detection model and the detection method of the key points of the face provided in the embodiments of the present application is implemented.
  • Embodiments of the present application also provide a computer-readable storage medium, when the instructions in the storage medium are executed by the processor of the device, the device can execute the training method and the human face detection model training method described in the foregoing method embodiments At least one of the face key point detection methods.
  • the technical solution of the present application can essentially be embodied in the form of a software product that contributes to the related technology, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, Read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a robot, personal A computer, a server, or a network device, etc.) execute at least one of the method for training a face detection model and the method for detecting key points of a face as described in any embodiment of the present application.
  • a computer device which can be a robot, personal A computer, a server, or a network device, etc.
  • each part of the present application may be implemented by hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device.
  • a logic gate circuit for implementing a logic function on a data signal
  • Discrete logic circuits dedicated integrated circuits with appropriate combinational logic gate circuits
  • programmable gate arrays PROM
  • field programmable gate arrays Field-Programmable Gate Array, FPGA

Abstract

La présente invention concerne un procédé et un appareil d'entraînement de modèle de détection de visage, un procédé et un appareil de détection de point clé de visage, et un support de données. Le procédé d'apprentissage consiste à : obtenir une image de visage d'apprentissage ; effectuer une reconstruction tridimensionnelle sur l'image de visage d'apprentissage sur la base d'un modèle de visage tridimensionnel prédéfini pour obtenir un modèle de visage tridimensionnel d'apprentissage ; générer une carte de coordonnées UV d'apprentissage comprenant des coordonnées tridimensionnelles du modèle de visage tridimensionnel d'apprentissage selon le modèle de visage tridimensionnel d'apprentissage ; et utiliser l'image de visage d'apprentissage et la carte de coordonnées UV d'apprentissage pour entraîner un réseau de segmentation sémantique, de façon à obtenir un modèle de détection de visage.
PCT/CN2019/117945 2018-11-16 2019-11-13 Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage WO2020098686A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
SG11202105115TA SG11202105115TA (en) 2018-11-16 2019-11-13 Face detection model training method and apparatus, and face key point detection method and apparatus
RU2021115692A RU2770752C1 (ru) 2018-11-16 2019-11-13 Способ и устройство для обучения модели распознавания лица и устройство для определения ключевой точки лица
EP19883820.3A EP3882808A4 (fr) 2018-11-16 2019-11-13 Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage
US17/294,664 US11922707B2 (en) 2018-11-16 2019-11-13 Method and apparatus for training face detection model, and apparatus for detecting face key point

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811367129.0 2018-11-16
CN201811367129.0A CN109508678B (zh) 2018-11-16 2018-11-16 人脸检测模型的训练方法、人脸关键点的检测方法和装置

Publications (1)

Publication Number Publication Date
WO2020098686A1 true WO2020098686A1 (fr) 2020-05-22

Family

ID=65748813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117945 WO2020098686A1 (fr) 2018-11-16 2019-11-13 Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage

Country Status (6)

Country Link
US (1) US11922707B2 (fr)
EP (1) EP3882808A4 (fr)
CN (1) CN109508678B (fr)
RU (1) RU2770752C1 (fr)
SG (1) SG11202105115TA (fr)
WO (1) WO2020098686A1 (fr)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832648A (zh) * 2020-07-10 2020-10-27 北京百度网讯科技有限公司 关键点标注方法、装置、电子设备及存储介质
CN111918049A (zh) * 2020-08-14 2020-11-10 广东申义实业投资有限公司 三维成像的方法、装置、电子设备及存储介质
CN112464809A (zh) * 2020-11-26 2021-03-09 北京奇艺世纪科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质
CN112530003A (zh) * 2020-12-11 2021-03-19 北京奇艺世纪科技有限公司 一种三维人手重建方法、装置及电子设备
CN113313828A (zh) * 2021-05-19 2021-08-27 华南理工大学 基于单图片本征图像分解的三维重建方法与系统
CN113487575A (zh) * 2021-07-13 2021-10-08 中国信息通信研究院 用于训练医学影像检测模型的方法及装置、设备、可读存储介质
CN113808277A (zh) * 2021-11-05 2021-12-17 腾讯科技(深圳)有限公司 一种图像处理方法及相关装置
CN113870267A (zh) * 2021-12-03 2021-12-31 深圳市奥盛通科技有限公司 缺陷检测方法、装置、计算机设备及可读存储介质
CN114565967A (zh) * 2022-04-28 2022-05-31 广州丰石科技有限公司 工牌人脸检测方法、终端以及存储介质
CN114648611A (zh) * 2022-04-12 2022-06-21 清华大学 局域轨道函数的三维重构方法及装置
US20230042654A1 (en) * 2020-12-04 2023-02-09 Tencent Technology (Shenzhen) Company Limited Action synchronization for target object

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508678B (zh) 2018-11-16 2021-03-30 广州市百果园信息技术有限公司 人脸检测模型的训练方法、人脸关键点的检测方法和装置
CN110083243A (zh) * 2019-04-29 2019-08-02 深圳前海微众银行股份有限公司 基于摄像头的交互方法、装置、机器人及可读存储介质
US11580673B1 (en) * 2019-06-04 2023-02-14 Duke University Methods, systems, and computer readable media for mask embedding for realistic high-resolution image synthesis
CN110705352A (zh) * 2019-08-29 2020-01-17 杭州晟元数据安全技术股份有限公司 基于深度学习的指纹图像检测方法
CN110826395B (zh) * 2019-09-18 2023-10-31 平安科技(深圳)有限公司 人脸旋转模型的生成方法、装置、计算机设备及存储介质
CN110827342B (zh) 2019-10-21 2023-06-02 中国科学院自动化研究所 三维人体模型重建方法及存储设备、控制设备
CN111062266B (zh) * 2019-11-28 2022-07-15 东华理工大学 基于圆柱坐标的人脸点云关键点定位方法
CN110942142B (zh) * 2019-11-29 2021-09-17 广州市百果园信息技术有限公司 神经网络的训练及人脸检测方法、装置、设备和存储介质
CN111027474B (zh) * 2019-12-09 2024-03-15 Oppo广东移动通信有限公司 人脸区域获取方法、装置、终端设备和存储介质
CN113129425A (zh) * 2019-12-31 2021-07-16 Tcl集团股份有限公司 一种人脸图像三维重建方法、存储介质及终端设备
CN111428579A (zh) * 2020-03-03 2020-07-17 平安科技(深圳)有限公司 人脸图像的获取方法与系统
CN111524226B (zh) * 2020-04-21 2023-04-18 中国科学技术大学 讽刺肖像画的关键点检测与三维重建方法
CN111507304B (zh) * 2020-04-29 2023-06-27 广州市百果园信息技术有限公司 自适应刚性先验模型训练方法、人脸跟踪方法及相关装置
CN111652082B (zh) * 2020-05-13 2021-12-28 北京的卢深视科技有限公司 人脸活体检测方法和装置
CN113743157A (zh) * 2020-05-28 2021-12-03 北京沃东天骏信息技术有限公司 关键点检测模型训练方法和装置、关键点检测方法和装置
CN111797745A (zh) * 2020-06-28 2020-10-20 北京百度网讯科技有限公司 一种物体检测模型的训练及预测方法、装置、设备及介质
CN111667403B (zh) * 2020-07-02 2023-04-18 北京爱笔科技有限公司 一种有遮挡的人脸图像的生成方法及装置
CN112001231B (zh) * 2020-07-09 2023-07-21 哈尔滨工业大学(深圳) 加权多任务稀疏表示的三维人脸识别方法、系统及介质
CN112101105B (zh) * 2020-08-07 2024-04-09 深圳数联天下智能科技有限公司 人脸关键点检测模型的训练方法、装置以及存储介质
CN112869746B (zh) * 2020-11-10 2022-09-20 南方医科大学南方医院 一种检测提上睑肌肌力的方法及装置
CN112613357B (zh) * 2020-12-08 2024-04-09 深圳数联天下智能科技有限公司 人脸测量方法、装置、电子设备和介质
CN112613374A (zh) * 2020-12-16 2021-04-06 厦门美图之家科技有限公司 人脸可见区域解析与分割方法、人脸上妆方法及移动终端
CN112733705A (zh) * 2021-01-07 2021-04-30 中科魔镜(深圳)科技发展有限公司 一种基于人体面部的3d智能分析系统
CN113112596B (zh) * 2021-05-12 2023-10-24 北京深尚科技有限公司 人脸几何模型提取、3d人脸重建方法、设备及存储介质
CN113628322B (zh) * 2021-07-26 2023-12-05 阿里巴巴(中国)有限公司 图像处理、ar显示与直播方法、设备及存储介质
CN113838134B (zh) * 2021-09-26 2024-03-12 广州博冠信息科技有限公司 图像关键点检测方法、装置、终端和存储介质
CN114612971A (zh) * 2022-03-04 2022-06-10 北京百度网讯科技有限公司 人脸检测方法、模型训练方法、电子设备及程序产品
CN115311407B (zh) * 2022-04-19 2023-09-12 北京和华瑞博医疗科技有限公司 一种特征点标记方法、装置、设备及存储介质
CN115209180A (zh) * 2022-06-02 2022-10-18 阿里巴巴(中国)有限公司 视频生成方法以及装置
CN117745990B (zh) * 2024-02-21 2024-05-07 虹软科技股份有限公司 一种虚拟试衣方法、装置和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018085749A1 (fr) * 2016-11-07 2018-05-11 Nec Laboratories America, Inc Système et procédé d'apprentissage d'une propagation d'étiquettes à cheminement aléatoire permettant une segmentation sémantique faiblement supervisée
CN108304765A (zh) * 2017-12-11 2018-07-20 中国科学院自动化研究所 用于人脸关键点定位与语义分割的多任务检测装置
CN108921926A (zh) * 2018-07-02 2018-11-30 广州云从信息科技有限公司 一种基于单张图像的端到端三维人脸重建方法
CN109508678A (zh) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 人脸检测模型的训练方法、人脸关键点的检测方法和装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6975750B2 (en) * 2000-12-01 2005-12-13 Microsoft Corp. System and method for face recognition using synthesized training images
KR102477190B1 (ko) 2015-08-10 2022-12-13 삼성전자주식회사 얼굴 인식 방법 및 장치
CN105354531B (zh) * 2015-09-22 2019-05-21 成都通甲优博科技有限责任公司 一种面部关键点的标注方法
RU2610682C1 (ru) * 2016-01-27 2017-02-14 Общество с ограниченной ответственностью "СТИЛСОФТ" Способ распознавания лиц
CN106485230B (zh) 2016-10-18 2019-10-25 中国科学院重庆绿色智能技术研究院 基于神经网络的人脸检测模型的训练、人脸检测方法及系统
US10474881B2 (en) * 2017-03-15 2019-11-12 Nec Corporation Video retrieval system based on larger pose face frontalization
CN107680158A (zh) * 2017-11-01 2018-02-09 长沙学院 一种基于卷积神经网络模型的三维人脸重建方法
CN107766851A (zh) * 2017-12-06 2018-03-06 北京搜狐新媒体信息技术有限公司 一种人脸关键点定位方法及定位装置
CN108805977A (zh) * 2018-06-06 2018-11-13 浙江大学 一种基于端到端卷积神经网络的人脸三维重建方法
CN110866864A (zh) * 2018-08-27 2020-03-06 阿里巴巴集团控股有限公司 人脸姿态估计/三维人脸重构方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018085749A1 (fr) * 2016-11-07 2018-05-11 Nec Laboratories America, Inc Système et procédé d'apprentissage d'une propagation d'étiquettes à cheminement aléatoire permettant une segmentation sémantique faiblement supervisée
CN108304765A (zh) * 2017-12-11 2018-07-20 中国科学院自动化研究所 用于人脸关键点定位与语义分割的多任务检测装置
CN108921926A (zh) * 2018-07-02 2018-11-30 广州云从信息科技有限公司 一种基于单张图像的端到端三维人脸重建方法
CN109508678A (zh) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 人脸检测模型的训练方法、人脸关键点的检测方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3882808A4

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832648A (zh) * 2020-07-10 2020-10-27 北京百度网讯科技有限公司 关键点标注方法、装置、电子设备及存储介质
CN111832648B (zh) * 2020-07-10 2024-02-09 北京百度网讯科技有限公司 关键点标注方法、装置、电子设备及存储介质
CN111918049A (zh) * 2020-08-14 2020-11-10 广东申义实业投资有限公司 三维成像的方法、装置、电子设备及存储介质
CN111918049B (zh) * 2020-08-14 2022-09-06 广东申义实业投资有限公司 三维成像的方法、装置、电子设备及存储介质
CN112464809A (zh) * 2020-11-26 2021-03-09 北京奇艺世纪科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质
CN112464809B (zh) * 2020-11-26 2023-06-06 北京奇艺世纪科技有限公司 一种人脸关键点检测方法、装置、电子设备及存储介质
US20230042654A1 (en) * 2020-12-04 2023-02-09 Tencent Technology (Shenzhen) Company Limited Action synchronization for target object
CN112530003A (zh) * 2020-12-11 2021-03-19 北京奇艺世纪科技有限公司 一种三维人手重建方法、装置及电子设备
CN112530003B (zh) * 2020-12-11 2023-10-27 北京奇艺世纪科技有限公司 一种三维人手重建方法、装置及电子设备
CN113313828B (zh) * 2021-05-19 2022-06-14 华南理工大学 基于单图片本征图像分解的三维重建方法与系统
CN113313828A (zh) * 2021-05-19 2021-08-27 华南理工大学 基于单图片本征图像分解的三维重建方法与系统
CN113487575A (zh) * 2021-07-13 2021-10-08 中国信息通信研究院 用于训练医学影像检测模型的方法及装置、设备、可读存储介质
CN113487575B (zh) * 2021-07-13 2024-01-16 中国信息通信研究院 用于训练医学影像检测模型的方法及装置、设备、可读存储介质
CN113808277A (zh) * 2021-11-05 2021-12-17 腾讯科技(深圳)有限公司 一种图像处理方法及相关装置
CN113808277B (zh) * 2021-11-05 2023-07-18 腾讯科技(深圳)有限公司 一种图像处理方法及相关装置
CN113870267B (zh) * 2021-12-03 2022-03-22 深圳市奥盛通科技有限公司 缺陷检测方法、装置、计算机设备及可读存储介质
CN113870267A (zh) * 2021-12-03 2021-12-31 深圳市奥盛通科技有限公司 缺陷检测方法、装置、计算机设备及可读存储介质
CN114648611A (zh) * 2022-04-12 2022-06-21 清华大学 局域轨道函数的三维重构方法及装置
CN114648611B (zh) * 2022-04-12 2023-07-18 清华大学 局域轨道函数的三维重构方法及装置
CN114565967A (zh) * 2022-04-28 2022-05-31 广州丰石科技有限公司 工牌人脸检测方法、终端以及存储介质
CN114565967B (zh) * 2022-04-28 2022-08-30 广州丰石科技有限公司 工牌人脸检测方法、终端以及存储介质

Also Published As

Publication number Publication date
RU2770752C1 (ru) 2022-04-21
CN109508678A (zh) 2019-03-22
US11922707B2 (en) 2024-03-05
CN109508678B (zh) 2021-03-30
US20210406516A1 (en) 2021-12-30
EP3882808A1 (fr) 2021-09-22
EP3882808A4 (fr) 2022-01-19
SG11202105115TA (en) 2021-06-29

Similar Documents

Publication Publication Date Title
WO2020098686A1 (fr) Procédé et appareil d'entraînement de modèle de détection de visage, et procédé et appareil de détection de point clé de visage
US10198823B1 (en) Segmentation of object image data from background image data
US9965865B1 (en) Image data segmentation using depth data
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
CN109325437B (zh) 图像处理方法、装置和系统
JP6424293B1 (ja) ボディの画像化
US10217195B1 (en) Generation of semantic depth of field effect
US11703949B2 (en) Directional assistance for centering a face in a camera field of view
CN110399825B (zh) 面部表情迁移方法、装置、存储介质及计算机设备
WO2020134528A1 (fr) Procédé de détection cible et produit associé
US20120183238A1 (en) Rapid 3D Face Reconstruction From a 2D Image and Methods Using Such Rapid 3D Face Reconstruction
CN111008935B (zh) 一种人脸图像增强方法、装置、系统及存储介质
US11182945B2 (en) Automatically generating an animatable object from various types of user input
KR20180097915A (ko) 개인 맞춤형 3차원 얼굴 모델 생성 방법 및 그 장치
CN111489396A (zh) 利用临界边缘检测神经网络和几何模型确定相机参数
CN112242002B (zh) 基于深度学习的物体识别和全景漫游方法
CN112598780A (zh) 实例对象模型构建方法及装置、可读介质和电子设备
US11645800B2 (en) Advanced systems and methods for automatically generating an animatable object from various types of user input
US11605220B2 (en) Systems and methods for video surveillance
JP6655513B2 (ja) 姿勢推定システム、姿勢推定装置、及び距離画像カメラ
US11562504B1 (en) System, apparatus and method for predicting lens attribute
WO2023053317A1 (fr) Appareil de mise en correspondance d'images, procédé de commande et support d'enregistrement non transitoire lisible par ordinateur
KR102593135B1 (ko) 딥러닝 기술 기반 3차원 공간 모델링 및 시점 합성을 통해 전문 촬영 기법이 적용된 고품질 동영상을 생성하는 방법과 이를 위한 장치
TWI826201B (zh) 物件檢測方法、物件檢測設備以及其非暫時性儲存媒體
CN117593702B (zh) 远程监控方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19883820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019883820

Country of ref document: EP

Effective date: 20210616