WO2021129107A1 - 深度人脸图像的生成方法、装置、电子设备及介质 - Google Patents

深度人脸图像的生成方法、装置、电子设备及介质 Download PDF

Info

Publication number
WO2021129107A1
WO2021129107A1 PCT/CN2020/123544 CN2020123544W WO2021129107A1 WO 2021129107 A1 WO2021129107 A1 WO 2021129107A1 CN 2020123544 W CN2020123544 W CN 2020123544W WO 2021129107 A1 WO2021129107 A1 WO 2021129107A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
feature points
face image
dimensional feature
depth map
Prior art date
Application number
PCT/CN2020/123544
Other languages
English (en)
French (fr)
Inventor
陈锦伟
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021129107A1 publication Critical patent/WO2021129107A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the embodiments of this specification relate to the field of image processing technology, and in particular to a method, device, electronic device, and medium for generating a deep face image.
  • Face recognition technology is used for example in stations with facial recognition, facial recognition in supermarkets, and mobile apps. In scenes such as face-swiping and login.
  • the embodiments of this specification provide a method, device, electronic device, and medium for generating a deep face image, which can effectively improve the accuracy of a deep face image generated by a two-dimensional face image.
  • the first aspect of the embodiments of this specification provides a method for generating a deep face image, including: after acquiring a two-dimensional face image and a three-dimensional face image of a target user, performing a neural network on the two-dimensional face image Feature extraction, and classify the extracted two-dimensional feature points to obtain classified two-dimensional feature points; and use a neural network to perform feature extraction on the three-dimensional face image, and classify the extracted three-dimensional feature points, Obtaining classified three-dimensional feature points; mapping the classified two-dimensional feature points into a three-dimensional feature space to obtain three-dimensional mapped feature points, wherein the three-dimensional feature space is composed of the classified three-dimensional feature points; Generate a depth map according to the three-dimensional mapping feature points; use the three-dimensional face image to perform enhancement processing on the depth map to obtain the enhanced depth map, and use the enhanced depth map as the target The user's deep face image.
  • the second aspect of the embodiments of the present specification provides a device for generating a deep face image, including: a two-dimensional feature point acquisition unit for using a neural network after acquiring a two-dimensional face image and a three-dimensional face image of a target user Perform feature extraction on the two-dimensional face image, and classify the extracted two-dimensional feature points to obtain classified two-dimensional feature points; and a three-dimensional feature point acquisition unit for using a neural network to analyze the three-dimensional person Feature extraction is performed on the face image, and the extracted three-dimensional feature points are classified to obtain the classified three-dimensional feature points; the three-dimensional mapping unit is used to map the classified two-dimensional feature points to the three-dimensional feature space to obtain the three-dimensional Mapping feature points, wherein the three-dimensional feature space is composed of the classified three-dimensional feature points; a depth map acquisition unit for generating a depth map according to the three-dimensional mapping feature points; a depth face image acquisition unit for The depth map is enhanced by using the three-dimensional face image to obtain the enhanced depth map, and the
  • the third aspect of the embodiments of this specification also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor implements the above-mentioned deep face when the program is executed. The steps of the image generation method.
  • the fourth aspect of the embodiments of the present specification also provides a computer-readable storage medium on which a computer program is stored.
  • the program is executed by a processor, the steps of the above-mentioned deep face image generation method are provided.
  • the extracted two-dimensional feature points and three-dimensional feature points are classified, so that each of the two-dimensional feature points and the three-dimensional feature points after the classification is
  • the characterization ability of each feature point is enhanced, and feature mapping is performed on the basis of the enhancement of the characterization ability of the feature point, which can effectively promote the convergence and consistency of the mapping from two-dimensional space to three-dimensional space to be more robust, thereby making the mapping of two-dimensional feature points into
  • the accuracy and quality of the 3D mapping feature points are also higher. Based on the higher accuracy and quality of the 3D mapping feature points, the accuracy and quality of the deep face images obtained based on the 3D mapping feature points will also follow. improve.
  • FIG. 1 is a method flowchart of a method for generating a deep face image in an embodiment of this specification
  • FIG. 2 is a flowchart of a method for obtaining classified two-dimensional feature points in an embodiment of the specification
  • FIG. 3 is a flowchart of a method for obtaining classified three-dimensional feature points in an embodiment of this specification
  • FIG. 4 is a schematic diagram of the process of generating a deep face image in an embodiment of this specification
  • FIG. 5 is a schematic structural diagram of a device for generating a deep face image in an embodiment of this specification
  • Fig. 6 is a schematic diagram of the structure of an electronic device in an embodiment of the specification.
  • an embodiment of this specification provides a method for generating a deep face image, including steps S102 to S110.
  • Step S102 After acquiring the two-dimensional face image and the three-dimensional face image of the target user, use a neural network to perform feature extraction on the two-dimensional face image, and classify the extracted two-dimensional feature points to obtain the post-classification The two-dimensional feature points.
  • Step S104 Perform feature extraction on the three-dimensional face image using a neural network, and classify the extracted three-dimensional feature points to obtain classified three-dimensional feature points.
  • Step S106 Map the classified two-dimensional feature points into a three-dimensional feature space to obtain a three-dimensional mapped feature point, where the three-dimensional feature space is composed of the classified three-dimensional feature points.
  • Step S108 Generate a depth map according to the three-dimensional mapping feature points.
  • Step S110 Use the three-dimensional face image to perform enhancement processing on the depth map to obtain the enhanced depth map, and use the enhanced depth map as the depth face image of the target user.
  • the method for generating deep face images can be applied to terminal devices or servers.
  • the terminal devices include devices such as smart watches, laptops, desktop computers, smart phones, and tablet computers; the servers include laptop computers. , Desktop computers, all-in-one computers and tablet computers.
  • the two-dimensional face image and the three-dimensional face image of the target user may be acquired first, and the three-dimensional camera device may be used when acquiring the two-dimensional face image and the three-dimensional face image of the target user Collect two-dimensional face images and three-dimensional face images of the target user at the same time.
  • the three-dimensional camera device may be, for example, a 3D camera, a 3D pan-tilt, and a 3D camera.
  • the target user can be acquired in real time.
  • the two-dimensional face image and the three-dimensional face image of the target user may also be read from the storage device after the two-dimensional face image and the three-dimensional face image of the target user collected by the three-dimensional camera device are stored. A three-dimensional face image and the three-dimensional face image.
  • the two-dimensional face image of user a at the entrance of the store is simultaneously captured by the 3D camera with r1 and the three-dimensional face image with d1, and then r1 and d1 collected by the 3D camera are transmitted to the server through the network , So that the server can obtain r1 and d1.
  • the extracted two-dimensional feature points include multiple feature points, and each extracted two-dimensional feature point is represented by a multi-dimensional vector, for example, each two-dimensional feature point is represented by a 32-dimensional, 64-dimensional, and 128-dimensional vector, etc. Perform vector representation, and preferably each two-dimensional feature point is represented by a 128-dimensional vector.
  • the neural network may be Deep Neural Networks (DNN for short), Convolutional Neural Networks (CNN for short), Recurrent Neural Networks for short: RNN ), Feedforward Neural Networks (FFNN), Multi-task Cascaded Convolutional Networks (MTCNN), and Deep Alignment Network (DAN) Etc.
  • DNN Deep Neural Networks
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks for short: RNN
  • FFNN Feedforward Neural Networks
  • MTCNN Multi-task Cascaded Convolutional Networks
  • DAN Deep Alignment Network
  • DNN can be used to perform feature extraction on the two-dimensional face image, and then the salient parts of the face such as eyes, nose, eyebrows, and mouth corners will be extracted.
  • Two-dimensional feature points, each extracted two-dimensional feature point includes a multi-dimensional vector, for example, each extracted two-dimensional feature point is represented by a 128-dimensional vector.
  • the classifier may be, for example, a convolutional neural network (Convolutional Neural Networks, CNN for short) classifier, a softmax classifier, and a decision tree classifier.
  • CNN convolutional Neural Networks
  • the extracted two-dimensional feature points are input to the classifier for classification to obtain the classified two-dimensional feature points.
  • the classifier has been trained, and its training data is the collection of two-dimensional feature points of historical face images.
  • Each two-dimensional feature point in the training data is also represented by a multi-dimensional vector, and each of the training data is The vector dimension of the two-dimensional feature point is the same as the vector dimension of each two-dimensional feature point extracted in step S202, so that the vector dimension of each two-dimensional feature point input into the classifier for training and actual classification is the same , So that the training data and the actual classification data are the same type of data, so that the classification of the classified two-dimensional feature points obtained by the classifier is more accurate.
  • a three-dimensional camera device can be used to simultaneously collect a two-dimensional face image and a three-dimensional face image of the target user.
  • the three-dimensional camera device can be, for example, a 3D camera, a 3D pan-tilt, and a 3D camera. .
  • the extracted three-dimensional feature points include multiple feature points, and each extracted three-dimensional feature point is represented by a multi-dimensional vector, for example, each three-dimensional feature point is represented by a 32-dimensional, 64-dimensional, and 128-dimensional vector. Preferably, each three-dimensional feature point is represented by a 128-dimensional vector.
  • the neural network may be a deep neural network DNN, CNN, RNN, FFNN, MTCNN, DAN, etc., preferably, DNN may be used to perform feature extraction on the three-dimensional face image.
  • DNN can be used to perform feature extraction on the three-dimensional face image, and then the three-dimensional features of the salient parts of the face such as eyes, nose tip, eyebrows, and mouth corners will be extracted Point, each extracted three-dimensional feature point is represented by a 128-dimensional vector.
  • the classifier may be, for example, a Convolutional Neural Networks (CNN) classifier, a softmax classifier, a decision tree classifier, etc.; further, each extracted two-dimensional feature point includes The vector dimension of the extracted three-dimensional feature point is the same as the vector dimension contained in each extracted three-dimensional feature point.
  • CNN Convolutional Neural Networks
  • each extracted feature point contains a 64-dimensional vector
  • each extracted three-dimensional feature point also contains a 64-dimensional vector; of course, each extracted two The vector dimension contained in the dimensional feature point and the vector dimension contained in each extracted three-dimensional feature point may also be different.
  • the extracted three-dimensional feature points are input to the classifier for classification to obtain the classified three-dimensional feature points.
  • the classifier has been trained, and its training data is the collection of three-dimensional feature points of historical face images, each three-dimensional feature point in the training data is also represented by a multi-dimensional vector, and each three-dimensional feature in the training data
  • the vector dimension of the point is the same as the vector dimension of each three-dimensional feature point extracted in step S302, so that the vector dimension of each three-dimensional feature point input into the classifier for training and actual classification is the same, so that the training data and The actual classification data are all data of the same type, so that the classification of the classified three-dimensional feature points obtained by the classifier is more accurate.
  • the server After the 3D camera collects the r1 and d1 of user a, it is transmitted to the server, so that the server obtains r1 and d1, uses DNN to extract features from r1, and extracts two-dimensional feature points with r1 -1 means, where each feature point in r1-1 is represented by a 128-dimensional vector; after using the softmax classifier to classify r1-1, the classified two-dimensional feature points are obtained, and the classified two-dimensional Feature points include eye-type two-dimensional feature points corresponding to the feature points of the eye position, nose-type two-dimensional feature points corresponding to the feature points of the nose position, mouth-type two-dimensional feature points corresponding to the feature points of the mouth position, and feature points of the eyebrow position Corresponding two-dimensional feature points of eyebrows.
  • DNN is used to perform feature extraction on d1, and the three-dimensional feature points extracted are represented by d1-1, where each feature point in d1-1 is represented by a 128-dimensional vector; the softmax classifier is used to perform feature extraction on d1-1.
  • the classified three-dimensional feature points include the eye-type three-dimensional feature points corresponding to the feature points of the eye position, the nose-type three-dimensional feature points corresponding to the feature points of the nose position, and the feature points of the mouth position.
  • the mouth-like three-dimensional feature points, and the eyebrows’ position correspond to the eyebrows’ three-dimensional feature points.
  • the extracted two-dimensional feature points and the three-dimensional feature points are classified by the classifier, and the extracted two-dimensional feature points and the three-dimensional feature points are classified Perform classification to enhance the characterization ability of each of the classified two-dimensional feature points and three-dimensional feature points, and perform feature mapping on the basis of the enhanced characterization ability of feature points, which can effectively promote the transformation from two-dimensional space to three-dimensional space.
  • the mapping convergence and consistency are more robust, and the cross-modal comparison from two-dimensional space to three-dimensional space is completed.
  • step S106 is performed.
  • the three-dimensional feature space can be constructed according to the classified three-dimensional feature points, and then each classified two-dimensional feature point can be mapped to all the three-dimensional feature points through the residual neural network.
  • the three-dimensional mapping feature points are obtained.
  • each of the classified two-dimensional feature points is mapped to the three-dimensional feature space to obtain the The three-dimensional mapping feature points, where the residual neural network may be ResNet.
  • each feature point in the classified three-dimensional feature points is represented by a multi-dimensional vector, so that the classification can be determined in the three-dimensional coordinate system according to the multi-dimensional vector of each feature point in the classified three-dimensional feature points.
  • each classified three-dimensional feature point is connected in a three-dimensional coordinate system to construct the three-dimensional feature space.
  • F(x) represents the residual function
  • x represents the input
  • the vector similarity can be used to train the parameters in the residual function, wherein, when the vector similarity is obtained, the vector similarity can be obtained by using a similarity algorithm.
  • the similarity algorithm may include Euclidean distance algorithm, cosine similarity algorithm, Manhattan distance algorithm, Pearson correlation coefficient, and the like.
  • the mapping relationship between historical two-dimensional feature points and historical three-dimensional feature points can be obtained; according to the mapping relationship, the similarity algorithm is used to obtain the vector similarity of each set of feature points with the mapping relationship, where each set of feature points It includes a historical two-dimensional feature point and a historical three-dimensional feature point with a mapping relationship; the residual function is trained by using the similarity of each group of feature points to obtain the trained residual function.
  • mapping relationship between historical two-dimensional feature points and historical three-dimensional feature points can be used to train the parameters in the residual function, for example, if the historical two-dimensional feature points are r11, r12, r13, r14, and r15 , Corresponding to d11, d12, d13, d14, and d15 in the historical three-dimensional characteristics in turn.
  • the vector similarity between r11 and d11 can be obtained through the cosine similarity algorithm, which is represented by s1, and the vector similarity between r12 and d12 Degree s2, vector similarity s3 between r13 and d13, vector similarity s4 between r14 and d14, and vector similarity s5 between r15 and d15, according to the pair s1, s2, s3, s4 and s5
  • the parameters in the residual function are trained to make the parameters in the residual function more accurate. On the basis of the more accurate parameters in the residual function, all the parameters obtained by using the trained residual function for mapping are obtained. The accuracy of the three-dimensional mapping feature points will also be improved.
  • the characterization ability of each feature point in the two-dimensional feature points and three-dimensional feature points after classification is enhanced, which is the basis for the enhancement of the characterization ability of the feature points
  • a deep network structure can be used to process the three-dimensional mapping feature points to obtain a depth map; wherein, the deep network structure is a network structure in deep learning, which can be a classic convolution
  • the neural network structure such as LeNet-5, AlexNet, and VGG-16
  • the basic model of deep learning such as PolyNet
  • the residual network structure ResNet
  • each feature point of the three-dimensional mapping feature points is input to the depth Calculations are performed in the network structure to obtain the depth map. Since the three-dimensional mapping feature points obtained in step S106 have high accuracy, the accuracy of the obtained depth map will also be improved accordingly.
  • step S110 is performed.
  • the relative loss function of the depth map with respect to the three-dimensional face image can be obtained; and the depth map can be compared to the depth map by generating an adversarial network (Generative Adversarial Network, abbreviated as: GAN).
  • GAN Geneative Adversarial Network
  • the depth map and the three-dimensional face image can be classified or regressed to obtain the relative loss function; and the three-dimensional face image can be used to generate the discriminator in the confrontation network
  • the three-dimensional face image can be used to generate the discriminator in the confrontation network
  • Perform training to obtain a trained discriminator input the data in the depth map into the trained discriminator to obtain the probability data of the depth map; obtain the probability data of the depth map according to the probability data of the depth map A confrontation loss function; determine the target loss function according to the relative loss function and the confrontation loss function; and then generate a deep face image of the target user according to the target loss function and the three-dimensional mapping feature points.
  • a loss calculation is performed according to the probability data to obtain the counter loss function.
  • the relative weight of the relative loss function and the counter weight of the counter loss function can be obtained; and then according to The relative loss function and the relative weight, as well as the confrontation loss function and the confrontation weight, determine the target loss function.
  • the relative weight and the confrontation weight can be obtained through historical data training, or can be set manually or by a device.
  • the relative weight is usually smaller than the confrontation weight.
  • the relative weight can also be not less than all.
  • the confrontation weights for example, the relative weights are 0.1, 0.2, 0.3, etc., and the confrontation weights are 0.9, 0.8, 0.7, etc., which are not specifically limited in this specification.
  • the product of the relative loss function and the relative weight may be used as the relative product, and the product of the confrontation loss function and the confrontation weight may be used as the confrontation product, and then the relative product and the confrontation product may be obtained.
  • the sum is used as the objective loss function.
  • the deep network structure when generating the deep face image according to the target loss function and the three-dimensional mapping feature points, the deep network structure may be optimized by the target loss function, and the depth network structure may be optimized according to the optimized depth The network structure processes the three-dimensional mapping feature points to obtain the deep face image.
  • the embodiment of this specification provides a schematic diagram of a process for generating a deep face image, which is specifically shown in FIG. 4, including: a depth camera 40, which is used to collect a two-dimensional face image 41 and a three-dimensional face of user a For image 42, feature extraction is performed on the two-dimensional face image 41 and the three-dimensional face image 42 through a neural network to obtain two-dimensional feature points 43 and three-dimensional feature points 44; use the softmax classifier to classify and train the two-dimensional feature points 43, The two-dimensional feature points obtained after classification are the two-dimensional classification feature points 431; and the three-dimensional feature points 44 are classified and trained using the softmax classifier, and the two-dimensional feature points obtained after classification are the three-dimensional classification feature points 441.
  • r1 and d1 of user a collected by a 3D camera as an example, extract the features of r1 and d2 and use the softmax classifier to classify, and the classified two-dimensional feature points are represented by r1-2 and the classified three-dimensional features
  • the points are represented by d1-2, where each of the above-mentioned feature points is represented by a 128-dimensional vector.
  • the data in d1 and d2 are classified by a classifier to obtain the relative loss function expressed by mse-los; and the discriminator in the GAN network is trained by d1 to obtain the trained discriminator, To enhance the discrimination accuracy of the discriminator; then use the trained discriminator to discriminate the data in d2 to obtain the probability data of d2; according to the probability data of d2, generate the anti-loss function using D-loss Said.
  • the target loss function final-loss 0.8 ⁇ D-loss+0.2 ⁇ mse-loss; and then the deep network is calculated according to final-loss
  • the structure is optimized, and r1-3 is processed according to the optimized deep network structure to obtain the deep face image.
  • the learning of the mapping capabilities between two-dimensional and three-dimensional is efficient and robust, so that the generated
  • the accuracy and quality of the three-dimensional mapping feature points mapped from the two-dimensional feature points are also higher, and the feature points with rich three-dimensional information can be generated from the two-dimensional face image.
  • the GAN network uses the data of the three-dimensional face image to train the discriminator in the GAN network in the process of generating the counter loss function
  • the judgment accuracy of the trained discriminator is stronger.
  • the accuracy of the probability data obtained by judging the data in the depth map by the trained discriminator will also increase accordingly;
  • the counter loss function obtained according to the probability data will also be increased accordingly.
  • the target loss function is determined according to the confrontation loss function and the relative loss function, and the relative loss function is obtained according to the difference between the depth map and the three-dimensional face image , And the counter loss function is obtained through the training of the GAN network, so that the target loss function is obtained under the dual constraints of the original depth image (the three-dimensional face image) and the GAN network, Therefore, the accuracy of the target loss function will be improved accordingly.
  • the improved accuracy of the target loss function and the improved accuracy and quality of the three-dimensional mapping feature points according to the target loss function The accuracy and quality of the deep face image obtained by the three-dimensional mapping of the feature points will also be improved.
  • an embodiment of this specification provides a deep face image generation device, as shown in FIG. 5, including a two-dimensional feature point acquiring unit 501 and a three-dimensional feature point acquiring unit 502, a three-dimensional mapping unit 503, a depth map acquisition unit 504, and a depth face image acquisition unit 505.
  • the two-dimensional feature point acquiring unit 501 is configured to, after acquiring the two-dimensional face image and the three-dimensional face image of the target user, perform feature extraction on the two-dimensional face image by using a neural network, and perform the feature extraction on the extracted two-dimensional feature Points are classified to obtain two-dimensional feature points after classification; and
  • the three-dimensional feature point acquisition unit 502 is configured to use a neural network to perform feature extraction on the three-dimensional face image, and classify the extracted three-dimensional feature points to obtain classified three-dimensional feature points;
  • the three-dimensional mapping unit 503 is configured to map the classified two-dimensional feature points into a three-dimensional feature space to obtain three-dimensional mapped feature points, wherein the three-dimensional feature space is composed of the classified three-dimensional feature points;
  • the depth map acquiring unit 504 is configured to generate a depth map according to the three-dimensional mapping feature points
  • the depth face image acquisition unit 505 is configured to use the three-dimensional face image to perform enhancement processing on the depth map to obtain the enhanced depth map, and use the enhanced depth map as the target user's Deep face image.
  • the two-dimensional feature point acquiring unit 501 is configured to perform feature extraction on the two-dimensional face image using a neural network to obtain the extracted two-dimensional feature points, where the extracted two-dimensional feature points are Each two-dimensional feature point of includes a multi-dimensional vector; the extracted two-dimensional feature point is classified by using a classifier to obtain the classified two-dimensional feature point.
  • the three-dimensional feature point acquiring unit 502 is configured to perform feature extraction on the three-dimensional face image using a neural network to obtain the extracted three-dimensional feature points, wherein each of the extracted three-dimensional feature points is The three-dimensional feature points include multi-dimensional vectors; the extracted three-dimensional feature points are classified using a classifier to obtain the classified three-dimensional feature points.
  • the three-dimensional mapping unit 503 is configured to construct the three-dimensional feature space according to the classified three-dimensional feature points; each of the classified two-dimensional feature points is classified by a residual neural network Mapping into the three-dimensional feature space to obtain the three-dimensional mapping feature point.
  • the depth map acquiring unit 504 is configured to acquire the relative loss function of the depth map with respect to the three-dimensional face image;
  • the face image is processed to obtain the counter loss function of the depth map;
  • the target loss function of the depth map is determined according to the relative loss function and the counter loss function; according to the target loss function and the three-dimensional map
  • the feature points are used to generate a deep face image of the target user.
  • the deep face image acquisition unit 505 is configured to train the discriminator in the generation confrontation network through the three-dimensional face image to obtain the trained discriminator;
  • the data in the depth map is input into the trained discriminator to obtain the probability data of the depth map; and the counter loss function is obtained according to the probability data of the depth map.
  • an embodiment of this specification also provides an electronic device, as shown in FIG. 6, including a memory 604, a processor 602, and storage in the memory.
  • the bus architecture (represented by a bus 600), the bus 600 can include any number of interconnected buses and bridges, the bus 600 will include one or N processors represented by the processor 602 and a memory 604 representing The various circuits of the memory are linked together.
  • the bus 600 can also link various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are all known in the art, and therefore, no further description will be given herein.
  • the bus interface 605 provides an interface between the bus 600 and the receiver 601 and transmitter 603.
  • the receiver 601 and the transmitter 603 may be the same element, that is, a transceiver, which provides a unit for communicating with various other devices on the transmission medium.
  • the processor 602 is responsible for managing the bus 600 and general processing, and the memory 604 may be used to store data used by the processor 602 when performing operations.
  • the embodiments of this specification also provide a computer-readable storage medium on which a computer program is stored, and the program is executed when the program is executed by a processor. Steps of any one of the aforementioned deep face image generation methods.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种深度人脸图像的生成方法,在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取的二维特征点进行分类,得到分类后的二维特征点(S102);使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点(S104);将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的(S106);根据所述三维映射特征点,生成深度图(S108);利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像(S110)。

Description

深度人脸图像的生成方法、装置、电子设备及介质 技术领域
本说明书实施例涉及图像处理技术领域,尤其涉及一种深度人脸图像的生成方法、装置、电子设备及介质。
背景技术
随着人脸识别技术的飞速发展,人脸识别技术越来越多的应用在人们的日常生活中,人脸识别技术应用在例如车站的刷脸进站,超市的刷脸付钱和手机APP的刷脸登录等场景中。
在进行人脸识别时通常需要将二维人脸图像转换成三维人脸图像,在将二维人脸图像转换成三维人脸图像时,采用的技术通常是通过深度神经网络直接将二维人脸图像生成深度人脸图像,但是生成的深度人脸图像通常不具有唯一性,更近似于一种平均深度脸的类型。
发明内容
本说明书实施例提供了一种深度人脸图像的生成方法、装置、电子设备及介质,能够有效提高二维人脸图像生成的深度人脸图像的准确性。
本说明书实施例第一方面提供了一种深度人脸图像的生成方法,包括:在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点;以及使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点;将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的;根据所述三维映射特征点,生成深度图;利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
本说明书实施例第二方面提供了一种深度人脸图像的生成装置,包括:二维特征点获取单元,用于在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的 二维特征点;以及三维特征点获取单元,用于使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点;三维映射单元,用于将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的;深度图获取单元,用于根据所述三维映射特征点,生成深度图;深度人脸图像获取单元,用于利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
本说明书实施例第三方面还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述深度人脸图像的生成方法的步骤。
本说明书实施例第四方面还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时上述深度人脸图像的生成方法的步骤。
本说明书实施例的有益效果如下。
基于上述技术方案,在获取所述目标用户的二维特征点和三维特征点之后,对提取的二维特点和三维特征点进行分类,以使得分类后的二维特征点和三维特征点中每个特征点的表征能力增强,在特征点的表征能力增强的基础上进行特征映射,能够有效促使二维空间到三维空间的映射收敛性和一致性更加鲁棒,进而使得二维特征点映射成的三维映射特征点的准确性和质量也更高,在三维映射特征点的准确性和质量更高的基础,促使根据三维映射特征点获取的深度人脸图像的准确性和质量也会随之提高。
附图说明
图1为本说明书实施例中深度人脸图像的生成方法的方法流程图;
图2为本说明书实施例中获取分类后的二维特征点的方法流程图;
图3为本说明书实施例中获取分类后的三维特征点的方法流程图;
图4为本说明书实施例中生成深度人脸图像的流程示意图;
图5为本说明书实施例中深度人脸图像的生成装置的结构示意图;
图6为本说明书实施例中电子设备的结构示意图。
具体实施方式
为了更好的理解上述技术方案,下面通过附图以及具体实施例对本说明书实施例的技术方案做详细的说明,应当理解本说明书实施例以及实施例中的具体特征是对本说明书实施例技术方案的详细的说明,而不是对本说明书技术方案的限定,在不冲突的情况下,本说明书实施例以及实施例中的技术特征可以相互组合。
第一方面,如图1所示,本说明书实施例提供一种深度人脸图像的生成方法,包括步骤S102~S110。
步骤S102、在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点。
步骤S104、使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点。
步骤S106、将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的。
步骤S108、根据所述三维映射特征点,生成深度图。
步骤S110、利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
本说明书实施例提供的深度人脸图像的生成方法可以应用于终端设备或服务器中,所述终端设备包括智能手表、笔记本电脑、台式电脑、智能手机和平板电脑等设备;所述服务器包括笔记本电脑、台式电脑、一体机和平板电脑等设备。
其中,在步骤S102中,可以首先获取所述目标用户的二维人脸图像和三维人脸图像,在获取所述目标用户的二维人脸图像和三维人脸图像时,可以使用三维摄像设备同时采集所述目标用户的二维人脸图像和三维人脸图像,其中,所述三维摄像设备例如可以是3D摄像头、3D云台和3D摄像机等设备,此时,可以实时获取所述目标用户的二维人脸图像和三维人脸图像,也可以在所述三维摄像设备采集的所述目标用户的二维人脸图像和三维人脸图像存储之后,然后从存储设备中读取所述二维人脸图像和所述三维人脸图像。
例如,以3D摄像头为例,通过3D摄像头同时采集商店入口的用户a的二维人脸 图像用r1表示和三维人脸图像用d1表示,然后将3D摄像头采集的r1和d1通过网络传输至服务器,使得服务器能够获取到r1和d1。
以及,在获取所述目标用户的二维人脸图像之后,根据所述二维人脸图像,得到所述分类后的二维特征点,如图2所示,包括步骤S202~S204。
S202、使用神经网络对所述二维人脸图像进行特征提取,得到所述提取到的二维特征点,其中,提取出的每个二维特征点包括多维向量。
其中,所述提取到的二维特征点包含有多个特征点,提取到的每个二维特征点均用多维向量表示,例如每个二维特征点用32维、64维和128维向量等进行向量表示,优选的每个二维特征点均用128维向量表示。
本说明书实施例中,所述神经网络可以是深度神经网络(Deep Neural Networks,简称:DNN)、卷积神经网络(Convolutional Neural Networks,简称:CNN)、循环神经网络(Recurrent Neural Network,简称:RNN)、前馈神经网络(Feed forward neural networks,简称:FFNN)、多任务级联卷积神经网络(Multi-task Cascaded Convolutional Networks,简称MTCNN)和级联深度神经网络(Deep Alignment Network,简称DAN)等,优选地,可以使用DNN对所述二维人脸图像进行特征提取。
具体来讲,在使用所述二维人脸图像进行特征提取时,可以使用DNN对所述二维人脸图像进行特征提取,进而会提取出眼睛、鼻子、眉毛和嘴角等人脸显著部位的二维特征点,提取出的每个二维特征点包括多维向量,例如提取出的每个二维特征点均用128维向量表示。
S204、使用分类器对所述提取到的二维特征点进行分类,得到所述分类后的二维特征点。
本说明书实施例中,所述分类器例如可以是卷积神经网络(Convolutional Neural Networks,简称CNN)分类器、softmax分类器和决策树分类器等。
具体来讲,在通过步骤S202提取到的二维特征点之后,将所述提取到的二维特征点输入到所述分类器进行分类,得到所述分类后的二维特征点。
具体地,所述分类器是已训练的,其训练数据为采集历史人脸图像的二维特征点,训练数据中的每个二维特征点也用多维向量进行表示,且训练数据中每个二维特征点的向量维数与步骤S202中提取的每个二维特征点的向量维数相同,使得训练和实际分类输入到所述分类器中的每个二维特征点的向量维数相同,使得训练数据和实际分类数据 均为同一类型的数据,使得通过所述分类器得到的所述分类后的二维特征点的分类更精确。
接下来执行步骤S104,可以使用三维摄像设备同时采集所述目标用户的二维人脸图像和三维人脸图像,其中,所述三维摄像设备例如可以是3D摄像头、3D云台和3D摄像机等设备。
以及,在获取所述目标用户的三维人脸图像之后,根据所述三维人脸图像,得到所述分类后的三维特征点,如图3所示,包括步骤S302~S304。
S302、使用神经网络对所述三维人脸图像进行特征提取,得到所述提取到的三维特征点,其中,提取出的每个三维特征点包括多维向量。
其中,所述提取到的三维特征点包含有多个特征点,提取到的每个三维特征点均用多维向量表示,例如每个三维特征点用32维、64维和128维向量等进行向量表示,优选的每个三维特征点均用128维向量表示。
本说明书实施例中,所述神经网络可以是深度神经网络DNN、CNN、RNN、FFNN、MTCNN和DAN等,优选地,可以使用DNN对所述三维人脸图像进行特征提取。
具体来讲,在使用所述三维人脸图像进行特征提取时,可以使用DNN对所述三维人脸图像进行特征提取,进而会提取出眼睛、鼻尖、眉毛和嘴角等人脸显著部位的三维特征点,提取出的每个三维特征点均用128维向量表示。
S304、使用分类器对所述提取到的三维特征点进行分类,得到所述分类后的三维特征点。
本说明书实施例中,所述分类器例如可以是卷积神经网络(Convolutional Neural Networks,简称CNN)分类器、softmax分类器和决策树分类器等;进一步的,提取的每个二维特征点包含的向量维度和提取的每个三维特征点包含的向量维度相同,例如提取的每个特征点包含64维向量,则提取的每个三维特征点也包含64维向量;当然,提取的每个二维特征点包含的向量维度和提取的每个三维特征点包含的向量维度也可以不同。
具体来讲,在通过步骤S302提取到的三维特征点之后,将所述提取到的三维特征点输入到所述分类器进行分类,得到所述分类后的三维特征点。
具体地,所述分类器是已训练的,其训练数据为采集历史人脸图像的三维特征点, 训练数据中的每个三维特征点也用多维向量进行表示,且训练数据中每个三维特征点的向量维数与步骤S302中提取的每个三维特征点的向量维数相同,使得训练和实际分类输入到所述分类器中的每个三维特征点的向量维数相同,使得训练数据和实际分类数据均为同一类型的数据,使得通过所述分类器得到的所述分类后的三维特征点的分类更精确。
例如,以3D摄像头为例,在3D摄像头采集用户a的r1和d1之后,将其传输给服务器,使得服务器获取到r1和d1,使用DNN对r1进行特征提取,提取到二维特征点用r1-1表示,其中,r1-1中的每个特征点均使用128维向量表示;在使用softmax分类器对r1-1进行分类,得到分类后的二维特征点,所述分类后的二维特征点包括眼睛位置的特征点对应的眼睛类二维特征点,鼻子位置的特征点对应的鼻子类二维特征点,嘴巴位置的特征点对应的嘴巴类二维特征点,眉毛位置的特征点对应的眉毛类二维特征点。
相应地,使用DNN对d1进行特征提取,提取到三维特征点用d1-1表示,其中,d1-1中的每个特征点均使用128维向量表示;在使用softmax分类器对d1-1进行分类,得到分类后的三维特征点,所述分类后的三维特征点包括眼睛位置的特征点对应的眼睛类三维特征点,鼻子位置的特征点对应的鼻子类三维特征点,嘴巴位置的特征点对应的嘴巴类三维特征点,眉毛位置的特征点对应的眉毛类三维特征点。
如此,在获取所述目标用户的二维特征点和三维特征点之后,通过所述分类器对提取的二维特征点和三维特征点进行分类,通过对提取的二维特征点和三维特征点进行分类,以使得分类后的二维特征点和三维特征点中每个特征点的表征能力增强,在特征点的表征能力增强的基础上进行特征映射,能够有效促使二维空间到三维空间的映射收敛性和一致性更加鲁棒,完成二维空间到三维空间的跨模态比对。
接下来执行步骤S106,在该步骤中,可以首先根据所述分类后的三维特征点构建所述三维特征空间,然后通过残差神经网络将所述分类后的每个二维特征点映射到所述三维特征空间中,得到所述三维映射特征点。
具体来讲,在构建出所述三维特征空间之后,通过所述残差神经网络中的恒等映射,将所述分类后的每个二维特征点映射到所述三维特征空间中,得到所述三维映射特征点,其中,所述残差神经网络可以为ResNet。
具体地,所述分类后的三维特征点中每个特征点均用多维向量表示,从而可以根据所述分类后的三维特征点中每个特征点的多维向量,在三维坐标系中确定出分类后的 每个三维特征点的坐标,在三维坐标系中将分类后的每个三维特征点相连,构建出所述三维特征空间。
具体地,所述残差神经网络的算法具体如下述公式1所示:
F(x)=H(x)-x   公式1
其中,公式1中F(x)表示残差函数,x表示输入,H(x)表示输出;进一步地:若F(x)=0时,x=H(x),即为恒等映射。
具体地,可以使用向量相似度来训练所述残差函数中的参数,其中,在获取所述向量相似度时,可以使用相似度算法获取到所述向量相似度。
本说明书实施例中,所述相似度算法可以包括欧几里得距离算法、余弦相似度算法,曼哈顿距离算法和皮尔逊相关系数等。
具体来讲,可以获取历史二维特征点和历史三维特征点的映射关系;根据所述映射关系,使用相似度算法获取具有映射关系的每组特征点的向量相似度,其中,每组特征点包括具有映射关系的一个历史二维特征点和一个历史三维特征点;利用每组特征点的相似度对所述残差函数进行训练,得到训练的所述残差函数。
具体地,可以使用历史二维特征点和历史三维特征点的映射关系,对所述残差函数中的参数进行训练,例如,若历史二维特征特征点为r11、r12、r13、r14和r15,依次与历史三维特点中的d11、d12、d13、d14和d15对应,如此,可以通过余弦相似度算法依次获取r11与d11之间的向量相似度用s1表示,r12与d12之间的向量相似度s2,r13与d13之间的向量相似度s3,r14与d14之间的向量相似度s4,以及r15与d15之间的向量相似度s5,根据s1、s2、s3、s4和s5对所述残差函数中的参数进行训练,以使得所述残差函数中的参数更准确,在所述残差函数中的参数更准确的基础上,使用训练的所述残差函数进行映射得到的所述三维映射特征点的准确度也会随之提高。
具体地,通过对提取的二维特征点和三维特征点进行分类,以使得分类后的二维特征点和三维特征点中每个特征点的表征能力增强,在特征点的表征能力增强的基础上进行特征映射,以及通过向量相似度对所述残差函数进行训练,使得所述残差函数中的参数更准确,在所述残差函数中的参数更准确且特征点的表征能力增强的基础上,能够促使训练的所述残差函数进行映射得到的所述三维映射特征点的准确度进一步提高。
接下来执行步骤S108,在该步骤中,可以采用深度网络结构对所述三维映射特征点进行处理,得到深度图;其中,所述深度网络结构为深度学习中的网络结构,可以为 经典卷积神经网络结构(例如为LeNet-5、AlexNet和VGG-16),深度学习之基础模型(例如PolyNet),以及残差网络结构(ResNet)等,本说明书不作具体限制。
具体来讲,在通过步骤S106获取到所述三维映射特征点之后,由于所述三维映射特征点的数量为多个,从而将所述三维映射特征点中的每个特征点输入到所述深度网络结构中进行计算,进而得到所述深度图,由于通过步骤S106获取的所述三维映射特征点的准确度较高,进而使得获取的所述深度图的准确度也会随之提高。
接下来执行步骤S110,在该步骤中,可以获取所述深度图相对于所述三维人脸图像的相对损失函数;以及可以通过生成对抗网络(Generative Adversarial Network,简称:GAN)对所述深度图和所述三维人脸图像进行处理,得到所述深度图的对抗损失函数;根据所述相对损失函数和所述对抗损失函数,确定出所述深度图的目标损失函数;根据所述目标损失函数和所述三维映射特征点,生成所述目标用户的深度人脸图像。
具体来讲,可以对所述深度图和所述三维人脸图像进行分类处理或回归处理,得到所述相对损失函数;以及可以通过所述三维人脸图像对所述生成对抗网络中的判别器进行训练,得到已训练的判别器;将所述深度图中的数据输入到所述已训练的判别器中,得到所述深度图的概率数据;根据所述深度图的概率数据,得到所述对抗损失函数;根据所述相对损失函数和所述对抗损失函数,确定所述目标损失函数;再根据所述目标损失函数和所述三维映射特征点,生成所述目标用户的深度人脸图像。
其中,在获取所述深度图的概率数据之后,根据所述概率数据进行损失计算,得到所述对抗损失函数。
具体来讲,在根据所述相对损失函数和所述对抗损失函数,确定所述目标损失函数时,可以获取所述相对损失函数的相对权重,以及获取所述对抗损失函数的对抗权重;再根据所述相对损失函数和所述相对权重,以及所述对抗损失函数和所述对抗权重,确定所述目标损失函数。
其中,所述相对权重和所述对抗权重可以通过历史数据训练得到,也可以由人工或设备自行设定,所述相对权重通常小于所述对抗权重,当然,所述相对权重也可以不小于所述对抗权重,例如所述相对权重为0.1,0.2和0.3等,所述对抗权重为0.9,0.8和0.7等,本说明书不作具体限制。
具体地,可以将所述相对损失函数与所述相对权重的乘积作为相对乘积,以及将所述对抗损失函数和所述对抗权重的乘积作为对抗乘积,再获取所述相对乘积和所述对 抗乘积之和作为所述目标损失函数。
具体地,在根据所述目标损失函数和所述三维映射特征点,生成所述深度人脸图像时,可以通过所述目标损失函数对所述深度网络结构进行优化,根据优化后的所述深度网络结构对所述三维映射特征点进行处理,得到所述深度人脸图像。
本说明书实施例中提供了一种生成深度人脸图像的流程示意图,具体如图4所示,包括:深度相机40,深度相机40用于采集用户a的二维人脸图像41和三维人脸图像42,通过神经网络分别对二维人脸图像41和三维人脸图像42进行特征提取,得到二维特征点43和三维特征点44;使用softmax分类器对二维特征点43进行分类训练,得到分类后的二维特征点为二维分类特征点431;以及使用softmax分类器对三维特征点44进行分类训练,得到分类后的二维特征点为三维分类特征点441。
以及,通过三维分类特征点441构建三维特征空间,然后通过残差神经网络将二维分类特征点431映射到三维特征空间中得到三维映射特征点432;三维映射特征点432通过深度网络结构生成深度图45;根据深度图45和三维人脸图像42生成相对残差参数为mse-loss46;通过GAN网络对深度图45进行训练得到对抗残差参数为d-loss47;若所述相对权重为0.9且所述对抗权重为0.1,则确定目标损失函数为final-loss48,且final-loss=0.9×D-loss+0.1×mse-loss;再根据final-loss48对所述深度网络结构进行优化,根据优化后的所述深度网络结构对三维映射特征点432进行处理,得到深度人脸图像49;在获取到深度人脸图像49,可以对深度人脸图像49进行人脸识别,由于深度人脸图像49的准确性和质量更高,使得人脸识别的识别准确度也会随之提高。
例如,以3D摄像头采集用户a的r1和d1为例,分别对r1和d2进行特征提取并使用softmax分类器进行分类,得到分类后的二维特征点用r1-2表示和分类后的三维特征点用d1-2表示,其中,上述每个特征点均用128维向量表示。
以及,根据d1-2构建三维特征空间,通过恒等映射将r1-2中的每个特征点映射到所述三维特征空间中,得到三维映射特征点用r1-3表示;通过深度网络结构对r1-3进行处理,生成深度图用d2表示。
进一步地,通过分类器对d1和d2中的数据进行分类处理,得到所述相对损失函数用mse-los表示;以及利用d1对GAN网络中判别器进行训练,得到已训练的所述判别器,以增强所述判别器的判别准确性;然后使用已训练的所述判别器对d2中的数据进行判别,得到d2的概率数据;根据d2的概率数据,生成所述对抗损失函数用D-loss 表示。
进一步地,若所述相对权重为0.8且所述对抗权重为0.2,则所述目标损失函数final-loss=0.8×D-loss+0.2×mse-loss;再根据final-loss对所述深度网络结构进行优化,根据优化后的所述深度网络结构对r1-3进行处理,得到所述深度人脸图像。
如此,通过生成二维特征点和三维特征点,并在三维特征空间中完成二维到三维的跨模深度学习,使得二维到三维之间映射能力的学习根据高效和鲁棒,使得生成的二维特征点映射成的三维映射特征点的准确性和质量也更高,进而可以通过二维人脸图像生成含有丰富三维信息的特征点。
以及,由于GAN网络在生成所述对抗损失函数过程中,会使用三维人脸图像的数据对GAN网络中的判别器进行训练,使得已训练的所述判别器的判断准确性更强,在已训练的所述判别器的判断准确性更强的基础上,通过已训练的所述判别器对所述深度图中的数据进行判断而获得的所述概率数据的准确度也会随之提高;在所述概率数据提高的基础上,根据所述概率数据得到的所述对抗损失函数也会随之提高。
以及,由于所述目标损失函数是根据所述对抗损失函数和所述相对损失函数而确定的,而所述相对损失函数是根据所述深度图和所述三维人脸图像之间的差异而获取的,以及所述对抗损失函数是通过GAN网络的训练而得到的,如此,使得所述目标损失函数是在原始深度图像(所述三维人脸图像)和GAN网络的双重约束下而得到的,使得所述目标损失函数的准确度也会随之提高,在所述目标损失函数的准确度提高,且所述三维映射特征点的准确度和质量均提高的基础上,根据所述目标损失函数和所述三维映射特征点而得到的所述深度人脸图像的准确度和质量也会随之提高。
第二方面,基于与第一方面的同一发明构思,本说明书实施例提供了一种深度人脸图像的生成装置,如图5所示,包括二维特征点获取单元501、三维特征点获取单元502、三维映射单元503、深度图获取单元504、深度人脸图像获取单元505。
二维特征点获取单元501,用于在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点;以及
三维特征点获取单元502,用于使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点;
三维映射单元503,用于将所述分类后的二维特征点映射到三维特征空间中,得到 三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的;
深度图获取单元504,用于根据所述三维映射特征点,生成深度图;
深度人脸图像获取单元505,用于利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
在一种可选的实施方式中,二维特征点获取单元501,用于使用神经网络对所述二维人脸图像进行特征提取,得到所述提取到的二维特征点,其中,提取出的每个二维特征点包括多维向量;使用分类器对所述提取到的二维特征点进行分类,得到所述分类后的二维特征点。
在一种可选的实施方式中,三维特征点获取单元502,用于使用神经网络对所述三维人脸图像进行特征提取,得到所述提取到的三维特征点,其中,提取出的每个三维特征点包括多维向量;使用分类器对所述提取到的三维特征点进行分类,得到所述分类后的三维特征点。
在一种可选的实施方式中,三维映射单元503,用于根据所述分类后的三维特征点构建所述三维特征空间;通过残差神经网络将所述分类后的每个二维特征点映射到所述三维特征空间中,得到所述三维映射特征点。
在一种可选的实施方式中,深度图获取单元504,用于获取所述深度图相对于所述三维人脸图像的相对损失函数;通过生成对抗网络对所述深度图和所述三维人脸图像进行处理,得到所述深度图的对抗损失函数;根据所述相对损失函数和所述对抗损失函数,确定出所述深度图的目标损失函数;根据所述目标损失函数和所述三维映射特征点,生成所述目标用户的深度人脸图像。
在一种可选的实施方式中,深度人脸图像获取单元505,用于通过所述三维人脸图像对所述生成对抗网络中的判别器进行训练,得到已训练的判别器;将所述深度图中的数据输入到所述已训练的判别器中,得到所述深度图的概率数据;根据所述深度图的概率数据,得到所述对抗损失函数。
第三方面,基于与前述实施例中深度人脸图像的生成方法同样的发明构思,本说明书实施例还提供一种电子设备,如图6所示,包括存储器604、处理器602及存储在存储器604上并可在处理器602上运行的计算机程序,所述处理器602执行所述程序时实现前文所述深度人脸图像的生成方法的任一方法的步骤。
其中,在图6中,总线架构(用总线600来代表),总线600可以包括任意数量的互联的总线和桥,总线600将包括由处理器602代表的一个或N个处理器和存储器604代表的存储器的各种电路链接在一起。总线600还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口605在总线600和接收器601和发送器603之间提供接口。接收器601和发送器603可以是同一个元件,即收发机,提供用于在传输介质上与各种其他装置通信的单元。处理器602负责管理总线600和通常的处理,而存储器604可以被用于存储处理器602在执行操作时所使用的数据。
第四方面,基于与前述实施例中深度人脸图像的生成方法的发明构思,本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前文所述深度人脸图像的生成方法的任一方法的步骤。
本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的设备。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令设备的制造品,该指令设备实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本说明书的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本说明书范围的所有变更和修改。
显然,本领域的技术人员可以对本说明书进行各种改动和变型而不脱离本说明书的精神和范围。这样,倘若本说明书的这些修改和变型属于本说明书权利要求及其等同技术的范围之内,则本说明书也意图包含这些改动和变型在内。

Claims (14)

  1. 一种深度人脸图像的生成方法,包括:
    在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点;以及
    使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点;
    将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成;
    根据所述三维映射特征点,生成深度图;
    利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
  2. 如权利要求1所述的生成方法,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点,包括:
    使用神经网络对所述二维人脸图像进行特征提取,得到所述提取到的二维特征点,其中,提取出的每个二维特征点包括多维向量;
    使用分类器对所述提取到的二维特征点进行分类,得到所述分类后的二维特征点。
  3. 如权利要求2所述的生成方法,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点,包括:
    使用神经网络对所述三维人脸图像进行特征提取,得到所述提取到的三维特征点,其中,提取出的每个三维特征点包括多维向量;
    使用分类器对所述提取到的三维特征点进行分类,得到所述分类后的三维特征点。
  4. 如权利要求1所述的生成方法,将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,包括:
    根据所述分类后的三维特征点构建所述三维特征空间;
    通过残差神经网络将所述分类后的每个二维特征点映射到所述三维特征空间中,得到所述三维映射特征点。
  5. 如权利要1所述的生成方法,利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像,包括:
    获取所述深度图相对于所述三维人脸图像的相对损失函数;
    通过生成对抗网络对所述深度图和所述三维人脸图像进行处理,得到所述深度图的对抗损失函数;
    根据所述相对损失函数和所述对抗损失函数,确定出所述深度图的目标损失函数;
    根据所述目标损失函数和所述三维映射特征点,生成所述目标用户的深度人脸图像。
  6. 如权利要5所述的生成方法,通过生成对抗网络对所述深度图和所述三维人脸图像进行处理,得到所述深度图的对抗损失函数,包括:
    通过所述三维人脸图像对所述生成对抗网络中的判别器进行训练,得到已训练的判别器;
    将所述深度图中的数据输入到所述已训练的判别器中,得到所述深度图的概率数据;
    根据所述深度图的概率数据,得到所述对抗损失函数。
  7. 一种深度人脸图像的生成装置,包括:
    二维特征点获取单元,用于在获取目标用户的二维人脸图像和三维人脸图像之后,使用神经网络对所述二维人脸图像进行特征提取,并对提取到的二维特征点进行分类,得到分类后的二维特征点;以及
    三维特征点获取单元,用于使用神经网络对所述三维人脸图像进行特征提取,并对提取到的三维特征点进行分类,得到分类后的三维特征点;
    三维映射单元,用于将所述分类后的二维特征点映射到三维特征空间中,得到三维映射特征点,其中,所述三维特征空间由所述分类后的三维特征点构成的;
    深度图获取单元,用于根据所述三维映射特征点,生成深度图;
    深度人脸图像获取单元,用于利用所述三维人脸图像对所述深度图进行增强处理,得到增强后的所述深度图,并将增强后的所述深度图作为所述目标用户的深度人脸图像。
  8. 如权利要求7所述的生成装置,所述二维特征点获取单元,用于使用神经网络对所述二维人脸图像进行特征提取,得到所述提取到的二维特征点,其中,提取出的每个二维特征点包括多维向量;使用分类器对所述提取到的二维特征点进行分类,得到所述分类后的二维特征点。
  9. 如权利要求8所述的生成装置,所述三维特征点获取单元,用于使用神经网络对所述三维人脸图像进行特征提取,得到所述提取到的三维特征点,其中,提取出的每个三维特征点包括多维向量;使用分类器对所述提取到的三维特征点进行分类,得到所述分类后的三维特征点。
  10. 如权利要求7所述的生成装置,所述三维映射单元,用于根据所述分类后的三维特征点构建所述三维特征空间;通过残差神经网络将所述分类后的每个二维特征点映 射到所述三维特征空间中,得到所述三维映射特征点。
  11. 如权利要7所述的生成装置,所述深度人脸图像获取单元,用于获取所述深度图相对于所述三维人脸图像的相对损失函数;通过生成对抗网络对所述深度图和所述三维人脸图像进行处理,得到所述深度图的对抗损失函数;根据所述相对损失函数和所述对抗损失函数,确定出所述深度图的目标损失函数;根据所述目标损失函数和所述三维映射特征点,生成所述目标用户的深度人脸图像。
  12. 如权利要11所述的生成装置,所述深度人脸图像获取单元,用于通过所述三维人脸图像对所述生成对抗网络中的判别器进行训练,得到已训练的判别器;将所述深度图中的数据输入到所述已训练的判别器中,得到所述深度图的概率数据;根据所述深度图的概率数据,得到所述对抗损失函数。
  13. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1-6任一项所述方法的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-6任一项所述方法的步骤。
PCT/CN2020/123544 2019-12-25 2020-10-26 深度人脸图像的生成方法、装置、电子设备及介质 WO2021129107A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911358809.0 2019-12-25
CN201911358809.0A CN111144284B (zh) 2019-12-25 2019-12-25 深度人脸图像的生成方法、装置、电子设备及介质

Publications (1)

Publication Number Publication Date
WO2021129107A1 true WO2021129107A1 (zh) 2021-07-01

Family

ID=70520088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123544 WO2021129107A1 (zh) 2019-12-25 2020-10-26 深度人脸图像的生成方法、装置、电子设备及介质

Country Status (2)

Country Link
CN (1) CN111144284B (zh)
WO (1) WO2021129107A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902768A (zh) * 2021-10-11 2022-01-07 浙江博采传媒有限公司 一种基于可微渲染的三维人脸模型边缘优化方法及系统
CN114998600A (zh) * 2022-06-17 2022-09-02 北京百度网讯科技有限公司 图像处理方法、模型的训练方法、装置、设备及介质
CN116645299A (zh) * 2023-07-26 2023-08-25 中国人民解放军国防科技大学 一种深度伪造视频数据增强方法、装置及计算机设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144284B (zh) * 2019-12-25 2021-03-30 支付宝(杭州)信息技术有限公司 深度人脸图像的生成方法、装置、电子设备及介质
CN113705390B (zh) * 2021-08-13 2022-09-27 北京百度网讯科技有限公司 定位方法、装置、电子设备和存储介质
CN115171196B (zh) * 2022-08-25 2023-03-28 北京瑞莱智慧科技有限公司 人脸图像处理方法、相关装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075315A1 (en) * 2016-09-12 2018-03-15 Sony Corporation Information processing apparatus and information processing method
CN108492248A (zh) * 2018-01-30 2018-09-04 天津大学 基于深度学习的深度图超分辨率方法
CN108492364A (zh) * 2018-03-27 2018-09-04 百度在线网络技术(北京)有限公司 用于生成图像生成模型的方法和装置
CN110223230A (zh) * 2019-05-30 2019-09-10 华南理工大学 一种多前端深度图像超分辨率系统及其数据处理方法
CN111144284A (zh) * 2019-12-25 2020-05-12 支付宝(杭州)信息技术有限公司 深度人脸图像的生成方法、装置、电子设备及介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2869239A3 (en) * 2013-11-04 2015-08-19 Facebook, Inc. Systems and methods for facial representation
CN106981091B (zh) * 2017-04-27 2020-04-17 深圳奥比中光科技有限公司 人体三维建模数据处理方法及装置
CN107944422B (zh) * 2017-12-08 2020-05-12 业成科技(成都)有限公司 三维摄像装置、三维摄像方法及人脸识别方法
CN109979013B (zh) * 2017-12-27 2021-03-02 Tcl科技集团股份有限公司 三维人脸贴图方法及终端设备
WO2020037676A1 (zh) * 2018-08-24 2020-02-27 太平洋未来科技(深圳)有限公司 三维人脸图像生成方法、装置及电子设备
CN109325994B (zh) * 2018-09-11 2023-03-24 合肥工业大学 一种基于三维人脸数据增强的方法
CN109685915B (zh) * 2018-12-11 2023-08-15 维沃移动通信有限公司 一种图像处理方法、装置及移动终端
CN109670487A (zh) * 2019-01-30 2019-04-23 汉王科技股份有限公司 一种人脸识别方法、装置及电子设备
CN110147721B (zh) * 2019-04-11 2023-04-18 创新先进技术有限公司 一种三维人脸识别方法、模型训练方法和装置
CN110427799B (zh) * 2019-06-12 2022-05-06 中国地质大学(武汉) 基于生成对抗网络的人手深度图像数据增强方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075315A1 (en) * 2016-09-12 2018-03-15 Sony Corporation Information processing apparatus and information processing method
CN108492248A (zh) * 2018-01-30 2018-09-04 天津大学 基于深度学习的深度图超分辨率方法
CN108492364A (zh) * 2018-03-27 2018-09-04 百度在线网络技术(北京)有限公司 用于生成图像生成模型的方法和装置
CN110223230A (zh) * 2019-05-30 2019-09-10 华南理工大学 一种多前端深度图像超分辨率系统及其数据处理方法
CN111144284A (zh) * 2019-12-25 2020-05-12 支付宝(杭州)信息技术有限公司 深度人脸图像的生成方法、装置、电子设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEEPAK PATHAK; PHILIPP KRAHENBUHL; JEFF DONAHUE; TREVOR DARRELL; ALEXEI A. EFROS: "Context Encoders: Feature Learning by Inpainting", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 April 2016 (2016-04-25), 201 Olin Library Cornell University Ithaca, NY 14853, XP080697790 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902768A (zh) * 2021-10-11 2022-01-07 浙江博采传媒有限公司 一种基于可微渲染的三维人脸模型边缘优化方法及系统
CN114998600A (zh) * 2022-06-17 2022-09-02 北京百度网讯科技有限公司 图像处理方法、模型的训练方法、装置、设备及介质
CN116645299A (zh) * 2023-07-26 2023-08-25 中国人民解放军国防科技大学 一种深度伪造视频数据增强方法、装置及计算机设备
CN116645299B (zh) * 2023-07-26 2023-10-10 中国人民解放军国防科技大学 一种深度伪造视频数据增强方法、装置及计算机设备

Also Published As

Publication number Publication date
CN111144284A (zh) 2020-05-12
CN111144284B (zh) 2021-03-30

Similar Documents

Publication Publication Date Title
WO2021129107A1 (zh) 深度人脸图像的生成方法、装置、电子设备及介质
Zhou et al. 3D face recognition: a survey
CN110135249B (zh) 基于时间注意力机制和lstm的人体行为识别方法
US20200387748A1 (en) Facial image data collection method, apparatus, terminal device and storage medium
WO2015149534A1 (zh) 基于Gabor二值模式的人脸识别方法及装置
WO2023098128A1 (zh) 活体检测方法及装置、活体检测系统的训练方法及装置
CN110457515B (zh) 基于全局特征捕捉聚合的多视角神经网络的三维模型检索方法
CN110390308B (zh) 一种基于时空对抗生成网络的视频行为识别方法
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN111723600B (zh) 一种基于多任务学习的行人重识别特征描述子
CN112562159B (zh) 一种门禁控制方法、装置、计算机设备和存储介质
CN113254491A (zh) 一种信息推荐的方法、装置、计算机设备及存储介质
CN113378770A (zh) 手势识别方法、装置、设备、存储介质以及程序产品
WO2023124278A1 (zh) 图像处理模型的训练方法、图像分类方法及装置
WO2023124869A1 (zh) 用于活体检测的方法、装置、设备及存储介质
Imani et al. Histogram of the node strength and histogram of the edge weight: two new features for RGB-D person re-identification
Li et al. Fitness action counting based on MediaPipe
CN111291611A (zh) 一种基于贝叶斯查询扩展的行人重识别方法及装置
CN110610131A (zh) 人脸运动单元的检测方法、装置、电子设备及存储介质
TW201828156A (zh) 圖像識別方法、度量學習方法、圖像來源識別方法及裝置
CN116994319A (zh) 训练模型的方法和人脸识别方法、设备、介质
Selvi et al. FPGA implementation of a face recognition system
CN112749711B (zh) 视频获取方法和装置及存储介质
Baig et al. Facial Paralysis Recognition Using Face Mesh-Based Learning.
CN114038045A (zh) 一种跨模态人脸识别模型构建方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20906747

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20906747

Country of ref document: EP

Kind code of ref document: A1