WO2021036059A1 - Procédé d'entraînement d'un modèle de conversion d'image, procédé de reconnaissance faciale hétérogène, dispositif et appareil - Google Patents

Procédé d'entraînement d'un modèle de conversion d'image, procédé de reconnaissance faciale hétérogène, dispositif et appareil Download PDF

Info

Publication number
WO2021036059A1
WO2021036059A1 PCT/CN2019/121348 CN2019121348W WO2021036059A1 WO 2021036059 A1 WO2021036059 A1 WO 2021036059A1 CN 2019121348 W CN2019121348 W CN 2019121348W WO 2021036059 A1 WO2021036059 A1 WO 2021036059A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
sketch map
updated
feature vector
Prior art date
Application number
PCT/CN2019/121348
Other languages
English (en)
Chinese (zh)
Inventor
王孝宇
柳军领
王楠楠
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2021036059A1 publication Critical patent/WO2021036059A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of artificial intelligence and image processing technology, and in particular to an image conversion model training method, a heterogeneous face recognition method, device and equipment.
  • face recognition has always been a hot field of research from all walks of life, and significant progress has been made on the basis of deep convolutional neural networks, which are widely used in all walks of life in society.
  • face recognition can be used to narrow the scope of potential suspects and reduce the difficulty of investigation.
  • Face recognition due to the large differences between the face image collected by the image acquisition device and the sketch map (such as shape, texture, color, etc.), it is difficult to use traditional face recognition algorithms to identify the face in the public security according to the sketch map. Precise retrieval in the database.
  • the face recognition method for sketch map is to use convolutional neural network to mine the potential relationship between sketch map and real image, and model based on the non-linear mapping of the two, so as to convert the sketch map to the real image. , And then recognize the converted image; or use a generative confrontation network to generate a more realistic texture image for recognition through the confrontation loss, but the accuracy is not very high.
  • this application provides an image conversion model training method, a heterogeneous face recognition method, device and equipment, which are beneficial to improve the accuracy of heterogeneous face recognition.
  • the first aspect of the embodiments of the present application provides an image conversion model training method, including:
  • the first synthetic face image is input into a pre-trained convolutional neural network model for feature extraction, the feature vector of the first synthetic face image is obtained, and the feature vector of the first synthetic face image is compared with The feature vectors of the real face images corresponding to the training samples are compared, and the comparison results are output;
  • the image conversion model is obtained according to the discrimination result and the comparison result.
  • the obtaining the image conversion model according to the discrimination result and the comparison result includes:
  • the training sample is input to the updated generator for processing to obtain an updated first face composite image, and the updated first face composite image and the real face image corresponding to the training sample are input into the updated first face composite image.
  • the discriminator performs the discrimination, outputs the updated discrimination result, and inputs the updated first face composite image into the pre-trained convolutional neural network model for feature extraction to obtain the feature vector of the updated first face composite image , And compare the updated feature vector of the first synthetic face image with the feature vector of the real face image corresponding to the training sample, and output the updated comparison result;
  • the network parameter weights of the updated generator and the updated discriminator are fixed to obtain the image conversion model.
  • the second aspect of the embodiments of the present application provides a heterogeneous face recognition method, including:
  • the feature vector of the second synthetic face image is matched with feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the cropping the sketch map of the face to be recognized to obtain the sketch map of the face area includes:
  • the sketch map of the face to be recognized is cropped to obtain the sketch map of the face area.
  • the image conversion model includes a generator; said inputting the sketch map of the face region into a pre-trained image conversion model for processing to generate a second face Composite image, including:
  • the dimensions of the feature map are compressed to generate the second face composite image.
  • the third aspect of the embodiments of the present application provides an image conversion model training device, including:
  • the first image generation module is used to input the training samples into the generator that generates the confrontation network for processing to generate the first synthetic face image
  • the first image discrimination module is configured to input the first synthetic face image and the real face image corresponding to the training sample into the discriminator of the generation confrontation network for discrimination, and output the discrimination result;
  • the first image comparison module is configured to input the first synthetic face image into a pre-trained convolutional neural network model for feature extraction, obtain the feature vector of the first synthetic face image, and combine the first synthetic face image
  • the feature vector of a synthetic face image is compared with the feature vector of the real face image corresponding to the training sample, and the comparison result is output;
  • the model generation module is used to obtain the image conversion model according to the discrimination result and the comparison result.
  • a heterogeneous face recognition device including: a face cropping module, configured to obtain a sketch map of a face to be recognized, and crop the sketch map of a face to be recognized to obtain a person Sketch map of face area;
  • the second face generation module is configured to input the sketch map of the face area into a pre-trained image conversion model for processing, so as to generate a second face composite image;
  • the first feature extraction module is configured to perform feature extraction on the second face composite image to obtain a feature vector of the second face composite image
  • the face matching module is used to match the feature vector of the second synthetic face image with the feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the fifth aspect of the embodiments of the present application provides an electronic device.
  • the electronic device includes a processor, a memory, and a computer program stored on the memory and running on the processor, and the processor executes all
  • the computer program implements the steps in the method of any one of the first aspect or the second aspect.
  • the sixth aspect of the embodiments of the present application provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, any one of the first aspect or the second aspect described above is implemented. Steps in the method described in item.
  • the embodiment of the present application obtains a sketch map of the face to be recognized, and crops the sketch map of the face to be recognized to obtain a sketch map of the face area; sketching the face area
  • the image input pre-trained image conversion model is processed to generate a second face composite image; feature extraction is performed on the second face composite image to obtain the feature vector of the second face composite image;
  • the feature vector of the second synthetic face image is matched with feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the face area is first cut out from the sketch map of the face to be recognized to reduce the interference of the background area on the face recognition.
  • the dual constraints of the anti-loss and high-level features are used, except In addition to making the generated second face composite image clearer, it can also make the high-level features closer to the high-level features of the real face image, and convert heterogeneous face recognition to homogeneous face recognition, which is beneficial to improve heterogeneity The accuracy of face recognition.
  • Figure 1 is an application architecture diagram provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of an image conversion model training method provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a heterogeneous face recognition method provided by an embodiment of this application.
  • FIG. 4 is a schematic structural diagram of a multi-task cascaded convolutional neural network provided by an embodiment of this application;
  • FIG. 5 is a schematic structural diagram of an image conversion model provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of an image conversion model training device provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of another image conversion model training device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of another image conversion model training device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of a heterogeneous face recognition device provided by an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of another heterogeneous face recognition device provided by an embodiment of this application.
  • FIG. 11 is a schematic structural diagram of another heterogeneous face recognition device provided by an embodiment of this application.
  • FIG. 12 is a schematic structural diagram of another heterogeneous face recognition device provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • FIG. 1 is an application architecture diagram provided by an embodiment of the application. As shown in FIG. 1, it includes a user terminal, a server, and a database.
  • the user terminal and the server interact through a wired or wireless network. It can be a database in the server, or a database independent of the server, a local database, or a cloud database, and the specifics are not limited.
  • the user terminal can be a smart phone, a tablet computer, a notebook computer, a desktop computer, a mobile Internet device (MID, Mobile Internet Device) or a wearable device, etc.
  • MID Mobile Internet Device
  • the user terminal is installed with an application program or provides a data upload interface, and the user can use the user
  • the application or data upload interface on the terminal uploads the sketch of the face to be recognized to the server, and receives the face recognition result returned by the server;
  • the server can be a local server, a cloud server or a server cluster, which is a heterogeneous face provided by this application
  • the executive body of the recognition method is mainly used to process and recognize the sketch map of the face to be recognized, and return the recognition result to the user terminal.
  • FIG. 2 is a schematic flowchart of an image conversion model training method provided by an embodiment of the application. As shown in FIG. 2, it includes the steps:
  • S21 Input training samples into a generator that generates a confrontation network for processing, and generate a first synthetic face image.
  • the training sample refers to the face region sketch map sample used for training, which can be taken from the sample database
  • the first face composite image refers to the face composite image generated using the training sample.
  • the The first step is to use the multi-layer residual network to perform feature extraction on the training sample, then deconvolve the extracted features to obtain a feature map with the same size as the training sample, and finally compress the dimension of the feature map to generate the first person Face composite image.
  • the generative confrontation network mainly includes two parts: a generator and a discriminator.
  • the generator is a network that generates pictures and is used to process the input to generate pictures
  • the discriminator is a discriminatory network. , Used to judge whether the input picture is a real picture.
  • the real face image corresponding to the training sample refers to the real face image corresponding to the face area sketch map sample used for training, which comes from the database, for example, a face area sketch map sample of A, which is also stored in the database
  • the real face image of A, the real face image corresponding to the training sample can be used by the discriminator to determine whether the generated initial face image meets the distribution of the real face image corresponding to the training sample.
  • the first synthetic face image output by the generator and the real face image corresponding to the training sample are input to the discriminator, and the discrimination result is output.
  • the discrimination result is a probability value indicating whether the generated first synthetic face image conforms to the corresponding training sample.
  • the probability of the distribution of real face images is as follows:
  • D(x) represents the actual output result of the discriminator on the real face image x
  • D(G(z)) represents the discriminator’s response to the first synthetic face image G(z) generated by the generator
  • p2(x) represents the distribution of the real face image
  • p1(z) represents the distribution of the first synthetic face image generated by the generator
  • E represents the expectation.
  • the feature vector refers to a 512-dimensional high-level feature vector. Since the high-level feature vector of each person is different, the high-level feature vector has a certain degree of discrimination.
  • the feature vector of the first synthetic face image is compared with the feature vector of the real face image corresponding to the training sample, and the comparison result is output.
  • the comparison result is used to represent the reality of the first synthetic face image and the training sample. The proximity of the face image.
  • step S24 includes:
  • the training sample is input to the updated generator for processing to obtain an updated first face composite image, and the updated first face composite image and the real face image corresponding to the training sample are input into the updated first face composite image.
  • the discriminator performs the discrimination, outputs the updated discrimination result, and inputs the updated first face composite image into the pre-trained convolutional neural network model for feature extraction to obtain the feature vector of the updated first face composite image , And compare the updated feature vector of the first synthetic face image with the feature vector of the real face image corresponding to the training sample, and output the updated comparison result;
  • the network parameter weights of the updated generator and the updated discriminator are fixed to obtain the image conversion model.
  • the network parameter weights of the generator and the discriminator will not be updated only when the discrimination result and the comparison result meet the preset expected value, otherwise the network parameter weights of the generator and the discriminator will be updated .
  • the supervised update generator uses the training sample to generate the updated first face composite image, and repeatedly executes the update of the first face composite image and the real face image corresponding to the training sample into the updated discriminator for discrimination,
  • the updated first face composite image is input into the pre-trained convolutional neural network model for feature extraction, and the feature vector of the updated first face composite image is compared with the feature vector of the real face image corresponding to the training sample Steps, until the obtained discrimination result and comparison result meet the preset expected value, the network parameter weights of the updated generator and the updated discriminator are fixed, and a trained image conversion model is obtained.
  • each discrimination of the discriminator can be realized through a predefined loss function:
  • LOSS represents the loss value of the entire generation of the confrontation network
  • xi and zi represent the real face image and the input variables of the generator
  • n represents the number of xi and zi
  • D(xi) represents the discriminator’s effect on the real face image xi.
  • the actual output result, G(zi) represents the actual output result of the generator to the input variable zi.
  • the network parameter weights of the generator and the discriminator are updated, and the better network parameter weights are fixed, which is beneficial to improve the sketch map of the face area.
  • the method before the second face composite image is input into a pre-trained convolutional neural network model for feature extraction to obtain the feature vector of the second face composite image, the method also includes:
  • the pre-trained convolutional neural network model uses the pre-trained convolutional neural network model to perform feature extraction on multiple real face images stored in the database to obtain the feature vector of each real face image in the multiple real face images; wherein
  • the multiple real face images include real face images corresponding to the training samples;
  • the feature vector of each real face image is stored in the database.
  • the feature vectors of multiple real face images in the database are first extracted and stored, and the feature vectors of the first synthetic face image are combined with the feature vectors of multiple real face images in the database.
  • it can be called directly, which is helpful to improve the speed of the comparison, and to a certain extent, it can improve the efficiency of face recognition.
  • the image conversion model training method provided by the embodiments of this application, during model training, not only uses the confrontation loss of the generation confrontation network to constrain the generated first face composite image, but also uses high-level feature vectors to restrict the first face synthesis image.
  • the face composite image is constrained, and the image conversion model is trained according to the dual constraints, which can make the generated face composite image of higher quality.
  • FIG. 3 is a schematic flowchart of a heterogeneous face recognition method provided by an embodiment of the application. As shown in FIG. 3, it includes the steps:
  • S31 Obtain a sketch map of the face to be recognized, and cut the sketch map of the face to be recognized to obtain a sketch map of the face area.
  • the face sketch to be recognized is a face sketch drawn by a sketch expert based on a low-resolution image or a description of an eyewitness.
  • the above-mentioned acquisition of the face sketch to be recognized may be received
  • the sketch of the face to be recognized uploaded by the user terminal through the application program or the data upload interface.
  • the sketch expert can draw the sketch map of the face to be recognized on the drawing, and then take a photo of the sketch map of the face to be recognized or scan to obtain an electronic file of the sketch map of the face to be recognized, through the application on the user terminal or
  • the data upload interface uploads the electronic file of the face sketch to be recognized to the server; the sketch expert can also directly draw the sketch of the face to be recognized on the user terminal with drawing function, and then send it to the server through the application or the data upload interface Upload the completed sketch of the face to be recognized.
  • the above-mentioned cropping of the face sketch to be recognized can be implemented based on dlib+opencv (a machine learning open source tool + open source vision library); it can also be achieved by using two layers of different structures of convolutional neural networks to locate the face sketch to be recognized
  • the face key points include eyes, nose tip and bilateral mouth corners, a total of five
  • the image input size of the first layer is 112*112*3, and the output is the coordinates of the five key points Information
  • the image input size of the second layer is 32*32*3, and the output is the coordinate information of a single key point among the five key points; it can also use multi-task cascaded convolutional neural network to realize the face sketch to be recognized Tailoring is not specifically limited.
  • the cropping the sketch map of the face to be recognized to obtain the sketch map of the face area includes:
  • the sketch map of the face to be recognized is cropped to obtain the sketch map of the face area.
  • the pre-trained multi-task convolutional neural network is shown in Figure 4, which consists of three network structures.
  • Composition P-Net (Proposal Network), R-Net (Refine Network), O-Net (Output Network), which integrates face area detection and face key point detection, and each network structure will perform face classification , Bounding box regression and key point positioning operations.
  • face classification uses cross-entropy loss
  • bounding box regression uses BB regression (bounding box regression) loss
  • face key point positioning uses landmark localization loss.
  • the face sketch to be recognized is transformed into different scales to construct an image pyramid to obtain images of different sizes.
  • Input the image with the size of 12*12*3 into P-Net, and pass the processing of convolution 3*3-pooling 2*2-convolution 3*3-convolution 3*3, and judge by a face classifier Whether the area is a face, use border regression and a key point locator to perform the preliminary output of the face area.
  • R-Net The input of R-Net is the output of P-Net, its size is 24*24*3, after convolution 3*3-pooling 3*3-convolution 3*3-pooling 3*3-convolution 2* 2-
  • the processing of the fully connected layer 128 optimizes the input, and again uses the border regression and key point locator to perform the border regression and key point positioning of the face area to obtain an output that can be used by O-Net.
  • the input size of O-Net is 48*48*3, after convolution 3*3-pooling 3*3-convolution 3*3-pooling 3*3-convolution 3*3-pooling 2*2-
  • the convolution 2*2-fully connected layer 256 processes, through more supervision to identify the face area, and will return to the person's facial feature points, and finally output five face key points.
  • the sketch of the face to be recognized is cropped to remove the background area in the picture, and finally only the face area of the preset size (for example: 112*128 pixels) is retained to obtain the person Sketch map of face area.
  • the pre-trained multi-task cascaded convolutional neural network is used to locate the five key points of the face in the sketch to be recognized.
  • the positioning accuracy is higher, and the background area of the sketch to be recognized is removed. , It is helpful to reduce the interference of image conversion and face recognition in the later stage.
  • the training process of the multi-task cascaded convolutional neural network includes: performing scale transformation and labeling label information on the image set used for training to obtain the first training sub-image, the second training sub-image, and the third training sub-image.
  • Images, where the image set used for training includes positive face samples, negative face samples, partial face samples, positive face samples with key points, and background region samples.
  • the size of the first training sub-image is 12 *12, the size of the second training sub-image is 24*24, and the size of the third training sub-image is 48*48;
  • the first training sub-image is input to P-Net for processing, and the P-Net is updated by the random gradient descent method Parameters, the trained P-Net model is obtained after multiple iterations; initialize R-Net with the trained P-Net model, predict the first face candidate window, and input the first face candidate window and the second training sub-image R-Net is trained to obtain a trained R-Net model; use the trained R-Net model to initialize O-Net, and use the trained P-Net and R-Net cascade model to predict the second face candidate window , The second face candidate window and the third training sub-image are input into O-Net for training, and a trained O-Net model is obtained; according to the trained P-Net model, R-Net model and O-Net model, the prediction Trained multi-task cascaded convolutional neural network.
  • S32 Input the sketch map of the face region into a pre-trained image conversion model for processing to generate a second synthetic face image.
  • the pre-trained image conversion model structure may be specifically as shown in FIG. 5, including a Generative Adversarial Network (GAN) and a feature retention module, where generating the adversarial network includes generating
  • GAN Generative Adversarial Network
  • the generator uses the sketch map of the face area to generate the second face composite image, and uses the training samples to generate the corresponding first face composite image during training.
  • the discriminator is used in the During training, it is judged whether all the first face composite images generated by the generator are close to the real face images.
  • the main body of the feature retention module is a pre-trained convolutional neural network model.
  • the feature retention module is used for feature extraction of the first synthetic face image generated during training and comparison with the features of the real face image, according to the discriminator
  • the comparison supervision generator of the discrimination and feature retention module generates a second face composite image with higher quality.
  • S33 Perform feature extraction on the second face composite image to obtain a feature vector of the second face composite image.
  • the second face composite image is input into a pre-trained convolutional neural model for feature extraction.
  • the pre-trained convolutional neural network model is trained by a large number of face image samples with category labels.
  • feature vector extraction during image conversion model training is to retain feature information closer to the real face image corresponding to the training sample, and to obtain a better second face composite image , And here is to match with multiple real face images in the database to obtain the final face recognition result.
  • the similarity measure or distance measure can be used specifically, such as: cosine similarity, Euclidean distance or cut ratio Shev distance and so on.
  • the embodiment of the application obtains the sketch map of the face to be recognized by obtaining the sketch map of the face to be recognized, and crops the sketch map of the face to be recognized to obtain the sketch map of the face area;
  • the image input pre-trained image conversion model is processed to generate a second face composite image; feature extraction is performed on the second face composite image to obtain the feature vector of the second face composite image;
  • the feature vector of the second synthetic face image is matched with feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the face area is first cut out from the sketch map of the face to be recognized to reduce the interference of the background area on the face recognition.
  • the dual constraints of the anti-loss and high-level features are used, except In addition to making the generated second face composite image clearer, it can also make the high-level features closer to the high-level features of the real face image, and convert heterogeneous face recognition to homogeneous face recognition, which is beneficial to improve heterogeneity The accuracy of face recognition.
  • the image conversion model includes a generator; the input of the sketch map of the face region into a pre-trained image conversion model for processing to generate a second composite face image includes:
  • the dimensions of the feature map are compressed to generate the second face composite image.
  • the convolutional neural network can extract the low/middle/high-level features of the image, the deeper the network, the richer the features of the different layers extracted, but the deeper the network is not the better, blindly increase the network depth , It will degrade the entire network and decrease the accuracy.
  • a multi-layer residual network is used to extract features from the cropped face region sketch map.
  • the multi-layer residual network constructs residual blocks to enrich the extracted features. It is accurate, and then deconvolution and dimensional compression operations are performed to obtain the above-mentioned first face composite image, and the network parameter weight of the generator has been updated and fixed when the image conversion model is trained, so that the generated first face The composite image quality is better.
  • the matching the feature vector of the first synthetic face image with feature vectors of multiple real face images stored in a database to obtain a face recognition result includes:
  • the preset value can be set as small as possible to determine the target real face image with the smallest Euclidean distance. For example, only the Euclidean distance from the feature vector of the real face image A is less than the preset value. Value, then the real face image A is the result of face recognition.
  • the prompt message of face recognition failure can be a voice prompt output through the speaker of the user terminal, such as "Face recognition failed, please try again!, or output through the display window of the user terminal, such as a warning sign, text prompt Language etc.
  • the similarity between the first synthetic face image obtained by the Euclidean distance metric conversion and multiple real face images in the database the smaller the distance, the higher the similarity, and the more intuitive matching result.
  • Fig. 6 is a schematic structural diagram of an image conversion model training device provided by an embodiment of the application. As shown in Fig. 6, the device includes:
  • the first image generation module 61 is configured to input training samples into a generator that generates a confrontation network for processing to generate a first synthetic face image;
  • the first image discrimination module 62 is configured to input the first synthetic face image and the real face image corresponding to the training sample into the discriminator of the generated confrontation network for discrimination, and output the discrimination result;
  • the first image comparison module 63 is configured to input the first synthetic face image into a pre-trained convolutional neural network model for feature extraction, obtain the feature vector of the first synthetic face image, and combine the The feature vector of the first synthetic face image is compared with the feature vector of the real face image corresponding to the training sample, and the comparison result is output;
  • the model generation module 64 is configured to obtain the image conversion model according to the discrimination result and the comparison result.
  • the model generation module 64 includes:
  • the weight update unit 6401 is configured to update the network parameter weights of the generator and the discriminator if any one of the discrimination result and the comparison result does not meet the preset expected value to obtain an updated generator And updated discriminator;
  • the result update unit 6402 is configured to input the training samples into the updated generator for processing to obtain the updated first face composite image, and the updated first face composite image and the real life corresponding to the training sample.
  • the face image is input into the updated discriminator for discrimination, the updated discrimination result is output, and the updated first face composite image is input into the pre-trained convolutional neural network model for feature extraction to obtain the updated first
  • the feature vector of the face composite image is compared, and the updated feature vector of the first face composite image is compared with the feature vector of the real face image corresponding to the training sample, and the updated comparison result is output;
  • the model generation unit 6403 is configured to fix the network parameter weights of the updated generator and the updated discriminator if the updated judgment result and the updated comparison result both meet the preset expected value, so as to obtain the image conversion model.
  • the device further includes:
  • the second feature extraction module 65 is configured to use the pre-trained convolutional neural network model to perform feature extraction on multiple real face images stored in the database to obtain the feature vector of each real face image in the multiple real face images ;
  • the multiple real face images stored in the database include real face images corresponding to the training samples;
  • the feature storage module 66 is used to store the feature vector of each real face image in the database.
  • FIG. 9 is a schematic structural diagram of a heterogeneous face recognition device provided by an embodiment of the application. As shown in FIG. 9, the device includes:
  • the face cropping module 91 is configured to obtain a sketch map of a face to be recognized, and cut the sketch map of the face to be recognized to obtain a sketch map of a face area;
  • the second face generation module 92 is configured to input the sketch map of the face area into a pre-trained image conversion model for processing, so as to generate a second face composite image;
  • the first feature extraction module 93 is configured to perform feature extraction on the second face composite image to obtain a feature vector of the second face composite image
  • the face matching module 94 is configured to match the feature vector of the second synthetic face image with the feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the face trimming module 91 includes:
  • the key point positioning unit 9101 is configured to detect the key points of the face in the sketch map of the face to be recognized through a pre-trained multi-task cascaded convolutional neural network, so as to locate the key points of the face;
  • the face cropping unit 9102 is configured to crop the sketch map of the face to be recognized based on the located key points of the face to obtain the sketch map of the face area.
  • the second face generation module 92 includes:
  • the feature extraction unit 9201 is configured to input the sketch map of the face region into the generator, so as to perform feature extraction on the sketch map of the face region using a multi-layer residual network;
  • a feature map generating unit 9202 configured to deconvolve the extracted features to obtain a feature map with the same size as the sketch map of the face region;
  • the image compression unit 9203 is configured to compress the dimensions of the feature map to generate the second face composite image.
  • the face matching module 94 includes:
  • a calculation unit 9401 configured to calculate the Euclidean distance between the feature vector of the second synthetic face image and the feature vector of each real face image among the multiple real face images stored in the database;
  • the first result obtaining unit 9402 is configured to determine the target real face image as a face recognition result if there is a target real face image whose Euclidean distance is less than a preset value in the database;
  • the second result obtaining unit 9403 is configured to, if there is no target real face image with the Euclidean distance less than the preset value in the database, output a prompt message of a face recognition failure as a face recognition result.
  • the device provided in the embodiment of the present application can be applied in any heterogeneous face recognition scene, and can achieve the same or similar beneficial effects.
  • the integrated unit implemented by the software functional module of the device can be stored in a computer-readable storage medium and includes several instructions to enable related equipment or processors to execute part or all of the steps in the above heterogeneous face recognition method.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 13, it includes: a memory 1301 for storing computer programs; a processor 1302 for calling data stored in the memory 1301 The computer program implements the steps in the embodiment of the above heterogeneous face recognition method; at least one output interface 1303 is used for output; at least one input interface 1304 is used for input; each component is connected to at least one bus for communication .
  • processor 1302 is specifically configured to call a computer program to execute the following steps:
  • the first synthetic face image is input into a pre-trained convolutional neural network model for feature extraction, the feature vector of the first synthetic face image is obtained, and the feature vector of the first synthetic face image is compared with The feature vectors of the real face images corresponding to the training samples are compared, and the comparison results are output;
  • the image conversion model is obtained according to the discrimination result and the comparison result.
  • the execution of the processor 1302 to obtain the image conversion model according to the discrimination result and the comparison result includes:
  • the training sample is input to the updated generator for processing to obtain an updated first face composite image, and the updated first face composite image and the real face image corresponding to the training sample are input into the updated first face composite image.
  • the discriminator performs the discrimination, outputs the updated discrimination result, and inputs the updated first face composite image into the pre-trained convolutional neural network model for feature extraction to obtain the feature vector of the updated first face composite image , And compare the updated feature vector of the first synthetic face image with the feature vector of the real face image corresponding to the training sample, and output the updated comparison result;
  • the network parameter weights of the updated generator and the updated discriminator are fixed to obtain the image conversion model.
  • processor 1302 is also used to:
  • the pre-trained convolutional neural network model uses the pre-trained convolutional neural network model to perform feature extraction on multiple real face images stored in the database to obtain the feature vector of each real face image in the multiple real face images; wherein
  • the multiple real face images include real face images corresponding to the training samples;
  • the feature vector of each real face image is stored in the database.
  • processor 1302 of the electronic device executes the computer program to implement the steps in the image conversion model training method described above, the embodiments of the image conversion model training method described above are all applicable to the electronic device, and can achieve the same Or similar beneficial effects.
  • processor 1302 is specifically further configured to call a computer program to execute the following steps:
  • the feature vector of the second synthetic face image is matched with feature vectors of multiple real face images stored in the database to obtain a face recognition result.
  • the processor 1302 executes the cropping of the sketch map of the face to be recognized to obtain the sketch map of the face area, including:
  • the sketch map of the face to be recognized is cropped to obtain the sketch map of the face area.
  • the image conversion model includes a generator; the processor 1302 executes the input of the sketch map of the face region into a pre-trained image conversion model for processing to generate a second composite face image, including:
  • the dimensions of the feature map are compressed to generate the second face composite image.
  • the processor 1302 executes the matching of the feature vector of the second face composite image with feature vectors of multiple real face images stored in a database to obtain a face recognition result, including:
  • the above electronic device may be a computer, a notebook computer, a tablet computer, a palmtop computer, a server, an embedded device, and the like.
  • the electronic device may include, but is not limited to, a processor 1302, a memory 1301, an output interface 1303, and an input interface 1304.
  • a processor 1302 a memory 1301, an output interface 1303, and an input interface 1304.
  • the schematic diagram is only an example of the electronic device, and does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components.
  • the processor 1302 of the electronic device executes the computer program to implement the steps in the above-mentioned heterogeneous face recognition method
  • the above-mentioned embodiments of the heterogeneous face recognition method are all applicable to the electronic device, and all are capable of Achieve the same or similar beneficial effects.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the above-mentioned image conversion model training method or heterogeneous face recognition method is implemented. step.
  • the computer program in the computer-readable storage medium includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • the computer program of the computer-readable storage medium is executed by the processor to implement the steps in the image conversion model training method or the heterogeneous face recognition method
  • the image conversion model training method or the heterogeneous face recognition method described above is All the embodiments or implementations of the identification method are applicable to the computer-readable storage medium, and can achieve the same or similar beneficial effects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé d'entraînement d'un modèle de conversion d'image, un procédé de reconnaissance faciale hétérogène, un dispositif et un appareil. Le procédé de reconnaissance faciale hétérogène comprend les étapes suivantes : obtention d'une carte de croquis de visage à reconnaître, et découpe de la carte de croquis de visage à reconnaître pour obtenir une carte de croquis de région de visage (S31) ; application du croquis de région de visage comme entrée d'un modèle de conversion d'image pré-entraîné pour un traitement en vue de générer une deuxième image composite de visage (S32) ; réalisation d'une extraction de caractéristique sur la deuxième image composite de visage afin d'obtenir un vecteur de caractéristique de la deuxième image composite de visage (S33) ; et appariement du vecteur de caractéristiques de la deuxième image composite de visage avec les vecteurs de caractéristiques d'une pluralité d'images de visage réel stockées dans une base de données afin d'obtenir un résultat de reconnaissance faciale (S34). De cette manière, la région de visage est découpée à partir de la carte de croquis de visage à reconnaître, ce qui permet de réduire l'interférence de la zone d'arrière-plan sur la reconnaissance faciale et, en même temps, le modèle de conversion d'image prédéfini est adopté pour la conversion d'image et la reconnaissance faciale hétérogène est convertie en reconnaissance faciale homogène, ce qui améliore la précision de la reconnaissance faciale hétérogène.
PCT/CN2019/121348 2019-08-29 2019-11-27 Procédé d'entraînement d'un modèle de conversion d'image, procédé de reconnaissance faciale hétérogène, dispositif et appareil WO2021036059A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910812547.4 2019-08-29
CN201910812547.4A CN110659582A (zh) 2019-08-29 2019-08-29 图像转换模型训练方法、异质人脸识别方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2021036059A1 true WO2021036059A1 (fr) 2021-03-04

Family

ID=69036532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121348 WO2021036059A1 (fr) 2019-08-29 2019-11-27 Procédé d'entraînement d'un modèle de conversion d'image, procédé de reconnaissance faciale hétérogène, dispositif et appareil

Country Status (2)

Country Link
CN (1) CN110659582A (fr)
WO (1) WO2021036059A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096242A (zh) * 2021-04-29 2021-07-09 平安科技(深圳)有限公司 虚拟主播生成方法、装置、电子设备及存储介质
CN113570564A (zh) * 2021-07-21 2021-10-29 同济大学 一种基于多路卷积网络的多清晰度伪造人脸视频的检测方法
CN113642481A (zh) * 2021-08-17 2021-11-12 百度在线网络技术(北京)有限公司 识别方法、训练方法、装置、电子设备以及存储介质
CN113901904A (zh) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 图像处理方法、人脸识别模型训练方法、装置及设备
CN114326655A (zh) * 2021-11-30 2022-04-12 深圳先进技术研究院 工业机器人故障数据生成方法、系统、终端以及存储介质
CN115065863A (zh) * 2022-06-14 2022-09-16 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN117830083A (zh) * 2024-03-05 2024-04-05 昆明理工大学 一种人脸素描到人脸照片的生成方法、装置

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339832B (zh) * 2020-02-03 2023-09-12 中国人民解放军国防科技大学 人脸合成图像的检测方法及装置
CN111652064A (zh) * 2020-04-30 2020-09-11 平安科技(深圳)有限公司 人脸图像生成方法、电子装置及可读存储介质
CN113486688A (zh) * 2020-05-27 2021-10-08 海信集团有限公司 一种人脸识别方法及智能设备
CN111833240B (zh) * 2020-06-03 2023-07-25 北京百度网讯科技有限公司 人脸图像转换方法、装置、电子设备及存储介质
CN111754596B (zh) * 2020-06-19 2023-09-19 北京灵汐科技有限公司 编辑模型生成、人脸图像编辑方法、装置、设备及介质
CN111862030B (zh) 2020-07-15 2024-02-09 北京百度网讯科技有限公司 一种人脸合成图检测方法、装置、电子设备及存储介质
CN112633154B (zh) * 2020-12-22 2022-07-22 云南翼飞视科技有限公司 一种异源人脸特征向量之间的转换方法及系统
CN112633288B (zh) * 2020-12-29 2024-02-13 杭州电子科技大学 一种基于绘画笔触指导的人脸素描生成方法
CN112766105A (zh) * 2021-01-07 2021-05-07 北京码牛科技有限公司 一种应用于图码联采系统的图像转换方法及装置
CN113011277B (zh) * 2021-02-25 2023-11-21 日立楼宇技术(广州)有限公司 基于人脸识别的数据处理方法、装置、设备及介质
CN112989096B (zh) * 2021-03-05 2023-03-14 浙江大华技术股份有限公司 脸部特征迁移方法、电子设备及存储介质
CN114240810B (zh) * 2021-11-10 2023-08-08 合肥工业大学 一种基于渐进式生成网络的人脸素描-照片合成方法
CN113822245B (zh) * 2021-11-22 2022-03-04 杭州魔点科技有限公司 人脸识别方法、电子设备和介质
CN114255502B (zh) * 2021-12-23 2024-03-29 中国电信股份有限公司 人脸图像生成方法及装置、人脸识别方法、设备、介质
CN114266946A (zh) * 2021-12-31 2022-04-01 智慧眼科技股份有限公司 遮挡条件下的特征识别方法、装置、计算机设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279936A (zh) * 2013-06-21 2013-09-04 重庆大学 基于画像的人脸伪照片自动合成及修正方法
CN103902991A (zh) * 2014-04-24 2014-07-02 西安电子科技大学 基于法医素描的人脸识别方法
CN108596110A (zh) * 2018-04-26 2018-09-28 北京京东金融科技控股有限公司 图像识别方法及装置、电子设备、存储介质
CN109063776A (zh) * 2018-08-07 2018-12-21 北京旷视科技有限公司 图像再识别网络训练方法、装置和图像再识别方法及装置
CN109359541A (zh) * 2018-09-17 2019-02-19 南京邮电大学 一种基于深度迁移学习的素描人脸识别方法
CN110069992A (zh) * 2019-03-18 2019-07-30 西安电子科技大学 一种人脸图像合成方法、装置、电子设备及存储介质
CN110119746A (zh) * 2019-05-08 2019-08-13 北京市商汤科技开发有限公司 一种特征识别方法及装置、计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107748858A (zh) * 2017-06-15 2018-03-02 华南理工大学 一种基于级联卷积神经网络的多姿态眼睛定位方法
CN108446667A (zh) * 2018-04-04 2018-08-24 北京航空航天大学 基于生成对抗网络数据增强的人脸表情识别方法和装置
CN109493303B (zh) * 2018-05-30 2021-08-17 湘潭大学 一种基于生成对抗网络的图像去雾方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279936A (zh) * 2013-06-21 2013-09-04 重庆大学 基于画像的人脸伪照片自动合成及修正方法
CN103902991A (zh) * 2014-04-24 2014-07-02 西安电子科技大学 基于法医素描的人脸识别方法
CN108596110A (zh) * 2018-04-26 2018-09-28 北京京东金融科技控股有限公司 图像识别方法及装置、电子设备、存储介质
CN109063776A (zh) * 2018-08-07 2018-12-21 北京旷视科技有限公司 图像再识别网络训练方法、装置和图像再识别方法及装置
CN109359541A (zh) * 2018-09-17 2019-02-19 南京邮电大学 一种基于深度迁移学习的素描人脸识别方法
CN110069992A (zh) * 2019-03-18 2019-07-30 西安电子科技大学 一种人脸图像合成方法、装置、电子设备及存储介质
CN110119746A (zh) * 2019-05-08 2019-08-13 北京市商汤科技开发有限公司 一种特征识别方法及装置、计算机可读存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113096242A (zh) * 2021-04-29 2021-07-09 平安科技(深圳)有限公司 虚拟主播生成方法、装置、电子设备及存储介质
CN113570564A (zh) * 2021-07-21 2021-10-29 同济大学 一种基于多路卷积网络的多清晰度伪造人脸视频的检测方法
CN113570564B (zh) * 2021-07-21 2024-02-27 同济大学 一种基于多路卷积网络的多清晰度伪造人脸视频的检测方法
CN113642481A (zh) * 2021-08-17 2021-11-12 百度在线网络技术(北京)有限公司 识别方法、训练方法、装置、电子设备以及存储介质
CN113901904A (zh) * 2021-09-29 2022-01-07 北京百度网讯科技有限公司 图像处理方法、人脸识别模型训练方法、装置及设备
CN114326655A (zh) * 2021-11-30 2022-04-12 深圳先进技术研究院 工业机器人故障数据生成方法、系统、终端以及存储介质
CN115065863A (zh) * 2022-06-14 2022-09-16 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN115065863B (zh) * 2022-06-14 2024-04-12 北京达佳互联信息技术有限公司 视频生成方法、装置、电子设备及存储介质
CN117830083A (zh) * 2024-03-05 2024-04-05 昆明理工大学 一种人脸素描到人脸照片的生成方法、装置
CN117830083B (zh) * 2024-03-05 2024-05-03 昆明理工大学 一种人脸素描到人脸照片的生成方法、装置

Also Published As

Publication number Publication date
CN110659582A (zh) 2020-01-07

Similar Documents

Publication Publication Date Title
WO2021036059A1 (fr) Procédé d'entraînement d'un modèle de conversion d'image, procédé de reconnaissance faciale hétérogène, dispositif et appareil
US11657602B2 (en) Font identification from imagery
CN108460338B (zh) 人体姿态估计方法和装置、电子设备、存储介质、程序
WO2021073417A1 (fr) Procédé et appareil de génération d'expression, dispositif et support d'informations
WO2019128508A1 (fr) Procédé et appareil de traitement d'image, support de mémoire et dispositif électronique
AU2014368997B2 (en) System and method for identifying faces in unconstrained media
US11163978B2 (en) Method and device for face image processing, storage medium, and electronic device
US8861800B2 (en) Rapid 3D face reconstruction from a 2D image and methods using such rapid 3D face reconstruction
CN109993102B (zh) 相似人脸检索方法、装置及存储介质
AU2018202767B2 (en) Data structure and algorithm for tag less search and svg retrieval
CN108509994B (zh) 人物图像聚类方法和装置
CN112561879B (zh) 模糊度评价模型训练方法、图像模糊度评价方法及装置
WO2022227765A1 (fr) Procédé de génération d'un modèle de complétion d'image, et dispositif, support et produit programme
CN111553838A (zh) 模型参数的更新方法、装置、设备及存储介质
CN111108508B (zh) 脸部情感识别方法、智能装置和计算机可读存储介质
JP2011258036A (ja) 3次元形状検索装置、3次元形状検索方法、及びプログラム
CN115115552B (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
JP2015041293A (ja) 画像認識装置および画像認識方法
CN111353353A (zh) 跨姿态的人脸识别方法及装置
WO2024099026A1 (fr) Procédé et appareil de traitement d'image, dispositif, support de stockage et produit programme
CN117373100B (zh) 基于差分量化局部二值模式的人脸识别方法及系统
CN117079313A (zh) 图像处理方法、装置、设备及存储介质
CN114627534A (zh) 活体判别方法及电子设备、存储介质
CN116959125A (zh) 一种数据处理方法以及相关装置
CN117237845A (zh) 生成视频识别模型的方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19942888

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19942888

Country of ref document: EP

Kind code of ref document: A1