CN114612991A - Conversion method and device for attacking face picture, electronic equipment and storage medium - Google Patents

Conversion method and device for attacking face picture, electronic equipment and storage medium Download PDF

Info

Publication number
CN114612991A
CN114612991A CN202210282529.1A CN202210282529A CN114612991A CN 114612991 A CN114612991 A CN 114612991A CN 202210282529 A CN202210282529 A CN 202210282529A CN 114612991 A CN114612991 A CN 114612991A
Authority
CN
China
Prior art keywords
face
picture
loss value
generator
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210282529.1A
Other languages
Chinese (zh)
Inventor
刘星
赵晨旭
唐大闰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Minglue Zhaohui Technology Co Ltd
Original Assignee
Beijing Minglue Zhaohui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Minglue Zhaohui Technology Co Ltd filed Critical Beijing Minglue Zhaohui Technology Co Ltd
Priority to CN202210282529.1A priority Critical patent/CN114612991A/en
Publication of CN114612991A publication Critical patent/CN114612991A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application relates to the technical field of face recognition, and discloses a conversion method for attacking face pictures, which comprises the following steps: training to obtain a generated confrontation network according to the acquired face data sample set and reserving a generator in the generated confrontation network; training an encoder through the generator according to the face data sample set until a similarity loss value of the encoder is converged, wherein the similarity loss value is the sum of a mean square deviation loss value and an identity information loss value; and converting the attack face picture of the target user into a real face picture of the target user according to the trained encoder and generator. The technical effect of the method is illustrated. The application also discloses a conversion device for attacking the face picture, electronic equipment and a storage medium.

Description

Conversion method and device for attacking face picture, electronic equipment and storage medium
Technical Field
The present application relates to the field of face recognition technologies, and for example, to a conversion method and apparatus for attacking a face picture, an electronic device, and a storage medium.
Background
At present, due to rapid development of computer science and electronic technology, face recognition is becoming the second global biometric authentication method to fingerprint recognition only in terms of market share, and face recognition systems have been applied in many fields of people's life, such as electronic door lock, electronic door access, financial payment, and so on. Meanwhile, the face recognition system also faces many security risks, such as video attack, photo attack, stereo mask attack, and the like.
In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art:
in the process of training a face living body detection model in a face recognition system, a real face picture and an attack face picture are acquired, wherein the acquisition process is generally to acquire the real face picture of a target user, three-dimensionally print an attack mask according to the facial features of the target user, and then acquire the attack face picture after wearing the attack mask. However, the difference between the posture, expression or background information of the attack face image and the real face image obtained in the above manner is large, so that the identity information of the attacked target user cannot be accurately determined when the face living body detection model is applied.
Disclosure of Invention
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.
The embodiment of the disclosure provides a conversion method and device for attacking face pictures, electronic equipment and a storage medium, so that a face recognition system can accurately recognize identity information of an attacked target user.
In some embodiments, the method for converting an attack face picture includes:
training to obtain a generated confrontation network according to the acquired face data sample set and reserving a generator in the generated confrontation network;
training an encoder through the generator according to the face data sample set until a similarity loss value of the encoder is converged, wherein the similarity loss value is the sum of a mean square deviation loss value and an identity information loss value;
and converting the attack face picture of the target user into a real face picture of the target user according to the trained encoder and generator.
Optionally, the converting, according to the trained encoder and generator, an attack face picture of a target user into a real face picture of the target user includes:
acquiring an attack face picture of a target user;
inputting the attack face picture into an encoder to obtain a first potential code corresponding to the attack face picture;
and inputting the first potential code corresponding to the attack face picture into a generator to obtain the real face picture of the target user.
Optionally, the training, by the generator, an encoder according to the face data sample set until a similarity loss value of the encoder converges includes:
inputting the training sample pictures in the face data sample set into an initialized encoder to obtain a second potential code output by the encoder;
inputting the second potential code output by the coder into a generator to obtain a generated picture output by the generator;
calculating a similarity loss value between the training sample picture and the generated picture;
and according to the similarity loss value, performing back propagation to adjust the parameters of the encoder until the similarity loss value of the encoder is converged.
Optionally, the calculating a similarity loss value between the training sample picture and the generated picture includes:
calculating a mean square deviation loss value between the generated picture and the training sample picture according to the pixel value of each pixel point of the generated picture and the training sample picture;
acquiring an identity information loss value between the generated picture and a training sample picture according to a face recognition model based on the ArcFace;
and summing the mean square deviation loss value and the identity information loss value to obtain a similarity loss value between the generated picture and the training sample picture.
Optionally, the obtaining an identity information loss value between the generated picture and the training sample picture according to the face recognition model based on the ArcFace includes:
training a residual error network by using an ArcFace loss function according to the face data sample set to obtain the face recognition model;
inputting the generated picture and the training sample picture into the face recognition model to obtain a face feature expression corresponding to the generated picture and a face feature expression corresponding to the training sample picture;
and calculating the cosine distance between the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture, and taking the cosine distance as the identity information loss value between the generated picture and the training sample picture.
Optionally, before training to obtain a generated confrontation network according to the acquired face data sample set and reserving the generator in the generated confrontation network, the method further includes:
collecting a plurality of real face pictures in an open source face data set to form a face data sample set;
acquiring a face boundary frame corresponding to each real face picture in the face data sample set through a face detector in a C language toolkit;
and positioning the coordinates of the facial key feature points corresponding to each real face picture in the face data sample set through a face key point detector in the C language toolkit.
Optionally, the training, according to the acquired face data sample set, to obtain a generator for generating a confrontation network and retaining the generated confrontation network, includes:
training based on a Style GAN frame to obtain a generated confrontation network according to a face boundary frame corresponding to each real face picture in the face data sample set and facial key feature point coordinates corresponding to each real face picture;
and after the training is finished, reserving the generator in the generation countermeasure network.
In some embodiments, the conversion device for attacking a face picture includes:
the generator training module is configured to train to obtain a generated countermeasure network according to the acquired face data sample set and reserve a generator in the generated countermeasure network;
an encoder training module configured to train, by the generator, an encoder according to the face data sample set until a similarity loss value of the encoder converges, where the similarity loss value is a sum of a mean square variance loss value and an identity information loss value;
and the face conversion module is configured to convert the attack face picture of the target user into the real face picture of the target user according to the trained encoder and generator.
In some embodiments, the electronic device comprises a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the conversion method for attacking a face picture according to the present application.
In some embodiments, the storage medium stores program instructions, and when executed, the program instructions perform the method for converting an attack face picture according to the present application.
The conversion method and device for attacking face pictures, the electronic device and the storage medium provided by the embodiment of the disclosure can achieve the following technical effects:
the method and the device adopt technical means applied to the technical field of machine learning, convert the attack face picture of the attack mask worn by the target user into the real face picture of the target user by using the trained encoder and generator, not only can ensure the authenticity of the generated face, but also can keep the invariance of the posture, expression and background information of the face in the picture, so that the identity information of the attacked target user can be accurately determined under the condition that the face living body detection model judges the attack face.
The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:
FIG. 1 is a schematic diagram of a system architecture for generating a countermeasure network;
FIG. 2 is a schematic diagram of the operation of an encoder;
fig. 3 is a schematic diagram of a conversion method for attacking a face picture according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of another conversion method for attacking a face picture according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of another conversion method for attacking a face picture according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of another conversion method for attacking a face picture according to an embodiment of the present disclosure;
fig. 7 is a schematic diagram of another conversion method for attacking a face picture according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of another conversion method for attacking a face picture according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of an application of an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of another application of an embodiment of the present disclosure;
FIG. 11 is a schematic diagram of another application of an embodiment of the present disclosure;
fig. 12 is a schematic diagram of a conversion device for attacking a face picture according to an embodiment of the present disclosure;
fig. 13 is a schematic diagram of another conversion apparatus for attacking a face picture according to an embodiment of the present disclosure.
Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The term "plurality" means two or more unless otherwise specified.
In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.
The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.
The term "correspond" may refer to an association or binding relationship, and a corresponds to B refers to an association or binding relationship between a and B.
Referring to fig. 1, generating a countermeasure Network (GAN) is a neural Network learning model in which a game is continuously played by a Generator (Generator) and a Discriminator (Discriminator), and the Generator learns the distribution of data. During the training process, the goal of the generator is to generate as much as possible a real picture to trick the discriminator. The goal of the discriminator is to try to discriminate between false and true pictures generated by the generator. Therefore, the generator and the discriminator form a dynamic game process, and finally, the result of the game of the two parties is that the false samples generated by the generator are almost not different from the true samples, so that the discriminator cannot distinguish the true samples from the false samples, the generator and the discriminator reach balance at the moment, and the whole training process is finished.
In the related art, a number of GAN models, such as PCGAN, Big GAN, Style GAN, have been developed for application to generate high quality diversified generated pictures from random noise input. Recent studies have shown that GAN can efficiently encode rich semantic information in intermediate feature and hidden layer spaces. These theories can synthesize pictures with varying diversity characteristics by changing the hidden layer spatial coding. However, because GAN lacks the inference capability and encoder, this process can only be applied to pictures generated from GAN, and not to real pictures.
As shown in fig. 2, an Encoder (Encoder) aims to encode an input picture, text, audio, or the like into a low-dimensional Latent Code (late Code) or a feature expression (Embedding), and is generally implemented by a neural network, and includes a convolutional layer, a pooling layer, and a batch normalization layer, wherein the convolutional layer is responsible for acquiring a picture local feature, the pooling layer downsamples the picture and transfers a scale-invariant feature to a next layer, and the batch normalization layer mainly normalizes the distribution of a training picture and accelerates learning. Taking the coding of the face picture as an example, the encoder performs feature extraction on the face picture to form a potential code, so that the potential code contains main information of the face picture, for example, elements of the vector may represent face skin color, eyebrow position, eye size, and the like.
In the related technology, a face recognition system faces many risks in safety, in order to meet the challenge, training a face living body detection model has objective value, and for a real face, the output result of the face living body detection model is true; and for an attacking face, the output result of the model is false. Therefore, training a face liveness detection model usually requires a large number of real face data sets and attack face data sets. For example, the method includes acquiring facial three-dimensional point cloud data of a user, printing a corresponding three-dimensional face mask by using materials such as gypsum, resin and silica gel, wearing the three-dimensional face mask by an attacker, and shooting a face picture, so as to acquire an attack face picture of the wearing mask.
Meanwhile, when a face recognition system faces an attacking face, the attack form may be: paper printing attack, electronic screen equipment attack, mask attack (including resin mask, gypsum mask, silica gel headgear mask etc.), judge that the camera is a face of attack who wears the mask in front of the face live body detection model after, face identification model still faces another challenge: how to determine the identity information of the target user attacked in front of the camera. Therefore, how to restore an attacking face picture wearing a mask into a real face picture of an attacked person has important significance for the safety guarantee of a face recognition system.
Therefore, with reference to fig. 3, an embodiment of the present disclosure provides a conversion method for attacking a face picture, including:
step 301: and training to obtain a generated countermeasure network according to the acquired face data sample set and reserving a generator in the generated countermeasure network.
In the embodiment of the application, based on a Style GAN architecture, a generator for generating a confrontation network and reserving the generated confrontation network is obtained through training by using a real face picture in an acquired face data sample set, the generated confrontation network can realize automatic learning and unsupervised separation based on high-level attributes (such as postures and identities during face training) and random changes (such as freckles and hairs) of the generated picture (such as freckles and hairs), and intuitive and large-scale specific control synthesis can be realized.
Step 302: and training an encoder through the generator according to the face data sample set until the similarity loss value of the encoder is converged, wherein the similarity loss value is the sum of the mean square deviation loss value and the identity information loss value.
In the embodiment of the present application, according to the face data sample set obtained in step 301, the trained generator is used to perform auxiliary training on the encoder, the role of the encoder is opposite to that of the generator, the input of the encoder may be a face picture in RGB format, the output is a potential code corresponding to the face picture, meanwhile, the loss function of the encoder is designed in the training, the identity information loss value is introduced while the original mean square deviation loss value of GAN inverse mapping is retained, the identity information loss value is calculated through a face recognition model based on Arcface, and then the sum of the mean square deviation loss value and the identity information loss value is used as a final similarity loss value.
Step 303: and converting the attack face picture of the target user into a real face picture of the target user according to the trained encoder and generator.
In the embodiment of the application, in the application stage, the trained encoder and generator are used, the attack face picture with the attack mask of the target user is converted into the real face picture of the target user through potential encoding, the authenticity of the generated face can be ensured, and the invariance of the pose, expression and background information of the face in the picture can be kept.
By adopting the conversion method of the attack face picture provided by the embodiment of the disclosure, the attack face picture with the attack mask of the target user is converted into the real face picture of the target user by using the trained encoder and generator, thereby not only ensuring the authenticity of the generated face, but also keeping the invariance of the posture, expression and background information of the face in the picture, and ensuring that the identity information of the attacked target user can be accurately determined under the condition that the face living body detection model judges the attack face.
Optionally, as shown in fig. 4, the converting, according to the trained encoder and generator, an attack face picture of a target user into a real face picture of the target user includes:
step 401: and acquiring an attack face picture of the target user.
In the embodiment of the application, in the inference and prediction stage, the attack face picture of a given target user is obtained, for example, the face picture of an attacker wearing a face mask of the target user is obtained through a camera or a camera, so that the attack face picture of the target user is obtained.
Step 402: and inputting the attack face picture into an encoder to obtain a first potential code corresponding to the attack face picture.
In the embodiment of the application, an attack face picture of a target user is input into an encoder, the encoder performs feature extraction on the attack face picture, and outputs a first potential code corresponding to the attack face picture, for example, the first potential code may be a 16 × 512 dimensional matrix.
Step 403: and inputting the first potential code corresponding to the attack face picture into a generator to obtain the real face picture of the target user.
In the embodiment of the application, the first potential code corresponding to the attack face picture is input into the generator, and the generator generates the real face picture of the target user, namely the real face picture of the attacked person according to the first potential code.
Therefore, under the condition that the living body detection model judges that the face recognition attack exists, the identity information of the attacked can be accurately determined through determining the real face picture of the attacked and comprehensively judging, so that the targeted defense is carried out.
Optionally, as shown in fig. 5, the training, by the generator, an encoder according to the face data sample set until a similarity loss value of the encoder converges includes:
step 501: and inputting the training sample pictures in the face data sample set into an initialized coder to obtain a second potential code output by the coder.
In the embodiment of the application, an initialized encoder is constructed in a training stage, and then training sample pictures in a face data sample set are input into the encoder to obtain a second potential encoding output by the encoder.
Step 502: and inputting the second potential code output by the coder into a generator to obtain a generated picture output by the generator.
In the embodiment of the present application, the second latent code is further input into a trained generator to obtain a generated picture output by the generator, where the generator of the present application may be based on a StyleGAN framework, the second latent code may be a 16 × 512 dimensional matrix, and the output of the generator is a face picture, such as a 3 × 512 × 512 RGB picture.
Step 503: and calculating a similarity loss value between the training sample picture and the generated picture.
In the embodiment of the application, the similarity loss value between the training sample picture and the generated picture is calculated, so that the training condition of the encoder can be generally evaluated from the aspects of the face features and the picture pixels, and the process of the encoder is guided.
Step 504: and according to the similarity loss value, performing back propagation to adjust the parameters of the encoder until the similarity loss value of the encoder is converged.
Therefore, the training of the encoder can be better realized, and the accuracy and generalization rate of the encoder are ensured.
In the above embodiment, as shown in fig. 6, the calculating a similarity loss value between the training sample picture and the generated picture includes:
step 601: and calculating a mean square deviation loss value between the generated picture and the training sample picture according to the pixel value of each pixel point of the generated picture and the training sample picture.
In the embodiment of the application, a mean square error loss value, namely an average value of sums after the square of the difference between the pixel values of each pixel point is calculated by comparing the pixel values of each pixel point according to the pixel value of each pixel point of the generated picture and the pixel value of each pixel point of the training sample picture.
Step 602: and acquiring an identity information loss value between the generated picture and the training sample picture according to the face recognition model based on the ArcFace.
In the embodiment of the application, the generated picture and the training sample picture are input into the face recognition model based on a pre-trained face recognition model, and an identity information loss value between the generated picture and the training sample picture output by the face recognition model is obtained, wherein the face represents an additional angular edge distance loss for deep face recognition, and is a loss function for face recognition.
Step 603: and summing the mean square deviation loss value and the identity information loss value to obtain a similarity loss value between the generated picture and the training sample picture.
In the embodiment of the application, the mean square deviation loss value and the identity information loss value are finally summed to obtain a final similarity loss value so as to guide the optimization of the back propagation algorithm.
Therefore, when a loss function is designed, traditionally, an identity information loss value capable of reflecting human face features is introduced on the basis of a mean square deviation loss value, so that the pixel value similarity and the human face feature similarity of a picture are integrated, and the similarity between a generated picture and a target picture is judged.
In the foregoing embodiment, as shown in fig. 7, the obtaining an identity information loss value between the generated picture and the training sample picture according to the face recognition model based on ArcFace includes:
step 701: and training a residual error network by using an ArcFace loss function according to the face data sample set to obtain the face recognition model.
In the embodiment of the application, the face recognition model is obtained by training by using an ArcFace loss function and using a residual error network as a backbone network, such as ResNet50, based on a face data sample set.
Step 702: and inputting the generated picture and the training sample picture into the face recognition model to obtain the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture.
In the embodiment of the application, the face recognition model of the application obtains a face feature expression (Embedding) corresponding to the generated picture and a face feature expression (Embedding) corresponding to the training sample picture by respectively performing feature extraction on the generated picture and the training sample picture, wherein the face feature expression can be a 512-dimensional vector.
Step 703: and calculating the cosine distance between the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture, and taking the cosine distance as the identity information loss value between the generated picture and the training sample picture.
In the embodiment of the application, the cosine distance between the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture is calculated, and the larger the cosine distance is, the poorer the similarity between the generated picture and the training sample picture is, which means that the difference between the face features between the generated picture and the training sample picture is larger.
In this way, by introducing the face recognition model based on the ArcFace to obtain the identity information loss value between the generated picture and the training sample picture, the face features are considered when the confrontation network is generated during GAN inverse mapping, and therefore the generated face and the original face are ensured to have higher similarity in the face recognition system.
In practical application, with reference to fig. 8, an embodiment of the present disclosure provides a conversion method for attacking a face picture, including:
step 801: and collecting a plurality of real face pictures in an open source face data set to form the face data sample set.
In the embodiment of the application, a plurality of real face pictures are collected from an open source face data set such as an FFHQ data set or a Glint360K data set to form the face data sample set.
Step 802: and acquiring a face boundary frame corresponding to each real face picture in the face data sample set through a face detector in the C language toolkit.
Step 803: and positioning the coordinates of the facial key feature points corresponding to each real face picture in the face data sample set through a face key point detector in the C language toolkit.
In the embodiment of the application, the face detector in the Dlib is used for cutting after a face rectangular frame is detected. And simultaneously, after 68 key point coordinates of each face are detected by using a face key point detector in the Dlib, aligning the faces.
Step 804: and training based on a Style GAN frame to obtain a generated confrontation network according to the face boundary frame corresponding to each real face picture in the face data sample set and the face key feature point coordinates corresponding to each real face picture.
Step 805: and after the training is finished, reserving the generator in the generation countermeasure network.
In the embodiment of the present application, as shown in fig. 9, the present application trains a generated confrontation network based on a Style GAN framework, using a face bounding box corresponding to each face picture in a face data sample set and facial key feature point coordinates corresponding to each face picture, where the generated confrontation network includes a generator and a discriminator, and the generator of the network is retained after the training is finished. The generator is a neural network, the input of the generator is called latent coding, for example, the latent coding may be a matrix with dimension of 16 × 512, and the output of the generator is a generated picture of a human face, such as a 3 × 512 × 512 RGB picture.
Step 806: and training an encoder through the generator according to the face data sample set until the similarity loss value of the encoder is converged, wherein the similarity loss value is the sum of the mean square deviation loss value and the identity information loss value.
In the embodiment of the present application, as shown in fig. 10, according to the face data sample set obtained in step 301, the trained generator is used to perform auxiliary training on the encoder, the role of the encoder is opposite to that of the generator, the input of the encoder may be a face picture in RGB format, the potential code corresponding to the face picture is output, meanwhile, the loss function of the encoder is designed in the training, while the original mean square deviation loss value of GAN inverse mapping is retained, the identity information loss value is introduced, the identity information loss value is calculated by using a face recognition model based on arcfacce, and then the sum of the mean square deviation loss value and the identity information loss value is used as the final similarity loss value.
Step 807: and converting the attack face picture of the target user into a real face picture of the target user according to the trained encoder and generator.
In the embodiment of the application, as shown in fig. 11, in the application stage, the trained encoder and generator are used to convert the attack face picture wearing the attack mask of the target user into the real face picture of the target user through the latent coding, which not only ensures the authenticity of the generated face, but also maintains the invariance of the pose, expression and background information of the face in the picture.
By adopting the conversion method of the attack face picture provided by the embodiment of the disclosure, the attack face picture with the attack mask of the target user is converted into the real face picture of the target user by using the trained encoder and generator, thereby not only ensuring the authenticity of the generated face, but also keeping the invariance of the posture, expression and background information of the face in the picture, and ensuring that the identity information of the attacked target user can be accurately determined under the condition that the face living body detection model judges the attack face.
With reference to fig. 12, an embodiment of the present disclosure provides a conversion apparatus for attacking a face picture, including:
a generator training module 1201 configured to train to obtain a generated confrontation network according to the acquired face data sample set and reserve a generator in the generated confrontation network;
an encoder training module 1202 configured to train, by the generator, an encoder according to the face data sample set until a similarity loss value of the encoder converges, where the similarity loss value is a sum of a mean square deviation loss value and an identity information loss value;
a face conversion module 1203, configured to convert the attack face picture of the target user into the real face picture of the target user according to the trained encoder and generator.
Optionally, the face conversion module 1203 is specifically configured to:
acquiring an attack face picture of a target user;
inputting the attack face picture into an encoder to obtain a first potential code corresponding to the attack face picture;
and inputting the first potential code corresponding to the attack face picture into a generator to obtain the real face picture of the target user.
Optionally, the encoder training module 1202 is specifically configured to:
inputting the training sample pictures in the face data sample set into an initialized encoder to obtain a second potential code output by the encoder;
inputting the second potential code output by the encoder into a generator to obtain a generated picture output by the generator;
calculating a similarity loss value between the training sample picture and the generated picture;
and according to the similarity loss value, performing back propagation to adjust the parameters of the encoder until the similarity loss value of the encoder is converged.
Optionally, the encoder training module 1202 is specifically configured to:
calculating a mean square deviation loss value between the generated picture and the training sample picture according to the pixel value of each pixel point of the generated picture and the training sample picture;
acquiring an identity information loss value between the generated picture and a training sample picture according to a face recognition model based on the ArcFace;
and summing the mean square deviation loss value and the identity information loss value to obtain a similarity loss value between the generated picture and the training sample picture.
Optionally, the encoder training module 1202 is specifically configured to:
training a residual error network by using an ArcFace loss function according to the face data sample set to obtain the face recognition model;
inputting the generated picture and the training sample picture into the face recognition model to obtain a face feature expression corresponding to the generated picture and a face feature expression corresponding to the training sample picture;
and calculating the cosine distance between the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture, and taking the cosine distance as the identity information loss value between the generated picture and the training sample picture.
Optionally, the generator training module 1201 is further configured to:
collecting a plurality of real face pictures in an open source face data set to form a face data sample set;
acquiring a face boundary frame corresponding to each real face picture in the face data sample set through a face detector in a C language toolkit;
and positioning the coordinates of the facial key feature points corresponding to each real face picture in the face data sample set through a face key point detector in the C language toolkit.
Optionally, the generator training module 1201, the screenshot configured to:
training based on a Style GAN frame to obtain a generated confrontation network according to a face boundary frame corresponding to each real face picture in the face data sample set and facial key feature point coordinates corresponding to each real face picture;
and after the training is finished, reserving the generator in the generation countermeasure network.
By adopting the conversion device for the attack face picture provided by the embodiment of the disclosure, the attack face picture with the attack mask of the target user is converted into the real face picture of the target user by using the trained encoder and generator, thereby not only ensuring the authenticity of the generated face, but also keeping the invariance of the posture, expression and background information of the face in the picture, and ensuring that the identity information of the attacked target user can be accurately determined under the condition that the face living body detection model judges the attack face.
As shown in fig. 13, an embodiment of the present disclosure provides a conversion apparatus for attacking a face picture, which includes a processor (processor)130 and a memory (memory) 131. Optionally, the apparatus may also include a Communication Interface 132 and a bus 133. The processor 130, the communication interface 132, and the memory 131 may communicate with each other through a bus 133. Communication interface 132 may be used for information transfer. The processor 130 may call logic instructions in the memory 131 to execute the conversion method of the attack face picture of the above embodiment.
In addition, the logic instructions in the memory 131 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products.
The memory 131 is a computer-readable storage medium and can be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 130 executes functional applications and data processing by executing program instructions/modules stored in the memory 131, that is, implements the conversion method of the attack face picture in the above embodiments.
The memory 131 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 131 may include a high-speed random access memory, and may also include a nonvolatile memory.
The embodiment of the disclosure provides a storage medium, which stores computer-executable instructions, wherein the computer-executable instructions are set to execute the conversion method of the attack face picture.
The storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.
The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other media capable of storing program codes, and may also be a transient storage medium.
The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description for example only and are not limiting upon the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of additional identical elements in the process, method or apparatus comprising the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.
Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (10)

1. A conversion method for attacking a face picture is characterized by comprising the following steps:
training according to the acquired face data sample set to obtain a generated confrontation network and reserving a generator in the generated confrontation network;
training an encoder through the generator according to the face data sample set until a similarity loss value of the encoder is converged, wherein the similarity loss value is the sum of a mean square deviation loss value and an identity information loss value;
and converting the attack face picture of the target user into a real face picture of the target user according to the trained encoder and generator.
2. The conversion method according to claim 1, wherein said converting the attack face picture of the target user into the real face picture of the target user according to the trained encoder and generator comprises:
acquiring an attack face picture of a target user;
inputting the attack face picture into an encoder to obtain a first potential code corresponding to the attack face picture;
and inputting the first potential code corresponding to the attack face picture into a generator to obtain the real face picture of the target user.
3. The conversion method according to claim 1, wherein training an encoder by the generator according to the face data sample set until a similarity loss value of the encoder converges comprises:
inputting the training sample pictures in the face data sample set into an initialized encoder to obtain a second potential code output by the encoder;
inputting the second potential code output by the coder into a generator to obtain a generated picture output by the generator;
calculating a similarity loss value between the training sample picture and the generated picture;
and according to the similarity loss value, performing back propagation to adjust the parameters of the encoder until the similarity loss value of the encoder is converged.
4. The conversion method according to claim 3, wherein the calculating a similarity loss value between the training sample picture and the generated picture comprises:
calculating a mean square deviation loss value between the generated picture and the training sample picture according to the pixel value of each pixel point of the generated picture and the training sample picture;
acquiring an identity information loss value between the generated picture and a training sample picture according to a face recognition model based on the ArcFace;
and summing the mean square deviation loss value and the identity information loss value to obtain a similarity loss value between the generated picture and the training sample picture.
5. The conversion method according to claim 4, wherein the obtaining of the identity information loss value between the generated picture and the training sample picture according to the ArcFace-based face recognition model comprises:
training a residual error network by using an ArcFace loss function according to the face data sample set to obtain the face recognition model;
inputting the generated picture and the training sample picture into the face recognition model to obtain a face feature expression corresponding to the generated picture and a face feature expression corresponding to the training sample picture;
and calculating the cosine distance between the face feature expression corresponding to the generated picture and the face feature expression corresponding to the training sample picture, and taking the cosine distance as the identity information loss value between the generated picture and the training sample picture.
6. The conversion method according to any one of claims 1 to 5, before training to obtain a generation countermeasure network and retaining a generator in the generation countermeasure network according to the acquired face data sample set, further comprising:
collecting a plurality of real face pictures in an open source face data set to form a face data sample set;
acquiring a face boundary frame corresponding to each real face picture in the face data sample set through a face detector in a C language toolkit;
and positioning the coordinates of the facial key feature points corresponding to each real face picture in the face data sample set through a face key point detector in the C language toolkit.
7. The conversion method according to claim 6, wherein training a generator that generates a confrontation network and retains the generated confrontation network according to the acquired face data sample set comprises:
training based on a Style GAN frame to obtain a generated confrontation network according to a face boundary frame corresponding to each real face picture in the face data sample set and facial key feature point coordinates corresponding to each real face picture;
and after the training is finished, reserving the generator in the generation countermeasure network.
8. A conversion device for attacking a face picture is characterized by comprising:
the generator training module is configured to train to obtain a generated countermeasure network according to the acquired face data sample set and reserve a generator in the generated countermeasure network;
an encoder training module configured to train, by the generator, an encoder according to the face data sample set until a similarity loss value of the encoder converges, where the similarity loss value is a sum of a mean square variance loss value and an identity information loss value;
and the face conversion module is configured to convert the attack face picture of the target user into the real face picture of the target user according to the trained encoder and generator.
9. An electronic device comprising a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is used for executing the computer program to realize the conversion method of the attack face picture according to any one of claims 1 to 7.
10. A storage medium storing program instructions which, when executed, perform the method of converting an attack face picture according to any one of claims 1 to 7.
CN202210282529.1A 2022-03-22 2022-03-22 Conversion method and device for attacking face picture, electronic equipment and storage medium Pending CN114612991A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210282529.1A CN114612991A (en) 2022-03-22 2022-03-22 Conversion method and device for attacking face picture, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210282529.1A CN114612991A (en) 2022-03-22 2022-03-22 Conversion method and device for attacking face picture, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114612991A true CN114612991A (en) 2022-06-10

Family

ID=81864315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210282529.1A Pending CN114612991A (en) 2022-03-22 2022-03-22 Conversion method and device for attacking face picture, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114612991A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171199A (en) * 2022-09-05 2022-10-11 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171199A (en) * 2022-09-05 2022-10-11 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Galbally et al. Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms
Yu et al. Privacy protecting visual processing for secure video surveillance
CN113850168A (en) Fusion method, device and equipment of face pictures and storage medium
CN111598051B (en) Face verification method, device, equipment and readable storage medium
CN110532965A (en) Age recognition methods, storage medium and electronic equipment
CN111680544B (en) Face recognition method, device, system, equipment and medium
CN113570684A (en) Image processing method, image processing device, computer equipment and storage medium
Baek et al. Generative adversarial ensemble learning for face forensics
Paul et al. Extraction of facial feature points using cumulative histogram
CN116524125A (en) Meta universe aggregation method and platform
CN114973349A (en) Face image processing method and training method of face image processing model
Barni et al. Iris deidentification with high visual realism for privacy protection on websites and social networks
CN114783017A (en) Method and device for generating confrontation network optimization based on inverse mapping
CN116977463A (en) Image processing method, device, computer equipment, storage medium and product
CN114972010A (en) Image processing method, image processing apparatus, computer device, storage medium, and program product
CN114612991A (en) Conversion method and device for attacking face picture, electronic equipment and storage medium
CN111325252B (en) Image processing method, apparatus, device, and medium
CN112990123B (en) Image processing method, apparatus, computer device and medium
CN116958306A (en) Image synthesis method and device, storage medium and electronic equipment
WO2016032410A1 (en) Intelligent system for photorealistic facial composite production from only fingerprint
CN114612989A (en) Method and device for generating face recognition data set, electronic equipment and storage medium
CN115410257A (en) Image protection method and related equipment
JP5279007B2 (en) Verification system, verification method, program, and recording medium
CN115708135A (en) Face recognition model processing method, face recognition method and device
CN114627204A (en) Two-dimensional reconstruction method, system, equipment and storage medium based on human face

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination