WO2021184898A1

WO2021184898A1 - Facial feature extraction method, apparatus and device

Info

Publication number: WO2021184898A1
Application number: PCT/CN2020/140574
Authority: WO
Inventors: 徐崴
Original assignee: 支付宝(杭州)信息技术有限公司
Priority date: 2020-03-19
Filing date: 2020-12-29
Publication date: 2021-09-23
Also published as: CN111401272B; CN113657352A; CN111401272A

Abstract

Facial feature extraction method, apparatus and device for privacy protection. Said method comprises: inputting a facial image of a user to be identified into an encoder, to obtain an encoding vector of the facial image outputted by the encoder, the encoding vector being vector data obtained after performing feature processing on the facial image (102); and after a decoder in a facial feature extraction model receives the encoding vector, outputting reconstructed facial image data to a feature extraction model in the facial feature extraction model, so that after the feature extraction model performs feature processing on the reconstructed facial image data, a facial feature vector of the user to be identified is outputted (104).

Description

Method, device and equipment for extracting facial features

Technical field

One or more embodiments of this specification relate to the field of computer technology, and in particular to a method, device, and device for extracting facial features.

Background technique

With the development of computer technology and optical imaging technology, user recognition methods based on face recognition technology are becoming popular day by day. At present, it is usually necessary to send the face image of the user to be identified collected by the client device to the server device, so that the server device can extract the face feature vector from the face image of the user to be identified, which can be based on The face feature vector is used to generate the user recognition result. Since the face image of the user to be identified belongs to user sensitive information, this method that requires the face image of the user to be identified to be sent to other devices for feature extraction has the risk of leaking the user's sensitive information.

Based on this, how to extract the user's facial features on the basis of ensuring the privacy of the user's facial information has become an urgent technical problem to be solved.

Summary of the invention

In view of this, one or more embodiments of this specification provide a method, device and device for extracting facial features, which are used to extract the facial features of a user on the basis of ensuring the privacy of the facial information of the user.

In order to solve the above technical problems, the embodiments of this specification are implemented in this way.

The embodiment of this specification provides a face feature extraction method, which uses a user feature extraction model for privacy protection. The user feature extraction model includes an encoder and a face feature extraction model. The feature extraction model is a model obtained by locking a decoder and a feature extraction model based on a convolutional neural network, where the encoder and the decoder constitute an autoencoder; the encoder and the face The decoder in the feature extraction model is connected, and the decoder is connected to the feature extraction model; the method includes: inputting the face image of the user to be identified into the encoder to obtain the person output by the encoder The encoding vector of the face image, where the encoding vector is vector data obtained after characterizing the face image; after receiving the encoding vector, the decoder in the facial feature extraction model extracts the features from the encoding vector The model outputs reconstructed face image data; so that after the feature extraction model performs characterization processing on the reconstructed face image data, it outputs the face feature vector of the user to be identified.

The embodiment of this specification provides a training method for a user feature extraction model for privacy protection, the method includes: obtaining a first training sample set, and the training samples in the first training sample set are face images; using The first training sample set trains the initial autoencoder to obtain the trained autoencoder; obtains a second training sample set, and the training samples in the second training sample set are coding vectors, and the coding vector is The vector data obtained after the face image is characterized by the encoder in the trained autoencoder is used; the training samples in the second training sample set are input into the decoder of the initial face feature extraction model , So as to use the reconstructed face image data output by the decoder to train the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model to obtain the trained face feature extraction model; The initial facial feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained autoencoder; according to the encoding And the trained face feature extraction model to generate a user feature extraction model for privacy protection.

An embodiment of this specification provides a face feature extraction device, which uses a user feature extraction model for privacy protection, and the user feature extraction model includes an encoder and a face feature extraction model. The feature extraction model is a model obtained by locking a decoder and a feature extraction model based on a convolutional neural network, where the encoder and the decoder constitute an autoencoder; the encoder and the face The decoder in the feature extraction model is connected, and the decoder is connected with the feature extraction model; the device includes: an input module for inputting the face image of the user to be identified into the encoder to obtain the encoder The output encoding vector of the face image, where the encoding vector is vector data obtained after characterizing the face image; the face feature vector generation module is used to make the face feature extraction model After receiving the encoding vector, the decoder in the output of the reconstructed face image data to the feature extraction model; so that the feature extraction model performs characterization processing on the reconstructed face image data, and then outputs the user to be identified Face feature vector.

An embodiment of this specification provides a training device for a user feature extraction model for privacy protection. The device includes: a first acquisition module configured to acquire a first training sample set. The training sample is a face image; the first training module is used to train the initial autoencoder using the first training sample set to obtain the trained autoencoder; the second acquisition module is used to obtain the second training sample set The training samples in the second training sample set are coding vectors, and the coding vectors are vector data obtained by characterizing a face image using an encoder in the trained autoencoder; second The training module is used to input the training samples in the second training sample set into the decoder of the initial face feature extraction model, so as to use the reconstructed face image data output by the decoder to compare the initial face The initial feature extraction model based on the convolutional neural network in the feature extraction model is trained to obtain a trained face feature extraction model; the initial face feature extraction model is performed by comparing the decoder and the initial feature extraction model Obtained by locking, the decoder is the decoder in the trained autoencoder; the user feature extraction model generation module is used to extract the model according to the encoder and the trained face features, Generate user feature extraction model for privacy protection.

A client device provided by an embodiment of this specification includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores an image encoder and can be used The instruction executed by the processor, the image encoder is an encoder in a self-encoder, and the instruction is executed by the at least one processor, so that the at least one processor can: Input to the image encoder to obtain the encoding vector of the face image output by the image encoder, where the encoding vector is vector data obtained after characterizing the face image; sending the encoding vector To the server device, so that the server device uses the face feature extraction model to generate the face feature vector of the user to be identified according to the encoding vector, and the face feature extraction model The model obtained by locking the decoder and the feature extraction model based on the convolutional neural network.

A server device provided by an embodiment of this specification includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores a facial feature extraction model, and the facial feature The extraction model is a model obtained by locking a decoder in self-encoding and a feature extraction model based on a convolutional neural network. The memory also stores instructions that can be executed by the at least one processor, and the instructions are The at least one processor executes, so that the at least one processor can: obtain a coding vector of the face image of the user to be recognized, and the coding vector is to use the encoder in the autoencoder to perform the The vector data obtained by image characterization processing; after the coding vector is input to the decoder in the facial feature extraction model, the decoder outputs the reconstructed facial image data to the feature extraction model; After the feature extraction model performs characterization processing on the reconstructed face image data, it outputs the face feature vector of the user to be identified.

An embodiment of this specification provides a training device for a facial feature extraction model for privacy protection, including: at least one processor; and, a memory communicatively connected to the at least one processor; wherein the memory stores There is an instruction executable by the at least one processor, the instruction is executed by the at least one processor, so that the at least one processor can: obtain a first training sample set, in the first training sample set The training sample of is a face image; the initial autoencoder is trained using the first training sample set to obtain the trained autoencoder; the second training sample set is obtained, and the training samples in the second training sample set are Encoding vector, the encoding vector is the vector data obtained after the face image is characterized by the encoder in the trained autoencoder; the training samples in the second training sample set are input into the initial person In the decoder of the face feature extraction model, it is convenient to use the reconstructed face image data output by the decoder to train the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model to obtain training After the face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is the trained autoencoder The decoder; according to the encoder and the trained facial feature extraction model, a user feature extraction model for privacy protection is generated.

An embodiment of this specification achieves the following beneficial effects: because the encoding vector of the face image generated by the encoder in the autoencoder is transmitted, stored or used, the privacy and security of the user's face information will not be compromised. Sex has an impact. Therefore, the service provider can obtain and process the encoding vector of the face image of the user to be identified to generate the face feature vector of the user to be identified without having to obtain the original face image of the user to be identified, thereby ensuring that the user Based on the privacy and security of the facial information, the user's facial feature vector is extracted.

And because the face feature extraction model used to extract the face feature vector is a model obtained by locking the decoder in the autoencoder and the feature extraction model based on the convolutional neural network, the face feature extraction model is In the process of extracting the user's face feature vector, the reconstructed face image data generated by the decoder in the encoder will not be leaked, so as to ensure the privacy and security of the user's face information.

Description of the drawings

The drawings described here are used to provide a further understanding of one or more embodiments of this specification, and constitute a part of one or more embodiments of this specification. The exemplary embodiments of this specification and their descriptions are used to explain one or more embodiments of this specification. The multiple embodiments do not constitute an improper limitation of one or more embodiments in this specification. In the attached picture:

FIG. 1 is a schematic flowchart of a method for extracting facial features according to an embodiment of this specification;

2 is a schematic structural diagram of a face feature extraction model for privacy protection provided by an embodiment of this specification;

3 is a schematic flowchart of a training method for a facial feature extraction model for privacy protection provided by an embodiment of this specification;

FIG. 4 is a schematic structural diagram of a face feature extraction device corresponding to FIG. 1 provided by an embodiment of this specification;

Fig. 5 is a training device for a facial feature extraction model for privacy protection provided by an embodiment of the specification and corresponding to Fig. 3.

Detailed ways

In order to make the purpose, technical solutions and advantages of one or more embodiments of this specification clearer, the technical solutions of one or more embodiments of this specification will be clearly and completely described below in conjunction with specific embodiments of this specification and the corresponding drawings. . Obviously, the described embodiments are only a part of the embodiments in this specification, rather than all the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of one or more embodiments of this specification.

The technical solutions provided by the embodiments of this specification will be described in detail below with reference to the accompanying drawings.

When performing user recognition based on facial recognition technology, it is usually necessary to send the facial image of the user to be recognized to the server provider, so that the service provider can extract the facial feature vector from the facial image of the user to be recognized, and Perform user recognition based on the facial feature vector. Since this method requires the service provider to obtain, store, or process the user's facial image, it is easy to affect the privacy and security of the user's facial information.

At present, when extracting the face feature vector from the user's face image, the user's face image is usually preprocessed before extracting the face feature vector. For example, in a face recognition method based on principal component analysis (PCA), the principal component information is first extracted from the user's face picture, and part of the detailed information is discarded to generate facial features based on the principal component information vector. The face feature vector generated based on this method has the problem of loss of face feature information. It can be seen that the accuracy of the currently extracted face feature vector is also poor.

FIG. 1 is a schematic flowchart of a method for extracting facial features provided by an embodiment of this specification. The method uses a face feature extraction model for privacy protection to extract face feature vectors.

FIG. 2 is a schematic structural diagram of a face feature extraction model for privacy protection provided by an embodiment of this specification. As shown in FIG. 2, the user feature extraction model 201 for privacy protection includes: an encoder 202 and face features The extraction model 203, the face feature extraction model is a model obtained by locking the decoder 204 and the feature extraction model 205 based on the convolutional neural network, wherein the encoder 202 and the decoder 204 form a self-encoding Device. The encoder 202 is connected to the decoder 204 in the face feature extraction model 201, and the decoder 204 is connected to the feature extraction model 205.

From a program perspective, the execution subject of the process shown in FIG. 1 may be a user's facial feature extraction system or a program carried on the user's facial feature extraction system. The user facial feature extraction system may include a client device and a server device. Wherein, the client device may be equipped with an encoder in a face feature extraction model for privacy protection, and the server device may be equipped with a face feature extraction model in a face feature extraction model for privacy protection.

As shown in FIG. 1, the process may include step 102 to step 104.

Step 102: Input the face image of the user to be identified into the encoder to obtain the encoding vector of the face image output by the encoder, where the encoding vector is obtained after characterizing the face image The vector data.

In the embodiments of this specification, when a user uses various applications, he usually needs to register an account at each application. When the user logs in or unlocks the registered account, or the user uses the registered account to make payments, it is usually necessary to perform user identification on the operating user of the registered account (that is, the user to be identified), and confirm that the user to be identified is the registered user. Only after the authenticated user (that is, the designated user) of the account, the user to be identified is allowed to perform subsequent operations. Or, for the scenario where the user needs to pass through the access control system, the user usually needs to be identified, and the user (ie the user to be identified) is determined to be the whitelisted user (ie the designated user) of the access control system before the user is allowed Through the access control system.

When performing user recognition on the user to be identified based on the face recognition technology, the client device usually needs to collect the face image of the user to be identified, and extract the encoding vector of the face image by using the encoder mounted on it. The client device may send the encoding vector to the server device, so that the server device generates the facial feature vector of the user to be identified according to the encoding vector, and then performs user identification based on the generated facial feature vector of the user to be identified.

Wherein, the encoder in step 102 may be an encoder in an auto encoder (AE). The autoencoder is a network model structure in deep learning. Its characteristic is that the input image itself can be used as supervision information, and the input image is reconstructed as the goal for network training, so as to achieve the purpose of encoding the input image (encoding) . Since the autoencoder does not need other information except the input image as the supervision information in the network training, the training cost of the autoencoder is low, and it is economical and practical.

The autoencoder usually includes two parts: an encoder and a decoder. Among them, the encoder in the self-encoder can be used to encode the face image to obtain the encoding vector of the face image, and the decoder in the self-encoder can perform the encoding process on the face image according to the encoding vector. Reconstruction to obtain a reconstructed face image.

Since the encoding vector of the face image generated by the encoder in the autoencoder is the vector data obtained by characterizing the face image, and the encoding vector of the face image cannot reflect the facial information of the user to be identified, therefore, The service provider transmits, stores, and processes the encoding vector of the face image, and will not affect the security and privacy of the face information of the identified user.

In the embodiments of this specification, since the autoencoder is an artificial neural network that can learn input data through unsupervised learning, and can efficiently and accurately represent the input data. Therefore, the face feature information contained in the encoding vector of the face image generated by the encoder in the autoencoder is more comprehensive and the noise is small, so that the face image generated by the encoder in the autoencoder is When the encoding vector is used to extract the face feature vector, the accuracy of the obtained face feature vector can be improved, which in turn helps to improve the accuracy of the user recognition result generated based on the face feature vector.

In the embodiment of this specification, the face image of the user to be identified may be a multi-channel face image. In practical applications, when the face image of the user to be identified collected by the user equipment is a single-channel face image, the single-channel image data of the user to be identified can be determined first; the user to be identified is generated based on the single-channel image data In order to use the encoder in the autoencoder to process the multi-channel face image of the user to be identified, the image data of each channel of the multi-channel face image of the user to be identified is the same as The single-channel image data is the same.

Step 104: After receiving the encoding vector, the decoder in the face feature extraction model outputs reconstructed face image data to the feature extraction model; so that the feature extraction model can perform processing on the reconstructed face image data. After the characterization process, the facial feature vector of the user to be recognized is output.

In the embodiment of this specification, since the training goal of the autoencoder is to minimize the difference between the reconstructed face image and the original face image, it is not used to classify the user's face. Therefore, if the autoencoder is used The encoding vector of the face image extracted by the encoder in the device is directly used as the face feature vector of the user to be identified to perform user identification, which will make the accuracy of the user identification result poor.

In the embodiments of this specification, a facial feature extraction model obtained by locking a decoder in self-encoding and a feature extraction model based on a convolutional neural network can be deployed on the server device. Since the decoder in the self-encoding can be used to generate reconstructed face image data according to the encoding vector of the face image of the user to be recognized, and the feature extraction model based on the convolutional neural network can classify the reconstructed face image data, so , The output vector of the feature extraction model based on the convolutional neural network can be used as the facial feature vector of the user to be recognized, so as to improve the accuracy of the user recognition result generated based on the facial feature vector of the user to be recognized.

In the embodiment of this specification, since the feature extraction model based on the convolutional neural network in the face feature extraction model is used to extract the face feature vector from the reconstructed face image, the feature extraction based on the convolutional neural network The model can be implemented using existing face recognition models based on convolutional neural networks, for example, DeepFace, FaceNet, MTCNN, RetinaFace, etc. It can be seen that the compatibility of the face feature extraction model is better.

And because the decoder in the facial feature extraction model decrypts the encoding vector of the face image of the user to be identified, the reconstructed face image data obtained after decryption processing has a high degree of similarity with the face image of the user to be identified , So that the accuracy of the facial feature vector of the user to be recognized extracted by the feature extraction model based on the convolutional neural network is better.

In the embodiments of this specification, encryption software can be used to lock the decoder in the autoencoder and the feature extraction model based on the convolutional neural network, or the decoder in the autoencoder and the convolutional neural network-based feature extraction model can be locked. The feature extraction model is stored in the security hardware module of the device, so that the user cannot read the reconstructed face image data output by the decoder, thereby ensuring the privacy of the user's face information. In the embodiment of this specification, there are many ways to achieve locking of the decoder in the autoencoder and the feature extraction model, which is not specifically limited, and it only needs to ensure that the reconstructed face output by the decoder in the autoencoder is guaranteed. The security of the image data is sufficient.

In practical applications, after the service provider or other users obtain the read permission for the reconstructed face image data of the user to be identified, they can also obtain the output of the decoder in the facial feature extraction model based on the read permission. Reconstruction of face image data is beneficial to improve the utilization of data.

It should be understood that the order of some of the steps of the method described in one or more embodiments of this specification can be exchanged according to actual needs, or some of the steps can also be omitted or deleted.

In the method in Figure 1, because the service provider can extract the facial feature vector from the encoding vector of the facial image of the user to be identified, there is no need to obtain the facial image of the user to be identified, which avoids the service provider from treating the user to be identified. The transmission, storage and use of facial images to ensure the privacy and security of the facial information of the users to be identified.

And because the reconstructed face image data generated by the decoder in the facial feature extraction model has a high degree of similarity with the face image of the user to be identified, the feature extraction model based on the convolutional neural network can be used to reconstruct the face from the face. The accuracy of the facial feature vector of the user to be recognized extracted from the image is better.

Based on the method in FIG. 1, the examples of this specification also provide some specific implementations of the method, which are described below.

In the embodiment of this specification, the encoder may include: an input layer, a first hidden layer, and a bottleneck layer in the self-encoder, and the decoder may include: a second hidden layer and an output layer in the self-encoder.

Wherein, the input layer of the encoder is connected to the first hidden layer, the first hidden layer is connected to the bottleneck layer, and the bottleneck layer of the encoder is connected to the second hidden layer of the decoder, The second hidden layer is connected to the output layer, and the output layer is connected to the feature extraction model.

The input layer may be used to receive the face image of the user to be identified.

The first hidden layer may be used to perform encoding processing on the face image to obtain a first feature vector.

The bottleneck layer may be used to perform dimensionality reduction processing on the first feature vector to obtain the coding vector of the face image, where the number of dimensions of the coding vector is smaller than the number of dimensions of the first feature vector.

The second hidden layer may be used to decode the encoding vector to obtain the second feature vector.

The output layer may be used to generate reconstructed face image data according to the second feature vector.

In the embodiment of this specification, since the encoder in the self-encoder needs to encode the image, and the decoder in the self-encoder needs to generate a reconstructed face image, in order to ensure the encoding effect and the decoding effect, the first The hidden layer and the second hidden layer may include multiple convolutional layers, and the first hidden layer and the second hidden layer may also include a pooling layer and a fully connected layer. The bottleneck layer (Bottleneck layer) can be used to reduce the feature dimension. The dimension of the feature vector output by the hidden layer connected to the bottleneck layer is higher than the dimension of the feature vector output by the bottleneck layer.

In the embodiment of this specification, the feature extraction model based on the convolutional neural network may include: an input layer, a convolutional layer, a fully connected layer, and an output layer; wherein the input layer is connected to the output of the decoder, so The input layer is also connected to the convolutional layer, the convolutional layer is connected to the fully connected layer, and the fully connected layer is connected to the output layer.

The input layer may be used to receive the reconstructed face image data output by the decoder; the convolutional layer may be used to perform local feature extraction on the reconstructed face image data to obtain the information of the user to be identified Human face local feature vector; the fully connected layer can be used to generate the face feature vector of the user to be identified according to the local face feature vector.

The output layer may be used to generate a face classification result according to the face feature vector of the user to be recognized output by the fully connected layer.

In the embodiment of this specification, the facial feature vector of the user to be recognized may be the output vector of the fully connected layer adjacent to the output layer; or, when the fully connected layer in the feature extraction model based on the convolutional neural network When there are multiple, the facial feature vector of the user to be recognized may also be an output vector of a fully connected layer separated from the output layer by N network layers; this is not specifically limited.

In the embodiment of this specification, because the facial feature vector of the user to be recognized generated in step 104 can be used in a user recognition scene. Therefore, the face feature extraction model may further include a user matching model, and the input of the user matching model may be connected with the output of the feature extraction model based on the convolutional neural network in the face feature extraction model.

After step 104, it may further include: making the user matching model receive the facial feature vector of the user to be identified and the facial feature vector of the designated user, and according to the facial feature vector of the user to be identified and the designated user The vector distance between the facial feature vectors of the users is generated to indicate whether the user to be identified is the designated user or not, wherein the facial feature vector of the designated user uses the encoder and the human The facial feature extraction model is obtained by processing the face image of the specified user.

In the embodiments of this specification, the vector distance between the face feature vector of the user to be identified and the face feature vector of the specified user can be used to indicate the distance between the face feature vector of the user to be identified and the face feature vector of the specified user的similarity. Specifically, when the vector distance is less than or equal to the threshold, it can be determined that the user to be identified and the designated user are the same user. When the vector distance is greater than the threshold, it can be determined that the user to be identified and the designated user are different users. The threshold can be determined according to actual needs, and there is no specific limitation on this.

In the embodiment of this specification, the method in FIG. 1 may be used to generate the facial feature vector of the user to be recognized and the facial feature vector of the designated user. Since the accuracy of the user's face feature vector generated based on the method in FIG. 1 is better, it is beneficial to improve the accuracy of the user recognition result.

FIG. 3 is a schematic flowchart of a method for training a face recognition model provided by an embodiment of this specification. From a program perspective, the main body of execution of the process can be a server or a program loaded on a server. As shown in FIG. 3, the process may include step 302 to step 310.

Step 302: Obtain a first training sample set, and the training samples in the first training sample set are face images.

In the embodiment of this specification, the training samples in the first training sample set are face images for which use rights have been obtained. For example, public face images in the face database or face pictures authorized by users, etc., to ensure that the training process of the face recognition model does not affect the privacy of the user's face information.

In the embodiment of this specification, the training samples in the first training sample set may be multi-channel face images. When the face image in the public face database or the face image authorized by the user is a single-channel face image, The single-channel image data of the face image may be determined first; a multi-channel image is generated according to the single-channel image data to use the multi-channel image as a training sample in the first training sample set. The image data of each channel is the same as the single-channel image data, thereby ensuring the consistency of the training samples in the first training sample set.

Step 304: Use the first training sample set to train the initial autoencoder to obtain the trained autoencoder.

In the embodiment of this specification, step 304 may specifically include: for each training sample in the first training sample set, input the training sample to the initial autoencoder to obtain reconstructed face image data to minimize The image reconstruction loss is the target, and the model parameters of the initial autoencoder are optimized to obtain the trained autoencoder; the image reconstruction loss is the difference between the reconstructed face image data and the training sample .

In the embodiment of this specification, the input layer, the first hidden layer, and the bottleneck layer in the self-encoder constitute an encoder, and the second hidden layer and the output layer in the self-encoder constitute a decoder. The encoder can be used to encode the face image to obtain the encoding vector of the face image. The decoder can decode the code vector generated by the encoder to obtain a reconstructed face image. The function of each layer of the self-encoder may be the same as the function of each layer of the self-encoder mentioned in the embodiment of the method in FIG. 1, which will not be repeated here.

Step 306: Obtain a second training sample set, the training samples in the second training sample set are coding vectors, and the coding vector is to use the encoder in the trained autoencoder to characterize the face image The vector data obtained after processing.

In the embodiment of this specification, the training samples in the second training sample set may be vectors obtained by using the encoder in the trained autoencoder to characterize the face image of the user who needs privacy protection. data. Among them, users who need privacy protection can be determined according to actual needs. For example, the operating user and the authenticated user of the registered account at the application site. Or, users to be identified and whitelisted users at the entrance guard based on face recognition technology.

In the embodiment of this specification, the encoder in the trained autoencoder can be used in advance to generate and store the training samples in the second training sample set. When step 306 is executed, it is only necessary to extract the training samples in the second training sample set generated in advance from the database. Since the training samples in the second training sample set stored in the database are the coding vector of the user’s face image, and the coding vector cannot reflect the appearance information of the user to be recognized, the service provider should The transmission, storage and processing of training samples will not affect the privacy of the user's face information.

Step 308: Input the training samples in the second training sample set into the decoder of the initial facial feature extraction model, so that the reconstructed facial image data output by the decoder can be used to extract the initial facial features The initial feature extraction model based on the convolutional neural network in the model is trained to obtain a trained face feature extraction model; the initial face feature extraction model is performed by locking the decoder and the initial feature extraction model And obtained, the decoder is the decoder in the self-encoder after training.

In the embodiments of this specification, when training the initial face feature extraction model, it is not necessary to optimize the model parameters of the decoder in the initial face feature extraction model, but only the initial feature extraction based on the convolutional neural network. The model parameters of the model can be optimized.

The using the reconstructed face image data output by the decoder to train the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model may specifically include: using the initial feature extraction model Performing classification processing on the reconstructed face image data to obtain the predicted value of the category label of the reconstructed face image data; obtaining the preset value of the category label for the reconstructed face image data; aiming at minimizing the classification loss, The model parameters of the initial feature extraction model are optimized, and the classification loss is a difference value between the predicted value of the category label and the preset value of the category label.

Step 310: Generate a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

In the embodiment of this specification, the input of the encoder is used to receive the face image of the user to be recognized, and the output of the encoder is connected with the input of the decoder in the trained face feature extraction model, and The output of the decoder is connected to the input of the feature extraction model based on the convolutional neural network in the trained face feature extraction model, and the output of the feature extraction model based on the convolutional neural network is the face of the user to be recognized Feature vector.

In the embodiments of this specification, the autoencoder and the initial facial feature extraction model are trained to build a face feature extraction for privacy protection based on the trained autoencoder and the trained initial facial feature extraction model Model. Since the autoencoder does not need other information except the input image as the supervision information in the network training, the training cost of the face feature extraction model for privacy protection can be reduced, which is economical and practical.

Based on the method in Figure 3, the examples of this specification also provide some specific implementations of the method, which will be described below.

In the embodiment of this specification, the user feature extraction model generated by the method in FIG. 3 can be applied to a user recognition scenario. After the user feature extraction model is used to extract the user's face feature vector, it is usually necessary to compare the user's face feature vector to generate the final user recognition result.

Therefore, before generating the facial feature extraction model for privacy protection in step 310, it may further include: establishing a user matching model, the user matching model being used to match the first facial feature vector of the user to be identified with the specified user The vector distance between the second face feature vectors of the, to generate an output result indicating whether the user to be recognized is the specified user, and the first face feature vector is obtained by using the encoder and the trained The face feature extraction model is obtained by processing the face image of the user to be recognized, and the second face feature vector is obtained by using the encoder and the trained face feature extraction model to analyze the specified user Is obtained by processing the face image of;

Step 310 may specifically include: generating a user feature extraction model for privacy protection composed of the encoder, the trained facial feature extraction model, and the user matching model.

Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method. FIG. 4 is a schematic structural diagram of a facial feature extraction device corresponding to FIG. 1 provided by an embodiment of this specification. The device uses a user feature extraction model for privacy protection. The user feature extraction model may include an encoder and a face feature extraction model. The face feature extraction model is based on a decoder and a convolutional neural network. A model obtained by locking the feature extraction model of the network, wherein the encoder and the decoder form a self-encoder; the encoder is connected to the decoder in the face feature extraction model, and the decoder Connected to the feature extraction model; the device may include: an input module 402, which may be used to input the face image of the user to be identified into the encoder to obtain the encoding vector of the face image output by the encoder The encoding vector is vector data obtained after characterizing the face image; the face feature vector generating module 404 may be used to make the decoder in the face feature extraction model receive the encoding vector Then, output the reconstructed face image data to the feature extraction model; so that the feature extraction model performs characterization processing on the reconstructed face image data, and then outputs the face feature vector of the user to be identified.

Optionally, the encoder may include: an input layer, a first hidden layer, and a bottleneck layer in the self-encoder, and the decoder may include: a second hidden layer and an output layer in the self-encoder; The input layer of the encoder is connected to the first hidden layer, the first hidden layer is connected to the bottleneck layer, the bottleneck layer of the encoder is connected to the second hidden layer of the decoder, and the first hidden layer is connected to the second hidden layer of the decoder. The second hidden layer is connected to the output layer, and the output layer is connected to the feature extraction model.

The input layer in the self-encoder may be used to receive the face image of the user to be identified; the first hidden layer may be used to encode the face image to obtain a first feature vector; The bottleneck layer may be used to perform dimensionality reduction processing on the first feature vector to obtain an encoding vector of the face image, where the number of dimensions of the encoding vector is less than the number of dimensions of the first feature vector; The second hidden layer can be used to decode the encoding vector to obtain a second feature vector; the output layer can be used to generate reconstructed face image data according to the second feature vector.

Optionally, the feature extraction model based on the convolutional neural network may include: an input layer, a convolutional layer, and a fully connected layer; wherein the input layer is connected to the output of the decoder, and the input layer is also connected to The convolutional layer is connected, and the convolutional layer is connected to the fully connected layer.

The input layer of the feature extraction model based on the convolutional neural network may be used to receive reconstructed face image data output by the decoder; the convolutional layer may be used to perform local features on the reconstructed face image data Extraction to obtain the facial feature vector of the user to be recognized; the fully connected layer is used to generate the facial feature vector of the user to be recognized based on the facial feature vector.

Optionally, the user feature extraction model may further include a user matching model, and the user matching model is connected to the feature extraction model; the device may further include:

The user matching module is configured to enable the user matching model to receive the facial feature vector of the user to be identified and the facial feature vector of the designated user, and then according to the facial feature vector of the user to be identified and the facial feature vector of the designated user The vector distance between the face feature vectors is used to generate output information indicating whether the user to be identified is the specified user, wherein the face feature vector of the specified user uses the encoder and the face feature The extraction model is obtained by processing the face image of the specified user.

Based on the same idea, the embodiment of this specification also provides a device corresponding to the above method. Fig. 5 is a training device for a facial feature extraction model for privacy protection provided by an embodiment of the specification and corresponding to Fig. 3. As shown in Figure 5, the device may include the following modules.

The first obtaining module 502 is configured to obtain a first training sample set, and the training samples in the first training sample set are face images.

The first training module 504 is configured to use the first training sample set to train the initial autoencoder to obtain the trained autoencoder.

The second acquisition module 506 is configured to acquire a second training sample set, and the training samples in the second training sample set are coding vectors, and the coding vectors are used for coding the encoder in the trained autoencoder. The vector data obtained after the face image is characterized.

The second training module 508 is configured to input the training samples in the second training sample set into the decoder of the initial facial feature extraction model, so as to use the reconstructed face image data output by the decoder to perform the The initial feature extraction model based on the convolutional neural network in the initial face feature extraction model is trained to obtain the trained face feature extraction model; the initial face feature extraction model is performed by comparing the decoder and the initial It is obtained by locking the feature extraction model, and the decoder is the decoder in the trained autoencoder.

The user feature extraction model generation module 510 is configured to generate a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

Optionally, the first training module 504 may be specifically configured to: for each training sample in the first training sample set, input the training sample into the initial autoencoder to obtain a reconstructed face image Data; with the goal of minimizing image reconstruction loss, the model parameters of the initial autoencoder are optimized to obtain a trained autoencoder; the image reconstruction loss is the reconstructed face image data and the training sample The difference value between.

Optionally, using the reconstructed face image data output by the decoder to train an initial feature extraction model based on a convolutional neural network in the initial face feature extraction model may specifically include: using the The initial feature extraction model classifies the reconstructed face image data to obtain the predicted value of the category label of the reconstructed face image data; obtains the preset value of the category label for the reconstructed face image data; to minimize the classification Loss is the target, the model parameters of the initial feature extraction model are optimized, and the classification loss is the difference between the predicted value of the category label and the preset value of the category label.

Optionally, the apparatus in FIG. 5 may further include: a user matching model establishment module, configured to establish a user matching model, the user matching model being used to identify the user’s first facial feature vector and the specified user’s second facial feature vector. The vector distance between the face feature vectors is generated to indicate whether the user to be recognized is the specified user or not, and the first face feature vector uses the encoder and the trained face features The extraction model is obtained by processing the face image of the user to be identified, and the second face feature vector is obtained by using the encoder and the trained face feature extraction model to analyze the face of the specified user. The image is processed.

The user feature extraction model generation module 510 may be specifically used to generate a user feature extraction model for privacy protection composed of the encoder, the trained facial feature extraction model, and the user matching model.

Based on the same idea, the embodiment of this specification also provides a client device corresponding to the above method. The client device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores an image encoder and instructions executable by the at least one processor, The image encoder is an encoder in a self-encoder, and the instructions are executed by the at least one processor, so that the at least one processor can: input the face image of the user to be recognized into the image encoder To obtain the encoding vector of the face image output by the image encoder, where the encoding vector is vector data obtained after characterizing the face image.

Send the coding vector to the server device, so that the server device uses the facial feature extraction model to generate the facial feature vector of the user to be identified according to the coding vector, and the facial feature extraction model The model obtained by locking the decoder in the self-encoding and the feature extraction model based on the convolutional neural network.

In the embodiment of this specification, the client device can use the encoder in the self-encoder on it to generate the encoding vector of the face image of the user to be recognized, so that the client device can send the waiting device to the server device. Recognize the encoding vector of the user's face image for user identification, without sending the face image of the user to be identified to the server device, avoiding the transmission of the face image of the user to be identified, to ensure the face information of the user to be identified Privacy and security.

Based on the same idea, the embodiment of this specification also provides a server device corresponding to the above method. The server device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores a facial feature extraction model, and the facial feature extraction model is obtained by The decoder in the encoding and the model obtained by locking based on the feature extraction model of the convolutional neural network, the memory also stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor. Is executed to enable the at least one processor to: obtain an encoding vector of the face image of the user to be identified, the encoding vector is obtained by using the encoder in the autoencoder to perform characterization processing on the face image The obtained vector data; after the encoding vector is input to the decoder in the facial feature extraction model, the decoder outputs the reconstructed face image data to the feature extraction model; so that the feature extraction model can After the reconstructed face image data is characterized, the face feature vector of the user to be recognized is output.

In the embodiment of this specification, the server device can generate the facial feature vector of the user to be recognized based on the facial feature extraction model carried on it based on the coding vector of the facial image of the user to be recognized, so that the server can be The device does not need to obtain the face image of the user to be identified to be able to perform user identification, which not only avoids the transmission operation of the face image of the user to be identified, but also prevents the server device from storing and processing the face image of the user to be identified. Improve the privacy and security of the face information of the user to be identified.

Based on the same idea, the embodiment of this specification also provides a training device for the facial feature extraction model for privacy protection corresponding to the method in FIG. 3. The device may include: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are The at least one processor executes, so that the at least one processor can: obtain a first training sample set, and the training samples in the first training sample set are face images.

The initial autoencoder is trained by using the first training sample set to obtain the trained autoencoder.

Obtain a second training sample set, the training samples in the second training sample set are coding vectors, and the coding vectors are obtained by characterizing the face image using the encoder in the trained autoencoder The vector data.

The training samples in the second training sample set are input into the decoder of the initial facial feature extraction model, so that the reconstructed facial image data output by the decoder can be used to extract data from the initial facial feature extraction model. The initial feature extraction model based on the convolutional neural network is trained to obtain a trained face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model , The decoder is a decoder in the self-encoder after training.

According to the encoder and the trained face feature extraction model, a user feature extraction model for privacy protection is generated.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 1990s, the improvement of a technology can be clearly distinguished between hardware improvements (for example, improvements in circuit structures such as diodes, transistors, switches, etc.) or software improvements (improvements in method flow). However, with the development of technology, the improvement of many methods and processes of today can be regarded as a direct improvement of the hardware circuit structure. Designers almost always get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by the hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (for example, a Field Programmable Gate Array (Field Programmable Gate Array, FPGA)) is such an integrated circuit whose logic function is determined by the user's programming of the device. It is programmed by the designer to "integrate" a digital system on a piece of PLD, without requiring chip manufacturers to design and manufacture dedicated integrated circuit chips. Moreover, nowadays, instead of manually making integrated circuit chips, this kind of programming is mostly realized with "logic compiler" software, which is similar to the software compiler used in program development and writing, but before compilation The original code must also be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one type of HDL, but many types, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (Ruby Hardware Description), etc., currently most commonly used It is VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that only a little logic programming of the method flow in the above-mentioned hardware description languages and programming into an integrated circuit can easily obtain the hardware circuit that implements the logic method flow.

The controller can be implemented in any suitable manner. For example, the controller can take the form of, for example, a microprocessor or a processor and a computer-readable medium storing computer-readable program codes (such as software or firmware) executable by the (micro)processor. , Logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers. Examples of controllers include but are not limited to the following microcontrollers: ARC625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as a part of the control logic of the memory. Those skilled in the art also know that, in addition to implementing the controller in a purely computer-readable program code manner, it is entirely possible to program the method steps to make the controller use logic gates, switches, application-specific integrated circuits, programmable logic controllers, and embedded logic. The same function can be realized in the form of a microcontroller or the like. Therefore, such a controller can be regarded as a hardware component, and the devices included in it for realizing various functions can also be regarded as a structure within the hardware component. Or even, the device for realizing various functions can be regarded as both a software module for realizing the method and a structure within a hardware component.

The systems, devices, modules, or units illustrated in the above embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. Specifically, the computer may be, for example, a personal computer, a laptop computer, a cell phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or Any combination of these devices.

For the convenience of description, when describing the above device, the functions are divided into various units and described separately. Of course, when implementing one or more embodiments of this specification, the functions of each unit may be implemented in the same one or more software and/or hardware.

Those skilled in the art should understand that one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.

One or more embodiments of this specification are described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to one or more embodiments of this specification. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment can be used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

In a typical configuration, the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.

Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, commodity, or equipment that includes the element.

One or more embodiments of this specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of this specification can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media including storage devices.

The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.

The above descriptions are only the embodiments of this specification, and are not used to limit one or more embodiments of this specification. For those skilled in the art, one or more embodiments of this specification may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of this specification should be included in the scope of the claims of one or more embodiments of this specification.

Claims

A face feature extraction method that uses a user feature extraction model for privacy protection. The user feature extraction model includes an encoder and a face feature extraction model. The face feature extraction model is based on A decoder and a model obtained by locking a feature extraction model based on a convolutional neural network, wherein the encoder and the decoder form a self-encoder;

The encoder is connected to a decoder in the face feature extraction model, and the decoder is connected to the feature extraction model; the method includes:

Input the face image of the user to be identified into the encoder to obtain the encoding vector of the face image output by the encoder, where the encoding vector is vector data obtained after characterizing the face image ；

After receiving the encoding vector, the decoder in the face feature extraction model outputs reconstructed face image data to the feature extraction model; so that the feature extraction model performs characterization processing on the reconstructed face image data Then, output the facial feature vector of the user to be recognized.
The method of claim 1, wherein the encoder includes: an input layer, a first hidden layer, and a bottleneck layer, and the decoder includes: a second hidden layer and an output layer;

Wherein, the input layer of the encoder is connected to the first hidden layer, the first hidden layer is connected to the bottleneck layer, and the bottleneck layer of the encoder is connected to the second hidden layer of the decoder, The second hidden layer is connected to the output layer, and the output layer is connected to the feature extraction model;

The input layer is used to receive the face image of the user to be identified;

The first hidden layer is used to perform encoding processing on the face image to obtain a first feature vector;

The bottleneck layer is configured to perform dimensionality reduction processing on the first feature vector to obtain an encoding vector of the face image, where the number of dimensions of the encoding vector is smaller than the number of dimensions of the first feature vector;

The second hidden layer is used to decode the code vector to obtain a second feature vector;

The output layer is used to generate reconstructed face image data according to the second feature vector.
The method according to claim 1, wherein the feature extraction model based on convolutional neural network includes: input layer, convolutional layer and fully connected layer;

Wherein, the input layer is connected to the output of the decoder, the input layer is also connected to the convolutional layer, and the convolutional layer is connected to the fully connected layer;

The input layer is used to receive reconstructed face image data output by the decoder;

The convolutional layer is used to perform local feature extraction on the reconstructed face image data to obtain the local feature vector of the face of the user to be recognized;

The fully connected layer is used to generate the face feature vector of the user to be recognized according to the local feature vector of the face.
The method according to claim 3, the feature extraction model based on the convolutional neural network further comprises an output layer, the output layer is connected with the fully connected layer; The output facial feature vector of the user to be recognized is generated to generate a face classification result;

The face feature vector of the user to be recognized is an output vector of a fully connected layer adjacent to the output layer.
The method according to claim 1, wherein the user feature extraction model further comprises a user matching model, and the user matching model is connected to the feature extraction model; the method further comprises:

The user matching model receives the facial feature vector of the user to be recognized and the facial feature vector of the designated user, and based on the difference between the facial feature vector of the user to be recognized and the facial feature vector of the designated user The vector distance is used to generate output information indicating whether the user to be identified is the specified user, wherein the face feature vector of the specified user is used to perform the calculation of the specified user on the specified user using the encoder and the face feature extraction model. The face image is processed.
A training method for a user feature extraction model for privacy protection, the method includes:

Acquiring a first training sample set, where the training samples in the first training sample set are face images;

Training the initial autoencoder by using the first training sample set to obtain a trained autoencoder;

Obtain a second training sample set, the training samples in the second training sample set are coding vectors, and the coding vectors are obtained after characterizing the face image using the encoder in the trained autoencoder Vector data;

The training samples in the second training sample set are input into the decoder of the initial facial feature extraction model, so that the reconstructed facial image data output by the decoder can be used to extract data from the initial facial feature extraction model. The initial feature extraction model based on the convolutional neural network is trained to obtain a trained face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model , The decoder is a decoder in the trained autoencoder;

According to the encoder and the trained face feature extraction model, a user feature extraction model for privacy protection is generated.
7. The method according to claim 6, wherein the training of the initial autoencoder by using the first training sample set to obtain the trained autoencoder specifically includes:

For each training sample in the first training sample set, input the training sample into the initial autoencoder to obtain reconstructed face image data;

With the goal of minimizing the image reconstruction loss, the model parameters of the initial autoencoder are optimized to obtain the trained autoencoder; the image reconstruction loss is the difference between the reconstructed face image data and the training sample The difference value.
8. The method according to claim 6, said using the reconstructed face image data output by the decoder to train an initial feature extraction model based on a convolutional neural network in the initial face feature extraction model, which specifically includes :

Performing classification processing on the reconstructed face image data by using the initial feature extraction model to obtain a category label predicted value of the reconstructed face image data;

Acquiring a preset value of a category label for the reconstructed face image data;

With the objective of minimizing the classification loss, the model parameters of the initial feature extraction model are optimized, and the classification loss is the difference value between the predicted value of the category label and the preset value of the category label.
7. The method according to claim 6, wherein the training samples in the first training sample set are face images for which use rights have been obtained.
7. The method according to claim 6, wherein the training samples in the second training sample set are vector data obtained by using the encoder to perform characterization processing on the face image of the user who needs privacy protection.
The method according to claim 6, before generating the user feature extraction model for privacy protection, further comprising:

A user matching model is established, and the user matching model is used to generate the vector distance between the first facial feature vector of the user to be recognized and the second facial feature vector of the specified user to generate a Specify the output result of the user, the first face feature vector is obtained by processing the face image of the user to be identified using the encoder and the trained face feature extraction model, and the second The face feature vector is obtained by processing the face image of the designated user by using the encoder and the trained face feature extraction model;

The generating a user feature extraction model for privacy protection specifically includes:

A user feature extraction model for privacy protection composed of the encoder, the trained face feature extraction model, and the user matching model is generated.
A face feature extraction device that uses a user feature extraction model for privacy protection. The user feature extraction model includes an encoder and a face feature extraction model. The face feature extraction model is based on A decoder and a model obtained by locking a feature extraction model based on a convolutional neural network, wherein the encoder and the decoder constitute an autoencoder; the encoder and the decoder in the facial feature extraction model Connected to the decoder, the decoder is connected to the feature extraction model; the device includes:

The input module is configured to input the face image of the user to be identified into the encoder to obtain the encoding vector of the face image output by the encoder, and the encoding vector is to perform characterization processing on the face image The vector data obtained afterwards;

The face feature vector generation module is used to enable the decoder in the face feature extraction model to output the reconstructed face image data to the feature extraction model after receiving the encoding vector; After the reconstructed face image data is characterized, the face feature vector of the user to be recognized is output.
11. The apparatus of claim 12, wherein the encoder includes: an input layer, a first hidden layer, and a bottleneck layer, and the decoder includes: a second hidden layer and an output layer;

Wherein, the input layer of the encoder is connected to the first hidden layer, the first hidden layer is connected to the bottleneck layer, and the bottleneck layer of the encoder is connected to the second hidden layer of the decoder, The second hidden layer is connected to the output layer, and the output layer is connected to the feature extraction model;

The input layer is used to receive the face image of the user to be identified;

The first hidden layer is used to perform encoding processing on the face image to obtain a first feature vector;

The bottleneck layer is configured to perform dimensionality reduction processing on the first feature vector to obtain an encoding vector of the face image, where the number of dimensions of the encoding vector is smaller than the number of dimensions of the first feature vector;

The second hidden layer is used to decode the code vector to obtain a second feature vector;

The output layer is used to generate reconstructed face image data according to the second feature vector.
The device according to claim 12, the feature extraction model based on convolutional neural network comprises: an input layer, a convolutional layer, and a fully connected layer;

Wherein, the input layer is connected to the output of the decoder, the input layer is also connected to the convolutional layer, and the convolutional layer is connected to the fully connected layer;

The input layer is used to receive reconstructed face image data output by the decoder;

The convolutional layer is used to perform local feature extraction on the reconstructed face image data to obtain the local feature vector of the face of the user to be recognized;

The fully connected layer is used to generate the face feature vector of the user to be recognized according to the local feature vector of the face.
11. The device of claim 11, wherein the user feature extraction model further comprises a user matching model, and the user matching model is connected to the feature extraction model; the device further comprises:

The user matching module is configured to enable the user matching model to receive the facial feature vector of the user to be identified and the facial feature vector of the designated user, and then according to the facial feature vector of the user to be identified and the facial feature vector of the designated user The vector distance between the face feature vectors is used to generate output information indicating whether the user to be identified is the specified user, wherein the face feature vector of the specified user uses the encoder and the face feature The extraction model is obtained by processing the face image of the specified user.
A training device for a user feature extraction model for privacy protection, the device comprising:

The first acquisition module is configured to acquire a first training sample set, and the training samples in the first training sample set are face images;

The first training module is used to train the initial autoencoder by using the first training sample set to obtain the trained autoencoder;

The second acquisition module is configured to acquire a second training sample set, the training samples in the second training sample set are coding vectors, and the coding vector is the use of the encoder in the trained autoencoder to compare faces Vector data obtained after image characterization processing;

The second training module is used to input the training samples in the second training sample set into the decoder of the initial face feature extraction model, so as to use the reconstructed face image data output by the decoder to perform the calculation of the initial face image data. The initial feature extraction model based on the convolutional neural network in the face feature extraction model is trained to obtain a trained face feature extraction model; the initial face feature extraction model is performed by comparing the decoder and the initial feature Obtained by extracting and locking the model, the decoder is the decoder in the trained autoencoder;

The user feature extraction model generation module is used to generate a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.
The device according to claim 16, wherein the first training module is specifically configured to:

For each training sample in the first training sample set, input the training sample into the initial autoencoder to obtain reconstructed face image data;

With the goal of minimizing the image reconstruction loss, the model parameters of the initial autoencoder are optimized to obtain the trained autoencoder; the image reconstruction loss is the difference between the reconstructed face image data and the training sample The difference value.
16. The device according to claim 16, said using the reconstructed face image data output by the decoder to train an initial feature extraction model based on a convolutional neural network in the initial face feature extraction model, which specifically includes :

Performing classification processing on the reconstructed face image data by using the initial feature extraction model to obtain a category label predicted value of the reconstructed face image data;

Acquiring a preset value of a category label for the reconstructed face image data;

With the objective of minimizing the classification loss, the model parameters of the initial feature extraction model are optimized, and the classification loss is the difference value between the predicted value of the category label and the preset value of the category label.
The device of claim 16, further comprising:

The user matching model establishment module is used to establish a user matching model. The user matching model is used to generate a representation based on the vector distance between the first face feature vector of the user to be identified and the second face feature vector of the specified user. The output result of whether the user to be recognized is the designated user, and the first facial feature vector is performed on the facial image of the user to be recognized using the encoder and the trained facial feature extraction model Obtained by processing, the second face feature vector is obtained by processing the face image of the designated user by using the encoder and the trained face feature extraction model;

The user feature extraction model generation module is specifically used for:

A user feature extraction model for privacy protection composed of the encoder, the trained face feature extraction model, and the user matching model is generated.
A client device, including:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores an image encoder and instructions executable by the at least one processor. The image encoder is an encoder in a self-encoder, and the instructions are executed by the at least one processor to enable the Said at least one processor can:

The face image of the user to be identified is input to the image encoder, and the encoding vector of the face image output by the image encoder is obtained, and the encoding vector is obtained by characterizing the face image Vector data

Send the coding vector to the server device, so that the server device uses the facial feature extraction model to generate the facial feature vector of the user to be identified according to the coding vector, and the facial feature extraction model The model obtained by locking the decoder in the self-encoding and the feature extraction model based on the convolutional neural network.
A server device, including:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores a facial feature extraction model, the facial feature extraction model is a model obtained by locking a decoder in self-encoding and a feature extraction model based on a convolutional neural network, and the memory also stores An instruction executed by the at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can:

Acquiring an encoding vector of the face image of the user to be identified, where the encoding vector is vector data obtained by performing characterization processing on the face image using an encoder in the autoencoder;

After the coding vector is input to the decoder in the face feature extraction model, the decoder outputs reconstructed face image data to the feature extraction model; so that the feature extraction model can perform the reconstruction of the face image After the data is characterized, the facial feature vector of the user to be recognized is output.
A training device for face feature extraction model for privacy protection, including:

At least one processor; and,

A memory communicatively connected with the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can:

Acquiring a first training sample set, where the training samples in the first training sample set are face images;

Training the initial autoencoder by using the first training sample set to obtain a trained autoencoder;

Obtain a second training sample set, the training samples in the second training sample set are coding vectors, and the coding vectors are obtained after characterizing the face image using the encoder in the trained autoencoder Vector data;

Input the training samples in the second training sample set into the decoder of the initial facial feature extraction model, so as to use the reconstructed face image data output by the decoder to extract the original facial features from the model. The initial feature extraction model based on the convolutional neural network is trained to obtain a trained face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model , The decoder is a decoder in the trained autoencoder;

According to the encoder and the trained face feature extraction model, a user feature extraction model for privacy protection is generated.