CN111401272B

CN111401272B - Face feature extraction method, device and equipment

Info

Publication number: CN111401272B
Application number: CN202010197694.8A
Authority: CN
Inventors: 徐崴
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2021-08-24
Anticipated expiration: 2040-03-19
Also published as: CN113657352A; CN111401272A; WO2021184898A1

Abstract

The embodiment of the specification discloses a face feature extraction method, a face feature extraction device and face feature extraction equipment for privacy protection. The scheme comprises the following steps: inputting a face image of a user to be identified into an encoder to obtain a coding vector of the face image output by the encoder. And after a decoder in a face feature extraction model is used for receiving the coding vector, outputting reconstructed face image data to a feature extraction model in the face feature extraction model, so that the feature extraction model performs characterization processing on the reconstructed face image data and outputs the face feature vector of the user to be identified.

Description

Face feature extraction method, device and equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method, an apparatus, and a device for extracting a face feature.

Background

With the development of computer technology and optical imaging technology, a user recognition mode based on a face recognition technology is becoming popular. At present, a face image of a user to be recognized, which is acquired by a client device, needs to be sent to a server device, so that the server device extracts a face feature vector from the face image of the user to be recognized, and a user recognition result can be generated based on the face feature vector. Because the face image of the user to be recognized belongs to the user sensitive information, the method for extracting the features by sending the face image of the user to be recognized to other equipment has the risk of leaking the user sensitive information.

Based on this, how to extract the face features of the user on the basis of ensuring the privacy of the face information of the user becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure provide a method, an apparatus, and a device for extracting a face feature, which are used to extract a face feature of a user on the basis of ensuring privacy of face information of the user.

In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:

an embodiment of the present specification provides a face feature extraction method, where a user feature extraction model for privacy protection is used, and the user feature extraction model includes: the system comprises an encoder and a human face feature extraction model, wherein the human face feature extraction model is obtained by locking a decoder and a feature extraction model based on a convolutional neural network, and the encoder and the decoder form a self-encoder;

the encoder is connected with a decoder in the human face feature extraction model, and the decoder is connected with the feature extraction model; the method comprises the following steps:

inputting a face image of a user to be identified into the encoder to obtain a coding vector of the face image output by the encoder, wherein the coding vector is vector data obtained after the face image is characterized;

a decoder in the human face feature extraction model receives the coding vector and outputs reconstructed human face image data to the feature extraction model; and outputting the face feature vector of the user to be identified after the feature extraction model performs the feature processing on the reconstructed face image data.

The embodiment of the specification provides a training method for a user feature extraction model for privacy protection, and the method comprises the following steps:

acquiring a first training sample set, wherein training samples in the first training sample set are human face images;

training an initial self-encoder by using the first training sample set to obtain a trained self-encoder;

acquiring a second training sample set, wherein training samples in the second training sample set are coding vectors, and the coding vectors are vector data obtained by characterizing a face image by using an encoder in the trained self-encoder;

inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model, so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder, and obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder;

and generating a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

An embodiment of the present specification provides a facial feature extraction apparatus, where the apparatus uses a user feature extraction model for privacy protection, where the user feature extraction model includes: the system comprises an encoder and a human face feature extraction model, wherein the human face feature extraction model is obtained by locking a decoder and a feature extraction model based on a convolutional neural network, and the encoder and the decoder form a self-encoder; the encoder is connected with a decoder in the human face feature extraction model, and the decoder is connected with the feature extraction model; the device comprises:

the input module is used for inputting a face image of a user to be identified into the encoder to obtain a coding vector of the face image output by the encoder, wherein the coding vector is vector data obtained after the face image is characterized;

the human face feature vector generation module is used for enabling a decoder in the human face feature extraction model to receive the coding vector and then outputting reconstructed human face image data to the feature extraction model; and outputting the face feature vector of the user to be identified after the feature extraction model performs the feature processing on the reconstructed face image data.

An embodiment of the present specification provides a training apparatus for a user feature extraction model for privacy protection, where the apparatus includes:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first training sample set, and training samples in the first training sample set are human face images;

the first training module is used for training an initial self-encoder by utilizing the first training sample set to obtain a trained self-encoder;

a second obtaining module, configured to obtain a second training sample set, where a training sample in the second training sample set is a coding vector, and the coding vector is vector data obtained by performing characterization processing on a face image by using an encoder in the trained self-encoder;

the second training module is used for inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder to obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder;

and the user feature extraction model generation module is used for generating a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

An embodiment of this specification provides a client device, including:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores an image encoder that is an encoder in a self-encoder and instructions executable by the at least one processor to enable the at least one processor to:

inputting a face image of a user to be identified into the image encoder to obtain a coding vector of the face image output by the image encoder, wherein the coding vector is vector data obtained after the face image is characterized;

and sending the coding vector to server-side equipment so that the server-side equipment can generate the face feature vector of the user to be identified according to the coding vector by using a face feature extraction model, wherein the face feature extraction model is a model obtained by locking a decoder in self-coding and a feature extraction model based on a convolutional neural network.

An embodiment of this specification provides a server device, including:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a facial feature extraction model that is a model obtained by locking a decoder in self-encoding and a convolutional neural network-based feature extraction model, and further stores instructions executable by the at least one processor to enable the at least one processor to:

acquiring a coding vector of a face image of a user to be identified, wherein the coding vector is vector data obtained by characterizing the face image by using an encoder in the self-encoder;

after the coding vector is input into a decoder in the human face feature extraction model, the decoder outputs reconstructed human face image data to the feature extraction model; and outputting the face feature vector of the user to be identified after the feature extraction model performs the feature processing on the reconstructed face image data.

An embodiment of the present specification provides a training device for a face feature extraction model for privacy protection, including:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to:

One embodiment of the present description achieves the following advantageous effects:

when the coding vector of the face image generated by the encoder in the self-encoder is transmitted, stored or used, the privacy and the safety of the face information of the user cannot be influenced. Therefore, the service provider can generate the face feature vector of the user to be recognized by acquiring and processing the coding vector of the face image of the user to be recognized, and does not need to acquire the original face image of the user to be recognized, so that the face feature vector of the user can be extracted on the basis of ensuring the privacy and the safety of the face information of the user.

And because the face feature extraction model for extracting the face feature vector is obtained by locking the decoder in the self-encoder and the feature extraction model based on the convolutional neural network, reconstructed face image data generated by the decoder in the self-encoder cannot be leaked in the process of extracting the face feature vector of the user by the face feature extraction model, so that the privacy and the safety of the face information of the user are ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of one or more embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the disclosure and together with the description serve to explain the embodiments of the disclosure and not to limit the embodiments of the disclosure. In the drawings:

fig. 1 is a schematic flow chart of a face feature extraction method provided in an embodiment of the present specification;

fig. 2 is a schematic structural diagram of a face feature extraction model for privacy protection according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a training method for a face feature extraction model for privacy protection according to an embodiment of the present specification;

fig. 4 is a schematic structural diagram of a face feature extraction apparatus corresponding to fig. 1 provided in an embodiment of the present specification;

fig. 5 is a training apparatus for a face feature extraction model for privacy protection, provided in an embodiment of the present specification and corresponding to fig. 3.

Detailed Description

To make the objects, technical solutions and advantages of one or more embodiments of the present disclosure more apparent, the technical solutions of one or more embodiments of the present disclosure will be described in detail and completely with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present specification, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from the embodiments given herein without making any creative effort fall within the scope of protection of one or more embodiments of the present specification.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

In the prior art, when a user is identified based on a face identification technology, a face image of a user to be identified generally needs to be sent to a server provider, so that the service provider extracts a face feature vector from the face image of the user to be identified, and performs user identification based on the face feature vector. Since this method requires the service provider to acquire, store, or process the face image of the user, it is easy to affect the privacy and security of the face information of the user.

At present, when extracting a face feature vector from a user face image, the face feature vector is usually extracted after the user face image is preprocessed. For example, in a Principal Component Analysis (PCA) based face recognition method, principal component information is first extracted from a face picture of a user, and part of detail information is discarded to generate a face feature vector based on the principal component information. The problem that the face feature vector generated based on the method is lost is solved, and the accuracy of the face feature vector extracted at present is poor.

In order to solve the defects in the prior art, the scheme provides the following embodiments:

fig. 1 is a schematic flow chart of a face feature extraction method provided in an embodiment of the present specification. The method uses a face feature extraction model for privacy protection to extract face feature vectors.

Fig. 2 is a schematic structural diagram of a face feature extraction model for privacy protection according to an embodiment of the present disclosure, and as shown in fig. 2, a user feature extraction model 201 for privacy protection includes: an encoder 202 and a face feature extraction model 203, the face feature extraction model is obtained by locking a decoder 204 and a feature extraction model 205 based on a convolutional neural network, wherein the encoder 202 and the decoder 204 form a self-encoder. The encoder 202 is connected to a decoder 204 in the face feature extraction model 201, and the decoder 204 is connected to the feature extraction model 205.

From the viewpoint of a program, an execution subject of the flow shown in fig. 1 may be a user face feature extraction system or a program loaded on the user face feature extraction system. The user face feature extraction system can comprise client equipment and server equipment. The client device may be equipped with an encoder in a face feature extraction model for privacy protection, and the server device may be equipped with a face feature extraction model in a face feature extraction model for privacy protection.

As shown in fig. 1, the process may include the following steps:

step 102: inputting a face image of a user to be identified into the encoder to obtain a coding vector of the face image output by the encoder, wherein the coding vector is vector data obtained after the face image is characterized.

In this embodiment, when a user uses various applications, the user usually needs to register an account at each application. When a user logs in or unlocks the registered account, or the user uses the registered account to pay, and other scenarios, user identification is usually performed on an operation user (i.e., a user to be identified) of the registered account, and after the user to be identified is determined to be an authentication user (i.e., a designated user) of the registered account, the user to be identified is allowed to perform subsequent operations. Or, for a scenario in which a user needs to pass through an access control system, the user needs to be identified, and after determining that the user (i.e., a user to be identified) is a white list user (i.e., a designated user) of the access control system, the user is allowed to pass through the access control system.

When a user to be recognized is recognized based on a face recognition technology, the client device usually needs to collect a face image of the user to be recognized, and an encoder carried by the client device is used for extracting an encoding vector of the face image. The client device can send the coding vector to the server device, so that the server device can generate a face feature vector of the user to be recognized according to the coding vector, and then perform user recognition based on the generated face feature vector of the user to be recognized.

The encoder in step 102 may be an encoder in an Auto Encoder (AE). The self-encoder is a network model structure in deep learning, and is characterized in that an input image can be used as supervision information to reconstruct the input image to perform network training, so that the purpose of encoding (encoding) the input image is achieved. The self-encoder does not need other information except the input image as supervision information in network training, so that the training cost of the self-encoder is low, and the self-encoder is economical and practical.

A self-encoder typically comprises two parts, an encoder (encoder) and a decoder (decoder). The encoder in the self-encoder may be configured to perform encoding processing on the face image to obtain an encoding vector of the face image, and the decoder in the self-encoder may reconstruct the face image according to the encoding vector to obtain a reconstructed face image.

The coding vector of the face image generated by the encoder in the self-encoder is vector data obtained after the face image is subjected to characterization processing, and the coding vector of the face image cannot reflect the appearance information of the user to be identified, so that the service provider transmits, stores and processes the coding vector of the face image and cannot influence the safety and privacy of the face information of the user to be identified.

In the embodiment of the specification, the self-encoder is an artificial neural network which can learn the input data through unsupervised learning and can efficiently and accurately represent the input data. Therefore, the face feature information contained in the coding vector of the face image generated based on the encoder in the self-encoder is comprehensive, and the noise is low, so that when the face feature vector is extracted based on the coding vector of the face image generated by the encoder in the self-encoder, the accuracy of the obtained face feature vector can be improved, and the accuracy of the user identification result generated based on the face feature vector can be improved.

In this embodiment of the present specification, the face image of the user to be recognized may be a multi-channel face image. In practical application, when a face image of a user to be identified, which is acquired by user equipment, is a single-channel face image, single-channel image data of the user to be identified can be determined firstly; and generating a multi-channel image of the user to be identified according to the single-channel image data so as to process the multi-channel face image of the user to be identified by utilizing an encoder in an auto-encoder, wherein the image data of each channel of the multi-channel face image of the user to be identified is the same as the single-channel image data.

Step 104: a decoder in the human face feature extraction model receives the coding vector and outputs reconstructed human face image data to the feature extraction model; and outputting the face feature vector of the user to be identified after the feature extraction model performs the feature processing on the reconstructed face image data.

In the embodiment of the present specification, since the training target of the self-encoder is to minimize the difference between the reconstructed face image and the original face image, and not to classify the face of the user, if the coding vector of the face image extracted by the encoder in the self-encoder is directly used as the face feature vector of the user to be identified, to perform user identification, the accuracy of the user identification result is poor.

In this specification, a face feature extraction model obtained by locking a decoder in self-encoding and a feature extraction model based on a convolutional neural network may be deployed on a server device. The decoder in the self-coding process can be used for generating reconstructed face image data according to the coding vector of the face image of the user to be recognized, and the feature extraction model based on the convolutional neural network can classify the reconstructed face image data, so that the output vector of the feature extraction model based on the convolutional neural network can be used as the face feature vector of the user to be recognized, and the accuracy of the user recognition result generated based on the face feature vector of the user to be recognized is improved.

In the embodiment of the present specification, since the convolutional neural network-based feature extraction model in the face feature extraction model is used to extract a face feature vector from a reconstructed face image, the convolutional neural network-based feature extraction model may be implemented by using an existing convolutional neural network-based face recognition model, for example, deep face, FaceNet, MTCNN, RetinaFace, and the like. Therefore, the compatibility of the face feature extraction model is better.

And because the decoder in the facial feature extraction model decrypts the coding vector of the facial image of the user to be recognized, the reconstructed facial image data obtained by decrypting the coding vector of the facial image of the user to be recognized has higher similarity with the facial image of the user to be recognized, so that the accuracy of the facial feature vector of the user to be recognized extracted by the feature extraction model based on the convolutional neural network is better.

In this embodiment of the present specification, the decoder in the self-encoder and the feature extraction model based on the convolutional neural network may be locked by using encryption software, or the decoder in the self-encoder and the feature extraction model based on the convolutional neural network may be stored in a security hardware module of the device, so that the user cannot read the reconstructed face image data output by the decoder, thereby ensuring privacy of the face information of the user. In the embodiments of the present specification, there are various implementation manners for locking the decoder in the self-encoder and the feature extraction model, and this is not particularly limited, and only the use security of the reconstructed face image data output by the decoder in the self-encoder needs to be ensured.

In practical application, when a service provider or other users obtain a reading right for the reconstructed face image data of a user to be identified, the reconstructed face image data output by a decoder in the face feature extraction model can be obtained based on the reading right, so that the utilization rate of the data is favorably improved.

It should be understood that the order of some steps in the method described in one or more embodiments of the present disclosure may be interchanged according to actual needs, or some steps may be omitted or deleted.

In the method in fig. 1, the service provider can extract the face feature vector from the coding vector of the face image of the user to be recognized, so that the face image of the user to be recognized does not need to be acquired, and the transmission, storage and use of the face image of the user to be recognized by the service provider are avoided, so as to ensure the privacy and security of the face information of the user to be recognized.

And the similarity between the reconstructed face image data generated by the decoder in the face feature extraction model and the face image of the user to be identified is higher, so that the accuracy of the face feature vector of the user to be identified extracted from the reconstructed face image by the feature extraction model based on the convolutional neural network is better.

Based on the process of fig. 1, some specific embodiments of the process are also provided in the examples of this specification, which are described below.

In this specification, the encoder may include: from an input layer, a first hidden layer, and a bottleneck layer in an encoder, the decoder may include: from a second hidden layer and an output layer in the encoder.

Wherein the input layer of the encoder is connected with the first hidden layer, the first hidden layer is connected with the bottleneck layer, the bottleneck layer of the encoder is connected with the second hidden layer of the decoder, the second hidden layer is connected with the output layer, and the output layer is connected with the feature extraction model.

The input layer may be configured to receive a face image of the user to be recognized.

The first hidden layer may be configured to perform encoding processing on the face image to obtain a first feature vector.

The bottleneck layer may be configured to perform dimension reduction processing on the first feature vector to obtain a coding vector of the face image, where the number of dimensions of the coding vector is smaller than the number of dimensions of the first feature vector.

The second hidden layer may be configured to decode the encoded vector to obtain a second feature vector.

The output layer may be configured to generate reconstructed face image data according to the second feature vector.

In this embodiment of the present disclosure, since an encoder in the self-encoder needs to encode an image, and a decoder in the self-encoder needs to generate a reconstructed face image, in order to ensure an encoding effect and a decoding effect, the first hidden layer and the second hidden layer may include a plurality of convolutional layers, and the first hidden layer and the second hidden layer may further include a pooling layer and a full-link layer. The Bottleneck layer (bottleeck layer) can be used to reduce feature dimensions. The dimensionality of the feature vectors output by the hidden layer connected with the bottleneck layer is higher than that of the feature vectors output by the bottleneck layer.

In this embodiment, the feature extraction model based on the convolutional neural network may include: an input layer, a convolution layer, a full connection layer and an output layer; the input layer is connected with the output of the decoder, the input layer is also connected with the convolution layer, the convolution layer is connected with the full-connection layer, and the full-connection layer is connected with the output layer.

The input layer can be used for receiving reconstructed face image data output by the decoder;

the convolution layer can be used for extracting local features of the reconstructed face image data to obtain a face local feature vector of the user to be identified;

the full connection layer may be configured to generate the face feature vector of the user to be identified according to the face local feature vector.

The output layer can be used for generating a face classification result according to the face feature vector of the user to be recognized, which is output by the full connection layer.

In this embodiment of the present specification, the facial feature vector of the user to be recognized may be an output vector of a fully connected layer adjacent to the output layer; or, when there are multiple fully connected layers in the feature extraction model based on the convolutional neural network, the face feature vector of the user to be recognized may also be an output vector of a fully connected layer spaced by N network layers from the output layer; this is not particularly limited.

In this embodiment of the present specification, the face feature vector of the user to be recognized generated in step 104 may be used in the user recognition scene. Therefore, the facial feature extraction model may further include a user matching model, and an input of the user matching model may be connected to an output of the convolutional neural network-based feature extraction model in the facial feature extraction model.

After step 104, the method may further include: and enabling the user matching model to receive the face feature vector of the user to be identified and the face feature vector of the appointed user, and generating output information representing whether the user to be identified is the appointed user according to the vector distance between the face feature vector of the user to be identified and the face feature vector of the appointed user, wherein the face feature vector of the appointed user is obtained by processing the face image of the appointed user by using the encoder and the face feature extraction model.

In this specification, a vector distance between a face feature vector of a user to be recognized and a face feature vector of a designated user may be used to represent a similarity between the face feature vector of the user to be recognized and the face feature vector of the designated user. Specifically, when the vector distance is less than or equal to the threshold, it may be determined that the user to be identified and the designated user are the same user. And when the vector distance is larger than the threshold value, the user to be identified and the designated user can be determined to be different users. The threshold may be determined according to actual requirements, and is not particularly limited.

In this embodiment, the method in fig. 1 may be used to generate a face feature vector of a user to be identified and a face feature vector of a specified user. The accuracy of the user face feature vector generated based on the method in the figure 1 is better, so that the accuracy of the user identification result is favorably improved.

Fig. 3 is a schematic flowchart of a training method for a face recognition model according to an embodiment of the present disclosure. From the viewpoint of a program, the execution subject of the flow may be a server or a program installed on a server. As shown in fig. 3, the process may include the following steps:

step 302: obtaining a first training sample set, wherein training samples in the first training sample set are human face images.

In an embodiment of the present specification, the training samples in the first training sample set are face images that have acquired usage rights. For example, face images in a face database or face images authorized by a user are disclosed, so that the privacy of the face information of the user is not affected in the face recognition model training process.

In this embodiment of the present specification, the training samples in the first training sample set may be multi-channel face images, and when a face image in a public face database or a face image authorized by a user is a single-channel face image, single-channel image data of the face image may be determined first; and generating a multi-channel image according to the single-channel image data to use the multi-channel image as a training sample in a first training sample set, wherein the image data of each channel of the multi-channel face image is the same as the single-channel image data, so that the consistency of the training samples in the first training sample set is ensured.

Step 304: and training the initial self-encoder by using the first training sample set to obtain the trained self-encoder.

In this embodiment of this specification, step 304 may specifically include: aiming at each training sample in the first training sample set, inputting the training sample into the initial self-encoder to obtain reconstructed face image data, and optimizing model parameters of the initial self-encoder by taking minimized image reconstruction loss as a target to obtain a trained self-encoder; the image reconstruction loss is a difference value between the reconstructed face image data and the training sample.

In an embodiment of the present specification, the input layer, the first hidden layer, and the bottleneck layer in the self-encoder constitute an encoder, and the second hidden layer and the output layer in the self-encoder constitute a decoder. The encoder may be configured to perform encoding processing on a face image to obtain an encoding vector of the face image. And the decoder can decode the coding vector generated by the coder to obtain a reconstructed face image. The functions of each layer of the self-encoder may be the same as those of each layer of the self-encoder mentioned in the embodiment of the method in fig. 1, and are not described again.

Step 306: and acquiring a second training sample set, wherein the training samples in the second training sample set are coding vectors, and the coding vectors are vector data obtained by performing characterization processing on the face images by using an encoder in the trained self-encoder.

In this embodiment of the present specification, the training samples in the second training sample set may be vector data obtained by performing a characterization process on a face image of a user needing privacy protection by using an encoder in a trained self-encoder. The user needing privacy protection can be determined according to actual requirements. For example, an operating user and an authenticating user of a registered account at an application. Or a user to be identified and a white list user at the entrance guard position based on the face recognition technology.

In this embodiment, the encoder in the trained auto-encoder may be used to generate and store the training samples in the second training sample set in advance. When step 306 is executed, only the training samples in the pre-generated second training sample set need to be extracted from the database. Because the training samples in the second training sample set stored in the database are coding vectors of face images of the user, and the coding vectors cannot reflect the physiognomic information of the user to be identified, the service provider does not affect the privacy of the face information of the user when transmitting, storing and processing the training samples in the second training sample set.

Step 308: inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model, so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder, and obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder.

In the embodiment of the present specification, when the initial face feature extraction model is trained, it is not necessary to optimize the model parameters of the decoder in the initial face feature extraction model, but only the model parameters of the initial feature extraction model based on the convolutional neural network are optimized.

The training of the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model by using the reconstructed face image data output by the decoder may specifically include:

classifying the reconstructed face image data by using the initial feature extraction model to obtain a category label predicted value of the reconstructed face image data; acquiring a category label preset value aiming at the reconstructed face image data; and optimizing the model parameters of the initial feature extraction model by taking the minimized classification loss as a target, wherein the classification loss is a difference value between the predicted value of the class label and the preset value of the class label.

Step 310: and generating a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

In an embodiment of the present specification, an input of the encoder is configured to receive a face image of a user to be recognized, an output of the encoder is connected to an input of a decoder in the trained face feature extraction model, an output of the decoder is connected to an input of a feature extraction model based on a convolutional neural network in the trained face feature extraction model, and an output of the feature extraction model based on the convolutional neural network is a face feature vector of the user to be recognized.

In the embodiment of the present description, a face feature extraction model for privacy protection is built based on a self-encoder after training and an initial face feature extraction model after training by training the self-encoder and the initial face feature extraction model. The self-encoder does not need other information except the input image as supervision information in network training, so that the training cost of the face feature extraction model for privacy protection can be reduced, and the self-encoder is economical and practical.

Based on the method of fig. 3, the present specification also provides some specific embodiments of the method, which are described below.

In this embodiment of the present specification, the user feature extraction model generated by the method in fig. 3 may be applied to a user identification scenario. After the user face feature vectors are extracted by using the user feature extraction model, the user face feature vectors are usually required to be compared to generate a final user identification result.

Therefore, before generating the face feature extraction model for privacy protection in step 310, the method may further include: establishing a user matching model, wherein the user matching model is used for generating an output result for indicating whether a user to be identified is the appointed user according to a vector distance between a first face feature vector of the user to be identified and a second face feature vector of the appointed user, the first face feature vector is obtained by processing a face image of the user to be identified by using the encoder and the trained face feature extraction model, and the second face feature vector is obtained by processing the face image of the appointed user by using the encoder and the trained face feature extraction model;

step 310 may specifically include: and generating a user feature extraction model for privacy protection, which is composed of the encoder, the trained face feature extraction model and the user matching model.

Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method. Fig. 4 is a schematic structural diagram of a face feature extraction apparatus corresponding to fig. 1 provided in an embodiment of the present specification. The apparatus uses a user feature extraction model for privacy protection, which may include: the system comprises an encoder and a human face feature extraction model, wherein the human face feature extraction model is obtained by locking a decoder and a feature extraction model based on a convolutional neural network, and the encoder and the decoder form a self-encoder; the encoder is connected with a decoder in the human face feature extraction model, and the decoder is connected with the feature extraction model; the apparatus may include:

an input module 402, configured to input a face image of a user to be identified into the encoder, to obtain a coding vector of the face image output by the encoder, where the coding vector is vector data obtained by performing a characterization process on the face image;

a face feature vector generation module 404, configured to enable a decoder in the face feature extraction model to receive the coding vector and output reconstructed face image data to the feature extraction model; and outputting the face feature vector of the user to be identified after the feature extraction model performs the feature processing on the reconstructed face image data.

Optionally, the encoder may include: from an input layer, a first hidden layer, and a bottleneck layer in an encoder, the decoder may include: a second hidden layer and an output layer in the self-encoder; wherein the input layer of the encoder is connected with the first hidden layer, the first hidden layer is connected with the bottleneck layer, the bottleneck layer of the encoder is connected with the second hidden layer of the decoder, the second hidden layer is connected with the output layer, and the output layer is connected with the feature extraction model.

The input layer in the self-encoder can be used for receiving the face image of the user to be identified; the first hidden layer may be configured to perform encoding processing on the face image to obtain a first feature vector; the bottleneck layer may be configured to perform dimension reduction processing on the first feature vector to obtain a coding vector of the face image, where the number of dimensions of the coding vector is smaller than the number of dimensions of the first feature vector; the second hidden layer may be configured to decode the encoded vector to obtain a second feature vector; the output layer may be configured to generate reconstructed face image data according to the second feature vector.

Optionally, the feature extraction model based on the convolutional neural network may include: an input layer, a convolution layer and a full connection layer; wherein the input layer is connected with the output of the decoder, the input layer is also connected with the convolutional layer, and the convolutional layer is connected with the full connection layer.

The feature extraction model input layer based on the convolutional neural network can be used for receiving reconstructed face image data output by the decoder; the convolution layer can be used for extracting local features of the reconstructed face image data to obtain a face local feature vector of the user to be identified; and the full connection layer is used for generating the face characteristic vector of the user to be identified according to the face local characteristic vector.

Optionally, the user feature extraction model may further include a user matching model, and the user matching model is connected to the feature extraction model; the apparatus may further include:

and the user matching module is used for enabling the user matching model to receive the face feature vector of the user to be identified and the face feature vector of the appointed user, and then generating output information for indicating whether the user to be identified is the appointed user according to the vector distance between the face feature vector of the user to be identified and the face feature vector of the appointed user, wherein the face feature vector of the appointed user is obtained by processing the face image of the appointed user by using the encoder and the face feature extraction model.

Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method. Fig. 5 is a training apparatus for a face feature extraction model for privacy protection, provided in an embodiment of the present specification and corresponding to fig. 3. As shown in fig. 5, the apparatus may include:

a first obtaining module 502, configured to obtain a first training sample set, where training samples in the first training sample set are face images;

a first training module 504, configured to train an initial self-encoder with the first training sample set to obtain a trained self-encoder;

a second obtaining module 506, configured to obtain a second training sample set, where a training sample in the second training sample set is a coding vector, and the coding vector is vector data obtained by performing characterization processing on a face image by using an encoder in the trained self-encoder;

a second training module 508, configured to input training samples in the second training sample set into a decoder of an initial face feature extraction model, so as to train an initial feature extraction model based on a convolutional neural network in the initial face feature extraction model by using reconstructed face image data output by the decoder, so as to obtain a trained face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder;

and a user feature extraction model generation module 510, configured to generate a user feature extraction model for privacy protection according to the encoder and the trained face feature extraction model.

Optionally, the first training module 502 may be specifically configured to:

inputting the training sample into the initial self-encoder for each training sample in the first training sample set to obtain reconstructed face image data; optimizing the model parameters of the initial self-encoder by taking the minimized image reconstruction loss as a target to obtain a trained self-encoder; the image reconstruction loss is a difference value between the reconstructed face image data and the training sample.

Optionally, the training of the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model by using the reconstructed face image data output by the decoder may specifically include: classifying the reconstructed face image data by using the initial feature extraction model to obtain a category label predicted value of the reconstructed face image data; acquiring a category label preset value aiming at the reconstructed face image data; and optimizing the model parameters of the initial feature extraction model by taking the minimized classification loss as a target, wherein the classification loss is a difference value between the predicted value of the class label and the preset value of the class label.

Optionally, the apparatus in fig. 5 may further include: the system comprises a user matching model establishing module, a user matching model establishing module and a user matching model, wherein the user matching model is used for generating an output result for indicating whether a user to be identified is an appointed user according to a vector distance between a first face feature vector of the user to be identified and a second face feature vector of the appointed user, the first face feature vector is obtained by processing a face image of the user to be identified through an encoder and a trained face feature extraction model, and the second face feature vector is obtained by processing the face image of the appointed user through the encoder and the trained face feature extraction model.

The user feature extraction model generation module 510 may be specifically configured to: and generating a user feature extraction model for privacy protection, which is composed of the encoder, the trained face feature extraction model and the user matching model.

Based on the same idea, the embodiment of the present specification further provides a client device corresponding to the method. The client device may include:

inputting a face image of a user to be identified into the image encoder to obtain a coding vector of the face image output by the image encoder, wherein the coding vector is vector data obtained after the face image is characterized.

In the embodiment of the present specification, the client device may generate the coding vector of the face image of the user to be recognized by using the encoder in the self-encoder carried by the client device, so that the client device may send the coding vector of the face image of the user to be recognized to the server device for user recognition, and does not need to send the face image of the user to be recognized to the server device, thereby avoiding transmission of the face image of the user to be recognized, and ensuring privacy and security of the face information of the user to be recognized.

Based on the same idea, the embodiment of the present specification further provides a server device corresponding to the method. The server side equipment can comprise:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a facial feature extraction model that is a model obtained by locking a decoder in self-encoding and a convolutional neural network-based feature extraction model, and further stores instructions executable by the at least one processor to enable the at least one processor to:

In the embodiment of the present specification, the server device may generate the face feature vector of the user to be recognized according to the coding vector of the face image of the user to be recognized based on the face feature extraction model carried on the server device, so that the server device may perform user recognition without acquiring the face image of the user to be recognized, thereby not only avoiding transmission operation of the face image of the user to be recognized, but also avoiding the server device from storing and processing the face image of the user to be recognized, so as to improve privacy and security of the face information of the user to be recognized.

Based on the same idea, an embodiment of the present specification further provides a training device for a face feature extraction model for privacy protection, which corresponds to the method in fig. 3. The apparatus may include:

obtaining a first training sample set, wherein training samples in the first training sample set are human face images.

And training the initial self-encoder by using the first training sample set to obtain the trained self-encoder.

And acquiring a second training sample set, wherein the training samples in the second training sample set are coding vectors, and the coding vectors are vector data obtained by performing characterization processing on the face images by using an encoder in the trained self-encoder.

Inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model, so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder, and obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean expression Language), ahdl (alternate Language Description Language), traffic, pl (core universal programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), and vhjrag-Language (Hardware Description Language), which are currently used by most commonly. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, AtmelAT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

One skilled in the art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to one or more embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is merely exemplary of the present disclosure and is not intended to limit one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments described herein will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of the present specification should be included in the scope of claims of one or more embodiments of the present specification.

Claims

1. A face feature extraction method using a user feature extraction model for privacy protection, the user feature extraction model comprising: the system comprises an encoder and a human face feature extraction model, wherein the human face feature extraction model is obtained by carrying out encryption locking on a decoder and a feature extraction model based on a convolutional neural network, and the encoder and the decoder form a self-encoder;

inputting a face image of a user to be identified into the encoder to obtain a coding vector of the face image output by the encoder, wherein the coding vector is vector data obtained after the face image is characterized; the method comprises the steps that a face image of a user to be identified is a multi-channel face image, and image data of each channel of the multi-channel face image of the user to be identified is the same as single-channel image data;

2. The method of claim 1, the encoder comprising: an input layer, a first hidden layer, and a bottleneck layer, the decoder comprising: a second hidden layer and an output layer;

wherein the input layer of the encoder is connected with the first hidden layer, the first hidden layer is connected with the bottleneck layer, the bottleneck layer of the encoder is connected with the second hidden layer of the decoder, the second hidden layer is connected with the output layer, and the output layer is connected with the feature extraction model;

the input layer is used for receiving the face image of the user to be identified;

the first hidden layer is used for coding the face image to obtain a first feature vector;

the bottleneck layer is used for performing dimension reduction processing on the first feature vector to obtain a coding vector of the face image, and the dimension number of the coding vector is smaller than that of the first feature vector;

the second hidden layer is used for decoding the coding vector to obtain a second feature vector;

and the output layer is used for generating reconstructed face image data according to the second characteristic vector.

3. The method of claim 1, the convolutional neural network-based feature extraction model comprising: an input layer, a convolution layer and a full connection layer;

wherein the input layer is connected with the output of the decoder, the input layer is also connected with the convolutional layer, and the convolutional layer is connected with the full connection layer;

the input layer is used for receiving the reconstructed face image data output by the decoder;

the convolution layer is used for extracting local features of the reconstructed face image data to obtain a face local feature vector of the user to be identified;

and the full connection layer is used for generating the face characteristic vector of the user to be identified according to the face local characteristic vector.

4. The method of claim 3, the convolutional neural network-based feature extraction model further comprising an output layer, the output layer connected to the fully-connected layer; the output layer is used for generating a face classification result according to the face feature vector of the user to be identified output by the full connection layer;

and the face feature vector of the user to be recognized is an output vector of a full connection layer adjacent to the output layer.

5. The method of claim 1, the user feature extraction model further comprising a user matching model, the user matching model connected to the feature extraction model; the method further comprises the following steps:

the user matching model receives the face feature vector of the user to be identified and the face feature vector of the designated user, and generates output information representing whether the user to be identified is the designated user according to the vector distance between the face feature vector of the user to be identified and the face feature vector of the designated user, wherein the face feature vector of the designated user is obtained by processing the face image of the designated user by using the encoder and the face feature extraction model.

6. A training method for a user feature extraction model for privacy protection, the method comprising:

acquiring a first training sample set, wherein training samples in the first training sample set are human face images; the face image is a multi-channel face image, and the image data of each channel of the multi-channel face image is the same as the single-channel image data;

inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model, so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder, and obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by encrypting and locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder;

7. The method of claim 6, wherein the training the initial self-encoder with the first training sample set to obtain the trained self-encoder specifically comprises:

inputting the training sample into the initial self-encoder for each training sample in the first training sample set to obtain reconstructed face image data;

optimizing the model parameters of the initial self-encoder by taking the minimized image reconstruction loss as a target to obtain a trained self-encoder; the image reconstruction loss is a difference value between the reconstructed face image data and the training sample.

8. The method according to claim 6, wherein the training of the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model by using the reconstructed face image data output by the decoder specifically comprises:

classifying the reconstructed face image data by using the initial feature extraction model to obtain a category label predicted value of the reconstructed face image data;

acquiring a category label preset value aiming at the reconstructed face image data;

and optimizing the model parameters of the initial feature extraction model by taking the minimized classification loss as a target, wherein the classification loss is a difference value between the predicted value of the class label and the preset value of the class label.

9. The method of claim 6, wherein the training samples in the first set of training samples are face images that have been authorized for use.

10. The method of claim 6, wherein the training samples in the second set of training samples are vector data obtained by characterizing facial images of a user who needs privacy protection by the encoder.

11. The method of claim 6, prior to generating the user feature extraction model for privacy protection, further comprising:

establishing a user matching model, wherein the user matching model is used for generating an output result for indicating whether a user to be identified is the appointed user according to a vector distance between a first face feature vector of the user to be identified and a second face feature vector of the appointed user, the first face feature vector is obtained by processing a face image of the user to be identified by using the encoder and the trained face feature extraction model, and the second face feature vector is obtained by processing the face image of the appointed user by using the encoder and the trained face feature extraction model;

the generating of the user feature extraction model for privacy protection specifically includes:

and generating a user feature extraction model for privacy protection, which is composed of the encoder, the trained face feature extraction model and the user matching model.

12. A facial feature extraction apparatus using a user feature extraction model for privacy protection, the user feature extraction model comprising: the system comprises an encoder and a human face feature extraction model, wherein the human face feature extraction model is obtained by carrying out encryption locking on a decoder and a feature extraction model based on a convolutional neural network, and the encoder and the decoder form a self-encoder; the encoder is connected with a decoder in the human face feature extraction model, and the decoder is connected with the feature extraction model; the device comprises:

the input module is used for inputting a face image of a user to be identified into the encoder to obtain a coding vector of the face image output by the encoder, wherein the coding vector is vector data obtained after the face image is characterized; the method comprises the steps that a face image of a user to be identified is a multi-channel face image, and image data of each channel of the multi-channel face image of the user to be identified is the same as single-channel image data;

13. The apparatus of claim 12, the encoder comprising: an input layer, a first hidden layer, and a bottleneck layer, the decoder comprising: a second hidden layer and an output layer;

14. The apparatus of claim 12, the convolutional neural network-based feature extraction model comprising: an input layer, a convolution layer and a full connection layer;

15. The apparatus of claim 12, the user feature extraction model further comprising a user matching model, the user matching model connected to the feature extraction model; the device further comprises:

16. A training apparatus for a user feature extraction model for privacy protection, the apparatus comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first training sample set, and training samples in the first training sample set are human face images; the face image is a multi-channel face image, and the image data of each channel of the multi-channel face image is the same as the single-channel image data;

the second training module is used for inputting the training samples in the second training sample set into a decoder of an initial human face feature extraction model so as to train the initial feature extraction model based on a convolutional neural network in the initial human face feature extraction model by using reconstructed human face image data output by the decoder to obtain a trained human face feature extraction model; the initial face feature extraction model is obtained by encrypting and locking the decoder and the initial feature extraction model, and the decoder is a decoder in the trained self-encoder;

17. The apparatus of claim 16, the first training module to:

18. The apparatus according to claim 16, wherein the training of the initial feature extraction model based on the convolutional neural network in the initial face feature extraction model by using the reconstructed face image data output by the decoder specifically comprises:

19. The apparatus of claim 16, further comprising:

a user matching model establishing module, configured to establish a user matching model, where the user matching model is configured to generate an output result indicating whether a user to be identified is an appointed user according to a vector distance between a first face feature vector of the user to be identified and a second face feature vector of the appointed user, the first face feature vector is obtained by processing a face image of the user to be identified using the encoder and the trained face feature extraction model, and the second face feature vector is obtained by processing the face image of the appointed user using the encoder and the trained face feature extraction model;

the user feature extraction model generation module is specifically configured to:

20. A client device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores an image encoder that is an encoder of a self-encoder and instructions executable by the at least one processor to enable the at least one processor to:

inputting a face image of a user to be identified into the image encoder to obtain a coding vector of the face image output by the image encoder, wherein the coding vector is vector data obtained after the face image is characterized; the method comprises the steps that a face image of a user to be identified is a multi-channel face image, and image data of each channel of the multi-channel face image of the user to be identified is the same as single-channel image data;

and sending the coding vector to server-side equipment so that the server-side equipment can generate the face feature vector of the user to be identified according to the coding vector by using a face feature extraction model, wherein the face feature extraction model is obtained by carrying out encryption locking on a decoder in self-coding and a feature extraction model based on a convolutional neural network.

21. A server device, comprising:

at least one processor; and the number of the first and second groups,

the memory stores a facial feature extraction model that is a model obtained by cryptographically locking a decoder in self-encoding and a convolutional neural network-based feature extraction model, and further stores instructions executable by the at least one processor to enable the at least one processor to:

acquiring a coding vector of a face image of a user to be identified, wherein the coding vector is vector data obtained by characterizing the face image by using an encoder in the self-encoder; the method comprises the steps that a face image of a user to be identified is a multi-channel face image, and image data of each channel of the multi-channel face image of the user to be identified is the same as single-channel image data;

22. A training device for a face feature extraction model for privacy protection, comprising:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to: