CN117011909A

CN117011909A - Training method of face recognition model, face recognition method and device

Info

Publication number: CN117011909A
Application number: CN202211449046.2A
Authority: CN
Inventors: 许剑清
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-11-07

Abstract

The application provides a training method of a face recognition model, a face recognition method and a face recognition device, and relates to the field of machine learning of artificial intelligence. The model training method comprises the following steps: acquiring a first training sample set; respectively inputting the first training sample set into at least two first face recognition models to obtain first image features output by each first face recognition model; inputting the first image features output by each first face recognition model into a model confidence estimation module corresponding to each first face recognition model to obtain the confidence of the first image features output by each first face recognition model; fusing the first image features output by at least two first face recognition models according to the confidence coefficient of the first image features output by each first face recognition model to obtain fused features; and carrying out knowledge distillation on the second face recognition model according to the fusion characteristics to obtain a trained face recognition model. The embodiment of the application can improve the face recognition accuracy.

Description

Training method of face recognition model, face recognition method and device

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method of a face recognition model, a face recognition method and a device.

Background

At present, the method is applied to a mobile terminal face recognition system in actual scenes such as intelligent door locks, payment, access control and the like, and high requirements are put forward on the operation time consumption and accuracy of a face recognition model, for example, the higher recognition accuracy is obtained in the least possible reasoning time consumption. To meet the low time-consuming requirements, a small network is often required to be used as a face recognition reasoning model. Training a small network directly with large amounts of data often does not result in a model that meets accuracy requirements. This is due to the small fitting capacity of the small network, which falls into local minima of the constraint function during model training, and then begins to oscillate without further optimization.

In the related technology, a trained large-scale network is adopted to carry out knowledge distillation on a small-scale network, characteristics of a single large-scale network are adopted to restrain characteristics of the small-scale network, training of the small-scale network is realized, and the risk that a small model is sunk into a local minimum value can be effectively avoided. However, in various complex application scenarios, how to further improve the accuracy of the face recognition of the small-sized network is needed to be solved.

Disclosure of Invention

The application provides a training method of a face recognition model, a face recognition method and a face recognition device, which can improve the face recognition accuracy.

In a first aspect, an embodiment of the present application provides a training method for a face recognition model, including:

acquiring a first training sample set, wherein the first training sample set comprises a plurality of face image samples;

respectively inputting the first training sample set into at least two first face recognition models to obtain first image features output by each first face recognition model; the at least two first face recognition models are obtained by training at least two initialized first face recognition models by using the face image samples respectively;

inputting the first image features output by each first face recognition model into a model confidence estimation module corresponding to each first face recognition model to obtain the confidence of the first image features output by each first face recognition model; the model confidence estimation module corresponding to each first face recognition model is obtained by training an initialization model confidence estimation module by utilizing image features output by each first face recognition model and sample class center vectors corresponding to each face image sample;

Fusing the first image features output by the at least two first face recognition models according to the confidence level of the first image features output by each first face recognition model to obtain fused features;

and carrying out knowledge distillation on the second face recognition model according to the fusion characteristics to obtain the trained face recognition model, wherein the face recognition model is used for carrying out face recognition on the face image to be recognized.

In a second aspect, an embodiment of the present application provides a method for face recognition, including:

acquiring a first face image to be identified;

inputting the first face image into a face recognition model to obtain a first image feature corresponding to the first face image, wherein the face recognition model is obtained according to the training method of the first aspect;

determining a second image feature matching the first image feature in at least one second image feature; the at least one second image feature is obtained by extracting features of at least one reference face image by the face recognition model;

and determining that the first object identity indicated by the first face image is consistent with the second object identity corresponding to the matched second image feature.

In a third aspect, an embodiment of the present application provides a training device for a face recognition model, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first training sample set, and the first training sample set comprises a plurality of face image samples;

the at least two first face recognition models are used for respectively inputting the first training sample sets to obtain first image features output by each first face recognition model; the at least two first face recognition models are obtained by training at least two initialized first face recognition models by using the face image samples respectively;

the at least two model confidence estimation modules are used for inputting the first image features output by each first face recognition model into the model confidence estimation module corresponding to each first face recognition model to obtain the confidence of the first image features output by each first face recognition model; the model confidence estimation module corresponding to each first face recognition model is obtained by training an initialization model confidence estimation module by utilizing image features output by each first face recognition model and sample class center vectors corresponding to each face image sample;

The fusion unit is used for fusing the first image features output by the at least two first face recognition models according to the confidence coefficient of the first image features output by each first face recognition model to obtain fusion features;

and the training unit is used for carrying out knowledge distillation on the second face recognition model according to the fusion characteristics to obtain the trained face recognition model, and the face recognition model is used for carrying out face recognition on the face image to be recognized.

In a fourth aspect, an embodiment of the present application provides a device for face recognition, including:

the acquisition unit is used for acquiring a first face image to be identified;

the face recognition model is used for inputting the first face image to obtain a first image feature corresponding to the first face image, wherein the face recognition model is obtained according to the training method of the first aspect;

a matching unit for determining a second image feature matching the first image feature among at least one second image feature; the at least one second image feature is obtained by extracting features of at least one reference face image by the face recognition model;

And the determining unit is used for determining that the first object identity indicated by the first face image is consistent with the second object identity corresponding to the matched second image characteristic.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory for storing a computer program, the processor being for invoking and running the computer program stored in the memory for performing the method as in the first or second aspect.

In a sixth aspect, embodiments of the application provide a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform a method as in the first or second aspect.

In a seventh aspect, embodiments of the present application provide a computer program product comprising computer program instructions for causing a computer to perform the method as in the first or second aspect.

In an eighth aspect, embodiments of the present application provide a computer program that causes a computer to perform the method as in the first or second aspect.

Through the technical scheme, the features of the plurality of first face recognition models can be fused according to the confidence of the output features of the plurality of first face recognition models, and then knowledge distillation is performed on the second face recognition model according to the fused features, so that the trained face recognition model is obtained, and the features extracted by the trained face recognition model are gradually converged to be consistent with the distribution of the fused features. Because the fused features can more accurately express the real distribution of the features, the face recognition model obtained through knowledge distillation can extract more accurate face image features, and the face recognition system obtained through deployment can have higher face recognition accuracy.

In addition, as the fusion characteristics are obtained by adopting the characteristics output by a plurality of first face recognition models and the confidence degrees thereof, the second face recognition model obtained by carrying out knowledge distillation based on the fusion characteristics has stronger robustness.

Drawings

FIG. 1 is a schematic diagram of a system architecture of an embodiment of the present application;

fig. 2 is a schematic flow chart of a face recognition scheme provided by an embodiment of the present application;

fig. 3 is a schematic diagram of a training method of a first face recognition model according to an embodiment of the present application;

fig. 4 is another schematic diagram of a training method of the first face recognition model according to the embodiment of the present application;

FIG. 5 is a schematic diagram of a training method of a model confidence estimation module according to an embodiment of the present application;

FIG. 6 is another schematic diagram of a training method of a model confidence estimation module according to an embodiment of the present application;

fig. 7 is a schematic diagram of a training method of a face recognition model according to an embodiment of the present application;

fig. 8 is another schematic diagram of a training method of a face recognition model according to an embodiment of the present application;

fig. 9 is another schematic diagram of a training method of a face recognition model according to an embodiment of the present application;

Fig. 10 is a schematic diagram of a face recognition system according to an embodiment of the present application;

fig. 11 is a schematic flow chart of a method for face recognition according to an embodiment of the present application;

FIG. 12 is a schematic block diagram of a training model apparatus in accordance with an embodiment of the present application;

fig. 13 is a schematic block diagram of a face recognition device according to the own embodiment;

fig. 14 is a schematic block diagram of an electronic device in accordance with an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

It should be understood that in embodiments of the present application, "B corresponding to a" means that B is associated with a. In one implementation, B may be determined from a. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

In the description of the present application, unless otherwise indicated, "at least one" means one or more, and "a plurality" means two or more. In addition, "and/or" describes an association relationship of the association object, and indicates that there may be three relationships, for example, a and/or B may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be further understood that the description of the first, second, etc. in the embodiments of the present application is for illustration and distinction of descriptive objects, and is not intended to represent any limitation on the number of devices in the embodiments of the present application, nor is it intended to constitute any limitation on the embodiments of the present application.

It should also be appreciated that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the application. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the application is applied to the technical field of artificial intelligence.

Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The embodiment of the application can relate to Computer Vision (CV) technology in artificial intelligence technology, wherein the Computer Vision is a science for researching how to make a machine "see", and further refers to the machine Vision that a camera and a Computer are used for replacing human eyes to identify and measure targets, and further performs graphic processing, so that the Computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

The embodiment of the application can also relate to Machine Learning (ML) in the artificial intelligence technology, wherein ML is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Embodiments of the present application may also relate to face recognition techniques in artificial intelligence techniques. The face recognition technology has two main uses in daily life, namely, one is used for face verification (also called face comparison) to verify whether you are someone or not, and the other is used for face recognition to verify who you are.

The face verification is 1:1 comparison, and the identity verification mode is essentially a process of quickly comparing the current face with the face database and obtaining whether the current face is matched with the face database, so that the user can be simply understood to prove that the user is you. In another expression mode, the face background recognition system is told that I are Zhang San, and then the face background recognition system is used for verifying whether I stand in front of a machine is Zhang San or not.

The face recognition is 1:N comparison, namely after face images of 'me' are acquired, face images conforming to the current user are found from a large number of face images and matched, and the person who is found.

At present, the face recognition technology can be applied to a mobile terminal face recognition system in actual scenes such as intelligent door locks, payment, entrance guard and the like. In order to meet the low time-consuming requirements of face recognition models, small networks are generally adopted as face recognition reasoning models. In order to further improve the accuracy of the small-scale network, a trained large-scale network is adopted to carry out knowledge distillation on the small-scale network in the related technology, the characteristics of the small-scale network are restrained by the characteristics of a single large-scale network, the training of the small-scale network is realized, and the risk that a small model is sunk into a local minimum value can be effectively avoided.

However, in the existing face recognition model training, the loss function maps the face image to a determined point in the feature space, and the probability problem of each face image in the feature space cannot be solved in the mode, so that when the quality of the face image is poor, the distribution estimation of the face image features in the space is more difficult, and the accuracy of the model on the features extracted from the image is lower. Therefore, in various complex application scenarios, how to further improve the accuracy of model face recognition is needed to be solved.

In view of this, the embodiment of the present application fuses the features of the plurality of first face recognition models based on the confidence levels of the output features of the plurality of first face recognition models, and then performs knowledge distillation on the second face recognition model according to the fused features, so as to obtain a trained face recognition model, so that the features extracted by the trained face recognition model converge to be consistent with the distribution of the fused features. Because the fused features can more accurately express the real distribution of the features, the face recognition model obtained by knowledge distillation can extract more accurate face image features. Furthermore, the face recognition system deployed according to the face recognition model can have higher face recognition accuracy.

The embodiment of the application can be applied to the identity authentication scenes such as intelligent door locks, payment, entrance guard and the like, but is not limited to the application. The image acquisition module acquires a first face image of an object located in a designated area in the scene, then the face recognition model provided by the embodiment of the application is adopted to extract the image characteristics of the first face image, and then the image characteristics are subjected to characteristic comparison search with at least one reference face image characteristic, the reference face image characteristics matched with the image characteristics of the first face image are obtained, and the identity of the object corresponding to the first face image is determined to be consistent with the identity of the object indicated by the matched reference face image characteristics. The face recognition model provided by the embodiment of the application is obtained by carrying out knowledge distillation training on the fusion characteristics of a plurality of large face recognition models with stronger expression capability, the extracted characteristic distribution of the face recognition model is higher in similarity with the fusion characteristic distribution of the plurality of large face recognition models, and the fusion characteristics can more accurately express the real distribution of the characteristics, so that the face recognition model provided by the embodiment of the application can extract more accurate face image characteristics, and the face recognition system deployed according to the face recognition model can have higher face recognition accuracy and can be suitable for face recognition under various complex application scenes.

The face recognition system provided by the embodiment of the application can perform face verification of 1:1 and can also perform face recognition of 1:N.

Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 1, the system architecture may include a user device 101, a data acquisition device 102, a training device 103, an execution device 104, a database 105, and a content library 106.

The data acquisition device 102 is configured to read training data from the content library 106, and store the read training data in the database 105. The training data according to the embodiments of the present application includes a face image sample, which may include tag information.

Training device 103 trains the face recognition model based on training data maintained in database 105. The face recognition model obtained by the training device 103 can extract more accurate face features and perform face recognition. And the model may be further connected to other downstream models. The model obtained by training device 103 may be applied to different systems or devices.

In addition, referring to fig. 1, the execution device 104 may be configured with an I/O interface 107 for data interaction with external devices. For example, the face picture to be recognized sent by the user equipment 101 is received through the I/O interface. The computing module 109 in the execution device 104 processes the input data using the trained face recognition model, outputs the face recognition result, and sends the corresponding result to the user device 101 through the I/O interface.

The user device 101 may include a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a mobile internet device (mobile internet device, MID), or other terminal devices with face recognition function.

The execution device 104 may be a server. By way of example, the server may be a rack server, a blade server, a tower server, or a rack server, among other computing devices. The server may be an independent server or a server cluster formed by a plurality of servers.

In this embodiment, the execution device 104 is connected to the user device 101 through a network. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), a 4G network, a 5G network, bluetooth (Bluetooth), wi-Fi, a telephony network, etc.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawings does not constitute any limitation. In some embodiments, the data acquisition device 102 may be the same device as the user device 101, the training device 103, and the execution device 104. The database 105 may be distributed over one server or over a plurality of servers, and the content library 106 may be distributed over one server or over a plurality of servers.

The following describes the technical scheme of the embodiments of the present application in detail through some embodiments. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 2 is a schematic flow chart of a face recognition scheme provided by an embodiment of the present application. As shown in fig. 2, the technical solution of the embodiment of the present application may relate to a model training phase and a model deployment phase. The following steps 210 through 230 are included in the model training phase, and the following step 240 is included in the model deployment phase.

210, training a plurality of first face recognition models.

Illustratively, the first face recognition model may include a large recognition network module. Specifically, the plurality of first face recognition models may be trained using existing training data.

220, training a plurality of model confidence assessment modules.

Here, the plurality of model confidence evaluation modules are in one-to-one correspondence with the plurality of first face recognition models. That is, in this step, a model confidence assessment module that matches each of the first face recognition models may be trained. The model confidence assessment module may be configured to perform confidence assessment on the features output by the matched first face recognition model. The confidence level is used for evaluating the confidence level of the face recognition model on accurate expression of the features.

230, training a second face recognition model.

The second face recognition model may include, for example, a small recognition network module. The parameters of the second face recognition model are smaller than those of the first face recognition model. Specifically, in this step, the image features output by the plurality of first face recognition models may be fused according to the confidence coefficient obtained by the model confidence coefficient evaluation module, and knowledge distillation training may be performed on the second face recognition module by using the fused features.

And 240, deploying by using the second face recognition model and other modules.

Specifically, a trained second face recognition model can be adopted for deployment, and a feature comparison module and a search module are configured to form a complete face recognition system.

Therefore, the embodiment of the application fuses the characteristics of the plurality of first face recognition models according to the confidence degrees of the output characteristics of the plurality of first face recognition models, and then carries out knowledge distillation on the second face recognition model according to the fused characteristics to obtain the trained face recognition model, so that the characteristics extracted by the trained face recognition model are gradually converged to be consistent with the distribution of the fused characteristics. Because the fused features can more accurately express the real distribution of the features, the face recognition model obtained by knowledge distillation can extract more accurate face image features, and the face recognition system obtained by deployment can have higher face recognition accuracy.

First, the training process of the above step 210, that is, the plurality of first face recognition models, will be described in detail.

In some embodiments, as shown in fig. 3, steps S11-S15 may be repeatedly performed, and training data is used to train a plurality of initialized first face image recognition models, respectively, to obtain a plurality of trained first face recognition models:

s11, acquiring a training sample set, wherein the training sample set comprises a plurality of face image samples.

S12, inputting the current face image sample in the training sample set into a current first face recognition model to obtain a first current image feature corresponding to the current face image sample.

The current first face recognition model is used for extracting image features of an input current face image sample and outputting first current image features. By way of example, the first face recognition model may include, but is not limited to, a convolution layer, an activation function, a pooling layer, a full connection layer, and the like.

In some embodiments, the first current image feature may be a feature map, or an hyperspherical spatial feature vector, as the application is not limited in this respect.

S13, obtaining a current class center vector corresponding to the current face image sample. The current class center vector is used for indicating the identity class to which the current face image sample belongs.

Optionally, the class center vector corresponding to the face image sample may be obtained in the following two ways:

mode one: and (3) carrying out average value operation on all the image features in each category to obtain a category center vector corresponding to each category. Specifically, a plurality of face image samples can be clustered according to label information of the face image samples to obtain a face image sample set with a plurality of identity categories; extracting face features of face image samples contained in the face image sample set of each identity category to obtain category face features corresponding to the identity category; carrying out mean value calculation on the class face features to obtain a sample class center vector corresponding to the identity class; and taking the sample class center vector as a class center vector corresponding to each face image sample in the face image sample set corresponding to the identity class.

Mode two: and using the classification weight of each category obtained by the face recognition model as a category center vector of the corresponding category. Specifically, the face image sample and the label information of the face image sample can be sequentially input into a face recognition model, so that the class center vector corresponding to the face sample image can be determined from the network model through classification in the face recognition model.

The label information of the face image sample may include an identity label of the face image sample, which is not limited by the present application.

S14, obtaining a first current loss of the current first face recognition model according to the first current image feature, the current class center vector and the label information of the current face image sample.

Specifically, the first current image feature, the current class center vector and the label information of the current face image sample may be input into an objective function of the current first face recognition model, to obtain the first current loss.

And S15, according to the first current loss, adjusting current parameters of a current first face recognition model.

For example, the training stopping condition is determined to be satisfied when the number of iterations of the current first face recognition model is greater than or equal to a preset threshold and/or the first current loss is less than or equal to the preset threshold. The current first face recognition model determined to satisfy the training stop condition may then be output as the first face recognition model.

In summary, through the steps S11 to S15, training for each initialized first face image recognition model may be implemented, so as to obtain a plurality of first face recognition models.

In some embodiments, the parameter values of the plurality of initialized first face recognition models are initial values, and the initial values of the plurality of initialized first face recognition models are different. In this way, a plurality of first face recognition models with different parameters can be obtained, and therefore diversity of the first face recognition models is enriched.

In some embodiments, in the step S14, a first current loss of the current first face recognition model may be obtained according to the first current image feature, the current class center vector, the tag information of the current face image sample, and the super parameters, where the corresponding super parameters of the different first face recognition modules are different. According to the embodiment of the application, the corresponding super parameters of the at least two first face recognition modules are different, so that the restraint diversity of the face images in space can be ensured, and different first face recognition models are promoted to learn different information of the same face image.

Specifically, with reference to the example of fig. 4.

301, a training data sample is acquired.

The training data sample is a face image sample. As one implementation, the training data preparation module may read a plurality of face image samples from the training sample set, and combine the read plurality of face image samples into one data set (batch) for input into the first face recognition model.

302, the deep neural network extracts image features.

Specifically, the first face recognition model may include the deep neural network, the deep neural network may extract spatial features of the face image sample, and the output image features may include spatial structure information of the face picture.

By way of example, the deep neural network may include a convolution (concentration) layer, an activation function layer (such as a nonlinear activation function Relu), a Pooling (Pooling) layer, and the like. The convolution layer is used for extracting image features of the face image sample; the activation function and the pooling layer are used for carrying out calculation processing on the image characteristics to obtain a characteristic diagram.

Optionally, prior to step 302, the deep neural network may also be randomly initialized. As an implementation, the first face recognition model may be randomly initialized using a random seed control module. The random seeds are control parameters for model parameter initialization, and if the random seeds are inconsistent, the obtained model parameters are inconsistent in initialization.

In some embodiments, different random seeds may be configured for the plurality of first face recognition models, respectively, so that the plurality of first face recognition models have different model parameter initializations, thereby enriching the diversity of the first face recognition models.

303, the full connection layer maps the image features.

In particular, the first face recognition model may include the fully connected layer, which may map the image features extracted by the deep neural network to a 1×n _d Vector μ of dimensions.

In some embodiments, the fully connected layer is configured to map the feature map obtained by the pooling layer into hyperspherical spatial feature vectors in hyperspherical space.

304, obtaining a class center vector.

Specifically, a class center vector w corresponding to each class of the face image sample can be obtained _x∈c Representing the class center of the class c sample. Wherein each category may correspond to an identity category ID. For example, the center vector of a certain class can be obtained by taking the average value of all the image features in the class, or the classification weight of each class obtained by the deep neural network can be used as the class center of the class.

305, a loss function of face recognition is calculated.

Specifically, the loss function may be determined according to the image features (such as the feature vector output by the full connection layer) of the current face image sample, the class center vector, and the label information of the face image sample. This loss function is one example of the first current loss described above. Alternatively, the loss function may include, but is not limited to, a classification function, such as softmax, or various types of softmax of the additive margin type, and other types of loss functions may be used, which are not limited by the embodiments of the present application.

Illustratively, the loss function may be as shown in equation (1) below:

wherein L is _margin-loss Is the loss function of face recognition, s, m ₁ 、m ₂ Is super parameter, theta is the included angle between the feature vector and the class center vector, N is the number of face image samples of each iteration, k is the number of all identity categories, y _i And the label information corresponding to the face image sample is obtained.

Optionally, 306, super-ginseng control.

Specifically, in step 306, the corresponding super parameters of the at least two first face recognition modules (e.g., s, m described above ₁ 、m ₂ Etc.) are different. For example, the super-parameter control can be performed on the objective function of each first face recognition model through the loss super-parameter control module.

For example, for the formula(1) In other words, the face image samples are gathered around the space of the class center vector, and the distance s from the origin expresses the quality of the picture. Thus, in the plurality of first face recognition models, by adjusting s, m ₁ 、m ₂ And the like, the configuration can help to ensure the diversity of the constraint of the images in space, promote different first face recognition models to learn different information of the same face image, and enable the image features extracted by the plurality of first face recognition models to have complementarity. Alternatively, different first face recognition models may be configured with different s, m ₁ 、m ₂ And (5) waiting for super parameters.

307, is a termination training condition satisfied?

For example, terminating the training condition may include the number of iterations meeting a set value and/or the loss calculated from the objective function being less than or equal to the set value.

308, obtaining a first face recognition model.

When the termination training condition is satisfied, the first face recognition model determined when the termination training condition is satisfied may be output as a final one.

309, model parameter optimization.

When the training condition is not satisfied, the parameter optimization can be performed on the whole deep neural network and the full-connection layer based on a gradient descent mode, such as random gradient descent, random gradient descent of a driving quantity item and the like. During the training process, the above steps 301 to 309 may be repeated until the training result satisfies the above training stop condition.

Therefore, through the training process, the embodiment of the application can obtain a plurality of first face recognition models with different parameters, and the plurality of first face recognition models can learn different information of the same face image to obtain a plurality of image features with complementarity.

The training process of the model confidence assessment modules in step 220 is described in detail below.

It should be noted that, in the training process of the confidence evaluation modules of multiple models, the first face recognition model obtained by training in step 210 is required. In the training process, the parameters of the first face recognition model are not updated.

In some embodiments, referring to fig. 5, steps S21-S26 may be repeated, and the plurality of initialized model confidence assessment modules may be respectively trained using training data to obtain a plurality of trained model confidence assessment modules.

S21, acquiring a training sample set, wherein the training sample set comprises a plurality of face image samples.

Alternatively, the training sample set herein may be the same as the training sample set in S11.

S22, inputting the current face image sample in the training sample set into each first face recognition model to obtain a second current image feature corresponding to the current face image sample.

The at least two first face recognition models are obtained by training at least two initialized first face recognition models by using face image samples. Specific training procedures may be found in the description of fig. 3 or fig. 4 above. The second current image feature is similar to the first current image feature and reference is made to the relevant description above.

S23, obtaining a current class center vector corresponding to the current face image sample. The current class center vector is used for indicating the identity class to which the current face image sample belongs.

In particular, class center vectors may be referred to the relevant description above.

S24, inputting the second current image feature into a current model confidence estimation module to obtain the current confidence of the second current image feature; the current confidence level is used for evaluating the confidence level of the face recognition of the current face image sample through the first face recognition model.

Specifically, each first face recognition model may correspond to a model confidence estimating module, which is configured to estimate a confidence level of an image feature output by the corresponding first face recognition model, where the confidence level is capable of evaluating a confidence level of a face image sample for face recognition by the first face recognition model.

In this step, the second current image feature output by each first face recognition model may be input to the current model confidence estimation module corresponding to the first face recognition model, so as to obtain the confidence coefficient of the second current image feature.

And S25, obtaining a second current loss of the current model confidence estimation module according to the second current image characteristics, the current class center vector and the current confidence.

Specifically, the second current image feature, the current class center vector and the current confidence coefficient may be input into an objective function of the current model confidence coefficient estimation module, to obtain the second current loss.

S26, according to the second current loss, current parameters of the current model confidence estimation module are adjusted.

For example, the training stopping condition is determined to be satisfied if the number of iterations of the current model confidence estimation module is greater than or equal to a preset threshold and/or the second current loss is less than or equal to the preset threshold. Then, the current model confidence estimation module determined by each first face recognition model meeting the training stopping condition can be output as the model confidence estimation module corresponding to the first face recognition model.

In summary, through the steps S21 to S26, training of the model confidence estimation module corresponding to each first face image recognition model may be respectively implemented, so as to obtain the model confidence estimation module corresponding to each first face image recognition model.

Specifically, the description is given with reference to the example of fig. 6.

401, a training data sample is acquired.

The deep neural network extracts image features 402.

403, the full connection layer maps the image features.

In particular, steps 401 to 403 are similar to steps 301 to 303, and reference may be made to the description hereinabove. Unlike steps 301 to 303, in steps 401 to 403, parameters of the deep neural network and the full connection layer are not updated.

404, the model confidence estimation module performs confidence estimation.

Specifically, each model confidence estimation module may determine the confidence level k of its corresponding deep neural network model for the extracted features of the input image. For example, when the extracted features include hyperspherical spatial feature vectors, the model confidence estimation module may estimate the confidence k of the hyperspherical spatial feature vector in hyperspherical space.

By way of example, the model confidence estimation module may include a number of connected fully connected layers, a neural network in the form of a residual network (RESNET), and the like, as the application is not limited in this regard.

And 405, obtaining a class vector center.

Specifically, step 405 may refer to the description of step 304 in fig. 3, which is not repeated.

406, a loss function of model confidence is calculated.

Specifically, the loss function may be determined according to the image features (such as feature vectors output by the full connection layer), the class center vector, and the confidence level of the current face image sample. This loss function is one example of the second current loss described above.

Illustratively, the loss function may be as shown in equation (2) below:

wherein L is _s K is the loss function of the confidence coefficient of the model, r is the current confidence coefficient, r is the currently configured hypersphere space radius, x is the current face image sample, mu (x) is the hypersphere space feature vector of the current face image sample x, and w _x∈c For the class center vector of the current class c sample, d is the dimension of the hyperspherical space feature vector, μ (x) ^T Represents the transpose of μ (x). I is a Bessel function, and the expression of the Bessel function is shown in a formula (3):

where m represents an infinite number of steps, m-! Representing the factorization of m, Γ (·) represents the Γ function. In the embodiment of the application, the alpha value is

407, do it meet the termination training condition?

408, a model confidence estimation module is obtained,

when the termination training condition is satisfied, the model confidence estimation module determined when the termination training condition is satisfied may be output as a final model confidence estimation module.

409, model parameter optimization.

When the training condition is not satisfied, the parameter optimization can be performed on the whole model confidence estimation module based on a gradient descent mode, such as random gradient descent, random gradient descent of a driving quantity item, and the like.

As an example, the gradient optimization can be performed for k and μ according to the following formulas (4) and (5):

wherein in the formulas (4) and (5),representing the gradient of the loss function L (μ, k) to μ, < >>Gradient operation of k on loss function L (mu, k), k represents confidence, mu represents image feature vector, k () represents confidence vector of current face image sample x,Representing w _x∈c Is a transpose of (a). In addition, w _x∈c 、r、d、μ(x)、μ(x) ^T The meaning of I, etc. can be found in the relevant description in equation (2) above.

During the training process, the above steps 401 to 409 may be repeated until the training result satisfies the above training stop condition.

Therefore, through the training process, the embodiment of the application can obtain the model confidence estimation modules respectively corresponding to the plurality of first face recognition models, so that the confidence estimation can be carried out on the image characteristics output by each first face recognition model.

The training process of the second face recognition model in step 230 is described in detail below.

It should be noted that, in the training process of the second face recognition module, the plurality of first face recognition models obtained through training in steps 210 and 220 and the model confidence evaluation module corresponding to each first face recognition model are required. In the training process, the parameters of the first face recognition model and the model confidence evaluation module are not updated.

In some embodiments, referring to fig. 7, steps S31-S36 may be repeatedly performed, the second face recognition module is initialized to be trained using the training data, and the second face recognition module obtained by satisfying the training stopping condition is output as the trained face recognition module.

S31, acquiring a training sample set, wherein the training sample set comprises a plurality of face image samples.

Alternatively, the training sample set may be the same as the training sample set in S11 or S12, which is not limited in the present application.

S32, respectively inputting the training sample set into at least two first face recognition models to obtain first image features output by each first face recognition model.

Thus, the at least two first face recognition modules can learn different information of the same face image to obtain a plurality of first image features with complementarity.

Specifically, the current face image sample in the training sample set may be input into each first face recognition model, so as to obtain the current image feature corresponding to the current face image sample output by each first face recognition model. The current image feature is one example of the first image feature described above.

The at least two first face recognition models are obtained by training at least two initialized first face recognition models by using face image samples. Specific training procedures may be found in the description of fig. 3 or fig. 4 above. The first image feature is similar to the first current image feature above, and reference may be made to the relevant description above.

S33, inputting the first image features output by each first face recognition model into a model confidence estimation module corresponding to each first face recognition model to obtain the confidence of the first image features output by each first face recognition model.

Specifically, the current image feature corresponding to the current face image sample output by each first face recognition model may be input into a model confidence estimation module corresponding to each first face recognition model, so as to obtain the confidence of the current image feature output by each first face recognition model.

The model confidence estimation module corresponding to each first face recognition model is obtained by training the initialization model confidence estimation module by utilizing image features output by each first face recognition model and sample class center vectors corresponding to each face image sample. Specific training procedures and confidence levels may be found in the relevant descriptions above in fig. 5 or fig. 6.

S34, fusing the first image features output by at least two first face recognition models according to the confidence coefficient of the first image features output by each first face recognition model to obtain fusion features.

Specifically, the corresponding current image features can be fused according to the confidence coefficient of the current image features output by each first face recognition model, and the current fusion features corresponding to the current face image samples can be obtained.

Because the first image features output by the first face recognition models have complementarity, the first image features are fused according to the confidence coefficient corresponding to each first image feature, and the obtained fusion features can more accurately express the real distribution of the image features.

And S35, performing knowledge distillation on the second face recognition model according to the fusion characteristics to obtain a trained face recognition model. The face recognition model can be used for recognizing the face of the face image to be recognized.

Illustratively, the network structure of the second face recognition model is similar to that of the first face recognition network, and may include convolutional neural network models including, but not limited to, convolutional layers, nonlinear activation layers, pooling layers, and fully-connected layers, as the present application is not limited in this regard.

Optionally, the parameters of the second face recognition model are smaller than those of the first face recognition model. For example, the first face recognition model may include a large deep neural network, the second face recognition model may include a small deep neural network, and parameters of the small deep neural network may be much smaller than those of the large deep neural network, so that forward reasoning of the second face recognition model requires less time.

Specifically, knowledge distillation is performed on the second face recognition module according to the fusion features, so that the features extracted by the second face recognition features gradually converge to be consistent with the distribution of the fusion features, the face recognition model obtained through training can obtain more accurate feature distribution, and the face recognition accuracy is improved.

Therefore, the embodiment of the application fuses the characteristics of at least two first face recognition models according to the confidence coefficient of the output characteristics of each first face recognition model, and carries out knowledge distillation on the second face recognition model according to the fused characteristics to obtain the trained face recognition model, so that the characteristics extracted by the trained face recognition model are gradually converged to be consistent with the distribution of the fused characteristics. Because the fused features can more accurately express the real distribution of the features, the embodiment of the application extracts more accurate facial image features through the face recognition model obtained by knowledge distillation.

In some embodiments, as shown in fig. 8, step S35 may be specifically implemented as the following steps S351 to S353:

s351, inputting the first training sample set into a second face recognition model to obtain second image features.

Specifically, the current face image sample in the training sample set may be input into the second face recognition model, so as to obtain the current image feature corresponding to the current face image sample output by the second face recognition model. The current image feature is one example of a second image feature.

And S352, determining knowledge distillation loss according to the similarity of the fusion characteristic and the second image characteristic.

For example, the current knowledge distillation loss may be determined according to a similarity between the current fusion feature corresponding to the current face image sample and the current image feature of the current face image sample output by the current second face recognition model. That is, the distance between the second face recognition model and the first face recognition model may be constrained according to the fusion features.

And S353, adjusting the parameters of the second face recognition model according to the knowledge distillation loss until the training stopping condition is met, and outputting the second face recognition model determined by meeting the training stopping condition as the face recognition model.

For example, the current parameters of the current second face recognition model may be adjusted according to the current knowledge distillation loss until the training stopping condition is satisfied.

In some embodiments, the training stopping condition is determined to be met if the number of iterations of the second face recognition model is greater than or equal to a preset threshold and/or the knowledge distillation loss is less than or equal to a preset threshold. Then, the current second face recognition model determined to satisfy the training stopping condition may be output as a trained face recognition model.

In some embodiments, soft tag information of a plurality of first face recognition models can be fused according to confidence degrees of the models, and probability output by a second face recognition model is constrained according to the fused soft tag information to determine knowledge distillation loss. And then, updating parameters of the second face recognition model according to the knowledge distillation loss to obtain a trained face recognition model.

Alternatively, the knowledge distillation loss may be a combination of a knowledge distillation loss obtained according to a probability constraint of the fused soft tag information on the model output and a knowledge distillation loss obtained according to a constraint of the fused feature on the model output, which is not limited in the present application.

Specifically, the description is given with reference to the example of fig. 9.

501, obtaining training data samples.

Specifically, step 501 may refer to the description of step 301 in fig. 3, and will not be repeated.

And 502, extracting features from n large-scale deep neural network models.

Specifically, the n large deep neural network models may be a plurality of deep neural network models obtained according to the training method shown in fig. 3 or fig. 4, which is not limited in the present application. The large deep neural network model may be one example of the first face recognition model described above. Each large-scale deep neural network model can learn different information of the same face image to obtain n image features with complementarity, such as feature 1, feature 2, …, feature n and the like.

503, n model confidence estimation modules perform confidence estimation.

Specifically, the n model confidence coefficient estimation modules are in one-to-one correspondence with the n large-scale deep neural network models, and each model confidence coefficient estimation module is used for estimating the confidence coefficient of the image feature output by the corresponding large-scale deep neural network model. Such as the model confidence estimation module 1 obtaining the confidence 1 of the extracted features of the large-scale deep neural network 1, the model confidence estimation module 2 obtaining the confidence 2 of the extracted features of the large-scale deep neural network 2, etc.

Specifically, the n model confidence estimation modules may be obtained according to a training method as shown in fig. 5 or fig. 6, which is not limited in the present application.

504, fusing the features.

Specifically, the features extracted by the n large-scale deep neural networks may be fused according to the confidence degrees of the features extracted by the n large-scale deep neural networks, to obtain fused features (i.e., fused features).

Illustratively, the fusion characteristics may be derived according to the following equation (6):

wherein,mu, as fusion feature _m Normalized features, k, extracted for the mth model _m Normalized features μ extracted for mth model _m Is a confidence level of (2).

505, the small neural network model extracts features.

The small neural network model may be, for example, the same structure as the large deep neural network model, but with parameters much smaller than the large deep neural network model. The small deep neural network model may be one example of the first face recognition model described above.

506, a knowledge distillation loss function is calculated.

For example, the knowledge distillation loss function may be determined based on the similarity of the fused features and features extracted by the small neural network model, such as cosine similarity. Specifically, the method can be shown in the following formula (7):

L _f ＝‖F _X -F _Y ‖ ₂ (7)

Wherein L is _f To knowledge distillation loss function, F _X Features extracted for small neural network model, F _Y Is a fusion feature.

507, whether the termination training condition is satisfied.

For example, terminating the training condition may include the number of iterations meeting a set value, and/or the knowledge distillation loss being less than or equal to the set value.

508, acquiring a face recognition model.

When the termination training condition is satisfied, the small neural network model determined when the termination training condition is satisfied may be output as a final face recognition model.

509, model parameter optimization.

When the training condition is not satisfied, the parameters of the small neural network model can be optimized based on gradient descent modes, such as random gradient descent, random gradient descent with a driving quantity term, and the like. During the training process, the above steps 501 to 509 may be repeated until the training result satisfies the above training stop condition.

The above step 240, i.e., the module deployment phase, is described in detail below. The module deployment stage mainly deploys relevant modules obtained in the training stage to form a complete face recognition solution. Fig. 10 shows a schematic diagram of a face recognition system deployed according to an embodiment of the present application, including an image acquisition module, a face recognition model, and a feature contrast search module. The image acquisition module acquires a face image to be identified and inputs the face image to the face identification model. The face recognition model extracts the characteristics of the output image and inputs the characteristics into the characteristic comparison search module for recognition.

Referring to fig. 11, a schematic flowchart of a method for face recognition according to an embodiment of the present application is shown. As shown in fig. 11, the method includes steps S41 to S45.

S41, acquiring a first face image to be recognized.

For example, an image acquisition module may be deployed, and a first face image of an object requiring face recognition located in a designated area in each scene requiring face recognition may be acquired by the image acquisition module. As an example, the image acquisition module may include a camera.

S42, inputting the first face image into a face recognition model to obtain a first image feature corresponding to the first face image. The face recognition model is obtained according to the model training method shown above.

S43, determining a second image feature matched with the first image feature in the at least one second image feature. The at least one second image feature is obtained by extracting features of at least one reference face image by the face recognition model.

For example, a feature comparison search module may be deployed and a second image feature that matches the first image feature may be determined from among at least one second image feature by the feature comparison search module.

For example, the pre-constructed face image data feature library may be traversed to obtain the at least one second image feature. The image features in the face image data feature library are obtained by inputting at least one reference face image into the face recognition model and extracting features of the at least one reference face image by the face recognition model. Optionally, the facial image data feature library further includes object identity information corresponding to each second image feature.

In some embodiments, a similarity of the first image feature to at least one second image feature, respectively, may be determined. Then, in the case that the similarity is greater than or equal to a preset threshold, determining that the second image feature corresponding to the similarity is a second image feature matched with the first image feature. In addition, under the condition that the similarity is smaller than a preset threshold value, the second image feature corresponding to the similarity is determined to be not matched with the first image feature.

S44, determining that the first object identity indicated by the first face image is consistent with the second object identity corresponding to the matched second image feature.

In addition, the embodiment of the application can also determine that the identity of the first object indicated by the first face image is inconsistent with the identity of the object corresponding to the unmatched second image characteristic.

It should be noted that, when at least one second image feature is a second image feature, the face recognition system performs face verification of 1:1; when the at least one second image feature is a plurality of second image features, the face recognition system performs 1: and (5) face recognition of N.

In the embodiment of the application, the face recognition model is obtained by carrying out knowledge distillation training on the fusion characteristics of a plurality of large face recognition models with stronger expression capability, the extracted characteristic distribution of the face recognition model is higher in similarity with the fusion characteristic distribution of the plurality of large face recognition models, and the fusion characteristics can more accurately express the real distribution of the characteristics, so that the face recognition model of the embodiment of the application can extract more accurate face image characteristics, and the face recognition system deployed according to the face recognition model can have higher face recognition accuracy and can be suitable for face recognition under various complex application scenes.

The specific embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be regarded as the disclosure of the present application.

It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application. It is to be understood that the numbers may be interchanged where appropriate such that the described embodiments of the application may be practiced otherwise than as shown or described.

The method embodiments of the present application are described above in detail, and the apparatus embodiments of the present application are described below in detail with reference to fig. 12 to 14.

Fig. 12 is a schematic block diagram of a training apparatus 600 of a face recognition model according to an embodiment of the present application. As shown in fig. 12, the model training apparatus 600 may include an acquisition unit 610, at least two first face recognition models 620, at least two model confidence estimation modules 630, a fusion unit 640, and a training unit 650.

An obtaining unit 610, configured to obtain a first training sample set, where the first training sample set includes a plurality of face image samples;

at least two first face recognition models 620, configured to input the first training sample sets respectively, and obtain first image features output by each of the first face recognition models; the at least two first face recognition models are obtained by training at least two initialized first face recognition models by using the face image samples respectively;

At least two model confidence estimation modules 630, configured to input the first image feature output by each of the first face recognition models into a model confidence estimation module corresponding to each of the first face recognition models, to obtain a confidence level of the first image feature output by each of the first face recognition models; the model confidence estimation module corresponding to each first face recognition model is obtained by training an initialization model confidence estimation module by utilizing image features output by each first face recognition model and sample class center vectors corresponding to each face image sample;

a fusion unit 640, configured to fuse the first image features output by the at least two first face recognition models according to the confidence level of the first image feature output by each first face recognition model, so as to obtain a fused feature;

and the training unit 650 is configured to perform knowledge distillation on the second face recognition model according to the fusion feature, so as to obtain the trained face recognition model, where the face recognition model is used for performing face recognition on the face image to be recognized.

In some embodiments, training unit 650 is specifically configured to:

Inputting the first training sample set into a second face recognition model to obtain a second image feature;

determining knowledge distillation loss according to the similarity of the fusion feature and the second image feature;

and adjusting parameters of the second face recognition model according to the knowledge distillation loss until a first training stopping condition is met, and outputting the second face recognition model determined by meeting the first training stopping condition as the face recognition model.

In some embodiments, training unit 650 is further to:

repeating the following steps until each first face recognition model meets a second training stopping condition:

inputting a current face image sample into a current first face recognition model to obtain a first current image feature corresponding to the current face image sample;

acquiring a current class center vector corresponding to the current face image sample; the current class center vector is used for indicating the class to which the current face image sample belongs;

obtaining a first current loss of the current first face recognition model according to the first current image characteristics, the current class center vector and the label information of the current face image sample;

According to the first current loss, current parameters of the current first face recognition model are adjusted;

and outputting the current first face recognition model determined by meeting the second training stopping condition as the first face recognition model.

In some embodiments, training unit 650 is specifically configured to:

and obtaining a first current loss of the current first face recognition model according to the first current image feature, the current class center vector, the label information of the current face image sample and the super parameters, wherein the corresponding super parameters of the at least two first face recognition modules are different.

In some embodiments, the initial values corresponding to the at least two initialized first face image recognition models are different.

In some embodiments, training unit 650 is specifically configured to:

repeating the following steps until the model confidence estimation module corresponding to each first face recognition model meets a third training stopping condition:

inputting a current face image sample into each first face recognition model to obtain a second current image feature corresponding to the current face image sample;

Inputting the second current image features into a current model confidence estimation module to obtain the current confidence of the second current image features; the current confidence coefficient is used for evaluating the confidence coefficient of the face recognition of the current face image sample through each first face recognition model;

obtaining a second current loss of the current model confidence estimation module according to the second current image characteristics, the current class center vector and the current confidence;

according to the second current loss, current parameters of the current model confidence estimation module are adjusted;

and outputting the current model confidence coefficient estimation module determined by meeting the third training stopping condition as a trained model confidence coefficient estimation module corresponding to each first face recognition model.

In some embodiments, the first face recognition model includes a convolutional layer, a nonlinear activation layer, a pooling layer, and a fully-connected layer; the second face recognition model comprises a convolution layer, a nonlinear activation layer, a pooling layer and a full connection layer.

In some embodiments, the parameters of the second face recognition model are smaller than the first face recognition model.

In some embodiments, the model confidence estimation module comprises a fully connected network or a residual network.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 600 shown in fig. 12 may perform the above method embodiments, and the foregoing and other operations and/or functions of each module in the apparatus 600 are respectively for implementing the corresponding flow in the training method of the face recognition model, which is not described herein for brevity.

Fig. 13 is a schematic block diagram of an apparatus 700 for face recognition according to an embodiment of the present application. As shown in fig. 13, the apparatus 700 may include an acquisition unit 710, a face recognition model 720, a matching unit 730, and a determination unit 740.

An acquiring unit 710, configured to acquire a first face image to be identified;

the face recognition model 720 is used for inputting the first face image to obtain a first image feature corresponding to the first face image, wherein the face recognition model is obtained according to the face recognition model training method provided by the embodiment of the application;

a matching unit 730 for determining a second image feature matching the first image feature among at least one second image feature; the at least one second image feature is obtained by extracting features of at least one reference face image by the face recognition model;

And the determining unit 740 is configured to determine that the first object identity indicated by the first face image is consistent with the second object identity corresponding to the matched second image feature.

In some embodiments, the matching unit 730 is specifically configured to:

determining the similarity of the first image feature and the at least one second image feature respectively;

and under the condition that the similarity is larger than or equal to a preset threshold value, determining the second image feature corresponding to the similarity as the second image feature matched with the first image feature.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 700 shown in fig. 13 may perform the above method embodiments, and the foregoing and other operations and/or functions of each module in the apparatus 700 are respectively for implementing the corresponding flow in the above face recognition method, which is not described herein for brevity.

The apparatus of the embodiments of the present application is described above in terms of functional modules with reference to the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

Fig. 14 is a schematic block diagram of an electronic device 800 provided by an embodiment of the present application.

As shown in fig. 14, the electronic device 800 may include:

a memory 810 and a processor 820, the memory 810 being for storing a computer program and transmitting the program code to the processor 820. In other words, the processor 820 may call and run a computer program from the memory 810 to implement the methods in embodiments of the present application.

For example, the processor 820 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the application, the processor 820 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the application, the memory 810 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the application, the computer program may be partitioned into one or more modules that are stored in the memory 810 and executed by the processor 820 to perform the methods provided by the application. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.

As shown in fig. 14, the electronic device 800 may further include:

a transceiver 830, the transceiver 830 being connectable to the processor 820 or the memory 810.

Processor 820 may control transceiver 830 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. Transceiver 830 may include a transmitter and a receiver. Transceiver 830 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be appreciated that in the specific implementation of the present application, when the above embodiments of the present application are applied to specific products or technologies and relate to data related to user information and the like, user permissions or consents need to be obtained, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in various embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for training a face recognition model, comprising:

2. The method according to claim 1, wherein the performing knowledge distillation on the second face recognition model according to the fusion feature to obtain the trained face recognition model comprises:

3. The method as recited in claim 1, further comprising:

acquiring a second training sample set, wherein the second training sample set comprises a plurality of face image samples;

inputting a current face image sample in the second training sample set into a current first face recognition model to obtain a first current image feature corresponding to the current face image sample;

4. A method according to claim 3, wherein said deriving a first current loss of the current first face recognition model from the first current image feature, the current class center vector, and tag information of the current face image sample comprises:

5. The method of claim 1, wherein initial values corresponding to the at least two initialized first face image recognition models are different.

6. The method as recited in claim 1, further comprising:

acquiring a third training sample set, wherein the third training sample set comprises a plurality of face image samples;

inputting the current face image sample in the third training sample set into each first face recognition model to obtain a second current image feature corresponding to the current face image sample;

7. The method of any of claims 1-6, wherein the first face recognition model comprises a convolutional layer, a nonlinear activation layer, a pooling layer, and a fully-connected layer; the second face recognition model comprises a convolution layer, a nonlinear activation layer, a pooling layer and a full connection layer.

8. The method of claim 7, wherein the parameters of the second face recognition model are smaller than the first face recognition model.

9. The method of any of claims 1-6, wherein the model confidence estimation module comprises a fully connected network or a residual network.

10. A method of face recognition, comprising:

Acquiring a first face image to be identified;

inputting the first face image into a face recognition model to obtain a first image feature corresponding to the first face image, wherein the face recognition model is obtained according to the training method as set forth in any one of claims 1 to 9;

11. The method of claim 10, wherein said determining a second image feature of at least one second image feature that matches said first image feature comprises:

12. A training device for a face recognition model, comprising:

13. An apparatus for face recognition, comprising:

the acquisition unit is used for acquiring a first face image to be identified;

a face recognition model for inputting the first face image to obtain a first image feature corresponding to the first face image, wherein the face recognition model is obtained according to the training method as set forth in any one of claims 1 to 9;

14. An electronic device comprising a processor and a memory, the memory having instructions stored therein that when executed by the processor cause the processor to perform the method of any of claims 1-11.

15. A computer storage medium for storing a computer program, the computer program comprising instructions for performing the method of any one of claims 1-11.

16. A computer program product comprising computer program code which, when run by an electronic device, causes the electronic device to perform the method of any one of claims 1-11.