CN112541458B

CN112541458B - Domain self-adaptive face recognition method, system and device based on meta learning

Info

Publication number: CN112541458B
Application number: CN202011517834.1A
Authority: CN
Inventors: 朱翔昱; 雷震; 郭建珠
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2023-08-11
Anticipated expiration: 2040-12-21
Also published as: CN112541458A

Abstract

The invention belongs to the technical field of face recognition, in particular relates to a domain self-adaptive face recognition method, system and device based on meta-learning, and aims to solve the problems that the existing face recognition method depends on the sample scale of a target scene and is poor in recognition performance. The method comprises the steps of obtaining a face image to be recognized as an input image; acquiring a recognition result of the input image through a pre-trained face recognition model; the face recognition model is constructed based on a residual neural network. The invention reduces the dependence on the sample scale of the target scene and improves the face recognition performance.

Description

Domain self-adaptive face recognition method, system and device based on meta learning

Technical Field

The invention belongs to the technical field of face recognition, and particularly relates to a domain self-adaptive face recognition method, system and device based on meta learning.

Background

Face recognition models typically require deployment into multiple target scenarios, which often can only acquire very limited unlabeled data. The method adopts a manual data labeling mode, the cost is high, fine adjustment is carried out on the premise that labeling samples are few, and the model is easy to be fitted; the other is to gather a large amount of tagged face data from similar domains, but the depth model is easy to fit the training data, and the resulting model tends to have poor performance in the target scene due to the distribution bias. Based on the method, the domain self-adaptive face recognition method based on meta-learning is provided, so that the model obtained through training can be quickly self-adaptive to the target scene only by updating the model local parameters through a small amount of unlabeled samples of the target scene, and the generalization of the model in the target scene is improved.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, that is, in order to solve the problem that the existing face recognition method depends on the sample size of the target scene and has poor recognition performance, the first aspect of the present invention provides a domain adaptive face recognition method based on meta-learning, which includes:

step S10, acquiring a face image to be recognized as an input image;

step S20, acquiring a recognition result of the input image through a pre-trained face recognition model;

the face recognition model is constructed based on a residual neural network, and the training method comprises the following steps:

step A10, acquiring an image sample training set; performing face detection on the image samples of the image sample training set, and constructing a first data set based on the detected image samples containing the faces;

step A20, performing keypoint detection on the image samples in the first data set, and preprocessing the image samples in the first data set by combining a predefined keypoint template;

step A30, acquiring the identification result of each preprocessed image sample through a face recognition model, calculating the cross entropy classification loss of the normalized exponential function, and updating the parameters of the face recognition model;

Step A40, resampling a face image sample of the B person; preprocessing the resampled face image sample by adopting a method of the step A10-A20, and extracting the characteristics of the resampled face image sample for clustering; after clustering, sampling each class respectively, constructing a meta training set and a meta testing set corresponding to each class, and exchanging the meta testing set of each class with the meta testing sets of other classes randomly; b is a positive integer;

step A50, obtaining losses corresponding to a meta training set and a meta testing set of each class through a pre-constructed loss function based on a difficult sample pair, wherein the losses are respectively used as a first loss and a second loss; after the first loss and the second loss are weighted and summed, updating parameters of a face recognition model;

and step A60, circularly executing the steps A10-50 until a trained face recognition model is obtained.

In some preferred embodiments, the face detection model is constructed based on FaceBoxes neural networks; the face key point model is constructed based on a CNN neural network.

In some preferred embodiments, face detection is performed by a face detection model constructed based on FaceBoxes neural networks; and performing key point detection through a face key point detection model constructed based on the CNN neural network.

In some preferred embodiments, the method for obtaining the normalized exponential function cross entropy classification loss is:

wherein L is _cos Representing normalized exponential function cross entropy classification loss, N is the number of image samples in the first dataset, i.e., the number of image samples containing faces, x _i Is the characteristic of the ith image sample, and the corresponding true value class is y _i ，W _j Is the template vector of the j-th class, theta _j Is W _j And x _i The included angle between the two is m is the super parameter of the angle interval, s is the scaling factor constant, W ^* Is a characteristic template matrix corresponding to all categories, x ^* Is a feature matrix formed by all face features,and (5) representing the similarity between the ith image sample and the corresponding face feature.

In some preferred embodiments, the method of "extracting features thereof for clustering" in step a40 is as follows:

extracting features of resampled face image samples, and solving an average value to be used as a face feature average value of each person; wherein, each person B samples two face image samples, one is used as registration photo and the other is used as search photo;

clustering by a balance k-means clustering method based on the characteristic mean value of each face;

the balance k-means clustering method comprises the following steps:

Randomly selecting k clustering centers;

according to the face characteristic mean value F _i Euclidean distances to k cluster centers are ordered and assigned to each F in turn _i A class cluster class, F if the class cluster class is assigned _i If the number is larger than B/k, traversing other classes of cluster categories, and endowing the face feature average value with F _i The number of the categories is less than B/k until the characteristic number of each category cluster is B/k;

calculating a cluster center under the current endowed category;

constructing a feature list L, initializing to be empty, sorting the feature sequences according to the distance from the center of the nearest cluster from large to small, traversing the feature sequences, and if L is empty, setting F _i Increment to L, otherwise, traverse F _j E L, if exchange F _i And F _j The clustering error can be made smaller, and the exchange is carried out until each face feature is obtainedClass cluster category corresponding to the characteristic mean value; the cluster error is the sum of the distances from each feature to the center of the cluster-like assigned to that feature.

In some preferred embodiments, in step a50, "the losses corresponding to the meta training set and the meta testing set of each class are obtained through the pre-constructed loss function based on the hard sample pair, and the losses are respectively used as the first loss and the second loss", which is as follows:

Inputting each face image sample of each meta training set into a face recognition model, and calculating a batch normalization mean value and a variance parameter as first parameters by combining a learnable model parameter theta and a batch normalization function corresponding to the face recognition model;

based on the first parameter, the learnable model parameter theta and the face image sample in the meta-training set, obtaining the loss corresponding to each type of meta-training set through a pre-constructed loss function based on the difficult sample pair as a first loss;

calculating corresponding batch normalization mean and variance parameters of each class of meta-test set as second parameters; decoupling each class of meta-test sets from the corresponding second parameters, and coupling each original test set with the batch normalization mean and variance parameters of the corresponding meta-training set;

and for each class of meta-test set, acquiring the loss corresponding to each class of meta-test set through a pre-constructed loss function based on the difficult sample pair by combining each face image sample, the learnable model parameter theta and the batch normalization mean value and variance parameter of the coupling of the meta-test set, and taking the loss as a second loss.

In some preferred embodiments, the pre-constructed hard sample pair-based loss function is:

Wherein L is _hp Represents a loss value based on a difficult sample pair, P represents an index set of a difficult positive sample pair, N represents an index set of a difficult negative sample pair, F _g Is the characteristic moment of the registration photoArray, F _p The characteristic matrix of the search is that i, j is the index of the characteristic matrix, and represents the i, j person of resampling, the difficult positive sample pair is a sample pair formed by a face image sample in a meta training set/meta test set of each class and other face image samples with feature distances smaller than a set first distance threshold in the test set/the test set, and the difficult negative sample pair is a sample pair formed by a face image sample in a meta training set/meta test set of each class and other face image samples with feature distances larger than a set second distance threshold in the test set/the other test set.

In some preferred embodiments, "weight sum the first loss and the second loss" in step a50 is:

argmin _θ γL _mtr (θ)+(1-γ)L _mte (θ)

L _mtr ＝L _hp (X _mtr ；θ，BN _mtr )

L _mte ＝L _hp (X _mte ；θ，BN _mtr )

wherein, gamma represents a weight parameter, L _mtr Representing the first loss, L _mte Representing a second loss, L _hp Representing a loss function based on a hard sample pair, X _mtr 、X _mte Representing meta training set and meta test set, theta representing the learnable model parameters, BN _mtr And representing the batch normalized mean and variance parameters corresponding to the meta-training set.

In a second aspect of the present invention, a domain-adaptive face recognition system based on meta-learning is provided, the system comprising: the device comprises an acquisition module and an identification module;

the acquisition module is configured to acquire a face image to be identified as an input image;

the recognition module is configured to acquire a recognition result of the input image through a pre-trained face recognition model;

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-described domain-adaptive face recognition method based on meta-learning.

In a fourth aspect of the present invention, a processing device is provided, including a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described meta-learning based domain-adaptive face recognition method.

The invention has the beneficial effects that:

the invention reduces the dependence on the sample scale of the target scene and improves the face recognition performance.

The invention samples tasks with distribution differences in iterative training, wherein each task comprises a meta-training domain and a meta-testing domain; in the optimization process of the task, decoupling is carried out on the model parameters and the statistic parameters of the batch normalization layer, and the gradient is retransmitted and the original model parameters are updated through the weighting loss of the meta training domain and the meta testing domain, so that the model can learn the capability of rapid self-adaption; after a small number of unlabeled samples of the target domain are given, network forwarding is performed by using the model parameters, and the statistic value parameters of the batch normalization layer can be rapidly calculated by combining the batch normalization function, so that rapid self-adaption is realized, and the recognition performance on the target domain is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings.

FIG. 1 is a flow chart of a domain-adaptive face recognition method based on meta-learning according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a framework of a domain-adaptive face recognition system based on meta-learning in accordance with one embodiment of the present application;

FIG. 3 is a flow chart of the preprocessing steps prior to meta-training of one embodiment of the present application;

FIG. 4 is a schematic diagram of a balanced k-means clustering flow in accordance with one embodiment of the present application;

FIG. 5 is a schematic diagram of a cross-distributed task sampling process according to one embodiment of the application;

FIG. 6 is a schematic diagram of a decoupling process according to one embodiment of the present application;

FIG. 7 is a schematic diagram of a decoupled task distribution according to one embodiment of the present application;

FIG. 8 is a flow chart of a face recognition model training process according to an embodiment of the application

FIG. 9 is a flow diagram of an adaptation process according to one embodiment of the application;

FIG. 10 is a flow chart of a prediction unit according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

The application relates to a domain self-adaptive face recognition method based on meta learning, as shown in figure 1, which comprises the following steps:

step S10, acquiring a face image to be recognized as an input image;

In order to more clearly describe the domain-adaptive face recognition method based on meta-learning of the present invention, each step in an embodiment of the method of the present invention is described in detail below with reference to the accompanying drawings.

In the following embodiments, a training process of a face recognition model is described in detail, and then a process of obtaining a face recognition result by a domain adaptive face recognition method based on meta learning is described in detail.

1. The training process of the face recognition model, as shown in fig. 8, is specifically as follows:

in this embodiment, before performing meta-training on the face recognition model, a preprocessing is performed, and the preprocessing process, as shown in fig. 3, is specifically as follows:

acquiring an image sample training dataset; and carrying out face detection on the image samples (namely the input pictures) in the image sample training data set through a pre-constructed face detection model, if the detected image samples contain faces, constructing a first data set from the image samples containing the faces, and carrying out key point detection, otherwise, carrying out face detection on other image samples in the image sample training data set. In the invention, a face detection model is constructed based on FaceBoxes neural network.

In this embodiment, the keypoint detection is performed on the image sample in the first dataset by using a pre-constructed face keypoint detection model. The face key point model is a model constructed based on a CNN neural network.

According to the detected key points of the human face, such as the center points of two eyes, the key points of the nose tip and the two sides of the mouth, the pre-processing is carried out by combining a pre-defined key point template, specifically: affine transformation is carried out on the image sample containing the human face, and cutting and scaling are carried out to a predefined size of 120x120, so that the preprocessed image sample containing the human face is obtained.

in this embodiment, the recognition result of each image sample in the second data set is obtained through a face recognition model constructed based on a residual neural network res net, and the normalized exponential function cross entropy classification loss is calculated, so as to update the parameters of the face recognition model. The normalized exponential function cross entropy classification loss is shown in formula (1):

wherein L is _cos Represents the normalized exponential function cross entropy classification loss, N is the number of image samples in the first dataset, i.e. the number of image samples containing faces, x _i Is the characteristic of the ith image sample, and the corresponding true value class is y _i ，W _j Is the template vector of the j-th class, theta _j Is W _j And x _i The included angle between the two is the super parameter of the angle interval, and is set to 0.4, s is the scaling factor constant, W ^* Is a characteristic template matrix corresponding to all categories, x ^* Is a feature matrix composed of all face features, < +.>And (5) representing the similarity between the ith image sample and the corresponding face feature. And randomly sampling the image samples containing the human face in the training process of the human face recognition model. Each batch randomly samples 128 image samples containing human faces, and carries out data random horizontal turning operation on training pictures for augmentation, which can be understood as carrying out the horizontal turning operation randomly with the probability of 0.5 each time; and then optimizing the model by adopting a batch gradient descent method (Stochastic Gradient Optimization, SGD) until convergence, and taking the model parameters obtained by training as initial model parameters of the meta-training unit.

In this embodiment, the clustering process is shown in fig. 4, and specifically includes the following steps:

number of given class clustersThe quantity k is calculated, in each iteration, based on 2B resampled image samples processed by the preprocessing unit, namely, the batch of face samples in fig. 4 (the invention resamples B persons in training, wherein each person samples a face image as a registration photo and a face image as a search photo), and performs one-time network forwarding according to the existing model parameters to obtain the face characteristics corresponding to each image sample input in batch; i.e. each person has a registration and a search corresponding feature. Then average the features of the registration photo and the search photo of each person to obtain a face feature mean value F epsilon R corresponding to the person ^B ^×C Where C is the feature dimension.

Taking a face characteristic mean value F as an input, carrying out characteristic balance k-means clustering, and dividing algorithm steps into three steps: 1) Randomly selecting a center: firstly, using an initialization mode of k-means++, randomly selecting k cluster centers c ₁ ，…，c _k The method comprises the steps of carrying out a first treatment on the surface of the 2) Initializing: according to the ith face feature mean F _i European feature distance ordering to k cluster centers and sequentially endowing each feature F _i If the feature quantity of the current class cluster class is larger than B/k, traversing the class cluster class, and endowing the class cluster class with the face feature mean value and the other class cluster class with the feature quantity smaller than B/k until the face feature mean value of each class cluster is B/k, and finishing the initialization step of balanced clustering to ensure that the sample quantity of each class cluster is equal; 3) Iterative optimization: the following iterative optimization of the balance k-means is performed: calculating a cluster center under the current endowed category; constructing a feature list L, initializing to be empty, sorting the feature sequences according to the distance from the center of the nearest cluster from large to small, traversing the feature sequences, and if L is empty, setting F _i Put into L, otherwise, traverse F _j E L, if exchange F _i And F _j The clustering error can be smaller, and the switching is performed, so that the feature quantity of each class cluster is ensured to be maintained as B/k; and stopping the iteration after the maximum iteration times are reached, and obtaining class cluster categories corresponding to each characteristic mean value.

After the balanced k-means clustering step, resampled B individuals each correspond to a category, so that k kinds of batch samples with distribution differences are obtained, and the number of each divided batch sample is 2B/k. Specifically, the values of the various parameters are as follows: k=4, b=512, c=256, so 512 people are sampled per resample, 1024 graphs total, and the balanced k-means cluster module clusters into 4 class clusters, each containing 128 people, 256 graphs.

After clustering, from each domain (the invention will simply be referred to as domain) D _i Sampling a task, each task T _i Comprising a meta-training set X _mtr And a meta-test set X _mte . We resample B persons altogether, each person collects two graphs, one as a registered graph and one as a search graph, and ensures the collected meta-training set X _mtr Sum element test set X _mte No overlapping persons. The k domains correspond to k tasks in total.

Due to meta-training set X _mtr And a meta-test set X _mte From the same domain, whose distribution variation is limited, we further randomly perturb each task T _i Meta-training set X _mtr And a meta-test set X _mte I.e. the meta-test set of each class is randomly swapped with the meta-test set of the other class, thus coming from domain D _i Meta-training set X _mtr Will randomly correspond to the data from domain D _j Meta-test set X _mte So that the possible distribution difference of the meta training set and the meta testing set of one task becomes large. If after random disturbance, meta-training set X _mtr Sum element test set X _mte The corresponding relationship of (c) is unchanged, and the degradation is an undisturbed condition, as shown in fig. 5. Training field D in FIG. 5 _s Comprises three classes D ₁ 、D ₂ 、D ₃ And (3) sampling the meta-test sets and the meta-training sets of the three classes, distinguishing the meta-test sets and the meta-training sets by using the superscripts 1, 2 and 3 respectively, and randomly disturbing the meta-test sets of the three classes after sampling.

in this embodiment, parameters of the deep neural network model are first divided into two parts: a learnable model parameter θ and a batch normalized mean and variance parameter BN. Wherein the learnable model parameter theta is needed to be obtained by calculating and updating the back-propagation gradient, the BN is not needed to carry out gradient back-propagation, and the calculation depends on the learnable model parameter theta and the batch normalization function phi _BN . The standard calculation mode of the batch normalization statistic parameters is as follows:

wherein X is a resampled face image sample, y is the output of the normalization layer, EX and Var X are the mean and variance of the batch normalization layer, E is a smaller normal value (e.g. 1E-5) to prevent the divisor from 0 from overflowing the value; gamma and beta are scaling and translation parameters but are not considered by the decoupling module.

The deep neural network can be expressed as a function: f (X; θ, BN). We assume that in a deep neural network model, domain related information is encoded in BN and domain independent information is encoded in a learnable model parameter θ. Therefore, in the training process, the invention proposes to decouple the learnable model parameters theta and BN, and remove the relationship that BN directly depends on the resampled face image sample. As shown in fig. 6, a specific decoupling process is as follows: resampled face image sample X given an input _i By normalizing the function phi in batches _BN And a learnable model parameter theta can be calculated

BN _i ＝(φ _BN οf(θ))(X _i ) The deep neural network model can then be expressed as f (X _i ；θ，BN _i ) Wherein BN is _i Corresponds to X _i Decoupling is to remove BN _i And X is _i Once decoupled, BN _i Then no longer corresponds exactly to X _i Further we will BN _i Coupled according to another input X _j Calculated BN _j . Based on the bookThe cross-distribution task sampling provided by the invention can obtain a plurality of tasks T with distribution differences ₁ ，T ₂ ，…，T _k For each task T _i Meta-training test set X is carried out _mte And corresponding BN _mte Decoupling and coupling to BN corresponding to the task element training set _mtr After decoupling, the structure is shown in fig. 7. Specifically, the values of the various parameters are as follows: k=4. The application of decoupling and coupling is described in detail below.

The meta-training unit further comprises meta-optimization, which is specifically as follows:

first, optimization on meta-training set, meta-training set X in each task for sampling _mtr Inputting each image sample into a face recognition model, and calculating a batch normalization mean value and a variance parameter BN by combining a learnable model parameter theta and a batch normalization function corresponding to the face recognition model _mtr As a first parameter;

based on the first parameter, the learnable model parameter theta and the image samples in the meta-training set, obtaining the loss corresponding to each type of meta-training set through a pre-constructed loss function based on the difficult sample pair as a first loss;

the first loss acquisition process is as shown in formula (3):

L _mtr ＝L _hp (X _mtr ；θ，BN _mtr ) (3)

Wherein X is _mtr Is the input meta training set, theta is the learnable model parameter, BN _mtr Is the batch normalized mean and variance parameters obtained on the meta-training set, L _hp Representing a loss function based on hard sample pairs, this step does not perform decoupling and coupling operations.

Next is optimization on meta-test set, meta-test set X in each task for sampling _mte Calculating corresponding batch normalized mean and variance parameters BN _mte As a second parameter; decoupling each meta-test set from the corresponding second parameter, and normalizing the batch average and variance parameters BN of each original test set and its corresponding meta-training set _mtr Coupling is carried out;

and for each class of meta-test set, acquiring the loss corresponding to each class of meta-test set through a pre-constructed loss function based on the difficult sample pair by combining each image sample, the learnable model parameter theta and the batch normalization mean and variance parameters of the coupling of the meta-test set, and taking the loss as a second loss.

The second loss acquisition process is shown in formula (4):

L _mte ＝L _hp (X _mte ；θ，BN _mtr ) (4)

wherein X is _mte Is the meta-test of the input. We use the batch normalized mean and variance parameters BN obtained on the meta-training set _mtr Instead of BN on meta-test sets _mte The step is to simulate the self-adaptation link of the model on the target domain in iterative training, so that the self-adaptation capability of the model is improved.

The loss function based on the difficult sample pair is specifically shown as the following formula (5):

where P is the index set of the difficult positive sample pair, N is the index set of the difficult negative sample pair, F _g Is the feature matrix of the registration photo, F _p Is the feature matrix of the search, i, j is the index of the feature matrix, representing the person i, j who resamples. The loss of the difficult sample pair is focused on optimizing the difficult positive sample pair and the difficult negative sample pair, so that whether the model learns more discriminative features can be realized. In particular, we use a distance threshold τ of one positive sample _p And a negative sample distance threshold τ _n To make the selection of the difficult positive and negative pairs of samples, the thresholds are set to 0.3 and 0.04 for initialization, and change as the number of iterations increases: τ _p ＝0.3+0.1n，τ _n ＝0.4×0.5 ⁿ . For positive sample pairs, only the most difficult B x tau is selected _p The index is denoted as P; for negative pairs, only the most difficult B× (B-1) x τ was selected _n The index is denoted N. I.e. difficult positive samplesFor each class of meta-training set/meta-test set, the distance between the image sample and the feature in the test set/test set is smaller than a set first distance threshold tau _p The difficult negative sample pair is that the distance between the image sample in the meta training set/meta testing set and the characteristic in the other testing set/other testing set of each class is larger than the set second distance threshold tau _n A sample pair consisting of image samples of (a) is provided.

In addition, to balance optimization across meta-training and meta-test sets, the final meta-optimization objective is as follows:

argmin _θ γL _mtr (θ)+(1-γ)L _mte (θ) (6)

where γ is a super parameter, set to 0.5, is used to balance the loss of meta-training and meta-test sets. The meta-optimization objective is such that: the learnable model parameters theta are optimized, so that the learnable model parameters theta can obtain better performance on the meta-training set, and can be quickly self-adaptive to another domain, and the self-adaptive process only needs to update the mean value BN of the normalization layer. The second regularization term L of the meta-optimization objective _mte And (theta) so that the model parameters theta are more robust to BN calculated from different distributions, and the whole model can be better self-adaptive to a target domain.

Optimizing a learnable model parameter theta according to the weighting loss of the meta optimization; judging whether the training loss is converged, if so, stopping training, updating the model parameters theta, otherwise, circularly executing the steps A40-A50.

In this embodiment, the learnable model parameters θ are optimized according to the weighting loss of meta-optimization; judging whether the training loss is converged or not, and if so, stopping training to obtain a model parameter theta; otherwise, executing the steps A40-50 until a trained face recognition model is obtained.

In addition, after the training is completed, a small amount of unlabeled pictures of a given target scene are used as input, and the trained model parameters are used for calculating to obtain the batch normalization statistical mean and variance, so that the batch normalization statistical mean and variance parameters contained in the original trained model are replaced. After the substitution, the final model parameters are obtained, namely the self-adaption of the face recognition model is completed, as shown in fig. 9.

2. Domain self-adaptive face recognition method based on meta learning

Step S10, acquiring a face image to be recognized as an input image;

in this embodiment, a face image to be recognized is first acquired.

And step S20, acquiring a recognition result of the input image through a pre-trained face recognition model.

In this embodiment, the face image to be recognized is input into a trained face recognition model, and network forward calculation is performed to obtain face features, and the face features are used to perform face comparison or recognition, as shown in fig. 10. In addition, before forward computation, the invention needs to preprocess the face image, and the preprocessing process is the same as the method in the steps A10 and A20, namely, face detection, key point detection, shooting transformation, cutting and scaling are carried out on the face image.

A domain-adaptive face recognition system based on meta-learning according to a second embodiment of the present invention, as shown in fig. 2, specifically includes: the device comprises an acquisition module and an identification module;

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes and related descriptions of the above-described system may refer to corresponding processes in the foregoing method embodiments, which are not described herein again.

It should be noted that, in the domain adaptive face recognition system based on meta learning provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present application stores therein a plurality of programs adapted to be loaded by a processor and to implement the above-described domain-adaptive face recognition method based on meta-learning.

A processing device according to a fourth embodiment of the present application includes a processor, a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described meta-learning based domain-adaptive face recognition method.

It can be clearly understood by those skilled in the art that the storage device, the specific working process of the processing device and the related description described above are not described conveniently and simply, and reference may be made to the corresponding process in the foregoing method example, which is not described herein.

The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. The domain self-adaptive face recognition method based on meta learning is characterized by comprising the following steps of:

step S10, acquiring a face image to be recognized as an input image;

Step S20, obtaining a recognition result of the input image through a pre-trained face recognition model:

after the input image is input into a pre-trained face recognition model, calculating statistic parameters of a batch normalization layer by combining a batch normalization function;

combining the statistic value parameters of the batch normalization layers, and performing network forward calculation on the input images to obtain face features; identifying the face features to obtain an identification result of the input image;

The method for acquiring the normalized exponential function cross entropy classification loss comprises the following steps:

wherein L is _cos Representing normalized exponential function cross entropy classification loss, N is the number of image samples in the first dataset, i.e., the number of image samples containing faces, x _i Is the characteristic of the ith image sample, and the corresponding true value class is y _i ，W _j Is the template vector of the j-th class, theta _j Is W _j And x _i Between (a) and (b)The included angle, m, is the super-parameter of the angle interval, s is the scaling factor constant, W ^* Is a characteristic template matrix corresponding to all categories, x ^* Is a feature matrix formed by all face features,representing the similarity between the ith image sample and the corresponding face feature;

The method comprises the steps of obtaining losses corresponding to a meta training set and a meta testing set of each class through a pre-constructed loss function based on a difficult sample pair, wherein the losses are respectively used as a first loss and a second loss, and the method comprises the following steps:

for each class of meta-test set, acquiring the loss corresponding to each class of meta-test set through a pre-constructed loss function based on a difficult sample pair by combining each face image sample, a learnable model parameter theta and a batch normalization mean value and variance parameter of the coupling of the meta-test set, and taking the loss as a second loss;

2. The domain-adaptive face recognition method based on meta-learning of claim 1, wherein face detection is performed through a face detection model constructed based on FaceBoxes neural network; and performing key point detection through a face key point detection model constructed based on the CNN neural network.

3. The domain-adaptive face recognition method based on meta-learning according to claim 1, wherein "extracting features thereof for clustering" in step a40 comprises the following steps:

the balance k-means clustering method comprises the following steps:

randomly selecting k clustering centers;

calculating a cluster center under the current endowed category;

constructing a feature list L, initializing to be empty, sorting the feature sequences according to the distance from the center of the nearest cluster from large to small, traversing the feature sequences, and if L is empty, setting F _i Increment to L, otherwise, traverse F _j E L, if exchange F _i And F _j If the clustering error is smaller, exchanging until class cluster categories corresponding to the characteristic mean value of each face are obtained; the cluster error is the sum of the distances from each feature to the center of the cluster-like assigned to that feature.

4. A domain adaptive face recognition method based on meta-learning as claimed in claim 3, wherein the pre-constructed loss function based on hard sample pairs is:

wherein L is _hp Represents a loss value based on a difficult sample pair, P represents an index set of a difficult positive sample pair, N represents an index set of a difficult negative sample pair, F _g Is the feature matrix of the registration photo, F _p The characteristic matrix of the search is that i, j is the index of the characteristic matrix, and represents the i, j person of resampling, the difficult positive sample pair is a sample pair formed by a face image sample in a meta training set/meta test set of each class and other face image samples with feature distances smaller than a set first distance threshold in the test set/the test set, and the difficult negative sample pair is a sample pair formed by a face image sample in a meta training set/meta test set of each class and other face image samples with feature distances larger than a set second distance threshold in the test set/the other test set.

5. The method for domain-adaptive face recognition based on meta-learning according to claim 4, wherein "the first loss and the second loss are weighted and summed" in step a50 is as follows:

L _mtr ＝L _hp (X _mtr ；θ，BN _mtr )

L _mte ＝L _hp (X _mte ；θ，BN _mtr )

6. A domain-adaptive face recognition system based on meta-learning, the system comprising: the device comprises an acquisition module and an identification module;

the recognition module is configured to acquire a recognition result of the input image through a pre-trained face recognition model:

wherein L is _cos Representing normalized exponential function cross entropy classification loss, N is the number of image samples in the first dataset, i.e., the number of image samples containing faces, x _i Is the characteristic of the ith image sample, and the corresponding true value class is y _i ，W _j Is the template vector of the j-th class, theta _j Is W _j And x _i The included angle between the two is m is the super parameter of the angle interval, s is the scaling factor constant, W ^* Is a characteristic template matrix corresponding to all categories, x ^* Is a feature matrix formed by all face features,representing the similarity between the ith image sample and the corresponding face feature;

7. A storage device in which a plurality of programs are stored, characterized in that the programs are adapted to be loaded and executed by a processor to implement the meta-learning based domain adaptive face recognition method of any one of claims 1-5.

8. A processing device, comprising a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor for implementing a meta-learning based domain-adaptive face recognition method as claimed in any one of claims 1-5.