CN111898547B

CN111898547B - Training method, device, equipment and storage medium of face recognition model

Info

Publication number: CN111898547B
Application number: CN202010760772.0A
Authority: CN
Inventors: 张国辉; 徐玲玲; 宋晨
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2024-04-16
Anticipated expiration: 2040-07-31
Also published as: CN111898547A; WO2021139309A1

Abstract

The invention relates to the field of artificial intelligence, and provides a training method, device and equipment of a face recognition model and a storage medium, which are used for solving the problem of low recognition accuracy of the existing face recognition model. The training method of the face recognition model comprises the following steps: acquiring a plurality of training data sets, a trunk network of a preset face recognition model and a plurality of classification networks; face feature extraction is respectively carried out on a plurality of training data sets through a backbone network, so that a plurality of feature sets are obtained; classifying the plurality of feature sets through a plurality of classification networks to obtain a plurality of classified data sets; calculating a plurality of feature vector loss function values for the plurality of feature sets and a plurality of classification loss function values for the plurality of classification data sets; calculating a target loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values; and carrying out iterative updating on the backbone network according to the objective loss function value until the objective loss function value is converged, and obtaining an updated face recognition model.

Description

Training method, device, equipment and storage medium of face recognition model

Technical Field

The present invention relates to the field of neural networks for artificial intelligence, and in particular, to a training method, apparatus, device, and storage medium for a face recognition model.

Background

Face recognition is a popular field in the field of image recognition, and a neural network capable of performing face recognition, namely a face recognition model, is usually obtained through deep learning and training. For the recognition accuracy of the face recognition model, the face recognition model obtained by training the training data of the application scene is limited by the application scene of the training data, so that the recognition accuracy is lower, and the recognition accuracy of the face recognition model is improved by adopting a mode of optimizing the universality of the face recognition model.

At present, for the optimization of the universality of the face recognition model, a fine adjustment mode or a mode of mixing a plurality of training sets is generally adopted, but the feature retention of the original training set is little due to the fine adjustment mode after the model is trained, so that the generalization effect of the final face recognition model is poor; the mode of mixing a plurality of training sets has the defect that overlapping data are difficult to clean, and the overlapping data are introduced into training as dirty data to influence the training effect of the model, so that the recognition accuracy of the traditional face recognition model is low.

Disclosure of Invention

The invention mainly aims to solve the problem of low recognition accuracy of the existing face recognition model.

The first aspect of the present invention provides a training method for a face recognition model, including:

acquiring a plurality of preprocessed training data sets, wherein the plurality of training data sets are face training data sets corresponding to a plurality of application scenes respectively;

extracting facial features from the training data sets through a trunk network in a preset face recognition model to obtain a plurality of feature sets, wherein the face recognition model comprises the trunk network and a plurality of classification networks;

classifying the feature sets through the classification networks to obtain a plurality of classified data sets, wherein one classification network corresponds to one feature set;

calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values;

calculating a target loss function value of the face recognition model according to the feature vector loss function values and the classification loss function values;

and carrying out iterative updating on the backbone network according to the objective loss function value until the objective loss function value is converged, and obtaining an updated face recognition model.

Optionally, in a first implementation manner of the first aspect of the present invention, the extracting facial features of the plurality of training data sets through a backbone network in a preset face recognition model to obtain a plurality of feature sets includes:

acquiring the number of data sets of the plurality of training data sets, and calculating the average data volume of each training data set according to the number of data sets;

taking the training data corresponding to the average data volume as batch processing data to obtain target batch processing data corresponding to each training data set;

and sequentially performing face image region detection, face key point detection and face feature vector extraction on the target batch processing data through a backbone network in a preset face recognition model to obtain a plurality of feature sets.

Optionally, in a second implementation manner of the first aspect of the present invention, the calculating the objective loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values includes:

calculating the average value of the feature vector loss function values according to the number of the data sets to obtain an average feature vector loss function value;

Calculating the average value of the multiple classification loss function values according to the number of the data sets to obtain an average classification loss function value;

and calculating the sum of the average feature vector loss function value and the average classification loss function value to obtain the target loss function value of the face recognition model.

Optionally, in a third implementation manner of the first aspect of the present invention, the calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the class loss function value of each class data set to obtain a plurality of class loss function values includes:

calculating a first feature center vector corresponding to each feature set and second feature center vectors corresponding to the feature sets;

calculating a distance value between a first feature center vector and the second feature center vector corresponding to each feature set, and determining the distance value as a feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values;

acquiring preset labels corresponding to the training data in each training data set, and calculating the classification loss function value of each classification data set according to the preset labels and the preset cross entropy loss function to obtain a plurality of classification loss function values.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the calculating a class loss function value of each class data set according to the preset label and the preset cross entropy loss function, to obtain a plurality of class loss function values includes:

counting the number of the labels of the preset labels in each classified data set, and acquiring the feature vector of the classified data corresponding to the preset labels in each classified data set;

according to a preset cross entropy loss function, the number of labels and the feature vector, calculating a classification loss function value of each classification data set to obtain a plurality of classification loss function values, wherein the cross entropy loss function is as follows:

wherein y represents the y-th training dataset and c _y A classification data set corresponding to the y-th training data set, wherein n is as follows _y The label number is the label number _i Preset labels for the ith category, said v _i Is the feature vector.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the iteratively updating the backbone network according to the objective loss function value until the objective loss function value converges, to obtain an updated face recognition model includes:

Judging whether the target loss function value is converged or not, if the target loss function value is not converged, updating the target batch processing data to obtain updated target batch processing data, and updating the network structure of the backbone network to obtain an updated backbone network;

sequentially extracting and classifying facial features of the updated target batch data through the updated backbone network and the plurality of classification networks to obtain a plurality of target classification data sets;

calculating an updated objective loss function value according to the plurality of objective classification data sets, and judging whether the updated objective loss function value is converged or not;

and if the updated objective loss function value is not converged, iteratively updating the updated backbone network according to the updated objective loss function value until the updated objective loss function value is converged, so as to obtain a final updated face recognition model.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the acquiring a plurality of preprocessed training data sets, where the plurality of training data sets are face training data sets corresponding to a plurality of application scenarios respectively includes:

Acquiring initial training data sets respectively corresponding to a plurality of application scenes, wherein the initial training data sets comprise open source data and private data;

and carrying out data cleaning and label marking on each initial training data set in sequence to obtain a plurality of preprocessed training data sets.

The second aspect of the present invention provides a training device for a face recognition model, including:

the acquisition module is used for acquiring a plurality of preprocessed training data sets, wherein the plurality of training data sets are face training data sets corresponding to a plurality of application scenes respectively;

the feature extraction module is used for extracting the facial features of the training data sets through a main network in a preset face recognition model to obtain a plurality of feature sets, and the face recognition model comprises the main network and a plurality of classification networks;

the classification module is used for classifying the feature sets through the classification networks to obtain a plurality of classification data sets, wherein one classification network corresponds to one feature set;

the first calculation module is used for calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values;

The second calculation module is used for calculating the target loss function value of the face recognition model according to the feature vector loss function values and the classification loss function values;

and the iteration updating module is used for carrying out iteration updating on the backbone network according to the objective loss function value until the objective loss function value converges to obtain an updated face recognition model.

Optionally, in a first implementation manner of the second aspect of the present invention, the feature extraction module is specifically configured to:

Optionally, in a second implementation manner of the second aspect of the present invention, the second computing module is specifically configured to:

Optionally, in a third implementation manner of the second aspect of the present invention, the first computing module includes:

a first calculating unit, configured to calculate a first feature center vector corresponding to each feature set, and second feature center vectors corresponding to the feature sets;

the second computing unit is used for computing a distance value between the first characteristic center vector and the second characteristic center vector corresponding to each characteristic set, determining the distance value as a characteristic vector loss function value of each characteristic set and obtaining a plurality of characteristic vector loss function values;

and the third calculation unit is used for acquiring preset labels corresponding to the training data in each training data set, and calculating the classification loss function value of each classification data set according to the preset labels and the preset cross entropy loss function to obtain a plurality of classification loss function values.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the third computing unit is specifically configured to:

Optionally, in a fifth implementation manner of the second aspect of the present invention, the iterative updating module is specifically configured to:

Optionally, in a sixth implementation manner of the second aspect of the present invention, the acquiring module includes:

the acquisition unit is used for acquiring initial training data sets respectively corresponding to a plurality of application scenes, wherein the initial training data sets comprise open source data and private data;

the preprocessing unit is used for sequentially carrying out data cleaning and label marking on each initial training data set to obtain a plurality of preprocessed training data sets.

A third aspect of the present invention provides a training apparatus for a face recognition model, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the training device of the face recognition model to perform the training method of the face recognition model described above.

A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein that, when run on a computer, cause the computer to perform the above-described training method of a face recognition model.

In the technical scheme provided by the invention, a plurality of preprocessed training data sets, a trunk network of a preset face recognition model and a plurality of classification networks are obtained, wherein the plurality of training data sets are face training data sets corresponding to a plurality of application scenes respectively; face feature extraction is respectively carried out on a plurality of training data sets through a backbone network, so that a plurality of feature sets are obtained; classifying the plurality of feature sets through a plurality of classification networks to obtain a plurality of classified data sets, wherein one classification network corresponds to one feature set; calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values; calculating a target loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values; and carrying out iterative updating on the backbone network according to the objective loss function value until the objective loss function value is converged, and obtaining an updated face recognition model. According to the invention, the face feature extraction and classification are respectively carried out on the plurality of training data sets, so that the situation that the model training is adversely affected by overlapping data and the overlapping data serving as dirty data is avoided, and the main network of the face recognition model is updated according to the target loss function values obtained by the plurality of feature vector loss function values and the plurality of classification loss function values, so that the face recognition model has better universality, and the recognition precision of the conventional face recognition model is improved.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a training method of a face recognition model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of another embodiment of a training method of a face recognition model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a training apparatus for face recognition models according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a training apparatus for face recognition models according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of a training device for a face recognition model according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a training method, device and equipment for a face recognition model and a storage medium, which solve the problem of low recognition accuracy of the existing face recognition model.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and one embodiment of a training method for a face recognition model in an embodiment of the present invention includes:

101. and acquiring a plurality of preprocessed training data sets, wherein the plurality of training data sets are face training data sets corresponding to a plurality of application scenes respectively.

It can be understood that the execution subject of the present invention may be a training device of a face recognition model, and may also be a terminal or a server corresponding to a logistics headquarter, which is not limited herein. The embodiment of the invention is described by taking a server corresponding to a logistics headquarter as an execution main body.

One training data set corresponds to one application scenario, for example: the person certificate identifies a scene and a natural scene. The training data sets may be face data, open source data, and private data in different dimensions, such as: face data of natural scenes, face data of asians, attendance data, personnel evidence data and competition data.

The server can extract a plurality of preprocessed training data sets from a preset database, and can also obtain face training data sets under different dimensions corresponding to a plurality of application scenes respectively from a plurality of channels, and preprocess the face training data sets to obtain a plurality of preprocessed training data sets.

102. And respectively extracting facial features from the training data sets through a trunk network in a preset face recognition model to obtain a plurality of feature sets, wherein the face recognition model comprises the trunk network and a plurality of classification networks.

The preset face recognition model comprises a main network and a plurality of classification networks, wherein the output of the main network is the input of the plurality of classification networks, and the data processed by the main network are classified through the plurality of classification networks, so that the face recognition training of the training data set is realized. The backbone network may be a single convolutional neural network or a composite framework of multiple convolutional neural networks, such as: the backbone network can be a deep residual error learning frame ResNet or a target detection network frame ET-Yolov3, and can also be a comprehensive frame of combining the deep residual error learning frame ResNet with the target detection network frame ET-Yolov 3.

The server can perform face frame recognition, frame region division, face key point detection and face feature vector extraction on each training data set through a backbone network of the face recognition model to obtain feature sets (namely a plurality of feature sets) corresponding to each training data set. The convolution network layer in the backbone network adopts a small convolution kernel, more features are reserved through the small convolution kernel, the calculated amount is reduced, and the face feature extraction efficiency is improved.

103. And classifying the plurality of feature sets through a plurality of classification networks to obtain a plurality of classified data sets, wherein one classification network corresponds to one feature set.

The server acquires the labels on the training data corresponding to each feature set, invokes a plurality of classification networks, classifies the feature sets through the classification networks and the labels, and obtains a plurality of classification data sets. Wherein a classification network classifies a feature set, for example: the plurality of classification networks are A1, B1, C1 and D1 respectively, the plurality of feature sets are A2, B2, C2 and D2 respectively, A1 classifies A2, B1 classifies B2, C1 classifies C2, and D1 classifies D2. Each classification network may use the same network structure, or may use different network structures, for example: the classification networks A1, B1, C1 and D1 are all linear classifiers, the classification networks A1, B1, C1 and D1 are convolutional neural network acceptance-v 3, linear classifiers, nearest neighbor classifiers and GoogLeNet, the network complexity is reduced through the same network structure, and the classification efficiency and the universality of the face recognition model are improved by adopting different network structures to process training data of different types.

104. And calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values.

The server calculates a first center vector and a second center vector, calculates a distance value between each first center vector and each second center vector, and uses the distance value as a feature vector loss function value corresponding to each feature set, thereby obtaining a plurality of feature vector loss functions, wherein the first center vector is a center vector corresponding to each feature set, or can be a center vector corresponding to each training data in each feature set, and the second center vector can be a second center vector corresponding to all feature sets, or can be a center vector corresponding to all training data in each feature set.

The server can calculate the average value of the sum value according to the number of the training data by acquiring the number of the training data corresponding to each feature set and calculating the sum value of the first center vectors corresponding to all the training data, wherein the average value is the second center vector corresponding to each feature set, and the server can also calculate the second center vector through a preset center vector formula.

The server calculates the classification loss function value of each classification data set through a preset cross entropy loss function, so that a plurality of classification loss function values are obtained, the cross entropy loss function can be a multi-classification cross entropy loss function, and the derivation is simpler through the multi-classification cross entropy loss function, so that the convergence is faster, and the corresponding weight matrix is updated faster.

105. And calculating the target loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values.

After obtaining the plurality of feature vector loss function values and the plurality of classification loss function values, the server obtains the number of data sets of the plurality of training data sets, calculates an average feature vector loss function value of the plurality of feature vector loss function values and an average classification loss function value of the plurality of classification loss function values according to the number of data sets, and takes the sum value of the average feature vector loss function value and the average classification loss function value as a target loss function value of the face recognition model or takes the weighted sum value of the average feature vector loss function value and the average classification loss function value as a target loss function value of the face recognition model. When each classification network calculates the classification loss function value, the corresponding classification network can be reversely updated according to the classification loss function value.

106. And carrying out iterative updating on the backbone network according to the objective loss function value until the objective loss function value is converged, and obtaining an updated face recognition model.

And the server carries out iterative updating on the network structure and/or the weight value of the main network according to the objective loss function value and the preset iteration times until the objective loss function value converges (namely, the training precision of the face recognition model meets the preset condition) to obtain an updated face recognition model. The network structure of the backbone network can be updated by adding or deleting network layers to the backbone network, or by adding other network frames, or by modifying the convolution kernel size, step size, etc. When the backbone network is iteratively updated, the server can also optimize the face recognition model by combining an optimization algorithm.

In the embodiment of the invention, the face characteristics of the plurality of training data sets are respectively extracted and classified, so that the situation that the model training is adversely affected by overlapping data and the overlapping data serving as dirty data is avoided, and the main network of the face recognition model is updated according to the objective loss function values obtained by the plurality of characteristic vector loss function values and the plurality of classification loss function values, so that the face recognition model has better universality, and the recognition precision of the conventional face recognition model is improved.

Referring to fig. 2, another embodiment of the training method of the face recognition model in the embodiment of the present invention includes:

201. initial training data sets respectively corresponding to a plurality of application scenes are obtained, wherein the initial training data sets comprise open source data and private data.

The server crawls the initial training data sets (open source data) corresponding to the plurality of different application scenes from the network platform by extracting the initial training data sets (open source data) corresponding to the plurality of different application scenes in different dimensionalities from the open source database, and extracts the initial training data sets (private data) corresponding to the plurality of different application scenes from the alliance chain or the private database.

202. And carrying out data cleaning and label marking on each initial training data set in sequence to obtain a plurality of preprocessed training data sets.

And the server sequentially carries out missing value detection, missing value filling and missing value cleaning on each initial training data set according to a preset missing value proportion to obtain initial training data sets after missing value processing, merges and de-duplicates the initial training data sets after missing value processing to obtain initial training data sets after merging and de-duplication, judges whether training data which does not accord with a preset validity judgment rule exist in the initial training data sets after merging and de-duplication, deletes the corresponding training data if the training data do not exist, determines the initial training data sets after merging and de-duplication as candidate training data sets if the training data sets do not exist, and labels the candidate training data sets to obtain a plurality of training data sets after preprocessing.

The content of the label can comprise at least one of classification label, label frame label, region label and description point label, and the classification label comprises the following steps: age-adult, sex-female, race-yellow race, hair-long hair, facial expression-smile, facial wear parts-classification labeling of glasses, box labels such as: the frame positions of the faces in the images are marked, and the region marks are as follows: the positions of the areas of the faces in the images are marked, and the description points are marked by the following steps: and marking key points of the human face.

203. And respectively extracting facial features from the training data sets through a trunk network in a preset face recognition model to obtain a plurality of feature sets, wherein the face recognition model comprises the trunk network and a plurality of classification networks.

Specifically, the server acquires the number of data sets of a plurality of training data sets, and calculates the average data volume of each training data set according to the number of data sets; taking training data corresponding to the average data volume as batch processing data to obtain target batch processing data corresponding to each training data set; and sequentially performing face image region detection, face key point detection and face feature vector extraction on the target batch processing data through a backbone network in a preset face recognition model to obtain a plurality of feature sets.

Taking one of a plurality of training data sets as an example, for example: the number of data sets of a plurality of training data sets obtained by the server is 5, the training data set E has 800 training data, the average data size of the training data set E is 160 training data, then the target batch processing data is 160 training data, face image area detection is carried out on the 160 training data (namely the target batch processing data) through a main network in a preset face recognition model, a face area is obtained, face key point detection is carried out on the face area, face key point information is obtained, face feature vector extraction is carried out on the face key point information, and a plurality of feature sets are obtained. And if the training data quantity of different training data sets is inconsistent, randomly and circularly processing the target batch processing data obtained by different training data sets when the target batch processing data with the small training data quantity is processed firstly in the data processing process until the target batch processing data with the largest training data quantity is processed.

204. And classifying the plurality of feature sets through a plurality of classification networks to obtain a plurality of classified data sets, wherein one classification network corresponds to one feature set.

The server acquires the labels on the training data corresponding to each feature set, invokes a plurality of classification networks, classifies the feature sets through the classification networks and the labels, and obtains a plurality of classification data sets. Wherein a classification network classifies a feature set. Each classification network can adopt the same network structure or different network structures, network complexity is reduced through the same network structure, and different types of training data are processed through the different network structures, so that the classification efficiency and the universality of the face recognition model are improved.

205. And calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values.

Specifically, the server calculates a first feature center vector corresponding to each feature set and second feature center vectors corresponding to a plurality of feature sets; calculating a distance value between a first feature center vector and a second feature center vector corresponding to each feature set, determining the distance value as a feature vector loss function value of each feature set, and obtaining a plurality of feature vector loss function values; acquiring preset labels corresponding to the training data in each training data set, and calculating the classification loss function value of each classification data set according to the preset labels and the preset cross entropy loss function to obtain a plurality of classification loss function values.

The server acquires a first feature vector of each feature set, the number of first training data corresponding to each feature set, and the number of first data of target batch processing data, calculates a first feature center vector corresponding to each feature set through a preset first update center vector formula, and calculates a first update center vector formula of the first feature center vector as follows:

p is used to indicate the p-th feature set, vc _p Vc is the current first feature center vector _p-1 For the first feature center vector of the last iteration, vn _p For the first data number of the current iteration, n _p For the first training data number, v _i Is the current first feature vector. Wherein, the first characteristic center vector vc before the first iteration _p Is 0.

The server acquires second feature vectors of all feature sets, second training data numbers corresponding to all feature sets, second data numbers of target batch processing data corresponding to all feature sets, and a second updating center vector formula of the second feature center vector is calculated through a preset second updating center vector formula as follows:

q is used to indicate the q-th iteration, v _q V is the current second feature center vector _q-1 Vk is the second feature center vector of the last iteration _q For the second data number of the current iteration, _nq for the second training data number, v _j And the second feature vector of all the current feature sets. Wherein the second feature center vector v before the first iteration _q Is 0.

The server acquires the dimension of the first feature vector of each feature set, calculates a feature vector loss function value according to the dimension of the first feature vector of each feature set, the first feature center vector and the second feature center vector, and the calculation formula of the feature vector loss function value is as follows:

p is used to indicate the p-th feature set, m is the dimension of the first feature vector of each feature set, vc _p As the first feature center vector, v _q Is the second feature center vector.

Specifically, the server counts the number of the preset labels in each classified data set, and obtains the feature vector of the classified data corresponding to the preset labels in each classified data set; according to a preset cross entropy loss function, the number of labels and feature vectors, calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values, wherein the cross entropy loss function is as follows:

wherein y represents the y-th training dataset, c _y For the classification dataset corresponding to the y-th training dataset, n _y Label number, label _i Preset label for ith class, v _i Is a feature vector.

The server classifies the features according to the preset labels on each training data to obtain a plurality of classified data sets, so that the label number of the preset labels in each classified data set, the preset labels of each class and the feature vectors generated by the classified data corresponding to the preset labels of each class can be obtained, and the classified loss function value of each classified data set is obtained by combining the preset cross entropy loss function with the obtained label number and the feature vectors to calculate.

206. And calculating the target loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values.

Specifically, the server calculates the average value of a plurality of eigenvector loss function values according to the number of the data sets to obtain an average eigenvector loss function value; calculating the average value of the multiple classification loss function values according to the number of the data sets to obtain an average classification loss function value; and calculating the sum of the average feature vector loss function value and the average classification loss function value to obtain the target loss function value of the face recognition model.

For example: the number of data sets is 6, the plurality of eigenvector loss function values are L1, L2, L3, L4 and L5, the plurality of class loss function values are K1, K2, K3 and K4, respectively, then the average eigenvector loss function value is (l1+l2+l3+l4+l5)/6=l, the average class loss function value is (k1+k2+k3+k4)/6=k, and the objective loss function value is lc=l+k.

207. And carrying out iterative updating on the backbone network according to the objective loss function value until the objective loss function value is converged, and obtaining an updated face recognition model.

Specifically, whether the target loss function value is converged or not is judged, if the target loss function value is not converged, the target batch processing data is updated to obtain updated target batch processing data, and the network structure of the backbone network is updated to obtain an updated backbone network; sequentially extracting and classifying face features of the updated target batch data through the updated backbone network and the plurality of classification networks to obtain a plurality of target classification data sets; calculating an updated objective loss function value according to the plurality of objective classification data sets, and judging whether the updated objective loss function value is converged or not; and if the updated objective loss function value is not converged, carrying out iterative updating on the updated backbone network according to the updated objective loss function value until the updated objective loss function value is converged, and obtaining a final updated face recognition model.

And when the server judges that the target loss function value is converged, the current face recognition model is used as a final face recognition model. And when the server judges that the updated objective loss function value is converged, the face recognition model updated currently is used as the face recognition model updated finally. The operation method for extracting and classifying the face features of the updated target batch data to obtain multiple target classification data sets is similar to the operation methods of steps 102, 103, 203 and 204, and the operation method for calculating the updated target loss function value according to the multiple target classification data sets is similar to the operation methods of steps 104, 105, 205 and 206, which are not described herein. In each iteration, the data quantity of each updated target batch data is different and dynamically changed and is equal to the sum value between the target batch data in the previous iteration and the current target batch data.

In the embodiment of the invention, the cleaning data and the label labeling are respectively carried out on a plurality of initial training data sets, and the face feature extraction and the classification are carried out on a plurality of training data sets, so that the different data sets do not need to be combined and cleaned, only need to be cleaned respectively, the time for cleaning the data is greatly saved, the situation that the model training is adversely affected by overlapping data and the overlapping data serving as dirty data is effectively avoided, and the main network of the face recognition model is updated according to the target loss function values obtained by a plurality of feature vector loss function values and a plurality of classification loss function values, so that the face recognition model has better universality, and the recognition precision of the existing face recognition model is improved.

The above describes the training method of the face recognition model in the embodiment of the present invention, and the following describes the training device of the face recognition model in the embodiment of the present invention, referring to fig. 3, one embodiment of the training device of the face recognition model in the embodiment of the present invention includes:

the acquiring module 301 is configured to acquire a plurality of preprocessed training data sets, where the plurality of training data sets are face training data sets corresponding to a plurality of application scenarios respectively;

the feature extraction module 302 is configured to extract facial features of a plurality of training data sets through a backbone network in a preset face recognition model, so as to obtain a plurality of feature sets, where the face recognition model includes a backbone network and a plurality of classification networks;

a classification module 303, configured to classify a plurality of feature sets through a plurality of classification networks, to obtain a plurality of classification data sets, where one classification network corresponds to one feature set;

a first calculation module 304, configured to calculate a feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculate a classification loss function value of each classification data set to obtain a plurality of classification loss function values;

a second calculation module 305, configured to calculate a target loss function value of the face recognition model according to the plurality of feature vector loss function values and the plurality of classification loss function values;

And the iteration updating module 306 is configured to iteratively update the backbone network according to the objective loss function value until the objective loss function value converges, thereby obtaining an updated face recognition model.

The function implementation of each module in the training device of the face recognition model corresponds to each step in the training method embodiment of the face recognition model, and the function and the implementation process of the module are not described in detail herein.

Referring to fig. 4, another embodiment of the training apparatus for face recognition model according to the embodiment of the present invention includes:

The acquiring module 301 specifically includes:

an obtaining unit 3011, configured to obtain initial training data sets respectively corresponding to multiple application scenarios, where the initial training data sets include open source data and private data;

a preprocessing unit 3012, configured to sequentially perform data cleaning and label labeling on each initial training data set, so as to obtain a plurality of preprocessed training data sets;

Optionally, the feature extraction module 302 may be further specifically configured to:

acquiring the number of data sets of a plurality of training data sets, and calculating the average data volume of each training data set according to the number of the data sets;

taking training data corresponding to the average data volume as batch processing data to obtain target batch processing data corresponding to each training data set;

Optionally, the second computing module 305 may be further specifically configured to:

Optionally, the first computing module 304 includes:

a first calculating unit 3041, configured to calculate a first feature center vector corresponding to each feature set, and a second feature center vector corresponding to a plurality of feature sets;

a second calculating unit 3042, configured to calculate a distance value between the first feature center vector and the second feature center vector corresponding to each feature set, and determine the distance value as a feature vector loss function value of each feature set, to obtain a plurality of feature vector loss function values;

and a third calculation unit 3043, configured to obtain preset labels corresponding to the training data in each training data set, and calculate a class loss function value of each class data set according to the preset labels and a preset cross entropy loss function, so as to obtain a plurality of class loss function values.

Optionally, the third computing unit 3043 may be further specifically configured to:

counting the number of the preset labels in each classified data set, and acquiring the feature vectors of the classified data corresponding to the preset labels in each classified data set;

according to a preset cross entropy loss function, the number of labels and feature vectors, calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values, wherein the cross entropy loss function is as follows:

Optionally, the iterative updating module 306 may be further specifically configured to:

sequentially extracting and classifying face features of the updated target batch data through the updated backbone network and the plurality of classification networks to obtain a plurality of target classification data sets;

and if the updated objective loss function value is not converged, carrying out iterative updating on the updated backbone network according to the updated objective loss function value until the updated objective loss function value is converged, and obtaining a final updated face recognition model.

The function implementation of each module and each unit in the training device of the face recognition model corresponds to each step in the training method embodiment of the face recognition model, and the function and the implementation process of the module and the unit are not described in detail herein.

The training device for the face recognition model in the embodiment of the present invention is described in detail above in fig. 3 and fig. 4 from the point of view of the modularized functional entity, and the training device for the face recognition model in the embodiment of the present invention is described in detail below from the point of view of hardware processing.

Fig. 5 is a schematic structural diagram of a training device for a face recognition model according to an embodiment of the present invention, where the training device 500 for a face recognition model may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the training device 500 for the face recognition model. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the training device 500 of the face recognition model.

The training device 500 of the face recognition model may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the training device structure of the face recognition model shown in fig. 5 does not constitute a limitation of the training device of the face recognition model, and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components.

The invention also provides a training device of the face recognition model, which comprises a memory and a processor, wherein the memory stores instructions which, when executed by the processor, cause the processor to execute the steps of the training method of the face recognition model in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, which when executed on a computer, cause the computer to perform the steps of the training method of the face recognition model.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The training method of the face recognition model is characterized by comprising the following steps of:

iteratively updating the backbone network according to the objective loss function value until the objective loss function value converges to obtain an updated face recognition model;

the step of extracting the face features of the training data sets through a main network in a preset face recognition model to obtain feature sets comprises the following steps:

sequentially performing face image region detection, face key point detection and face feature vector extraction on the target batch processing data through a main network in a preset face recognition model to obtain a plurality of feature sets;

The calculating of the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss function value of each classification data set to obtain a plurality of classification loss function values, including:

acquiring preset labels corresponding to the training data in each training data set, and calculating the classification loss function value of each classification data set according to the preset labels and a preset cross entropy loss function to obtain a plurality of classification loss function values;

calculating the classification loss function value of each classification data set according to the preset label and the preset cross entropy loss function to obtain a plurality of classification loss function values, wherein the method comprises the following steps:

，

wherein y represents the y-th training data set, theFor the classification dataset corresponding to the y-th training dataset, said +.>Is saidNumber of tags, said->Preset tag for the ith class, said +.>Is the feature vector.

2. The method according to claim 1, wherein calculating the objective loss function value of the face recognition model from the plurality of feature vector loss function values and the plurality of classification loss function values comprises:

3. The training method of a face recognition model according to claim 1 or claim 2, wherein the iteratively updating the backbone network according to the objective loss function value until the objective loss function value converges, to obtain an updated face recognition model, includes:

4. The method for training a face recognition model according to claim 1, wherein the acquiring the preprocessed plurality of training data sets, the plurality of training data sets being face training data sets respectively corresponding to a plurality of application scenarios, includes:

5. Training device for a face recognition model, characterized in that the training device for a face recognition model performs the training method for a face recognition model according to any one of claims 1 to 4, the training device for a face recognition model comprising:

6. A training device for a face recognition model, characterized in that the training device for a face recognition model comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the training device of the face recognition model to perform the training method of the face recognition model of any one of claims 1-4.

7. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement a method of training a face recognition model according to any one of claims 1-4.