WO2021139309A1

WO2021139309A1 - Method, apparatus and device for training facial recognition model, and storage medium

Info

Publication number: WO2021139309A1
Application number: PCT/CN2020/122376
Authority: WO
Inventors: 张国辉; 徐玲玲; 宋晨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-07-31
Filing date: 2020-10-21
Publication date: 2021-07-15
Also published as: CN111898547A; CN111898547B

Abstract

The present application relates to the field of artificial intelligence. Provided are a method, apparatus and device for training a facial recognition model, and a storage medium, which are used for solving the problem of the relatively low recognition accuracy of an existing facial recognition model. The method for training a facial recognition model comprises: acquiring multiple training data sets, and a backbone network and multiple classification networks of a preset facial recognition model; respectively performing facial feature extraction on the multiple training data sets by means of the backbone network to obtain multiple feature sets; classifying the multiple feature sets by means of the multiple classification networks to obtain multiple classification data sets; calculating multiple feature vector loss function values of the multiple feature sets, and multiple classification loss function values of the multiple classification data sets; according to the multiple feature vector loss function values and the multiple classification loss function values, calculating a target loss function value of the facial recognition model; and iteratively updating the backbone network according to the target loss function value until the target loss function value converges, so as to obtain an updated facial recognition model.

Description

Training method, device, equipment and storage medium of face recognition model

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010760772.0, and the invention title is "Face Recognition Model Training Method, Apparatus, Equipment and Storage Medium" on July 31, 2020, and its entire contents Incorporated in the application by reference.

Technical field

This application relates to the field of artificial intelligence neural networks, and in particular to a training method, device, equipment and storage medium of a face recognition model.

Background technique

Face recognition is a hot field in the field of image recognition. Usually, deep learning is used to train a neural network that can perform face recognition, that is, a face recognition model. For the recognition accuracy of the face recognition model, because the face recognition model obtained by training using the training data of the application scene will be limited by the application scene of the training data, its recognition accuracy will be low. Therefore, the face recognition model will be used. The universality of the recognition model is optimized to improve the recognition accuracy of the face recognition model.

At present, the optimization of the universality of face recognition models generally adopts fine-tuning finetune or mixing multiple training sets. However, the inventor realizes that due to the fine-tuning method after the model training, the original training set will be changed. Few features are retained, which leads to poor generalization effects of the final face recognition model; the method of mixing multiple training sets has overlapping data that is difficult to clean, and is introduced as dirty data into the training, which affects the training effect of the model. Therefore, the recognition accuracy of the existing face recognition model is low.

Summary of the invention

The main purpose of this application is to solve the problem of low recognition accuracy of existing face recognition models.

The first aspect of this application provides a method for training a face recognition model, including:

Acquiring preprocessed multiple training data sets, where the multiple training data sets are face training data sets corresponding to multiple application scenarios, respectively;

Performing face feature extraction on the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets, the face recognition model including the backbone network and multiple classification networks;

Classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

Calculating a target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

The backbone network is iteratively updated according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.

A second aspect of the present application provides a training device for a face recognition model. The training device for a face recognition model includes a memory, a processor, and a person stored in the memory and running on the processor. The training program of the face recognition model, when the processor executes the training program of the face recognition model, the following steps are implemented:

A third aspect of the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed on a computer, the computer executes the following steps:

The fourth aspect of the present application provides a training device for a face recognition model, including:

An obtaining module, configured to obtain a plurality of preprocessed training data sets, where the plurality of training data sets are face training data sets corresponding to a plurality of application scenarios;

The feature extraction module is used to extract the facial features of the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets. The face recognition model includes the backbone network and Multiple classification networks;

The classification module is configured to classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

The first calculation module is used to calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

The second calculation module is configured to calculate the target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

An iterative update module is used to iteratively update the backbone network according to the target loss function value until the target loss function value converges to obtain an updated face recognition model.

In the technical solution provided by this application, multiple pre-processed training data sets, as well as the backbone network and multiple classification networks of the preset face recognition model are obtained, and the multiple training data sets are people corresponding to multiple application scenarios. Face training data set; multiple training data sets are extracted through the backbone network to obtain multiple feature sets; multiple classification networks are used to classify multiple feature sets to obtain multiple classification data sets, of which, one The classification network corresponds to a feature set; calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values; according to Multiple feature vector loss function values and multiple classification loss function values are used to calculate the target loss function value of the face recognition model; the backbone network is iteratively updated according to the target loss function value until the target loss function value converges, and the updated person is obtained Face recognition model. In this application, by performing face feature extraction and classification on multiple training data sets, the existence of overlapping data and overlapping data as dirty data have an adverse effect on model training is avoided, and the loss function value and the value of the multiple feature vectors are avoided in this application. The target loss function value obtained by multiple classification loss function values is used to update the backbone network of the face recognition model, which can make the face recognition model have better universality, thereby improving the recognition accuracy of the existing face recognition model .

Description of the drawings

FIG. 1 is a schematic diagram of an embodiment of a method for training a face recognition model in an embodiment of the application;

FIG. 2 is a schematic diagram of another embodiment of a method for training a face recognition model in an embodiment of the application;

3 is a schematic diagram of an embodiment of a training device for a face recognition model in an embodiment of the application;

FIG. 4 is a schematic diagram of another embodiment of a training device for a face recognition model in an embodiment of the application;

Fig. 5 is a schematic diagram of an embodiment of a training device for a face recognition model in an embodiment of the application.

Detailed ways

The embodiments of the present application provide a method, device, equipment, and storage medium for training a face recognition model, which solve the problem of low recognition accuracy of the existing face recognition model.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments described herein can be implemented in a sequence other than the content illustrated or described herein. In addition, the terms "including" or "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Steps or units, but may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

For ease of understanding, the following describes the specific process of the embodiment of the present application. Please refer to FIG. 1. An embodiment of the training method of the face recognition model in the embodiment of the present application includes:

101. Obtain preprocessed multiple training data sets, where the multiple training data sets are face training data sets corresponding to multiple application scenarios, respectively.

It is understandable that the execution subject of this application may be a training device for a face recognition model, or a terminal or server corresponding to the logistics headquarters, which is not specifically limited here. The embodiment of the present application takes the server corresponding to the logistics headquarters as the execution subject as an example for description.

A training data set corresponds to an application scenario, such as: identification scenes and natural scenes. The training data set can be face data, open source data and private data in different dimensions, such as: face data of natural scenes, face data of Asians, attendance data, personal identification data, and competition data.

The server can extract multiple pre-processed training data sets from a preset database, and can also obtain face training data sets in different dimensions corresponding to multiple application scenarios from multiple channels, and perform face training data sets on the face training data sets. Preprocessing to obtain multiple training data sets after preprocessing.

102. Perform face feature extraction on multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets. The face recognition model includes a backbone network and multiple classification networks.

The preset face recognition model includes a backbone network and multiple classification networks. The output of the backbone network is the input of multiple classification networks. The data processed by the backbone network is classified through multiple classification networks to realize the training data set. Face recognition training. The backbone network can be a single convolutional neural network or a comprehensive framework of multiple convolutional neural networks. For example, the backbone network can be a deep residual learning framework ResNet or a target detection network framework ET-YOLOv3, or deep residual learning The framework ResNet combines the comprehensive framework of the target detection network framework ET-YOLOv3.

The server can perform face frame recognition, frame area division, face key point detection, and face feature vector extraction for each training data set through the backbone network of the face recognition model to obtain the corresponding features of each training data set Set (ie multiple feature sets). The convolutional network layer in the backbone network uses a small convolution kernel, which retains more features through the small convolution kernel, reduces the amount of calculation, and improves the efficiency of facial feature extraction.

103. Classify multiple feature sets through multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set.

The server obtains the label on the training data corresponding to each feature set, calls multiple classification networks, and classifies the multiple feature sets through the classification network and the labels to obtain multiple classification data sets. Among them, a classification network classifies a feature set. For example, multiple classification networks are A1, B1, C1, and D1, and multiple feature sets are A2, B2, C2, and D2. A1 classifies A2, and B1 classifies A2. B2 classifies, C1 classifies C2, and D1 classifies D2. Each classification network can adopt the same network structure or different network structure. For example, the classification network is A1, B1, C1, and D1 are linear classifiers, and the classification network is A1, B1, C1, and D1 as volumes. Product neural network Inception-v3, linear classifier, nearest neighbor classifier and Google network GoogLeNet, through the same network structure, reduce the complexity of the network, and use different network structures to process different types of training data, which is beneficial Improve the efficiency of classification and the universality of face recognition models.

104. Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values.

The server calculates the first center vector and the second center vector, calculates the distance value between each first center vector and the second center vector, and uses the distance value as the feature vector loss function value corresponding to each feature set, thereby obtaining more A feature vector loss function, where the first center vector is the center vector corresponding to each feature set, or the center vector corresponding to each training data in each feature set, and the second center vector can be the first center vector corresponding to all feature sets. The two center vector can also be the center vector corresponding to all the training data in each feature set.

The server can obtain the number of training data corresponding to each feature set, and calculate the sum value of the first center vector corresponding to all training data, and calculate the average value of the sum value according to the number of training data. The average value is corresponding to each feature set. For the second center vector, the server can also calculate the second center vector through a preset center vector formula.

The server calculates the classification loss function value of each classification data set through the preset cross-entropy loss function, thereby obtaining multiple classification loss function values. The cross-entropy loss function can be a multi-class cross-entropy loss function. Function, the derivation is simpler, can make the convergence faster, and the update of the corresponding weight matrix is faster.

105. Calculate the target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values.

After the server obtains multiple eigenvector loss function values and multiple classification loss function values, it obtains the number of data sets of multiple training data sets, and calculates the average eigenvector loss function of multiple eigenvector loss function values according to the number of data sets Value, and the average classification loss function value of multiple classification loss function values, the sum of the average feature vector loss function value and the average classification loss function value is used as the target loss function value of the face recognition model, or the average feature vector loss The weighted sum of the function value and the average classification loss function value is used as the target loss function value of the face recognition model. When each classification network calculates the classification loss function value, the corresponding classification network can be updated inversely according to the classification loss function value.

106. Iteratively update the backbone network according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.

The server iteratively updates the network structure and/or weight value of the backbone network according to the target loss function value and the preset number of iterations until the target loss function value converges (that is, the training accuracy of the face recognition model meets the preset conditions), and obtains The updated face recognition model. Among them, the network structure of the backbone network can be updated by adding or deleting the network layer of the backbone network, or by adding other network frameworks to update the network structure of the backbone network, or by modifying the size of the convolution kernel of the backbone network And step size to update the network structure of the backbone network. When iteratively update the backbone network, the server can also optimize the face recognition model in combination with optimization algorithms.

In the embodiment of the application, by extracting and categorizing face features of multiple training data sets, the existence of overlapping data and overlapping data as dirty data have an adverse effect on model training. The loss function based on multiple feature vectors is avoided. Value and the target loss function value obtained by multiple classification loss function values to update the backbone network of the face recognition model, which can make the face recognition model have better universality, thereby improving the performance of the existing face recognition model Recognition accuracy.

Referring to FIG. 2, another embodiment of the training method of the face recognition model in the embodiment of the present application includes:

201. Obtain initial training data sets corresponding to multiple application scenarios, where the initial training data sets include open source data and private data.

The server extracts the initial training data sets (open source data) in different dimensions corresponding to multiple different application scenarios from the open source database, and crawls the initial training data sets corresponding to multiple different application scenarios (open source data) from the network platform. , Extract multiple initial training data sets (private data) corresponding to different application scenarios from the alliance chain or private database.

202. Perform data cleaning and label labeling on each initial training data set in sequence to obtain multiple preprocessed training data sets.

The server performs missing value detection, missing value filling, and missing value cleanup on each initial training data set according to the preset missing value ratio to obtain the initial training data set after missing value processing, and the initial training data set after missing value processing Perform merging and de-duplication to obtain the initial training data set after merging and de-duplication, determine whether there is training data in the initial training data set after merging and de-duplication that does not meet the preset legality determination rules, if there is, delete the corresponding training data If it does not exist, the initial training data set after the merging and de-duplication processing is determined as a candidate training data set, and the candidate training data set is labeled to obtain multiple preprocessed training data sets.

Among them, the content of labeling may include at least one of classification, frame labeling, area labeling, and spot labeling, such as age-adult, gender-female, race-yellow, hair-long hair, face Facial expressions-smiles, face-wearing parts-spectacles classification labeling, frame labeling such as: labeling the frame position of the face in the image, area labeling such as: labeling the area position of the face in the image, marking the point labeling such as: face Key points are marked.

203. Perform face feature extraction on multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets. The face recognition model includes a backbone network and multiple classification networks.

Specifically, the server obtains the number of data sets of multiple training data sets, and calculates the average data volume of each training data set according to the number of data sets; the training data corresponding to the average data volume is used as batch data to obtain each training data set. The target batch data corresponding to the data set; through the backbone network in the preset face recognition model, the target batch data is sequentially subjected to face image region detection, face key point detection and face feature vector extraction to obtain multiple Feature set.

Take a training data set from multiple training data sets as an example. For example, the number of multiple training data sets acquired by the server is 5, the training data set E has 800 training data, and the average data of the training data set E If the amount is 160 training data, the target batch processing data is the 160 training data. Through the backbone network in the preset face recognition model, the 160 training data (that is, the target batch processing data) is processed into the face image area Detect, obtain the face area, perform face key point detection on the face area, obtain face key point information, and perform face feature vector extraction on the face key point information to obtain multiple feature sets. Among them, if the number of training data in different training data sets is inconsistent, the target batch data obtained from different training data sets will be randomly looped when the target batch data with a small number of training data is processed first in the data processing process. Processing until the end of the target batch data processing with the largest number of training data.

204. Classify multiple feature sets through multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set.

The server obtains the label on the training data corresponding to each feature set, calls multiple classification networks, and classifies the multiple feature sets through the classification network and the labels to obtain multiple classification data sets. Among them, a classification network classifies a feature set. Each classification network can use the same network structure or different network structure. Through the same network structure, the network complexity is reduced. By using different network structures to process different types of training data, it is beneficial to Improve the efficiency of classification and the universality of face recognition models.

205. Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values.

Specifically, the server calculates the first feature center vector corresponding to each feature set and the second feature center vector corresponding to multiple feature sets; calculates the difference between the first feature center vector and the second feature center vector corresponding to each feature set The distance value is determined as the feature vector loss function value of each feature set, and multiple feature vector loss function values are obtained; the preset labels corresponding to each training data in each training data set are obtained, according to the preset labels and The preset cross entropy loss function calculates the classification loss function value of each classification data set, and obtains multiple classification loss function values.

Among them, the server obtains the first feature vector of each feature set, the number of corresponding first training data in each feature set, the first number of target batch data, and calculates it through the preset first update center vector formula The first feature center vector corresponding to each feature set, and the first update center vector formula for calculating the first feature center vector is as follows:

p is used to indicate the p-th feature set, vc _p is the current first feature center vector, vc _p-1 is the first feature center vector of the previous iteration, vn _p is the first data number of the current iteration, n _p a first number of training data, v _i first current feature vector. Wherein, prior to the first iteration of the first feature of the center vector vc _p is 0.

The server obtains the second feature vector of all feature sets, the number of second training data corresponding to all feature sets, and the second data number of target batch data corresponding to all feature sets. Through the preset second update center vector formula, The formula for calculating the second update center vector of the second feature center vector is as follows:

q is used to indicate the qth iteration, v _q is the current second feature center vector, v _q-1 is the second feature center vector of the previous iteration, vk _q is the number of second data in the current iteration, n _q is the first 2. Number of training data, v _j is the second feature vector of all current feature sets. Among them, the second feature center vector v _q before the first iteration is zero.

The server obtains the dimension of the first feature vector of each feature set, and calculates the feature vector loss function value and the feature vector loss function value according to the dimension of the first feature vector, the first feature center vector and the second feature center vector of each feature set The calculation formula is as follows:

p is used to indicate the p-th feature set, m is the dimension of the first feature vector of each feature set, vc _p is the first feature center vector, and v _q is the second feature center vector.

Specifically, the server counts the number of preset labels in each classification data set, and obtains the feature vector of the classification data corresponding to the preset labels in each classification data set; according to the preset cross-entropy loss function, the number of labels and The feature vector is used to calculate the classification loss function value of each classification data set to obtain multiple classification loss function values. The cross-entropy loss function is as follows:

Wherein, y represents the y-th training data set, c _y is the y-th training data set corresponding to the classified data set, n _y is the number of labels, label _i is the i-th label preset categories, v _i is the eigenvector .

The server classifies the features according to the preset labels on each training data to obtain multiple classification data sets, so as to obtain the number of labels of the preset labels in each classification data set, the preset labels of each category, and the The feature vector generated by the classification data corresponding to the preset label, through the preset cross entropy loss function, combined with the number of labels and the feature vector calculation, the classification loss function value of each classification data set is obtained, thereby obtaining multiple classification losses Function value.

206. Calculate the target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values.

Specifically, the server calculates the average value of the multiple eigenvector loss function values according to the number of data sets to obtain the average eigenvector loss function value; calculates the average value of the multiple classification loss function values according to the number of data sets to obtain the average classification loss function value; Calculate the sum of the average feature vector loss function value and the average classification loss function value to obtain the target loss function value of the face recognition model.

For example: the number of data sets is 6, multiple eigenvector loss function values are L1, L2, L3, L4, and L5, and multiple classification loss function values are K1, K2, K3, and K4, then the average eigenvector loss function The value is (L1+L2+L3+L4+L5)/6=L, the average classification loss function value is (K1+K2+K3+K4)/6=K, and the target loss function value is LC=L+K.

207. Iteratively update the backbone network according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.

Specifically, it is judged whether the value of the target loss function converges. If the value of the target loss function does not converge, the target batch data is updated to obtain the updated target batch data, and the network structure of the backbone network is updated to obtain the updated The backbone network; through the updated backbone network and multiple classification networks, the updated target batch data is sequentially extracted and classified by facial features to obtain multiple target classification data sets; according to multiple target classification data sets, calculate The updated objective loss function value, and determine whether the updated objective loss function value converges; if the updated objective loss function value does not converge, the updated backbone network is iteratively updated according to the updated objective loss function value, Until the updated target loss function value converges, the final updated face recognition model is obtained.

When the server determines that the value of the target loss function has converged, it uses the current face recognition model as the final face recognition model. When the server determines that the updated target loss function value has converged, it uses the currently updated face recognition model as the final updated face recognition model. Among them, the facial feature extraction and classification are performed on the updated target batch data in sequence to obtain multiple target classification data sets. The operation method is similar to the

above steps

102, 103, 203, and 204, according to the multiple target classification data Set, the operation method of calculating the updated target loss function value is similar to the operation method of the

above steps

104, 105, 205, and 206, and will not be repeated here. In each iteration, the data quantity of each updated target batch data will be different, and it will change dynamically, which is equal to the sum of the target batch data in the previous iteration and the current target batch data.

In the embodiment of this application, by performing data cleaning and labeling on multiple initial training data sets, and extracting and classifying face features on multiple training data sets, different data sets do not need to be merged and cleaned, only You can clean them separately, which not only greatly saves the time of cleaning data, but also effectively avoids the existence of overlapping data and the situation of overlapping data as dirty data that has an adverse effect on model training. By losing function values according to multiple feature vectors and multiple The target loss function value obtained by the classification loss function value is used to update the backbone network of the face recognition model, which can make the face recognition model have better universality, thereby improving the recognition accuracy of the existing face recognition model.

The training method of the face recognition model in the embodiment of the application is described above, and the training device of the face recognition model in the embodiment of the application is described below. Please refer to FIG. 3, the training device of the face recognition model in the embodiment of the application One embodiment includes:

The obtaining module 301 is configured to obtain a plurality of preprocessed training data sets, and the plurality of training data sets are face training data sets corresponding to multiple application scenarios;

The feature extraction module 302 is used to extract facial features from multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets. The face recognition model includes the backbone network and multiple classifications. The internet;

The classification module 303 is configured to classify multiple feature sets through multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

The first calculation module 304 is configured to calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

The second calculation module 305 is configured to calculate the target loss function value of the face recognition model according to multiple feature vector loss function values and multiple classification loss function values;

The iterative update module 306 is configured to iteratively update the backbone network according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.

The function realization of each module in the above-mentioned face recognition model training device corresponds to each step in the embodiment of the above-mentioned face recognition model training method, and its functions and realization process are not repeated here.

Referring to FIG. 4, another embodiment of the training device for the face recognition model in the embodiment of the present application includes:

Among them, the obtaining module 301 specifically includes:

The obtaining unit 3011 is configured to obtain initial training data sets corresponding to multiple application scenarios, and the initial training data sets include open source data and private data;

The preprocessing unit 3012 is configured to sequentially perform data cleaning and label labeling on each initial training data set to obtain multiple preprocessed training data sets;

Optionally, the feature extraction module 302 may also be specifically used for:

Obtain the number of data sets of multiple training data sets, and calculate the average data volume of each training data set according to the number of data sets;

Use the training data corresponding to the average data volume as batch data to obtain the target batch data corresponding to each training data set;

Through the backbone network in the preset face recognition model, face image area detection, face key point detection and face feature vector extraction are sequentially performed on the target batch data to obtain multiple feature sets.

Optionally, the second calculation module 305 may also be specifically used for:

Calculate the average value of multiple eigenvector loss function values according to the number of data sets to obtain the average eigenvector loss function value;

Calculate the average value of multiple classification loss function values according to the number of data sets to obtain the average classification loss function value;

Calculate the sum of the average feature vector loss function value and the average classification loss function value to obtain the target loss function value of the face recognition model.

Optionally, the first calculation module 304 includes:

The first calculation unit 3041 is configured to calculate the first feature center vector corresponding to each feature set and the second feature center vectors corresponding to multiple feature sets;

The second calculation unit 3042 is used to calculate the distance value between the first feature center vector and the second feature center vector corresponding to each feature set, and determine the distance value as the feature vector loss function value of each feature set to obtain Multiple eigenvector loss function values;

The third calculation unit 3043 is used to obtain the preset label corresponding to each training data in each training data set, and calculate the classification loss function value of each classification data set according to the preset label and the preset cross-entropy loss function, and obtain the The value of the classification loss function.

Optionally, the third calculation unit 3043 may also be specifically configured to:

Count the number of tags of the preset tags in each classification data set, and obtain the feature vector of the classification data corresponding to the preset tags in each classification data set;

According to the preset cross-entropy loss function, the number of labels and the feature vector, the classification loss function value of each classification data set is calculated, and multiple classification loss function values are obtained. The cross-entropy loss function is as follows:

Optionally, the iterative update module 306 may also be specifically used to:

Determine whether the value of the target loss function converges. If the value of the target loss function does not converge, update the target batch data to obtain the updated target batch data, and update the network structure of the backbone network to obtain the updated backbone network ；

Through the updated backbone network and multiple classification networks, the updated target batch data is sequentially extracted and classified by facial features to obtain multiple target classification data sets;

According to multiple target classification data sets, calculate the updated target loss function value, and judge whether the updated target loss function value converges;

If the updated objective loss function value does not converge, the updated backbone network is iteratively updated according to the updated objective loss function value until the updated objective loss function value converges to obtain the final updated face recognition model .

The functional realization of each module and each unit in the above-mentioned face recognition model training device corresponds to each step in the above-mentioned embodiment of the face recognition model training method, and the functions and realization processes are not repeated here.

In the embodiment of this application, by performing data cleaning and labeling on multiple initial training data sets, and extracting and classifying face features on multiple training data sets, different data sets do not need to be merged and cleaned, only You can clean them separately, which not only greatly saves the time of cleaning data, but also effectively avoids the existence of overlapping data and overlapping data as dirty data that have an adverse effect on model training. By losing function values and multiple eigenvectors according to multiple The target loss function value obtained by the classification loss function value is used to update the backbone network of the face recognition model, which can make the face recognition model have better universality, thereby improving the recognition accuracy of the existing face recognition model.

The above figures 3 and 4 describe in detail the face recognition model training device in the embodiment of the present application from the perspective of modular functional entities, and the following is a detailed description of the face recognition model training device in the embodiment of the present application from the perspective of hardware processing. description.

FIG. 5 is a schematic structural diagram of a training device for a face recognition model provided by an embodiment of the present application. The training device 500 for the face recognition model may have relatively large differences due to different configurations or performances, and may include one or more A processor (central processing units, CPU) 510 (for example, one or more processors), a memory 520, and one or more storage media 530 (for example, one or one storage device with a large amount of storage) storing application programs 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the training device 500 for the face recognition model. Further, the processor 510 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the training device 500 of the face recognition model.

The face recognition model training device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or one or more operating systems 531, For example, Windows Serve, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the training device for the face recognition model shown in FIG. 5 does not constitute a limitation on the training device for the face recognition model, and may include more or less components than shown in the figure, or a combination of certain components. Some components, or different component arrangements.

This application also provides a training device for a face recognition model. The training device for a face recognition model includes a memory and a processor. The memory stores instructions. When the instructions are executed by the processor, the processor executes each of the foregoing. The steps of the training method of the face recognition model in the embodiment.

This application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium, and the computer-readable storage medium may also be a volatile computer-readable storage medium. The computer-readable storage medium stores instructions, and when the instructions run on a computer, the computer executes the steps of the method for training the face recognition model.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A method for training a face recognition model, wherein the training method for the face recognition model includes:

Acquiring preprocessed multiple training data sets, where the multiple training data sets are face training data sets corresponding to multiple application scenarios, respectively;

Performing face feature extraction on the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets, the face recognition model including the backbone network and multiple classification networks;

Classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

Calculating a target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

The backbone network is iteratively updated according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.
The method for training a face recognition model according to claim 1, wherein said using the backbone network in the preset face recognition model, the plurality of training data sets are respectively subjected to face feature extraction to obtain a plurality of Feature set, including:

Acquiring the number of data sets of the multiple training data sets, and calculating the average data amount of each training data set according to the number of data sets;

Use the training data corresponding to the average data volume as batch data to obtain target batch data corresponding to each training data set;

Through the backbone network in the preset face recognition model, face image region detection, face key point detection, and face feature vector extraction are sequentially performed on the target batch data to obtain multiple feature sets.
The method for training a face recognition model according to claim 2, wherein the calculation of the target loss function of the face recognition model is based on the plurality of feature vector loss function values and the plurality of classification loss function values Values include:

Calculating an average value of the multiple eigenvector loss function values according to the number of the data sets to obtain an average eigenvector loss function value;

Calculating an average value of the multiple classification loss function values according to the number of the data sets to obtain an average classification loss function value;

The sum of the average feature vector loss function value and the average classification loss function value is calculated to obtain the target loss function value of the face recognition model.
The method for training a face recognition model according to claim 2, wherein said calculating the feature vector loss function value of each feature set to obtain a plurality of feature vector loss function values, and calculating the classification loss of each classification data set Function value, get multiple classification loss function values, including:

Calculating a first feature center vector corresponding to each feature set, and a second feature center vector corresponding to the multiple feature sets;

Calculate the distance value between the first feature center vector corresponding to each feature set and the second feature center vector, and determine the distance value as the feature vector loss function value of each feature set to obtain multiple feature vectors Loss function value;

Obtain a preset label corresponding to each training data in each training data set, and calculate the classification loss function value of each classification data set according to the preset label and the preset cross-entropy loss function to obtain multiple classification loss function values.
The method for training a face recognition model according to claim 4, wherein the value of the classification loss function of each classification data set is calculated according to the preset label and the preset cross-entropy loss function to obtain a plurality of classifications Loss function value, including:

Counting the number of tags of the preset tags in each classification data set, and obtaining the feature vector of the classification data corresponding to the preset tags in each classification data set;

According to the preset cross-entropy loss function, the number of tags, and the feature vector, the classification loss function value of each classification data set is calculated to obtain multiple classification loss function values, and the cross-entropy loss function is as follows:

Wherein, the y represents the y-th training data set, the c y is the classification data set corresponding to the y-th training data set, the n y is the number of the labels, and the label i is the i-th classification preset tag, the v i is the feature vector.
The method for training a face recognition model according to any one of claims 2-5, wherein the iterative update of the backbone network according to the value of the target loss function until the value of the target loss function converges, Get the updated face recognition model, including:

Determine whether the value of the target loss function converges. If the value of the target loss function does not converge, update the target batch data to obtain the updated target batch data, and perform the network structure of the backbone network. Update, get the updated backbone network;

Performing face feature extraction and classification on the updated target batch data sequentially through the updated backbone network and the multiple classification networks to obtain multiple target classification data sets;

Calculating an updated target loss function value according to the multiple target classification data sets, and judging whether the updated target loss function value converges;

If the updated objective loss function value does not converge, the updated backbone network is iteratively updated according to the updated objective loss function value until the updated objective loss function value converges to obtain the final The updated face recognition model.
The method for training a face recognition model according to claim 1, wherein said acquiring a plurality of training data sets after preprocessing, said plurality of training data sets being face training data sets corresponding to a plurality of application scenarios respectively ,include:

Acquiring initial training data sets corresponding to multiple application scenarios, where the initial training data sets include open source data and private data;

Perform data cleaning and label labeling on each initial training data set in order to obtain multiple preprocessed training data sets.
A training device for a face recognition model, wherein the training device for the face recognition model includes: a memory, a processor, and training of a face recognition model stored in the memory and running on the processor Program, the processor implements the following steps when executing the training program of the face recognition model:

Acquiring preprocessed multiple training data sets, where the multiple training data sets are face training data sets corresponding to multiple application scenarios, respectively;

Performing face feature extraction on the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets, the face recognition model including the backbone network and multiple classification networks;

Classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

Calculating a target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

The backbone network is iteratively updated according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.
The training device for a face recognition model according to claim 8, wherein the processor executes the training program of the face recognition model to implement the backbone network in the preset face recognition model, and the The face feature extraction is performed on multiple training data sets, and when multiple feature sets are obtained, the following steps are included:

Acquiring the number of data sets of the multiple training data sets, and calculating the average data amount of each training data set according to the number of data sets;

Use the training data corresponding to the average data volume as batch data to obtain target batch data corresponding to each training data set;

Through the backbone network in the preset face recognition model, face image region detection, face key point detection, and face feature vector extraction are sequentially performed on the target batch data to obtain multiple feature sets.
The training device for a face recognition model according to claim 9, wherein the processor executes a training program of the face recognition model to implement the loss function value according to the plurality of feature vectors and the plurality of classifications Loss function value, when calculating the target loss function value of the face recognition model, the following steps are included:

Calculating an average value of the multiple eigenvector loss function values according to the number of the data sets to obtain an average eigenvector loss function value;

Calculating an average value of the multiple classification loss function values according to the number of the data sets to obtain an average classification loss function value;

The sum of the average feature vector loss function value and the average classification loss function value is calculated to obtain the target loss function value of the face recognition model.
The training device for a face recognition model according to claim 9, wherein the processor executes the training program of the face recognition model to realize the calculation of the feature vector loss function value of each feature set to obtain a plurality of features Vector loss function value, and calculate the classification loss function value of each classification data set. When multiple classification loss function values are obtained, the following steps are included:

Calculating a first feature center vector corresponding to each feature set, and a second feature center vector corresponding to the multiple feature sets;

Calculate the distance value between the first feature center vector corresponding to each feature set and the second feature center vector, and determine the distance value as the feature vector loss function value of each feature set to obtain multiple feature vectors Loss function value;

Obtain a preset label corresponding to each training data in each training data set, and calculate the classification loss function value of each classification data set according to the preset label and the preset cross-entropy loss function to obtain multiple classification loss function values.
The training device for a face recognition model according to claim 11, wherein the processor executes the training program of the face recognition model to realize the calculation according to the preset label and the preset cross-entropy loss function The classification loss function value of each classification data set, when multiple classification loss function values are obtained, the following steps are included:

Counting the number of tags of the preset tags in each classification data set, and obtaining the feature vector of the classification data corresponding to the preset tags in each classification data set;

According to the preset cross-entropy loss function, the number of tags, and the feature vector, the classification loss function value of each classification data set is calculated to obtain multiple classification loss function values, and the cross-entropy loss function is as follows:

Wherein, the y represents the y-th training data set, the c y is the classification data set corresponding to the y-th training data set, the n y is the number of the labels, and the label i is the i-th classification preset tag, the v i is the feature vector.
The training device for a face recognition model according to any one of claims 9-12, wherein the processor executes the training program of the face recognition model to realize the pairing of the face recognition model according to the target loss function value The backbone network is updated iteratively until the value of the target loss function converges, and when the updated face recognition model is obtained, the following steps are included:

Determine whether the value of the target loss function converges. If the value of the target loss function does not converge, update the target batch data to obtain the updated target batch data, and perform the network structure of the backbone network. Update, get the updated backbone network;

Performing face feature extraction and classification on the updated target batch data sequentially through the updated backbone network and the multiple classification networks to obtain multiple target classification data sets;

Calculating an updated target loss function value according to the multiple target classification data sets, and judging whether the updated target loss function value converges;

If the updated objective loss function value does not converge, the updated backbone network is iteratively updated according to the updated objective loss function value until the updated objective loss function value converges to obtain the final The updated face recognition model.
The training device for a face recognition model according to claim 8, wherein the processor executes a training program of the face recognition model to realize the acquisition of a plurality of training data sets after preprocessing, and the plurality of training When the data set is a face training data set corresponding to multiple application scenarios, the following steps are included:

Acquiring initial training data sets corresponding to multiple application scenarios, where the initial training data sets include open source data and private data;

Perform data cleaning and label labeling on each initial training data set in order to obtain multiple preprocessed training data sets.
A computer-readable storage medium in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer executes the following steps:

Acquiring preprocessed multiple training data sets, where the multiple training data sets are face training data sets corresponding to multiple application scenarios, respectively;

Performing face feature extraction on the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets, the face recognition model including the backbone network and multiple classification networks;

Classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

Calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

Calculating a target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

The backbone network is iteratively updated according to the value of the target loss function until the value of the target loss function converges to obtain an updated face recognition model.
The computer-readable storage medium according to claim 15, wherein the computer-readable storage medium executes the computer instructions to implement the backbone network in the preset face recognition model, and performs the respective training data sets for the plurality of training data sets. When performing face feature extraction to obtain multiple feature sets, the following steps are included:

Acquiring the number of data sets of the multiple training data sets, and calculating the average data amount of each training data set according to the number of data sets;

Use the training data corresponding to the average data volume as batch data to obtain target batch data corresponding to each training data set;

Through the backbone network in the preset face recognition model, face image region detection, face key point detection, and face feature vector extraction are sequentially performed on the target batch data to obtain multiple feature sets.
The computer-readable storage medium according to claim 16, wherein the computer-readable storage medium executes the computer instructions to implement the calculation of the loss function value based on the plurality of feature vector loss function values and the plurality of classification loss function values When describing the objective loss function value of the face recognition model, the following steps are included:

Calculating an average value of the multiple eigenvector loss function values according to the number of the data sets to obtain an average eigenvector loss function value;

Calculating an average value of the multiple classification loss function values according to the number of the data sets to obtain an average classification loss function value;

The sum of the average feature vector loss function value and the average classification loss function value is calculated to obtain the target loss function value of the face recognition model.
The computer-readable storage medium according to claim 16, wherein the computer-readable storage medium executes the computer instructions to implement the calculation of the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and Calculate the classification loss function value of each classification data set, and when multiple classification loss function values are obtained, the following steps are included:

Calculating a first feature center vector corresponding to each feature set, and a second feature center vector corresponding to the multiple feature sets;

Calculate the distance value between the first feature center vector corresponding to each feature set and the second feature center vector, and determine the distance value as the feature vector loss function value of each feature set to obtain multiple feature vectors Loss function value;

Obtain a preset label corresponding to each training data in each training data set, and calculate the classification loss function value of each classification data set according to the preset label and the preset cross-entropy loss function to obtain multiple classification loss function values.
The computer-readable storage medium according to claim 18, wherein the computer-readable storage medium executes the computer instructions to realize the calculation of each classification data set according to the preset label and the preset cross-entropy loss function Classification loss function value. When multiple classification loss function values are obtained, the following steps are included:

Counting the number of tags of the preset tags in each classification data set, and obtaining the feature vector of the classification data corresponding to the preset tags in each classification data set;

According to the preset cross-entropy loss function, the number of tags, and the feature vector, the classification loss function value of each classification data set is calculated to obtain multiple classification loss function values, and the cross-entropy loss function is as follows:

Wherein, the y represents the y-th training data set, the c y is the classification data set corresponding to the y-th training data set, the n y is the number of the labels, and the label i is the i-th classification preset tag, the v i is the feature vector.
A training device for a face recognition model, characterized in that the training device for a face recognition model includes:

An obtaining module, configured to obtain a plurality of preprocessed training data sets, where the plurality of training data sets are face training data sets corresponding to a plurality of application scenarios;

The feature extraction module is used to extract the facial features of the multiple training data sets through the backbone network in the preset face recognition model to obtain multiple feature sets. The face recognition model includes the backbone network and Multiple classification networks;

The classification module is configured to classify the multiple feature sets through the multiple classification networks to obtain multiple classification data sets, where one classification network corresponds to one feature set;

The first calculation module is used to calculate the feature vector loss function value of each feature set to obtain multiple feature vector loss function values, and calculate the classification loss function value of each classification data set to obtain multiple classification loss function values;

The second calculation module is configured to calculate the target loss function value of the face recognition model according to the multiple feature vector loss function values and the multiple classification loss function values;

An iterative update module is used to iteratively update the backbone network according to the target loss function value until the target loss function value converges to obtain an updated face recognition model.