CN112801054B - Face recognition model processing method, face recognition method and device - Google Patents

Face recognition model processing method, face recognition method and device Download PDF

Info

Publication number
CN112801054B
CN112801054B CN202110354900.6A CN202110354900A CN112801054B CN 112801054 B CN112801054 B CN 112801054B CN 202110354900 A CN202110354900 A CN 202110354900A CN 112801054 B CN112801054 B CN 112801054B
Authority
CN
China
Prior art keywords
face
sample
face recognition
images
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110354900.6A
Other languages
Chinese (zh)
Other versions
CN112801054A (en
Inventor
许剑清
沈鹏程
李绍欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110354900.6A priority Critical patent/CN112801054B/en
Publication of CN112801054A publication Critical patent/CN112801054A/en
Application granted granted Critical
Publication of CN112801054B publication Critical patent/CN112801054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a processing method of a face recognition model, a face recognition method and a face recognition device. The method relates to a computer vision technology in the field of artificial intelligence, and comprises the following steps: the method comprises the steps of firstly using a majority of population face images in a first sample set to pre-train an initial face recognition model to obtain a pre-trained face recognition model, then using sample images in the first sample set, a second sample set and a third sample set to perform fine tuning training on the pre-trained face recognition model, and in the fine tuning training process, after eliminating gradients corresponding to the sample images which belong to the third sample set and comprise a majority of population faces, updating model parameters of the pre-trained face recognition model by using gradients corresponding to the rest of sample images in the sample image set. The method can be applied to face recognition under the scenes of intelligent business overload, intelligent traffic and the like, and the accuracy of face recognition of various groups can be improved by adopting the method.

Description

Face recognition model processing method, face recognition method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method for processing a face recognition model, a face recognition method, and an apparatus.
Background
With the development of artificial intelligence, machine learning models are more and more widely used. For example, before a user performs various operations through a computer device, a face recognition operation is often required, and the face recognition operation generally performs data processing through a face recognition model.
At present, for different user groups, the recognition accuracy of the face recognition model has larger deviation, namely the traditional face recognition operation has the problem of inaccuracy.
Disclosure of Invention
Therefore, it is necessary to provide a processing method of a face recognition model, a face recognition method and a face recognition device, which can improve the accuracy of face recognition for various groups, in order to solve the above technical problems.
A processing method of a face recognition model comprises the following steps:
acquiring a first sample set, a second sample set and a third sample set, wherein samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises the majority group face images and the minority group face images;
pre-training the initial face recognition model by using a majority of group face images in the first sample set to obtain a pre-trained face recognition model;
iteratively obtaining a sample image set required by fine tuning training from a first sample set, a second sample set and a third sample set, in the process of performing fine tuning training on the pre-trained face recognition model by using the sample images in the sample image set, after eliminating gradients corresponding to the sample images which belong to the third sample set and comprise faces of a majority of groups, updating model parameters of the pre-trained face recognition model by using the gradients corresponding to the remaining sample images in the sample image set, and obtaining the face recognition model for performing face recognition on the face images of the majority of groups and the face images of a minority of groups until iteration is stopped.
An apparatus for processing a face recognition model, the apparatus comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first sample set, a second sample set and a third sample set, samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images;
the pre-training module is used for pre-training the initial face recognition model by using the majority of population face images in the first sample set to obtain a pre-training face recognition model;
and the fine tuning training module is used for iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, eliminating gradients corresponding to sample images which belong to the third sample set and comprise a plurality of groups of faces in the fine tuning training process of the pre-trained face recognition model by using the sample images in the sample image set, updating model parameters of the pre-trained face recognition model by using the gradients corresponding to the rest sample images in the sample image set until iteration is stopped, and obtaining the face recognition model for carrying out face recognition on the plurality of groups of face images and the few groups of face images.
In one embodiment, the pre-training module is further to: acquiring a plurality of groups of face images and corresponding face label information from a first sample set; inputting a plurality of groups of face images acquired from the first sample set into an initial face recognition model to obtain a face recognition prediction result; constructing a pre-training loss function according to the face label information and the face recognition prediction result; and after the model parameters when the pre-training loss function is minimized are taken as the updated model parameters of the initial face recognition model, returning to the step of acquiring the face images of most groups from the first sample set to continue training until the training end condition is met.
In one embodiment, the pre-training module is further to: extracting the face image characteristics corresponding to the face images of the majority of groups through a characteristic extraction network in the initial face recognition model; and obtaining face recognition prediction results corresponding to the face images of the majority of groups based on the face image characteristics through a classification network in the initial face recognition model.
In one embodiment, the fine training module is further to: acquiring a preset sampling proportion; and respectively sampling the first sample set, the second sample set and the third sample set according to a sampling proportion to obtain a sample image set required by fine tuning training.
In one embodiment, the fine training module is further to: carrying out fine tuning training on the pre-trained face recognition model by using each sample image in the sample image set, and acquiring the corresponding gradient of each sample image in the fine tuning training process; and after the gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of human faces are excluded from the gradients corresponding to the sample images, updating the model parameters of the pre-training human face recognition model by using the gradients corresponding to the residual sample images in the sample image set.
In one embodiment, the fine training module is further to: inputting each sample image in the sample image set into a pre-training face recognition model to obtain a face recognition prediction result corresponding to each sample image; constructing a fine-tuning training loss function according to the face label information and the face recognition prediction result corresponding to each sample image; and obtaining the corresponding gradient of each sample image when the fine training loss function is minimized.
In one embodiment, the fine training module is further to: extracting the face image characteristics corresponding to each sample image through a characteristic extraction network in the pre-training face recognition model; and obtaining a face recognition prediction result corresponding to each sample image based on the face image characteristics through a classification network in the pre-training face recognition model.
In one embodiment, the fine training module is further to: constructing a first loss according to face label information and a face recognition prediction result corresponding to a sample image belonging to a first sample set in the sample image set; constructing a second loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the second sample set in the sample image set; constructing a third loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the third sample set in the sample image set; acquiring a preset loss weighting coefficient; and summing the first loss, the second loss and the third loss according to the loss weighting coefficient to obtain a fine tuning training loss function.
In one embodiment, the fine training module is further to: when sample images obtained by sampling from a third sample set comprise face images of a minority group in the sample image set, updating model parameters of the pre-trained face recognition model by using corresponding gradients of the sample images which belong to the third sample set and comprise the faces of the minority group, the sample images which belong to the first sample set and the sample images which belong to the second sample set; and when the sample images sampled from the third sample set in the sample image set do not comprise a few group face images, updating the model parameters of the pre-trained face recognition model by using the corresponding gradients of the sample images belonging to the first sample set and the sample images belonging to the second sample set.
In one embodiment, the processing apparatus of the face recognition model further comprises a face recognition module, and the face recognition module is configured to: acquiring a face image to be verified and a to-be-verified identity corresponding to the face image to be verified, wherein the face image to be verified comprises a majority group face or a minority group face; obtaining the image characteristics to be verified corresponding to the face image to be verified through the face recognition model obtained after fine tuning training; acquiring reference image characteristics corresponding to the identity to be verified; and when the similarity between the image feature to be verified and the reference image feature exceeds a threshold value, determining that the face image to be verified passes identity verification.
In one embodiment, the face recognition module is further configured to: acquiring a face image to be matched, wherein the face image to be matched comprises a majority group face or a minority group face; acquiring the image characteristics to be matched corresponding to the face image to be matched through a face recognition model obtained after fine tuning training; and matching the image features to be matched with at least one reference image feature, and taking the target identity corresponding to the reference image feature with the maximum matching degree as the identity of the face in the face image to be matched.
A face recognition method, the method comprising:
acquiring a face image to be recognized, wherein the face image to be recognized comprises at least one of a majority group face and a minority group face;
carrying out face recognition on a face image to be recognized through a trained face recognition model to obtain a face recognition result corresponding to the face image to be recognized, wherein the face recognition model is obtained by carrying out pre-training on an initial face recognition model through a plurality of groups of face images in a first sample set, then carrying out fine tuning training on the pre-trained face recognition model through sample images in a sample image set obtained from the first sample set, a second sample set and a third sample set, removing gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of faces in the fine tuning training process, and updating model parameters of the pre-trained face recognition model through gradients corresponding to the rest sample images in the sample image set;
the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
A face recognition apparatus, the apparatus comprising:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a face image to be recognized, and the face image to be recognized comprises at least one of a majority group face and a minority group face;
the face recognition module is used for carrying out face recognition on a face image to be recognized through a trained face recognition model to obtain a face recognition result corresponding to the face image to be recognized, the face recognition model is obtained by carrying out pre-training on an initial face recognition model through a majority of groups of face images in a first sample set, then carrying out fine tuning training on the pre-trained face recognition model through sample images in a sample image set obtained from the first sample set, a second sample set and a third sample set, in addition, the gradients corresponding to the sample images which belong to the third sample set and comprise the majority of groups of faces are eliminated in the fine tuning training process, and the model parameters of the pre-trained face recognition model are updated through the gradients corresponding to the rest sample images in the sample image set;
the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the processing method of the face recognition model and/or the steps of the face recognition method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the above-mentioned processing method of a face recognition model and/or the steps of a face recognition method.
A computer program comprising computer instructions stored in a computer readable storage medium, the computer instructions being read by a processor of a computer device from the computer readable storage medium, the computer instructions being executed by the processor to cause the computer device to perform the above-mentioned processing method of a face recognition model and/or steps of the face recognition method.
According to the processing method, the face recognition method and device, the computer equipment and the storage medium of the face recognition model, the initial face recognition model is pre-trained by using the majority of groups of face images in the first sample set to obtain the pre-trained face recognition model, and the accuracy of the face recognition model in recognizing the majority of groups of face images is improved; then, using the sample images in the first sample set, the second sample set and the third sample set to fine-tune a pre-trained face recognition model, wherein the pre-trained face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, learns features for distinguishing minority groups based on minority group face images of the second sample set, learns features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the third sample set comprises majority group face images, the method can be used for intervening the learning of the pre-trained face recognition model on the features of the minority group to expand the feature space of the minority group, reduce the similarity between the non-same persons of the minority group calculated by the face recognition model and improve the recognition accuracy of the face image of the minority group; therefore, the accuracy of the face recognition model in recognizing the face images of the majority of groups is improved, and the accuracy of the face recognition model in recognizing the face images of the minority of groups is also greatly improved.
Drawings
FIG. 1 is a diagram of an exemplary environment in which a face recognition model may be implemented;
FIG. 2 is a schematic flow chart illustrating a method for processing a face recognition model according to an embodiment;
FIG. 3 is a diagram of a feature space of a face recognition model in one embodiment;
FIG. 4 is a diagram illustrating pre-training of an initial face recognition model in one embodiment;
FIG. 5 is a diagram illustrating a fine-tuning training of a pre-trained face recognition model in one embodiment;
FIG. 6 is a schematic flow chart illustrating pre-training of an initial face recognition model in one embodiment;
FIG. 7 is a schematic flow chart illustrating a fine-tuning training of a pre-trained face recognition model according to an embodiment;
FIG. 8 is a diagram illustrating a pre-trained model for fine-tuning training in another embodiment;
FIG. 9 is a flowchart illustrating a processing method of a face recognition model according to another embodiment;
FIG. 10 is a flow diagram illustrating a face recognition method according to one embodiment;
FIG. 11 is a block diagram of an apparatus for processing a face recognition model according to an embodiment;
FIG. 12 is a block diagram showing the structure of a face recognition apparatus according to an embodiment;
FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The processing method of the face recognition model provided by the embodiment of the application relates to an Artificial Intelligence (AI) technology, wherein the AI technology is a theory, a method, a technology and an application system which simulate, extend and expand human Intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and use the knowledge to acquire an optimal result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The processing method of the face recognition model provided by the embodiment of the application mainly relates to an artificial intelligence Machine Learning (ML) technology. Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
For example, in the embodiment of the present application, the computer device first obtains a first sample set, a second sample set and a third sample set, where samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set includes majority group face images and minority group face images, then uses the majority group face images in the first sample set to pre-train the initial face recognition model to obtain a pre-trained face recognition model, then uses the sample image sets obtained from the first sample set, the second sample set and the third sample set to perform fine tuning training on the pre-trained face recognition model, and in the course of fine tuning training, after excluding gradients corresponding to sample images belonging to the third sample set and including majority group faces, uses gradients corresponding to the remaining sample images in the sample image sets to update model parameters of the pre-trained face recognition model, and finally, obtaining a face recognition model for carrying out face recognition on the face images of the majority group and the face images of the minority group.
The embodiment of the application provides a processing method of a face recognition model, and further relates to a block chain technology. For example, in the embodiment of the present application, the server may be a block chain node in a block chain network, the first sample set, the second sample set, and the third sample set may be stored on a block chain, and the server may obtain the first sample set, the second sample set, and the third sample set from data blocks of the block chain. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The processing method of the face recognition model provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The method comprises the steps that a terminal 102 obtains a first sample set, a second sample set and a third sample set, samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, the third sample set comprises majority group face images and minority group face images, the first sample set, the second sample set and the third sample set are sent to a server 104, the server 104 uses the majority group face images in the first sample set to pre-train an initial face recognition model to obtain a pre-trained face recognition model, then iteratively obtains a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, in the process of using the sample images in the sample image set to perform fine tuning training on the pre-trained face recognition model, after eliminating gradients corresponding to the sample images which belong to the third sample set and comprise majority group faces, and updating model parameters of the pre-training face recognition model by using gradients corresponding to the residual sample images in the sample image set until iteration is stopped, and obtaining the face recognition model for carrying out face recognition on the face images of the majority group and the face images of the minority group.
The terminal 102 may be, but is not limited to, various smart phones, tablet computers, notebook computers, desktop computers, portable wearable devices, smart speakers, and the like. The server 104 may be an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDNs, and big data and artificial intelligence platforms.
In the method for processing a face recognition model provided in the embodiment of the present application, an execution main body may be a processing apparatus of the face recognition model provided in the embodiment of the present application, or a computer device integrated with the processing apparatus of the face recognition model, where the processing apparatus of the face recognition model may be implemented in a hardware or software manner. The computer device may be the terminal 102 or the server 104 shown in fig. 1.
The processing method of the face recognition model provided by the embodiment of the application can be applied to a training scene of the face recognition model. For example, a computer device obtains a first sample set, a second sample set and a third sample set, wherein samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises the majority group face images and the minority group face images; firstly, pre-training an initial face recognition model by using a majority of group face images in a first sample set to obtain a pre-training face recognition model; and then iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, in the process of performing fine tuning training on the pre-trained face recognition model by using the sample images in the sample image set, after eliminating gradients corresponding to the sample images which belong to the third sample set and comprise faces of a majority of groups, updating model parameters of the pre-trained face recognition model by using the gradients corresponding to the remaining sample images in the sample image set, and obtaining the face recognition model for performing face recognition on the face images of the majority of groups and the face images of a minority of groups until iteration is stopped.
The face recognition method provided by the application can also be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The method comprises the steps that a terminal 102 obtains a face image to be recognized, the face image to be recognized comprises at least one of a majority group face and a minority group face, the face image to be recognized is sent to a server 104, the server 104 conducts face recognition on the face image to be recognized through a face recognition model obtained through training by the method provided by the embodiment of the application, a face recognition result corresponding to the face image to be recognized is obtained, and the server 104 can return the face recognition result to the terminal 102.
In the face recognition method provided by the embodiment of the present application, the execution subject may be the face recognition apparatus provided by the embodiment of the present application, or a computer device integrated with the face recognition apparatus, where the face recognition apparatus may be implemented in a hardware or software manner. The computer device may be the terminal 102 or the server 104 shown in fig. 1.
The face recognition method provided by the embodiment of the application can be applied to one-to-one identity verification scene, and the face recognition accuracy of various groups under the scene can be improved. In the one-to-one identity authentication scene, under the condition that the identity to be authenticated is known, the image feature to be authenticated corresponding to the face image to be authenticated is compared with the reference image feature corresponding to the identity to be authenticated so as to verify whether the identity corresponding to the face in the face image to be authenticated is the identity to be authenticated. Such as mobile terminal screen unlocking, account login in social applications, transaction payment in bank applications, criminal capture, and the like.
For example, in order to ensure the security of user data, an electronic payment application, a financial service application, a social communication application, a government affairs service application, a trip service application, and the like have an identity verification function, and after the user passes the identity verification, related businesses can be transacted based on the application. Specifically, the terminal acquires a face image of a user, wherein the face in the face image can be a face of a majority group or a face of a minority group, after the terminal acquires a face recognition model which can recognize faces of various groups through the training of the method provided by the embodiment of the application, the face recognition model acquires face image features corresponding to the face image, the face image features are compared with reference image features corresponding to identities to be verified, and when the similarity between the face image features and the reference image features exceeds a threshold value, the user is determined to pass identity verification.
For example, in a terminal screen unlocking scene, a terminal stores reference image features, a face image of a user is collected, the face in the face image can be a face of a majority group or a face of a minority group, after the terminal obtains a face recognition model capable of recognizing various groups of faces through training by the method provided by the embodiment of the application, the face image features corresponding to the face image are obtained through the face recognition model, the face image features are compared with the reference image features corresponding to the identity to be verified, and when the similarity between the face image features and the reference image features exceeds a threshold value, the terminal screen is unlocked.
The face recognition method provided by the embodiment of the application can be applied to one-to-many identity recognition scenes, and the face recognition accuracy of various groups in the scene can be improved. In the one-to-many identity recognition scene, the image features to be matched corresponding to the face images to be matched are compared with the reference image features prestored in the database, so that the identities corresponding to the faces in the face images to be matched are determined from the identities corresponding to the reference image features in the database. Such as punching cards by a face attendance machine, searching missing persons, etc.
For example, in a face payment scene, a terminal acquires a face image of an electronic payment user, after the terminal acquires a face recognition model which can recognize faces of various groups through the training of the method provided by the embodiment of the application, the face recognition model acquires face image features corresponding to the face image, the face image features are matched with at least one reference image feature in a payment system, and a deduction operation is executed from an account corresponding to the reference image feature with the maximum matching degree. Particularly, under the scene of diverse groups of human faces such as intelligent business surpassing, intelligent traffic, intelligent travel and the like, the human face identification method can provide great convenience for users and intelligent business surpassing.
For example, in a traffic safety scene, the face image of an offending driver is collected, after a terminal obtains a face recognition model which can recognize faces of various groups through the training of the method provided by the embodiment of the application, the face image feature corresponding to the face image is obtained through the face recognition model, the face image feature is matched with at least one reference image feature in a traffic system, and the identity corresponding to the reference image feature with the maximum matching degree is the identity of the offending driver.
For example, in a work attendance scene, a face image of an attendance person is collected, after a terminal obtains a face recognition model which can recognize various groups of faces through the training of the method provided by the embodiment of the application, the face image characteristics corresponding to the face image are obtained through the face recognition model, the face image characteristics are matched with at least one reference image characteristic in an attendance system, and an attendance record is updated for the identity corresponding to the reference image characteristic with the maximum matching degree.
In an embodiment, as shown in fig. 2, a method for processing a face recognition model is provided, and this embodiment is mainly illustrated by applying the method to a computer device (the terminal 102 or the server 104 in fig. 1 above), and includes the following steps:
step S202, a first sample set, a second sample set and a third sample set are obtained, wherein samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
And the first sample set, the second sample set and the third sample set are all image sets used for training a face recognition model. The majority group face image means that the group to which the face in the image belongs is a majority group, and the minority group face image means that the group to which the face in the image belongs is a minority group.
In one embodiment, the computer device divides the majority population and the minority population from the training samples of the face recognition model according to the number of samples of each population in the training samples of the face recognition model. In a training sample of the face recognition model, the number distribution of samples of each group is unbalanced, for example, a sample image of a yellow person is far more than sample images of a white person, a black person and a brown person, and the face characteristics of the white person, the black person and the brown person are greatly different from those of the yellow person, so that the yellow person can be used as a majority group, and the white person, the black person and the brown person can be used as a minority group; for example, the sample image of the Han nationality is far more than the sample image of the minority, and the face features of the minority and the face features of the Han nationality have some differences, so that the Han nationality can be used as a majority group, and the minority can be used as a minority group; for example, the sample images of the teenagers, the young people and the middle-aged people are far more than those of the old people, and the face features of the old people are greatly different from those of the teenagers, the young people and the middle-aged people, so that the teenagers, the young people and the middle-aged people can be used as a majority group, and the old people can be used as a minority group. That is, the majority population may be a population in which the number of samples in the training samples of the face recognition model occupies the majority, and the minority population may be a population in which the number of samples in the training samples of the face recognition model occupies the minority.
The idea of the inventor is introduced as follows:
the inventor finds that the face recognition result of the face recognition model to the minority is inaccurate, which is mainly reflected in that the similarity between the minority non-same persons calculated by the face recognition model is high, and the minority non-same persons are easily recognized as the same person by mistake. After the analysis of the inventor, the similarity between a small number of non-same people calculated by the face recognition model is high, and the reason is that: in the training samples of the face recognition model, the number of samples of a majority group occupies a majority, and the number of samples of a minority group occupies a minority, so that in the feature space of the face recognition model, the feature space of the minority group is compressed by the feature space of the majority group, and the distance between features in the feature space of the minority group is too close, so that the similarity between non-same persons of the minority group calculated by the face recognition model is higher. The feature space of the face recognition model is the space where the features learned by the face recognition model are located, the feature space of the minority group is the space where the features learned by the face recognition model based on the samples of the minority group are located, and the feature space of the majority group is the space where the features learned by the face recognition model based on the samples of the majority group are located.
Referring to fig. 3, fig. 3 is a schematic diagram of a feature space of a face recognition model in an embodiment. Assuming that the face recognition model is trained with a small number of samples of the minority group, the face recognition model learns the features of the samples for distinguishing the minority group through the samples of the minority group, and obtains a feature space 302 of the minority group as shown in fig. 3 (a). Assuming that the face recognition model is then trained with a large number of majority group samples, the face recognition model learns the features of the majority group samples through the majority group samples to obtain a majority group feature space 304 as shown in fig. 3(b), and the majority group feature space 304 compresses the minority group feature space 302 to reduce the distance between the features in the minority group feature space 302.
The inventor finds through a large number of experiments that when the face recognition model attempts to distinguish samples of a majority group from samples of a minority group, the face recognition model learns the features of the minority group, which are closer to the features of the majority group, under the intervention of the samples of the majority group. The inventor thinks that the samples of the majority group can be used to intervene in the learning of the face recognition model on the features of the minority group, so as to increase the distance between the features in the feature space of the minority group, thereby achieving the purpose of expanding the feature space of the minority group, and obtaining the feature space 302 of the minority group as shown in fig. 3 (c). Because the distance between the features in the feature space 302 of the minority group becomes large, the similarity between the non-same persons of the minority group calculated by the face recognition model can be reduced, and the recognition accuracy of the face image of the minority group is improved.
In the application, the inventor designs a first sample set, a second sample set and a third sample set, wherein samples in the first sample set are face images of a majority of groups, and the face images can be used for a face recognition model to learn features for distinguishing the majority of groups; the samples in the second sample set are face images of minority groups, and the face images can be used for a face recognition model to learn features for distinguishing the minority groups; the third sample set comprises a majority group face image and a minority group face image which can be used for the face recognition model to learn the characteristics for distinguishing the majority group from the minority group, and the majority group face image comprised in the third sample set can be used for interfering the face recognition model to learn the characteristics of the minority group so as to expand the characteristic space of the minority group.
In one embodiment, a computer device obtains a first set of samples and a second set of samples, and treats a union of the first set of samples and the second set of samples as a third set of samples. In other embodiments, the samples in the third sample set do not or partially coincide with the samples in the first sample set, the second sample set, and the third sample set includes a majority group face image and a minority group face image.
And step S204, pre-training the initial face recognition model by using the majority of population face images in the first sample set to obtain a pre-trained face recognition model.
The initial face recognition model may be a face recognition model that is not trained by samples in the first sample set, the second sample set, or the third sample set. The pre-trained face recognition model may be a face recognition model trained on samples from the first set of samples.
In the application, a computer device obtains a first sample set, a second sample set and a third sample set, and pre-trains an initial face recognition model by using a majority of population face images in the first sample set to obtain a pre-trained face recognition model. The first sample set only comprises the face images of the majority groups, and the face images of the majority groups in the first sample set are used for pre-training the initial face recognition model, so that the face recognition model learns essential features for distinguishing the majority groups, and the accuracy of the face recognition model for recognizing the face images of the majority groups is ensured.
In one embodiment, the computer device pre-trains the initial face recognition model, updating model parameters of the feature extraction network and the classification network.
The feature extraction network is a model structure with the face image feature extraction capability through sample learning. The feature extraction network may be a convolutional neural network structure that may perform operations such as convolution calculations, nonlinear activation function calculations, pooling calculations, and the like. The input end of the feature extraction network is a face image, and the output end of the feature extraction network is the features of the face image. The face image features may be feature maps, feature vectors, and the like. The classification network is a model structure with classification regression capability through sample learning. The input end of the classification network is the face image characteristic, and the output end is the face recognition prediction result. It can be understood that the general network structure having the face image feature extraction capability and the classification regression capability can meet the requirements of the feature extraction network and the classification network in the embodiment of the present application, and therefore the embodiment of the present application can adopt the general network structure as the feature extraction network and the classification network.
In one embodiment, referring to fig. 4, fig. 4 is a schematic diagram of pre-training an initial face recognition model in one embodiment. The computer equipment acquires more than one majority group face images from the first sample set, inputs the more than one majority group face images into an initial face recognition model, extracts face image characteristics corresponding to the majority group face images through a characteristic extraction network in the initial face recognition model, and acquires face recognition prediction results corresponding to the majority group face images based on the face image characteristics through a classification network in the initial face recognition model. The computer equipment constructs a pre-training loss function according to the face recognition prediction results corresponding to the face images of the majority groups and the face label information, determines the gradient of the training based on the pre-training loss function, extracts the model parameters of the network and the classification network according to the gradient updating characteristics, and returns to the step of acquiring more than one face image of the majority groups from the first sample set to continue the training until the training end condition is met.
Step S206, iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, in the process of performing fine tuning training on the pre-trained face recognition model by using the sample images in the sample image set, after eliminating gradients corresponding to the sample images belonging to the third sample set and including a plurality of groups of faces, updating model parameters of the pre-trained face recognition model by using gradients corresponding to the remaining sample images in the sample image set, and obtaining the face recognition model for performing face recognition on the plurality of groups of face images and the few groups of face images until iteration is stopped.
In the application, a computer device obtains a first sample set, a second sample set and a third sample set, pre-trains an initial face recognition model by using a majority of population face images in the first sample set to obtain a pre-trained face recognition model, then iteratively obtains a sample image set from the first sample set, the second sample set and the third sample set, and performs fine tuning training on the pre-trained face recognition model by using the sample images in the sample image set. The pre-training face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, learns features for distinguishing minority groups based on minority group face images of the second sample set, learns the features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the majority group face images included in the third sample set can be used for interfering in the learning of the features of the minority groups by the pre-training face recognition model to expand feature space of the minority groups.
In one embodiment, the computer device performs fine-tuning training on the pre-trained face recognition model, and updates model parameters of the feature extraction network and the classification network.
In one embodiment, referring to fig. 5, fig. 5 is a schematic diagram illustrating a fine tuning training of a pre-trained face recognition model in one embodiment. The computer equipment samples the first sample set, the second sample set and the third sample set to obtain a sample image set, sample images in the sample image set are input into a pre-training face recognition model, face image features corresponding to all the sample images are extracted through a feature extraction network in the pre-training face recognition model, and face recognition prediction results corresponding to all the sample images are obtained based on the face image features through a classification network in the pre-training face recognition model. The computer equipment constructs a fine-tuning training loss function according to a face recognition prediction result corresponding to a sample image and face label information, determines the gradient of the training based on the fine-tuning training loss function, excludes the gradient corresponding to the sample image which belongs to a third sample set and comprises a plurality of groups of faces from the gradient, updates the model parameters of the pre-training face recognition model by using the gradient corresponding to the residual sample image in the sample image set, and returns to the step of sampling the first sample set, the second sample set and the third sample set to continue training until the training end condition is met. Due to the elimination of the gradients corresponding to the sample images of the third sample set and including the majority population of faces, the sample images of the third sample set and including the majority population of faces are only used for expanding the feature space of a minority population, and do not affect the updating of the model parameters.
In one embodiment, the computer device determines a gradient of the current training based on the fine-tuning training loss function, and performs masking processing on a gradient corresponding to a sample image of a third sample set in the gradient and including a majority of human faces, so that the gradient corresponding to the sample image of the third sample set and including the majority of human faces does not participate in updating the model parameters.
The inventor finds that the method provided by the embodiment of the application can improve the accuracy of the face recognition model in recognizing the face images of the majority group, and simultaneously greatly improve the accuracy of the face recognition model in recognizing the face images of the minority group.
In the processing method of the face recognition model, the initial face recognition model is pre-trained by using the majority of face images in the first sample set to obtain a pre-trained face recognition model, so that the recognition accuracy of the face recognition model on the majority of face images is improved; then, using the sample images in the first sample set, the second sample set and the third sample set to fine-tune a pre-trained face recognition model, wherein the pre-trained face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, learns features for distinguishing minority groups based on minority group face images of the second sample set, learns features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the third sample set comprises majority group face images, the method can be used for intervening the learning of the pre-trained face recognition model on the features of the minority group to expand the feature space of the minority group, reduce the similarity between the non-same persons of the minority group calculated by the face recognition model and improve the recognition accuracy of the face image of the minority group; therefore, the accuracy of the face recognition model in recognizing the face images of the majority of groups is improved, and the accuracy of the face recognition model in recognizing the face images of the minority of groups is also greatly improved.
In one embodiment, referring to fig. 6, pre-training the initial face recognition model using the majority population face images in the first sample set to obtain a pre-trained face recognition model, includes:
step S602, a plurality of groups of face images and corresponding face label information are obtained from the first sample set.
The face label information may be an identity corresponding to a face in the face image. The identity identifier is used for uniquely identifying the user and can be composed of at least one of letters, numbers and characters.
In one embodiment, the computer device obtains more than one majority group face images from the first sample set, and pre-trains the initial face recognition model once with the more than one majority group face images as a group.
Step S604, inputting the face images of a plurality of groups acquired from the first sample set into an initial face recognition model to obtain a face recognition prediction result.
In one embodiment, the computer equipment extracts face image characteristics corresponding to face images of a plurality of groups through a characteristic extraction network in an initial face recognition model; and obtaining face recognition prediction results corresponding to the face images of the majority of groups based on the face image characteristics through a classification network in the initial face recognition model.
The face image features are data for reflecting face features in the face image. The face features are physiological features inherent to the face, such as iris form, positional relationship between facial organs (eyes, nose, mouth, ears, etc.), structure (shape, size, etc.) of the facial organs, skin texture, and the like.
In one embodiment, the facial image features may specifically be one or a combination of several of position information, texture information, shape information, color information, and the like extracted from the facial image and related to the facial features. Taking the position information as an example, the position information may refer to distances, angles, and the like between various facial organs such as eyes, a nose, a mouth, ears, and the like.
In one embodiment, the computer device extracts face image features corresponding to each majority of face images through a feature extraction network in an initial face recognition model, transmits the face image features to a classification network in the initial face recognition model, and performs classification regression processing on the face image features through the classification network to obtain a face recognition prediction result corresponding to each majority of face images. The face recognition prediction result may be a probability vector whose dimensions match the number of identity identifiers in the training sample, and the value of each dimension represents the probability that the face in the face image corresponds to one of the identity identifiers.
Step S606, a pre-training loss function is constructed according to the face label information and the face recognition prediction result.
In one embodiment, the computer device constructs a pre-training loss function according to the difference between the face label information corresponding to each of the plurality of groups of face images and the face recognition prediction result. The pre-training Loss function may be a Softmax function, a contextual Loss function, a Triplet Loss function, a Center Loss function, a Margin function, or the like.
In one embodiment, the computer device converts the face label information corresponding to each of the plurality of groups of face images into a label vector, the label vector is consistent with the dimension of the probability vector, and then constructs the pre-training loss function based on the difference between the label vector corresponding to each of the plurality of groups of face images and the probability vector.
Step S608, after the model parameter when the pre-training loss function is minimized is taken as the updated model parameter of the initial face recognition model, the step of obtaining the face images of the majority of groups from the first sample set is returned to continue training until the training end condition is satisfied.
In one embodiment, the computer device obtains a gradient corresponding to the training based on a gradient descent algorithm according to the direction of the minimum pre-training loss function, and extracts model parameters of the network and the classification network according to gradient update characteristics. The gradient descent algorithm may be a random gradient descent algorithm, or an algorithm optimized based on a random gradient descent algorithm, such as a random gradient descent algorithm with vector terms.
In one embodiment, the training end condition may be that the training times reach a preset number, or that a loss value calculated by the pre-training loss function is smaller than a preset value, or the like.
In this embodiment, the initial face recognition model is pre-trained by using the face images of the majority groups in the first sample set, so that the face recognition model learns essential features for distinguishing the majority groups, thereby ensuring the accuracy of the face recognition model in recognizing the face images of the majority groups.
In one embodiment, obtaining a set of sample images required for fine-tuning training from the first set of samples, the second set of samples, and the third set of samples includes: acquiring a preset sampling proportion; and respectively sampling the first sample set, the second sample set and the third sample set according to a sampling proportion to obtain a sample image set required by fine tuning training.
In one embodiment, the computer device samples a first sample set, a second sample set and a third sample set according to a preset sampling proportion, takes a majority population face image sampled from the first sample set as a first subset, takes a minority population face image sampled from the second sample set as a second subset, takes a majority population face image and/or a minority population face image sampled from the third sample set as a third subset, and merges the first subset, the second subset and the third subset into a sample image set. And the computer equipment inputs the sample images in the sample image set into the pre-trained face recognition model so as to carry out multi-task fine tuning training on the pre-trained face recognition model. Through a large number of experiments, the inventor finds that a better training effect can be achieved when the ratio of the sample images in the first subset, the second subset and the third subset is 2:1: 3.
In one embodiment, referring to fig. 7, in the process of performing fine tuning training on the pre-trained face recognition model by using the sample images in the sample image set, after excluding gradients corresponding to sample images belonging to the third sample set and including a majority of human faces, updating model parameters of the pre-trained face recognition model by using gradients corresponding to remaining sample images in the sample image set, including:
step 702, performing fine tuning training on the pre-trained face recognition model by using each sample image in the sample image set, and acquiring a gradient corresponding to each sample image in the fine tuning training process.
In one embodiment, the fine tuning training of the pre-trained face recognition model is performed by using each sample image in the sample image set, and the obtaining of the gradient corresponding to each sample image in the fine tuning training process includes: inputting each sample image in the sample image set into a pre-training face recognition model to obtain a face recognition prediction result corresponding to each sample image; constructing a fine-tuning training loss function according to the face label information and the face recognition prediction result corresponding to each sample image; and obtaining the corresponding gradient of each sample image when the fine training loss function is minimized.
In one embodiment, inputting each sample image in the sample image set into a pre-training face recognition model to obtain a face recognition prediction result corresponding to each sample image, includes: extracting the face image characteristics corresponding to each sample image through a characteristic extraction network in the pre-training face recognition model; and obtaining a face recognition prediction result corresponding to each sample image based on the face image characteristics through a classification network in the pre-training face recognition model.
In one embodiment, the computer device constructs a fine training loss function according to the difference between the face label information and the face recognition prediction result corresponding to each sample image. The fine training Loss function may be a Softmax function, a contextual Loss function, a Triplet Loss function, a Center Loss function, a Margin function, or the like.
In one embodiment, constructing a fine-tuning training loss function according to face label information and a face recognition prediction result corresponding to each sample image includes: constructing a first loss according to face label information and a face recognition prediction result corresponding to a sample image belonging to a first sample set in the sample image set; constructing a second loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the second sample set in the sample image set; constructing a third loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the third sample set in the sample image set; acquiring a preset loss weighting coefficient; and summing the first loss, the second loss and the third loss according to the loss weighting coefficient to obtain a fine tuning training loss function.
Wherein, the first Loss, the second Loss and the third Loss may be Loss functions, such as Softmax function, contrast Loss function, triple Loss function, Center Loss function, Margin function, etc.
In one embodiment, the sample images belonging to the first sample set in the sample image set (referred to as the first subset), the sample images belonging to the second sample set in the sample image set (referred to as the second subset), and the sample images belonging to the third sample set in the sample image set (referred to as the third subset), respectively, may be set with the loss weighting coefficients. The inventors have found through a lot of experiments that a better training effect can be achieved when the ratio of the loss weighting coefficients of the first subset, the second subset and the third subset is 2:3: 1.
In one embodiment, the fine training loss function may be represented by the following equation:
Figure 100064DEST_PATH_IMAGE001
(1)
wherein,
Figure 34522DEST_PATH_IMAGE002
in order to fine-tune the training loss function,
Figure 683678DEST_PATH_IMAGE003
is the first of face recognition model
Figure 641270DEST_PATH_IMAGE004
A model parameter;
Figure 643861DEST_PATH_IMAGE005
the number of subsets in the sample image;
Figure 619776DEST_PATH_IMAGE006
is as follows
Figure 456145DEST_PATH_IMAGE007
Loss weighting coefficients corresponding to the subsets;
Figure 228929DEST_PATH_IMAGE008
is as follows
Figure 159844DEST_PATH_IMAGE009
The corresponding loss for each subset.
In one embodiment, the computer device obtains a gradient corresponding to the current training based on a gradient descent algorithm according to the direction of the minimum fine tuning training loss function, and determines a gradient corresponding to each sample image from the gradients. The gradient descent algorithm may be a random gradient descent algorithm, or an algorithm optimized based on a random gradient descent algorithm, such as a random gradient descent algorithm with vector terms.
For example, assume the first
Figure 334474DEST_PATH_IMAGE007
The corresponding loss of each subset is expressed by the following formula:
Figure 76165DEST_PATH_IMAGE010
(2)
wherein,
Figure 336245DEST_PATH_IMAGE011
is as follows
Figure 73781DEST_PATH_IMAGE007
The number of sample images in the subset;
Figure 837338DEST_PATH_IMAGE012
is as follows
Figure 281089DEST_PATH_IMAGE007
A subset of
Figure 497306DEST_PATH_IMAGE013
Face image characteristics corresponding to the sample images;
Figure 35604DEST_PATH_IMAGE014
is as follows
Figure 919246DEST_PATH_IMAGE007
A subset of
Figure 268319DEST_PATH_IMAGE013
Face recognition prediction results corresponding to the sample images;
Figure 237412DEST_PATH_IMAGE015
is as follows
Figure 313822DEST_PATH_IMAGE007
A subset of
Figure 317550DEST_PATH_IMAGE013
Face label information corresponding to the sample images.
In conjunction with equation (1) and equation (2), the fine training loss function can be expressed by the following equation:
Figure 634261DEST_PATH_IMAGE016
(3)
for the
Figure 949705DEST_PATH_IMAGE017
And the computer equipment calculates the partial derivative of the fine tuning training loss function to obtain the corresponding gradient of the training:
Figure 705172DEST_PATH_IMAGE018
(4)
therefore, the computer equipment can determine the gradient corresponding to each sample image from the gradients corresponding to the training.
Step 704, after excluding gradients corresponding to sample images belonging to the third sample set and including a plurality of groups of faces from gradients corresponding to each sample image, updating model parameters of the pre-trained face recognition model by using gradients corresponding to remaining sample images in the sample image set.
Referring to fig. 8, fig. 8 is a diagram illustrating fine tuning training of a pre-training model according to an embodiment. The sample image 4 and the sample image 5 are assumed to be sample images belonging to the third sample set among the sample image sets and including faces of a majority of groups. It can be seen that the computer device extracts the face image features corresponding to each sample image, and obtains the face recognition prediction result corresponding to each sample image based on the face image features. And constructing a first loss according to the face label information and the face recognition prediction result corresponding to the sample image of the first subset, constructing a second loss according to the face label information and the face recognition prediction result corresponding to the sample image of the second subset, constructing a third loss according to the face label information and the face recognition prediction result corresponding to the sample image of the third subset, and summing the first loss, the second loss and the third loss according to the loss weighting coefficient to obtain a fine-tuning training loss function. And obtaining the gradient corresponding to the training based on a gradient descent algorithm according to the direction of the minimum fine tuning training loss function, and determining the gradient corresponding to each sample image from the gradient. And after the gradients corresponding to the sample images 4 and 5 are eliminated from the gradients corresponding to the sample images, updating the model parameters of the pre-training face recognition model by using the gradients corresponding to the sample images 1, 2, 3 and 6.
In this embodiment, the pre-trained face recognition model is fine-tuned using the sample images in the first sample set, the second sample set, and the third sample set, the pre-trained face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, features for distinguishing minority groups based on minority group face images of the second sample set, features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and majority group face images included in the third sample set, the method can be used for intervening the learning of the pre-trained face recognition model on the features of the minority group to expand the feature space of the minority group, reduce the similarity between the non-same persons of the minority group calculated by the face recognition model and improve the recognition accuracy of the face image of the minority group.
In one embodiment, the method further comprises: fixing model parameters of a feature extraction network of a pre-training face recognition model, iteratively obtaining a sample image set required by fine-tuning training from a first sample set, a second sample set and a third sample set, performing fine-tuning training on a classification network of the pre-training face recognition model by using sample images in the sample image set required by the fine-tuning training, and updating the model parameters of the classification network by using gradients corresponding to the remaining sample images in the sample image set after eliminating gradients corresponding to sample images which belong to the third sample set and comprise a plurality of groups of faces in the fine-tuning training process; and when the iteration is stopped, fixing model parameters of a classification network of the pre-trained face recognition model, iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, performing fine tuning training on the feature extraction network of the pre-trained face recognition model by using the sample images in the sample image set required by the fine tuning training, and updating the model parameters of the feature extraction network by using gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of faces in the fine tuning training process after eliminating the gradients corresponding to the sample images in the sample image set.
In this embodiment, after the model parameters of the classification network are updated, the model parameters of the feature extraction network are updated to improve the training effect and the training efficiency.
In one embodiment, in the process of performing fine tuning training on the pre-trained face recognition model by using sample images in the sample image set, after excluding gradients corresponding to sample images that belong to a third sample set and include faces of a majority of groups, updating model parameters of the pre-trained face recognition model by using gradients corresponding to remaining sample images in the sample image set, including: when sample images obtained by sampling from a third sample set comprise face images of a minority group in the sample image set, updating model parameters of the pre-trained face recognition model by using corresponding gradients of the sample images which belong to the third sample set and comprise the faces of the minority group, the sample images which belong to the first sample set and the sample images which belong to the second sample set; and when the sample images sampled from the third sample set in the sample image set do not comprise a few group face images, updating the model parameters of the pre-trained face recognition model by using the corresponding gradients of the sample images belonging to the first sample set and the sample images belonging to the second sample set.
Specifically, when the computer device samples the first sample set, the second sample set, and the third sample set, since the third sample set includes a majority group face image and a minority group face image, the sample image sampled from the third sample set by the computer device has the following three cases: (1) the computer device samples from the third sample set only to the majority population of face images. (2) The computer device samples only a few population face images from the third sample set. (3) The computer device samples from the third sample set to a majority group face image and a minority group face image. And aiming at different sampling conditions, the computer equipment adopts different model parameter updating measures. For the case (1), the computer device updates the model parameters of the pre-trained face recognition model by using the corresponding gradients of the sample images belonging to the first sample set and the sample images belonging to the second sample set in the sample image set. For the cases (2) and (3), the computer device updates the model parameters of the pre-trained face recognition model by using the corresponding gradients of the sample images belonging to the third sample set and including the face of the minority group, the sample images belonging to the first sample set and the sample images belonging to the second sample set in the sample image sets. In this way, the sample image of the third sample set and including the face of the majority population is only used to expand the feature space of the minority population, and does not affect the update of the model parameters.
In one embodiment, the method further comprises: acquiring a face image to be verified and a to-be-verified identity corresponding to the face image to be verified, wherein the face image to be verified comprises a majority group face or a minority group face; obtaining the image characteristics to be verified corresponding to the face image to be verified through the face recognition model obtained after fine tuning training; acquiring reference image characteristics corresponding to the identity to be verified; and when the similarity between the image feature to be verified and the reference image feature exceeds a threshold value, determining that the face image to be verified passes identity verification.
The face image to be verified is an image to be subjected to identity verification. The identity authentication is to verify whether the face in the face image to be authenticated matches the identity to be authenticated. The image feature to be verified is data used for reflecting the face feature in the face image to be verified. The reference image features are data for reflecting face features in a reference face image, and the reference face image in the application scene is a pre-stored face image matched with the identity to be verified.
The face recognition model obtained by training through the method provided by the embodiment of the application can be applied to a one-to-one identity verification scene. In the one-to-one identity authentication scene, under the condition that the identity to be authenticated is known, the image feature to be authenticated corresponding to the face image to be authenticated is compared with the reference image feature corresponding to the identity to be authenticated so as to verify whether the identity corresponding to the face in the face image to be authenticated is the identity to be authenticated. Such as an identity verification scenario, a terminal screen unlock scenario, a criminal capture scenario, and so on.
Specifically, the computer device obtains the image features to be verified corresponding to the face image to be verified through the trained face recognition model, obtains the reference image features corresponding to the identity to be verified, and determines that the face image to be verified passes the identity verification when the similarity between the image features to be verified and the reference image features exceeds a threshold value.
In one embodiment, the computer device may capture an image of a living body in a real scene as a face image to be verified. The computer equipment can firstly identify the living body based on the face image to be verified so as to identify whether the acquisition object corresponding to the face image to be verified is the living body, and then carry out identity verification on the face image to be verified when the acquisition object corresponding to the face image to be verified is judged to be the living body.
The embodiment can be applied to a one-to-one identity verification scene, and the accuracy of face recognition of various groups in the one-to-one identity verification scene is improved.
In one embodiment, the method further comprises: acquiring a face image to be matched, wherein the face image to be matched comprises a majority group face or a minority group face; acquiring the image characteristics to be matched corresponding to the face image to be matched through a face recognition model obtained after fine tuning training; and matching the image features to be matched with at least one reference image feature, and taking the target identity corresponding to the reference image feature with the maximum matching degree as the identity of the face in the face image to be matched.
The face image to be matched is an image to be subjected to identity recognition. The identity recognition is to recognize the identity corresponding to the face in the face image to be matched. The image features to be matched are data used for reflecting the face features in the face images to be matched. The reference image features are data for reflecting face features in a reference face image, and the reference face image in the application scene is a pre-stored face image in a database.
The face recognition model obtained by training through the method provided by the embodiment of the application can be applied to one-to-many identity recognition scenes. In the one-to-many identity recognition scene, the image features to be matched corresponding to the face images to be matched are compared with the reference image features prestored in the database, so that the identities corresponding to the faces in the face images to be matched are determined from the identities corresponding to the reference image features in the database. Such as a face payment scenario, a traffic safety scenario, a work attendance scenario, a missing person finding scenario, and so on.
Specifically, the computer device obtains image features to be matched corresponding to the face image to be matched through the trained face recognition model, matches the image features to be matched with at least one reference image feature, and takes the target identity corresponding to the reference image feature with the maximum matching degree as the identity of the face in the face image to be matched.
The embodiment can be applied to one-to-many identity recognition scenes, and the accuracy of face recognition of various groups in one-to-many identity recognition scenes is improved.
In an embodiment, as shown in fig. 9, a method for processing a face recognition model is provided, and this embodiment is mainly illustrated by applying the method to a computer device (the terminal 102 or the server 104 in fig. 1 above), and includes the following steps:
step S902, a first sample set, a second sample set and a third sample set are obtained, where samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set includes majority group face images and minority group face images.
Step S904, obtaining a plurality of groups of face images and corresponding face label information from the first sample set, extracting face image features corresponding to each plurality of groups of face images through a feature extraction network in the initial face recognition model, and obtaining a face recognition prediction result corresponding to each plurality of groups of face images based on the face image features through a classification network in the initial face recognition model.
Step S906, constructing a pre-training loss function according to the difference between the face label information and the face recognition prediction result, taking the model parameter when the pre-training loss function is minimized as the updated model parameter of the initial face recognition model, returning to the step of obtaining the face images of most groups from the first sample set to continue training until the training end condition is met, and obtaining the pre-training face recognition model.
Step S908, respectively sampling the first sample set, the second sample set, and the third sample set according to a sampling ratio, and obtaining a sample image set required by the fine tuning training.
Step S910, inputting each sample image in the sample image set into a pre-training face recognition model, extracting face image features corresponding to each sample image through a feature extraction network in the pre-training face recognition model, and obtaining a face recognition prediction result corresponding to each sample image based on the face image features through a classification network in the pre-training face recognition model.
Step S912, constructing a first loss according to the face label information and the face recognition prediction result corresponding to the sample image belonging to the first sample set in the sample image set, constructing a second loss according to the face label information and the face recognition prediction result corresponding to the sample image belonging to the second sample set in the sample image set, constructing a third loss according to the face label information and the face recognition prediction result corresponding to the sample image belonging to the third sample set in the sample image set, obtaining a preset loss weighting coefficient, and summing the first loss, the second loss, and the third loss according to the loss weighting coefficient to obtain a fine-tuning training loss function.
Step S914, obtaining the corresponding gradient of each sample image when the fine tuning training loss function is minimized, after eliminating the corresponding gradient of the sample image belonging to the third sample set and including the majority of the population of human faces from the corresponding gradient of each sample image, updating the model parameters of the pre-training human face recognition model by using the corresponding gradient of the remaining sample images in the sample image set, returning to the step of respectively sampling the first sample set, the second sample set and the third sample set according to the sampling proportion to continue training until the training end condition is satisfied, and obtaining the human face recognition model for carrying out human face recognition on the majority of the population of human face images and the minority of the population of human face images.
In this embodiment, the number of samples of the majority group in the training samples of the face recognition model is the majority, and the number of samples of the minority group is the minority, so that the face recognition model can learn a large number of intrinsic features for distinguishing the majority group based on the samples of the majority group, and can learn a small number of shallow features for distinguishing the minority group based on the samples of the minority group only, resulting in a high similarity between non-people of the minority group calculated by the face recognition model. And when the face recognition model tries to distinguish samples of a majority group and samples of a minority group, the face recognition model can learn essential features for distinguishing the minority group under the intervention of the samples of the majority group. Therefore, the initial face recognition model is pre-trained by using the face images of the majority of groups in the first sample set, so that the face recognition model learns essential features for distinguishing the majority of groups, and the recognition accuracy of the face recognition model on the face images of the majority of groups is ensured; and then performing fine tuning training on a pre-trained face recognition model by using the first sample set, the second sample set and the third sample set, wherein the pre-trained face recognition model learns the features for distinguishing majority groups based on majority group face images of the first sample set, learns the features for distinguishing minority groups based on minority group face images of the second sample set, learns the features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the majority group face images included in the third sample set can be used for interfering in the learning of the essential features for distinguishing minority groups by the pre-trained face recognition model, so that the accuracy of the face recognition model for distinguishing the minority group face images is improved. According to the processing method of the face recognition model, the accuracy of the face recognition model for recognizing the face images of the majority of groups is improved, and meanwhile the accuracy of the face recognition model for recognizing the face images of the minority of groups is also greatly improved.
In an embodiment, as shown in fig. 10, a face recognition method is provided, and this embodiment is mainly illustrated by applying the method to a computer device (the terminal 102 or the server 104 in fig. 1 above), and includes the following steps:
step 1002, a face image to be recognized is obtained, wherein the face image to be recognized comprises at least one of a majority group face and a minority group face.
The face image to be recognized is an image to be subjected to face recognition. The face image to be recognized may include one or at least two faces to be recognized. The computer equipment can identify one or at least two faces to be identified in the face images to be identified based on the face images to be identified.
Specifically, the terminal can acquire an image of a real scene through a built-in camera. The terminal can also acquire images of a real scene through an external camera which is associated with the terminal. For example, the terminal may be connected to the image capturing device through a connection line or a network, and the image capturing device captures an image of a real scene through the camera and transmits the captured image to the terminal. The cameras may be monocular cameras, binocular cameras, depth cameras, 3D (three dimensional) cameras, and the like. The terminal can collect images of living bodies in a real scene, and can also collect existing images containing human faces in the real scene, such as identity document scanning pieces and the like.
Step 1004, performing face recognition on a face image to be recognized through a trained face recognition model to obtain a face recognition result corresponding to the face image to be recognized, wherein the face recognition model is obtained by performing pre-training on an initial face recognition model through a majority of population face images in a first sample set to obtain a pre-trained face recognition model, then performing fine tuning training on the pre-trained face recognition model through sample images in a sample image set obtained from the first sample set, a second sample set and a third sample set, eliminating gradients corresponding to sample images belonging to the third sample set and including the majority of population faces in the fine tuning training process, and updating model parameters of the pre-trained face recognition model through gradients corresponding to the rest sample images in the sample image set; the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
In one embodiment, the computer device obtains the face image characteristics corresponding to the face image to be recognized through the trained face recognition model, and obtains the face recognition result corresponding to the face image to be recognized based on the face image characteristics.
In an embodiment, the face recognition result may be an identity corresponding to a face in a face image to be recognized, for example, in a one-to-many identity recognition scenario, the computer device obtains, through a trained face recognition model, a face image feature corresponding to the face image to be recognized, matches the face image feature with at least one reference image feature, and uses an identity corresponding to the reference image feature with the largest matching degree as an identity of the face in the face image to be recognized. In other embodiments, the face recognition result may be an authentication result corresponding to a face in the face image to be recognized. For example, in a one-to-one identity authentication scene, the computer device obtains the face image features corresponding to the face image to be recognized through the trained face recognition model, obtains the reference image features corresponding to the identity to be verified, and determines that the face image to be recognized passes the identity authentication when the similarity between the image features to be verified and the reference image features exceeds a threshold value.
For the training method of the face recognition model, reference may be made to the above embodiments, which are not described herein again.
In the face recognition method, the face recognition is carried out through the trained face recognition model, so that the face recognition accuracy of various groups can be improved. The face recognition model is characterized in that the initial face recognition model is pre-trained by using a majority of face images in the first sample set to obtain a pre-trained face recognition model, so that the recognition accuracy of the face recognition model on the majority of face images is improved; then, using the sample images in the first sample set, the second sample set and the third sample set to fine-tune a pre-trained face recognition model, wherein the pre-trained face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, learns features for distinguishing minority groups based on minority group face images of the second sample set, learns features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the third sample set comprises majority group face images, the method can be used for intervening the learning of the pre-trained face recognition model on the features of the minority group to expand the feature space of the minority group, reduce the similarity between the non-same persons of the minority group calculated by the face recognition model and improve the recognition accuracy of the face image of the minority group; therefore, the accuracy of the face recognition model in recognizing the face images of the majority of groups is improved, and the accuracy of the face recognition model in recognizing the face images of the minority of groups is also greatly improved.
It should be understood that although the various steps in the flowcharts of fig. 2, 6-7, 9-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 6-7, and 9-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in alternation with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 11, there is provided an apparatus for processing a face recognition model, where the apparatus may adopt a software module or a hardware module, or a combination of the two modules as a part of a computer device, and the apparatus specifically includes: an acquisition module 1102, a pre-training module 1104, and a fine-tuning training module 1106, wherein:
an obtaining module 1102, configured to obtain a first sample set, a second sample set, and a third sample set, where samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set includes majority group face images and minority group face images;
a pre-training module 1104, configured to pre-train the initial face recognition model using the majority of group face images in the first sample set to obtain a pre-trained face recognition model;
a fine-tuning training module 1106, configured to iteratively obtain a sample image set required by fine-tuning training from the first sample set, the second sample set, and the third sample set, during a process of performing fine-tuning training on the pre-trained face recognition model by using sample images in the sample image set, after excluding gradients corresponding to sample images that belong to the third sample set and include faces of a majority of groups, update model parameters of the pre-trained face recognition model by using gradients corresponding to remaining sample images in the sample image set, until iteration is stopped, obtain a face recognition model for performing face recognition on face images of the majority of groups and face images of a minority of groups.
In one embodiment, the pre-training module 1104 is further configured to: acquiring a plurality of groups of face images and corresponding face label information from a first sample set; inputting a plurality of groups of face images acquired from the first sample set into an initial face recognition model to obtain a face recognition prediction result; constructing a pre-training loss function according to the face label information and the face recognition prediction result; and after the model parameters when the pre-training loss function is minimized are taken as the updated model parameters of the initial face recognition model, returning to the step of acquiring the face images of most groups from the first sample set to continue training until the training end condition is met.
In one embodiment, the pre-training module 1104 is further configured to: extracting the face image characteristics corresponding to the face images of the majority of groups through a characteristic extraction network in the initial face recognition model; and obtaining face recognition prediction results corresponding to the face images of the majority of groups based on the face image characteristics through a classification network in the initial face recognition model.
In one embodiment, fine training module 1106 is further to: acquiring a preset sampling proportion; and respectively sampling the first sample set, the second sample set and the third sample set according to a sampling proportion to obtain a sample image set required by fine tuning training.
In one embodiment, fine training module 1106 is further to: carrying out fine tuning training on the pre-trained face recognition model by using each sample image in the sample image set, and acquiring the corresponding gradient of each sample image in the fine tuning training process; and after the gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of human faces are excluded from the gradients corresponding to the sample images, updating the model parameters of the pre-training human face recognition model by using the gradients corresponding to the residual sample images in the sample image set.
In one embodiment, fine training module 1106 is further to: inputting each sample image in the sample image set into a pre-training face recognition model to obtain a face recognition prediction result corresponding to each sample image; constructing a fine-tuning training loss function according to the face label information and the face recognition prediction result corresponding to each sample image; and obtaining the corresponding gradient of each sample image when the fine training loss function is minimized.
In one embodiment, fine training module 1106 is further to: extracting the face image characteristics corresponding to each sample image through a characteristic extraction network in the pre-training face recognition model; and obtaining a face recognition prediction result corresponding to each sample image based on the face image characteristics through a classification network in the pre-training face recognition model.
In one embodiment, fine training module 1106 is further to: constructing a first loss according to face label information and a face recognition prediction result corresponding to a sample image belonging to a first sample set in the sample image set; constructing a second loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the second sample set in the sample image set; constructing a third loss according to the face label information and the face identification prediction result corresponding to the sample image belonging to the third sample set in the sample image set; acquiring a preset loss weighting coefficient; and summing the first loss, the second loss and the third loss according to the loss weighting coefficient to obtain a fine tuning training loss function.
In one embodiment, fine training module 1106 is further to: when sample images obtained by sampling from a third sample set comprise face images of a minority group in the sample image set, updating model parameters of the pre-trained face recognition model by using corresponding gradients of the sample images which belong to the third sample set and comprise the faces of the minority group, the sample images which belong to the first sample set and the sample images which belong to the second sample set; and when the sample images sampled from the third sample set in the sample image set do not comprise a few group face images, updating the model parameters of the pre-trained face recognition model by using the corresponding gradients of the sample images belonging to the first sample set and the sample images belonging to the second sample set.
In one embodiment, the processing apparatus of the face recognition model further comprises a face recognition module, and the face recognition module is configured to: acquiring a face image to be verified and a to-be-verified identity corresponding to the face image to be verified, wherein the face image to be verified comprises a majority group face or a minority group face; obtaining the image characteristics to be verified corresponding to the face image to be verified through the face recognition model obtained after fine tuning training; acquiring reference image characteristics corresponding to the identity to be verified; and when the similarity between the image feature to be verified and the reference image feature exceeds a threshold value, determining that the face image to be verified passes identity verification.
In one embodiment, the face recognition module is further configured to: acquiring a face image to be matched, wherein the face image to be matched comprises a majority group face or a minority group face; acquiring the image characteristics to be matched corresponding to the face image to be matched through a face recognition model obtained after fine tuning training; and matching the image features to be matched with at least one reference image feature, and taking the target identity corresponding to the reference image feature with the maximum matching degree as the identity of the face in the face image to be matched.
For the specific definition of the processing device of the face recognition model, reference may be made to the above definition of the processing method of the face recognition model, and details are not described here. All or part of the modules in the processing device of the face recognition model can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, as shown in fig. 12, a face recognition apparatus is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an acquisition module 1202 and a face recognition module 1204, wherein:
an obtaining module 1202, configured to obtain a face image to be recognized, where the face image to be recognized includes at least one of a majority group face and a minority group face;
a face recognition module 1204, configured to perform face recognition on a to-be-recognized face image through a trained face recognition model to obtain a face recognition result corresponding to the to-be-recognized face image, where the face recognition model is obtained by pre-training an initial face recognition model using a majority of group face images in a first sample set, and then performing fine tuning training on the pre-trained face recognition model using sample images in a sample image set obtained from the first sample set, a second sample set, and a third sample set, and in a fine tuning training process, excluding gradients corresponding to sample images belonging to the third sample set and including a majority of group faces, and updating model parameters of the pre-trained face recognition model using gradients corresponding to remaining sample images in the sample image set;
the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
In the processing device of the face recognition model and the face recognition device, the initial face recognition model is pre-trained by using the majority of groups of face images in the first sample set to obtain the pre-trained face recognition model, so that the recognition accuracy of the face recognition model on the majority of groups of face images is improved; then, using the sample images in the first sample set, the second sample set and the third sample set to fine-tune a pre-trained face recognition model, wherein the pre-trained face recognition model learns features for distinguishing majority groups based on majority group face images of the first sample set, learns features for distinguishing minority groups based on minority group face images of the second sample set, learns features for distinguishing majority groups and minority groups based on majority group face images and minority group face images of the third sample set, and the third sample set comprises majority group face images, the method can be used for intervening the learning of the pre-trained face recognition model on the features of the minority group to expand the feature space of the minority group, reduce the similarity between the non-same persons of the minority group calculated by the face recognition model and improve the recognition accuracy of the face image of the minority group; therefore, the accuracy of the face recognition model in recognizing the face images of the majority of groups is improved, and the accuracy of the face recognition model in recognizing the face images of the minority of groups is also greatly improved.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing the processing data of the face recognition model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing a face recognition model.
Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (16)

1. A method for processing a face recognition model, the method comprising:
acquiring a first sample set, a second sample set and a third sample set, wherein samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises the majority group face images and the minority group face images;
pre-training an initial face recognition model by using a majority of population face images in the first sample set to obtain a pre-training face recognition model;
iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, during the fine tuning training of the pre-trained face recognition model by using the sample images in the sample image set, after eliminating gradients corresponding to sample images belonging to the third sample set and including faces of a majority of groups, updating model parameters of the pre-trained face recognition model by using gradients corresponding to remaining sample images in the sample image set, until iteration is stopped, obtaining a face recognition model for performing face recognition on face images of a majority of groups and face images of a minority of groups.
2. The method of claim 1, wherein pre-training an initial face recognition model using a majority of face images in the first sample set to obtain a pre-trained face recognition model comprises:
acquiring a plurality of groups of face images and corresponding face label information from the first sample set;
inputting the face images of a plurality of groups obtained from the first sample set into the initial face recognition model to obtain a face recognition prediction result;
constructing a pre-training loss function according to the face label information and the face recognition prediction result;
and after the model parameters when the pre-training loss function is minimized are taken as the updated model parameters of the initial face recognition model, returning to the step of acquiring the face images of the majority of groups from the first sample set to continue training until the training end condition is met.
3. The method of claim 2, wherein said inputting the majority of face images obtained from the first sample set into the initial face recognition model to obtain a face recognition prediction result comprises:
extracting the facial image characteristics corresponding to the facial images of the majority of groups through a characteristic extraction network in the initial facial recognition model;
and obtaining a face recognition prediction result corresponding to each of the plurality of groups of face images based on the face image features through a classification network in the initial face recognition model.
4. The method of claim 1, wherein obtaining a set of sample images required for fine-tuning training from the first set of samples, the second set of samples, and the third set of samples comprises:
acquiring a preset sampling proportion;
and sampling the first sample set, the second sample set and the third sample set according to the sampling proportion to obtain the sample image set required by fine tuning training.
5. The method according to claim 1, wherein the updating the model parameters of the pre-trained face recognition model using the gradients corresponding to the remaining sample images in the sample image set after excluding the gradients corresponding to the sample images belonging to the third sample set and including the majority of faces during the fine-tuning training of the pre-trained face recognition model using the sample images in the sample image set comprises:
performing fine tuning training on the pre-trained face recognition model by using each sample image in the sample image set, and acquiring a gradient corresponding to each sample image in the fine tuning training process;
and after the gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of human faces are excluded from the gradients corresponding to the sample images, updating the model parameters of the pre-training human face recognition model by using the gradients corresponding to the residual sample images in the sample image set.
6. The method of claim 5, wherein the performing fine-tuning training on the pre-trained face recognition model by using each sample image in the sample image set, and obtaining a gradient corresponding to each sample image in the fine-tuning training process comprises:
inputting each sample image in the sample image set into the pre-training face recognition model to obtain a face recognition prediction result corresponding to each sample image;
constructing a fine-tuning training loss function according to the face label information corresponding to each sample image and the face recognition prediction result;
and obtaining the corresponding gradient of each sample image when the fine training loss function is minimized.
7. The method according to claim 6, wherein the inputting each sample image in the sample image set into the pre-trained face recognition model to obtain a face recognition prediction result corresponding to each sample image comprises:
extracting the face image characteristics corresponding to each sample image through a characteristic extraction network in the pre-training face recognition model;
and obtaining a face recognition prediction result corresponding to each sample image based on the face image characteristics through a classification network in the pre-training face recognition model.
8. The method of claim 6, wherein constructing a fine-tuning training loss function according to the face label information corresponding to each sample image and the face recognition prediction result comprises:
constructing a first loss according to the face label information corresponding to the sample images belonging to the first sample set in the sample image set and the face identification prediction result;
constructing a second loss according to the face label information corresponding to the sample image belonging to the second sample set in the sample image set and the face identification prediction result;
constructing a third loss according to the face label information corresponding to the sample image belonging to the third sample set in the sample image set and the face identification prediction result;
acquiring a preset loss weighting coefficient;
and summing the first loss, the second loss and the third loss according to the loss weighting coefficient to obtain the fine training loss function.
9. The method according to claim 1, wherein the updating the model parameters of the pre-trained face recognition model using the gradients corresponding to the remaining sample images in the sample image set after excluding the gradients corresponding to the sample images belonging to the third sample set and including the majority of faces during the fine-tuning training of the pre-trained face recognition model using the sample images in the sample image set comprises:
when sample images sampled from the third sample set in the sample image set comprise face images of a minority group, updating model parameters of the pre-trained face recognition model by using corresponding gradients of the sample images belonging to the third sample set and comprising the faces of the minority group, the sample images belonging to the first sample set and the sample images belonging to the second sample set;
and when the sample images sampled from the third sample set in the sample image set do not comprise a few group face images, updating the model parameters of the pre-training face recognition model by using the corresponding gradients of the sample images belonging to the first sample set and the sample images belonging to the second sample set.
10. The method of claim 1, further comprising:
acquiring a face image to be verified and a to-be-verified identity corresponding to the face image to be verified, wherein the face image to be verified comprises a majority group face or a minority group face;
obtaining the image characteristics to be verified corresponding to the face image to be verified through the face recognition model obtained after fine tuning training;
acquiring the reference image characteristics corresponding to the identity to be verified;
and when the similarity between the image feature to be verified and the reference image feature exceeds a threshold value, determining that the face image to be verified passes identity verification.
11. The method of claim 1, further comprising:
acquiring a face image to be matched, wherein the face image to be matched comprises a majority group face or a minority group face;
acquiring the image characteristics to be matched corresponding to the face image to be matched through the face recognition model obtained after fine tuning training;
and matching the image features to be matched with at least one reference image feature, and taking the target identity corresponding to the reference image feature with the maximum matching degree as the identity of the face in the face image to be matched.
12. A face recognition method, comprising:
acquiring a face image to be recognized, wherein the face image to be recognized comprises at least one of a majority group face and a minority group face;
performing face recognition on the face image to be recognized through a trained face recognition model to obtain a face recognition result corresponding to the face image to be recognized, wherein the face recognition model is obtained by performing pre-training on an initial face recognition model through a majority of group face images in a first sample set to obtain a pre-trained face recognition model, then performing fine tuning training on the pre-trained face recognition model through sample images in sample image sets obtained from the first sample set, a second sample set and a third sample set, eliminating gradients corresponding to the sample images belonging to the third sample set and including the majority of group faces in the fine tuning training process, and updating model parameters of the pre-trained face recognition model through gradients corresponding to the rest of the sample images in the sample image sets;
the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
13. An apparatus for processing a face recognition model, the apparatus comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first sample set, a second sample set and a third sample set, samples in the first sample set are majority group face images, samples in the second sample set are minority group face images, and the third sample set comprises the majority group face images and the minority group face images;
the pre-training module is used for pre-training the initial face recognition model by using the majority of population face images in the first sample set to obtain a pre-training face recognition model;
and the fine tuning training module is used for iteratively obtaining a sample image set required by fine tuning training from the first sample set, the second sample set and the third sample set, removing gradients corresponding to sample images which belong to the third sample set and comprise a plurality of groups of faces in the fine tuning training process of the pre-trained face recognition model by using the sample images in the sample image set, updating model parameters of the pre-trained face recognition model by using gradients corresponding to the rest sample images in the sample image set until iteration is stopped, and obtaining the face recognition model for performing face recognition on the plurality of groups of face images and the few groups of face images.
14. An apparatus for face recognition, the apparatus comprising:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring a face image to be recognized, and the face image to be recognized comprises at least one of a majority group face and a minority group face;
a face recognition module for carrying out face recognition on the face image to be recognized through the trained face recognition model to obtain a face recognition result corresponding to the face image to be recognized, after the face recognition model is obtained by pre-training the initial face recognition model by using the majority of face images in the first sample set, then the sample images in the sample image sets obtained from the first sample set, the second sample set and the third sample set are used for carrying out fine tuning training on the pre-trained face recognition model, in the fine tuning training process, the gradients corresponding to the sample images which belong to the third sample set and comprise a plurality of groups of human faces are excluded, and the model parameters of the pre-training human face recognition model are updated by using the gradients corresponding to the residual sample images in the sample image set;
the samples in the first sample set are majority group face images, the samples in the second sample set are minority group face images, and the third sample set comprises majority group face images and minority group face images.
15. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.
16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202110354900.6A 2021-04-01 2021-04-01 Face recognition model processing method, face recognition method and device Active CN112801054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110354900.6A CN112801054B (en) 2021-04-01 2021-04-01 Face recognition model processing method, face recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110354900.6A CN112801054B (en) 2021-04-01 2021-04-01 Face recognition model processing method, face recognition method and device

Publications (2)

Publication Number Publication Date
CN112801054A CN112801054A (en) 2021-05-14
CN112801054B true CN112801054B (en) 2021-06-22

Family

ID=75816085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110354900.6A Active CN112801054B (en) 2021-04-01 2021-04-01 Face recognition model processing method, face recognition method and device

Country Status (1)

Country Link
CN (1) CN112801054B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114360008B (en) * 2021-12-23 2023-06-20 上海清鹤科技股份有限公司 Face authentication model generation method, authentication method, equipment and storage medium
CN114462502B (en) * 2022-01-06 2024-07-12 支付宝(杭州)信息技术有限公司 Nuclear body recommendation model training method and device
CN114419400B (en) * 2022-03-28 2022-07-29 北京字节跳动网络技术有限公司 Training method, recognition method, device, medium and equipment of image recognition model
CN114677564B (en) * 2022-04-08 2023-10-13 北京百度网讯科技有限公司 Training sample generation method, deep learning model training method and device
CN115410265B (en) * 2022-11-01 2023-01-31 合肥的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295584A (en) * 2016-08-16 2017-01-04 深圳云天励飞技术有限公司 Depth migration study is in the recognition methods of crowd's attribute
CN107886064A (en) * 2017-11-06 2018-04-06 安徽大学 A kind of method that recognition of face scene based on convolutional neural networks adapts to
CN108537193A (en) * 2018-04-17 2018-09-14 厦门美图之家科技有限公司 Ethnic attribute recognition approach and mobile terminal in a kind of face character
CN108805048A (en) * 2018-05-25 2018-11-13 腾讯科技(深圳)有限公司 A kind of method of adjustment of human face recognition model, device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9400925B2 (en) * 2013-11-15 2016-07-26 Facebook, Inc. Pose-aligned networks for deep attribute modeling
US9405963B2 (en) * 2014-07-30 2016-08-02 International Business Machines Corporation Facial image bucketing with expectation maximization and facial coordinates
US10546232B2 (en) * 2017-07-04 2020-01-28 Microsoft Technology Licensing, Llc Image recognition with promotion of underrepresented classes
CN110633604B (en) * 2018-06-25 2023-04-25 富士通株式会社 Information processing method and information processing apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295584A (en) * 2016-08-16 2017-01-04 深圳云天励飞技术有限公司 Depth migration study is in the recognition methods of crowd's attribute
CN107886064A (en) * 2017-11-06 2018-04-06 安徽大学 A kind of method that recognition of face scene based on convolutional neural networks adapts to
CN108537193A (en) * 2018-04-17 2018-09-14 厦门美图之家科技有限公司 Ethnic attribute recognition approach and mobile terminal in a kind of face character
CN108805048A (en) * 2018-05-25 2018-11-13 腾讯科技(深圳)有限公司 A kind of method of adjustment of human face recognition model, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Face Recognition Based on Phase-Feature;Jiang, Wei et al;《Asia-Pacific Signal and Information Processing Association》;20091004;第672-675页 *
基于迁移学习的人脸种群识别技术研究;张家伟 等;《信息技术》;20190430(第4期);第77-81页 *

Also Published As

Publication number Publication date
CN112801054A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112801054B (en) Face recognition model processing method, face recognition method and device
Khosravy et al. Model inversion attack by integration of deep generative models: Privacy-sensitive face generation from a face recognition system
CN109190470B (en) Pedestrian re-identification method and device
CN109359541A (en) A kind of sketch face identification method based on depth migration study
CN111680672B (en) Face living body detection method, system, device, computer equipment and storage medium
CN111368943B (en) Method and device for identifying object in image, storage medium and electronic device
US11430255B2 (en) Fast and robust friction ridge impression minutiae extraction using feed-forward convolutional neural network
CN113011387B (en) Network training and human face living body detection method, device, equipment and storage medium
CN113705290A (en) Image processing method, image processing device, computer equipment and storage medium
CN111476222B (en) Image processing method, image processing device, computer equipment and computer readable storage medium
CN111104852B (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN115050064A (en) Face living body detection method, device, equipment and medium
Mehraj et al. Feature vector extraction and optimisation for multimodal biometrics employing face, ear and gait utilising artificial neural networks
CN114677611B (en) Data identification method, storage medium and device
Diarra et al. Study of deep learning methods for fingerprint recognition
Lohith et al. Multimodal biometric person authentication using face, ear and periocular region based on convolution neural networks
KR20210071410A (en) Sensor-specific image recognition device and method
CN117011909A (en) Training method of face recognition model, face recognition method and device
CN115708135A (en) Face recognition model processing method, face recognition method and device
CN111832364B (en) Face recognition method and device
CN115082873A (en) Image recognition method and device based on path fusion and storage medium
Elshazly et al. Compression-Based Cancelable Multi-Biometric System
Uddin et al. Artificial neural network inducement for enhancement of cloud computing security
CN113269176B (en) Image processing model training method, image processing device and computer equipment
CN116597500B (en) Iris recognition method, iris recognition device, iris recognition equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40044399

Country of ref document: HK