CN112912887A

CN112912887A - Processing method, device and equipment based on face recognition and readable storage medium

Info

Publication number: CN112912887A
Application number: CN201880098348.5A
Authority: CN
Inventors: 吴晓民
Original assignee: Bitmain Technologies Inc
Current assignee: Bitmain Technologies Inc
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2021-06-04
Also published as: WO2020093303A1

Abstract

A processing method, a device, equipment and a readable storage medium based on face recognition, the method comprises the steps of firstly training a face feature extraction model by utilizing a face data set and a first loss function based on Softmax (S101), then performing model retraining on the face feature extraction model by utilizing a testimony data set and a second loss function based on Triplet to obtain a final face feature extraction model (S102), and can accelerate the convergence speed when the testimony data set is used for performing model training on the basis of the second loss function of Triplet to improve the model training efficiency; even if the testimony data sets are few, the testimony data sets can be well utilized to train the face feature model through the testimony data sets based on the second loss function of the Triplet, the data characteristics of the face data sets and the testimony data sets can be fully utilized, the accuracy of extracting the face features when the face feature model is applied to a testimony scene is improved, and the accuracy of face recognition can be improved.

Description

Processing method, device and equipment based on face recognition and readable storage medium

Technical Field

The present disclosure relates to the field of face recognition technologies, and for example, to a processing method, an apparatus, a device, and a readable storage medium based on face recognition.

Background

In application scenarios of security monitoring, security protection, and the like, in order to track a certain person (e.g. an evasion, a suspect, and the like), it is often necessary to identify a certificate picture matching the person picture in a database according to a captured person picture to determine the identity of the person, and such a face identification scenario that matches the certificate picture by using one person picture may be referred to as a "person and certificate scenario".

Face recognition is generally divided into four processes: face detection, face alignment, feature extraction and feature matching. The feature extraction is used as the most key step of face recognition, and the extracted features are more biased to the unique features of the face, so that the feature matching is performed with great significance. Most face recognition models rely on deep neural network technology, which is a method for training model parameters by using training data, so that the training data has great influence on the performance of the models.

At present, face recognition usually adopts a common face data set collected from the internet to train a face feature extraction model. In the face data set, one person usually has dozens or even hundreds of thousands of pictures, and a face recognition model obtained by training the data cannot be well adapted to face recognition in a testimony scene, so that the accuracy of face recognition is low. To adapt to face recognition in a witness scene, a witness data set is required to train a face recognition model, and the witness data set is characterized in that the number of photos of each person generally only contains one life photo and one certificate photo.

Disclosure of Invention

The embodiment of the disclosure provides a processing method based on face recognition, which comprises the following steps:

training a face feature extraction model by using a face data set and a first loss function based on Softmax;

and performing model retraining on the face feature extraction model by using the testimony data set and a second loss function based on the Triplet to obtain a final face feature extraction model.

The embodiment of the present disclosure further provides a processing apparatus based on face recognition, including:

the first training module is used for training a face feature extraction model by utilizing a face data set and a first loss function based on Softmax;

and the second training module is used for performing model retraining on the human face feature extraction model by using the testimony data set and a second loss function based on the Triplet to obtain a final human face feature extraction model.

The embodiment of the disclosure also provides a computer device, which comprises the processing device based on the face recognition.

The embodiment of the present disclosure also provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the processing method based on face recognition.

The disclosed embodiments also provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the above-mentioned face recognition-based processing method.

An embodiment of the present disclosure further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to perform the above-described face recognition-based processing method.

The disclosed embodiment provides a processing method, a device, equipment and a readable storage medium based on face recognition, wherein a face feature extraction model is trained by using a face data set and a first loss function based on Softmax, then a testimony data set and a second loss function based on triple are used for model retraining, so that a final face feature extraction model is obtained, the first loss function based on Softmax is trained by using a common face data set, so that the convergence speed of model training by using the testy data set and the second loss function based on triple can be accelerated, and the efficiency of model training is improved; moreover, the data characteristics of a common face data set and a testimony data set can be fully utilized, even if the testimony data set is less, the testimony data set can be well utilized to train a face feature model through a second loss function based on the Triplet, the accuracy of extracting the face features when the face feature model is applied to the testimony scene is improved, and the accuracy of face recognition can be improved.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:

fig. 1 is a flowchart of a processing method based on face recognition according to an embodiment of the present disclosure;

fig. 2 is a flowchart of another processing method based on face recognition according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a processing apparatus based on face recognition according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another processing apparatus based on face recognition according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.

The embodiment of the present disclosure provides a processing method based on face recognition, and fig. 1 is a flowchart of the processing method based on face recognition provided by the embodiment of the present disclosure. As shown in fig. 1, the method in this embodiment may include:

and S101, training a face feature extraction model by using a face data set and a first loss function based on Softmax.

Face recognition is generally divided into four processes: face detection, face alignment, feature extraction and feature matching. The feature extraction is used as the most key step of face recognition, and the extracted features are more biased to the unique features of the face, so that the feature matching is performed with great significance. In addition to optimizing the network structure, modifying the loss function is another option to improve the performance of the face recognition model, and optimizing the loss function can enable the model to learn more valuable information from the existing data.

In this embodiment, the face data set includes a plurality of face pictures of each person. The face pictures in the face data set comprise certificate pictures used for identity cards, passports, drivers licenses and other certificates, and non-certificate pictures including face area life photos and the like. The certificate-free picture can be a single photo or a collective photo. In addition, the Face image in the Face data set may be a Face image downloaded from a network and subjected to data cleaning, or may also be a training data set used for training a Face recognition model in the prior art, such as an lfw (laboratory Faces in the wild) data set, a MegaFace data set, a VGG2Face data set, and the like, which is not specifically limited herein.

Optionally, the face feature extraction model may adopt a network structure based on a residual network (ResNet). Specifically, the face feature extraction model may be a residual error network, or may be a network based on various variants of the residual error network, for example, a depth residual error network, or a depth pyramid residual error network, and the like, which is not specifically limited in this embodiment.

For example, the face feature extraction model may employ a 100-layer residual network, abbreviated as ResNet-100.

Optionally, the first Loss function based on Softmax may be Softmax Loss, or another Loss function improved based on Softmax Loss, such as Cosine Loss.

In this embodiment, the face feature extraction model is trained by using the face data set and the first loss function based on Softmax, and may specifically be implemented in the following manner:

for the face pictures in the face data set, extracting the face features of the face pictures through a face feature extraction model to be trained, inputting the extracted face features into a classification model based on a first loss function of Softmax for classification training until a classification result meets a preset convergence condition, and obtaining parameters of the face feature extraction model.

The convergence condition may be set by a technician according to actual needs, and this embodiment is not specifically limited herein.

The classification model of the first loss function based on Softmax is subjected to classification training by using a Softmax regression classification algorithm, and a cost function of the Softmax regression algorithm can be set by a technician according to experience and actual needs, which is not specifically limited in this embodiment.

It should be noted that, in this embodiment, the face feature extraction model is trained by using the face data set and the first loss function based on Softmax, and any method in the prior art that trains the face classification model by using the loss function based on Softmax may also be used, which is not described herein again.

And S102, performing model retraining on the face feature extraction model by using the testimony data set and the second loss function based on the Triplet to obtain a final face feature extraction model.

The testimony data set comprises a plurality of human face pictures, and the human face picture of each person at least comprises a certificate picture and a non-certificate picture.

Preferably, in order to be suitable for a scene with only one certificate picture (for example, a face picture on an identification card), the face picture of each person in the personal certificate data set in the embodiment may include one certificate picture and one non-certificate picture.

The second Loss function based on the Triplet may be the Triplet Loss or another Loss function modified or improved based on the Triplet Loss.

The triple Loss is a Loss function in deep learning, and is used for training samples with small differences, such as human faces. The triple Loss is a triple (triple) Loss calculation (Loss) formed by three pieces of picture information, wherein the triple is composed of an Anchor (Anchor) example, a Positive (Positive) example and a Negative (Negative) example, the facial features of any one picture can be used as the Anchor example, the second picture belonging to the same person as the picture is the Positive example of the triple Loss, and the third picture not belonging to the same person as the picture is the Negative example of the triple Loss. The similarity calculation of the pictures is realized by optimizing that the distance between the anchor examples and the positive examples is smaller than that between the anchor examples and the negative examples.

Specifically, the testimony data set and the second loss function based on the Triplet are used for model retraining on the face feature extraction model to obtain a final face feature extraction model, and the method can be realized by adopting the following modes:

for the face pictures in the authentication data set, selecting a plurality of groups of training data from the testimony data set, wherein each group of training data comprises three training pictures, and the two training pictures belong to the same person; extracting the face features of three training pictures in each group of training data through a face feature extraction model obtained after the training in the step S101, wherein the face features of the three pictures in each group of training data form a triple to obtain a plurality of triples; and inputting the Triplet model based on the second loss function of the Triplet into the triple, training the triple model until the triple model meets the preset condition, obtaining the final parameters of the face feature extraction model, and keeping the face feature extraction model.

Specifically, the triple input is based on a Triplet model of a second loss function of the Triplet for model retraining. The method can be specifically realized by adopting the following modes:

calculating second loss function values corresponding to the multiple triples; judging whether second loss function values corresponding to the multiple triples meet preset conditions or not; if the second loss function values corresponding to the triples meet preset conditions, taking the current face feature extraction model as a final face feature extraction model; and if the second loss function values corresponding to the triples do not meet the preset condition, skipping to execute the step of selecting the triples from the testimony data set, and continuing to train the face feature extraction model.

The preset condition may be set by a technician according to an actual need, for example, may be a set convergence condition, and the embodiment is not specifically limited herein.

Optionally, in the three training pictures of each set of training data, the first training picture and the second training picture are certificate pictures and non-certificate pictures of the same person, and the third training picture and the second training picture do not belong to the same person.

Further, calculating the second loss function values corresponding to the multiple triples may specifically be implemented as follows:

calculating a first Euclidean distance and a second Euclidean distance of each triple, wherein the first Euclidean distance is the Euclidean distance between the face features of the two training pictures belonging to the same person in the triple, and the second Euclidean distance is the Euclidean distance between the face features of the two training pictures not belonging to the same person in the triple; and calculating second loss function values corresponding to the multiple triples according to the first Euclidean distance and the second Euclidean distance of each triplet.

Optionally, after the first euclidean distance and the second euclidean distance of each triplet are obtained through calculation, a second loss function value corresponding to each triplet may be calculated according to the first euclidean distance and the second euclidean distance of each triplet, whether the second loss function values correspond to the triples meet the preset condition is judged according to the second loss function values corresponding to all the triples, and if the second loss function values correspond to the triples, the current face feature extraction model is used as the final face feature extraction model; and if the face feature extraction model does not meet the preset condition, skipping to execute the step of selecting a plurality of triples from the testimony data set, and continuing to train the face feature extraction model.

Optionally, after the first euclidean distance and the second euclidean distance of each triplet are obtained through calculation, the triplets may be screened according to the first euclidean distance and the second euclidean distance of each triplet, and effective triplets with the first euclidean distance smaller than the second euclidean distance are screened out; then, according to the first Euclidean distance and the second Euclidean distance of the effective triple, calculating a corresponding second loss function value of the effective triple; judging whether the corresponding second loss function value of the effective triple accords with a preset condition, and if so, taking the current face feature extraction model as a final face feature extraction model; and if the face feature extraction model does not meet the preset condition, skipping to execute the step of selecting a plurality of triples from the testimony data set, and continuing to train the face feature extraction model. Effective triples which are more beneficial to optimizing the face feature extraction model are obtained by screening the triples, and the face feature extraction model obtained through training of the effective triples has a better effect.

The face feature extraction model is trained by using a face data set and a first loss function based on Softmax, then a testimony data set and a second loss function based on triple are used for model retraining, a final face feature extraction model is obtained, the first loss function based on Softmax is trained by using a common face data set, the convergence speed of model training by using a testimony data set and a second loss function based on triple can be accelerated, and the efficiency of model training is improved; moreover, the data characteristics of a common face data set and a testimony data set can be fully utilized, even if the testimony data set is less, the testimony data set can be well utilized to train a face feature model through a second loss function based on the Triplet, the accuracy of extracting the face features when the face feature model is applied to the testimony scene is improved, and the accuracy of face recognition can be improved.

Fig. 2 is a flowchart of another processing method based on face recognition according to an embodiment of the present disclosure. On the basis of the embodiment shown in fig. 1, in this embodiment, after the final face feature extraction model is obtained through training, a processing procedure of performing face recognition by using the face feature extraction model is further included. As shown in fig. 2, the method further comprises the steps of:

step S201, obtaining a character picture to be processed.

The face feature extraction model in the embodiment can be applied to application scenes of various face recognition systems for safety monitoring and safety protection, and a figure picture to be processed can be acquired through an image acquisition device, wherein the image acquisition device can be a monitoring device which is distributed in advance or a monitoring device which is arranged in a district, a road and the like in advance; the to-be-processed person picture may be a picture of a deployment object captured by a monitoring device, or a picture of a fleeing suspect captured by a roadside, and the like.

Step S202, extracting the human face characteristics of the character picture through the human face characteristic extraction model.

And inputting the figure picture to be processed into a human face feature extraction model, and extracting the human face feature of the figure picture.

Step S203, calculating the Euclidean distance between the human face characteristics of the figure picture and the human face characteristics of each certificate picture in the picture library.

And step S204, determining the certificate picture matched with the figure picture according to the Euclidean distance between the face characteristics of the figure picture and the face characteristics of each certificate picture in the picture library.

Wherein the picture library comprises all the certificate pictures of people with known identities, and at least one certificate picture of each person.

Optionally, after the final face feature extraction model is obtained, the face feature extraction model can be used to extract the face feature of each certificate picture in the picture library of the face recognition system, and the extracted face feature is stored, so that the face feature of each certificate picture in the picture library can be directly read when face recognition is needed, and the face recognition efficiency can be improved.

In this embodiment, the euclidean distance between the face features of the person picture and the face features of any one certificate picture is calculated by using a method of the euclidean distance between the feature data of two pictures in any one of the prior art, which is not described herein again.

Specifically, this step can be implemented as follows:

according to the Euclidean distance between the human face features of the figure pictures and the human face features of each certificate picture in the picture library, the Euclidean distance between the human face features of the two pictures can reflect the difference of human faces of the two pictures, the certificate picture with the minimum Euclidean distance between the certificate picture and the human face features of the figure pictures is determined to be the certificate picture matched with the figure pictures, and the only certificate picture matched with the figure pictures is obtained.

Optionally, this step may also be implemented as follows:

and determining the certificate pictures with the Euclidean distance smaller than a preset distance threshold value from the face features of the figure pictures to be certificate pictures matched with the figure pictures according to the Euclidean distance between the face features of the figure pictures and the face features of each certificate picture in the picture library, and obtaining at least one certificate picture matched with the figure pictures.

The preset distance threshold may be obtained by counting and analyzing, by a technician, the euclidean distance between the face features of two training pictures belonging to the same person in a plurality of triples calculated during the last round of training or verification of the face feature extraction model in the embodiment shown in fig. 1 and the euclidean distance between the face features of two training pictures not belonging to the same person; alternatively, the preset distance threshold may be set by a technician according to practical experience, and the embodiment is not specifically limited herein.

The face feature extraction model is trained by utilizing the face data set and the first loss function based on Softmax, then the testimony data set and the second loss function based on triple are utilized to train the face feature extraction model, the obtained face feature extraction model is obtained, the face feature extraction model can be used for accurately extracting the face features of the figure picture to be processed and the face features of the certificate pictures in the picture library, the figure picture can be further accurately matched with the certificate pictures in the picture library, and the accuracy of face recognition is improved.

The embodiment of the present disclosure further provides a processing device based on face recognition, and fig. 3 is a schematic structural diagram of the processing device based on face recognition provided by the embodiment of the present disclosure. The processing device based on the face recognition provided by the embodiment of the disclosure can execute the processing flow provided by the embodiment of the processing method based on the face recognition. As shown in fig. 3, the processing device 30 based on face recognition includes: a first training module 301 and a second training module 302.

Specifically, the first training module 301 is configured to train the face feature extraction model using the face data set and the first loss function based on Softmax.

The second training module 302 is configured to perform model retraining on the face feature extraction model by using the testimony data set and the second loss function based on the Triplet, so as to obtain a final face feature extraction model.

Optionally, the testimony data set comprises a plurality of human face pictures, and the human face picture of each person at least comprises a certificate picture and a non-certificate picture.

Optionally, the second training module 302 is further configured to:

selecting a plurality of groups of training data from the testimonial data set, wherein each group of training data comprises three training pictures, and two training pictures belong to the same person; extracting the face features of the three training pictures in each group of training data through a face feature extraction model, wherein the face features of the three pictures in each group of training data form a triple, and a plurality of triples are obtained; calculating second loss function values corresponding to the multiple triples; if the second loss function values corresponding to the triples meet preset conditions, taking the current face feature extraction model as a final face feature extraction model; and if the second loss function values corresponding to the triples do not meet the preset condition, skipping to execute the step of selecting the triples from the testimony data set.

Optionally, the second training module 302 is further configured to:

and calculating a second loss function value corresponding to each triple according to the first Euclidean distance and the second Euclidean distance of each triple.

Optionally, the second training module 302 is further configured to:

screening effective triples of which the first Euclidean distance is smaller than the second Euclidean distance according to the first Euclidean distance and the second Euclidean distance of each triplet; and calculating a corresponding second loss function value of the effective triple according to the first Euclidean distance and the second Euclidean distance of the effective triple.

Optionally, the first Loss function is Softmax Loss or Cosine Loss.

Optionally, the face feature extraction model adopts a network structure based on a residual error network ResNet.

The apparatus provided in the embodiment of the present disclosure may be specifically configured to execute the method embodiment shown in fig. 1, and specific functions are not described herein again.

Fig. 4 is a schematic structural diagram of another processing apparatus based on face recognition according to an embodiment of the present disclosure. On the basis of the above embodiment, in the present embodiment, as shown in fig. 4, the processing device 30 based on face recognition further includes: a face recognition module 303.

Specifically, the face recognition module 303 is configured to:

acquiring a figure picture to be processed; extracting the face characteristics of the figure picture through a face characteristic extraction model; calculating the Euclidean distance between the face characteristics of the figure picture and the face characteristics of each certificate picture in the picture library; and determining the certificate picture matched with the figure picture according to the Euclidean distance between the face characteristics of the figure picture and the face characteristics of each certificate picture in the picture library.

Optionally, the face recognition module 303 is further configured to:

and extracting the face characteristics of each certificate picture in the picture library through the face characteristic extraction model.

Optionally, the face recognition module 303 is further configured to:

and determining the certificate picture with the minimum Euclidean distance from the face features of the figure picture to the face features of each certificate picture in the picture library as the certificate picture matched with the figure picture according to the Euclidean distance between the face features of the figure picture and the face features of each certificate picture in the picture library.

Optionally, the face recognition module 303 is further configured to:

and determining the certificate picture of which the Euclidean distance from the face features of the figure picture is less than a preset distance threshold value as the certificate picture matched with the figure picture according to the Euclidean distance between the face features of the figure picture and the face features of each certificate picture in the picture library.

The apparatus provided in the embodiment of the present disclosure may be specifically configured to execute the method embodiment shown in fig. 2, and specific functions are not described herein again.

The embodiment of the disclosure also provides computer equipment comprising the processing device based on face recognition provided by any one of the embodiments.

The embodiment of the present disclosure further provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the processing method based on face recognition.

The disclosed embodiments also provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the above-mentioned processing method based on face recognition.

The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.

An embodiment of the present disclosure further provides an electronic device, a structure of which is shown in fig. 5, where the electronic device includes:

at least one processor (processor)100, one processor 100 being exemplified in fig. 5; and a memory (memory)101, and may further include a Communication Interface (Communication Interface)102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 may communicate with each other via a bus 103. The communication interface 102 may be used for information transfer. The processor 100 may call logic instructions in the memory 101 to execute the processing method based on face recognition of the above-described embodiment.

In addition, the logic instructions in the memory 101 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products.

The memory 101, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 executes functional applications and data processing, namely, implements the processing method based on face recognition in the above-described method embodiments, by executing software programs, instructions and modules stored in the memory 101.

The memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 101 may include a high-speed random access memory, and may also include a nonvolatile memory.

The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.

As used in this application, although the terms "first," "second," etc. may be used in this application to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, unless the meaning of the description changes, so long as all occurrences of the "first element" are renamed consistently and all occurrences of the "second element" are renamed consistently. The first and second elements are both elements, but may not be the same element.

The words used in this application are words of description only and not of limitation of the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The various aspects, implementations, or features of the described embodiments can be used alone or in any combination. Aspects of the described embodiments may be implemented by software, hardware, or a combination of software and hardware. The described embodiments may also be embodied by a computer-readable medium having computer-readable code stored thereon, the computer-readable code comprising instructions executable by at least one computing device. The computer readable medium can be associated with any data storage device that can store data which can be read by a computer system. Exemplary computer readable media can include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tape, and optical data storage devices, among others. The computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

The above description of the technology may refer to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration embodiments in which the described embodiments may be practiced. These embodiments, while described in sufficient detail to enable those skilled in the art to practice them, are non-limiting; other embodiments may be utilized and changes may be made without departing from the scope of the described embodiments. For example, the order of operations described in a flowchart is non-limiting, and thus the order of two or more operations illustrated in and described in accordance with the flowchart may be altered in accordance with several embodiments. As another example, in several embodiments, one or more operations illustrated in and described with respect to the flowcharts are optional or may be eliminated. Additionally, certain steps or functions may be added to the disclosed embodiments, or two or more steps may be permuted in order. All such variations are considered to be encompassed by the disclosed embodiments and the claims.

Additionally, terminology is used in the foregoing description of the technology to provide a thorough understanding of the described embodiments. However, no unnecessary detail is required to implement the described embodiments. Accordingly, the foregoing description of the embodiments has been presented for purposes of illustration and description. The embodiments presented in the foregoing description and the examples disclosed in accordance with these embodiments are provided solely to add context and aid in the understanding of the described embodiments. The above description is not intended to be exhaustive or to limit the described embodiments to the precise form disclosed. Many modifications, alternative uses, and variations are possible in light of the above teaching. In some instances, well known process steps have not been described in detail in order to avoid unnecessarily obscuring the described embodiments.

Claims

A processing method based on face recognition is characterized by comprising the following steps:

training a face feature extraction model by using a face data set and a first loss function based on Softmax;

and performing model retraining on the face feature extraction model by using the testimony data set and a second loss function based on the Triplet to obtain a final face feature extraction model.
The method of claim 1, wherein the credential data set includes a plurality of human face pictures, each human face picture including at least one credential picture and one non-credential picture.
The method of claim 2, wherein model retraining the face feature extraction model with a credential dataset and a Triplet-based second loss function to obtain a final face feature extraction model comprises:

selecting a plurality of groups of training data from the testimony data set, wherein each group of training data comprises three training pictures, and two training pictures belong to the same person;

extracting the face features of the three training pictures in each group of training data through the face feature extraction model, wherein the face features of the three pictures in each group of training data form a triple to obtain a plurality of triples;

calculating second loss function values corresponding to the multiple triples;

if the second loss function values corresponding to the triples meet preset conditions, taking the current face feature extraction model as a final face feature extraction model;

and if the second loss function values corresponding to the triples do not meet the preset condition, skipping to execute the step of selecting the triples from the testimony data set.
The method of claim 3,

in the three training pictures of each group of training data, the first training picture and the second training picture are certificate pictures and non-certificate pictures of the same person, and the third training picture and the second training picture do not belong to the same person.
The method of claim 3 or 4, wherein calculating the second loss function values for the plurality of triplets comprises:

calculating a first Euclidean distance and a second Euclidean distance of each triple, wherein the first Euclidean distance is the Euclidean distance between the face features of two training pictures belonging to the same person in the triple, and the second Euclidean distance is the Euclidean distance between the face features of two training pictures not belonging to the same person in the triple;

and calculating second loss function values corresponding to the multiple triples according to the first Euclidean distance and the second Euclidean distance of each triplet.
The method of claim 5, wherein calculating the second loss function value for the plurality of triples based on the first Euclidean distance and the second Euclidean distance for each triplet comprises:

and calculating a second loss function value corresponding to each triple according to the first Euclidean distance and the second Euclidean distance of each triple.
The method of claim 5, wherein calculating the second loss function value for the plurality of triples based on the first Euclidean distance and the second Euclidean distance for each triplet comprises:

screening effective triples of which the first Euclidean distance is smaller than the second Euclidean distance according to the first Euclidean distance and the second Euclidean distance of each triplet;

and calculating a corresponding second loss function value of the effective triple according to the first Euclidean distance and the second Euclidean distance of the effective triple.
The method of claim 1,

the first Loss function is Softmax Loss or Cosine Loss.
The method of claim 1,

the face feature extraction model adopts a network structure based on a residual error network.
The method of claim 1, after obtaining the final face feature extraction model, further comprising:

acquiring a figure picture to be processed;

extracting the human face features of the figure picture through the human face feature extraction model;

calculating the Euclidean distance between the face characteristics of the figure picture and the face characteristics of each certificate picture in the picture library;

and determining the certificate picture matched with the figure picture according to the Euclidean distance between the human face characteristics of the figure picture and the human face characteristics of each certificate picture in the picture library.
The method of claim 10, wherein before calculating the euclidean distance between the facial features of the person picture and the facial features of each document picture in the picture library, the method further comprises:

and extracting the face characteristics of each certificate picture in the picture library through the face characteristic extraction model.
The method of claim 10, wherein determining the certificate pictures matched with the character pictures according to Euclidean distances between the face features of the character pictures and the face features of each certificate picture in the picture library comprises:

and determining the certificate picture with the minimum Euclidean distance from the face features of the figure picture to the face features of each certificate picture in the picture library as the certificate picture matched with the figure picture.
The method of claim 10, wherein determining the certificate pictures matched with the character pictures according to the Euclidean distance between the facial features of the character pictures and the facial features of each certificate picture in the picture library comprises:

and determining the certificate picture of which the Euclidean distance from the face features of the figure picture is less than a preset distance threshold value as the certificate picture matched with the figure picture according to the Euclidean distance between the face features of the figure picture and the face features of each certificate picture in the picture library.
A processing apparatus based on face recognition, comprising:

the first training module is used for training a face feature extraction model by utilizing a face data set and a first loss function based on Softmax;

and the second training module is used for performing model retraining on the human face feature extraction model by using the testimony data set and a second loss function based on the Triplet to obtain a final human face feature extraction model.
A computer device comprising the apparatus of claim 14.
An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor, the instructions, when executed by the at least one processor, causing the at least one processor to perform the method of any of claims 1-13.
A computer-readable storage medium having stored thereon computer-executable instructions configured to perform the method of any one of claims 1-13.
A computer program product, characterized in that the computer program product comprises a computer program stored on a computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method of any one of claims 1-13.