CN108009528A

CN108009528A - Face authentication method, device, computer equipment and storage medium based on Triplet Loss

Info

Publication number: CN108009528A
Application number: CN201711436879.4A
Authority: CN
Inventors: 许丹丹; 梁添才; 章烈剽; 龚文川
Original assignee: Guangdian Yuntong Financial Electronic Co Ltd
Current assignee: GRG Banking Equipment Co Ltd; Guangdian Yuntong Financial Electronic Co Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2018-05-08
Anticipated expiration: 2037-12-26
Also published as: WO2019128367A1; CN108009528B

Abstract

The present invention relates to a kind of face authentication method based on Triplet Loss, device, computer equipment and storage medium, this method includes：Asked based on face authentication, obtain certificate photograph and the scene photo of personage；Carry out Face datection, crucial point location and image preprocessing respectively to scene photo and certificate photograph, obtain the corresponding scene facial image of scene photo, and the corresponding certificate facial image of certificate photograph；Scene facial image and certificate facial image are input to the advance trained convolutional neural networks model for face authentication, and the corresponding first eigenvector of scene facial image of convolutional neural networks model output is obtained, and the corresponding second feature vector of certificate facial image；Calculate the COS distance of first eigenvector and second feature vector；Compare COS distance and predetermined threshold value, and face authentication result is determined according to comparative result.It the method increase the reliability of face authentication.

Description

Face authentication method, device, computer equipment and storage based on Triplet Loss Medium

Technical field

The present invention relates to technical field of image processing, more particularly to a kind of face authentication side based on Triplet Loss Method, device, computer equipment and storage medium.

Background technology

Face authentication, refers to contrast the certificate photograph in the personage's scene photo and identity information of collection in worksite, judges Whether it is same person.The key technology of face authentication is recognition of face.

With the rise of depth learning technology, the relevant issues of recognition of face constantly break through traditional technical bottleneck, performance Level is highly improved.In the research work of recognition of face is solved the problems, such as with deep learning, mainly there are two to send mainstream Method：Method based on classification learning and the method based on metric learning.Wherein, the method based on classification learning mainly exists Depth convolutional network extraction feature after calculate sample Classification Loss function (such as softmax loss, center loss and Related variants) network is optimized, last layer of network is the full articulamentum for classification, the quantity of its output node is past Toward to be consistent with total classification number of training dataset, such method is more suitable for training sample, especially same category Training sample than more rich situation, network can obtain preferable training effect and generalization ability.But when classification number reaches During hundreds thousand of or higher amount level, the last classification layer of network (full articulamentum) parameter amount can linearly increase and quite huge, Network is caused to be difficult to train.

Another kind of method is the method based on metric learning, this method tissue training's sample (such as two in a manner of tuple Tuple pair or triple triplet), need not be by layer of classifying after depth convolutional network, and it is based on directly on convolution The measurement loss (such as contrastive loss, triplet loss etc.) that feature vector is calculated between sample to carry out network Optimization, this method need not train classification layer, therefore the influence that network parameter amount increases from classification number, to training dataset Classification number is not limited, it is only necessary to chooses the similar or suitable tuple of foreign peoples's sample architecture according to corresponding strategy.Compared to classification Learning method, metric learning method is more suitable for that training data range is larger but depth is insufficient, and (sample class number is more, but similar sample Originally situation less), by the various combination between sample, can construct quite abundant tuple data and be used to train, spend at the same time Amount mode of learning focuses more on tuple internal relations, for 1:This kind of judgement of 1 face verification is that problem has that its is congenital with no Advantage.

In practical applications, many mechanisms require that system of real name is registered, for example, open a bank account, phone number registration, gold Melt account to open an account etc..System of real name register request user carry identity card arrive the place specified, by staff verification in person with After the photo of identity card corresponds to, can open an account success.And with Internet technology develop, more and more mechanisms are proposed just The people service, and are no longer strictly required client to specified site.The geographical location of user is unrestricted, uploads identity card, and utilize shifting Personage's scene photo at the image acquisition device scene of dynamic terminal, carries out face authentication, and lead in face authentication by system Later, you can success of opening an account.And the learning method of measurement is traditionally based on, measured using Euclidean distance similar between sample Degree, and Euclidean distance weigh be spatial points absolute distance, directly related with the position coordinates where each point, this is not Meet the properties of distributions in face characteristic space, cause the reliability of recognition of face relatively low.

The content of the invention

Based on this, it is necessary to for traditional face authentication method reliability it is low the problem of, there is provided one kind is based on Triplet Face authentication method, device, computer equipment and the storage medium of Loss.

A kind of face authentication method based on Triplet Loss, including：

Asked based on face authentication, obtain certificate photograph and the scene photo of personage；

Carry out Face datection, crucial point location and image preprocessing respectively to the scene photo and the certificate photograph, Obtain the corresponding scene facial image of the scene photo, and the corresponding certificate facial image of the certificate photograph；

The scene facial image and certificate facial image are input to the advance trained convolution for face authentication Neural network model, and obtain the corresponding fisrt feature of the scene facial image of convolutional neural networks model output to Amount, and the corresponding second feature vector of the certificate facial image；Wherein, the convolutional neural networks model is based on triple The supervised training of loss function obtains；

Calculate the COS distance of the first eigenvector and second feature vector；

Compare the COS distance and predetermined threshold value, and face authentication result is determined according to comparative result.

In one embodiment, the method further includes：

The training sample of tape label is obtained, the training sample includes marked a certificate for belonging to each tagged object Facial image and at least a scene facial image；

According to the training sample training convolutional neural networks module, the corresponding ternary of each training sample is produced by OHEM Constituent element element；The triple element includes reference sample, positive sample and negative sample；

According to the triple element of each training sample, based on the supervision of triple loss function, the training convolutional Neural Network model；The triple loss function, using COS distance as metric form, optimizes mould by stochastic gradient descent algorithm Shape parameter；

Verification collection data are inputted into the convolutional neural networks model, when reaching trained termination condition, are obtained trained Convolutional neural networks model for face authentication.

In another embodiment, according to the training sample training convolutional neural networks model, produced by OHEM each The step of training sample corresponding triple element, including：

One image of random selection selects to belong to same label object, different from reference sample classification as sample refer to Image as positive sample；

It is right using the COS distance between currently trained convolutional neural networks model extraction feature according to OHEM strategies In each reference sample, it is not belonging to from other in the image of the label object, chosen distance minimum and the reference sample The image to belong to a different category, the negative sample as the reference sample.

In another embodiment, the triple loss function includes the restriction to the COS distance of similar sample, with And the restriction of the COS distance to foreign peoples's sample.

In another embodiment, the triple loss function is：

Wherein, cos () represents COS distance, its calculation isN is triple quantity,Represent the feature vector of reference sample,Represent the feature vector of similar positive sample,Represent foreign peoples's negative sample Feature vector, []₊Implication it is as follows：α₁The spacing parameter between class, α₂To be spaced ginseng in class Number.

In another embodiment, the method further includes：Increase income the trained basis of human face data using based on magnanimity Model parameter is initialized, and addition normalization layer and triple loss function layer, obtain to be trained after feature output layer Convolutional neural networks model.

A kind of face authentication device based on Triplet Loss, including：Image collection module, image pre-processing module, Feature acquisition module, computing module and authentication module；

Described image acquisition module, for being asked based on face authentication, obtains certificate photograph and the scene photo of personage；

Described image pretreatment module, for the scene photo and the certificate photograph are carried out respectively Face datection, Crucial point location and image preprocessing, obtain the corresponding scene facial image of the scene photo, and the certificate photograph pair The certificate facial image answered；

The feature acquisition module, trains in advance for the scene facial image and certificate facial image to be input to The convolutional neural networks model for face authentication, and obtain the scene face of convolutional neural networks model output The corresponding first eigenvector of image, and the corresponding second feature vector of the certificate facial image；Wherein, the convolution god Through network model, the supervised training based on triple loss function obtains；

The computing module, for calculating the COS distance of the first eigenvector and second feature vector；

The authentication module, determines that face is recognized for the COS distance and predetermined threshold value, and according to comparative result Demonstrate,prove result.

In another embodiment, described device further includes：Sample acquisition module, triple acquisition module, training module And authentication module；

The sample acquisition module, for obtaining the training sample of tape label, the training sample includes marked belonging to A certificate facial image and an at least scene facial image for each tagged object；

The triple acquisition module, for according to the training sample training convolutional neural networks model, passing through OHEM Produce the corresponding triple element of each training sample；The triple element includes reference sample, positive sample and negative sample；

The training module, for the triple element according to each training sample, the supervision of base triple loss function, instruction Practice the convolutional neural networks model；The triple loss function, using COS distance as metric form, by under stochastic gradient Drop algorithm carrys out Optimized model parameter；

The authentication module, for verification collection data to be inputted the convolutional neural networks model, reaches training and terminates bar During part, the trained convolutional neural networks model for face authentication is obtained.

A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor Computer program, the processor realize the above-mentioned face authentication based on Triplet Loss when performing the computer program The step of method.

A kind of storage medium, is stored thereon with computer program, it is characterised in that the computer program is executed by processor When, the step of realizing above-mentioned face authentication method based on Triplet Loss.

Face authentication method based on Triplet Loss, device, computer equipment and storage medium of the present invention, Face authentication is carried out using convolutional neural networks trained in advance, since convolutional neural networks model is based on triple loss function Supervised training obtain, and the similarity of scene facial image and certificate facial image is according to scene facial image corresponding first The COS distance of feature vector and the corresponding second feature vector of certificate facial image is calculated, and what COS distance was weighed is empty Between vectorial angle, the difference being more embodied on direction, so as to more meet the properties of distributions in face characteristic space, improves people The reliability of face certification.

Brief description of the drawings

Fig. 1 is the structure diagram of the face authentication system based on Triplet Loss of one embodiment；

Fig. 2 is the flow chart of the face authentication method based on Triplet Loss in one embodiment；

Fig. 3 is the flow for the step of training obtains the convolutional neural networks model for face authentication in one embodiment Figure；

Fig. 4 be spaced between class consistent, variance within clusters it is larger in the case of, the probability schematic diagram that wrong point of sample；

Fig. 5 be spaced between class consistent, variance within clusters it is smaller in the case of, the probability schematic diagram that wrong point of sample；

Fig. 6 is the schematic diagram of the transfer learning process of the face authentication based on Triplet Loss in one embodiment；

Fig. 7 is the structure diagram for the convolutional neural networks model for being used for face authentication in one embodiment；

Fig. 8 is the flow diagram of the face authentication method based on Triplet Loss in one embodiment；

Fig. 9 is the structure diagram of the face authentication device based on Triplet Loss in one embodiment；

Figure 10 is the structure diagram of the face authentication device based on Triplet Loss in another embodiment.

Embodiment

Fig. 1 is the structure diagram of the face authentication system based on Triplet Loss of one embodiment.Such as Fig. 1 institutes Show, face authentication system includes server 101 and image collecting device 102.Wherein, server 101 and image collecting device 102 Network connection.Image collecting device 102 gathers the real-time scene photo of user to be certified, and certificate photograph, and by collection Real-time scene photo and certificate photograph are sent to server 101.Server 101 is judged in the personage and certificate photo of scene photo Whether personage is same people, and the identity of user to be certified is authenticated.Based on specific application scenarios, image collecting device 102 can be camera, or the user terminal with camera function.Exemplified by the scene of opening an account, image collecting device 102 can Think camera；Exemplified by financial account is carried out by internet and is opened an account, image collecting device 102 can be with camera function Mobile terminal.

In other embodiments, face authentication system can also include card reader, for reading certificate (such as identity card Deng) certificate photo in chip.

Fig. 2 is the flow chart of the face authentication method based on Triplet Loss in one embodiment.As shown in Fig. 2, should Method includes：

S202, is asked based on face authentication, obtains certificate photograph and the scene photo of personage.

Wherein, certificate photograph refers to be able to demonstrate that the photo corresponding to the certificate of piece identity, such as is printed on identity card Certificate photo in the certificate photo or chip of system.The acquisition modes of certificate photograph can use and carry out acquisition of taking pictures to certificate, also may be used With the certificate photograph stored by card reader reading certificate chip.Certificate in the present embodiment can be identity card, driver's license Or social security card etc..

The scene photo of personage refers to that user to be certified is gathered in certification, the photograph of the user to be certified environment at the scene Piece.Site environment refers to local environment of the user when taking pictures, and site environment is unrestricted.The acquisition modes of scene photo can be with Using the mobile terminal collection scene photo with camera function and to send to server.

Face authentication, refers to contrast the certificate photograph in the personage's scene photo and identity information of collection in worksite, judges Whether it is same person.Face authentication request is triggered based on actual application operating, for example, the account opening request based on user, is touched Send out face authentication request.Application program carries out the acquisition operations of photo in the display interface prompting user of user terminal, and is shining After the completion of piece collection, the photo of collection is sent to server, carries out face authentication.

S204, carries out scene photo and certificate photograph Face datection, crucial point location and image preprocessing, obtains respectively The corresponding scene facial image of scene photo, and the corresponding certificate facial image of certificate photograph.

Face datection refers to identify photo and obtains the human face region in photo.

Crucial point location, refers to the human face region to being detected in photo, obtains position of the face key point in every photos Put.Face key point includes eyes, nose, corners of the mouth point, eyebrow and each component outline point of face.

In the present embodiment, the concatenated convolutional neutral net MTCNN methods based on multitask combination learning can be used complete at the same time Into Face datection and face critical point detection, can also be returned using the method for detecting human face based on LBP features and based on shape Face critical point detection method.

Image preprocessing refers to the position according to the face key point of detection in every pictures, carry out portrait alignment and Shear treatment, so as to obtain the normalized scene facial image of size and certificate facial image.Wherein, scene facial image refers to The facial image obtained after Face datection, crucial point location and image preprocessing is carried out to scene photo, certificate facial image is Refer to the facial image for carrying out being obtained after Face datection, crucial point location and image preprocessing to certificate photograph.

S206, the advance trained convolution for face authentication is input to by scene facial image and certificate facial image Neural network model, and the corresponding first eigenvector of scene facial image of convolutional neural networks model output is obtained, and The corresponding second feature vector of certificate facial image.

Wherein, supervision of the convolutional neural networks model based on triple loss function is trained in advance previously according to training sample Alright.Convolutional neural networks include convolutional layer, pond layer, activation primitive layer and full articulamentum, every layer of each neuron parameter Determined by training.Using trained convolutional neural networks, by network propagated forward, convolutional neural networks model is obtained The first eigenvector of the scene facial image of full articulamentum output, and the corresponding second feature vector of certificate facial image.

Triple (triplet) refers to concentrate from training data selects a sample at random, which is known as reference sample, so A sample for belonging to same people with reference sample is randomly selected again afterwards as positive sample, chooses the sample work for being not belonging to same people For negative sample, (reference sample, positive sample, negative sample) triple is thus formed.Card is mainly based upon since the testimony of a witness compares Part is according to the comparison shone with scene, rather than certificate photo and certificate photo or scene be according to the comparison shone with scene, therefore triple Pattern mainly has two kinds of combinations：During using certificate photo image as reference sample, positive sample and negative sample are that scene is shone；Shone with scene When image is reference sample, positive sample and negative sample are certificate photo.

For each sample in triple, the network of one parameter sharing of training, obtains the feature representation of three elements. The purpose for improving triple loss (triplet loss) be exactly by learning, allow reference sample and the feature representation of positive sample it Between distance it is as small as possible, and the distance between feature representation of reference sample and negative sample is as big as possible, and to allow reference Have between the distance between feature representation of the distance between sample and the feature representation of positive sample and reference sample and negative sample One minimum interval.

S208, calculates the COS distance of first eigenvector and second feature vector.

COS distance, also referred to as cosine similarity, are to be used as measurement by the use of two vectorial angle cosine values in vector space The measurement of the size of two inter-individual differences.The COS distance of first eigenvector and second feature vector is bigger, represents scene The similarity of facial image and certificate facial image is bigger, and the COS distance of first eigenvector and second feature vector is smaller, Represent that the similarity of scene facial image and certificate facial image is smaller.When scene facial image and the cosine of certificate facial image When distance is more received in 1, the probability that two images belong to same people is bigger, more than scene facial image and certificate facial image Chordal distance is smaller, and the probability that two images belong to same people is smaller.

In traditional triple loss (triplet loss) method, measured using Euclidean distance similar between sample Degree.And Euclidean distance weigh be spatial points absolute distance, directly related with the position coordinates where each point, this is not Meet the properties of distributions in face characteristic space.In the present embodiment, properties of distributions and the practical application field in face characteristic space are considered Scape, the similarity between sample is measured using COS distance.What COS distance was weighed is the angle of space vector, is more embodied Difference on direction, rather than position, so as to more meet the properties of distributions in face characteristic space.

Specifically, the calculation formula of COS distance is：

Wherein, x represents first eigenvector, and y represents second feature vector.

S210, compares COS distance and predetermined threshold value, and determines face authentication result according to comparative result.

Authentication result includes certification by the way that i.e. certificate photograph and scene photo belongs to same people.Authentication result, which further includes, recognizes Card failure, i.e. certificate photograph and scene photo are not belonging to same people.

Specifically, by COS distance compared with predetermined threshold value, when COS distance is more than predetermined threshold value, represent to demonstrate,prove The similarity of part photo and scene photo is more than predetermined threshold value, and certification success, when COS distance is less than predetermined threshold value, expression is The similarity of certificate photograph and scene photo is less than predetermined threshold value, authentification failure.

The above-mentioned face authentication method based on Triplet Loss, using convolutional neural networks trained in advance into pedestrian Face certification, since supervised training of the convolutional neural networks model based on triple loss function obtains, and scene facial image and The similarity of certificate facial image is according to the corresponding first eigenvector of scene facial image and certificate facial image corresponding The COS distance of two feature vectors is calculated, and what COS distance was weighed is the angle of space vector, is more embodied on direction Difference, so as to more meet the properties of distributions in face characteristic space, improve the reliability of face authentication.

In another embodiment, face authentication method further includes training and obtains the convolutional neural networks for face authentication The step of model.Fig. 3 is the stream for the step of training obtains the convolutional neural networks model for face authentication in one embodiment Cheng Tu.As shown in figure 3, the step includes：

S302, obtains the training sample of tape label, and training sample includes marked one that belongs to each tagged object card Part facial image and at least a scene facial image.

In the present embodiment, tagged object, that is, people, training sample marked the scene people for belonging to a people taking human as unit Face image and certificate facial image.Specifically, scene facial image and certificate facial image can be by shining the scene of tape label Piece and certificate photograph carry out Face datection, crucial point location and image preprocessing and obtain.

In the present embodiment, the concatenated convolutional neutral net MTCNN methods based on multitask combination learning can be used complete at the same time It is crucial into Face datection and face, it can also use the method for detecting human face based on LBP features and the face returned based on shape to close Key point detecting method.

Image preprocessing refers to the position according to the face key point of detection in every pictures, carry out portrait alignment and Shear treatment, so as to obtain size normalization scene facial image and certificate facial image.Wherein, scene facial image refers to pair Scene photo carries out the facial image obtained after Face datection, crucial point location and image preprocessing, and certificate facial image refers to The facial image obtained after Face datection, crucial point location and image preprocessing is carried out to certificate photograph.

S304, according to training sample training convolutional neural networks model, each training sample corresponding three is produced by OHEM Tuple elements；Triple element includes reference sample, positive sample and negative sample.

Triple has two kinds of combinations：During using certificate photo image as reference sample, positive sample and negative sample are scene According to image；When shining image as reference sample using scene, positive sample and negative sample are certificate photo image.

Specifically, by taking certificate photo is reference picture as an example, the random certificate photo sample for selecting a people is concentrated from training data, The sample is known as reference sample, then randomly select one again and reference sample belong to the scene of same people in the same old way this as positive sample This, choose be not belonging to the scene of same people in the same old way this be used as negative sample, thus forming one, (reference sample, positive sample, bear sample This) triple.

I.e. positive sample and reference sample are similar sample, that is, belong to same people's image.Negative sample is the foreign peoples of reference sample Sample, that is, be not belonging to the image of same people.Wherein, the reference sample in triple element and positive sample have been marked in training sample Note, negative sample is in the training process of convolutional neural networks, using OHEM (Online Hard Example Mining) plan Triple is slightly constructed online, i.e., during each iteration optimization of network, before being carried out using current network to candidate's triple To calculating, select to be not belonging to same user in training sample with reference sample, and the nearest image of COS distance is as negative sample, So as to obtain the corresponding triple element of each training sample.

In one embodiment, according to training sample training convolutional neural networks, and the corresponding ternary of each training sample is produced The step of constituent element element, comprises the following steps S1 and S2：

S1：One image of random selection belongs to same label object and reference sample classification not as sample, selection is referred to Same image is as positive sample.

Classification refers to affiliated image type, and in the present embodiment, the classification of training sample includes scene facial image and card Part facial image.Because face authentication is mainly the contrast between certificate photo and scene photograph, therefore, reference sample and positive sample should When belonging to different classifications, if reference sample is scene facial image, positive sample is certificate facial image；If reference sample is Certificate facial image, then positive sample is scene facial image.

S2：According to OHEM strategies, using the COS distance between currently trained convolutional neural networks model extraction feature, For each reference sample, it is not belonging to from other in the image of same label object, chosen distance minimum and reference sample category In different classes of image, the negative sample as the reference sample.

Negative sample is selected from the facial image of label that same people is not belonging to reference sample, and specifically, negative sample exists In the training process of convolutional neural networks, using the online construction triple of OHEM strategies, the i.e. mistake in each iteration optimization of network Cheng Zhong, carries out forward calculation to candidate's triple using current network, selects to be not belonging to reference sample in training sample same User, and COS distance is not belonging to same category of image as negative sample recently, with reference sample.That is, negative sample and reference The classification of sample is different.If it is believed that using certificate photo as reference sample in triple, positive sample and negative sample are scenes According to；If on the contrary shone for reference sample with scene, in addition positive sample and negative sample are certificate photos.

S306, according to the triple element of each training sample, based on the supervision of triple loss function, training convolutional nerve Network model, the triple loss function, using COS distance as metric form, optimizes mould by stochastic gradient descent algorithm Shape parameter.

Whether testimony of a witness verification terminal is shone with scene according to unanimously testing user identity by comparing user certificate chip Card, background acquisition to data be often that the sample of single people only has two figures, i.e., certificate photo is with comparing the field captured constantly Jing Zhao, and the quantity of Different Individual can be thousands of.If the few data of this larger and similar sample of categorical measure with Be trained based on the method for classification, classification layer parameter can be excessively huge and cause network to be very difficult to learn, therefore consider Solved with the method for metric learning.Wherein the typical of metric learning is usually to lose (triplet loss) side with triple Method, learns a kind of effective Feature Mapping, the characteristic distance of similar sample is small under the mapping by structural map as triple In the characteristic distance of foreign peoples's sample, so as to achieve the purpose that correctly to compare.

The purpose of triple loss (triplet loss) is exactly by learning, allowing reference sample and the mark sheet of positive sample Up to the distance between it is as small as possible, and the distance between feature representation of reference sample and negative sample is as big as possible, and to allow The distance between feature representation of the distance between reference sample and the feature representation of positive sample and reference sample and negative sample it Between have a minimum interval.

In another embodiment, triple loss function includes the restriction to the COS distance of similar sample, and right The restriction of the COS distance of foreign peoples's sample.

Wherein, similar sample refers to reference sample and positive sample, and foreign peoples's sample refers to reference sample and negative sample.Similar sample This COS distance refers to reference sample and the COS distance of positive sample, and the COS distance of foreign peoples's sample refers to reference sample and bears The COS distance of sample.

On the one hand, gap is without considering gap in class between original triplet loss methods simply consider class, such as Fruit distribution within class is not enough amassed wealth by heavy taxation, and the generalization ability of network will weaken, and scene adaptability can also be decreased.On the other hand, Original triplet loss methods measure the similarity between sample, actually faceform portion using Euclidean distance Link can more be measured using COS distance in aspect ratio after administration.What Euclidean distance was weighed is spatial points Absolute distance, it is directly related with the position coordinates where each point；And COS distance weigh be space vector angle, more The difference being embodied on direction, rather than position, so as to more meet the properties of distributions in face characteristic space.

Using triplet loss methods, network is inputted by constructing triple data online, then backpropagation ternary The measurement of group is lost to be iterated optimization.Each triple includes three images, is a reference sample respectively, one with The similar positive sample of reference sample, and one with the negative sample of reference sample foreign peoples, labeled as (anchor, positive, negative).The basic thought of original triplet loss is, by between metric learning reference sample and positive sample Distance is less than the distance between reference sample and negative sample, and is more than a minimum interval parameter alpha apart from its difference.Therefore it is original Triplet loss loss functions it is as follows：

Wherein, N is triple quantity,Represent the feature vector of reference sample (anchor),Represent similar The feature vector of positive sample (positive),Represent the feature vector of foreign peoples's negative sample (negative).Represent L2 Normal form, i.e. Euclidean distance.[·]₊Implication it is as follows：

It can be seen that from above formula, original triplet loss functions merely define similar sample (anchor, positive) The distance between foreign peoples's sample (anchor, negative), i.e., increase between class distance as far as possible by spacing parameter α, and right Inter- object distance is not limited in any way, i.e., does not make any constraint to the distance between similar sample.If inter- object distance is more dispersed, Variance is excessive, and the generalization ability of network will weaken, and sample will bigger by the probability of mistake point.Fig. 4 be spaced between class it is consistent, In the case of variance within clusters are larger, the wrong probability schematic diagram divided of sample, Fig. 5 is that consistent, the smaller situation of variance within clusters is spaced between class Under, the wrong probability schematic diagram divided of sample, as shown in Figure 4 and Figure 5, dash area represent the probability of wrong point of sample, are spaced between class Unanimously, in the case of variance within clusters are larger, the wrong probability divided of sample is spaced consistent, the smaller situation of variance within clusters between being significantly greater than class Under, the wrong probability divided of sample.

In view of the above-mentioned problems, the present invention proposes improved triplet loss methods, on the one hand remain in original method Restriction between class distance, while add the bound term to inter- object distance so that inter- object distance is amassed wealth by heavy taxation as far as possible.Its loss letter Counting expression formula is：

Compared to original triplet loss functions, the metric forms of improved triplet loss functions by Euclidean away from From COS distance is changed to, the uniformity of training stage and deployment phase metric form can be so kept, improves feature learning Continuity.It is consistent with original triplet loss effects with stylish triplet loss functions Section 1, for increasing class Between gap, Section 2 with the addition of the distance restraint to (positive tuple) to similar sample, for reducing gap in class.α₁Between between class Every parameter, value range is 0~0.2, α₂For spacing parameter in class, value range is 0.8~1.0.Significantly, since It is to be measured with cosine manner, the similarity between obtained metric two samples of correspondence, thereforeIt is only negative in expression formula Tuple cosine similarity is in α₁In the range of be more than positive tuple cosine similarity sample, just can really participate in training.

Based on improved triple loss function come training pattern, constraint is combined by Inter-class loss and Intra-class loss Come to model carry out backpropagation optimization train so that similar sample feature space as close possible to and foreign peoples's sample in spy Sign space is far as possible from improving the sense of model, so as to improve the reliability of face authentication.

S308, by verification collection data input convolutional neural networks, when reaching trained termination condition, obtains trained be used for The convolutional neural networks of face authentication.

Specifically, 90% data are taken from testimony of a witness view data pond, and as training set, residue 10% is as verification collection.It is based on Above formula calculates improved triplet loss values, feeds back in convolutional neural networks and is iterated optimization.Observe mould at the same time The performance that type is concentrated in verification, when verifying that performance no longer raises, model reaches convergence state, and the training stage terminates.

Above-mentioned face authentication method, on the one hand adds to sample in class in the loss function of original triplet loss The constraint of this distance, so as to reduce gap in class, the generalization ability of lift scheme while gap between increasing class；The opposing party Face, is changed to COS distance by Euclidean distance by the metric form of original triplet loss, keeps the measurement one of training and deployment Cause property, improves the continuity of feature learning.

In another embodiment, the step of training convolutional neural networks further include：Increase income face number using based on magnanimity Initialized according to trained basic model parameter, addition normalization layer and improved triple damage after feature output layer Function layer is lost, obtains convolutional neural networks to be trained.

Specifically, it is conventional to be instructed based on internet mass human face data when solving testimony of a witness unification problem with deep learning The testimony of a witness of the depth human face recognition model got under special scenes compares and can decline to a great extent using upper performance, and application-specific Testimony of a witness data source under scene is again than relatively limited, and directly study is often due to sample deficiency causes training result undesirable, Therefore pole need to research and develop it is a kind of be effectively extended trained method for the contextual data of small data set, to lift face knowledge Accuracy rate of the other model under application-specific scene, meets market application demand.

Deep learning algorithm tends to rely on the training of mass data, and in testimony of a witness unification application, certificate photo shines with scene Comparison is belonged to heterogeneous sample and compares problem, the conventional depth recognition of face mould trained based on magnanimity internet human face data Type is compared in the testimony of a witness and can declined to a great extent using upper performance.But testimony of a witness data source is limited (needs to be provided simultaneously with same person ID Card Image and corresponding scene image), less available for trained data volume, direct training can be caused due to sample deficiency Training effect is undesirable, therefore when carrying out the model training of testimony of a witness unification with deep learning, often utilizes transfer learning Thought, first the internet human face data based on magnanimity train one in the basic model of dependable performance on test set of increasing income, so Secondary spread training is carried out in limited testimony of a witness data again afterwards, model is learnt the character representation of modality-specific automatically, carries Rise model performance.This process is as shown in Figure 6.

During second training, whole network is initialized with the good basic model parameter of pre-training, Ran Hou A L2 normalization layer and triplet loss layers improved, convolution to be trained are added after the feature output layer of network Neural network structure figure is as shown in Figure 7.

In one embodiment, a kind of flow diagram of face authentication method is as shown in figure 8, including three phases, difference For data acquisition and pretreatment stage, training stage and deployment phase.

Data acquisition and pretreatment stage, read certificate chip by the card reader module of testimony of a witness verification terminal equipment and shine, with And front camera crawl scene photograph, by human-face detector, Keypoint detector, face alignment with being obtained after shear module To the normalized certificate facial image of size and scene facial image.

Training stage, 90% data are taken from testimony of a witness view data pond, and as training set, residue 10% is as verification collection.By Comparison between the testimony of a witness compares mainly certificate photo and scene is shone, if because using certificate photo as reference chart in triple (anchor), then other two figures are that scene is shone；If on the contrary shone for reference chart with scene, in addition two figures are certificates According to.Construct the strategy of triple online using OHEM, i.e., during each iteration optimization of network, using current network to waiting Triple is selected to carry out forward calculation, screening meets effective triple of condition, improved triplet is calculated according to above formula Loss values, feed back in network and are iterated optimization.At the same time observation model verification concentrate performance, when verification performance not When raising again, model reaches convergence state, and the training stage terminates.

Deployment phase, is deployed to testimony of a witness verification terminal by trained model and carries out in use, the image that equipment collects By the preprocessor identical with the training stage, then obtained by network forward calculation the feature of every facial image to Amount, obtains the similarity of two images by calculating COS distance, is then made decisions according to predetermined threshold value, more than predetermined threshold value For same people, otherwise is different people.

Above-mentioned face authentication method, original triplet loss functions merely define the study relation of between class distance, on The face authentication method stated, the bound term of inter- object distance is added by improving original triplet loss loss functions, can be with So that network reduces gap in class as far as possible while increasing gap between class in the training process, so as to improve the extensive energy of network Power, and then the scene adaptability of lift scheme.In addition, it instead of the Euclidean distance in original triplet loss with COS distance Metric form, more meets the properties of distributions in face characteristic space, and it is consistent with deployment phase metric form to maintain the training stage Property so that comparison result is relatively reliable.

In one embodiment, there is provided a kind of face authentication device, as shown in figure 9, including：Image collection module 902, figure As pretreatment module 904, feature acquisition module 906, computing module 908 and authentication module 910.

Image collection module 902, for being asked based on face authentication, obtains certificate photograph and the scene photo of personage.

Image pre-processing module 904, for carrying out Face datection, crucial point location respectively to scene photo and certificate photograph And image preprocessing, obtain the corresponding scene facial image of scene photo, and the corresponding certificate facial image of certificate photograph.

Feature acquisition module 906, for scene facial image and certificate facial image to be input to advance trained use In the convolutional neural networks model of face authentication, and obtain the scene facial image corresponding the of convolutional neural networks model output One feature vector, and the corresponding second feature vector of certificate facial image；Wherein, convolutional neural networks model is based on triple The supervised training of loss function obtains.

Computing module 908, for calculating the COS distance of first eigenvector and second feature vector.

Authentication module 910, face authentication knot is determined for comparing COS distance and predetermined threshold value, and according to comparative result Fruit.

Above-mentioned face authentication device, face authentication is carried out using convolutional neural networks trained in advance, due to convolution god Through network model, the supervised training based on improved triple loss function obtains, and scene facial image and certificate face figure The similarity of picture is according to the corresponding first eigenvector of scene facial image and the corresponding second feature vector of certificate facial image COS distance be calculated, COS distance weigh be space vector angle, the difference being more embodied on direction, without It is position, so as to more meet the properties of distributions in face characteristic space, improves the reliability of face authentication.

As shown in figure 9, in another embodiment, face authentication device further includes：Sample acquisition module 912, triple Acquisition module 914, training module 916 and authentication module 918.

Sample acquisition module 912, for obtaining the training sample of tape label, the training sample includes marked belonging to every One certificate facial image of a tagged object and at least a scene facial image.

Triple acquisition module 914, for according to training sample training convolutional neural networks model, being produced by OHEM each The corresponding triple element of training sample；Triple element includes reference sample, positive sample and negative sample.

Specifically, triple acquisition module 914, belongs to same for randomly choosing an image as sample, selection is referred to One label object, the image different from reference sample classification are additionally operable to, according to OHEM strategies, utilize current training as positive sample Convolutional neural networks model extraction feature between COS distance, for each reference sample, from it is other have be not belonging to The image that in the facial image of same label object, chosen distance is minimum, belongs to a different category with reference sample, as the reference The negative sample of sample.

Specifically, using certificate photo as when referring to sample, positive sample and negative sample are that scene is shone；Shone using scene and be used as ginseng When examining sample, positive sample and negative sample are certificate photo.

Training module 916, for the triple element according to each training sample, based on the supervision of triple loss function, Training convolutional neural networks model, the triple loss function, using COS distance as metric form, passes through stochastic gradient descent Algorithm carrys out Optimized model parameter.

Specifically, modified triple loss function includes the restriction to the COS distance of similar sample, and to foreign peoples The restriction of the COS distance of sample.

Modified triple loss function is：

Authentication module 918, for verification collection data to be inputted convolutional neural networks model, when reaching trained termination condition, Obtain the trained convolutional neural networks model for face authentication.

In another embodiment, face authentication device further includes model initialization module 920, and magnanimity is based on for utilizing The trained basic model parameter of human face data of increasing income is initialized, addition normalization layer and triple after feature output layer Loss function layer, obtains convolutional neural networks to be trained.Above-mentioned face authentication device, on the one hand in original triplet The constraint to sample distance in class is added in the loss function of loss, so as to reduce class internal difference while gap between increasing class Away from the generalization ability of lift scheme；On the other hand, the metric form of original triplet loss is changed to cosine by Euclidean distance Distance, keeps the measurement uniformity of training and deployment, improves the continuity of feature learning.

A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor The step of computer program, processor realizes the face authentication method of the various embodiments described above when performing computer program.

A kind of storage medium, is stored thereon with computer program, it is characterised in that the computer program is executed by processor When, the step of realizing the face authentication method of the various embodiments described above.

Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.

Embodiment described above only expresses the several embodiments of the present invention, its description is more specific and detailed, but simultaneously Cannot therefore it be construed as limiting the scope of the patent.It should be pointed out that come for those of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of face authentication method based on Triplet Loss, including：

The scene facial image and certificate facial image are input to the advance trained convolutional Neural for face authentication Network model, and the corresponding first eigenvector of the scene facial image of the convolutional neural networks model output is obtained, And the corresponding second feature vector of the certificate facial image；Wherein, the convolutional neural networks model is damaged based on triple The supervised training for losing function obtains；

2. according to the method described in claim 1, it is characterized in that, the method further includes：

The training sample of tape label is obtained, the training sample includes marked a certificate face for belonging to each tagged object Image and at least a scene facial image；

According to the training sample training convolutional neural networks model, the corresponding ternary constituent element of each training sample is produced by OHEM Element；The triple element includes reference sample, positive sample and negative sample；

According to the triple element of each training sample, based on the supervision of triple loss function, the training convolutional neural networks Model；The triple loss function, using COS distance as metric form, is joined by stochastic gradient descent algorithm come Optimized model Number；

Verification collection data are inputted into the convolutional neural networks model, when reaching trained termination condition, obtain trained be used for The convolutional neural networks model of face authentication.

3. according to the method described in claim 2, it is characterized in that, according to the training sample training convolutional neural networks mould Type, the step of corresponding triple element of each training sample is produced by OHEM, including：

One image of random selection belongs to same label object, the figure different from reference sample classification as sample, selection is referred to As being used as positive sample；

According to OHEM strategies, using the COS distance between currently trained convolutional neural networks model extraction feature, for every One reference sample, is not belonging in the image of same label object from other, and chosen distance is minimum, belongs to the reference sample Different classes of image, the negative sample as the reference sample.

4. according to the method described in claim 2, it is characterized in that, the triple loss function is included to more than similar sample The restriction of chordal distance, and the restriction of the COS distance to foreign peoples's sample.

5. according to the method described in claim 4, it is characterized in that, the triple loss function is：

<mrow> <mover> <mi>L</mi> <mo>~</mo> </mover> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>N</mi> </munderover> <msub> <mrow> <mo>&lsqb;</mo> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>a</mi> </msubsup> <mo>)</mo> <mo>,</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>n</mi> </msubsup> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>a</mi> </msubsup> <mo>)</mo> <mo>,</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>p</mi> </msubsup> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&alpha;</mi> <mn>1</mn> </msub> <mo>&rsqb;</mo> </mrow> <mo>+</mo> </msub> <mo>+</mo> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>N</mi> </munderover> <msub> <mrow> <mo>&lsqb;</mo> <msub> <mi>&alpha;</mi> <mn>2</mn> </msub> <mo>-</mo> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mrow> <mo>(</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>a</mi> </msubsup> <mo>)</mo> <mo>,</mo> <mi>f</mi> <mo>(</mo> <msubsup> <mi>x</mi> <mi>i</mi> <mi>p</mi> </msubsup> <mo>)</mo> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mo>+</mo> </msub> </mrow>

Wherein, cos () represents COS distance, its calculation isN is triple quantity,Table Show the feature vector of reference sample,Represent the feature vector of similar positive sample,Represent the feature of foreign peoples's negative sample Vector, []₊Implication it is as follows：α₁The spacing parameter between class, α₂For spacing parameter in class.

6. according to the method described in claim 2, it is characterized in that, the method further includes：Increase income face using based on magnanimity The trained basic model parameter of data is initialized, addition normalization layer and triple loss function after feature output layer Layer, obtains convolutional neural networks model to be trained.

7. a kind of face authentication device based on Triplet Loss, including：Image collection module, image pre-processing module, spy Levy acquisition module, computing module and authentication module；

Described image pretreatment module, for carrying out Face datection, key respectively to the scene photo and the certificate photograph Point location and image preprocessing, obtain the corresponding scene facial image of the scene photo, and the certificate photograph is corresponding Certificate facial image；

The feature acquisition module, for the scene facial image and certificate facial image to be input to advance trained use In the convolutional neural networks model of face authentication, and obtain the scene facial image of the convolutional neural networks model output Corresponding first eigenvector, and the corresponding second feature vector of the certificate facial image；Wherein, the convolutional Neural net Supervised training of the network model based on triple loss function obtains；

The authentication module, face authentication knot is determined for the COS distance and predetermined threshold value, and according to comparative result Fruit.

8. device according to claim 7, it is characterised in that described device further includes：Sample acquisition module, triple obtain Modulus block, training module and authentication module；

The sample acquisition module, for obtaining the training sample of tape label, the training sample includes marked belonging to each One certificate facial image of tagged object and at least a scene facial image；

The triple acquisition module, for according to the training sample training convolutional neural networks model, being produced by OHEM The corresponding triple element of each training sample；The triple element includes reference sample, positive sample and negative sample；

The training module, for the triple element according to each training sample, based on the supervision of triple loss function, training The convolutional neural networks model；The triple loss function, using COS distance as metric form, passes through stochastic gradient descent Algorithm carrys out Optimized model parameter；

The authentication module, for verification collection data to be inputted the convolutional neural networks model, when reaching trained termination condition, Obtain the trained convolutional neural networks model for face authentication.

9. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor Calculation machine program, it is characterised in that the processor is realized described in any one of claim 1 to 6 when performing the computer program The face authentication method based on Triplet Loss the step of.

10. a kind of storage medium, is stored thereon with computer program, it is characterised in that the computer program is executed by processor When, the step of realizing face authentication method of claim 1 to 6 any one of them based on Triplet Loss.