The content of the invention
Based on this, it is necessary to for traditional face authentication method reliability it is low the problem of, there is provided one kind is based on Triplet
Face authentication method, device, computer equipment and the storage medium of Loss.
A kind of face authentication method based on Triplet Loss, including:
Asked based on face authentication, obtain certificate photograph and the scene photo of personage;
Carry out Face datection, crucial point location and image preprocessing respectively to the scene photo and the certificate photograph,
Obtain the corresponding scene facial image of the scene photo, and the corresponding certificate facial image of the certificate photograph;
The scene facial image and certificate facial image are input to the advance trained convolution for face authentication
Neural network model, and obtain the corresponding fisrt feature of the scene facial image of convolutional neural networks model output to
Amount, and the corresponding second feature vector of the certificate facial image;Wherein, the convolutional neural networks model is based on triple
The supervised training of loss function obtains;
Calculate the COS distance of the first eigenvector and second feature vector;
Compare the COS distance and predetermined threshold value, and face authentication result is determined according to comparative result.
In one embodiment, the method further includes:
The training sample of tape label is obtained, the training sample includes marked a certificate for belonging to each tagged object
Facial image and at least a scene facial image;
According to the training sample training convolutional neural networks module, the corresponding ternary of each training sample is produced by OHEM
Constituent element element;The triple element includes reference sample, positive sample and negative sample;
According to the triple element of each training sample, based on the supervision of triple loss function, the training convolutional Neural
Network model;The triple loss function, using COS distance as metric form, optimizes mould by stochastic gradient descent algorithm
Shape parameter;
Verification collection data are inputted into the convolutional neural networks model, when reaching trained termination condition, are obtained trained
Convolutional neural networks model for face authentication.
In another embodiment, according to the training sample training convolutional neural networks model, produced by OHEM each
The step of training sample corresponding triple element, including:
One image of random selection selects to belong to same label object, different from reference sample classification as sample refer to
Image as positive sample;
It is right using the COS distance between currently trained convolutional neural networks model extraction feature according to OHEM strategies
In each reference sample, it is not belonging to from other in the image of the label object, chosen distance minimum and the reference sample
The image to belong to a different category, the negative sample as the reference sample.
In another embodiment, the triple loss function includes the restriction to the COS distance of similar sample, with
And the restriction of the COS distance to foreign peoples's sample.
In another embodiment, the triple loss function is:
Wherein, cos () represents COS distance, its calculation isN is triple quantity,Represent the feature vector of reference sample,Represent the feature vector of similar positive sample,Represent foreign peoples's negative sample
Feature vector, []+Implication it is as follows:α1The spacing parameter between class, α2To be spaced ginseng in class
Number.
In another embodiment, the method further includes:Increase income the trained basis of human face data using based on magnanimity
Model parameter is initialized, and addition normalization layer and triple loss function layer, obtain to be trained after feature output layer
Convolutional neural networks model.
A kind of face authentication device based on Triplet Loss, including:Image collection module, image pre-processing module,
Feature acquisition module, computing module and authentication module;
Described image acquisition module, for being asked based on face authentication, obtains certificate photograph and the scene photo of personage;
Described image pretreatment module, for the scene photo and the certificate photograph are carried out respectively Face datection,
Crucial point location and image preprocessing, obtain the corresponding scene facial image of the scene photo, and the certificate photograph pair
The certificate facial image answered;
The feature acquisition module, trains in advance for the scene facial image and certificate facial image to be input to
The convolutional neural networks model for face authentication, and obtain the scene face of convolutional neural networks model output
The corresponding first eigenvector of image, and the corresponding second feature vector of the certificate facial image;Wherein, the convolution god
Through network model, the supervised training based on triple loss function obtains;
The computing module, for calculating the COS distance of the first eigenvector and second feature vector;
The authentication module, determines that face is recognized for the COS distance and predetermined threshold value, and according to comparative result
Demonstrate,prove result.
In another embodiment, described device further includes:Sample acquisition module, triple acquisition module, training module
And authentication module;
The sample acquisition module, for obtaining the training sample of tape label, the training sample includes marked belonging to
A certificate facial image and an at least scene facial image for each tagged object;
The triple acquisition module, for according to the training sample training convolutional neural networks model, passing through OHEM
Produce the corresponding triple element of each training sample;The triple element includes reference sample, positive sample and negative sample;
The training module, for the triple element according to each training sample, the supervision of base triple loss function, instruction
Practice the convolutional neural networks model;The triple loss function, using COS distance as metric form, by under stochastic gradient
Drop algorithm carrys out Optimized model parameter;
The authentication module, for verification collection data to be inputted the convolutional neural networks model, reaches training and terminates bar
During part, the trained convolutional neural networks model for face authentication is obtained.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor
Computer program, the processor realize the above-mentioned face authentication based on Triplet Loss when performing the computer program
The step of method.
A kind of storage medium, is stored thereon with computer program, it is characterised in that the computer program is executed by processor
When, the step of realizing above-mentioned face authentication method based on Triplet Loss.
Face authentication method based on Triplet Loss, device, computer equipment and storage medium of the present invention,
Face authentication is carried out using convolutional neural networks trained in advance, since convolutional neural networks model is based on triple loss function
Supervised training obtain, and the similarity of scene facial image and certificate facial image is according to scene facial image corresponding first
The COS distance of feature vector and the corresponding second feature vector of certificate facial image is calculated, and what COS distance was weighed is empty
Between vectorial angle, the difference being more embodied on direction, so as to more meet the properties of distributions in face characteristic space, improves people
The reliability of face certification.
Embodiment
Fig. 1 is the structure diagram of the face authentication system based on Triplet Loss of one embodiment.Such as Fig. 1 institutes
Show, face authentication system includes server 101 and image collecting device 102.Wherein, server 101 and image collecting device 102
Network connection.Image collecting device 102 gathers the real-time scene photo of user to be certified, and certificate photograph, and by collection
Real-time scene photo and certificate photograph are sent to server 101.Server 101 is judged in the personage and certificate photo of scene photo
Whether personage is same people, and the identity of user to be certified is authenticated.Based on specific application scenarios, image collecting device
102 can be camera, or the user terminal with camera function.Exemplified by the scene of opening an account, image collecting device 102 can
Think camera;Exemplified by financial account is carried out by internet and is opened an account, image collecting device 102 can be with camera function
Mobile terminal.
In other embodiments, face authentication system can also include card reader, for reading certificate (such as identity card
Deng) certificate photo in chip.
Fig. 2 is the flow chart of the face authentication method based on Triplet Loss in one embodiment.As shown in Fig. 2, should
Method includes:
S202, is asked based on face authentication, obtains certificate photograph and the scene photo of personage.
Wherein, certificate photograph refers to be able to demonstrate that the photo corresponding to the certificate of piece identity, such as is printed on identity card
Certificate photo in the certificate photo or chip of system.The acquisition modes of certificate photograph can use and carry out acquisition of taking pictures to certificate, also may be used
With the certificate photograph stored by card reader reading certificate chip.Certificate in the present embodiment can be identity card, driver's license
Or social security card etc..
The scene photo of personage refers to that user to be certified is gathered in certification, the photograph of the user to be certified environment at the scene
Piece.Site environment refers to local environment of the user when taking pictures, and site environment is unrestricted.The acquisition modes of scene photo can be with
Using the mobile terminal collection scene photo with camera function and to send to server.
Face authentication, refers to contrast the certificate photograph in the personage's scene photo and identity information of collection in worksite, judges
Whether it is same person.Face authentication request is triggered based on actual application operating, for example, the account opening request based on user, is touched
Send out face authentication request.Application program carries out the acquisition operations of photo in the display interface prompting user of user terminal, and is shining
After the completion of piece collection, the photo of collection is sent to server, carries out face authentication.
S204, carries out scene photo and certificate photograph Face datection, crucial point location and image preprocessing, obtains respectively
The corresponding scene facial image of scene photo, and the corresponding certificate facial image of certificate photograph.
Face datection refers to identify photo and obtains the human face region in photo.
Crucial point location, refers to the human face region to being detected in photo, obtains position of the face key point in every photos
Put.Face key point includes eyes, nose, corners of the mouth point, eyebrow and each component outline point of face.
In the present embodiment, the concatenated convolutional neutral net MTCNN methods based on multitask combination learning can be used complete at the same time
Into Face datection and face critical point detection, can also be returned using the method for detecting human face based on LBP features and based on shape
Face critical point detection method.
Image preprocessing refers to the position according to the face key point of detection in every pictures, carry out portrait alignment and
Shear treatment, so as to obtain the normalized scene facial image of size and certificate facial image.Wherein, scene facial image refers to
The facial image obtained after Face datection, crucial point location and image preprocessing is carried out to scene photo, certificate facial image is
Refer to the facial image for carrying out being obtained after Face datection, crucial point location and image preprocessing to certificate photograph.
S206, the advance trained convolution for face authentication is input to by scene facial image and certificate facial image
Neural network model, and the corresponding first eigenvector of scene facial image of convolutional neural networks model output is obtained, and
The corresponding second feature vector of certificate facial image.
Wherein, supervision of the convolutional neural networks model based on triple loss function is trained in advance previously according to training sample
Alright.Convolutional neural networks include convolutional layer, pond layer, activation primitive layer and full articulamentum, every layer of each neuron parameter
Determined by training.Using trained convolutional neural networks, by network propagated forward, convolutional neural networks model is obtained
The first eigenvector of the scene facial image of full articulamentum output, and the corresponding second feature vector of certificate facial image.
Triple (triplet) refers to concentrate from training data selects a sample at random, which is known as reference sample, so
A sample for belonging to same people with reference sample is randomly selected again afterwards as positive sample, chooses the sample work for being not belonging to same people
For negative sample, (reference sample, positive sample, negative sample) triple is thus formed.Card is mainly based upon since the testimony of a witness compares
Part is according to the comparison shone with scene, rather than certificate photo and certificate photo or scene be according to the comparison shone with scene, therefore triple
Pattern mainly has two kinds of combinations:During using certificate photo image as reference sample, positive sample and negative sample are that scene is shone;Shone with scene
When image is reference sample, positive sample and negative sample are certificate photo.
For each sample in triple, the network of one parameter sharing of training, obtains the feature representation of three elements.
The purpose for improving triple loss (triplet loss) be exactly by learning, allow reference sample and the feature representation of positive sample it
Between distance it is as small as possible, and the distance between feature representation of reference sample and negative sample is as big as possible, and to allow reference
Have between the distance between feature representation of the distance between sample and the feature representation of positive sample and reference sample and negative sample
One minimum interval.
S208, calculates the COS distance of first eigenvector and second feature vector.
COS distance, also referred to as cosine similarity, are to be used as measurement by the use of two vectorial angle cosine values in vector space
The measurement of the size of two inter-individual differences.The COS distance of first eigenvector and second feature vector is bigger, represents scene
The similarity of facial image and certificate facial image is bigger, and the COS distance of first eigenvector and second feature vector is smaller,
Represent that the similarity of scene facial image and certificate facial image is smaller.When scene facial image and the cosine of certificate facial image
When distance is more received in 1, the probability that two images belong to same people is bigger, more than scene facial image and certificate facial image
Chordal distance is smaller, and the probability that two images belong to same people is smaller.
In traditional triple loss (triplet loss) method, measured using Euclidean distance similar between sample
Degree.And Euclidean distance weigh be spatial points absolute distance, directly related with the position coordinates where each point, this is not
Meet the properties of distributions in face characteristic space.In the present embodiment, properties of distributions and the practical application field in face characteristic space are considered
Scape, the similarity between sample is measured using COS distance.What COS distance was weighed is the angle of space vector, is more embodied
Difference on direction, rather than position, so as to more meet the properties of distributions in face characteristic space.
Specifically, the calculation formula of COS distance is:
Wherein, x represents first eigenvector, and y represents second feature vector.
S210, compares COS distance and predetermined threshold value, and determines face authentication result according to comparative result.
Authentication result includes certification by the way that i.e. certificate photograph and scene photo belongs to same people.Authentication result, which further includes, recognizes
Card failure, i.e. certificate photograph and scene photo are not belonging to same people.
Specifically, by COS distance compared with predetermined threshold value, when COS distance is more than predetermined threshold value, represent to demonstrate,prove
The similarity of part photo and scene photo is more than predetermined threshold value, and certification success, when COS distance is less than predetermined threshold value, expression is
The similarity of certificate photograph and scene photo is less than predetermined threshold value, authentification failure.
The above-mentioned face authentication method based on Triplet Loss, using convolutional neural networks trained in advance into pedestrian
Face certification, since supervised training of the convolutional neural networks model based on triple loss function obtains, and scene facial image and
The similarity of certificate facial image is according to the corresponding first eigenvector of scene facial image and certificate facial image corresponding
The COS distance of two feature vectors is calculated, and what COS distance was weighed is the angle of space vector, is more embodied on direction
Difference, so as to more meet the properties of distributions in face characteristic space, improve the reliability of face authentication.
In another embodiment, face authentication method further includes training and obtains the convolutional neural networks for face authentication
The step of model.Fig. 3 is the stream for the step of training obtains the convolutional neural networks model for face authentication in one embodiment
Cheng Tu.As shown in figure 3, the step includes:
S302, obtains the training sample of tape label, and training sample includes marked one that belongs to each tagged object card
Part facial image and at least a scene facial image.
In the present embodiment, tagged object, that is, people, training sample marked the scene people for belonging to a people taking human as unit
Face image and certificate facial image.Specifically, scene facial image and certificate facial image can be by shining the scene of tape label
Piece and certificate photograph carry out Face datection, crucial point location and image preprocessing and obtain.
Face datection refers to identify photo and obtains the human face region in photo.
Crucial point location, refers to the human face region to being detected in photo, obtains position of the face key point in every photos
Put.Face key point includes eyes, nose, corners of the mouth point, eyebrow and each component outline point of face.
In the present embodiment, the concatenated convolutional neutral net MTCNN methods based on multitask combination learning can be used complete at the same time
It is crucial into Face datection and face, it can also use the method for detecting human face based on LBP features and the face returned based on shape to close
Key point detecting method.
Image preprocessing refers to the position according to the face key point of detection in every pictures, carry out portrait alignment and
Shear treatment, so as to obtain size normalization scene facial image and certificate facial image.Wherein, scene facial image refers to pair
Scene photo carries out the facial image obtained after Face datection, crucial point location and image preprocessing, and certificate facial image refers to
The facial image obtained after Face datection, crucial point location and image preprocessing is carried out to certificate photograph.
S304, according to training sample training convolutional neural networks model, each training sample corresponding three is produced by OHEM
Tuple elements;Triple element includes reference sample, positive sample and negative sample.
Triple has two kinds of combinations:During using certificate photo image as reference sample, positive sample and negative sample are scene
According to image;When shining image as reference sample using scene, positive sample and negative sample are certificate photo image.
Specifically, by taking certificate photo is reference picture as an example, the random certificate photo sample for selecting a people is concentrated from training data,
The sample is known as reference sample, then randomly select one again and reference sample belong to the scene of same people in the same old way this as positive sample
This, choose be not belonging to the scene of same people in the same old way this be used as negative sample, thus forming one, (reference sample, positive sample, bear sample
This) triple.
I.e. positive sample and reference sample are similar sample, that is, belong to same people's image.Negative sample is the foreign peoples of reference sample
Sample, that is, be not belonging to the image of same people.Wherein, the reference sample in triple element and positive sample have been marked in training sample
Note, negative sample is in the training process of convolutional neural networks, using OHEM (Online Hard Example Mining) plan
Triple is slightly constructed online, i.e., during each iteration optimization of network, before being carried out using current network to candidate's triple
To calculating, select to be not belonging to same user in training sample with reference sample, and the nearest image of COS distance is as negative sample,
So as to obtain the corresponding triple element of each training sample.
In one embodiment, according to training sample training convolutional neural networks, and the corresponding ternary of each training sample is produced
The step of constituent element element, comprises the following steps S1 and S2:
S1:One image of random selection belongs to same label object and reference sample classification not as sample, selection is referred to
Same image is as positive sample.
Classification refers to affiliated image type, and in the present embodiment, the classification of training sample includes scene facial image and card
Part facial image.Because face authentication is mainly the contrast between certificate photo and scene photograph, therefore, reference sample and positive sample should
When belonging to different classifications, if reference sample is scene facial image, positive sample is certificate facial image;If reference sample is
Certificate facial image, then positive sample is scene facial image.
S2:According to OHEM strategies, using the COS distance between currently trained convolutional neural networks model extraction feature,
For each reference sample, it is not belonging to from other in the image of same label object, chosen distance minimum and reference sample category
In different classes of image, the negative sample as the reference sample.
Negative sample is selected from the facial image of label that same people is not belonging to reference sample, and specifically, negative sample exists
In the training process of convolutional neural networks, using the online construction triple of OHEM strategies, the i.e. mistake in each iteration optimization of network
Cheng Zhong, carries out forward calculation to candidate's triple using current network, selects to be not belonging to reference sample in training sample same
User, and COS distance is not belonging to same category of image as negative sample recently, with reference sample.That is, negative sample and reference
The classification of sample is different.If it is believed that using certificate photo as reference sample in triple, positive sample and negative sample are scenes
According to;If on the contrary shone for reference sample with scene, in addition positive sample and negative sample are certificate photos.
S306, according to the triple element of each training sample, based on the supervision of triple loss function, training convolutional nerve
Network model, the triple loss function, using COS distance as metric form, optimizes mould by stochastic gradient descent algorithm
Shape parameter.
Whether testimony of a witness verification terminal is shone with scene according to unanimously testing user identity by comparing user certificate chip
Card, background acquisition to data be often that the sample of single people only has two figures, i.e., certificate photo is with comparing the field captured constantly
Jing Zhao, and the quantity of Different Individual can be thousands of.If the few data of this larger and similar sample of categorical measure with
Be trained based on the method for classification, classification layer parameter can be excessively huge and cause network to be very difficult to learn, therefore consider
Solved with the method for metric learning.Wherein the typical of metric learning is usually to lose (triplet loss) side with triple
Method, learns a kind of effective Feature Mapping, the characteristic distance of similar sample is small under the mapping by structural map as triple
In the characteristic distance of foreign peoples's sample, so as to achieve the purpose that correctly to compare.
The purpose of triple loss (triplet loss) is exactly by learning, allowing reference sample and the mark sheet of positive sample
Up to the distance between it is as small as possible, and the distance between feature representation of reference sample and negative sample is as big as possible, and to allow
The distance between feature representation of the distance between reference sample and the feature representation of positive sample and reference sample and negative sample it
Between have a minimum interval.
In another embodiment, triple loss function includes the restriction to the COS distance of similar sample, and right
The restriction of the COS distance of foreign peoples's sample.
Wherein, similar sample refers to reference sample and positive sample, and foreign peoples's sample refers to reference sample and negative sample.Similar sample
This COS distance refers to reference sample and the COS distance of positive sample, and the COS distance of foreign peoples's sample refers to reference sample and bears
The COS distance of sample.
On the one hand, gap is without considering gap in class between original triplet loss methods simply consider class, such as
Fruit distribution within class is not enough amassed wealth by heavy taxation, and the generalization ability of network will weaken, and scene adaptability can also be decreased.On the other hand,
Original triplet loss methods measure the similarity between sample, actually faceform portion using Euclidean distance
Link can more be measured using COS distance in aspect ratio after administration.What Euclidean distance was weighed is spatial points
Absolute distance, it is directly related with the position coordinates where each point;And COS distance weigh be space vector angle, more
The difference being embodied on direction, rather than position, so as to more meet the properties of distributions in face characteristic space.
Using triplet loss methods, network is inputted by constructing triple data online, then backpropagation ternary
The measurement of group is lost to be iterated optimization.Each triple includes three images, is a reference sample respectively, one with
The similar positive sample of reference sample, and one with the negative sample of reference sample foreign peoples, labeled as (anchor, positive,
negative).The basic thought of original triplet loss is, by between metric learning reference sample and positive sample
Distance is less than the distance between reference sample and negative sample, and is more than a minimum interval parameter alpha apart from its difference.Therefore it is original
Triplet loss loss functions it is as follows:
Wherein, N is triple quantity,Represent the feature vector of reference sample (anchor),Represent similar
The feature vector of positive sample (positive),Represent the feature vector of foreign peoples's negative sample (negative).Represent L2
Normal form, i.e. Euclidean distance.[·]+Implication it is as follows:
It can be seen that from above formula, original triplet loss functions merely define similar sample (anchor, positive)
The distance between foreign peoples's sample (anchor, negative), i.e., increase between class distance as far as possible by spacing parameter α, and right
Inter- object distance is not limited in any way, i.e., does not make any constraint to the distance between similar sample.If inter- object distance is more dispersed,
Variance is excessive, and the generalization ability of network will weaken, and sample will bigger by the probability of mistake point.Fig. 4 be spaced between class it is consistent,
In the case of variance within clusters are larger, the wrong probability schematic diagram divided of sample, Fig. 5 is that consistent, the smaller situation of variance within clusters is spaced between class
Under, the wrong probability schematic diagram divided of sample, as shown in Figure 4 and Figure 5, dash area represent the probability of wrong point of sample, are spaced between class
Unanimously, in the case of variance within clusters are larger, the wrong probability divided of sample is spaced consistent, the smaller situation of variance within clusters between being significantly greater than class
Under, the wrong probability divided of sample.
In view of the above-mentioned problems, the present invention proposes improved triplet loss methods, on the one hand remain in original method
Restriction between class distance, while add the bound term to inter- object distance so that inter- object distance is amassed wealth by heavy taxation as far as possible.Its loss letter
Counting expression formula is:
Wherein, cos () represents COS distance, its calculation isN is triple quantity,Represent the feature vector of reference sample,Represent the feature vector of similar positive sample,Represent foreign peoples's negative sample
Feature vector, []+Implication it is as follows:α1The spacing parameter between class, α2To be spaced ginseng in class
Number.
Compared to original triplet loss functions, the metric forms of improved triplet loss functions by Euclidean away from
From COS distance is changed to, the uniformity of training stage and deployment phase metric form can be so kept, improves feature learning
Continuity.It is consistent with original triplet loss effects with stylish triplet loss functions Section 1, for increasing class
Between gap, Section 2 with the addition of the distance restraint to (positive tuple) to similar sample, for reducing gap in class.α1Between between class
Every parameter, value range is 0~0.2, α2For spacing parameter in class, value range is 0.8~1.0.Significantly, since
It is to be measured with cosine manner, the similarity between obtained metric two samples of correspondence, thereforeIt is only negative in expression formula
Tuple cosine similarity is in α1In the range of be more than positive tuple cosine similarity sample, just can really participate in training.
Based on improved triple loss function come training pattern, constraint is combined by Inter-class loss and Intra-class loss
Come to model carry out backpropagation optimization train so that similar sample feature space as close possible to and foreign peoples's sample in spy
Sign space is far as possible from improving the sense of model, so as to improve the reliability of face authentication.
S308, by verification collection data input convolutional neural networks, when reaching trained termination condition, obtains trained be used for
The convolutional neural networks of face authentication.
Specifically, 90% data are taken from testimony of a witness view data pond, and as training set, residue 10% is as verification collection.It is based on
Above formula calculates improved triplet loss values, feeds back in convolutional neural networks and is iterated optimization.Observe mould at the same time
The performance that type is concentrated in verification, when verifying that performance no longer raises, model reaches convergence state, and the training stage terminates.
Above-mentioned face authentication method, on the one hand adds to sample in class in the loss function of original triplet loss
The constraint of this distance, so as to reduce gap in class, the generalization ability of lift scheme while gap between increasing class;The opposing party
Face, is changed to COS distance by Euclidean distance by the metric form of original triplet loss, keeps the measurement one of training and deployment
Cause property, improves the continuity of feature learning.
In another embodiment, the step of training convolutional neural networks further include:Increase income face number using based on magnanimity
Initialized according to trained basic model parameter, addition normalization layer and improved triple damage after feature output layer
Function layer is lost, obtains convolutional neural networks to be trained.
Specifically, it is conventional to be instructed based on internet mass human face data when solving testimony of a witness unification problem with deep learning
The testimony of a witness of the depth human face recognition model got under special scenes compares and can decline to a great extent using upper performance, and application-specific
Testimony of a witness data source under scene is again than relatively limited, and directly study is often due to sample deficiency causes training result undesirable,
Therefore pole need to research and develop it is a kind of be effectively extended trained method for the contextual data of small data set, to lift face knowledge
Accuracy rate of the other model under application-specific scene, meets market application demand.
Deep learning algorithm tends to rely on the training of mass data, and in testimony of a witness unification application, certificate photo shines with scene
Comparison is belonged to heterogeneous sample and compares problem, the conventional depth recognition of face mould trained based on magnanimity internet human face data
Type is compared in the testimony of a witness and can declined to a great extent using upper performance.But testimony of a witness data source is limited (needs to be provided simultaneously with same person
ID Card Image and corresponding scene image), less available for trained data volume, direct training can be caused due to sample deficiency
Training effect is undesirable, therefore when carrying out the model training of testimony of a witness unification with deep learning, often utilizes transfer learning
Thought, first the internet human face data based on magnanimity train one in the basic model of dependable performance on test set of increasing income, so
Secondary spread training is carried out in limited testimony of a witness data again afterwards, model is learnt the character representation of modality-specific automatically, carries
Rise model performance.This process is as shown in Figure 6.
During second training, whole network is initialized with the good basic model parameter of pre-training, Ran Hou
A L2 normalization layer and triplet loss layers improved, convolution to be trained are added after the feature output layer of network
Neural network structure figure is as shown in Figure 7.
In one embodiment, a kind of flow diagram of face authentication method is as shown in figure 8, including three phases, difference
For data acquisition and pretreatment stage, training stage and deployment phase.
Data acquisition and pretreatment stage, read certificate chip by the card reader module of testimony of a witness verification terminal equipment and shine, with
And front camera crawl scene photograph, by human-face detector, Keypoint detector, face alignment with being obtained after shear module
To the normalized certificate facial image of size and scene facial image.
Training stage, 90% data are taken from testimony of a witness view data pond, and as training set, residue 10% is as verification collection.By
Comparison between the testimony of a witness compares mainly certificate photo and scene is shone, if because using certificate photo as reference chart in triple
(anchor), then other two figures are that scene is shone;If on the contrary shone for reference chart with scene, in addition two figures are certificates
According to.Construct the strategy of triple online using OHEM, i.e., during each iteration optimization of network, using current network to waiting
Triple is selected to carry out forward calculation, screening meets effective triple of condition, improved triplet is calculated according to above formula
Loss values, feed back in network and are iterated optimization.At the same time observation model verification concentrate performance, when verification performance not
When raising again, model reaches convergence state, and the training stage terminates.
Deployment phase, is deployed to testimony of a witness verification terminal by trained model and carries out in use, the image that equipment collects
By the preprocessor identical with the training stage, then obtained by network forward calculation the feature of every facial image to
Amount, obtains the similarity of two images by calculating COS distance, is then made decisions according to predetermined threshold value, more than predetermined threshold value
For same people, otherwise is different people.
Above-mentioned face authentication method, original triplet loss functions merely define the study relation of between class distance, on
The face authentication method stated, the bound term of inter- object distance is added by improving original triplet loss loss functions, can be with
So that network reduces gap in class as far as possible while increasing gap between class in the training process, so as to improve the extensive energy of network
Power, and then the scene adaptability of lift scheme.In addition, it instead of the Euclidean distance in original triplet loss with COS distance
Metric form, more meets the properties of distributions in face characteristic space, and it is consistent with deployment phase metric form to maintain the training stage
Property so that comparison result is relatively reliable.
In one embodiment, there is provided a kind of face authentication device, as shown in figure 9, including:Image collection module 902, figure
As pretreatment module 904, feature acquisition module 906, computing module 908 and authentication module 910.
Image collection module 902, for being asked based on face authentication, obtains certificate photograph and the scene photo of personage.
Image pre-processing module 904, for carrying out Face datection, crucial point location respectively to scene photo and certificate photograph
And image preprocessing, obtain the corresponding scene facial image of scene photo, and the corresponding certificate facial image of certificate photograph.
Feature acquisition module 906, for scene facial image and certificate facial image to be input to advance trained use
In the convolutional neural networks model of face authentication, and obtain the scene facial image corresponding the of convolutional neural networks model output
One feature vector, and the corresponding second feature vector of certificate facial image;Wherein, convolutional neural networks model is based on triple
The supervised training of loss function obtains.
Computing module 908, for calculating the COS distance of first eigenvector and second feature vector.
Authentication module 910, face authentication knot is determined for comparing COS distance and predetermined threshold value, and according to comparative result
Fruit.
Above-mentioned face authentication device, face authentication is carried out using convolutional neural networks trained in advance, due to convolution god
Through network model, the supervised training based on improved triple loss function obtains, and scene facial image and certificate face figure
The similarity of picture is according to the corresponding first eigenvector of scene facial image and the corresponding second feature vector of certificate facial image
COS distance be calculated, COS distance weigh be space vector angle, the difference being more embodied on direction, without
It is position, so as to more meet the properties of distributions in face characteristic space, improves the reliability of face authentication.
As shown in figure 9, in another embodiment, face authentication device further includes:Sample acquisition module 912, triple
Acquisition module 914, training module 916 and authentication module 918.
Sample acquisition module 912, for obtaining the training sample of tape label, the training sample includes marked belonging to every
One certificate facial image of a tagged object and at least a scene facial image.
Triple acquisition module 914, for according to training sample training convolutional neural networks model, being produced by OHEM each
The corresponding triple element of training sample;Triple element includes reference sample, positive sample and negative sample.
Specifically, triple acquisition module 914, belongs to same for randomly choosing an image as sample, selection is referred to
One label object, the image different from reference sample classification are additionally operable to, according to OHEM strategies, utilize current training as positive sample
Convolutional neural networks model extraction feature between COS distance, for each reference sample, from it is other have be not belonging to
The image that in the facial image of same label object, chosen distance is minimum, belongs to a different category with reference sample, as the reference
The negative sample of sample.
Specifically, using certificate photo as when referring to sample, positive sample and negative sample are that scene is shone;Shone using scene and be used as ginseng
When examining sample, positive sample and negative sample are certificate photo.
Training module 916, for the triple element according to each training sample, based on the supervision of triple loss function,
Training convolutional neural networks model, the triple loss function, using COS distance as metric form, passes through stochastic gradient descent
Algorithm carrys out Optimized model parameter.
Specifically, modified triple loss function includes the restriction to the COS distance of similar sample, and to foreign peoples
The restriction of the COS distance of sample.
Modified triple loss function is:
Wherein, cos () represents COS distance, its calculation isN is triple quantity,Represent the feature vector of reference sample,Represent the feature vector of similar positive sample,Represent foreign peoples's negative sample
Feature vector, []+Implication it is as follows:α1The spacing parameter between class, α2To be spaced ginseng in class
Number.
Authentication module 918, for verification collection data to be inputted convolutional neural networks model, when reaching trained termination condition,
Obtain the trained convolutional neural networks model for face authentication.
In another embodiment, face authentication device further includes model initialization module 920, and magnanimity is based on for utilizing
The trained basic model parameter of human face data of increasing income is initialized, addition normalization layer and triple after feature output layer
Loss function layer, obtains convolutional neural networks to be trained.Above-mentioned face authentication device, on the one hand in original triplet
The constraint to sample distance in class is added in the loss function of loss, so as to reduce class internal difference while gap between increasing class
Away from the generalization ability of lift scheme;On the other hand, the metric form of original triplet loss is changed to cosine by Euclidean distance
Distance, keeps the measurement uniformity of training and deployment, improves the continuity of feature learning.
A kind of computer equipment, including memory, processor and storage can be run on a memory and on a processor
The step of computer program, processor realizes the face authentication method of the various embodiments described above when performing computer program.
A kind of storage medium, is stored thereon with computer program, it is characterised in that the computer program is executed by processor
When, the step of realizing the face authentication method of the various embodiments described above.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, its description is more specific and detailed, but simultaneously
Cannot therefore it be construed as limiting the scope of the patent.It should be pointed out that come for those of ordinary skill in the art
Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.